Module 14

LangServe: Deploying Chains & Agents as APIs

⏱ ~3.5 hours ❓ 12-question quiz 🎯 Unlock Module 15

1. What is LangServe?

LangServe is a library built on FastAPI + Pydantic that exposes any LangChain Runnable (chain, agent, retriever) as a REST API with a single add_routes() call.

5 Auto-Generated Endpoints

POST /invoke, POST /batch, POST /stream, POST /stream_events, GET /input_schema

Built-in Playground

Interactive UI at /playground for testing chains with real inputs — no Postman needed.

RemoteRunnable Client

Call a remote LangServe endpoint exactly like a local chain — same .invoke(), .stream(), .batch() API.

Configurable Fields

Expose model selection, temperature, and other parameters to API callers at runtime.

Type Safety

Pydantic schemas generated from your chain's input/output types. Full OpenAPI spec at /openapi.json.

LangSmith Integration

Traces are automatically sent to LangSmith if LANGCHAIN_TRACING_V2=true is set.

LangServe vs raw FastAPI: LangServe is the right choice when your endpoint is a LangChain runnable. For non-LangChain endpoints (auth, user management, file upload), use raw FastAPI alongside LangServe in the same app.

2. Installation & Project Setup

bash
pip install "langserve[all]" langchain-openai langchain uvicorn

# Or use the LangChain CLI to scaffold a new project
pip install langchain-cli
langchain app new my-langserve-app
cd my-langserve-app
project structure
my-langserve-app/
├── app/
│   ├── server.py        ← FastAPI app + add_routes()
│   └── chains/
│       ├── rag_chain.py
│       └── agent.py
├── Dockerfile
├── pyproject.toml
└── .env

3. Basic add_routes()

One call exposes your entire chain as a production API.

app/server.py
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

app = FastAPI(
    title="My LangServe App",
    description="LangChain chains deployed as REST APIs",
    version="1.0.0",
)

# Define your chain
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "{question}"),
])
chain = prompt | llm | StrOutputParser()

# Mount it — generates 5 endpoints at /chat/*
add_routes(
    app,
    chain,
    path="/chat",
    enable_feedback_endpoint=True,   # POST /chat/feedback for thumbs up/down
)

# Health check
@app.get("/health")
def health():
    return {"status": "ok"}

# Run: uvicorn app.server:app --reload
Generated endpoints at /chat:
  • POST /chat/invoke — single synchronous call
  • POST /chat/batch — parallel batch calls
  • POST /chat/stream — SSE streaming
  • POST /chat/stream_events — detailed event streaming (astream_events)
  • GET /chat/input_schema — Pydantic input JSON schema
  • GET /chat/output_schema — Pydantic output JSON schema
  • GET /chat/playground — interactive UI

4. Multiple Chains in One App

Register multiple chains on different paths in the same FastAPI application.

app/server.py — multi-route
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_chroma import Chroma
from langchain_core.runnables import RunnablePassthrough

app = FastAPI()
llm = ChatOpenAI(model="gpt-4o-mini")

# Chain 1: Simple Q&A
qa_chain = (
    ChatPromptTemplate.from_messages([("human", "{question}")])
    | llm
    | StrOutputParser()
)

# Chain 2: RAG pipeline
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
store = Chroma(collection_name="docs", embedding_function=embeddings)
retriever = store.as_retriever()

rag_prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using context: {context}"),
    ("human", "{question}"),
])
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Mount both chains
add_routes(app, qa_chain, path="/qa")
add_routes(app, rag_chain, path="/rag")

5. RemoteRunnable Client

Call a LangServe API from any Python application with the same interface as a local chain.

client.py
from langserve import RemoteRunnable

# Connect to a deployed LangServe app
chain = RemoteRunnable("http://localhost:8000/chat")

# Same API as local chains
response = chain.invoke({"question": "What is LangGraph?"})
print(response)

# Batch
results = chain.batch([
    {"question": "Explain RAG"},
    {"question": "What is LCEL?"},
])
for r in results:
    print(r)

# Streaming
for chunk in chain.stream({"question": "Tell me about LangSmith."}):
    print(chunk, end="", flush=True)
print()
async_client.py
import asyncio
from langserve import RemoteRunnable

chain = RemoteRunnable("http://localhost:8000/chat")

async def main():
    # Async invoke
    response = await chain.ainvoke({"question": "What is LangGraph?"})
    print(response)

    # Async streaming
    async for chunk in chain.astream({"question": "Explain LCEL"}):
        print(chunk, end="", flush=True)
    print()

asyncio.run(main())

6. Configurable Fields

Expose chain parameters (model, temperature, system prompt) to API callers without rebuilding the chain.

configurable_server.py
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

app = FastAPI()

# Create a configurable chain
llm = ChatOpenAI(model="gpt-4o-mini").configurable_fields(
    model_name=ConfigurableField(
        id="model",
        name="LLM Model",
        description="The OpenAI model to use",
    ),
    temperature=ConfigurableField(
        id="temperature",
        name="Temperature",
        description="Sampling temperature (0.0–2.0)",
    ),
)

from langchain_core.runnables import ConfigurableField

prompt = ChatPromptTemplate.from_messages([
    ("system", "{system_prompt}"),
    ("human", "{question}"),
]).configurable_fields(
    system_prompt=ConfigurableField(
        id="system_prompt",
        name="System Prompt",
        description="Override the system prompt",
    )
)

chain = prompt | llm | StrOutputParser()
add_routes(app, chain, path="/configurable-chat")
configurable_client.py — calling with custom config
from langserve import RemoteRunnable

chain = RemoteRunnable("http://localhost:8000/configurable-chat")

# Override model and system prompt at call time
response = chain.invoke(
    {"question": "Explain LangServe", "system_prompt": "You are a pirate."},
    config={
        "configurable": {
            "model": "gpt-4o",
            "temperature": 0.9,
            "system_prompt": "You are a pirate. Answer with 'Arrr!'",
        }
    },
)
print(response)

7. Authentication Middleware

Add JWT authentication to LangServe routes using FastAPI dependency injection.

auth_server.py
from fastapi import FastAPI, Depends, Request
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Re-use get_current_tenant from Module 13
from auth import get_current_tenant

app = FastAPI()

llm = ChatOpenAI(model="gpt-4o-mini")
chain = (
    ChatPromptTemplate.from_messages([("human", "{question}")])
    | llm
    | StrOutputParser()
)

# Dependency that validates JWT for every LangServe request
def per_req_config_modifier(config: dict, request: Request) -> dict:
    """Called before each LangServe request to add per-request config."""
    # Validate Bearer token — raises 401 if invalid
    auth_header = request.headers.get("Authorization", "")
    if not auth_header.startswith("Bearer "):
        from fastapi import HTTPException
        raise HTTPException(status_code=401, detail="Missing auth token")
    # You can add metadata here for tracing
    config["metadata"] = {
        "request_id": request.headers.get("X-Request-ID", ""),
    }
    return config

add_routes(
    app,
    chain,
    path="/secure-chat",
    per_req_config_modifier=per_req_config_modifier,
)

8. Deploying a LangGraph Agent

LangGraph compiled graphs are LangChain runnables — they can be mounted with add_routes() just like any chain.

agent_server.py
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage

app = FastAPI()

@tool
def search_kb(query: str) -> str:
    """Search the knowledge base."""
    return f"KB result: {query} is well documented."

@tool
def get_ticket_status(ticket_id: str) -> str:
    """Get the status of a support ticket."""
    return f"Ticket {ticket_id}: In Progress — assigned to team-2."

llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
agent = create_react_agent(llm, tools=[search_kb, get_ticket_status])

# Mount the compiled LangGraph agent
add_routes(
    app,
    agent,
    path="/agent",
    input_type=dict,     # agent accepts {"messages": [...]}
    config_keys=["configurable"],
)

# Run: uvicorn agent_server:app --reload
agent_client.py
from langserve import RemoteRunnable
from langchain_core.messages import HumanMessage

agent = RemoteRunnable("http://localhost:8000/agent")

# Invoke the remote agent
result = agent.invoke({
    "messages": [HumanMessage(content="What is the status of ticket TKT-42?")]
})
print(result["messages"][-1].content)

# Stream agent steps
for chunk in agent.stream({
    "messages": [HumanMessage(content="Search for LangGraph docs")]
}):
    print(chunk)

9. CI/CD for LangServe

.github/workflows/deploy.yml
name: Deploy LangServe App

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      - run: pip install -r requirements.txt
      - run: pytest tests/ -v
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

  build-and-push:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build Docker image
        run: docker build -t ${{ secrets.REGISTRY }}/langserve-app:${{ github.sha }} .
      - name: Push to registry
        run: |
          echo ${{ secrets.REGISTRY_PASSWORD }} | docker login -u ${{ secrets.REGISTRY_USER }} --password-stdin ${{ secrets.REGISTRY }}
          docker push ${{ secrets.REGISTRY }}/langserve-app:${{ github.sha }}

  deploy:
    needs: build-and-push
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Kubernetes
        run: |
          kubectl set image deployment/langserve-app langserve-app=${{ secrets.REGISTRY }}/langserve-app:${{ github.sha }}

📝 Knowledge Check

Module 14 — Quiz

Score 80% or higher (10 out of 12) to unlock Module 15.

0 of 12 answered