LangServe: Deploying Chains & Agents as APIs
1. What is LangServe?
LangServe is a library built on FastAPI + Pydantic that exposes any LangChain Runnable (chain, agent, retriever) as a REST API with a single add_routes() call.
5 Auto-Generated Endpoints
POST /invoke, POST /batch, POST /stream, POST /stream_events, GET /input_schema
Built-in Playground
Interactive UI at /playground for testing chains with real inputs — no Postman needed.
RemoteRunnable Client
Call a remote LangServe endpoint exactly like a local chain — same .invoke(), .stream(), .batch() API.
Configurable Fields
Expose model selection, temperature, and other parameters to API callers at runtime.
Type Safety
Pydantic schemas generated from your chain's input/output types. Full OpenAPI spec at /openapi.json.
LangSmith Integration
Traces are automatically sent to LangSmith if LANGCHAIN_TRACING_V2=true is set.
2. Installation & Project Setup
pip install "langserve[all]" langchain-openai langchain uvicorn
# Or use the LangChain CLI to scaffold a new project
pip install langchain-cli
langchain app new my-langserve-app
cd my-langserve-app
my-langserve-app/
├── app/
│ ├── server.py ← FastAPI app + add_routes()
│ └── chains/
│ ├── rag_chain.py
│ └── agent.py
├── Dockerfile
├── pyproject.toml
└── .env
3. Basic add_routes()
One call exposes your entire chain as a production API.
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
app = FastAPI(
title="My LangServe App",
description="LangChain chains deployed as REST APIs",
version="1.0.0",
)
# Define your chain
llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{question}"),
])
chain = prompt | llm | StrOutputParser()
# Mount it — generates 5 endpoints at /chat/*
add_routes(
app,
chain,
path="/chat",
enable_feedback_endpoint=True, # POST /chat/feedback for thumbs up/down
)
# Health check
@app.get("/health")
def health():
return {"status": "ok"}
# Run: uvicorn app.server:app --reload
/chat:
POST /chat/invoke— single synchronous callPOST /chat/batch— parallel batch callsPOST /chat/stream— SSE streamingPOST /chat/stream_events— detailed event streaming (astream_events)GET /chat/input_schema— Pydantic input JSON schemaGET /chat/output_schema— Pydantic output JSON schemaGET /chat/playground— interactive UI
4. Multiple Chains in One App
Register multiple chains on different paths in the same FastAPI application.
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_chroma import Chroma
from langchain_core.runnables import RunnablePassthrough
app = FastAPI()
llm = ChatOpenAI(model="gpt-4o-mini")
# Chain 1: Simple Q&A
qa_chain = (
ChatPromptTemplate.from_messages([("human", "{question}")])
| llm
| StrOutputParser()
)
# Chain 2: RAG pipeline
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
store = Chroma(collection_name="docs", embedding_function=embeddings)
retriever = store.as_retriever()
rag_prompt = ChatPromptTemplate.from_messages([
("system", "Answer using context: {context}"),
("human", "{question}"),
])
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| rag_prompt
| llm
| StrOutputParser()
)
# Mount both chains
add_routes(app, qa_chain, path="/qa")
add_routes(app, rag_chain, path="/rag")
5. RemoteRunnable Client
Call a LangServe API from any Python application with the same interface as a local chain.
from langserve import RemoteRunnable
# Connect to a deployed LangServe app
chain = RemoteRunnable("http://localhost:8000/chat")
# Same API as local chains
response = chain.invoke({"question": "What is LangGraph?"})
print(response)
# Batch
results = chain.batch([
{"question": "Explain RAG"},
{"question": "What is LCEL?"},
])
for r in results:
print(r)
# Streaming
for chunk in chain.stream({"question": "Tell me about LangSmith."}):
print(chunk, end="", flush=True)
print()
import asyncio
from langserve import RemoteRunnable
chain = RemoteRunnable("http://localhost:8000/chat")
async def main():
# Async invoke
response = await chain.ainvoke({"question": "What is LangGraph?"})
print(response)
# Async streaming
async for chunk in chain.astream({"question": "Explain LCEL"}):
print(chunk, end="", flush=True)
print()
asyncio.run(main())
6. Configurable Fields
Expose chain parameters (model, temperature, system prompt) to API callers without rebuilding the chain.
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
app = FastAPI()
# Create a configurable chain
llm = ChatOpenAI(model="gpt-4o-mini").configurable_fields(
model_name=ConfigurableField(
id="model",
name="LLM Model",
description="The OpenAI model to use",
),
temperature=ConfigurableField(
id="temperature",
name="Temperature",
description="Sampling temperature (0.0–2.0)",
),
)
from langchain_core.runnables import ConfigurableField
prompt = ChatPromptTemplate.from_messages([
("system", "{system_prompt}"),
("human", "{question}"),
]).configurable_fields(
system_prompt=ConfigurableField(
id="system_prompt",
name="System Prompt",
description="Override the system prompt",
)
)
chain = prompt | llm | StrOutputParser()
add_routes(app, chain, path="/configurable-chat")
from langserve import RemoteRunnable
chain = RemoteRunnable("http://localhost:8000/configurable-chat")
# Override model and system prompt at call time
response = chain.invoke(
{"question": "Explain LangServe", "system_prompt": "You are a pirate."},
config={
"configurable": {
"model": "gpt-4o",
"temperature": 0.9,
"system_prompt": "You are a pirate. Answer with 'Arrr!'",
}
},
)
print(response)
7. Authentication Middleware
Add JWT authentication to LangServe routes using FastAPI dependency injection.
from fastapi import FastAPI, Depends, Request
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Re-use get_current_tenant from Module 13
from auth import get_current_tenant
app = FastAPI()
llm = ChatOpenAI(model="gpt-4o-mini")
chain = (
ChatPromptTemplate.from_messages([("human", "{question}")])
| llm
| StrOutputParser()
)
# Dependency that validates JWT for every LangServe request
def per_req_config_modifier(config: dict, request: Request) -> dict:
"""Called before each LangServe request to add per-request config."""
# Validate Bearer token — raises 401 if invalid
auth_header = request.headers.get("Authorization", "")
if not auth_header.startswith("Bearer "):
from fastapi import HTTPException
raise HTTPException(status_code=401, detail="Missing auth token")
# You can add metadata here for tracing
config["metadata"] = {
"request_id": request.headers.get("X-Request-ID", ""),
}
return config
add_routes(
app,
chain,
path="/secure-chat",
per_req_config_modifier=per_req_config_modifier,
)
8. Deploying a LangGraph Agent
LangGraph compiled graphs are LangChain runnables — they can be mounted with add_routes() just like any chain.
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
app = FastAPI()
@tool
def search_kb(query: str) -> str:
"""Search the knowledge base."""
return f"KB result: {query} is well documented."
@tool
def get_ticket_status(ticket_id: str) -> str:
"""Get the status of a support ticket."""
return f"Ticket {ticket_id}: In Progress — assigned to team-2."
llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
agent = create_react_agent(llm, tools=[search_kb, get_ticket_status])
# Mount the compiled LangGraph agent
add_routes(
app,
agent,
path="/agent",
input_type=dict, # agent accepts {"messages": [...]}
config_keys=["configurable"],
)
# Run: uvicorn agent_server:app --reload
from langserve import RemoteRunnable
from langchain_core.messages import HumanMessage
agent = RemoteRunnable("http://localhost:8000/agent")
# Invoke the remote agent
result = agent.invoke({
"messages": [HumanMessage(content="What is the status of ticket TKT-42?")]
})
print(result["messages"][-1].content)
# Stream agent steps
for chunk in agent.stream({
"messages": [HumanMessage(content="Search for LangGraph docs")]
}):
print(chunk)
9. CI/CD for LangServe
name: Deploy LangServe App
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- run: pip install -r requirements.txt
- run: pytest tests/ -v
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
build-and-push:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build Docker image
run: docker build -t ${{ secrets.REGISTRY }}/langserve-app:${{ github.sha }} .
- name: Push to registry
run: |
echo ${{ secrets.REGISTRY_PASSWORD }} | docker login -u ${{ secrets.REGISTRY_USER }} --password-stdin ${{ secrets.REGISTRY }}
docker push ${{ secrets.REGISTRY }}/langserve-app:${{ github.sha }}
deploy:
needs: build-and-push
runs-on: ubuntu-latest
steps:
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/langserve-app langserve-app=${{ secrets.REGISTRY }}/langserve-app:${{ github.sha }}