LangChain Streaming Chat Integration

FastAPI backend implementation using LangChain and OpenAI for token-by-token streaming chat interface.

System Architecture

Frontend → POST /stream-tool-chat-with-openai → ToolController → LangChain Pipeline → StreamingResponse

LangChain Pipeline Components:

LLM (ChatOpenAI)
Tools
Prompt Template
Memory

FastAPI Endpoint

@router.post("/stream-tool-chat-with-openai")
async def stream_chat(chat_input: ToolChatBase):
    controller = ToolController()
    controller.initialization_service_code(code=chat_input.code)
    
    response_generator = await controller.service_hub_handler(chat_input)
    return StreamingResponseWithStatusCode(response_generator, media_type="text/event-stream")

Controller Flow

initialization_service_code(code) - Sets LLM provider (e.g., OPEN_AI)
_select_manager(chat_input) - Selects appropriate tool service
selected_tool.initialize_llm() - Initializes LangChain LLM
selected_tool.initialize_repository() - Loads session and history
selected_tool.prompt_attach() - Attaches custom prompt template
selected_tool.create_conversation() - Logs query into memory
selected_tool.tool_calls_run() - Executes pipeline and streams results

LangChain Components

LLM Initialization

from langchain.chat_models import ChatOpenAI

self.llm = ChatOpenAI(
    model_name="gpt-4",
    temperature=0.7,
    streaming=False,
    openai_api_key="your-api-key"
)

Memory Configuration

from langchain.memory import ConversationSummaryBufferMemory

self.memory = ConversationSummaryBufferMemory(
    llm=self.llm,
    max_token_limit=1000,
    return_messages=True
)

Tool Registration

from langchain.agents import tool

@tool
def web_search_preview(query: str) -> str:
    return f"Search results for {query}"

tools = [web_search_preview]
llm_with_tools = self.llm.bind_tools(tools=tools, tool_choice='auto')

Prompt Template

from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You're a helpful assistant."),
    ("human", "{input}")
])

Streaming Implementation

async def tool_calls_run(self, query: str) -> AsyncGenerator[str, None]:
    chain = LLMChain(
        llm=self.llm_with_tools,
        memory=self.memory,
        prompt=prompt,
        verbose=True
    )
    
    async for chunk in chain.astream(input={"input": query}):
        yield f"data: {chunk.content}\n\n"
    yield "data: [DONE]\n\n"

File Structure

File	Responsibility
`tool_chat.py`	FastAPI endpoint implementation
`ToolController.py`	Flow orchestration based on model/tool
`OpenAIToolService.py`	LangChain LLM setup and execution
`simple_tools.py`	Custom tool functions
`prompt_template.py`	Prompt customization

LangChain Concepts

Component	Purpose
ChatOpenAI	LangChain wrapper for OpenAI chat models
Tools	Extend LLM behavior with custom logic
Memory	Maintain conversation history and context
Prompt Template	Structure system/user messages dynamically
Streaming	Real-time response delivery
ToolController	Manage initialization and execution flow

Common Functions

App Development

Workflows

LangChain Streaming Chat Integration

System Architecture

FastAPI Endpoint

Controller Flow

LangChain Components

LLM Initialization

Memory Configuration

Tool Registration

Prompt Template

Streaming Implementation

File Structure

LangChain Concepts

Common Functions

App Development

Workflows

​System Architecture

​FastAPI Endpoint

​Controller Flow

​LangChain Components

​LLM Initialization

​Memory Configuration

​Tool Registration

​Prompt Template

​Streaming Implementation

​File Structure

​LangChain Concepts

System Architecture

FastAPI Endpoint

Controller Flow

LangChain Components

LLM Initialization

Memory Configuration

Tool Registration

Prompt Template

Streaming Implementation

File Structure

LangChain Concepts