FastAPI backend implementation using LangChain and OpenAI for token-by-token streaming chat interface.

System Architecture

Frontend → POST /stream-tool-chat-with-openai → ToolController → LangChain Pipeline → StreamingResponse
LangChain Pipeline Components:
  • LLM (ChatOpenAI)
  • Tools
  • Prompt Template
  • Memory

FastAPI Endpoint

@router.post("/stream-tool-chat-with-openai")
async def stream_chat(chat_input: ToolChatBase):
    controller = ToolController()
    controller.initialization_service_code(code=chat_input.code)
    
    response_generator = await controller.service_hub_handler(chat_input)
    return StreamingResponseWithStatusCode(response_generator, media_type="text/event-stream")

Controller Flow

  1. initialization_service_code(code) - Sets LLM provider (e.g., OPEN_AI)
  2. _select_manager(chat_input) - Selects appropriate tool service
  3. selected_tool.initialize_llm() - Initializes LangChain LLM
  4. selected_tool.initialize_repository() - Loads session and history
  5. selected_tool.prompt_attach() - Attaches custom prompt template
  6. selected_tool.create_conversation() - Logs query into memory
  7. selected_tool.tool_calls_run() - Executes pipeline and streams results

LangChain Components

LLM Initialization

from langchain.chat_models import ChatOpenAI

self.llm = ChatOpenAI(
    model_name="gpt-4",
    temperature=0.7,
    streaming=False,
    openai_api_key="your-api-key"
)

Memory Configuration

from langchain.memory import ConversationSummaryBufferMemory

self.memory = ConversationSummaryBufferMemory(
    llm=self.llm,
    max_token_limit=1000,
    return_messages=True
)

Tool Registration

from langchain.agents import tool

@tool
def web_search_preview(query: str) -> str:
    return f"Search results for {query}"

tools = [web_search_preview]
llm_with_tools = self.llm.bind_tools(tools=tools, tool_choice='auto')

Prompt Template

from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You're a helpful assistant."),
    ("human", "{input}")
])

Streaming Implementation

async def tool_calls_run(self, query: str) -> AsyncGenerator[str, None]:
    chain = LLMChain(
        llm=self.llm_with_tools,
        memory=self.memory,
        prompt=prompt,
        verbose=True
    )
    
    async for chunk in chain.astream(input={"input": query}):
        yield f"data: {chunk.content}\n\n"
    yield "data: [DONE]\n\n"

File Structure

FileResponsibility
tool_chat.pyFastAPI endpoint implementation
ToolController.pyFlow orchestration based on model/tool
OpenAIToolService.pyLangChain LLM setup and execution
simple_tools.pyCustom tool functions
prompt_template.pyPrompt customization

LangChain Concepts

ComponentPurpose
ChatOpenAILangChain wrapper for OpenAI chat models
ToolsExtend LLM behavior with custom logic
MemoryMaintain conversation history and context
Prompt TemplateStructure system/user messages dynamically
StreamingReal-time response delivery
ToolControllerManage initialization and execution flow