LLM Query Flow

On this page

Dual-service processing pipeline handling user queries through Python (LLM processing) and Node.js (chat management).

Architecture separates compute-heavy AI logic from chat state management for better scalability and performance.

LLM Query Flow Architecture

Python Side: LLM Processing

Call Python API - User query forwarded to FastAPI service
Python API Response - FastAPI handles LLM invocation and prepares streaming
Streaming Response - Chunks sent to client in real-time via StreamingResponse
Send to Frontend - Response piped to frontend UI for display