Dual-service processing pipeline handling user queries through Python (LLM processing) and Node.js (chat management).
Architecture separates compute-heavy AI logic from chat state management for better scalability and performance.
LLM Query Flow Diagram

LLM Query Flow Architecture

Python Side: LLM Processing

  1. Call Python API - User query forwarded to FastAPI service
  2. Python API Response - FastAPI handles LLM invocation and prepares streaming
  3. Streaming Response - Chunks sent to client in real-time via StreamingResponse
  4. Send to Frontend - Response piped to frontend UI for display

Node.js Side: Chat Management

  1. Call NodeJS API Socket - Parallel request stores chat state via Socket.IO
  2. Store Question - Message saved in MongoDB for chat history
  3. Create Chat - New chat thread created if needed
  4. Add Member to Chat - User registered in thread for continued interaction