Architecture separates compute-heavy AI logic from chat state management for better scalability and performance.

LLM Query Flow Architecture
Python Side: LLM Processing
- Call Python API - User query forwarded to FastAPI service
- Python API Response - FastAPI handles LLM invocation and prepares streaming
- Streaming Response - Chunks sent to client in real-time via
StreamingResponse
- Send to Frontend - Response piped to frontend UI for display
Node.js Side: Chat Management
- Call NodeJS API Socket - Parallel request stores chat state via Socket.IO
- Store Question - Message saved in MongoDB for chat history
- Create Chat - New chat thread created if needed
- Add Member to Chat - User registered in thread for continued interaction