Simple Chat now uses LangGraph and MCP integration for enhanced tool handling and modular flow:
  • Dynamically pulls tools from MCP server
  • Builds a state machine using LangGraph to control execution between tools and chatbot
  • Supports new integrations: Slack, GitHub, Google Drive, Gmail, Google Calendar
1

User Query Flow

  1. User sends a query from the frontend UI.
  2. The Python API receives the request and passes it to the Service Controller.
  3. Controller uses the selected model (from dropdown) to decide which Model Service to initialize.
  4. The selected LLM is initialized.
  5. Simultaneously, a MultiServerClient fetches all available tools from the MCP server.
2

Tool Registration and Filtering

  • Tools received from MCP include:
    • Slack
    • GitHub
    • Google Drive
    • Google Calendar
    • Gmail
  • These tools are matched against the current query.
  • Unnecessary tools are filtered out before being bound with the LLM.
3

LangGraph StateGraph Construction

  • With tools filtered and bound, a LangGraph StateGraph is created:
    • Contains a ToolNode for tool execution.
    • Contains a ChatbotNode for normal LLM-based responses.
4

Chat History Initialization

  • In parallel, the Chat Repository is initialized using the provided chat_id.
  • Full chat history is retrieved.
  • History is used to create memory for the conversation flow.
5

StateGraph Execution

  • The LangGraph StateGraph is invoked.
  • Based on the query and history:
    • If tool usage is detected → ToolNode is called
    • Else → ChatbotNode is used to generate a natural LLM response
  • The combined context (query + history) is passed to the chatbot.
6

Tool Invocation Rules

Tool Activation Based on Model Selection
ToolCondition
Web Search ToolOnly used if GPT-4.1-search model is selected
Image GenerationOnly used for GET-based GPT models
MCP Tools (Slack, GitHub, etc.)Dynamically fetched and filtered per query
7

Streaming the Response

  1. The selected tool (if needed) is invoked.
  2. The tool may internally call the LLM to complete its task.
  3. The final response is streamed back to the frontend in real-time.
8

Logging and Cost Management

  • LLM token cost is calculated using input/output lengths.
  • This cost is written to the DB using the Cost Callback.
  • The MongoDB Handler logs:
    • Final response
    • Token usage
    • Tool activations
    • Total cost breakdown
9

Architecture Overview

LLM Query Flow Diagram
Web Search and Image Generation are still model-gated — verify model-specific logic before invoking or binding such tools. For deeper logic, inspect the Service Controller and LangGraph Node implementations.

Troubleshooting