Agents allow enhanced chat experiences using custom prompts and contextual documents:
  • Automatically selects between Simple Chat and RAG Chat
  • Dynamically assembles tools based on agent settings and selected model
This guide outlines the complete flow from agent selection to response generation and cost tracking.
1

Flow Overview

  1. User submits a query along with a selected agent.
  2. Python API routes the request to the Service Controller.
  3. The Service Controller selects the appropriate model service based on user input.
Developer Tip:
  • The agent ID and configuration are critical at this point.
  • Ensure agent metadata (tools, prompt, doc presence) is fully loaded.
2

Agent Document Check

The system checks whether the selected agent has any uploaded documents.
  • If documents exist: A RAG Chat is triggered using:
    • Agent’s custom prompt
    • Toolset:
      • RAG Tool
      • Web Analysis Tool
      • Image Generation Tool (if GPT model selected)
      • Web Search Tool (if GPT-4.1 search selected)
  • If no documents: Falls back to a Simple Chat using:
    • Agent’s prompt
    • Toolset:
      • Simple Chat
      • Image Generation Tool (if GPT model selected)
      • Web Analysis Tool
      • Web Search Tool (if GPT-4.1 search selected)
Developer Tip:
  • Path selection logic is handled inside the Agent Routing Layer.
  • Make sure document presence is reflected in agent.doc_ids or a similar flag.
3

Context Construction

The following elements are combined into a single context prompt:
  • Chat History
  • Agent Prompt
  • User Query
This compiled context is passed to the LLM Chain for response generation.Developer Tip:
  • Prompt templates typically follow:
    "You are a helpful assistant. {agent_prompt}\n\n{chat_history}\n\nUser: {query}"
  • Token overflow is trimmed or batched using a rolling window strategy.
4

Response Streaming and Storage

  1. The LLM generates a response, which is streamed live to the user.
  2. System stores:
    • The LLM response in MongoDB
    • The token cost using a Cost Callback tracker
Developer Tip:
  • MongoDB schema should track agent ID, query, response, and model metadata.
  • Cost calculation is triggered after the LLM returns, and stored with token usage.
5

RAG with Agent (when docs exist)

RAG chat allows file-based augmentation when the agent has documents uploaded.
  1. Uploaded documents are:
    • Text-extracted
    • Split into chunks
    • Embedded using an embedding model
  2. Embeddings are stored in Qdrant (or Pinecone)
  3. During inference:
    • RAG Tool queries Qdrant for similar chunks
    • Retrieved chunks are used as context for the LLM to enhance answers
Developer Tip:
  • Ensure embeddings use the same model at upload and retrieval time.
  • Retrieval uses Top-k vector search. Use agent-level metadata for filtering.
6

Architecture Diagram

LLM Query Flow Diagram
Tool activation depends on both the selected agent and the model. Be sure to validate tool permissions and document presence during implementation and testing.

Troubleshooting