Python API implementation for agents supporting both Simple Chat and RAG-style interactions with custom prompts and contextual documents.
Agents provide enhanced chat experiences by automatically selecting between Simple Chat and RAG Chat based on document availability and dynamically assembling tools.

Processing Flow

1. Agent Selection and Routing

  1. User submits query with selected agent
  2. Python API routes request to Service Controller
  3. Service Controller selects appropriate model service
Critical Requirements:
  • Agent ID and configuration must be fully loaded
  • Agent metadata (tools, prompt, document presence) verified

2. Document-Based Path Selection

With Documents (RAG Chat):
  • Agent’s custom prompt
  • RAG Tool
  • Web Analysis Tool
  • Image Generation Tool (GPT models only)
  • Web Search Tool (GPT-4.1 search only)
Without Documents (Simple Chat):
  • Agent’s prompt
  • Simple Chat Tool
  • Image Generation Tool (GPT models only)
  • Web Analysis Tool
  • Web Search Tool (GPT-4.1 search only)
Implementation Notes:
  • Path selection handled by Agent Routing Layer
  • Document presence tracked via agent.doc_ids or similar flag

3. Context Construction

Combined context elements:
  • Chat History - Previous conversation messages
  • Agent Prompt - Custom agent instructions
  • User Query - Current user input
Prompt Template Structure:
"You are a helpful assistant. {agent_prompt}\n\n{chat_history}\n\nUser: {query}"
Token Management:
  • Overflow handled using rolling window strategy
  • Context trimmed or batched as needed

4. Response Generation and Storage

Processing:
  1. LLM generates response streamed live to user
  2. MongoDB storage via Cost Callback tracker
Data Stored:
  • LLM response content
  • Agent ID and query metadata
  • Token cost and usage metrics
  • Model configuration details

5. RAG Implementation (Document-Based Agents)

Document Processing:
  1. Text extraction from uploaded files
  2. Content split into chunks
  3. Embedding generation using embedding model
  4. Storage in Qdrant (or Pinecone)
Inference Process:
  1. RAG Tool queries Qdrant for similar chunks
  2. Retrieved chunks used as LLM context
  3. Enhanced responses based on document content
Technical Requirements:
  • Consistent embedding model for upload and retrieval
  • Top-k vector search with agent-level metadata filtering

Architecture

Agent Architecture Diagram

Agent Processing Architecture

Tool Activation Matrix

ToolSimple ChatRAG ChatModel Requirement
RAG ToolAny
Simple Chat ToolAny
Web Analysis ToolAny
Image GenerationGPT models only
Web SearchGPT-4.1 search only

Key Components

  • Agent Routing Layer: Path selection logic
  • Service Controller: Model service management
  • RAG Tool: Document-based context retrieval
  • LLM Chain: Response generation pipeline
  • Cost Callback: Token usage and pricing tracking
Tool activation depends on both selected agent and model. Validate tool permissions and document presence during implementation.

Troubleshooting

Agent RAG Not Triggering

  • Confirm agent has active documents linked in backend
  • Verify embeddings generated and stored correctly in Qdrant
  • Check model selection supports required tools

Agent Prompt Issues

  • Review agent metadata and prompt formatting
  • Verify context builder includes agent prompt and history
  • Check LLM Chain handler logs for prompt construction issues