Agents: Working and Implementation

Python API implementation for agents supporting both Simple Chat and RAG-style interactions with custom prompts and contextual documents.

Agents provide enhanced chat experiences by automatically selecting between Simple Chat and RAG Chat based on document availability and dynamically assembling tools.

Processing Flow

1. Agent Selection and Routing

User submits query with selected agent
Python API routes request to Service Controller
Service Controller selects appropriate model service

Critical Requirements:

Agent ID and configuration must be fully loaded
Agent metadata (tools, prompt, document presence) verified

2. Document-Based Path Selection

With Documents (RAG Chat):

Agent’s custom prompt
RAG Tool
Web Analysis Tool
Image Generation Tool (GPT models only)
Web Search Tool (GPT-4.1 search only)

Without Documents (Simple Chat):

Agent’s prompt
Simple Chat Tool
Image Generation Tool (GPT models only)
Web Analysis Tool
Web Search Tool (GPT-4.1 search only)

Implementation Notes:

Path selection handled by Agent Routing Layer
Document presence tracked via agent.doc_ids or similar flag

3. Context Construction

Combined context elements:

Chat History - Previous conversation messages
Agent Prompt - Custom agent instructions
User Query - Current user input

Prompt Template Structure:

"You are a helpful assistant. {agent_prompt}\n\n{chat_history}\n\nUser: {query}"

Token Management:

Overflow handled using rolling window strategy
Context trimmed or batched as needed

4. Response Generation and Storage

Processing:

LLM generates response streamed live to user
MongoDB storage via Cost Callback tracker

Data Stored:

LLM response content
Agent ID and query metadata
Token cost and usage metrics
Model configuration details

5. RAG Implementation (Document-Based Agents)

Document Processing:

Text extraction from uploaded files
Content split into chunks
Embedding generation using embedding model
Storage in Qdrant (or Pinecone)

Inference Process:

RAG Tool queries Qdrant for similar chunks
Retrieved chunks used as LLM context
Enhanced responses based on document content

Technical Requirements:

Consistent embedding model for upload and retrieval
Top-k vector search with agent-level metadata filtering

Architecture

Agent Processing Architecture

Tool Activation Matrix

Tool	Simple Chat	RAG Chat	Model Requirement
RAG Tool	❌	✅	Any
Simple Chat Tool	✅	❌	Any
Web Analysis Tool	✅	✅	Any
Image Generation	✅	✅	GPT models only
Web Search	✅	✅	GPT-4.1 search only

Key Components

Agent Routing Layer: Path selection logic
Service Controller: Model service management
RAG Tool: Document-based context retrieval
LLM Chain: Response generation pipeline
Cost Callback: Token usage and pricing tracking

Tool activation depends on both selected agent and model. Validate tool permissions and document presence during implementation.

Common Functions

App Development

Workflows

Agents: Working and Implementation

Processing Flow

1. Agent Selection and Routing

2. Document-Based Path Selection

3. Context Construction

4. Response Generation and Storage

5. RAG Implementation (Document-Based Agents)

Architecture

Tool Activation Matrix

Key Components

Troubleshooting

Agent RAG Not Triggering

Agent Prompt Issues

Common Functions

App Development

Workflows

​Processing Flow

​1. Agent Selection and Routing

​2. Document-Based Path Selection

​3. Context Construction

​4. Response Generation and Storage

​5. RAG Implementation (Document-Based Agents)

​Architecture

​Tool Activation Matrix

​Key Components

​Troubleshooting

​Agent RAG Not Triggering

​Agent Prompt Issues

Processing Flow

1. Agent Selection and Routing

2. Document-Based Path Selection

3. Context Construction

4. Response Generation and Storage

5. RAG Implementation (Document-Based Agents)

Architecture

Tool Activation Matrix

Key Components

Troubleshooting

Agent RAG Not Triggering

Agent Prompt Issues