This guide explains how token usage and credit (cost) is calculated and stored for LLM API calls (OpenAI, Gemini, etc.) in your project.
CostCalculator
src/custom_lib/langchain/callbacks/huggingface/cost/cost_calc_handler.py
Method | Description |
---|---|
add_prompt_tokens(count: int) | Adds number of tokens in prompt |
add_completion_tokens(count: int) | Adds number of tokens in completion |
calculate_total_cost(model_name: str) | Calculates total cost using model’s rate per 1K tokens |
MODEL_COST_PER_1K_TOKENS
get_openai_token_cost_for_model(model_name, input_tokens, output_tokens)
CostCalcCallbackHandler
src/custom_lib/langchain/callbacks/huggingface/cost/cost_calc_handler.py
Method | Description |
---|---|
async def on_llm_end(self, response: LLMResult, **kwargs) | Triggered at the end of LLM call; collects tokens, calculates cost, and persists usage |
CostCalculator.calculate_total_cost(...)
token_data = {...}
including usage and costthread_repo.initialization(...)
to store datasrc/custom_lib/langchain/callbacks/gemini/cost/cost_calc_handler.py
ThreadRepository
src/chatflow_langchain/repositories/thread_repository.py
thread_id
and collection_name
.callbacks/gemini/cost/cost_calc_handler.py
callbacks/huggingface/cost/cost_calc_handler.py
on_llm_end
CostCalculator
+ ThreadRepository
File | Purpose |
---|---|
cost_calc_handler.py | Token & cost tracking logic |
thread_repository.py | Persists thread-wise usage data |
*_cost_calc_handler.py | API-specific handlers (OpenAI, Gemini, etc.) |
model_cost_mapping.py | Model token cost rate definition |