Intelligent Cost Optimization
EDDI's Model Cascading system enables cost-aware multi-model routing. Start with fast, inexpensive models and automatically escalate to more powerful (and expensive) models only when confidence is low — reducing AI costs without sacrificing quality.
Cascading Features
- Cost Optimization — Try cheap/fast models first, escalate to powerful models only when confidence is low
- 4 Confidence Strategies — Structured output, heuristic, judge model, or none — choose the evaluation method that fits your use case
- Per-Conversation Budgets — Automatic cost tracking with budget caps and conversation eviction when exceeded
- Tenant Cost Ceilings — Monthly cost budgets per tenant with automatic enforcement in multi-tenant deployments
- 12 LLM Providers — OpenAI, Anthropic, Google Gemini, Mistral, Azure OpenAI, Amazon Bedrock, Oracle GenAI, Vertex AI, Ollama, Jlama, Hugging Face, and any OpenAI-compatible endpoint
How It Works
Configure a cascade chain of models ordered by cost. For each user message, EDDI tries the cheapest model first and evaluates confidence. If confidence falls below the threshold, it automatically escalates to the next model in the chain. This approach can reduce LLM costs by 60-80% for typical workloads where most queries are simple enough for smaller models.