Models Overview
Arcee AI offers models at various sizes to meet different deployment scenarios. Choosing the right model can help you complete tasks more efficiently, accurately, and cost effectively.
Trinity Mini and Large (Preview) are currently the only models available via API. Try Trinity-Large-Preview on OpenRouter here.
Strength
Lightweight, ultra-low latency model.
Fast and cost-efficient model for well-defined tasks.
Robust generalist model with strong performance across reasoning, coding, math, and complex task decomposition.
Ideal Deployment
Fully local on consumer GPUs, edge servers, and mobile devices. Tuned for offline operation.
Serve customer-facing apps, agent backends, and high-throughput services in cloud or VPC.
Advanced agents, reasoning systems, and developer tools. Deployed via hosted cloud endpoints or self-hosted in multi-GPU configurations.
Active Parameters
1B per token
3B per token
13B per token
Context Window
128k tokens
128k tokens
512k tokens (hosted at 128k)
Knowledge Cutoff
2024
2024
2024
Speed
⚡⚡⚡⚡⚡ Instant
⚡⚡⚡ Very Fast
⚡⚡⚡ Very Fast
API Model Name
Coming Soon
trinity-mini
trinity-large-preview
Robust generalist model with strong performance across reasoning, coding, math, and complex task decomposition.
Last updated


