Trinity Large (400B)
Overview
Trinity Large (Preview) is a 400B-parameter (13B active) sparse mixture-of-experts language model, engineered to scale model capacity while maintaining inference efficiency over long contexts, with strong performance in reasoning-heavy workloads including math, coding-related tasks, and multi-step agent workflows.
Key Features
Sparse mixture-of-experts architecture: Uses an extremely sparse MoE design with 400B total parameters and 13B activated per token. Sparse expert routing constrains per-token activation, enabling efficient inference at scale.
Long-context training and utilization: Trained at 256K sequence length with support for 512K inference (hosted at 128k), using architecture and training procedures designed to operate effectively over long inputs and extended multi-turn interactions over large inputs.
High throughput efficiency: Designed with inference-time efficiency as a primary objective, leveraging both extreme sparsity and optimized attention mechanisms to achieve strong throughput on modern accelerator hardware.
Deployment Quickstart
To get started deploying Trinity Large, download the model here and proceed to Quick Deploys
Model Summary
Name
Trinity-Large-400B (Preview)
Architecture
Mixture-of-Experts
Parameters
400 Billion Total, 13 Billion Active
Experts
256 Experts, 4 Active
Attention Mechanism
Grouped Query Attention (GQA)
Training Tokens
17 trillion
License
Apache 2.0
Recommended Inference Parameters
temperature: 0.8
top_p: 0.8
Last updated

