Trinity Large (400B)

Overview

Trinity Large (Preview) is a 400B-parameter (13B active) sparse mixture-of-experts language model, engineered to scale model capacity while maintaining inference efficiency over long contexts, with strong performance in reasoning-heavy workloads including math, coding-related tasks, and multi-step agent workflows.

Key Features

  • Sparse mixture-of-experts architecture: Uses an extremely sparse MoE design with 400B total parameters and 13B activated per token. Sparse expert routing constrains per-token activation, enabling efficient inference at scale.

  • Long-context training and utilization: Trained at 256K sequence length with support for 512K inference (hosted at 128k), using architecture and training procedures designed to operate effectively over long inputs and extended multi-turn interactions over large inputs.

  • High throughput efficiency: Designed with inference-time efficiency as a primary objective, leveraging both extreme sparsity and optimized attention mechanisms to achieve strong throughput on modern accelerator hardware.

Deployment Quickstart

To get started deploying Trinity Large, download the model herearrow-up-right and proceed to Quick Deploys

Model Summary

Name

Trinity-Large-400B (Preview)

Architecture

Mixture-of-Experts

Parameters

400 Billion Total, 13 Billion Active

Experts

256 Experts, 4 Active

Attention Mechanism

Grouped Query Attention (GQA)

Training Tokens

17 trillion

License

Apache 2.0

Recommended Inference Parameters

  • temperature: 0.8

  • top_p: 0.8

Last updated