# Trinity-Large-Preview

**Overview**

Trinity Large (Preview) is a 400B-parameter (13B active) sparse mixture-of-experts language model, engineered to scale model capacity while maintaining inference efficiency over long contexts, with strong performance in reasoning-heavy workloads including math, coding-related tasks, and multi-step agent workflows.

**Key Features**&#x20;

* **Sparse mixture-of-experts architecture:** Uses an extremely sparse MoE design with 400B total parameters and 13B activated per token. Sparse expert routing constrains per-token activation, enabling efficient inference at scale.
* **Long-context training and utilization:** Trained at 256K sequence length with support for 512K inference (hosted at 128k), using architecture and training procedures designed to operate effectively over long inputs and extended multi-turn interactions over large inputs.
* **High throughput efficiency:** Designed with inference-time efficiency as a primary objective, leveraging both extreme sparsity and optimized attention mechanisms to achieve strong throughput on modern accelerator hardware.

### Deployment Quickstart

To get started deploying Trinity Large, download the model [here](https://huggingface.co/arcee-ai) and proceed to [Quick Deploys](/quick-deploys/hardware-prerequisites.md)

### Model Summary

|                                  |                                                        |
| -------------------------------- | ------------------------------------------------------ |
| Name                             | Trinity-Large-Preview                                  |
| Architecture                     | Mixture-of-Experts                                     |
| Parameters                       | 400 Billion Total, 13 Billion Active                   |
| Experts                          | 256 Experts, 4 Active                                  |
| Attention Mechanism              | Grouped Query Attention (GQA)                          |
| Training Tokens                  | 17 trillion                                            |
| License                          | Apache 2.0                                             |
| Recommended Inference Parameters | <ul><li>temperature: 0.8</li><li>top\_p: 0.8</li></ul> |

v


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.arcee.ai/language-models/trinity-large-preview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.