> For the complete documentation index, see [llms.txt](https://docs.arcee.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.arcee.ai/language-models/trinity-large-thinking.md). # Trinity-Large-Thinking **Overview** Trinity-Large-Thinking is a reasoning-optimized variant of Arcee AI's Trinity-Large family — a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token. Built on Trinity-Large-Base and post-trained with extended chain-of-thought reasoning and agentic RL, Trinity-Large-Thinking delivers state-of-the-art performance on agentic benchmarks while maintaining strong general capabilities. Trinity-Large-Thinking generates explicit reasoning traces wrapped in `...` blocks before producing its final response. This thinking process is critical to the model's performance — **thinking tokens must be kept in context** for multi-turn conversations and agentic loops to function correctly.

**Key Features** * **Agentic-first design**: Purpose-built for tool calling, multi-step planning, and agent workflows * **State-of-the-art agentic performance**: 94.7% on τ²-Bench, 91.9% on PinchBench, 98.2% on LiveCodeBench * **Native reasoning traces**: Extended chain-of-thought via `...` blocks * **Compatible with major agent frameworks**: Works out of the box with [OpenClaw](https://github.com/openclaw) and [Hermes Agent](https://github.com/NousResearch/hermes-agent) **Thinking-in-Context: Important Usage Note** Trinity-Large-Thinking produces reasoning traces inside `...` blocks before generating its final response. This means: 1. **Multi-turn conversations**: When building chat applications, include the full assistant response (thinking + answer) in the conversation history for subsequent turns. 2. **Agentic loops**: When using Trinity-Large-Thinking as the backbone of an agent (OpenClaw, Hermes Agent, or custom), ensure your tool-calling loop preserves `` blocks in the message history between steps. 3. **Context window management**: The 512k extended context window accommodates long reasoning chains across many agentic steps. If you must truncate history, prefer removing older turns entirely rather than stripping thinking tokens from recent turns. For implementation details, pitfalls (`reasoning` vs `reasoning_content`), and Python/TypeScript examples, refer to the [Reasoning Traces](/capabilities/reasoning-traces.md) page. ### Benchmarks

Benchmark	Trinity-Large-Thinking	Opus-4.6	GLM-5	MiniMax-M2.7	Kimi-K2.5
IFBench	52.3	53.1	72.3	75.7	70.2
GPQA-Diamond	76.3	89.2	81.6	86.2	86.9
Tau2-Airline	88.0	82.0	80.5	80.0	80.0
Tau2-Telecom	94.7	92.1	98.2	84.8	95.9
PinchBench	91.9	93.3	86.4	89.8	84.8
AIME25	96.3	99.8	93.3	80.0	96.3
BCFLv4	70.1	77.0	70.8	70.6	68.3
MMLU-Pro	83.4	89.1	85.8	80.8	87.1
SWE-bench Verified*	63.2	75.6	72.8	75.4	70.8

\*All models evaluated in mini-swe-agent-v2 ### Deployment Quickstart To get started deploying Trinity Large, download the model [here](https://huggingface.co/arcee-ai) and proceed to [Quick Deploys](/quick-deploys/hardware-prerequisites.md) ### Model Summary | | | | -------------------------------- | ------------------------------------ | | Name | Trinity-Large-Thinking | | Architecture | Sparse MoE (AfmoeForCausalLM) | | Parameters | 398 Billion Total, 13 Billion Active | | Experts | 256 Experts, 4 Active | | Attention Mechanism | Grouped Query Attention (GQA) | | Training Tokens | 17 trillion | | License | Apache 2.0 | | Recommended Inference Parameters |

temperature: 0.3

| --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.arcee.ai/language-models/trinity-large-thinking.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.