Trinity-Large-Thinking

Overview

Trinity-Large-Thinking is a reasoning-optimized variant of Arcee AI's Trinity-Large family — a 398B-parameter sparse Mixture-of-Experts (MoE) model with approximately 13B active parameters per token. Built on Trinity-Large-Base and post-trained with extended chain-of-thought reasoning and agentic RL, Trinity-Large-Thinking delivers state-of-the-art performance on agentic benchmarks while maintaining strong general capabilities.

Trinity-Large-Thinking generates explicit reasoning traces wrapped in <think>...</think> blocks before producing its final response. This thinking process is critical to the model's performance — thinking tokens must be kept in context for multi-turn conversations and agentic loops to function correctly.

Key Features

  • Agentic-first design: Purpose-built for tool calling, multi-step planning, and agent workflows

  • State-of-the-art agentic performance: 94.7% on τ²-Bench, 91.9% on PinchBench, 98.2% on LiveCodeBench

  • Native reasoning traces: Extended chain-of-thought via <think>...</think> blocks

  • Compatible with major agent frameworks: Works out of the box with OpenClawarrow-up-right and Hermes Agentarrow-up-right

Thinking-in-Context: Important Usage Note

Trinity-Large-Thinking produces reasoning traces inside <think>...</think> blocks before generating its final response.

This means:

  1. Multi-turn conversations: When building chat applications, include the full assistant response (thinking + answer) in the conversation history for subsequent turns.

  2. Agentic loops: When using Trinity-Large-Thinking as the backbone of an agent (OpenClaw, Hermes Agent, or custom), ensure your tool-calling loop preserves <think> blocks in the message history between steps.

  3. Context window management: The 512k extended context window accommodates long reasoning chains across many agentic steps. If you must truncate history, prefer removing older turns entirely rather than stripping thinking tokens from recent turns.

Benchmarks

Benchmark
Trinity-Large-Thinking
Opus-4.6
GLM-5
MiniMax-M2.7
Kimi-K2.5

IFBench

52.3

53.1

72.3

75.7

70.2

GPQA-Diamond

76.3

89.2

81.6

86.2

86.9

Tau2-Airline

88.0

82.0

80.5

80.0

80.0

Tau2-Telecom

94.7

92.1

98.2

84.8

95.9

PinchBench

91.9

93.3

86.4

89.8

84.8

AIME25

96.3

99.8

93.3

80.0

96.3

BCFLv4

70.1

77.0

70.8

70.6

68.3

MMLU-Pro

83.4

89.1

85.8

80.8

87.1

SWE-bench Verified*

63.2

75.6

72.8

75.4

70.8

*All models evaluated in mini-swe-agent-v2

Deployment Quickstart

To get started deploying Trinity Large, download the model herearrow-up-right and proceed to Quick Deploys

Model Summary

Name

Trinity-Large-Thinking

Architecture

Sparse MoE (AfmoeForCausalLM)

Parameters

398 Billion Total, 13 Billion Active

Experts

256 Experts, 4 Active

Attention Mechanism

Grouped Query Attention (GQA)

Training Tokens

17 trillion

License

Apache 2.0

Recommended Inference Parameters

  • temperature: 0.3

Last updated