Trinity-Mini (26B)

Overview

Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model, engineered for efficient inference over long contexts with robust function calling and multi-step agent workflows.

Key Features

Efficient attention mechanism: reduces memory and compute requirements while preserving long-context coherence.
128K-token context window: supports multi-turn interactions and extended document processing.
Strong context utilization: fully leverages long inputs for coherent multi-turn reasoning and reliable function/tool calls.
High inference efficiency: generates tokens rapidly while minimizing compute, delivering an outstanding price-to-performance ratio.

Deployment Quickstart

To get started deploying Trinity-Mini, download the model here and proceed to Quick Deploys

Model Summary

Name

Trinity-Mini-26B

Architecture

Mixture-of-Experts

Parameters

26 Billion Total, 3.5 Billion Active

Experts

128 Experts, 8 Active

Attention Mechanism

Grouped Query Attention (GQA)

Training Tokens

10 trillion

License

Apache 2.0

Recommended Inference Parameters

temperature: 0.15
top_p: 0.75
top_k: 50
min_p: 0.06

PreviousTrinity-Nano (6B)NextTrinity Large (400B)

Last updated 1 month ago

hashtagDeployment Quickstart

hashtagModel Summary

Deployment Quickstart

Model Summary