Trinity-Nano (6B)
Overview
Trinity Nano is a 6B-parameter (1B active) sparse mixture-of-experts language model, optimized for high-efficiency inference in real-time, on-device, and embedded AI applications.
Key Features
Efficient attention mechanism: reduces memory and compute requirements while preserving long-context coherence.
128K-token context window: supports multi-turn interactions and extended document processing.
Strong context utilization: fully leverages long inputs for coherent multi-turn reasoning and reliable function/tool calls.
High inference efficiency: generates tokens rapidly while minimizing compute, delivering an outstanding price-to-performance ratio.
Deployment Quickstart
To get started deploying Trinity-6B, download the model here and proceed to Quick Deploys
Model Summary
Name
Trinity-Nano-6B
Architecture
Mixture-of-Experts
Parameters
6 Billion Total, 1 Billion Active
Experts
128 Experts, 8 Active
Attention Mechanism
Grouped Query Attention (GQA)
Training Tokens
10 trillion
License
Apache 2.0
Recommended Inference Parameters
temperature: 0.15
top_p: 0.75
top_k: 50
min_p: 0.06
Last updated


