Page cover

Trinity-Nano (6B)

Overview

Trinity Nano is a 6B-parameter (1B active) sparse mixture-of-experts language model, optimized for high-efficiency inference in real-time, on-device, and embedded AI applications.

Key Features

  • Efficient attention mechanism: reduces memory and compute requirements while preserving long-context coherence.

  • 128K-token context window: supports multi-turn interactions and extended document processing.

  • Strong context utilization: fully leverages long inputs for coherent multi-turn reasoning and reliable function/tool calls.

  • High inference efficiency: generates tokens rapidly while minimizing compute, delivering an outstanding price-to-performance ratio.

Deployment Quickstart

To get started deploying Trinity-6B, download the model here and proceed to Quick Deploys

Model Summary

Name

Trinity-Nano-6B

Architecture

Mixture-of-Experts

Parameters

6 Billion Total, 1 Billion Active

Experts

128 Experts, 8 Active

Attention Mechanism

Grouped Query Attention (GQA)

Training Tokens

10 trillion

License

Apache 2.0

Recommended Inference Parameters

  • temperature: 0.15

  • top_p: 0.75

  • top_k: 50

  • min_p: 0.06

Last updated