Consumer Hardware
Consumer GPU Performance
This section summarizes Trinity model performance on common single-GPU consumer hardware.
Benchmarks were run on:
NVIDIA RTX 3090
24 GB
NVIDIA RTX 4090
24 GB
NVIDIA RTX 5090
~32 GB
The results are organized by inference framework:
vLLM Benchmarks Performance of Trinity Nano using vLLM, including request throughput, token throughput, and latency metrics.
llama.cpp Benchmarks Performance of Trinity Nano and Trinity Mini using GGUF quantizations across decode speed, context scaling, and generation workloads.
All results shown are from single-GPU runs to reflect typical workstation and desktop deployments.
Benchmark Coverage
The benchmark dataset includes:
throughput and latency measurements
quantization sweeps
decode speed benchmarks
context scaling tests
real generation workloads (QA, code generation, long-form text)
Last updated

