# Consumer Hardware

## Consumer GPU Performance

This section summarizes Trinity model performance on common **single-GPU consumer hardware**.

Benchmarks were run on:

| GPU             | VRAM    |
| --------------- | ------- |
| NVIDIA RTX 3090 | 24 GB   |
| NVIDIA RTX 4090 | 24 GB   |
| NVIDIA RTX 5090 | \~32 GB |

The results are organized by inference framework:

* **vLLM Benchmarks**\
  Performance of Trinity Nano using vLLM, including request throughput, token throughput, and latency metrics.
* **llama.cpp Benchmarks**\
  Performance of Trinity Nano and Trinity Mini using GGUF quantizations across decode speed, context scaling, and generation workloads.

All results shown are from **single-GPU runs** to reflect typical workstation and desktop deployments.

### Benchmark Coverage

The benchmark dataset includes:

* throughput and latency measurements
* quantization sweeps
* decode speed benchmarks
* context scaling tests
* real generation workloads (QA, code generation, long-form text)
