llama.cpp
llama.cpp Benchmarks
RTX 3090
Nano decode
Quantization
tg128
Mini decode
Quantization
tg128
RTX 4090
Nano decode
Quantization
tg128
Mini decode
Quantization
tg128
RTX 5090
Nano decode
Quantization
tg128
Mini decode
Quantization
tg128
Context scaling (RTX 5090)
Model
ctx 512
ctx 32768
Model compatibility
Model
Size
RTX 3090
RTX 4090
RTX 5090
Last updated

