# Inference Engines

Arcee models can be deployed across several popular inference engines depending on your hardware, performance goals, and integration needs. Each engine offers different strengths, from high-throughput GPU serving to lightweight local CPU inference. The table below summarizes the recommended environments and use cases for each option to help you choose the best deployment path for your application.&#x20;

| Inference Engine | Recommended For                                                                                            |
| ---------------- | ---------------------------------------------------------------------------------------------------------- |
| **vLLM**         | GPU servers with high-throughput needs; predictable prompts, batch processing, and structured workflows    |
| **SGLang**       | Dynamic, multi-turn GPU workloads such as chat applications and assistants                                 |
| **llama.cpp**    | CPU or edge devices, quantized inference and environments where you need efficient inference without a GPU |

To learn more about supported hardware and recommended setups, visit [hardware-prerequisites](https://docs.arcee.ai/quick-deploys/hardware-prerequisites "mention").
