Quick Deploys

Arcee's family of foundation models are designed to run efficiently in a variety of environments. Models under 8B parameters do well in edge, on-device, CPU, and low-GPU VRAM environments. Models larger than 8B excel with GPU inference.

For GPU deployments, we recommend using vLLM or SGLang.

For CPU deployments, we recommend using llama.cpp or ollama.

The size of the model directly impacts the hardware requirements to run the model. For example, AFM-4.5B can run on as little as 3 GB RAM when quantized to 4-bit and will need at least 9 GB RAM when loading in bf16.

Deployment Guides:

vLLM
SGLang
llama.cpp
ollama

For a more detailed, step-by-step walk through of deploying AFM on various platforms, see AFM Learning Paths.

PreviousAFM-4.5B NextvLLM

Last updated 22 days ago