AFM-4.5B Quick Deploys

AFM-4.5B is designed to run efficiently on low VRAM GPUs and CPUs. For GPU deployments, we recommend using vLLM or SGLang. For CPU deployments, we recommend using llama.cpp or ollama.

AFM-4.5B can run on as little as 3 GB RAM when quantized to 4-bit and will need at least 9 GB RAM when loading in bf16. For a breakdown of AFM-4.5B performance on Intel Sapphire Rapids, AWS Graviton4, and Qualcomm Z1E-80-100 processors, read Is Running Language Models on CPU Really Viable?

Deployment Guides:

vLLM
SGLang
llama.cpp
ollama

For a more detailed, step-by-step walk through of deploying AFM on various platforms, see AFM Learning Paths.

PreviousGetting Started: AFM-4.5B NextvLLM

Last updated 21 days ago