AFM-4.5B
AFM-4.5B is the first model available in the Arcee Foundation Model family. AFM-4.5B is a 4.5 billion parameter small language model, which delivers business performance comparable to much larger models at vastly lower hosting costs, while being efficient enough to run on low-RAM GPUs or even CPUs.
AFM-4.5B comes in two variants - base and instruct. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pre-training data followed by 1.5 trillion tokens of mid-training data with enhanced focus on mathematical reasoning and code generation. Following pre-training, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference.
We used a modified version of TorchTitan for pre-training, Axolotl for supervised fine-tuning, and a modified version of Verifiers for reinforcement learning.
Both variants of AFM-4.5B are available on Hugging Face:
Get Started
To get started deploying AFM-4.5B on a GPU, proceed to AFM-4.5B GPU Quick Deploy.
To get started deploying AFM-4.5B on a CPU, proceed to AFM-4.5B CPU Quick Deploy.
Model Summary
Name
AFM-4.5B
Parameters
4.5 billion
Architecture
Decoder-only Transformer
Activation Function
ReLU²
Attention
Grouped Query Attention
Training Tokens
8 trillion
License
Recommended Inference Parameters
temperature: 0.5
top_k: 50
top_p: 0.95
repeat_penalty: 1.1
Training Pipeline
Pre-training (6.5T tokens): General web, code, multilingual, and reasoning data.
Mid-training (1.5T tokens): Emphasis on math, programming, and structured reasoning.
Supervised Fine-tuning: High-quality instruction datasets for chat-style interactions.
RLHF: Reinforcement learning with verifiable reward models and human preference optimization.
Data Curation: Powered by DatologyAI, using model-based filtering, source mixing, and synthetic data synthesis.
Performance Characteristics
Factual Accuracy: Low hallucination rate due to clean, curated dataset.
Compliance: Minimal IP risk with exclusion of copyrighted books and restricted data.
Inference Efficiency: Suitable for real-time applications on lower-end GPUs or CPUs.
Multilingual: Supports Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish.
Performance Metrics
H100 x 1
65536 (Max)
bf16
16
136
H100 x 1
4096
bf16
250
74.5
L40S x 1
8190
bf16
58
137
L40S x 1
4096
bf16
115
136
Intel CPU1
1024
Q4_0
4
29
Graviton42
1024
Q4_0
4
60
1 Intel Sapphire Rapids CPU with 32 threads
2AWS Graviton4 Instance with 32 vCPUs
*TPS benchmarks represent tokens per second per request at maximum concurrent requests. TPS will increase with fewer concurrent requests, so the benchmark numbers effectively represent minimum TPS.
Relevant Blogs
Announcing Arcee Foundation Models
Last updated