AFM-4.5B
AFM-4.5B is the first model available in the Arcee Foundation Model family. AFM-4.5B is a 4.5 billion parameter small language model, which delivers business performance comparable to much larger models at vastly lower hosting costs, while being efficient enough to run on low-RAM GPUs or even CPUs.
AFM-4.5B comes in two variants - base and instruct. The base model was trained on a dataset of 8 trillion tokens, comprising 6.5 trillion tokens of general pre-training data followed by 1.5 trillion tokens of mid-training data with enhanced focus on mathematical reasoning and code generation. Following pre-training, the model underwent supervised fine-tuning on high-quality instruction datasets. The instruction-tuned model was further refined through reinforcement learning on verifiable rewards as well as for human preference.
We used a modified version of TorchTitan for pre-training, Axolotl for supervised fine-tuning, and a modified version of Verifiers for reinforcement learning.
Both variants of AFM-4.5B are available on Hugging Face:
Deployment Quickstart
To get started deploying AFM-4.5B, proceed to AFM-4.5B Quick Deploys.
Model Summary
Name
AFM-4.5B
Parameters
4.5 billion
Architecture
Decoder-only Transformer
Activation Function
ReLU²
Attention
Grouped Query Attention
Training Tokens
License
Apache 2.0
Recommended Inference Parameters
temperature: 0.5
top_k: 50
top_p: 0.95
repeat_penalty: 1.1
Training Pipeline
Pre-training (6.5T tokens): General web, code, multilingual, and reasoning data.
Mid-training (1.5T tokens): Emphasis on math, programming, and structured reasoning.
Supervised Fine-tuning: High-quality instruction datasets for chat-style interactions.
RLHF: Reinforcement learning with verifiable reward models and human preference optimization.
Data Curation: Powered by DatologyAI, using model-based filtering, source mixing, and synthetic data synthesis.
Performance Characteristics
Factual Accuracy: Low hallucination rate due to clean, curated dataset.
Compliance: Minimal IP risk with exclusion of copyrighted books and restricted data.
Inference Efficiency: Suitable for real-time applications on lower-end GPUs or CPUs.
Multilingual: Supports Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish.
Performance Metrics
H100 x 1
65536 (Max)
bf16
16
136
H100 x 1
4096
bf16
250
74.5
L40S x 1
8192
bf16
55
59
L40S x 1
4096
bf16
109
64
A10 x 1
8192
bf16
12
65
A10 x 1
4096
bf16
25
75
Intel CPU1
1024
Q4_0
4
29
Graviton42
1024
Q4_0
4
60
1 Intel Sapphire Rapids CPU with 32 threads
2AWS Graviton4 Instance with 32 vCPUs
Relevant Blogs
Announcing Arcee Foundation Models
Last updated