Page cover

ollama

ollama provides a streamlined command-line interface and API for running open-source language models locally with automatic model management and optimized performance. It abstracts away the complexity of model deployment while offering simple installation and usage patterns for developers and end users.

The deployments in this document are for deploying AFM-4.5B; however, they work the exact same for all Arcee AI models. To deploy a different model, simply change the model name to the model you'd like to deploy.

Prerequisite

  1. Computer or Instance with > 9 GB RAM (if running the model in bf16)

  2. A Hugging Face account with access to arcee-ai/AFM-4.5B-GGUF

  3. Download ollama

Deployment

  1. Download an AFM-4.5B GGUF version from Hugging Face (we recommend using BF16, Q8_0, or Q4_0)

pip install --upgrade huggingface_hub[cli]
hf auth login

mkdir afm

# bf16
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-bf16.gguf --repo-type model --local-dir ./afm

# Q8_0
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-Q8_0.gguf --repo-type model --local-dir ./afm

# Q4_0
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-Q4_0.gguf --repo-type model --local-dir ./afm
  1. Create a Modelfile

  1. Paste in the following content into the Modelfile

In the first line, edit FROM ./AFM-4.5B-Q4_0.gguf to the name of the model you downloaded

  1. Create the model in ollama

  1. Run AFM-4.5B

Last updated