ollama

ollama provides a streamlined command-line interface and API for running open-source language models locally with automatic model management and optimized performance. It abstracts away the complexity of model deployment while offering simple installation and usage patterns for developers and end users.

The deployments in this document are for deploying AFM-4.5B; however, they work the exact same for all Arcee AI models. To deploy a different model, simply change the model name to the model you'd like to deploy.

Prerequisite

Computer or Instance with > 9 GB RAM (if running the model in bf16)
A Hugging Face account with access to arcee-ai/AFM-4.5B-GGUF
Download ollama

Deployment

Download an AFM-4.5B GGUF version from Hugging Face (we recommend using BF16, Q8_0, or Q4_0)

pip install --upgrade huggingface_hub[cli]
hf auth login

mkdir afm

# bf16
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-bf16.gguf --repo-type model --local-dir ./afm

# Q8_0
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-Q8_0.gguf --repo-type model --local-dir ./afm

# Q4_0
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-Q4_0.gguf --repo-type model --local-dir ./afm

Create a Modelfile

cd afm
vim Modelfile

Paste in the following content into the Modelfile

FROM ./AFM-4.5B-Q4_0.gguf

# Template configuration converted to Go template syntax
TEMPLATE """{{- if .Messages }}
{{- if eq (index .Messages 0).Role "system" }}
<|im_start|>system
{{ (index .Messages 0).Content }}<|im_end|>
{{- range $i, $msg := slice .Messages 1 }}
<|im_start|>{{ $msg.Role }}
{{ $msg.Content }}<|im_end|>
{{- end }}
{{- else }}
<|im_start|>system
The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully.<|im_end|>
{{- range .Messages }}
<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{- end }}
{{- end }}
{{- end }}<|im_start|>assistant
"""

# System message defining the assistant's behavior
SYSTEM """The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully."""

# Parameters for generation
PARAMETER temperature 0.5
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 8192 #Max is 65536

# Stop tokens based on the tokenizer config
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|end_of_text|>"

In the first line, edit FROM ./AFM-4.5B-Q4_0.gguf to the name of the model you downloaded

Create the model in ollama

ollama create afm-4.5b

Run AFM-4.5B

ollama run afm-4.5b

Last updated 4 months ago