Page cover

ollama

ollama provides a streamlined command-line interface and API for running open-source language models locally with automatic model management and optimized performance. It abstracts away the complexity of model deployment while offering simple installation and usage patterns for developers and end users.

Prerequisite

  1. Computer or Instance with > 9 GB RAM (if running the model in bf16)

  2. A Hugging Face account with access to arcee-ai/AFM-4.5B-GGUF

  3. Download ollama

Deployment

  1. Download an AFM-4.5B GGUF version from Hugging Face (we recommend using BF16, Q8_0, or Q4_0)

pip install --upgrade huggingface_hub[cli]
hf auth login

mkdir afm

# bf16
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-bf16.gguf --repo-type model --local-dir ./afm

# Q8_0
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-Q8_0.gguf --repo-type model --local-dir ./afm

# Q4_0
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-Q4_0.gguf --repo-type model --local-dir ./afm
  1. Create a Modelfile

cd afm
vim Modelfile
  1. Paste in the following content into the Modelfile

FROM ./AFM-4.5B-Q4_0.gguf

# Template configuration converted to Go template syntax
TEMPLATE """{{- if .Messages }}
{{- if eq (index .Messages 0).Role "system" }}
<|im_start|>system
{{ (index .Messages 0).Content }}<|im_end|>
{{- range $i, $msg := slice .Messages 1 }}
<|im_start|>{{ $msg.Role }}
{{ $msg.Content }}<|im_end|>
{{- end }}
{{- else }}
<|im_start|>system
The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully.<|im_end|>
{{- range .Messages }}
<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{- end }}
{{- end }}
{{- end }}<|im_start|>assistant
"""

# System message defining the assistant's behavior
SYSTEM """The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully."""

# Parameters for generation
PARAMETER temperature 0.5
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 8192 #Max is 65536

# Stop tokens based on the tokenizer config
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|end_of_text|>"

In the first line, edit FROM ./AFM-4.5B-Q4_0.gguf to the name of the model you downloaded

  1. Create the model in ollama

ollama create afm-4.5b
  1. Run AFM-4.5B

ollama run afm-4.5b

Last updated