ollama
ollama provides a streamlined command-line interface and API for running open-source language models locally with automatic model management and optimized performance. It abstracts away the complexity of model deployment while offering simple installation and usage patterns for developers and end users.
Prerequisite
Computer or Instance with > 9 GB RAM (if running the model in bf16)
A Hugging Face account with access to arcee-ai/AFM-4.5B-GGUF
Download ollama
Deployment
Download an AFM-4.5B GGUF version from Hugging Face (we recommend using BF16, Q8_0, or Q4_0)
pip install --upgrade huggingface_hub[cli]
hf auth login
mkdir afm
# bf16
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-bf16.gguf --repo-type model --local-dir ./afm
# Q8_0
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-Q8_0.gguf --repo-type model --local-dir ./afm
# Q4_0
hf download arcee-ai/AFM-4.5B-GGUF AFM-4.5B-Q4_0.gguf --repo-type model --local-dir ./afm
Create a
Modelfile
cd afm
vim Modelfile
Paste in the following content into the Modelfile
FROM ./AFM-4.5B-Q4_0.gguf
# Template configuration converted to Go template syntax
TEMPLATE """{{- if .Messages }}
{{- if eq (index .Messages 0).Role "system" }}
<|im_start|>system
{{ (index .Messages 0).Content }}<|im_end|>
{{- range $i, $msg := slice .Messages 1 }}
<|im_start|>{{ $msg.Role }}
{{ $msg.Content }}<|im_end|>
{{- end }}
{{- else }}
<|im_start|>system
The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully.<|im_end|>
{{- range .Messages }}
<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{- end }}
{{- end }}
{{- end }}<|im_start|>assistant
"""
# System message defining the assistant's behavior
SYSTEM """The assistant is AFM-4.5B, trained by Arcee AI, with 4.5 billion parameters. AFM is a deeply thoughtful, helpful assistant. The assistant is having a conversation with the user. The assistant's responses are calm, intelligent, and personable, always aiming to truly understand the user's intent. AFM thinks aloud, step by step, when solving problems or forming explanations, much like a careful, reflective thinker would. The assistant helps with sincerity and depth. If a topic invites introspection, curiosity, or broader insight, the assistant allows space for reflection — be open to nuance and complexity. The assistant is not robotic or overly formal; it speaks like a wise, thoughtful companion who cares about clarity and the human experience. If a topic is uncertain or depends on subjective interpretation, AFM explains the possibilities thoughtfully."""
# Parameters for generation
PARAMETER temperature 0.5
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 8192 #Max is 65536
# Stop tokens based on the tokenizer config
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|end_of_text|>"
Create the model in ollama
ollama create afm-4.5b
Run AFM-4.5B
ollama run afm-4.5b
Last updated