ElevenLabs
ElevenLabs Agents is a conversational voice agent platform that combines automatic speech recognition (ASR), a pluggable language model, human-like TTS, and a turn-taking engine into a complete voice stack.
This tutorial will guide you through how to integrate Arcee AI model's as the language model for your ElevenLabs agent. The first section will showcase how to utilize our models through Together.ai and the second will showcase a self-hosting option.
Using Arcee Models with ElevenLabs Agents (via Together.ai)
Step 1: Create a Together.ai API Key
Click “Create API Key”
Copy the key and store it securely
Step 2: Connect an Arcee model to Your ElevenLabs Agent
In the ElevenLabs dashboard, go to Settings → Workspace Secrets
Click “Add a Secret”
Name:
together-ai-api-keyValue: Paste your Together AI API key
Click “Add a Secret” to save it to your workspace
Go to the Agents tab from the left pane
Select your existing agent or create a new one
Scroll to the LLM section
Beside "Select which provider and model to use for the LLM", select “Custom LLM”
Fill in the following fields:

Click Save to apply the agent configuration
Step 3: Test the Agent
Click "Test AI Agent" in the ElevenLabs dashboard to chat with the model.
Using Arcee Models with ElevenLabs Agents (Self-Hosted Example)
This section explains how to use one of our self hosted models as the LLM backbone for your agent by self-hosting the model on your own infrastructure. We will use AFM 4.5B in this example
Step 1: Deploy the model
Our models can be deployed using any framework that exposes an OpenAI-compatible endpoint. Choose one of the following depending on your environment:
vLLM
GPU servers with high throughput needs
SGLang
GPU, fast routing, OpenAI-style APIs
llama.cpp
CPU or edge devices, quantized inference
ollama
Lightweight local deployments with simple CLI
Hardware Notes:
AFM‑4.5B can run on as little as 3 GB RAM when quantized to 4-bit
For bf16 inference, allocate at least 9 GB RAM
For deployment guides, visit:
In this example, we will use a self-hosted version of AFM‑4.5B using llama.cpp . After you've followed the steps to download the model in the llama.cpp guide, complete the following:
Step 2: Launch the OpenAI-Compatible Server
Start llama-server with the correct model and context size. This will expose an OpenAI-compatible /v1/chat/completions endpoint:
bin/llama-server -m ./afm/AFM-4.5B-bf16.gguf \
--host 0.0.0.0 \
--port 8000 \
--jinja \
--ctx-size 8192Make sure the
--jinjaflag is included. This is required to enable the OpenAI-compatible API.
Step 3: Expose the Server with ngrok (Required)
To make your server accessible, create a public URL using a tunneling tool like ngrok:
ngrok http 8000This will generate a public HTTPS URL like:
https://your-subdomain.ngrok-free.dev → http://localhost:8000Keep this ngrok tunnel open while the agent is active.
Step 4: Configure ElevenLabs Agent to Use Your Self Hosted Model
Configure your agent
Go to the Agents tab and open your agent
In the Model Configuration section, enter the ngrok url with "/v1" at the end, a placeholder model ID and select "None" for the API key:

Step 5: Test the Agent
Click "Test AI Agent" in the ElevenLabs dashboard to chat with the model.
Last updated

