DeepGram
Deepgram Voice Agents is a flexible conversational AI stack that includes speech‑to‑text (STT), text‑to‑speech (TTS), and pluggable language models, designed to power real‑time, multi‑turn voice applications.
This tutorial will guide you through how to integrate Arcee AI model's as the LLM backbone for your Deepgram voice agent. The first section will showcase how to utilize our models through Together.ai and the second will showcase a self-hosting option.
Using Arcee Models with Deepgram Voice Agents (via Together.ai)
Step 1: Create a Together AI API Key
Click Create API Key
Copy the key and store it securely, you’ll need it to authorize model requests
Step 2: Configure the Deepgram Agent to Use our Model
Navigate to the Deepgram Voice Agent section of the Deepgram playground
Scroll to the Model section
Under Select a Large Language Model, choose:
Other – Custom modelFill in the following fields (and replace the API key under authorization with your together API key from Step 1)
Custom Model Name
arcee-ai/AFM-4.5B (or any Arcee model)
Custom Model URL
https://api.together.xyz/v1/chat/completions
Custom Model API Format
OpenAI
Authorization Header
Authorization → Bearer YOUR_TOGETHER_API_KEY
Step 3: Test Your Agent
Scroll down and click Talk to your Agent
Speak to your agent or type a message
Open the Developer Console to view the underlying API calls and verify responses.
You’ll see the full conversation log in real time, and hear your model’s response played back using Deepgram’s TTS engine.
Using AFM‑4.5B with Deepgram Voice Agents (Self-Hosted)
This guide explains how to integrate a self hosted Arcee model as the LLM backbone for your Deepgram voice agent using a self-hosted setup. We will use AFM 4.5B for this example.
Step 1: Deploy a Model with an OpenAI-Compatible Server
Refer to our Quick Deploys section and our Hardware Prequesities page to select a method for deployment based on your use case and hardware. In this example, we'll use llama.cpp:
Make sure --jinja is included to enable the OpenAI-compatible API.
Step 2: Expose the server via ngrok
Deepgram needs a public HTTPS endpoint to reach your model.
Use ngrok or any tunneling tool:
This will forward to your local server and give you a public URL like:
Keep this tunnel active while your Deepgram agent is running.
Step 3: Configure the Deepgram Agent to Use AFM‑4.5B
Navigate to the DeepGram VoiceAgent section of the DeepGram playground
Scroll to the Model section
Under Select a Large Language Model, choose:
Other – Custom modelFill in the following fields:
Custom Model Name
AFM
Custom Model URL
https://your-subdomain.ngrok-free.dev/v1/chat/completions
Custom Model API Format
OpenAI
Authorization Header
Authorization → Bearer None
Step 4: Test Your Agent
Scroll down and click "Talk to your Agent"
Speak to your agent
Examine the calls under the developer console
You’ll see the full conversation log in real time, and hear your model's response played back using Deepgram’s TTS engine.
Last updated


