# DeepGram

**Deepgram Voice Agents** is a flexible conversational AI stack that includes speech‑to‑text (STT), text‑to‑speech (TTS), and pluggable language models, designed to power real‑time, multi‑turn voice applications.

This tutorial will guide you through how to integrate Arcee AI model's as the LLM backbone for your Deepgram voice agent. The first section will showcase how to utilize our models through **Together.ai** and the second will showcase a **self-hosting** option.

***

### Using Arcee Models with Deepgram Voice Agents via Together.ai

#### Create a Together AI API Key

1. Go to [api.together.xyz/settings/api-key](https://api.together.xyz/settings/api-keys)
2. Click **Create API Key**
3. Copy the key and store it securely, you’ll need it to authorize model requests

#### Configure the Deepgram Agent to Use our Model

1. Navigate to the **Deepgram Voice Agent** section of the Deepgram playground
2. Scroll to the **Model** section
3. Under **Select a Large Language Model**, choose:

   ```
   Other – Custom model
   ```
4. Fill in the following fields (and replace the API key under authorization with your together API key from Step 1)

| Field                       | Value                                            |
| --------------------------- | ------------------------------------------------ |
| **Custom Model Name**       | arcee-ai/trinity-mini *(or any Arcee model)*     |
| **Custom Model URL**        | `https://api.together.xyz/v1/chat/completions`   |
| **Custom Model API Format** | OpenAI                                           |
| **Authorization Header**    | `Authorization` → `Bearer YOUR_TOGETHER_API_KEY` |

#### Test Your Agent

1. Scroll down and click **Talk to your Agent**
2. Speak to your agent or type a message
3. Open the **Developer Console** to view the underlying API calls and verify responses.
4. You’ll see the full conversation log in real time, and hear your model’s response played back using Deepgram’s TTS engine.

***

## Using a Self Hosted Model with Deepgram Voice Agents

This guide explains how to integrate **a self hosted Arcee model** as the LLM backbone for your Deepgram voice agent using a self-hosted setup. We will use AFM 4.5B for this example.

#### Deploy a Model with an OpenAI-Compatible Server

Refer to our [Quick Deploys](/quick-deploys/hardware-prerequisites.md) section and our [Hardware Prequesities ](/quick-deploys/hardware-prerequisites.md)page to select a method for deployment based on your use case and hardware. In this example, we'll use **llama.cpp:**

```bash
./bin/llama-server -m ./afm/AFM-4.5B-bf16.gguf \
  --host 0.0.0.0 \
  --port 8000 \
  --ctx-size 8192 \
  --jinja
```

Make sure `--jinja` is included to enable the OpenAI-compatible API.

#### Expose the server via ngrok

Deepgram needs a public HTTPS endpoint to reach your model.

Use ngrok or any tunneling tool:

```bash
ngrok http 8000
```

This will forward to your local server and give you a public URL like:

```
https://your-subdomain.ngrok-free.dev → http://localhost:8000
```

Keep this tunnel active while your Deepgram agent is running.

#### Configure the Deepgram Agent to Use Your Model

1. Navigate to the DeepGram VoiceAgent section of the DeepGram playground
2. Scroll to the **Model** section
3. Under **Select a Large Language Model**, choose: `Other – Custom model`
4. Fill in the following fields:

| Field                       | Value                                                       |
| --------------------------- | ----------------------------------------------------------- |
| **Custom Model Name**       | AFM                                                         |
| **Custom Model URL**        | `https://your-subdomain.ngrok-free.dev/v1/chat/completions` |
| **Custom Model API Format** | OpenAI                                                      |
| **Authorization Header**    | `Authorization` → `Bearer None`                             |

#### Test Your Agent

1. Scroll down and click "Talk to your Agent"
2. Speak to your agent
3. Examine the calls under the developer console

You’ll see the full conversation log in real time, and hear your model's response played back using Deepgram’s TTS engine.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.arcee.ai/get-started/integration-list/deepgram.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
