# Multi-Turn Conversations

Multi-Turn Conversations enable models to keep context from previous messages in a conversation providing a more in-depth experience. This guide will show how to use Arcee AI models through Arcee Platform for multi-turn conversations.

The Arcee AI `/chat/completions` API is a "stateless" API, meaning the server does not record the context of the user's requests. Therefore, the user must **concatenate all previous conversation history** and pass it to the chat API with each request.

The following Python code demonstrates how to easily concatenate context to achieve multi-turn conversations.

```python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY", 
    base_url="https://api.arcee.ai/api/v1"
)

# Round 1
messages = [{"role": "user", "content": "What is a small language model?"}]
response = client.chat.completions.create(
    model="MODEL_NAME",
    messages=messages
)

answer = {"role": "assistant", "content": response.choices[0].message.content}
messages.append(answer)
print(f"Messages Round 1: {messages}")

# Round 2
messages.append({"role": "user", "content": "How do they differ from LLMs?"})
response = client.chat.completions.create(
    model="arcee-ai/trinity-mini-thinking",
    messages=messages
)

answer = {"role": "assistant", "content": response.choices[0].message.content}
messages.append(answer)
print(f"Messages Round 2: {messages}")
```

***

In the **first round** of the request, the `messages` passed to the API are:

```
[
    {"role": "user", "content": "What is a small language model?"}
]
```

In the **second round** of the request:

1. Add the model's output from the first round to the end of the `messages`.
2. Add the new question to the end of the `messages`.

The `messages` ultimately passed to the API are:

```
[
    {"role": "user", "content": "What is a small language model?"},
    {"role": "assistant", "content": "A small language model refers to a model that has a relatively limited number of parameters compared to other large language models. Here's a detailed explanation:\n\nKey Characteristics:\n- Smaller-scale model architecture\n- Fewer parameters (typically tens of billions instead of hundreds of billions)\n- Reduced computational resources needed\n- Faster training and inference times\n- Potentially lower energy consumption\n\nUse Cases:\n1. Resource-constrained environments\n2. Real-time applications\n3. Embedded systems\n4. Low-power devices\n5. Situational-specific needs\n\nExamples include models like:\n- Duchowny et al.'s DLVM\n- Chan et al.'s SHOORN\n- Salimon's ALma (a mid-scale model)\n\nThese smaller models strike a balance between performance and efficiency, often making them suitable for specific applications where resource limitations are a factor."},
    {"role": "user", "content": "How do they differ from LLMs?"}
]
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.arcee.ai/capabilities/multi-turn-conversations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
