# Multi-Turn Conversations

Multi-Turn Conversations enable models to keep context from previous messages in a conversation providing a more in-depth experience. This guide will show how to use Arcee AI models through Arcee Platform for multi-turn conversations.

The Arcee AI `/chat/completions` API is a "stateless" API, meaning the server does not record the context of the user's requests. Therefore, the user must **concatenate all previous conversation history** and pass it to the chat API with each request.

The following Python code demonstrates how to easily concatenate context to achieve multi-turn conversations.

```python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY", 
    base_url="https://api.arcee.ai/api/v1"
)

# Round 1
messages = [{"role": "user", "content": "What is a small language model?"}]
response = client.chat.completions.create(
    model="MODEL_NAME",
    messages=messages
)

answer = {"role": "assistant", "content": response.choices[0].message.content}
messages.append(answer)
print(f"Messages Round 1: {messages}")

# Round 2
messages.append({"role": "user", "content": "How do they differ from LLMs?"})
response = client.chat.completions.create(
    model="arcee-ai/trinity-mini-thinking",
    messages=messages
)

answer = {"role": "assistant", "content": response.choices[0].message.content}
messages.append(answer)
print(f"Messages Round 2: {messages}")
```

***

In the **first round** of the request, the `messages` passed to the API are:

```
[
    {"role": "user", "content": "What is a small language model?"}
]
```

In the **second round** of the request:

1. Add the model's output from the first round to the end of the `messages`.
2. Add the new question to the end of the `messages`.

The `messages` ultimately passed to the API are:

```
[
    {"role": "user", "content": "What is a small language model?"},
    {"role": "assistant", "content": "A small language model refers to a model that has a relatively limited number of parameters compared to other large language models. Here's a detailed explanation:\n\nKey Characteristics:\n- Smaller-scale model architecture\n- Fewer parameters (typically tens of billions instead of hundreds of billions)\n- Reduced computational resources needed\n- Faster training and inference times\n- Potentially lower energy consumption\n\nUse Cases:\n1. Resource-constrained environments\n2. Real-time applications\n3. Embedded systems\n4. Low-power devices\n5. Situational-specific needs\n\nExamples include models like:\n- Duchowny et al.'s DLVM\n- Chan et al.'s SHOORN\n- Salimon's ALma (a mid-scale model)\n\nThese smaller models strike a balance between performance and efficiency, often making them suitable for specific applications where resource limitations are a factor."},
    {"role": "user", "content": "How do they differ from LLMs?"}
]
```
