Multi-Turn Conversations

Multi-Turn Conversations enable models to keep context from previous messages in a conversation providing a more in-depth experience. This guide will show how to use Arcee AI models through Arcee Platform for multi-turn conversations.

The Arcee AI /chat/completions API is a "stateless" API, meaning the server does not record the context of the user's requests. Therefore, the user must concatenate all previous conversation history and pass it to the chat API with each request.

The following Python code demonstrates how to easily concatenate context to achieve multi-turn conversations.

from openai import OpenAI

client = OpenAI(
    api_key="afm-13cf46d35fd48a6aa2da4c8d62424de8", 
    base_url="https://api.arcee.ai/api/v1"
)

# Round 1
messages = [{"role": "user", "content": "What is a small language model?"}]
response = client.chat.completions.create(
    model="arcee-ai/trinity-mini-thinking",
    messages=messages
)

answer = {"role": "assistant", "content": response.choices[0].message.content}
messages.append(answer)
print(f"Messages Round 1: {messages}")

# Round 2
messages.append({"role": "user", "content": "How do they differ from LLMs?"})
response = client.chat.completions.create(
    model="arcee-ai/trinity-mini-thinking",
    messages=messages
)

answer = {"role": "assistant", "content": response.choices[0].message.content}
messages.append(answer)
print(f"Messages Round 2: {messages}")

In the first round of the request, the messages passed to the API are:

[
    {"role": "user", "content": "What is a small language model?"}
]

In the second round of the request:

Add the model's output from the first round to the end of the messages.
Add the new question to the end of the messages.

The messages ultimately passed to the API are:

[
    {"role": "user", "content": "What is a small language model?"},
    {"role": "assistant", "content": "A small language model refers to a model that has a relatively limited number of parameters compared to other large language models. Here's a detailed explanation:\n\nKey Characteristics:\n- Smaller-scale model architecture\n- Fewer parameters (typically tens of billions instead of hundreds of billions)\n- Reduced computational resources needed\n- Faster training and inference times\n- Potentially lower energy consumption\n\nUse Cases:\n1. Resource-constrained environments\n2. Real-time applications\n3. Embedded systems\n4. Low-power devices\n5. Situational-specific needs\n\nExamples include models like:\n- Duchowny et al.'s DLVM\n- Chan et al.'s SHOORN\n- Salimon's ALma (a mid-scale model)\n\nThese smaller models strike a balance between performance and efficiency, often making them suitable for specific applications where resource limitations are a factor."},
    {"role": "user", "content": "How do they differ from LLMs?"}
]

PreviousStreaming Messages NextFunction Calling

Last updated 3 months ago