# SGLang

SGLang is a fast serving framework for language models which makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.&#x20;

{% hint style="warning" %}
The deployments in this document are for deploying Trinity-Nano-6B; however, they work the exact same for all Arcee AI models. To deploy a different model, simply change the model name to the model you'd like to deploy.
{% endhint %}

### Docker Container for SGLang

**Prerequisite**

1. Sufficient VRAM (refer to [Hardware Prerequisites](https://docs.arcee.ai/~/revisions/UOfL3qIelQCFUdc2TpQu/quick-deploys/hardware-prerequisites))&#x20;
2. A Hugging Face account
3. Docker and NVIDIA Container Toolkit installed on your instance
   1. If you need assistance, see [Install Docker Engine](https://docs.docker.com/engine/install/) and [Installing the NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)

**Deployment**

```bash
docker run --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --env "HUGGING_FACE_HUB_TOKEN=your_hf_token_here" \
  -p 8000:8000 \
  --ipc=host \
  lmsysorg/sglang:latest \
  python -m sglang.launch_server \
  --model-path arcee-ai/trinity-nano-thinking \
  --host 0.0.0.0 \
  --port 8000 \
  --max-total-tokens 8192 \
  --served-model-name afm \
  --trust-remote-code
```

{% hint style="info" %}
Replace `your_hf_token_here` with your Hugging Face token
{% endhint %}

### Run Inference using the Chat Completions endpoint.

```bash
curl http://Your.IP.Address:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
        "model": "trinity",
        "messages": [
          { "role": "user", "content": "What are the benefits of model merging" }
        ],
        "temperature": 0.7,
        "top_k": 50,
        "repeat_penalty": 1.1
      }'
```

{% hint style="info" %}
Ensure you replace `Your.IP.Address` with the IP address of the instance you're hosting the model on
{% endhint %}
