SGLang
Docker Container for SGLang
docker run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=your_hf_token_here" \
-p 8000:8000 \
--ipc=host \
lmsysorg/sglang:latest \
python -m sglang.launch_server \
--model-path arcee-ai/trinity-nano-thinking \
--host 0.0.0.0 \
--port 8000 \
--max-total-tokens 8192 \
--served-model-name afm \
--trust-remote-codeRun Inference using the Chat Completions endpoint.
Last updated

