vLLM
Docker Container for vLLM
docker run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=your_hf_token_here" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model arcee-ai/Trinity-Mini \
--dtype bfloat16 \
--enable-auto-tool-choice \
--reasoning-parser deepseek_r1 \
--port 8000 \
--tool-call-parser hermesManual Install using vLLM
Run Inference using the Chat Completions endpoint.
Last updated


