vLLM
Docker Container for vLLM
docker run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=your_hf_token_here" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model arcee-ai/trinity-nano-thinking \
--host 0.0.0.0 \
--port 8000 \
--max-model-len 8192 \
--served-model-name afm \
--model_impl transformers \
--trust-remote-codeManual Install using vLLM
Run Inference using the Chat Completions endpoint.
Last updated

