Deployment

Start Deployment

Overview

Initiate the deployment process in Arcee by selecting from pretrained, merged, or aligned models. Ensuring model performance is crucial during deployment.

Free Endpoints Last 2 Hours

Due to GPU shortages, free deployment endpoints on Arcee will remain active for only 2 hours for testing before shutting down automatically.

Expect Deployment in 5 minutes

Deployment for 7-8B models in Arcee typically takes about 5 minutes. Please be sure to stand by and get ready to test your model.

Mind the Token Limit

Arcee generation windows are capped at 2048 tokens. If you have a longer context, please download your model.

Starting a Deployment on the Arcee Platform

Step-by-step
  1. Select the Deployments tab and click on the Create Deployment button.

  2. Enter the deployment name in the provided text field.

  3. Select the model to deploy from the dropdown menu.

  4. Configure the deployment settings as needed.

  5. Click on the Create Deployment button.

  6. Verify the deployment status.

Setting Up Deployment

Deployment Requirements

To start a deployment in Arcee, you must provide several essential arguments. These arguments help define the specific characteristics and configurations of your deployment.

  • Deployment Name: This is the unique name for your deployment. It helps identify your deployment, among others.

  • Alignment: Specifies the model alignment to be used in the deployment.

  • Merging: Indicates if the deployment involves merging models.

  • Pretraining: Points to the pretrained model version you want to deploy.

  • Target Instance: Specifies the instance type where the deployment will be hosted.

Deploying with Arcee Python

plaintext
!pip install -q arcee-py
%env ARCEE_API_KEY=[MY-ARCEE-API-KEY]
python
import arcee

arcee.start_deployment("mydeploy", alignment="Qwen2-7B-Instruct")
plaintext
Parameters:

- `deployment_name` (str): The name of the deployment (required).
- `alignment` (Optional[str]): The alignment configuration name.
- `merging` (Optional[str]): The merging configuration name.
- `pretraining` (Optional[str]): The pretraining configuration name.
- `retriever` (Optional[str]): The retriever configuration name.
- `target_instance` (Optional[str]): The target instance for deployment. Default is ml.g5.2xlarge (a single A10 GPU)
- `openai_compatability` (Optional[bool]): Flag indicating OpenAI compatibility, defaults to False. Not advised unless you need the messages API.

Free Tier Limitation

Deployment endpoints on the free tier are automatically closed within two hours after deployment.

Production Recommendation

For production deployments, you are recommended to download your model weights.

Choosing the Model to Deploy

To select a model for deployment in Arcee, start by accessing the Create Deployment screen. This screen features a modal window where you can begin the process.

First, enter a name for your deployment in the Deployment Name text field. Next, use the Model to Deploy dropdown menu to choose the model you wish to deploy. A list of available pretrained models will appear when you click on this dropdown menu. You can select from Qwen2-7B-Instruct, Qwen2-7B, Mistral-7B-Instruct-V0.2, Mistral-7B-Instruct-V0.1, and Mistral-7B-V0.1.

Choosing the correct model is essential, as it should align with your deployment goals and requirements. Please carefully review the available models to select the best fit for your needs.

Available Model Types for Deployment

Model Type

Example Models

Pretrained

Your pretrained models, e.g., llama-base

Merged

Your merges

Aligned

Your alignments, e.g., llama-chat

Frequently Asked Questions

  • You can deploy models that have undergone alignment, pretraining, or merging within the Arcee platform.

  • To select a model for deployment, choose from the list of available pretrained models in the Model to Deploy dropdown menu in the Create Deployment modal.

  • Verifying deployment is essential to ensure your model's performance meets your expectations.

  • On the free tier, deployment endpoints are automatically closed within two hours after deployment.