Skip to main content

Model Deployment

About Deployment

Model deployment allows you to deploy personal public or private models on GPU instances, and you can configure the range of automatic scaling. Compared with direct API calls of public models, deployment services are more stable and efficient. They automatically adjust computing resources based on the request volume and provide monitoring of the deployment service, making it easy to track its running status.

Create Deployment

Visit the My Deployments page and click Create Deployment to enter the deployment creation page.

Fill in Basic Information

  • Enter the deployment name (letters, numbers, hyphens -, and English periods . are supported).
  • Select the model name and version to deploy (private or public models can be selected). You can change the version at any time later.
  • Note: Official public models do not support deployment.

Configure Auto Scaling

Configure the instance type for model deployment, specify the number of GPUs per instance, and the auto scaling range of instances.

  • When set to 0-0, the deployment service will not run and no billing will be incurred.
  • When set to 0-n (n > 0), the deployment service will automatically scale between 0 and n instances based on the request volume, with billing based on the number of instances and duration. This may cause cold starts: if no requests are detected for 10 minutes, the instance count will scale down to 0. When a new request arrives, the service will need time to start.
  • When set to n-m, the minimum number of instances is n, keeping the service in a warm state. Requests will not experience startup latency.

Manage Deployment

Go to My Deployments to view the list of deployments, which shows the deployment name and number of running instances.

View Details

Click a deployment service to enter its details page. You can view the model deployment experience area, API documentation, and operation logs.

Modify Deployment

In the Settings panel, modify the configuration of the deployment service, including its name, model version, and hardware settings.

Disable Deployment

Click Disable this deployment. After confirmation, the deployment service will stop running and billing will stop. This is suitable for temporarily adjusting configurations or pausing the service. Each disable operation takes about 1 minute to take effect.

Delete Deployment

Click Delete Deployment to permanently delete the deployment service. This action is irreversible, so please proceed with caution.