Model Upload
Use the Gear model building tool to upload your custom model.
1. Instructions
- This tool only supports MacOS and Linux
- The operating machine needs to have a docker environment ready
2. Install Gear
sudo curl -o /usr/local/bin/gear -L http://oss-high-qy01.cdsgss.com/gear/gear_`uname -s`_`uname -m` && sudo chmod +x /usr/local/bin/gear
3. Start the Model
3.1. Initialization
mkdir qwen0_5B && cd qwen0_5B
$ gear init
Setting up the current directory for use with gear...
✅ Created /data/qwen/predict.py
✅ Created /data/qwen/.dockerignore
✅ Created /data/qwen/.github/workflows/push.yaml
✅ Created /data/qwen/gear.yaml
Done!
3.2. Prepare Model Files
Currently, pulling model files from remote is not supported. You must download your own model to the local machine.
$ git clone https://hf-mirror.com/Qwen/Qwen2.5-0.5B
Cloning into 'Qwen2.5-0.5B'...
remote: Enumerating objects: 42, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 42 (delta 18), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (42/42), 3.61 MiB | 3.99 MiB/s, done.
3.3. Prepare gear.yaml
Configuration File
# Configuration for gear ⚙️
build:
# Whether GPU is needed
gpu: true
# Ubuntu system packages to install
# system_packages:
# - "libgl1-mesa-glx"
# - "libglib2.0-0"
# Python version '3.11'
python_version: "3.11"
# Python packages <package-name>==<version>
python_packages:
- "transformers"
- "torch"
- "accelerate>=0.26.0"
# Commands to run after environment setup
# run:
# - "echo env is ready!"
# - "echo another command if needed"
# predict.py defines how to run prediction on the model
predict: "predict.py:Predictor"
# Control request concurrency, if specified, `def predict` in predict.py must be changed to `async def predict`
# concurrency:
# max: 20
3.4. Prepare predict.py
File
from gear import BasePredictor, Input, ConcatenateIterator
class Predictor(BasePredictor):
def setup(self) -> None:
"""Load the model into memory to efficiently run multiple predictions"""
# self.model = AutoModelForCausalLM.from_pretrained(
# "./Qwen2.5-0.5B",
# torch_dtype="auto",
# device_map="auto"
# )
# self.tokenizer = AutoTokenizer.from_pretrained(model_name)
async def predict(
self,
prompt: str = Input(description="prompt input", default="Hello")
) -> ConcatenateIterator[str]:
"""Run a prediction on the model"""
test_model = ["This", "is", "a", "test", "example"]
for i in test_model:
yield i
3.5. Local Test Model
Run the test command:
$ gear serve
Building Docker image from environment in gear.yaml...
[+] Building 5.6s (13/13) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 844B 0.0s
=> resolve image config for docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4 5.2s
=> CACHED docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4@sha256:1f6e06f58b23c0700df6c05284ac25b29088 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 366B 0.0s
=> [internal] load metadata for registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11 0.1s
=> [stage-0 1/6] FROM registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24 0.0s
=> => resolve registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24d6ca50d4 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 137.36kB 0.0s
=> CACHED [stage-0 2/6] COPY .gear/tmp/build20250321152525.358211857500444/gear-1.0.0-py3-none-any.whl /tmp/gear-1.0.0 0.0s
=> CACHED [stage-0 3/6] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir /tmp/gear-1.0.0-py3- 0.0s
=> CACHED [stage-0 4/6] COPY .gear/tmp/build20250321152525.358211857500444/requirements.txt /tmp/requirements.txt 0.0s
=> CACHED [stage-0 5/6] RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt 0.0s
=> CACHED [stage-0 6/6] WORKDIR /src 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => preparing layers for inline cache 0.0s
=> => writing image sha256:9342ca519d1f66e0f642cb7630581700a56f8a86f11c83ca89e9bf440a461b33 0.0s
=> => naming to docker.io/library/gear-qwen05b-base 0.0s
Running 'python --check-hash-based-pycs never -m gear.server.http --await-explicit-shutdown true' in Docker with the current directory mounted as a volume...
Serving at http://127.0.0.1:8393
....
After the service starts, verify:
$ curl -X POST http://127.0.0.1:8393/predictions -H 'Content-Type: application/json' -H 'Stream: true' -d '{"input": {"prompt": "Hello"}}'
{"input": {"prompt": "1+1=?"}, "output": ["This is"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": ["一", "个"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": ["Test", "Example"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": [], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": "2025-03-21T08:05:57.154953+00:00", "logs": "", "error": null, "status": "succeeded", "metrics": {"predict_time": 0.198196}}
4. Publish Model on GpuGeek
4.1. Create Model
Click Create Model
4.2. Get Image Repository Token
Click Get Token
4.3. Login to Image Repository
Use the obtained token to log in to the image repository:
$ gear login
4.4. Build and Upload Image
$ gear build -t maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>:<tag>
$ docker push maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>:<tag>
You can also use the gear push
command, but you need to define image
in gear.yaml
:
# Configuration for gear ⚙️
build:
...
# predict.py defines how to run prediction on the model
predict: "predict.py:Predictor"
image: maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>:<tag>
5. Predictor
Predictor.setup()
Prepare the model so multiple predictions can run efficiently. Use this optional method for any expensive one-time operations, such as loading the trained model or instantiating data transformations.
Predictor.predict(**kwargs)
Run a single prediction. This required method is where you call the loaded model during setup()
. You may also want to add preprocessing and postprocessing code here.
The predict()
method takes a list of arbitrary named parameters, each name must correspond to an Input()
annotation.