Model Upload
Use the Gear model building tool to upload your custom model.
1. Instructions
- This tool only supports MacOS and Linux
- The operating machine needs to have a docker environment ready
2. Install Gear
sudo curl -o /usr/local/bin/gear -L http://oss-high-qy01.cdsgss.com/gear/gear_`uname -s`_`uname -m` && sudo chmod +x /usr/local/bin/gear
3. Start the Model
3.1. Initialization
mkdir qwen0_5B && cd qwen0_5B
$ gear init
Setting up the current directory for use with gear...
✅ Created /data/qwen/predict.py
✅ Created /data/qwen/.dockerignore
✅ Created /data/qwen/.github/workflows/push.yaml
✅ Created /data/qwen/gear.yaml
Done!
3.2. Prepare Model Files
Currently, pulling model files from remote is not supported. You must download your own model to the local machine.
$ git clone https://hf-mirror.com/Qwen/Qwen2.5-0.5B
Cloning into 'Qwen2.5-0.5B'...
remote: Enumerating objects: 42, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 42 (delta 18), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (42/42), 3.61 MiB | 3.99 MiB/s, done.
3.3. Prepare gear.yaml Configuration File
# Configuration for gear ⚙️
build:
# Whether GPU is needed
gpu: true
# Ubuntu system packages to install
# system_packages:
# - "libgl1-mesa-glx"
# - "libglib2.0-0"
# Python version '3.11'
python_version: "3.11"
# Python packages <package-name>==<version>
python_packages:
- "transformers"
- "torch"
- "accelerate>=0.26.0"
# Commands to run after environment setup
# run:
# - "echo env is ready!"
# - "echo another command if needed"
# predict.py defines how to run prediction on the model
predict: "predict.py:Predictor"
# Control request concurrency, if specified, `def predict` in predict.py must be changed to `async def predict`
# concurrency:
# max: 20
3.4. Prepare predict.py File
from gear import BasePredictor, Input, ConcatenateIterator
class Predictor(BasePredictor):
def setup(self) -> None:
"""Load the model into memory to efficiently run multiple predictions"""
# self.model = AutoModelForCausalLM.from_pretrained(
# "./Qwen2.5-0.5B",
# torch_dtype="auto",
# device_map="auto"
# )
# self.tokenizer = AutoTokenizer.from_pretrained(model_name)
async def predict(
self,
prompt: str = Input(description="prompt input", default="Hello")
) -> ConcatenateIterator[str]:
"""Run a prediction on the model"""
test_model = ["This", "is", "a", "test", "example"]
for i in test_model:
yield i
3.5. Local Test Model
Run the test command:
$ gear serve
Building Docker image from environment in gear.yaml...
[+] Building 5.6s (13/13) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 844B 0.0s
=> resolve image config for docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4 5.2s
=> CACHED docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4@sha256:1f6e06f58b23c0700df6c05284ac25b29088 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 366B 0.0s
=> [internal] load metadata for registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11 0.1s
=> [stage-0 1/6] FROM registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24 0.0s
=> => resolve registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24d6ca50d4 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 137.36kB 0.0s
=> CACHED [stage-0 2/6] COPY .gear/tmp/build20250321152525.358211857500444/gear-1.0.0-py3-none-any.whl /tmp/gear-1.0.0 0.0s
=> CACHED [stage-0 3/6] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir /tmp/gear-1.0.0-py3- 0.0s
=> CACHED [stage-0 4/6] COPY .gear/tmp/build20250321152525.358211857500444/requirements.txt /tmp/requirements.txt 0.0s
=> CACHED [stage-0 5/6] RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt 0.0s
=> CACHED [stage-0 6/6] WORKDIR /src 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => preparing layers for inline cache 0.0s
=> => writing image sha256:9342ca519d1f66e0f642cb7630581700a56f8a86f11c83ca89e9bf440a461b33 0.0s
=> => naming to docker.io/library/gear-qwen05b-base 0.0s
Running 'python --check-hash-based-pycs never -m gear.server.http --await-explicit-shutdown true' in Docker with the current directory mounted as a volume...
Serving at http://127.0.0.1:8393
....
After the service starts, verify:
$ curl -X POST http://127.0.0.1:8393/predictions -H 'Content-Type: application/json' -H 'Stream: true' -d '{"input": {"prompt": "Hello"}}'
{"input": {"prompt": "1+1=?"}, "output": ["This is"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": ["一", "个"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": ["Test", "Example"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": [], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": "2025-03-21T08:05:57.154953+00:00", "logs": "", "error": null, "status": "succeeded", "metrics": {"predict_time": 0.198196}}
4. Publish Model on GpuGeek
4.1. Create Model
Click Create Model
4.2. Get Image Repository Token
Click Get Token
4.3. Login to Image Repository
Use the obtained token to log in to the image repository:
$ gear login
4.4. Build and Upload Image
$ gear build -t maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>:<tag>
$ docker push maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>:<tag>
You can also use the gear push command, but you need to define image in gear.yaml:
# Configuration for gear ⚙️
build:
...
# predict.py defines how to run prediction on the model
predict: "predict.py:Predictor"
image: maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>:<tag>
5. Predictor
Predictor.setup()
Prepare the model so multiple predictions can run efficiently. Use this optional method for any expensive one-time operations, such as loading the trained model or instantiating data transformations.
Predictor.predict(**kwargs)
Run a single prediction. This required method is where you call the loaded model during setup(). You may also want to add preprocessing and postprocessing code here.
The predict() method takes a list of arbitrary named parameters, each name must correspond to an Input() annotation.
Input()
Define each parameter in predict() using a gear.Input() object:
The Input() function takes the following keyword arguments:
description: A description of this input passed to the model user.default: Sets the default value of the input. If not passed, the input is required. If explicitly set toNone, the input is optional.ge: Forintorfloattypes, the value must be greater than or equal to this number.le: Forintorfloattypes, the value must be less than or equal to this number.min_length: Forstrtypes, the minimum string length.max_length: Forstrtypes, the maximum string length.regex: Forstrtypes, the string must match this regex.choices: Forstrorinttypes, a list of possible values for this input.
Path()
The gear.Path object is used to put files into and retrieve files from the model. It represents the path of a file on disk. It can be used for text-to-image, text-to-video, etc., returning the file path.
from gear import BasePredictor, Path
class Predictor(BasePredictor):
def setup(self) -> None:
print("async setup is also supported...")
def predict(self) -> Path:
image_path = "/tmp/outp.mp3"
return Path(image_path)
Streaming Output
from gear import BasePredictor, ConcatenateIterator
class Predictor(BasePredictor):
def predict(self) -> ConcatenateIterator[str]:
tokens = ["Test", "streaming", "output", "."]
for token in tokens:
yield token + " "
Async and Concurrency
You can specify the predict() method as async def predict(...). In addition, if you have an asynchronous predict() function, you may also have an asynchronous setup() function:
from gear import BasePredictor
class Predictor(BasePredictor):
async def setup(self) -> None:
print("async setup is also supported...")
async def predict(self) -> str:
print("async predict")
return "hello world"
Models with asynchronous predict() functions can run predictions concurrently, up to the limit specified in gear.yaml under concurrency.max. Attempts to exceed this limit will return a 409 Conflict response.
OpenAI-Compatible
from gear import BasePredictor, ConcatenateIterator, Input
class Predictor(BasePredictor):
def setup(self) -> None:
print("setup is ready...")
async def predict(self,
prompt: str = Input(description="prompt input"),
max_tokens: int = Input(
description="Maximum number of tokens generated by the model.",
default=1024, ge=1, le=8192),
temperature: float = Input(
description="Controls randomness of the model output. Higher values (closer to 1) make output more random, lower values make it more deterministic.",
default=0.7, ge=0.00, le=2.0),
top_p: float = Input(
description="Controls output diversity. Lower values make output more focused, higher values make it more diverse.",
default=0.7, ge=0.1, le=1.0),
top_k: int = Input(
description="Samples from the top k tokens. Helps speed up generation and can improve quality.",
default=50, ge=1, le=100),
frequency_penalty: float = Input(
description="Reduces the likelihood of repeating words by penalizing those already used frequently.",
default=0.0, ge=-2.0, le=2.0)
) -> ConcatenateIterator[str]:
messages = []
try:
messages = json.loads(prompt)
if not isinstance(messages, list):
raise ValueError("messages must be a list")
for msg in messages:
if not isinstance(msg, dict):
raise TypeError(f"Message item must be a dict, got {type(msg)}")
if "role" not in msg:
raise KeyError("Message missing 'role' field")
if "content" not in msg:
raise KeyError("Message missing 'content' field")
if not isinstance(msg["content"], str):
raise TypeError("content must be a string")
except Exception as e:
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt},
]
for token in youmodel.generate(messages=messages):
yield token
Currently, only the parameters prompt, max_tokens, temperature, top_p, top_k, and frequency_penalty are supported. The handling of messages must follow the method above.