Gear模型构建工具
1. 说明
- 本工具仅支持MacOS 和 Linux
- 操作机需要准备docker环境
2. 安装Gear
sudo curl -o /usr/local/bin/gear -L http://oss-high-sq01.cdsgss.com/gear/gear_`uname -s`_`uname -m` && sudo chmod +x /usr/local/bin/gear
3. 启动模型
3.1. 初始化
mkdir qwen0_5B && cd qwen0_5B
$ gear init
Setting up the current directory for use with gear...
✅ Created /data/qwen/predict.py
✅ Created /data/qwen/.dockerignore
✅ Created /data/qwen/.github/workflows/push.yaml
✅ Created /data/qwen/gear.yaml
Done!
3.2. 准备模型文件
目前不支持从远端拉取模型文件 , 需下载自己的模型到本地
$ git clone https://hf-mirror.com/Qwen/Qwen2.5-0.5B
Cloning into 'Qwen2.5-0.5B'...
remote: Enumerating objects: 42, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 42 (delta 18), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (42/42), 3.61 MiB | 3.99 MiB/s, done.
3.3. 准备 gear.yaml
配置文件
# Configuration for gear ⚙️
build:
# 是否需要gpu
gpu: true
# 需要安装的ubuntu系统包
# system_packages:
# - "libgl1-mesa-glx"
# - "libglib2.0-0"
# python 版本号 '3.11'
python_version: "3.11"
# python包 <package-name>==<version>
python_packages:
- "transformers"
- "torch"
- "accelerate>=0.26.0"
# 环境设置完成后运行的命令
# run:
# - "echo env is ready!"
# - "echo another command if needed"
# predict.py 定义了如何在模型上运行预测
predict: "predict.py:Predictor"
# 控制请求并发数, 如果指定此配置需要 preict.py中的 `def predict` 需改为 `async def predict`
# concurrency:
# max: 20
3.4. 准备predict.py
文件
from gear import BasePredictor, Input, ConcatenateIterator
class Predictor(BasePredictor):
def setup(self) -> None:
"""将模型加载到内存中,以高效地运行多个预测"""
# self.model = AutoModelForCausalLM.from_pretrained(
# "./Qwen2.5-0.5B",
# torch_dtype="auto",
# device_map="auto"
# )
# self.tokenizer = AutoTokenizer.from_pretrained(model_name)
async def predict(
self,
prompt: str = Input(description="prompt input", default="你好")
) -> ConcatenateIterator[str]:
"""在模型上运行一个预 测"""
test_model = ["这是", "一", "个", "测试", "样例"]
for i in test_model:
yield i
3.5. 本地测试模型
执行测试命令
$ gear serve
Building Docker image from environment in gear.yaml...
[+] Building 5.6s (13/13) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 844B 0.0s
=> resolve image config for docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4 5.2s
=> CACHED docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4@sha256:1f6e06f58b23c0700df6c05284ac25b29088 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 366B 0.0s
=> [internal] load metadata for registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11 0.1s
=> [stage-0 1/6] FROM registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24 0.0s
=> => resolve registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24d6ca50d4 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 137.36kB 0.0s
=> CACHED [stage-0 2/6] COPY .gear/tmp/build20250321152525.358211857500444/gear-1.0.0-py3-none-any.whl /tmp/gear-1.0.0 0.0s
=> CACHED [stage-0 3/6] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir /tmp/gear-1.0.0-py3- 0.0s
=> CACHED [stage-0 4/6] COPY .gear/tmp/build20250321152525.358211857500444/requirements.txt /tmp/requirements.txt 0.0s
=> CACHED [stage-0 5/6] RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt 0.0s
=> CACHED [stage-0 6/6] WORKDIR /src 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => preparing layers for inline cache 0.0s
=> => writing image sha256:9342ca519d1f66e0f642cb7630581700a56f8a86f11c83ca89e9bf440a461b33 0.0s
=> => naming to docker.io/library/gear-qwen05b-base 0.0s
Running 'python --check-hash-based-pycs never -m gear.server.http --await-explicit-shutdown true' in Docker with the current directory mounted as a volume...
Serving at http://127.0.0.1:8393
....
服务启动后, 验证
$ curl -X POST http://127.0.0.1:8393/predictions -H 'Content-Type: application/json' -H 'Stream: true' -d '{"input": {"prompt": "你好"}}'
{"input": {"prompt": "1+1=?"}, "output": ["这是"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": ["一", "个"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": ["测试", "样例"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": [], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": "2025-03-21T08:05:57.154953+00:00", "logs": "", "error": null, "status": "succeeded", "metrics": {"predict_time": 0.198196}}