跳到主要内容

Gear模型构建工具

1. 说明

  • 本工具仅支持MacOS 和 Linux
  • 操作机需要准备docker环境

2. 安装Gear

sudo curl -o /usr/local/bin/gear -L http://oss-high-sq01.cdsgss.com/gear/gear_`uname -s`_`uname -m` && sudo chmod +x /usr/local/bin/gear

3. 启动模型

3.1. 初始化

mkdir qwen0_5B && cd qwen0_5B
$ gear init

Setting up the current directory for use with gear...

✅ Created /data/qwen/predict.py
✅ Created /data/qwen/.dockerignore
✅ Created /data/qwen/.github/workflows/push.yaml
✅ Created /data/qwen/gear.yaml

Done!

3.2. 准备模型文件

目前不支持从远端拉取模型文件 , 需下载自己的模型到本地

$ git clone https://hf-mirror.com/Qwen/Qwen2.5-0.5B
Cloning into 'Qwen2.5-0.5B'...
remote: Enumerating objects: 42, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 42 (delta 18), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (42/42), 3.61 MiB | 3.99 MiB/s, done.

3.3. 准备 gear.yaml 配置文件

# Configuration for gear ⚙️

build:
# 是否需要gpu
gpu: true

# 需要安装的ubuntu系统包
# system_packages:
# - "libgl1-mesa-glx"
# - "libglib2.0-0"

# python 版本号 '3.11'
python_version: "3.11"

# python包 <package-name>==<version>
python_packages:
- "transformers"
- "torch"
- "accelerate>=0.26.0"


# 环境设置完成后运行的命令
# run:
# - "echo env is ready!"
# - "echo another command if needed"

# predict.py 定义了如何在模型上运行预测
predict: "predict.py:Predictor"

# 控制请求并发数, 如果指定此配置需要 preict.py中的 `def predict` 需改为 `async def predict`
# concurrency:
# max: 20

3.4. 准备predict.py文件

from gear import BasePredictor, Input, ConcatenateIterator


class Predictor(BasePredictor):
def setup(self) -> None:
"""将模型加载到内存中,以高效地运行多个预测"""
# self.model = AutoModelForCausalLM.from_pretrained(
# "./Qwen2.5-0.5B",
# torch_dtype="auto",
# device_map="auto"
# )
# self.tokenizer = AutoTokenizer.from_pretrained(model_name)

async def predict(
self,
prompt: str = Input(description="prompt input", default="你好")
) -> ConcatenateIterator[str]:
"""在模型上运行一个预测"""
test_model = ["这是", "一", "个", "测试", "样例"]

for i in test_model:

yield i

3.5. 本地测试模型

执行测试命令

$ gear serve

Building Docker image from environment in gear.yaml...
[+] Building 5.6s (13/13) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 844B 0.0s
=> resolve image config for docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4 5.2s
=> CACHED docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4@sha256:1f6e06f58b23c0700df6c05284ac25b29088 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 366B 0.0s
=> [internal] load metadata for registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11 0.1s
=> [stage-0 1/6] FROM registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24 0.0s
=> => resolve registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24d6ca50d4 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 137.36kB 0.0s
=> CACHED [stage-0 2/6] COPY .gear/tmp/build20250321152525.358211857500444/gear-1.0.0-py3-none-any.whl /tmp/gear-1.0.0 0.0s
=> CACHED [stage-0 3/6] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir /tmp/gear-1.0.0-py3- 0.0s
=> CACHED [stage-0 4/6] COPY .gear/tmp/build20250321152525.358211857500444/requirements.txt /tmp/requirements.txt 0.0s
=> CACHED [stage-0 5/6] RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt 0.0s
=> CACHED [stage-0 6/6] WORKDIR /src 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => preparing layers for inline cache 0.0s
=> => writing image sha256:9342ca519d1f66e0f642cb7630581700a56f8a86f11c83ca89e9bf440a461b33 0.0s
=> => naming to docker.io/library/gear-qwen05b-base 0.0s

Running 'python --check-hash-based-pycs never -m gear.server.http --await-explicit-shutdown true' in Docker with the current directory mounted as a volume...

Serving at http://127.0.0.1:8393

....

服务启动后, 验证

$ curl -X POST http://127.0.0.1:8393/predictions -H 'Content-Type: application/json' -H 'Stream: true'   -d '{"input": {"prompt": "你好"}}'

{"input": {"prompt": "1+1=?"}, "output": ["这是"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}

{"input": {"prompt": "1+1=?"}, "output": ["一", "个"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}

{"input": {"prompt": "1+1=?"}, "output": ["测试", "样例"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}

{"input": {"prompt": "1+1=?"}, "output": [], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": "2025-03-21T08:05:57.154953+00:00", "logs": "", "error": null, "status": "succeeded", "metrics": {"predict_time": 0.198196}}

3.6. 构建模型镜像

$ gear push maas-harbor-cn.yun-paas.com/maas-<user-id>/hotdog-detector
$ gear build

4. GpuGeek 发布模型

4.1. 创建模型

​ 点击创建模型

4.2. 获取镜像仓库 token

​ 点击获取token

4.3. 登陆镜像仓库

使用获取的token登陆镜像仓库

$ gear login 

4.4. 编译和上传镜像

$ gear build -t  maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>
$ docker push maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>

也可以使用 gear push 命令, 不过需要在gear.yaml中定义image 信息

# Configuration for gear ⚙️

build:
...

# predict.py 定义了如何在模型上运行预测
predict: "predict.py:Predictor"

image: maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>

5. Predictor

Predictor.setup()

准备模型以便多个预测能够有效运行。

使用此可选方法可以在此处包含任何昂贵的一次性操作,如加载训练模型、实例化数据转换等。

Predictor.predict(**kwargs)

运行单一预测。

这个必需的方法是您在期间调用加载的模型的地方setup(),但您可能还想在这里添加预处理和后处理代码。

predict()方法采用任意命名参数列表,其中每个参数名称必须对应一个Input()注释。

Input()

使用 gear.Input()对象定义方法中的每个参数predict()

Input()函数采用以下关键字参数:

  • description:向模型用户传递此输入内容的描述。
  • default:设置输入的默认值。如果未传递此参数,则输入是必需的。如果明确设置为None,则输入是可选的。
  • ge:对于intfloat类型,该值必须大于或等于该数字。
  • le:对于intfloat类型,该值必须小于或等于该数字。
  • min_length:对于str类型,字符串的最小长度。
  • max_length:对于str类型,字符串的最大长度。
  • regex:对于str类型,字符串必须与此正则表达式匹配。
  • choices:对于strint类型,此输入的可能值的列表。

Path()

gear.Path对象用于将文件放入和取出模型。它表示磁盘上文件的路径。可用于文生图,文生视频等返回文件路径。

from gear import BasePredictor, Path

class Predictor(BasePredictor):
def setup(self) -> None:
print("async setup is also supported...")

def predict(self) -> str:
image_path = "/tmp/outp.mp3"
return Path(image_path)

流式输出

from gear import BasePredictor ConcatenateIterator

class Predictor(BasePredictor):
def predict(self) -> ConcatenateIterator[str]:
tokens = ["测", "试", "一", "下", "流", "式", "输", "出", "。"]
for token in tokens:
yield token + " "

异步和并发

你可以将predict()方法指定为async def predict(...)。此外,如果你有一个异步predict()函数,你也可能有一个异步 setup()函数:

from gear import BasePredictor

class Predictor(BasePredictor):
async def setup(self) -> None:
print("async setup is also supported...")

async def predict(self) -> str:
print("async predict")
return "hello world"

具有异步predict()函数的模型可以同时运行预测,但最高可达 gear.yaml 中指定的限制concurrency.max。尝试超出此限制将返回 409 冲突响应。

支持OpenAi

from gear import BasePredictor, ConcatenateIterator, Input

class Predictor(BasePredictor):
async def predict(self,
prompt: str = Input(description="输入 prompt"),
max_tokens: int = Input(
description="模型最大生成 tokens 数量。",
default=1024,
ge=1, le=8192),
temperature: float = Input(
description="控制模型输出的随机性,较高的值(接近1)使输出更随机,而较低的值使输出更具确定性。",
default=0.7, ge=0.00, le=2.0),
top_p: float = Input(
description="控制输出的多样性。较低的值使输出更加集中,较高的值使其更加多样化。",
default=0.7, ge=0.1, le=1.0),
top_k: int = Input(
description="从前k个tokens中采样。有助于加快生成过程,并可以提高生成文本的质量。",
default=50, ge=1, le=100),
frequency_penalty: float = Input(
description="通过惩罚已经频繁使用的单词来降低模型行中重复单词的可能性。",
default=0.0, ge=-2.0, le=2.0)
) -> ConcatenateIterator[str]:

messages = []
try:
messages = json.loads(prompt)
if not isinstance(messages, list):
raise ValueError("messages 必须是列表")
for msg in messages:
if not isinstance(msg, dict):
raise TypeError(f"消息项必须是字典,发现 {type(msg)} 类型")
if "role" not in msg:
raise KeyError("消息缺少 'role' 字段")
if "content" not in msg:
raise KeyError("消息缺少 'content' 字段")
if not isinstance(msg["content"], str):
raise TypeError("content 必须是字符串类型")
except Exception as e:
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt},
]

for token in youmodel.generate(messages=messages):
yield token

目前仅prompt, max_tokens, temperature, top_p, top_k, frequency_penalty 这几个参数, messages的处理需要按照以上方式处理。