模型上传
使用Gear模型构建工具上传您的自定义模型。
1. 说明
- 本工具仅支持 MacOS 和 Linux
- 操作机需要准备 docker 环境
2. 安装 Gear
sudo curl -o /usr/local/bin/gear -L http://oss-high-sq01.cdsgss.com/gear/gear_`uname -s`_`uname -m` && sudo chmod +x /usr/local/bin/gear
3. 启动模型
3.1. 初始化
mkdir qwen0_5B && cd qwen0_5B
$ gear init
Setting up the current directory for use with gear...
✅ Created /data/qwen/predict.py
✅ Created /data/qwen/.dockerignore
✅ Created /data/qwen/.github/workflows/push.yaml
✅ Created /data/qwen/gear.yaml
Done!
3.2. 准备模型文件
目前不支持从远端拉取模型文件 , 需下载 自己的模型到本地
$ git clone https://hf-mirror.com/Qwen/Qwen2.5-0.5B
Cloning into 'Qwen2.5-0.5B'...
remote: Enumerating objects: 42, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 42 (delta 18), reused 0 (delta 0), pack-reused 3 (from 1)
Unpacking objects: 100% (42/42), 3.61 MiB | 3.99 MiB/s, done.
3.3. 准备 gear.yaml
配置文件
# Configuration for gear ⚙️
build:
# 是否需要gpu
gpu: true
# 需要安装的ubuntu系统包
# system_packages:
# - "libgl1-mesa-glx"
# - "libglib2.0-0"
# python 版本号 '3.11'
python_version: "3.11"
# python包 <package-name>==<version>
python_packages:
- "transformers"
- "torch"
- "accelerate>=0.26.0"
# 环境设置完成后运行的命令
# run:
# - "echo env is ready!"
# - "echo another command if needed"
# predict.py 定义了如何在模型上运行预测
predict: "predict.py:Predictor"
# 控制请求并发数, 如果指定此配置需要 preict.py中的 `def predict` 需改为 `async def predict`
# concurrency:
# max: 20
3.4. 准备predict.py
文件
from gear import BasePredictor, Input, ConcatenateIterator
class Predictor(BasePredictor):
def setup(self) -> None:
"""将模型加载到内存中,以高效地运行多个预测"""
# self.model = AutoModelForCausalLM.from_pretrained(
# "./Qwen2.5-0.5B",
# torch_dtype="auto",
# device_map="auto"
# )
# self.tokenizer = AutoTokenizer.from_pretrained(model_name)
async def predict(
self,
prompt: str = Input(description="prompt input", default="你好")
) -> ConcatenateIterator[str]:
"""在模型上运行一个预测"""
test_model = ["这是", "一", "个", "测试", "样例"]
for i in test_model:
yield i
3.5. 本地测试模型
执行测试命令
$ gear serve
Building Docker image from environment in gear.yaml...
[+] Building 5.6s (13/13) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 844B 0.0s
=> resolve image config for docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4 5.2s
=> CACHED docker-image://registry-bj.capitalonline.net/maas/dockerfile:1.4@sha256:1f6e06f58b23c0700df6c05284ac25b29088 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 366B 0.0s
=> [internal] load metadata for registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11 0.1s
=> [stage-0 1/6] FROM registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24 0.0s
=> => resolve registry-bj.capitalonline.net/maas/kit-base:cuda11.8-python3.11@sha256:0e9a503b356d548998e326f24d6ca50d4 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 137.36kB 0.0s
=> CACHED [stage-0 2/6] COPY .gear/tmp/build20250321152525.358211857500444/gear-1.0.0-py3-none-any.whl /tmp/gear-1.0.0 0.0s
=> CACHED [stage-0 3/6] RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir /tmp/gear-1.0.0-py3- 0.0s
=> CACHED [stage-0 4/6] COPY .gear/tmp/build20250321152525.358211857500444/requirements.txt /tmp/requirements.txt 0.0s
=> CACHED [stage-0 5/6] RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt 0.0s
=> CACHED [stage-0 6/6] WORKDIR /src 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => preparing layers for inline cache 0.0s
=> => writing image sha256:9342ca519d1f66e0f642cb7630581700a56f8a86f11c83ca89e9bf440a461b33 0.0s
=> => naming to docker.io/library/gear-qwen05b-base 0.0s
Running 'python --check-hash-based-pycs never -m gear.server.http --await-explicit-shutdown true' in Docker with the current directory mounted as a volume...
Serving at http://127.0.0.1:8393
....
服务启动后, 验证
$ curl -X POST http://127.0.0.1:8393/predictions -H 'Content-Type: application/json' -H 'Stream: true' -d '{"input": {"prompt": "你好"}}'
{"input": {"prompt": "1+1=?"}, "output": ["这是"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": ["一", "个"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": ["测试", "样例"], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": null, "logs": "", "error": null, "status": "processing", "metrics": null}
{"input": {"prompt": "1+1=?"}, "output": [], "id": "1e525032-cbaa-4fa5-82a3-b915754f072c", "version": null, "created_at": null, "started_at": "2025-03-21T08:05:56.956757+00:00", "completed_at": "2025-03-21T08:05:57.154953+00:00", "logs": "", "error": null, "status": "succeeded", "metrics": {"predict_time": 0.198196}}
3.6. 构建模型镜像
$ gear push maas-harbor-cn.yun-paas.com/maas-<user-id>/hotdog-detector
$ gear build
4. GpuGeek 发布模型
4.1. 创建模型
点击创建模型
4.2. 获取镜像仓库 token
点击获取token
4.3. 登陆镜像仓库
使用获取的token登陆镜像仓库
$ gear login
4.4. 编译和上传镜像
$ gear build -t maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>
$ docker push maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>
也可以使用 gear push
命令, 不过需要在gear.yaml
中定义image
信息
# Configuration for gear ⚙️
build:
...
# predict.py 定义了如何在模型上运行预测
predict: "predict.py:Predictor"
image: maas-harbor-cn.yun-paas.com/maas-<user-id>/<maas-name>
5. Predictor
Predictor.setup()
准备模型以便多个预测能够有效运行。
使用此可选方法可以在此处包含任何昂贵的一次性操作,如加载训练模型、实例化数据转换等。
Predictor.predict(**kwargs)
运行单一预测。
这个必需的方法是您在期间调用加载的模型的地方setup()
,但您可能还想在这里添加预处理和后处理代码。
该predict()
方法采用任意命名参数列表,其中每个参数名称必须对应一个Input()
注释。
Input()
使用 gear.Input()
对象定义方法中的每个参数predict()
:
该Input()
函数采用以下关键字参数:
description
:向模型用户传递此输入内容的描述。default
:设置输入的默认值。如果未传递此参数,则输入是必需的。如果明确设置为None
,则输入是可选的。ge
:对于int
或float
类型,该值必须大于或等于该数字。le
:对于int
或float
类型,该值必须小于或等于该数字。min_length
:对于str
类型,字符串的最小长度。max_length
:对于str
类型,字符串的最大长度。regex
:对于str
类型,字符串必须与此正则表达式匹配。choices
:对于str
或int
类型,此输入的可能值的列表。
Path()
该gear.Path
对象用于将文件放入和取 出模型。它表示磁盘上文件的路径。可用于文生图,文生视频等返回文件路径。
from gear import BasePredictor, Path
class Predictor(BasePredictor):
def setup(self) -> None:
print("async setup is also supported...")
def predict(self) -> str:
image_path = "/tmp/outp.mp3"
return Path(image_path)
流式输出
from gear import BasePredictor ConcatenateIterator
class Predictor(BasePredictor):
def predict(self) -> ConcatenateIterator[str]:
tokens = ["测", "试", "一", "下", "流", "式", "输", "出", "。"]
for token in tokens:
yield token + " "
异步和并发
你可以将predict()
方法指定为async def predict(...)
。此外,如果你有一个异步predict()
函数,你也可能有一个异步 setup()
函数:
from gear import BasePredictor
class Predictor(BasePredictor):
async def setup(self) -> None:
print("async setup is also supported...")
async def predict(self) -> str:
print("async predict")
return "hello world"
具有异步predict()
函数的模型可以同时运行预测,但最高可达 gear.yaml
中指定的限制concurrency.max
。尝试超出此限制将返回 409 冲突响应。
支持OpenAi
from gear import BasePredictor, ConcatenateIterator, Input
class Predictor(BasePredictor):
async def predict(self,
prompt: str = Input(description="输入 prompt"),
max_tokens: int = Input(
description="模型最大生成 tokens 数量。",
default=1024,
ge=1, le=8192),
temperature: float = Input(
description="控制模型输出的随机性,较高的值(接近1)使输出更随机,而较低的值使输出更具确定性。",
default=0.7, ge=0.00, le=2.0),
top_p: float = Input(
description="控制输出的多样性。较低的值使输出更加集中,较高的值使其更加多样化。",
default=0.7, ge=0.1, le=1.0),
top_k: int = Input(
description="从前k个tokens中采样。有助于加快生成过程,并可以提高生成文本的质量。",
default=50, ge=1, le=100),
frequency_penalty: float = Input(
description="通过惩罚已经频繁使用的单词来降低模型行中重复单词的可能性。",
default=0.0, ge=-2.0, le=2.0)
) -> ConcatenateIterator[str]:
messages = []
try:
messages = json.loads(prompt)
if not isinstance(messages, list):
raise ValueError("messages 必须是列表")
for msg in messages:
if not isinstance(msg, dict):
raise TypeError(f"消息项必须是字典,发现 {type(msg)} 类型")
if "role" not in msg:
raise KeyError("消息缺少 'role' 字段")
if "content" not in msg:
raise KeyError("消息缺少 'content' 字段")
if not isinstance(msg["content"], str):
raise TypeError("content 必须是字符串类型")
except Exception as e:
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt},
]
for token in youmodel.generate(messages=messages):
yield token
目前仅prompt
, max_tokens
, temperature
, top_p
, top_k
, frequency_penalty
这几个参数, messages的处理需要按照以上方式处理。