Data Storage
The GPUGEEK storage solutions are as follows:
Mount Directory | Type | Permission/Speed | Description |
---|---|---|---|
/ | Instance System Disk | Read/Write / Fastest | Store code, projects, virtual environments, etc. Backup images will also save this data |
/gz-data | Instance Data Disk | Read/Write / Fastest | Store datasets/models, suitable for scenarios with high IO requirements, not included in backup images |
/gz-fs | Instance Network Disk | Read/Write / Normal | Achieves data sharing among all instances in the same data center and under the same account |
/gz-datasets | Public Datasets | Read-only / Normal | Public training datasets, need to copy to /gz-data/ directory before training |
/gz-models | Public Models | Read-only / Normal | Public model data, need to copy to /gz-data/ directory before inference |
Instance System Disk
The instance system disk is the /
root directory space of the instance. This directory uses the server's local NVME
disk, with the fastest IO
read/write speed. The default storage space of this directory is 30GB
, which can be viewed on the GPUGEEK Console.
It is recommended to store training code, projects, conda
, pip
, and other virtual environments in the instance system disk. It is not recommended to store large training datasets or inference models in this directory.
The data in the instance system disk will be saved together with the [backup image], so it can be restored when needed.
- Do not store more than
95%
of data in the root path/
of the instance system disk, otherwise the instance may fail to start. - Some projects save training data by default in
/root
or/tmp
. Please check and modify them to/gz-data
.
Instance Data Disk
The instance data disk is the /gz-data
directory of the instance. This directory also uses the server's local NVME
disk, with the fastest IO
read/write speed. The default storage space is 20GB
.
You can check the usage in the GPUGEEK Console, and expand the data disk by going to the corresponding instance -> More -> Expand Data Disk to meet your needs.
It is recommended to store datasets, models, and other large data in the instance data disk. Since the data disk can be expanded, but the system disk cannot, large data should be stored on the data disk.
The data in the instance data disk will not be saved with the [backup image]. Therefore, if the instance is deleted or rebuilt, the data on the data disk will be lost.
Instance Network Disk
The instance network disk is the /gz-fs
directory of the instance. This directory is a distributed storage within the same data center for instances. It performs well with large files or compressed files, but read/write speed may be slower for hundreds of thousands or millions of small scattered files.
The /gz-fs
directory is suitable for backing up required data before shutting down or releasing the instance for long-term storage.
The /gz-fs
network disk storage is different for each data center. Data between different data centers is not shared.
Multiple instances created under the same account in the same data center will simultaneously mount the /gz-fs
directory for sharing. It can be used for data sharing and data backup between multiple instances under the same account.
You can view and upload data in the network disk from the Network Storage console.
Public Datasets
Public datasets are located in the /gz-datasets
directory of the instance.
This directory is collected and placed by the official GPUGEEK platform staff, and it is read-only in the instance.
If the directory contains the dataset you need, copy the dataset to the /gz-data
directory before training. Directly reading the datasets from /gz-datasets
may affect your training progress. If this occurs, the platform will not bear any responsibility.
If the dataset you need is not in this directory, you can submit a ticket to request adding it. The staff will process it within 3-5 working days.
Public Models
Public models are located in the /gz-models
directory of the instance.
This directory is collected and placed by the official GPUGEEK platform staff, and it is read-only in the instance.
If the directory contains the model you need, copy it to the /gz-data
directory before inference. Directly reading the models from /gz-models
may affect your training progress. If this occurs, the platform will not bear any responsibility.
If the model you need is not in this directory, you can submit a ticket to request adding it. The staff will process it within 3-5 working days.