Best Practices | GpuGeek-弹性|便捷|划算,您的专属AI云

📄️ SSH Tunnel

Before completing enterprise certification, you can use an SSH tunnel to forward service ports inside an instance, allowing your local computer to access services running in the instance.

📄️ Public Access to Services Inside Instances

On the GPUGEEK platform, this feature allows you to expose services inside an instance to the public internet. These services can be Web projects, API interfaces, Stable Diffusion web UI, ComfyUI, vLLM, or any process listening on HTTP or TCP ports.

📄️ Shut Down Instance with In-Instance Command

During training or inference, if you are not sure how long your code will take to finish, but you want the instance to shut down immediately after the training or inference completes in order to save costs, you can achieve this with the /usr/local/bin/poweroff command.

📄️ PyCharm Remote Development

Remote development mainly involves moving the development environment (including code editing, compilation, running, etc.) from the local machine to a remote server. This process involves several key components and concepts:

📄️ Launch Processes on Instance Startup

On the GPUGEEK platform, this feature allows you to automatically start processes or tasks when the instance boots. It is mainly suitable for the following scenarios:

📄️ Periodic Data Saving During Training

During training, issues may occur such as GPU drop, GPU failure, network fluctuations, excessive traffic load, network disconnection, hardware failure, machine crash, or the training process being automatically terminated at batch N by the system due to OOM. Once these issues occur, if proper measures are not taken to save training progress, previous results may be lost, requiring training to restart from scratch. This not only wastes valuable time and computing resources but also increases the workload of research and development.

📄️ Running Tasks in the Background

Normally, when you run a machine learning training or inference task with the command python train.py, the process is attached to the system foreground. This means that if you connect to a remote instance via SSH and the SSH connection is interrupted due to network latency or fluctuations, the foreground process associated with the SSH session (including your training task) will also be terminated, causing you to lose all unsaved training progress.

📄️ Common Commands

Common pip Commands