Containers in workspaces and jobs are ephemeral. To make a dataset, model checkpoint, or large config file available to your training scripts, put it in a persistent volume and mount that volume into the workload. This guide walks through the practical ways to get data into a volume.Documentation Index
Fetch the complete documentation index at: https://docs.cloud.vessl.ai/llms.txt
Use this file to discover all available pages before exploring further.
Do not encode data into the job command itself. Pasting a gzip+base64 blob, a long heredoc, or other large payloads into
--cmd is rejected by the API: command bodies over 256 KiB, environment variable values over 8 KiB, or more than 128 environment variable pairs return a 4xx error. Use one of the patterns below instead.At a glance
The right approach depends on which kind of volume you target.| Volume kind | How to load data |
|---|---|
| Object storage | Upload directly from your machine with vesslctl volume upload, or any S3-compatible client via vesslctl volume token. |
| Cluster storage | Mount the volume into a workspace, then bring the data into the mount path from inside that workspace. |
Object storage — upload from your machine
vesslctl volume upload is the primary path. Files stream from your machine straight to S3, so transfer size is limited only by your network and storage quota.
Pick or create an Object storage volume
--teams controls which teams can mount this volume; it is required for Object storage.Upload local files
--dry-run previews the file list without transferring. --overwrite replaces existing remote keys; without it, identical keys are skipped. See vesslctl volume upload for the full flag reference.Verify and mount
--object-volume <volume-slug>:/shared.Cluster storage — load through a workspace
vesslctl volume upload does not support Cluster storage volumes. Instead, mount the Cluster storage volume into a workspace and bring the data into the mount path from inside that workspace.
Create a workspace that mounts the volume
Mount the Cluster storage volume at a clear path under Persistent volume, for example Any container image with the tools you need (
/data:curl, wget, aws, huggingface-cli, git-lfs, …) works. To minimize hourly cost while you move data, pick a CPU-only spec from vesslctl resource-spec list.Connect to the workspace
Wait until the workspace is
running, then connect over SSH or in JupyterLab. See Connect to a workspace.Once connected, cd /data (or whatever mount path you chose). Anything you write below this path lands in the Cluster storage volume and persists after the workspace is paused or terminated.Bring the data in (pick a pattern below)
Several patterns work. Pick the one that matches where the data lives.
Pattern A — Pull from the public internet
The simplest and most common case: the data is already at a public (or token-authenticated) URL.aria2c -x 16 parallelizes HTTP downloads, and rclone copy handles cloud-storage providers with built-in retry and verification.
Pattern B — Push from your laptop over SSH
When the data is only on your laptop and you want to skip the round trip through the public internet, use SSH to copy directly into the mount path.rsync is preferable for anything multi-gigabyte: it resumes after a dropped connection (--partial) and only retransmits changed files on a re-run.
Pattern C — Stage through Object storage
When you want a one-time copy from your laptop into Cluster storage on a different cluster (or from one cluster to another), use Object storage as a portable intermediate. Object storage is reachable from any cluster.Pattern D — Open a custom HTTP port
Need a browser drag-and-drop, a sync server, or a temporary webhook into the workspace? Open a custom HTTP or TCP port when you create the workspace (see Workspace ports) and serve directly from the mount path.Anti-pattern: do not embed data in --cmd
A pattern that looks tempting — especially to LLM coding agents — is to gzip+base64 a dataset into a single shell line and pass it via --cmd:
Job.command is capped at 256 KiB, each environment variable value at 8 KiB, and the total environment variable count at 128. Requests beyond these thresholds return 4xx.
Always upload the data to a volume (this page) and mount it instead.
Next steps
- Understand storage — Cluster vs Object storage characteristics and pricing.
- Create a volume — provision a new Object storage volume from the console.
- Create a workspace — attach volumes during workspace creation.
vesslctl volume— full CLI reference for upload, download, token, and management.
