Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cloud.vessl.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Metrics tab shows timeseries charts for GPU, VRAM, temperature, CPU, memory, network, and storage so you can confirm that hardware is being used efficiently and spot waste while the job runs or after it finishes.
Job metrics tab with GPU, VRAM, temperature, CPU, memory, network, and storage charts

Charts

Job metrics use the same chart families as workspaces. See Workspace metrics for the full chart reference, including threshold values, axis units, and interpretation rules:
  • GPU utilization (0–100%): Compute usage. Below 30% can indicate over-provisioned resources.
  • VRAM usage (GB): Video memory consumption. Above 95% risks Out of Memory (OOM) errors.
  • Temperature (°C): Sustained values above 85°C can indicate thermal throttling.
  • CPU and memory: Allocated CPU cores and system RAM in use.
  • Network I/O: Data transferred in (Rx) and out (Tx).
  • Storage: Temporary, Cluster storage, and Object storage usage for volumes attached to the job.

Time range

Use the time range selector at the top of the charts to choose a window: 1h (default), 6h, 12h, 1d, or 7d. All charts update together.

Idle detection

Jobs running below 30% GPU utilization for a sustained window (one-hour average) are flagged as idle on the Home dashboard. Review the metrics and decide whether to terminate the job to stop billing or let it continue.

See also