GPU as a Service

Dedicated NVIDIA GPUs via VMs or Kubernetes

Access high-performance NVIDIA GPUs via GPU Passthrough on your VMs, or via the NVIDIA GPU Operator on your Kubernetes clusters. Two modes, one hardware catalog.

REQUEST A DEMO READ THE DOCUMENTATION

NVIDIA RTX Pro 6000 GPU card available in Hikube GPU as a Service

Memory 48 GB GDDR6

ECC Included

Performance INT8 733 TOPS

Performance FP32 91.6 TFLOPs

NVIDIA A100 GPU card for AI and ML model training

Memory 80 GB HBM2e

ECC Included

Performance INT8 624 TOPS

Performance FP32 19.5 TFLOPs

Memory 80 GB HBM2e

ECC Included

Performance INT8 3026 TOPS

Performance Tensor TF32 756 TFLOPs

Memory 96 GB GDDR7

ECC Included

Performance FP4 3.7 PFLOPS

Performance FP32 117 TFLOPs

Access mode

Exclusive PCI Passthrough

Shared Device Plugin

Insulation

1 GPU = 1 VM (dedicated)

Scheduling orchestrated by K8s

Performance

Native (passthrough)

Native (device plugin)

NVIDIA drivers

Manuals via cloud-init

Automatic (GPU Operator)

Scaling

Vertical only

Horizontal + Vertical

Sharing between workloads

Yes (between pods)

On a VM

Add a gpus[] field to your VMInstance. The GPU is attached in PCI Passthrough, guaranteeing direct and exclusive access to the hardware. Multi-GPU possible by repeating the inputs.

yaml

kind: VMInstance
spec:
instanceType: u1.2xlarge
gpus:
- name: "nvidia.com/AD102GL_L40S"

See the complete guide

On Kubernetes

Add a GPU node group to your cluster, then request the GPU in your pods via resources.limits. The GPU Operator manages the drivers automatically.

yaml

kind: Kubernetes
spec: 
 nodeGroups: 
 -gpu-workers: 
 instanceType: u1.xlarge 
 gpus: 
 - name: "nvidia.com/AD102GL_L40S"

See the complete guide

u1.xlarge

16 GB

1× L40S - development, prototyping

u1.2xwide

32 GB

1× A100 - fine-tuning, multi-model inference

u1.4xwide

64 GB

1-2× A100 - intensive ML training

u1.8xwide

128 GB

4× H100 - distributed drive, LLM

On a VM

bash

# SSH connection
virtctl ssh -i ~/.ssh/id_ed25519 ubuntu@gpu-workstation

 # Check GPU
 nvidia-smi

 # Detailed info nvidia-smi \ 
 --query-gpu=name,memory.total,utilization.gpu \ 
 --format=csv

On Kubernetes

yaml

# GPUs exposed per node
kubectl get nodes -o custom-columns=\
NAME:.metadata.name,\
GPU:.status.allocatable. 'nvidia\.com/gpu'

# From a pod
kubectl exec -it <pod-name> -- nvidia-smi

# Allocated resources
kubectl describe node <gpu-node> \
| grep -A5 "Allocated resources"

Why the GPU cloud

The GPU, the gas pedal of modern workloads

The CPU is designed to execute complex sequential tasks. The GPU, on the other hand, is architected for massive parallelism: thousands of single cores working simultaneously on the same problem. It's this fundamental difference that makes the GPU indispensable for training machine learning models, large-scale inference, 3D rendering or scientific computing.

Beyond raw computing power, this architecture is what makes it possible to run AI projects end to end: from experimentation and model training through to production deployment and large-scale operation.

Buying GPU hardware in-house implies long investment cycles, capacity management that's difficult to anticipate, and rapid obsolescence: an H100 bought today will be obsolete in 3 years' time. The GPU as a Service model provides access to the latest generation of NVIDIA hardware on demand, scaling according to actual load, and paying only for what is consumed.

At Hikube, GPUs are hosted in Switzerland and accessible via standard APIs, without lock-in or proprietary agents. Whether your workload is running on an isolated VM or in a Kubernetes cluster shared between teams, access to the hardware remains identical.

The rule of thumb: start with the L40S for all inference, development and prototyping. It covers the vast majority of cases at lower cost. Switch to theA100 when you're training models seriously (fine-tuning, large datasets). Reserve the H100 for really demanding workloads: Multi-billion parameter LLM, distributed training on multiple nodes.

If your application isn't containerized, you need full access to the GPU, or you're prototyping: take a Virtual Machine. It's simpler, faster to set up, and the GPU is entirely dedicated to you.

If you're already orchestrating your workloads with Kubernetes, need automatic scaling or share GPU resources between several teams: opt for the Kubernetes mode. The additional complexity is offset by the flexibility.

Plan on 8 to 16 vCPUs per GPU. A u1.2xlarge (8 vCPU, 32 GB RAM) is a good starting point for a single GPU. For 4 H100 GPUs, go up to u1.8xlarge (32 vCPU, 128 GB RAM). Undersizing the CPU creates data pre-processing bottlenecks that cap GPU utilization.

On VM, yes. You install the drivers via a cloud-init script on first boot. The doc provides the full script, so it's a one-time operation.

On Kubernetes, no. The GPU Operator takes care of this automatically on GPU nodes. You activate the addon in the cluster manifest, and the rest is transparent.

In VM mode, no. The GPU is entirely dedicated to the VM. In Kubernetes mode, the GPU Operator lets you allocate whole GPUs to different pods on the same node, but a pod can't request a fraction of a GPU. If you need to run several small jobs in parallel, the Kubernetes approach with multiple pods on a multi-GPU node is the most efficient.

Dedicated NVIDIA GPUs via VMs or Kubernetes

4 GPUS

2 modes

96 GB

3700 TOPS

Four NVIDIA families for each workload

L40S

A100

H100

RTX PRO 6000

Start-up advice

GPU on VM or GPU on Kubernetes

Ready in a few lines of YAML

On a VM

On Kubernetes

Recommended CPU/RAM ratio per GPU

Confirm GPU access

On a VM

On Kubernetes

The GPU, the gas pedal of modern workloads

CPU vs GPU: the right tool for every task

Guaranteed data sovereignty

Capex-free access to the latest generation

Integration into your existing stack

Questions about GPU as a Service

Which GPU should I choose for my workload?

VM or Kubernetes, what's the best choice?

Can I share a GPU between several jobs?

Should you manage NVIDIA drivers yourself?

Can I share a GPU between several jobs?