Kubernetes + GPU Setup

For Kubernetes clusters with GPU nodes. This is the most advanced setup: if you're not already running K8s, start with Docker instead.

Who this is for

You run a Kubernetes cluster with NVIDIA GPU nodes (on-prem, cloud, or hybrid). You want Immich Memories as a scheduled workload with GPU-accelerated encoding and optional music generation pods.

Architecture

┌─────────────────────────────────────────────────────┐
│ Kubernetes Cluster                                  │
│                                                     │
│  ┌──────────────────────────────────────────────┐   │
│  │ namespace: immich-memories                    │   │
│  │                                               │   │
│  │  ┌────────────┐  ┌────────────┐              │   │
│  │  │ Deployment  │  │ Job        │              │   │
│  │  │ (UI/API)    │  │ (batch     │              │   │
│  │  │ port 8080   │  │  generate) │              │   │
│  │  │ GPU: 1      │  │ GPU: 1     │              │   │
│  │  └────────────┘  └────────────┘              │   │
│  │                                               │   │
│  │  PVCs: config (1Gi), cache (20Gi),           │   │
│  │        output (50Gi)                          │   │
│  └──────────────────────────────────────────────┘   │
│                                                     │
│  ┌─────────────────┐                                │
│  │ GPU Operator     │  (manages nvidia.com/gpu)     │
│  └─────────────────┘                                │
└─────────────────────────────────────────────────────┘
         │
    ┌────┴─────────┐
    │ Immich server │ (same cluster or external)
    └──────────────┘

Kubernetes setup diagram

Prerequisites

NVIDIA GPU Operator installed:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator \
  --create-namespace

Storage class available for PersistentVolumeClaims
Immich accessible from the cluster (same namespace, different namespace, or external)

Deploy with Kustomize

The manifests live in deploy/kubernetes/ in the repo:

cd deploy/kubernetes

# Create the secret
cp secret.yaml.example secret.yaml
# Edit with your Immich URL and API key
vim secret.yaml

# Deploy
kubectl apply -k .

Or apply individually:

kubectl apply -f namespace.yaml
kubectl apply -f secret.yaml
kubectl apply -f configmap.yaml
kubectl apply -f pvc.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

Access the UI

kubectl port-forward -n immich-memories svc/immich-memories 8080:80

Open http://localhost:8080. For production, set up an Ingress with your existing ingress controller.

GPU resource requests

The deployment requests 1 NVIDIA GPU. Adjust in the deployment manifest:

resources:
  requests:
    nvidia.com/gpu: "1"
    memory: "4Gi"
    cpu: "2"
  limits:
    nvidia.com/gpu: "1"
    memory: "8Gi"
    cpu: "4"

Node selection

Pods schedule on nodes with nvidia.com/gpu.present=true (set by the GPU Operator). If your cluster uses different labels:

nodeSelector:
  nvidia.com/gpu.present: "true"
  # Or your custom label:
  # gpu-node: "true"

For music generation pods (MusicGen/ACE-Step), you might want separate node affinity rules to schedule on nodes with more VRAM.

Batch jobs

Run one-off generation without the UI:

kubectl apply -f job.yaml
kubectl logs -n immich-memories -f job/immich-memories-generate

Storage

Default PVC sizes:

PVC	Size	Purpose
Config	1Gi	Config files, analysis cache database
Cache	20Gi	Downloaded video files, thumbnails
Output	50Gi	Generated videos

The cache PVC holds cache.db (analysis scores from all previous runs). This is the most valuable volume: losing it means re-analyzing your entire library. Back it up:

kubectl exec -n immich-memories deployment/immich-memories -- \
  immich-memories cache backup /output/cache-backup.db

Secrets management

Don't commit plain secrets to git. Use sealed-secrets or your cluster's secret management:

kubeseal --format=yaml < secret.yaml > sealed-secret.yaml
kubectl apply -f sealed-secret.yaml

Health monitoring

The /health endpoint returns JSON:

{
  "status": "ok",
  "immich_reachable": true,
  "last_successful_run": "2025-12-15T10:30:00",
  "version": "0.2.0"
}

Status is ok when Immich is reachable, degraded otherwise. The deployment includes liveness and readiness probes hitting this endpoint.

Point your monitoring (Uptime Kuma, Prometheus blackbox exporter, etc.) at /health on port 8080.

What works / what doesn't

Same as the Linux + NVIDIA setup: NVENC encoding, CUDA face detection, Taichi GPU titles. The Kubernetes layer adds scheduling, scaling, and PVC-based storage.

Performance

Same as bare-metal Linux + NVIDIA. Kubernetes overhead is negligible for this workload. The bottleneck is GPU encoding speed and Immich API download throughput, not container orchestration.

Who this is for​

Architecture​

Prerequisites​

Deploy with Kustomize​

Access the UI​

GPU resource requests​

Node selection​

Batch jobs​

Storage​

Secrets management​

Health monitoring​

What works / what doesn't​

Performance​

Further reading​