USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookThe Undo Button: Velero for Kubernetes Backup and Restore
Previous Chapter
Backup Fundamentals
Next Chapter
Chaos Engineering Basics
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

17 sections

Progress0%
1 / 17

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Velero for Kubernetes Backup and Restore

It happens faster than you can react. A junior developer runs kubectl delete namespace production instead of kubectl delete namespace test-production. In the two seconds before muscle memory kicks in and they hit Ctrl+C, Kubernetes has already begun terminating every pod, service, configmap, and secret in your production namespace.

Your database PersistentVolumeClaim survives because the storage class has reclaimPolicy: Retain. Small mercy. But the Task API deployment, the inference service configuration, the carefully-tuned HPA settings, the secrets containing API keys for three external services - all gone. You stare at the terminal, trying to remember what was in that namespace. You haven't touched some of those configurations in months.

This is the moment you discover whether your disaster recovery strategy exists beyond good intentions. Do you have backups? When were they last tested? Can you restore to a specific namespace? How long until production is back?

Velero answers these questions before disaster strikes. It's a CNCF project that backs up Kubernetes resources and persistent volumes, stores them safely off-cluster, and restores them when you need them. This lesson teaches you to install Velero, configure scheduled backups with retention policies, implement database-aware hooks for consistency, and verify that restores actually work.


Installing Velero with MinIO for Local Development

In production, you'll use cloud object storage - S3, GCS, Azure Blob. For local development and testing, MinIO provides an S3-compatible storage backend that runs in your cluster.

Step 1: Deploy MinIO

bash
helm repo add minio https://charts.min.io/ helm install minio minio/minio \ --namespace minio \ --create-namespace \ --set rootUser=minioadmin \ --set rootPassword=minioadmin \ --set mode=standalone \ --set resources.requests.memory=512Mi \ --set persistence.size=10Gi

Create a bucket for Velero backups:

bash
kubectl run minio-client --rm -it --restart=Never \ --image=minio/mc \ --namespace=minio \ --command -- /bin/sh -c " mc alias set myminio http://minio:9000 minioadmin minioadmin && \ mc mb myminio/velero-backups "

Step 2: Install Velero

bash
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts helm install velero vmware-tanzu/velero \ --namespace velero \ --create-namespace \ --set configuration.backupStorageLocation[0].name=default \ --set configuration.backupStorageLocation[0].provider=aws \ --set configuration.backupStorageLocation[0].bucket=velero-backups \ --set configuration.backupStorageLocation[0].config.region=minio \ --set configuration.backupStorageLocation[0].config.s3ForcePathStyle=true \ --set configuration.backupStorageLocation[0].config.s3Url=http://minio.minio:9000 \ --set snapshotsEnabled=false \ --set initContainers[0].name=velero-plugin-for-aws \ --set initContainers[0].image=velero/velero-plugin-for-aws:v1.10.0 \ --set initContainers[0].volumeMounts[0].mountPath=/target \ --set initContainers[0].volumeMounts[0].name=plugins \ --set credentials.useSecret=true \ --set credentials.secretContents.cloud='[default]\naws_access_key_id=minioadmin\naws_secret_access_key=minioadmin'

Verify Velero is running:

bash
kubectl get pods -n velero kubectl get backupstoragelocation -n velero

The Available phase confirms Velero can reach the MinIO bucket.


Understanding Velero's CRD Architecture

Velero introduces four Custom Resource Definitions for backup logic:

  1. BackupStorageLocation: Where backups live (S3/GCS bucket configs).
  2. Backup: A point-in-time snapshot of resources and PVCs.
  3. Schedule: Automated recurring backups based on a cron expression.
  4. Restore: Recovering Kubernetes objects and data from a specific Backup.

Creating a Production Schedule with 30-Day Retention

A 24-hour RPO means daily backups; 30-day retention means you can recover from problems noticed weeks later.

Create the Schedule manifest (task-api-schedule.yaml):

yaml
apiVersion: velero.io/v1 kind: Schedule metadata: name: task-api-production-daily namespace: velero labels: app: task-api environment: production spec: schedule: "0 2 * * *" # 2 AM UTC daily useOwnerReferencesInBackup: false template: includedNamespaces: - production includedResources: - "*" excludedResources: - events - pods - replicasets snapshotVolumes: true storageLocation: default ttl: 720h # 30 days = 720 hours

Apply and verify:

bash
kubectl apply -f task-api-schedule.yaml velero schedule describe task-api-production-daily

Understanding TTL Retention

The ttl field ensures Velero automatically deletes expired backups and data.

  • 168h = 7 days
  • 720h = 30 days
  • 2160h = 90 days

Implementing Backup Hooks for Database Consistency

If Velero backs up a PostgreSQL volume while active, the backup might contain inconsistent data. Hooks quiesce the database before snapshotting.

Complete Schedule with Database Hooks

yaml
apiVersion: velero.io/v1 kind: Schedule metadata: name: task-api-production-daily namespace: velero spec: schedule: "0 2 * * *" template: includedNamespaces: - production snapshotVolumes: true ttl: 720h hooks: resources: - name: postgres-consistency includedNamespaces: - production labelSelector: matchLabels: app: postgres pre: - exec: container: postgres command: - /bin/bash - -c - | psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c "CHECKPOINT;" pg_dump -U "$POSTGRES_USER" -d "$POSTGRES_DB" > /var/lib/postgresql/data/backup.sql onError: Fail timeout: 120s post: - exec: container: postgres command: - /bin/bash - -c - | rm -f /var/lib/postgresql/data/backup.sql onError: Continue timeout: 30s

Why Pre-Backup Hooks Matter

The CHECKPOINT command forces PostgreSQL to write all dirty buffers to disk, ensuring the volume snapshot is consistent. The pg_dump creates an additional SQL backup inside the container, providing application-level recovery if the raw volume snapshot fails.


Restore Procedure: Step-by-Step Recovery

When disaster strikes, you need a reliable restore procedure.

Step 1: Verify Available Backups

bash
velero backup get

Step 2: Inspect a Specific Backup

bash
velero backup describe task-api-production-daily-20241230020000 --details

Step 3: Create a Restore

bash
velero restore create restore-production-20241230 \ --from-backup task-api-production-daily-20241230020000 \ --include-namespaces production \ --restore-volumes=true

Step 4: Monitor Restore Progress

bash
velero restore describe restore-production-20241230 --details velero restore wait restore-production-20241230

Step 5: Verify the Restore

Check that resources are restored:

bash
kubectl get deployments -n production kubectl get services -n production kubectl get pvc -n production

Step 6: Validate Application Functionality

bash
kubectl port-forward svc/task-api 8000:8000 -n production & curl http://localhost:8000/health

Try With AI

These prompts help you apply Velero patterns to your backup requirements.

Prompt 1 (Backup Strategy Design):

text
I'm designing a backup strategy for my cluster: - Task API (FastAPI + PostgreSQL) - Inference Service (stateless, connects to external LLMs) - Redis cache (ephemeral) Help me decide: 1. Which components need Velero backups vs just Helm charts? 2. What backup frequency for each? 3. Which need database hooks?

Prompt 2 (Multi-Cluster Backup Architecture):

text
I have three clusters: dev, staging, production. I want to: - Back up production daily to S3 - Keep 7-day retention in dev, 30-day in staging, 90-day in production Design the BackupStorageLocation and Schedule configurations for this setup.

Prompt 3 (Compliance-Driven Verification):

text
My company needs to comply with SOC 2 Type II. The auditor asks: 1. How do you ensure backups are encrypted at rest? 2. How do you verify backup integrity? 3. Can you prove restore procedures work? Show me how to configure Velero to answer these questions with evidence.

Safety Note

Restore operations can overwrite existing resources. Always use --include-namespaces to target specific namespaces. Test restores in non-production environments first. The --dry-run flag shows what would be restored without actually modifying the cluster.