Your FastAPI agent runs in a Pod on Kubernetes. But Pods are ephemeral—when they restart, their filesystem disappears. This is a critical problem: Your agent has embedded vector search indexes, model checkpoints, conversation logs. When the Pod crashes and Kubernetes creates a replacement, all that data is gone.
PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) solve this by decoupling storage from compute. Storage exists independent of Pods. When a Pod restarts, it reconnects to the same storage and your agent resumes with all previous state intact.
This lesson teaches you to provision persistent storage manually, understand the abstraction that makes Kubernetes storage work, and configure your Pods to use that storage reliably.
Let's see what happens without persistent storage.
Create a simple Pod that writes data to its local filesystem:
Create this Pod:
Output:
Check that the file exists inside the Pod:
Output:
Now delete the Pod:
Output:
The data is gone forever. emptyDir (temporary storage) is cleared when the Pod terminates. For embeddings, model weights, and conversation history, you need storage that survives Pod restarts.
Kubernetes separates storage concerns into two layers:
PersistentVolume (PV): The infrastructure—a chunk of storage that exists in your cluster. A cluster administrator provisions PVs from available storage (local disk, network storage, cloud volumes). PVs are cluster-level resources.
PersistentVolumeClaim (PVC): The request—a developer specifies "I need 10GB of storage with read-write access." Kubernetes finds a matching PV and binds them together. PVCs are namespace-scoped.
This abstraction parallels the CPU/memory model:
Think of it like renting office space:
When the company moves to a different office building (Pod restarts), the same office (PV) still exists. A new company can occupy it, or the same company can return to the same office after relocation.
Let's create a PV manually. We'll use hostPath—storage backed by a directory on the Kubernetes node. This is suitable for learning and single-node clusters like Docker Desktop Kubernetes.
First, create a directory for storage (Docker Desktop mounts your local filesystem):
Output:
Now create a PersistentVolume that points to that directory:
Apply this manifest:
Output:
Check that the PV was created:
Output:
Notice the STATUS: Available. The PV exists but is not yet bound to any PVC. The RECLAIM POLICY: Delete means that when a PVC is deleted, this PV will be deleted too (other options: Retain, Recycle).
A PVC is a request for storage. Create a PVC that claims the PV we just created:
Apply this manifest:
Output:
Check that the PVC was created and is bound:
Output:
Check the PV status again:
Output:
The PV is now Bound to the PVC. They're connected. The binding was automatic based on:
Now create a Pod that mounts this PVC:
Apply this manifest:
Output:
Check the logs to confirm the Pod mounted the storage successfully:
Output:
The Pod successfully read the file we created in /tmp/k8s-data/test.txt earlier. The storage persists across container restarts because it's backed by the host filesystem, not the container's ephemeral layer.
Delete the Pod and recreate it:
Output:
Output:
Check the logs again:
Output:
The data is still there. The storage survived the Pod deletion and recreation. This is the core benefit of PersistentVolumes: data outlives container instances.
Creating PVs manually doesn't scale. In production, you use StorageClasses to provision PVs dynamically.
A StorageClass defines:
First, check what StorageClasses are available in your cluster:
Output:
Docker Desktop comes with a default StorageClass. Now create a PVC that uses this StorageClass (no PV needed—it's created automatically):
Apply this manifest:
Output:
Check the PVC:
Output:
Automatic PV creation: Kubernetes provisioner created a PV automatically and bound the PVC to it. Notice the PV name is generated (pvc-4a2b1c9d...). You don't need to manually create PVs anymore.
Create a Pod using this dynamically-provisioned PVC:
Apply and check logs:
Output:
Dynamic provisioning eliminates manual PV management. Developers just declare PVCs with desired storage size and access mode; the provisioner handles infrastructure provisioning.
PersistentVolumes support three access modes:
ReadWriteOnce (RWO): The volume can be mounted as read-write by a single Pod (but that Pod's containers can all read and write). Most restrictive mode.
ReadOnlyMany (ROX): The volume can be mounted as read-only by many Pods. Multiple readers, no writers allowed.
ReadWriteMany (RWX): The volume can be mounted as read-write by many Pods simultaneously. Requires network storage (not hostPath).
Create a read-only PVC for agent embeddings:
This PVC can be mounted by multiple inference Pods. If one embedding update Pod writes to it, the read-only mounting enforces that other Pods cannot accidentally overwrite data.
When you delete a PVC, what happens to the underlying PV? The reclaim policy controls this:
Delete: The PV is deleted when the PVC is deleted. Storage is freed immediately. Suitable for dynamic provisioning where storage is cheap.
Retain: The PV persists after PVC deletion. A cluster admin must manually delete the PV or recycle it. Suitable for important data where you want manual verification before deletion.
Recycle (deprecated): The PV is wiped and made available for reuse. Avoided in production due to data security concerns.
Here's a realistic Pod configuration for an agent that stores embeddings and checkpoints:
When this Pod runs:
Your agent continues serving requests without recomputing embeddings from scratch.
Setup: You're designing persistent storage for a multi-agent system. One agent computes and caches vector embeddings. Five other agents need read-only access to those embeddings. A background service periodically updates the embeddings.
Challenge Prompts:
Ask AI: "Design a PVC and access mode strategy for this scenario:
What access modes and binding strategy should I use? Should the embeddings and generators use separate PVCs?"
Follow up: "The embedding generator needs to update embeddings without downtime. My inference Pods must continue serving. What reclaim policy and update strategy would work best? Should I use ReadOnlyMany or a different approach?"
Then: "Write a Kubernetes manifest for this architecture. Include the PVC for embeddings, the PVC for the generator (if separate), and Pod definitions for one inference Pod and the generator Pod. Ensure the inference Pod includes volume mounts for the embeddings."