USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookHow to Deploy Globally Without a Single Point of Failure
Previous Chapter
Secrets Management for GitOps
Next Chapter
AI-Assisted GitOps Workflows
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

49 sections

Progress0%
1 / 49

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Multi-Cluster Deployments

So far you've deployed your FastAPI agent to a single Kubernetes cluster. That works for development. But production systems need redundancy: if one cluster fails, your agent keeps running on another. If you need to test a new version before rolling out to all users, you deploy to a staging cluster first. This chapter teaches you to manage multiple clusters from one ArgoCD instance using a hub-spoke architecture.

In hub-spoke, ArgoCD (the hub) manages deployment to many Kubernetes clusters (the spokes). You define your application once in Git. ArgoCD syncs that same application to cluster 1, cluster 2, cluster 3—each with different configurations. One Git repository becomes the source of truth for your entire infrastructure.

The Hub-Spoke Architecture

A hub-spoke topology has one control point (ArgoCD hub) managing many execution points (Kubernetes clusters as spokes). This is different from decentralized approaches where each cluster runs its own ArgoCD instance.

Why Hub-Spoke?

Single pane of glass: One ArgoCD UI/CLI shows status across all clusters

text
ArgoCD Hub Kubernetes Clusters ┌──────────────┐ │ ArgoCD │ ┌──────────────┐ │ Server │──────────│ Prod Cluster │ │ │ │ (us-east) │ │ │ └──────────────┘ │ Git Repo │ │ (source of │ ┌──────────────┐ │ truth) │──────────│ Staging │ │ │ │ (us-west) │ │ │ └──────────────┘ └──────────────┘ ┌──────────────┐ ──────│ DR Cluster │ │ (eu-west) │ └──────────────┘

Cost of a unified approach: Secrets containing cluster credentials must be stored securely in ArgoCD, not in Git. We'll address this in Chapter 15 (Secrets Management).

Alternative: cluster-local ArgoCD (not hub-spoke):

text
Git Repo Kubernetes Clusters Prod Cluster ┌──────────────┐ └─ ArgoCD ────────────│ Prod Cluster │ └──────────────┘ Staging Cluster ┌──────────────┐ └─ ArgoCD ────────────│ Staging │ └──────────────┘

This approach works for teams with separate infra teams per cluster but loses the unified deployment view. We'll focus on hub-spoke because it's more common for AI agents.

Registering External Clusters

ArgoCD starts with one cluster: the one it's installed in (the hub). To deploy to other clusters (spokes), you must register those clusters with ArgoCD first.

Local Cluster Registration (Hub Cluster)

When you install ArgoCD on a cluster, it automatically registers itself:

yaml
apiVersion: cluster.argoproj.io/v1alpha1 kind: Cluster metadata: name: in-cluster spec: server: https://kubernetes.default.svc config: bearerToken: <token> tlsClientConfig: caData: <ca-cert>

Output:

text
Cluster registered successfully Name: in-cluster URL: https://kubernetes.default.svc Status: Healthy

Registering External Clusters

To register an external cluster (e.g., your staging environment), you need:

  1. Access to the external cluster's API server (kubeconfig context)
  2. A service account with cluster-admin permissions (or appropriate RBAC)
  3. The argocd CLI to register the cluster

Step 1: Create a service account on the external cluster

bash
# On the external cluster, create a namespace and service account kubectl create namespace argocd kubectl create serviceaccount argocd-manager -n argocd # Grant cluster-admin permissions kubectl create clusterrolebinding argocd-manager-cluster-admin \ --clusterrole=cluster-admin \ --serviceaccount=argocd:argocd-manager

Output:

text
namespace/argocd created serviceaccount/argocd-manager created clusterrolebinding.rbac.authorization.k8s.io/argocd-manager-cluster-admin created

Step 2: Get the external cluster's kubeconfig

bash
# Generate a kubeconfig for the service account kubectl config get-contexts # Current context should be your external cluster # If not, switch to it: kubectl config use-context <external-cluster-context>

Output:

text
CURRENT NAME CLUSTER AUTHINFO NAMESPACE * staging-us-west-1 us-west-1 admin prod-us-east-1 us-east-1 admin

Step 3: Register the cluster with ArgoCD

bash
# Switch back to your HUB cluster where ArgoCD is installed kubectl config use-context in-cluster # Port-forward to ArgoCD (if it's not exposed) kubectl port-forward -n argocd svc/argocd-server 8080:443 & # Register the external cluster argocd cluster add staging-us-west-1 \ --name staging \ --in-cluster=false

Output:

text
INFO[0003] ServiceAccount "argocd-manager" created in namespace "argocd" INFO[0004] ClusterRole "argocd-manager-role" created INFO[0005] ClusterRoleBinding "argocd-manager-rolebinding" created Cluster 'staging' has been added to Argo CD.

Cluster Secrets and Authentication

When you register an external cluster, ArgoCD stores the cluster's API server URL and authentication credentials as a Kubernetes Secret in the hub cluster.

Viewing Registered Clusters

bash
# List all registered clusters argocd cluster list # Get details of a specific cluster argocd cluster get staging # View the cluster secret directly kubectl get secret -n argocd -l argocd.argoproj.io/secret-type=cluster -o yaml

Output:

yaml
NAME CLUSTER TLS in-cluster https://kubernetes.default.svc false staging https://staging-api.example.com true prod https://prod-api.example.com true --- apiVersion: v1 kind: Secret metadata: name: cluster-staging-0123456789abcdef namespace: argocd labels: argocd.argoproj.io/secret-type: cluster type: Opaque data: server: aHR0cHM6Ly9zdGFnaW5nLWFwaS5leGFtcGxlLmNvbQ== # base64 encoded name: c3RhZ2luZw== # base64 encoded config: eyJiZWFyZXJUb2tlbiI6Ijc4OXB4eVl6ZUZRSXdVMkZrVUhGcGJISmhiblJsIn0=

Cluster Credentials: Bearer Token

The config field in the secret contains authentication details. For external clusters, it typically includes:

json
{ "bearerToken": "<service-account-token>", "tlsClientConfig": { "insecure": false, "caData": "<base64-encoded-ca-cert>" } }

The bearer token comes from the argocd-manager service account on the external cluster:

bash
# Get the token from the external cluster kubectl get secret -n argocd \ $(kubectl get secret -n argocd | grep argocd-manager-token | awk '{print $1}') \ -o jsonpath='{.data.token}' | base64 -d

Output:

text
ey Jhb Gci OiJIUzI1Ni IsInR5cCI6IkpXVCJ9.ey Jpc3Mi OiJrdWJlcm5ldGVzL3Nlcn ZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2Ui OiJhcmdvY2QiLCJrdWJlcm5ldGVz LmlvL3Nlcn ZpY2VhY2NvdW50L3NlY3JldC5uYW1l IjoiYXJnb2NkLW1hbm FnZXItdG9rZW4tOXA0ZGwiLCJrdWJlcm5ldGVz LmlvL3Nlcn ZpY2VhY2NvdW50L3Nlcn ZpY2VhY2NvdW50Lm5hbWUi OiJhcmdvY2QtbWFuYWdlci IsImt1Ym VybmV0ZXMuaW8vc2VydmljZWFjY291bn Qvc2VydmljZWFjY291bn Qud Wlk IjoiOWQ1YTc1Yz ItZjM0ZS00YjQ3LWJh YmUtODJm MmI4N2Rh MjI0In0.4b Gl...

Cluster Health Check

ArgoCD periodically verifies cluster connectivity:

bash
# Check cluster health argocd cluster get staging

Output:

text
Name: staging Server: https://staging-api.example.com Connection Status: Successful

If a cluster becomes unreachable, ArgoCD marks it as unhealthy but continues managing other clusters.

ApplicationSet with Cluster Generator

You've already learned ApplicationSets in Chapter 11. Now you'll use the Cluster generator to deploy an application to multiple registered clusters with cluster-specific configurations.

The Cluster Generator Concept

Instead of creating separate Applications for prod, staging, and DR:

yaml
# ❌ Old way: Three separate Applications --- apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: agent-prod spec: destination: server: https://prod-api.example.com # ... etc

Use a Cluster generator to create one Application per registered cluster:

yaml
# ✅ New way: One ApplicationSet generates three Applications apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet metadata: name: agent-multi-cluster spec: generators: - clusters: {} # Generates one Application per registered cluster template: metadata: name: 'agent-{{name}}' spec: project: default destination: server: '{{server}}' namespace: agent source: repoURL: https://github.com/example/agent path: manifests/ targetRevision: main

The clusters: {} generator creates template variables for every registered cluster:

  • {{name}}: Cluster name (e.g., "staging", "prod")
  • {{server}}: Cluster API server URL
  • {{metadata.labels}}: Cluster labels (if you've added them)

Cluster-Specific Configurations

Real deployments need different configs per cluster. You might want:

  • Prod: 3 replicas, resource limits, strict security policies
  • Staging: 1 replica, minimal resources, relaxed policies
  • DR: 3 replicas, same as prod but in different region

Use Helm values overrides to customize per cluster:

yaml
apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet metadata: name: agent-multi-cluster spec: generators: - clusters: selector: matchLabels: deploy: "true" # Only deploy to clusters with this label template: metadata: name: 'agent-{{name}}' spec: project: default destination: server: '{{server}}' namespace: agent source: repoURL: https://github.com/example/agent path: helm/ targetRevision: main helm: releaseName: agent values: | replicas: "{{replicas}}" environment: "{{name}}"

Step 1: Add labels to clusters

bash
# Label the clusters argocd cluster patch staging -p '{"metadata":{"labels":{"env":"staging","deploy":"true"}}}' argocd cluster patch prod -p '{"metadata":{"labels":{"env":"prod","deploy":"true"}}}' argocd cluster patch dr -p '{"metadata":{"labels":{"env":"dr","deploy":"true"}}}'

Output:

text
cluster 'staging' patched cluster 'prod' patched cluster 'dr' patched

Step 2: Create values-per-cluster in your Git repository

Create these files in your agent repository:

helm/values.yaml (default values) helm/values-prod.yaml (prod-specific overrides) helm/values-staging.yaml (staging-specific overrides) helm/values-dr.yaml (DR cluster same as prod)

Verify the files exist:

bash
ls -la helm/values*.yaml

Output:

text
-rw-r--r-- 1 user group 298 Dec 23 10:15 helm/values.yaml -rw-r--r-- 1 user group 156 Dec 23 10:15 helm/values-staging.yaml -rw-r--r-- 1 user group 298 Dec 23 10:15 helm/values-prod.yaml -rw-r--r-- 1 user group 298 Dec 23 10:15 helm/values-dr.yaml

Step 3: Create ApplicationSet with per-cluster values

yaml
apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet metadata: name: agent-multi-cluster spec: generators: - clusters: selector: matchLabels: deploy: "true" template: metadata: name: 'agent-{{name}}' spec: project: default syncPolicy: automated: prune: true selfHeal: true destination: server: '{{server}}' namespace: agent source: repoURL: https://github.com/example/agent path: helm/ targetRevision: main helm: releaseName: agent valueFiles: - values.yaml - values-{{name}}.yaml # Cluster-specific overrides

Apply the ApplicationSet:

bash
kubectl apply -f applicationset.yaml argocd app list

Output:

text
NAME CLUSTER NAMESPACE PROJECT STATUS HEALTH agent-staging staging agent default Synced Healthy agent-prod prod agent default Synced Healthy agent-dr dr agent default Synced Healthy

Cross-Cluster Networking Considerations

Multi-cluster deployments raise networking questions:

Service Discovery Between Clusters

OptionMethodCons
Option 1Direct IP/DNSNot recommended; cluster-local IPs don't route.
Option 2Ingress/Load BalancerExtra hops, increased latency.
Option 3Service Mesh (Istio)High complexity, requires shared control plane.

For your AI agent, if each cluster is independent (data doesn't flow between clusters), you don't need cross-cluster communication. Each cluster runs a complete copy of your agent with its own database.

DNS Across Clusters

Each Kubernetes cluster has its own DNS domain:

  • In Cluster A: agent-service.agent.svc.cluster.local resolves only within Cluster A
  • In Cluster B: Same agent-service.agent.svc.cluster.local is different from Cluster A

To expose a service to other clusters, use an external DNS name:

bash
# Get the external endpoint kubectl get svc -n agent agent-service -o jsonpath='{.status.load Balancer.ingress[0].hostname}'

Output:

text
agent-staging.example.com agent-prod.example.com agent-dr.example.com

Disaster Recovery: ArgoCD HA and Cluster Failover

With multiple clusters, you need resilience at two levels: ArgoCD itself must be HA, and your clusters must be capable of failover.

ArgoCD High Availability (Hub Cluster)

If your ArgoCD hub cluster goes down, you cannot deploy to spoke clusters. Make ArgoCD highly available:

bash
# Install ArgoCD with HA enabled helm install argocd argo/argo-cd \ --namespace argocd \ --set server.replicas=3 \ --set controller.replicas=3 \ --set repo.replicas=3 \ --set redis.replicas=3

Output:

text
Release "argocd" has been installed. Deployment argocd-application-controller: 3 replicas Deployment argocd-server: 3 replicas Deployment argocd-repo-server: 3 replicas StatefulSet redis: 3 replicas

Each component is fault-tolerant (Controller, Server, Repo Server, Redis). If one pod crashes, others take over.

Cluster Failover: Traffic Shifting

Your agent runs on three clusters (staging, prod, DR). If the prod cluster fails:

Scenario: User Traffic Shifting

text
User Traffic → AWS NLB (Network Load Balancer) ├─→ Prod cluster (prod.example.com) [FAILED] ├─→ DR cluster (dr.example.com) [HEALTHY] └─→ Staging (staging.example.com) [BACKUP] Action: NLB detects prod failure → routes traffic to DR cluster

For your agent, implement:

  1. Health checks on all clusters
  2. DNS failover (Route53, Cloudflare) to shift traffic
  3. ArgoCD monitoring to detect when clusters become unhealthy
bash
# Check if a cluster is healthy argocd cluster get prod # Check application health on prod cluster argocd app get agent-prod

Output:

text
Application: agent-prod Status: Degraded Server: https://prod-api.example.com (UNREACHABLE) --- Cluster: prod Connection Status: Failed (connection timeout)

Complete Multi-Cluster ApplicationSet Example

Here's a production-ready example:

Directory structure:

text
repo/ ├── argocd/ │ └── agent-multi-cluster-appset.yaml ├── helm/ │ ├── Chart.yaml │ ├── values.yaml │ ├── values-staging.yaml │ ├── values-prod.yaml │ └── values-dr.yaml

argocd/agent-multi-cluster-appset.yaml:

yaml
apiVersion: argoproj.io/v1alpha1 kind: ApplicationSet metadata: name: agent-multi-cluster namespace: argocd spec: syncPolicy: preserveResourcesOnDeletion: true generators: - clusters: selector: matchLabels: deploy: "true" template: metadata: name: 'agent-{{name}}' finalizers: - resources-finalizer.argocd.argoproj.io spec: project: default syncPolicy: automated: prune: true selfHeal: true retry: limit: 5 backoff: duration: 5s factor: 2 destination: server: '{{server}}' namespace: agent source: repoURL: https://github.com/example/agent path: helm/ targetRevision: main helm: releaseName: agent-{{name}} valueFiles: - values.yaml - values-{{metadata.labels.env}}.yaml

Deploy the ApplicationSet:

bash
# Label each cluster argocd cluster patch staging --labels 'env=staging,deploy=true' argocd cluster patch prod --labels 'env=prod,deploy=true' argocd cluster patch dr --labels 'env=dr,deploy=true' # Apply the ApplicationSet kubectl apply -f argocd/agent-multi-cluster-appset.yaml # Check sync status argocd app get agent-prod --refresh

Output:

text
NAME CLUSTER STATUS HEALTH agent-staging staging Synced Healthy agent-prod prod Synced Healthy agent-dr dr Synced Healthy

Try With AI

Setup: Use the same FastAPI agent from previous chapters. You now have three Kubernetes clusters available (or can simulate with three Minikube instances).

Part 1: Design Your Multi-Cluster Strategy

Ask AI: "I have a FastAPI agent that I want to deploy to three clusters: staging, prod, and DR. Each should have different resource allocations. Design a multi-cluster deployment strategy using ArgoCD that supports: (1) Separate configurations per cluster, (2) Secrets management outside of Git, (3) Automatic failover if one cluster becomes unhealthy."

Part 2: Refine Secret Handling

"How would I configure External Secrets to pull database passwords from HashiCorp Vault for my prod cluster, while the staging cluster gets test credentials from a different secret location?"

Part 3: Test with One Cluster First

"I want to set up a test ApplicationSet with just my staging cluster to verify the approach works before adding prod and DR. Give me a minimal ApplicationSet that deploys to a single cluster with custom values."

Part 4: Scaling to Three Clusters

"Now add the prod and dr clusters to the ApplicationSet. How do I ensure the cluster selector only deploys to clusters with the deploy=true label?"

Part 5: Design Failover

"If my prod cluster becomes unreachable, how does ArgoCD detect this and how would my users be notified? What monitoring should I add to alert when a cluster is unhealthy?"


Reflect on Your Skill

You built a gitops-deployment skill in Chapter 0. Test and improve it based on what you learned.

Test Your Skill

bash
Using my gitops-deployment skill, register an external cluster with ArgoCD. Does my skill describe the service account creation and argocd cluster add command?

Identify Gaps

Ask yourself:

  • Did my skill include ApplicationSet cluster generator for multi-cluster deployments?
  • Did it handle per-cluster Helm value overrides (values-prod.yaml, values-staging.yaml)?

Improve Your Skill

If you found gaps:

bash
My gitops-deployment skill doesn't generate multi-cluster Application Sets. Update it to include cluster generators with label selectors and environment-specific value files.