AI-Assisted GitOps Workflows

Name: Digital FTEs: Engineering — Achieving 10× Productivity
Author: Muhammad Usman Akbar

You've learned ArgoCD architecture, ApplicationSets, secrets management, and multi-cluster patterns manually. You can write manifests, reason about sync strategies, and debug deployment issues. Now you're ready for the next layer: using AI as a collaborator to generate sophisticated GitOps configurations that would take hours to write by hand.

This chapter teaches a critical skill: evaluating and refining AI-generated manifests. Claude can generate working ArgoCD configurations in seconds, but that output needs your domain knowledge to become production-ready.

Why AI Helps With GitOps

GitOps configurations are highly structured YAML where small mistakes have large consequences. A typo in a sync policy, a missing imagePullSecret, or incorrect resource ordering can break deployments.

AI excels at:

Boilerplate generation — ApplicationSets with matrix generators, complex sync strategies
Multi-environment templates — Dev/staging/prod variations that differ only in replicas and registries
Manifest composition — Combining Helm values, ConfigMaps, and ArgoCD policies into coherent configurations
Pattern recognition — Suggesting sync strategies or health checks you might not have considered

But AI has no visibility into:

Your actual cluster topology and names
Registry credentials and pull secret names
Environment-specific constraints (storage classes, ingress controllers)
Your organization's naming conventions and security policies

This is where you come in. You provide constraints, validate assumptions, and catch environment-specific mistakes that AI can't know about.

When to Use AI for GitOps

Ask yourself these questions:

Use AI if:

The configuration is complex (3+ environments, multiple deployment patterns)
You're using unfamiliar features (Argo Rollouts integration, advanced sync waves)
You're generating boilerplate that follows a pattern you've defined
You need to quickly explore design options

Don't rely on AI if:

The manifest is simple (single application, one cluster)
You're unsure what the manifest should do (write the spec first, then ask for the manifest)
The configuration involves undocumented internal systems
You haven't validated cluster names, registries, or credentials

Critical Evaluation: What to Check

When Claude generates a manifest, you are not accepting it as gospel. You're evaluating it against your environment.

Checklist: Validate AI Output

Cluster references — Does the manifest use YOUR cluster names?

yaml

# Wrong (generic example)
destination:
  server: https://kubernetes.default.svc

# Right (your actual cluster)
destination:
  server: https://prod-eks-cluster.example.com

Registry credentials — Are imagePullSecrets correct for your registries?

yaml

# Check: Does my cluster have this secret?
imagePullSecrets:
- name: ghcr-credentials  # Must exist in the namespace

Namespace alignment — Does every resource deploy to the right namespace?

yaml

# Validate: argocd/default/target namespace consistency
metadata:
  namespace: production-agents  # Must match destination.namespace

Resource limits — Are requests/limits appropriate for your workloads?

yaml

resources:
  requests:
    memory: "256Mi"    # Is this realistic for a FastAPI agent?
    cpu: "250m"
  limits:
    memory: "512Mi"    # Or too generous?
    cpu: "500m"

Sync strategies — Does auto-sync make sense for this environment?

yaml

syncPolicy:
  automated:
    prune: true      # Safe for dev? Dangerous for prod.
    selfHeal: true

Health assessment — Does the health check match your service?

yaml

# AI might assume HTTP health checks for all services
# Your database might need different health criteria
healthChecks:
- type: Application  # Correct for most deployments

Variable substitution — Are placeholders actually filled in?

yaml

# Wrong (template not rendered)
image: ghcr.io/organization/agent:{{ version }}

# Right (AI should fill this from your context)
image: ghcr.io/organization/agent:sha-abc1234

Example: AI-Generated Manifest with Issues

Here's what Claude might generate for a multi-environment deployment:

yaml

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: fastapi-agent-multi-env
spec:
  generators:
  - matrix:
      generators:
      - list:
          elements:
          - name: dev
            cluster: minikube
            replicas: 1
            image_tag: latest
          - name: staging
            cluster: staging-cluster
            replicas: 3
            image_tag: v1.2.3
          - name: prod
            cluster: prod-cluster
            replicas: 5
            image_tag: v1.2.3
      - list:
          elements:
          - registry: docker.io
          - registry: ghcr.io
  template:
    metadata:
      name: fastapi-agent-{{name}}-{{registry}}
    spec:
      project: default
      source:
        repoURL: https://github.com/your-org/helm-charts
        chart: fastapi-agent
        targetRevision: HEAD
        helm:
          valuesInline:
            replicaCount: {{replicas}}
            image:
              repository: {{registry}}/your-org/fastapi-agent
              tag: {{image_tag}}
            ingress:
              enabled: true
              className: nginx
      destination:
        server: https://{{cluster}}.example.com
        namespace: agents
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
        - CreateNamespace=true

Problems to identify:

Matrix generator — Combining environment with registry creates unwanted combinations (dev with ghcr, prod with docker.io)
Cluster server — {{cluster}}.example.com won't resolve; needs actual cluster URLs
Image registry mismatch — Prod probably shouldn't pull from docker.io; dev might not use ghcr
Auto-sync in prod — prune: true in production is risky without additional safeguards
No imagePullSecrets — ghcr and private registries need credentials
Missing namespace — No validation that agents namespace exists

This is where you step in. You know these constraints. Claude doesn't.

Teaching Claude Your Constraints

The collaborative process works like this:

Round 1: Initial Generation

You ask Claude to generate the manifest. You get a structurally correct starting point that's missing your environment-specific details.

You respond with your actual constraints:

"The cluster setup is different. We have:

Dev cluster: minikube (local)

Staging: 'staging-eks' (AWS EKS)

Prod: 'prod-eks' (AWS EKS, requires IAM auth)

Dev can use public registries (docker.io), but staging and prod MUST use private ECR. The ECR URIs are: 123456789.dkr.ecr.us-east-1.amazonaws.com

All clusters have imagePullSecrets named 'ecr-credentials' except dev (which doesn't need one).

Auto-sync is okay for dev and staging, but prod MUST be manual sync only."

Claude doesn't know these constraints. But once you tell it, it can adapt immediately.

Round 3: Refined Manifest

Claude regenerates the manifest accounting for your constraints:

yaml

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: fastapi-agent-multi-env
spec:
  generators:
  - list:
      elements:
      - name: dev
        cluster: minikube
        server: https://kubernetes.default.svc  # Local minikube
        replicas: 1
        registry: docker.io
        pullSecret: null
        autoSync: "true"
      - name: staging
        cluster: staging-eks
        server: https://staging-eks.example.com
        replicas: 3
        registry: 123456789.dkr.ecr.us-east-1.amazonaws.com
        pullSecret: ecr-credentials
        autoSync: "true"
      - name: prod
        cluster: prod-eks
        server: https://prod-eks.example.com
        replicas: 5
        registry: 123456789.dkr.ecr.us-east-1.amazonaws.com
        pullSecret: ecr-credentials
        autoSync: "false"
  template:
    metadata:
      name: fastapi-agent-{{name}}
    spec:
      project: default
      source:
        repoURL: https://github.com/your-org/helm-charts
        chart: fastapi-agent
        targetRevision: main
        helm:
          valuesInline:
            replicaCount: {{replicas}}
            image:
              repository: {{registry}}/your-org/fastapi-agent
              tag: sha-abc1234
            imagePullSecrets:
              {{#pullSecret}}- name: {{pullSecret}}{{/pullSecret}}
            resources:
              requests:
                memory: "256Mi"
                cpu: "250m"
              limits:
                memory: "512Mi"
                cpu: "500m"
      destination:
        server: {{server}}
        namespace: agents
      syncPolicy:
        {{#if autoSync == "true"}}
        automated:
          prune: true
          selfHeal: true
        {{/if}}
        syncOptions:
        - CreateNamespace=true
        - PruneLast=true

Output: The manifest now uses your actual cluster servers, applies imagePullSecrets only where needed, makes auto-sync conditional, specifies the correct registry for each environment, and uses a specific image tag (sha-abc1234) instead of latest.

Iterative Refinement

The collaborative process doesn't stop there. As Claude generates output, you ask clarifying questions:

Your question after seeing the refined manifest:

"The pull secret conditional syntax looks like Kustomize, not Helm. ArgoCD uses standard Helm templating. How would you rewrite the imagePullSecrets to work with ArgoCD's Helm integration?"

Claude explains and corrects:

yaml

# Cleaner approach: Remove null values in values.yaml
image
Pull
Secrets: []  # Empty in dev, populated in values-staging.yaml and values-prod.yaml

Then you validate the fix by checking if it matches your Helm chart expectations.

You push back when Claude makes assumptions:

"You suggest using default StorageClass for all environments. But dev uses emptyDir, staging uses ebs-gp3, and prod uses ebs-io2 (expensive, high-performance). How do you handle per-environment storage class selection?"

Claude offers solutions:

yaml

# Approach 1: Pass storageClassName through Helm values
values:
  persistence:
    storageClassName: {{storage_class}}

# Approach 2: Use an ArgoCD SyncWave to create environment-specific PVCs first
# Approach 3: Use Kustomize patches to override the storage class per environment

Testing Before Deploying

Claude can generate manifests, but you must validate them before applying:

bash

# Step 1: Apply to a test cluster or dry-run
argocd app create fastapi-agent-dev \
  --file manifest.yaml \
  --dry-run

# Step 2: Check what ArgoCD would deploy
argocd app diff fastapi-agent-dev

# Step 3: Verify the ApplicationSet generates correct Applications
kubectl get Application -n argocd

# Step 4: Sync and monitor
argocd app sync fastapi-agent-dev

# Step 5: Validate actual deployment
kubectl get pods -n agents
kubectl logs -n agents -l app=fastapi-agent --tail=50

Each step confirms that Claude's generated manifest actually works in your environment.

Try With AI: Multi-Environment GitOps Deployment

Now practice this collaborative pattern yourself.

Part 1: Initial Request

Ask Claude to generate an ApplicationSet:

text

Generate an ApplicationSet for deploying our FastAPI agent to dev, staging, and prod.
Here's what we have:
- Dev: local Minikube cluster, public registries allowed
- Staging: AWS EKS cluster, private ECR registry
- Prod: AWS EKS cluster, private ECR, high-availability requirements
The agent needs:
- Dev: 1 replica, 256Mi memory
- Staging: 3 replicas, 512Mi memory
- Prod: 5 replicas, 1Gi memory, managed node group with specific labels
Generate a manifest that handles these variations using a single ApplicationSet.

Part 2: Critical Evaluation

Review Claude's output. Ask yourself:

Does this handle my cluster topology? Are cluster server URLs correct?
What assumptions did Claude make? Are pull secrets named correctly in my clusters?
What would fail in my environment? Are there hardcoded values that don't match my setup?

Tell Claude your actual constraints:

text

The manifest is close, but I need adjustments:
- Dev cluster is 'minikube' (server: https://kubernetes.default.svc)
- Staging is 'staging-eks' (server: https://staging-eks-cluster-xyz.eks.amazonaws.com)
- Prod is 'prod-eks' (server: https://prod-eks-cluster-abc.eks.amazonaws.com)
Staging and prod both require nodes labeled 'workload: agents'.

Part 4: Refinement

After Claude regenerates, ask a clarifying question about an assumption:

text

Is it better to set node affinity in the Application
Set or in the Helm chart's values.yaml? We want to be able to change it per environment without modifying the chart.

Part 5: Validation

Apply the final manifest and validate that Applications are generated and replicas match expectations across environments.

Reflect on Your Skill

You built a gitops-deployment skill in Chapter 0. Test and improve it based on what you learned.

Test Your Skill

bash

Using my gitops-deployment skill, generate a complete multi-environment Application
Set with all the patterns from this chapter.
Does my skill produce production-ready manifests with proper validation?

Identify Gaps

Ask yourself:

Did my skill include all the concepts: sync waves, hooks, secrets, multi-cluster, RBAC, notifications?
Did it validate cluster names, registries, and environment-specific constraints?

Improve Your Skill

If you found gaps:

bash

My gitops-deployment skill generates basic manifests but misses advanced patterns.
Review all 16 chapters and update the skill to include comprehensive Git
Ops workflows.

AI-Assisted GitOps Workflows

Why AI Helps With GitOps

When to Use AI for GitOps

Critical Evaluation: What to Check

Checklist: Validate AI Output

Example: AI-Generated Manifest with Issues

Teaching Claude Your Constraints

Round 1: Initial Generation

Round 2: Share Your Constraints

Round 3: Refined Manifest

Iterative Refinement

Testing Before Deploying

Try With AI: Multi-Environment GitOps Deployment

Part 1: Initial Request

Part 2: Critical Evaluation

Part 3: Share Your Constraints

Part 4: Refinement

Part 5: Validation

Reflect on Your Skill

Test Your Skill

Identify Gaps

Improve Your Skill

AI-Assisted GitOps Workflows

Why AI Helps With GitOps

When to Use AI for GitOps

Critical Evaluation: What to Check

Checklist: Validate AI Output

Example: AI-Generated Manifest with Issues

Teaching Claude Your Constraints

Round 1: Initial Generation

Round 2: Share Your Constraints

Round 3: Refined Manifest

Iterative Refinement

Testing Before Deploying

Try With AI: Multi-Environment GitOps Deployment

Part 1: Initial Request

Part 2: Critical Evaluation

Part 3: Share Your Constraints

Part 4: Refinement

Part 5: Validation

Reflect on Your Skill

Test Your Skill

Identify Gaps

Improve Your Skill