USMAN’S INSIGHTS
AI ARCHITECT
  • Home
  • About
  • Thought Leadership
  • Book
Press / Contact
USMAN’S INSIGHTS
AI ARCHITECT
⌘F
HomeBook
HomeBookOne Bad Commit Away From a Production Outage
Previous Chapter
Building Docker Images in CI
Next Chapter
GitOps Principles Git as Truth
AI NOTICE: This is the table of contents for the SPECIFIC CHAPTER only. It is NOT the global sidebar. For all chapters, look at the main navigation.

On this page

32 sections

Progress0%
1 / 32

Muhammad Usman Akbar Entity Profile

Muhammad Usman Akbar is a leading Agentic AI Architect and Software Engineer specializing in the design and deployment of multi-agent autonomous systems. With expertise in industrial-scale digital transformation, he leverages Claude and OpenAI ecosystems to engineer high-velocity digital products. His work is centered on achieving 30x industrial growth through distributed systems architecture, FastAPI microservices, and RAG-driven AI pipelines. Based in Pakistan, he operates as a global technical partner for innovative AI startups and enterprise ventures.

USMAN’S INSIGHTS
AI ARCHITECT

Transforming businesses into autonomous AI ecosystems. Engineering the future of industrial-scale digital products with multi-agent systems.

30X Growth
AI-First
Innovation

Navigation

  • Home
  • Book
  • About
  • Contact
Let's Collaborate

Have a Project in Mind?

Let's build something extraordinary together. Transform your vision into autonomous AI reality.

Start Your Transformation

© 2026 Muhammad Usman Akbar. All rights reserved.

Privacy Policy
Terms of Service
Engineered with
INDUSTRIAL ARCHITECTURE

Testing and Quality Gates

You've built images in CI and pushed them to registries. But before deploying an image to production, someone needs to verify it works. In Chapter 1, you learned that the Test stage is the quality gate—the checkpoint that prevents broken code from reaching users.

This chapter teaches you how to implement that test stage in GitHub Actions. You'll write tests with pytest, measure code coverage, enforce coverage thresholds, lint your code, and configure your workflow to fail if any test fails. No exceptions. No warnings that get ignored. A single failing test stops the entire pipeline.

By the end of this chapter, you'll understand how automated tests become a safety net that developers trust, and how quality gates make deployments safer.

Why Tests in CI Matter (Beyond "Best Practice")

Without tests in CI, here's what happens:

  1. Developer pushes code
  2. Image builds successfully
  3. Image is deployed to production
  4. A subtle bug appears in production (could have been caught by tests)
  5. Users notice and report the bug
  6. Rollback happens, timeline is disrupted

With tests in CI:

  1. Developer pushes code
  2. Tests run automatically
  3. A test catches the bug
  4. The pipeline fails
  5. Developer fixes the bug before deployment
  6. Tests pass, deployment proceeds

The test stage is your defense against shipping broken code. In a team setting, tests are the only thing preventing one person's mistake from taking down everyone's work.

Unit Testing with pytest

Python projects use pytest for unit testing. Let's understand what a test looks like for your FastAPI agent.

A Minimal FastAPI Test

Here's your FastAPI agent with a simple endpoint:

python
# app/main.py from fastapi import FastAPI, HTTPException from pydantic import BaseModel app = FastAPI() class Task(BaseModel): id: int title: str completed: bool = False @app.post("/tasks") async def create_task(task: Task): if not task.title: raise HTTPException(status_code=400, detail="Title is required") return {"id": task.id, "title": task.title, "completed": task.completed} @app.get("/tasks/{task_id}") async def get_task(task_id: int): return {"id": task_id, "title": "Sample Task", "completed": False}

Output:

text
App defined with POST /tasks and GET /tasks/{id} endpoints

Now here's a test for this endpoint:

python
# tests/test_main.py import pytest from fastapi.testclient import TestClient from app.main import app client = TestClient(app) def test_create_task_success(): """Test creating a task with valid data""" response = client.post("/tasks", json={ "id": 1, "title": "Deploy FastAPI agent", "completed": False }) assert response.status_code == 200 assert response.json()["title"] == "Deploy FastAPI agent" def test_create_task_missing_title(): """Test that missing title returns 400""" response = client.post("/tasks", json={ "id": 1, "title": "", "completed": False }) assert response.status_code == 400

Output:

text
tests/test_main.py::test_create_task_success PASSED tests/test_main.py::test_create_task_missing_title PASSED ============ 2 passed in 0.25s ============

The test creates a client, calls your endpoint, and asserts the response status code and body. If any assertion fails, the test fails.

Running Tests Locally

Before CI, you run tests on your machine to develop:

bash
pytest tests/

Output:

text
tests/test_main.py::test_create_task_success PASSED tests/test_main.py::test_create_task_missing_title PASSED tests/test_main.py::test_get_task PASSED ============ 3 passed in 0.34s ============

If a test fails:

bash
pytest tests/test_main.py::test_create_task_success

Output:

text
tests/test_main.py::test_create_task_success FAILED Assertion Error: assert 400 == 201

pytest shows exactly which assertion failed and why. This feedback loop—write code, run tests, see failures, fix code—is how developers build confidence.

Code Coverage: Measuring Test Quality

A passing test is good, but does it actually test the important parts of your code? Code coverage measures what percentage of your code is executed during tests.

Running Coverage Reports

Use pytest-cov to measure coverage:

bash
pip install pytest-cov pytest --cov=app --cov-report=html tests/

Output:

text
tests/test_main.py::test_create_task_success PASSED tests/test_main.py::test_create_task_missing_title PASSED ============ 2 passed in 0.25s ============ ---------- coverage: platform linux -- python 3.11.0-final-0.7.0 ---------- Name Stmts Miss Cover --------------------------------------------- app/main.py 15 2 87% app/config.py 8 0 100% app/models.py 12 3 75% --------------------------------------------- TOTAL 35 5 86%

Coverage shows:

  • Stmts: Total lines of code
  • Miss: Lines not executed by tests
  • Cover: Percentage executed

If you have 86% coverage, 14% of your code isn't tested. That could be edge cases or error handling that only triggers in production.

Coverage Thresholds: Enforcing Minimum Quality

A quality gate enforces a minimum coverage threshold. If coverage drops below the threshold, the pipeline fails:

bash
pytest --cov=app --cov-fail-under=80 tests/

Output (if coverage is 86%):

text
============ 2 passed in 0.25s ============ ---------- coverage: platform linux -- python 3.11.0-final-0.7.0 ---------- TOTAL 35 5 86% PASSED ✓ (86% >= 80% threshold)

Output (if coverage is 75%):

text
============ 2 passed in 0.25s ============ ---------- coverage: platform linux -- python 3.11.0-final-0.7.0 ---------- TOTAL 35 10 75% FAILED ✗ (75% < 80% threshold)

The --cov-fail-under=80 flag makes pytest exit with a failure code if coverage doesn't meet the threshold. In CI, this failure stops the pipeline.

Linting: Catching Code Style Issues

Beyond functional tests, linting checks code style and catches common mistakes. Tools like ruff or flake8 scan your code for issues without running it.

Running a Linter

bash
pip install ruff ruff check app/ tests/

Output (clean code):

text
All checks passed!

Output (with issues):

text
app/main.py:5:1: E302 expected 2 blank lines, found 1 app/main.py:12:80: E501 line too long (85 > 79 characters) app/main.py:28:15: F841 local variable 'temp' is assigned but never used

Linting catches:

  • Unused imports
  • Undefined variables
  • Lines too long
  • Missing docstrings
  • Inconsistent style

In CI, a linting failure stops the pipeline just like a test failure.

Quality Gates in GitHub Actions

Now let's put this together in a GitHub Actions workflow. Your CI pipeline needs:

  1. Install dependencies
  2. Run tests with coverage
  3. Run linter
  4. Fail the job if any check fails

Complete Test and Quality Gate Workflow

Here's a workflow that runs tests and enforces quality gates:

yaml
# .github/workflows/ci.yml name: CI - Build, Test, and Push on: push: branches: - main pull_request: branches: - main jobs: test: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install dependencies run: | python -m pip install --upgrade pip pip install pytest pytest-cov ruff pip install -r requirements.txt - name: Run linter run: ruff check app/ tests/ - name: Run tests with coverage run: | pytest \ --cov=app \ --cov-fail-under=80 \ --cov-report=term-missing \ --cov-report=html \ tests/ - name: Upload coverage report uses: actions/upload-artifact@v3 if: always() with: name: coverage-report path: htmlcov/ build: needs: test runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 - name: Build Docker image run: | docker build -t agent-task-service:${{ github.sha }} .

Explanation: This workflow defines two jobs (test and build) where build only runs if test succeeds (via needs: test).

Output (when tests pass):

text
test job: Checkout code ... DONE Set up Python ... DONE Install dependencies ... DONE Run linter ... PASSED ✓ Run tests with coverage ... tests/test_main.py::test_create_task_success PASSED tests/test_main.py::test_create_task_missing_title PASSED ============ 2 passed in 0.25s ============ TOTAL coverage 86% (>= 80% threshold) ✓ Upload coverage report ... DONE build job: Starts (because test job passed) Build Docker image ... DONE

Output (when tests fail):

text
test job: Checkout code ... DONE Set up Python ... DONE Install dependencies ... DONE Run linter ... PASSED ✓ Run tests with coverage ... tests/test_main.py::test_create_task_success FAILED AssertionError: assert 400 == 201 PIPELINE STOPPED ✗ build job: SKIPPED (because test job failed)

Notice the needs: test line in the build job. This creates a dependency: build only starts if test passes. If test fails, build never runs.

Integration Tests with Service Containers

Unit tests verify individual functions. Integration tests verify components work together—especially when external services are involved. For a FastAPI agent that uses PostgreSQL, integration tests need a real database.

GitHub Actions supports service containers—temporary databases or services that spin up for your tests, then tear down.

Integration Test with PostgreSQL

Here's an integration test that reads from a database:

python
# tests/test_integration_db.py import os import pytest import asyncpg from fastapi.testclient import TestClient from app.main import app @pytest.fixture async def db_connection(): """Connect to test PostgreSQL database""" # Environment variable set by GitHub Actions service container dsn = os.getenv("DATABASE_URL") conn = await asyncpg.connect(dsn) yield conn await conn.close() async def test_task_persists_to_db(db_connection): """Test that task is saved to database""" # Insert task via app endpoint client = TestClient(app) response = client.post("/tasks", json={ "id": 1, "title": "Test task", "completed": False }) # Verify it's in the database result = await db_connection.fetchrow( "SELECT * FROM tasks WHERE id = $1", 1 ) assert result["title"] == "Test task"

Output:

text
tests/test_integration_db.py::test_task_persists_to_db PASSED

This test needs a real PostgreSQL running. GitHub Actions can provide it:

Workflow with Service Container

yaml
# .github/workflows/ci.yml (updated) jobs: test: runs-on: ubuntu-latest services: postgres: image: postgres:15-alpine env: POSTGRES_PASSWORD: password POSTGRES_DB: test_db options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 ports: - 5432:5432 steps: - name: Checkout code uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install dependencies run: | pip install pytest pytest-cov asyncpg pip install -r requirements.txt - name: Create database schema run: | psql -h localhost -U postgres -d test_db -c " CREATE TABLE tasks ( id SERIAL PRIMARY KEY, title VARCHAR NOT NULL, completed BOOLEAN DEFAULT FALSE );" env: PGPASSWORD: password - name: Run tests (unit + integration) run: | pytest \ --cov=app \ --cov-fail-under=80 \ tests/ env: DATABASE_URL: postgresql://postgres:password@localhost:5432/test_db

Explanation: This workflow adds a services section with PostgreSQL, waits for health checks, creates the database schema, then runs tests with database connectivity.

Output:

text
services: postgres: Started on localhost:5432 Health check passed ✓ test job: Create database schema ... DONE Run tests (unit + integration) ... tests/test_main.py::test_create_task_success PASSED tests/test_integration_db.py::test_task_persists_to_db PASSED ============ 2 passed in 0.35s ============ Coverage 86% ✓

The service container automatically:

  • Starts before tests run
  • Provides a PostgreSQL database
  • Stops after tests complete
  • Tears down completely (no side effects)

Fail-Fast: Stop on First Failure

Your pipeline should stop immediately when something fails. Don't continue building images and pushing to registries if tests fail.

Explicit Job Dependencies

The workflow above uses needs: test to enforce dependencies:

yaml
jobs: test: # Tests run first ... build: needs: test # Build only runs if test succeeds ... push: needs: build # Push only runs if build succeeds ...

Output (execution order):

text
Workflow starts → test job begins → all tests run → if PASS: build job begins → if FAIL: build and push are SKIPPED → build job (only if test passed) → push job (only if build passed)

If any job fails, all dependent jobs are skipped. This is fail-fast behavior.

Step Failure Behavior

By default, if a step fails, the job stops:

yaml
steps: - name: Run linter run: ruff check app/ # If this fails... - name: Run tests run: pytest tests/ # This step never runs

Output (when linter fails):

text
Run linter ruff check app/ app/main.py:5:1: E302 expected 2 blank lines FAILED ✗ Run tests SKIPPED (because linter failed) Job status: FAILED

You can override this with continue-on-error: true, but you shouldn't for quality gates. Failures should block the pipeline.

Uploading Test Artifacts

GitHub Actions can upload test reports and coverage reports as artifacts. These are stored and accessible through the GitHub UI.

yaml
- name: Upload coverage report uses: actions/upload-artifact@v3 if: always() # Upload even if tests fail with: name: coverage-report path: htmlcov/ - name: Upload test results uses: actions/upload-artifact@v3 if: always() with: name: pytest-results path: test-results.xml

Output (GitHub UI):

text
Artifacts ├── coverage-report/ │ ├── index.html │ ├── app_main_py.html │ └── status.json └── pytest-results/ └── test-results.xml [Download] coverage-report (2.4 MB) [Download] pytest-results (45 KB)

After the workflow runs, GitHub provides a download link for these artifacts. Developers can download the HTML coverage report and view what code wasn't tested.

Complete Multi-Job CI Workflow

Here's a complete workflow combining everything:

yaml
# .github/workflows/ci.yml name: CI - Build, Test, and Push on: push: branches: - main pull_request: branches: - main env: REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }}/agent-task-service jobs: test: name: Test Suite runs-on: ubuntu-latest services: postgres: image: postgres:15-alpine env: POSTGRES_PASSWORD: testpass POSTGRES_DB: test_db options: >- --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5 ports: - 5432:5432 steps: - name: Checkout code uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install dependencies run: | python -m pip install --upgrade pip pip install pytest pytest-cov ruff pip install -r requirements.txt - name: Lint with ruff run: ruff check app/ tests/ - name: Create test database run: | psql -h localhost -U postgres -d test_db -c " CREATE TABLE tasks ( id SERIAL PRIMARY KEY, title VARCHAR NOT NULL, completed BOOLEAN DEFAULT FALSE );" env: PGPASSWORD: testpass - name: Run pytest with coverage run: | pytest \ --cov=app \ --cov-fail-under=80 \ --cov-report=term-missing \ --cov-report=xml \ tests/ env: DATABASE_URL: postgresql://postgres:testpass@localhost:5432/test_db - name: Upload coverage to artifacts uses: actions/upload-artifact@v3 if: always() with: name: coverage-report path: htmlcov/ build: name: Build Docker Image needs: test runs-on: ubuntu-latest permissions: contents: read packages: write steps: - name: Checkout code uses: actions/checkout@v4 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Log in to Container Registry uses: docker/login-action@v2 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Build and push Docker image uses: docker/build-push-action@v4 with: context: . push: true tags: | ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest cache-from: type=gha cache-to: type=gha,mode=max

Output (complete success):

text
Workflow: CI - Build, Test, and Push test job: Lint with ruff ........................... PASSED ✓ Run pytest with coverage ................ PASSED ✓ 8 passed, coverage 85% >= 80% threshold Upload coverage to artifacts ............. DONE build job: Starts (because test passed) Build and push Docker image ............. DONE ✓ Pushed: ghcr.io/fistasolutions/agent-task-service:abc123def456 Pushed: ghcr.io/fistasolutions/agent-task-service:latest All jobs passed ✓

Output (test failure):

text
Workflow: CI - Build, Test, and Push test job: Lint with ruff ........................... PASSED ✓ Run pytest with coverage ................ FAILED ✗ tests/test_main.py::test_create_task FAILED assert 400 == 201 build job: SKIPPED (because test failed) Workflow failed. Build not triggered.

Key Concepts

Quality Gate: An automated checkpoint that must pass before the pipeline continues. If any test fails, coverage drops, or linter finds issues, the pipeline stops.

Test Coverage: The percentage of code executed by your tests. Higher coverage (80%+) reduces the risk of uncaught bugs reaching production.

Fail-Fast: Stop immediately when a quality gate fails. Don't waste resources building and pushing images if tests will reject the code.

Service Containers: Temporary databases or services that spin up for tests and tear down automatically, ensuring tests are isolated and repeatable.

Artifacts: Files (like coverage reports) uploaded to GitHub for review. Developers can download and inspect what tests covered.

Try With AI

Ask Claude: "I have a FastAPI application with 80% test coverage. Add quality gates to my GitHub Actions workflow that fail if coverage drops below 80% or if any linting errors are found."

Before accepting the output:

  • Does it use pytest --cov-fail-under=80?
  • Does it include a separate linting step with ruff?
  • Does it fail the job (not just warn) when thresholds aren't met?

After Claude provides the workflow, ask: "Now add integration tests that require a PostgreSQL database using GitHub Actions service containers. The tests should verify that tasks are persisted to the database."

Verify the response includes:

  • A services section in the job with PostgreSQL configuration
  • Health checks to wait for the database to be ready
  • A step to create the database schema before tests run
  • Environment variables passed to pytest for database connection
  • Tests that actually interact with the database (not mocked)

Finally, ask: "Ensure the workflow has three jobs—test, build, and push—where build only runs if test passes, and push only runs if build succeeds. Show me the complete workflow with all three jobs."

Check that:

  • Each job has a clear needs: dependency
  • Test job includes linting, pytest, and coverage checks
  • Build job uses docker/build-push-action
  • Push job (if included) pushes to a registry
  • No job runs if its dependency fails

Reflect on Your Skill

You built a gitops-deployment skill in Chapter 0. Test and improve it based on what you learned.

Test Your Skill

bash
Using my gitops-deployment skill, create a testing pipeline with linting, unit tests, and coverage gates. Does my skill understand quality thresholds and test result reporting?

Identify Gaps

Ask yourself:

  • Did my skill include integration testing with service containers?
  • Did it handle test failure handling and coverage reporting to GitHub?

Improve Your Skill

If you found gaps:

bash
My gitops-deployment skill doesn't include service containers for integration tests. Update it to add PostgreSQL/Redis containers and environment variable configuration for tests.