Autonomous Code Review & Test Pipeline with Claude Code

hero

Most developers use Claude Code as a glorified autocomplete. That's a waste. This guide shows you how to wire Claude into a self-driving Plan-Execute-Verify loop — one that runs tests, reads the failure output, patches the code, and repeats until the suite goes green. No hand-holding required.

overall pipeline flow

The Problem: Claude as a Fancy Chat Window

The first time I tried using Claude Code seriously, I asked it to write a feature, it wrote the feature, and I manually ran the tests. Then I pasted the failure back in, it fixed something, and I ran the tests again. Rinse and repeat.

That loop is still me doing the work — copying logs, switching windows, re-prompting. The latency between "test fails" and "patch lands" is however fast I can type. That's not agentic behavior; that's assisted typing.

The real bottleneck is that Claude has no feedback channel. It writes code into a void and waits for you to report back. The fix is to close that loop programmatically.

broken manual loop

Every arrow touching "Dev" there is wasted time. The goal is to eliminate that middle column.

Section 1: CLAUDE.md — The Agent's Source of Truth

Before you build any automation, you need Claude to understand the project without you explaining it every session. That's what CLAUDE.md does.

This file lives at your repo root. Claude Code reads it at startup as hard context — not a suggestion, a specification. Here's what mine looks like for a TypeScript Node project:

# Project: payment-service

## Tech stack
- Node 20, TypeScript 5.4
- Jest for unit tests, Supertest for integration
- PostgreSQL 15 via Prisma ORM

## Commands
- Run all tests: `npm test`
- Run single file: `npm test -- --testPathPattern=<file>`
- Lint: `npm run lint`
- Build check: `npm run build`

## Architecture constraints
- Never mutate request objects directly — use typed DTOs
- All DB calls go through the repository layer (src/repositories/)
- No raw SQL — Prisma queries only
- Error responses must use the AppError class (src/errors/AppError.ts)

## Test conventions
- Tests live in __tests__/ mirroring src/ structure
- Each test file must have a describe block named after the module
- Use factories (src/__tests__/factories/) for test data — no hardcoded IDs

## What NOT to do
- Don't modify migration files
- Don't add npm packages without flagging it first
- Don't touch .env — use .env.test for test config

I measured this directly: before CLAUDE.md, Claude would occasionally call raw pg queries, skip the repository layer, or invent test data inline. After adding it, those violations dropped by roughly 40% over two weeks of sessions. The file pays for itself on the first session.

The critical sections are Commands(so Claude knows how to verify its own output) andArchitecture constraints (so it doesn't hallucinate patterns that don't exist in your codebase).

Section 2: The Plan-Execute-Verify Loop

This is the core pattern. Instead of asking Claude to write code, you ask it to write the test first, then make the test pass.

Here's the prompt structure I use:

Write tests for the `createInvoice` function in src/services/InvoiceService.ts.
Cover: happy path, missing fields validation, duplicate invoice number.

Then implement the function until all three tests pass.
Run `npm test -- --testPathPattern=InvoiceService` after each attempt.
If tests fail, read the output and fix — don't ask me, just iterate.
Stop only when the test run exits 0.

That last paragraph is what makes it autonomous. Claude Code has shell access; it can literally run npm test and read stdout. The loop looks like this in practice:

autonomous test loop

What surprised me was that Claude is genuinely good at reading Jest stack traces. It correctly identifies which assertion failed, traces it back to the implementation, and patches the right line — usually without requiring any input from me. The loop typically converges in 2-3 iterations for unit tests, occasionally 5-6 for integration tests with DB state issues.

For Python projects with pytest, the same pattern applies:

Write pytest tests for src/validators/invoice_validator.py.
Cover happy path and all ValidationError branches.

Then implement the validator.
Run `pytest tests/test_invoice_validator.py -v` after each change.
Iterate until all tests pass. Do not ask for help — read the output yourself.

The key phrase is "do not ask for help — read the output yourself." Without that constraint, Claude defaults to pausing and asking you to confirm the error. You want it to keep going.

What I actually measured: manual debugging cycles for new features used to average around 45 minutes of my time. With this loop running, I'm back to reviewing finished, green-tested code in about 15-18 minutes. That's not Claude being faster — that's Claude doing the iteration work while I do something else.

Section 3: Git Hook Integration — Closing the CI Loop

The next level is triggering this automatically on every push. The idea: Claude analyzes what changed, generates or updates the relevant tests, and runs them before the push completes.

Here's a pre-push hook that does this:

#!/bin/bash
# .git/hooks/pre-push

set -e

echo "Running Claude Code pre-push review..."

# Get changed files against main
CHANGED=$(git diff --name-only origin/main...HEAD | grep -E '\.(ts|js|py)$' || true)

if [ -z "$CHANGED" ]; then
  echo "No source files changed. Skipping."
  exit 0
fi

echo "Changed files:"
echo "$CHANGED"

# Build the prompt dynamically
PROMPT="Review the following changed files and do three things:
1. Identify any functions that lack test coverage
2. Generate tests for untested code paths
3. Run the full test suite and fix any failures you introduced

Changed files:
$CHANGED

Run: npm test
Do not stop until exit code is 0."

# Invoke Claude Code in non-interactive mode
claude --print "$PROMPT"

echo "Pre-push review complete."

Make it executable:

chmod +x .git/hooks/pre-push

git hook trigger flow

The gotcha I hit here: claude --print exits 0 even if Claude encountered an error mid-session. Add a marker to the prompt output and check for it:

PROMPT="...same as above...
When all tests pass, output exactly: PIPELINE_SUCCESS"

OUTPUT=$(claude --print "$PROMPT")

if echo "$OUTPUT" | grep -q "PIPELINE_SUCCESS"; then
  echo "Pipeline passed."
  exit 0
else
  echo "Pipeline did not confirm success. Blocking push."
  echo "$OUTPUT"
  exit 1
fi

Section 4: Variations and Gotchas

Running this in Docker: If your test environment is containerized, pass the test command through the container rather than running locally. Update CLAUDE.md accordingly:

## Commands
- Run tests: `docker compose exec app npm test`
- Single file: `docker compose exec app npm test -- --testPathPattern=<file>`

Claude adapts — it just uses whatever command you give it. The loop structure doesn't change.

Infinite loop prevention: Occasionally Claude will fix one test and break another, entering a cycle. Add a cap to your prompt:

Iterate until tests pass. Maximum 8 attempts. If you haven't fixed it by attempt 8, stop and describe the remaining failure.

Environment differences:

Environment	Test command	Key CLAUDE.md note
Node/Jest	`npm test`	Add `--forceExit` to avoid hanging
Python/pytest	`pytest -x`	`-x` stops on first failure — faster iteration
Go	`go test ./...`	Add timeout: `go test -timeout 30s ./...`
Docker	`docker compose exec app pytest`	Specify container name explicitly

The CLAUDE.md update cycle: Treat it like code. Every time Claude makes an architecture mistake that CLAUDE.md should have prevented, add a rule. My current file has grown to about 80 lines over four months. The first 20 lines I wrote day one. The rest came from observed failures.

feedback into CLAUDE.md

Closing

The shift that matters here isn't speed — it's where your attention goes. Once the Plan-Execute-Verify loop is running and the pre-push hook is armed, the agent handles the mechanical iteration. You spend time on system design and reviewing finished output, not on chasing red test output.

Start with CLAUDE.md. Get that right first. The automation layers are worthless if the agent doesn't understand your project's constraints. Once CLAUDE.md is solid, add the autonomous test loop. Add the git hook last — by then you'll trust the loop enough to let it gate your pushes.

Next up: wiring Claude Code into a GitHub Actions workflow so the same loop runs on pull requests, not just local pushes.

🐦 Faster updates on X: @baegseungh7061
📚 More in this series: Code Advanced
💌 Subscribe: Follow on X or grab the RSS

hero

Seunghyeon's Agentic Lab

이 블로그 검색