Projects / Three-Body Agent github.com/a7t-ai/three-body-agent

  • [autoagent]

Three-Body Agent

An autonomous development pipeline on GitHub Actions + Claude Code.

Five workflows on a schedule. The implementer picks the next p0 issue from the project board, opens a PR. The fixer chases CI until it's green. The merger ships when reviews land. Board-sync and rollover keep the queue tidy. No human in the loop until the morning standup.

.github/workflows/autoagent-implementer.yml
1
2
3
4
5
6
7
8
9
10
11
12
name: '[AUTOAGENT] Implementer'

on:
  schedule:
    - cron: '0 * * * *'      # hourly
  workflow_dispatch:

concurrency:
  group: autoagent-implementer
  cancel-in-progress: false

# pick next p0 → branch → code → push → open PR
Three-Body Agent

The pitch is short. You open a GitHub Project board, drop issues on it ranked p0 through p5, walk away, and come back the next morning to a stack of merged PRs and a Telegram thread showing what got shipped.

That’s the whole loop. The interesting part is what’s underneath.

The three bodies

The system is shell scripts and GraphQL. Three workflows run on schedule, each with its own orbit.

The Implementer runs hourly. It scans the current week’s milestone, picks the highest-priority Todo issue, hands it to Claude Code, opens a PR.

The Fixer runs every 30 minutes. It watches PRs flagged for revision, reads the feedback, asks Claude Code to address it, and pushes the patch.

The Merger runs every two hours. It promotes green PRs that have been stable long enough to be trusted.

Two more support the loop. Board Sync listens for PR events and moves the board column. Rollover runs weekly, sweeping un-finished work into next week’s milestone.

The “three bodies” framing is the joke. They orbit each other, each pulling at the others’ work, and the whole arrangement is stable as long as nothing falls into a degenerate state. (Spoiler: things do fall into degenerate states. That’s what the priority labels and the paused column are for.)

What “flagged for revision” actually means is up to your setup. In mine, it can be any of three signals: CI failed, a human left a review comment, or a separate LLM acting as critic posted its verdict on the PR.

The LLM acting as critic is the interesting one. Given a clear rubric of what counts as a critical issue versus a medium one, it tells the Fixer what to prioritise. And it’s deliberately bias-free: a fresh process per call, no memory of prior fixes, no context on what was already done. Every pass reads the diff as new code. A critic with memory drifts toward “looks good, you fixed it last time” instead of catching the new failure mode the last fix introduced.

Here is one ticket’s round trip through the three workflows. The first attempt comes back flagged for revision (CI red, in this run), the Fixer routes the feedback to the Implementer, the second attempt passes, and the Merger takes it to main. Drag the scrubber or click play.

What it runs on

Nothing exotic.

GitHub Actions runs the cron and the runner. GitHub Projects V2 is the shared state: columns are status, milestones are weeks, labels are priority. The Claude Code CLI is the worker; the Implementer hands it a prompt that includes the issue body, the file tree, and the current diff, while the Fixer hands it the reviewer’s feedback and asks for a patch.

Claude Code is what I run, but the loop doesn’t care which model sits behind the CLI; any frontier LLM with a coding CLI slots in. Local models work too, with the caveat that they need a serious setup behind them: a server that handles long contexts and tool calls reliably. A local model that drops half its tool calls or chokes on a 60k-token prompt costs more in harness debugging than it saves in inference.

Telegram is what I use for visibility, but any messaging channel with a bot API slots in: Slack, Discord, a webhook to anything you already check. It posts a message at every transition so I can watch from my phone.

No framework, no SDK, no dependencies beyond gh, jq, and curl. Everything is auditable in the Actions logs and version-controlled next to the code it operates on.

What surprises people

The Implementer sorts by priority and then by issue-body length. Longer descriptions tend to produce cleaner PRs, because Claude has more constraint to work with. Empty issues fail more often. So the bot self-selects toward issues you’ve actually thought about.

The Fixer is the most-used workflow, by a wide margin. I expected the Implementer to do the heavy lifting. In practice, the first attempt almost never lands green. Flaky tests, missing dependencies, a type narrowing issue. The Fixer is what turns attempts into merges. It runs more often, and it earns its keep more often.

Telegram is load-bearing. The notifications are the only place you can tell the system to stop before it pushes another bad PR.

What’s deliberately out of scope

No auto-deploy. Three-Body merges to main; whatever picks up main for deploy is your concern.

No model selection. The CLI uses whatever you’ve configured; Three-Body just hands it work.

No “agent supervisor.” Three-Body is the supervisor. Adding another layer above is where these systems start to lose their pragmatism.

What I’d do differently

The cron-based polling is wasteful. Three workflows wake up every hour, every thirty minutes, every two hours, whether there is anything to do or not. Most of the runs are no-ops. GitHub does have webhooks for this: both external HTTP webhooks and internal workflow triggers like on: pull_request. The Fixer and the Merger should really be wired to PR events instead of the clock. Both react to PR state changes, so an event filter would wake them at exactly the right moments and let them stay quiet the rest of the time. Only the Implementer genuinely needs a clock, because it polls the board for new tickets and there is no GitHub event for “the board has work waiting”.

The Fixer has no give-up rule. It loops on the same PR forever, which is fine until it isn’t. After three failed attempts at the same error, I want it to label the PR paused, post once to Telegram, and stop burning credits.

Sub-agent runs read back into the same logs as the main thread, and that blurs the diagnosis when the Fixer is trying to understand why something failed. A separate transcript per attempt would make the post-mortem much cleaner.

Try it

Repo: https://github.com/a7t-ai/three-body-agent. README walks through:

  1. Fork it onto a target repo.
  2. Create a GitHub Project with the columns: Todo · In Progress · Ready for QA · Done.
  3. Add labels p0 through p5.
  4. Set the milestone for this week.
  5. Add the secrets the workflows need (CLAUDE_CODE_API_KEY, TELEGRAM_BOT_TOKEN, and so on).
  6. Drop one issue on the board and watch the Implementer pick it up next hour.

a7t.ai · BOOKLET

Three-Body Agent: the Playbook

The full architecture in PDF form. Fourteen chapters: the three roles, the two reviews, the board and triggers, the failure catalogue, the cost dashboards. Refundable for fourteen days.

Read details →
€19

What I’d do for you

This pattern transfers to teams that want a “junior engineer who never sleeps” running on their backlog. The right shape for you depends on three things: what your main looks like (protected? CODEOWNERS? required reviews?), what your test surface costs to run, and who reviews before merge. Tell me those three, and we can spec the right adaptation in 45 minutes.