We Built Our Own PR Agents, and You Can Too

$20 for an AI code review? 😔 Building your own PR agent for pennies? 😏

Insights

Mar 27, 2026

12 min read

Brian Giori

Former Head of Engineering, MCP+Experiment, Amplitude

Colorful marbles sliding through a bendy path, suggesting code reviews

This article was originally published on X/Twitter. Read the original here.

Anthropic’s Claude Code Action, OpenAI’s Codex GitHub Action, Cursor’s Bugbot Autofix—everyone is shipping PR review bots right now, and they all want you to pay a premium.

These tools are great for generic code review, but what about everything else? What if you want an agent that adds analytics instrumentation to new features, auto-generates missing test coverage, keeps documentation in sync with code changes, or enforces internal API patterns your team cares about—and does it as a PR with human review and one-click publishing?

The pattern I describe in this article scales to any of those use cases. It’s a generic workflow for running custom agent skills on pull requests: the agent reads the diff, does its work in an isolated sandbox, pushes to a separate branch, posts a review with inline comments, and gives the author a one-click button to merge. Swap the skill, and you have a completely different agent.

We shamelessly stole this workflow pattern from @cursor_ai’s Bugbot Autofix, which we use across all our repos internally. Bugbot’s approach is elegant: the agent makes its fix, pushes it to a separate branch, then posts a PR comment with a one-click button to merge the changes in. The human reviews the diff and decides. We generalized that pattern into something you can plug any skill into.

Our first use case: Automatic event tracking at Amplitude

At @Amplitude_HQ, we’re building PR agents that automatically add event tracking instrumentation to pull requests. No human has to think about what to track, write the code, or review whether the analytics are correct. The agent reads the diff, decides what matters, writes the tracking code, validates it compiles, and posts a review—all before the author asks for review.

But the architecture isn't specific to analytics. The same prepare → review → publish pipeline works for any skill you want to run against a PR. Event tracking is just the first one we’re shipping.

The stack

Cloudflare Workers: API surface. Receives GitHub webhooks, serves capability links.
Cloudflare Workflows: Orchestration. Durable, retryable multi-step execution for long-running agent tasks.
Cloudflare Sandbox: Isolation. A disposable Linux container per workflow instance.
Flue: Sandbox agent framework. Manages sandboxes, proxies secrets, and orchestrates the agent inside each container. (Thanks @FredKSchott!)
OpenCode: Open source, model agnostic coding agent built for flexibility and headless environments. Runs inside each sandbox, editing files, executing shell commands, and reasoning about the codebase. (@opencode)
Cloudflare KV: State. Handoff records between prepare and publish phases.

Step 1: Prepare—from PR opened to agent review

When a pull request opens on GitHub, a webhook hits our Cloudflare Worker. The Worker verifies the signature, normalizes the PR metadata, and starts a durable Workflow instance. From there, Flue sets up an isolated Cloudflare Sandbox, clones the repo, checks out the PR branch, and hands off to OpenCode.

OpenCode loads and runs a skill from another repository. Our tracking skill tells the agent to assess the diff, discover existing tracking patterns, plan events and properties, implement the code, and validate the build. But you could swap this for any skill: custom code reviews, generating test coverage for untested code, syncing documentation with API changes, or enforcing internal patterns across a monorepo. The prepare workflow doesn't care what the skill does—it just runs it and collects the structured result.

The workflow ID is derived from the PR number and head SHA, giving us natural deduplication. If the same commit triggers twice, Cloudflare Workflows handles the conflict. Each step.do() call is independently retryable with its own timeout.

class PrPrepareWorkflow extends WorkflowEntrypoint {
  async run(event, step) {
    await step.do('setup', { timeout: '5 minutes' }, async () => {
      await setupFlueRuntime(flue, sandbox);
      await ensureRepositoryWorkspace({ runtime: flue, repoName, defaultBranch });
      await checkoutPullRequestHeadBranch({ runtime: flue, ... });
    });

    const result = await step.do('agent', { timeout: '1 hour' }, async () => {
      return runSkill(flue.client);
    });

    // Commit, push, post review...
  }
}

We also create GitHub Check Runs at the start of each workflow so the PR shows a check in progress. If the workflow fails, the check fails too, giving authors clear visibility.

Sandboxes and scoped proxies

Cloudflare Sandbox is what makes the whole thing safe. Each workflow instance gets its own disposable container. We clone the repository into it, give the agent shell access, and tear it down when the workflow ends.

const sandbox = getSandbox(env.Sandbox, event.instanceId, {
  sleepAfter: '60m',
});

Crucially, we don't give the agent unrestricted network access. Flue sets up scoped proxies that control exactly what the sandbox can reach:

Anthropic API: for model calls
GitHub API: read-only by default, with specific write permissions for git and github GraphQL

const proxies = {
  anthropic: anthropic(),
  github: github({
    policy: {
      base: 'allow-read',
      allow: [
        { method: 'POST', path: '/graphql', body: githubBody.graphql() },
        { method: 'POST', path: '/**/git-receive-pack' },
      ],
    },
  }),
};

Flue adds is the proxy layer: scoped, session-based secrets that never touch the container's environment, so the agent can make LLM and GitHub API calls without ever seeing a raw token.

Skills and typed output

Flue gives your workflow three primitives to control what OpenCode does inside the sandbox:

flue.shell(command): run a shell command in the sandbox
flue.prompt(text, { result }): one-shot LLM call with optional typed output
flue.skill(name, { result }): delegate a complex task to OpenCode using a skill file

The key insight is skills. A skill is a markdown file that describes a multi-step task OpenCode should perform autonomously. The agent reads the instructions, uses its tools (shell, file editing, etc.), and returns a structured result. This is what makes the pattern generic—the prepare/publish pipeline stays the same, you just swap the skill.

For our tracking use case, the skill tells the agent to:

Assess the branch diff against the merge base
Discover existing tracking patterns in the codebase
Plan events, properties, and placement
Implement the tracking code and validate the build

And we get back typed output—not prose:

const TrackingResult = v.object({
  trackingRequired: v.boolean(),
  reasoning: v.string(),
  trackingPlan: v.optional(v.array(v.object({
    eventName: v.string(),
    eventProperties: v.array(v.object({
      name: v.string(),
      type: v.string(),
      description: v.string(),
    })),
    eventDescriptionAndReasoning: v.string(),
    implementationLocations: v.array(v.object({
      filePath: v.string(),
      originalLineNumberPreChanges: v.number(),
    })),
  }))),
});

const result = await flue.skill('tracking-implementation', {
  result: TrackingResult,
});

The structured output is what makes everything downstream possible. We use the tracking plan to generate inline review comments, build a diff preview, and decide whether to approve or request changes—all without parsing natural language. A different skill would return a different schema, but the workflow handles it the same way.

Step 2: Publish—the Bugbot Autofix pattern

This is where we directly stole from Bugbot Autofix. The agent never pushes directly to the PR branch. Instead, it commits to a separate branch, then posts a PR comment with a one-click merge button—exactly like Bugbot does. The human reviews the diff and decides whether the changes land.

This pattern is what makes agent-on-PR workflows trustworthy regardless of what skill is running. Whether the agent added tracking code, generated missing tests, or updated stale documentation—the author always gets to review before anything lands.

When prepare finishes, we generate a capability link—a URL containing a handoff ID and a hashed token—and post it in the PR review comment. When the author clicks it, the Worker validates the token, checks that the PR head hasn't changed, and starts the publish workflow.

The publish workflow spins up a new sandbox, checks out the PR branch, fetches the prepare branch, and merges. If there are conflicts (because the PR moved forward since prepare ran), OpenCode gets a second skill—merge-conflict-resolution—to resolve them. Then we commit, push, resolve the old review threads, and dismiss the prepare review.

Why two phases matter

Human stays in control: The agent can do real work, but nothing lands without explicit approval
Stale detection: If the PR head changes after prepare, the publish link is automatically invalidated
Conflict resolution: The publish workflow can hand merge conflicts back to the agent
Auditability: Every prepare creates a real git commit on a named branch, so you can inspect the diff before it lands
Skill-agnostic: The prepare/publish pipeline doesn’t change when you add a new skill

Step 3: Re-trigger via comment

Authors can re-run the agent at any time by commenting “@amplitude track” on the PR. The webhook handler picks it up, fetches the latest head SHA, and creates a fresh prepare workflow instance.

This is useful when the PR has changed since the last prepare, or when the author wants a fresh analysis after updating the code. You could extend this to support multiple trigger phrases for different skills—“@amplitude track for analytics,” “@amplitude secure for security,” etc.

What the PR author sees

When the agent finishes, the PR gets:

A GitHub Check Run showing the skill status (e.g. “Amplitude—Tracking changes prepared”)
A PR review with:

A summary of what the agent did and why
Inline comments on the exact lines where changes were made
A collapsible diff preview showing the prepared changes
A Merge Changes button (the capability link)
Instructions to re-run the skill

If no changes are needed, the agent approves the PR with its reasoning.

If the author clicks merge, the check run updates to show the publish in progress, and the changes land on the PR branch as a merge commit.

Key takeaways

Steal the Bugbot Autofix workflow. (Thanks Bugbot team!) Cursor’s Bugbot Autofix pushes fixes to a separate branch and gives the author a one-click merge button. That’s the right UX pattern for any PR agent: agent works in isolation, human reviews the diff, one click to merge. We copied it wholesale, and it generalizes to any skill.
The pattern is skill-agnostic. The prepare → review → publish pipeline doesn't know or care what the agent does inside the sandbox. Swap the skill markdown file and the result schema, and you have a completely different agent running against the same PRs with the same human-in-the-loop workflow.
Workflows > request handlers for agent tasks. Agent tasks are long-running, flaky, and multi-step. Cloudflare Workflows give you per-step retries, independent timeouts, and durable execution. A raw Worker request would crumble under a 10-minute agent run.
Typed agent output unlocks real automation. If your agent returns prose, you'll spend half your workflow parsing it. If it returns structured data (via Valibot schemas in our case), you can pipe the output directly into GitHub API calls, review comments, and state machines.
Safety is architecture, not prompting. We hash capability tokens instead of storing raw secrets. We scope network access through proxies. We isolate execution in sandboxes. We invalidate handoffs when the PR head changes. None of this is about the model—it’s about the system around it.

About the author

Brian Giori

Former Head of Engineering, MCP+Experiment, Amplitude

More from Brian

Brian Giori is the former head of engineering for Amplitude's MCP server and its experimentation platform. He's an AI enthusiast.

More from Brian

Topics

Engineering

Tech Stack

We Built Our Own PR Agents, and You Can Too

$20 for an AI code review? 😔 Building your own PR agent for pennies? 😏

Insights

Mar 27, 2026

12 min read

Brian Giori

Former Head of Engineering, MCP+Experiment, Amplitude

This article was originally published on X/Twitter. Read the original here.

Anthropic’s Claude Code Action, OpenAI’s Codex GitHub Action, Cursor’s Bugbot Autofix—everyone is shipping PR review bots right now, and they all want you to pay a premium.

Our first use case: Automatic event tracking at Amplitude

The stack

Cloudflare Workers: API surface. Receives GitHub webhooks, serves capability links.
Cloudflare Workflows: Orchestration. Durable, retryable multi-step execution for long-running agent tasks.
Cloudflare Sandbox: Isolation. A disposable Linux container per workflow instance.
Flue: Sandbox agent framework. Manages sandboxes, proxies secrets, and orchestrates the agent inside each container. (Thanks @FredKSchott!)
OpenCode: Open source, model agnostic coding agent built for flexibility and headless environments. Runs inside each sandbox, editing files, executing shell commands, and reasoning about the codebase. (@opencode)
Cloudflare KV: State. Handoff records between prepare and publish phases.

Step 1: Prepare—from PR opened to agent review

class PrPrepareWorkflow extends WorkflowEntrypoint {
  async run(event, step) {
    await step.do('setup', { timeout: '5 minutes' }, async () => {
      await setupFlueRuntime(flue, sandbox);
      await ensureRepositoryWorkspace({ runtime: flue, repoName, defaultBranch });
      await checkoutPullRequestHeadBranch({ runtime: flue, ... });
    });

    const result = await step.do('agent', { timeout: '1 hour' }, async () => {
      return runSkill(flue.client);
    });

    // Commit, push, post review...
  }
}

We also create GitHub Check Runs at the start of each workflow so the PR shows a check in progress. If the workflow fails, the check fails too, giving authors clear visibility.

Sandboxes and scoped proxies

const sandbox = getSandbox(env.Sandbox, event.instanceId, {
  sleepAfter: '60m',
});

Crucially, we don't give the agent unrestricted network access. Flue sets up scoped proxies that control exactly what the sandbox can reach:

Anthropic API: for model calls
GitHub API: read-only by default, with specific write permissions for git and github GraphQL

const proxies = {
  anthropic: anthropic(),
  github: github({
    policy: {
      base: 'allow-read',
      allow: [
        { method: 'POST', path: '/graphql', body: githubBody.graphql() },
        { method: 'POST', path: '/**/git-receive-pack' },
      ],
    },
  }),
};

Flue adds is the proxy layer: scoped, session-based secrets that never touch the container's environment, so the agent can make LLM and GitHub API calls without ever seeing a raw token.

Skills and typed output

Flue gives your workflow three primitives to control what OpenCode does inside the sandbox:

flue.shell(command): run a shell command in the sandbox
flue.prompt(text, { result }): one-shot LLM call with optional typed output
flue.skill(name, { result }): delegate a complex task to OpenCode using a skill file

For our tracking use case, the skill tells the agent to:

Assess the branch diff against the merge base
Discover existing tracking patterns in the codebase
Plan events, properties, and placement
Implement the tracking code and validate the build

And we get back typed output—not prose:

const TrackingResult = v.object({
  trackingRequired: v.boolean(),
  reasoning: v.string(),
  trackingPlan: v.optional(v.array(v.object({
    eventName: v.string(),
    eventProperties: v.array(v.object({
      name: v.string(),
      type: v.string(),
      description: v.string(),
    })),
    eventDescriptionAndReasoning: v.string(),
    implementationLocations: v.array(v.object({
      filePath: v.string(),
      originalLineNumberPreChanges: v.number(),
    })),
  }))),
});

const result = await flue.skill('tracking-implementation', {
  result: TrackingResult,
});

Step 2: Publish—the Bugbot Autofix pattern

Why two phases matter

Human stays in control: The agent can do real work, but nothing lands without explicit approval
Stale detection: If the PR head changes after prepare, the publish link is automatically invalidated
Conflict resolution: The publish workflow can hand merge conflicts back to the agent
Auditability: Every prepare creates a real git commit on a named branch, so you can inspect the diff before it lands
Skill-agnostic: The prepare/publish pipeline doesn’t change when you add a new skill

Step 3: Re-trigger via comment

Authors can re-run the agent at any time by commenting “@amplitude track” on the PR. The webhook handler picks it up, fetches the latest head SHA, and creates a fresh prepare workflow instance.

What the PR author sees

When the agent finishes, the PR gets:

A GitHub Check Run showing the skill status (e.g. “Amplitude—Tracking changes prepared”)
A PR review with:

A summary of what the agent did and why
Inline comments on the exact lines where changes were made
A collapsible diff preview showing the prepared changes
A Merge Changes button (the capability link)
Instructions to re-run the skill

If no changes are needed, the agent approves the PR with its reasoning.

If the author clicks merge, the check run updates to show the publish in progress, and the changes land on the PR branch as a merge commit.

Key takeaways

Steal the Bugbot Autofix workflow. (Thanks Bugbot team!) Cursor’s Bugbot Autofix pushes fixes to a separate branch and gives the author a one-click merge button. That’s the right UX pattern for any PR agent: agent works in isolation, human reviews the diff, one click to merge. We copied it wholesale, and it generalizes to any skill.
The pattern is skill-agnostic. The prepare → review → publish pipeline doesn't know or care what the agent does inside the sandbox. Swap the skill markdown file and the result schema, and you have a completely different agent running against the same PRs with the same human-in-the-loop workflow.
Workflows > request handlers for agent tasks. Agent tasks are long-running, flaky, and multi-step. Cloudflare Workflows give you per-step retries, independent timeouts, and durable execution. A raw Worker request would crumble under a 10-minute agent run.
Typed agent output unlocks real automation. If your agent returns prose, you'll spend half your workflow parsing it. If it returns structured data (via Valibot schemas in our case), you can pipe the output directly into GitHub API calls, review comments, and state machines.
Safety is architecture, not prompting. We hash capability tokens instead of storing raw secrets. We scope network access through proxies. We isolate execution in sandboxes. We invalidate handoffs when the PR head changes. None of this is about the model—it’s about the system around it.

About the author

Brian Giori

Former Head of Engineering, MCP+Experiment, Amplitude

More from Brian

Brian Giori is the former head of engineering for Amplitude's MCP server and its experimentation platform. He's an AI enthusiast.

More from Brian

Topics

Engineering

Tech Stack

We Built Our Own PR Agents, and You Can Too

$20 for an AI code review? 😔 Building your own PR agent for pennies? 😏

Our first use case: Automatic event tracking at Amplitude

The stack

Step 1: Prepare—from PR opened to agent review

Sandboxes and scoped proxies

Skills and typed output

Step 2: Publish—the Bugbot Autofix pattern

Why two phases matter

Step 3: Re-trigger via comment

What the PR author sees

Key takeaways

Recommended Reading

Putting A Number On AI Quality

Meet the Winners of the 2026 Amplitude AI Impact Awards

Beyond Last-Touch Attribution: Find Out Which Interactions Really Matter

Agent Connectors Are Better Together

We Built Our Own PR Agents, and You Can Too

$20 for an AI code review? 😔 Building your own PR agent for pennies? 😏

Our first use case: Automatic event tracking at Amplitude

The stack

Step 1: Prepare—from PR opened to agent review

Sandboxes and scoped proxies

Skills and typed output

Step 2: Publish—the Bugbot Autofix pattern

Why two phases matter

Step 3: Re-trigger via comment

What the PR author sees

Key takeaways

Recommended Reading

Putting A Number On AI Quality

Meet the Winners of the 2026 Amplitude AI Impact Awards

Beyond Last-Touch Attribution: Find Out Which Interactions Really Matter

Agent Connectors Are Better Together