Platform

AI

Amplitude AI
Analytics that never stops working
AI Agents
Sense, decide, and act faster than ever before
AI Visibility
See how your brand shows up in AI search
AI Feedback
Distill what your customers say they want
Amplitude MCP
Insights from the comfort of your favorite AI tool

Insights

Product Analytics
Understand the full user journey
Marketing Analytics
Get the metrics you need with one line of code
Session Replay
Visualize sessions based on events in your product
Heatmaps
Visualize clicks, scrolls, and engagement

Action

Guides and Surveys
Guide your users and collect feedback
Feature Experimentation
Innovate with personalized product experiences
Web Experimentation
Drive conversion with A/B testing powered by data
Feature Management
Build fast, target easily, and learn as you ship
Activation
Unite data across teams

Data

Data Governance
Complete data you can trust
Integrations
Connect Amplitude to hundreds of partners
Security & Privacy
Keep your data secure and compliant
Warehouse-native Amplitude
Unlock insights from your data warehouse
Solutions
Solutions that drive business results
Deliver customer value and drive business outcomes
Amplitude Solutions →

Industry

Financial Services
Personalize the banking experience
B2B
Maximize product adoption
Media
Identify impactful content
Healthcare
Simplify the digital healthcare experience
Ecommerce
Optimize for transactions

Use Case

Acquisition
Get users hooked from day one
Retention
Understand your customers like no one else
Monetization
Turn behavior into business

Team

Product
Fuel faster growth
Data
Make trusted data accessible
Engineering
Ship faster, learn more
Marketing
Build customers for life
Executive
Power decisions, shape the future

Size

Startups
Free analytics tools for startups
Enterprise
Advanced analytics for scaling businesses
Resources

Learn

Blog
Thought leadership from industry experts
Resource Library
Expertise to guide your growth
Compare
See how we stack up against the competition
Glossary
Learn about analytics, product, and technical terms
Explore Hub
Detailed guides on product and web analytics

Connect

Community
Connect with peers in product analytics
Events
Register for live or virtual events
Customers
Discover why customers love Amplitude
Partners
Accelerate business value through our ecosystem

Support & Services

Customer Help Center
All support resources in one place: policies, customer portal, and request forms
Developer Hub
Integrate and instrument Amplitude
Academy & Training
Become an Amplitude pro
Professional Services
Drive business success with expert guidance and support
Product Updates
See what's new from Amplitude

Tools

Benchmarks
Understand how your product compares
Prompt Library
Prompts for Agents to get started
Templates
Kickstart your analysis with custom dashboard templates
Tracking Guides
Learn how to track events and metrics with Amplitude
Maturity Model
Learn more about our digital experience maturity model
Pricing
LoginContact salesGet started

AI

Amplitude AIAI AgentsAI VisibilityAI FeedbackAmplitude MCP

Insights

Product AnalyticsMarketing AnalyticsSession ReplayHeatmaps

Action

Guides and SurveysFeature ExperimentationWeb ExperimentationFeature ManagementActivation

Data

Data GovernanceIntegrationsSecurity & PrivacyWarehouse-native Amplitude
Amplitude Solutions →

Industry

Financial ServicesB2BMediaHealthcareEcommerce

Use Case

AcquisitionRetentionMonetization

Team

ProductDataEngineeringMarketingExecutive

Size

StartupsEnterprise

Learn

BlogResource LibraryCompareGlossaryExplore Hub

Connect

CommunityEventsCustomersPartners

Support & Services

Customer Help CenterDeveloper HubAcademy & TrainingProfessional ServicesProduct Updates

Tools

BenchmarksPrompt LibraryTemplatesTracking GuidesMaturity Model
LoginSign Up
Blog
InsightsProductCompanyCustomers
Topics

101

AI

APJ

Acquisition

Adobe Analytics

Agents

Amplify

Amplitude Academy

Amplitude Activation

Amplitude Analytics

Amplitude Audiences

Amplitude Community

Amplitude Feature Experimentation

Amplitude Full Platform

Amplitude Guides and Surveys

Amplitude Heatmaps

Amplitude Made Easy

Amplitude Session Replay

Amplitude Web Experimentation

Amplitude on Amplitude

Analytics

B2B SaaS

Behavioral Analytics

Benchmarks

Churn Analysis

Cohort Analysis

Collaboration

Consolidation

Conversion

Customer Experience

Customer Lifetime Value

DEI

Data

Data Governance

Data Management

Data Tables

Digital Experience Maturity

Digital Native

Digital Transformer

EMEA

Ecommerce

Employee Resource Group

Engagement

Engineering

Event Tracking

Experimentation

Feature Adoption

Financial Services

Funnel Analysis

Getting Started

Google Analytics

Growth

Healthcare

How I Amplitude

Implementation

Integration

LATAM

LLM

Life at Amplitude

MCP

Machine Learning

Marketing Analytics

Media and Entertainment

Metrics

Modern Data Series

Monetization

Next Gen Builders

North Star Metric

Partnerships

Personalization

Pioneer Awards

Privacy

Product 50

Product Analytics

Product Design

Product Management

Product Releases

Product Strategy

Product-Led Growth

Recap

Retention

Revenue

Startup

Tech Stack

The Ampys

Warehouse-native Amplitude

We Built Our Own PR Agents, and You Can Too

$20 for an AI code review? 😔 Building your own PR agent for pennies? 😏
Insights

Mar 27, 2026

12 min read

Brian Giori

Brian Giori

Head of Engineering, MCP+Experiment, Amplitude

Colorful marbles sliding through a bendy path, suggesting code reviews

This article was originally published on X/Twitter. Read the original here.


Anthropic’s Claude Code Action, OpenAI’s Codex GitHub Action, Cursor’s Bugbot Autofix—everyone is shipping PR review bots right now, and they all want you to pay a premium.

These tools are great for generic code review, but what about everything else? What if you want an agent that adds analytics instrumentation to new features, auto-generates missing test coverage, keeps documentation in sync with code changes, or enforces internal API patterns your team cares about—and does it as a PR with human review and one-click publishing?

The pattern I describe in this article scales to any of those use cases. It’s a generic workflow for running custom agent skills on pull requests: the agent reads the diff, does its work in an isolated sandbox, pushes to a separate branch, posts a review with inline comments, and gives the author a one-click button to merge. Swap the skill, and you have a completely different agent.

We shamelessly stole this workflow pattern from @cursor_ai’s Bugbot Autofix, which we use across all our repos internally. Bugbot’s approach is elegant: the agent makes its fix, pushes it to a separate branch, then posts a PR comment with a one-click button to merge the changes in. The human reviews the diff and decides. We generalized that pattern into something you can plug any skill into.

Our first use case: Automatic event tracking at Amplitude

At @Amplitude_HQ, we’re building PR agents that automatically add event tracking instrumentation to pull requests. No human has to think about what to track, write the code, or review whether the analytics are correct. The agent reads the diff, decides what matters, writes the tracking code, validates it compiles, and posts a review—all before the author asks for review.

But the architecture isn't specific to analytics. The same prepare → review → publish pipeline works for any skill you want to run against a PR. Event tracking is just the first one we’re shipping.

The stack

  • Cloudflare Workers: API surface. Receives GitHub webhooks, serves capability links.
  • Cloudflare Workflows: Orchestration. Durable, retryable multi-step execution for long-running agent tasks.
  • Cloudflare Sandbox: Isolation. A disposable Linux container per workflow instance.
  • Flue: Sandbox agent framework. Manages sandboxes, proxies secrets, and orchestrates the agent inside each container. (Thanks @FredKSchott!)
  • OpenCode: Open source, model agnostic coding agent built for flexibility and headless environments. Runs inside each sandbox, editing files, executing shell commands, and reasoning about the codebase. (@opencode)
  • Cloudflare KV: State. Handoff records between prepare and publish phases.

Step 1: Prepare—from PR opened to agent review

When a pull request opens on GitHub, a webhook hits our Cloudflare Worker. The Worker verifies the signature, normalizes the PR metadata, and starts a durable Workflow instance. From there, Flue sets up an isolated Cloudflare Sandbox, clones the repo, checks out the PR branch, and hands off to OpenCode.

OpenCode loads and runs a skill from another repository. Our tracking skill tells the agent to assess the diff, discover existing tracking patterns, plan events and properties, implement the code, and validate the build. But you could swap this for any skill: custom code reviews, generating test coverage for untested code, syncing documentation with API changes, or enforcing internal patterns across a monorepo. The prepare workflow doesn't care what the skill does—it just runs it and collects the structured result.

The workflow ID is derived from the PR number and head SHA, giving us natural deduplication. If the same commit triggers twice, Cloudflare Workflows handles the conflict. Each step.do() call is independently retryable with its own timeout.

class PrPrepareWorkflow extends WorkflowEntrypoint {
  async run(event, step) {
    await step.do('setup', { timeout: '5 minutes' }, async () => {
      await setupFlueRuntime(flue, sandbox);
      await ensureRepositoryWorkspace({ runtime: flue, repoName, defaultBranch });
      await checkoutPullRequestHeadBranch({ runtime: flue, ... });
    });

    const result = await step.do('agent', { timeout: '1 hour' }, async () => {
      return runSkill(flue.client);
    });

    // Commit, push, post review...
  }
}

We also create GitHub Check Runs at the start of each workflow so the PR shows a check in progress. If the workflow fails, the check fails too, giving authors clear visibility.

Sandboxes and scoped proxies

Cloudflare Sandbox is what makes the whole thing safe. Each workflow instance gets its own disposable container. We clone the repository into it, give the agent shell access, and tear it down when the workflow ends.

const sandbox = getSandbox(env.Sandbox, event.instanceId, {
  sleepAfter: '60m',
});

Crucially, we don't give the agent unrestricted network access. Flue sets up scoped proxies that control exactly what the sandbox can reach:

  • Anthropic API: for model calls
  • GitHub API: read-only by default, with specific write permissions for git and github GraphQL
const proxies = {
  anthropic: anthropic(),
  github: github({
    policy: {
      base: 'allow-read',
      allow: [
        { method: 'POST', path: '/graphql', body: githubBody.graphql() },
        { method: 'POST', path: '/**/git-receive-pack' },
      ],
    },
  }),
};

Flue adds is the proxy layer: scoped, session-based secrets that never touch the container's environment, so the agent can make LLM and GitHub API calls without ever seeing a raw token.

Skills and typed output

Flue gives your workflow three primitives to control what OpenCode does inside the sandbox:

  • flue.shell(command): run a shell command in the sandbox
  • flue.prompt(text, { result }): one-shot LLM call with optional typed output
  • flue.skill(name, { result }): delegate a complex task to OpenCode using a skill file

The key insight is skills. A skill is a markdown file that describes a multi-step task OpenCode should perform autonomously. The agent reads the instructions, uses its tools (shell, file editing, etc.), and returns a structured result. This is what makes the pattern generic—the prepare/publish pipeline stays the same, you just swap the skill.

For our tracking use case, the skill tells the agent to:

  1. Assess the branch diff against the merge base
  2. Discover existing tracking patterns in the codebase
  3. Plan events, properties, and placement
  4. Implement the tracking code and validate the build

And we get back typed output—not prose:

const TrackingResult = v.object({
  trackingRequired: v.boolean(),
  reasoning: v.string(),
  trackingPlan: v.optional(v.array(v.object({
    eventName: v.string(),
    eventProperties: v.array(v.object({
      name: v.string(),
      type: v.string(),
      description: v.string(),
    })),
    eventDescriptionAndReasoning: v.string(),
    implementationLocations: v.array(v.object({
      filePath: v.string(),
      originalLineNumberPreChanges: v.number(),
    })),
  }))),
});

const result = await flue.skill('tracking-implementation', {
  result: TrackingResult,
});

The structured output is what makes everything downstream possible. We use the tracking plan to generate inline review comments, build a diff preview, and decide whether to approve or request changes—all without parsing natural language. A different skill would return a different schema, but the workflow handles it the same way.

Step 2: Publish—the Bugbot Autofix pattern

This is where we directly stole from Bugbot Autofix. The agent never pushes directly to the PR branch. Instead, it commits to a separate branch, then posts a PR comment with a one-click merge button—exactly like Bugbot does. The human reviews the diff and decides whether the changes land.

This pattern is what makes agent-on-PR workflows trustworthy regardless of what skill is running. Whether the agent added tracking code, generated missing tests, or updated stale documentation—the author always gets to review before anything lands.

When prepare finishes, we generate a capability link—a URL containing a handoff ID and a hashed token—and post it in the PR review comment. When the author clicks it, the Worker validates the token, checks that the PR head hasn't changed, and starts the publish workflow.

The publish workflow spins up a new sandbox, checks out the PR branch, fetches the prepare branch, and merges. If there are conflicts (because the PR moved forward since prepare ran), OpenCode gets a second skill—merge-conflict-resolution—to resolve them. Then we commit, push, resolve the old review threads, and dismiss the prepare review.

Why two phases matter

  • Human stays in control: The agent can do real work, but nothing lands without explicit approval
  • Stale detection: If the PR head changes after prepare, the publish link is automatically invalidated
  • Conflict resolution: The publish workflow can hand merge conflicts back to the agent
  • Auditability: Every prepare creates a real git commit on a named branch, so you can inspect the diff before it lands
  • Skill-agnostic: The prepare/publish pipeline doesn’t change when you add a new skill

Step 3: Re-trigger via comment

Authors can re-run the agent at any time by commenting “@amplitude track” on the PR. The webhook handler picks it up, fetches the latest head SHA, and creates a fresh prepare workflow instance.

This is useful when the PR has changed since the last prepare, or when the author wants a fresh analysis after updating the code. You could extend this to support multiple trigger phrases for different skills—“@amplitude track for analytics,” “@amplitude secure for security,” etc.

What the PR author sees

When the agent finishes, the PR gets:

  1. A GitHub Check Run showing the skill status (e.g. “Amplitude—Tracking changes prepared”)
  2. A PR review with:
  • A summary of what the agent did and why
  • Inline comments on the exact lines where changes were made
  • A collapsible diff preview showing the prepared changes
  • A Merge Changes button (the capability link)
  • Instructions to re-run the skill

If no changes are needed, the agent approves the PR with its reasoning.

If the author clicks merge, the check run updates to show the publish in progress, and the changes land on the PR branch as a merge commit.

Key takeaways

  1. Steal the Bugbot Autofix workflow. (Thanks Bugbot team!) Cursor’s Bugbot Autofix pushes fixes to a separate branch and gives the author a one-click merge button. That’s the right UX pattern for any PR agent: agent works in isolation, human reviews the diff, one click to merge. We copied it wholesale, and it generalizes to any skill.
  2. The pattern is skill-agnostic. The prepare → review → publish pipeline doesn't know or care what the agent does inside the sandbox. Swap the skill markdown file and the result schema, and you have a completely different agent running against the same PRs with the same human-in-the-loop workflow.
  3. Workflows > request handlers for agent tasks. Agent tasks are long-running, flaky, and multi-step. Cloudflare Workflows give you per-step retries, independent timeouts, and durable execution. A raw Worker request would crumble under a 10-minute agent run.
  4. Typed agent output unlocks real automation. If your agent returns prose, you'll spend half your workflow parsing it. If it returns structured data (via Valibot schemas in our case), you can pipe the output directly into GitHub API calls, review comments, and state machines.
  5. Safety is architecture, not prompting. We hash capability tokens instead of storing raw secrets. We scope network access through proxies. We isolate execution in sandboxes. We invalidate handoffs when the PR head changes. None of this is about the model—it’s about the system around it.
About the author
Brian Giori

Brian Giori

Head of Engineering, MCP+Experiment, Amplitude

More from Brian

Brian Giori is head of engineering for Amplitude's MCP server and its experimentation platform. He's an AI enthusiast.

More from Brian
Topics

AI

Engineering

Tech Stack

Recommended Reading

article card image
Read 
Insights
How Healthcare Teams Use Amplitude AI with Confidence & Safety

Mar 25, 2026

11 min read

article card image
Read 
Company
Amplitude Named to Fast Company’s Most Innovative Companies of 2026

Mar 24, 2026

3 min read

article card image
Read 
Product
Get to Know Amplitude’s New Always-On Data Analysts

Mar 23, 2026

9 min read

article card image
Read 
Insights
Structuring Documentation for AI Readers

Mar 20, 2026

6 min read

Platform
  • AI Agents
  • AI Visibility
  • AI Feedback
  • Amplitude MCP
  • Product Analytics
  • Web Analytics
  • Feature Experimentation
  • Feature Management
  • Web Experimentation
  • Session Replay
  • Guides and Surveys
  • Activation
Compare us
  • Adobe
  • Google Analytics
  • Mixpanel
  • Pendo
  • Optimizely
  • Fullstory
  • LaunchDarkly
  • Heap
  • PostHog
Resources
  • Resource Library
  • Blog
  • Agent Prompt Library
  • Product Updates
  • Amp Champs
  • Amplitude Academy
  • Events
  • Glossary
Partners & Support
  • Contact Us
  • Customer Help Center
  • Community
  • Developer Docs
  • Find a Partner
  • Become an affiliate
Company
  • About Us
  • Careers
  • Press & News
  • Investor Relations
  • Diversity, Equity & Inclusion
Terms of ServicePrivacy NoticeAcceptable Use PolicyLegal
EnglishJapanese (日本語)Korean (한국어)Español (LATAM)Español (Spain)Português (Brasil)Português (Portugal)FrançaisDeutsch
© 2026 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.