Platform

AI

Amplitude AI
Analytics that never stops working
AI Agents
Sense, decide, and act faster than ever before
AI Visibility
See how your brand shows up in AI search
AI Feedback
Distill what your customers say they want
Amplitude MCP
Insights from the comfort of your favorite AI tool
AI Assistant
Support powered by product data

Insights

Product Analytics
Understand the full user journey
Marketing Analytics
Get the metrics you need with one line of code
Session Replay
Visualize sessions based on events in your product
Heatmaps
Visualize clicks, scrolls, and engagement

Action

Guides and Surveys
Guide your users and collect feedback
Feature Experimentation
Innovate with personalized product experiences
Web Experimentation
Drive conversion with A/B testing powered by data
Feature Management
Build fast, target easily, and learn as you ship
Activation
Unite data across teams

Data

Data Governance
Complete data you can trust
Integrations
Connect Amplitude to hundreds of partners
Security & Privacy
Keep your data secure and compliant
Solutions
Solutions that drive business results
Deliver customer value and drive business outcomes
Amplitude Solutions →

Industry

Financial Services
Personalize the banking experience
B2B
Maximize product adoption
Media
Identify impactful content
Healthcare
Simplify the digital healthcare experience
Ecommerce
Optimize for transactions

Use Case

Acquisition
Get users hooked from day one
Retention
Understand your customers like no one else
Monetization
Turn behavior into business

Team

Product
Fuel faster growth
Data
Make trusted data accessible
Engineering
Ship faster, learn more
Marketing
Build customers for life
Executive
Power decisions, shape the future

Size

Startups
Free analytics tools for startups
Enterprise
Advanced analytics for scaling businesses
Resources

Learn

Blog
Thought leadership from industry experts
Resource Library
Expertise to guide your growth
Compare
See how we stack up against the competition
Glossary
Learn about analytics, product, and technical terms
Explore Hub
Detailed guides on product and web analytics

Connect

Community
Connect with peers in product analytics
Events
Register for live or virtual events
Customers
Discover why customers love Amplitude
Partners
Accelerate business value through our ecosystem

Support & Services

Customer Help Center
All support resources in one place: policies, customer portal, and request forms
Developer Hub
Integrate and instrument Amplitude
Academy & Training
Become an Amplitude pro
Customer Success
Drive business success with expert guidance and support
Product Updates
See what's new from Amplitude

Tools

Benchmarks
Understand how your product compares
Prompt Library
Prompts for Agents to get started
Templates
Kickstart your analysis with custom dashboard templates
Tracking Guides
Learn how to track events and metrics with Amplitude
Maturity Model
Learn more about our digital experience maturity model
Pricing
LoginContact salesGet started

AI

Amplitude AIAI AgentsAI VisibilityAI FeedbackAmplitude MCPAI Assistant

Insights

Product AnalyticsMarketing AnalyticsSession ReplayHeatmaps

Action

Guides and SurveysFeature ExperimentationWeb ExperimentationFeature ManagementActivation

Data

Data GovernanceIntegrationsSecurity & Privacy
Amplitude Solutions →

Industry

Financial ServicesB2BMediaHealthcareEcommerce

Use Case

AcquisitionRetentionMonetization

Team

ProductDataEngineeringMarketingExecutive

Size

StartupsEnterprise

Learn

BlogResource LibraryCompareGlossaryExplore Hub

Connect

CommunityEventsCustomersPartners

Support & Services

Customer Help CenterDeveloper HubAcademy & TrainingCustomer SuccessProduct Updates

Tools

BenchmarksPrompt LibraryTemplatesTracking GuidesMaturity Model
LoginSign Up

What I Learned Pointing a Ralph Loop at My Product for a Week

With Amplitude data for feedback, taking the human out of the loop is not “unpossible.”
Insights

May 13, 2026

12 min read

Eric Carlson

Eric Carlson

Chief AI Architect, Amplitude

Ralph Loops Product Improvements

Amplitude recently hosted AI Week, a time dedicated to upending our normal work process to focus on a fully different AI-native model. As a data scientist by background, I wanted to run one experiment: can I give a coding agent a clear objective function, scaffold it with an orchestration system, then run a Ralph Wiggum loop to autonomously build a product?

Kick it off, walk away, see what happened. My app was a backcountry route planning app I had been toying with. The agent was Claude Code with browser use enabled. The orchestrator was Amplitude’s Opportunity Finder, a new experimental feature that’s supposed to identify product signals, surface improvement opportunities, then draft specs and PRs that coding agents pick up. The constraint I set for myself was that I would not intervene. No prompts, no nudges, just up-front goal definition.

By the end of the week, my app had 102 shipped features. Slope-angle overlays on a 3D Mount Hood. A physics-based avalanche runout simulation that modeled how snow would flow down a canyon before a skier ever dropped in. A mushroom foraging prediction model grounded in species-specific fruiting weather and micro-climate estimates from elevation and aspect. Weather tabs, resort data, mountain biking routes, kayaking maps. Every one of them shipped with a browser-recorded GIF validating that the feature worked end-to-end.

The features were impressive, but what really surprised me was the Ralph loop itself and the techniques that helped it run effectively. Here’s what I learned.

The Ralph loop, as I ran it

The Ralph loop is named after the Simpsons character Ralph Wiggum, who optimistically and persistently proceeds through life, even when incompetent. For the programmer, it is just a while loop over Claude Code. When one session completes, it fires up another one, always in full auto-mode. For the computer/data scientist, it is an intelligent global optimizer with a rigorous objective function that stops only after convergence, if ever.

The version I ran had three cycles: build the next opportunity, verify it in a browser, and generate more opportunities from the new product state. I pointed it at the app, gave it very high-level goals and constraints (examine competitors, optimize for fun planning), and let it run.

Ralph loops are easy to write, but without precise definitions of how they should hill-climb, you’re just burning tokens (like Ralph’s famous quote, ”It tastes like ... burning.”). My success depended on defining how to generate opportunities, how verification was measured, and how the outcomes of one cycle became the inputs to the next.

Going in, I frankly underestimated the effectiveness of this system for building my Ski app.

Where the opportunities came from

Once it has a goal, a Ralph loop needs an input queue. You can just ask the agent to pick things on its own, but I wanted to try giving it a little more guidance, so I connected it to Amplitude’s experimental Opportunity Finder.

The Opportunity Finder is supposed to work as an ”AI PM,” identifying high-value tasks from signals in your analytics, session replays, customer feedback, agent traces, and competitive gaps, then drafting specs for a coding agent to pick up. For my Ralph loop, each opportunity arrived as a structured object: a one-line problem statement, a few-sentence proposed solution, and behavioral evidence for why it was worth picking up.

This was load-bearing in a way I did not fully appreciate before the week started. Without a structured opportunity queue, a forever-loop just asks the model “What should I build next?” and the answer drifts into whatever the model’s priors happen to be about ski apps. With the queue, the loop had a ranked input that was tied to the product’s actual goals and refreshed every cycle with whatever the previous cycle had produced.

My agent instrumented itself

I think this is the part that mattered most, and it is the part I almost did not do.

My agent did not just ship features. It wired up its own telemetry. Amplitude events, full session replay, metric definitions tied back to the opportunities that had proposed the feature in the first place. Every feature the agent built started reporting back on itself the moment it shipped. It really moved at the speed of feedback.

Anyone who has gotten stuck with a single model trying to fix a bug over and over knows how that can fail. Amplitude provided a feedback machine that let Claude mine for the next thing to do rather than spinning its wheels. These external signals seemed to knock the agents out of mode collapse. Maybe the product features would be buggy at first, but they would heal themselves as the bugs were observed.

When the agent instruments its own output, the next cycle has behavioral evidence (albeit mostly synthetic behaviors) to work from, including what got used, what got ignored, where sessions got stuck. The Opportunity Finder ranked against that evidence.

The second cycle got smarter than the first. The tenth got smarter than the second. That is what compounding looks like in this setup, and it almost did not happen without the telemetry being a first-class part of what the agent ships. Not a thing I bolted on afterward.

Browser verification

I paired Claude Code with browser use enabled, so the agent had to click through the app it just changed. Every cycle, the agent opened the browser, used the feature it just built, and recorded a GIF of the click-through.

My ski planning app didn’t have a lot of real users, and I initially thought this would be a barrier to the machine working. But it turns out that the act of Claude verifying features with agents drove a lot of synthetic traffic through parts of the site, which allowed Amplitude to pick up on flow oddities, bugs, and improvements. Session replays of these agents provided visual feedback that Amplitude’s Opportunity Manager agents could use to understand the user journeys and features holistically, at a higher level than code alone.

This approach caught a class of bug that unit tests miss. Let’s say the feature compiles, the tests pass, the button renders, but the dropdown is wired to the wrong handler, or the form submits the wrong field, or a state update never propagates. An agent actually using browser verification finds that in seconds. A green CI run does not.

The GIF was the artifact. Every PR had one attached. It was documentation, verification, and evidence in the same file. When I came back after a stretch of agent time and wanted to know whether feature #73 actually worked, I watched the GIF. I did not read the diff.

The features I didn’t know I needed

Going in, I expected the loop to mostly generate fixes and refinements, iterating on things I had already scoped. Instead, it proposed entire feature categories I had not asked for.

The mushroom foraging model is the one I keep coming back to. I never told the agent to build it. The Opportunity Finder, doing its own competitive research against other outdoor planning apps, decided that a foraging feature was a gap worth closing. Then it researched when different species fruit, pulled historical weather data, and built a prediction methodology in the style of a field guide, grounded in species-specific micro-climate estimates from elevation and aspect. I watched the GIF of the feature being used to find morels near a trailhead and thought: I would not have built this. But I'm delighted that the agent did.

The avalanche runout simulator is similar. I had scoped “slope angle visualization and route hazard highlighting” as a goal. The loop noticed that slope angle alone is not how backcountry skiers actually evaluate risk, instead they evaluate terrain traps, runout paths, trigger points. So it built a physics simulation that models how snow would flow down a specific canyon before a skier drops in.

The Ralph loop did overnight what I would have spent a week on as a PM, along with a handful of physics-savvy engineers. It looked at what competitors shipped this quarter, read session replays, synthesized the top three opportunities. Not always as well as I would have done it, but consistently, and across a wider surface than I would have covered alone. “Define goals and let the system converge” was the name of the game.

Risking auto-merge

The last manual step in the loop, the one I still had my hand on, was merge. By the end of the week, I was letting narrow categories of work (like small UI additions with clear success metrics) auto-merge without me looking. Anything touching user data I left alone. The judgment about which work was safe to auto-merge turned out to be a per-opportunity-type decision, not a global switch.

Going forward, I want to get the deploy step into the loop too, i.e., the last handoff left in the chain. If verification is strong enough for auto-merge, it is strong enough for auto-deploy. Same idea, one more step.

My case was pretty low-risk, but the more risk I was willing to take, the higher the return. It’s an interesting case study for deeply understanding your own product risks, and getting systematic about where and how to gate that. When you hand over the reins, you can really move quickly.

What I took away

Three things I am carrying into the next project:

  1. The loop is not the interesting part. The dispatcher and the verification gate are. Any while-loop will spin. Only a loop with honest outcome signals and a clearly defined objective function will progress toward value.
  2. Self-instrumentation is what makes the loop compound. Without it, the agent has no idea whether the thing it shipped worked outside of the code, and the next cycle is running on the same priors as the first. With it, the loop starts telling you which of its own outputs actually worked, and you can start trusting specific opportunity shapes more over time.
  3. The bottleneck moves. When one agent can ship 102 verified features in a week, execution stops being the scarce resource. Taste moves to the front. Prioritization moves to the front. Knowing which opportunities are worth pursuing, and which of the ones that shipped actually moved a metric, became what I spent my attention on. The loop does the building. I do the judging.

The parts that make this worth anything are the parts that are easy to skip: the verification and the feedback loop. The Ralph loop does not work without them. Nothing does.

About the author
Eric Carlson

Eric Carlson

Chief AI Architect, Amplitude

More from Eric

Eric Carlson is a Principal AI Engineer helping to shape and build Amplitude's next generation vision of of agentic and data driven product development. His background is in physics at UC Santa Cruz where he received a PhD working to detect Dark Matter at the center of the galaxy before transitioning to healthcare data science. When not working Eric enjoys playing guitar, cooking, and exploring the outdoors through skiing, mountain biking and rafting.

More from Eric
Topics

Agents

Amplitude AI

Engineering

Product Analytics

Recommended Reading

article card image
Read 
Insights
Claude Cowork for PMs: 5 Playbooks to Get Started

May 12, 2026

7 min read

article card image
Read 
Customers
How ACKO Drove 13% More Conversions & 50% Drop in Calls with GenAI

May 12, 2026

9 min read

article card image
Read 
Product
Agents Just Made Your Feature Launch Channel Smarter

May 11, 2026

7 min read

article card image
Read 
Insights
Homegrown FinOps Tools: How AI “Build” Beat “Buy” for Us in <1 Year

May 11, 2026

10 min read

Platform
  • AI Agents
  • AI Visibility
  • AI Feedback
  • Amplitude MCP
  • AI Assistant
  • Product Analytics
  • Web Analytics
  • Feature Experimentation
  • Feature Management
  • Web Experimentation
  • Session Replay
  • Guides and Surveys
  • Activation
Compare us
  • Adobe
  • Google Analytics
  • Contentsquare
  • Fullstory
  • Heap
  • LaunchDarkly
  • Mixpanel
  • Optimizely
  • Pendo
  • PostHog
Resources
  • Resource Library
  • Blog
  • Agent Prompt Library
  • Product Updates
  • Amp Champs
  • Amplitude Academy
  • Events
  • Glossary
Partners & Support
  • Status
  • Contact Us
  • Customer Help Center
  • Community
  • Developer Docs
  • Partner Program
  • Partner Directory
  • Become an affiliate
Company
  • About Us
  • Careers
  • Press & News
  • Investor Relations
  • Diversity, Equity & Inclusion
View markdown
Terms of ServicePrivacy NoticeAcceptable Use PolicyLegal
EnglishJapanese (日本語)Korean (한국어)Español (LATAM)Español (Spain)Português (Brasil)Português (Portugal)FrançaisDeutsch
© 2026 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.
Blog
InsightsProductCompanyCustomers
Topics

101

AI

APJ

Acquisition

Adobe Analytics

Agents

Amplify

Amplitude AI

Amplitude Academy

Amplitude Activation

Amplitude Agent Analytics

Amplitude Analytics

Amplitude Audiences

Amplitude Community

Amplitude Feature Experimentation

Amplitude Full Platform

Amplitude Guides and Surveys

Amplitude Heatmaps

Amplitude Made Easy

Amplitude Session Replay

Amplitude Web Experimentation

Amplitude on Amplitude

Analytics

B2B SaaS

Behavioral Analytics

Benchmarks

Churn Analysis

Cohort Analysis

Collaboration

Consolidation

Conversion

Customer Experience

Customer Lifetime Value

Customer Support

DEI

Data

Data Governance

Data Management

Data Tables

Digital Experience Maturity

Digital Native

Digital Transformer

EMEA

Ecommerce

Employee Resource Group

Engagement

Engineering

Event Tracking

Experimentation

Feature Adoption

Financial Services

Funnel Analysis

Getting Started

Google Analytics

Growth

Healthcare

How I Amplitude

Implementation

Integration

LATAM

LLM

Life at Amplitude

MCP

Machine Learning

Marketing Analytics

Media and Entertainment

Metrics

Modern Data Series

Monetization

Next Gen Builders

North Star Metric

Partnerships

Personalization

Pioneer Awards

Privacy

Product 50

Product Analytics

Product Design

Product Management

Product Releases

Product Strategy

Product-Led Growth

Recap

Retention

Revenue

Startup

Tech Stack

The Ampys

Warehouse-native Amplitude