Sequential testing for statistical inference

Experiment uses a sequential testing method of statistical inference. With sequential testing, results stay valid whenever you view them. You can end an experiment early based on observations made to that point. The number of observations you need to make an informed decision is, on average, much lower than the number you need with T-tests or similar procedures. You can experiment rapidly, incorporate what you learn into your product, and accelerate the pace of your experimentation program.Sequential testing has several advantages over T-tests. Primarily, you don't need to know the number of observations necessary to achieve significance before you start the experiment. You can use both sequential testing and T-tests for binary metrics and continuous metrics. If you have concerns about long-tailed distributions affecting the Central Limit Theorem assumption, refer to this article about outliers.

Given enough time, the statistical power of the sequential testing method is 1. If there is an effect size to detect, this approach can detect it.

This article explains the basics of sequential testing, how it fits into Amplitude Experiment, and how to make it work for you.

Hypothesis testing in Amplitude Experiment

When you run an A/B test, Experiment conducts a hypothesis test using a randomized control trial. In this trial, Amplitude randomly assigns users to either a treatment variant or the control. The control represents your product in its current state. Each treatment includes a set of potential changes to your current baseline product. With a predetermined metric, Experiment compares the performance of these two populations using a test statistic.

In a hypothesis test, you look for performance differences between the control and your treatment variants. Amplitude Experiment tests the null hypothesis

H_0:\ \delta = 0

where

\delta = \mu_{\text{treatment}} - \mu_{\text{control}}

states there's no difference between the treatment's mean and the control's mean.

For example, you want to measure the conversion rate of a treatment variant. The null hypothesis posits that the conversion rates of your treatment variants and your control are the same.

The alternative hypothesis states that there is a difference between the treatment and control. Experiment's statistical model uses sequential testing to look for any difference between treatments and control.

There are many different sequential testing options. Amplitude Experiment uses a family of sequential tests called mixture sequential probability ratio test (mSPRT). The weight function, H, is the mixing distribution. The following mixture of likelihood ratios against the null hypothesis is such that.

Common questions

Was this helpful?

Sequential testing for statistical inference

Hypothesis testing in Amplitude Experiment

Common questions

Why hasn't the p-value or confidence interval changed, even though the number of exposures is greater than 0?

Why don't I see any confidence interval on the Confidence Interval Over Time chart?

What are we estimating when we choose Uniques?

What are we estimating when we choose Average Totals?

What are we estimating when we choose Average Sum of Property?

What is absolute lift?

What is relative lift?

Why does absolute lift exit the confidence interval?

How does sequential testing compare to a T-test?