Analyze your experiment data with the T-test
- The Central Limit Theorem applies to the metric.
- Neither population shares the same variance.
- You don't run the T-test until you reach the sample size specified by the duration estimator.
Looking for a Z-test?
T-test supports many of the same options as a Z-test.
Conduct a T-test as either:
- Two-sided: looks for any change in the metric, in either direction.
- One-sided: looks for an increase or a decrease, but not both.
T-tests work for both binary and continuous metrics.
A two-sided test doesn't explicitly state a statistically significant increase or decrease, while a one-sided test does. If you select Increase, the upper confidence interval bound is positive infinity. For Decrease, the lower confidence interval bound is negative infinity.
Configure T-test settings
Access the T-test settings from the Settings tab. The required settings depend on the T-test type you want to run and the direction you want the metric to move. To configure your T-test:
- Edit the Goals panel, then select Increase or Decrease for your metric.
- Open the Analysis Settings panel. Go to Stats Preferences > Advanced. Select the T-test stats method. Choose 1-sided or 2-sided based on the T-test type you want to run. For example, to run a two-sided T-test looking for an increase, select Increase in the primary metric and 2-sided T-test in statistical settings.
- Enter the number of users needed under Samples Per Variant Needed. If you don't know which sample size to enter in Samples Per Variant, use Amplitude's duration estimator. To learn more, refer to the Help Center article on planning experiments with the duration estimator.
- Select Save to change the statistical settings to T-test.
Manage sample size needed for the T-test
You must reach a minimum sample size before you run a T-test. Experiment warns you if your dataset is too small.
The Cumulative Exposure graph and its table show your sample size requirements. The graph shows a constant, dotted line named Sample Size Target, which represents the total number of users needed for each variant. The table next to the graph highlights the Exposure Remaining, which is the number of users each variant still needs. This information confirms the number of users needed before running the T-test, and provides an estimate of the time the experiment needs before you use a T-test to interpret your results.
Reaching the needed sample size doesn't guarantee statistically significant results. For example, if your lift is smaller than the MDE, your results often aren't statistically significant.
Common questions
How does Amplitude calculate improvement over baseline?
Improvement over baseline is the ratio of the mean of the variant (A) over the mean of the baseline (B): mean(A) / mean(B).
For each group, Amplitude calculates the mean as k / n, where k is the number of conversions and n is the sample size.
Why do calculations use unique conversions instead of totals?
Amplitude uses unique conversions instead of totals when checking for statistical significance. Totals make false assumptions about a user's behavior in the funnel. The aggregate sum assumes that each time a user enters the funnel is independent of the previous time. That assumption isn't valid when calculating statistical significance, although totals can still help with other analyses in the Experiment Results chart or end-to-end Amplitude Experiment.How does Amplitude calculate statistical significance?
Amplitude uses standardized statistical methods to calculate statistical significance. The method varies by feature: sequential testing or a two-tailed T-test. By default, Amplitude Experiment and the Experiment Results chart use sequential testing, while the Funnel Analysis chart uses the two-tailed T-test. When you compare analyses, p-values may not match across charts that use different testing methods.
For both methods, Amplitude uses a 5% false-positive rate by default. The threshold for significance is(1 - p_value) > 95%. You can change the false-positive rate in Amplitude Experiment. You can't change it in the Funnel Analysis chart.To help reduce false positives, Amplitude requires a minimum sample size before declaring significance: 30 samples, five conversions, and five non-conversions for each variant. Amplitude automatically treats tests that don't meet these minimums as not statistically significant.
Was this helpful?