- can have multiple treatments, which is sometimes referred to as an A/B/n test
- use both success and guardrail metrics to identify experiences that improve some metrics without negatively impacting others
- let you learn and find promising ideas
- have a fixed allocation that doesn’t change
- can use either a fixed or sequential design, where you view results upon conclusion or continuously during the experiment
Most A/B tests aim to test product changes with the goal of understanding whether you should roll
out the changes, or if they need further development.
A learning experiment is another type of A/B test that aims to learn about user behavior or to
measure a strategic baseline for the product.
This learning is typically achieved by removing a product or feature from the experience or degrading the
experience in some other way. Such a test helps inform future product prioritization
by breaking down which parts of the existing product have the most impact on
user behavior or the business.
Learning experiments can also be exploratory and only aim to find if a certain variant has a causal relation to an outcome
regardless of direction.
The Anatomy of an Experiment
An A/B test has different parts. This section gives a high-level overview of these concepts.The Hypothesis is the Product Foundation of the Test
A hypothesis is a specific assumption that can be conclusively tested when subjected to an experiment, and is the basis for a good experiment. It guides the experiment from a product perspective, and makes the anticipated impact and value of the experiment clear.A/B Tests Distribute Different Experiences Through Variants
An A/B test evaluates how users react after exposure to a new experience. Variants describe the different user experiences you test. For example, there could be different variants of a button color. One variant sets the button color to red, another to blue. A variant in an experiment is often referred to as a treatment. These variants often introduce new features, innovations, or changes that should improve the experience for the user. Typically, an experiment has one variant representing the current default (in production) experience, usually called control or the control treatment.Randomization Makes Differences Causal
Users in an experiment are randomly assigned a variant. The variant is the only difference in the experience between the control and treatment groups. Because of randomization, the different treatments explain any observed change in behavior. If the treatment group outperforms the control group on the target metric, the treatment variant improves the user experience. Randomization ensures that the groups are similar. External factors, such as seasonality, other feature launches, and competitor moves, affect control and treatment evenly and have no impact on the results of the experiment.The treatment effect estimated in an A/B test is only valid for the time of the test.
The estimated effect doesn’t necessarily generalize to other future points in time.
The same treatment can have a widely different impact depending on when you run the test.
For example, recommending Christmas songs in July might not have the same effect as in December.
The randomization only ensures that the groups are similar during the experiment.
Metrics Measure the Effect of the Treatments
Every A/B test needs at least one metric. These metrics help prove or disprove the hypothesis and to make a business decision based on the outcome of the test. In other words, your metrics help answer whether the change is good enough to release widely. Confidence supports two types of metrics:- Success metrics are metrics that should improve with the treatment
- Guardrail metrics are metrics that don’t need to improve, but shouldn’t deteriorate
It’s common and strongly recommended to use both success and guardrails metrics.
The reason is to guard against, for example, cannibalization.
An experiment may want to increase engagement in a new feature, but not by cannibalizing on the engagement in another feature.
In this case, the engagement in the new feature would be the success metric, while the engagement in the related feature is
the guardrail metric.

