Skip to main content
The success of an experiment depends on how the metrics you select move. Metrics describe the behavior of the entity you’re experimenting on. By first calculating the metrics for the control and treatment groups, you can later statistically compare the two groups. Metrics in Confidence belong to one of two classes of metrics: success or guardrail metrics.
You first need to create a metric before you can use it in an experiment. You can read more about how to create metrics on the metrics page.

Assignments

For Confidence to be able to evaluate the experiment, it needs to know what you are experimenting on. Specify an entity, and where the assignments for these entities exist.
  1. Click the Edit icon to the right on the Metrics section of the experiment edit page to bring up the metric configuration dialog.
  2. Select the entity that you want to analyze.
  3. Select the assignment table that has assignment logs for the experiment.
  4. Optional. Select an exposure filter.
  5. Select how often you want to compute metrics by specifying the metric interval.
  6. Click Save to save the metric configuration.

Exposure Filtering

Exposure filters are methods for narrowing down more closely which users to include in the exposure definition and the analysis of your experiment. When you add an exposure filter for your experiment, the analysis only includes the exposed users that also match the exposure filter. The time of exposure is the first unit of time after default exposure where the user matches the exposure filter. You may want to use exposure filters if the default definition of exposure is too broad for what you want to measure in your experiment. Read more about Exposure Filtering

Success Metrics

Success metrics aim to prove the hypothesis of the A/B test. For example, if your hypothesis is that “more users stream podcasts if you rank podcasts higher,” then an appropriate success metric is a metric that measures podcast consumption. In this example, the hypothesis expects the metric to increase. A significant result means that there is evidence of an effect in the desired direction. Test for the following change in a success metric.
Test for a significant increase of the metric. For example, an increase in hours spent listening to Spotify.
You can optionally set a minimum detectable effects (MDE) for success metrics. The MDE is the effect size that you want to be able to detect with high certainty. You must set an MDE to be able to run a power analysis for the metric and learn how much traffic the metric needs.

Guardrail Metrics

A guardrail metric is a metric that ensures that the experiment doesn’t have any unexpected side effects. You can use any metric as a guardrail metric. The meaning of a significant result depends on the type of guardrail metric. Use guardrail metrics in the following situations:
  • When you want to ensure that your A/B test doesn’t introduce regressions in performance or product quality.
  • When you want to ensure that your A/B test doesn’t have a negative impact on a metric that some other part of the organization cares about.
Guardrail metrics can use non-inferiority margins that let you seek evidence that the change doesn’t have a negative impact that exceeds the margin you specify. The margin is optional. The interpretation of the results depends on whether you use non-inferiority margins. All guardrail metrics in the same experiment must all use or not use non-inferiority margins.

With Non-Inferiority Margin

Use guardrail metrics with non-inferiority margins to look for evidence that the change doesn’t negatively impact the metric more than your specified non-inferiority margin. A significant result means that there is evidence that the guardrail is within acceptable margins. Guardrail metrics with non-inferiority margins test for the following.
Test if there is evidence that the metric hasn’t increased by more than the NIM. For example, the number of skipped songs in personalized playlists shouldn’t increase by more than 1%.

Without Non-Inferiority Margin

Use guardrail metrics without non-inferiority margins to look for evidence that the change has a negative impact on the metric. A significant result means that there is evidence that the guardrail deteriorates because of the change. Guardrail metrics with non-inferiority margins test for the following.
Test if there is evidence that the metric has increased. For example, test if the number of skipped songs in personalized playlists has increased.
Set non-inferiority margins if you can. With non-inferiority margins, you seek evidence that the change doesn’t lead to a regression by more than a tolerance level you specify. Without a non-inferiority margin, a lack of evidence of a deterioration doesn’t imply a neutral result. Read more about how the two approaches compare in the guardrail metric lesson.

Required Metrics

For experiments that uses a surface with required metrics, Confidence adds these metrics to the bottom of the design page. Read more about required metrics in the surface documentation.