Skip to main content
The platform provides tests for differences between means of the treatment groups and the control group. The success metrics and guardrail metrics tests are slightly different in their interpretations.

Superiority Tests

Confidence uses superiority tests for success metrics and for deterioration tests.
A success metric test can be significant or non-significant. Significant means that it’s unlikely to find the observed difference of means between the groups if there were no effect. All success metric tests are against the null hypothesis of zero. Three types of tests are available for success metrics.
  • Significant result: The data shows evidence that the treatment caused a change in the metric.
  • Insignificant result: The data shows no evidence that the treatment caused a change in the metric.
The statistical hypotheses used in the test are:
  • H0:δ=0H_0: \delta = 0
  • H1:δ0H_1: \delta \neq 0
where δ\delta is the treatment effect.

Non-Inferiority Tests

Confidence uses non-inferiority tests for guardrail metrics.
For non-inferiority tests, the test is against the null hypothesis of NIM (non-inferiority margin). You must select a direction for a non-inferiority test.
  • Significant result: The data shows evidence that the metric hasn’t decreased by more than NIM in the treatment group.
  • Insignificant result: The data shows no evidence that the metric hasn’t decreased by more than NIM in the treatment group.
The statistical hypotheses used in the test are:
  • H0:δ<NIMH_0: \delta < -NIM
  • H1:δ>NIMH_1: \delta > -NIM
where δ\delta is the treatment effect.

Inferiority Tests

Confidence uses inferiority tests for unintended negative effects in success and guardrail metrics. The inferiority test is testing for a move in the opposite direction than the intended one.
For inferiority tests, the test is against the null hypothesis of zero. You must select a direction for an inferiority test.
  • Significant result: The data shows evidence that the treatment caused a decrease in the metric.
  • Insignificant result: The data shows no evidence that the treatment caused a decrease in the metric.
The statistical hypotheses used in the test are:
  • H0:δ=0H_0: \delta = 0
  • H1:δ<0H_1: \delta < 0
where δ\delta is the treatment effect.

Relative Values

Confidence performs tests on the absolute values, but lets you give NIMs on a relative scale. The mean of the baseline group, typically the control group, transforms the relative values into absolute values.

Tests for Success Metrics

Success metrics always use a superiority test. The test is against the null hypothesis of zero mean difference between the groups.

Tests for Guardrail Metrics

You can test guardrail metrics in two different ways:
  • Use an inferiority test. This test evaluates whether there is evidence that the guardrail metric does worse in the treatment group compared to the control group.
  • Use a non-inferioriy test. This test instead evaluates whether there is evidence that the guardrail metric does better than a pre-defined threshold in the treatment group compared to the control group.

Tests for Deterioration

Confidence tests all success and guardrail metrics for deterioration. For success metrics, this means testing for inferiority and superiority separately. For guardrail metrics, this means testing for inferiority and non-inferiority if the guardrail metric uses a non-inferiority test.