The Overall Shipping Decision
An important feature of the statistical analysis in Confidence is that the errors that can happen, false positive and false negatives, matter on the experiment level, and not on the individual metric level. In other words, the rates at which these errors happen is over repeated experiments. From a product perspective, false positives and false negatives exist for the decision to ship a feature or not. A false positive is when you ship a feature that truly doesn’t have an effect, and a false negative is when you don’t ship a feature that truly had an effect.- App
- API
Confidence uses a composite decision rule to
produce an overall recommendation for a shipping decision.
The results must pass the following for a recommendation to ship:
- at least one success metric has evidence of improvement
- all guardrail metrics show evidence of being within acceptable margins
- Alpha is adjusted using a Bonferroni correction, where the original alpha is divided by the number of success metrics.
- The power level is adjusted using
1 - (1 - power)/(number of guardrails).
References
- A. Dmitrienko, A.C. Tamhane,, and F. Bretz (Eds.) (2009) “Multiple Testing Problems in Pharmaceutical Statistics” (First ed.), Chapman and Hall/CRC.


