Skip to main content
This section provides technical specifications and reference information for analysis plans and statistical testing.
For conceptual explanations of analysis, see Stats Concepts.

Comparison Specifications

Define how to compare groups in an analysis:

All to Baseline

Compare all treatment groups to a designated control:
{
  "comparisonSpec": {
    "allToBaseline": {
      "baseline": "control"
    }
  }
}
Use when: Standard A/B test with one control and multiple treatments

All Pairs

Compare every group to every other group:
{
  "comparisonSpec": {
    "allPairs": {}
  }
}
Use when: exploring all possible differences, no clear control group

Specific Pairs

Define exactly which groups to compare:
{
  "comparisonSpec": {
    "pairs": [
      {
        "baseline": "control",
        "treatment": "variant_a"
      },
      {
        "baseline": "control",
        "treatment": "variant_b"
      }
    ]
  }
}
Use when: complex designs with specific comparisons of interest

Hypothesis Types

Superiority Hypothesis

Test if a treatment improves a metric by a meaningful amount:
{
  "superiority": {
    "preferredDirection": "INCREASE",
    "minimumDetectableEffect": 0.03
  }
}
Fields:
  • preferredDirection: INCREASE or DECREASE
  • minimumDetectableEffect: Relative change considered meaningful (for example, 0.03 = 3%)
Use for: success metrics, primary outcomes

Non-Inferiority Hypothesis

Test if a treatment doesn’t harm a metric beyond an acceptable margin:
{
  "nonInferiority": {
    "preferredDirection": "INCREASE",
    "nonInferiorityMargin": 0.01
  }
}
Fields:
  • preferredDirection: INCREASE or DECREASE
  • nonInferiorityMargin: Maximum acceptable degradation (for example, 0.01 = 1%)
Use for: guardrail metrics, cost metrics, performance metrics

Preferred Direction

ValueMeaningExample Metrics
INCREASEHigher is betterRevenue, conversion rate, engagement
DECREASELower is betterLoad time, error rate, bounce rate

Decision Rules

Combine multiple hypotheses into a single decision:

AND Rule

All hypotheses must be significant:
{
  "operator": "AND",
  "items": ["metric1", "metric2", "metric3"]
}

OR Rule

At least one hypothesis must be significant:
{
  "operator": "OR",
  "items": ["metric1", "metric2", "metric3"]
}

Complex Rule

Combine AND/OR logic:
{
  "operator": "AND",
  "items": [
    {
      "rule": {
        "operator": "AND",
        "items": ["guardrail1", "guardrail2"]
      }
    },
    {
      "rule": {
        "operator": "OR",
        "items": ["success1", "success2", "success3"]
      }
    }
  ]
}
Translates to: (guardrail1 AND guardrail2) AND (success1 OR success2 OR success3)

Group Structure

Define groups with allocation weights:
{
  "groups": [
    {
      "id": "control",
      "weight": 1
    },
    {
      "id": "treatment",
      "weight": 1
    }
  ]
}
Fields:
  • id: Unique identifier for the group
  • weight: Relative allocation (typically proportional to traffic split)
Common patterns:
  • Equal split: All weights = 1
  • 50/25/25: Weights = 2, 1, 1
  • 90/10: Weights = 9, 1

Statistical Parameters

Significance Level (Alpha)

Probability of false positive:
"alpha": 0.05  // 5% false positive rate
Common values:
  • 0.05: Standard significance level
  • 0.01: Stricter threshold
  • 0.10: More lenient threshold

Statistical Power

Probability of detecting a true effect:
"power": 0.80  // 80% power
Common values:
  • 0.80: Standard power level
  • 0.90: Higher power (larger sample needed)
  • 0.70: Lower power (smaller sample enough)

Data Types

Binary Data

For conversion-like metrics:
{
  "binaryData": {
    "successes": [100, 110],
    "trials": [1000, 1000]
  }
}
Use for: conversion rates, click-through rates, success/failure outcomes

Continuous Data

For numeric measurements:
{
  "continuousData": {
    "means": [42.5, 43.2],
    "variances": [12.3, 11.8],
    "counts": [1000, 1000]
  }
}
Use for: revenue, duration, ratings, counts

Analysis Methods

Different methods have different assumptions and use cases:
MethodSequentialData TypeUse Case
Fixed horizonNoBothFinal analysis only
SequentialYesBothContinuous monitoring
BayesianYesBothContinuous updates with prior knowledge

Method Assumptions

All methods assume:
  • Random assignment: Users randomly assigned to groups
  • Independence: User outcomes are independent
  • Stable variance: Variance doesn’t change over time
  • No spillover: Treatment doesn’t affect control group
Sequential methods additionally assume:
  • Data arrives continuously: New data added over time
  • Stopping rules followed: Don’t peek without accounting for it

Best Practices

Hypothesis Design

  • Set MDE/NIM based on business impact, not statistical convenience
  • Use superiority for metrics you want to improve
  • Use non-inferiority for metrics you want to protect
  • Define hypotheses before looking at data

Decision Rules

  • Require all guardrails to pass (use AND)
  • Allow any success metric to trigger (use OR)
  • Be explicit about what defines success
  • Consider multiple testing adjustments

Power Analysis

  • Run power analysis before experiment
  • Ensure adequate sample size for MDE
  • Consider seasonal effects on sample collection
  • Account for multiple comparisons in power calculation