← Back to Blog

A/B Testing Interview Questions: What You Need to Know

A/B Testing Is a Data Science Interview Staple

Nearly every data science interview at a product company includes A/B testing questions. Google, Meta, Netflix, Airbnb, and Uber all rely heavily on experimentation, and they want to know that you can design, analyze, and reason about experiments.

The Basics: What Is an A/B Test?

An A/B test (randomized controlled experiment) compares two versions of something: - Control (A): the current experience - Treatment (B): the new experience

Users are randomly assigned to one group. You measure a key metric and determine whether the difference is statistically significant.

Designing an A/B Test

Interview question: How would you design an A/B test for a new recommendation algorithm on our homepage?

Step 1: Define the Hypothesis

"We believe the new recommendation algorithm will increase click-through rate on homepage product recommendations."

  • H₀: The new algorithm has no effect on CTR
  • H₁: The new algorithm increases CTR

Step 2: Choose the Primary Metric

Pick ONE primary metric (the "Overall Evaluation Criterion"): - CTR — click-through rate on recommendations - Guardrail metrics — revenue per user, session duration, bounce rate (these should not get worse)

Interview tip: Mention guardrail metrics. It shows you think about unintended consequences.

Step 3: Calculate Sample Size

You need four inputs: 1. Baseline conversion rate (e.g., 5% CTR) 2. Minimum detectable effect (e.g., 10% relative increase → 5.5% CTR) 3. Significance level α (typically 0.05) 4. Power 1-β (typically 0.80)

from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize

effect = proportion_effectsize(0.05, 0.055)  # 5% → 5.5%
analysis = NormalIndPower()
sample_size = analysis.solve_power(effect, alpha=0.05, power=0.80)
# ~24,000 per group

Step 4: Determine Duration

Duration = sample_size_per_group × 2 / daily_traffic

If you have 10,000 daily users: 48,000 / 10,000 = ~5 days minimum.

But also consider: - Day-of-week effects — run for full weeks - Novelty effects — run for at least 2 weeks - Seasonality — avoid launching during holidays

Step 5: Randomization

Randomize at the user level, not the session level. If a user sees version A on Monday and version B on Tuesday, you can't attribute their behavior to either version.

Analyzing Results

The Z-Test for Proportions

For conversion rate metrics:

from statsmodels.stats.proportion import proportions_ztest

# control: 500 conversions out of 10000
# treatment: 550 conversions out of 10000
stat, p_value = proportions_ztest(
    [550, 500],      # successes
    [10000, 10000],  # totals
    alternative='larger'
)

Confidence Intervals

Always report confidence intervals, not just p-values:

Treatment CTR: 5.5% (95% CI: 5.1% - 5.9%)
Control CTR: 5.0% (95% CI: 4.6% - 5.4%)
Difference: +0.5pp (95% CI: -0.1pp to +1.1pp)

Interview tip: If the confidence interval includes zero, the result is not significant at the 95% level.

Common Interview Questions

"Your test shows a 3% lift but it's not significant. What do you do?"

Options: 1. Run longer — you may need more data 2. Accept the result — maybe the effect is too small to detect with your traffic 3. Segment analysis — check if the effect is concentrated in a subgroup (but correct for multiple comparisons) 4. Don't just launch it — a non-significant result means you can't confidently say there's an effect

"How do you handle multiple comparisons?"

When testing multiple metrics or segments, you increase the chance of false positives: - 1 test at α=0.05 → 5% chance of false positive - 20 tests at α=0.05 → 64% chance of at least one false positive

Solutions: - Bonferroni correction: divide α by number of tests (conservative) - Pre-register your primary metric to avoid post-hoc testing - Use a single primary metric with guardrails

"What is the novelty effect?"

Users may engage more with a new feature simply because it's new, not because it's better. After the novelty wears off, engagement drops back.

Mitigation: - Run tests for longer (2+ weeks) - Only include users who haven't been exposed to either version before - Monitor the treatment effect over time — if it declines, it might be novelty

"How do you handle network effects?"

If users interact (social networks, marketplaces), treating individual users as independent violates the i.i.d. assumption.

Solutions: - Cluster randomization — randomize at the group/geo level - Switchback experiments — alternate treatments over time - Adjust standard errors for clustering

"When would you NOT run an A/B test?"

  • Ethical concerns — you can't randomly deny medical treatment
  • Too few users — not enough power to detect meaningful effects
  • Irreversible changes — can't un-rebrand a company
  • Obvious improvements — fixing a broken checkout button doesn't need a test
  • Strategic decisions — A/B tests measure incremental changes, not transformative ones

Sequential Testing

Interview question: Your PM wants to check test results daily. How do you handle this?

Traditional fixed-sample tests require you to wait until the sample size is reached. "Peeking" inflates the false positive rate because you're doing multiple tests.

Solutions: - Sequential testing (e.g., always-valid p-values) — designed for continuous monitoring - Alpha spending — allocate your Type I error budget across multiple looks - Bayesian A/B testing — naturally handles continuous monitoring

Practical Tips for Interviews

  1. Start with the business question — don't jump to statistics
  2. Define metrics before methodology — what you measure matters more than how
  3. Always mention power and sample size — shows you think practically
  4. Discuss guardrail metrics — shows you think about trade-offs
  5. Know when NOT to test — shows maturity and judgment

Practice Statistics Problems

Strengthen your experimentation skills with our statistics interview problems covering A/B testing, hypothesis testing, and probability.

Practice Makes Perfect

Ready to test your skills?

Practice real Statistics interview questions from top companies — with solutions.

Get interview tips in your inbox

Join data scientists preparing smarter. No spam, unsubscribe anytime.