A/B Testing Interview Questions: What You Need to Know
A/B Testing Is a Data Science Interview Staple
Nearly every data science interview at a product company includes A/B testing questions. Google, Meta, Netflix, Airbnb, and Uber all rely heavily on experimentation, and they want to know that you can design, analyze, and reason about experiments.
The Basics: What Is an A/B Test?
An A/B test (randomized controlled experiment) compares two versions of something: - Control (A): the current experience - Treatment (B): the new experience
Users are randomly assigned to one group. You measure a key metric and determine whether the difference is statistically significant.
Designing an A/B Test
Interview question: How would you design an A/B test for a new recommendation algorithm on our homepage?
Step 1: Define the Hypothesis
"We believe the new recommendation algorithm will increase click-through rate on homepage product recommendations."
- H₀: The new algorithm has no effect on CTR
- H₁: The new algorithm increases CTR
Step 2: Choose the Primary Metric
Pick ONE primary metric (the "Overall Evaluation Criterion"): - CTR — click-through rate on recommendations - Guardrail metrics — revenue per user, session duration, bounce rate (these should not get worse)
Interview tip: Mention guardrail metrics. It shows you think about unintended consequences.
Step 3: Calculate Sample Size
You need four inputs: 1. Baseline conversion rate (e.g., 5% CTR) 2. Minimum detectable effect (e.g., 10% relative increase → 5.5% CTR) 3. Significance level α (typically 0.05) 4. Power 1-β (typically 0.80)
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize
effect = proportion_effectsize(0.05, 0.055) # 5% → 5.5%
analysis = NormalIndPower()
sample_size = analysis.solve_power(effect, alpha=0.05, power=0.80)
# ~24,000 per group
Step 4: Determine Duration
Duration = sample_size_per_group × 2 / daily_traffic
If you have 10,000 daily users: 48,000 / 10,000 = ~5 days minimum.
But also consider: - Day-of-week effects — run for full weeks - Novelty effects — run for at least 2 weeks - Seasonality — avoid launching during holidays
Step 5: Randomization
Randomize at the user level, not the session level. If a user sees version A on Monday and version B on Tuesday, you can't attribute their behavior to either version.
Analyzing Results
The Z-Test for Proportions
For conversion rate metrics:
from statsmodels.stats.proportion import proportions_ztest
# control: 500 conversions out of 10000
# treatment: 550 conversions out of 10000
stat, p_value = proportions_ztest(
[550, 500], # successes
[10000, 10000], # totals
alternative='larger'
)
Confidence Intervals
Always report confidence intervals, not just p-values:
Treatment CTR: 5.5% (95% CI: 5.1% - 5.9%)
Control CTR: 5.0% (95% CI: 4.6% - 5.4%)
Difference: +0.5pp (95% CI: -0.1pp to +1.1pp)
Interview tip: If the confidence interval includes zero, the result is not significant at the 95% level.
Common Interview Questions
"Your test shows a 3% lift but it's not significant. What do you do?"
Options: 1. Run longer — you may need more data 2. Accept the result — maybe the effect is too small to detect with your traffic 3. Segment analysis — check if the effect is concentrated in a subgroup (but correct for multiple comparisons) 4. Don't just launch it — a non-significant result means you can't confidently say there's an effect
"How do you handle multiple comparisons?"
When testing multiple metrics or segments, you increase the chance of false positives: - 1 test at α=0.05 → 5% chance of false positive - 20 tests at α=0.05 → 64% chance of at least one false positive
Solutions: - Bonferroni correction: divide α by number of tests (conservative) - Pre-register your primary metric to avoid post-hoc testing - Use a single primary metric with guardrails
"What is the novelty effect?"
Users may engage more with a new feature simply because it's new, not because it's better. After the novelty wears off, engagement drops back.
Mitigation: - Run tests for longer (2+ weeks) - Only include users who haven't been exposed to either version before - Monitor the treatment effect over time — if it declines, it might be novelty
"How do you handle network effects?"
If users interact (social networks, marketplaces), treating individual users as independent violates the i.i.d. assumption.
Solutions: - Cluster randomization — randomize at the group/geo level - Switchback experiments — alternate treatments over time - Adjust standard errors for clustering
"When would you NOT run an A/B test?"
- Ethical concerns — you can't randomly deny medical treatment
- Too few users — not enough power to detect meaningful effects
- Irreversible changes — can't un-rebrand a company
- Obvious improvements — fixing a broken checkout button doesn't need a test
- Strategic decisions — A/B tests measure incremental changes, not transformative ones
Sequential Testing
Interview question: Your PM wants to check test results daily. How do you handle this?
Traditional fixed-sample tests require you to wait until the sample size is reached. "Peeking" inflates the false positive rate because you're doing multiple tests.
Solutions: - Sequential testing (e.g., always-valid p-values) — designed for continuous monitoring - Alpha spending — allocate your Type I error budget across multiple looks - Bayesian A/B testing — naturally handles continuous monitoring
Practical Tips for Interviews
- Start with the business question — don't jump to statistics
- Define metrics before methodology — what you measure matters more than how
- Always mention power and sample size — shows you think practically
- Discuss guardrail metrics — shows you think about trade-offs
- Know when NOT to test — shows maturity and judgment
Practice Statistics Problems
Strengthen your experimentation skills with our statistics interview problems covering A/B testing, hypothesis testing, and probability.
Ready to test your skills?
Practice real Statistics interview questions from top companies — with solutions.
Get interview tips in your inbox
Join data scientists preparing smarter. No spam, unsubscribe anytime.