Common Statistics Interview Questions for Data Science
Why Statistics Questions Matter
Statistics is the backbone of data science. While SQL and Python test your technical skills, statistics questions test whether you can think like a data scientist. Every FAANG company includes a statistics or probability round, and even SQL-heavy interviews sneak in questions about statistical concepts.
Probability Fundamentals
Conditional Probability
Interview question: If 1% of a population has a disease, and a test is 99% accurate (both sensitivity and specificity), what's the probability someone who tests positive actually has the disease?
This is Bayes' Theorem in action:
P(Disease | Positive) = P(Positive | Disease) × P(Disease) / P(Positive)
P(Positive) = P(Positive | Disease) × P(Disease) + P(Positive | No Disease) × P(No Disease)
P(Positive) = 0.99 × 0.01 + 0.01 × 0.99 = 0.0198
P(Disease | Positive) = 0.99 × 0.01 / 0.0198 = 0.50 = 50%
Despite the 99% accuracy, there's only a 50% chance the person is actually sick. This is the base rate fallacy — the low prevalence (1%) dramatically affects the result.
Expected Value
Interview question: You flip a fair coin. Heads you win $10, tails you lose $5. What's the expected value?
E(X) = 0.5 × $10 + 0.5 × (-$5) = $2.50
Expected value questions are common because they test whether candidates can think about decisions under uncertainty — a core data science skill.
The Birthday Problem
Interview question: How many people do you need in a room for a 50% chance that two share a birthday?
The answer is 23 — surprisingly low. The key insight is calculating the probability that NO one shares a birthday:
P(no match) = 365/365 × 364/365 × 363/365 × ... × (365-n+1)/365
At n=23, this drops below 0.5.
Distributions You Must Know
Normal Distribution
- 68-95-99.7 rule: 68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD
- Central Limit Theorem: sample means are approximately normal regardless of population distribution (for large enough samples)
Binomial Distribution
Models the number of successes in n independent trials with probability p:
- Mean: n × p
- Variance: n × p × (1-p)
- Example: number of conversions out of 1000 website visitors
Poisson Distribution
Models the count of events in a fixed time/space interval:
- Mean = Variance = λ (lambda)
- Example: number of customer service calls per hour
Interview tip: Know when each distribution applies. "Number of heads in 100 flips" → Binomial. "Number of website crashes per month" → Poisson. "Height of adults" → Normal.
Hypothesis Testing
The Framework
- State hypotheses: H₀ (null — no effect) vs H₁ (alternative — there is an effect)
- Choose significance level: α = 0.05 (typically)
- Collect data and compute test statistic
- Calculate p-value
- Make decision: reject H₀ if p-value < α
Type I and Type II Errors
| H₀ is True | H₀ is False | |
|---|---|---|
| Reject H₀ | Type I Error (α) | Correct (Power) |
| Fail to Reject | Correct | Type II Error (β) |
Interview question: What's worse, a Type I or Type II error?
It depends on context: - Medical test: Type II (missing a disease) is worse - Spam filter: Type I (blocking legitimate email) is worse - A/B test: Type I (shipping a bad feature) is usually worse
P-Value
Interview question: What does a p-value of 0.03 mean?
A p-value of 0.03 means: if the null hypothesis is true, there's a 3% probability of observing a result as extreme (or more extreme) than what we got.
It does NOT mean there's a 97% chance the alternative hypothesis is true.
A/B Testing
A/B testing is the most practically relevant statistics topic in data science interviews.
Designing an A/B Test
Interview question: How would you design an A/B test for a new checkout button?
- Define the metric: conversion rate (or revenue per user)
- Choose significance level: α = 0.05
- Determine minimum detectable effect: e.g., 2% relative improvement
- Calculate sample size: based on baseline rate, MDE, α, and desired power (typically 80%)
- Randomize users into control and treatment
- Run the test for the calculated duration
- Analyze results with appropriate test
Sample Size Calculation
The key formula components: - Baseline conversion rate (p₁) - Minimum detectable effect (determines p₂) - Significance level (α, typically 0.05) - Power (1-β, typically 0.80)
Higher power, smaller MDE, or lower α all require larger sample sizes.
Common A/B Testing Pitfalls
- Peeking at results — checking before the test completes inflates false positive rates
- Multiple comparisons — testing 20 metrics means one will be significant by chance (apply Bonferroni correction)
- Simpson's Paradox — a trend that appears in groups can reverse when groups are combined
- Novelty effect — users may engage with new features just because they're new, not because they're better
- Network effects — if users interact, independence assumption is violated
When Results Aren't Significant
Interview question: Your A/B test shows a 5% lift in conversion but it's not statistically significant. What do you do?
Options: - Run the test longer to increase sample size - Accept insufficient power for this effect size - Look at segment-level results (but be careful of multiple comparisons) - Consider practical significance even without statistical significance
Correlation vs Causation
Interview question: Ice cream sales and drowning deaths are correlated. Does ice cream cause drowning?
No — both are caused by a confounder (warm weather). This is the most fundamental concept in causal inference:
- Correlation: two variables move together
- Causation: one variable directly influences another
- Only randomized experiments (like A/B tests) can establish causation from data alone
Practice Statistics Problems
Test your statistical thinking with our statistics interview problems — covering probability, hypothesis testing, A/B testing, and more.
Ready to test your skills?
Practice real Statistics interview questions from top companies — with solutions.
Get interview tips in your inbox
Join data scientists preparing smarter. No spam, unsubscribe anytime.