← Back to Blog

Machine Learning Interview Questions: A Practical Guide

How ML Interviews Work

Machine learning interviews for data science roles focus on intuition and practical understanding — not mathematical proofs. You need to explain concepts clearly, know the trade-offs between approaches, and demonstrate that you can make sound modeling decisions.

The Bias-Variance Tradeoff

This is the single most important concept in ML interviews.

Bias — error from overly simplistic assumptions. A linear model fitting a curved relationship has high bias (underfitting).

Variance — error from sensitivity to training data fluctuations. A deep decision tree that memorizes training data has high variance (overfitting).

The tradeoff: As model complexity increases, bias decreases but variance increases. The sweet spot minimizes total error.

Interview question: Your model has 95% training accuracy but 60% test accuracy. What's happening and how do you fix it?

Answer: High variance (overfitting). Solutions: - More training data - Regularization (L1/L2) - Reduce model complexity (fewer features, shallower trees) - Cross-validation for hyperparameter tuning - Ensemble methods (bagging reduces variance)

Model Selection

When to Use What

Algorithm Best For Pros Cons
Logistic Regression Binary classification, interpretability Fast, interpretable, works well with linear boundaries Can't capture non-linear relationships
Decision Trees Non-linear relationships, feature importance Interpretable, handles mixed data types Overfits easily, unstable
Random Forest General purpose classification/regression Robust, handles non-linearity, feature importance Less interpretable, slower than single trees
Gradient Boosting (XGBoost) Competitions, tabular data Often highest accuracy, handles missing values Can overfit, many hyperparameters
Linear Regression Continuous target, interpretability Simple, fast, well-understood Assumes linear relationship
K-Nearest Neighbors Small datasets, non-parametric problems Simple, no training phase Slow at prediction time, curse of dimensionality

Interview question: When would you choose logistic regression over a random forest?

Choose logistic regression when: - You need interpretable coefficients (regulated industries) - The relationship is approximately linear - You have limited data (simpler models generalize better) - Speed matters (both training and inference)

Feature Engineering

Feature engineering often matters more than model selection. Key techniques:

Encoding Categorical Variables

# One-hot encoding (for nominal categories)
pd.get_dummies(df, columns=['color'])

# Label encoding (for ordinal categories)
df['size_encoded'] = df['size'].map({'S': 1, 'M': 2, 'L': 3, 'XL': 4})

# Target encoding (for high-cardinality categories)
# Mean of target for each category value

Handling Missing Data

  • Drop rows/columns — only if very few missing values
  • Mean/median imputation — simple but can distort distributions
  • Mode imputation — for categorical features
  • Indicator variable — add a binary "is_missing" feature
  • Model-based imputation — use other features to predict missing values

Feature Scaling

  • StandardScaler — zero mean, unit variance. Use for linear models, SVMs, KNN
  • MinMaxScaler — scales to [0, 1]. Use when you need bounded values
  • Tree-based models don't need scaling — they split on thresholds, not distances

Evaluation Metrics

Classification Metrics

Accuracy — percentage of correct predictions. Misleading for imbalanced classes.

Precision — of all positive predictions, how many were actually positive? - High precision matters when false positives are costly (spam filter)

Recall — of all actual positives, how many did we catch? - High recall matters when false negatives are costly (disease detection)

F1 Score — harmonic mean of precision and recall. Use when you need to balance both.

AUC-ROC — measures discrimination ability across all thresholds. Good for comparing models.

Interview question: You're building a fraud detection model. Which metric do you optimize?

Answer: Recall (or F1) — missing fraud (false negative) is more costly than flagging legitimate transactions (false positive). But you'd also monitor precision to avoid blocking too many good transactions.

Regression Metrics

  • MSE / RMSE — penalizes large errors heavily
  • MAE — more robust to outliers
  • — proportion of variance explained (0 to 1)
  • MAPE — percentage error, useful for business interpretation

Cross-Validation

Interview question: Why use cross-validation instead of a single train/test split?

A single split gives one estimate of model performance, which could be lucky or unlucky. K-fold cross-validation: 1. Splits data into K folds 2. Trains K models, each using a different fold as the test set 3. Averages the K performance scores

This gives a more reliable performance estimate with a confidence interval.

When to use what: - K-fold (K=5 or 10) — standard approach for most problems - Stratified K-fold — preserves class distribution in each fold (use for imbalanced data) - Time-series split — respects temporal ordering (never train on future data)

Regularization

Regularization prevents overfitting by penalizing model complexity:

  • L1 (Lasso) — drives some coefficients to exactly zero (feature selection)
  • L2 (Ridge) — shrinks all coefficients toward zero (prevents any single feature from dominating)
  • ElasticNet — combination of L1 and L2
  • Dropout — randomly drops neurons during training (neural networks)
  • Early stopping — stop training when validation loss starts increasing

Common Interview Scenarios

Imbalanced Classes

You have 99% negative and 1% positive examples. How do you handle this?

  1. Don't use accuracy — a model predicting all negatives gets 99%
  2. Resampling: oversample minority (SMOTE) or undersample majority
  3. Class weights: penalize misclassification of minority class more heavily
  4. Use appropriate metrics: precision, recall, F1, AUC
  5. Threshold tuning: adjust decision threshold based on business needs

Feature Importance

How do you determine which features matter most?

  1. Coefficient magnitude (linear models) — after scaling features
  2. Tree-based importance — Gini importance or permutation importance
  3. SHAP values — model-agnostic, theoretically grounded
  4. Correlation analysis — simple but doesn't capture non-linear relationships
  5. Recursive feature elimination — systematically remove least important features

Practice ML Problems

Explore our machine learning interview problems for hands-on practice with real questions from top companies.

Practice Makes Perfect

Ready to test your skills?

Practice real Machine Learning interview questions from top companies — with solutions.

Get interview tips in your inbox

Join data scientists preparing smarter. No spam, unsubscribe anytime.