← Back to Blog

Python List Comprehensions for Data Science Interviews

Why Interviewers Test List Comprehensions

List comprehensions are Python's most distinctive feature for concise data transformation. In data science interviews, they signal that a candidate thinks in Python — not just writing loops translated from another language. Companies like Netflix, Spotify, and Airbnb expect fluent use of comprehensions in their Python rounds.

Basic Syntax

A list comprehension creates a new list by applying an expression to each item in an iterable:

# Traditional loop
squares = []
for x in range(10):
    squares.append(x ** 2)

# List comprehension
squares = [x ** 2 for x in range(10)]

The general pattern is:

[expression for item in iterable]

Adding Conditions (Filtering)

Use if to filter which items are included:

# Only even numbers
even_squares = [x ** 2 for x in range(20) if x % 2 == 0]

# Filter strings by length
long_words = [w for w in words if len(w) > 5]

# Multiple conditions
results = [x for x in data if x > 0 if x < 100]
# Equivalent to: if x > 0 and x < 100

If-Else (Conditional Expression)

When you need different values based on a condition, the if-else goes before the for:

# Label values as "high" or "low"
labels = ["high" if x > 50 else "low" for x in scores]

# Replace negative values with zero
cleaned = [x if x >= 0 else 0 for x in raw_data]

Common interview mistake: Putting if-else after the for — that's filtering syntax, not conditional expression syntax.

Nested Loops

Comprehensions can have multiple for clauses:

# Flatten a 2D list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

# All pairs from two lists
pairs = [(x, y) for x in [1, 2, 3] for y in ['a', 'b']]
# [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]

Reading order: The for clauses are read left to right, same as nested loops would be written.

Dictionary and Set Comprehensions

The same syntax works for dictionaries and sets:

# Dictionary comprehension
word_lengths = {word: len(word) for word in words}

# Invert a dictionary
inverted = {v: k for k, v in original.items()}

# Set comprehension (removes duplicates)
unique_lengths = {len(word) for word in words}

Dictionary Comprehension Interview Pattern

# Group items by a key
from collections import defaultdict

# With comprehension (when values are computed, not accumulated)
scores_by_grade = {
    grade: [s for s in scores if grade_of(s) == grade]
    for grade in ['A', 'B', 'C', 'D', 'F']
}

Generator Expressions

Replace the brackets with parentheses for lazy evaluation:

# List comprehension — creates entire list in memory
total = sum([x ** 2 for x in range(1_000_000)])

# Generator expression — processes one at a time
total = sum(x ** 2 for x in range(1_000_000))

For large datasets, generator expressions use constant memory instead of O(n). This is a common optimization question in interviews.

Data Science Applications

Cleaning and Transforming Data

# Parse numeric strings, handling errors
raw = ["42", "3.14", "N/A", "100", ""]
values = [float(x) for x in raw if x and x != "N/A"]

# Normalize column names
columns = ["First Name", "Last-Name", "E Mail"]
clean_cols = [col.lower().replace(" ", "_").replace("-", "_") for col in columns]
# ['first_name', 'last_name', 'e_mail']

Feature Engineering

# Create binary features
features = [1 if val > threshold else 0 for val in column]

# Extract patterns from strings
import re
emails = ["[email protected]", "[email protected]", "[email protected]"]
domains = [re.search(r'@(.+)', e).group(1) for e in emails]

Working with DataFrames

import pandas as pd

# Select columns matching a pattern
numeric_cols = [col for col in df.columns if df[col].dtype in ['int64', 'float64']]

# Create multiple aggregation expressions
aggs = {col: ['mean', 'std'] for col in numeric_cols}
summary = df.groupby('category').agg(aggs)

Common Interview Questions

Question 1: Flatten and Filter

Given a list of lists of integers, create a flat list containing only positive even numbers.

nested = [[1, -2, 3, 4], [-5, 6, 7], [8, -9, 10]]
result = [x for sublist in nested for x in sublist if x > 0 and x % 2 == 0]
# [4, 6, 8, 10]

Question 2: Word Frequency

Count word frequencies in a sentence (case-insensitive).

from collections import Counter

sentence = "The cat sat on the mat and the cat"
word_counts = Counter(word.lower() for word in sentence.split())
# Counter({'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1, 'and': 1})

Question 3: Matrix Transpose

Transpose a matrix represented as a list of lists.

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
transposed = [[row[i] for row in matrix] for i in range(len(matrix[0]))]
# [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

When NOT to Use Comprehensions

Comprehensions aren't always the answer:

  1. Side effects — If you're calling functions for their side effects (like print()), use a regular loop
  2. Complex logic — If the comprehension is hard to read, a loop is better
  3. Multiple statements — Comprehensions are single expressions; use loops for multi-step logic

Interview tip: If you write a comprehension that's more than one line long, refactor it into a loop or break it into steps. Readability matters.

Practice List Comprehension Problems

Test your skills with our Python list comprehension problems — real interview questions with step-by-step solutions.

Practice Makes Perfect

Ready to test your skills?

Practice real List Comprehension interview questions from top companies — with solutions.

Get interview tips in your inbox

Join data scientists preparing smarter. No spam, unsubscribe anytime.