Python List Comprehensions for Data Science Interviews
Why Interviewers Test List Comprehensions
List comprehensions are Python's most distinctive feature for concise data transformation. In data science interviews, they signal that a candidate thinks in Python — not just writing loops translated from another language. Companies like Netflix, Spotify, and Airbnb expect fluent use of comprehensions in their Python rounds.
Basic Syntax
A list comprehension creates a new list by applying an expression to each item in an iterable:
# Traditional loop
squares = []
for x in range(10):
squares.append(x ** 2)
# List comprehension
squares = [x ** 2 for x in range(10)]
The general pattern is:
[expression for item in iterable]
Adding Conditions (Filtering)
Use if to filter which items are included:
# Only even numbers
even_squares = [x ** 2 for x in range(20) if x % 2 == 0]
# Filter strings by length
long_words = [w for w in words if len(w) > 5]
# Multiple conditions
results = [x for x in data if x > 0 if x < 100]
# Equivalent to: if x > 0 and x < 100
If-Else (Conditional Expression)
When you need different values based on a condition, the if-else goes before the for:
# Label values as "high" or "low"
labels = ["high" if x > 50 else "low" for x in scores]
# Replace negative values with zero
cleaned = [x if x >= 0 else 0 for x in raw_data]
Common interview mistake: Putting if-else after the for — that's filtering syntax, not conditional expression syntax.
Nested Loops
Comprehensions can have multiple for clauses:
# Flatten a 2D list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]
# All pairs from two lists
pairs = [(x, y) for x in [1, 2, 3] for y in ['a', 'b']]
# [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]
Reading order: The for clauses are read left to right, same as nested loops would be written.
Dictionary and Set Comprehensions
The same syntax works for dictionaries and sets:
# Dictionary comprehension
word_lengths = {word: len(word) for word in words}
# Invert a dictionary
inverted = {v: k for k, v in original.items()}
# Set comprehension (removes duplicates)
unique_lengths = {len(word) for word in words}
Dictionary Comprehension Interview Pattern
# Group items by a key
from collections import defaultdict
# With comprehension (when values are computed, not accumulated)
scores_by_grade = {
grade: [s for s in scores if grade_of(s) == grade]
for grade in ['A', 'B', 'C', 'D', 'F']
}
Generator Expressions
Replace the brackets with parentheses for lazy evaluation:
# List comprehension — creates entire list in memory
total = sum([x ** 2 for x in range(1_000_000)])
# Generator expression — processes one at a time
total = sum(x ** 2 for x in range(1_000_000))
For large datasets, generator expressions use constant memory instead of O(n). This is a common optimization question in interviews.
Data Science Applications
Cleaning and Transforming Data
# Parse numeric strings, handling errors
raw = ["42", "3.14", "N/A", "100", ""]
values = [float(x) for x in raw if x and x != "N/A"]
# Normalize column names
columns = ["First Name", "Last-Name", "E Mail"]
clean_cols = [col.lower().replace(" ", "_").replace("-", "_") for col in columns]
# ['first_name', 'last_name', 'e_mail']
Feature Engineering
# Create binary features
features = [1 if val > threshold else 0 for val in column]
# Extract patterns from strings
import re
emails = ["[email protected]", "[email protected]", "[email protected]"]
domains = [re.search(r'@(.+)', e).group(1) for e in emails]
Working with DataFrames
import pandas as pd
# Select columns matching a pattern
numeric_cols = [col for col in df.columns if df[col].dtype in ['int64', 'float64']]
# Create multiple aggregation expressions
aggs = {col: ['mean', 'std'] for col in numeric_cols}
summary = df.groupby('category').agg(aggs)
Common Interview Questions
Question 1: Flatten and Filter
Given a list of lists of integers, create a flat list containing only positive even numbers.
nested = [[1, -2, 3, 4], [-5, 6, 7], [8, -9, 10]]
result = [x for sublist in nested for x in sublist if x > 0 and x % 2 == 0]
# [4, 6, 8, 10]
Question 2: Word Frequency
Count word frequencies in a sentence (case-insensitive).
from collections import Counter
sentence = "The cat sat on the mat and the cat"
word_counts = Counter(word.lower() for word in sentence.split())
# Counter({'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1, 'and': 1})
Question 3: Matrix Transpose
Transpose a matrix represented as a list of lists.
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
transposed = [[row[i] for row in matrix] for i in range(len(matrix[0]))]
# [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
When NOT to Use Comprehensions
Comprehensions aren't always the answer:
- Side effects — If you're calling functions for their side effects (like
print()), use a regular loop - Complex logic — If the comprehension is hard to read, a loop is better
- Multiple statements — Comprehensions are single expressions; use loops for multi-step logic
Interview tip: If you write a comprehension that's more than one line long, refactor it into a loop or break it into steps. Readability matters.
Practice List Comprehension Problems
Test your skills with our Python list comprehension problems — real interview questions with step-by-step solutions.
Ready to test your skills?
Practice real List Comprehension interview questions from top companies — with solutions.
Get interview tips in your inbox
Join data scientists preparing smarter. No spam, unsubscribe anytime.