Python Dictionary Methods for Data Science
Why Dictionaries Matter in Data Science
Python dictionaries are one of the most versatile data structures you will use in data science. They provide O(1) average-case lookups, making them ideal for counting, grouping, caching, and mapping values. Data science interviews frequently test dictionary skills because they reveal how well you understand Python's core data structures.
Essential Dictionary Methods
Creating Dictionaries
# Literal syntax
user = {'name': 'Alice', 'role': 'Data Scientist', 'level': 'Senior'}
# From a list of tuples
pairs = [('a', 1), ('b', 2), ('c', 3)]
d = dict(pairs)
# From keys with a default value
keys = ['python', 'sql', 'statistics']
skill_scores = dict.fromkeys(keys, 0)
# {'python': 0, 'sql': 0, 'statistics': 0}
get() — Safe Value Access
The get() method returns a default value instead of raising a KeyError:
config = {'batch_size': 32, 'learning_rate': 0.001}
# Without get() — raises KeyError if missing
# epochs = config['epochs'] # KeyError!
# With get() — returns default value
epochs = config.get('epochs', 10) # Returns 10
batch_size = config.get('batch_size', 64) # Returns 32 (key exists)
This is essential in data pipelines where missing keys should not crash your program.
items(), keys(), values()
metrics = {'accuracy': 0.92, 'precision': 0.88, 'recall': 0.95}
# Iterate over key-value pairs
for metric_name, score in metrics.items():
print(f"{metric_name}: {score:.2f}")
# Get all keys or values
print(list(metrics.keys())) # ['accuracy', 'precision', 'recall']
print(list(metrics.values())) # [0.92, 0.88, 0.95]
update() — Merge Dictionaries
defaults = {'batch_size': 32, 'epochs': 10, 'lr': 0.001}
custom = {'epochs': 50, 'lr': 0.01}
# Merge: custom values override defaults
config = {**defaults, **custom}
# {'batch_size': 32, 'epochs': 50, 'lr': 0.01}
# Or use update() (modifies in place)
defaults.update(custom)
In Python 3.9+, you can also use the | operator:
config = defaults | custom
setdefault() — Get or Set
# Group items by category
items = [
('fruit', 'apple'), ('veggie', 'carrot'),
('fruit', 'banana'), ('veggie', 'pea')
]
groups = {}
for category, item in items:
groups.setdefault(category, []).append(item)
# {'fruit': ['apple', 'banana'], 'veggie': ['carrot', 'pea']}
setdefault returns the existing value if the key exists, or sets it to the default and returns that. This is a clean pattern for building grouped data structures.
Dictionary Comprehensions
Dictionary comprehensions are concise and Pythonic:
# Square numbers
squares = {x: x**2 for x in range(1, 6)}
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Filter a dictionary
scores = {'alice': 92, 'bob': 67, 'charlie': 85, 'diana': 45}
passing = {name: score for name, score in scores.items() if score >= 70}
# {'alice': 92, 'charlie': 85}
# Invert a dictionary (swap keys and values)
original = {'a': 1, 'b': 2, 'c': 3}
inverted = {v: k for k, v in original.items()}
# {1: 'a', 2: 'b', 3: 'c'}
# Transform values
normalized = {k: v / max(scores.values()) for k, v in scores.items()}
collections.defaultdict
defaultdict automatically creates default values for missing keys:
from collections import defaultdict
# Group words by first letter
words = ['apple', 'banana', 'avocado', 'blueberry', 'cherry', 'apricot']
by_letter = defaultdict(list)
for word in words:
by_letter[word[0]].append(word)
# defaultdict(list, {'a': ['apple', 'avocado', 'apricot'],
# 'b': ['banana', 'blueberry'],
# 'c': ['cherry']})
Common default factories:
# Counting
counts = defaultdict(int)
for item in data:
counts[item] += 1
# Nested dictionaries
nested = defaultdict(lambda: defaultdict(int))
nested['2024']['Q1'] += 100
nested['2024']['Q2'] += 200
collections.Counter
Counter is a specialized dictionary for counting:
from collections import Counter
# Count occurrences
words = ['python', 'sql', 'python', 'pandas', 'sql', 'python']
word_counts = Counter(words)
# Counter({'python': 3, 'sql': 2, 'pandas': 1})
# Most common elements
print(word_counts.most_common(2))
# [('python', 3), ('sql', 2)]
# Arithmetic operations
survey_1 = Counter({'python': 50, 'r': 30, 'sql': 45})
survey_2 = Counter({'python': 60, 'r': 25, 'sql': 55})
combined = survey_1 + survey_2
# Counter({'python': 110, 'sql': 100, 'r': 55})
Counter is invaluable for text analysis, frequency distributions, and any counting problem.
Interview Patterns
Pattern 1: Two-Sum Problem
def two_sum(nums, target):
# Find indices of two numbers that add up to target
seen = {}
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []
This O(n) solution uses a dictionary as a lookup table — a classic interview pattern.
Pattern 2: Frequency Analysis
def find_duplicates(lst):
# Find all elements that appear more than once
counts = Counter(lst)
return [item for item, count in counts.items() if count > 1]
Pattern 3: Grouping Data
def group_by_key(records, key):
# Group a list of dicts by a given key
groups = defaultdict(list)
for record in records:
groups[record[key]].append(record)
return dict(groups)
employees = [
{'name': 'Alice', 'dept': 'Engineering'},
{'name': 'Bob', 'dept': 'Marketing'},
{'name': 'Charlie', 'dept': 'Engineering'},
]
print(group_by_key(employees, 'dept'))
Pattern 4: Caching with Dictionaries
def fibonacci(n, cache={}):
# Compute Fibonacci with memoization
if n in cache:
return cache[n]
if n <= 1:
return n
cache[n] = fibonacci(n - 1) + fibonacci(n - 2)
return cache[n]
Dictionaries with Pandas
Dictionaries and Pandas work together frequently:
import pandas as pd
# Create DataFrame from dict
data = {'name': ['Alice', 'Bob'], 'salary': [120000, 95000]}
df = pd.DataFrame(data)
# Map values using a dictionary
status_map = {1: 'Active', 0: 'Inactive'}
df['status_label'] = df['status'].map(status_map)
# Convert DataFrame to dict
records = df.to_dict('records')
# [{'name': 'Alice', 'salary': 120000}, {'name': 'Bob', 'salary': 95000}]
Practice Problems
For more Python fundamentals practice, visit the Python basics practice page.
Key Takeaways
Master get(), items(), comprehensions, defaultdict, and Counter. These tools cover the vast majority of dictionary use cases in data science. In interviews, dictionaries are the go-to data structure for O(1) lookups, counting, grouping, and caching. Practice the common patterns until they are second nature.
Ready to test your skills?
Practice real Python Basics interview questions from top companies — with solutions.
Get interview tips in your inbox
Join data scientists preparing smarter. No spam, unsubscribe anytime.