← Back to Blog

Python Dictionary Methods for Data Science

Why Dictionaries Matter in Data Science

Python dictionaries are one of the most versatile data structures you will use in data science. They provide O(1) average-case lookups, making them ideal for counting, grouping, caching, and mapping values. Data science interviews frequently test dictionary skills because they reveal how well you understand Python's core data structures.

Essential Dictionary Methods

Creating Dictionaries

# Literal syntax
user = {'name': 'Alice', 'role': 'Data Scientist', 'level': 'Senior'}

# From a list of tuples
pairs = [('a', 1), ('b', 2), ('c', 3)]
d = dict(pairs)

# From keys with a default value
keys = ['python', 'sql', 'statistics']
skill_scores = dict.fromkeys(keys, 0)
# {'python': 0, 'sql': 0, 'statistics': 0}

get() — Safe Value Access

The get() method returns a default value instead of raising a KeyError:

config = {'batch_size': 32, 'learning_rate': 0.001}

# Without get() — raises KeyError if missing
# epochs = config['epochs']  # KeyError!

# With get() — returns default value
epochs = config.get('epochs', 10)  # Returns 10
batch_size = config.get('batch_size', 64)  # Returns 32 (key exists)

This is essential in data pipelines where missing keys should not crash your program.

items(), keys(), values()

metrics = {'accuracy': 0.92, 'precision': 0.88, 'recall': 0.95}

# Iterate over key-value pairs
for metric_name, score in metrics.items():
    print(f"{metric_name}: {score:.2f}")

# Get all keys or values
print(list(metrics.keys()))    # ['accuracy', 'precision', 'recall']
print(list(metrics.values()))  # [0.92, 0.88, 0.95]

update() — Merge Dictionaries

defaults = {'batch_size': 32, 'epochs': 10, 'lr': 0.001}
custom = {'epochs': 50, 'lr': 0.01}

# Merge: custom values override defaults
config = {**defaults, **custom}
# {'batch_size': 32, 'epochs': 50, 'lr': 0.01}

# Or use update() (modifies in place)
defaults.update(custom)

In Python 3.9+, you can also use the | operator:

config = defaults | custom

setdefault() — Get or Set

# Group items by category
items = [
    ('fruit', 'apple'), ('veggie', 'carrot'),
    ('fruit', 'banana'), ('veggie', 'pea')
]

groups = {}
for category, item in items:
    groups.setdefault(category, []).append(item)
# {'fruit': ['apple', 'banana'], 'veggie': ['carrot', 'pea']}

setdefault returns the existing value if the key exists, or sets it to the default and returns that. This is a clean pattern for building grouped data structures.

Dictionary Comprehensions

Dictionary comprehensions are concise and Pythonic:

# Square numbers
squares = {x: x**2 for x in range(1, 6)}
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# Filter a dictionary
scores = {'alice': 92, 'bob': 67, 'charlie': 85, 'diana': 45}
passing = {name: score for name, score in scores.items() if score >= 70}
# {'alice': 92, 'charlie': 85}

# Invert a dictionary (swap keys and values)
original = {'a': 1, 'b': 2, 'c': 3}
inverted = {v: k for k, v in original.items()}
# {1: 'a', 2: 'b', 3: 'c'}

# Transform values
normalized = {k: v / max(scores.values()) for k, v in scores.items()}

collections.defaultdict

defaultdict automatically creates default values for missing keys:

from collections import defaultdict

# Group words by first letter
words = ['apple', 'banana', 'avocado', 'blueberry', 'cherry', 'apricot']

by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

# defaultdict(list, {'a': ['apple', 'avocado', 'apricot'],
#                     'b': ['banana', 'blueberry'],
#                     'c': ['cherry']})

Common default factories:

# Counting
counts = defaultdict(int)
for item in data:
    counts[item] += 1

# Nested dictionaries
nested = defaultdict(lambda: defaultdict(int))
nested['2024']['Q1'] += 100
nested['2024']['Q2'] += 200

collections.Counter

Counter is a specialized dictionary for counting:

from collections import Counter

# Count occurrences
words = ['python', 'sql', 'python', 'pandas', 'sql', 'python']
word_counts = Counter(words)
# Counter({'python': 3, 'sql': 2, 'pandas': 1})

# Most common elements
print(word_counts.most_common(2))
# [('python', 3), ('sql', 2)]

# Arithmetic operations
survey_1 = Counter({'python': 50, 'r': 30, 'sql': 45})
survey_2 = Counter({'python': 60, 'r': 25, 'sql': 55})
combined = survey_1 + survey_2
# Counter({'python': 110, 'sql': 100, 'r': 55})

Counter is invaluable for text analysis, frequency distributions, and any counting problem.

Interview Patterns

Pattern 1: Two-Sum Problem

def two_sum(nums, target):
    # Find indices of two numbers that add up to target
    seen = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i
    return []

This O(n) solution uses a dictionary as a lookup table — a classic interview pattern.

Pattern 2: Frequency Analysis

def find_duplicates(lst):
    # Find all elements that appear more than once
    counts = Counter(lst)
    return [item for item, count in counts.items() if count > 1]

Pattern 3: Grouping Data

def group_by_key(records, key):
    # Group a list of dicts by a given key
    groups = defaultdict(list)
    for record in records:
        groups[record[key]].append(record)
    return dict(groups)

employees = [
    {'name': 'Alice', 'dept': 'Engineering'},
    {'name': 'Bob', 'dept': 'Marketing'},
    {'name': 'Charlie', 'dept': 'Engineering'},
]
print(group_by_key(employees, 'dept'))

Pattern 4: Caching with Dictionaries

def fibonacci(n, cache={}):
    # Compute Fibonacci with memoization
    if n in cache:
        return cache[n]
    if n <= 1:
        return n
    cache[n] = fibonacci(n - 1) + fibonacci(n - 2)
    return cache[n]

Dictionaries with Pandas

Dictionaries and Pandas work together frequently:

import pandas as pd

# Create DataFrame from dict
data = {'name': ['Alice', 'Bob'], 'salary': [120000, 95000]}
df = pd.DataFrame(data)

# Map values using a dictionary
status_map = {1: 'Active', 0: 'Inactive'}
df['status_label'] = df['status'].map(status_map)

# Convert DataFrame to dict
records = df.to_dict('records')
# [{'name': 'Alice', 'salary': 120000}, {'name': 'Bob', 'salary': 95000}]

Practice Problems

For more Python fundamentals practice, visit the Python basics practice page.

Key Takeaways

Master get(), items(), comprehensions, defaultdict, and Counter. These tools cover the vast majority of dictionary use cases in data science. In interviews, dictionaries are the go-to data structure for O(1) lookups, counting, grouping, and caching. Practice the common patterns until they are second nature.

Practice Makes Perfect

Ready to test your skills?

Practice real Python Basics interview questions from top companies — with solutions.

Get interview tips in your inbox

Join data scientists preparing smarter. No spam, unsubscribe anytime.