SQL GROUP BY: Complete Guide with Interview Examples
Why GROUP BY Is Fundamental
GROUP BY is one of the most tested SQL concepts in data science interviews. It's the foundation of data aggregation — turning millions of rows into meaningful summaries. If you can't write a confident GROUP BY query under pressure, you'll struggle in technical screens at companies like Google, Meta, and Amazon.
This guide covers everything from basics to advanced patterns, with real examples you'll see in interviews.
How GROUP BY Works
GROUP BY collapses rows that share the same values in specified columns into a single summary row. You then apply aggregate functions (COUNT, SUM, AVG, MIN, MAX) to calculate values for each group.
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department;
This returns one row per department with the total number of employees in each.
The Execution Order
Understanding SQL's logical execution order is critical for interviews:
- FROM — identify the table(s)
- WHERE — filter individual rows
- GROUP BY — form groups
- HAVING — filter groups
- SELECT — compute expressions
- ORDER BY — sort results
This is why you can't reference an alias from SELECT in your WHERE clause — WHERE executes before SELECT.
WHERE vs HAVING
This is one of the most common interview questions about GROUP BY.
- WHERE filters rows before grouping
- HAVING filters groups after aggregation
-- Find departments with more than 10 employees hired after 2023
SELECT department, COUNT(*) as recent_hires
FROM employees
WHERE hire_date >= '2023-01-01' -- filters rows first
GROUP BY department
HAVING COUNT(*) > 10; -- filters groups after
Interview tip: If an interviewer asks "What's the difference between WHERE and HAVING?", start with execution order, then give a concrete example. Don't just say "HAVING is for aggregates."
Grouping by Multiple Columns
Real interview queries often group by two or more columns:
-- Revenue by region and product category
SELECT region, category,
SUM(revenue) as total_revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM orders
GROUP BY region, category
ORDER BY total_revenue DESC;
Each unique combination of (region, category) becomes one row.
Common Aggregate Functions
| Function | Purpose | NULL Handling |
|---|---|---|
| COUNT(*) | Count all rows | Includes NULLs |
| COUNT(col) | Count non-NULL values | Excludes NULLs |
| COUNT(DISTINCT col) | Count unique non-NULL values | Excludes NULLs |
| SUM(col) | Total of values | Ignores NULLs |
| AVG(col) | Average of values | Ignores NULLs |
| MIN(col) / MAX(col) | Minimum / Maximum | Ignores NULLs |
COUNT(*) vs COUNT(col)
This trips up many candidates:
SELECT department,
COUNT(*) as total_rows, -- counts every row
COUNT(bonus) as with_bonus, -- counts non-NULL bonus
COUNT(DISTINCT title) as titles -- unique job titles
FROM employees
GROUP BY department;
Conditional Aggregation with CASE
This is an advanced pattern that appears in medium-to-hard interview questions:
-- Pivot: count employees by department AND employment type
SELECT department,
COUNT(CASE WHEN employment_type = 'full-time' THEN 1 END) as full_time,
COUNT(CASE WHEN employment_type = 'part-time' THEN 1 END) as part_time,
COUNT(CASE WHEN employment_type = 'contractor' THEN 1 END) as contractors
FROM employees
GROUP BY department;
This is essentially a manual pivot table — a very common interview pattern.
-- Calculate retention rate per cohort
SELECT signup_month,
COUNT(*) as total_users,
SUM(CASE WHEN last_active >= signup_month + INTERVAL '30 days' THEN 1 ELSE 0 END) as retained,
ROUND(100.0 * SUM(CASE WHEN last_active >= signup_month + INTERVAL '30 days' THEN 1 ELSE 0 END)
/ COUNT(*), 1) as retention_rate
FROM users
GROUP BY signup_month
ORDER BY signup_month;
Real Interview Examples
Example 1: Top Customers (Easy)
Find the top 5 customers by total spending.
SELECT customer_id,
SUM(amount) as total_spent,
COUNT(*) as order_count
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 5;
Example 2: Active vs Churned Users (Medium)
For each month, count users who made a purchase vs users who didn't return.
WITH monthly_activity AS (
SELECT user_id,
DATE_TRUNC('month', purchase_date) as month
FROM purchases
GROUP BY user_id, DATE_TRUNC('month', purchase_date)
)
SELECT month,
COUNT(DISTINCT user_id) as active_users
FROM monthly_activity
GROUP BY month
ORDER BY month;
Example 3: Year-over-Year Growth (Hard)
Calculate monthly revenue and year-over-year growth percentage.
WITH monthly AS (
SELECT DATE_TRUNC('month', order_date) as month,
SUM(amount) as revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT month,
revenue,
LAG(revenue, 12) OVER (ORDER BY month) as same_month_last_year,
ROUND(100.0 * (revenue - LAG(revenue, 12) OVER (ORDER BY month))
/ NULLIF(LAG(revenue, 12) OVER (ORDER BY month), 0), 1) as yoy_growth
FROM monthly
ORDER BY month;
Common Mistakes to Avoid
- Selecting non-aggregated columns not in GROUP BY — this is invalid in standard SQL (though MySQL allows it with unpredictable results)
- Using WHERE instead of HAVING for aggregate conditions — remember the execution order
- Forgetting COUNT(col) excludes NULLs — use COUNT(*) when you want all rows
- Not handling division by zero — always use NULLIF in denominators
Practice GROUP BY Problems
Ready to practice? Work through our GROUP BY interview problems with detailed solutions and explanations. Start with easy problems and work your way up to the complex multi-step queries that top companies love to ask.
Ready to test your skills?
Practice real Group By interview questions from top companies — with solutions.
Get interview tips in your inbox
Join data scientists preparing smarter. No spam, unsubscribe anytime.