SQL GROUP BY: Complete Guide with Interview Examples

February 28, 2026

Why GROUP BY Is Fundamental

GROUP BY is one of the most tested SQL concepts in data science interviews. It's the foundation of data aggregation — turning millions of rows into meaningful summaries. If you can't write a confident GROUP BY query under pressure, you'll struggle in technical screens at companies like Google, Meta, and Amazon.

This guide covers everything from basics to advanced patterns, with real examples you'll see in interviews.

How GROUP BY Works

GROUP BY collapses rows that share the same values in specified columns into a single summary row. You then apply aggregate functions (COUNT, SUM, AVG, MIN, MAX) to calculate values for each group.

SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department;

This returns one row per department with the total number of employees in each.

The Execution Order

Understanding SQL's logical execution order is critical for interviews:

FROM — identify the table(s)
WHERE — filter individual rows
GROUP BY — form groups
HAVING — filter groups
SELECT — compute expressions
ORDER BY — sort results

This is why you can't reference an alias from SELECT in your WHERE clause — WHERE executes before SELECT.

WHERE vs HAVING

This is one of the most common interview questions about GROUP BY.

WHERE filters rows before grouping
HAVING filters groups after aggregation

-- Find departments with more than 10 employees hired after 2023
SELECT department, COUNT(*) as recent_hires
FROM employees
WHERE hire_date >= '2023-01-01'    -- filters rows first
GROUP BY department
HAVING COUNT(*) > 10;             -- filters groups after

Interview tip: If an interviewer asks "What's the difference between WHERE and HAVING?", start with execution order, then give a concrete example. Don't just say "HAVING is for aggregates."

Grouping by Multiple Columns

Real interview queries often group by two or more columns:

-- Revenue by region and product category
SELECT region, category,
       SUM(revenue) as total_revenue,
       COUNT(DISTINCT customer_id) as unique_customers
FROM orders
GROUP BY region, category
ORDER BY total_revenue DESC;

Each unique combination of (region, category) becomes one row.

Common Aggregate Functions

Function	Purpose	NULL Handling
COUNT(*)	Count all rows	Includes NULLs
COUNT(col)	Count non-NULL values	Excludes NULLs
COUNT(DISTINCT col)	Count unique non-NULL values	Excludes NULLs
SUM(col)	Total of values	Ignores NULLs
AVG(col)	Average of values	Ignores NULLs
MIN(col) / MAX(col)	Minimum / Maximum	Ignores NULLs

COUNT(*) vs COUNT(col)

This trips up many candidates:

SELECT department,
       COUNT(*) as total_rows,           -- counts every row
       COUNT(bonus) as with_bonus,       -- counts non-NULL bonus
       COUNT(DISTINCT title) as titles   -- unique job titles
FROM employees
GROUP BY department;

Conditional Aggregation with CASE

This is an advanced pattern that appears in medium-to-hard interview questions:

-- Pivot: count employees by department AND employment type
SELECT department,
       COUNT(CASE WHEN employment_type = 'full-time' THEN 1 END) as full_time,
       COUNT(CASE WHEN employment_type = 'part-time' THEN 1 END) as part_time,
       COUNT(CASE WHEN employment_type = 'contractor' THEN 1 END) as contractors
FROM employees
GROUP BY department;

This is essentially a manual pivot table — a very common interview pattern.

-- Calculate retention rate per cohort
SELECT signup_month,
       COUNT(*) as total_users,
       SUM(CASE WHEN last_active >= signup_month + INTERVAL '30 days' THEN 1 ELSE 0 END) as retained,
       ROUND(100.0 * SUM(CASE WHEN last_active >= signup_month + INTERVAL '30 days' THEN 1 ELSE 0 END)
             / COUNT(*), 1) as retention_rate
FROM users
GROUP BY signup_month
ORDER BY signup_month;

Real Interview Examples

Example 1: Top Customers (Easy)

Find the top 5 customers by total spending.

SELECT customer_id,
       SUM(amount) as total_spent,
       COUNT(*) as order_count
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 5;

Example 2: Active vs Churned Users (Medium)

For each month, count users who made a purchase vs users who didn't return.

WITH monthly_activity AS (
    SELECT user_id,
           DATE_TRUNC('month', purchase_date) as month
    FROM purchases
    GROUP BY user_id, DATE_TRUNC('month', purchase_date)
)
SELECT month,
       COUNT(DISTINCT user_id) as active_users
FROM monthly_activity
GROUP BY month
ORDER BY month;

Example 3: Year-over-Year Growth (Hard)

Calculate monthly revenue and year-over-year growth percentage.

WITH monthly AS (
    SELECT DATE_TRUNC('month', order_date) as month,
           SUM(amount) as revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT month,
       revenue,
       LAG(revenue, 12) OVER (ORDER BY month) as same_month_last_year,
       ROUND(100.0 * (revenue - LAG(revenue, 12) OVER (ORDER BY month))
             / NULLIF(LAG(revenue, 12) OVER (ORDER BY month), 0), 1) as yoy_growth
FROM monthly
ORDER BY month;

Common Mistakes to Avoid

Selecting non-aggregated columns not in GROUP BY — this is invalid in standard SQL (though MySQL allows it with unpredictable results)
Using WHERE instead of HAVING for aggregate conditions — remember the execution order
Forgetting COUNT(col) excludes NULLs — use COUNT(*) when you want all rows
Not handling division by zero — always use NULLIF in denominators

Practice GROUP BY Problems

Ready to practice? Work through our GROUP BY interview problems with detailed solutions and explanations. Start with easy problems and work your way up to the complex multi-step queries that top companies love to ask.

Practice Makes Perfect

Ready to test your skills?

Practice real Group By interview questions from top companies — with solutions.

Practice Group By Problems Browse all problems

Get interview tips in your inbox

Join data scientists preparing smarter. No spam, unsubscribe anytime.

← Back to Blog