SQL Subqueries Explained: Correlated vs Non-Correlated
Why Subqueries Are Essential
Subqueries — queries nested inside other queries — are fundamental to SQL and appear constantly in data science interviews. They let you break complex problems into smaller, manageable steps. Understanding subqueries deeply, especially the difference between correlated and non-correlated, is a must for passing technical screens.
Non-Correlated Subqueries
A non-correlated subquery runs once, independently of the outer query. The inner query does not reference any columns from the outer query.
In the WHERE Clause
-- Find employees who earn more than the company average
SELECT employee_id, name, salary
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
The subquery SELECT AVG(salary) FROM employees runs once, returns a single value, and the outer query uses that value to filter rows.
With IN for Multiple Values
-- Find customers who placed an order in the last 30 days
SELECT customer_id, name
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);
The subquery returns a list of customer IDs, and the outer query filters to only those customers.
In the FROM Clause (Derived Tables)
-- Find the top department by average salary
SELECT department, avg_salary
FROM (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) AS dept_averages
ORDER BY avg_salary DESC
LIMIT 1;
The subquery creates a temporary result set (a derived table) that the outer query treats like a regular table.
In the SELECT Clause (Scalar Subqueries)
-- Show each employee's salary and the company average
SELECT
employee_id,
name,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
Scalar subqueries must return exactly one value (one row, one column).
Correlated Subqueries
A correlated subquery references columns from the outer query. It runs once for each row processed by the outer query. This makes it conceptually different from non-correlated subqueries.
Basic Example
-- Find employees who earn more than their department average
SELECT e.employee_id, e.name, e.salary, e.department
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department = e.department
);
Notice that the inner query references e.department from the outer query. For each employee, the database recalculates the average salary for that employee's department.
Correlated Subquery in SELECT
-- For each department, count the number of high earners
SELECT
d.department_name,
(
SELECT COUNT(*)
FROM employees e
WHERE e.department_id = d.department_id
AND e.salary > 100000
) AS high_earner_count
FROM departments d;
EXISTS vs IN
Both EXISTS and IN can filter rows based on subquery results, but they work differently and have different performance characteristics.
IN
-- Find customers who have placed at least one order
SELECT customer_id, name
FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders
);
IN collects all values from the subquery and checks membership. It works well when the subquery returns a small result set.
EXISTS
-- Same query using EXISTS
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
EXISTS checks whether the subquery returns any rows at all. It can stop as soon as it finds the first match, which makes it more efficient for large subquery result sets.
NOT IN vs NOT EXISTS
This is an important distinction for interviews:
-- NOT IN has a gotcha with NULLs
SELECT customer_id, name
FROM customers
WHERE customer_id NOT IN (
SELECT customer_id FROM orders
-- If any customer_id in orders is NULL,
-- NOT IN returns no rows at all!
);
-- NOT EXISTS handles NULLs correctly
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
If the subquery in NOT IN can return NULL values, the entire NOT IN condition evaluates to UNKNOWN, and no rows are returned. This is a classic interview trap. NOT EXISTS does not have this problem.
Rule of thumb: prefer NOT EXISTS over NOT IN. It is safer and often faster.
Performance Considerations
Non-Correlated Subqueries
- Run once, result is cached
- Generally efficient
- Modern optimizers often convert IN subqueries to joins
Correlated Subqueries
- Conceptually run once per outer row
- Can be slow on large tables
- Modern optimizers may rewrite them as joins internally
When to Use Joins Instead
Many subqueries can be rewritten as joins:
-- Subquery version
SELECT e.name, e.salary
FROM employees e
WHERE e.department_id IN (
SELECT department_id
FROM departments
WHERE location = 'New York'
);
-- Equivalent join version
SELECT e.name, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE d.location = 'New York';
In most modern databases, the optimizer produces the same execution plan for both. However, joins are often more readable for simple cases, while subqueries are clearer for complex filtering logic.
Common Interview Patterns
Find Rows with No Match
-- Customers with no orders (anti-join pattern)
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);
Find the Nth Highest Value
-- Second highest salary
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (
SELECT MAX(salary) FROM employees
);
Row-Level Comparisons to Aggregates
-- Products priced above their category average
SELECT p.product_name, p.price, p.category
FROM products p
WHERE p.price > (
SELECT AVG(price)
FROM products
WHERE category = p.category
);
Practice Problems
For more subquery practice, visit the subqueries practice page.
Key Takeaways
Non-correlated subqueries run once and are independent of the outer query. Correlated subqueries reference the outer query and run once per row. Prefer NOT EXISTS over NOT IN to avoid NULL pitfalls. Modern databases optimize many subqueries into joins, so focus on writing clear, correct queries first and optimizing second. Understanding these patterns will help you tackle most SQL interview problems with confidence.
Ready to test your skills?
Practice real Subqueries interview questions from top companies — with solutions.
Get interview tips in your inbox
Join data scientists preparing smarter. No spam, unsubscribe anytime.