SQL JOINs Explained: INNER, LEFT, RIGHT, and FULL
JOINs Are the Most Tested SQL Concept
In data science interviews, JOINs appear in nearly every SQL question. You'll combine data from multiple tables to answer business questions — just like you would on the job. Understanding when to use each JOIN type is essential.
The Four Main JOIN Types
INNER JOIN
Returns only rows that have matching values in both tables.
SELECT o.order_id, c.name, o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;
If a customer has no orders, they won't appear. If an order has no matching customer (orphaned record), it won't appear either.
LEFT JOIN (LEFT OUTER JOIN)
Returns all rows from the left table, plus matching rows from the right table. Non-matching rows get NULL for right-table columns.
-- Find all customers and their orders (including customers with no orders)
SELECT c.name, o.order_id, o.amount
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;
This is the most common JOIN in interview problems because it preserves all rows from the primary table.
RIGHT JOIN (RIGHT OUTER JOIN)
Returns all rows from the right table, plus matching rows from the left. It's the mirror of LEFT JOIN.
SELECT c.name, o.order_id, o.amount
FROM orders o
RIGHT JOIN customers c ON o.customer_id = c.id;
In practice, most people rewrite RIGHT JOINs as LEFT JOINs by swapping table order. It's more readable.
FULL OUTER JOIN
Returns all rows from both tables, with NULLs where there's no match.
-- Find all customers and all orders, even unmatched ones
SELECT c.name, o.order_id, o.amount
FROM customers c
FULL OUTER JOIN orders o ON c.id = o.customer_id;
Use FULL OUTER JOIN when you need to find records that exist in one table but not the other.
Anti-Join Pattern (Finding Missing Records)
One of the most common interview patterns is finding rows in one table that don't have a match in another:
-- Find customers who have never placed an order
SELECT c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.order_id IS NULL;
This LEFT JOIN + WHERE IS NULL pattern is called an anti-join. You'll see it in almost every interview.
Alternative: NOT EXISTS
SELECT c.name, c.email
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
Both approaches work. The anti-join pattern is often more readable; NOT EXISTS can be more efficient with indexes.
Self-Join
A self-join joins a table to itself. Common use cases:
Employees and Managers
-- Find employees who earn more than their manager
SELECT e.name as employee, e.salary as emp_salary,
m.name as manager, m.salary as mgr_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
Sequential Events
-- Find users who logged in on consecutive days
SELECT DISTINCT a.user_id
FROM logins a
JOIN logins b ON a.user_id = b.user_id
AND b.login_date = a.login_date + INTERVAL '1 day';
CROSS JOIN
Produces the Cartesian product — every combination of rows from both tables.
-- Generate a row for every user × every month (for reporting)
SELECT u.user_id, m.month
FROM users u
CROSS JOIN (
SELECT generate_series('2025-01-01', '2025-12-01', INTERVAL '1 month') as month
) m;
CROSS JOINs are useful for generating scaffolding that you then LEFT JOIN actual data onto.
Multiple JOINs
Real queries often join 3+ tables:
SELECT c.name, p.product_name, o.quantity, o.order_date
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE o.order_date >= '2025-01-01'
ORDER BY o.order_date DESC;
JOIN Order Matters for Readability
Start with the table that represents the "main" entity, then join supporting tables. This makes the query easier to read and explain in an interview.
Common Interview Questions
Question 1: Duplicate Handling
What happens when a JOIN produces duplicates?
If a customer has 3 orders, an INNER JOIN on customers and orders produces 3 rows for that customer. This is expected behavior — but you need to account for it when aggregating.
-- WRONG: this double-counts customer data
SELECT c.name, SUM(o.amount), c.signup_bonus
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.name, c.signup_bonus;
-- signup_bonus appears correct but customer-level stats could be misleading
-- BETTER: aggregate orders separately, then join
WITH order_totals AS (
SELECT customer_id, SUM(amount) as total_spent
FROM orders
GROUP BY customer_id
)
SELECT c.name, ot.total_spent, c.signup_bonus
FROM customers c
JOIN order_totals ot ON c.id = ot.customer_id;
Question 2: Find Users Active in Both Platforms
-- Users who used both the web app and mobile app
SELECT DISTINCT w.user_id
FROM web_sessions w
INNER JOIN mobile_sessions m ON w.user_id = m.user_id;
Question 3: Customers with Orders in Every Category
SELECT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN products p ON o.product_id = p.product_id
GROUP BY c.customer_id, c.name
HAVING COUNT(DISTINCT p.category) = (SELECT COUNT(DISTINCT category) FROM products);
Performance Considerations
Interviewers sometimes ask about JOIN performance:
- Index the JOIN columns — this is the single biggest optimization
- Filter early with WHERE — reduce row counts before joining
- **Avoid SELECT *** — only select columns you need
- Consider join order — most query optimizers handle this, but it matters in some databases
Practice JOIN Problems
Master JOINs with our SQL JOIN interview problems — from basic two-table joins to complex multi-table queries with real company questions.
Ready to test your skills?
Practice real Joins interview questions from top companies — with solutions.
Get interview tips in your inbox
Join data scientists preparing smarter. No spam, unsubscribe anytime.