Lesson 10 of 20

GROUP BY & HAVING

GROUP BY - Grouping Results

GROUP BY divides the result set into groups based on one or more columns. It is almost always used with aggregate functions to calculate summary statistics for each group.

When you use GROUP BY, each unique combination of the grouped columns becomes one row in the result. Any column in the SELECT that is not inside an aggregate function must appear in the GROUP BY clause.

  • GROUP BY creates one result row per unique group
  • Every non-aggregated column in SELECT must be in the GROUP BY clause
  • Groups are formed before aggregate functions are applied
  • You can group by multiple columns for more granular results
Example
-- Count users by their active status
SELECT is_active, COUNT(*) AS user_count
FROM users
GROUP BY is_active;

-- Average age grouped by active status
SELECT is_active, AVG(age) AS avg_age
FROM users
GROUP BY is_active;

-- Count users who joined each month
SELECT 
    YEAR(join_date) AS join_year,
    MONTH(join_date) AS join_month,
    COUNT(*) AS new_users
FROM users
GROUP BY YEAR(join_date), MONTH(join_date)
ORDER BY join_year, join_month;
Notes
  • If you see the error 'column must appear in the GROUP BY clause or be used in an aggregate function', it means you selected a column that is not grouped or aggregated.

HAVING - Filtering Groups

HAVING filters groups after they have been formed by GROUP BY, while WHERE filters individual rows before grouping. This distinction is critical: you cannot use aggregate functions in a WHERE clause, but you can in a HAVING clause.

Think of it this way: WHERE filters rows, HAVING filters groups.

Example
-- Find statuses with more than 3 users
SELECT is_active, COUNT(*) AS user_count
FROM users
GROUP BY is_active
HAVING COUNT(*) > 3;

-- Find age groups with an average age over 30
SELECT 
    CASE
        WHEN age < 25 THEN 'Under 25'
        WHEN age BETWEEN 25 AND 35 THEN '25-35'
        ELSE 'Over 35'
    END AS age_group,
    COUNT(*) AS user_count,
    AVG(age) AS avg_age
FROM users
GROUP BY age_group
HAVING AVG(age) > 30;

-- WHERE and HAVING together
SELECT is_active, COUNT(*) AS user_count
FROM users
WHERE age > 20           -- filter rows first
GROUP BY is_active
HAVING COUNT(*) >= 2;    -- then filter groups
Notes
  • SQL execution order: FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT. Understanding this order helps you write correct queries.