QUERYINGINTERMEDIATElesson-07

Aggregate Functions, GROUP BY & HAVING

🕑 32 min
📃 5 sections
🎓 Querying
Aggregate functions collapse rows into summary values. They are the engine behind every dashboard, report, and analytics query. COUNT tells you how many. SUM adds them up. GROUP BY splits the data into buckets before aggregating. HAVING then filters those buckets. These four concepts combined can answer almost any business question.

Core Aggregate Functions

SQL
-- COUNT: number of rows
SELECT COUNT(*) AS total_students FROM students;
SELECT COUNT(email) AS has_email FROM students;     -- skips NULLs

-- SUM: add up values
SELECT SUM(price) AS revenue FROM orders;

-- AVG: mean value (ignores NULLs)
SELECT AVG(age) AS average_age FROM students;

-- MIN and MAX
SELECT MIN(age) AS youngest, MAX(age) AS oldest FROM students;

-- Multiple aggregates in one query
SELECT
  COUNT(*) AS total,
  AVG(age)  AS avg_age,
  MIN(age)  AS min_age,
  MAX(age)  AS max_age
FROM students;
🔢
COUNT
Number of rows — COUNT(*) includes NULLs, COUNT(col) skips them
Rows
SUM
Adds all values in a column — ignores NULLs
Total
📈
AVG
Arithmetic mean — sum divided by non-NULL count
Mean
📋
MIN
Smallest value — works on text (A comes first) too
Smallest
📊
MAX
Largest value — works on text and dates too
Largest
🧸
COUNT DISTINCT
Counts unique non-NULL values only
Unique

GROUP BY

SQL
-- Count students per grade
SELECT grade, COUNT(*) AS num_students
FROM students
GROUP BY grade;

-- Average age per grade
SELECT grade, AVG(age) AS avg_age
FROM students
GROUP BY grade
ORDER BY avg_age DESC;

-- Multiple GROUP BY columns
SELECT grade, age, COUNT(*) AS count
FROM students
GROUP BY grade, age;
⚠️
GROUP BY Rule
Every column in SELECT must either be in GROUP BY or inside an aggregate function. Violating this causes errors (or silent wrong results in non-strict mode).

HAVING — Filtering Groups

SQL
-- HAVING filters groups (like WHERE but for aggregates)
SELECT grade, COUNT(*) AS num
FROM students
GROUP BY grade
HAVING COUNT(*) >= 3;    -- only grades with 3+ students

-- HAVING with AVG
SELECT grade, AVG(age) AS avg_age
FROM students
GROUP BY grade
HAVING AVG(age) > 16.5;
ℹ️
WHERE vs HAVING
WHERE filters individual rows BEFORE grouping. HAVING filters groups AFTER grouping. You cannot use aggregate functions in WHERE.

Full Aggregation Pipeline

SQL
SELECT
  grade,
  COUNT(*) AS total_students,
  AVG(age) AS avg_age,
  MAX(age) AS oldest
FROM students
WHERE email IS NOT NULL         -- 1. filter rows first
GROUP BY grade                   -- 2. group remaining rows
HAVING COUNT(*) >= 2            -- 3. filter groups
ORDER BY avg_age DESC           -- 4. sort groups
LIMIT 5;                         -- 5. take top 5
Aggregation Execution Pipeline
WHERE
filter rows
GROUP BY
bucket rows
Aggregate
COUNT/SUM/AVG
HAVING
filter groups
ORDER BY
LIMIT

NULL in Aggregates

SQL
-- COUNT(*) counts all rows including NULLs
-- COUNT(col) skips NULLs
SELECT
  COUNT(*) AS total_rows,
  COUNT(email) AS rows_with_email,
  COUNT(*) - COUNT(email) AS rows_missing_email
FROM students;

-- AVG ignores NULLs (doesn't divide by rows with NULL)
-- SUM ignores NULLs
-- Use COALESCE to handle NULLs in SUM
SELECT SUM(COALESCE(discount, 0)) FROM orders;
🎤 Knowledge Check
1. What does COUNT(*) count?
COUNT(*) counts every row regardless of NULL values. COUNT(column_name) skips rows where that column is NULL.
2. Difference between WHERE and HAVING?
WHERE runs before GROUP BY (filters rows). HAVING runs after GROUP BY (filters groups). Use HAVING when your condition involves an aggregate like COUNT or AVG.
3. Can you use AVG() in a WHERE clause?
Aggregate functions (AVG, COUNT, SUM, etc.) cannot be used in WHERE. Put aggregate conditions in HAVING after GROUP BY.
4. GROUP BY grade means?
GROUP BY collects all rows with the same grade value into a "bucket," then the aggregate functions run on each bucket separately.
5. Which finds grades with more than 5 students?
After GROUP BY groups rows by grade, HAVING COUNT(*) > 5 filters out any groups that don't have more than 5 students.