QUERYINGINTERMEDIATElesson-07
Aggregate Functions, GROUP BY & HAVING
Aggregate functions collapse rows into summary values. They are the engine behind every dashboard, report, and analytics query. COUNT tells you how many. SUM adds them up. GROUP BY splits the data into buckets before aggregating. HAVING then filters those buckets. These four concepts combined can answer almost any business question.
Section 1
Core Aggregate Functions
SQL
-- COUNT: number of rows SELECT COUNT(*) AS total_students FROM students; SELECT COUNT(email) AS has_email FROM students; -- skips NULLs -- SUM: add up values SELECT SUM(price) AS revenue FROM orders; -- AVG: mean value (ignores NULLs) SELECT AVG(age) AS average_age FROM students; -- MIN and MAX SELECT MIN(age) AS youngest, MAX(age) AS oldest FROM students; -- Multiple aggregates in one query SELECT COUNT(*) AS total, AVG(age) AS avg_age, MIN(age) AS min_age, MAX(age) AS max_age FROM students;
COUNT
Number of rows — COUNT(*) includes NULLs, COUNT(col) skips them
RowsSUM
Adds all values in a column — ignores NULLs
TotalAVG
Arithmetic mean — sum divided by non-NULL count
MeanMIN
Smallest value — works on text (A comes first) too
SmallestMAX
Largest value — works on text and dates too
LargestCOUNT DISTINCT
Counts unique non-NULL values only
UniqueSection 2
GROUP BY
SQL
-- Count students per grade SELECT grade, COUNT(*) AS num_students FROM students GROUP BY grade; -- Average age per grade SELECT grade, AVG(age) AS avg_age FROM students GROUP BY grade ORDER BY avg_age DESC; -- Multiple GROUP BY columns SELECT grade, age, COUNT(*) AS count FROM students GROUP BY grade, age;
⚠️
GROUP BY Rule
Every column in SELECT must either be in GROUP BY or inside an aggregate function. Violating this causes errors (or silent wrong results in non-strict mode).
Section 3
HAVING — Filtering Groups
SQL
-- HAVING filters groups (like WHERE but for aggregates) SELECT grade, COUNT(*) AS num FROM students GROUP BY grade HAVING COUNT(*) >= 3; -- only grades with 3+ students -- HAVING with AVG SELECT grade, AVG(age) AS avg_age FROM students GROUP BY grade HAVING AVG(age) > 16.5;
ℹ️
WHERE vs HAVING
WHERE filters individual rows BEFORE grouping. HAVING filters groups AFTER grouping. You cannot use aggregate functions in WHERE.
Section 4
Full Aggregation Pipeline
SQL
SELECT grade, COUNT(*) AS total_students, AVG(age) AS avg_age, MAX(age) AS oldest FROM students WHERE email IS NOT NULL -- 1. filter rows first GROUP BY grade -- 2. group remaining rows HAVING COUNT(*) >= 2 -- 3. filter groups ORDER BY avg_age DESC -- 4. sort groups LIMIT 5; -- 5. take top 5
Aggregation Execution Pipeline
WHERE
filter rows
filter rows
→
GROUP BY
bucket rows
bucket rows
→
Aggregate
COUNT/SUM/AVG
COUNT/SUM/AVG
→
HAVING
filter groups
filter groups
→
ORDER BY
→
LIMIT
Section 5
NULL in Aggregates
SQL
-- COUNT(*) counts all rows including NULLs -- COUNT(col) skips NULLs SELECT COUNT(*) AS total_rows, COUNT(email) AS rows_with_email, COUNT(*) - COUNT(email) AS rows_missing_email FROM students; -- AVG ignores NULLs (doesn't divide by rows with NULL) -- SUM ignores NULLs -- Use COALESCE to handle NULLs in SUM SELECT SUM(COALESCE(discount, 0)) FROM orders;
🎤 Knowledge Check
1. What does COUNT(*) count?
COUNT(*) counts every row regardless of NULL values. COUNT(column_name) skips rows where that column is NULL.
2. Difference between WHERE and HAVING?
WHERE runs before GROUP BY (filters rows). HAVING runs after GROUP BY (filters groups). Use HAVING when your condition involves an aggregate like COUNT or AVG.
3. Can you use AVG() in a WHERE clause?
Aggregate functions (AVG, COUNT, SUM, etc.) cannot be used in WHERE. Put aggregate conditions in HAVING after GROUP BY.
4. GROUP BY grade means?
GROUP BY collects all rows with the same grade value into a "bucket," then the aggregate functions run on each bucket separately.
5. Which finds grades with more than 5 students?
After GROUP BY groups rows by grade, HAVING COUNT(*) > 5 filters out any groups that don't have more than 5 students.