DESIGN INTERMEDIATE lesson-13

Normalization: 1NF, 2NF, 3NF

🕑 28 min 📚 Lesson 13 of 15 ✅ 5 quiz questions
Normalization is a systematic process to organize a database so it reduces redundancy and prevents anomalies. Each normal form builds on the previous, adding a stricter rule about how data dependencies should be structured.

Why Normalize?

Without normalization you get three types of anomalies:

  • Update anomaly — Changing a city name requires updating it in hundreds of rows
  • Insertion anomaly — Can't add a course until at least one student enrolls
  • Deletion anomaly — Deleting the last student in a course deletes the course itself

1NF — First Normal Form

Rule: Every column must contain atomic (single, indivisible) values. No repeating groups or sets of values in one cell.

SQL — 1NF
-- VIOLATES 1NF: courses column stores multiple values
-- student_id | name  | courses
-- 1          | Sara  | Math, Science, English

-- CORRECT 1NF: separate rows per course
-- student_id | name  | course
-- 1          | Sara  | Math
-- 1          | Sara  | Science
-- 1          | Sara  | English
-- Or better: use a separate enrollments junction table

2NF — Second Normal Form

Rule: Must be in 1NF and have no partial dependencies. Every non-key column must depend on the entire primary key — not just part of it. This only matters when the PK is composite.

SQL — 2NF
-- VIOLATES 2NF: student_name depends only on student_id, not course_id
-- (student_id, course_id) → student_name, course_name, grade
-- student_name depends on PART of the composite key

-- FIX: decompose into three tables
-- students(id, student_name)           ← depends on student_id
-- courses(id, course_name)             ← depends on course_id
-- enrollments(student_id, course_id, grade)  ← depends on both

3NF — Third Normal Form

Rule: Must be in 2NF and have no transitive dependencies. A non-key column must not depend on another non-key column.

SQL — 3NF
-- VIOLATES 3NF: zip_code determines city (transitive dependency)
-- order_id → customer_zip → customer_city
-- customer_city depends on zip_code, not order_id

-- FIX: extract zip codes into their own table
-- orders: order_id, customer_id, zip_code
-- zip_codes: zip_code, city, state
-- Now city is accessed via: orders JOIN zip_codes ON zip_code

When to Denormalize

Normalization is great for OLTP (Online Transaction Processing) systems — apps that INSERT, UPDATE, DELETE frequently. However, OLAP (Online Analytical Processing) systems — data warehouses, reporting dashboards — often intentionally denormalize for read performance.

Normalization Progression
Raw Data
1NF
Atomic values
2NF
Remove partial deps
3NF
Remove transitive deps
Normalized DB
ℹ️
Production Target: 3NF
Most production databases target 3NF. Denormalize only when you have a measured performance problem — premature denormalization creates maintenance headaches without guaranteed benefit.
🏻 Knowledge Check — Normalization
1. 1NF requires that…
Correct! 1NF requires atomic values in each cell — no comma-separated lists, arrays, or repeating groups.
2. 2NF builds on 1NF by…
Correct! 2NF removes partial dependencies — every non-key column must depend on the whole composite key, not just part of it.
3. 3NF removes…
Correct! 3NF says: non-key column A should not determine another non-key column B. If it does, extract B into its own table.
4. Denormalization is used for…
Correct! Denormalization trades write complexity for faster reads — common in OLAP/data warehouse systems where reads vastly outnumber writes.
5. A transitive dependency means…
Correct! Transitive: PK → non-key column A → non-key column B. Column B depends on A, not directly on the PK.