DESIGN
INTERMEDIATE
lesson-13
Normalization: 1NF, 2NF, 3NF
Normalization is a systematic process to organize a database so it reduces redundancy and prevents anomalies. Each normal form builds on the previous, adding a stricter rule about how data dependencies should be structured.
Section 1
Why Normalize?
Without normalization you get three types of anomalies:
- Update anomaly — Changing a city name requires updating it in hundreds of rows
- Insertion anomaly — Can't add a course until at least one student enrolls
- Deletion anomaly — Deleting the last student in a course deletes the course itself
Section 2
1NF — First Normal Form
Rule: Every column must contain atomic (single, indivisible) values. No repeating groups or sets of values in one cell.
SQL — 1NF
-- VIOLATES 1NF: courses column stores multiple values -- student_id | name | courses -- 1 | Sara | Math, Science, English -- CORRECT 1NF: separate rows per course -- student_id | name | course -- 1 | Sara | Math -- 1 | Sara | Science -- 1 | Sara | English -- Or better: use a separate enrollments junction table
Section 3
2NF — Second Normal Form
Rule: Must be in 1NF and have no partial dependencies. Every non-key column must depend on the entire primary key — not just part of it. This only matters when the PK is composite.
SQL — 2NF
-- VIOLATES 2NF: student_name depends only on student_id, not course_id -- (student_id, course_id) → student_name, course_name, grade -- student_name depends on PART of the composite key -- FIX: decompose into three tables -- students(id, student_name) ← depends on student_id -- courses(id, course_name) ← depends on course_id -- enrollments(student_id, course_id, grade) ← depends on both
Section 4
3NF — Third Normal Form
Rule: Must be in 2NF and have no transitive dependencies. A non-key column must not depend on another non-key column.
SQL — 3NF
-- VIOLATES 3NF: zip_code determines city (transitive dependency) -- order_id → customer_zip → customer_city -- customer_city depends on zip_code, not order_id -- FIX: extract zip codes into their own table -- orders: order_id, customer_id, zip_code -- zip_codes: zip_code, city, state -- Now city is accessed via: orders JOIN zip_codes ON zip_code
Section 5
When to Denormalize
Normalization is great for OLTP (Online Transaction Processing) systems — apps that INSERT, UPDATE, DELETE frequently. However, OLAP (Online Analytical Processing) systems — data warehouses, reporting dashboards — often intentionally denormalize for read performance.
Normalization Progression
Raw Data
→
1NF
Atomic values
Atomic values
→
2NF
Remove partial deps
Remove partial deps
→
3NF
Remove transitive deps
Remove transitive deps
→
Normalized DB
ℹ️
Production Target: 3NF
Most production databases target 3NF. Denormalize only when you have a measured performance problem — premature denormalization creates maintenance headaches without guaranteed benefit.
🏻 Knowledge Check — Normalization
1. 1NF requires that…
Correct! 1NF requires atomic values in each cell — no comma-separated lists, arrays, or repeating groups.
2. 2NF builds on 1NF by…
Correct! 2NF removes partial dependencies — every non-key column must depend on the whole composite key, not just part of it.
3. 3NF removes…
Correct! 3NF says: non-key column A should not determine another non-key column B. If it does, extract B into its own table.
4. Denormalization is used for…
Correct! Denormalization trades write complexity for faster reads — common in OLAP/data warehouse systems where reads vastly outnumber writes.
5. A transitive dependency means…
Correct! Transitive: PK → non-key column A → non-key column B. Column B depends on A, not directly on the PK.