Normalization in Database | 1NF, 2NF, 3NF

Sohaib Anser
6 min readJun 4, 2021

Before discussing the Normalization let's first look at two basic terms of Data Redundancy and Anomalies…

Data redundancy occurs when the same piece of data exists in more than one place in the database. For example, you own an online books store and one specific customer has multiple orders. For every order, he has to enter his address. So that one customer has entered the same address multiple times. That address is redundant data in the system.

Anomalies are problems that can occur in a poorly planned and un-normalized database. There are three types of anomalies in the database. Insertion anomaly, Deletion anomaly, and Update anomaly.

An insertion anomaly occurs when an inserting piece of data is dependent on the insertion of other data. For example, a user can not be registered into the transport system until he buys one train ticket.

Deletion anomaly happens when deleting a record, results in the deletion of other related records. For example, a user deletes a booked train ticket resulted in the deletion of that particular train information.

Update anomaly is by updating a single or multiple records leads to updating of many other records incorrectly.

Normalization is an approach to reduce the data redundancy and remove anomalies from the database.

The normalization process is categorized into…
• First-Normal-Form (1 NF )
• Second-Normal-Form (2 NF )
• Third-Normal-Form (3 NF )

Let's discuss one by one,

First-Normal-Form (1 NF ):
According to the rule, every cell of a table must contain atomic values. It means a cell can not hold multiple values.
A customer John shopped multiple grocery items and customer_table contains multiple shopped_items for his record.

customer_table

In 1NF this is not acceptable, rather it should contain a single value for each record. As you can see in the table.

customer_table

The other implications of 1NF are,

  • Each record in the table should be unique, which means no duplicate record should exist.
  • Each column should belong to the same domain. It means customer_id column should contain only customer IDs and can not hold the customer's name or anything else.
  • The order of rows and columns is not important.
  • The name of each column should be unique. No two columns can have the same names.

Second-Normal-Form (2 NF ):
Before discussing the 2NF, let's look at few terms first.

Super Key:
A super key is a set of attributes used to uniquely find a record in the table.
The following table contains multiple super keys like,
{Customer_id}
{Customer_name}
{Bank_account}
{Customer_id, Customer_name}
{Customer_id, Bank_account}
{Customer_name, Bank_account}
{Customer_id, Customer_name, Bank_account}

customer_table_2

Candidate Key:
A minimal super key with no redundant attributes is called a candidate key.
In the above super key table (customer_table_2), the following can be the candidate keys.
{Customer_id}
{Bank_account}
{Customer_id, Bank_account}

As two customer’s names can be the same, so Customer_name is a redundant attribute and can not be used in the candidate key.

Primary Key:
A primary key is a single attribute used to find the record uniquely.
It has the following attributes,

  • The primary key can not be null.
  • The primary key must be unique for each record.

In the following table, customer_id is a primary key because it is used to identify customers uniquely.

Composite Key:
In composite key, two or more attributes are combined to identify the record uniquely.

Two customer’s names are the same, so we can not identify the records by just using the name. A combination of the customer_name and customer_address attributes guarantees uniqueness and we can identify the record correctly. So, customer_name and customer_address is the composite key.

Prime attributes:
Attributes that are part of the candidate key are called prime attributes.
In the above table (customer_table_2 of Super Key), Customer_id and Bank_account are the prime attributes.

Non-Prime attributes:
Attributes that are not part of the candidate key are called non-prime attributes. In the above table (customer_table_2 of Super Key), Customer_name is the non-prime attribute.

Foreign Key:
An attribute of the table that references the primary key of another table.

Second-Normal-Form (2 NF ):
• It should be in a 1NF.
• It should not contain any partial dependency.

Partial-dependency is when non-prime attributes of the table dependent on part of the candidate key.

customer_table_partial_dependent

In the above table, customer_id, Bank_account_number are the prime attributes, and customer_name, bank_account_name are the non-prime attributes.

Customer_name can be identified from customer_id and bank_account_name can be identified from bank_account_number, so there is a partial dependency. To make this schema in 2NF, we need to remove its partial dependency.

To remove the partial dependency, bank details will be transferred into a separate table.

bank_details

customer_table will hold the bank_details foreign key.

customer_table

Now there is no partial dependency exist in the schema, so it is in 2NF.

Third-Normal-Form (3 NF ):
Transitive Functional Dependency:
Functional dependency is a constraint that defines the relationship between two columns of a table.

When an indirect relation results in functional dependency then it is a transitive functional dependency. X → Z holds transitive functional dependency if X → Y and Y → Z hold true.

book_details

{book} → {author_name} (if we know the book name then we know the author name).
{author_name} does not hold → {book}
{author_name} → {author_age}
So {book} → {author_age} holds.

Third-Normal-Form (3 NF ):
• It should be in a 2NF.
• It should not hold transitive functional dependency.

The above table of books_details is not in 3 NF because it holds transitive functional dependency. To convert it into 3 NF, there is a need to remove the transitive functional dependency.

books_details
author_details

Now there is a separate table for author_details and no transitive functional dependency exists. The schema is in 3 NF.

That's it to Normalization 😇 😉 😊 🙂

--

--

Sohaib Anser

Backend Engineer, Python, AWS, Committed to making a difference. Follow me on LinkedIn: https://www.linkedin.com/in/muhammad-sohaib-python