Normalization – 1NF, 2NF, 3NF, BCNF and 4NF


Normalization
  • Normalization is the process of organizing the data.
  • Normalization is used to minimize and eliminate the redundancy from a relation or set of relations. It is also used to eliminate the undesirable characteristics like Insertion, Updation and Deletion anomalies.
  • It divides the larger table into the smaller table and links them using relationship.
Anomalies in Database

Update Anomalies: When several instances of the same data are scattered across the database without a proper relationship, it could cause inconsistency where few of the instances will get updated with new values while some of them will remain with old values. This might lead to an inconsistent state of database.
Deletion Anomalies: Incomplete deletion of a particular data section which leaves some residual instances of that data. The database creator remains unaware of such unwanted data as it is present at a different location. The deletion of any particular data may lead to deleting other attributes referencing another table.
Insertion Anomalies: This occurs when an attempt to insert data into a non-existent record i.e. child is inserted without a parent.

Types of Normal Forms:

There are mainly five types of normal forms:
1. First Normal Form(1NF)
2. Second Normal Form(2NF)
3. Third Normal Form(3NF)
4. Boyce codd Normal Form(BCNF)
5. Fourth Normal Form(4NF)

1. First Normal Form(1NF)

The relation is said to be in first normal form if it does not contain any multivalued attributes.
It should have only atomic value.
Every column must have values of its particular domain.
Every colum must have unique names.
Let us consider Student table having following data with multivalued attribute.
Roll_No
Stud_Name
Stud_Contact
1
Pooja
7649356783, 8574334584
2
Palak
9483945763
3
Rajvi
9596049359


From the above table, we can identify that student Pooja is having multiple values in Stud_Contact attribute, which violates the rule of first normal form.

So, to bring the above given table in 1st Normal Form, we can have following resolution:

Roll_No
Stud_Name
Stud_Contact
1
Pooja
7649356783
1
Pooja
8574334584
2
Palak
9483945763
3
Rajvi
9596049359

2. Second Normal Form(2NF)

A relation is said to be in second normal form, if it is in first normal form and every non-prime attribute is fully functionally dependent on the candidate key of that table.

Non-prime attribute: An attribute that is not a part of candidate key is called non-prime attribute.

Second Normal form does not allow partial dependency.

Partial dependency: In a relation, if proper subset of candidate key that determines non-prime attribute, is called partial dependency.

If any partial dependency exist, we remove it by placing the attributes in new relation for converting the relation into second normal form.

Let us consider one relation say Student, having following functional dependencies:
FD1: {Roll_No, Course_Id} -> {Student_Name}
FD2: {Course_Id} -> {Course_Name}

 Roll_No
Course_Id
Stud_Name
Course_Name
1
CE001
Pooja
Computer
2
CE002
Rani
Civil
3
CE003
Palak
IT

Step-1: Identify the candidate key by finding closure of attributes for given functional dependencies:
{Roll_No, Course_Id}+            =          {Roll_No, Course_Id}
                                                =          {Roll_No, Course_Id, Stud_Name}

                                                =          {Roll_No, Course_Id, Stud_Name, Course_Name}
As we can see, that we can derive all attributes of a relation with closure of {Roll_No, Course_Id}, so we can say that  {Roll_No, Course_Id} is a candidate key.

Step-2: Identify prime and non-prime attributes in a given relation:
Prime attributes: Roll_No, Course_Id (attributes of candidate key)
Non-prime attributes: Stud_Name, Course_Name (Dependent attributes)

Step-3: Find if any partial dependency exist
In the above given example there is partial dependency as Course_Id is individually trying to determine Course_Name without involving Roll_No, so there exist the partial dependency.

Step-4: Decompose the relation in such a way that, partial dependency is removed.
So the above given relation can be decomposed in following relations:

Relation1: Student_Detail
Roll_No
Course_Id
Stud_Name
1
CE001
Pooja
2
CE002
Rani
3
CE003
Palak

Relation2:  Course
Course_Id
Course_Name
CE001
Computer
CE002
Civil
CE003
IT

So, the relation formed after decomposing are Student_Detail and Course, and now no partial dependency exists.

Third Normal Form(3NF)

The relation is said to be in third normal form if it is in second normal form and it should not have any transitive dependency.

Transitive dependency: When a non-prime attribute finds another non-prime attribute, it is called transitive dependency.

Let us consider one relation say Student, having following functional dependencies:
FD1: {Roll_No} -> {Course_Id, Student_Name}
FD2: {Course_Id} -> {Course_Name}

 Roll_No
Course_Id
Stud_Name
Course_Name
1
CE001
Pooja
Computer
2
CE002
Rani
Civil
3
CE003
Palak
IT

Step-1: Identify the candidate key by finding closure of attributes for given functional dependencies:
{Roll_No}+      =          {Roll_No}
                        =          {Roll_No, Course_Id, Stud_Name}
=          {Roll_No, Course_Id, Stud_Name, Course_Name}
As we can see, that we can derive all attributes of a relation with closure of {Roll_No}, so we can say that  {Roll_No} is a candidate key.

Step-2: Identify prime and non-prime attributes in a given relation:
Prime attributes: Roll_No (attributes of candidate key)
Non-prime attributes: Course_Id, Stud_Name, Course_Name (Dependent attributes)

Step-3: Find if any transitive dependency exist?
In the above given example there is transitive dependency as Course_Id which is non-prime attribute is trying to determine Course_Name, so there is existence of transitive dependency.

Step-4: Decompose the relation in such a way that, transitive dependency is removed.
So the above given relation can be decomposed in following relations:

Relation1: Student_Detail
Roll_No
Course_Id
Stud_Name
1
CE001
Pooja
2
CE002
Rani
3
CE003
Palak

Relation2:  Course
Course_Id
Course_Name
CE001
Computer
CE002
Civil
CE003
IT

So by decomposing a relation in two relations i.e. Student_detail and Course, we have made Course_Id as a prime attribute in relation Course.

Boyce Codd Normal Form(BCNF)

Boyce Codd Normal Form is the strict form of third normal form.

A relation is said to be in BCNF, if it is in third normal form and for any dependency A->B, A should be super key of that table.

Note: All superkeys are not candidate keys but all candidate keys are super keys.

Let us take one example on this:
Consider a relation Student having following details with given functional dependencies:

FD1: {Roll_No, Course_Name} -> {Faculty}

Roll_No
Course_Name
Faculty
1
DBMS
APC
1
DSA
VHK
2
CO
RPP
3
DBMS
AJS
4
DBMS
APC

Step-1: Identify the candidate key by finding closure of attributes for given functional dependencies:
{Roll_No, Course_Name}+            =          {Roll_No, Course_Name}
                                                      =          {Roll_No, Course_Name, Faculty}
As we can see, that we can derive all attributes of relation with the closure of {Roll_No, Course_Name}, so we can say that  {Roll_No, Course_Name} are candidate keys.

Step-2: So now, we can decompose the given relation into two relations to have super keys within each relation.

Relation1: Student_Course
Roll_No
Faculty
1
APC
1
VHK
2
RPP
3
AJS
4
APC

Relation2: Course_Faculty
Course_Name
Faculty
DBMS
APC
DSA
VHK
CO
RPP
DBMS
AJS

The relation is said to be in fourth normal form, if it is Boyce codd normal form and the table should not have any multivalued dependency.

Multivalued dependency: For a single value of attribute A, if there exists multiple values of B, then such table contains multi-valued dependency.

Rules to ensure multivalued dependency on any relation:
For a functional dependency A -> B, if a single value of A derives multiple values of B, then the table might contain multi-valued dependency.
In a relation R(A,B,C), if there exists a multi-valued dependency between, A and B, then B and C should remain independent of each other.
Note: A table should have at-least 3 columns for it to have a multi-valued dependency.
Let us consider one relation Subject_allocation with following functional dependency:
Subject
Faculty
Textbook
DBMS
AJS
MCGraw Hill
APC
Pearson
AMS

CO
RPP
Pearson
MCGraw Hill
In the above given example we can see that, for subject attribute we have multivalued dependency faculty, as well as we can see textbook and faculty are independent attributes. So we can split the relation into two relations as follows:

Relation1: Subject_Faculty

Subject
Faculty

DBMS
AJS

APC

AMS
CO
RPP
Relation2: Subject_Textbook
Subject
Textbook
DBMS
MCGraw Hill
DBMS
Pearson
CO
Pearson
CO
MCGraw Hill


Comments

Popular posts from this blog

Transaction State Diagram

Introduction to transaction