Normalization – 1NF, 2NF, 3NF, BCNF and 4NF
Normalization
- Normalization is the process of organizing the data.
- Normalization is used to
minimize and eliminate the redundancy from a relation or set of relations.
It is also used to eliminate the undesirable characteristics like
Insertion, Updation and Deletion anomalies.
- It divides the larger table
into the smaller table and links them using relationship.
Anomalies in Database
Update Anomalies: When several
instances of the same data are scattered across the database without a
proper relationship, it could cause inconsistency where few of the
instances will get updated with new values while some of them will remain
with old values. This might lead to an inconsistent state of database.
Deletion Anomalies: Incomplete
deletion of a particular data section which leaves some residual instances
of that data. The database creator remains unaware of such unwanted data
as it is present at a different location. The deletion of any particular
data may lead to deleting other attributes referencing another table.
Insertion Anomalies: This occurs
when an attempt to insert data into a non-existent record i.e. child is
inserted without a parent.
Types of Normal
Forms:
There are
mainly five types of normal forms:
1. First
Normal Form(1NF)
2. Second
Normal Form(2NF)
3. Third
Normal Form(3NF)
4. Boyce
codd Normal Form(BCNF)
5. Fourth
Normal Form(4NF)
1. First Normal Form(1NF)
The relation is said to be in first normal form if it does not contain any multivalued attributes.
It should have only atomic value.
Every column must have values of its particular
domain.
Every colum must have unique names.
Let us consider Student table having following data
with multivalued attribute.
Roll_No
|
Stud_Name
|
Stud_Contact
|
1
|
Pooja
|
7649356783,
8574334584
|
2
|
Palak
|
9483945763
|
3
|
Rajvi
|
9596049359
|
From the above table, we can identify that student
Pooja is having multiple values in Stud_Contact attribute, which violates the
rule of first normal form.
So, to bring the above given table in 1st Normal
Form, we can have following resolution:
Roll_No
|
Stud_Name
|
Stud_Contact
|
1
|
Pooja
|
7649356783
|
1
|
Pooja
|
8574334584
|
2
|
Palak
|
9483945763
|
3
|
Rajvi
|
9596049359
|
2.
Second Normal Form(2NF)
A
relation is said to be in second normal form, if it is in first normal form and
every non-prime attribute is fully functionally dependent on the candidate key
of that table.
Non-prime
attribute: An attribute that is not a part of candidate key is called non-prime
attribute.
Second
Normal form does not allow partial dependency.
Partial
dependency: In a relation, if proper subset of candidate key that determines
non-prime attribute, is called partial dependency.
If any
partial dependency exist, we remove it by placing the attributes in new
relation for converting the relation into second normal form.
Let us
consider one relation say Student, having following functional dependencies:
FD1:
{Roll_No, Course_Id} -> {Student_Name}
FD2:
{Course_Id} -> {Course_Name}
Roll_No
|
Course_Id
|
Stud_Name
|
Course_Name
|
1
|
CE001
|
Pooja
|
Computer
|
2
|
CE002
|
Rani
|
Civil
|
3
|
CE003
|
Palak
|
IT
|
Step-1: Identify the candidate key
by finding closure of attributes for given functional dependencies:
{Roll_No,
Course_Id}+ =
{Roll_No, Course_Id}
=
{Roll_No, Course_Id, Stud_Name}
= {Roll_No, Course_Id, Stud_Name, Course_Name}
As we can
see, that we can derive all attributes of a relation with closure
of {Roll_No, Course_Id}, so we can say that {Roll_No,
Course_Id} is a candidate key.
Step-2: Identify prime and non-prime attributes in a given relation:
Prime
attributes: Roll_No, Course_Id (attributes of candidate key)
Non-prime
attributes: Stud_Name, Course_Name
(Dependent attributes)
Step-3: Find if any partial dependency exist
In the above given example there is partial dependency as Course_Id is
individually trying to determine Course_Name without involving Roll_No, so
there exist the partial dependency.
Step-4: Decompose the relation in
such a way that, partial dependency is removed.
So the
above given relation can be decomposed in following relations:
Relation1: Student_Detail
Roll_No
|
Course_Id
|
Stud_Name
|
1
|
CE001
|
Pooja
|
2
|
CE002
|
Rani
|
3
|
CE003
|
Palak
|
Relation2: Course
Course_Id
|
Course_Name
|
CE001
|
Computer
|
CE002
|
Civil
|
CE003
|
IT
|
So, the
relation formed after decomposing are Student_Detail and Course, and now no
partial dependency exists.
Third
Normal Form(3NF)
The
relation is said to be in third normal form if it is in second normal form and
it should not have any transitive dependency.
Transitive
dependency: When a non-prime attribute finds another non-prime attribute, it is
called transitive dependency.
Let us
consider one relation say Student, having following functional dependencies:
FD1:
{Roll_No} -> {Course_Id, Student_Name}
FD2:
{Course_Id} -> {Course_Name}
Roll_No
|
Course_Id
|
Stud_Name
|
Course_Name
|
1
|
CE001
|
Pooja
|
Computer
|
2
|
CE002
|
Rani
|
Civil
|
3
|
CE003
|
Palak
|
IT
|
Step-1: Identify the candidate key
by finding closure of attributes for given functional dependencies:
{Roll_No}+ =
{Roll_No}
=
{Roll_No, Course_Id, Stud_Name}
= {Roll_No, Course_Id, Stud_Name, Course_Name}
As we can
see, that we can derive all attributes of a relation with closure
of {Roll_No}, so we can say that {Roll_No} is a candidate key.
Step-2: Identify prime and non-prime attributes in a given relation:
Prime
attributes: Roll_No (attributes of candidate key)
Non-prime
attributes: Course_Id, Stud_Name,
Course_Name (Dependent attributes)
Step-3: Find if any transitive dependency exist?
In the above given example there is transitive dependency as Course_Id
which is non-prime attribute is trying to determine Course_Name, so there is
existence of transitive dependency.
Step-4: Decompose the relation in such a way that, transitive dependency is removed.
So the
above given relation can be decomposed in following relations:
Relation1: Student_Detail
Roll_No
|
Course_Id
|
Stud_Name
|
1
|
CE001
|
Pooja
|
2
|
CE002
|
Rani
|
3
|
CE003
|
Palak
|
Relation2: Course
Course_Id
|
Course_Name
|
CE001
|
Computer
|
CE002
|
Civil
|
CE003
|
IT
|
So by
decomposing a relation in two relations i.e. Student_detail and Course, we have
made Course_Id as a prime attribute in relation Course.
Boyce
Codd Normal Form(BCNF)
Boyce
Codd Normal Form is the strict form of third normal form.
A
relation is said to be in BCNF, if it is in third normal form and for any
dependency A->B, A should be super key of that table.
Note: All
superkeys are not candidate keys but all candidate keys are super keys.
Let us take one example on this:
Consider a relation Student having following details
with given functional dependencies:
FD1: {Roll_No, Course_Name}
-> {Faculty}
Roll_No
|
Course_Name
|
Faculty
|
1
|
DBMS
|
APC
|
1
|
DSA
|
VHK
|
2
|
CO
|
RPP
|
3
|
DBMS
|
AJS
|
4
|
DBMS
|
APC
|
Step-1: Identify the candidate key
by finding closure of attributes for given functional dependencies:
{Roll_No,
Course_Name}+ =
{Roll_No, Course_Name}
= {Roll_No, Course_Name, Faculty}
As we can
see, that we can derive all attributes of relation with the closure
of {Roll_No, Course_Name}, so we can say that {Roll_No, Course_Name}
are candidate keys.
Step-2: So now, we can decompose the
given relation into two relations to have super keys within each relation.
Relation1: Student_Course
Roll_No
|
Faculty
|
1
|
APC
|
1
|
VHK
|
2
|
RPP
|
3
|
AJS
|
4
|
APC
|
Relation2: Course_Faculty
Course_Name
|
Faculty
|
DBMS
|
APC
|
DSA
|
VHK
|
CO
|
RPP
|
DBMS
|
AJS
|
The relation is said to be in fourth normal form,
if it is Boyce codd normal form and the table should not have any multivalued
dependency.
Multivalued dependency: For a single value of
attribute A, if there exists multiple values of B, then such table contains
multi-valued dependency.
Rules to ensure multivalued dependency on any
relation:
For a functional dependency A -> B, if a single
value of A derives multiple values of B, then the table might contain
multi-valued dependency.
In a relation R(A,B,C), if there exists a
multi-valued dependency between, A and B, then B and C should remain
independent of each other.
Note: A table should have at-least 3 columns for it
to have a multi-valued dependency.
Let us consider one relation Subject_allocation
with following functional dependency:
Subject
|
Faculty
|
Textbook
|
DBMS
|
AJS
|
MCGraw Hill
|
APC
|
Pearson
|
|
AMS
|
||
CO
|
RPP
|
Pearson
|
MCGraw Hill
|
In the above given example we can see that, for subject
attribute we have multivalued dependency faculty, as well as we can see
textbook and faculty are independent attributes. So we can split the relation
into two relations as follows:
Relation1: Subject_Faculty
Subject
|
Faculty
|
|
DBMS
|
AJS
|
|
APC
|
||
AMS
|
||
CO
|
RPP
|
|
Relation2: Subject_Textbook
Subject
|
Textbook
|
DBMS
|
MCGraw Hill
|
DBMS
|
Pearson
|
CO
|
Pearson
|
CO
|
MCGraw Hill
|
Comments
Post a Comment