Normalization – NF1, NF2, NF3. How to normalize your database?

最新推荐文章于 2024-03-20 09:04:29 发布

weixin_33756418

最新推荐文章于 2024-03-20 09:04:29 发布

阅读量394

点赞数

CC 4.0 BY-SA版权

原文链接：https://my.oschina.net/u/2275100/blog/867745

本文深入浅出地介绍了数据库规范化的三个阶段：第一范式、第二范式和第三范式，并通过实例展示了如何消除数据冗余，提高数据清晰度，从而提升数据库性能。

2019独角兽企业重金招聘Python工程师标准>>>

What is normalization?

Database normalization is the process of organizing the columns and tables of a relational database to minimize data redundancy.

And another definition:

Normalization splits a large table into smaller tables and define relationships between them to increases the clarity in organizing data.

In general I like to see tables that have one purpose. Having the table serve many purposes introduces many of the challenges like: data duplication, data update issues, and increased effort to query data.

Single purpose and relations instead of mixed up table

Why normalize?

If you have big database that is not normalized, people working with it most likely will face lots of problems. Performance will be bad, there are high chances of having multiple rows for one object, then, some day one row will change, and then you have a conflict.

Long story short – successfully normalized databases will increase performance and reduce storage.

So this is our table, let’s say we need some kind of database for an agency.

EmployeeId	Person	Phone
1	John Jones	27716602, 26153215,
2	Peter Janes	47716602, 36153215, 52321262
3	Michael Juines	17716602

We are storing our employee id, the name and their phone numbers.

Common misconception: Our database structure heavily depends on the required functionality. For example. If we always use a full name of the person, it is not necessary to split the “Person” column into first name and last name columns.

About phone numbers, if we know, that there is a limit, for example: work phone, personal mobile phone and fax these three columns wouldn’t count as duplicative columns and would still pass first normal form requirements.

To clarify, let’s assume that we need this functionality for the finished product:

We are going to send personalized letters. (e.g. Hello John, how are you doing?)
People can only add their mobile phone number.
There is no limit of the mobile phone numbers they can add.

1NF

Firstnormal formsets the very basic rules for an organized database:

Each column is unique
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Requirements for 1NF

Table must have at least one candidate key (that all attributes depend on)
Every column stores atomic values (example, if you use full name all the time it makes sense to store full name, if you will use first name and last name separately – you should store name and last name separately)
Not a repeating group (multiple phone numbers in one row) – row that stores multiple similar values instead of one

EmployeeId	Person	Phone
1	John Jones	27716602
1	John Jones	26153215
2	Peter Janes	47716602
2	Peter Janes	36153215
2	Peter Janes	52321262
3	Michael Juines	17716602

Now first normal form is satisfied, as the columns in this table hold just one value.

2 Normal Form

Second normal form (2NF) further addresses the concept of removing duplicative data:

Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys

Requirements for 2NF

Must meet all 1NF requirements

Non-key attributes must not depend on subset of any candidate key.

As we can see in this example, we have a new table to store all the phones, and another table to determine who owns which phone number.

Employees

EmployeeId	Name	Last
1	John	Jones
2	Peter	Janes
3	Michael	Juines

Phones

PhoneId	Phone
1	27716602
2	26153215
3	47716602
4	36153215
5	52321262
6	17716602

This means EmployeeId and PhoneId are both Primary keys of the table? Not exactly, they are composite primary keys referring to entries in other two tables.

SQL code to create that table would look like this.

Transact-SQL

Create table EmployeesPhones ( CustomerID int references Employees(EmployeeId) not null, ProductID int references Phones(PhoneId) not null Primary key (CustomerID, ProductID) )

Create table EmployeesPhones

(

CustomerID int references Employees(EmployeeId) not null,

ProductID int references Phones(PhoneId) not null

Primary key (CustomerID, ProductID)

)

EmployeesPhones

EmployeeId	PhoneId
1	1
1	2
2	3
2	4
2	5
3	6

Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:

Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.

For third normal form, I adjusted the table a little bit. I added Zip code and city columns.

We can still save a little bit of space, ZIP code determines a city, meaning that many people can live in the same city and have the same ZIP code.

We find that in the above Employees table, EmployeeId is the key and only prime key attribute. We find that City can be identified by EmployeeId as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. In this case – transitive dependency exists.

Employees