Database design is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity.
The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system (DBMS).
Database normalization
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
Normalization done by because to Optimalisasi Structure of is tables, Improving speed, Eliminating same data inclusion, More efficient in storage media use, Lessening redundans, Avoiding anomali ( insertion anomalies, deletion anomalies, update the anomalies) and improved Data integrity.
A tables told by goodness of if fulfilling 3 criterion :
· If there is decomposition of is tables of, hence decomposition have to be secured ( Lossless-Join Decomposition
· The looking after of functional depending at the (time) of data change ( Dependency Preservation).
· Do not impinge the Normal Boyce-Code of Form ( BCNF)
If third criterion ( BCNF) cannot be fullfiled, hence at least the tables do not impinge the Normal Form of third phase ( 3rd Normal of Form / 3NF )
functional dependency
functional dependency (FD) is a constraint between two sets of attributes in a relation from a database.
Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, (written X → Y) if and only if each X value is associated with precisely one Y value. Customarily we call X the determinant set and Y the dependent attribute. Thus, given a tuple and the values of the attributes in X, one can determine the corresponding value of the Y attribute. For the purposes of simplicity, given that X and Y are sets of attributes in R, X → Y denotes that X functionally determines each of the members of Y - in this case Y is known as the dependent set. Thus, a candidate key is a minimal set of attributes that functionally determine all of the attributes in a relation.
(Note: the "function" being discussed in "functional dependency" is the function of identification.)
A functional dependency FD:X\to Y is called trivial if Y is a subset of X.
The determination of functional dependencies is an important part of designing databases in the relational model, and in database normalization and denormalization. The functional dependencies, along with the attribute domains, are selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the system as possible.
For example, suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has a unique vehicle identification number (VIN). One would write VIN → EngineCapacity because it would be inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have one engine.) However, EngineCapacity → VIN, is incorrect because there could be many vehicles with the same engine capacity.
This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with candidate key VIN. However, that may not always be appropriate. For example, if that functional dependency occurs as a result of the transitive functional dependencies
\mbox{VIN}\,\to\,\mbox{VehicleModel}, \mbox{VehicleModel}\,\to\,\mbox{EngineCapacity},
then that would not result in a normalized relation.
Trivial functional dependency
A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}.
Full functional dependency
An attribute is fully functionally dependent on a set of attributes X if it is
· functionally dependent on X, and
· not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.
Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.
Multivalued dependency
A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows.
Join dependency
A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.
Superkey
A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {Employee ID, Employee Address, Skill} would be a superkey for the "Employees' Skills" table; {Employee ID, Skill} would also be a superkey.
Candidate key
A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.
Non-prime attribute
A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.
Primary key
Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary keyis a key which the database designer has designated for this purpose.
First
First Normal Form (1NF) is now generally considered part of the formal definition of a relation. Historically, 1NF was intended to disallow multi-valued attributes. 1NF dictates that the domains (allowable values) of attributes must include only atomic (simple, indivisible) values and that any given value of an instance of an attribute must be a single value from the domain of that attribute. In short, a given cell of a column in a table can contain only one value.
The following table violates 1NF because the second row contains more than one value in the COLOR column:
To ensure a table is in 1NF, one simply needs to decompose grouped attributes into separate rows (or in some case, tables). The
following representation of the above data is in 1NF (and others):
Second
Second Normal Form (2NF) is based on the concept of "full" functional dependency. A functional dependency, X ® Y, is a full functional dependency if removal of any attribute from X means that the dependency does not hold anymore. For example, Given a table that tracks hours (HOURS) a given employee (SSN) devotes to a given project (PROJNUM), we note that HOURS is functionally dependent on the combination of SSN and PROJNUM because a given employee can work on more than one project. Removal of either SSN or PROJNUM from the functional dependency results in an incorrect relationship. For example, if we were to remove SSN from the previous functional dependency, we are left with HOURS and PROJNUM, but we don’t know which SSN worked those HOURS! Thus, we say SSN, PROJNUM is a full functional dependency of
HOURS.
A table is in 2NF if that table is in 1NF and every non-prime (is not involved in a primary key of the table) attribute in that table is fully functionally dependent on the primary key of the table.
For example, the following table is in 1NF but is not in 2NF because PNAME and PLOCATION are dependent on only part of the primary key (PROJNUM and SSN). Likewise, ENAME is also only dependent on SSN. (Primary keys will be denoted in this document by underlining the included columns.) Employee_Project_Table
To correct this schema, we need to create additional tables and decompose the partial dependencies into these new tables, as in figure 4.
(Please note that, for simplicity, these diagrams will not denote the resulting referential integrity, such as foreign keys, that would need to be added to these decomposed schemas.)
Employee_Table
Third
Third Normal Form (3NF) is based on the concept of "transitive dependency". A transitive dependency can be loosely defined as a dependency that does not involve the primary key. For example, in the table below, we see that while all elements have functional dependencies on the key, SSN, there also exist other, transitive, dependencies. Namely, DEPTNAME is dependent on DEPTNUM.
Employee_Department_Table
A table is in 3NF if it is in 2NF and no non-key attributes are dependent on other non-key attributes.
We can decompose the above table into 3NF by creating a second table for department. Thus, the following structure is in 3NF:
Employee_Table
Department_Table
There is a subtle difference between 2NF and 3NF. In 2NF, we were concerned about non-key fields being dependent on subsets of the key. In 3NF, we are concerned about non-key fields being dependent on other non-key fields. Another way to say this has been nicely summarized as: any non-key field must be "… Dependent on the key, the whole key, and nothing but the key."
Elementary Key
Elementary Key Normal Form (EKNF) is a subtle enhancement on 3NF (by definition, EKNF tables are also in 3NF) that most often occurs when there is more than one unique composite key (more than one column) which overlap (one or more columns are involved in both keys) in a table 3. Such cases can cause redundant information in the overlapping column(s). For example, in the following table, let’s assume that a subject title (SUBJECTTITLE) is also a unique identifier for a given subject in the following table:
Enrollment_Table
The primary key of the above table is the combination of STUDENTNUM and SUBJECTCODE. However, we can also see a (non-primary) uniqueness constraint (alternate key) that should span the STUDENTNUM and SUBJECTTITLE columns as well. The above schema could result in update and deletion anomalies because values of both SUBJECTCODE and SUBJECTTITLE tend to be repeated for a given subject. The following schema is a decomposition of the above table in order to satisfy EKNF:
Subject_Table
Enrollment_Table
For reasons that will become obvious in the following section, ensuring a table is in EKNF is usually skipped, as most designers will move directly on to Boyce-Codd Normal Form after ensuring that a schema is in 3NF. Thus, EKNF is included here only for reasons of historical accuracy and completeness.
Boyce-Codd
Like EKNF, the only time a table is in 3NF but is not in Boyce-Codd Normal form (BCNF) is when the table contains two or more candidate keys that overlap. Beyond that, there is only a subtle difference between EKNF and BCNF, which I will outline below. Consider the same example we used to illustrate EKNF, but we now add a column (GRADE) to denote a student’s grade received in the course. (Further, for illustrative simplicity, let’s assume that a student can only take a course once.)
Enrollment_Grade_Table
We see here that the GRADE column is dependent only on a given enrollment pair and that the keys are now elementary 3 (which satisfies EKNF). However, SUBJECTTITLE is dependent on SUBJECTCODE. Since the key of the table is STUDENTNUM and SUBJECTCODE, we decompose this structure into the following two tables which satisfy BCNF:
One may note that this would also have happened to solve our EKNF problem in the previous section. For that very reason, most designers seldom worry about EKNF and move straight on to BCNF.
Fourth
The final normal forms are concerned with multi-valued facts. We can also note that they are concerned with composite keys, as they tend to minimize the number of fields involved in a composite key. A table is in Fourth Normal form if it is in BCNF and all functional dependencies are "single valued". Another way to state this is to say that a table cannot contain two or more independent "multivalued" 4 facts. By "independent", we mean to say that there is no direct connection between the two (or more) multivalued facts. This vague definition is better handled by example. In the following table (in BCNF, since it is entirely composed of attributes involved in the key), we record people (NAME), instruments they play (INSTRUMENT), and music styles (MUSICSTYLE) they play.
We see that redundancy occurs because a given person (NAME) can play more than one INSTRUMENT and play more than one MUSICSTYLE (the fact that ‘Hallock’ plays the ‘Piano’ is repeated, as is the fact that he plays the ‘Blues’ and ‘Classical’). Further, this table seems to suggest a link between instruments and music styles. Can ‘Hallock’ play ‘Blues’ with a French Horn 5? (Yes, and if you know Hallock, you know he plays the blues with anything, including spoons!)
In other words, we see that there are two independent multi-valued facts in the above table. The first is that a person (NAME) can play more than one INSTRUMENT while the second is that a person (NAME) can play more than one MUSICSTYLE. These facts are independent because these two facts have no bearing on each other. Decomposing this table into two tables (below) solves the problem.
Figure 12: The table in Figure 11 has been decomposed into a 4NF schema.
One should note that 4NF only applies to tables with three or more attributes (it eliminates overlapping multi-valued dependencies, which, by definition, require three or more attributes) and only when all attributes compose the primary key of the table.
Fifth Normal Form and Project Join
Cases where a table is in 4NF but is not in Fifth Normal Form (5NF) are extremely rare. Further, Project Join
has the following projections:
The training table in Figure 13 may or may not be in 5NF depending on the business rules. Say we have to enforce the rule:
An EMPLOYEE trains a CLASSTYPE for a COMPANY if and only if an EMPLOYEE trains a CLASSTYPE, the EMPLOYEE trains for a COMPANY, and the COMPANY the EMPLOYEE trains for makes a tool that implements the CLASSTYPE the EMPLOYEE trains.
If we enforce the above rule the table in Figure 13 is not in 5NF and must be reduced to three tables represented by the above projections of the original table.
To achieve 5NF, one checks all-key tables for decompositions whose joins result in the same information. A cautionary note, however, is that such decompositions can lead to a loss of constraint knowledge. For example, in the above case, we need to create database code to handle the specified rule between an EMPLOYEE, the CLASSTYPEs they train, and the COMPANY who makes the tool that implements the CLASSTYPE.
The root concept behind 4NF, 5NF, and PJNF is that the tables not in these normal forms can be derived from simpler, more fundamental relationships. Further, 5NF does not differ from 4NF unless there are other rules (symmetric constraints) that dictate correct data population Lastly, 5NF differs from 4NF in that the fact combinations we are concerned with are no longer independent from each other (due to the semantic constraints).
Reffrences :http://databases.about.com/od/specificproducts/a/normalization.htm
http://en.wikipedia.org/wiki/Functional_dependency
http://en.wikipedia.org/wiki/Database_design
http://www3.gpf.or.th/knowledge/adb/im/sample4.pdf