gunggus: April 2009

Minggu, 26 April 2009

Database normalization

Database design

Database design is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity.

The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the relational model these are the tables and views. In an object database the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system (DBMS).

Database normalization

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

Normalization done by because to Optimalisasi Structure of is tables, Improving speed, Eliminating same data inclusion, More efficient in storage media use, Lessening redundans, Avoiding anomali ( insertion anomalies, deletion anomalies, update the anomalies) and improved Data integrity.

A tables told by goodness of if fulfilling 3 criterion :

· If there is decomposition of is tables of, hence decomposition have to be secured ( Lossless-Join Decomposition

· The looking after of functional depending at the (time) of data change ( Dependency Preservation).

· Do not impinge the Normal Boyce-Code of Form ( BCNF)

If third criterion ( BCNF) cannot be fullfiled, hence at least the tables do not impinge the Normal Form of third phase ( 3rd Normal of Form / 3NF )

functional dependency

functional dependency (FD) is a constraint between two sets of attributes in a relation from a database.

Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, (written X → Y) if and only if each X value is associated with precisely one Y value. Customarily we call X the determinant set and Y the dependent attribute. Thus, given a tuple and the values of the attributes in X, one can determine the corresponding value of the Y attribute. For the purposes of simplicity, given that X and Y are sets of attributes in R, X → Y denotes that X functionally determines each of the members of Y - in this case Y is known as the dependent set. Thus, a candidate key is a minimal set of attributes that functionally determine all of the attributes in a relation.

(Note: the "function" being discussed in "functional dependency" is the function of identification.)

A functional dependency FD:X\to Y is called trivial if Y is a subset of X.

The determination of functional dependencies is an important part of designing databases in the relational model, and in database normalization and denormalization. The functional dependencies, along with the attribute domains, are selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the system as possible.

For example, suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has a unique vehicle identification number (VIN). One would write VIN → EngineCapacity because it would be inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have one engine.) However, EngineCapacity → VIN, is incorrect because there could be many vehicles with the same engine capacity.

This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with candidate key VIN. However, that may not always be appropriate. For example, if that functional dependency occurs as a result of the transitive functional dependencies

\mbox{VIN}\,\to\,\mbox{VehicleModel}, \mbox{VehicleModel}\,\to\,\mbox{EngineCapacity},

then that would not result in a normalized relation.

Trivial functional dependency

A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {Employee Address}.

Full functional dependency

An attribute is fully functionally dependent on a set of attributes X if it is

· functionally dependent on X, and

· not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.

Transitive dependency

A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

Multivalued dependency

A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows.

Join dependency

A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Superkey

A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {Employee ID, Employee Address, Skill} would be a superkey for the "Employees' Skills" table; {Employee ID, Skill} would also be a superkey.

Candidate key

A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee Id, Skill} would be a candidate key for the "Employees' Skills" table.

Non-prime attribute

A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.

Primary key

Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary keyis a key which the database designer has designated for this purpose.

First Normal Form (1NF)

First Normal Form (1NF) is now generally considered part of the formal definition of a relation. Historically, 1NF was intended to disallow multi-valued attributes. 1NF dictates that the domains (allowable values) of attributes must include only atomic (simple, indivisible) values and that any given value of an instance of an attribute must be a single value from the domain of that attribute. In short, a given cell of a column in a table can contain only one value.

The following table violates 1NF because the second row contains more than one value in the COLOR column:

To ensure a table is in 1NF, one simply needs to decompose grouped attributes into separate rows (or in some case, tables). The

following representation of the above data is in 1NF (and others):

Second Normal Form

Second Normal Form (2NF) is based on the concept of "full" functional dependency. A functional dependency, X ® Y, is a full functional dependency if removal of any attribute from X means that the dependency does not hold anymore. For example, Given a table that tracks hours (HOURS) a given employee (SSN) devotes to a given project (PROJNUM), we note that HOURS is functionally dependent on the combination of SSN and PROJNUM because a given employee can work on more than one project. Removal of either SSN or PROJNUM from the functional dependency results in an incorrect relationship. For example, if we were to remove SSN from the previous functional dependency, we are left with HOURS and PROJNUM, but we don’t know which SSN worked those HOURS! Thus, we say SSN, PROJNUM is a full functional dependency of

HOURS.

A table is in 2NF if that table is in 1NF and every non-prime (is not involved in a primary key of the table) attribute in that table is fully functionally dependent on the primary key of the table.

For example, the following table is in 1NF but is not in 2NF because PNAME and PLOCATION are dependent on only part of the primary key (PROJNUM and SSN). Likewise, ENAME is also only dependent on SSN. (Primary keys will be denoted in this document by underlining the included columns.) Employee_Project_Table

To correct this schema, we need to create additional tables and decompose the partial dependencies into these new tables, as in figure 4.

(Please note that, for simplicity, these diagrams will not denote the resulting referential integrity, such as foreign keys, that would need to be added to these decomposed schemas.)

Employee_Table

Third Normal Form

Third Normal Form (3NF) is based on the concept of "transitive dependency". A transitive dependency can be loosely defined as a dependency that does not involve the primary key. For example, in the table below, we see that while all elements have functional dependencies on the key, SSN, there also exist other, transitive, dependencies. Namely, DEPTNAME is dependent on DEPTNUM.

Employee_Department_Table

A table is in 3NF if it is in 2NF and no non-key attributes are dependent on other non-key attributes.

We can decompose the above table into 3NF by creating a second table for department. Thus, the following structure is in 3NF:

Employee_Table

Department_Table

There is a subtle difference between 2NF and 3NF. In 2NF, we were concerned about non-key fields being dependent on subsets of the key. In 3NF, we are concerned about non-key fields being dependent on other non-key fields. Another way to say this has been nicely summarized as: any non-key field must be "… Dependent on the key, the whole key, and nothing but the key."

Elementary Key Normal Form

Elementary Key Normal Form (EKNF) is a subtle enhancement on 3NF (by definition, EKNF tables are also in 3NF) that most often occurs when there is more than one unique composite key (more than one column) which overlap (one or more columns are involved in both keys) in a table 3. Such cases can cause redundant information in the overlapping column(s). For example, in the following table, let’s assume that a subject title (SUBJECTTITLE) is also a unique identifier for a given subject in the following table:

Enrollment_Table

The primary key of the above table is the combination of STUDENTNUM and SUBJECTCODE. However, we can also see a (non-primary) uniqueness constraint (alternate key) that should span the STUDENTNUM and SUBJECTTITLE columns as well. The above schema could result in update and deletion anomalies because values of both SUBJECTCODE and SUBJECTTITLE tend to be repeated for a given subject. The following schema is a decomposition of the above table in order to satisfy EKNF:

Subject_Table

Enrollment_Table

For reasons that will become obvious in the following section, ensuring a table is in EKNF is usually skipped, as most designers will move directly on to Boyce-Codd Normal Form after ensuring that a schema is in 3NF. Thus, EKNF is included here only for reasons of historical accuracy and completeness.

Boyce-Codd Normal Form

Like EKNF, the only time a table is in 3NF but is not in Boyce-Codd Normal form (BCNF) is when the table contains two or more candidate keys that overlap. Beyond that, there is only a subtle difference between EKNF and BCNF, which I will outline below. Consider the same example we used to illustrate EKNF, but we now add a column (GRADE) to denote a student’s grade received in the course. (Further, for illustrative simplicity, let’s assume that a student can only take a course once.)

Enrollment_Grade_Table

We see here that the GRADE column is dependent only on a given enrollment pair and that the keys are now elementary 3 (which satisfies EKNF). However, SUBJECTTITLE is dependent on SUBJECTCODE. Since the key of the table is STUDENTNUM and SUBJECTCODE, we decompose this structure into the following two tables which satisfy BCNF:

One may note that this would also have happened to solve our EKNF problem in the previous section. For that very reason, most designers seldom worry about EKNF and move straight on to BCNF.

Fourth Normal Form

The final normal forms are concerned with multi-valued facts. We can also note that they are concerned with composite keys, as they tend to minimize the number of fields involved in a composite key. A table is in Fourth Normal form if it is in BCNF and all functional dependencies are "single valued". Another way to state this is to say that a table cannot contain two or more independent "multivalued" 4 facts. By "independent", we mean to say that there is no direct connection between the two (or more) multivalued facts. This vague definition is better handled by example. In the following table (in BCNF, since it is entirely composed of attributes involved in the key), we record people (NAME), instruments they play (INSTRUMENT), and music styles (MUSICSTYLE) they play.

We see that redundancy occurs because a given person (NAME) can play more than one INSTRUMENT and play more than one MUSICSTYLE (the fact that ‘Hallock’ plays the ‘Piano’ is repeated, as is the fact that he plays the ‘Blues’ and ‘Classical’). Further, this table seems to suggest a link between instruments and music styles. Can ‘Hallock’ play ‘Blues’ with a French Horn 5? (Yes, and if you know Hallock, you know he plays the blues with anything, including spoons!)

In other words, we see that there are two independent multi-valued facts in the above table. The first is that a person (NAME) can play more than one INSTRUMENT while the second is that a person (NAME) can play more than one MUSICSTYLE. These facts are independent because these two facts have no bearing on each other. Decomposing this table into two tables (below) solves the problem.

Figure 12: The table in Figure 11 has been decomposed into a 4NF schema.

One should note that 4NF only applies to tables with three or more attributes (it eliminates overlapping multi-valued dependencies, which, by definition, require three or more attributes) and only when all attributes compose the primary key of the table.

Fifth Normal Form and Project Join Normal Form

Cases where a table is in 4NF but is not in Fifth Normal Form (5NF) are extremely rare. Further, Project Join Normal Form (PJNF) is a slightly stronger (although this is debated) case of 5NF, and in virtually all cases it can be treated as an equivalent. Therefore, PJNF is included here for completeness. As in 4NF, 5NF considerations apply only to tables with three or more attributes, all of which comprise the primary key. The formal definition of 5NF and PJNF requires that we must first define a "projection". A projection of a table is a subset of the total number of columns with no duplicate rows. For example, the following table:

has the following projections:

The training table in Figure 13 may or may not be in 5NF depending on the business rules. Say we have to enforce the rule:

An EMPLOYEE trains a CLASSTYPE for a COMPANY if and only if an EMPLOYEE trains a CLASSTYPE, the EMPLOYEE trains for a COMPANY, and the COMPANY the EMPLOYEE trains for makes a tool that implements the CLASSTYPE the EMPLOYEE trains.

If we enforce the above rule the table in Figure 13 is not in 5NF and must be reduced to three tables represented by the above projections of the original table.

To achieve 5NF, one checks all-key tables for decompositions whose joins result in the same information. A cautionary note, however, is that such decompositions can lead to a loss of constraint knowledge. For example, in the above case, we need to create database code to handle the specified rule between an EMPLOYEE, the CLASSTYPEs they train, and the COMPANY who makes the tool that implements the CLASSTYPE.

The root concept behind 4NF, 5NF, and PJNF is that the tables not in these normal forms can be derived from simpler, more fundamental relationships. Further, 5NF does not differ from 4NF unless there are other rules (symmetric constraints) that dictate correct data population Lastly, 5NF differs from 4NF in that the fact combinations we are concerned with are no longer independent from each other (due to the semantic constraints).

Reffrences :

http://databases.about.com/od/specificproducts/a/normalization.htm

http://en.wikipedia.org/wiki/Functional_dependency

http://en.wikipedia.org/wiki/Database_design

http://www3.gpf.or.th/knowledge/adb/im/sample4.pdf

Minggu, 19 April 2009

DATABASE AND ER-Diagram

DATABASE

A database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships.

DBMS ( Database Management System )

A database management system (DBMS) is computer software that manages databases. DBMSes may use any of a variety of database models, such as the network model or relational model. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way.

BIT, BYTE, FIELD

Bit

A bit is a binary digit, taking a logical value of either "1" or "0" (also referred to as "true" or "false" respectively). Binary digits are a basic unit of information storage and communication in digital computing and digital information theory. Information theory also often uses the natural digit, called either a nit or a nat. Quantum computing user qubits; single piece of quantum information encoded on a two level quantum system and hence having the potential to exist in superposition of "true" and "false".

Byte

A byte is a basic unit of measurement of information storage in computer science In many computer architecture it is a unit of memory addressing . There is no standard but a byte most often consists of eight bits.

A byte is an ordered collection of bits, with each bit denoting a single binary value of 1 or 0. The byte most often consists of 8 bits in modern systems; however, the size of a byte can vary and is generally determined by the underlying computer operating system or hardware. Historically, byte size was determined by the number of bits required to represent a single character from a Western character set. Its size was generally determined by the number of possible characters in the supported character set and was chosen to be a divisor of the computer's word zize. Historically bytes have ranged from five to twelve bits.

Field

Field is a set of byte-byte similar, in the database used the term attribute

Attribute/field

Attribute or Field is a characteristic from entity, which preparing detailed explanation about that's entity.
A relation could have atribute too.
Example of Attribute :

Student : NIM, Name, Sex, Address
Car : Plat Number, Color, CC
Book : ID, title, author

Type of Attribute

Single vs multivalue

Single > can only be filled at most one value

Multivalue > can be filled with more than one value with the same type of

Atomic vs composition

Atomic > can’t be divided into the attributes of smaller

composition > is a combination of several attributes of a smaller

Derived Attribute

attribute value can be derived from other attribute values, for example: age of the attributes generated from the date of birth.

Null Value Attribute

Attributes that have no value to a record

Mandatory Value Attribute

Attributes must have values

Record / Tuple

Record is a data line in an relation. Consist of attributes where there attribute can interaction to completely information a entity / relation.

Entity or File

Entity is a collection of same kind of record and having same element, same attribute however different in their data value
type of Entity:
in processing application, file can be categorized likes:
- mains file
- transaction file
- report file
- history file
- protector file
- work file

Domain

Domain is collection of values which enabled to stay in one or more attribute. Every attribute in a relational database is defined like a domain

Key of element data

Key is the element of record which used to find the record when accessing or can also used to identify every entity / record / row

Type of key :

superkey

A superkey is defined in the relational model of database organization as a set of attributes of a relation variable (relvar) for which it holds that in all relations assigned to that variable there are no two distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a superkey can also be defined as a set of attributes of a relvar upon which all attributes of the relvar are functionally dependent.

Candidate key

In the relational model, a candidate key of a relvar (relation variable) is a set of attributes of that relvar such that

1. at all times it holds in the relation assigned to that variable that there are no two distinct tuples with the same values for these attributes and

2. there is not a proper subset of this set of attributes for which (1) holds.

Since a superkey is defined as a set of attributes for which (1) holds, we can also define a candidate key as a minimal superkey, i.e. a superkey of which no proper subset can also be called as a candidate key.

The importance of candidate keys is that they tell us how we can identify individual tuples in a relation. As such they are one of the most important types of database constraint that should be specified when designing a database schema. Since a relation is a set (no duplicate elements), it holds that every relation will have at least one candidate key (because the entire heading is always a superkey). Since in some RDBMSs tables may also represent multisets (which strictly means these DBMSs are not relational), it is an important design rule to specify explicitly at least one candidate key for each relation. For practical reasons RDBMSs usually require that for each relation one of its candidate keys is declared as the primary key, which means that it is considered as the preferred way to identify individual tuples. Foreign keys, for example, are usually required to reference such a primary key and not any of the other candidate keys.

Primary Key

One of the key attributes of the candidate can be selected / specified a

primary key with the three following criteria:

1. Key is more natural to use as reference

2. Key is more simple

3. Key is guaranteed unique

Alternate Key

is an attribute of the candidate key is not selected to be primary key.

Foreign Key

is any attribute that points to the primary key in another table. Foreign key will be going on a relationship that has kardinalitas one to many or many to many. Foreign key is usually always put on the table that point to many.

External Key

is a lexical attribute (or set of lexical attributes) that values are always identify an object instance.

ERD (ENTITY RELATIONSHIP DIAGRAM)

Data models are tools used in analysis to describe the data requirements and assumptions in the system from a top-down perspective. They also set the stage for the design of databases later on in the SDLC.

There are three basic elements in ER models:

Entities are the "things" about which we seek information.

Attributes are the data we collect about the entities.

Relationships provide the structure needed to draw information from multiple entities.

Generally, ERD's look like this:

ELEMENTS OF THE ERD

Entity

In the ER Diagram Entity is described with the form of a rectangle. entity is something that exists in the real system and the abstract where the data stored or where there are data.

Relationship

ER diagram on the relationship can be described with a lozenge. Relationship is a natural relationship that occurs between entities. In general, the name given to the verb base making it easier to do readings relationships.

Relationship Degree

is the number of entities participating in a relationship. Degree which is often used in the ERD.

Attribute

is the nature or characteristics of each entity and relationship

Kardinalitas

tupel indicates the maximum number that can be berelasi with entities on the other entity

Relationship Degree

Unary Relationship is a relationship model between entity coming from same entity set.

Binary Relationship is a relationship model between 2 entity.

Ternary Relationship is a relationship between instance from 3 type of entitas unilaterally.

Cardinality

Cardinality indicates the maximum number of tables that can be relation with the entity on the other entity
Type of Cardinality :

One to One:

Level one to one relationship with the one stated in the entity's first event, only had one relationship with one incident in which the two entities and vice versa.

One to Many or Many to One:

Level one to many relationship is the same as the one to many depending on the direction from which the relationship is viewed. For one incident in the first entity can have many relationships with the incident on the second entity, if the one incident in which two entities can have only one incident hubugan with the first entity.

Many To Many:

if any incident occurs in an entity has many relationships with other entities in the incident.

NOTATION (E-R DIAGRAM)

Symbolic notation in the ER diagram is:

1. Rectangle represent the collective entity

2. Circle represent the attributes

3. Rhomb represent collective relationships

4. Line as the set of relations between the Association and the collective entity with entity attributes

Reference:

ER Ngurah Agus Sanjaya. Slide Part 5 - Database Dan Er-Diagram.

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Database_management_system

http://kur2003.if.itb.ac.id/file/SE6162%20ERD.pdf

http://www.umsl.edu/~sauterv/analysis/er/er_intro.html

Minggu, 05 April 2009

"DFD" Data Flow Diagram

Data Flow Diagram

Data flow diagram also called Data Flow Diagram (DFD). DFD is to describe the system modules in a smaller and less easy for the user to understand to understand the field of computer systems that will be done.

DFD also serves to describe the existing system or the new system will be developed logically without considering the physical environment where the data flows, or where data are stored. DFD is a tool that is used in the system development methodology is structured (structured analysis and design). DFD can describe the flow of data within the system with a structured and clear.

The result is a series of diagrams that represent the business activities in a way that is clear and easy to communicate. A business model comprises one or more data flow diagrams (also known as business process diagrams). Initially a context diagram is drawn, which is a simple representation of the entire system under investigation. This is followed by a level 1 diagram; which provides an overview of the major functional areas of the business. Don't worry about the symbols at this stage, these are explained shortly. Using the context diagram together with additional information from the area of interest, the level 1 diagram can then be drawn.

The level 1 diagram identifies the major business processes at a high level and any of these processes can then be analyzed further - giving rise to a corresponding level 2 business process diagram. This process of more detailed analysis can then continue – through level 3, 4 and so on. However, most investigations will stop at level 2 and it is very unusual to go beyond a level 3 diagram.

Identifying the existing business processes, using a technique like data flow diagrams, is an essential precursor to business process re-engineering, migration to new technology, or refinement of an existing business process. However, the level of detail required will depend on the type of change being considered.

Data Flow Diagrams
There are only five symbols that are used in the drawing of business process diagrams (data flow diagrams). These are now explained, together with the rules that apply to them.

External Entity

An external entity is a source or destination of a data flow which is outside the area of study. Only those entities which originate or receive data are represented on a business process diagram. The symbol used is an oval containing a meaningful and unique identifier.

Process

A process shows a transformation or manipulation of data flows within the system. The symbol used is a rectangular box which contains 3 descriptive elements:
Firstly an identification number appears in the upper left hand corner. This is allocated arbitrarily at the top level and serves as a unique reference.
Secondly, a location appears to the right of the identifier and describes where in the system the process takes place. This may, for example, be a department or a piece of hardware. Finally, a descriptive title is placed in the centre of the box. This should be a simple imperative sentence with a specific verb, for example 'maintain customer records' or 'find driver'.

Data Flow

A data flow shows the flow of information from its source to its destination. A data flow is represented by a line, with arrowheads showing the direction of flow. Information always flows to or from a process and may be written, verbal or electronic. Each data flow may be referenced by the processes or data stores at its head and tail, or by a description of its contents.

Data Store

A data store is a holding place for information within the system:
It is represented by an open ended narrow rectangle.
Data stores may be long-term files such as sales ledgers, or may be short-term accumulations: for example batches of documents that are waiting to be processed. Each data store should be given a reference followed by an arbitrary number.

Resource Flow

A resource flow shows the flow of any physical material from its source to its destination. For this reason they are sometimes referred to as physical flows.
The physical material in question should be given a meaningful name. Resource flows are usually restricted to early, high-level diagrams and are used when a description of the physical flow of materials is considered to be important to help the analysis.

Data Flow Diagrams – The Rules

External Entities
It is normal for all the information represented within a system to have been obtained from, and/or to be passed onto, an external source or recipient. These external entities may be duplicated on a diagram, to avoid crossing data flow lines. Where they are duplicated a stripe is drawn across the left hand corner, like this.

The addition of a lowercase letter to each entity on the diagram is a good way to uniquely identify them.

Processes
When naming processes, avoid glossing over them, without really understanding their role. Indications that this has been done are the use of vague terms in the descriptive title area - like 'process' or 'update'.

The most important thing to remember is that the description must be meaningful to whoever will be using the diagram.

Data Flows
Double headed arrows can be used (to show two-way flows) on all but bottom level diagrams. Furthermore, in common with most of the other symbols used, a data flow at a particular level of a diagram may be decomposed to multiple data flows at lower levels.

Data Stores
Each store should be given a reference letter, followed by an arbitrary number. These reference letters are allocated as follows:

'D' - indicates a permanent computer file
'M' - indicates a manual file
'T' - indicates a transient store, one that is deleted after
processing.

In order to avoid complex flows, the same data store may be drawn several times on a diagram. Multiple instances of the same data store are indicated by a double vertical bar on their left hand edge.

Data Flow Diagrams – Relationship Grid

There are rules governing various aspects of the diagram components and how they can relate to one another.

Data Flows
For data flows the rules are as follows:
Data flows and resource flows are allowed between external entities and processes. Data flows are also allowed between different external entities. However, data flows and resource flows are not allowed between external entities and data stores.

Processes
For processes the data flow rules are as follows:
Data flows and resource flows are allowed between processes and external entities and between processes and data stores. They are also allowed between different processes. In other words processes can communicate with all other areas of the business process diagram.

Data Stores
For data stores the data flow rules are as follows:
Data flows and resource flows are allowed between data stores and processes. However, these flows are not allowed between data stores and external entities or between one data store and another. In practice this means that data stores cannot initiate a communication of information, they require a process to do this.

Data Flow Diagrams – Context Diagrams

The context diagram represents the entire system under investigation. This diagram should be drawn first, and used to clarify and agree the scope of the investigation.

The components of a context diagram are clearly shown on this screen. The system under investigation is represented as a single process, connected to external entities by data flows and resource flows.

The context diagram clearly shows the interfaces between the system under investigation and the external entities with which it communicates. Therefore, whilst it is often conceptually trivial, a context diagram serves to focus attention on the system boundary and can help in clarifying the precise scope of the analysis.

The context diagram shown on this screen represents a book lending library. The library receives details of books, and orders books from one or more book suppliers.

Books may be reserved and borrowed by members of the public, who are required to give a borrower number. The library will notify borrowers when a reserved book becomes available or when a borrowed book becomes overdue.

In addition to supplying books, a book supplier will furnish details of specific books in response to library enquiries.

Note, that communications involving external entities are only included where they involve the 'system' process. Whilst a book supplier would communicate with various agencies, for example, publishers and other suppliers - these data flow are remote from the 'system' process and so this is not included on the context diagram.

Data Flow Diagrams – Context Diagram Guidelines
Firstly, draw and name a single process box that represents the entire system.

Next, identify and add the external entities that communicate directly with the process box. Do this by considering origin and destination of the resource flows and data flows.

Finally, add the resource flows and data flows to the diagram.

In drawing the context diagram you should only be concerned with the most important information flows. These will be concerned with issues such as: how orders are received and checked, with providing good customer service and with the paying of invoices. Remember that no business process diagram is the definitive solution - there is no absolute right or wrong.

Zero Diagram

Zero diagram is a chart that describes the process of DFD. This diagram provides a view of the overall system shows that the main function of the process or the flow of data and the external entity. At this level there is a data storage.

Detailed Diagram

Is a diagram that decipher what is the process in the diagram zero level or above.

Numbering level in the DFD:

In one level there should be no more than 7 units and the maximum of 9, when more should be done in the decomposition.

Process Specification
Each process in DFD must have the process specification. in top level method used to describe process by using descriptive sentence. At more level detailed that is under process (functional primitive) required the specification structure.Process Specification will become guide to programmer in coding. Method used in the process specification : * description process in the story * decision table * decision tree

Data flow diagram levels

Level 1 (High Level Diagram)

A Level 1 Data flow diagram for the same system.

This level (level 1) shows all processes at the first level of numbering, data stores, external entities and the data flows between them. The purpose of this level is to show the major high-level processes of the system and their interrelation. A process model will have one, and only one, level-1 diagram. A level-1 diagram must be balanced with its parent context level diagram, i.e. there must be the same external entities and the same data flows, these can be broken down to more detail in the level 1, e.g. the "enquiry" data flow could be spilt into "enquiry request" and "enquiry results" and still be valid.

Level 2 (Low Level Diagram)

A Level 2 Data flow diagram showing the "Process Enquiry" process for the same system.

This level is a decomposition of a process shown in a level-1 diagram, as such there should be a level-2 diagram for each and every process shown in a level-1 diagram. In this example processes 1.1, 1.2 & 1.3 are all children of process 1, together they wholly and completely describe process 1, and combined must perform the full capacity of this parent process. As before, a level-2 diagram must be balanced with its parent level-1 diagram.

Data Dictionary
Data dictionary is a reserved space within a database which is used to store information about the database itself. Data dictionary is also called with a system data dictionary is a catalog of facts and data information needs of an information system. In function to help system agent to interpreting application in detail and organization all of data element that utilized by system exactly so user and system analyst have same understanding basic about entry, output, storage and process. In analysis phase, data dictionary used as communication between system analyst with user. in development system phase, data dictionary used to design input, and report database. Data flow in DFD have the character of globally, boldness more detailed can be seen in data dictionary.
Data dictionary load the followings :

Name of data current: must note that readers who need further explanation about a flow of data can find it easily
Alias: alias or other name of the data can be written when there is
Forms of data: used to segment the data dictionary to use when designing the system
Flow data: indicates from which data flows and where the data
Description: to give an explanation of the meaning of the data flow

Balancing in DFD
Data flow into and out of a process must be the same as the flow of data into and out of the details of the process on the level / levels below. Name of the data flow into and out of the process must match the name of the flow of data into and out of the details of the process. Number and the name of an entity outside the process must be equal to the number of names and entities outside of the details of the process.
Things which must be gave attention to DFD owning than one level:

There are must input balance and output between one level and level next
Balance between level 0 and level 1 seen at input / output of data stream to or from terminal in level 0, while balance between level 1 and level 2 seen at input / output of data stream to/from pertinent process
Data flow name, data of storage terminal and every level must be same if its same object

Prohibition in DFD

Data flow may not from external entity directly to wend another external entity without passing a process.
Data flow may not from direct data deposit go to external entity without passing a process
Data flow may not from direct data deposit go to other data deposit without passing a process
Data flow from one direct process go to other process without passing a data deposit better possible avoided.

References:
- Slide Part 4 - DATA FLOW DIAGRAM by ER Ngurah Agus Sanjaya
- http://www.getahead-direct.com/gwbadfd.htm

- http://en.wikipedia.org/wiki/Data_flow_diagram

gunggus