BIS 2030

1997/98 Semester 2. Hendon

Systems Analysis and Design

Data Flow Diagrams


A Data Flow Diagram (DFD) is a diagrammatic representation of the information flows within a system, showing:


In SSADM a DFD model includes supporting documentation describing the information shown in the diagram. DFDs are used not only in structured system analysis and design, but also as a general process modelling tool. There are a number of commercial tools in the market today which are based on DFD modelling.


SSADM uses DFDs in three stages of the development process:


1. The Notation


DFDs show the passage of data through the system by using 5 basic constructs: Data flows, Processes, Data Stores, External Entities, and Physical Resources.


1.1 Data Flows


A data flow shows the flow of data from a source to a destination. The flow is shown as an arrowed line with the arrowhead showing the direction of flow. Each data flow should be uniquely identified by a meaningful descriptive name (caption).

Flow may move from an external entity to a process, from a process to another process, into and out of a store from a process, and from a process to an external entity. Flows are not permitted to move directly from an external entity to a store or from a store directly to an external entity.


It is generally unacceptable to have a flow moving directly from one external entity to another. However, if it is felt useful to show such a flow, and they do not clutter the diagram, they can be shown as dotted lines.


No two data flows should have the same name. The name of the flows moving in and out of stores may be omitted if the name of the store implies the name of the flow. It is useful to use a name if the flow is especially significant or it is not easy to discern the name of the flow just by examining the diagram. However, omission of names can be justified only in the case of complex diagrams, or when extra long names seem to clutter the diagram. It is good practice to name all notations represented in the diagram.


It may be possible to give a combined name for circumstances where many flows move between the same sources and destination.


It is very important that the direction of flow is represented correctly in the diagram. A flow is always from or into a process. The figure below shows the connections, which are allowed and not allowed when constructing a DFD.


1.2 Processes


Processes are transformations, changing incoming data flows into outgoing data flows. Processes are drawn as rectangular boxes with a descriptive name occupying the middle of the box. The box has a top stripe that contains an identification number in the left, and the location (or the role carrying out the work) on the right (this is optional and used only in the current physical DFD).

The numbering generally follows a left to right convention. This does not indicate priority or sequence. The identification number is purely an identifier. It also helps to associate a high level process with its decomposed subprocesses. This will be made clear to you when we discuss about process decomposition in Section 2.


The name of the process should describe what happens to the data as it passes through it. An active verb (verify, compute, extract, create, retrieve, store, determine, etc.) followed by an object or object clause is a suggested notation.


In the current physical DFD, the location of the process is placed in the right top box. This might be a physical location or the staff responsible.


1.3 Data Stores


A store is a repository of data; it may be a card index, a database file, a temporary pile of sales orders awaiting processing, or a folder in a filing cabinet. The store may contain permanent data or temporary accumulations (pending documents, daily movements).


A store is represented by an open-ended box and is given a meaningful descriptive name. Each store is also given a reference number prefixed by a letter. In current physical DFDs manual data stores are shown using the letter ‘M’, and a ‘D’ used to represent a computer data In contrast to these permanent data stores, data can also be held for a short time in temporary or transient data stores. These are identified by a ‘T’. If they are also manual then a ‘T(M)’ is used.

In logical and required system DFD, data stores are regarded as computerised and hence only a ‘D’ will be used. Some transient stores may remain and retain the ‘T’.

To prevent a DFD becoming ‘spider’s web’ of crossing lines, the same data store may be included more than once on a DFD. Such duplication is shown by an additional vertical line within the store symbol.

1.1.1 Direction of Flow

 If the arrow from the store is single headed and points towards the process, this signifies a 'read' action. In other words, the process does not alter the contents of the store, it only access the data available. For example, the flow from the data store 'Customers' in the figure below.

If the single arrow head points towards the data store then, this indicates a 'write' action, e.g. creating a record. The flow to the data store, "'Hold' forms" is an example of a write action.

An 'update' will consist of both a read and a write. This could be shown either by a double-headed arrow or 2 single arrows on either direction.


External Entities (Source or Sink)


The external entity represents a person or a part of an organisation which sends or receives data from the system but considered to be outside the system boundary (scope of the project). As with the data stores these may be duplicated on a DFD to simplify presentation. External entities may be further referenced by the use of an alpha character, and this is particularly recommended if at a lower level the entity is being decomposed.

Sometimes external entities are referred to as sources and sinks. An External entity either supplies data to the system, which makes it a source and /or receives data from the system, which makes it a sink.


1.5 Physical Resources


A physical flow represents the flow of material (as opposed to data flows representing the flow of information), the movement of some resources or goods which are relevant to the information system, from source to destination. They are included to aid communication. A physical flow is represented by a broad arrow. The resource store is represented by a closed rectangle.




You will find that some of the books, which describe the earlier versions of SSADM, do not include this symbol. This notation is not used generally in DFDs. When used it is only included in the initial set of high-level DFDs. Physical flows add clutter to the DFD by their physical size. However they can be useful for:



2. Modelling Hierarchy


A major advantage of a DFD is its use in communication between user and analyst, or even between 2 analysts. A DFD becomes difficult to understand when it has more than 7-9 processes. If there is a tendency to overstep this (in other words, if the modeller feels the figure is too complex for easy understanding) then the DFD should be redrawn with processes that are logically grouped together being replaced by a single process to encompass them all. The processes which were replaced should appear on another DFD (which is considered to be at a lower level) that shows how this combined process can be exploded into its constituents. These constituents themselves may be complex and can be broken down into sub processes shown on a DFD at a lower level. This is known as decomposing the DFD.


The DFD that shows the entire system within a single diagram is the top-level or ‘level 1’ DFD. The DFD that are expansions of processes at the top-level are ‘level 2’ DFDs. Levels below this are called ‘level 3’, level 4’, etc. Processes that are not further decomposed are bottom-level processes. Processes from the top-level DFD may be broken down (decomposed) into a number of levels if they are complex, or may be not broken down at all if they are simple. Thus, it is possible to have bottom-level processes appearing at all levels of the DFD.

In the figure below, the bottom-level processes are denoted by the letter ‘b’.





If a process is decomposed, the identifiers of the lower-level processes are prefixed by the identifier of the higher-level process. For example, if process 1 is decomposed, then the lower-level processes will be identified as 1.1, 1.2, etc. Similarly, if process 1.3 is subsequently decomposed, the lower-level processes will be 1.3.1, 1.3.2, and so on. This is shown in the figure below.


Note that all of the data flows to and from the high-level process have to be represented at the lower level. They can be either duplicated or broken down to several flows. If new data flows are identified at the lower level which cross the frame (indicating they are not internal to the process), these should be reflected at the higher level so that consistency is maintained between the levels.


This concept can also be extended backwards where the complete level 0 DFD is a one process diagram which summarises the inputs and the outputs of the system under consideration. This is called the context diagram.


2.1 Advantages of decomposition




3 constructing the Logical Data Flow Diagram


The first step is to read carefully the specification looking for and listing all mentions of data the system is to handle. Some data originates in the environment and is supplied as input documents to the system. Some data is generated by the program and delivered as output documents. Some data is retrieved from or saved in data stores.

Hint: When identifying data implied by a specification look out for nouns.


The next step is to list all mention of processing that the data undergo.

Hint: When doing this look out for verbs.


Now you can begin to develop a data flow diagram.


The figure below shows the simplest data flow diagram.




The figure above shows a top-level description of a system specification. The system as a whole is viewed as one process. The input and output to the system at this level of abstraction is from the environment. In the above figure there is a single external entity (source) which sends data (input) to the system, and a single external entity (sink) which receives data from the system (output). This is commonly known as a ‘context diagram’, or a 0-level DFD.


If the system also updates an external data store (e.g. a database, a file, a record) then the context diagram will look like:



An internal data store would not be shown at this level of abstraction and would appear only in the subsequent refinements of the transform.


3.1 Naming Convention


The name of a notation is usually written within the symbol. Choose brief verb phrases for processes and noun phrases for data flows.


It is important that the name should say only what is necessary. Do not describe the representation of the data, its recording medium, or its type; or how the transforms are implemented - say only what processing is to be done.


3.1.1 Hints on names on DFDs


Data Flows





verb + object/object phrase


Data Stores


Show only net flow in/out/both (i.e. indicate whether it is read-only, write-only, or updated).


4. Advantages of DFDs






S Skidmore, R Framer and G Mills, SSADM Models and Methods Version 4.


M Goodland and C Slater, SSADM VERSION 4:A Practical Approach, McGraw-Hill, 1995.


Any book on system development will describe the concepts of data flow diagrams. Remember to use the notation used in SSADM V4.