TECHNO TALK: file organization

Showing posts with label file organization. Show all posts

Sunday, August 9, 2009

Indexed sequential files

We are all familiar with the concept of an index. For example, the directory in a large multi storied building is an index that helps us locate a particular person's room within the building. For instance, to find the room of Dr. Sam within the building, we would look up his name in the directory (index) and read the corresponding floor number and room number. this idea of scanning a logically sequenced table is preferable to searching door by door for a particular name. Indexed sequential files use exactly the same principle. The records in this type of file are organized in a sequence and an index table is used to speed up access to the records without requiring a search of the entire file. The records of the file can be stored in random sequence but the index table is in stored sequence on the key value. This provides the user with a very powerful tool. Not only can the file be processed randomly, but it can also be processed sequentially. Since the index table is in a stored sequence on the key value, the file management system simply accesses the data records in the order of the index values. Thus indexed sequential files provide the user sequential access, even though the file management system is accessing the data records in a physically random order. This technique of file management is randomly referred to as the Indexed Sequential Access Method (ISAM). Files of this type are called ISAM files.

Saturday, August 8, 2009

Direct files

A direct file, also called a random or relative file consists of records organized in such a way that it is possible for the computer to directly locate the key of the desired record without having to search through a sequence of other records. This means that the time required for online enquiry and updating of a few records is much faster than when batch techniques are used. However a direct access storage device, DASD such as a drum, disk, strip file, or mass core is essential for storing a direct file. A record is stored in a direct file by its key field. Although it might be possible to directly use the storage location numbers in DASD as the keys for the records stored in those locations, this is seldom done. Instead, an arithmetic procedure called hashing is frequently used. In this method, an address generating function is used to convert the record key number into DASD storage address. The address generating function is selected in such a manner that the generated address should be distributed uniformly over the entire range of the file area and a unique address should be generated for each record key. However in practice, the above constraints are usually not satisfied and the address generating function often maps a large number of records to the same storage address. Several methods are followed to overcome this problem of collision when it occurs. One approach is to include a pointer field at the location calculated by the hashing function. This field points to the DASD location of another record that has the same calculated address value. When the computer is given the key of a record to be processed at a later date, it reuses the hashing function to locate the stored record. If the record is found at the location calculated by the hashing function, the search is over and the record is directly accessed for processing. On the other hand, if the record at the calculated address does not have the correct key, the computer looks at the pointer field to continue the search. Advantages of direct fields

The access to, and retrieval of a record is quick and direct. Any record can be located and retrieved directly in a fraction of a second without the need for a sequential search of the file
Transactions need not be stored and placed in sequence prior to processing
Accumulation of transactions into batches is not required before processing them. They may be processed as and when generated
It can also provide up-to-the-minute information in response to inquiries from simultaneously usable online stations
If required, it is also possible to process direct file records sequentially in a record key sequence
A direct file organization is most suitable for interactive online applications such as airline or railway reservation systems, teller facility in banking applications, etc.

Disadvantages of direct files

These files must be stored on a direct-access storage device. Hence, relatively expensive hardware and software resources are required
File updation (addition and deletion of records) is more difficult as compared to sequential files
Address generation overhead is involved for accessing each record due to hashing function
May be less efficient in the use of storage space than sequentially organized fields
Special security measures are necessary for online files that are accessible from several stations

Sequential Files

In a sequential file records are stored one after another in an ascending or descending order determined by the key field of the records. In payroll example, the records of the employee file may be organized sequentially by employee code sequence.

Sequentially organized files that are processed by computer systems are normally stored on storage media such as magnetic tape, punched paper tape, punched cards or magnetic disks. To access these records, the computer must read the file in sequence from the beginning. The first record is read and processed first, then the second record, in the file sequence, and so on.

To locate a particular record, the computer program must read in each record in sequence and compare its key field to the one that is needed. The retrieval search ends only when the desired key matches with the key field of the currently read record. On an average, about half the file has to be searched to retrieve the desired record from a sequential file.

Advantages of sequential files

Easy to organize, maintain and understand
There is no overhead in address generation. Locating a particular record requires only the specification of the key field
Relatively inexpensive I/O Media and devices can be used for the storage and processing of such files
It is the most efficient and economical file organization in case if applications in which there are large number of file records to be updated at regularly scheduled intervals.

Disadvantages of sequential files

It proves to be very inefficient and uneconomical for applications in which the activity ratio is very low
Since an entire sequential file may need to be read just to retrieve and update few records, accumulation of transactions into batches is required before processing them
Transactions must be stored and placed in sequence prior to processing
Timeliness of data in the file deteriorates while batches are being accumulated
Data redundancy is typically high since the same data may be stored in several files sequenced on different keys

Friday, August 7, 2009

File Organizations

System designers choose to organize, access, and process records and files in different ways depending on the type of application and the needs of users. The three commonly used file organizations used in business data processing applications are - sequential, direct and indexed sequential organizations. The selection of a particular file organization depends upon the type of application. The best organization to use in a given application is the one that happens to meet the user's needs in the most effective and economical manner. In making the choice for an application, designers must evaluate the distinct strengths and weaknesses of each file organization. File organization requires the use of some key field or unique identifying value that is found in every record in the file. The key value must be unique for each record of the file because duplications would cause serious problems. In the payroll example, the employee code field may be used as the key field.

TECHNO TALK