Programming in C: Unit V: File processing

Introduction to Files

File processing in C Programming

A file is a collection of data stored on a secondary storage device like hard disk. Till now, we had been processing data that was entered through the computer's keyboard. But this task can become very tedious especially when there is a huge amount of data to be processed.

Unit V : Files

CHAPHTER 10 : FILES

Takeaways

• Streams in C

• Error handling

• Renaming files

• Reading data from files

• Command line arguments

• Creating temporary files

• Writing data to files

• Random access of data


INTRODUCTION TO FILES

A file is a collection of data stored on a secondary storage device like hard disk. Till now, we had been processing data that was entered through the computer's keyboard. But this task can become very tedious especially when there is a huge amount of data to be processed. A better solution, therefore, is to combine all the input data into a file and then design a C program to read this data from the file whenever required.

Broadly speaking, a file is basically used because real- life applications involve large amounts of data and in such applications the console-oriented I/O operations pose two major problems:

• First, it becomes cumbersome and time-consuming to handle huge amount of data through terminals.

• Second, when doing I/O using terminal, the entire data is lost when either the program is terminated or computer is turned off. Therefore, it becomes necessary to store data on a permanent storage device (the disks) and read whenever necessary, without destroying the data.

In order to use files, we have to learn file input and output operations, i.e., how data is read or written to a file.

Although file I/O operations is almost same as terminal I/O, the only difference is that when doing file I/O, the user must specify the name of the file from which data should be read/written.

Streams in C

In C, the standard streams are termed as pre-connected input and output channels between a text terminal and the program (when it begins execution). Therefore, stream is a logical interface to the devices that are connected to the computer.

Stream is widely used as a logical interface to a file where a file can refer to a disk file, the computer screen, keyboard, etc. Although files may differ in the form and capabilities, all streams are the same. The three standard streams (Figure 9.1) in C language are as follows:

• standard input (stdin)

• standard output (stdout) and

• standard error (stderr).

Standard input (stdin) Standard input is the stream from which the program receives its data. The program re- quests transfer of data using the read operation. However, not all programs require input. Generally, unless redirected, input for a program is expected from the keyboard.

Standard output (stdout) Standard output is the stream where a program writes its output data. The program requests data transfer using the write operation. However, not all programs generate output.

Standard error (stderr) Standard error is basically an output stream used by programs to report error messages or diagnostics. It is a stream independent of standard output and can be redirected separately. No doubt, the standard output and standard error can also be directed to the same destination.

A stream is linked to a file using an open operation and dissociated from a file using a close operation.

Buffer Associated with File Stream

When a stream linked to a disk file is created, a buffer is automatically created and associated with the stream. A buffer is nothing but a block of memory that is used for temporary storage of data that has to be read from or written to a file.

Buffers are needed because disk drives are block- oriented devices as they can operate efficiently when data has to be read/written in blocks of certain size. An ideal buffer size is hardware-dependent. oed

The buffer acts as an interface between the stream (which is character-oriented) and the disk hardware (which is block-oriented). When the program has to write data to the stream, it is saved in the buffer till it is full. Then the entire contents of the buffer are written to the disk as a block. This is shown in Figure 9.2.

Similarly, when reading data from a disk file, the data is read as a block from the file and written into the buffer. The program reads data from the buffer. The creation and operation of the buffer is automatically handled by the operating system. However, C provides some functions for buffer manipulation. The data resides in the buffer until the buffer is flushed or written to a file.

Types of Files

In C, the types of files used can be broadly classified into two categories-ASCII text files and binary files.

ASCII Text Files

A text file is a stream of characters that can be sequentially processed by a computer in forward direction. For this reason, a text file is usually opened for only one kind of operation (reading, writing, or appending) at any given time. Because text files only process characters, they can only read or write data one character at a time. In C, a text stream is treated as a special kind of file.

Depending on the requirements of the operating system and on the operation that has to be performed (read/write operation) on the file, newline characters may be converted to or from carriage return/line feed combinations. Besides this, other character conversions may also be done to satisfy the storage requirements of the operating system. However, these conversions occur transparently to process a text file.

In a text file, each line contains zero or more characters and ends with one or more characters that specify the end of line. Each line in a text file can have maximum of 255 characters. A line in a text file is not a C string, so it is not terminated by a null character. When data is written to a text file, each newline character is converted to a carriage return/line feed character. Similarly, when data is read from a text file, each carriage return/line feed character is converted into newline character.

Programming Tip: The contents of a binary file are not human-readable. If you want the data stored in the file to be human- readable, then store the data in a text file.

Another important thing is that when a text file is used, there are actually two representations of data-internal or external. For example, an int value will be represented as 2 or 4 bytes of memory internally, but externally the int value will be represented as a string of characters representing its decimal or hexadecimal value. To convert internal representation into external, we can use printf geland fprintf functions. Similarly, to convert an external representation into internal scanf and fscanf can be used. We will read more about three functions in the coming sections.

 Note

In a text file, each line of data ends with a newline character. Each file ends with a special character called the end-of-file (EOF) marker.

Binary Files

A binary file may contain any type of data, encoded in binary form for computer storage and processing purposes. Like a text file, a binary file is a collection of bytes. In C, a byte and a character are equivalent. Therefore, a binary file is also referred to as a character stream with the following two essential differences:

• A binary file does not require any special processing of the data and each byte of data is transferred to or from the disk unprocessed.

• C places no constructs on the file, and it may be read bo from, or written to, in any manner the programmer wants.

While text files can be processed sequentially, binary files, on the other hand, can be either processed sequentially or randomly depending on the needs of the application. In C, to process a file randomly, the programmer must move the current file position to an appropriate place in the file before reading or writing data. For example, if a file is used to store records (using structures) of students, then to update a particular record, the programmer must first locate the appropriate record, read the record into memory, update it, and finally write the record back to the disk at its appropriate location in the file.

Note

Binary files store data in the internal representation format. Therefore, an int value will be stored in binary form as a 2 or 4 byte value. The same format is used to store data in memory as well as in file. Like text file, binary file also ends with an EOF marker.

In a text file, an integer value 123 will be stored as a sequence of three characters-1, 2, and 3. So each character will take 1 byte and therefore, to store the integer value 123 23 we need 3 bytes. However, in a binary file, the int value 123 will be stored in 2 bytes in the binary form. This clearly indicates that binary files take less space to store the same piece of data and eliminates conversion between internal and external representations and are thus more efficient than the text files. 

Programming in C: Unit V: File processing : Tag: : File processing in C Programming - Introduction to Files