 Welcome to this session on processing data in external files in our C++ programs. Last time we saw how external files are organized on this storage and other external storage and also how we can associate special file pointers in our programs with such external files. We will now look into the details of how such data can be read from external files or written to external files. Let us first look at the logical model that C++ provides us. The logical model is actually a very simple model where the file is considered as a sequence of bytes. If there are 4 million bytes in a file, for example, this will be considered as far as our C++ program is concerned to be a sequence of 4 million bytes. I have shown a sort of view of the sample bytes in a file. Please note that there is a portion here which are sort of open portion so I will just put them as non-relevant as far as our processing is concerned. This is there are pieces of information stored by the operating system at the beginning of a physical file, at the end of a physical file etc etc for its own internal bookkeeping. We are not concerned with that. We are concerned with the number of bytes which either exist in a file from which we have to read data or which we want to write to an external file. As far as C++ program is concerned, once I open a file using a statement such as fp is equal to f open something something, then this file pointer fp gets permanently associated with this sequence of bytes. In fact, you can actually regard it as if it is an array of bytes but residing on an external storage. What is the property of an array? In an array, assume you have an array of bytes say char array for example, you can actually access any particular byte in that array in exactly the same fashion bytes on the external storage devices particularly those devices which are called direct access devices such as desk CD-ROMs or pen drives. The file is actually logically an array of bytes. However, we read data from files or we write data to files and therefore, whenever we perform a read or write operation, there is a certain position at which the reading is done or at which writing is done. Such a position is internally known to my C++ program through a variable which let us tentatively define as say pass. An internal variable pass might be pointing to this point. What it means is that any subsequent operation of reading or writing of data will happen at this point. Whenever a read operation takes place, so many bytes will be read from the external file into the computer's memory. The position is automatically advanced to be at the next byte position in this logical array. There are as I mentioned a whole lot of functions which permit us to process this data. These functions are defined in the CSTDIO library. There are a few functions which I mentioned which are relevant for processing data. For example, if I want to read, I will use a function called f read. If I want to write, I will use a function called f write. I have a series of functions which permit me to read define the position which is an internal position for the file. We will not go into those details now but we will discuss them in another session where we discuss binary files. Binary files is a particular type which is used to store not necessarily text data but any kind of data such as picture images, digital images for example where a byte is not necessarily an ASCII code of a character. It could mean something completely different depending upon the application which has written that byte or the application which will read that byte. The fact is that external data is treated as if it was an array or a sequence of bytes. So to recapitulate every file when it is opened, an internal file positioning pointer is automatically associated by C++ to that file. Similarly that pointer is at the beginning of the file. As we read data from that file or write data to that file, the pointer automatically advances. We have tentatively shown here an internal arbitrary name called pos. We shall see exactly how such pos has to be handled but if we are reading data from a text file or writing data to a text file where we read data sequentially and write data sequentially, we do not have to bother about the internal positioning pointers such as pos. What C++ guarantees is that whenever we read the next piece of information from the input file automatically it will be read after the last reading ended whatever is the next byte will be read from that file. In exactly the same fashion for writing data on to the output file, the writing will always happen at the current position indicator pos and the value of pos will be automatically incremented to indicate how many bytes have been written on to the file. F read and F write are the two main functions used for reading and writing. More specifically for handling text data there are special functions that have been provided by C++. We will discuss the F read and F write functions in the context of binary files later but today let me describe to you how simple text data can be read and processed from text files. Imagine that fp is a file pointer associated through f open with a text file and I want to read text data from that file. It means that these various bytes actually contain some ASCII coded characters. You are familiar with the getC function or get a character function. You have exactly a similar function called f getC. The way to use that function is to declare a character variable C and then simply say C is equal to f getC f of p. This will get the next character at the position pos and assign it to C. If you want to read not one character but a sequence of characters or a string then such a string is read using another function called f get S. It is very similar to get string or get S. The way you write it is to say f get S with three parameters. The first parameter is a string pointer such as you get when you declare a character string let us say char str 100. This is an array. Please remember that the name of the array is nothing but a pointer to the 0th element of that array and therefore it is a character pointer. What this function does is it reads the number of characters from the file associated with the file pointer f p and puts those characters in the string pointed to by str. Obviously it is our responsibility to ensure that these num characters can be held in the str array but the value of num can be less than the size of the string that you have. When you read data from input file there could be a situation for example when there is no more data or when you encounter a new line character for example there could be a new line character here in this part. Imagine that currently the file is positioned here and there are only 23 characters before the backslash n is encountered while reading data. The f get S since it behaves like a gate string actually stops reading whenever it encounters a new line character. So it would have read only 23 characters up to that point. It will then transfer only 23 characters to the string array even if the num specified is let us say 50 or 100 or 90 of course it cannot be 100 for a different reason. What gate S also does is that after reading so many characters from the file and putting them in the string it automatically inserts a backslash 0 character. Remember the null character is a natural indicator of end of string in C plus plus. So this null character is automatically inserted by f get S. So to recapitulate f get S can read a string exactly like you would read a string using get S function. It stops reading whenever it encounters a backslash n character and if the number of bytes read are less then only less number of bytes are transferred to string. Every time the number of bytes are transferred to the string the string is automatically terminated with a backslash 0 character. That means the backslash 0 character is inserted by C plus plus in our string array. So we get a well formed string as a result of f get S. There is another possible situation where our reading operation may go heaven. For example suppose the file itself ends. I have specified 100 bytes to be read or 90 bytes to be read but the number of bytes available subsequent to this pass is let us say only 35. There is no backslash n character that is encountered. I just reach the end of file. When I reach the end of file the operating system signals to C plus plus program that there is no more data to be read. The C plus plus program in such a case imagines as if a new line character has been encountered. So to note that f get S will work till it encounters either a backslash n character or end of file and it will put as many characters as it has found into the string str terminated properly by a backslash 0. It is quite possible that my pass is actually located at this point during the operation. That is it is at the end of the file. There is no more data to be read. Now it does not matter whether I say f read or I say f get C or I say f get S. There is nothing to be read. This situation is indicated by the operating system to my program by returning a spatial signal. That signal can be tested by end of file. So there is a spatial function called F E O F. F E O F when invoked on a file pointer F P will return true or false. When it is true it means there is nothing to be read the file has ended. If it is false that means there is more data to be read. Indeed in a C plus plus program when I want to setup an iteration for example to read fifty hundred or thousand sentences which have been put inside that text file then I will typically have a while loop at the beginning which will say while not of F E O F F P repeatedly do this. That means as long as the file has not ended keep reading keep reading keep reading when the file ends stop reading stop processing and get out. When I finish all my input processing I can close the file by using an F close step. In the sample programs that have been given in the handout there is a program which will read such strings process that string and do something similar to what was indicated namely data about participants roll numbers and marks is read and some statistics is produced. That program will illustrate the use of several of these functions additionally in the handout you will find a list of all functions which are associated or available in the C plus plus file library and a brief explanation of what each function does. Thank you.