 Welcome to this session on computer programming. In this session, we are going to look at how to handle data from text files. Earlier, we had seen that we can handle text data. We have studied functions like printf and xcanf and their variants scanf and sprintf and fscanf and fprintf. Earlier, we had also seen how files are handled in C++. Namely, we have to define file pointers and we have functions like fopen, fclose, feof, etcetera, etcetera. In this session, we will use our learning to handle data from text files rather than from keyboard input and produce output on monitors. First, let us understand how text data originates in a text file. We can, of course, use standard text editors. There are special data entry programs which are used by professional operators to enter large volumes of data. There are spreadsheet programs. Here is a spreadsheet, for example, in which I have entered some sample data for students. Each student has a roll number, a name, a batch number and marks obtained in an exam. You will notice that different rows contain the same information, roll number, name, batch number and marks. Now, while spreadsheets store this data in a specialized format, it is also possible to save a spreadsheet in a plain text format, a format called comma separated values. This comma separated values is nothing but what the spreadsheet does is, it puts one line for each row putting value comma value comma value comma value in that fashion all the rows. Here is how a CSV data file will look like. So, I have here, for example, roll number, name, batch number and marks, roll number, name, batch number and marks. And there are as many rows as there are students in the data file. What we wish to do is as follows. We want to take this CSV file as an input file, read one line at a time in a string, let us say line string. For example, the first line is 10101 comma anilcha comma 112 comma 12.5. This entire string will be read into line string. Now, we want to separate out the four parts in four different strings. So, roll number in one string, anilcha in one string, 112 in one string, 12.5 in another string. Then we wish to convert each part in an internal form, come insurate with the type of the variable that we use. For example, int sr and int sb will denote roll number and batch, floats sm will denote marks for the student and char sn30 will denote the name of the student. Next, we want to put these four parts separated by blank spaces now, not comma. Together in a string called out string, let us say we define it as char out string 80 and we want to write this string to the output file. We wish to repeat this procedure to process all lines from the input file. Please understand what we will do by this process. We would have converted comma separated values in lines which are created by the spreadsheet program into a blank separated values kind of lines in ordinary text file, which we normally process. Here is the program logic. We read one line of input file in line string. Now, we set up a file iteration. Please note the file condition. While not end of file for the input file, that means as long as the input file contains data, we will keep doing this. What do we do? We exactly do four things. One, process input string and separate parts in four strings. Two, convert each part and store it in an appropriate variable. Three, prepare an outputs test string with these four values. Note that we should separate these values by blank spaces. And four, finally write this string or out string to output file. Now, these four things we have to repeat again and again and again and that is why we have this while loop. Please note that before entering the while we read one line and before going to the next iteration, we read the next line. Consequently, these four steps will be taken for a line that has been correctly read. For every line that we read, we wish to separate parts in four strings. The first line, for example, is shown here, each character separated by a vertical bar. We wish to have these five characters in an array, character array called s-roll, the next characters representing the name in an array called s-name, the next characters representing batch in a character array called s-batch and the last characters representing marks in an array called s-marks. Notice how we handle this in program. So, the program begins by standard inclusion. Notice that we have defined line string of 80 characters and out string as 80 characters, the input and output character strings. We have used s-roll, s-name, s-batch and s-marks. These are four character arrays to store four parts. We also have variables of the type int, sr, char, sn, 30, int, sb and float, sm. These are to temporarily store the values in internal format that we get from these four character strings, i, j, k, n, etcetera, etcetera, the standard index variables. First, some housekeeping about files. We declare file pointers as you know, file star fp in, file star fp out. So, fp in and fp out are the names of the pointers which we shall use internally, but we must associate these names with external file names. This is done by f open statement as you will recall. Here is f open. So, we say f pin is equal to f open csv data dot txt comma r. This associates an external file csv underscore data dot txt. We should exist in the same director in which our program is executing. The second parameter says it is used in read mode. That means it is open for input operation. Of course, it is possible that the file does not exist, in which case f pin will be null. So, if fp in is null, then we just print an error message and return with minus. We do the same thing with output file as well. We open max underscore data dot txt, but this time the parameter is w, which means the file is open for writing. If a file does not exist, it will be created. If a file exists, it will be overwritten. When we say f open this and assign it to fp out, fp out now gets a file pointer associated with it, which connects it to this file. Of course, if for some reason we cannot create that file, then fp out will be null. We test it and if that is so indeed, we say we cannot create an output file and return minus. This is the standard protocol that we have to follow while handling any external files, namely define file pointers, open the appropriate files and associate them with the file pointers and in case you get a null pointer, print an error message and get out. The input file is now open at this point. So, as per our program logic, we have to read lines one by one. Before setting up the loop, we first read the first line. f get s is the function that we use. Notice that fp in is the last parameter. That means from this file, read a line into line string at most 79 characters. Why? Because line string is only 80 characters long. Now, where the while loop, we says if not fp o f e o f of fp in, that means if the file input file fp in has not ended, then I proceed with the loop. I know that I got a valid string here because I read the string here in line string. I now need to separate the parts. I start with i equal to 0 and k equal to 0. The idea is that I will associate k with the larger string and I will associate i as an index with individual 4 character strings in which I wish to separate out the parts. Look at the simple way in which this separation is being done. I know that the values are separated by comma. So, I assign line string k to s roll i. Please note i plus plus and k plus plus simply means that after this assignment, both i and k will be incremented by one. While this not equal to comma and there is no body for this file. So, please note what it does is it keeps assigning characters one after another till it encounters a comma. When it encounters a comma, the while loop ends. At that particular point of time, the value of k will be 1 beyond the comma and similarly value of i will be 1 beyond the comma. Now wherever there was comma, I need to put a backslash 0 to end the s roll string. So, I set s roll i minus 1 equal to backslash 0 and reset i to 0. The value of k I continue from whatever that value was because it correctly points to a character just after the comma. I do the same thing. I extract the name up to comma as long as I do not get a comma. It is possible that the name is actually shorter than the length that I have provided for s name. Remember s name is a 30 character string. So, what I do is after encountering comma, I just put up a another for loop. It says for j equal to i minus 1, j less than 29, j plus plus I simply insert blank spaces into s name. s name 29 is set to backslash 0, i is again reset to 0. I do the same thing with s batch and finally, I do the same thing with s marks. Observe that I do not test for comma this time, but only for end of the line string. Having extracted these, I extract the relevant values from these strings. So, I use s cadf to convert the value from the string s roll into s r using a percent d specifier in format. Similarly, I convert this string into another string s n. I convert s batch into an integer s b. I convert s marks into a float variable s m. So, I got these four values separated out in internal form. Now, I collect these values and put them into an out string using s print f. Observe that I am collecting s r, s n, s b and s m and I am using the format 5 d, 30 s, 3 d and 5.2 f. I will get a complete line constructed which I output to f p out using f put s. I just print this string for my own benefit. Before going to the next iteration, I get the next string from f p in increment n and that is the end of the while do. So, the entire input file has been processed. At the end, I say input file has been read and printed, output file has been created. I output the number of records written and close these files f p in and f p out. That is the end of my program. Let us see what happens when this program is executed. I got this roll number. I got this name batch marks. But wait, this name was Anil Shah. This name was Shefali Pandya, for example. This name was Neel Mani Rao. What has happened to the second name? Well, you will recall that when we read the name from a string into another string using scan f or s scan f, it stopped after the first blank and that is why the second part of the name has not come. However, the file has been correctly read. In summary, we wrote a program to process data in CSV format from a file. In the next two sessions, we will create a binary file to store the database of students and we will process the records in this file by using direct access to records. Please do refer to C++ tutorials once again and particularly read the reference on functions of STDIO. Read about all the relevant file functions. Thank you.