 Good morning. So, so far we have been handling data input and output using our keyboard as an input medium and monitor as an output. And we type in ASCII characters they go in through C in and C out produces an output ASCII character stream which comes out on the monitor. But if you want data to be stored persistently you cannot store it in memory variables inside because the moment computer program executes completely. At the end there are no memory contents left. So, you would like your data to be preserved for later use and for that you will use external files. These are typically stored on external storage devices. In the introductory video I had told you about the three types of this devices. One is the conventional magneticness which typically comes now in a terabyte capacity or two terabyte capacity. Then you have what we call solid state desks which are like memory on the desk or ROM desks and you also have optical desks which are CDs and DVDs. Capacities differ the speeds differ but essentially data remains persistent. You can store it forget it anytime later tomorrow one month later one year later you can actually read data from those devices. How is that data organized on those devices? It is organized in files. Now files are not handled directly by C++ programs or programs written in any language. Files are always handled by the operating system which runs the computer. Operating system as a component called file manager or file system which handles these. As a result each file is given a name and a path which is a location of the file on the desk. It has a size and each file has access permissions associated with that. For example if I create a file I can set permissions that nobody else will be able to read it or all others can read it but nobody can modify it and so on. All files which are written on desks are modifiable. They can be extended. The operating system manages the allocation of space on the desk appropriately and handles them. Now a C++ program can define and use files but internally a C++ program uses what is known as file pointers and the name of the file pointer becomes the file name within a program. Of course such a pointer name cannot be used unless you associate that internal pointer file name with a physical file external. Such an association can be done through functions which are available in C++. The normal input output that we do, we presume that C in directly gets input from keyboard and C out directly produces output on monitor but that is not strictly true. All input output inside a computer happens through the notion of a file. So the operating system actually automatically makes available to you two files. One is called stdin for standard input and the other is called std out for standard out. Whenever you start a program execution these two files are created by operating system and are opened and are made availability. Also by default stdin is connected to the keyboard and stdin out is associated to the monitor. So this happens automatically. That is why whenever you say C in greater or greater something the actually operating system reads the stream of ASCII bytes from your keyboard and then hand over that particular line or record or whatever to your program. It is possible to reconnect the stdin and stdin out to some other files not necessarily keyboard and monitor. This is called redirection. If you execute a program from the terminal, so either on Ubuntu you go to a terminal you go to the directory in which code blocks has compiled that program and you execute that program. Then it is possible to give a command like this myprog which is a compiled version less than in file.txt greater than out file. This less than simple stands for input redirection. That means stdin instead of being connected to keyboard will now get connected to a file on the desk called in file.txt. Naturally this file should exist in the same directory in which your program is. If it exists somewhere else you will have to specify the complete path name of that file. In exactly the same fashion you can do an output redirection and the output can be redirected to a file which is stored on the desk. So you don't see any output then on the terminal because all output will go to that file. But the advantage is that file is persistent. You can read it later any number of type and if you run the same program three times you can actually create the output and redirect it to three different files. So you can actually have the outputs of all the executions independently and persistently availability. Redirection can be only for input or for output not necessarily for both. There is actually a third standard file which is also opened by the operating system for each C++ program that executes it is called stderr or standard error file. The operating system uses this file to write all error messages and you can actually write to stderr. Let us also discuss two functions which are used to analyze text input in a formatted fashion and to create formatted output. These functions are the traditional functions from C programming language which continue to exist in C++. The C programming language did not have any input output instructions. It did not have operators like C in or C out. Consequently everybody used only these functions. Let's understand what these functions are. The printf function converts values as per a specified format string and produces it on a steady out. So for example if I say printf followed by this funny looking string comma roll comma batch. Assume that roll is an integer value 1, 2, 3, 4, 5 and batch is another integer value 1, 1, 2. Then this printf will actually produce an output line containing values of roll and batch and the first string that is written here is called the format specifier string. So this has various format specifiers. Each format specifier is identified by a percent symbol. So percent 5D actually means convert the internal value into a 5 ASCII character space. Percent 3D is convert the corresponding integer value in 3 character space. Of course if the value is larger than that you will have funny results but that is your choice. If you don't put any 5 or 3 for example the conversion will take as many positions as are required. More importantly this formatting string apart from content format specifiers and there must be one format specifier for each variable or expression that you write later. In addition you can write any characters in that string and those characters will appear verbatim on the output. For example you will notice that there is a blank here. This blank will appear as is. There is a new line character here. This will appear as is on the output. So consequently this will produce an output 1, 2, 3, 4, 5 1 blank, 1, 1, 2, backslash. If you had put 4 blanks here you would have got 4 blanks. If you had put a tab character here you would have got tab character. If you had written abrakadabra abrakadabra would have come. So you would have got 1, 2, 3, 4, 5 abrakadabra 1, 1. Whatever you write whatever characters you write they will become verbatim. So please note the format string consists of two components 1, the format specifiers which must be as many as there are variables and there should be corresponding format specifiers there and 2, any number of other characters which are reproduced verbatim as they are ending. This is for printing out. For input the operation is similar but works slightly different. For example if I have m and n as integer x and y as float and a character string 40 and I want to read these values from an input line. Ordinarily you would have said c in greater greater m greater greater n greater greater x greater greater y greater greater name and you would have given the value. But in a scanf kind of function call you would put a format string and then write and m and n and x and y. Why? Because scanf is supposed to collect value and bring it into the variables that you have defined. So variable values must change and therefore you must pass them by reference. That is why the pointers are to be passed. Note that name does not have an explicit and before it. Why? Because the name is the name of an array and the array name itself is a pointer to the 0th element. So implicitly it is a pointer. There are 1, 2, 3, 4, 5 values that are expected and you will notice that the format string contains 5 specifiers %d %f %f %s %d is integer %d is integer %f is floating point %s is character %c would be a single character %s is a string. Notice that there is one blank in between each of these. Now the blank in a format specifier for scanf is interpreted differently. One blank stands for any number of blanks or white spaces. So white space includes tab character for example. Consequently if you type a line which has 5 blanks between 2 values they are all equivalently represented by a single blank. So one blank means any number of white places. Consequently if you type this line 25 blank blank blank minus 78 blank blank 0.00763 blank this or I type this line as blank blank blank 25 blank blank blank minus 78 blank blank 7.63e minus 3 is a valid representation just as I would have written it inside a program you can actually give an input value. Scanf will interpret these values correctly and will assign exactly the same values to all the very blanks. Please note that the character string that I input cannot contain a blank. If it contains a blank scanf like C in terminates consideration of those bytes. So blank is actually a separator anyway. If you do not want a blank to affect reading of values for example if my input data line is this 1, 2, 3, 4, 5, 6 fan belt 150.50 This is a typical sample of a subset of what we call an inventory record. So 1, 2, 3, 4, 5 may be an item identifier. Fan belt may be an item code which is a character string and 150.50 may be the value in rupees that is paid on an average for one fan belt. Why am I not giving space in between? When I type I can give a space. But please notice that such inventory records for lakhs of items will be actually persistently stored in a file. And when you store in a file you don't want to waste storage space there. There is no great advantage of putting a blank in between different values because a human being is not going to read those. Computer is going to read that. So you would like to preserve space and you might want to have a record on the desk which looks like this. Unfortunately CIN cannot read this as three different values. It will read it as if a continuous string not meaning anything other than a character string. However, if you use scanf and give %6d, %7s and %f and these are the reference variable and a item code and x. Notice that item code is a care string so this itself stands for a point. This will correctly read the values. So this will for example interpret the first six characters as the value for a immediately the next seven characters as value for item code and immediately subsequent characters till you find a space as value for x because you have said %f. If you had given a specific number of characters for width of f then that would have read exactly those many. Now this is a facility which scan it provides. I would advise you to read as was stated in the lectures that you should read C++ tutorials online which exhaustively defines scanf and printf format specifiers. I think Abhiram Anand's book also gives enough details for that. There are variations of these scanf and printf functions which are very useful. For example, instead of interpreting data which is typed on input or instead of creating a output line on a monitor because what is being read in is a string of characters what is being put out is a string of I could actually read an entire line in some string variable or compose a string variable string array internally and then output that entire characters. So it is possible to apply scanf and printf on strings which are internal. That is done by sprintf and scan. They work exactly like scanf and printf except that for sprintf the destination is not a terminal or std out but the destination is a string s which is a character array variable that you have. Similarly for scanf the source is not keyboard but the source is the string s. So somehow you have read an entire input line in some character string then you can apply scanf to interpret values if they are different values. Exactly the same functions can be applied to lines in a text file. So instead of in an internal string if suppose such a string exists in a text file we shall see how text files are organized. But you can imagine I mean the program that you type for example is stored in a text file. So text file is nothing but a series of lines. So if you want to write a line on an output file or if you want to read a line from an input file then instead of using scanf which will insist on working on std in you can write fscanf which means do the same thing as scanf but read the line from fp in which is a file pointer. Remember I mentioned that you will have a file pointer of course a file pointer will have to be appropriately associated with an external file we shall see that shortly. Similarly output a formatted line to a text file you say fprintf. These are the variations of printf and scanf which are extremely useful whenever you are dealing with text data or converting text data into internal variables and vice-verses. Just to take this further there is an example of how the test data gets created in an external file to begin with. You can type the text data in my labor yesterday but when you are talking about thousands or millions of lines such data usually comes from some other application. There are specialized data entry programs for example which will capture information let's say about a new account that has been opened in a bank. A typical bank in India had 2 crore or 4 crore accounts on any one day there may be 20,000 accounts that are opened across the country. You will fill up the form but the data entry for that is done at a back office somewhere the common place. Now they can't afford to write a C++ program which will read every line what if there are mistakes. This is a elaborate process of entering the data in a file then somebody verifies that data and then only that entire file is processed by the banking system to create the account. In exactly the similar way a spreadsheet can be used to create text files you are familiar with spreadsheets whether it is Microsoft or open office or any other spreadsheet you will generally have for example this is an example of a spreadsheet which has for each row contains data about one student it has roll number, it has name, it has a lab batch for example 112111 etc and it has marks in some quiz. So these are the elements of course I can have more columns but these are the typical columns in one sample spreadsheet. Now these spreadsheets are stored in special internal formats by the program which work on whether it is any utility as I said Microsoft, open office or a number of other spreadsheets you use they store data in their peculiar format. However all of them are capable of saving the data in an ordinary text format which is called comma separated values format or CSV format. If you store any spreadsheet you can try it out enter some rows and save it as .csv file. Then the file that you will get will be actually a text file which will look like this 10101 comma Anil Shah comma 112 comma 10106 comma Avinash Arsway comma 112 comma 14 etc. Now this is more or this is similar to the kind of data that you hand except that it has commas. Comma separated value is what ordinarily you will save. You can actually give any separator while saving the file but this is the most common form. Why I am showing this is we will be discussing a program written to read such lines. Please note that these lines cannot be easily read through C in because this comma is a nutty character. We do not know how to handle it in C. We also do not know how to handle it in scan F although you can figure that out. But let us say for the illustration purposes of using external files we shall be treating this data file as a text file which is to be used as input and we want to create an output file for example which is also a text file but it does not have comma it instead has blank spaces because once we have blank separated values then we know exactly how to handle it by F scan F or whatever whatever. So a program to process data from a CSV file is essentially to read one line from the input file as a single string. It is a line string. So the first line is 10101, Anil Shah etc. This entire line is read in line string. Now the logic that I have decided to use here is this string itself consists of four strings. The first string is roll number, second string is name third string is batch and fourth string is marks all text string. So let us say arbitrarily I decide that I will separate out the four parts in four different strings. Then I have now four different string. I will interpret each string and convert that value into an appropriate internal format. For example I can define int sr, char sn 30, int sb float sf. What are these? This is the roll number, this is the name this is the students batch and this is the students mark. So these are internal variables. Observe that this is exactly to using a C in or F scan or scan but I am just illustrating the process. So my process is I associate my internal file pointer with an external input file then I will read one line from that input file to all of this, convert this. Now I want to create an output line I will create an output line by separated by blank spaces together in a string which is called out string. So I am internally constructing an out string and then write this string to the output file. So I have an input file I have an output file I am not doing any redirection. I am actually associating that input file with a file pointer, output file with a file pointer reading a string from input file, doing all this processing creating an output string and writing output string to the output file. This is exactly how text files are ordinarily processed. Of course I will repeat this procedure to process all lines from the input file. So let us quickly recap the program, this is the program logic read one line of input file while the file continues to contain more records do all of this. Please note that any while or do while requires a condition observe what you do when you use C in to read data for N students. What is the typical way? There are two ways that you will use either you will count how many students are there say 575 in CSHono. So you will first read N 575 and then read N lines. But suppose two students are missing a quiz then that data is not there then there is an artificial method that you can follow. Namely that you say keep reading the data till you get a roll number which is minus one for example. Now you will the moment you input minus one C plus plus can test it and say oh the input hasn't but both the methods are artificial. In case of files the operating system actually is capable of telling you whenever you issue a read command to the operating system to read a record from that file, read a line from that file the operating system will either give you a line or will give you a signal sorry there is no line file has ended. The end of file is indicated by setting a flag called F O E F we shall see that later. So when the flag comes you know that there is no more record that is why the while loop is not end of file for the input while not end of file for the input file you keep doing of course as usual you read the first line before entering the while process that line so processing means separate parts in four strings convert each part in internal variables then recreate an output text string with these four values write this output string to output file and read next input line in line string and go back. Please note that when you read the next input line you will again come to the while condition here. In that next input line suppose there was no more line then the operating system will say end of file and you can get out of the while. This is a simple logic here is what a sample one line 1 0 1 0 1 comma A end I L sorry I will show for example these are the value. Now what I want to do is I want to read this into a string then separate out the string into four separate strings S roll is a string, S name is a string, S batch is a string S max is a string. Artificial decision I mean I could have processed it in different ways but this is what this particular program that is there in the lectures does. The program simply defines all these character variables it also defines internal variables SR SN SB SM and some other thing. Most important thing for file processing is definition of two file pointers F P in and F P out. Notice that F I L E is written as capital F I so is a data type F I L E is a data type in C plus plus. We says that whatever I define like int X means X is an integer file star something means file point. So I am defining two file pointer F P in and F P out. These are not the names of the files by which operating system knows those. These are my internal and I can associate this pointer to any external file that I wish since I want to read an input file called CSV underscore data dot text this is the name of the external this is what operating system knows it and notice the association F P in is equal to F open. So F open is the function call effectively associate means open a file for processing. I am opening an input file notice it as two parameters one is the name of the file it could be a complete path name if you want to be very clear where the file is the second parameter is called more arm is read actually there are two types of files text files and binary files if you don't specify the type by default it is a text file arm is open it for reading W means open some file for writing. There are many other things which you can read in the tutorial. So I am opening this file ordinarily when I open the file the operating system will associate CSV underscore data dot text with my pointer F P but there is a possibility that this association attempt may fail under what circumstances this attempt may fail of opening a file. File may not exist in that direction or if the file exists I as a user may not have the operating systems permission to read data from that any one of those reasons in that case how does the operating system indicate to my program that look such association has not been established it does so by sending you a null point if the file is properly open you will get a proper pointer otherwise you will get a null pointer which you examine if F pin equal equal null well give gali gali gali cannot open file and return minus now please understand that whenever you are handling files this should be an absolutely standard way with closed eyes you should be able to write open a file and examine whether the pointer is null the pointer is null get out otherwise continue in exactly the same way you will open an output F out F P out is equal to F P F open marks underscore data dot text is a file that I am creating W is the mode which means it is being created as output so records will be written out again if F P out is null why output file cannot be created input file we can understand the file may not exist etc. I am creating an output file so what could be the problem any guess well we should not call it memory memory is the term that is used for the internal memory of the computer but you are very right is actually there is no space on the disk there is no space on the disk or in the directory which I am attempting to create a file I may not have right permission to that directory given by the operating system please note that files are managed by operating system my programs do not control them directly ok so these are some other there is a third possibility somebody else might or I might myself have created a file called marks underscore data dot text so a file called by this name already exist inside what would happen in that case yeah unfortunately operating system doesn't care if a file with the same name exists it will be overwritten so if you have a previous file it will be deleted a new file will be created and whatever you write in this program will be written on to that so it's your responsibility if you want to create multiple output files for example for different executions of your program remember to give different names and there is a neat facility in the c++ I have written this as a fixed string but I could have a character string variable called file name and give an appropriate value through variety of means including c in and that file name will be used to open the file so these are standard again as I said any output file that you want to create and write to should follow this standard procedure F open and examine the pointer the pointer is null shout shout shout get out otherwise proceed so notice that I have opened an input file and I have opened an output file I will now do the processing on these text files almost in the same fashion as I was doing it for STD in and STD out except that for STD in and STD out I could have used for example get string and put string function but I have to do it with file now so let us see how we do it F get S so instead of get S I say F get S notice the parameters line string comma 79 comma Fp in so Fp in which is an input file pointer internally appears as a parameter this statement says read from Fp in a line but at most 79 characters and put the result in line string why 79 characters because line string I have defined as 80 characters I do not know what line would contain but I am just of course if there are less number of characters in that line only less number of characters will be read now look at the next while while not F E O F Fp F E O F is another function call which returns the status of the file whose pointer is mentioned as a parameter so Fp O F Fp means at this juncture has the file ended has the flag for end of file being set if so operating system will return true if no it will return false so not of false will be true so as long as the end of file is not set keep repeating this is the equivalent of the logic that we are seeing while not end of file keep doing this keep doing what I received a valid string separate the four parts so I will separate out this is the logic you can look at it in the videos the logic is simple I will just extract all those up to comma you know and put them into separate strings then I am using extraction mechanism for relevant values as a scan notice we have just seen a scan if so for each string S roll S name S batch S mark I will extract using the format specifier into and SR and SB and SM and SN which itself is just an example I could have done it in in different ways is just to illustrate a scan having done that now I create an S print F out string so I want to write an output line of text but I want to create that line internally first I don't know to write portions I could do that by saying S print F and here I am writing SR SN SB and SM for values please note that these are all internal values this is integer this is a character string this is integer this is floating point but using a sprint F all of these will be put as ASCII characters notice an important thing that has happened here I am specifying the width so this is five characters long this is 30 characters long this is 3 characters long this is 5.2 means five characters consequently the output string that I am creating has now a fixed length earlier the lines were of different length depending upon the length of the name if some name was very large that line would have been larger but here it does not matter all lines are now fixed linker there is an advantage of having fixed length lines on the file or fixed length record on the file that we shall see in a moment so is this clear how I am creating an output string like this and this to actually put it on the output file I use F put S please note that the file F P out has been opened successfully so every time I say F put S and out string is written and out string is written notice that in the out string I am deliberately inserting a backslash M there will be actually a text file if I use a text editor or anything it will look like a normal text file they will have a new line character everything you write of course for the next iteration I will get the next line I will augment the counter by one end of the file loop I will go back and keep doing this when I finish this I will put all the statistics output but I should not forget to say F close F pin and F close F P out F P in and F P out are close problem in short the only difference in using external files for processing data as compared to STD in and STD out is a simple extension of a similar conceptual mechanism but the major difference you define files by using file pointers you use F open initially at the beginning of the program to associate the file pointers with external files you use F get S and F put S instead of get S and put S for handling character string there are other functions which we shall see you test for end of file flag for reading input data and at the end you actually explicitly close F P in and F P out that is your responsibility STD in and STD out are open and closed by operating system but any such text file that you handle you will have to be explicitly close just as you have to explicitly open is that clear so file processing is basically a simple affair but there is much more to file processing than just reading data sequentially and writing data sequentially and that is what we shall like to look at observe that we were writing fixed length records of text there why is it necessary to write records as text why should I waste five spaces to write a floating point value with point something or ten spaces for it when a floating point variable requires only four bytes why should I write an integer which could be seven digit long in seven ASCII characters when a seven digit integer number can be stored internally in four byte int why can't I define a structure which has components all these four components and I write the entire structure in an internal format only on the external file as long as I can read the entire structure in exactly the same fashion it should not matter to I am not going to edit data on external files manually my programs only are going to look at it that is possible if I define a structure so here is a structure for student info for example it has integer role character name 30 int batch float marks how many bytes such a structure would occupy quickly four here 30 here four here four here unfortunately all compilers when they allocate memory they insist on allocating memory to different components starting at what is known as word boundary that means the starting address of any component element has to start at a multiple of four the first one will start at multiple for four but the next one has 30 bytes so at the end of 30 bytes the next byte is not a multiple of four address so two more bytes are actually used as a buffer byte consequently the size of this entire thing will work out to 44 bytes and not 42 bytes now how do you and I know what is the size I mean this is we can understand and count but if we just do a counting and if I have complex structure who will do the counting please remember a structure can have structure itself as an element it could have array as an element all kinds of so there is a mechanism in c++ that when you define a structure variable say student info s this you already know if I define a student a variable s of the structure type then individual elements of s can be accessed by s dot roll s dot name s dot batch and s dot marks ok the size in bytes of a structure can be found by using size of operator so if you say size of struct student info not of the variable of the type what is the size of that type the structure is a type that you are defined and this you can store in an integer variable say rex size if you print that rex size you will it will turn out to be 44 bytes it doesn't matter you need to know exactly what is the size of that so you are now created a structure for containing students information in an internal format where roll number is an int the name is a stream the batch is an int and marks is a flow no text no nothing in internal format I would like to create a database file of students in which records after records contain such structures for one student second student third student fourth student and please note a structure has a fixed length record so I would be creating a file which will have fixed number of bytes in each record for each student having fixed record length has certain advantages because of the peculiar nature of disk files which are stored in either your CDs or memory stacks or on hard disk or variable and that facility is that can access data in a binary file directly by going to any position just like in an array you can go by an index I to IH position on a disk you can go to any position and access that data so let's see how a direct access binary file operates in an operating system environment every file which is open will have an associated file pointer we already mentioned that say fp but operating system maintains an additional internal position indicator this position indicator is internally positioned at some byte position that is the position where any read write will happen upon opening a file the position is set at 0 which is beginning of the file this is done by operating system please note that ordinarily this is not stored in a variable or something by operating system it is maintained by operating system but every time I do read or write it happens at this position so if for some reason the position is here any read will happen so many bytes from here or any write will happen bytes from here and any time I do a read or write operation this internal current internal position indicator will be advanced by the number of bytes read or written all this is done by the operating system say if I am at the beginning of the file and suppose I read 44 bytes from this point this I issue a read command to read one structure variable so it will read 44 bytes the position will now be at 44th byte 0 1 2 3 4 43 would have been read 40 so automatically every read or write operation will keep advancing this and that is how you can process the data sequentially if you show this but if you want you can say I want to go to the byte number 24538 and from that byte position I want to read 100 bytes such funny things are possible such interesting things are possible and such useful things are possible let us see how if I want to be able to go to any place and read or write I should have a capability of A setting this current internal position to a desired point I should also a capability of knowing where the current position is so knowing and setting are the two important things I should be able to do for which C++ provides a facility the first facility to find out what is the current internal position there is a function called f tell tell is a very appropriate work tell me where you are so f tell fp if fp is a fine pointer operating system will look at its internal position and return that byte number and that is what you store in say pos notice that by definition by requirement of C++ pos cannot be a simple integer variable it has to be a long variable so you declare long pass anytime in your program you say pos equal to f tell fp it will operating system will tell you look I am exactly at this point so if you open the file and say f tell you will get 0 because that is where the point but otherwise it will tell you exactly where that is one part but if you want to use the file at a specific position you should be you should be able to tell the operating system that I want the pointer to be said there so I want to tell the operating system I do not care where you are or even if I know where you are I want you to seek to this position this is done by another function called f seek the word seek comes from the old traditional disk architecture where there is an arm and there are tracks on the desk and that magnetic arm will actually move forward or backward to go to a desired track on the desk and that operation used to be called a seek operation that means that arm is moved arm is seeking to go to a track that is why the word seek has come and the function name is fc please note f6 has three parameters a file pointer fp the position where I want to go directly say p again p like your pass has to be a long int variable so you have declared in long p and you said p equal to 1,25,430 or any arbitrary number you will go to that particular byte position but from where you count that byte position is very interesting because if this is your file for example then you have some pointer at this pointer now when you say go to this byte position you are actually not giving an absolute position you are giving a displacement and this displacement can be counted from the beginning can be counted from the current position or can be counted from the end I can say go to minus 20 position from it or I can say go to 100th byte from beginning or I can say go to 112 bytes from wherever you are currently consequently when I specify a p so please note p is not an absolute byte number ordinary p is a displacement and that displacement can be relative to either beginning or the current position or at the end and with reference to what you are giving that displacement is indicated by the third parameter which is seek underscore set is the name of the parameter which indicates from the beginning so if you say seek underscore set as the third parameter then p is actually an absolute byte number you go to that particular byte right there are other parameters possible other values for this parameter seek set is seek curve c u r r seek underscore curve for the current position and seek underscore n for the end position these are artificial names which are predefined names in c++ they actually translate into values 0 1 and 2 0 means from the beginning 1 means from the current position and 2 means from the end backward do you understand how significant this facility is of fc I can actually go it does not matter is a millions of bytes or trillions of bytes I can go to any position in that file directly directly going means the amount of time saved is enormous can you imagine what it would mean to go to that byte by reading one byte at a time and then going there that is the difference sequential processing versus direct processing once you go to that position you can read or write data at that position now we come to the fixed length record advantage we know the record size number of bytes in the record say it is s if we know the relative position of the record that this student is on fifty third position or he is the fifty third student first second third fourth fifty third student then I can subtract one from that record number multiplied by s that is the byte position in the file so if I know the mapping between the roll number of a student and the position in the record in the file I can directly access that this is a facility which is not possible in a text file at all I have let us say four lakh students appearing for joint understanding I have a file containing all four lakh students and I want to find out some details about one student I have no choice but to locate that student sequentially scanning the whole thing but if I have a mechanism to map the roll number of that student into a record position then I can directly access that so I use fc to go to s star r minus one from the beginning and I read the next s bytes for my record these are some relevant c plus plus functions for reading and writing to direct access files f read and f write simple functions f read will read from the file fp as many records as you specify here by the third parameter of the size rec size and it will read those records in a structure referenced by and s you have to pass actually and s you have to pass the reference this s ordinarily would be one record variable in which you are holding one student and that is why you will say one so one record can be read into a struct variable s for writing one record I do the exactly same thing f write into this this must be opened as an output file of course and I can write this the rec size but this is a tremendously powerful statement for example instead of an s I could have an array of structures thousand and I could give the beginning pointer there and here I could write instead of one thousand all bytes will be read in one shot I can read chunks then students data at a time I can do variety of tricks that I wish to do that is the power of the read and write I can read any number the total number of bytes read are simply the multiplication of the second and third parameter I can put any number of bytes and any number of records for that bytes so that that will be read already if the current pointer is somewhere and I want to reset it there is a simple command called rewind this comes from the old tapes which are spools which are revamped so rewind means go at the beginning of course I can also do the same thing by saying f seek to zero which is same as rewind all having done that here is a program which creates a binary file of the students data from the input text file which was produced as an output by our previous program so look at how it does I say fp input as f open marks data dot text fp output f open student db this is an arbitrary name I have given student db no file extension no nothing but I open it as wb w is for writing b means binary so I am now creating a binary file it is not a text file there will be some special housekeeping information that the operating system may keep about that file which is different from a text file as usual I check whether fp out is null otherwise I will start processing like this notice I read the input string I create s dot roll s dot batch s dot marks and I string copy s dot same what I am doing elements of s which is a structure variable is being given appropriate values for one student and then I use f write f write is the command which writes one record on fp output from and s which is the reference variable please note I am actually creating output records one at a time because I am reading input lines one at a time whatever I read I create output read I create output I can keep doing f write and then I can keep doing f scan f to read the next line standard while iteration while not feo f of fp input so that is it the program will end by saying f close fp input f close fp output this is very similar to how we created an output text file the difference is this time we have created a binary file it has a fixed length record and each record contains values in an internal format which we cannot decipher if we directly display the file contents on screen because it is an int and float and whatever one but it's okay because we propose to read that data in exactly the same structure we don't care now you will understand why two's complement or floating point representation etc etc has to be same across all different compilers otherwise you could have problems you have written a file on one machine and you are trying to read it on another machine you could have a problem in fact there are some differences sometimes and that is the reason why even the binary files many times contain data in either a string format or in a format called bsd for example which contains half byte digits so decimal digits some compression internally next we discuss the program to handle this data one so there are three components this is the last program on the the videos that have been uploaded this program illustrates two different things one given a roll number how do I search sequential second it illustrates given a record number how a record is read directly and third it illustrates given a roll number and some implicit mapping how the record of that roll number is directly read values change and rewritten at the same place on the desk this is something which is not possible with text files so let's look at what this program does it defines the structure and other things please note it calculates the rake size as size of structure that is what is relevant please note that it is now opening the same student db file which we created but this time it is opening as rb plus r is for reading or even b is it is a binary file not a text file plus means I will not only read but I may modify so this is the update kind of file opening mode first the basic housekeeping I open the file and I check where the pointer is null for whatever reason but the processing sequential processing is like this rewind fp I need not give this command just to illustrate that rewind can be given at any time and your position is the beginning now I am searching for marks of roll number 10105 I do a sequential search except that the file is a binary file so I calculate what the current pass is just for my information and I want to find marks for roll number 10105 so I will put up an iteration which will I am using a do while loop so do what the while comes here while not end of a file of fp this is the do loop and inside that I am reading one record from the file and if s dot roll of that record is r I have found this so it is just a sequential what is important to note is the file stored the data in an internal format and it read correctly that internal format I just have to access the element called s dot roll and if I find it I will print s dot marks I got I can also find out which is the record number because I am also finding out the pass at a current time and which I can convert into a record number using the records next is a slightly more involving we want to demonstrate direct access to record so fine and print the sixth record in database first second third fourth fifth sixth record where is the sixth record from the 0th byte first record one rec size later second so the number of records that is record position minus one multiplied by the rec size is the starting point I calculate rec number as six I assign six as rec number because I want to read the sixth record I calculate the position pass which I have defined as long so pass is rec num minus one into rec size and then I do what fc so I fc to that position set the file and the moment I fc the file pointer internal has gone there I simply f read one record this will be the sixth record notice how easy it is if I know the record position one lakh twenty five thousand four hundred and thirty fourth record minus one into rec size I seek to that position read one then I can print that record starting at this byte position so this is the direct access this is the peculiarity of binary file which is I repeat not available with text file set now I go one step forward there is a student called Neelmani route his roll number is one zero one zero eight I want to update Neelmani route smart let's say he had ninety one point five now his paper has been reexamined he should get let's say ninety three point five so what I want to do I want to read Neelmani routes record I want to change the marks inside and I want to rewrite it exactly at the same position at least the original record one but here now I am making some assumption I am saying how do I conclude it is earth record I am saying that because his roll number is one zero eight this time I am presuming a mapping between roll number and record position let me show you how I have presumed that map r is one zero one zero eight so basically I have to read the record of roll number one zero one zero eight notice what I am doing I am calculating rec num as r minus one zero one zero zero what is the assumption here all the students that I have they are sequentially given roll numbers starting with one zero one zero one one zero one zero two one zero one zero three he later what if roll numbers were given like that but some four students left in between well for a simple mapping like this it is worthwhile to create space for those four jokers also and keep nothing inside those records but if I do that I will have sequentially records one zero one zero one zero one zero two one zero one zero three you have ten thousand students I have ten thousand records like that you get the advantage now once I have all the records doesn't matter if some students are missing I can map the roll number into a record position through a simple mathematical arithmetic expression cos is rec num minus one into rec sign I do the fc and I do an elementary I know I have got nil mani rouses the science of doing this mapping is extensively studied in other areas such as databases and so on you will notice that you have studied binary trees there are search trees for desk called b trees there are index files index them so you spend a lot of reads in order to find out the position of a record and then go to that record dial because reading from disk is very costly conventional disk is at least one thousand times slower than main memory that is why in main memory of an array you can handle four byte elements it takes the same time wherever but on a disk since it is thousand times costly to go to a particular position you don't generally read four bytes you read many bytes you read one full record etcetera most important thing is mapping between the record position and the desired key attribute such as roll num here is one example there could be many other examples of such map so given this I can read this record I have read it now what I wish to do I want to update nil mani rouses marks s dot marks is equal to 93.5 will simply update inside the memory the component called marks s is the structure in which nil mani rouses record has been read I have updated but I want to rewrite it can I just say f write what would happen if I did f write now please note that the previous f read it read nil mani rouses record but it has advanced the internal pointer further if I do any write it will write the next fellow actually in fact if I blindly do an f write it will write nil mani rouses new record obliterating somebody else's record and nil mani rouses will have now two records that is not correct what I must do now I must bring back that current position to where the nil mani rouses record began and that is why since the previous read has advanced the internal position I do an f seek again I already know the pause I have calculated it I used f seek there so I can do f seek any number of times when I do this f seek the internal file pointer is brought back at the beginning of nil mani rouses record and now when I do an f write it will update the write record there are a few more lines of code written just to confirm that you actually updated nil mani rouses record only so what you do you for verify you read it again from the disk but again you have to do an f seek because even the write operation has advanced it so you do an f seek and you do an f write there as in the video there is an execution screen shot given for the code blocks where you can see what happens so is this clear how you can handle thing the crux of this is your ability to do a mapping which is not as easy as it sounds but it is possible for example suppose I was storing information about mobile numbers let us say I have a shop and I issue mobile numbers to people and I have been given a bunch of mobile numbers to give let us say they all start from 98200 and then 10000 so I can give I can sell 20,000 numbers or 50,000 numbers like that now it should be possible for me to do a simple mapping I will create records as 98200 100000 100000 100002 and I can create such 10,000 records so for each customer I have a record what could that record contain the name of the customer the address of the customer the proof that he gave a scanned copy of PDF file pointer file name or something like that and the current value of money for using to make phone calls 200 rupees 1000 rupees 500 given a telephone number suppose somebody comes to my shop and says I want to recharge my account by 300 rupees so I say what is your mobile number I have a C++ program running here I enter that mobile number I do this mapping I directly access that fellows record and I say how much value 200 rupees type 200 I do exactly the same process add 200 to his balance and update that record this is a simplistic mechanism please note that for actual operations this entire file is required to be connected to the service providers infrastructure because after all that service provider must know that 200 rupees have been added and that information must go back to your mobile through the service provider when you ask for a balance so all that happens in a distributed fashion multiple programs running elsewhere it f close is absolutely mandatory you must close the file that you open so this is the practice problem assume that a binary file mobile dot bin contains mobile information like mobile number or mobile recharge balance etcetera it has multiple other please note this down because the problem itself is defined into two or three different slides very simple you just need to write down the name of the file called mobile dot bin and it contains information like mobile number and mobile recharge amount or you may call it mobile balance now first you want to write a main program which will open a binary file for reading and writing the main program will call two functions these two functions please write down their names mobile number exists function and the second function is update recharge amount function mobile number exists function as the name suggests it would return a Boolean right mobile number exists yes or no I mean somebody says this is my mobile number charge 200 rupees either that fellow may tell me a wrong number or I might make a mistake in entering the mobile number may not exist in my database please note that if I have done simplistic mapping then all mobile numbers will exist in the file but whether an actual mobile number is active or not can be indicated by a field inside them for example I can say not active that means that mobile number does not exist or the record itself may not exist either way you can take a call but these are the two functions that you have to write now what you do in these two functions the function update recharge amount should be called only if the function mobile number exists returns true so declare these two functions with the required return value and the parameters that need to be passed please note you are dealing with files and both these functions may be required to access the file the first function to determine whether the number is there or not meaning that number is active or not now this is something that you should note down write the first function mobile number exists to check whether the mobile entered by the user exists in the binary file exists as an active record or exists at all depending upon how you have decided the mapping if it exists the function should return true to the main program from where it is called your main program actually does two thing a customer comes to me tells me here 200 rupees this is my mobile number so my main program will collect an input as mobile number then the main program will call this function that you are writing mobile number exists it will return true or false if it returns true then I will go to the next function update recharge and the next function does work to update the binary file with the new recharge value so obviously the main program should collect the input as 200 rupees 500 rupees 1000 rupees whatever is the recharge amount and the second function should do what that program did to update Neil Munir house marks it should read that record it should update the balance and it should rewrite it at that so the actual function writing is very simple because it is very similar to the sample program illustration program that has been given except that you are now required to write it as two functions both callable from the main program so issues like where would you open the file obviously it does not make sense to open the file inside each function best is to open the file outside but then you may have to pass the file pointer for the for the actual activities to happen there is a position would you when the when the first function finds out whether the number exists or not while doing so it would have located the position would you like your update function to redo that work again or would you like to have some mechanism to get the position that has been computed by that a better specification could have been that this is not a binary function but it is actually a long function the first function which returns the starting point or it returns minus one if it does not for example so I could do see these are the design decisions that you will take why I am telling you these elaborate mechanisms is that there is no unique way of solving a general data processing problem and when you design your projects during the project design these are the design decisions that you will have to take collectively what parameters should be used where how should the function be written what should be the interpretation of values inside the records etcetera alright well with that we come to end of this lecture thank you