 Complete our discussion on handling data on these files. We have already seen how we use files both implicitly and explicitly. Implicitly, we use files as sheets of files coming from keyboard. Using CIN, the CIN always operates upon STDIN, which is a default 5 pointer given by the operating system. Similarly, COUT operates upon STDOUT, which is a default 5 pointer given by the operating system. There is another 5 pointer called STDERR. We have not used that very much. Later, I will indicate how it is written. These are all implicit usage of files. Observe that these are all text files. So, you read or write data in character strings. Typically, individual lines are terminated by two line characters, and different values are terminated by one or more lines or line spaces. We have also seen that we can explicitly open files, and we have capital F I L E as the type which declares a file pointer. We can associate that file pointer with any specific file on the disk by using the F open command. Subsequently, we have seen how we can use F scan F, F print F, F get S, etc. to read strings, write strings, and interpret values in a text file. However, we have always commented that when we keep very large amount of data on the disk, we would like to directly access a desired piece of information from the disk, rather than sequentially scanning for the designs of information starting from the first byte on the file. It is in this context that we need to look at the direct access to C++ files. We have seen very briefly that direct access is feasible, but how do we exactly use it for our applications is not clear. In the process, we shall also revisit the data management basics. Last time, very briefly, we had looked at the entity model, and we had said that the class of entities, such as a class of students, would be represented by a set of attributes, and that attribute values will define a particular student in that class uniquely, when I identify attributes such as name, role number, marks of date, courses register form, hostel, room, etc. We had also seen that this model for an entity can be translated first into a tabular form of information, and later in the form of a disk file, where on the disk, you will create a file containing information about all entities of the class. There, you will store information about a single entity in a record, and each record would have fields, and they will have different attributes. So, records and fields is the basic notion inside this file called Sporting Power Information Model. Essentially, then we are looking for ways in which we can directly access a record, or have directly access a field in a record, or directly access a record, get it in the net family, and process various things, and possibly update information inside that record in the same place where the original record model is. For locating such records, we need the notion of a primary group. We had seen in our sample entity models for students that role number is such a primary group. However, we observed that role number is a character string, and therefore represents far more number of possible values as possible primary key values. Out of it, only 4000, 5000, 6000 values will be valid value. On the other hand, we know that on the disk, a file can be treated as an array of bytes, and just as we can index an array to go to any particular byte in the array, similarly we can go to a specified position on the disk and access one or more bytes from that position. What it means is that a direct access is visible only if I can specify numerical position of a particular character on the disk, which means that if we use something like a serial number, it will be much easier to go to a corresponding record on the disk. Now, in real life, primary keys are not such well formed in feature numbers. They are usually character strings such as role number or a part number in an inventory system, or a train ID in a train intervention system, whatever. If at all we have to directly access a record, which is uniquely identified by a primary key, which is not numerated, then we must find a way to map such non-numeric keys onto an appropriate position value, which is a numeric value on the disk. This mapping is usually created using an index table. Index tables themselves could be very large, so the index tables can be stored on files, and these are called index files. We will discuss these issues later, very briefly. Currently, we are going to illustrate the use of direct access to our records and fields, assuming that the primary key is numeric, and that there is a simple method of arithmetic calculation using which the primary key can be converted into a position on the disk. There is some recapitulation of what we have briefly discussed, as we will describe the files on disk. A file on disk is a stream of files. The file itself is pointed to by a file pointer, say, a file star FB or something. Additionally, for every disk file, the operating system maintains a current position we may briefly call it as POS. This is very much like an index to an array. Once we find out that this sequence of files can be considered to be an array of files. Then, like an array in our program, we declare CAD, say, A500, which means we have 500 bytes of information. Now, by using an index AI, setting I to 25 and referring to AI will mean I am referring to 25th byte. In exactly the same fashion, there is an invisible position indicator or position index called POS. Well, POS is my name I have used. There is no such name that you can explicitly use for the position. This is invisible. This position pointer for every file is maintained by the operating system. It is not visible to you, but you can set its value and you can test its value. So that is what is important. Now, there is a function called FC which sets the position POS for access. Ordinarily, when you open a file, this position pointer or position index is always set at the beginning. So opening of the file, the 0th byte of the file, observe that it is an array in true similarity. That means the first byte of this file is actually called by the operating system as 0th byte. And the last byte is called the size of bytes minus 1. There is a size of the file that you can get just as you can get string length. So as I mentioned, POS is invisible to us. However, POS can be set by us. Now, how do we set the POS? We would like to set the POS at any desired position. So we will have a numerical value. This numerical value which we give as a parameter to FC function is given always as a displacement. And this displacement can be specified relative to any one of the three positions on the file. The first position is called the file beginning. And that is indicated by a global parameter called seek underscore set. So when you include IO stream, this variable is defined. Seek underscore set means relative to beginning, whatever position I give, whatever number of bytes I give, move the POS pointer to that point. I can also set it relative to the current position which is specified by seek underscore COR or current. I can also specify relative to the end position for seek end. Obviously in such case the displacement should be negative in order to access a meaningful byte on the disk. Now, just as I can set index for an array by saying i is equal to something, j is equal to something and use that. The corresponding way of setting this POS index for a file is only f seek statement. So once again I would like to remind you that this POS is invisible to you. You cannot say POS equal to something. You cannot declare POS as a variable. Let's say POS is equal to 5,4345, that is not permissible. The invisible variable is set by the f seek statement. So every time you want to do a read or write operation. Remember, read or write operation will always happen at the current position of such a invisible variable. So before every read or write operation you must use f seek to set the pointer problem. When you execute a read or write command by using functions f read and f write these are different from c in c out they are different from f scan f and print del. Those input output statements operate upon text files to extract individual values. You can even read a line by using a get a statement or a get light statement function as we consider it from text file. This is not like that. This particular f read or f write is capable of reading or writing a specified number of files. So it does not matter what those files contain. They may contain text, they may contain binary, whatever what. So consequently there is no notion of a mind space being escaped or a new line character ending something. If you say read 20 bytes it will read exactly 20 bytes from this position of words. 1 2 3 4 5 6 20 bytes, that's it. And those 20 bytes will be given to you. Where will you give them? Obviously you will keep them in a buffer. Traditionally such a buffer is declared as a carrier. But it could be a structure, it could be an integer variable, whatever what. Because whatever has been written in the disk will be read and whatever is there in the memory will be written. There is no translation, no change, no format difference, etc etc that happens in the f read or f write statement. There is one more thing that happens that you execute these statements. Every such operation advances this invisible pointer by as many bytes as you have read or read. This makes sense because quite often you might want to read all the records of a file. In which case when you open the file the pause is set here. When you first issue a command f read with so many bytes say 100 bytes it will read 100 bytes. Automatically the position is enhanced by the operating system to the next byte code. So if you issue another f read without doing fc it will read next 100 bytes from this code and so on. This makes sense because you can sequentially access a file which is otherwise made redirected. However, if you want to update a record for example at this position you have read 20 bytes. Now you want to modify those 20 bytes in the memory and rewrite those 20 bytes here. Say for example, marks in the middle are recorded here. It's in float value. Now the marks have been updated because of reevaluation. So you have to change the marks of a particular spoon. Exactly how do we reach a particular spoon records we shall see to end up. But assuming that these are the bytes which I want to rewrite then after saying f read there is 20 bytes. If I issue an f write after modifying the values then the writing will happen subsequent to 20 bytes because the pause has moved forward. Consequently if you want to rewrite a particular set of bytes you must bring this pause pointer back to this position. That means you must reissue an f seek. You can reissue an f seek by giving exactly the same positioning for f seek that you did for reaching here or alternately you can reissue an f seek relative to the current position which would have been here by giving a negative value which will bring back to these 20 bytes or whatever. There are multiple ways of doing that. What we shall be doing is we shall be seeing some examples which illustrate how do we use direct access files for such purpose. To illustrate the need for data to be maintained in form other than text form I have taken the same example that we had seen earlier this actually creates an artificial output file from the input lines of text which contain your big semester mark statement. Do I remember this? So we have seen for example that I would have a serial number comma, roll number, comma name, comma, batch number, comma, marks, marks, marks, etc. This was the kind of comma separated field values that I got from first spreadsheet in which this data was added. And we have seen that we can create an output file containing exactly the same text lines but matter with 5 stars at the beginning and condition. It does not make any sense but just to illustrate that not only I can use C in and C out I can use files to read complete lines and I can use files to write complete lines and I can write any lines not necessarily what I have read so we had illustrated it by seeing this. So essentially then we have a file which contains this kind of data. Observe that although it is a text file I can actually open it for input and directly access any byte position here also. However we observe that the lines are not of the same length because the name may be smaller in some case larger in another case the marks in some case may be 2.5 in another case it may be 1 and therefore the length of a line is not same as length of another line although the values of the attributes that are maintained in every line are exactly same attribute for every scooty for every scooty we have a serial number roll number, name, batch, etc. but the value is so different that the total length of bytes is different assume that I still try to see what happens when I directly access records here this is an illustrative program so this program I have not started yet but it opens the pipe directly accesses bytes at position 0, 182, 364, etc. why 182, 364 these are arbitrary numbers but let us say that I have found out that the longest line was 182 bytes so I presume that if there were fixed length lines then there will be 182 bytes long so let us see what happens then I read position 182 bytes position 364 bytes position, etc. and just to illustrate the direct access I have decided that I will read and print 20 characters at each position so the objective here is not to do any meaningful operation but just to illustrate that direct access is visible even in case of text path so let us see how this program proceeds I will just say output which says arbitrary position long int file pass this file pass resembles the pass pointer that I indicated in the previous slide but it is not same this file pass is an integer variable in my book the capital pass was a hidden pointer used by the operating system but since I have to give a displacement for seeking a position that displacement has to be specified as a numeric value I can specify an absolute value that variable is required to be long int in most 32 bit computers such as the ones which you use for example long int may still be 4 bytes long just as an ordinary however in more self-respecting computer system you would have long int as 8 bytes because you should be able to handle very large bytes in any case long int is the default declaration for such internal pointer not pointer but a position value that we mentioned notice what I want to do is I want to read from arbitrary position and I want to read 20 bytes now when I read 20 bytes arbitrarily from a text file you will agree that there may not be any end of string or backslash 0 implied there is nothing like that in the text file so in order to ensure that when I read these bytes and try to print them what is printed is a valid state I artificially put this 20th element as backslash 0 backslash 0 means whatever is the remaining part of the string there is definitely a backslash 0 at the end remember when I use get s or f get s a backslash 0 is inserted by the operation which reads that text because it knows it is reading a text string however that is not so in direct text as I mentioned so I insert this article now I open a file I give a file name if you recall there was a program which we had seen earlier where file name was being read from the user and that file name is what will be used here I open this file for reading so I just say r this is the program what it does is for the relative position 0 1 8 2 3 6 4 etc etc I set up an iteration for 5 position from 0 to less than some arbitrary value say 1000 because I just want to illustrate the sample of direct text and I increment that by 182 everings observe carefully the fc set fc in 5 comma 5 false comma 0 that is the format the 5 pointer here the relative position by which I want to displace the current position and 0 means c set I can actually I should actually use c under 4 set I use this 0 merely to illustrate that c set c curve and c can have implicit internal values of 0 1 and 2 in fact in traditional c viruses can we use 0 1 2 later on as the language evolved these are the global variables which are named for this program so instead of this 0 I can as well write c under 4 set in fact that is the right way of doing it so what happens when I give this fc the internal 5 position is set with respect to beginning of the file to as many bytes as are indicated by file false effectively then if I use such an fc statement file false is almost saved as the internal false name but fc is the only way of setting that point having set that pointer I read 20 bytes from this observe the read statement f read path record this is the buffer observe that path record is the characteristic and it is 20 bytes long this f read statement read blocks of bytes and it can read as many blocks as you want the second specification says what is the size of that block in this case the size is 20 bytes the third parameter says how many such blocks are to be read observe that if my buffer was large enough I could read 5000 blocks of 20 bytes in one shot so f read is a very powerful statement in fact you can if your file is small and is well structured it has fixed length record you can read the entire file in one f read statement if you know the number of records which are in one shot and you get assigned the pointer that your pointer so that is easily possible however in this particular case for illustration we are just reading 20 bytes at this position we are printing the file path which is our internal thing the path record which is the 20 bytes if you are there observe that path record is a valid string now because we have artificially inserted a backslash program so here are the execution results reading from arbitrary positions 0, 1, 8, 2, 3, 6, 4 are the arbitrary positions at 0 at bind I get the first 20 bytes of the record observe that the record was much bigger in fact it was not mistake and joshi something something joshi has become just because I have artificially cut out that string to 21 observe the next string it starts with 11, 0 etc I think you can surmise that the first 11 must be a patch number of summa because there was no question which had 11 marks and the subsequent 0, 2, 3 indicates that this must not be the total marks for a student if you recall the values which are stored in the record so this is an arbitrary 20 bytes from here specifically note what happens when you go to 364 bytes you have 0, 17.5 and then there is nothing but on the next line you are getting there what it means is that the new line character which existed in the text file is also read as one character remember f read has no specific currency for two line character 8 of life character, 8 of string it will just read those 20 bytes so because there is a new line character on the string when you show a C out you will go to the new line very clearly that is so because 17.5 is probably the sum of the marks of 10 by a student and this is next next student again I print 5 stars, 6 not even the full roll number because the total number of bytes if you count including the new line character will be 20 bytes same thing happens at 546 a part of the name comes and something like this here 728 you get again 0 from a 17 new line character and there is 910 there is etc so from a practical point of view this exercise is not eroding any meaningful information but we have understood how to set an internal position pointer of a file at any desired location how to read from that point if we want to convert this power into meaningful reads and drives we will have to take some decisions so when is direct access meaningful direct access is meaningful only if I can access the complete record of an entity otherwise it is not very meaningful further direct access is meaningful not only if I can access the complete record of a person of an entity but I can reach the beginning point of that record on that is so not only full record reading capability must exist but ability to reach the position of the beginning of that record also must exist clearly these two requirements are satisfied if I decide I design it my file then look I shall not write arbitrary text if the file representing information about people or entities but I shall use fixed length records a fixed length record is a record containing information about an entity always has so many bytes only 185 73 2154 whatever a fixed number of bytes always obviously if my fields within a record have variable length then it is my responsibility to ensure that I convert all these variable values into a fixed length pattern and write those fixed length attributes jackstapp was against each other as a full record of a fixed length if I do that the first requirement is set that means whenever I can go to the beginning of that record I can read exactly so many bytes and I have full record for that date but how do I go to that position that is where I need to have a position indicated once I have fixed length record and I go the length of the record why I can know exactly 5th record will be there 124th record will be there 1000th record will be there because I know each record occupies so many bytes there are some consequences of this usage one there is no artificial delimiter is required it takes twice the backslash n in fact we depend upon that backslash n to get one line of data just as we depend upon white spaces flags or tabs to differentiate between different value when I have fixed length records I don't need such artificial delimiter between records such as two length characters I don't need artificial delimiter between field value I don't need to separate one field from another field by either a comma if just as I fixed the length of the record I also fixed the length of every field in short if I move from the realm of completely variable fields and variable records to a new realm of fixed length record and fixed length fields within that record then I have an extremely nice ordered situation wherein knowing the record length I can actually go to the beginning of that record and read the complete record in short interpreting the values of fields within that record of course is my responsibility and what I should ensure is whatever the fixed length fields that I have decided upon are exactly written the way I have maintained them in memory so that when I read that data back in the same memory definition of data structure then I will directly get load value since these values need not be text the values can be in load in fact they can even be structure and structure is an ideal organization of the information about an entity because in a structure I can define separate components each component can be construed to be a field in my record each of these components will have fixed length even if you have an array of characters which is a character string still the declared dimension of that array will tell me so many bytes building using this motion then I can meaningfully deploy the direct access to records so imagine that for the students of a course somebody declare such a structure which is called a type student input as I said it's an abstract data type when I say struct student input student input is not a variable I have to declare a variable separately I have to set up I have an int serial number notice that as decided earlier we will use a serial number as primary key because we still do not know how to translate a roll number into a position on that so we will use serial number to define a position char s roll 10 line character string char sm 40 name of the student int large load marks n char grade 3 remark character 255 notice that float marks 10 are not marks for individual questions and sub parts in an exam this is decided to represent the total performance record of a student in a course so these same fields are allocated to store marks obtained in different evaluation type so one 0th element may be marks obtained in visits first element may be marks obtained in assignment second element may be marks obtained in mid semesters third element may be end semesters fourth element may be project fifth element may be something else another element could be bonus whatever you can imagine that these 10 represent not the normally required number other than deciding that this course has 4 types of evaluation that course has 6 types of evaluation and changing the nature of the record it is better to allocate additional bridge the idea is that if I further introduce in this a course code then this record can be used by all courses and all teachers in the institute this is general idea then you design your files I would expect you to apply this kind of to design the course the advantage of using a struct should be obvious once I define a variable the struct should have been for s now that s is a variable which will be allocated memory and how much memory it will be allocated memory exactly equal to the size of this struct so just the size of it is 4 size of growth is 4 size of struct is something which is 5 don't you have therefore a fixed length record now similarly you can read those many bytes from the disk into a struct and when you read that automatically the bytes will fall into this that separation of fields from the disk record is done handled automatically by the read and write set to illustrate this I have considered another example of memory students but of a larger dimension also to illustrate because this kind of record how many students are there in IIT 8000 records in a file I could store all of them in an array to illustrate that there are situations where I cannot handle all records in memory I have taken this example this is a real life example of doing this work there is a national mission for education to information and communication technology which attempts to enhance the quality of education all across the educational system IIT Bombay is a part of a component of that national mission which says enhancement of the quality of professional courses to engineering college we are targeting engineering college teachers and students very roughly the process is as follows we decide on a core subject which is taught in most colleges to most students then we ask teachers to come and participate in a 2 week workshop we prepare the lectures and the tutorials and the labs not only in consonance with how we teach those courses here in IIT but also taking into account how these courses are taught at various universities the university syllabus are different question paper patterns are different so our attempt is to tell these teachers that if you want to teach in the style in which you continue to teach this is the material that you can use however if you wish to adopt something from IIT style of teaching this is additional material we engage them for 2 weeks and all the recorded audio, video lectures, transcripts all the tutorials everything is later on released in open source so that people can use it this open source material is accessible to students, teachers, everybody and this can be modified by teachers and use that as the meaning of open source incidentally the recording that are happening for CS 101 course here will all go in the open source after it now how do we open source means what do we just put a web page and links just like home page here the objective is not merely to give information because access to information is not knowledge if that were true librarians would be the most knowledgeable people in the world applying one's mind to that access knowledge discussing, learning, discovering new things is what is knowledge and therefore we wish to set up subject wise portals the first portal will be coming up in this December of computer program the next portal will be coming up on database management system which Professor Sudarshan is here heading he will be conducting a workshop this December on database management which will be used by whom, by all students by all teachers and even profession imagine you pass out of an institution and you join let's say PCS or in process or whatever or some company and you still want to keep in touch with what is happening on this basic subject of program so you are able to make possible use in fact once I say open source the information can be accessed by absolutely anybody in the world however we want to create discussion forums we want to create collaborative groups so that these open source contents are perpetually enhanced by the interested people people will contribute from this place, that place different groups will come together what should be the total number of members who may be accessing this portal either for this subject or that our estimation went like this we first estimated the number of students who joined engineering programs in the country every year about 6.5 lakh students joined engineering courses every year the total number of students studying engineering in a 4 year program is roughly 2.5 million this is only undergraduate students you take the first graduate students and you take the two colleges which are opening every day but other colleges open soon some will start closing also but that's a different story there will be professionals who will be interested in joining therefore we estimate that these courses should be used by about 5 million members now you will agree that 5 million is a large number it's not 5,000 students 5 million students, teachers working from here if they have to access then how to give them some access right anybody who wants to read does not need access right so open source code but anybody who wants to contribute then that person must be recognized by us so there has to be a registration process much like you do the registration apart from that registration there will be additional information which at different pieces of points of time the portal will keep collecting from the members some members say I want to participate in the discussion for one pointers in this subject so there may be a separate group identity etcetera you have groups in the portal but to illustrate the use of direct access files I have decided to construct an example which merely does the basic enrollment of these students these teachers and these professionals so we have about 5 million records to be written now initially when I start the portal there is zero percent so what I do I create a file with 5 million IDs which are some numbers that I give enrollment number etcetera and some artificial information about their name city whatever whatever I decide that I must collect eventually from every member who joins and I create such a file later on when a particular person comes in and says this is my name and this is my city I need that record directly and update that information that is the objective of using direct access files here we need software to create and maintain such file and associated file not one file consider the IIT Bombay academic system the academic information processing system which has large number of programs does something like this for all the students but it does not have one file it does not have one entity called there are several other entities course is an entity teacher is an entity hostel is an entity in any significant information system there would be about 15 to 20 different entities main entity and there could be artificial entities associated the total number of attributes across these entities that a real life system will handle could range anywhere from 200 to 20 thousand so the system becomes very large and complex because you need programs to handle each one of these ideas here is a program to create portal how does it come and say that this program creates artificial data for students, professionals and teachers numbering about 5 million who will be members of IIT Bombay open source education portal the database file name is artificially set as portal dv portal database now primary key we artificially decide that each participant member shall be given a 9 digit enrollment number the enrollment number to be a valid 9 digit number it cannot start with 0 if it has to have a valid digit it should start with 1 so consequently we decide that the first enrolled person or member will have an enrollment of 1 million and then 1 million, 1 million, 1 million, 2 1 million, 3 etc up to 5 million additional enrollment numbers I would like to create this data the file includes standard file includes the meet of the program starts with declaration of a structure to store the data so look at the structure member if we have integer enrollment enrollment number starts with this file name 60 name of the member initially all start why because I don't have any real name what I am doing is I am creating artificial 5 million records the only real thing about it is the enrollment which is exact correct all other information are there city would be a 40 character long name short for poway I will just put P in the first position you know there I will artificially insert postal form it's an integer character category 2 it's a 1 character 4 for character why 2 because the character should be followed by backslide 0 so that when I print it it's a valid state observe the categories S for student which I say default which means initially I will insert why O for other I have a better suggestion suppose there is a philanthropist say Raldan Tata announces that this is an excellent exercise started by IIT Bombay I want to give a donation of 10 crore rupees I will say please become a member are you a student no can you write programs no are you a teacher no sorry I can't make you a member would that be a right way to say this of course you are welcome give 10 crore rupees that's the right the point that I am trying to make is should not be lost in the human the point that I am trying to make is whenever you model any information system you will invariably go by the limited thinking that you will apply to the problem so what is visible to you you will put only those characters 6 and those attributes and those possible value so when the premise of the program or the whole purpose is to say students, professionals and teachers you would invariably understand only these three possible values it does not make really much difference because I can put any one character here in IIT Bombay the point is if you have initially identified the possible value by extending your imagination of who others could benefit what else can be done then you will get better design of your information system Sir, it means postal code and enrolment we have user type 8 but it has maximum value of the order 37,000 in some places 37,000 no no in these 4 bytes short in these 2 bytes so short in will bomb if you have a support so I think either North India or South India will be omitted so short in it is ok and enrolment number is actually 2 to the power 31 minus 1 is the largest number it's a fairly large number so don't worry besides this program I have actually run it so it works notice this file pointer declaration file start out file file name 30 although I am not going to use the file name I am not going to use the file name read this very carefully because various lines relate to both the formation of information of the structure and writing data struck member improve member so this is the variable act member is the actual variable and member dot city member dot postal code etc etc is the way to refer to the elements as we have seen rate size to write a fixed rate record the rate size is the size of this structure member improve so this abstract data type which I have described has struck member improve just for the sake of knowing I print it out now I initialize the default field values for the record so observe that I put star in the name I put a backslide zero here I put member dot city dot zero member dot city one as backslide zero I am just putting one character or artificial name of a city postal code is an integer number which is assigned here category is assigned as student which is actually default category and the remark is put as blank so I put all the blanks here and put the backslide zero there I open the output file here notice the f-open statement out file is a file pointer out file equal to f-open portal db wb this is important wb means write and b means binary so it's not a text if you don't write b it will be assumed to be a text file so we are all set to create records different so I need two bytes for a category is because when I get the category in my memory from the file I would like to without processing it print it as a string or treat it as a string can you if I just declare as car one then I will not be able to treat it as a string it is a matter of choice you can, his question is why do I need an array of two elements which is a one character code I did not put that backslide zero the point that I am trying to make is unless explicitly required you would invariably use strings to represent even individual character so all car arrays will be actually string and there cannot be a valid string internally unless there is a backslide zero so if I juxtapose a backslide zero after a category when I read the category for example memple.category is a string now I can use c out memple.category it will print it correctly I can later on as we shall see if the string type I can concatenate strings I can operate upon strings and the string operation is exactly identical for any string because every string is actually a backslide zero at if you know very surely that you will never free any individual character a string your most welcome to use car just as a single type no problem so it will take one byte less on the store so please remember this point that if you are very sure that you are going to use one character you can actually declare that as a car variable which is part of your memple not only that we shall be seeing very soon bit fields actually c plus plus permits you to access individual bits define individual bit fields and access individual fields even individual bit fields so three bit field can be defined the total rate size however will depend upon the actual memory allocation because memory allocation is required to begin at what is known as world boundary so three bit field cannot then end and the fourth bit onwards be assigned to some other full byte that is not possible so the memory allocation will be different but you need not worry about it so whatever is the memory allocation the size of the record and reading and writing will happen so here is a valid point there is no need it is a matter of choice and as I said the reason for this choice is I like to treat all characters think that straight but wherever there is a single character there is no harm you can use the single character but this is all assignment I have opened this file observe the error checking it should not be opened for example I don't have right permissions in the directory in which I have right to open the file or there is no display whatever in which case the out file will return an null point the same open statement will return a null point invariably after opening closing even reading, writing you are supposed to do an error check so this is an error check if out file is null I output some error message and return one directly here there are different sites of programming one in which you say if out file is not null then do all this all this is rest of your program consequently you are required to enter every line of your program because it is part of the same I prefer this type there is a problem I return immediately the rest of the program can follow at the same indentation level but again it's a matter of choice both approaches are of course correct this is the beat of the program to write 5 million records in the database part how to write 5 million records I set up a count count equal to 0 count less than 5 million count plus 1 member dot enrollment this is the only field which I have not given a value yet all other default fields have been given value and what is the value I am giving since the first enrollment number has to be 1 million next 1 million 1 million 2 I simply add a count to 1 million notice that I can use the reverse logic to extract the position from a given enrollment number I simply subtract 1 million from it and I will know the position or the count of records the record number not really position but the record anyway that we shall see later having set member dot enrollment equal to this it's an integer number and all other fields have been set my member record is ready to be written I will print sample enrollment every 5 million records 5 lakh records just for the sake of confirming that data is correctly written to the data because outputting every enrollment number for verification means 5 million numbers will be displayed on the screen and that display will take longer than what it takes to write on the data so this is just once I test the program I can simply remove these lines I don't mean this what am I doing in this iteration the correct statement that I am using is F right and member remember member is a structure and to function I must point pointers I must set pointers I don't expect F right to change the value of member F field will check it because F field will read the byte there but the method for function call is always a pointer if you don't give a pointer there will be a mismatch where the way the F right function is defined it expects a pointer not the actual member path rank size that is the size how many bytes I want to write how many units of records only one unit at a time and this is the out file point once I do that the invisible file pointer will move rank size bytes away next time when I execute this F right it will write the number of bytes with a different enrollment number in next position next position next position that's it whether I execute it 5 million times or 1 million times or other time this simple loop will write all the records in my file so my file is now ready for usage I will close that file and I just give a error message so this is this finished file let us very quickly look at how I will access information this is the output of the file creation program nothing great about it I have here the count 0, 5 lakh, 10 lakh etcetera and these are the enrollment numbers so this output file has been created I have now a program to read the pointer database file so purpose of this an associate program as I said is to illustrate the direct access file this program continues to use the same structure so I have the struct member info file definition the difference is now I declare this all right I calculate the rig size I open the file for reading so I say if file is equal to F open portal DB and RB so R stands for read and B stands for file notice that the same physical file can be opened once for reading month for writing and as we shall see in the next program for update it can be opened for changing manner again I check read the information that I am reading the records etcetera the actual reading what I want to do is I want to set up an infinite iteration in that infinite iteration I want you to give me your enrollment numbers and just display the information that I have so this is the infinite iteration 2 while something something is given enrollment number greater than 2 actually as we shall see the logic I could even say 2 while 2 because this is an infinite iteration I am setting up why? because first time I come inside the iteration I ask for an enrollment number collect the given enrollment number and if that enrollment number is negative I break that means I get out why am I using 2 while rather than while something 2 while will calculate the at least one execution of the iteration so I don't have to artificially read a given enrollment number first at the end of the iteration I can read it for the next iteration etcetera this is a simpler method when I am absolutely sure that I can break the iteration the break is a powerful statement it bypasses all of this bypasses while and comes out breaks means get out if I want to break only out of I should use the word continue so continue and break out differentiate it like that for logical implementation now look at what I am doing given enrollment number I subtract 1 million from it what do I get? I get the record number that I want to read it should be 0 and record first record second record record whatever I multiply that by rate size I get the file so I get there exactly that record is located in the days file I do an FC with that file position and I say from the beginning go to file post number of files file post has been correctly calculated so for a given enrollment number in one shot I have got the file position in the second shot I have set the disk pointer there internal disk pointer now when I say a free and member rate size 1 read one record how many bytes rate size 1 all the bytes which are there on the disk will come and sync in the memory occupied by that structure which is member so automatically member dot enrollment member dot days member dot that whatever whatever all fields will be populated automatically no scan f no percent we know nothing is required no scene for different number in one shot you get the record just for confirmation I am saying record position is that by so and so the file position is so and so and the enrollment number category and postal code is so and so so this is the iteration at the end of which I close the file and return this is the sample output so I am giving an enrollment number of 1 2 3 4 5 6 7 I am printing the record position you will observe that this is obtained by subtracting 1 million commit and multiplying it by the result so this is the actual file position at which this record is found that record is read in the member I am merely printing the member component so I am saying enrollment number is 1 2 3 4 5 6 7 record position is that by so and so what I am printing is enrollment number they does at postal code observe that only thing that was different in different records of the enrollment number this enrollment number is not the given enrollment number I am printing as member dot enrollment means whatever has been read from the list so it is a confirmation that yes what I have read is correct I am giving another enrollment number and then I am giving minus 1 to start it is simple now I want to change I want to change contents of the so let's say several 20 lakh students and 1 lakh teachers and 1,000 professionals have enrolled they have been given enrollment number let's say I was a student I had enrolled as a student now I have passed out and I have joined as a teacher in some other college I would like to come to the portal and say look my status is student please change it to teacher I should be able to do that I will ask you what is your enrollment number read the record modify the status and rewrite the record that is the simple thing that I want to do this program is a state all of this is state forward 4 x i etc etc I have opened the hit file the thing to be noted here is the way I have opened it the parameter on this particular flag is rb plus rb means read and final plus means read and also write this means the file is being opened for updates it is my responsibility to update properly if I make a mistake in that 5 position then I can please remember the 5 position indicator is treated very much like an index if you have declared an area of 100 bytes and if your index is 124 it will write to 124 position similarly if you have created that is file for 1000 records with so much length and by mistake your 5 position is some 1 million record onwards when you close the file suddenly the file has become very large there is nothing in between but that fight is written there and the file remembers the operating system remembers that there was something written here so you have to be very careful with the positions that you maintain here here is the program exactly the same style as reading except that I am reading an update observe how it is done I have created an enrollment number if it is negative I get out now I calculate the file post exactly as I am reading and I read fc in file file post and fread add member so far so good I have written that information member enrollment category post support now I want to change the category to teacher and update the record so this is how I am doing this again is the main slide of this program look at what I am doing member not category that means read whatever it is I want to change it to teacher so I am putting it at P and of course I am putting this at tax law zero I mean not that tax law zero would have been there any now what I have to do please remember go to the previous statement then I say fc here and fread at this fc my file pointer would have been pointed to the beginning of this requirement after fread the operating system has moved that file position pointer further by one record side if I say f right now the modified record will get written overwriting the next error record so what I should do I should bring the pointer back and to bring the pointer back there are multiple ways of doing that here I am saying fc in file minus x i fc is a perfectly valid sign that means with respect to the current position whatever it is fc is not the current position by the way fc is a flag whose value is 1 fc is 0 fc is 1 fc is 2 it is nearly a flag which is passed out to that function the file position pointer is internally maintained by overwriting all that I am saying is with respect to the current position bring back the position to one record side because that is where I have written the previous cycle so I bring that back this is one way of repositioning the pointer and I write this member after writing this member modified member I want to verify whether I have written correctly so what is the best way of verifying read that record again and print it out to check how do I read that record again even the right statement has further advanced that every read or write advance that the position so I have to bring it back this is another way of bringing it back fc is file c underscore set this was the original method that we use with respect to the beginning whatever is my file calls go to that part read this and then record position etc keep on to remember every read statement will advance the position by one record if you want to rewrite you have to bring it back there are two ways of doing it one with respect to the correct position go back one read side another with respect to the beginning give the position that you have already calculated either way will work this is the output of the update program it says updating records of this data it collects enrollment number 1234567 record position is at this point and it prints the original record that I have read enrollment number data student postal quality record position is at this point this is after I have updated I am reading it again so notice that I have got correctly the fact that the record has been updated is indicated by this window this s has become t so that means the data says k notice that using this you can use it to maintain any kind of file system any kind of records because now you know how to define a fixed-length record with various components now you know how to write full records and how to read full records and of course managing information within a record which is a structure is child's play because every structure element structure dot that structure variable dot that element is like a variable which is an integer plot character, character string or individual character as I have already pointed out either you can handle all is that clear? with this we end the discussion on file and record management with information management I have already told you that printf and scanf details you should be able to read and use them in your program there is another program I have not discussed this but you can read this program these slides should be on the machine what this does is it nearly shows one way of extracting fields from Thomas's printf field we have seen this briefly we have seen how we extract let us say the batch number or name but this one does it by extracting various pieces of information I had actually written this as a precursor to another powerful function in the C++ library called str-top or the tokenizer function the extraction of tokens from a string from a character string is something which is very often required all compilers actually do that when they read your program so that is a powerful function which can in iterative fashion successively go to the next delimiter such as comma is a delimiter so it can successively extract information between two commas and give that information to you as a character string which you can then interpret this is a precursor to that commas are explicitly checked so I like it not equal to comma do this etc etc this program is just one way of implementing information extraction since we have done that already I will just read this for you for a database so this is the program which extracts parts of the string containing serial number there is name etc etc just one thing to be noted if either to read the name for example a roll number which is a character string I must not have a blank in between so for example if I want to read the name using C in statement then the first blank will terminate the read whereas if I extract characters completely till the next comma I will get the full thing that is the only important point here so this is just an example this is one record the number of characters in line are 65 serial number is 1 roll number is this name is this you can read the program and relate to this output but I will also or maybe I will briefly discuss the SPR TOK functions a very useful and a very powerful function to extract information from a text line which has field values delimited by specific symbols comma, plan, tab this is the last slide and perhaps the most important slide there has been a delay in posting additional project ideas I regret that and I am sorry what happened is my right hand man and a colleague Nagesh Karnali felt seriously ill last week he is the one who used to handle postings of the model for assignments weekly lab handouts etc he was compiling the project list idea but in the hustle and bustle of treating him we all forgot that he was the one who was to put it we have back day man off to go out after he has become alright but there has been a delay consequently most of you did not have the benefit of the other possible choices for doing the post project many of you would have thought of some post project some of you might not have yet decided and therefore since the list was put up only last night I am extending the first stage submission from 19 to 24 midnight that is Sunday consequently the sample paiva for two students from each batch will happen on Tuesday Wednesday of the subsequent this is a breather for the first stage submission however there is no change the deadline for Tuesday submission these deadlines remain now unfortunately most of you have not made any submissions yesterday when I said that this is not correct then submission deadlines are sufficient deadlines and why have you not submitted I got three submissions by one o'clock midnight after midnight one of them was well before 12 midnight another was 11 minutes past class and the third one was about o'clock or something only three submissions and that too when I reminded that this was sufficient however I had told that batch and I am telling to this batch for this batch also that I am extending the weekly submission deadline just this once and that deadline is today Wednesday midnight so can I know why you did not submit on Tuesday midnight I have agreed deadline and all deadlines were 5 to the hard deadlines I would like to know why you did not submit ok just one second I suppose that there will be at least some batch leaders or the team leaders of batches present can you raise your hands now suddenly batch leaders don't want to be identified the much larger numbers ok so why is it that Tuesday's submission did not occur ok so let's understand one thing I will excuse me I will take just three or four minutes more but this is perhaps an extremely important declaration announcement and clarification and you must appreciate the objective the weekly deadlines are supposed to let us know how much work a team has done in a particular way as I have said you should ideally form a batch directory under that batch directory three team directories under each team directory sub directories for individual members you will continue to work on your project on your independent machine but whenever your program is done for the week whatever extent you are expected to put that source code in that whenever you completed documentation whenever you complete some write up of some idea you are supposed to put it in the only thing we don't want is actual executable they don't make sense now these were to be put in your directory and a single tar file was supposed to be submitted that single tar file was supposed to be submitted on the model as also a mail was to be sent to your lab here and to me so yesterday people said there is no poodle link there is no poodle link the lab CS email id is available my email id is available the point is are you going to do these submissions because I am telling you to do these submissions or are you going to do these so that you yourself as a group understand how important it is for weekly record maintenance so that you know an important aspect of team work and project work of this I think you should take that this is not acceptable at least the least that could happen is suppose you had met let's say once or twice you could have given minutes of the meeting this lab crew or this lab patch met on these days in the absence of any description of the poodle for project ideas we discussed the possible projects we discussed this in our lab team in our concentration we spent 1.5 hours doing this this meeting was attended by so many people isn't that information about the work that you have done if you don't report it it will be assumed that you have done nothing please understand and it has repercussions on evaluation as well although I don't consider evaluation to be the more important part but when you work as teams when you work as groups it is mandatory that you or whatever work that you have done during that please understand what I told you the biggest problem in this course project will be handling communication and decision making in group why you call a meeting of 12 people or 13 people who are there in the lab batch invariably one or two will have some other work they will not be able to attend you would have had such meetings earlier how many of your batches are guaranteeing that every member attended every meeting that was organized somebody was upset has it been recorded sky does not follow that person's head that person is not able to attend but what is important is suppose I am a member who is not able to attend a meeting called by the team leader do I later on on my own approach my team leader and say what happened in that meeting if I don't I am actually being discursive to the other members of the batch and I must therefore be punished for this lapse I must proactively ask what happened what did you discuss have you set up email groups or at least email ideas of all 12 people well one of the three team leaders who become some kind of a batch coordinator sends a mail to all members saying this is what was discussed now these are non-trivial issues they are not programming thing they are not setting up five-pointer accessing an index but without doing these things properly no programming project in the world has ever successfully been limited and I want you and you will appreciate this that if you learn this it will be useful to you not only during your stay in this institute but even later so no compromise with that if you did not have any meeting held so far this is your last chance to submit and report obviously this report will be a text file no program is expected to be written in fact I don't expect you to write any program even in the next week barring some trials that people might want to make but these submissions must happen or this week this week alone if you don't submit a tar file it is okay but some mail must come so I am saying for this week report just send a mail to your lab here to me that will be adequate so next week a tar file must go by the deadline of this day the postponement is only for the stage one and stage two because you can see this is that okay? Thank you