 As I mentioned last time, I wanted to discuss binary files and direct access files. Unfortunately, I failed last afternoon and I could not complete the job. However, I have an example of a database file which will illustrate how file I will generally handle will also illustrate the requirement to write functions and organize your program in a slightly better fashion. Additionally, I wish to discuss a sample question, the like of which you will have to set and a possible answer to a different problem altogether. So what I propose to discuss is the text files with fixed size records. The student database file that we discussed last time, the need for a fixed size record so that later on we can provide a random access to files and we can actually update individual records on the disk. Today we will just re-look at the sequential reading and writing. This portion the random access files and mapping keys onto record positions. We will discuss this on Saturday. Last time we had seen the test questions and quizzes that you have to be formulated by you. I have just constructed one example of converting a two digit number into words and a slightly peculiar rule of solving that problem which we shall see. These are the standard includes. I would like you to once again look at the structure definition. Last time we had seen this in details as to how a common rate of variables could be defined as a single structure. Distruct student info defines an additional type of data and you can then define variables or even arrays containing this type of data. The difference is that structure may have components which are not identical in type. For example, the first component is a character array which can hold a 30 character string followed by backslash 0. The second component is another character array but the third component is integer variable. Remember that inside computer's memory, hostel will be stored as either a 2 byte integer number or a 4 byte integer number depending upon the representation. Now, ordinarily when we write to files and when we read from files, we are always considering reading and writing text files. Using a simple operator for c in, we are able to dissect different components typed it on a line and those components which are given in as actual strings are written, interpreted as other strings or numbers, floating point numbers, whatever and are correctly assigned after conversion into the internal memory structure. Here is a function to print students. Notice that we are printing 30 characters for name, 8 characters for row number and percent 4D which means in 4 digit you want to print the hostel number while justifying. Please note that when we print s dot hostel in this format, the internal contents of memory which are essentially using a binary representation get converted into a character string which is what you see. This is the program that we have seen last time. We open a database file which is arbitrary as database dot txt. It is defined as an IOS string file having a characteristic of input. We define student info as a list of students. Thousand elements are shown in CS1 on spoons is what we are talking about. The list size of course could be arbitrary long as long as you can contain all of it in memory. The rest of the program we have actually seen last time. So I will just glance over it. The idea here is that I want to show this is not the way to write program properly. When you write large software as I mentioned, you would like to convert practically every major operation into a separate function and call that function independent. Just going back to the previous slide, this is the function we are written for printing the student's information. In exactly the same fashion we would like to rewrite most of the functionality of our program particularly as a program becomes long into separate functions. So this is the program which we have originally written. We are just reading it and notice that it assumes that the database file contains first line which only has the number of students. And then subsequently on each line there is a record for one student. This is not necessarily an ideal way to store data in a database file. A database file typically would have records each one of which is exactly of the same length and contains same information. So number of students actually has an information which has to be actually found out by reading all the records of the file and counting the number of records that you have read till end of file. But this is an artifact that we are using for this particular program. So I am reading the number of elements in that file and for I2 to 0 to n minus 1, I will read out is not true, is not hostile. S is variable of the type structure and therefore S automatically has exactly the same three components that were described in the structure. Once we have read all the elements in S, we can simply assign S to I of element of student list. First we could also say student list I that name, student list I that name, student list I that hostel that would have been perfectly alright. Having read the database I can close the file and forget about it. And then if you recall last time we were constructed a sort of query language where you give h followed by a number. It will give you details about all students in that hostel. If you say x it will exit, if you say r followed by a number it should give you details about that particular number and so on. Now this artificial query language interpreter is what this program is all about. So this we have already seen last time. I am just recapitulating it so that when we rewrite this as functions you will be able to understand what exactly is being done here. So here is for example an option h. So if h is the option given then you have to give a hostel number and you read a hostel number. Notice that this particular c out statement is not really required because you can give h followed by a hostel number in the same line as input. In fact that is the format of the query language that we had artificially designed. Which is all right because c in is capable of reading first a character h followed by any number in respect to whether that number is written in the same line or in the next line. Having read that you will actually look for the records of the students who belong to that hostel. So if the record of a student belongs to that hostel which you check by saying s not hostel equal equal hostel number then you can include the function to print that student's record. I have introduced the additional count here. How many students have you found? Ordinaryly this is not required because if two students or five students are found they will be listed. The reason why this count is introduced is suppose you give a hostel number and the student database does not contain any record of that hostel number. In this case you would not know what has happened. So maintaining a separate count is useful in such cases where while I am implementing the count found by one every time I find a student who satisfies that query. That is when I respond then at the end the count will remain zero and I could utilize that if the count found you notice the not count found usage. Ordinaryly I would have said if count found equal to zero equal equal zero see how it is this. But count found equal to zero is effectively equivalent to not count found. Because count found equal to zero is false not count found is any not zero value is true. So there is an implicit conversion between a numerical value and a logical Boolean value. A Boolean value of zero is false. A Boolean value true is actually one. But any value numerical value which is not zero is always treated as true by the C++ compile. Now here is the actual challenge that we wish to rewrite this program organizing appropriately defined functions which are written the whole program looks organized. For example the struct that we have defined. Ordinaryly we would include it in the program itself as we have seen here. But when you have many structures many other data structures which are peculiar to your programming system such as for your course project. It is customary to create header files separately. These header files are given in extension of .h as is tradition in CC++ programming. So you would create such header files separately and just as you have includes at the beginning of your program you also ask the compiler to include that header file. Once the header file is included which is part of the pre-processing step of the compiler. Then the rest of the program is compiled assuming that all the header files you have written within your program itself. They are actually included physically and then the compiler works. So this is how we would like to organize our programs. Here is an example. First the header file. I have called it .h and this contains my structure definition. The only difference is this is a separate file now. The idea is important that all the headers for large projects such as the course project that you are doing should never be mixed within functions or within program as long as they have global declarations. So also it is global declarations which are required either within a function of your program or in the main program itself are typically separated out put into a header file. You can have more than one header file which is not a bad idea if different type of structures are put into different headers but you have to include all of them in your main program. So this is one way of doing that. This is a file called studentinfo.h. Now look at the main program. The main program has all these includes our stream, higher bandwidth, stream, system, whatever you require math depends upon your program. Using name space sd is the name space for all of these. However, notice the last statement include studentinfo.h. This forces the compiler to read up this file which we have just created, the header file and include it physically as part of the program which is being compiled. So automatically the compiler looks at all the definitions. Do you notice the difference between the way studentinfo.h has been written in double quotes versus these things which are written in less than sign, greater than sign. Any idea what is the difference in these two? This is indeed a small but important difference which is worth while looking at it. Both of these statements actually imply the same thing. Get the file that is from the desk and include it before you compile the program. So it is called the preprocessor step. The cc++ program compiler that you involve before starting compilational to invoke this. The difference is only the place of the directory in the disk in which c++ compiler will search for the header file. This particular notation and the brackets less than equal to less than and greater than where you include it. The file is searched in the standard library location. When c++ compiler system is installed on your machine, some directories are specified so that c++ compiler knows that if directories, if some header files are to be included, where to find those. Those are typically the stdlib and other library directories which are not present in your working directory. They are somewhere else because they are considered part of c++ compiler system. So ordinarily whenever you say include less than 5.h greater than, then this file will be searched only in the standard library directories which are known to the c++ compiler. However, when you write your own programming system and you write your own header files, you have no way of putting your header files into the standard libraries. Indeed, it will be detrimental if you did that because 20 different groups working on 20 different projects all trying to put their header files into the same libraries would cause a lot of confusion. There could be several different projects which may have files, header files of the same name. They must necessarily remain in different directories. Consequently, c++ preprocessor provides for this kind of notation. When you say include 5.h in double quotes, this means that it will first look into the current directory. So whatever current directory you have, which is the directory in which your actual program will be stored and other programs will be stored, that is where it will look at it. Since it is an absolute state, it is possible to give a relative path. For example, you might have directory structure like this and your program, the main program itself, will be in the source directory, which is the tradition in Linux kind of systems. However, files and such other things, you might want to keep into a system directory which is at the same level, which is say include directory. So suppose the header file, the dot-h files are kept in this directory. If this is your working directory, then your program will be inside this and when you are compiling this program from the terminal, your other symbol will actually mean current directory is this. Since the dot-h file is not in this directory, this specification alone will not work. In such cases, you simply may say, so you can actually specify a relative path to your current directory. Go to the input sub-directory and include this. It is up to you as to how you would like to organize, but larger projects will invariably be organized such that for the project directory, you will have several sub-directories, you will have a separate sub-directory for your source programs, which itself could be several files. There could be separate delivery for headers. You could also compile some of your functions into your own library. So there could be a lib sub-directory. It all depends upon how you want to organize it. So coming back to our problem now, you will notice that when I say include student info dot-h, you are essentially asking the compiler to look for this header file into your current working directory. Incidentally, the pre-compiler is small enough. If this is not found in the current sub-directory, it will automatically go and search in the standard library anyway. So for that matter, if you are going to say include iosteam in W code, obviously it will not find it in your working directory, it will go to the standard library and search it. Why then we don't write this apostrophe in all these cases? It is because we know where those are located. So we don't want compiler to waste time searching here and there. In a nutshell then, the distinction is that Angular brackets less than and greater than symbol are used for include files which are to be found in the standard library directories of C++ compiler system. And your own header files are written in double quotes so that they are served in your current working directory. Just as we have said, include something dot-h instead of clattering our main program with the complete definition of the structure, it is customary to write all function prototypes at the beginning. I will explain this concept to you where in order to facilitate the compiler to understand how functions will be referenced without necessarily writing the complete function definition, we have a provision to define function prototypes. So these are the function prototypes that I have written. Read database, print student, find hostile info, file student details, etc. I have not written all functions. I have written only the first two functions which we shall see in a moment. The advantage is not only that this prototype permits compiler to understand how these functions will be referenced, but the functions themselves need not be physically written as part of this text or this source file. The functions themselves could be written separately and those could also be included. In fact those could not only be written separately, they could be compiled separately. And the compiled programs can be put into a library of your own as I mentioned some time ago. This permits a very clean organization of your main program without cluttering other things. Notice the function parameters that are prescribed here. There is no actual parameter here. This is only a prototype. And even in a function definition, you don't have to define variables. You just have to define the types of data that comes. Student info is a list of students, list of student structures as we have seen. By just indicating this opening bracket, closing bracket, you indicate that the first parameter is an array. You need not define the size of an array obviously because the size of the array within the function definition when it is a parameter has no sense. The actual array size will be the original array size and that size is what will have to be used because you are directly operating upon that array. You must of course know what that array size is and here we have used int amp. This ampersalt operator, which is a de-referencing operator we have seen that whenever we are transferring a parameter by reference, then the reference is indicated by putting this. So when you call this function and transfer for example integer value n, its pointer will come here and you will actually be operating upon that n. Consequently, when the read database reads the data, it will find out the value of n either as stored in the file in our case or by reading all the records and counting them, it will return the value of n. Returning the value of n as a part of the parameter is not possible if you pass the parameter by value. That is why this ampersalt. Why a paid student gets only student info type? Please notice student info is not a variable. It is a type definition that we have given as part of the structure. Similarly, find hostel info. We will require an array and an integer number for the hostel. Find student details. We will require the student info for the student that we have to be passed by and care star will be a whole number. We might also want to pass the student info arrow here. I have not written the functions other than this. You might want to write those functions appropriately modifying prototype definitions if you so wish. Here is the main program now. Look at how main program gets shortened after some basic definitions such as student list, etc. I simply say read student database. So read database student list, n. A single function in location will read the database. I could then have other functions which you say read a query from the keyboard. Other functions allow a query. And within the allow a query call a function to print student data, call a function to find data about students from the hostel, call a detail about one particular, call a function to find out details about one particular student whose role number given, etc. In general, most of the functionality of your main program should be divided into functions and the main program should consist of function calls of this type or similar type written. Here is an example of reading database. So notice how the function itself is written. Why read database student info, student list, int and n. Let us go back to the previous slide to see how this function has been involved. It is invoked by saying student list which is a list of thousand main school numbers, hostel numbers, etc. Notice that nothing exists inside it. In fact read database will populate this list and send it back. The next value parameter passed is n. n appears to have been passed by value and that is another point that you should remember. When you make an invocation to the function it is not clear whether you are passing the parameter by value or passing the parameter by reference. That becomes known only when the declaration inside the function is c. Here you always say n. But if you go here int and n will mean that a pointer reference is going to be passed here. The list of it is exactly similar to the program that we had seen earlier. A steam is defined as f in. f in is open with the file and the database.txt. Student info, a variable s is defined of the type student info. And you then read n, read 1 to n. All of this, the database has been run. You close the file. There is a statement missing here which is written. You should not say return 0, you should just say return. Why you should not say return 0? Because the function has been defined to have a type void. Why do you mean it does not return any value? So just a written statement will do. In exactly the same fashion, here is another function void print student info s. So you pass to this function a single element of the type that structure s. And it will print the components of s as s.name, s.name, s.hostel, etc. I would suggest that all of you should write functions for solving this query language problem. A generalized query where the query could be of any type. And your creativity is permitted to have its own say to say that I will have this type of query, that type of query, whatever. You can actually have variety of things. You can have additional information in the database such as CPI of people or hobbies of people and find out people who play football or whatever you want. In fact you can extend it more. And this extension is that this is precisely how I would like you to organize your database about the unique ID project that you are doing. All the relevant information about the student, of course the hobby of the student etc. may not be given as you are using it in the sample application of the project. So what is sample application you have in mind? The student database must contain the information pertaining to that application. You might want to put that in a database file of this type. This study we shall see how the database files could be actually randomly read and updated in line. So you don't have to actually read, update within the iris and recreate the whole file. That can be done for a file containing 1000 students data. That cannot be easily done with a file containing 100,000 students data. That cannot be done if the number of elements in that file are 1 crore. You cannot always read the entire file in the memory. You cannot update the file in memory in terms of the large arrays and rewrite the file every time. You have to have a mechanism by which you read a specific record of the file, change it and rewrite it exactly in the same position of the disk so that any other time in future when you read the data from that disk, it is read as if it is part of that. And that is the reason why we would like records in files to be exactly of the same length. Later on as we shall see, we can count the position of the particular record in a disk file by treating the disk file as if it is an array of bytes. So imagine a disk file to be an array of bytes and you have the possibility of locating a particular byte given its position number. And either read that byte or write that byte. That is precisely what disk file systems of UNIX or for that matter of any operating system actually permit you to do and CC++ programming which exploits that. But as I said, we shall discuss these things on this Saturday lecture. So here is a question. Remember all of you have to set questions. Instantly I have modified the requirement. Instead of every team having to set a question, I am assigning the job of setting questions and setting quizzes to the entire lab batch. So the lab batch will not be setting five quiz questions and five sample test questions but instead they will be setting three questions and three quiz questions. This will reduce the burden and you can distribute the work across teams of a lab batch. So with the group of 20 to 22 students together should do this work. I will try and define the details of this work today and post it on the website. But this is a sample program. So I say write a program which reads a two digit number and translates it in words. And I have given an estimated time as 30 minutes. So this could be a half an hour exam question. Now how would you attempt to solve it? Let's look at possible attempts here. Suppose the number given is 34, then my program should output 34. Suppose the number given is 50 by program should output only 50. The number given say 12, the program should output 12. Suppose the number given is 4, what should program output? Well the problem says given a two digit number, print it in words. This is not a two digit number. So if anybody gives you a number between 0 to 9, you should say invalid number. If somebody gives you a number which is 124, you should again say it is an invalid number. Why is this relevant? Printing like this? This is relevant because one should know how to print this. You can print any number in words. Because in order to print this, let us see what you will require. The program is incapable of understanding English by the way. So you have to provide for definition as constants, several strings, which are commonly required to print any two digit number. A simple Goragiri way of solving this problem is to define an array of 100 strings and write in words 1, 2, 3, 4, 35, 36, 37, etc. And you read a number, use that as an index, go to that particular string and print it. That is hardly an elegant method. In any case, since that is not a desired answer, even if you examine, you will not get any good marks. The idea is to actually dissect the given number and print separate portions of that number in words. The different words that you will encounter will necessarily be the words which describe the digit position. 1, 2, 3, 4, 5, 6, 7, 8, 9. You obviously require these strings. The strings that you will require will also require to print the 100 of the tens position, such as 50, 34, etc. So you require 20, 30, 40, 50, 60, 70, 80 and 90. It is advantageous to include 10 also in the list of 20, 30, 40 because 10 standing alone does not have a unit digit. It is 0. And when the last digit of a number is 0, you do not want to print the digit position anything at all. Only if it is 1, 2, 3, 4, 5, 6, 7, 8, 9, you want to print any one of those words. Otherwise, if the number is 10, 20, 30, etc., right up to 90, you just want to print a single word, 10, 20, 30, 40, etc. However, if you have a digit following this, say 91, you have to print 91. Again you can combine 10, 20, 30, 40, 50, 60, 70, 80, 90 as words and 1, 2, 3, 4, 5, 6, 7, 8, 9 as words for the digit position. The only exception is what we call the teens, 11, 12, 13, 14, 15, 16, 17, 18 and 19 are very unique strings in English language. They do not follow the conventional wisdom of forming words for numbers. So you will have to store them separately. That means a number 12 will have to be stored separately as a string. What I have attempted here is to store not only 11, 12, 13, 14, 15, 16, 17, 18 and 19 as teens but also the number 10 as teens. If you consider 10 to be the first number of the teens, that is logically also okay. But that helps make the program much shorter. What you will be required to do is if you set up such a problem, then at least three members of your lab team will have to make actual attempt in 30 minutes to write this program. It is quite likely that only one of them gets the correct answer. It is quite likely that one of them gets the correct answer. Incidentally the identity of students is not important at all. So you could just say student 1, student 2, student 3. The point is that genuine attempt for 30 minutes has to be taken by the three assigned students. They should put a watch right down the program. Then type it out and submit it. The typing time is not included. The attempting time is included. These three attempts should be included along with this problem definition and additionally the whole batch together should write a decent answer. And they should also write the actual time taken to get the decent answer because the decent answer is going to be compiled, tested, run and ensure that the program runs correctly. And it doesn't matter if that effect takes one hour or one and a half hours. So I hope you understand what you have to do for every question. You have to formulate a question, estimate the time. You will be required to formulate three types of questions. Simple, medium and complex hardlers. 15 minutes, 30 minutes, 45 minutes. Each question has to be attempted by three members of your block batch by putting against the clock 15 minutes, 30 minutes or 45 minutes. Whatever they write on a piece of paper that attempt has to be typed in without the identity of the student. Just say student 1, student 2, student 3. And finally the batch as a whole has to actually write a correct answer reasonably decently written program and also indicate how long it took and how many people work together to assemble that answer. The total exercise barring those students who participate in preparing sample answers who will naturally take 15, 30 or 45 minutes plus about 15 minutes of typing out their efforts. The actual correct answer identifying, writing it down, debugging in testing may take about a couple of hours but this job has to be done by different members of the team for each of the three questions that you are required to set. Here is an answer that I have written which is based on a web resource that I have found. It's a very cute answer so you might like to see that. So here program to translate a two digit number in English words. Incidentally the final answer that you prepare, the correct answer should be properly documented program because this is not what anybody is expected to write in their exam. But this is example of a well written program to solve that problem. So this is based on a triple translated program from Danny web. The Americans typically write in words the numbers in triplets. So there are hundreds, there are thousands, there are 100,000. They don't call one law for one crore. So there are 1,000, 2,000, 20,000, 98,000, 250,000 then there are a million. After that the million becomes billion, 1,000 million is one billion. So they actually consider triplets for converting into words. I have taken the triplet program and converted it into do it sort of thing. So this is the program. It is a very concise program so it may take some effort to understand. I will just explain very briefly in the nutshell then you can look at this program on the web and analyze it further. Here is an inline function definition. An inline function definition is almost like a macro. That means within the program any way this function appears actually this whole definition replaces that function reference in the string and then the program is compiled. So here is a function to append things. There is a left hand side string, there is a right hand side string and there is a separator between the two. The general separator shown is blank. If the left hand side is not empty, this can be, this is a member function of the string class and the right hand side is not, depending upon the right hand side not being empty, the separator is either blank or whatever plus the arches. So arches plus equal to this. You actually will take some time to understand what exactly is happening. Basically there are possibly two strings. Say 20 and 1 which you want to concatenate or you want to append 1 to 20. But if the number is only 20 there is no 1 then the second string will be blank so you would like to append nothing. This is the main program. As I mentioned I am defining string constants. The first is a set of digits 1, 2, 3, 4, 5, 6, 7, 8, 9. Notice that for 0 I have put a null string basically. It contains nothing, not even a blank space, nothing. Then I have teens 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19. So this is a teen string. Until I have a string called duplets. The duplets is 20, 30, 40, 50, 60, 70, 80, 90. And I have defined these to be two blanks instead of one blank here. Just as digits has one null string that duplets has two null strings. The purpose will become obvious when you look at the program. And finally I have defined integer number which has to be heard from the keyboard. The program itself is fairly straightforward once I have defined these constant strings. This is the program. I start with number in words which is defined as a string to be null string. So originally there is nothing, only null string. Then I will ask for a two digit number and read that number. Once I read that number, if the number is greater than 99 or number is less than equal to 0, I will say invalid number cannot translate sorry. Notice that I said 1, 2, 3, 4 also is not a valid two digit number. But assuming it's a two digit number which is non-zero namely 0, 1, 0, 2, 0, 3, 0, 5, I will still print it out using this program. Only if the number is negative or if number is greater than 99, this program will not print it. So if that is the real condition, I will simply return minus 1 from this main program and exit. But if I come out of this, that means I have a two digit number and it's printing is done in this fashion. I first dissect the number into left digit and right digit. So I define a left digit and a right digit. Notice that it is defined as unsigned. Why? Because I would not like to confuse between the internal signed representation for negative numbers and so on. I want a pure, pure digit. So I define it as unsigned end. Number, model 100 divided by 10. You will agree that this will give me the left most digit. So if the number is let's say 78, 78 model 100 is 70. Yes? How much is 78 model 100? 78 model 100 is 78 itself. Is this required then? Num model 100. If I just say num divided by 10, will I not get the left most digit? Be careful about such figures in a program. Do you know why this has happened? Because the original program was given with triplets. If I had a number 478, then if I were to get 78, I would have to take a model 100. Now this is a remnant of copy paste. I have copied that source, so I have this 100 here. I have very, very kept it here to indicate to you because you should actually be looking at such sources across the web. But when you decide to use any program that is published for public domain, you should be very careful in what portions that you use and how you use them. Anyway, you get the left digit here, you get the right digit here. As I mentioned to you, 20th street is a team for the simple logic. The duplex started 20, 20, 20, 22, etc. Duplex means two digit numbers beyond 20. So below 20, everything is handled especially basically. And beyond 20, everything is handled especially in simple logic. So here is the statement, append, num in words, duplex, left digit. Left digit and aisle digit. I am taking aisle digit and creating the first append. Remember the string starts with null string. So in our first appendage, the left digit will get appended. And then this will be followed by appending the right digit. Now if the aisle digit is greater than 1, then only I will append this. Because I want to append only to 20, 30, 40, etc. Otherwise, I want to append that whether it is strings or CO digits. So if it is strings, I will append this. If it is digits, I will append the right digit. It all depends upon the right digit. If the left digit is 0, that means it's a two digit number but actually only one digit number. 0, 1, 0, 2, 0, 3, 0, 4, 0, 9. Then I want to print only 1, 2, 3, 4, 5, 6, 7, 8, 9. On the other hand, if the number is 11, 12, 13, 14, then the left digit will be 1. If the left digit is 1, then teens will be printed. The right digit of aisle digit position of teens which is 11, 12, 13, 14, 15. It's a very cute logic. I would like you to look at this. But you look at the simplicity of the resulting program. Now, it is not necessary that every problem that you compose and solve should always have such a cute solution. What is most important is it should have a correct solution and the correct solution should be logically written. That is what is expected from you. But those of you who have creative talents might consider writing such programs elsewhere, not necessarily in this course. Thank you very much.