 Good morning and welcome to the second week of the workshop. Today, we are going to discuss pointers, basic input output which we should have discussed earlier and text files. While pointers is a sort of independent topic, basic input output in C and text files are very intrinsically linked and with these are linked the handling of character arrays within our C programs. Consequently in our examples, we shall be discussing those three things almost together because they are inseparable. First, we look at what we are going to discuss in details today. We are going to discuss pointers and somewhere later we will also describe the notion of a structure. Structure as I had mentioned once is a mechanism to describe a conglomeration of data which actually represents values of attributes of a particular object. For example, a student has roll number, has name, has marks in various courses. All of them belong to one student. Similarly, an employee has an employee code, an employee name, an address, date of birth. Notice that these attributes are of different types. Name is character string, date of birth is of a very peculiar type although represented as an integer number often. Salary could be a floating point number and so on. But all of these attributes together describe an employee and for every employee, they are exactly the same attributes for which values need to be known, need to be accessed, need to be modified and consequently we need to understand how a description could be made of the data in our programs such that we can easily identify all attribute values belonging to a single object such as a person, employee or a student or whatever. We of course are going to discuss basic input, output and files. After the T-bray, we shall have a discussion through several examples and of course we will answer questions whenever they are raised from any centre. We will also discuss after the T, the workshop projects. As I mentioned, I understand there is a lot of anxiety. I am going to substantially change the nature of the workshop projects. We have said that we will do some programming projects and we will also contribute to the question bank. I believe that now given the time frame, we should concentrate on the second aspect and for the first aspect which may require slightly longer time by the teams, we will permit teams to submit these later. However, we will not put that as a precondition to the certification of individual participation. That would depend upon contribution to the question bank as I will describe in the second half. So, here we go. This is a recapitulation of something that we have been mentioning of and on in between. First, we recall that memory locations contain data values of certain defined types. So, we have integer, float, character, short end, whatever. Second, in our program each location has a name associated with that location. Typically, it is a variable name or the value would be associated with an array element. So, a reference to an array element could also mean a location. In fact, generally these are the only two things which describe data values for us. Now, the size of a location is measured in terms of bytes used to represent the value which is stored in that location. We saw through an example how the sizes of different data types could vary and that variation tells us that different data values will occupy different amount of storage. In our simplified model of computing through Dumbo and the drawers, while drawers did represent fairly accurately the notion of locations, but all drawers in a cupboard are typically of the same size. That unfortunately does not reflect the reality of memory allocation and usage for different data values. So, in real life as we had seen earlier, internally the basic memory location has one byte and each byte can be uniquely addressed by a number. That address number itself is typically 4 bytes long. As we saw in an earlier occasion, the size of a pointer was 4 bytes irrespective of whether it pointed to integer or float or short or double or whatever. We shall see that through another example. Now, this address, this machine address as we call it, machine location address is what is used to point to one or more bytes containing a data value. So, this address can point to a 1 byte location if for example, it contains a single character value. It can point to a 4 byte location if it contains an integer or complex value. It can point to an 8 byte location if it contains a double precision floating point. Such an address which points to any one of these data types is naturally called a pointer. The address is used to point to one or more bytes containing a data value and hence it is called a pointer. Is it relevant for us to know this pointer at all? If we are writing a program in a higher level language, we shall always be accessing data which we declare in our program through variable names or array names. We will always be using or accessing this data through the names of those variables or array elements. Consequently, how does it matter to us whether internally a pointer which is certain number is used by the machine to point to these data items. We shall look at this answer in a short while. But first, we confirm what we have been stating that pointers do exist. These are merely numbers like house numbers in a house. So, if you have a colony of houses and let us say colony of thousand houses and each one has a unique house number. If we start with 0, 0, 0, then 9, 9, 9 will be the house number of the last house. Please note that a 3 digit address for the house will be a pointer to the house. However, individual houses can be smaller big. So, the size of the house has no relevance to the length of the address digits. The address is an independent commodity and we shall see the significance later. So, what use is the pointer? Continuing with the analogy of houses in a colony, suppose each house has a unique name in which case the number should not really matter because some house will be called Bhagirath, another house will be called my home, third house will be called something. If each house has a unique name, you do not need a house number. Invariably you observe that in real life you have both. You often have names for the houses and yet you have a house number. The house number is used by postman. It is an external service. The name of the home is used by us, family members living in it and our close friends. In exactly the same fashion, in our C programs we shall be using variable names, array names and such names that we give to objects to refer to our data values. But operating system, translated program by the compiler, etcetera, etcetera like to use equivalent house numbers. So, they use pointers. How exactly are pointers seen inside a machine? Here is a further modeling. Suppose we have this declaration M, N and float A3 and assume that some values have been assigned to these. The addresses of consecutive locations of these in computer memory will differ by four. Why? Because each of the integer items requires four bytes and each of the floating point item also requires four bytes. Consequently, if the first location has addressed 10000, which is an arbitrary number, then 10000 will be the address of the first byte allocated to M. Of course, M would have got four bytes allocated and those four bytes will contain the value let us say 573. If N is allocated memory by compiler immediately after M, then M being four bytes long, the address to the first byte of M will be 10000 and 4. And this will be the address which will be pointing to the contents which are let us say minus 1234567. Here are the three contents that I have shown A0, A1, A2. All three are floating point values. Each one occupies four bytes. Consequently the pointers or addresses of the first of the four bytes of each of these array elements are 100008, 1000012, 1000016 respectively. Here is a program I think I had shown this earlier but if not you can look at it now. This is a program actually is a short snippet. All that it does is it describes a whole lot of variables in with arbitrary names. For example, I have unsigned int, int, long, double, care, unsigned care, short int. These are all the various data types that I have. I am missing float I think somewhere there is also a float somewhere. Int star p double star cube this is the way pointers are declared in c programming language. What this means is that p is actually a place holder which is allocated a place which will point to an integer value. When we declare it it does not point to anything. Similarly, double star cube means that q is a pointer which will point to a double data value that means a double precision of data item which will itself be four bytes long. If we were to print the sizes of all these things I have used putnamai which was a standard defined macro that we had used and I have set size of a, size of b, size of c, etcetera, etcetera. If I execute this snippet program then I get an output something of this size. Size of unsigned int is four, size of int is four, size of long is four, size of float is four, size of double is eight, size of care is one, size of unsigned care is one, size of short int is two, etcetera. Notice that this proves to us that when compiler translates our program into machine language the space that it allocates to hold our data values depends upon the type of the data value that needs to be stored and therefore the space allocated for an integer or a long or a float is four bytes but the space that the compiler will allocate to a double is eight bytes to a care is one byte to a short int is two bytes. However, you will notice I have not written it here but if you also print the sizes of pointers of all types. I have not declared pointers to all types but you can try that. You will find that the size of all pointers turns out to be only four. So what it means is that the addressing mechanism for the computer memory is fixed to be four bytes long. The contents of individual memory locations could be one byte long, four byte long, two byte long or eight byte long depending upon the different values type of values that will be occupying this memory location. I will once again repeat the analogy of a housing colony. This is exactly like a housing colony which has a three digit addressing scheme that means each house is numbered zero zero zero one etcetera up to nine nine nine. So the address or a pointer to each house is three digits long. However, some houses may have five rooms, some have three rooms, some have eight rooms that will depend upon the size of the family that lives in those houses. There are a few important things about pointers which we must remember. First, the capability of compilers is limited to address only those locations directly as can be uniquely represented by the value of a pointer. So what it means is that of course a compiler can generate instructions to address all memory locations. But direct access is limited by the addressing scheme and in general it can address or access one byte at a time. However, most computers have machine instructions that given the first byte address you can have an instruction take two consecutive bytes out of memory or four consecutive bytes out of memory and push the contents into register and vice versa. That is where the notion of word length half word double word etcetera etcetera comes into picture. But we need not concern ourselves with that. Suffice it to say that if all pointers are four bytes long that means the memory addressability in this case is two to the power thirty two. Why two to the power thirty two? Because just as a three digit decimal number can represent ten to the power three numbers base being ten. Similarly, a thirty two bit binary number can represent two to the power thirty two numbers. Observe that we imagine it starts with zero and ends with two to the power thirty two minus one being the largest number. Note that this is not how an integer value is ordinarily stored. Either a signed integer will have minus something to plus something or if it is a two's complement form as a friend that pointed out it will slightly differ in the range namely that the maximum negative value will be one more than in case of a signed integer. However, the range is never this much for normal integers. Indeed, that is the reason why you have unsigned integer. An unsigned integer does not waste any bit to represent signed because it is always presumed to mean a positive to hold a positive number. So unsigned integer will be the correct equivalent of the address number that internally is used by the machine. To recapitulate in our machine since we get four bytes as the pointer length the memory addressability is two to the power thirty two and this often depends upon the machine architect. Somebody had asked the question what do we mean by a sixteen bit machine thirty two bit machine sixty four bit machine etc. Well, suffice it to say that these are defined by the architectural parameters of the computer processor and typically the addressability of the computer is defined by these numbers. So a sixteen bit computer intrinsically will have a sixteen bit address space thirty two bit computer will have a thirty two bit address space. It does not mean however that I cannot write compilers which can access memory using larger address space but that is not what is done normally and therefore the addressability often depends upon the machine architect. Now here is an important point that we make when a compiler translates our program it assigns memory location to all our program variables and converts all references to our variable names to the corresponding addresses of the allocated memory. For example, let us go back a few slides. Here was an example of some hypothetical addresses that I had shown assuming that the first address for the first byte of m starts with ten thousand and so on. Now in our program whenever we say m is equal to five seventy three actually the machine instruction internally will say lower the value five seventy three into machine location ten thousand. So it is the compiler which is translating my C language instruction or for that matter any higher level language instruction into a machine instruction. Within that machine instruction there will be no mention of m. m will be removed. It will be replaced by the physical address of that memory location. So consequently compiler generates the actual memory addresses which are the pointers to our data values and variables. Let us go back to our slide and continue this discussion. So when a compiler translates our program it assigns memory locations to all our program variables and converts all references to our variable names to the corresponding addresses of allocated memory. Well if compiler does that then why are we concerned with it? After all we do not even know what is the instruction code of ad instruction in the machine language. We need not. We write a plus b. We presume that the compiler will generate appropriate instruction which will be executed. Similarly when we say c equal to a star b we are referring to data values of a and b an expression is expected to be evaluated and the final result is to be assigned to variable c. Why should we bother where exactly a is located b is located and c is located which means why should we ever worry about what is the pointer to a pointer to b pointer to c. So that is why in conclusion in this slide I say ordinarily we do not need to know these and in fact most of the higher level programming languages either do not provide for an explicit pointer to be visible to your program nor do they advocate use of such pointers. Take for example the programming language Java. Java simply does not have a notion of a pointer. Why did this notion of a pointer being available to a programmer emerge? This emerge because the peculiar circumstances in which the c programming language was developed. C programming language was designed to be close to the machine. What does it mean? It means that after the c programming language was developed and first compiler was written the c language itself was used to write compilers, write operating systems and so on. As I had mentioned Unix was written or rather rewritten in c programming language after Cunningham and Richie developed that line. Consequently the language allowed visibility to these internal pointers by permitting a pointer data type so that you can actually write very efficient code because that is what is required when you write compilers or operating systems. You should also write a code which can be most easily translated into machine language because again that is what is required if you are writing system software as it is called operating system and and compilers and so on. So this is therefore the feature of c write from the beginning and it has remained. In fact in c++ also pointers continue to be available. Apart from this fact that because it is closer to the machine pointer types were made available. There are other things which cannot be done very easily without pointers. We shall see these and remark why pointers are actually important to us even when we are writing application program. We are not going to write compilers and operating system. Most of us are going to write applications but even there pointers become useful. We shall see how exactly they become useful. However the key point that I would like to make at these stages made in the last slide in the last slide ordinarily we do not need to know these replace the word these explicitly by pointers. Ordinarily we do not need to know pointers. We do not need to use pointers. So if we use pointers there should be specific reasons and specific advantages. And this is something I would like you to remember to convey to your students. Of course a syllabus contains pointers, pointer, arithmetic pointer usage etc. To that extent we have to teach students how to use pointers effectively wherever and whenever they are required to be used. But our advice to them should be avoid use of pointers unless it has a specific advantage in writing your programs. Now having said that the pointer types are made visible to us what are the features that C provides? Well C provides us features to determine the internal address of a variable. So we know the variable names we use them but in C we have construct we say given a variable name what is its pointer. Similarly we can use these pointers directly to refer to the data. So referencing data has now two possibilities. One use the variable name or array element reference. Second use the pointer which is equivalent of these variable names and array references. Not only that C language permits use of pointer arithmetic. Pointer arithmetic can be used to move over successive data elements but one must be careful. If I have disparate data elements like m n, roll number, salary, something it may not be desirable to use pointers to move over these different types already. However if I have a set of elements of an array such as array of 100 elements then knowing the pointer or address of the first location allocated to that array a0 for example it is much easier to move over successive elements a1, a2, a3, a4 etc. or to go backward or forward because the displacement in pointer is by a fixed amount. Notice that if I have a float array then every element of that array will be 4 bytes away from the predecessor element and 4 bytes ahead of the successor element. Consequently if I have one pointer value available to me and if I have that 4 bytes to that pointer then I will get the value of the pointer which points to the next array element and so on. Why one has to be careful? Well we shall see that through couple of examples. Here is a pointer example is of course an artificial example that I have constructed. So forget this hash defines their merely mechanisms to print integer and floating point numbers but what I am showing here is a sort of programmatic implementation of the small example that we saw in an earlier slide where we had a variable m, a variable n and a 3 element array a which contains floating point values. So I am using this program pointer example program to illustrate what exactly is the meaning of pointers and how they are used in programs. Here I have declared int star p to be a pointer to integer type float star q to be a pointer to a floating point. Please note that it will be wrong in our program to ever use p to point to any one of the floating point array elements. The moment we say int star p, p is bound by c programming language to refer to only an integer data type. The moment we say float star q, q is bound. It can have any pointer value any value 4 white value could be a pointer but this pointer must be pointing to a floating point value only. We shall see the reason for this binding but this is a must. So please remember that there is nothing like a declaration called pointer s where s is any general purpose pointer. So while pointers are general purpose numbers in this context these pointers are restricted to point to a specified data type only and not to any other data. Given this background now assume that we have the following executable statements m is equal to 5. m is equal to 5 this is an assignment statement the variable m gets a value 5. p is equal to and m this permits us to calculate the address of m and put it in p. n is equal to star p notice what is being done here p is now pointing to m and when I say n is star p star p by the way is a dereferencing operator. So star p will mean contents of p or contents of the location pointed to by p. More specifically since p points to an integer value star p refers to contents of n integer location pointed to by p. Obviously p must at this juncture point to some solid value it can be pointing to something arbitrary and that is what we have ensured by this assignment statement by saying p equal to and m p points to something definite something concrete namely it now points to location m notice that location m has been assigned value 5. So when p points to location m the contents of the location addressed by p is also 5. Consequently when we say n is equal to star p n should now get a value 5. So notice that these two statements are equivalent of saying n is equal to m. Of course in actual practice we would not do this stupid hodagiri if we want n to be equal to m we will simply say n equal to m. So please remember that this is an artificial example constructed merely to tell our students what exactly is the implication of use of pointers. What is being restated here most importantly is the notion of referencing that is and m will give us the address of m and star p will give us the contents pointed to by p. If I print the value of n which has not been assigned any value I should ordinarily get a arbitrary value which as we know exists in the location for n because of some previous assignment. However because of these two statements that we have executed namely p is equal to and m and n is equal to star p this n will print as value 5. Look at a similar but slightly more complex usage of pointers here I have an array of three elements a0 a1 a2 I am assigning 15.5 to a0 when I say q is equal to and a0 q becomes a pointer to a0 when I increment q q will point to the next element which is actually a1 consequently when I say star q equal to 25 this star q means contents of the memory location referred to by q which is nothing but a1. So a1 will get a value 25 when I again increment q it will now point to a2 and consequently when I say star q equal to a1 plus 10 notice that ordinarily such a statement should not make sense because a1 has not been assigned explicitly any value by us we have assigned a value only to a0 however through this jugglery of pointers we have made sure that a1 contains a value 25 and therefore when we say a1 plus 10 25 is added to 10 and we get 35 and this 35 goes where? 35 doesn't go to the pointer in fact we can never assign absolute values to the pointer that is something that is not illustrated here but obviously we understand a stupid exercise after all we don't know where our data values are the pointers to various data values can be generated only by the compiler and therefore compiler only knows what is the pointer value associated with any one of our data types we don't know and the only way to find out what the compiler has done is to use references such as p equal to and m or q equal to and a0 that is how you get a value of a valid pointer which points to the defined location in our programs but once a pointer has a valid value then a star to that value means actually the location which is pointed and what these six statements here indicate a0 equal to this q equal to and a0 q plus plus star q equal to 25 q plus plus again star q equal to a1 plus 10 these three statements represent the fact that I have assigned 15.5 to a0 I have assigned 25 to a1 implicitly and I am assigning a1 plus 10 which is 35 to star q if I have doubt I can print the values of a1 and a2 for good measure I can also print a0 but this will confirm to us that when I execute the program does it work the way that we have resend out or is there something wrong so here is the execution of the program what I have done by the way is that this program will be put in a turbo along with your assignments on the on the moodle side so you can download it and check it instead of having a separate text file containing the output and a separate file containing the program what I have been doing in the nights is I run these programs create the output then copy the text output and test it as comments in the same program at the end so we will notice go to previous slide this is the program it ends at return 0 originally I would have had a closing brass here but what I am doing here on the next page I have inserted a slash star slash multi-line command and all that I am doing in this multi-line command is inserting a text which represents exactly what you will see on your terminal when you execute this program actually after compiling of course I have shown here cc pointer example 1 dot c so this will compile it when I execute a dot out I will get 5, 25 and 35 why is 25 shown as 25.00 and 35 as 35.00 because 5 is not the value of the first element of A instead it is the integer value from a previous group of statements just go back to the program itself here we have two output groups of statement one is put num integer l and notice that we have similarly assigned some value to n okay through pointer arithmetic or pointer usage m was 5 alright but n also is expected to become 5 it does indeed become 5 a0 is 15.5 so we are not printing it but a1 and a2 we want to be sure that they are what we actually expect them to be a1 we expect to be 25 and a2 we expect to be 25 plus 10 then which is 35 which indeed is the case as you can verify by running the program so let us look at the basic input output in c first and foremost c does not have any instruction for reading or writing data now that comes out very heavily on the mindset of our students any other programming language right from the simplistic basic programming language or the older languages such as FORTRAN most languages have input output instructions but c and the derivative languages do not have explicit input output instruction therefore all input and output to the external world is performed through functions now because that is so naturally input output is an important activity and all programs must perform that activity spatial functions have been written and they are made available through standard library to perform these IOs the most important input output that we are directly concerned with as programmers and users of the machine is the input that we give through the keyboard and the output that we see on our terminal the input that we give through our keyboard is textual input symbols are input letters underscore numerical values full stop period comma whatever similarly what we see on the terminal are also symbols exactly like that and that is the reason why this kind of input output is traditionally called text input output this text input output as you know is also seen as a natural input output process and therefore the disk files which get defined and which we shall be seeing in greater details tomorrow are often categorized into two types text files and binary files that is files which do not contain text as we understand it and as we can see now scanf and printf are functions as teachers you would be familiar with these what we are going to very briefly describe is how exactly we are going to teach students to use scanf and printf first of all these are not very straightforward and easy functions to understand these functions involve understanding of the notion of a pointer understanding the notion of a string understanding the notion of formatting specifications and understanding the notion of associating different format specifications in a string which looks very familiar to us two different individual variables and other elements and such things to which either the input values are to be aside or from which the output values are to be generated so we can say almost pedagogically or sort of empirically look scanf and printf are the two functions for reading data and writing data parameters to these functions will include a format string followed by data values which could be expressions to be read or printed and the C programming language applies the appropriate pattern to each value for interpreting input characters or for generating output character so this could be as simple as that subsequent slides include a very detailed description of printf statement most of you would be familiar with that but what I thought is it is useful to actually go through the detailed specification to our students although they may not fully understand the significance of this if this is introduced in the early stages so one way is to start using scanf and printf without too much explanation about it but without using all the features as a matter of fact most of us who might have written large number of C programs may not have used all the features of scanf and printf because these features are really rich the traditional way to use printf is to say printf %d is a number n this backslash n is actually a character within the string but %d is a format specifier so this format specifier %d is used to interpret the value of n and convert it to a formatted value and throw it at the output if there are multiple values such as n, m, some other elements then there will be multiple format specifications one for each value and the specifiers can appear anywhere but each must correspond correctly to the corresponding value to the format string that means if I have say 5 variables here one is floating point, one is character, one is something then the format specifiers must be of that type and they must occur in the same order in which these variable names or expressions occur in the printf parameter list this specification I have taken from the open source C book you remember I mentioned that there is a book written in 1991 and although the standard used there has been superseded by 1999 standard much of the material is directly usable the book is very rich in giving fairly simple to complex examples of most features and what I find is that the book is almost neatly organized around the syllabus that most of you follow in your universities it may be a good idea to tell your students about that open source book that is why the PDF version has been kept on the Moodle site as also on the website and you can download this and you are free to distribute it to people since these particular screenshots are not screenshots but the contents are from that book I will not go over the details I am just retaining I have included these slides here for the sake of completeness so I have output format specification which means I can have flags this is what is important in a complete specification I have flags, width, precision, length followed by the conversion we are generally familiar with the conversion such as D or S or something we are also familiar with the length character 6D, 8D we are also familiar with the precision character so it may be 6.2D or whatever but we rarely use the width sorry we are familiar with the width, precision but we rarely use the flags what I mean is it is useful for even us teachers who have taught programming to once in a while go through these specifications look at the examples and be ready to answer questions of the more curious students if not actually to introduce some of these notions to them and let people use those which are perhaps most frequently required so here is the description of flags, field width, precision, length and these are the format conversion specifiers so look at this D, I, U, O, X, X, F XI decimal and octal we rarely use it may be a good idea to explain to students how these format conversions things could be used and then there are rules exactly what how is the precision to be interpreted and so on so these are various format conversion specifiers here there are some examples %6D, %7S, %8.2F here is an example of common day-to-day printing that we will be required to do we have ABC as integers and PQRS floating point and these are the values which are defined here minus 1, 10, 100, 123.456, 0.1234 and minus 12.34 there is a particular reason for these values which will be apparent when we look at what happens when we print using different format specifications so notice the first three statements print F %5D, A, B and C the values will occur as minus 1, 10, 100 remember that these occur in a right aligned fashion properly but look at what will happen if I change the specification to %2D that means only two digit place is allocated to print the number here the first number is correctly printed because it is minus 1 the second number is also correctly printed because it is 10 both can come within the two spaces the third number is also correctly printed but it is not printed in the two spaces that we have provided for and one on the left instead it overshoots its space allocation and prints an initial digit however if the value is 100 full value 100 is printed now this kind of simple example would be very good to tell our students some simplistic things and this can be given pretty early in the course so once again if I have ABC as integer values values deliberately chosen so that we are permitted to explain what we just saw similarly let us go to the next slide here I have chosen the floating point print in 8.4F format and then the same print in 4.2F format so we can tell our students that if the format is 8.4 the values are printed properly four digits after decimal point and of course three digits here since there is no decimal digit as 0 is printed here minus value is correctly printed however if I say 4.2 look at what happens the fractional portion of this first number is truncated because only two digits are to be printed after decimal point and sorry it is not truncated it is rounded off so instead of 4560 you get 0.46 look at the second digit it was 0. something 1 2 3 4 here you just it is sort of rounded down so in the round off you lose the last two digits you get it printed as 0.12 look at the third digit number it is minus 12.34 observe that if I have to print it in 4.2 format two digits have to be printed after decimal point but the integer portion requires three spaces two for the digits and one for the side observe that the actual value overrides this format and it prints as minus 12.34 all that we need to tell our students is that please allocate adequate space for printing of the numbers and please ensure that when you print numerical values they should preferably be aligned right alignment of numerical values and left alignment of character string is considered a standard reporting format in English language and people should be made aware of that and made aware of using it properly with this we will take a 5 minute break thank you