 So, today we are going to discuss some more examples of string handling. More particularly, we will look at the input output handling of characters. So far we have been using a C++ operator called C in and another called C out for handling input and output. As you know, the C in and C out uses greater-greater and less-less operators in order to extract individual values which we name in those statements. Now, the input that you type in usually have these values separated by one or more blanks. So, whenever you give a blank, a value is supposed to have ended and the next value starts. What if you want to read a blank character itself from the input stream? Similarly, whenever you press enter while giving input, the present value of input terminates. What if you want to capture even the enter symbol itself because enter also has an ASCII code? How do you generalize the input operation to capture each and every character that is typed at input without bothering whether it is part of an integer number, part of a floating point number, part of a care, whatever? Obviously, such characters can be captured only in a care type variable. So, you can declare a care type variable, but the issue is still how do we handle input and output of individual characters? In the process, we shall also reiterate that internally a character is represented by the ASCII code that corresponds to the symbol that has been stored in that care variable. Immediately after this, we will proceed to discuss an important concept in C++ called pointers. These pointers are essentially addresses to memory locations. Ordinarily, we do not deal with addresses. We directly deal with the names M and A, A, whatever, whatever. And we refer to any value by the name that we have given to that value. But it is possible to directly refer to addresses in which these values reside and do some arithmetic with these addresses which provide some interesting possibilities for programming. So, we shall look at pointers and pointer arithmetic. Subsequently, we will look at functions again. So far, we have seen that functions are called by transferring actual parameters that you mentioned in the function to the formal parameters defined in the function body. So, these actual parameters are actually copied onto the corresponding parameters. That is one way of passing parameters called parameter passing by value. There is another passing mechanism called parameter passing by reference. And we shall see what that reference passing means. We may not be able to cover the later part in this lecture today. We will do that on Friday when we discuss the general concept of files as well. So, this is what we have noticed so far that as far as handling characters are concerned, we can define a care type variable, say AXY, PC, whatever, and store one character in it. What gets stored actually is an ASCII value. If we have to store a string of characters, then a string of characters can be stored in an array. The convention that C++ follows is that whenever you have a string of characters stored in an array, you invariably put a backslash 0 as the last character in that array. It is an artificial character, it is an artificial symbol, it is not part of your string obviously. Backslash 0 amounts to a null value, which is literally 0, so a symbol whose ASCII code is 0. It is not printable, you cannot see it, but its ASCII code is 0. So whenever a 0 comes at the end of a string, any processing can terminate because you know that the string has ended. We have seen that we can directly read a character string typed as input in an array of characters. For example, if I declare care str50, then somewhere in my program, I can say c in, c in identifies str as a character array and therefore it knows that it has to get a sequence of characters from the input and those characters are to be put inside this string. Assume that I type in, let's say name of our country, India and let's say I put a blank here at the end. The moment first blank comes, c in will terminate input because that is the characteristic of c in, c in remember is supposed to identify and isolate different values that you type in at input. The first blank will terminate. What will go into str will be i, n, sorry this will be a small n, d, i, a and so this will be the contents of str0, this will be the contents of str1 and so on. These will be the contents of str4, the fourth element will contain the symbol for a and this will be the content of str5. Please note therefore that if you want to store a string in a care array, then the length of the actual string stored has to be one less than the size of the array because you require an extra element to store the backslash 0 at the end. This is how c in works. The question that we are raising is what if I want to read these blanks as well. Suppose these blanks, these two blanks at the end or suppose I put a comma somewhere, suppose I put an exclamation mark after blanks and I want to say that look I will internally analyze what are the contents of this string, whether they contain integer values, floating point values or some arbitrary symbols is my business. I do not want you to meddle with me. I want you to quietly give me all the characters that I type as the input. Such a function is not provided by c in, c in will not work. There are two variants which we would like to do for which c++ provides spatial functions. These are called character IO functions. The first of these reads a character. It is invoked by using gate care. It has no parameters. The moment gate care is invoked, whatever you type on the terminal, the character, any symbol, any key that you type is captured and is given back as the return value. That would be assigned to c if you have declared car c as a character medium. Note that this will read every character. This will not only read a blank, but this will also read a enter character. Enter itself has an ASCII code, the new line character. So it will read that. Usually if you want to read a series of characters using gate care, you would use this gate care function in an iterative loop. So you can say wild true, c equal to gate care, and then do something with that c. The problem is how do you terminate this loop? That is because anything that you type is a valid input. It will be taken in. So you will have to have some spatial arrangement logically set into your program to terminate such an input. We may, for example, say that look the capital X is a symbol which I will never type ordinarily. So please look at all the characters that I type in till I type capital X. So obviously whenever capital X is typed, it will also be read by this function. But then I can analyze it in my program. And if I find the capital X, I may terminate the object. Capital X is an artificial example that I give. Usually you will give a symbol which is ordinarily not expected to appear in your string. Invariably, our actual requirement in practice is to read one full line of characters. That is whenever I press enter, I say this line is terminated. It may not make sense in terms of numerical values, where I have to read, let's say, 1,000 values. I might arbitrarily type seven values separated by space on first line. Press and enter. Eighth line, tenth line, second line, and so on. That is permissible. But when I am handling, text information. Take, for example, roll number of a student which incidentally in IIT is a string and not an integer number because capital D is there sometimes and so on. Name of a student, which could be first name, middle name, last name, et cetera, et cetera. Batch number of a student, marks of a student. And as I said, when I have to input a large value set, I will not be typing it physically. I'll be preparing that input in a file and I'll be reading from that file directly. How will I prepare a file? Invariably, I will prepare a file such that one line of text corresponds to one student. Next line of text corresponds to second student. That is natural. So roll number, name, marks, let's say. Enter, roll number, name, marks, enter. This is one example where I want to read the entire string till enter is typed by someone. Then I want to read the next line till enter is typed. And I want to retain this entire string in an array inside. I will then take the responsibility of analyzing what are the contents of this array, which is roll number, which is name, et cetera, et cetera, that I will analyze. For such a situation, you have a function called getString or getS. This is an old, deprecated function. So modern C++ compilers, either some of them may not support this, or some will say it is dangerous to use this. Let us see what is the danger here. The objective of this function is actually to read all characters typed as input till you encounter an enter. The moment an enter is pressed, the enter is actually read inside by the operating system. But getS will sense that enter, terminate the input operation, and instead of enter, insert the backslash 0 in the array that you did. So consequently, what we saw in the previous slide, say I typed India, I will actually, when I execute getS A, I will get India followed by backslash 0 assigned to it. Let us see some examples of how do we handle this, how do we use this. So these are some example problems. First problem says, read all characters till I type capital X. The moment I type capital X, I should terminate the input operation. Now I want to store all these characters which I have read in an array, and finally I want to print the ASCII value of every character that is stored in that array. Simple problem. The next problem says the same thing, but it says read a string using getS, and do exactly the same operation as earlier, except that there may not be any artificial capital X typed in. When I read a string, okay, the moment I type carriage return or enter, the string will terminate. So I have to do the same thing. The third and fourth problem are more interesting. I have read a string. Now I am told that this string contains a full name of a person. Usually it is first name and last name. And these names, these two parts could be separated by one or more blanks. Now my job is that I read such a string. I have to isolate first part and I have to isolate the second part, and I have to assemble them in two different strings. That is one problem. The second problem is more general. It says I have typed some words on a line. How are you with some blanks in between? If there is exactly one blank between every word, then it is relatively easy. I can read all the array, I can read the entire string in an array, and I can simply start scanning the array element from 0, 1, 2, 3, 4, 5. The moment I get a blank, I know one word has ended. I will put a backslash 0 in that word, start a new string for the second word and carry on like that. But suppose I have variable number of blanks. Suppose somebody says that I may put any number of blanks in between. Or still, the last word when I type, I do not press enter immediately. I press some blanks also. Still worse. The first word which I type does not begin immediately with the first character writer. I say blank, blank, blank, how? Blank or blank, blank, blank, blank, you? Blank, blank, blank, blank, blank, enter. Now that is a more interesting problem. How do you extract these different words after having read a string? The handout that I have given contains some of the programs that we will discuss. Please pay attention to the discussion here. You can read these contents later. Whenever I wish you to refer to the page, I will mention that. We will look at some of these problems. Here is a program which reads individual characters typed by the user. So look at what program declares. It declares a character variable C. It declares a character array 256. We are assuming that you will input maximum 256 characters. Please note that in this particular problem, we are not assembling a valid string. And therefore it is not necessary to put a backslash zero symbol. We are merely storing the characters, individual characters as a dumping bin. The array is being used as a dumping bin. That's all. We are never going to treat that entire array as a single string. And therefore we can accommodate all 256 characters in the first string. There is no need for a backslash zero. Because the problem says whatever is typed should be stored. And at the end, you should finish off giving me the ASCII values of these characters. N care is used to represent number of characters. So I set it to zero. C value is the ASCII code for a particular character C that I will read or I will analyze. This is the program. So let's see what I do in this program. I start with reading a character. Whatever I type, get care will get me that. Please note that I need to check that capital X has been typed or not. If capital X has typed, I have to quit. So I set up a while loop while C is not equal to capital X. Ordinarily that would be sufficient. But since I intend to put this character in an array element, then I must ensure that array does not get filled up. And beyond the capacity, I don't try to push anything in it. Therefore, since N care is the number of characters that I will maintain at any point in time, that number should be less than 256. Because the array has indexes from zero to 255. Note what I am doing. Whatever character I have read, I am actually assigning its ASCII value to A N care. Remember A was int. A is not character. Let's go back. Int A 256. So I am not going to store characters in this array. I am going to store only the ASCII values of the character. As you know, C appears in dual representation internal. C is actually a number. We normally treat it as an unsigned integer so that it can represent all ASCII value. But whenever I assign it to an integer value or whenever I use it in an expression, its intrinsic integer value is used, which is the ASCII code. So this will capture the ASCII code here. Immediately thereafter, I increment N care to keep myself ready to capture the next character in that array. And I read the next character. So as is standard in any while loop, I set up first character and then I keep reading these characters. Remember, after C equal to get care, I will be thrown back to this while loop. And if the last character read happens to be capital X, I will get out. On the other hand, whatever be the last character read, if N care has already reached 256, that means 0 to 255 elements are filled up. I have no more space. I will come out here. In any case, N care will represent the current number of characters read, whether up to 256, that is 0, 255 or whatever. So I will set up a simple for iteration I equal to 0 to N care minus 1. And I will output here. This output is valid integer output because A is an integer. So I will print the ith element. Is that clear? It is a very straightforward program. I put end of line at the end after printing all the characters. This will be the program output. So suppose I type symbol 1, 2, 3, 4 on one line. Press enter, say 5, 6, 7, another enter, 8, 9, enter, 0, enter, X, enter. Get care. We start capturing these values. What is the ASCII code for 1? 49. What is the ASCII code for a blank? 32. What is the ASCII code for 2? 50. ASCII code for 3 is 51. ASCII code for 4 is 52. Notice that what I am getting printed is 49, 32, 50, 32, 51, 32. These are all blanks. 32 is the ASCII value for blank. But after 52, I get a 10. What could this 10 be? Enter. Because after 4, I have an enter symbol. I don't have a blank symbol. Note that if after 4, I had typed one blank and then pressed an enter, there would have been 32 before this 10. So although something is not visible to us, when we capture internally, that value will be captured. So this is how it will continue to print. The last one is 48 enter. 48 is the value for 0. After that, I have typed capital X. But notice my program terminated the moment it took capital X and therefore capital X is not printed. Is this understood? Can I make this program simpler? Yes, I can. So this is another version. In this version, I have the same declarations. I start with the same thing as n-care equal to 0. But I don't read any character to kickstart my while loop. So notice the way the while loop is written. It's a very interesting implementation. It's a shorter implementation and more elegant implementation. Look at what I am doing. While a n-care plus plus equal to get care, all that not equal to capital X and n-care less than 256. Where is the body for while? There is no body. I have put a semicolon. So what is it repeating? It is repeating itself till the condition is satisfied. Now we have said that in any while loop, the body must change the condition. Otherwise the condition will remain perpetually valid or immediately invalid. But notice what we are doing here. First of all, let us analyze this assignment statement. This gets a character, get care. So this is the first character that is being read. After reading it, it is transferring that character to a n-care. The current value of n-care is 0 initially. So it will transfer the first character read to 0th element. After doing this operation, it will compare it with X. The first character is unlikely to be X. So it will be valid. Then it will also check is n-care less than 256? Yes, n-care is 0 still. So n-care will be less than 250. And after completely evaluating this total condition, it will give effect to n-care plus plus. Please remember this post increment operation. Although I have said n-care plus plus, n-care is not updated immediately after that operation is done. It is n-care is plus plus is effected only when that complete thing in which that plus plus appears is complete. So till that is completed, n-care will continue to have a value 0. The moment that is completed, n-care will now become 1. So there is no body in this. So it will repeat itself. This type of character that will be read will be read in a n-care 1. Because n-care has changed. Again the same thing will happen. n-care will become 2 and so on. So you will notice that this loop does exactly what the previous loop did. But I get rid of the first initialization of get-care for first character to kick start this. And I do not need a body because all the operations that I was doing in the body of the iteration, I am doing in the wild statements. This is very commonly used and you will come across this in any example that you read on character handling or for that matter array handling in many places. The rest of it is of course same. I just output the ASCII values. I have a single statement in the for loop. So I just put it here. By using the for loop, I can do it from 0 to n. So this is clear. How you handle this? Here is another way of doing it. Except that it reads characters up to end of line. Because what I am doing here is I am saying get-s-a directly. Notice that a is now declared as a care array. So get-s-a will get a string from the input and it will get a string including blanks, capital X, whatever, whatever till enter is pressed. The moment you press and enter, a backslash 0 will be inserted. So the care array a will now contain a proper string terminated by backslash 0. How do you analyze any proper string standard? While a n care is not equal to 0. Because when not equal to backslash 0, that means null. So if you still you find a null character, you have to keep doing it. And what you are doing is you are continuing the loop. What you are doing is you are outputting c-value equal to a n-care plus plus. You see what we are doing? We are actually incrementing n-care inside this index expression itself. But it will take effect after this entire statement is executed. So n-care is initially a 0's as key value will be captured in c-value. That will be becoming the value of this expression. That value will be printed followed by a comma and a blank. And after that n-care will become 1. Again you will execute this iteration and so on. This time do you need to check for n-care being less than 256? We are not checking that. The reason we are not checking that is we are guaranteed that the get s function would have inserted a backslash 0 when I read the string. That is the difference between get-care and get-s. Get s will actually read the string till you press enter. And whatever is the string, 20 characters, 25 characters before you press enter, instead of enter it will put a backslash 0 at the end. So you are guaranteed that when you get the string back, you will have a backslash 0. There is only one problem with get-s. One of the reasons why it is called a deprecated function. It does not cross check for the array bound being violated. So suppose I type 500 characters before I press enter. I am sunk. What get-s will do very methodically is it will push 500 characters into that array. Of course the array is declared to have only 256. But it will logically go to the next location, next location, next location and keep stuffing characters. And in the last location you put a backslash 0. So it is possible that I may never encounter a backslash 0 and I will keep reading it. However, the way the program is written it will work in most cases. And if you have input 500 characters, you will actually get all 500 characters printed. Because array bound checking is your responsibility. And since the previous statement would have stuffed the characters in consecutive locations, you will get those many characters. Of course in the process something else may get chewed up. For example, if n care itself happens to be a variable whose location is after array a, you don't know. Then that n care will contain some funny arbitrary value initial. So funny things may happen if you are not very careful. Let us do one more thing here. Suppose we are not comfortable this function get-s. Can you write get-s yourself? Instead of this, I want to write some code which will get me a valid string in a by using get care. Can we do that? Yes, we can do that. What will I have to do? We know what get-s does. It reads every character till it finds an enter symbol. Whenever it finds an enter symbol, it will put a backslash 0. How will I do that? I have already started with n care equal to 0. So I will use the same logic that I did earlier while c equal to get care. So instead of c equal to get care, I will get care a itself. Suppose I wrote something like this. This will read a character in the array a. It will first read in 0th element. It will make n care equal to 1. Subsequent execution will read it in the first element and so on. But this time I need to check something more, which is backslash n is the enter symbol. So this loop says while you have not entered enter, keep reading characters. Whenever you finish enter, you come out. When I come out here, the value of n care will be actually the number of characters in the array because it would have been incremented by 1 any at the end of this. And the last character would have been backslash 0. However, does n care represent the length of the string? We have not put a backslash 0 inside that array yet. So if I want to simulate get s, it is my responsibility to put a backslash 0 at the end. So suppose I said a, will this work? Yes, I say no. Okay, what was the last character that you read? The last character that you read was backslash n. At that value, at that time n care had a value, let us say 5. So the fifth element is backslash n. But you do not want backslash n in your array. That is not what get s does. Not only that, this 5 would have become 6 when you exit the loop because I have said n care plus plus. Remember, this statement I am executing outside the loop after I come out of here. There is no body for the loop. So consequently, suppose I had read, I had typed India on my import. Okay, oops, what happened? I had typed India as my import. This would go into which element, 0th element. This will go in first element. This will go in second element. This will go in third element. This will go in fourth element, agreed? Now I would have entered a backslash n here. This will go into fifth element. And when I come out of this loop, my n care would be 6 because n care has been incremented. Where do I want to put a backslash 0? Here. So in the fifth position, I want to put a backslash n. The fifth position currently has backslash n. So I should say n care. Now I have formed a balance. So anybody who on an Ubuntu or any machine has a problem with get s can simply write this. And you will implement exactly the same. So are you comfortable how the input is handled? The output is handled in the same way. In fact, I can actually print a character string A directly by C out. We have seen how that happens. If A contains a valid string, which is terminated by backslash 0, then C out A would actually print that string up to backslash z. C out is capable of recognizing backslash 0 in a string and terminate out. Sorry, I should put this as, I forgot to put this bracket here. But anyway, now the first condition, observe that it is not really a condition. In fact, this particular A n care plus plus equal to get care is completely unlikely to be false ever because it is not really a condition. It is an operation. I am cheating why? In the guise of a condition, I am actually asking it to do an operation. Technically, an operation can give a false result if it results in 0. If an expression value is 0, then that condition part will be false. That is the nature of the condition evaluation. Now what happens when this statement is executed? I am actually getting a character at the input. Notice that I can type any character except 0. I can never type a 0. Not 0 symbol but 0 value. If I type 0, its ASCII code is what is being captured and that will be assigned to this A n care. N care will be incremented only later. So this condition is never false. So all that I am doing is whether this condition is false. Yeah, that's right. N care will be incremented only when the entire thing is done. No, no, no. If the n care is always incremented after completing the operation, independent of what that operation is. His question was if this condition happens to become false, that means I have actually backslash and I will come out of this. How can n care be incremented? So the answer to this is incrementing in n care is not optional depending on the what happens to the condition. The moment I have said n care plus plus, the C compiler will automatically introduce a forced instruction n care equal to n care plus 1 after that operation is completed independent of what the condition result is. So post increment operation is a very powerful fellow. He is not dependent on what happens to the expression. Whatever happens to the expression it will be incremented. Good question. This point is to be remembered that post increment is forced. It will always happen. Here is another problem. I have listed, I have typed in a series of names of my colleagues. Nandlal, Sada, Mureshwar, Pujada, Emile, Sonia, Jitriwan. All that I want is write a program which will separate out these names. So you can try this out in the context of the more general problem which I will discuss here. But this problem appears simple. You can define a string called first name. You can define a string called second name. You can start assembling characters as you read them from input in the first name. The moment you come across a blank, you terminate that string by putting a backslash 0 and then start assembling the second string. And do the same thing when the second string ends. Very simple. The problem happens if there are multiple blanks in between. The problems would happen if there are blanks after the second path. The problems could happen if there are blanks before the first path. So you have to check whether your program works correctly for such situations or not. I will leave it to you to write that program. But this is the program that we will briefly discuss for which I have given one sample solution in the handout. So here is a more general problem. I have typed in words. I want to read the string and analyze the string. Identifying different words. A word is any contiguous set of symbols. Non-blank symbols. That is what. So if there is a blank, it starts a different way. This is a simple example. How many words do I have here? Five words. So hello world. How are you? This is a well-formed sentence. One blank in between. But this is a sentence in which there are three blanks, two blanks, five blanks and they are blanks at the beginning. And although you don't see it, I have inserted some blanks at the end here when I type it. The trailing bank are most difficult to visibly verify. You can't see them. But in an actual practice, you may get anything that somebody types there. How do you do this problem? So first, we'll do some analysis to figure out how will we actually read a string and store different words. First of all, typically, this is my reasoning. I will type a line which is physically one line of the terminal. A monitor line has 80 characters. It's 80 columns. So I will generally not type a line more than 80 columns. That's my assumption. So I can define a string. So this is one care type that I will announce. Care line string. By using GATES, I can read this string. Now in this string, I have to identify different words. What is the longest possible word? All 80 characters, just a single word. What is the smallest possible word? One character. If I type one character blank, one character blank, one character blank, I can get up to 40 words. Each of these potentially could be 80 characters long. So the best bet for me to assemble these words is to declare an array care. Oh, sorry. So I have a two-dimensional array. Two-dimensional care array. How many rows it has? 40 rows because I don't expect more than 40 words to come up. Each word will be stored in the columns of that form. And the last symbol in that row would be a backslash 0. That is what I would get exactly. Of course, I won't get backslash 0 when I type the string. I'll simply get words, blanks, et cetera. It is my job to assemble. So effectively what I want to do is I want to read a string in line string. Then I want to start scanning from the first position, whatever it is. Whenever I come across a non-blank character, I know the first word has started. Let us say this is my two-dimensional array. So in word 0, I should start assembling the first non-blank character. That means all the initial blanks I should skip. I should forget the blanks. Then what should I do? Then I should put this hello H here. Then this should be followed by E. This should be followed by L, L, O. So I have finished this. After that the moment I notice a blank. I know one word has terminated. So I should, before starting ahead, I should put a backslash 0 here. And then increment this to 1 because now I know that I am adding a next word. So can you see now how the iteration will work? There has to be a single iteration which will cover from start to end the entire string. Within that iteration, I have to keep removing blanks. The moment a blank comes, I have to terminate the present word and start the new word. The program that has been written here, I will just write the operative part. Do you remember the string length function? GATES will get me a valid string terminated by a backslash 0. When I use the library function string length, okay, S-T-R-L-E-L, it will actually do an internal scanning of that string. Go up to backslash 0 and return the length of the string. So N-car is the length of the string. So if I have typed 45 characters, 0, 1, 2, 3, 4, 44 would contain those characters. 45th position would contain a backslash 0. Length of the string will be 40. All that I need to do now is I set up an iteration for I equal to 0. I less than N-car. I require only one iteration. It amounts to one scan of the entire string, starting with 0th element up to backslash 0. Now remember, during this scan, I will somewhere come across hello, somewhere world, somewhere how, somewhere R, somewhere U, et cetera. In this particular program version, I have assumed that I start with a non-blank character. So the first thing I check is whether a character is blank or not. If it is not blank, I have to assemble it in the current word. What is the current word? Start point, 0, 0. The variables which I have declared for that, J and K. The J is the row number. K is the column number for the word array. So if you notice the next condition, it says if line string is not equal to blank, that means I have found a valid character. I must insert it in word. So I now say word J, which is the Kth word. Notice J starts with 0 anyway. I have to insert it into Kth column. K was 0 initially, but immediately after insertion, I should add 1 to K. So I put this K plus plus here. All that I am doing is I am picking up the Ith character from line string, which I have just confirmed it is not a blank. Therefore, it belongs to a word. I put that in the next location for the word that I am looking at right now. And I increment the character count of that word by incrementing K. I will do one more thing here, which says blank flag. Blank flag is 0. What does it mean? I am setting up a flag. If I encounter a blank, I will raise that flag to 1. If I don't encounter a blank, I will keep it as 0. But since I do not know what was the previous character, I will do this setting every time I look at a character. If that character is non-blank, which is what it was right now, then the flag must remain 0. So I set it to 0. That is my convention. If I encounter a blank, I will set it to 1. This is all that I need to do in this loop. If I find a non-blank character, now whatever else I require to do inside this loop will deal with the situation where I have come across a blank character or a sequence of blanks, etc. So I have nothing to do with that. If I have found a non-blank character, that is why I use the term continue here. This continue will automatically take me out of this for good. So whatever I do next will not apply. So see how I am using the continue statement. I am jumping out of the loop, not out of the loop. I am jumping for the next iteration. Consequently, what this statement set does is, if it finds a non-blank character, it puts it in word and simply goes back to the next value of I. Because continue statement will force it to the next iteration. And I will continue traveling around till I come across the first blank. Suppose I say hello, followed by first blank I get. I will do this still, but I will not execute this if statement. So that means if the character is non-blank, I would have come here. Now when I come here, there are two possibilities. Either I have finished the string, in which case this was the last word I get out. Or I have found a blank character. If there is a blank character, what am I supposed to do? I will set the blank flag. That is one. And I will keep skipping those blanks. However, when a non-blank character comes, I will have to start assembling a new word. But have I ended the word? Remember I was so far assembling the 0th word. I have put H-E-L-L-O and I am waiting for the next character there. So the first blank that I find when I come here, if I have found the first blank, what must I do? I must go to word 0, put a backslash 0 in that position. And then increment J so that I start assembling a new word next time. This is what this program does. I would request you to look at this program carefully back home, try to hand execute it for Hello World or some two or three words like this, and make sure that it works in all cases. There is at least one case in which this program does not work. I would like you to find out that case. And I would like you to find out what should you do to the program such that it works correctly. It's a very small thing that you can do, but you will have to think about it under what circumstances this program will not work. So you may try the same string Hello World, how are you or whatever, just two words, three words, whatever, and try separate combinations, more blanks here, some blanks here, some blanks at the end, some blanks at the beginning, and figure out what happens. Is that okay? So you'll be able to answer this. Next we come to revisiting the memory locations. We'll consider the notion of a pointer that is available in CC++. So ordinarily we know that memory is organized in bytes. Bites is too small a value to handle larger numbers. So we have, for example, two bytes, four bytes or eight byte locations depending upon whether we say short end, long, double, float, whatever. So consider, let us say the variables and arrays that I have declared here int m float a3 care c4. This declares a single variable m. How many bytes it will have? Four bytes. Integer in Ubuntu is four bytes. Float a3, how many bytes totally this array will have? Three elements, four bytes each, four into three, twelve bytes. Care c4, there's four element array. How many bytes it will have? Only four, because care occupies only one byte. So care is allocated one byte. These are some sample values that I have shown here. Some arbitrary assignments I have done. m is equal to 573. Sorry, there is a semicolon missing at the end. What is the objective of writing all of this? What we want to see is how exactly the c++ compiler is likely to allocate memory to these variables? What would be the addresses and what would be the values stored in those variables? So we see what we may call a memory map, a possible memory map. Certain assumptions we are making. The assumption that we have made here is that all the locations allocated are in the same sequential order in which I have declared those names. That assumption is not right, by the way. We shall soon see that c++ can merely choose to put one variable here, another variable there, one array there, etc. What a compiler guarantees, however, is that if an array is allocated space, then various elements of the array will necessarily be allocated consecutive locations. So if a0 starts at some point, a1 will necessarily start at the next point after allocating as many bytes as are required for a0. Here is an example. We assume that m is allocated at address 10000. This is a sample value. Number of bytes needed are 4. A array is allocated next. a0 will be allocated the byte address 10000. This will require 4 bytes. So a1 will be allocated 10000. There is no confusion there. 4 bytes, 4 bytes, 4 bytes for a0, a1, a2. Immediately followed by this, suppose array c has been allocated and let's say c0 is address 10016. It contains, let's say, the character u. c1 will be 10017. Why? Because characters require only one byte. So you are comfortable with this mapping? Now ordinarily in our program, we refer to these values by these names. m is equal to 573. c out m. p is equal to m. So whenever we say m, we mean this value. We do not have to deal with these addresses at all. But can we and should we is a question. Should we deal with addresses directly? The current pedagogical answer is no. You should not deal with addresses directly. Can we deal with the addresses directly? The answer is yes because people who write compilers, people who write operating systems, how to deal with these memory address locations because the actual memory locations are to be participating in any instruction. So surely software can access those things. Whether that access should be made available at a higher level like C++? The designers of language C and C++ said yes, it is required. It is required because adequate abstraction capabilities did not exist in the early programming language. So the very first programming language Fortran for example had nothing to do with addresses. It would work strictly at the higher level with the names. But the language C and therefore the language C++ provided for point. The address handling is done through a special variable type called pointer type. This pointer type is completely different from integer, float, etc. We shall see that notion. So these addresses are essentially pointers to memory locations. Notice that each points to a location containing value to a specific type. For example, 10,000 is a pointer to M. It is pointing to an integer type value. 10,012 is pointing to a floating point value. I would like you to ponder about this distinction. Both 10,000 and 10,012 are addresses. 10,000, 10,000, 1, 10,000 each one is address. In absolute terms it is a numerical value because addresses start in any memory from 0 and go up to millions or billions depending upon whether you have megabytes or gigabytes of memory. So as such in the address value there is no distinction. All addresses are safe. However, inside a programming language you want to remember that this address points to a specific type of value. One address points to an integer type value. Another address points to a floating point type of value. If you don't remember this and try to manipulate addresses and their contents directly, you may end up in soup if you don't remember what the value is. And therefore the pointer notion which cc++ defines as we shall see in a moment is always associated with a specific type. What it means is yes we can deal with addresses directly but we cannot deal with addresses in absolutely indiscreet fashion. Every address must come tagged with a type of value that it points to. And it is that type that will make, that will permit us, that will not permit us to mix addresses of different types. We shall see that in a moment. So as I have said here ordinarily we don't deal with these pointers or addresses. We use names in address but c++ permits pointers. And a pointer is a special type where a location address can be found and stored in that pointer type name. And we can also use the pointer to identify the contents of that pointer. So here is an example such a location m. So this is the location m. What are the contents of m? 573. What is the address of m? 10,000. Now suppose I had a pointer p. This is a new element. This is a new animal. It does not exist so far in our program. And suppose I manage to extract the address of m which is 10,000 and put it inside. Then what do I have? I have a location which contains the address. How do I get this address in p? Having got this address in p, can I access 573 using p? So p is a pointer. Two questions we are asking. First, how do we get inside p an address of a location? Second, having got this address at 10,000, is there a mechanism for me to say I want to go to contents pointed to by this address 10000. So whatever be the address here, 10,000, 16, what are the contents of 10,000, 16? In short, I want to do two operations. If ever I am permitted to have such pointer variables. One operation, get a valid address of some location into a pointer. Second operation, access the contents of that location using address which is stored in the point. Both these operations are permitted and they are executed in this fashion. So this is first of all the definition of a pointer. C++ defines a special type called pointer type. And this pointer type is defined by using int star p. Int star p. Now that is a very funny convention. How do you remember star so far? Star is a multiplication. So p star q m star n is multiple. That is when it appears in an expression order. But star has multiple authors. In C++ in the definition, if I say int star p, it means p is not of the type int. But p is a pointer which will contain an address which will point to an integer type value. So pointer is tagged to a type. Can I declare a floating pointer? Any clue on how will I declare a floating point pointer? Is it necessary to write star along with p1? Can I write float star blank blank blank p1? Any answer? The question I am making is, can I say, will it make any difference? You should immediately say that it will never make a difference. You have forgotten one statement I had made long time ago. The first thing that a C compiler does, C++ compiler does, is that it removes all white spaces from your program. All blanks, all tab characters are all removed from your program before compiler analyzes it. So therefore float blank blank blank star p1, float star blank blank blank p1 is one and the same. There is no difference whatsoever. If you write float p1, it would mean a name p1 which actually holds a floating point. If you say float star p1, then p1 is a pointer object. It's a separate fellow. And p1 can at some time eventually contain an address which will point to a floating point value. That's the meaning of it. So this particular int star p defines a name p to have a type int star. So now we are introducing a new type. In fact, we are introducing multiple new types. You already know what types int, float, double, care, etc. Now we know we can have int star, float star, care star, double star. So we can have pointers which will point to contents of a specific type. How do we get a value for p? You remember in the previous slide we showed that if m was an integer and I had this 10000 here, I could actually in pointer p have the value of 10000 which is the address. How do I extract the value of the address? Can I, for example, say p equal to 12000? Some arbitrary integer. I know it's an integer value in terms. Can I do that? Well, actually I can. The c++ will not object. But extremely funny results may happen because this 12000 has come out of your mind. Suppose you run your program today and as we shall see we can actually print the pointer values after extracting the right value. And you note that your variable m was at this location. Tomorrow, can you directly assign this address? The answer is no because tomorrow when you run the program, the actual memory locations may be different. Please note how are actual memory locations assigned to your variable when you execute your program. So I'm digressing but it's important to note the following. Whenever I say .slash a.out, that is the point when my translated program is going to be loaded in the memory and then it's going to be executed. Now at that time the operating system will give my program a chunk of memory and say please accommodate yourself here. I know exactly how much total memory you will require. This a.out program when it is loaded at that time the loader will assign. You were m here. Today you go to location 1012. But suppose the memory locations assigned by the operating system are different every time that you run your program because operating system is doing a variety of other things. So you can never be sure as to what is the absolute value of an address till your program is loaded there. And only when your program execution begins then you can find what is the address. And therefore you must have a mechanism to dynamically find out the address of a location. And therefore you require some special operations which are indicated here. These are the two operations that you can perform related to pointers. The first operation is called the address operator and you use the symbol and for that. So and followed by any name will return the address of that name. There is a dereferencing operator which you may say the reverse of address. Given an address it will find out the contents and that dereferencing operation is against star. So star is overused very heavily. One for multiplication, other for defining a pointer. And this is the third one now which is actually used in any statement that you write in your C++ program. So for example and m will mean address of m. Anytime in an expression you write and m you will get an address. Can I assign it to an integer variable x? I can but it doesn't make sense. I should assign it to a pointer. In fact this is the only way for any pointer in your program to get a proper value. So you do add something, the address of that something will be returned which you can store in a pointer. That's why it's called an address operator or referencing operator. The opposite operator is star. If p is a pointer and if you say star p it means contents of address stored in p. So if p contains thousand then the location thousand whatever it contains those contents are referred to by star p. And remember those contents are known already to be either integer or float or care depending upon the type of pointer. That is how you define the pointer type. So p must be declared as a pointer and it should be assigned a value through the address operator. That is the right way of doing it. Is that understood? Let us look at some examples to clarify this. I have declared int m and n and then I have declared int star p. m and n are normal locations p is a pointer. m is equal to 25. Again a semicolon is missing you can add it there. p is equal to and m. This is a valid operation. What will it give? The address of m. Whatever be the address 10,000, 20,000 whatever it will come and sit in p. Now if I write an expression n is equal to star p plus 3. This is a valid expression. 3 means value 3. If I had used x, y it would have meant value x. But when I say star p it does not mean value of p. It means the value pointed 2 by p. So star p is a dereferencing. Consequently since p was assigned the address of m and since m contained 25 star p means 25 and 25 plus 3 is 28. So this will work correctly. So is this concept clear? This is a consolidation of whatever we discussed. We can declare pointer variables in our programs such as int star p 1, float star p 2, care star p t etc. Above declaration will allocate locations for the three pointers. Each pointer location can contain an address. Now the addresses could be 32 bits or 64 bits depending upon the nature of the operating system and the computer that you have. If you have computers which are 64 bit computers that means address can be 64 bit long. So you can have many more than gigabytes and terabytes of memory inside you. Then the pointer which will be assigned in that environment will have a 64 bit location. Ordinarily it is a 32 bit location. 32 bit addresses are good enough. How many bytes you can access using a 32 bit address? 0, 1, 2, 3, 4 up to 2 to the power 32 minus 1. It is a fairly large number. Now suppose we write p 1 equal to and m. Then if I say int q q equal to star p 1 will assign to q 573 if m has the value 573 as we just saw. We can print a value pointed to by simply writing c out star p 1. In fact there is no difference between writing star p 1 or m or x. Star p 1 is as good as any other value. It is of course our responsibility to ensure that p 1 has been assigned appropriate address before I come to this point. Now this is something special. I can also print the value of pointer itself. I can say c out less less p 1. So at any given instance I can actually find out inside operating system as given which particular address to my location. These addresses are printed in hexadecimal form. Remember we have said 0x. If anything begins with 0x it means the number is hexadecimal. Hexadecimal means 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f. These are the 16 symbols. So this is how it will be printed. We can also increment the value of a pointer. Now we come to pointer arithmetic and this is what makes it interesting. Suppose I say p 2 is equal to and a 0. What will this do? It will put the address of a 0, the 0th element of an array a into pointer p 2. Now I say p 2 plus plus. When I say p 2 plus plus, ordinarily plus plus means add 1. But p 2 plus plus or p 2 is equal to p 2 plus 1 has a different interpretation in c plus plus. I am playing with pointers. The objective is I should point to the next location of that type. Now if a 0 is 4 byte location, integer or float, then I know that a 1, the next location will be 4 bytes away. Therefore c plus will add not 1 but 4 bytes. Consequently, after adding 1 to p 2 like this, if I say c out star p 2, I will get the contents of location a 1. Is this understood? There is an important notion. So let us say this was a 0. Now a 0 will have 1, 2, 3, 4 bytes. Then this will be a 1. This will have 1, 2, 3, 4 bytes. Then this will be a 2. Now suppose I have said this is my p 2. And let us say p 2 contains 10,000 because this address was 10,000. Now when I say p 2 is equal to p 2 plus 1, I would, ordinarily arithmetic sense, I would expect 10,000 to become 10,001. But c plus plus remembers that p 2 is a pointer. And whenever I increment it by 1, it will increment this pointer which is currently pointing here to point to the next loop. Because it will know it is pointing to an integer type and therefore it has to add 4 bytes to it. Consequently now p 2 will become 10,004. If I do once again p 2 plus plus, what will it appoint to? This fact can be utilized effectively to access consecutive array elements by simply using pointers. So I can capture some pointer p is equal to add a 0. And now I can keep doing p plus plus and a star operation. Now p plus plus will actually consecutively next element of array a is what it will be pointing. We shall see more examples of the pointers later. I will conclude this lecture by just showing you what happens when you execute some programs. Here is a pointer example program. I have declared variables m and n and an array a. I have given some value to m and n. This particular example is about just two variables m and n. I have declared two pointers star ptr1 star ptr2 both are of type n. I get the values of m and n printed which are 573 and minus 1, 2, 3, 4, 5, 6, 7. Then I say and m is assigned to ptr1. So I am getting the address of m and putting it this. Since I have said m followed by n in my declaration, somehow in my mind I expect that m and n will be allocated consecutive locations. m will have 4 bytes and n will have x4 bytes. So I do an arbitrary thing. I say pointer2 is equal to pointer1 plus 1. And now I say the pointer values are these and the corresponding data values are star ptr1 star ptr. What do I expect? I again expect to get the value of m and value of n printed here. In the process, I will also get the value of addresses. This is the output. m and n are printed to be like this. These are the pointer values. However, the corresponding data values are 573 and minus 1077, 217072. What has happened? What has happened is that this pointer was correctly pointing to m because I explicitly said ptr1 equal to and m. But then I added 1 to that pointer which incremented that pointer value by 4. So I have here ac and b0. b0 is correct 4 bytes away. But unfortunately that does not happen to be the location allocated to n because the operating system said you might have declared m followed by n but I choose to put m here. I choose to put n there. And that is the reason why you are getting a funny value. The correct program is to say ptr1 equal to and m. ptr2 equal to and m. And then if I print the pointer values, I will see what actually the addresses are located and let us see whether I get the correct value. Of course, I will get the correct value. When I say star ptr1, I know I am pointing to the value of m. When I say star ptr2, I know I am addressing the value of n and I had explicitly assigned. This happens to be the result. Notice that the pointers have these values. 0xbfb2772c, 0xbfb27728. 28 is before 2c. So the compiler has chosen to allocate n first followed by m. That is compiler's decision is not under my control. Morale of the story is that you should always get explicit addresses captured through the end pointer and use them. However, one thing C++ guarantees which is if I have an array declared whatever is the base address of the array is not under my control. That the compiler will assign anywhere. But once it assigns a base address, compiler guarantees that all subsequent elements of that array will be given consecutive locations which means my pointer arithmetic can work there and I can access various elements of that array using merely the pointers. So we will stop here. Just remember that there is a plethora of examples in any book on C++ or for that matter on C which will tell you how to handle character arrays, floating point arrays, matrices, etc. using pointers. We will see some examples in the subsequent classes.