 We will get started. So, we are continuing with loops and today we will go from the basic while loop to the more general and sometimes easier to write for loop. Strictly speaking you do not need a for loop, you can manage everything with while, but for syntactic convenience people very often use while loops especially in conjunction with arrays and we will start looking at array manipulation today as well. So, most of this lecture will be a large number of simple and moderately simple examples of using loops for doing interesting computations. So, we will just continue on last time's theme, but we will get to see much many more examples. So, let us remember the counting problem that led to this sequence that we have to solve for. If I say that a bit vector sequence of bits has length length, then the number of possible bit vectors of that length is just 2 to the power length as we have seen. But if I further constrain these strings so that none of them can have 2 or more consecutive zeros then the number will of course decrease from 2 to the power n, some things will be disqualified. So, if I dictate that the length is 1 then there are 2 possible strings and none of them can possibly have 2 consecutive zeros, it is just 0 and 1. If length is equal to 2 then in all there are 4 possible strings, but one of them 0 0 is disqualified. So, the remaining number of legitimate strings is 3 which are 0 1 1 0 and 1 1. Now, suppose a of length a is a function a of length is the number of sequences that end with the 0 of length length and which do not have 2 consecutive zeros and b of length is the number of sequences of length length without 2 consecutive zeros which end in a 1. So, we define these 2 functions of length. Now, the required answer is of course, a length as b length because the string has to end with either 0 or 1 and if the sequence ends with a 0 then the second last bit has to be a 1 otherwise we will get a consecutive set of 2 zeros. Therefore, a of length has to be equal to b of length minus 1 by the formulas we have set up by the definitions we have set up. If a of length sequences end with 0 and b of length sequences end with 1 then if the sequence end with a 0 then the second last bit must be 1 and therefore, a length equal to b length minus 1. However, if a sequence ends with a 1 then there is no restriction on the second last bit because I will not be introducing any duplicate zeros either way and so, the second last bit could be either 0 or 1. Therefore, b of length is equal to a of length minus 1 plus b of length minus 1. So, you look at the number of strings of length 1 less and then we add them up because the last bit is already decided to be 1. So, this defines a system of mutual recurrences. So, b of length depends on a of length minus 1 and b of length minus 1, a of length depends on b of length minus 1, but the structure of the recurrence is fairly simple in that a can be eliminated easily out of this game and we can thereby get a recurrence only in terms of b. So, b of length is equal to b of length minus 2 which is a of length minus 1 plus b of length minus 1 and most of you will recognize this as the Fibonacci sequence, but it is also known as the Hemchandler numbers who is now credited to have discovered it centuries before Fibonacci. So, the best case for b itself is b of 1 equal to 1. So, remember b of length is the number of strings ending with 1 ok. Now, if the length is itself 1 then that only bit can be 1. So, the bit sequence is 1 and if b of 2 is equal to 2 if you are given 2 bits then if it has to end with 1 then the previous bit can be either 0 or 1. So, there are 2 possible combinations. So, b of 1 is equal to 1 and b of 2 is equal to 2. So, these are the best cases. Now, if I am given an integer length and I read that from C in and the length turns out to be either 0 or 1 then there is nothing to do. I just print out length itself as b of length otherwise I have to do some more work and after the best cases note that b of length depends only on b of length minus 1 and b of length minus 2. Therefore, in this calculation it is not necessary to keep the entire sequence of b's from the beginning. I can just remember the last 2 values and get by. So, look at this example. Suppose up to this point I have computed b of 1, b of 2, b of 3 which is the sum of the previous numbers and b of 4 which is 5 sum of the previous 2 numbers. Let us have an index variable called Lx which takes values from 1 to length and finally outputs the value for length. And b Lx is the value of the function b at argument or input Lx, b Lx minus 1 and b Lx minus 2. So, let us say at the moment I know Lx as 4, 1, 2, 3, 4, I know b Lx as 5, I know b Lx minus 1 as 3 and b Lx minus 2 as 2. So, note that after this calculation is done, there is no need for the number 1, I can afford to forget it. Now, what do I do in the next step? The important thing in writing these kinds of linear recurrence loops is to get the body of the loop right in the sense of the correct order of assignments to variables to forget the past and shift ahead to the future. So, observe that to calculate the next value in this cell, I will only be requiring 3 and 5 and not 2. So, I will destroy the value 2, I will forget it by switching the value of the variable b Lx minus 2 to the next value 3 by just copying b Lx minus 1 into b Lx minus 1. Now that 1 and 2 have disappeared from the program state, the next step I take is to copy the contents of b Lx, namely 5 into b Lx minus 1. So, now, if you will b Lx minus 1 points to the value 5, you do not actually have an array, we have just 3 variables. Now, b Lx becomes free so to speak and Lx we will increase to 5 and b Lx will now point to this so called cell, there is nothing in it yet logically. And then we will calculate b Lx as b Lx minus 1 plus b Lx minus 2 and that will give us the next value. So, this is the entire sequence of operations inside the loop, it is very important to get the sequence within the statements right, otherwise you might be damaging a variable you need in future or you know not getting rid of a variable like 2. So, after you have computed the basic step 2 will go the way of 1, 2 is lost in history and you do not care about it and you keep on doing this shift. So, the else clause we will assume that Ln is at least 3 because if Ln is 1 or 2 we know how to give the answer directly. So, we initialize b Lx minus 1 to 1 b Lx minus 2 to 1 and b Lx minus 1 to 2 and then we initialize Lx to 3. So, the initial condition is where Lx is pointing here and we would like to compute b Lx as 3 itself the first value. We do not actually compute b Lx, we start the loop saying let Lx start from 3 and the input is Ln. So, now suppose for the easy case that Ln is exactly equal to 3 the first non base case. So, if Ln is exactly equal to 3 then the test Lx less than Ln fails and the body of the while loop is not executed. So, you exit the while loop right away and at the end you print Ln which is 3 and b Lx minus 2 plus b Lx minus 1 which is also 3. So, that is correct till the case where there are 0 iterations of the while loop body. Now, suppose Ln is equal to 4. So, now we will enter the loop and evaluate b Lx equal to 3 1 plus 2 and Lx will shift to 4 and then I will do this switch. See this switch is goes in that order the minus 2 guys shift to the minus 1 guy the minus 1 guy shift to the current guy. So, now we will have 2 and 3 and so in case I add it up I will be printing 5 and this goes on you can verify that by induction that there is this invariant that the true value of the function b at Lx is always equal to b Lx minus 1 plus b Lx minus 2 no matter how many times you evaluate the loop. So, here is a case where you have to write down this loop invariant and this is the loop invariant you want that is the proof of that depends critically on the ordering of these statements. This has to happen first this has to happen next. So, that is how you can compute Fibonacci numbers. Now, many of you will know that the Fibonacci numbers do not grow as fast as 2 to the power Ln, but they still grow exponentially with a smaller base. The base is approximately 1.6. So, even Fibonacci numbers will grow exponentially and within very modest values of Ln your b of Lx will exceed integer capacities. So, even if you want to compute Fibonacci numbers for large numbers you might wish to use doubles. Yes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . but with the property you are looking for. So this is how you would compute simple linear recurrences between discrete variables in time. So next example is actually a simpler one but has the same flavor. So suppose I want to print running averages, the user inputs are potentially unending sequence of numbers. After the third number is input and for every subsequent input print the three point running average. So let us have variables a1, a2, a3 which hold the latest three numbers. You might want to use this kind of a program for example in computing smooth temperature variations in a room. So you read the temperature every minute and maybe there is some local disturbance in the airflow. So you do not want to overreact to a particular reading because it could be noisy or the sensor may have some dust on it. So you want to do a running average over a small window of temperatures to get a smoother estimate of the temperature in various parts of the room. So that is one example of why you might want to code up running averages. So we set the last three temperatures in a1, a2, a3 where a3 is the most recent one and a1 is the oldest temperature and then because we need three numbers to get primed we read those three indirectly. And then we have this infinite loop where we print the average and then again just like the pointer switching in the previous diagram a1 takes the value of a2, a2 takes the value of a3 and now a3 is freed up. So you can read the next temperature from C in. To fit on the screen I have written this in the same line but standard syntax you should write it in separate lines. So that is a very simple example of managing so called pointers. So you have a finite number of variables you do not or cannot remember the entire history. You might be measuring the temperature every millisecond for 10 years so you do not want all that storage. You decide how much storage you need and systematically forget the past. We can do simple tricks here. For example, suppose we say that recent temperatures are more important than you know temperatures in the distant past. Suppose I can have a buffer which keeps the last 20 days of temperatures. And I decide that the so called running average will be a weighted average where the current day has say weight 1, yesterday has weight half, the day before has weight 1, 4 and so on. So now that we know about binary representation you should figure out how to maintain running averages with that definition. So should be fairly efficient think about it. So in general geometric decay is easy to deal with. Other kinds of decay functions may or may not be easy to deal with. So there is this whole area of stream processing and sensor data processing which deals with problems like that. How can you keep very limited memory sketches of a time series of measurements, temperature pressure, humidity, call details in a router or a telephone network and then collect statistics of it on the fly in a moving window. That is a very important area of computer science as well. Next example, migrations between two cities. So let us say the two cities have initial populations m and p which we will represent in doubles. Every year one fourth of one city migrates to the other city and half of the other city migrates to the city. And suppose you want to simulate this for k years and one of the important questions is do the population stabilize or will one city just keep growing and growing. So here is a candidate piece of code. So you said start up with the populations in variables m and p and read in the number of iterations as well as the initial populations and then you simulate the time steps t equal to 0 through k or 0 through k minus 1. We take the population of Pune we migrate half of them to Mumbai and we import one fourth of Mumbai into Pune. And in the next step we take Mumbai we reduce it by one fourth of itself and add half the population of Pune. So is that okay or is there a problem with this? So this is called a loop carried dependency. So the semantics of the problem as specified on the previous page basically says that people note down the population through the year and they do not make any moves on 31st December they decide that based on the population in the previous year they will do something. That is called a synchronous update semantics. Of course in real life it does not happen like that. But if the problem is set up as a synchronous update problem then that is a loop carry dependency which violates that specification and therefore that is wrong. So this is a synchronous update where P changes on the fly inside the loop body. If you do not want that then you have to create temporary variables. So you have to say something like P new equal to that formula and then M new equal to the other formula and then you assign P to P new and M to M new. So this is not a contrived example there are many cases in iterative numerical analysis where asynchronous updates may be slower to converge or even incorrect and synchronous updates are proved to be correct or faster. In that case you have to carefully code synchronous updates which of course need more space. In this case there are only two variables involved so no one would feel it. But if you are updating matrices or big vectors and they fill up almost all of RAM and if two copies of your vector won't fit in RAM then you have to do some separate arrangement for it. So when we look at more detailed matrix algorithms like Eigenvalues and Eigenvectors we will see examples of this. So I am just creating more and more examples and giving some salient points of each. So we can see the intricacies of doing loop programming correctly and efficiently. So the next example most of you will know about how to calculate the GCD of two numbers. So given two integers M and N where M is greater than equal to N greater than equal to 1 find their GCD. Now it is well known that if H is the GCD of M and N then of course H divides both of them and if M is equal to NQ plus R given H divides both M and N it has to divide R. So therefore H divides the GCD of N and R and so H is less than equal to GCD of N and R. It is also quite easy to convince yourself that H is greater than equal to the GCD of R. So in other words GCD of MN is exactly equal to the GCD of N and M percent N. M percent N is the remainder when M is divided by N in other words R. Note that I flip the arguments around and did not write M percent N and N. See because the convention was that the first argument is at least as large as the second one. So when I divide M percent N is always less than N. So I arranged it again so that the larger argument is again in the beginning and we will assume that to write the code. So let us say I read integers M and N from C N and let us assume that the programmer will enter it in the correct order otherwise we can always change it by doing max and min and while N is greater than 0 the GCD of 1 number and 0 is defined to be the positive number. So while N is greater than 0 again it is an issue of synchronous update I cannot you know I have to change two values simultaneously. Arguments which used to be M and N have to become N and M percent N. So I save N in temp I overwrite N with M percent N and then I write M equal to temp. So the elementary step is to take arguments MN to N and M percent N. So I save N in temp I overwrite N with M percent N and then I write M to temp which will turn out to be the old N. Again this is very simple the only important thing is inside the loop you have to be careful in the order in which you save variables and overwrite variables otherwise you will get it wrong. And finally when N has gone to 0 you print the other variable that is the GCD. You know there is nothing much to demo about this but here is that code. So let us say I enter 12 and 8 I get 4. So 12 and 8 turns into 8 and 4 4 and 0 4. Suppose I enter 13 and 7 I get 1. So this is correct. So is this assumption really required that M is greater than equal to N what happens if I enter something in a illegal order where the first input is less than the second. Can you think of an example which will give a wrong answer? Will it matter? Suppose the second argument is bigger. So there are two opinions here one is that there will be no problem it will just be reversed and the other is that what is the other opinion that there will be a mistake in case. So how much is the smaller number will be output immediately. So let us see I will try say 7 and 13 instead of 13 and 7. So why did it get interchanged? So sometimes it may work out but I will leave you to get convinced that either way that there is something wrong with this code in case the input is not given in the order versus either order will work fine. So think about that. Sometimes specifications can be automatically satisfied without your. So as I said you can have real fun with loops only when you come to arrays because it does not suffice to have repeated computation you need to have repetitive data structures. So arrays is the first example of a repetitive data structures. So we have already introduced strings which are basically an array of characters but luckily unlike in the old C native strings with null combination we do not need to worry about who allocated the space and where the space comes from where the space goes. The system keeps track of how to allocate the space how long a string currently is. So the string class was designed to do all these things for you conveniently. So all we have to know is that we can assign two strings we can say string message equal to hello world. We can append to string so we can say message equal to message plus foo that will just append that string to the old string. We can read out a position we can say c out less than less than message of px for position px or we can write into a position as well we can say message px equal to q. So there are two flavors in which message px is used. If message px appears as a right hand side then message px will pull out the character that is inside position px of the string. Whereas if message px appears on the left hand side of an assignment then message px means the particular memory cell where a character is to be written. So when it is a message px equal to q then it says write q into the pxth cell of message. As long as you know all that that is all we need for today because the rest is just using loops to do fun stuff on strings. So here is the example. Suppose you want to print a string in reverse that is pretty easy. We read the string from c in using get line. So what is the difference between c in greater than greater than message versus get line c in message? So if you use c in greater than greater than message then it finishes reading at the first white space. Whereas if you do get line it reads until you hit the new line even if you have spaces in your string that whole thing will be read into message. So we need the whole message up to the end of the line and excluding that new line the new line is not read. Now since you have to print it out in reverse order remember if a string is 5 characters long the positions are numbered 0 through 4 not 1 through 5. So we set mx to be the last position which is message dot size minus 1 dot size is the method you call on the message to get its current length. We will see more of that when we do object oriented programming. Now we will do a while loop which is while mx is greater than equal to 0 we print out the contents of the cell mx and then you decrement mx by 1. So it is very easy and this is an example where mx changes completely predictably inside the loop. Contrast that so of course we know that this will terminate because I started out with mx equal to some length like 4 and then clocked it down and it has to hit 0 at some point so it is very simple. Contrast that with the GCD problem where there is no such simple looping on any variable okay but yet we are very confident that the GCD program will terminate why is that. So if you want to be really formal the proof is slightly more involved okay. The GCD program always terminates because see the every step the input was mn it became n and m percent n okay. What is the relationship between these two? See these what happens is at least one of these two numbers will strictly decrease that is the property because when you divide m percent n that is strictly less than n okay. So the property you are looking for to show termination is that at least one of the numbers strictly decreases and therefore you know since n is always the smaller number eventually n will hit 0 so that is the argument. So earlier I talked about loop invariance to prove termination you need a property called loop variance loop invariant and loop variant. So you have to show that certain properties are monotonically changing and then that there is some wall at which the property will hit and that will make the loop terminate okay. So GCD is a very easy case of course iterating over an array is trivial that you will terminate. Sometimes if you have complicated while loops it takes quite a lot of you know convincing that the loop will actually terminate at some point. So this is a simple way now if the loop is very predictable and the mx the so-called looping variable has this simple structure of being initialized at some end and clocking through finishing up at the other end then while is to cumbersome to write and that is why the for loop was created okay. So the while statement looks like while condition statement as we have already seen you evaluate the condition if it is false you exit immediately if it is true you execute the statement then you check the condition again. The for loop has some syntactic sugar it is called syntactic sugar to ease the life a little bit. So it allows some sort of initialization code in it then it checks for a condition if the condition is false it exits already if the condition is true it evaluates first the statement in the loop body and then the stepper code the stepper code changes something in particular it can change px the position where you are reading in the string after evaluating the stepper code it goes and reevaluates the condition and it continues like that. So that's the semantics of the for statement. So if we take this printing a string in reverse and write it in for it's going to be much shorter just three or four lines. So you would say int mx equal to message dot size minus 1 semicolon mx greater than equal to 0 which is the condition to check and the stepper is minus minus mx. Now observe that the init code is executed only once whereas the condition code is executed at many as many times as it is found true followed by the last false. So in case you are trying to get squeeze out the last bit of performance from your machine you need to make sure that the condition evaluated is faster than the unit evaluated. In this case that's actually the situation because the size method will be called only once in the beginning whereas mx greater than equal to 0 is a very very fast check between two integers. You could also have code which looks like printing the string in the ordinary order which of course is provided to you but just as a demo you could say for int mx equal to 0 and then you say mx less than message dot size and because no one needs the old value of mx in the stepper you just do plus plus mx instead of mx plus plus and then you do your calculations here. So observe that the check now involves this function call and later on we will see that method calls and function calls have substantial overhead in many cases okay because the architect the machine has to do some special things to remember what to do after the method returns. So this may be a little expensive. So instead if the size of the message is not changing inside the loop that's an important check. If the string is not manipulated inside the loop then you can change the code to say mx equal to 0 comma declare another variable mn which is the size so the unit is done only once and then in the test you just say mx less than mn. So this is a very fast test between two integer variables and then you do plus plus mx here. So in case this method is expensive this can save you substantial time. So that's a tip for writing in a faster for loops but coming back to the example also observe that the initialization code can declare new variables so integer mx. So mx the variable comes into being when the init code is called and you can use mx just like any other variable inside the for loop. So inside you can use mx and anything that was declared outside the for loop earlier. The variable mx ceases to exist when the for loop is terminated. That is as per the ISO standard. Older compilers will sometimes let mx leak beyond the termination of the for loop. That's usually considered a bad implementation feature because you know programmers may write multiple loops one after the other and they may pick up the same variable and forget to initialize it and then you'll pick up some stale value from a previously terminated loop. So that's generally not such a good idea. So g plus plus has been moving over to NC or ISO standards and if you look at this code where I start of a loop with I I close the loop and then I just assign j to I just to test what the compiler will do. So some old C++ compilers will let you do this g plus plus will complain saying name lookup of I change for ISO for scoping and if you still want to see I here then you have to give a flag saying be permissive. Strict ISO semantics will not let you use I once the loop terminates and that's how you should keep it. Don't depend on using I after the loop. So loops can of course be nested most non-trivial programs will nest loops to do useful computation. So the simple example let's calculate pi again we have been calculating pi for a while now and this time we will do it using a very simple counting technique. We will generate a uniform grid of x y points in the plane. We will draw a circular disk of a large radius and we will count whether each grid point lies inside the circle or outside and that will be an approximation to the area of the circle. So here's the code. So I declared a radius let's say start off with something modest say 100 and inside counts the number of grid points I have found inside it declared as double and radius is also declared as double because I will be dividing later on. So what I do is I take x from minus radius to plus radius in steps of one and if you want to be symmetric about it I can say less than equal to 0 and similarly y goes from minus radius to plus radius in steps of one. Now I detect if the point x y is inside the circle by saying x squared plus y squared less than equal to radius squared that's the test. Now observe that this involves only direct floating point multiplications which are implemented in hardware. It would be bad to say square root of this quantity less than equal to radius because square root as you know involves the function called to a library which does all kinds of complicated things. So you would like to avoid that. At the same time observe that the quantity x and y are changing through iterations of the loop but the quantity radius times radius is a constant. So that's one place where declaring radius as constant can help. See the compiler can understand that no one can ever update radius and so radius times radius is also a constant expression which can then be pulled outside the two loops. So radius times radius computation is only done once. It will not be done so many times. The compiler automatically looks takes care of that and if the point x y is found to be inside the disk then I increment the counter inside by 1. Finally I write out pi as inside over radius over radius. So pi r squared is equal to inside and therefore pi is approximately inside over radius squared. So and then I print it out to a lot of digits to see what happens. So very simple piece of code. So do this. So 3.1417. So it was pretty instant. If you want to see how much time was actually taken you say time erot out. So it claims you get this three times. You get real user and system. Now the real time is the all clock time. Now your code is coexisting in the system with all kinds of system routines, the desktop manager, maybe a web browser, the CPU is multitasking between them all the time. So the more accurate notion of the time the CPU takes to do your job is what is called the user time which in this case is 0 because it is too short to measure. It is much less than 1 millisecond. So let us yeah but then you would go along a diagonal. You would increase both of them at the same time. You want very small values of x and large values of y simultaneously. So you do need the two nested loops. So suppose we increase this from 100 to 1000. Roughly what should be the time dependency on the radius? Radius squared. The radius squared, four radius squared points on this. So let us see. So I increase it to 1000 and I compile it again and I time the run. Well this time it is showing through 96 milliseconds and the number has also improved to 3.1415. Now in this case however I have a stable algorithm in that if I keep on increasing radius I shouldn't easily get those floating point problems which will give me dot values. Because yes the both the numbers will increase but the numbers will increase within reason. It is just radius squared and so I can really take radius to astronomical levels and get more and more accurate estimates of 5. So you know you should play with this code offline and this used to be 96 milliseconds for 1000 for 10000. So I increased this by a factor of 10 and so the time should increase by a factor of 100 which means I should take about 10 seconds yeah 9.58 seconds. So if your job is CPU bound and there aren't any large memory artifacts like giant adage or matrices then these kinds of envelope calculations are very accurate. The machine works at a constant speed and you know if you can do 1000 squared things this fast you can do 10,000 squared things you know 100th of the total speed yeah. Yes yes it won't so when you say say radius times radius and radius has already reached the upper bound of double say then radius times radius will become infinity. It will be stored as a double register where the value will be the infinity code. It will store it as a double. It will store it as a double with a special value infinity and if you compare infinity to infinity the result is almost always false you can't compare to infinities but so you realize that this is a more reliable algorithm except it's much slower okay even at after 10 seconds my value was only 3.14159 so I got it right to the fifth place but nothing beyond compared to the earlier methods were converging to better values much faster okay but this sort of integration is a little more reliable you want to overflow easily. So that's okay one more interesting comment so you might argue that well I'm kind of replicating work by going from minus radius to plus radius I can easily go from zero to radius in the positive quadrant and do one-fourth of the work right. So try this out initialize x to zero and y to zero instead of radius you will see something interesting happen you will not converge to pi okay and apart from the factor of four of course so you do the adjustment of four even then you won't and you have to tell me why it misbehaves if you switch the lower limit from minus radius to zero okay see if you can figure that out okay. The next problem is also an nested loop problem but it's about strings. So suppose we are given two strings one is called needles the other is called haystack and the problem is to find needles in the haystack. Needles has no repeated character okay needles has only distinct characters in it haystack may of course repeat characters and the question is how many characters in needles appear in haystack at least once for example is if needles is equal to bat and haystack is tabla then observe that in some arbitrary order B appears in tabla and A does and so does T therefore the answer should be 3 whereas if the input is tab which is functionally equivalent to bat and haystack is bottle then T and B are found T is found twice but that doesn't matter and B is also found in bottle once A is not found and therefore the specified answer should be 2. Suppose that's the specification of your problem. So how do you find needles in a haystack? First we'll look at sub problem which is instead of being given a bunch of needles if I give you only one needle character called CH can I detect if the character appears in a string haystack or not okay and that's pretty easy. So given a one character CH and a string find if CH appears in the string at least once. So character CH is suitable initialized the string haystack is suitable initialized and let's say the answer is going to be an integer 0 which I will increment to 1 and only 1 if the character is found in the haystack. So that loop is pretty easy to write you say int for int hx equal to 0 hx less than haystack dot size and you can implement that optimization I showed if you want to plus plus hx. Now in standard coding practice it's also quite common for programmers to use as an index variable the initial character of the variable you are stepping through followed by x okay so haystack hx makes code easy to read. Then you say if CH is equal to haystack of hx then you increment the answer from 0 to 1 and as soon as you do that you are out. So use that break statement to quit the loop. So that at output the answer value will be either 0 or 1 0 if needle was not found and 1 if CH was found at least once in haystack. You don't even wait around for the other values you break immediately okay. So the sub problem is easy to solve and now if I have many needles then I need a nested loop. I read string needles and haystack from CN and I still initialize answer to 0 then I have an outer loop which is the needles loop okay. So int nx equal to 0 nx less than needles dot size nx plus plus nx and inside I initialize CH to needles nx now it reduces to my own problem right. So then I have exactly the same loop body as before if for int hx equal to 0 and so on if character equal to haystack hx then increment the answer okay and then break. Now this break will only break the inner loop it will not affect the outer loop okay. So you are abandoning the haystack loop because you already found one occurrence of needle CH but you go back to executing the next iteration of the needles loop. It goes out of the innermost block in which the break appears. No no if so break doesn't affect the if break affects the closest enclosing while do while or for loop or switch that is the condition. Break is exclusively applied to loops or switch the switch break is a little different as you have seen it's a case separation construct loop breaks break the innermost loop there are ways by which you can declare a label and then break out of several loops at the same time. Generally speaking you know using break and continue indiscriminately is a bad idea you should really use break only to break from the innermost loop and nothing else. So quit on first match and then you know you end the two loops and output answer at the end I have not printing that in this but if you implement the code you will print out answer at the end. So for needles let's put that bat and say bottle so 2 it doesn't appear and the fact that T appears twice didn't affect the results. Now if I however give something with repetition suppose I say Bata and bottle okay that doesn't matter because it doesn't at all appear in the other guy it's still 2 but if I say input something like Tata and bottle okay. Now what happens I should have gotten you know so I count one for each T in Tata not for one T in bottle each T in bottle but the two T's in Tata count for two if you want the answer to be still one then you need to somehow deduplicate Tata before you get started okay. So that's the sub problem here which is or the generalization can you work in case needles also has repeated characters. So the specification is that if needles is equal to Bata and haystack is tabla I still want to print 3 okay so I don't want to account for you know the A's multiple times and there are two approaches I can do a pre-processing on needles so that it's dedupted or deduplicated there are no duplicate characters if I can do that then I'll reduce it to a known problem the other is to dedupt needles on the fly inside the needle loop. So let's look at these two approaches one by one so one is that I'll a priori duplicate a string so how do I do that is say a string needle and I have a dedupt version where there are no duplicate characters okay. Now I could say something like for and remember they initialized and then say I read needle from STD in whatever way and now I start iterating through it in nx equal to 0 nx less than needle dot size plus plus nx okay what should I do here to plug in only distinct characters in needle into dedupt that's an estate loop as well okay so what I have to do is I have to say something like boolean already found say equal to false right and then I have so I'm considering so this is the needle string and I'm considering the position nx for inclusion in the output dedupt I want to find if there is that same character any place up to and excluding that in the left of needle okay so I say for int lx equal to 0 lx less than nx plus plus lx okay and the condition I check is if say needle lx is equal to needle nx then I've already handled it right so at this point I set already equal to true and I don't need to check for anything more so I break the lx loop okay this breaks the lx loop fine at this point I basically say if not already then dedupt dot append needle nx and then I close the nx loop so this format is clear how many people are comfortable with what's happening in this code I have a stepper nx going through needle for every position of nx I have another stepper lx with checks if any position to the left of nx already had that character if I've already copied that character then I don't copy it to dedupt otherwise I append it to the output string okay so if I had Bata as input B is transferred because there's nothing before it see in case nx is equal to 0 I initialize lx to be equal to 0 and this test fails so that this loop will not be executed at all so limits all work out nicely so B is transferred to the output A you test whether B is equal to A it's not so A is transferred to the output similarly 3 is transferred to the output this A is lost because I match a previous A fine so this is how you can dedupt needle first and then you can invoke the earlier code that will do your job but there is a second way which is that you know in the old code I had the following structure I said for int nx equal to 0 etc for the needle loop and then here I had the structure for int hx equal to 0 and this was the haystack loop okay that was the broad structure of the code what I'll do is instead of creating a new variable called dedupt a string I will check whether I need to deal with nx or not earlier in the loop okay so this code boolean already equal to false and then the code that found out whether already should be true so this part okay so that's a this is code number one that code will be inserted into the loop here so there is no storage required for dedupt I'll check it on the fly and then at this point I'll write something like if already continue okay that's a new keyword so this statement we need to understand what it does so if the character at position nx was already found earlier in needles then we abandon the rest of the loop so continue means execute the stepper so abandon this part execute the stepper of the loop you are continuing and then get back into the next iteration so the statement is here for in it condition stepper and inside you have some statement s1 and then after some checking some condition you do a continue and then you have s2 okay if the condition is not satisfied then continue is not invoked and your loop iteration goes like s1 s2 s1 s2 s1 s2 etc the first time that this condition is correct and continue is invoked s2 for that iteration is no longer invoked so your sequence will be like s1 stepper con s1 s2 etc depending on what happens after that so of course you don't technically need the continue statement instead you could write it as if not already then do this stuff and close that but if you're writing a long loop body then indenting it more and more increases the difficulty of reading it and using one or two continues in a loop is okay so use it with discretion so that is how you find needles in haystacks any questions about the string examples we have seen so far how many people are whose unfamiliarity with strings is interfering in understanding this or is it clear enough that string just added of characters and we are manipulating them so people who are finding it uncomfortable to understand the string examples okay or to pitch the question the right way how many are comfortable with understanding this so the next example shows the use of parallel steppers so sometimes it's useful to have multiple variables stepping through arrays and here is an example this is a little contrite but we'll see good applications of it in the next couple of slides so suppose I want to print a string in some crazy order print the first character then the last character then the second character then the second last character and keep doing this until you meet in the middle okay so we can do it in you know careful ways by doing index arithmetic you can see the size of that is n and I start with 0 then I print say lx I print n minus 1 minus lx so I can do some index arithmetic but you have to get all the indices right and it's much easier to issue two steppers so I start off with lx equal to 0 and rx equal to message dot size minus 1 I continue on while lx is less than rx while they haven't collided in the middle and the stepper is now two parallel operations okay I increment lx I decrement rx much easier to read so and I output lx and message lx and message rx now observe that the entire string may not be printed right so if the string is odd length the middle character will get omitted now what are the uses of parallel steppers suppose I want to reverse a string so you are printing it in that crazy order but I can now put that advantage to reverse the string in place without creating a different string how do I do that I start with lx equal to 0 rx equal to that last position and then inside the loop I use a temporary variable say or do some other tricks that have indicated to switch between message lx and message rx okay so again when message lx is used in the right hand side expression then it is a character value whereas when message rx appears as the left hand side it means right temp the character temp into the slot indicated by the left hand side message rx so suppose I have a string hello with lx pointing to the first position and rx pointing to the last position so temp will become h the first position will be reassigned to o and then rx position will be assigned to h as a result I will get that and then lx will step forward rx will step backward okay similarly you know e and l will be switched and then lx and rx will collide at the middle in this case it doesn't matter if you collide in the middle and omit the middle character because there is no reversing it so this can reverse a string for you in place without allocating a new string right fairly easy to do okay next example suppose I give you two strings which are exactly the same length and I want you to compare them by comparing I mean you have to output an integer which is negative 0 or positive according as the first string is less than the second string equal to the second string or greater than the second string now how do we define less than greater than remember that the ASCII codes of characters is such that say small a is less than small b is less than small z now for simplicity assume that the input string is strictly lowercase characters only from a to z now in particular suppose we want to return the difference of the first differing character so that will do the trick because until things are equal will not react the first differing character you return the difference of the character ASCII codes and that will make sure that the smallest string comes out to be negative so suppose my input strings are AS and BS which is suitably initialized and let's say my answer is 0 now I have this because they're exactly equal length I just need one step okay so int Ix equal to 0 I am equal to AS dot size in a real code you should check that the lengths are equal and not trust the specification and then plus plus Ix now inside I have an assignment expression we have talked about this before so you assign to answer AS of Ix minus BS of Ix remember characters are integer types so you can take a difference of two characters it may not be printable it may not make sense but it is an integer okay and if that quantity the answer which is the difference is not 0 then you break the loop and finally output dance so what's going to happen is if the strings are identical you are never going to take the break you are going to run through the loop normally and exit because in becomes equal to Ix and at that point you'll output 0 which says the strings are equal if at any intermediate point the characters differ then answer is going to be set to a non-zero value and the loop will break and we'll put out that non-zero value if those are the input strings at the first iteration H will compare to be equal and nothing will happen at the second stage answer will be set to E minus a which we know will be positive and that is what will be emitted so this will basically translate into the statement that AS is greater than BS where this greater than is defined over strings of equal length clear so how do we compare strings now now of course all strings in the world are not the same length and you know to write to compile a dictionary or a telephone directory we need to deal with strings of unequal length we need to deal with very short names like Smith versus Krishna Murthy right so Smith comes after Krishna Murthy in the dictionary even though Smith is shorter because it starts with S right so let's make up these rules okay so hello will be less than hell why is that because the first differing character let's try to generalize this to strings of arbitrary length that's it everyone is convinced so we'll say that hello comes before hell sort of like Krishna Murthy comes before Smith okay because the first differing character is the L and the P in the fourth position and L is less than P so therefore hello is going to be less than hell but hello is going to be greater than hell just because hell is shorter so shorter words will come before suffixes okay that's exactly how dictionaries are compiled right in a dictionary order this will be the order okay so the the technique is that we scan both strings from the beginning if a differing character is found then the job is the same as before we return the difference of the character values okay otherwise if one string ends earlier than the other then it is less than the other that's the convention and this is easy to code up we initialize the for loop outside so by the way in when you write for init con stepper any of those could be empty okay if there is no stepper then the job of stepping must be done inside the loop body somehow if the con is empty that defaults to true so the loop will keep on going unless you break inside and even in it can be empty which means initialization already been done by the time you entered the for statement so in this case we have an empty initializer okay I start off with answer equal to 0 a x equal to 0 b x equal to 0 a n and b n set to the sizes and then for no initializer while answer is equal to 0 okay and a x is less than n b x is less than b n okay strictly speaking we don't need the answer equal to 0 because if the answer turns out non zero will break immediately okay and the conditions are that I have not run out of either of the strings and the staples are parallel so a x plus plus and b x plus plus b x right now in the loop I just say if the difference of characters at the current position is non zero then break but what could happen is that if this loop terminates normally okay what is normal termination it could terminate because either of the strings have finished in which case answer will never be assigned to 0 so I checked that if answer is found to be 0 then I reset it to a n minus b n so in case one of the strings is larger than the other this will result in answer becoming non-zero again okay now you should convince yourself that in all cases of loop termination that last patch will do the right thing suppose the strings are actually equal length and they end at the same time okay and the loop normally terminates it means that the strings have to be equal so I let answer equal to 0 finally as well if the loop terminated because of a differing character then this will not even execute if the loop terminated because one of the strings was strictly shorter than the other then this will again do the right thing fine so this is how strings of arbitrary length are compared to each other using what's called the dictionary order or the lexicographic order okay later on when we look at sorting and searching you will realize that this order is very helpful because you can now take all citizens with ration cards or something and you can sort their names and the advantage of keeping the name sorted is that if I am looking for someone I can create indices on it see telephone directories there are these tabs which give you a b c etc so if you are looking for Krishnamurti you can quickly skip to k you don't need to do a sequential scan from the beginning of the telephone directory to the end which would be very very painful so similarly computers and databases where they store these kinds of strings and people's incomes and their tax returns for the last year they always stored them in with various indices on them where the keys are kept sorted now to access someone's income tax record you don't need to search the whole database you access the index to go into a small block and then you look inside the block okay so that's why we care about ordering of strings and ordering of other numbers so any questions of on comparing strings of an equal length so this is easily coded up and it should do the right thing so in case you set answer to a n minus b n the answer is not a difference of characters but that's okay at the end what I care about is the sign of the answer whether it's negative zero or positive nothing else so so in summary we have looked at many many other examples of for loops so it started with while loop then we went to far which gives you a little syntactic comfort of declaring the initializer the stepper and the loop body and the condition all in one shot and then we saw that we need this you know it's convenient to have this two new constructs one is continue and the other is break break we have already seen continue we saw today so if you want to abandon the rest of a loop body and then continue with the next iteration of the same loop then use continue if you want to abandon the rest of the loop body and abandon loop entirely then use break okay continue and break affect only the immediate containing loop if you don't want that if you want to break or continue outer loops you have to assign names so you have to say this loop is called foo this loop is called bar and you have to say continue or break with a label okay that is somewhat discouraged because it makes code harder to read and even harder to maintain as you know change the logic so basically use only sparingly in a clean well-lighted area so don't don't play with break and continue to dangerously okay so that's pretty much what I had for today next time we will go outside the paradigm of character arrays and look at arrays of double precision numbers integers and so on