 Okay, so we are off to chapter two and we're talking about analyzing algorithms and we're going to be doing This is the introduction to it We're going to be analyzing these algorithms as we go along through the rest of the book So we'll need to know how to do it. Oh, that's nice to know that they can see my screen. Okay. Oh Somebody else joined us. I see Okay, well, I'll find out who it is later on by the way, whoever it's just joined in if you have any questions Just turn on your microphone and ask So the question is When we have an algorithm, how do we know whether one algorithm is better than another and It says there's a difference between the program and the underlying algorithm So let's take a look at this program here Where we're finding the sum of the numbers one through n So you give me a number n and I return along by the way 64 bid integer that means I have more More and larger numbers that I can return and then I run a loop From one up to an including n and return that value And so there's nothing particularly new or exciting here So that's so that's one way to find the sum of the numbers one through n Everybody okay with this code? Do you actually need me to run it or do you gonna believe that it works? Oh All right. Oh, all right, so we're gonna do here's copy this I guess I'm gonna have to make a new Save this And this is called find some I think well now we get to see if there's what we put in the book is correct And we run it and the answers 55. Yeah, so it works So everybody okay with this code, right? Yes, any part of it that you might not be familiar with? Yeah Okay, well it's adding up the numbers from one up through and including ten correct so The sum starts off at zero agreed So when I is one we add one to zero, which gives us a one then I becomes two now we take one plus two which gives us three and The sum keeps getting updated every time through Okay, and if we were to go into I guess I could do it in J shell So I have one plus two plus three plus Man, I'm here. Okay. It's today is trouble typing day. I can see that already Okay Nobody believes a word I say okay wonderful So there's one algorithm for doing it a program to get the numbers one three in Now the book then goes on and says okay, here's another one If we look at this this finds something to this is pretty awful. Okay It's exactly the same program except all the variable names are really stupid so that it's harder to read So the question is this a better algorithm or a worse algorithm the answer is it's the same algorithm It's just a really badly written version of the same algorithm But I'd still say it's worse. Okay, because it's less readable. Um, it also has this useless Assignment statement here that doesn't get used for anything worthwhile so the question is Can we improve it? We okay? We can make it worse by giving us if ridiculous names Is there a better way to do the addition of the numbers one through n rather than running a loop? And it turns out. Yes, there is a way to do that and what we can do is Third, I'll see where's my Sorry, I'm trying to figure out where this is formula object there namely the sum from i equals one to n of n is n times n plus one over two So that formula is a lot easier. Yeah, I can do it with just that one thing So what we'd like to do with this let's let's go back back to this first one here Where we're running the loop the question is how well does this perform? How good a performant algorithm is this and one thing that we can do is we can record the amount of time it takes to execute this method So here I have this Sum of n that's the same Method that I had in the other program and what I'm going to do is I'm going to run 25 trials of this I'm going to ask the user find the sum from one to n and Then what I'm going to do is I'm going to use something called the nanotime method and it's in the system class and what it does is it records this Java virtual machines clock time By the way, the Java virtual machine clock time has absolutely nothing to do with this one down here Which is the time of day has nothing to do with the time of clock on the time of day on the wall clock over there It's an internal timer that the Java virtual machine uses So I'm going to record what my start time is Then I'm going to Calculate the result and then my elapsed time. I'll look at the system clock again And subtract the start time and divide it by a billion Are you all familiar with that notation 1.0 e9? Have you ever used that before? Now that's one of the nice things that you can do in java I can say for example 1.735 E4 and that'll be 1.735 times 10 to the fourth So it's just a convenient notation because 10 to the ninth. I don't want to have to write 90 as a loose count I can so say 1.0 e9 All right, that's big enough that it's not gonna it's gonna give me that so I wonder what would happen if I tried to say integer 1.0 e9 There you go, and there's there's a billion. Okay, and then I'll print out how long it took So I'll say which trial it was what was the sum of the numbers and the time So let's compile that And run it And let's say the number up from one to a million And this is approximately how long it took now. You'll notice something really weird here The first one took 0.002 the second time 0.001 and then everything started to sort of calm down and get to Something consistent towards the end so the question is why did the amount of time it took Take longer the first time Then after 25 trials And the answer is because of the way the java virtual machine works and there's some things that we have to understand is that javaset virtual machine has a Setup time So the first time I run a program it has to do all that setup So that adds the amount of time for my first trial The java virtual machine also does something called caching Which means it stores previous results in memory So that means if there are some Things that are being done over and over again It can take advantage of that without having to recalculate it but building up that cache takes some time And also the java commercial machine Does remember what we were talking about earlier the garbage collection? Did I talk about that in this class? Does anybody remember if I talked about that? get when you allocate objects in java And you've changed the references to them sometimes you'll have a reference that you used and now you've got something out in memory But nobody's referring to it anymore And so what happens to that extra memory that's just hanging out with nobody referring to it? That's considered garbage and the process called the garbage collector Comes through and sweeps through it and gets rid of all of that and reallocates the memory so other people can use it Um, so all of these things take time And that's why Our time taken for the first trial of the algorithm Will take longer than the second And eventually everything settles down Which is what we see here So in these last few trials It's pretty much the same amount of time for adding up the numbers one through a million Everybody okay with that so far? Yes? You sure you're okay? Hey guys, I see some doubtful looks out there Now back to the algorithm analysis, which is the point of this whole thing I had to I'm sorry. I had to talk about this because otherwise people look at this They what the hell's going on with java java is really weird Okay, let's say this as Timing 2.java And this time when I want to find the sum of n I'm going to say long result becomes n times n plus 1 divided by 2 And then I'll return the result Okay, so now we've changed the algorithm from the loop Which was this one here to The formula And now let's see if there's a difference. Let's compile this and run it And we're going to go up to a million again And you'll notice that now everything is very consistent again the first time through everything's pretty wild But everything settles down and it's a much more efficient process, isn't it? Why because instead of running a loop a million times I'm doing one calculation So instead of a million steps I have one step and You can see the difference Do you want me to run the other one so you can see the other one again what the times were? Yeah, let's uh grab these last three trials here. So this is the one line formula and here's the loop Which we will go here And this is called timing dot java By the way, I don't have to compile it again once it's compiled and I haven't made any changes to it I don't have to compile every time And then we're going to go to one million So yeah, that's it's a big difference Okay, the next thing we're going to need to talk about is The number of steps that are needed to do something So if we wanted to compare these two algorithms to one another We might want to find out how many assignment statements get performed again So in the first function the one where we had this we would have One assignment statement. Let's go back to Let's go back and take a look at this one again. So here we have one assignment statement Plus a million assignment statements. So the time taken the number of steps, which is called t notation so loop The time taken to do this sum is one plus however many there are Using the formula The number of assignments we have is two Uh, do we have even two or one? I don't remember. I think we may have had only one there Yeah, there's only one. There's only one assignment statement. Okay. Well, there we go And so notice by the way, this is independent of Which number we're using So t sub n is the time it takes to solve a problem of size n which is one plus n steps The exact number of operations is really not as important as what's the most dominant part of the t of n function So as the problem gets larger some part of this function will tend to overpower the rest Let's come back here. So for example when n is three, okay That one is one fourth of the whole time. Isn't it that does first assignment statement But when we get to a million when we're doing this sum from one to a million Is that extra one assignment statement making a gigantic difference in how much time there really is So what's really taking over here is the one the important part or is the end the important part of this The end is the important part. And so we're going to say in what's called Big o notation And the o stands for The order of magnitude of a problem So for the using the loop It's o sub order n. I think it's sometimes written o o of n or o n The time is proportional to the number of To the number you're adding up to using the formula It's order one The time is independent Of what number you're adding up to And although t sub n is interesting, we are most often Comparing things and when I say order n versus order one while order n Is always going to be take longer than order one order one is the best we can ever do Now sometimes there's also the best case worst case and average case performance Okay, so sometimes for example, if you're sorting numbers And let's say the numbers are in the exact reverse order of what they're supposed to be that might take longer than if they're just randomly Randomly arranged So the worst case might be when something's completely sorted, but in the wrong order That'll take the longest amount of time Best case for sorting and there's a list of numbers is if it's already sorted at the very beginning because then there's nothing to do And the average case is if you just have some random numbers that are thrown at you And they're in any old order and you have to figure out to sort them that's going to be the average amount of time So sometimes it will be characterizing an algorithm by how well it performs than the worst case How well it performs in the best case and how well it performs on the average And here are the common functions for big o if it's order one it's called constant time The amount of time it takes is totally independent of the number of items that it's manipulating Log to the base n is what's called a logarithmic or algorithm And we'll see some of those later on in the course Um the loop that we did to add up the number the sum from one to n that's linear time Because it requires um proportional to the number of items log linear is n log n and again, we're going to see some algorithms now that And then quadratic is n squared If the amount of time it takes is proportional to the cube of the number of items that's called a cubic um algorithm And if it's two to the end, that's called an exponential algorithm And this goes from fastest to slowest Now all of this i'm telling you is pretty much to build up the background of what's what we're going to be Talking about the rest of the semester. That's why you want to be familiar with these terms So say, oh, that's a quadratic algorithm. Do you think we can improve it to be linear? That's going the right direction Quadratic is n squared linear is on the order of magnitude of the number of items And here's one that we can analyze So if we were to go and look at the number of assignment operators We have four operators here, correct? And These are the first four The second term is three n squared. We have three of these But they're this they're done n times for the outer loop and each outer loop goes n times for the inner loop, correct So we have four plus three n squared And then this last one is n because we have two n we have two n here. We have two Assignment statements done n times yes And then finally we have this one assignment statement out here at the end So there's our t sub n is four plus three n squared plus two n plus one Which is three n squared plus two n plus five Now if i'm going through only one item Five is going to be The most important part of this But what happens if I have a thousand items which one's going to be the most important part of this now The three n squared the two n or the five The three n squared And in fact the n squared is the dominant part and the three is really not the factory is not important So we call this fragment of code is going to be order n squared The amount of time it's going to take to do this is going to be proportional to the number of items squared Because of this nested loop here This nested loop is the thing that just kills us So when n gets up to a million, I mean if we have three of us a million times a million That's like an enormous amount, right? And the two million and the five are just saying who cares about those that's just the rounding error almost This is a good example. I really like this one What happens if we want to find an anagram? So for example, it's strings taster and treats are anagrams that means you have One string is a rearrangement of the other string And here's the first algorithm that they give you in the book And what they're going to do is they're going to check off each letter in one string against the letters in the other string So what i'm going to do is i'm going to have string s one and s two Well, first of all if the lengths are different than they can't be anagrams. They have to have the same number of letters, right? Yes So what i'm going to do is i'm going to take the second string and make it into an array of characters I'm going to j so i'll show you what that means if I have string s is let's say Then I can say s dot two char array And that'll give me an array of six items. It's very nice to be able to split that apart Then as long as I have an anagram I look Okay, I'm going to have a position one. This is going to be my first string I guess I have to write this here. So this is So I'm going to do is I'm going to start here This is going to be position one Is going to be there and then I'm going to go through all of these to try and find the letter t And the moment I find it I'm going to replace it with a dash. So I never see it again So this t I found it here And I replace it with a dash Now I go to a and then I go here to position two. Is this an a no Is this an a no is this an a no is this an a yes, it is terrific. I'll replace that with a dash Then I go to the s and I go through here looking for the s I find it If you're there Then I have another t No, no, no, no. Yes And I replace it and eventually at the end I'm going to either have all dashes or I'm going to have something left over So if I have something like Cards well I need something that's not an anagram here. Um So the c comes here I have a dash The a I found to find that I'm looking for a t and it's not found anywhere in here and I can stop right there Because if I have something that's in one but not in the other one, then I guarantee it's not an anagram, right? And that's why I have This thing I have a found so I have a boolean that will let me get out of my while loop earlier So as long as I have letters to go in my first word and I still have an anagram I have to go through all the letters that are remaining in the second word And I go through each one of those and I say okay is the string character Equal to the one in the array If it is cool, I found it otherwise I have to keep moving If I finally find it, then I'll set it to dash That's what I was doing by hand if I didn't find it then it's not an anagram And then I return whether it's an anagram or not And of course Let's call anagram one So we have true true and false Because these are anagrams these are also And abcd and dcda are not Now question is What is the performance on this algorithm? Okay, and the answer is the performance on this algorithm is going to be order of n squared Why let's say my first I have because I have a nested while loop For each letter in the reverse string I have to go potentially and search through all of the characters in the other string to find it And in fact each of the characters will cause an iteration through up through n characters of the list from n1 And in fact it happens to be the same formula that we had before which is One half n squared plus one half n And the n squared dominates the n term we can ignore the factor of one half And this is an order n squared Because this is how we analyze an algorithm to figure out and it's not a bad algorithm It's the first one that most people will come up with But if I have something like that's a hundred Let's say letters Then it's going to take a while because it's going to have to potentially in the worst case Grow through something like well about a thousand times in 1001 divided by two Which is 50,000 some odd combinations Now another way to do it is we could Sort both of the strings So if we sorted all the letters and then we'd compare them one at a time Then we've if they're all the same then it's an anagram, right? It's a pretty clever way of doing it And I'm not going to run this program, but again you can run it yourselves here And What we're going to do is we're going to Change each one of them to a array of characters Sort them both And then we'll have a while loop And you're going to say oh well this is order n because there's only one loop. It's not nested right But there's a catch here We had to do the sort so the question is how efficient is the sort? And it turns out that most of the sorts are actually n log n So those are going to be or order n squared or order n log n. So the sorting operations will really dominate the iteration So we haven't gotten any real improvement here Now this one's really awful. Okay Remember how I can say you can make things better and you can make things worse This is worse What I can do is I'll try all the possibilities I'll generate the list of all the possible strings from s1 and then see if s2 is one of them well That's really bad because it's Do you all know about factorial numbers? How many people you can tell me what if somebody can tell me what effect let's say three factorial what that means Yes, it's a number times the number minus one times the number minus one from that all the way down to one So for example, if I have uh, let's say six factorial and it's used it's written as six exclamation point That's essentially six times five times four times three times two times one, which is 720 Now can you imagine what that's going to look like if I have a 15 letter that I'm trying to get an anagram That's 15 times 14. That's if that's a gigantic number This grows faster than quadratic or cubic. It's it's It gets really really bad really really fast So this is right out. In fact, if there's 20 car 20 characters Here's how many possible candidates there are Okay, so this is not a good solution However, this one's an interesting one Namely that somebody could say well We could check off the ones that as we get them right But when we have an anagram the number of a's let's say letter a in the first word has to be the same as the number of letter a's in the second word And the number of letter x's in the first word has to be the same as the number of x's in the second word Because if those don't match and they can't be anagrams Does that make sense Do you need me to just show an example here? Anybody need me to show an example? Okay, good And here is the Code for that So let me copy this here Nope Paste it there And what we're going to do is we're going to create two arrays of length 26 So this is only going to work with letters, but that's okay And then we're going to subtract the letter a to get our index remember characters are represented as integers in java So we can say Yeah, I better show you this in in in j shell So for example, if I have the letter let's um C minus a that gives me two If I have x minus a That's then Gives me is actually a 24th letter because it started zero and a So I can subtract characters from each other and get integers And that tells me how far away one is from the other And I'll use that as my index Into this array of length 26 So I do that for it the first word I do it for the second word And then I go through and say, okay Let's go through all of them and check to see that the counts are equal and the moment they are not equal It's not an anagram anymore And that bounces us out of this loop I'm really big by the way. I'm using compound conditions So as long as I still have letters to look at and I still have an anagram Check to see if the counts for the letter are the same if they are great Otherwise, it's not an anagram and then when this comes out false that ends our loop This is a way of doing a while loop without having to use the break statement Now probably everybody else in the world who writes this will use a break statement, but oh well Now what's the performance of this one is it order n squared There's two loops, right? So should it be is that also n squared or not? How many people think it is order and still order n squared How many people think it isn't Okay, how many aren't sure Okay, so people aren't sure. Let's see. Here's what I do if I weren't wasn't sure I said, all right. Let's let's do the t the t business here uh view Is there a way for me to split this or not? Excuse me here split window. There we go. Okay, so here I have I have Two assignment statements there And then here I'm going to have let's just say this is the length of the word is n. Okay, so I'm going to have Plus two times uh Why because I have two assignment statements in here and they're going to go n times agreed Now this loop is not nested in the other one. It's a completely separate loop. So independently this is also going to take two n And then here Since I might go 26 times I'll have I'll do either one or the other So that means I'll have 26 is the worst case Okay, so that's 28 Plus four times n What's the term that's going to dominate this? um equation Is it going to be the twice always going to be 20? What if I have a thousand? Let's say a thousand letter word that I'm trying to figure out if it's an anagram to another thousand letter word Is the 28 going to be dominant or the n going to be dominant? The n is going to be dominant and therefore this is going to be order n Algorithm So if you just count them up you can say, okay, what's the what's the Number that's going to take the Take the biggest share of the time that has to be be used And this is an order n algorithm. So now this is definitely an improvement remember this first one Which was really quite nice. I mean there's nothing wrong with it was order n squared Whereas this last one here is order n. So it's a preferable algorithm if we have a long string of things that has to be analyzed If we have only like four or five And you know 25 versus five Now you could live with that But once we get to more than let's say 15 or 20 then the difference becomes pretty significant And you might want to do these first. So let's yeah, let's do these here So what about this one? Is this going to be order n n squared log in or n cubed? Remember, these are not separate loops now. They're nested I've got to vote for n squared. How many people vote for n squared? By the way, when I I'm going to have to rewrite this to use a superscript two at some point And yes a singly singly nested loop Like this is an order n squared Whereas this one here where I have the two separate loops that's going to be Order n correct So it's at an order of two n, but again, I can ignore the constant Oh, this is an interesting one This one you have to look at a little bit carefully Let's say n start in fact, I'll I'll give you a hint here. Let's say um I starts off at oh 64 Okay is 64 greater than zero Great then we add two plus two and then I becomes 32 32 is greater than zero. We do the addition 32 divide by 2 is 16 So question is this going to be an order n or not? Yeah, you won't try see It's log n because the log n usually when we say log is logged to the base two so when we have 64 things it'll take that loop will run six times 128 it'll run only seven times 256 it'll run eight times So for every doubling that we have it takes only one more time through the loop which is pretty darn good So log n is much better than n in fact The value I is cut in half each time so it'll only take log n iterations You'll see these patterns and you'll become familiar with them as things go along So now what we'd like to talk about is what's the performance for operations on things like array lists and hash maps And we're going to show you how to do timing so you can time things now I'm gonna Go a little bit ahead where we're coming up on this Go to array lists here Okay, oh I need to go back for one thing here. I forgot to mention this the book does mention it You'll notice here in our original anagrams We really did not need to do uh, we needed to Add a little bit for the this we need to use a little bit of extra storage for this array Here we needed to create these two extra arrays for the count counting of how many there were So what we're doing here is we're trading off time and memory And this is important. I should probably put that in the notes Okay, we'll give you an increase In speed But the cost of four Java program Um improved an order n squared Not great. How do I do these stupid super scripts? And at the expense of six integer arrays That's a great trade-off by the way So ordinarily we say you can often trade time for memory and vice versa So we talked about array lists already. Yes And they need a refresher on those or not Sure, let's go back to j shell and take a quick look at this Important java dot util dot array list And I can have an array list and this since this is a generic type I have to say what kind of items I have in my array list And I give it a name becomes new Array list And again, I could put the word string and repeat it, but I don't have to And then I can say city list dot add Of let's say kupertina City list oops And then if I look at city list there, they all are and I can say city list dot remove The item at element one It returns it to me And there it's taken out of the middle And so these are the operations that we can perform and we'd like to know how well they perform And here's the big O efficiency. So if I want to get something to index, that's constant time No matter how big or how small my array list is Reaching into the array list and grabbing out the value for one of them takes exactly the same amount of time Setting also is constant time If I want to add n items, it's order n To remove something at a given index is also order n Why? Because I might have to move all of the Ones after the one that I got rid of I have to move them all down To get rid of that empty space Does that make sense or do I need to go into that or explain that further? Okay, great. We're good on that. And then if I want to find the index of to say, okay, is something in there For example, I can say city list dot index of I might have to go and search through the whole list to find out whether it's in there or not So the amount of time it will take me to find out if something is in an array list Is proportional to the number of items in the list The amount of time that it will take me to get rid of something Is also proportional to the number of items in the list. The more items there are the longer it'll take me When you remove the first number all the remaining numbers have to be moved left If I remove the last number, that's the best case. So again, remember we have worst case and best case So for removing something from an array list, the best case is removing the one at the end. There's nothing to move The worst case is removing the one at the beginning because everybody has to move down to the left in order to get rid of that extra space And in fact here if I have two million items Here's how long it takes to return to remove the first item and those for a million items It takes less time and for a hundred thousand items. It takes less time Why because there's less to move So it's proportional to the number of items in the list Whereas removing the last item and this should be item not time by the way. Oh, no, this is a time. Yeah Is pretty much the same no matter what how long my list is whether it was two million one million or a hundred thousand items And now the question is well, how do we write this program? again And and I think this would be something that I'm going to just start on tomorrow on wednesday rather, okay So on wednesday, I'm going to go through the Array list and hash maps and we can talk about those and also talk about the assignment So this is a good time for a break and after the break Do you want me to talk about the assignment anyway so that if you want to get a head start on it during lab? Okay, one thing I'm going to tell you is the way I'm doing the timing here This was what's called Maybe I'll give you a term here. This is an ad hoc solution. Okay solution that's written to fit one case specifically And not generalized So I was looking at what they had in the original book when I was translating this from python into java And I wrote an ad really big ad hoc solution, which I don't think was a really good one And so, um, I'm debating with myself on wednesday Whether to show you the better solution that I came up with um yesterday So I'll carry on that debate with myself and maybe ask one of the other professors and see what they have to say with this about it So, oh, let me continue sharing the screen. I guess So is okay. Should I go should I go over the assignment now or after the break? Now, okay, so And uh algorithm analysis So here's what we want to find out. We want to find out that index of really is order in And then you're going to write a document that describes and summarizes your results So this is really open-ended you can decide how to do the verification You can decide how many items to put in the array list how many times you want to call index of So you decide all of that but you have to at least let me know in your document And in fact, let's go and take a look at a summary So it'll have the course name your name and the date Any format that you like And so this person wrote a program that does the following for seven different several different array list sizes 100 200 300 400 500 million items Sets up an array with the integer zero through whatever the size is Starts a timer generates 10,000 random integers and calls index of with that random integer So it just randomly generates a whole bunch of numbers and tries to find them in the list Then it did each of those 25 times and it averaged the last five timings They discarded the first 20 because remember that stuff with the java start-up time and the caching and all that kind of stuff We want to have time for the algorithm to calm down and stabilize And so that's what this person did when they wrote it And then here's the results of running the program So the number of time in seconds You'll notice that a million takes a lot more than a hundred thousand And it looks like it's proportional and the best way to see this proportional is if you draw a graph of it And it's pretty much of a straight line very nearly linear And then they also calculate the correlation coefficient if you're really into statistics, you can do that Again, this is something you do not have to do in your summary But that will also help you to check to see if there's a linear relationship I will not take off for spelling grammar punctuation unless it's so bad that I can't understand what you're saying So if you have problems proofreading if you want to send it to me to do proofreading I'm really I really enjoy doing proofreading So if you need to say hey, can you correct spelling and grammar errors? I love to do that kind of stuff And you'll upload two files The java file and also Your um document So it could be docx. It could be pdf um Please don't please do not send me one of these macintosh only formats because I don't have a macintosh to read it on and um I've told you all about librae office. Yes Okay, if you want to use librae office, that's fine also Because it also by the way gives you the opportunity to Type in formulas right away and just see them nicely typeset, which is really sort of cool So that's the assignment. I'll go over this again on wednesday And I'm going to stop sharing and I'm going to stop the zoom session at this point Unless there are any questions from the folks who are um watching Okay, very good then. I'm going to stop recording And stop sharing see y'all I'll see you on wednesday, I guess