 1 down, what is the answer? Frequency distribution. Frequency distribution, very good. So, let us write this, anything else, 11, 11 is mean, is that ok? So, this is 11 down, more, anything else? Then which number? 15 across. Yeah, any other? 12th biograph across. 12 across. That is correct. Then, somebody said histogram, who said that? All people who said histogram raise your hands, ok, great. What number? 5 down. Anything else? 8 down range. Ok. 4 down, went diagram. 4 down. What is 4 down? You said? No, it is not. It will not agree with. I will give the answer to this, this is not, I do not know whether it is a popular one. Anything else? 9 across you should, you should know, 14 across also you should know. Yeah, 9 is line graph, essentially you just connect them as opposed to a bar graph, right. At every data point you put the frequency, just connect them with a straight line. 14 across you should know, ok, anything else? Yeah, what did, 6 x across, what did you say? Lower extreme. Lower extreme, that is correct. I want you to think about, we will come back after a few minutes. Let us get started with today's lecture. So outline for today's class is one is to recall or to recap whatever we have done and then I want to present a case study involving something called force part. Is there anyone who knows what force part is? Force part, anyone heard of Kino? No one? Ok, we will come to that. And then there are also some other measures, we will discuss that also. So frequency table is value versus number of occurrences or number of occurrences versus value. You can also express it as relative frequency and you would represent f by n, essentially you would scale the whole thing, ok. That is a frequency, relative frequency. So, f is a frequency for a particular value at a given value, so how many occurrences divided by the total number? So, it is scaled. So, that is the relative frequency. If you do it in the tabular form, it is a relative frequency table, ok. Line graph or line plot, distinct data sets on horizontal axis and frequency is given by the height and then you put a cross there vertical line. We saw that in the crossword. Bar graph is the same as line graph, but with added thickness and of course we also saw histogram. The same as bar graph, but it is for grouped data, we will see that and there is no gap. They are all compressed, so that is histogram. Then we have frequency polygon. I told you that in the line graph, you put a cross, draw vertical line. Here you put a cross, do not draw vertical line, just connect them up, ok. Connect all the crosses by straight line, that is frequency polygon. By the way, all these are taken from chapter 2 and I had already asked you to go through this. In case you have not, please go through that again because some of those are just descriptive. There is no point in spending too much time on that. It is all the explanation is given in the book and it is likely that you have seen most of them already. And then of course you know pie chart, so you break a circle and then the area will correspond to the percentage of that particular data value. And then as I mentioned, relative frequency that is after scaling by the total number, you can draw a relative frequency line or a bar graph. So relative frequency line is just put a cross and then connect them or a bar graph, that means you have thickness. And then the next one is the relative frequency polygon, that is you connect the crosses with straight lines. When you have lots of data and you want to visualize them graphically, it may be difficult to, especially in old days, you would rather the mechanism used was to group things within some range, you know it is almost like the grades. So you have the marks from let us say 80 to 89, then irrespective of what mark you might have scored, put all of them as AB. So that is a class or grouping. So this class, creation of class or grouping is done to handle large amounts of data. Sometimes, even if you have computational capability to handle each one separately, sometimes you just want to have quick grasp of what is being presented for those conditions also you would need this. As a result, such a thing may be useful even now, even though we have lots of computers excellent ways to represent things to group them, so it gives some idea of who comes in some range to come up with overall perspective. Of course, some details may be lost. So in case details are important, at the second level you can go expand it further, you can make them into larger classes, get more information. So this is one way to do that. Of course with computational tools, you know for example, such as Sylar, you will be able to do just by changing one number, if you write a program properly, change a number and you will say that repeat the whole thing for larger number of classes. For example, you did some grading based on 10 marks, then you say that I want to do this for every 5 marks, then you should be able to do that by just changing the number from 10 to 5 somewhere and the program goes through and does the whole thing again. Then you have class frequency table, these are some of the terminologies. So class frequency table is frequency table for grouped or class data. Similarly histogram that I mentioned, it is a bar graph for grouped or grouped stroke class data and compacted, there is no gap between that. So that is histogram and of course you can also do relative frequency, so that the total will add up to 1 and then you have stem and leaf plot. Have you seen this in the book? It's in the book, stem and leaf plot. Essentially when you have moderate amount of data, you divide the data into two parts. For example in the book, the following example is given. So essentially in the book you have numbers of the form, two digits and one decimal, it has been shown with stem and leaf plot and you have it splits this after the first digit. So for example if you have something like 59.1, you will have 5 here and then you put a vertical line and then 9.1 or 58.3 that means 58.3. Similarly if you have number 7 here, 72.5 then 7 and then after this line you put a 2.5. So this is just a way to represent moderate amount of information and this is useful to represent them in a compact representation so that for example the first digit here will not come for, it will come only once for all the things in that range. For example all numbers starting from 50.0 to 59.9 will have 5 in the front. Then 50.0 will be 0.0 and the last one will be 9.9, 59.9. So you are removing that 5. In that sense you do not need a space to represent all of this first digit, you are writing it only once. So it is a compact way to represent. I suggest you read the book in case you haven't. We have already seen mean arithmetic mean and then median also we discussed. Median is approximately or roughly speaking middle value when arranged in increasing order and of course if there are two values then you may have to take the average. That is suppose the median falls in between then you take the arithmetic mean of the two adjacent values. We saw that in the last class and mean is noise sensitive and median is less sensitive. We saw the case study last time, we saw the ABC News case study that said it is mean to ignore median. So we discussed that in great detail in the last class. And then sample variance, this is something that you would have seen in your school when you have data x1 through xn and the sample variance is denoted as s squared. It is 1 over n minus 1, note that there are n data items and you subtract the mean from each data item, square the difference, sum it, divide by n minus 1. This is known as sample variance and of course there is an identity you can expand this so you get this. Does anyone have a, can somebody tell me why we have this n minus 1 here, not n? Why we do n minus 1 or can somebody tell me what the textbook says why n minus 1 comes? Anyone, do you recall why the textbook says n minus 1 should come? The book says, let me recall that it is for a technical reason, it is for a technical reason. You can, after all this is a definition, you could have defined it in any way. So it is for a technical reason because what happens is you have the variance of the population but you will never study the whole population, you will always take a sample. You take a sample, you find the variance and then is this variance, variance of the sample, is it representative of the population variance, so that is the question. So you can show that under some special conditions for example this is random, truly random sample, unbiased, it is collected in an unbiased manner and so on and so forth. The expected value of the sample variance because you can collect several samples, I take one sample, I find the variance, I take another sample, find the variance, I take another sample, I can go on collecting the variances, go on collecting samples, calculate the variances for each one. What is the variability of these, of the variance values that you collect for every sample, what is the expected value of that, will the expected value be the same as that of the variance of the entire population. So you can show that if you use 1 over n minus 1, the expected value of the sample variance will be equal to the variance of the population, otherwise you will have something like the reciprocal of that n by n minus 1 or n minus 1 by n. So you need to put an extra factor otherwise, so it is for that to make sure that the expected value of the sample variance is the same as the variance of the entire population. This number is, this factor 1 over n minus 1 is put, it is also explained in the PDF file that I have put, so that is explained by a prof from Drexel University, I just did a search, located this and put it on Moodle, you may want to go through that. So s squared is the sample variance and it measures spread or variability of data and the square root of that is the standard deviation, s is the standard deviation. Now I am going to talk about this case study, force part or Kino refers to a game, did anyone see the write up on that? If I am not mistaken, I put the write up last night itself, did anyone see it, does anyone know what Kino is, yes, would somebody want to volunteer what Kino is, anyone, while waiting for your response. So let me what I will do is, before I go to that, I will show over to this, I see no one writing the wiki, so anyone thought about it. So who are all going to write the wiki? So why is that? See the thing is, I was under the impression that there are people with different interests, some are interested in doing the calculations, some are interested in writing blogs, how many of you write blogs, yeah, there are so many of you. So each one has a different interest, somebody wants to play games, somebody wants to solve puzzles, somebody wants to act, how many of you are interested in act, playing, I mean stage acting, yeah, yeah, what is your name, Rishabh, okay, who else is interested, because I am, yeah, what is your name, Bansal, okay, thanks. So Rishabh Bansal, who else, anybody else, yeah, there, Aatish, okay, anybody else, so what I would want is, yeah, yeah, I saw you here, yeah, what is your name, Ashish, okay, very good, anybody else, so the reason why I am asking is, none of you, I have not received any offer to do this role play, okay, those of you are interested or at least would like to know more about it, please meet me after the class, is that okay? Because we have allocated one class over for that, there are so many interesting stories, it is possible to take one of them and just do that, and you do not have to, no costumes required, no stage preparation required, just have to come, okay, you understand, so it is extremely easy to do, it is just the, coming up with an idea and putting it in the form of a stage act, is that okay? I am sure there are other people who would be interested in that, but I have now four names, Rishabh, Bansal, Ashish, Aatish, okay, so I would want, I want you to address these two activities, please, those of you are interested in writing in blogs, write something, it does not matter, we are not going to keep track of who is doing this and so on, okay, please do that, I have put lots of material, if you look at the, I have put an instructional wiki, have you seen that, yeah, I have put an instructional wiki and I have put a lot of links there, there are lots of interesting stories, so just a few of you get together and say let us do that, and then other people will join, is that okay? So what I am trying to do is to, to see if I, we can access or touch upon the interests of varying groups, okay, so I would want you to participate in all of these, okay, any more here? So I said that 14 across, you should know, 14 across, so what do you think 14 across is? 14 across is outlier, it is very easy, okay, 10 down also is very easy, we actually discussed this now, see I had talked about putting a cross for every value and then connecting a line, and the height of the thing is proportional to the, the value, number of items in that, what did we call that? We called it as line graph, there is another word for graph, yeah, somebody said that, yeah, line plot, so 7 across, yeah, somebody got it, who said that, okay, that is correct, scatter plot, okay, so this is the 2 across, I have not seen this terminology in the book, but in the book they talk about a box plot, right, so this is known as box and whisker, you should know this 3 down, yeah, stem and leaf, yeah, okay, so good, so let us go to the fourth spot case study, okay, let us take up case study, so this is the file Kino.pdf, can anybody read this, okay, so Kino is a, is a game, okay, and it is played for money, lot of money is won and lost, lost, lot of money is typically won by the casinos and lost by the people who play, okay, so Kino is played using a field of numbers from 1 through 80, in fact if you look at Wikipedia, I went through this explanation of Kino, it actually says that a process by which 80 balls out of 80 balls, 20 are generated by blowing some air, so 20 will come out out of the 80 in a random fashion, you can select or choose quick pick up to 10 numbers in that field, so we are talking about 4 spot game, so you are going to pick 4 numbers, you will pick 4, okay, so you have selected 4 numbers and this casino will randomly select 20 out of a total of 80, okay, so out of the 20, so you will compare whether your 4 numbers are part of this 20 that I have been selected, right, now supposing you bet 1 dollar, okay, you play 1 dollar, select 4 numbers, casino will select through an automated process, random process, they will select 20 numbers and see if any of the numbers that you have selected belong to, belongs to these 20, so the numbers, in fact the case study that I put up, I will show that also, if all 4 numbers match, okay, so they will give 50 dollars, if all 4 match, if 3 numbers match, they will give 5 dollars, if 2 numbers match, they give 1 dollar, is it clear, should I repeat, is it not clear to anybody, so that is the game, okay, you bet 1 dollar, selected 4 numbers, they will select 20 out of 80 numbers, okay, and if all 4 match, the numbers you have amongst the 20, if all 4 match, all 4 are in the 20, you will get 55 dollar, if 3 match, you will get 5 dollar, if 2 match, you get 1 dollar, if only 1 matches, no money, nothing matches, no money, is it okay, so is it possible for you to, so I want to find the chances, I put 1 dollar, I want to find out the chances of my earning, I want to find out how much I will get, right, so for example, if all 4 match, I get 55 dollars, right, if 3 match, I get 5 dollar, 2 match, I get 1 dollar, so how do I calculate my returns, what are the chances of my return, how do I calculate that, so do you follow that, so let me just summarize, as you think about it, okay, so you have 80 numbers, 20 are selected by casino randomly, and then compared, you were 4, your 4 numbers, okay, compared with your 4 numbers, 4 match, 55, 3 match, you bet with 1 dollar, and these are the returns, so how do we calculate, somebody wants to calculate how much he is likely to earn, what is the formula, so let us do it in terms of probabilities, can you say that suppose you calculate probability of earning, probability of 4 matching, supposing you know these numbers, probability of 4 matching, probability of 3 matching, probability of 2 matching, then how will you calculate the earnings, okay, so let us say that P4 is probability, can you see it, maybe I should write capital, probability of matching 4 cards, so P3 for 3 cards, P2 is 2 cards, and then you also have here 55 dollar, 5 dollar, okay, so what is the, how do you calculate the earning, okay, so the earning equals 55 times P4 plus 5 times, okay, because if you get 4 you will get some money, if you get 3 you will get some more money, so you look at all the chances, you sum up all of that, supposing there is a penalty, right, if nothing matches you will lose some money, you will have to pay some more, okay, then you will put negative something and the probability of that, right, is that okay, so we just have to calculate what is P4, what is P3, what is P2, and then just finish it, okay, the question is what is P4, okay, by the way this comes from what is known as hyper geometric distribution, okay, you will read about it in detail in this course, but you can show that P4 is 20, in fact Px I can write 20 Cx, okay, supposing 4 match, supposing 3 match, then what will happen is 3 match means in the 20 you have 3, what are the different ways of, what are the different combinations of selecting them, so if 3 are there the remaining one will be in the other 60, okay, then you have 60 C1 ways of selecting that, so these are the, all the combinations that by which you can get 3 matched divided by the total possibility which is 80 C4, right, you will study about this in the class, okay, this will be discussed later on, but this is something extremely important and it turns out, so what they did was they calculated and as a matter of fact, okay, so what will be P4, so this will be 60 C0 divided by, so can you just write this calculate P4, do it now, I do not want you to simplify the calculations, but just cancel common terms in the numerator and denominator, so I want you to do it for P4, does anyone have supposing I just simplify it, numerator is a product of some factors, denominator is a product of some other factors, anyone got this, you can show that this is nothing but 17 times 18, after cancelling terms you will be left with these and then in the denominator, okay, and then what I have done is I have, I had done this calculation in Sylab, I have multiplied by 55 that is the money that you will earn if this happens, when you do this it says it is 0.1684, so this is the return in cents for a $1 investment, this is P4, earning through 4 cards matching, similarly P3 you can show to be 0.216 and then next one is P2, okay, the income through 3 cards matching is the highest, so what do you get if you add up all of this, how many cents do you get, here you have 16 cents, 21 cents, another 21 cents, so 42, 58 plus 1 here 59, so you put $1 you will get 59 cents, if you do lots of them, so you are lottery announced for 4 Wednesdays in a month, they will double this earning, okay, so it is actually 59.5, almost 60, so if you double it then you will get $1.19, that means you put on those days only as a promotional thing, you put $1 you will get back $1.19, okay, so these guys, these I think 2 guys or 3 guys, they said they actually did simulation, how did they do simulation, they randomly go on selecting, okay, there is a random number generator, you can write it, for example you can write it in Sylab, in fact I will give it as a problem for one of you, if you want to write that, simulate it, just go on selecting and then matching, right and if you play long enough times, then they actually simulated it and found that they would indeed get $0.19 as profit, so they just went there and then played for those 4 days 24 hours, they did not do anything else, okay, they were delayed because it took time to take the printout, because they have to collect the tickets, because of that they lost some valuable time, okay and then they found this case study says that the money that they made eventually is off from their simulation value by only $100, they made a cool profit of $100,000 in that month, okay, of course these people when some statistics people came across this, they went back to the casino and they asked them, did this happen, so the casino people said they did not recall anything, actually for the people who played, it is good that they did not recall, otherwise there could have been some problems, because you go to the casino to lose the money, okay, if somebody is making lot of money the casino people are not very happy about it, but anyway the casino people did recall, yeah, we did a proportional scheme and so apparently this professor asked these people who played this game to come and explain to his students, okay and so these students explained that and concluded by saying that, so sometimes it makes you know useful sense to pay attention in the mathematics class, that's how this story ends, so I have taken this from a website called Darmoth's website where they have something called Chance News, okay, they talk about that, so I have put that, okay, we will discuss this in the next class, but there is one thing I want you to do, there is a problem in rounding numbers in percentile, so I want you to think about it, in fact there is a discussion in CNX.org from Rice University, I want you to look at this, so I will stop here, thank you for your patience.