 guess the phrase. So, the guess for this or the hint that you are supposed to use is given here. So, it refers to a field that uses measurements. So, you have quite a few letters here as the number of letters becomes larger, is it easier to guess or difficult to guess, easier to guess. Are you with me? Let us take a guess. So, it says it refers to a field that uses measurement. I will give in fact one more hint. It constructs models from data that is another hint. So, let us get started with this. So, let us take a guess A, A is there, E also is there, U, look at this, which I, this I, alright, I has been guessed, S, T, I think you got this, L, R, ok, great, excellent job. So, this course is about inferential statistics. There are going to be four instructors for this course, all the four are listed there. There are a total of about 25 lectures in this course, and we will divide the load equally, the lecturing, ok. But I am going to be the overall instructor of this course, which means that I take care of all the administrative details, attending to your problems, needs, getting your feedback, all that I am supposed to coordinate, ok. So, the outline of today's lecture is as follows. I will partly explain the organization of the course. We will continue to do that in the next class as well. So, this course is on statistics, both descriptive and inferential. You have already seen this inferential statistics phrase. We are going to spend some time on the history of statistics, and I am going to explain to you how you will all participate and learn this course in an active manner, ok. In fact, this is a new field in pedagogy, which is proven to be very effective, ok. The textbook for this course is Sheldon Ross, it is available in Indian edition. The bookstores have sufficient copies of this textbook, ok. Some details of the course, we will insist on attendance. There will be two quiz, one mid sem, one end sem. Some of the details I will tell in the next class, some more details what the weightage is and so on. There will be tutorials, there will be computation. I am personally interested in this package Sylab. We could use other packages as well. The book itself comes with a CD, ok, but only thing is it is a more or less a closed kind of a thing, but if you use Sylab, you will be able to do lot of computation. We are going to spend some time on this. In fact, I am going to devote the next tutorial session in going through Sylab for all of you. In fact, the next tutorial session will happen in this hall only. So, we will be, we will, I will meet all of you in two batches, right. All of you will come here and then we will do the Sylab introduction during the tutorial. We are going to use Moodle quite heavily, lot more than what you have seen earlier. Moodle also become clear as we go along, ok. You have any question? Any question? Ok, if not, let us get started. The, we have to do a balancing act. How to teach the relevant material, knowing fully well that some of you may have had very good introduction to probability and statistics, especially if you have gone through, let us say CBSE or if you have studied this portion reasonably carefully, which most of you would have done. So, there is a balancing act. We want to keep that in mind at the same time, deliver something that is useful to you. By the way, statistics is really interesting field. It will become clear as we go along some case studies and so on. You know, I am going to do. So, in that sense, my task is lot easier. I do not have to teach too much of theory. I have to spend lot of time in motivating you and then giving you an overall picture and then introducing you to some of the tools that we will be using and so on. Now, it is a wonderful subject. You, some of you may already know enough of it. So, we need to do a balancing act and make you also participate, make you also to contribute to the course learning. So, if you can do that, then that will be wonderful. So, in that sense, I would also want everybody to participate, want you to give your feedback. So, you know, feel free to interrupt me anytime and ask me questions. Let us begin with statistics. It is all introduction. Statistics is learning from data, begins with a given set of data. Of course, it is important that the data have to be generated carefully. A lot of this course is going to be devoted to how you would generate the data. In fact, I will talk about this wiki later, where you would have to create a wiki that talks about some of these things. We will come to that shortly. How many of you have seen Moodle webpage? Just raise your hand. Is Moodle running? Is your course on? Okay, some people have seen their course, but some haven't seen. I haven't seen. I tried yesterday, it wasn't there, but it is in the process of getting ready. It will be up and running soon. So, we will be using all of this. So, data to be generated carefully. In fact, I want a set of data to be generated for this course through contribution by you. Hopefully, we can use that data for the rest of the course. What do I mean by data has to be generated carefully? Whatever data we use has to be characteristic of the population. You know, statistics never deals with counting everybody. Everybody in the country, for example. There is always a sample. In fact, there is an interesting anecdote to that. We will attend to that shortly. So, the data that you would use for analysis has to be carefully selected and the data collected has to be presented. This is known as descriptive statistics. I talked about two statistics, descriptive statistics and inferential statistics. This is descriptive statistics where you present it, you draw pictures, you draw bar chart, histogram, you know, box plot, all kinds of things. So, that is descriptive. It turns out that half the work is done once you follow the principles of descriptive statistics properly. This is the word that you saw in Hangman inferential statistics. It has to do with drawing conclusions from the analysis of data. Data has been collected somehow. We have the data. What do I mean by drawing conclusions? Can we arrive at probability, etcetera after doing a limited number of experiments? After talking to only 5 percent of the public in the country, can you draw conclusions about the whole country? 5 percent is actually very large. Maybe people, you know, maybe talk to at the most a million people. That itself is a huge number, but that is a very small percentage. So, can you arrive at probability, etcetera? First of all, even before that you will have to find out whether probability model is even applicable to this system. So, to draw such conclusions, make some assumptions. This is what I am saying that probability is even applicable. This is known as the probability model. First of all, you say that this whatever underlying phenomenon happens by chance. So, a probability model will be suitable for this purpose. Sometimes the underlying probability model is obvious. And some other times, one has to go through careful analysis and presentation to unearth the model. And it has lot of benefits. In case you can understand the model, underlying model, then you can construct, you can use the model to predict what will happen. In the next class, I am going to talk about case study on gambling. It is a real story. I will talk about it in the next class. And we will see at the time you will have to recall this how something like this is actually useful. As I mentioned earlier, all of the things that you saw in the previous slide assume that the underlying phenomenon can be understood through probabilities. And it uses data to arrive at the probabilities because you do not have the model. All that you have is data. You have to construct the model out of the data. And obviously, to construct such models and so on, a knowledge of probability is required. Is that okay? You have any question? How many of you think that you have, you studied your grade 11, grade 12 statistics and probability reasonably well? Can you raise your hand? Whatever is supposed to, you are supposed to know in grade 11 and 12. I am not saying anything more than that. Did you study your portion properly? That is all. Please raise your hand. Anyone? No? Somebody raised his hand there. Few people. Yeah, I see some people there. Yeah, some more. So, what does it mean? Are you happy that at least some people studied? What was the portion for probability and statistics in JEE? What percentage? Ten percent, five percent, 20 percent. What is it? Anyone remembers the number? You know what is the percentage for integration in JEE? I remember, recall a story sometime ago in IIT in the first year, about 100 students had failed in the first year. Not, of course, this course was not even taught. I am talking about maths, physics, chemistry, courses. And it was the case in all IIT's also. And so, one of my colleagues talked to the students and asked them, so what happened? Please tell. What was difficult? Why did you do badly? Why is it 100 students have failed? So, the students told that they came to know in the coaching class that integration had only, I think, 5 percent or 6 percent by age in JEE. So, the coaching classes told them, do not study integration. So, this was the case. This happened about 4, 5 years, real story. If you want to know who the faculty member is, I will tell you in private. It is a real story. He talked to your colleagues, found out. And so, the students came here without studying probability. They could clear JEE, of course. So, it turns out that lot of what we do has a bearing on, I mean, the weightage in JEE seems to have a bearing on what happens before IIT and after IIT or during IIT. That is the reason why I wanted to know how many of, what was the weightage for probability and statistics in JEE. If it had been high, then you would have studied grade 11, grade 12, portion 12. These are two very important words, population and sample. Population is the entire universe. Sample is a small manageable size. In fact, you know, as I mentioned earlier, sampling is a very important tool used in census, population counting and so on. So, it is I said important tool. Sampling is not just a process. In statistics, it becomes a tool, which means that it has to be sharpened. It has to be clearly designed. Sample is a small manageable size that comes out of sampling. And a rule of thumb is a sample. If it has to represent, the whole population has to be randomly chosen without any bias. Let us begin with the history of statistics. In fact, we will look at the origin of the word statistics and so on. So, the history of statistics started with numbers on population, economy, etcetera. Of course, in this book, Sheldon Ross talks about what happened in Great Britain, in London. A lot of things seem to have started with plague. You know that plague killed a large number of people. It was prevalent for many hundred years. And he talks about statistics done around that time. But if you go through, so I would say in that sense it is little bias because you see that this is Europe centric. It starts discussing statistics from plague, from the time of plague. But if you look at the history, people kept track of people in Babylon, Egyptians, the Jews and so on and so forth. People had a kept track of this. In fact, at this point you may want to know about this. How many of you know about the story in which Parsis came to India? Does anyone know? Place your hand. I can see one person. Anybody else? Thanks. There are some more people. I am pretty sure there are quite a few people, but it is a very nice story, very important story. It also has to do with this population count, keeping track of the numbers. The government knowing how many people they could support with the infrastructure and so on. Anyway, because not many people know the story I will tell. So this happened in Gujarat. It happened long time ago. Possibly how long ago was it? Thousand years ago? Perhaps even before. So a group of Parsis come outside this town in Gujarat and they send a message to the king saying that we have come. Please let us in. We want to live here because we are getting persecuted in our country. We have come all the way to India. We have heard about India. So we have come to your country. Please let us in. Let us live. So the king sent a glass full of milk in a big platter. So like you would have seen in movies, somebody goes and delivers. Probably it was covered. They remove it and they see inside a glass of milk, full to the brim. But the Parsis leader was a very wise man. What he did was he took a spoon of sugar, put it in that, mixed it, sent it back. So the king of Gujarat, he drank that milk and it was sweet. So the message that was sent was that we will not increase in numbers because of us. You will not overflow. Essentially the king had said we are already overflowing. We have lots of people. This is all our facilities can support. Cannot support anything more. But the Parsis leader sent a message saying that it will not overflow. Not only that, we will add sweetness. We will add value to the population. And what is interesting is they have kept to that promise even today. So what I want to say is that the very fact that the king had sent this message says that he was fully aware of what the limits are, what the population was. In fact, you would see shortly how somebody estimated the population of London in 1600s and it was considered new. So that is why I am saying that counting, knowing the number of people, knowing what the facilities can support is actually an important thing. But people had known this. That is why I said that from ancient days, people had known this. The book does not talk about it, but you will talk about it in the wiki. You are going to develop a collaborative wiki. So statistics started with numbers on population, economy, etcetera. It was done by the state to collect information relevant to the governance. It had to do with the state. So it was known as statistics. Examples, birth, death, marriages, etcetera. Initially, as I mentioned, earlier, only death due to plague in Europe. But later on, the same European countries started collecting details on other facets as well. So the Englishman, John Grant, wrote a book Natural and Political Observations Made Upon the Bills of Mortality. They started with the plague, collected from London's deaths. So look at this statistics. Year wise, total deaths and deaths due to plague. Look at the year 1603. Most of the deaths were because of plague. And so plague indeed was serious and lot of things developed around the plague. And as I mentioned earlier, Grant used the mortality information to estimate London's population. The sampling showed that there were three deaths for 88 people in a particular year. Therefore, he estimated the population to be, population of London to be this. So 13200, I think it was later on somewhere around this time, 1640 or something like that. I have one more game. Can we play the game? Hangman. Do you see it? About this we make conclusions after statistical studies. You want to take a guess? A, no E. Look at this. I, I, I. Somebody got it. Yeah, yeah, yeah. So it is a simple word, population. So actually in the next class, we will actually do crossword also. There are some very nice interesting crossword and then it will be even more interesting. So, so it is population. All right. So in the, the remaining time I would like to talk about how some of these the data collected led to model construction or inferences. Okay. Little bit about it. It is all taken from chapter one in this book. Okay. I have not taken it. I thought the first day I will restrict myself to what is in the book, first chapter. So Grant used the bills of mortality data to estimate of the percentage of population dying at different ages. Is this useful? Whom does this help? Whom does this help? Supposing I know this. Yeah. Insurance companies. Yeah. Answer is correct. Can you explain how? Yeah. Stand up and tell. Go over said insurance company. Please stand up if you want to say more. Because people who died at different ages, insurance companies have to pay different amount of claims. Like those who died at higher ages, 80, 75, they will have to pay more claim for their deaths. And those who died at earlier ages, they have to pay less claims. So depending upon average expectancy rate and the health, companies will decide their interest rates. Excellent. Excellent. Terrific. Terrific. Yeah. I mean, of course you would have heard of this word pension, right? The pension is for a lifetime. Pension is for lifetime. So if a person doesn't live long, then the government has to give pension less. Government or insurance company. Insurance company, even in India, the insurance is getting completely deregulated. So the insurance companies are interested in this. Okay. So look at the mortality table. Okay. 0 to 6, age at 0 to, age at death is 0 to 6, number of deaths. So 36 percent of the children died when they were small. Okay. This was the statistics in early 1600. That okay. So those selling annuity liked it. So this is precisely this. What is this? What is annuity? One time lump sum amount is invested. Of course you may do it over a period of your service period. During service you go on contributing some money and the pension starts the moment you retire. Okay. Annuity received once a year. It is profitable to the insurance company if the insured does not live long. So as I mentioned earlier, our pension comes along this. And I did a search. Do we have an annuity plan? We indeed have. Given the plan is a profits deferred annuity pension plan. Okay. We do have this. Now Grant's work inspired Halley. Halley's comment, the same person. The Halley computed the odds that a person of a certain age would live to another age. If somebody is of age 30, what is the probability that he will live since live until 70, live until 80, live until 40, whatever. He calculated the statistics and he convinced the people. So this is an important thing. He convinced the insurers that the premium should depend on the age of the insured. Okay. Especially if you are going to, you know, pay for a long time, you better know. So this was the first time such a thing started happening. So because of all these things, data collection became popular. It is a profitable thing for people to know about this. I showed London's thing. So that started in the late 1500s. Then Paris also started collecting bills of mortality in 1667. And 1730, most year of recorded age at the time of death. By the way, this was not there. For example, he said, see here, they only had number of deaths recorded. They did not record the age. So as a result, that is the reason why this statement is made. In 1730, most year of recorded age at the time of death because it would help do things like this. That data could be used for many useful things. They did not record the age. So as a result, that is the reason why this statement is made. In 1730, most year of recorded age at the time of death because it would help do things like this. That data could be used for many useful things. Okay. Until 18th century, statistics meant descriptive signs of states, descriptive signs of states. That is statistics. In 19th century, it became identified with numbers. It is seen as numerical signs of society, at least in Britain and France, numerical signs of society. The reason is that lots of data were available at that time. That is the reason why the definition itself got changed. So I am coming back to the reference I made earlier. Until 19th century, entire population was used. They used to count everybody, not a sample. Okay. Now, Francis Galton analyzed heredity geniuses through regression and correlation analysis and he started a research center and Carl Pearson started inferring from data. So he started inferring from data. This is around the time when sampling became popular. Before that, they would count everybody. Everybody. It was not even known that you can get away with a small sample. Okay. In fact, when I was discussing this with one of my colleagues, with my colleague, Professor Sahana Murthy, and she was asking, is it possible to do sampling to know the attendance of people? Okay. Instead of taking attendance from everybody, can I take a small sample and do that? What do you think? No. Why not? Why not? Why not? Remember, one of the important, one of the necessary conditions, most important necessary condition to be fulfilled is for sampling to work. It has to be random. So I cannot take one group here and say this is a random sample. By the way, there are many different ways of sampling. One is to put people into, put the population to different clusters, randomly put them in different clusters and take a whole cluster or randomly pick. So there are many different ways of doing it. I mentioned sampling itself is a tool. So in this case, if you want to, you can say things like, you know, what is the average weight or what is the height? What is the eye color? I mean, things like that you can possibly come up with. But if you want to say about the attendance of every person, you can possibly say how many people, what is the general attendance? Okay. How many people normally sit here and how many people are sitting here today? You know, things like that you can do. But you cannot come up with attendance. At least I don't know. It will be nice if we can, if we can answer this in a more positive way. Anyway, so this is the first time that they did the sampling. Okay, before that they were using the, they were counting everybody. And more interestingly, this person by name Gosar, who was Pearson's student, he developed the famous t-test. You will study about all of this in this class. He used to publish under the name student because he was doing this in his spare time. This Gosar was a chemist. He was working for Guinness Brewery in his spare time, maybe in the evenings, maybe in the company itself, he was, they thought that he was measuring something, but he was actually calculating some statistics. But he didn't want his company to know that he was doing statistics. He didn't want his company to know that he was doing research. He'll say, why are you doing research? Okay, so he was publishing under the name pseudonym student. Okay, there are various definitions of statistics. It's given in the book. So this book calls it as the art of learning from data. Okay, now the last two slides, I have two, about two slides. I want to just go through a plan to cover my portion. That is the first six hours. Out of that first hour is already over. I'll have five more hours. I plan to go through case studies. I want to talk about, one of them is on gambling that I talked about. The people who, it's a real story as I mentioned. The people who worked on this, who applied this, figured out that it is something called hyper geometric distribution. So using that, they actually calculated some numbers and it helped them enormously, as we would see. That is one. The other case study that I want to talk about is Russian election. Recently held, I think it was held in 2006 or something. And through some statistics, people found that there was massive rigging. Okay, but people didn't care about it because people even before they had said that they didn't believe that in Russia elections would be free and fair. Okay, so nobody bothered about it. It was rigged. Okay, the statisticians actually proved that. That is another example I would like to cover. There is another one on sampling. It's a very interesting thing. It happened, I think, not more than two years ago. Great Britain wanted to find out the number of foreigners employed. So this talks about, this episode talks about how the counting was done in a ferry or in ferries to find out people who come to London or England by ferries. It talks about how somebody in a blue jacket would stand somewhere. And then you keep on punching one, two, three, or tenth. You'll say, go and say, okay, tell me who you are, so on and so forth. Okay, so things like that. So there are some case studies I have in mind. And then we will conduct a survey. As soon as model is up and running, it will come up. I would want you to participate in that. And then hopefully we'll be able to ask some questions that are relevant to whatever is happening in the hostel. Things like accommodation, tum tum, bathrooms, so on and so forth. It is to collect data. There are lots of people here. So we can collect some useful data, interesting data that can be used in the rest of the course as well. We can actually ask questions like, is this correlated with something else? Can we draw a scatter plot? Can we conclude something from the data that you have given? Are two things completely uncorrelated or is there some? So we should be able to try out some of the things. You yourselves may be interested in some of those. And then I talked about this collaborative wiki that you will do on sampling, sensors, case studies, history, geography and so on. There are lots of things. So we'll train you how to use wiki on Moodle. How many of you have tried the wiki on Moodle? Anyone? Anyone has your own wiki? If not in Moodle, have you contributed to the wiki pedia? Anyone? Yeah, I see some hands. Anybody else? So in this course, you will actually do that. Adding your wiki is extremely easy. We will of course play a lot more of hangman. We will play crossword games. I plan to give a crossword, plan to give crossword games also in tutorial and maybe in the quiz also. How many of you would be interested in this role play? Yeah, I see some hands. How many people would want to know? How many people know? How many people don't know what role play is? It is to play the role here. Maybe a group of 10 people, 15 people, 20 people, take a story and perform it. And then I will of course invite proposals, people who are interested in. We can take up any topic, let us say in the book or in a case study that is relevance to this course and then play it. It is a small act. So I am thinking of something like maybe 3 or 4 groups can actually come and perform. Do this for 10 minutes, spend 5 minutes in summarizing what is learnt or what is explained through that role play. Is that okay? How many people would be interested in this? I see more hands now. Excellent. Terrific. Thanks. So in the next class, we have come to the end of this class. I will give more details of course administration. I have not told you all the details. In fact, we can have the remaining time for answering questions that you may have. More details of course administration, information on the survey that we are planning to do, a detailed course outline. So we have come to the end of this class. Thank you for your patience. See you in the next class.