 Good afternoon and welcome back. So, the whole idea about this entire module is to give you an idea about how to represent your scientific data and you will get data whether you are working in science engineering or any other field and most of the people are often confused what type of plots to use, what type of error bars to use, what is the best way to represent the data, how to avoid the misleading figures and graphs. So, in this, in these four modules I will try to give you some idea about that. So, let us first start with the types of variables and data because this is something which is very important to understand the entire thing. So, what are variables? Variables are the values of the quantities that vary from one measurement to the other. So, something which is varying from one measurement to the other is called a variable. For example, if I ask about the gender and age of the people in this class, so gender will be a variable because it will be different for different people and age will be different for different people and the variables can be of two types. Your variables can be called qualitative variables or quantitative variables. You will call your variables to be a qualitative one if you cannot associate numbers with it and you will call your variable to be a quantitative one if you can associate numbers with it. So, is this clear? Now, what is data? Data are nothing but the values of the qualitative or quantitative variables that belong to one set of measurement. And your data again can be of two types. You can have a qualitative data where you cannot assign any numbers or you can have quantitative data. So, let us first talk about the quantitative variables. So, just to remind you, quantitative variables are the variables with which you can associate the numbers. And if you talk about your quantitative variables, your quantitative variable can be a continuous variable or it can be a discrete variable. For example, if I ask you to count the number of petals on a flower, you are going to get some numbers 1, 2, 3, 4, 5, 6 and so on. But if I ask you to measure the height of all the students in this class, then it is going to be some non-integer value. So, data is going to vary continuously. So, if data is in decimal not pure digits, then it is going to be a continuous data. Otherwise, it is going to be a discrete data. So, it can be continuous or it can be integer or discrete data. So, continuous variables are usually measurements. For example, if you measure the height, height cannot be exactly 1, 2, 3, 4. It can be weight or length of different people. And discontinuous variables are usually counts. For example, number of petals on a flower, you count the number of petals in a pond. All of these are going to be the integer numbers. And what are qualitative variables? You will get the qualitative data whenever your observations fall into separate categories. For example, you ask about the color of the eyes of the people in this class. Different people will have different eye colors. Someone will have brown, someone will have black, someone will have green and so on. So, in this case observations are falling into different categories. Exam results, you talk about the exam results. They are pass or fail. You talk about those socioeconomic status. There are three status. You can divide roughly lower, middle and upper class. Now, as I told you that quantitative data can be of two types. You can have integer or you can have a continuous one. Similarly, your qualitative data can be of two types. It can be called nominal or it can be called ordinal. You call your qualitative data with which you cannot assign numbers to be a nominal one if you can just associate the names. However, if you can put your qualitative data in certain order, if you see a natural ordering in your data, then you will call your data to be ordinal one. If you find that there is a natural ordering in your data. So, your variables can be of two types qualitative and quantitative. If you talk about the quantitative data, it can be an integer data or it can be a continuous data. Similarly, if you talk about the qualitative data with which you cannot assign the numbers, it can be either nominal or ordinal. It will be ordinal if there is a natural ordering and it will be a nominal if there is no natural ordering. So, let us do some brief exercises. So, this is your first exercise and here you have to fill in the blanks. So, based on the lecture, can we just have two, three minutes and try to finish this exercise? What is the first one? First one will be the qualitative because you cannot assign numbers. Second one, it will be a quantitative and continuous variables are usually the... No, continuous variables are usually the measurements. Example is height. Discrete variables are usually counts. Yes, they are count. When observations fall into separate distinct categories, they give rise to... They give rise to qualitative data. That is right. The last one, an ordinal data has a... There is a natural ordering. So, if you find a natural ordering in your data, then you can call it to be a nominal. Second one. So, here it is slightly different thing and here a type of variable is described to you and you have to find out which two of these will apply to that set of data. So, let us talk about the color of eyes. Which two options are correct for the first one? Qualitative and... Nominal. Nominal, right. Okay. Second one, quantitative because you can associate numbers and continuous because you are going to get not exact numbers. You are going to get continuous data. Let us move ahead. Third one. It is a quantitative integer. Very good. What about the fourth one? Qualitative and ordinal because you can put them in a certain natural order. What? Okay. Well, then there are some home assignment that you might have to do. We can skip this at this time due to the positive of the time. So, in the homework, you have similar questions you might have to complete. Okay. And based on the lectures, you can complete them. Fine. So, so far we have learnt about the different types of variables and data. Okay. And now we will try to learn how to use different type of plots to use your, to present your data and how to make good figures and how not to, how to avoid misleading figures. Okay. So, let us start. So, when you are doing a measurement, you can get a continuous data or you can get a discrete data. So, first let us try to understand how we can present a discrete data. Okay. So, a very simple and effective way of describing a discrete data is basically counting how many observations are falling into different categories. Okay. So, let us say there is category A, category B, category C and so on. So, you find out how many of them belong to category A, how many of them belong to category B, how many of them belong to category C. Okay. And each of the numbers you find for each of these classes will be called frequency for that. Okay. And this overall thing will be called a frequency distribution. Okay. And you can also calculate the relative frequency. We will come to, we will just learn after this slide what is that. Okay. And once you have found out the frequency, you can plot the frequency distribution. And so, for example, let us say you have this data where there are people in different disciplines in a college, physics 70, then chemistry 85, mathematics 30 and so on. So, you have all the numbers. Okay. So, you have counted how many observations are falling into different categories. Okay. And then if you have to find out the relative frequencies, what you do? You take a total of all of these frequencies and divide each of these by the total sum. Okay. And sometimes people also like to show the percentage for each category. In that case, you multiply this number by 100. Okay. So, normally if you have to plot the frequency distribution, you want to show all these counts. Okay. In that case, you use something called bar chart. And this is how a bar chart would look like. Okay. So, on the horizontal axis, you will have different categories, physics, chemistry, mathematics, biology and so on. And on the vertical axis, you will have total frequency. Sometimes people also show relative frequency and percentage. But it is not a very good thing to show on a bar chart. Okay. For that, you use a different kind of chart. Okay. We will come to that. Okay. The important thing is that there should be gaps between all of them. Okay. Because all of them are different categories. Okay. And you can put these boxes in any order. Okay. There is an alternative way of displaying this data that is called Pareto chart. And it is a homework for you to find out what it is. Okay. Now, so I told you that once you have found the frequency, you can show this frequency distribution using a bar chart. Okay. However, if you want to show the relative frequency or percentages, it is a good idea to use something called pie chart. Okay. So, this is how a pie chart looks like. Okay. So, your total number of observations will be equal to the desk. Okay. And each slice represents the proportion of the total. Okay. So, it is the same data. It is the same data. But if you have to show the relative frequency or percentage for each of them, you use a pie chart. Okay. Now, so, bar chart if you want to show the frequency distribution. And then if you want to show the relative frequency or percentage, you use a pie chart. Normally, this is described, they are being used to describe the just one set of data. Okay. But often when you are doing science or engineering, there is a need to compare two sets of data. Okay. You would like to see whether there is a correlation between these two sets of observations that I have met. Okay. And in those cases, it is usually a different kind of plot which is used. Okay. For example, you want to test the hypothesis or the number of caterpillars on a oak leaf is related to the size of the leaves for a sample of 100 leaves. Right. Now, obviously, in this case, you can see that there are two different types of measurements. One is size of oak leaves. Okay. And the second one is the number of caterpillars. So, there are two different sets of observation. And we would like to see if there is any correlation between them. The second example is does the number of sparrow in a particular village is determined by the number of houses in the village. Again, two sets of measurements, number of houses and number of sparrows. And we would like to see if there is any correlation between them. Third one is the length of the upper arm bone is related to length of upper leg bone in a group of 100 students. Okay. So, obviously, there is a need to see if there is any correlation between this data. Okay. Whenever you want to study this kind of relationship, you make something called a scatter plot. Okay. So, before you make a scatter plot, it is a good idea to think about what is your independent variable and what is your dependent variable. Right. Anyone? Definition of independent and dependent variable. Yeah. Like y is equal to x plus 1. So, y is the dependent variable and x is the independent variable. Because the value of x will determine the value of y. It is not the value of y which will determine the value. First thing, whenever you have two sets of measurements, find out which is your independent variable and which is your dependent variable. Okay. And then, once you have identified which is your independent variable, you plot that independent variable on the x axis and then you plot the dependent variable on the y axis. For example, this is a scatter plot where on the x axis, we have altitude of a place and on the y axis, we have temperature of the place. Now, you can clearly see that as the altitude is increasing, the temperature is decreasing and the relationship is more or less linear. Okay. So, in this case, altitude is the independent variable and temperature is the dependent variable because it is the altitude of the place which is going to determine the temperature. Temperature cannot determine how high the place will be. Right. Okay. So, let us think about, let us go back and think about the dependent and independent variables in these three examples. Okay. So, in the first example, which is the dependent variable and which is the independent variable? Number of caterpillars is a dependent variable or independent variable? Number of caterpillars is a dependent variable and size of the oak leaf is a independent variable because it is the size of the oak leaf which will determine how many caterpillars can live on this. Okay. Number of caterpillars will not decide the size of the leaf. What about the second example, which is dependent, which is independent? Number of houses is an independent variable and number of sparrows are the dependent variable because it is the number of houses that will determine how many sparrows can live in that. It is not the sparrows which make the houses. We mean the houses. Third one, interesting. No, third one I am talking about. Yeah, so in this case it is not obvious which is the dependent variable and which is the independent variable and whenever you have such a situation you can treat any one of these variables as an independent variable and you have to take the other variable as a dependent variable. So, let us consider another scatter plot where somebody has tried to see the relationship between the number of faces in a pond and pond size. So, here the person did a measurement of the area of the pond. So, two measurements he made. One was the area of the pond and the other one was the number of face. And you can see that this relationship is more or less linear that as the area of the pond is increasing the number of faces are increasing. But there is one point which is of this trend right and such points are called outliers. Now, whenever you get a whenever you make a scatter plot and you get outlier you just think about it why you got this outlier whether there is something you got due to an error. So, in this case what you have to do you have to go and study that point in detail. Now, the question is why one can get this kind of outlier one data point which is very up from the other ones. Now, in this case two measurements were made. One was the number of face and the second was the area of the pond. So, obviously you might have used some technique to determine the area of the pond. It is possible that you underestimated the area of the pond that is why the number went up or you somehow overestimated the number of the face. Is there any third reason which could be there for getting this data point wrong? It will just think about another possibility what could be the other possibility. In this case I am making the measurement of the area of the pond. So, it is possible that area measurement was right, but in one of the points like this one the level of the water was slightly high and that resulted in this increase number because it could support more number of face. So, this could be one of the reasons. So, you have to go and study that point in detail and obviously in this case if I have got that hint I would like to correct my measurement instead of measuring the area of the pond what I will measure? I will measure the volume of the water in that pond. So, sometimes they can also help you to correct the errors and guide you towards the correct measurement. So, far we have seen how to present a discrete data and let us talk about the continuous data. So, this is an example of a continuous data. This is the body weight of about 100 individuals in a class. So, let us say you have measured the body weight of about 100 individuals and you want to present this data to your supervisor. Obviously, you will not give him the data like this would like to show him in some other one because from this data looking at this data I cannot make out anything. So, whenever you have a continuous data you have to think about two things. First thing what is the range of this data that means what is the minimum value of the data and what is the maximum value of the data? You find out the range and then you divide this range into equal number of bins and then you find out how many. For example, let us say my data is from 60 to 120. So, what I will do? I will divide this entire range 60 to 120 into different bins 60 to 80, 80 to 100, 100 to 120 and so on and then I will count how many observations are falling into 60 to 80, 80 to 100 and 100 to 120 and then once I have done this I can represent this entire thing as something called histogram. So, a chart which shows the frequency distribution of the continuous variable is called histogram and here what you have generally is the range of the bins 64 to 66, 67 to 69 and so on. So, these are called different bins. You count how many observations are there in each category. Sometimes what people do instead of showing these start of the bin and end of the bin they also show the bin centers. In this case it is just the end of the bin which is being sometimes people also show the bin centers rather than showing the start and end of the bins. So, if you have a discrete data you can use a bar chart, you can use pie chart and if you have a continuous data then you have to use histogram. So, based on the lecture let us try to do this quick assessment. First one is bar chart, second one? No. Whenever you have to show relative frequency or percentage you use a pie chart, third one? No. You have to use a scatter plot. D scatter plot can help you to find out the out glass. E histogram, f independent variable is plotted on x axis or horizontal axis and dependent variable is plotted on y axis. So, second assessment is you are described as a data and you have to find out which one is the most appropriate type of plot to use. If you have to show the body weight of 500 students in a class which one you will use? You will use a histogram. Second one? Bar chart. Third one? No. No. What is the? You will use a scatter plot because you want to see relationship between two sets of data. Fourth one? Scatter plot. Fifth one? How many of you think that it should be bar chart? I am talking about numbers not frequencies. So, in this case probably the bar chart will be most appropriate one. Sixth one? Pie chart because you have to show the proportion. Seventh. What about the eighth one? No. No. Histogram. Bar chart. Bar chart. This is slightly tricky depends on how the measurement was done, but let us try to see. In this case there could be two possible answers. No, not scatter plot. One possible answer is bar chart if your numbers are coming out to be the pure digits and it has to be a otherwise it can be a histogram. Tenth. Pie chart, yeah. So, here I think all of us can do this exercise very quickly. Just keep in mind that the total sum has to be equal to 360 which is one volts of k. So, based on this case can you please make a pie chart. 1800 for blood group A, 900 for blood group B. AB is 450 and O is 450. Yeah, you have to make a pie chart. You have to be 360 degrees. So, total count here the total count is 3600. That has to correspond to 360 degree, right. A circle has a 360 degree. Okay, anyone has done it? Yes, sir. Okay. How many of you agree that this is the correct pie chart and how many of you do not agree that this is not the correct pie chart? I would like also like to see the wrong answers or right answers. We do not know. Who thinks that this is not the correct one? He is welcome to come ahead. Is this the right one? Yeah, because if you look at the total number of counts this is 3600 and we know that one full circle the total number of counts have to be equal to have to represent the 360 degree. Okay, and therefore this corresponds to total 180 degree. Okay, and then this one has to correspond to 90 degree and these two have to be the 4545. Okay, so they show the correct proportion for each of these numbers. If numbers are somewhat complicated you have to use some software. Okay, otherwise you can do the same exercise. You can calculate the percentage. You have to use something which can draw the angles. So, in the second example what is given to you is the data from a company where first column has the year of experience the employee had and the second one has his salary and in this case obviously there are two sets of data. Okay, and I would like to see if there is any relationship between these two sets of data. So, can you please make the appropriate plot to represent this data? Year of experience 1, 30,000, 2, 31,000, 3, 32,000, 4, 33,000, 5, 34,000, 6, 35,000, 7, 41,000, 8, 47,000, 9, 54,000, 10, 61,000, 11, 68,000, 12, 75,000, 13, 82,000, then 14, 89,000 and 15, 95,000. Okay, let me repeat. In this case you see that we have obtained two types of data from the company. One is number of experience of the employee. Okay, and then the second set of data is salary. Okay, so what is the correct type of plot to make here? I want to see the relationship. If there is any relationship between salary and number of experience of the employee, okay, then the second set of data is salary. Okay, so what is the correct type of plot to make here? I want to see the relationship. If there is any relationship between salary and number of years of experience is error plot. So, think about what is the independent variable and what is the dependent variable? Okay, so we will move ahead now. So, in this case the independent variable will be the number of years of experience because number of years of experience is going to decide the salary. Salary will not decide the number of years of experience. Okay, so on this axis it has to be the number of years of experience and then on this axis there has to be the salary. And if you make this plot then it will come out to be something like this. Okay, so here the person is joining the company and here the person is retiring. Right, so you see that when the person is, so the second question was assume that employee joins the company at the age of 30, okay, and retires at the age of 50. What do you conclude from the nature of the plot you have met? Right, yes please. So, obviously in this, if you make this you can find out what was the relationship between the numbers of years of experience in the company and salary. You can vary easily figure out that if I am joining this company in next five years or so my salary is not going to increase much. Okay, then there is going to be a rapid growth period where my salary will increase a lot. Okay, and then again in this period when I am about to retire then again my salary is not going to increase much. Okay, so depending on that you can make your decision. Yeah, wise person. Okay, so in the home assignment what will be given to you is again a set of data and again you have to show this data using the appropriate plot. Okay, I think we are end up this, at the end of this module. Let's go to the next one. Okay, so what I am going to discuss now is the anatomy of figures and tables and this is very important because I have just collected some of the plots here. Okay, and most of the plots have not been drawn the way ideally you should be drawing them to communicate your data better. So, that is the reason I am conducting this one. Okay, so let's see. So, first thing we are going to look at is how to make good figures. Okay, and before we look at a good figure we should first look at a bad figure. Okay, so this is a bad figure. Okay, why do you think that it is a bad figure? So, it doesn't say it is not telling you which data is which or in other words there are no legends with this. Okay, what is the other problem? Scale. The scale has not been chosen appropriately. Okay, this figure is not utilizing the entire area which is available to you. Whenever you are sending your paper for publication, okay, you have to pay some pay charges and you are essentially if you are putting a figure like this, okay, then you are not utilizing the entire area which is available to you and then there have to be the appropriate number of tick marks. Okay, okay, so now the same data has been plotted here on the right hand side. Okay, so here you don't have to, the important thing to note is that you don't have to always start your origin from 00. Okay, you can always choose the appropriate origin which need not be 00. Okay, then you put the enough number of major tick marks. Okay, where you will say that, okay, if you are at this point, this is the value and if you are at this point, this is the value and in between the major ticks, you can also put the appropriate number of minor tick marks. Okay, and you have to always find the legends. What is the first curve? What is the second curve and so on. Okay, so ideally this is how you should be making a figure. You have to take the appropriate origin which need not be 00. You have to put the correct y axis with proper label and unit. Similarly, x axis label on the x axis, you have to put all the figure legends. Okay, and you have to so it for each of the data points. Okay, you have to put, you have to put all the major tick marks and then you can put the appropriate number of minor tick marks. The idea is that you should not put too much clutter on the axis, so that becomes very hard to see what is there. Okay, now the other thing is that please do not use the figures unnecessarily. So what happens that people have done the measurements, okay, and then they put all types of figures they have generated. If your data is too obvious and there is something which you can explain just using one line text or two line text. Okay, I measure the dependence of this quantity and on temperature and I found it to be a linear one. You can just say that thing in the test. You do not have to put too many figures because why? Because most of the journals have a page size restriction and if you just put too many figures, they will say that okay, cut the length of your paper. The other thing is that whenever you are putting figures in the journals, you have to pay the page charges. Okay, which is going to cost you. So always try to avoid the figures if you can replace your results by just one line text or two line text. Okay, this is another type of figure where we have shown the histograms and here you will also see that there have been the error bars with each of these data points. Okay, this is the example of another figure. In this case again there are proper axes, labels, everything and especially when you are putting the error bars, it is important to specify what type of error bar you are using because error bars can be of different types. You are going to learn about them in the next module. Okay, so you always say what kind of error bar you are using, we will learn about them today. You have to know it has to be the figure caption. Okay, and this is how your table should look like. Okay, so you have to have a column title and then you have to draw the lines which demarcate your column title with the rest of the data. So you have to put all the data in the table body and then if there is some data point which requires further explanation, you can put them thing, that thing in the footnotes. And the tables for the, the captions for the table have to be at the end of the table. Normally they, I mean this is how they are done. So the other important thing is that suppose you have got a data and if you can convey the same information to your readers, okay, using the figures, please use the figure instead of table. Why? Because tables just have a bunch of numbers. Okay, and people don't like to see a bunch of numbers. Okay, people always like to see the visual impression of the thing. So if you can replace your data, whatever your data with a figure and you can convey the same information to your readers using a figure, please use that. That is one good thing. And the other bad thing about the tables is that tables take more space than a figure. Okay, so it is always a good idea to avoid them to, to reduce the page charges. Yes. Okay, and now comes the misleading graphs. Before, before we go to this misleading graphs, let me tell you about some of the things, some of the few graphs which I have collected. Okay, like all of you can see this. Okay, so this is a graph paper. Okay, normally if you have the, if you are making a graph, okay, then please don't write anything here. You have to plot, you have to draw axis here, then you have to draw axis somewhere here. Okay, and then somewhere in a box you say that, okay, x axis 1 centimeter is equal to this much, y axis 1 centimeter is equal to this much. Okay, then you put your axis labels with proper units, everything, and on the top of the graph you also say the title of the graph. Okay, what your graph is showing. Okay, so that is always a good, good practice. Always encircle your data points if required, if you can join them, join them with a curve. Okay. Okay, so let me tell you about the misleading graphs, how people can make the misleading graphs. The idea is not to tell you how to cheat. The idea is to tell you how to avoid cheating. Okay, so in this case it is a pie chart where item B has a, item C has a proportion of 5 percent, A has a proportion of 11 percent, okay, D has a proportion of 42 percent, and B has a proportion of 42 percent. And the same plot in Excel has been made using a 3D perspective. Okay, that means the object has been made in three-dimensional and then you are looking at the object from a different angle. Okay, and see what is happening here. I mean it appears that this item C which was double half, I mean which was half of this 11 percent almost looks like it was same as item A. Okay, so using 3D perspective can sometimes lead to the misleading graph. In this case if somebody just suddenly looks at the figure, doesn't look at the numbers, he will get an impression that the amount of item C is same as item A. Okay, what is the other kind of misleading graph? This is the original data. Okay, so for example the data is from 0 to I think it is 12000 or something. Okay, so these are the, these are bar chart where data for different categories have been plotted. Okay, and what somebody can do, so in this case if you look at this scale, if you use the original scale, you don't see major difference between the categories. Okay, but if somebody truncates this axis, okay, and which is very easy to do in most of the plotting software, if somebody just truncates this thing, this whole thing let's say at 9000, and so the data is turning from 9000 onwards, in that case the same data would look like this, and this, this case you see that there is a significant difference. Okay, so whenever you see a histogram be careful whether the starting, start of this scale has been shown or not, or it's a truncated one, if it's truncated one probably the data has been manipulated. Okay, it's probably giving you wrong information. Okay, this one is the correct one because you're showing all the data, okay, and if you look at the relative importance of each of them, it is more or less the same. Okay, but in this case everything has been blown up. Okay, because of the truncation of the axis. So the differences which were appearing to be very minor ones here appear to be the major ones here. Okay, the other thing the people can mislead you by changing the ratio of the plot. Okay, means the ratio of the dimensions of the graph. For example this is the original graph. Okay, and what one can do, one can scale the graph that means he can reduce the length of this by a factor of half and he can increase the height by a factor of 2. In this case the curve which was looking like this where the increase was appearing to be a slow one. It can appear to be a very rapidly increasing data, right. Similarly, look at the other one. Again in this case somebody has scaled it. What he has done? He has made this axis twice long and he has reduced this axis. In this case whatever was the increase, rate of increase, it will appear to be a slower one. Okay, so sometimes people can mislead you by scaling the axis. So just be careful while you are looking at the data. Okay, so let's try to complete this online assessment quickly. So the first one figures should not be used unnecessarily as they take space. Okay, usually take more space than figures. What takes tables take more space than figures. Changing the ratio of graph dimensions can lead to? Yeah, and then d truncating the, our bar chart can give you the misleading information. Okay, now second test. No this is, this is false. You need not choose original 00 all the time. Second one, foot nodes should not be used in a table. Yes, you can use foot nodes in the table. If there is some data point which requires explanation, you can use foot nodes for that. So this is true. This should not be, yeah, should not be, so this is false, yes. Third one, minor tracks should not be number, that is true. Fourth one, use tables rather than figures. This is false. Okay, so just read this statement. So the experiment which is described to you that you have a capacitor, you have connected resistance to it, you have a charged capacitor. I hope most of the people know here what is a capacitor. You have connected through a resistance and you are looking at because when you connect a capacitor through a resistance, it discharges, right? And then the charge on the capacitor decays with time, right? So once you had certain amount of resistance, you allowed to discharge, you looked at how the charge was decreasing with time, okay? Then what you did? In the second experiment you started with the same capacitor at the same charge level and then again you allowed it to discharge but through a different resistance, okay? So the question is if this is the question, so first time you are doing it with resistance R1, second time you are doing it with resistance, let us say R2, okay? Same experiment you are doing by changing the resistance, okay? You have to answer this one. And what you are recording? You are recording what is the value of Q as a function of Rc, right? So in this case what you will do? You will put the time on the x-axis and then charge on the y-axis, okay? And it is discharging, right? Something like this from some value. Second time you will change the resistance, you will get the other plot. So time should be shown on the x-axis and the charge on the capacitor should be shown on the y-axis because that is the dependent variable. Then what should be used? What should be used to indicate the values of the resistance? Because from one measurement to the other, for the other set of plot what you are doing? You are changing the value of resistance, right? So the first one you will say, okay, this is my R1, okay? And for the second one probably you will show it like this. This is my R2, okay? What are these called? Lizens. Lizens. You will be using Lizens, okay? Third one values between major tick marks are shown using minor tick marks, okay? So like here if you are having one then other measurement you did at five, the other one you did at ten, probably you can put four tick marks, minor tick marks. So one, two, three, four and then five, right? Okay? So in your home assignment again the same roughly same kind of data is given to you that somebody has measured the current flowing through a device okay? At different values of the voltages, okay? In first device and second device and you have to use this data to make a plot. So in this case you have to show the two sets of measurements, okay? By changing the devices on a graph paper. So whenever you are making graph paper the important thing to note is that you always please plot the axis, okay? Don't use this wide space to write all the values, okay? So suppose this is your graph paper area, you do something like this, okay? Then here you say what is your x-axis with proper labels, unit, everything. Here you say what is your y-axis. Here if you are using a graph paper you say that, okay? x-axis, one centimeter is equal to whatever volts or whatever you have, okay? y-axis again you have one centimeter is equal to how many ah, amperes of current or something. And then here you can also put the title of the graph if you have a space, okay? Like variation of current with voltage in device A and B, right? Now let's come to the most interesting part of this lecture, okay? Because this is not normally taught to many people, okay? So I think it will be a good idea to learn about it. So what I am going to discuss is error bars. One of the reasons I do it is following because most of the researchers, teachers, okay? They do all the hard work to get the data and then they can get the data, they make the they can make the plots, okay? But they are often unsure how to use the error bars, okay? So what I will try to do? I will try to discuss some basic type of error bars, okay? And how they can help you to communicate the data and assist the correct interpretation that you are making from your data, okay? Now when you talk about error bars, error bars can be of various types. They can be range, they can be standard deviation, they can be confidence interval. So there are different types of error bars and it's not like that you can choose any one of them all the time. You have to use different types of error bars for different interpretations, let's say. So the important thing to notice that different type of error bars give you altogether different information and one of the important thing you should be always doing whenever you are showing error bar you always say what type of error bar you are using, otherwise it's useless, okay? Now error bars if you are using them properly, they can either give you the information describing the data and these type of error bars are called descriptive error bars and the other type of error bars can give you information about what conclusions or inferences you are deriving from the data whether they are justified and these type of error bars are called inferential error bars, okay? So error bars can be of two types, they can be descriptive which can describe the data and the second one is called inferential they can help you to make the conclusion or inferences from the data whether they are justified or not, okay? Now what are the descriptive error bars? Descript error bars are called range and deviations, standard deviations, okay? So range is a kind of error bar and the other kind of descriptive error bar is standard deviation. What is the range? Suppose you have got a data, okay? Now data is something like 1, 2, 3, 4, 5, 6, 7, 8 up to 100, okay? So whatever is the difference from minimum to the maximum of your data that will be called the range of the data, right? And the other thing which is so range basically describes the spread of the data. How spread your data is? What is the minimum of the data? What is the maximum of the data? The other kind of error bar which is called standard deviation describes what is the typical or average deviation of your each data points from the minimum, okay? And that is called standard deviation. So range is easier to calculate, there is also formula to calculate the standard deviation and standard deviation gives you the average difference between the data points and their mean, okay? So for example here the mean, standard deviation and range is given for a measurement where there were 5 data points, okay? So 1, 2, 3, 4, 5, I took the mean, mean is somewhere in between the data. Range will be from minimum of this data to the maximum of this data and standard deviation will be showing you the deviation of data points from the mean. If you increase the number of samples, number of measurements then in that case your range will become larger by standard deviation will remain more or less same. Again if you increase your number of measurements which is sometimes called sample size, okay? Then again your range will increase by standard deviation will more or less remain same because this is the property of the standard deviation, okay? So if you are using standard deviation then about 2 third of the data points lie between mean and plus minus 1 standard deviation and about 95 percent of the data lies between mean and plus minus 2 standard deviation, okay? And whenever we do the experiment, we repeat it many many times. Why? Whenever we do an experiment we repeat it many many times and then we take the mean of that, right? Why do we do that? To minimize the error. Any other answer? So we repeat our measurements many many times, right? And you call it different, different sample size. The whole idea is that you want to get a best approximation for the true value of the quantity, right? So true value of the quantity is something which we will get after infinite number of measurements. Since we cannot do infinite measurements, so we use the mean which gives us a rough estimate of the true value of the quantity, okay? So our mean is our best estimate for the unknown true value or true mean, okay? And if you repeat your experiment more and more times, standard deviation of your experimental results tends very closely to the true standard deviation which you can get after infinite number of measurements, okay? And standard deviation of experimental result will be approximately equal to the true standard deviation whether your n is smaller or larger and this is why standard division then change much with the sample size. The other kind of error bar is inferential error bar. So range and standard deviation will be descriptive because they describe the data like how, what is the difference between minimum and maximum of the data, what is the typical deviation of each data points from the mean of the data, okay? But these inferential error bars can help you to derive the conclusions. For example in biology and some other fields it is very common to compare two different samples. For example you have a control experiment or you have a wild type organism and you want to do some new experiment and you want to see how much is the difference, okay? For example I have wild type mice and I want to make some genetic changes in the other organism and I want to see how different the results are. And sometimes people know that, okay, if I do this experiment which is called control then I get this where you know the result is definite and then you want to deviate from that you make some changes and then you want to see how much different your results are, okay? So to make inferences from the data or to make judgment, okay? Whether results are significantly different or whether differences are due to the random fluctuations or by chance, okay? We use different type of error bars and they are called inferential error bars, okay? And these inferential error bars are called standard error, sometimes also people call it standard error of the mean or confidence interval, okay? So mostly used inferential error bars are standard error or standard error of the mean or confidence interval and there can be different types of confidence interval. There can be 95% confidence interval, there can be 90% confidence interval. We will see what it means, okay? So if you are using standard error or confidence interval then mean of the data with standard error or confidence interval confidence interval error bar gives the indication of the reason where you expect the mean of the whole set of the possible results, okay? So that means you have done your measurement you have found some mean, okay? Mean is your best estimate of the true value of the quantity and if with mean you are plotting the standard error or confidence interval, okay? Then what you are telling your readers that look whatever mean I have calculated it is only going to vary in this region, okay? So that means my plausible values of the true value is also going to be in this region, okay? So this mean can be only in this region, okay? It is not going to be here. So mean with confidence interval or standard error is going to be an indication of the reason, okay? Where your mean is going to lie or true value can lie, fine? So here again this is a plot which shows how the confidence interval and standard error differ when you increase the number of samples. So in this case the measurement was done for three, five samples. I think this is wrong, okay? So five samples person has calculated the standard deviation, then confidence interval and then standard error. Now in this case the sample size is more, okay? For the same kind of experiment. In this case you will see that standard deviation will be more or less same. Confidence interval will reduce and again standard error will again reduce. The person probably forgot to put the standard error levels here. If you increase your number of samples again, both standard error and confidence interval will decrease, okay? So as you repeat your measurements many, many times one thing you are getting your mean is getting very close to the is giving you the best approximation for the true value and if you are repeating your measurements many, many time you are reducing the amount of error your mean can have, standard error of the mean. So basically you are narrowing over the reason where the mean of this entire set of data will lie down, okay? So as you are increasing the number of samples the mean will, so mean earlier could fluctuate like this. The sample is going to the fluctuation or the range within which it can vary is going to reduce further, okay? So these are the take home messages from this module that there are two types of error bars. One is called descriptive error bar and the other one is called inferential error bar. Range is an example, range is an example of descriptive error bar and it tells you the amount of a spread between the extremes of the data. How to find out it? You take a data point and you take the lowest data point, take the difference of them. That tells you the range of the data. Standard deviation is again a descriptive error bar which describes the data, okay? It is typically or roughly speaking the difference between the data points and their mean, okay? And the formula to calculate the standard deviation is following, you take, you look at how much is the deviation or difference of each point from the mean, m is the mean, x is each data point, you square that n minus 1 and then you take the under root because you have squared everything. So, this is the formula to find out the standard deviation. One important thing is that if you increase your sample size or n or number of observations, your standard deviation is going to reduce, okay? Similarly the standard error is a inferential error bar. It helps you to make conclusions from the data and it is a measure of how variable the mean will be if you repeat your whole study. Many times and the formula to calculate the standard error is nothing but standard deviation divided by under root of n. Again important thing to note here is that if you are increasing your sample size and then your standard error is going to reduce. By making, by repeating your measurements many, many times what you are doing, you are reducing these errors, right? Confidence interval or the most commonly used is called 95 percent interval. It is again an inferential error bar and it tells you the range of values. You can be 95 percent confidence that it contains the true mean, okay? And the formula to calculate is basically mean plus minus, this is called critical value multiplied by standard error, okay? And this is, this critical value is fixed. The number must be given in your assignment. So, if you have more data points then this comes out to be roughly 2 for 95 percent interval. So, roughly it is mean plus minus 2 times the standard error, okay? So, next will be the exercises based on this lecture. So, first one, range and standard deviations are examples of range and standard deviations are examples of descriptive error bars because they describe the data. Second one, standard error and confidence intervals are examples of inferential error bars because they can help you to make inferences from the data, okay? Error bars give information about what conclusion inferences are, okay? So, that is again inferential error bars. D, D is again descriptive, right? About 2 third of the data points lie between mean plus minus one standard deviation F. We can use mean as a best estimate of the unknown unknown true value, okay? First one, false, true and false statement. Standard error increases with sample size. Is this true? It is false because it comes in the denominator, right? Second one, standard deviation roughly gives you the average or typical difference between data points and their mean. Is this true or false? This is true, right? Third one, mean of the data with confidence interval, error bars define the range of values which are most possible for the true mean. This is true, okay? D, while compare results from two groups say wild type or mutant to see if they are different descriptive error bars should be used. This is false. In this case you have to use the inferential error bars, okay? So in your home assignment what is given to you that there is a coffee machine and you know that if you go to a coffee machine every time it will not pour the same amount of coffee, okay? There are going to be some little differences. So what is given to you that somebody has taken coffee 15 times and every time there is a difference in the amount that he has got. So using this data what you have to do you have to calculate the mean of this data you have to calculate the standard division of this data, you have to calculate the standard error for this data set you have to calculate the 95 percent confidence interval and you have to write the mean with the standard error. So this will be the home assignment and we will also give you the solution for this. If you have any problems with this you can get back to us, okay? So this is the last module and we will try to wrap it up quickly. So, so far I introduced you to the error bars. That error bars can be of different types descriptive, inferential like range and standard deviation or descriptive confidence interval and standard errors are inferential, okay? But it is very important to use this error bars properly. And we will also try to understand this using some examples, okay? So as I told you that error bars can be of different types, okay? And therefore error bars are meaningless and misleading if figure legends does not state what kind they are. So whenever you are using error bars please do write what type of error bar you are using, whether you are using range, standard deviation, standard error or confidence interval. Even confidence interval can be of different types 95% confidence interval 90% confidence interval. So always write in the figure legend what type of error bar you are using, okay? So this is the first message. The other important thing to notice that you should be very careful when you are reporting the data from the replicate measurements and representative experiments, okay? So it is always a good idea to show the error bars, okay? But whenever you are reporting the data from the replicate experiments or representative experiments, you have to be careful and we will try to understand what is the meaning of a replicate experiment and a representative experiment, okay? So one of the things scientists and researchers often do they account for natural variations which can happen by repeating their observations or samples many, many times, okay? Because that can take account for the individual to individual variation or sample to sample variation, okay? And this is the this is the way people try to eliminate the natural variations and they try to understand whether the different results which they are getting is not the is not because of the natural variation of the samples or individuals, okay? And these number of independent samples, individual, independently conducted experiments or independent observations are called sample size or small n, okay? So how many times you have repeated your experiments, okay? You should always say, okay? And that is called sample size, okay? So whenever you have repeated your measurements and you are putting your data from from different number of observations, you always say that what is your sample size and you have to be very careful because you are not reporting the sample size from a replicate or printed experiment, we will try to understand what it means, okay? So we will first try to understand the meaning of replicate, okay? Replicates are nothing but repetition of measurements of one individual in a single condition or multiple measurements of the same or identical samples. So one good thing is that you should be using error bars but you have to be very careful when you are reporting the data from either a replicate experiment or a representative experiment, okay? For example, suppose you are working in the field of biology, okay? And you want to test the hypothesis because it is very common, I mean all of you know that if there are mutations it can change many things, okay? So suppose your boss wants to test the hypothesis I have a mouse and if I delete this particular gene then whether it is going to change the length of the tail of this mice, right? And he gives this task to you as well as your lab mate, okay? So understand your boss is giving you to test the hypothesis that if I delete this particular gene in a mice whether it is going to affect the length of the tail of the mice, okay? And he asks you to do this experiment and he asks your lab mate to do this experiment. And what you do? You do the option one. What you are doing? You are taking one wild type mice, okay? You found somewhere and then you measured its tail length 10 times, okay? And then you did you had another mice where you mutated the gene, okay? And the idea was that if you mutated the gene it is going to change the length of the tail, okay? And then you after you have done the mutations you again in that mice you measured the length of the tail, okay? So what you did? You took one wild type mice and you took one mutant mice and 10 times you measured the length of this, length of the tail of this wild type mice and 10 times the length of this, length of the tail of this mutant mice, okay? And over your lab mate does it in a slightly different way and he takes longer, okay? What he does? He takes 10 wild type mice, okay? And then he takes another set of 10 mice where he mutates the gene in each of them and then he makes the measurement of their tail, okay? So in option one, one wild type mice, one mutant mice 10 measurements of their tail. Second option which your lab mate chose, in that case he took 10 wild type mice and 10 mutant mice where he mutated the gene and then measure the tail. Which one is the correct way of doing this experiment? Option two, because if I am choosing the option one in that case what is happening? It is possible because all of us in this class, okay? All of us have different height, okay? Different body color. But the the length of the other mouse which I chose, okay? Was just because of this natural animal to animal variation, okay? And it was not just because of the mutation of the gene, okay? So it is not going to address the primary question whether the deletion of the gene is going to affect the tail length of the mice, okay? So if you want to be really sure about your measurements then you have to take the option. And 10 mutant mice, okay? And then you compare their tail length. In that case you can very easily avoid, okay? The natural animal to animal variation. But if you are taking just one of this type and one of this type, other type it is possible that the differences which you are observing between these two were just because of the natural animal to animal variation, right? Am I clear? So therefore option two is the correct way of doing this experiment. Okay? So in this case, option one cannot answer the central question whether the gene deletion affects tail length because n is equal to 1 for each genotype, okay? So in this case sample size is 1 and the variation which you are observing could be just because of the natural animal to animal variation. And therefore to address this question successfully we must assume is the possible effect of gene deletion from natural animal to animal variation. And therefore option two is the correct way of doing this experiment. And therefore you should always have and better than 1. Any questions? Okay? So so far we have so in this case this was an example of the replicate experiment because I took one individual I measured the I did the measurement 10 times I took the mutant I just measured the measure the tail length 10 times. So all those sample size is 1 okay? The other measurements were just replicates okay? So sample size will be 1 if I were choosing the option 1 where I was just taking 1 wild type and 1 mutant. In the other one the sample size will be 10 okay? So you have to be very very careful when you are repeating when you are reporting the measurements for from experiments like this in this case the sample size is 1 it is not 10. Although you have measured the tail length 10 times okay? So this is an example of one of the replicate experiments. Now let us try to understand the meaning of representative experiment because what I told you that you have to be very careful when you are using the data from the replicate experiments and representative experiments. So let us try to understand the meaning of the representative experiments okay? Now what people do that sometimes in the lab they do the experiments many many time and what they do they show their best data okay? The measurements but at the end of the day because of the constraints which they have due to the size limit of the journal okay or various other things they just show only one set of data. So what person is doing? Person is probably have done measurements many many time but showing the data from only one measurement okay? So data from one measurements means sample size is 1 okay? And if you are showing the data from only your one experiment don't show the error bars because then in that case the error bar will be misleading okay? Because your sample size is always 1. If you are just doing it from a representative experiment you do 10 experiments you are showing experiment from only one of them don't show the error bars okay? From all the other measurements okay? If you are just using sample size of 1. So that is called representative experiment. So error bars should be shown only for independently related experiments and never for replicates. Data from a representative experiment should not have error bars because in such experiments sample size is always 1 okay? The other question is when what type of error bars to use when you are comparing two different types of results that I told you in biology it's very common to compare wild type versus mutant okay? You have some control experiment and then you want to see how much you will change if I do this. So you always want to compare the things or it is usually appropriate to show inferential error bars such as standard error or confidential error rather than standard deviation which tells you the spread of the data. So always use standard error or confidential interval whenever you are comparing two sets of data because range and standard deviation just show you the one data set. They give you information about one data set. So based on this lecture let's try to do this quick exercise which is treatments of 2. Error bars are meaningful even if figure lesion doesn't state what kind they are. This is false okay? You have to always specify what type of error bar you are using. Error bars should be used with caution when reporting data from replicate measurements. This is true. You have to be always careful okay? If you are reporting data from a replicate measurement. Third, number of independently conducted experiments are same as number of replicates. This is false okay? Number of independently conducted experiments is your sample size okay? Which has to be greater than one. For replicates is just n is equal to 1. Fourth one, the sample size and is one for data from a representative experiment. This is true. Second exercise, when comparing new experimental result with control experiment which type of error bar to use? Inferential error bar should be using either standard error or confidence interval. B, scientists handle wide variation of, wide variation that occur in nature by performing? No. By performing? No, replicate is the repetition of the same thing. Independent experiment okay? Because for each individually you get additional sample size right? Repetition of measurement of one individual in a single condition or multiple measurements of the same or what is called? This is called replicates okay? So in the home assignment a problem will be described to you. For example, you and your lab mates are both studying the same exciting new anti-cancer compound that is being tested for cancer cell lines using tissue culture. So the question has been framed along the same lines which I discussed to you like wild type mice and mutant mice. So it's a similar type of question and in this case after going through the question you have to answer like what is the value of sample size in your experiment? What is the value of sample size for your lab mates experiment? And whose experiment is having replicates? I told you what is the meaning of replicate? And if you go through this you will the description is written in such a way that since you will face longer because you are a lab mate and your lab mates will take longer time and therefore but still your supervisor believes that your lab mate has done it in the correct way and you have to write the explanation for that. Why your supervisor thinks that your lab mate was right and you were wrong? Okay so with this I would end my lecture and thank you very much for your attention and if there are any questions you can ask me. If you have any questions you can email us.