 I'm not only the only person talking, I've already did my introduction, so we're going to go directly into this session. So today's session, we're going to just discuss introduction to statistics with mostly will be your study unit one from your study guide. You don't require any calculations or calculators, so this one should be easy, straightforward to go through. And because we just going to introduce some of the concepts that you need to know as you develop or as you go through the entire module as well. So the lesson for today, we're going to just do an introduction to statistical methods in order for us to understand how to analyze your data. And we're going to also understand some of the characteristics of the data. We are going to look at also some of the types of variables and also the levels of measurements that you because in order for you to understand statistics, you need to first make sure that you understand the basics of it. These are the building blocks because from here, everything that we talk about today, you need to remember it all the time as you move along. You need to always remember what is the population, what is a statistic, you need to remember what is a qualitative data, what is a categorical data, what is an ordinal data and all sorts as we move along. So we're going to carry those concepts throughout. There will be some activities I like giving people work to do so that you learn by doing things so that you always remember how to solve or how to answer your questions as you get them in your assignment. So I've got a lot and a lot and a lot of activities that we're going to do as a group. If time is allowed, you're going to also have your few minutes to answer them and then we come back together as a group and have an engagement and discuss around the questions as well. Okay, so where statistics is located? You need to understand that for you to be able to make decisions, they should be a process and this process usually should be a scientific process so that people cannot argue with whatever you are presenting. So in terms of this scientific process, it always begins with asking the research question and usually that research question is accompanied by assumptions which are the claims that you are making which are the hypothesis. As you can see, I'm already, those who are already repeating this module, you will understand where I am going with this. In order for you to answer those hypothesis or to answer the research question that you have and have those hypothesis that you want to test or your assumptions that you want to test, you need to collect information. And once you have collected that information, you cannot just collect it and put it there. You need to analyze it to find insights, to find information that will help you make a decision. And once the data that you've collected or the information that you got does not answer your hypothesis, you can go back and revisit your assumption and maybe change or adapt your assumption because the information that you have does not support that. And then you start again, you collect, you will go and collect new data that will, and that is the cycle that we go. Where statistics fits in in all of this because my role at this point is not to get you as well because hence the sessions are called statistical literacy. So I want me to help you understand what you are studying has an impact on your day to day job as well and can help you in your day to day job. So I want, by the end of this session, I want you to fall in love with statistics and want to continue in this career or want to use it, use statistics. Statistics plays a role in so many things in life. So there are key areas of statistics or elements that you need to understand how they fit in together in statistics. The first part is the data management where we collect the data and do that. So there are people who plays those roles. For example, I'm a business intelligence data architect at UWC. My role is to do data management. But I also play a role in producing the statistics in the analyzing of the data, presenting it in tables and chart in manipulating it into statistical hypothesis and testing the hypothesis, checking the correlations and so on, which are inferential statistics. I'm also responsible for presenting that information so that the executive can be able to consume that information and make relevant decisions. So my role plays around all this, but mostly within statistics we talk about two and three, but you can see that it encompasses multiple areas, which is also data, which is also presentation. Because you cannot just do the manipulation of the data and get them into a report and never share the results. You cannot be able to do your statistical analysis without you seeing the information, the data, understanding the types of data that you have. And all of that also are included in the processes of statistics. So why, what is statistics? Statistics is the science of collecting, organizing, presenting, analyzing, interpreting data to assist in making more decisions. So you source the data, you enrich the data by putting it in nice graphs and calculating the mean, the standard deviations and presenting that in tables and presenting them as in the dashboard format or tables or charts. And then you give it for people to make informed decisions. That is the process of statistics. Why do we study it? Data is everywhere. We know that statistical techniques are used by many, many, many government departments, many organizations, hospitals, schools, companies like banks and insurance. They wouldn't know and understand their clients and what they need to offer them if they don't use the data that they have. So if in our day-to-day life, we're not also using the data to make decisions about how much is going to cost us in this month, next month, all those. Sometimes you do add up things, you divide things, you subtract things and you calculate the averages, all those things. They help you make a decision that is part of statistics. You use it in your day-to-day life as well. Statistics is also a method that helps people, companies and everybody to make effective decisions. Those effective decisions, for example, if you work in an insurance company, let's say probably, let's talk about KZN. KZN had the flats and all those things, right? First it was the looting, then the flat, happened in KZN. If I was an insurance company, if I didn't know my clients and the areas and all that, I wouldn't have prepared a risk cover for my clients to say, people, when you're building bends, you can ensure your company for this much so that you can recover the costs after the bending of your business, all those. In order for a company to understand what the risks are and how much to cost the company, they need to collect more information about the type of a business it is, make some calculations. For example, one of the calculations we're going to talk about them is the coefficient of variation, where it looks at the variability of or the likelihood of something costing you more money than the other. So they use those kind of calculations to make those effective decisions. We use statistics as well to help us predict the future. For example, those who work in the weather stations or the company that monitors the weather, they most of the time predict the weather, which are the weather forecasts that we always go and watch. They use some sort of statistical techniques to come to those forecasting like your time series or they can use, there are so many other methods. I don't know what they use to forecast the weather, but they use those type of statistical techniques to make estimates in terms of whether it's going to rain, whether it's going to be cold or hot and so on. So it helps in predicting the future. So there are two branches of statistics and you can stop me if you don't understand. There are two branches of statistics. The first one is a descriptive statistic. A descriptive statistic is a method that you use to describe your data and you describe your data by means of summarizing it in tables and charts and presenting it in a way that it is more appealing and easier to consume for the users. So with descriptive statistics, it includes also the method of collecting that information from the surveys, from the administrative systems that you have or CRM systems, which are called customer relation management systems. Most companies have them or from your operational systems. Those who work in the tools, you will understand those who are cashiers. The items you put, the bug code you scan through, those are operational systems or we call them CRM systems, where they help capture the operational data that we use and you can summarize that information to understand the sales processes, to understand when do you sell more, what is the most bought item in the store and so on and so forth. You also analyze that data, you can present it in terms of tables and charts. The second branch of statistic is what we call inferential statistics, which is the method used to determine something about the population from the sample. We're going to talk about what is a population, what is a sample later on. So with inferential statistics, there are two methods that you can use in inferential statistics, which is the estimation where we estimate and we're going to talk about this when we do the confidence interval. And there is the hypothesis testing. How does this hypothesis testing and the hypothesis that I spoke about? So remember the hypothesis that I spoke about is just a statement that you do, which is the assumption. The hypothesis testing also is a process. It has six steps. We're going to learn about it in Chapter 9 or I think Study Unit 9 that you are doing. So there is a section, beautiful hypothesis testing, where we test the claim that you have made and see if we can accept that claim or reject that claim. And inferential statistics is a process of drawing conclusion or making decision about the population based on the sample. And we're going to talk about this just now shortly. So we introduced the concept of population and sample. Population is a set of all elements or subject that you are interested in studying. So all of them, for example, everybody in South Africa will be a population. That's a population of South Africa. But also if I'm interested in studying Western Cape, I can define my population as everybody who stays in Western Cape. So Western Cape population, that is my population. So all elements of interest that I am interested in studying become my population. But because the population is too huge and is time consuming for you to go and get information about all this, you are going to create a subset. Hence, we make use of this subset to test the claim and make a decision. And that subset is called a sample. And a sample, because it's just a subset, helps us because it's small. We are able to collect the relevant information that we want and analyze it, present the data, and use the inferential statistics to infer back. Because once you test that claim, you can infer back those results. So when we collect information from the population or the sample, when we start calculating and manipulating this and creating the calculation, where we calculate the mean, the standard deviation, those are what we call the measures. When we create those measures, we are then creating what we call parameters if we create those measures from the population. If they come from a sample, we call them statistics. And statistic is the study of manipulating and summarizing the data for decision making because we're going to use statistics to infer back to the population. So measures coming from a sample are statistics from a population are parameters. We need to remember those two things. Later on, when we do the analysis of data, then we're going to talk about the parameters and the statistics. What are the differences and how do we calculate them? Because there are two different ways of calculating or they have different formulas to calculate the parameters and to calculate the statistics. So now I spoke about you collect data. Now, because we use this interchangeably, data variable. I'm going to tell you that instead of using data because data are units, we're going to talk about variables. When we collect variables or data or variables because the data that we are collecting, it comes from a variable. What is a variable? A variable is a characteristic that describes that subject or interest. For example, a person is an entity. It's a subject. If a person is a subject, the person is made up of variables. Now you are not made up of data. You've got data, but you are made up of variables because the variables describes the characteristics of you as a subject. Your color of your hair is a variable. The entity you are, for example, the gender that you encompasses, the gender that you are is a variable. If you are merit, we say the merit status is a variable. From those variables, we do get some units and those units, that's what we call date. For example, the color of my hair, my variable color of my hair. The color of my hair has data units, which are brown, maybe blonde, red, black. Those are the colors of the hair of different subjects. The type of vehicle is a variable. Data will be VW, not the type, the mode, the make. The make of a car or the type. Let's use the type, the brand. The brand of a car will be VW, Toyota, Mercedes and so on. Those units are called data. They are values associated with the variable. And you always use variable and data interchangeably. And from there, we're going to learn about the types of variables, because you need to understand the type of variables in order for you to be able to manipulate it. In order for you to be able to summarize this, you need to know what type of data you have and what type of variables you are working with. So there are two types of variables. There is what we call a qualitative variable, or we call it also a categorical variable. I mean one and the same thing. These are variables that you can put into categories, like the color of my eye, medical status. Those, I can put them into categories because the data that comes from there, I can group them into those different categories. Brown eyes, purple eyes, red eyes, black eyes and so on. We also have the other type of variable, which is quantitative or what we call numerical variable. And these are the variables that you can either count or you can measure. If a variable can be counted, we call it a discrete quantitative variable, like the number of students I have in my class, the number of books, the number of modules I'm studying, the defects per hour, so where you are counting how many defects you get from a factory, if you work in a factory and the bottle doesn't come out straight or when you are making cups, some of the cups do not have handles, those are defects or they've got chips, those are defects. If you work in the factory, I like going to those factories in Cape Townway. They sell the defects from Woolways and all that and because you buy those, the defects are not visible in the naked eye, but in a factory where they sell, the service is to sell 100% guaranteed good service to their client. A small defect for them, it's a defect, so you go there and you buy those defects. So you can count how many defects you get in your factory in order for you to fix the processes. So I'm deviating from what we are discussing. Those that can be measured are now called continuous quantitative variables and those are the variables such as your height, your weight, your age. Age is measured, you measure age by the second you are born, by time, by day, by month and so on. Those are the types of variables. We also have what you call the levels of measurement and with levels of measurement, they've got the ranks as well because there is the ones with the lowest and there are those with the highest levels. The lowest levels are from the categorical variables and those are nominal and ordinal. With nominal, there is no logical order and the categories are said to be mutually exclusive because one category, you cannot belong in two categories. Ordina, an example of that is female and male, you cannot belong in both. Ordina consists of distinct categories which have an order, so there is a rank to this. For example, levels of education, pre-school, primary, high school, college, university, those are levels that you can go through because one is higher than the other as you move with levels of education. Then we move into some of the strongest, which are the higher levels, which comes from quantitative variables, which is interval and ratio. Both of them because they come from numerical measurements, they've got numerical measurements to it. With interval, there is no true value of zero. For example, think about it in this way. When you think about interval, think about temperature. With a temperature, zero is just another number. It means it's another cold day. If it's negative 1, negative 10, or it's 10, or it's 5, or it's 14, or it's 22, that is interval. The ratio consists also, it's a numerical measurement, but it has a true meaning of zero. Zero means zero. Zero means you do not exist. Zero means there's nothing. The true meaning of zero exists. When you weigh yourself and it's zero kilogram, it means you don't exist. So it needs to be at least 20, 30, 40 kilograms. So any value that does not take a negative number, it's a ratio. The negative number, think about it as interval. I'm almost done. So these are the levels of measurements. Nominal ordinal goes with categorical data. The only thing you need to ask yourself is this numerical or is this categorical? If it's categorical, ask yourself does it has an order or does it not have an order? It's nominal. If it has order, it comes from the name. Ordinal, order, ordinal. So there is order. Interval, ratio. Interval, does it have a negative? Can it go to a negative value or can it go to positive value? It only takes zero and positive values. If it can go into the negative, it's interval. If it can go into only takes positive values, it's ratio. The other thing, just to conclude in terms of basic operations, remember order, they all can be ordered. Even if nominal does not have order, here we're talking about the basic operations. Can I sort the data? Yes, you can sort the data. You can order the data. This is not meaning that is there an order of how the variables look, but can I put them from lower to higher, the values when I have them? Can I count or create frequencies? Yes, I can count how many males I have, how many females I have, or I can count how many people responded with strongly agree as opposed to strongly disagree. I can count how many people because it's frequency. Interval, I can count how many people they have temperature, not people, how many days did we have a temperature below zero degrees? How many days did we have temperature above zero degrees, things like that. Ratio, I can count how many people weigh this kilogram or above this kilogram or below this kilogram, things like that. The mode, we're going to also discuss this later on what do they mean? What does the mode mean? The mode is the most appearing number. In the ordinal, yes, you can. In interval and all sorts. Median is the middle value. You can only calculate the median with numerical values. The mean you can only do with the numerical values. The difference, can you add and subtract? Yes, you can do it. You can add and subtract. Can you do a ratio? You cannot do a ratio of a negative number. So always remember that interval values will not have a ratio. Do they have an absolute value of zero? Remember that. Which one ratio will have an absolute value of zero? Whereas interval also can go into a negative, right? So I am done. Any questions? We are 20 minutes in. Do we have any questions? If they are going to switch on to this site? Any questions? Okay. Then let's go. Let's say I do have a question. Yes. With all these discussions in the study guide of basic statistics. Is that all now in unit one? Because I struggle to mark where with all the different types of. So it's all in level one. It's all in study unit one. Sometimes they are called scales of measurement or levels of measurement. Again. Thank you. Thank you. Okay. So let's look at how the questions comes from your assignment. Or your exam. So we're going to look at activities. So. Number one says complete the following sentence. Is a statistical method that draws conclusion about them. Based on the computed from them. So think about it. We've got two branches of statistics. We've got descriptive and we've got inferential statistic. So the first question they. Or the first should either be inferential or descriptive. So which one is rich? Think about it. Most of them are in the title themselves. Descriptive describes the data. Inferential infer the data. All draws conclusion. Descriptive describes. So they are in the description of what you need. So is a statistical method that draws conclusion about. Based on computed from. So the other thing also remember we introduced two concepts. There is the population and there is the statistics. And because we draw in what are the measures that comes from a population and what are the measures that comes from. The sample. So think about it. There is the parameter and there is the statistic. So based on that. And based on what we've covered today. Which option A, B, C or D. I would go for D. You'll go for D. D is incorrect because. We draw in conclusions. We're not summarizing. So the only correct statement will be C. Because it will say inferential. So the statistic is a statistical method that draws conclusion about their. Population. Based on the sample. Based on. Statistics. Computed from a sample. And that was the first few slides that we looked at. Moving on to question two. Which one of the following statement is not the goal of a descriptive statistics. Which one A, B, C, D or E. Which one is not the goal. Remember inferential statistic is about estimation and hypothesis. So which one is not D. It will be D is the incorrect one because. D says estimating we know descriptive statistics is about summarizing. Visualizing displaying it's also part of visualizing reporting the numerical findings because we summarize the date. By analyzing the numerical. Finding the mean the standard deviation presenting is the same as displaying. Presenting of the data in a form of tables chat and summary statistics it's all all of them. They describe what distinctive statistics is all about. Question three. Which of the following variable is not a categorical variable. Think about it. Which one can you place into categories. Is the height. Gender. The score is low high. And choice of weather test is true or false. Which one of this is not. A categorical variable. Which one of this can we not put into categories. See. I would say see. Nope. Remember see categories are low average and high. This is categories. Also this is categories. And gender can be put into male and female which is. Categories. And remember with gender as well with all the categories there is also a category where it says unknown or unclassified or prefer not to say. Those are other options you can have in your questionnaire when you create a questionnaire. You don't have to restrict people to just male and female. You have other classifications that you can also create with that to say they can say not specified. Not. Prefer not to say. Or you can either say unknown or prefer not to say or none or whatever. You can add those other categories. But height. Height is numerical. So categorical is either one or the other is not like a definite. Yes. So remember anything that you can put into categories. Okay. Right. And it doesn't have to be dichotomous or two things. It cannot. It doesn't have to be yes or no through or false. There can be other categories added like for example. I don't know. Through or false or I don't and I don't know can also be another category added to that. So but the answer is a. Okay. For some reason this thing puts everybody in the lobby. I might not be able to see everybody. Okay. So next question. Which one of the following statement is incorrect with regards to a statistic. Remember what is a statistic. Statistic. It has a big population that comes with the measures that are called. We have a sample which is a subset of a population which has the measures. So which one. A B C D O E. A statistic is a summary measure calculated from a sample. A statistic is an estimate of a parameter. We're looking for the incorrect one. Right. A statistic represents the property of a population and not that of a sample. A sample mean is a statistic. It's a measure calculated from a sample. Is it a sample standard deviation is a statistic. Which one A B C D O E. Sorry. Sorry. Can my coachy. Mute a mic please. Thank you. So number A is correct. Number B is correct. Number D is correct. And number E is correct because all of them describe what sample statistic. Oh a statistic is because a statistic is a summary measure that comes from a sample. We also know from inferential statistic we use statistic to estimate what the value of your parameter can be. Or your population parameter should be a measure that comes from the sample. For example, if we have to calculate the mean from the sample, it's called the sample mean. It's a statistic. Samples standard deviation. It's a statistic because it's also a measure that is calculated from the sample. So the only question here, which is not correct. Or the only option or statement that is not correct is C, which says a statistic is a representative or represents a property of a population. We know that a statistic comes from a sample. We are halfway through all the questions. Which one of the following statement is incorrect? So we're looking for the incorrect. So the statement that is mostly not correct. A population is a complete set of objects in the study while a sample is a subset of a population. This is a description of a population and a sample. A population, remember, that's what we said. Population, Western Cape, all of them. A sample will be if I pick and choose 100 people from Western Cape and study them and they will be a subset from the population. And once I got the results, I can feed them back to the population and say the population was this, this, this. So this is correct. A statistic is a property of a population while a parameter is a property of a sample. We know that population, parameter, sample, statistic. So therefore the incorrect one is B when all the other statements, then they are correct. Data from a sample are in a form of a variable which can either be numerical categorical. Remember, we said there are two types of variables, numerical or categorical. And numerical are also called quantitative. Categorical are called quantitative. Quantitative data are numeric and can either be discrete or continuous. We said that discrete, that means they can be counted. Continuous, it means they can be measured. Qualitative data are categorical data and they can use variables or labels to identify attributes. So labels like VW, Jeep, Toyota, those are light labels that identifies attributes. Question 6, which one of the following variables are not categorical? So this one is for you guys to tell me. A, B, C, or D. Can we put gender into categories? Yes. Name of internet provider like Telcom, VumaTel, Afrihost, all those. Yes. Right. Can we put it into categories? No. So right is the incorrect one. Merit of status, whether you are married, single, widowed, can be placed into category. Classifying an object as defective or non-defective. There are two outcomes. Two labels. Those can be classified. Which one of the following statements is correct? I don't know why statements. Okay. Which one is correct here? Now we're looking for the correct statement. So it means you need to read carefully each statement and eliminate those that are not correct. So gender, merit status, religion are example of qualitative, ordinal variables. Ask yourself, what is an ordinal data that you can put into order, right? Or data that you can put into a data that has an order or logical order. Does merit status have logical order? No. Therefore, it means if I'm using only that, let's take religion as well. Religion does not have any logical order. So that is not the correct statement we are looking for because the key word here is ordinal. They are all categorical, but are they ordinal? No. The amount of money a person spends in a shopping mall is a discrete variable. Ask yourself, discrete variable, amount of money, which means it's numerical, discrete it means I'm going to be measuring that. Now what I didn't mention when I spoke about measure, any value that can take also a decimal, it's measured. Right? Think about it. So now, discrete means counting. So if discrete is counting, amount of money. That is correct. No. Discrete is counting. And I told you anything continuous, anything that can be measured, which is continuous, can also take a decimal point. Money. Money is, can you count money or can you measure money? Encount money. Can you count money or measure money? Anything that can take a decimal point, it's measured, which means it is continuous. Money is money decimal because we measure money in terms of rent and sense. There is a point. Therefore, there is a decimal. Therefore, money is continuous. Always remember that. Okay. Don't fall for that crap that you are counting money because when you take 100 and put it on the table and say one, two, three, you are counting. Remember, with sense, we measuring it. So discrete is it a whole numbers then? A whole. Okay. Yeah, probably we can put it that way. Anything that can only take a whole number, it will be counted. I'm scared to say yes and I'm scared to say no, but a whole number yes, because it has to be a whole, right? Yeah. So this is also not correct. The number of girls with blue eyes is discrete. Good. Yeah. Because you're going to be counting them. One, two, three, four, five until you get to the number of, until you finish counting all the number of girls with blue eyes. So this is correct. The position one finishes in a race is a discrete variable. I think with this question, we have two incorrect answers. Because when you finish in a race, you can either finish position one, position two, position three. Ah, no, but you cannot add them, right? Yeah. But you cannot add them. The numbers represent labels which are categoricals. Sorry. That's actually categorical. It is a categorical. So this is incorrect because the position is not a numerical value. It's a categorical value that represents the position that you end up in in a race. Yes. The number of times, and I think our time is up, the number of times a mouse make a 10, a wrong 10 in a laboratory represents a continuous variable. But that is the number. So number means you can count the number of wrong 10s. You cannot measure them. No. So that is, so only C is incorrect. Okay. So I had three more questions. So you can also, I've shared the link to the, I'm going to share the PDF and share where all the PDFs will go. I still don't have places yet. I will share that. So this is another question that you can also look at on your own. And that was the last one. And I think our time is up. So I've shared this on the WhatsApp group. If you are not on the WhatsApp group after the session, we can discuss where you can, how you can be part of the WhatsApp group. The, on the PDF, on the link that I shared you, the links don't work, but on the PDF, the links will work. This is the schedule for this month. Everything up until the 21st of April, probably, should enable you to complete your assignment one. I think your assignment one, I'm not sure when it's due, a different module, different assignments. Otherwise, then I will see you on Saturday at 10, till 11. Remember to always to use the link to sign up to the classes so that you can have the dates available on your calendar and you are able to get reminders so that you don't forget. I will also find an alternative solution for us to, not to have to sign up to all the sessions and have one automated link that can generate the calendar for you for the rest of the sessions. But for now, this is the alternative I found. Remember to also, you can subscribe to the YouTube channel. The link, I've shared it on the WhatsApp as well. I will also reshare it. When you get to the YouTube channel, we're going to go there. Let me also just open it and then I can show you later on. Oh, sorry. I don't have to click on there. On the YouTube channel, there is a subscribe. So when you subscribe, you subscribe to the YouTube channel. You will receive notification of the videos. You can look at the videos. Remember to like and share, like and share and comment. There is a button somewhere on there that says join. So when you join, you need to choose. Don't choose the first two. First two are just for those who support the content and just want to give something. If you want to receive the videos, because this is the only way I can make a revenue as well for the time that I spent with you. And remember, you only pay as little as $79 rent for the whole month, because it's a month on month membership. So it's $80 for you to view the recordings eight recordings, eight sessions for free that I'm giving you my time. So please make sure that you join and you join, not subscribe. Also, when you join, remember not those two. There is the 999 and the 1999. Those are just for those who support the channel and just want to give a little that they can. But if you want to have access to the recordings and the videos, you need to join either with the $79, $149, $349 or something. I can't even remember now. How much is that other one? It's up to you how much you want to support the channel so that I can also feel like I need to continue doing this so that more people can benefit from this. But it's up to you. Otherwise, the free sessions are also on there. If you go to Playlist, look for the module STA 1610. You will find all the videos for all 1610 sessions, the previous sessions. You can get hold of me via my email, info at pambilianalytics.co.za or you can get in touch with me via my WhatsApp. Only WhatsApp. If you call, you won't get me from that number. It's just only WhatsApp. You can also visit the website and you can go to the YouTube channel. There is the YouTube channel link. It's www.youtube.com. Or you can go and find me personally, Elizabeth Lizzie Boy. It's one and the same YouTube channel. Thank you for coming and I will see you on Sunday, on Saturday. Sorry, Saturday. Are there any questions? Yeah, Lizzie, I'm going to phone you. Just to ask you to explain all these financial options. What do they mean?