 Welcome to our session on the introduction to basic concepts and techniques of statistical research. Today we're just going to do basic touch-ups. It's not going to go into detail explaining a lot of things, just to get you to understand the concepts used when you do a quantitative research as well as some of the techniques that you need to be aware of when you embark on your studies or when you are doing your research as well, whether it's part of studying or not. So our session is scheduled to end at 12 but we might end early depending on the engagement and how fast we go through everything and the recordings will be shared on YouTube channel. You can follow us so that you can and subscribe so that you can gain access to the recordings. So who we are as Pambili analytics, our vision is to build a community of youth and women mostly who have the digital and data skills needed to become more of data driven people or leaders within whatever the industry they are operating in. Our mission is to bridge the data literacy especially the numeracy and digital literacy but more building the analytical skills gap by training mentoring and coaching individuals and preparing citizens to enter a digital driven global environment. We know that with the rise of 4IR, everybody needs to have those skills that will make them compete and be able to prosper in this digital era. Your facility or some of the first services that we offer is in terms of training with regards to business intelligence, data analytics, research and market research depending on the research question that you have whether is it your studies or is it business oriented questions and then mostly where we operate it's more about skills development and training in the digital and data literacy. Your facilitator for today it's me Elizabeth Boy. I've got more than 20 years experience in the data analytics and management information as well as business intelligence. I have been tutoring for more than 13 years or offering quantitative numeracies and tutoring in the research statistical and numeracy skills. I'm currently also a Sia Pomelela coach which we geared towards addressing issues that deals with student success because we know that this is a challenge. Many students go into the university but they struggle to complete their degrees in minimum time or minimum time plus one and some eventually drop out which also is a loss and it's also a burden for either the person who went to the university or their family and also in terms of our country we are losing those intelligent people that could bring or help us develop our economy because when students drop out therefore it means they've got limited opportunities not that they cannot strive without the qualification but we're trying to also make sure that we keep students in higher education so that they can leave the higher education with a qualification at least. And I'm also a two-time award and an enterprise or entrepreneur award winner. I was the 2022 top western cape business entrepreneur of the year and I was the national champion for the business leverage and last year I was one of the top 10 finalists in the empty and women in digital business program which is still ongoing and the winner will be announced this year. In terms of experience and in terms of tools I've used I've got diverse experience in using tools in the business intelligence side in the data analytics side as well as individualization as you can see all the types of tools that I've used across all the years of experience that I have been I've worked in multiple companies I know this might not be good because it might be telling you how old they might but I've worked in multiple companies as you can see from different sectors including higher education, insurance, credit bureaus, financial so I've got experiences but not in the health so yeah that's me I'm also a founder and a managing director of pambillion analytics which is the company that is running this course. What are we going to go through today it's going to be an engaging session and you are more than welcome to stop me anytime and ask any question if you don't understand we're going to look at some basic concepts especially statistical concepts as well as the techniques needed in order for you to undertake a statistical research or what you seem to call it a quantitative type of a research. The learning objectives for today or some of the topics that we will go through today will cover those basic concepts explaining the terminologies behind those concepts we're going to look at what is the difference between a descriptive statistics and inferential statistics we're going to look at the difference between a population and a sample because when you're doing research you need to understand those terminologies as well. Then we're also going to learn about the sampling techniques I've only curated and selected three of those but there are so many sampling techniques that you need to know but for today we're going to only concentrate on the probability sampling techniques as well and I will also emphasize on the importance of using those sampling techniques and lastly we're going to talk about how to use hypothesis testing to do your research because the base of everything that you need to do is to test your theory and to test your assumptions as well so we're going to go through those but also how to start thinking about what is it that you want to test as well and we will go through a decision tree that will help you guide your thinking around how do you formulate your research and how do you formulate your research questions and hypothesis as well. So let's dig in before we dig in are there any expectations that you have from the topics that I'm going to cover today are there any other expectation that you had that I am not covering that you think that we might include or are you good with the topics that we have I'm good okay cool so we go into continue I just want to get them right when we talk about quantitative research we also or research in a way it's more about a scientific process because there need to be some steps that needs to take place so with the scientific process that's when you pose a research question and it's very important that you understand what your overarching research question is you cannot just start your research without knowing what is it that you want to test or you want to prove because it's all about proving something proving a phenomena that is happening so what is that question that you want to ask yourself and this will help you to better understand what is it that you need to be studying and to understand that phenomena you need to generate possible explanations because those possible explanation will become your assumptions that you are you want to test from that possible assumptions you might also want to create some prediction because you might want to look at whether something is related to the other or you might want to look at forecasting or looking at the future as well but most importantly you will need to collect data in order for you to be able to test those phenomena or to test those explanations or to answer that research question and to do that you need to identify the type of data I know that I'm introducing some of the terminologies that you might not be aware of but we will unpack all those terminologies especially those ones where I have highlighted them the type of data that you will need to answer that research question it's very important to understand that because also when you understand the type of questions that or the type of data that you need then you will be able to ask the right questions as well and formulate them in a way that they will help you to answer that research question then after you have collected your data you need to start analyzing your data and to analyze it you will really need to understand some of the techniques for analyzing the data for example whether are you just want do you just want to describe your data summarize it and describe it purely in terms of putting it in nice graphs and charts or calculating some measures or do you want to do those predictions where you need to then look at some of the techniques including the correlation the regression the t-test and all that so you need to understand those terminologies you need to understand those techniques in order to for you to know what is it that you are looking for to answer those research questions that you have and to analyze the data you may need to make sure that those data when you are analyzing them they support some of the hypothesis so they should always be a link between your data analytics and your hypothesis testing so you cannot say you're going to test the correlation but only just do a bar chart and pie charts there is no correlation between those because we know that a correlation will need you to do a scatter plot and so on so that is why it is very important for you to understand the type of data that you have and also to create those scatter plot you need numerical data so if you only have categorical data you cannot use that kind of information to create your scatter plot so you need to understand the type of data as you can see all of this and it is a cyclical process therefore it means every level that you go through you can always go back and refine there's nothing that says it is a cut throat it stays like that and it is no you can always go back and reiterate and change based on what you see on the data because if you do your data analytics and you find out that the data that you collected that's not actually answer your research questions you might also go back and go and look at your research questions and start redefining your research questions so that it aligns with the data that you collected but if you if you created a proper research question and put in place the proper mechanism to collect your data so that they are able to answer your research question then when you get to the data analytics process you don't have to go back to your hypothesis because all you just need to do is to just validate that your your data analytics and your hypothesis do relate or you are able to answer those or you are able to test those assumptions that you created once you have done your data analytics then the next process is to share your results write your conclusion discussion and you make your decision out of it and that is not what we're going to be discussing for the papers of today so for today we're going to be dealing with the hypothesis the data collection and the data analytics processes not the initial theories that you have including also the discussion and the conclusion that will be another discussion on its own and all these parts actually also through our processes in terms of research matrices we are going to touch and expand on them in more detail so that you are able to do your research or you are able to complete your studies understanding all the concepts in depth okay so let's move on and now look at what are the key components or key terminologies unless if there is another question you have a question to ask based on what we just discussed yeah is there any question no not at the moment oh so what are the terminologies or research concepts that you need to understand and learn when you undertake research one you need to understand what is it that you want to measure measurement a way of allocating numbers according to a rule in order to quantify some constructs and constructs can be any variable or any concept that you want to derive for example let's say you want to test people's depression score how depressed everyone is at so you will have to assign some numbers in terms of you testing the depression level of everyone so to create to quantify that construct called depression you will need to allocate a number to it or sometimes a construct you can allocate not just a number but a label to it but in order to measure something there is a way that you must make sure that you are able to operationalize that measure so for example like I said if I'm going to measure the level of depression that everyone is going through then I will put a score to say those who have low depression will have a score between zero and three and those with mild depression might have three to five and so on and so on and then after that then I will create some measure from it and calculate how many people also have depression and so on and that or or maybe probably let's put it this way I will ask different questions that will help me to create those measures so for example there might be a five point scale questions that relate to how people are behaving or how they feeling and all of those questions when I summarize them or add them together or average them they create this one measure called a depression level and that's how you're going to be saying you are operationalizing those measures now we're talking about like things like a construct and I said it describes this level that you want to measure and those things sometimes we call them variables and a variable is just a characteristics that describes an item so for example some characteristics are visible some characteristics are not visible some characteristics we can observe them some characteristics we can measure them like for example depression level we will be able to measure that because we will create a measure for it but things like a characteristic like the color of my eyes those you can observe because you can look at my eyes and say my eyes are brown or black or the color of my skin you can say it's black or white depending on how we define the color of the skin or let's use something different cars so some characteristics of a car over that we are able to see is the color of the car the model of a car the type of a car whether is it a sedan or is it a van is it an SUV is it a coupe things like that because you are able to observe them when something is visible we call it a manifest variable when it is invisible because things like whether i'm happy or not you cannot see that whether i'm stressed or not you cannot see that we call those things latent variables as well so you need to understand when you create for example a question you need to be able to say what are those type of variables that you have or you're going to ask people to respond to are they going to be manifest variables or are they going to be latent variables and that will also help you because if you're going to be asking questions about whether you want to your research question is about finding out you want to know the level of stress level of individual working at this company you will need to use some questions that are that relates to those latent variables as well and you can also use some variables that relate to manifest variable because they those variables will be visible for example like gender race education level and so on and so forth okay so so we've been talking about the variable and we said it is a characteristics that describes an item or an individual and it can be observed or it can be measured so when we start observing things and we uh say these are the values that comes from that we are referring to what we call the data points or data so sometimes we use these words interchangeably variable data variable data variable data we use them interchangeably but data is actually the values from those variables so the values that comes out of a variable or those measures I can call them measures but those are set of individual values associated with the variable for example like I said if I look at the type of a car I'm driving the type is a variable the data would be a coupé, a van, a siden those are the data values because those data values they are associated with that variable that I am looking at so when when we discuss data and variable sometimes some books sometimes they use them interchangeably but I think for the purpose of this question we are going to be talking about data than talking about the variable but we are talking about data and variable at the same time because data is just the value from that that that is associated with that variable I hope you do understand that so when you know the type of data that you are working with so there are two types actually of data we only talked about the data being values so there are two types of those data points that you can link or separate the first one is what we call a quantitative data point and a quantitative data point is that data that is numeric in nature it's something that you can count or you can measure so it has two characteristics you can count it or you can measure it the minute you count that data then it becomes what we call a discrete data because you are counting it like the number of children I have I can count them that is discrete data it's discrete the minute I measure I take a measuring tape and I measure my height I call that a continuous data so you need to understand those two types because the minute you understand them you will be able to know how to use them and what measure or what statistical method you can apply on those but it's not the papers of today you just need to know that they are under the quantitative numerical they are discrete or continuous data points there is also another category of data which is called a categorical data that type of data it's data that you can put in categories or classify into categories like for example the types of schools we have we've got independent schools and government schools or things like that we can call them government schools and independent schools or one called CAPS and government schools or in higher education we also classify schools in terms of the levels or the levels of schools we have in terms of quintals so we say quintal one two three are those under resource schools and quintal four and five are well resource schools because quintal one and quintal two are non-fee paying schools the majority of time because they rely on the government and quintal three they pay fifty fifty or some some sort so they've got a support from government but they also do charge fee and then quintal four and five the students who go to those schools the majority of time those schools even though they do get government support but the majority of time they fees parents pays for those ones so those we call them the the high level or the well resourced schools so because I'm able to classify those schools in those categories then that means that variable that I use the the type of school it's what has the values of quintal one two three and four they are categorical data within it now when you understand those things you also need to understand the levels the levels of measurements of those data so when we look at numerical remember numerical are those that we can measure or count when we look at those ones they can be classified under for the two skills ratio which with the ratio where the distance between the numbers is known for example when I look at the scale when I measure myself if I've got a zero kg it means I don't exist so there is a difference between that where two zero exists therefore it means if I have a zero kg it means I don't weigh anything I don't exist so zero means something means nothing or it has a an actual meaning that there is no other thing beyond or before zero it starts from zero and it goes up interval in a way to understand that it's also you are able to to find the distance between the numbers but that distance even though is known and it's constant in time there is no different meaning of zero zero is just another number for example temperature so when we go below zero to minus degree minus four degrees Celsius and so on 20 minus 20 degrees Celsius zero means another temperature it means it's cold but it's not colder than minus 20 or minus 14 or minus four so it's not a different definite thing it means it's another cold temperature then when we look at categorical data which are those data that we can classify and place into categories you need to understand that those data they form two levels they've got two levels of measurements or skills of measurements the first one is the ordinar which means you can place the data in two order from lowest to highest in a rank for example like if we look at it in a questionnaire type of a format we might say rate our service from zero to five or from one to five where one means poor and five means excellent therefore it means you are able to rate rate the service level from the lowest to the highest point peak so it means you are able to classify and put all the categories in order of appearance for example the sizes of shoe you can also put them in order because you've got size one size two size three size four size five things like that like clothes small medium extra large or large extra large and double x and so on so they've got scales or levels in terms of how you order them like also for the example the one we use here the levels of education preschool or kindergarten preschool primary secondary and nowadays we've got foundational FET and so on we can use those levels because you need to progress through the levels before you cannot start from primary and go to university you have to go through all the levels but then we also have what we call nominal nominal there is no order in that for example the color of my eyes blue green yellow red whatever the colors of eyes are there is no order one does not precede the other they just you can just categorize them randomly or what we can use the provinces South African provinces you cannot say what is above the other and so on because it's all the provinces they are not you cannot put them in order unless we are measuring something else and this categories also needs to be mutually exclusive therefore it means one category cannot fall in another so when I talk about the color of my skin or the color of my eyes I will not have brown and black eyes at the same time unless if my two eyes are different then I've got a an anomaly so it means something out of the ordinary you always have two eyes that looks exactly the same so for example like the color of your skin you cannot say you are black and white at the same time you only fall into one category of color of skin things like that and that is what we call mutually exclusive so you cannot fall into the other categories are there any questions well we are simplifying it it makes sense how you are elaborating thank cool oh welcome I didn't know that we've got another person joining because I'm sharing my screen I'm not able to see who is also online welcome and if there are any questions well I'm explaining you are more than welcome to also stop me and engage on that topic as well okay so moving on to the next thing that we need to learn so we've learned that you need to understand the type of data that we have right so now we know that we've got the categorical data we also have the quantitative data when you have those kind of data you need to know how you go into summarize that data after you have collected each so you you developed your questionnaire you have your your data somewhere from the primary source or secondary source that you used to collect that data now you need to classify that you know this is numerical data this is categorical data how am I now going to analyze this to answer my research question that's the next step that we have so we've got two types of methods that you can use or the two types of analysis that you can do you can do a descriptive analysis which is just a method of describing and analyzing your data so it's summarizing the data and describing it in in terms of the location in terms of the dispersion in terms of the distribution that is that it's just to summarize it and you can summarize it by means of using tables and charts like when I talk about tables I'm talking about like a cross tabulation or a list or a contingency table and later when you attend one of the the sessions later on or when you go on to our youtube channel you can learn more about what we mean by the types of tables like contingency table what is it contingency table and what is a list table and what is a summary table so you need to go and understand those from that point and we also have like a frequency distribution table as well so but that is a discussion for another day you can also summarize and visualize it in terms of graphs by using like your bar charts your pie charts your histograms your scatter plots your paretoes and so on so you can visualize it and nowadays some tools have nice visualizations that excel doesn't have but there are other techniques or other tools out there that you can use like R or SAS or Python nowadays we do have like Tableau Power BI you can use them to create those nice graphs and they improve the visualization types within these platforms and tools now these days because people become so creative in creating them so yeah then the other method that you can use to analyze your data if you're not visualizing it or putting it in tables you can use what we call summary statistics and those are measures that you develop like for example if you want to understand the average you want to understand what is the highest value the maximum value the minimum value the middle value the number that is repeating more so you use those descriptive statistics or what we call the summary statistics where you describe your data in those measures because and later on we can also discuss about what what we mean by those descriptive measures where we want to look at the location of the data by means of looking at the average the the number that is repeating more which we call it the mode or the middle value which we call the median and also we can use the box plot where we look at the the the smallest value the highest value the 25 percent tiles and all those all we can look at these specials where we look at how far apart the values are from the mean by means of looking at your standard deviation your variation or your coefficient coefficient of variation to look at the risk levels or the risk probabilities or the the weighted risk of your values away from the mean things like that or we can look at other measures like looking at probabilities and so on so there are so many other measures that you can look at the distribution of your data looking at the z scores which tells you are they skewed or positively skewed or negatively skewed and that will will also help you when you do your data analytics because the type of method you will need to use in order for you to analyze the data that is skewed you need to know what type of method you need to do or you can use because you cannot use you cannot do a normal distribution type of a method or like for example an over test when your data is negatively skewed or is not normally distributed you cannot do certain tests and so on so you need to be able to understand those and later on we can discuss that or you can watch the youtube videos where we explain in detail the type of analysis that that you need to do on the type of data that you have based on the analysis that you would have done with your descriptive statistics so that is how you can summarize and present your data or describe your data then we have what we call inferential statistics and inferential statistics is where we infer or we make prediction about your data so this method will tell you something about your population based on your sample and later on we will discuss what population is and what sample is we will cover it in this session but inferential statistics is where we infer in a way so it's inferential infer what we use the sample data to infer what the population should look like that is why for today's session we're only going to pick the probability sample method because when you use inferential statistics and you are going to infer back the results you need to make sure that the sample that you are using is a true representation of your population it's a it's a representative of your population it mimic what the population is it doesn't have to be exactly 100% like your population but it should at least align to what your population in terms of proportion or weighting your population would look like and that is why we need to use the probability sampling otherwise if you don't use the probability sampling method you will use methods which are non-probability like convenience and all that which then the results you you get out of the analysis you cannot infer back to the population they only relate to those group of individuals or items that you have studied you cannot generalize your results but when you use probabilities sampling methods you can generalize your results to your population or to other population especially if the population looks similar to your sample okay so but anyway so the type of method you will use for inferential statistics there are more around estimation like your regression where you estimate the next point or forecasting and so on where we estimate the population weight or mean or things like that from the sample weight or you can use things like hypothesis testing and later on we will discuss what hypothesis testing is but you can use hypothesis testing to test the claim or assumptions that you want to make about your population based on your sample so like I said inferential statistics is about the process of drawing conclusion or making a decision about the population based on the results you get from your sample okay are there any questions is that clear yes so far so far so good okay cool uh now let's talk about we spoke about inferential statistics and I kept on talking about the population and the sample and so on so how do we then define those two terminologies so population is a set of all individuals of interest or all elements that you are interested in studying okay when you want to study the whole population it might be costly timely because it might take you forever for example in South Africa we've got almost 60 million or more people in order for you to study the entire population of South Africa even the statistics South Africa cannot reach each and every person in South Africa that is why they use some weightings and we will talk about the difference between a population and a census so it's very costly and timely um because the population is huge huge huge huge so you cannot reach everyone so that is the population and when we do some measures like we did descriptive analysis of some of the measures I need coffee or tea or something my stomach is grumbling uh if you do some calculations based on let's say you managed you got some research fund that tells you you can take two years to go and reach all corners of South Africa and interview everybody in South Africa and try and find out everything you need about South Africans or whoever lives in South Africa the population of South Africa then you come back and you do your analysis those analysis you are doing like for example calculating the average or calculating the the mode the median the standard deviation from the population we call those measures parameters um and now I'm talking too statistical too technical but you need to understand parameters that's what we measure so that is why you will not be talking about parameters anytime soon or even when you are doing your research you will always not be talking about parameters unless if you're referring to parameters when we are using some of the tools like python and r and they asking you change this parameter and that is not what we're referring to here the parameters here are measures that comes from a population so since the population is big and we cannot study it we know that it's challenging uh then we need to sample out from that population and create this small group of individual or items that is manageable that is not going to take us forever to analyze and we go into analyze that and that is what we call a sample a sample is just a subset of your population and that is why we're going to discuss the sampling method in terms of how do you do that how do you select that sample so once you have your sample which is your subset that small group the measures you calculate from there like your mean your mode your standard deviation and all that those measures are what we call statistics and that is the time you're going to be learning about throughout statistics that is why when we say we are inferring what we learned about or we are inferring what we've learned from the sample data about the population this is what we mean so because we're using the samples statistics or the sample measures to estimate what the population would look like and that is why the terminology the term statistics will go on and on and on because every time we do an analysis we are using statistics right I hope that clarifies what statistics is and what parameter is and why we say statistical research it's because we're using the sample we're not using the population okay so how do we then draw this sample so you can draw this sample by using so many other methods out there but I'm only going to concentrate on the three probability sample methods like I explained previously there are probability sample methods sampling methods and there are non-probability sampling methods and the non-probability sampling methods are like your convenient sampling methods the other one there are other methods it's just that for today my mind is not there yet but we have this probability sampling methods and with this probability sampling method what we're trying to do is with every decision that we make or we come to for using the sample statistics we want to infer those results back to the population and to do that we can apply the different methods and we've got three methods that we can use the simple one it's called a simple random sampling methods and the simple random method it's just a method of selecting every possible sample with the same number of observation which is likely to be chosen for example we say every member who is part of that population has a likelihood or equal chance of being selected to be part of the population that's what we're trying to do so let's say for example you are studying at the University of Crittoria all students at the University of Crittoria are classified as population but you're only interested in learning about those students who are studying a particular program then you're going to select only from those who are doing your we can narrow our population not from being the University of Crittoria but being from that program so now our population has narrowed so it's everyone who's studying at the University of Crittoria who's studying in that program so let's assume that there were thousands of students doing that program now in order for me to select a sample of 20 students from that program I need to make sure that every person who is in that program has a likelihood or a chance an equal chance of being selected and I'll go and select one person and I go and select another person and I go and select another person until I get my sample that is just simple random sampling method everyone has an equal chance of being included in that sample then we have what we call a oh another good example is if you have ever participated in a draw where they say everybody who is here buy a raffle ticket and put your your name in the head and someone goes and in the head put their hands and they pull out a name that is what we call a simple random sampling because everybody who's name was in that head from it's a population that everybody who was in that had an equal chance of being selected but we only selected that one person things like that so nowadays when you are watching tv and there is that draw that they do for net bank psl thingy you must always remember simple random sampling everybody who is in that ball a bowel had an equal chance of being selected and paired with another team that simple random sampling so it will not go away from your mind you will be teaching your children as well going forward I'm just joking then we have what we call a stratified random sample a stratified random sample also everybody has an equal chance of being selected but you need to make sure that you select mutually exclusive groups so for example if you know that you're going to be classifying people in terms of race language and so on you need to group them into those groups sub groups so for example I need to make sure that I select black people white people colored people Indian people white Chinese people whatever the groups you have you are going to take all of them and place them in different groups and from those different groups you're going to go and select your sample from that so if you are going to use proportionate sampling for example you will look at let's say you've got a population of 60 percent African 20 percent colored 10 percent there's 10 percent there so you're going to select 60 percent of your sample will be from black people so 60 percent you select from that black people 20 percent you select from those colored people 10 percent you select from white people and then that will create your sample from that so let's say you wanted 20 people so 60 percent of 20 now I need to use a calculator what is 60 percent of 20 and I'm a statistician I don't know these things by heart because I didn't train my mind to be like that so let's say I need 20 people and 60 percent of black people so it means 12 people will be from the black people group so I'm going to go there and select 12 people one after the other so that until I have 12 people so everybody in the black community would have had equal chance of being selected and then we have what we call cluster sampling we're also clustering or cluster sampling it's almost like your stratified random sampling but with cluster sampling you also group your people into clusters so you group them so for example you can say if I live here in Cape Town everybody in city of Cape Town is one cluster everybody in the Karoo is one cluster anybody in the Stelenbosch is one cluster so I'll go and select depending on how many people I need I'm going to select 10 people from city of Cape Town 10 people from Stelenbosch 10 people from the Karoo 10 people from wherever I select until I make up my cluster so it looks almost similar but stratified you use proportionate as well okay so you do have other sampling like for example the KM method sampling the inconvenient sampling and the convenient sampling is like what the researcher prefers and those are non-probability sampling methods because what the researcher can just walk through the street and say I want to interview this person and I walk down the street and I see that one and I'm like that one doesn't look like they are friendly I pass them and I go to the next one then the KM method you do it systematically so it's what we call a systematic random sampling where you you systematically select so you first initially say from the first house if you're doing by house or from the first house then I'm going to select the next fifth house in the row so you go and you you go to the fifth house fifth and so on until you have collected all your samples but those are type of sampling methods like I said there are so many but these three are the most used and relied on methods especially if you want to use probability sampling methods because using stratified random sampling might get you the proportionate in terms of the the relation between your sample and your population and you might get the ratio right and when you get the results you are able to generalize those results back to the population are there any questions we are on the row and since I have been talking a lot I don't like to talk by myself are there any questions I usually have some like questions and activities that we go through but for today I thought no let me not scare you so that you can participate in future events are there any questions let me ask at the end I have a question but it's it's it's a on a qualitative research about sampling but I'll ask it at the end okay all right um so I'm gonna leave at least um yeah we're almost done by half past we should be done with my slides and then we can have any conversation we we want for the next 30 minutes until 12 o'clock so let's get on to it and we are almost there so we've been talking about the how what the statistical research would look like remember we started by explaining that you need to formulate your research question so your research question will have some assumptions that you would want to use in order for you to answer that research question so those assumptions are about what we call hypothesis testing statements let's put it that way so we need to hype to hypothesize that question and maybe create multiple hypotheses because in a quantitative research or any other research usually they will say have a research question and under your research question you have your research objectives and those research objectives should have hypothesis testing statement or they can have sub research questions or sometimes they say have a research aim or the high over a key and then the sub research questions things like that so depending on the structure and your way you are studying you must follow what they ask you to do but usually when you create your research question you will have your research objectives and your research objectives will be linked to your hypothesis testings most time or you can create your research questions and sub research questions and those sub research questions will have your hypothesis testing so that you can answer those research questions that you have right so how do we do hypothesis testing so a hypothesis testing is just the method of testing your view or your assumptions that you have so if you've got a claim how do we prove that claim that's hypothesis testing the first thing that you need to always remember is to make sure that you state your hypothesis testing statements and there are two hypothesis statements that you need to play so or you need to test there is a null hypothesis statement and there is a alternative hypothesis so that is what we call always there is two sides two sides to the coin or of a coin something like that or let's use the normal one that we use in law we say you need to be proven guilty or not let me not use that because I'm gonna butcher it in a way but you're either guilty or not guilty so how do we prove that you are not guilty something like that that's what hypothesis testing is about so you've got your null hypothesis which is what do you want to prove what the researcher is claiming to be true and you have your alternative which is the opposite of that plate so you always have those two side so you have a head and a tail you want to prove that the coin is is landing on a tail and you say no it's going to be landing on a tail on a head something like that then you need to also define what type of a method you're going to be using so the way you the way you pose your hypothesis testing statement also has different things that can influence what will happen at a later stage so you need to be very careful when you state your hypothesis testing and when we when we go into quantitative statistical research in detail you will learn that when you state those null hypothesis and alternative hypothesis what are the things that you you need to be stating or what are what are you trying to prove for example is it a one-directional test or is it a two-directional test so is it a less than or equal or not equal and is it greater than those kind of things you need to understand them we do discuss this in detail in some of the videos on youtube so you can go and follow but at the later stage we are also going to discuss some of these techniques in more detail so we will have a section a session where we discuss hypothesis testing so but let's not go into detail so once you have your hypothesis statements developed then you need to define what the decision method would be so like i said with those directional and non-directional they will help you how you're going to make your decision are you going to be using a one-directional test or are you going to be using a two-directional test and that will help when you're making a decision because we have what we call some of these statistical tests that you do and they produce some calculation or some values called the p-values or the critical values and all those we use them to make those decisions and in order for you to know whether are you doing a one directional whether is it a one tail test or a two tail test or are you doing both sides you need to understand those decision methods well in advance before you even go into your analysis so that then when you get to the conclusion and you make your conclusions you know what you are concluding about then you need to understand how do you do the calculations you will go and collect that data that you have but you need to know what method or statistical technique i'm going to call it technique or test statistic which is a calculation or a formula that you will need to use to calculate are you doing an ANOVA or are you doing a T test or are you doing a Z test or are you doing an F test things like that so you need to know what is it that you're going to be calculating and make sure that also when you do those because there are some assumptions that you need to be aware of as well so if i'm going to be given or working only with sample data with no population data then it means i must know that i'm doing a T test and this is the formula to use to calculate in that T test or whatever if i'm using excel or i'm using another program i need to be sure that i am selecting the right statistical test for it once you have done the statistical test you've calculated it it produces some summary statistics or some measures then you go and use those measures and make your conclusion or the decision whether to reject your null hypothesis or to accept it usually we reject or do not reject the null hypothesis and make your conclusion and revert back to your research question and say through this is that but in a statistical research manner once you get to this point you also need to look at what other people have said before and related to that whether does your results get supported by previous researchers or not that is part of the conclusions that you would be writing when you're writing your research papers or research articles but for the papers of this session we're just going to concentrate on on making decisions based on the information that is in front of you okay so we will have a session that dials deep into the hypothesis testing how do we do it how do we calculate it how do we use how do we interpret the results and so on how do we make decisions out of it but there's other steps that you always need to remember that when you're going to prove that hypothesis if you didn't have if you didn't follow some of this then it means you are missing some of the steps in between so now you know what hypothesis testing is all about you have your research question you have stated your hypothesis but before you do that because you know your topic might say uh I want to predict or I want to find the relationship between student performance and race something like that it's just an awkward question the research question that I'm posing so the minute I put there is there a relationship between student performance and race the key word there relationship student performance is it in category call data or is it in numerical data race it's categorical data so I need to think if I'm going to be using categorical data then I also need to know which statistical test I need to be doing so that is what we are coming at at this point so what you are going to be testing or what the research questions you have has an implication in terms of how you're going to select which test statistical test like I spoke about assumptions needs to be made for example if you want to test whether there is a difference between one group which means males performance of males is there a difference between performance of males you need to also ask your question do I have the population standard deviation or I don't have the population standard deviation when you are able to answer that question then you will know which statistical test you're going to be doing or using whether you're using r python excel irregardless because you need to ask yourself am I testing the difference between one group am I giving the population standard deviation if I'm giving the population standard deviation I know what the population standard deviation is which is the the measure of dispersion then I need to do a z test so I means I'm going to be using a z test statistic if my population standard deviation is unknown I'm only using my sample data and I've got my sample statistics oh sorry my sample statistics which is my sample standard deviation then I'm going to use a t test one group you need to ask yourself those if I need to ask and my question says I need to find the difference between two groups two groups males and female and they need to be independent or probably old or let me not get to that independent and dependent so if I'm going to be finding the difference between my research question is find the difference between two groups males and females or two groups those who are doing first level something like that I need to ask myself two questions are these two groups dependent or independent what do I mean by dependent and independent dependent groups they influence one another one group can be in another group for example first level therefore it means if I'm let's say for example I'm looking at students who are doing modules in this program they can be doing modules in that program different modules at the same time so it's dependent groups so they are not one person can be in multiple groups they are dependent on each other independent groups they do not have any bearing on one another so they are distinct one group does not have any characteristics of the other group or one person cannot be in two groups males and females if I'm using gender identities of those both groups person identifies as male and person identifies as female those are two groups so if I'm doing two independent groups I need to know that I am looking at the differences of means between those two groups and I'm going to use what test of type of test statistics I'm going to be using independent groups if I'm using two dependent groups I'm going to be using two independent groups but per matched for example these are what we call the before and the after so I can do a test before and a test after of same thing so for example what I always say is I can test the performance of my staff before they go for training so this is what they their performance look like so in terms of if I work in the call center I will say the measures from that the ratings from the survey that we got before they they went on a customer service training so I got these ratings and they go on a service training workshop and I then take the ratings after that so before and after and I can compare both of those groups so that is what we call dependent groups the before and after controls and I think in what you call in health they usually say the controls and the placebos or something like that so you get those who get the drugs or before they before the drug and after the drug or something like that so it's not the same so the independent will be those who receive the drug and those who don't receive the drug those are two independent groups but if I look at before they take a drug and after they take a drug and I want to look at their behavior did it change is there a difference before and after then you are doing a dependent groups so you need to know which ones you are using because if you look at the statistical tests you would see that they are different the method in which you are calculating even if it's a T test but they are different in a way that you are going to be calculating them therefore it means the results you're going to be getting would be different now when we talk about the relationship so is there a relationship between two groups those things that you need to ask yourself is am I looking at numerical data or categorical data if I'm looking at numerical data remember the numerical data can be classified as a ratio or interval so if I'm looking at two numerical data then I'm going to use what we call a Pearson R test we all know that those correlation numbers that we get 80 percent correlated 20 percent correlated or negatively correlated or positively correlated those are the ones we're going to be looking at if I'm looking at two categorical data therefore I'm going to be looking at what we call a chi-square for independent what is missing here usually is not really related to those who are in like educational research or psychological research or I think also in health you normally don't do that but in things like insurance finance and all that they can use that it's what we call a goodness of fit test which only looks at one numerical data a relationship between that so we look at that one numerical data but we're looking at two groups and then we use what we call a goodness of fit test because we want to find that relationship but if I'm looking at two categorical data then I'm going to be using what we call a Pearson chi-square test okay what we just spoken about here are parametric tests so parametric tests it means we're making assumptions that our population is normally distributed usually therefore we can use this there are other tests that you can run which when you use those tests you cannot generalize the data as well what we call a non-parametric test like for example you can look at different other tests like Wilcock test which looks at the difference between two groups which almost similar to what independent groups are we also have a Wilcock signed rank test where it looks at before and after but the results you get from those tests you cannot generalize so for example if you know that you use convenience sampling method then you cannot use any of these methods you will need to use those methods that I referred to there is also SPMN also for non-normal distributed data we will discuss those in detail in other sessions as well but these are just some of the concepts and methodologies and techniques that you need to learn and know in order for you to do your quantitative statistical research way actually you want to make assumptions about your population and you want to infer back and on that note I think that was my last session when it comes to the content and I think we are right on time I said we will finish at half past 11 and we are at that point so remember also we do have a I do have a YouTube channel it's not a company YouTube channel but I'm interchangeably using it as both I have a YouTube channel remember to go there and subscribe you can also join a channel by means of supporting me because running free sessions like this for two hours it means they don't pay bills but I do appreciate the support that people give so you can subscribe or join the YouTube channel and pay a small fee of subscription some videos are free to watch like for example today's sessions video will be free I will upload it everybody can watch it but some videos I only get to those who subscribe and pay a subscription fee as well so but feel free to go and subscribe to my YouTube channel it has grown and I really support or I thank those who subscribe to my channel and they keep me going and also do comment on some of the the videos and give me suggestions in terms of what other topics you might be interested in in future and so on if you want to get hold of me or us in pambili analytics these are our contact details you can find us also on the website or email oh what's up from us pambili analytics web success and data analytics methods that's where we are and we support you so let's have a discussion questions q and a session I'm gonna stop sharing so that I can see you are there any questions can I see on that note if there are no question add use see you next time bye