 Hello everyone and welcome to another episode of Code Emporium where we are going to talk about probability distributions. Now there are certain distributions that we use in academia and also some that we use in industry and there are some differences between them and I kind of wanted to highlight what we use in the industry and also give certain examples of use cases for every distribution that we're going to talk about. We're going to talk about five of them and we're going to talk about details of each. Before we get started with the video if you could just do me a favor and hit that like button and also subscribe for more videos like this because if you like this and other people can see it and they like it and then even more people see it and it just grows exponentially. Also do check out our Discord server it's down in the description below we're talking about some amazing things there let's create a community together and with that let's get back to the video. Coming up at number one we're going to talk about the normal distribution the standard bell curve that everybody knows and loves and is probably one of the most studied distributions in academia. Now in the industry though we don't have extremely apparent usages of the normal distributions from an application standpoint but they are super important to understand as like a hidden concept. For example let's take hypothesis testing. So hypothesis testing is a framework of A B testing where we state a hypothesis we try to gather and collect data and either try to prove or just disprove how ridiculous that hypothesis is and then we come to a conclusion of either rejecting or just not rejecting that hypothesis. Now in order to disprove a hypothesis or reject a hypothesis we use statistical tests and these statistical tests sometimes come with an assumption of normality. So what exactly is normally distributed or what is the requirement in these statistical tests to be normally distributed? The main assumption here kind of gets into the central limit theorem. So essentially it states that the distribution of sample means is supposed to be normally distributed. Basically the central limit theorem tries to relate the population with the samples that we take. So if we have a population of 500,000 and then we take a sample size of 1000 and compute the mean of that sample and just repeat this procedure again and again maybe let's say 400 times. To get 400 means the distribution of those means should follow a normal distribution. Now this is typically followed especially if the sample size that we're taking is at least 30 and so it's not too hard, it's not too strict of a condition. However for certain statistical tests there is an additional assumption that is much stricter and it is that the population from which we are sampling should also follow normally distributed data. The idea behind a lot of these statistical tests is to make sure that we get a representation of the population within a single sample. And when we're comparing populations and samples it's easier to compute or compare their means. So we want to compare the population mean and have that be approximated into the sample mean and then the sample standard deviation should be an approximation of how erroneous we are with our estimation of the population mean. Now in order to simplify the mathematics for all of this, the assumption that we need to make is that the standard deviation of the population is independent of the mean of the population and this is only true in the case of normally distributed data. And so we need to make the assumption that the population is also normally distributed and that's why we kind of see this assumption when making, when doing something like the t-test. And in this way the assumption of normality becomes the cornerstone of statistical testing. Another case where we see the assumption of normality is in linear regression where we need to assume that the distribution of residuals follows a normal distribution. Now what this means is that typically when we're doing machine learning or machine learning training specifically, we have a label and we have a certain algebraic expression of features that we want to compute and that should be as close to the label as possible. However, this is typically not going to be exactly equal to that regression label and we are left with some small amount of error. This is error that rises because like we just don't have all the features to exactly compute the label or it could just be the randomness inherently in the system or some other reasons too. Now what we need here is that if we were to take every single samples and determine the error for every single training sample and then plot the distribution of those errors they should have some sort of central tendency and they should follow a normal distribution. If this assumption is not met then we won't accurately be able to draw a line that actually fits the data for linear regression. I hope it's pretty clear right now how we actually use a normal distribution in the industry and it's not always 100% apparent but it's still a super important concept that you need to know. Coming up on number two, we have a relative of the normal distribution and that is the log normal distribution. This is typically used to model data where we have a lot of samples that have a small value and a few samples that have a large value and it is actually a lot more apparent that we see or more obviously that we see this log normal distribution than actually the typical bell curve. For example, let's take an e-commerce platform where we want to plot the distribution of orders associated with all products and what you'll see from this distribution is that you'll have a lot of products with a few amount of orders but there's a few amount of products with a lot of orders. Example number two, let's talk about social media where we want to plot the distribution of users subscriber counts and try to see what distribution it actually models and from this you'll see that a lot of users will have a small subscriber count but a few users will have a large subscriber count. Example number three, we can also model the distribution of household income. Some households make a lot of money but there's a lot of households that only make a little money and so we get a kind of long-tailed log normal like distribution and so you can see that the log normal distribution can be used to approximate so many different types of data and it's also quite apparent. More technically we can kind of tell if something behaves like a log normal distribution if we just take some samples from that distribution take the logarithm of those values and then plot a distribution of those values and you'll see a normal distribution or kind of like a bell curve. So clearly this distribution is super versatile and can be used in a host of applications to approximate many kinds of data across varying fields. I'll link some more resources down below of where you can find log normal distribution. Coming up on number three we have the uniform distribution. Now the uniform distribution kind of comes to mind when I talk about randomness even though all distributions do exhibit some form of randomness the uniform distribution allows us to equally select samples with equal probabilities. A good place where I've used this at work is in Monte Carlo simulations so let's say that we wanted to simulate a game of bingo where every user every player is given a five cross five grid where the grid contains random numbers between one and ninety and then there's like a person who calls out the numbers from like the bingo ball and then they call out very random numbers so in order to generate these numbers that are called it you would be sampling from a uniform distribution without replacement and then in order to also generate the boards which is like a five cross five board for every single player you can generate these with another uniform distribution where the probability of like any one number being called out is the same and the problem between one and ninety and the probability of any number even occupying a square is also the same which is between one and ninety there are also many other uses for this uniform distribution but I find it pretty useful in this context at least at work coming up on number four we have the beta distribution now the beta distribution is a distribution of probability values what this means is that if you have a distribution and you sample from this beta distribution you will get a number between zero and one that corresponds typically to a probability in the industry we tend to use this for modeling certain metrics such as click-through rate or purchase conversion numbers where they lie between zero and one or they can be represented as a percentage let's now talk about a super practical example of how we would use this beta distribution specifically in Bayesian testing so I know Bayesian testing is another framework of implementation of AB testing kind of like hypothesis testing is but in this case we are trying to compare two probability distributions based on prior data so we have a prior knowledge of how a system works we now collect experiment information and then we update our beliefs in order to reflect like what we have seen during that experiment and then we just make a decision in our example let's say that I work at YouTube and I have designed this amazing new YouTube recommendation system where I want to test out its effects on click-through rates and we are going to choose the algorithm that definitively has the higher click-through rate and so with this I can actually create the experiment and have a distribution of click-through rates for our old system have the distribution of click-through rates for the new system both of which are now beta distributions and then we can just compare them and decide whether or not we should go with the new algorithm so the beta distribution just becomes super useful in this context and the final distribution that is super useful to know is the chi-square distribution now for this distribution if we were to take a normal distribution sample from it square each of those values and then plot another distribution that distribution would more likely follow a chi-squared distribution this is really useful in the machine learning context for feature selection so specifically for this context we want to run a test of independence I run an e-commerce platform let's say it's target on the online side and I want to build a fraud classification system I have a hunch that whether or not a person has diapers in their cart could affect or be indicative of whether or not a person is committing fraud and so I want to test this out and so in this case I have a fraud label which is binary and then I have a diaper feature which is also binary you either have or don't have a diaper in your cart and then we make some observations over time and conduct a chi-squared test get a chi-squared test statistic and a p-value and then based on that we could see how dependent or not dependent that they are so for example if we were to reject the null hypothesis that the two values are going to be independent of each other then that means that there might be some sort of dependence going on and we could potentially use diapers in our feature set granted correlation is not equal to causation but still that diaper feature could potentially be indicative of fraud and would still require some further investigation another place that we could use this is in the goodness of fit test essentially you would use this if you are questioning how a certain system works let's give an example here so I work for the department of education and in that department of education we are going to conduct some SAT tests right so standardized tests which are multiple choice each each question will have an abc and d bubble now the the assumed hypothesis here or the expectation is that the probability that the answer is abc or d for any given question is 25 so it's it's equal throughout however I have a hunch that you know maybe maybe this is not the case what if like you know people what if like these questions are framed with you know the the answer c is actually more probable than the others well I can test if this is true with another chi-square test in the sense of a goodness of fit test so essentially we have the expected values which is the 25 for abc and d we have the observed values depending on how the test goes like maybe we'll have a is 20 b is 25 c is 35 d is 25 or something like that and then we can combine them in order to get a test statistic and a p-value using the chi-square test and then we can decide to see whether or not our our hypothesis of this is a good fit is to be rejected now both the goodness of fit test and the independence test they are broadly kind of very similar because they are comparing two categorical values and because you know they're kind of similar they come under the same umbrella but they're also very broad they can be used in many applications in many in many ways like this and that's all we have for now so I hope you all enjoyed this video I hope a lot of this information kind of makes sense from an industry standpoint of where you would actually use these distributions now this is a non-exhaustive list there are definitely more distributions that you should be knowing and that you'll probably get as you continue to work but I hope that this was still super useful in just discussing the details of like what we have for these five main distributions that we use thank you all so much for watching be sure to leave a like and subscribe and also join the discord server down in the description below because we're gonna have a lot of fun and I will see you all in the next one bye bye