 Hello everyone and welcome to another episode of Code Emporium where we are going to talk about some mathematics that you need as a data scientist. I currently have two years of full-time data science experience and I'm hoping that this video can help impart some knowledge onto how we actually use mathematics in real-time real projects in the industry for data science. So in this video we are going to be breaking down mathematical concepts into five major tiers. I'll start by mentioning these five concepts and then for each concept we're gonna go into details of how we would actually use these with certain examples and also sub-concepts. Before getting into the video though, please do give this a like because the more you like, the more that this video is spread to everybody in the world and the more they like, the continuous process just goes on. And so please be sure to give it a like, subscribe if you like this content and also we do have a Discord server so please do join because we're gonna be talking about some amazing things over there and we would love to have you. So with that out of the way, let's get back to the video. Our first topic of mathematical interest is probability. So under probability let me just mention a few concepts that we actually do use. So starting with distributions. So distributions gives us the probability of occurrence of every item that is of interest. Now these distributions of items can be in many forms. The normal distribution can be used to model data with the central tendency. The log normal distribution can be used to model data where we have a lot of samples which have a small values and then there's a small number of samples which have large values. The uniform distribution can be used to model data where the probability of picking every item is the same. And like this, we can have so many more applications too. So it's super important to just at least understand these distributions at a high level and then probably get into details of how they could potentially be related to each other as well. The next topic in probability that I think is super important is maximum likelihood estimation. The broad framework of MLE tries to state that let's say that we're given a model, right? And we're also given some training data. We want to estimate the parameters of the model such that the probability of seeing that training data is the highest and we do this with maximum likelihood estimation. This is typically used during the training part of machine learning and is kind of behind the scenes where we're using maximum likelihood estimation to derive a certain loss function that we wish to optimize or in this case minimize. And so if you've ever been interested in like how loss functions for certain machine learning algorithms actually appear, then learning maximum likelihood estimation will help you better understand where they come from. So it is a super important topic to know, although we do not use it directly as much in your field of work. The next topic that is of super interest is Bayesian testing. So Bayesian testing is a framework for conducting AB tests while also combining the idea of well, the Bayes theorem. So it's an implementation of AB testing where we have a certain assumption about how a system works. We then gain some experimental data, we try to conduct an experiment, and then we update our beliefs of our prior knowledge in order to create our new understanding of the system. We are now able to use this new understanding of the system to make certain decisions on whether or not, for example, to make certain product feature changes. So it is super useful in the context of AB testing, and I do encourage everybody to know about this coming up on concept number two, we're going to do statistics. And for this, we're going to segue like Bayesian testing, another very famous AB testing framework, which is hypothesis testing. So essentially hypothesis testing, it is a framework of AB testing where we state a hypothesis, gather some data, and then try to prove how ridiculous that hypothesis is or is not. And then based on that, we would make certain product level decisions. There are a bunch of statistical tests that come under this umbrella. So for example, we can use the t test to have two samples and try to compare their means. We can have the chi-square test, which is used as a test of independence or goodness of fit tests to see if two categorical variables are in any way independent or dependent of each other. We also have some less strict tests, which have less assumptions about the distribution of data. And they can be used. For example, we have the man witness test or the signed ranked test. Learning all of these types of statistical tests, what they actually prove and don't prove is super important, especially in just doing anything or testing anything in data science. And so I super encourage you to learn a lot more about it. The next topic of interest here is the central limit theorem. So essentially, the central limit theorem is more of a hidden concept within the confines of AB testing. So it tries to relate the sample to a population. The population is a group of interests and the sample is a subset of that group. The core of what the central limit theorem states is that the distribution of the sample means is a normal distribution, and it is typically a requirement in many of the statistical tests, including the t tests. And so understanding the central limit theorem will actually help you better understand where the derivation of the t test is coming from, where we get a test statistic, how we get p values. Coming up on our next concept, we have the law of large numbers. When I think of the law of large numbers, I typically think about convergence. So basically, let's say that we have a dice, right? Now what is the expected value of the die? It is essentially going to be the sum of probability of getting each side times the value of that side itself. And that expected value will turn out to be 3.5. So as you perform simulations of rolling that dice, you roll it once, twice, a hundred, a thousand times, you will see that the average of all of those rolls taken until that point will eventually converge to 3.5. And so what the law of large numbers states, more technically, is that the mean of performing a large number of trials is effectively going to converge to the expected value of whatever that system entails. And so this I feel is super useful, especially when we're carrying out Monte Carlo simulations. In fact, I've created a video on how I simulated a game of bingo with Monte Carlo simulations and inherently the law of large numbers does come into play because of convergence. So please do check that out for more details. Now another concept of interest is confidence intervals. So confidence intervals comes into play whenever we're making estimates of anything. So for example, we could be making estimates of the coefficients in a regression model, or we can also make estimates about the output of the regression itself and how confident we are that it is close to a true value. So for example, let's take the first example of the coefficients. So in this case, when we make an estimate of a coefficient, we can say that, okay, we can create a little confidence interval that you've probably seen if you've used something called stats models, where we can say that we now predict this coefficient to be this value, but there is like a 95% chance that it lies between the range A and B. And so that range between A and B is the confidence interval, the more narrow that is, the more higher confidence we have typically. And also if they are of the same sign in the case of feature selection or in the case of like the stats model sheet that you get. Now this other second application that I mentioned, which is the output of a regression model, if we want to determine a confidence interval, we want to determine an upper and lower bound. And this is typically done with something called quantile regression that I've illustrated in pretty detail in another video. And it just shows that, okay, we made this prediction right now, but it can range between A and B with a certain like 95% chance that the true value lies between A and B. And so this overall becomes the confidence interval. And it's pretty useful just to get to know the uncertainty of certain estimations. So very important concept to know, and I do encourage you to learn it. Coming up on topic number three, we have calculus. Calculus is the study of how things change. A lot of what we use in calculus is not necessarily rooted in day to day work, but it is required for understanding how certain, for example, loss functions are created. So I mentioned before about maximum likelihood estimation. And if we wanted to derive a loss function from the maximum likelihood initial statement, we would need to use a lot of calculus in between. And so I've put that in here as just a requirement to understand and to learn. So although you're not going to be using it in day to day work, calculus in your day to day work, you still do require it for just understanding how like loss functions are created or how optimization works and how things change, for example. And so I've mentioned in this list. Coming up on topic number four, we have algebra. So algebra is a huge umbrella of just anything containing variables, right? But here there are so many subtopics that are of quite high importance. For example, matrices and vectors. Computers tend to process information in the form of matrices and vectors to take advantage of parallel processing and just make millions and millions of operations happen much faster. This is how data is typically input to any model and also is processed by your machine learning models too. And so understanding matrices and vectors and how matrix and vector calculations are performed, like multiplications, additions and even matrix calculus can be super useful. Another point of interest here is linear regression. So in this case, I put it as its own bullet point because I encourage everyone to take some data, take some like two dimension set of points and try to actually fit a line through that data and try to also use mathematics to simplify what the equation of that line would be. Now, this is kind of a very base fundamental of how you can expect other machine learning models to work, although with other models, it's going to be really hard to kind of derive on paper. But with linear regression, I think it's a great start. Another topic of interest here is graphs. So graphs, I mean like the visual Cartesian coordinate system. In this case, we can use graphs for so many things from like data visualization to actually just visualizing how the loss decreases over time during training. So in terms of like data visualization, though, some of the graphs that you would need are like line charts, box plots, scatter plots, correlates, which are like heat maps and so many more. So understanding and having all of these data visualization tools in your arsenal would be great for you because it'll help you in just like visualizing any kind of data that's out there and also just making an explaining data in just in the form of pictures. It's just so much easier for stakeholders to understand, even for you to understand internally. So please do learn that as well. Coming up next, we also have evaluation functions. So for example, if we were to do some classification problems and to evaluate how wrong or right is our model, we have certain metrics like precision, recall, the F score. And for regression models, you would do something like the mean absolute error or the mean squared error. And so learning these metrics will actually help you determine performance of your models, so it's super useful to learn. Coming up on the fifth and final topic of today's video, and it is optimization algorithms. So optimization algorithms essentially decide or they determine how we are learning. A big prerequisite for understanding optimizations is functions themselves. So functions can have maxima and minima. Typically when we're learning in machine learning, the loss function is at a certain arbitrary point. And then we have to figure out a way to minimize that loss as much as possible in order to find a global minima. So understanding functions, understanding the concept of maxima and minima is all intertwined in this understanding of optimization algorithms in general. A good first start to this is understanding how gradient descent works. And from there, you can try to, you know, just tweak this a little bit in order to make it stochastic gradient descent, then mini-batch gradient descent, Atagrad, AdAdults, RMS Prop, AtomNatom. There's so many out there that you could probably just continuously tweak these strategies of learning and just get these new and potentially even better strategies, depending on your problem. I made a video on optimization algorithms that you can check out. And essentially it describes everything that I just mentioned and also says like what the difference between each of these optimization algorithms is. And so I do encourage you to check that out. And that's all I have for you today. So just note that all of the five mathematical concepts that I mentioned is definitely non-exhaustive and probability statistics. These are used in so many more places that I could, I could just go on and on, honestly. And I think you've noticed that in this video there's a lot of overlap between some of the concepts and also where they're used. But I hope this still gave you a good idea and a good starting point of where you can start trying to focus your time depending on what you want to learn in mathematics so that you can become a data scientist really soon or maybe just continue to be a data scientist if you already are one. All this is super fun and I hope you get started learning. Thank you all so much for watching today and if you like the video, please do drop that like. Please subscribe, ring that bell and also join us on Discord because we're going to create an amazing community with you. And until next time, I'll see you soon. Bye!