 In a previous video, I went over the exact maps you need to land that entry level data science job. In general, there are three areas you should be aware of, which were linear algebra, calculus, and statistics. Now in my opinion, statistics is probably the most useful one to fully grasp, as it's the one you'll use most day to day, and machine learning even comes from statistical learning theory. With that in mind, I wanted to dedicate a whole video with detailed and thorough statistics roadmap, so how you can go about learning it. Now, obviously, statistics is a massive field, and there's a lot of active research going on, so you can't learn all of it. However, if you have a solid grasp and intuition of all the things I will discuss in this video, then you would be a really solid position at the start of your career. Let's get into it. So Wikipedia defines a statistic as a statistic or sample statistic is any quantity computed from values in a sample, which is considered for a statistical purpose. Basically, a statistic is something that summarizes information about data, a sample, or population. And so the first thing I think of body and data science I should be aware of are the different summary statistics out there. In general, summary statistics cover four key things, location, spread, shape, and dependence. And the key things you need to know, or the key summary statistics you should know, are mean, medium and mode, variance, standard deviation, ketosis, skewness, and the different correlation coefficients. Most of these things are taught in junior or middle school in most countries, so you're probably aware of pretty much what they are already. However, if you're not too familiar, don't worry. At the end of the video, I have a whole like resources section where I detailed textbooks, online classes and courses to learn all the statistics I discussed in this video. As data scientists, we will frequently present our results and findings in a visual way. Therefore, we need to be aware of all the different types of plots and graphs in order to pick the right kind of visualization for the task. Now, there are many visualizations out there, but the key ones I think you should know are bar charts, line graphs, scatter graphs, pie charts, histograms, box plots, and heat maps and consoles. You can do more fancy things with all of these, such as a stacked bar chart, but once you know the basics, these are quite easy to do. When people think of statistics, they often think about probability distributions. Now, probability distributions are just a way of measuring how frequent or likely an event is to happen from some sort of statistical process. Their primary use in data science is to understand and model the relationship between the target variable and its features. This is really important because understanding the shape of that distribution will dictate the exact model you will choose for your task. The most important distributions for you to know are the normal distribution, Poisson, Gamma, exponential, chi-square, and a t-distribution. There are of course many others, but these are the ones you probably use most frequently, and the ones that show up a lot in industry. Probability theory basically encompasses the whole mathematical and theoretical framework around how probability works. This area is quite large and is often referenced on its own when being compared to statistics. Anyway, the key things you should learn in this area are random variables, state spaces, population, sample, and standard error, maximum likelihood estimation, central limit theorem, and the law of large numbers. Like I said, probability theory is quite big field, but the lists I gave you above are the most useful concepts you should know for Korean data science. How do you know if a result that you got from a study is indeed significant or not? Well, the way you do this is through hypothesis testing, perhaps the most common hypothesis test or the most frequent one we know is the A-B test, which is used pretty much by every company out there. To fully understand hypothesis testing, you need to know the following, confidence intervals, p-values, and significance levels, alternate and null hypotheses, one versus two-tailed tests, test statistics, z-test, t-test, and chi-square test. These concepts will pretty much cover all your basis, but the general thing you need to know is exactly how does our hypothesis test work and what are the different tests you need in different scenarios. The first algorithm most data scientists learn is linear regression. Now, despite machine learning being quite a relatively new field, linear regression is actually quite old and originates from the early 1800s. Linear regression is actually part of a bigger mathematical field called regression analysis. So as well as linear regression, you need to know multivariate linear regression, polynomial regression, generalize linear models, generalize additive models, and ordinary and least squares estimation. There are two schools of thought when it comes to statistics. The first one is frequentist, which is one with generally torn schools and is a general way we approach most statistics problems. However, there's a second one called Bayesian statistics. Now, even though Bayesian statistics is taught less, it's actually more in line with how we think as humans. And Bayesian statistics has actually been applied in a really positive way to love optimization and machine learning packages out there. So it's something we should definitely be aware of. The key things you should know for Bayesian statistics are marginal, joint and conditional probabilities, Bayes' theorem and Bayes' factor, credible intervals, conjugate priors, and Bayesian regression. Now, stochastic processes is probably not an essential requirement to learn to landing that entry-level data science job. However, a lot of the topics inside stochastic processes, you will probably use at least one time your data science career. So if you have the time now to dedicate to learning it, then I would hardly recommend to. If you're unfamiliar with what stochastic process means, it's basically a sequence of events where each event is modelled as a random variable. And this can be applied to many different aspects of life. So it can be used to model water molecule movements or even the stock market. So it really touches a lot of areas. The key things you should learn for this area are Markov property, Markov chains, hidden Markov models, random walks, geometric Brownian motion and ito calculus. Now, there are endless resources you could use to learn everything I discussed in this video. I mean, there's literally volumes of textbooks dedicated to each section. However, we just need to know the general gist and the main intuition behind these concepts. We certainly don't need to know everything to a PhD level of understanding. In terms of textbooks, my go to would be the practical statistics for data scientist, mainly because it's really easy to understand. And it's specifically designed for data scientists of examples in Python, particularly for Bayesian stats. I really recommend the think based book by Alan Downey. This is the book I use to learn Bayesian stats. I really, really recommend it. If courses are more your cup of tea, then I recommend these two courses that I'll link on screen here and in the description below from Coursera. To be honest, it doesn't really matter which course you pick, as long as a course covers some of the topics I've discussed in this video. Now, I hope this video gave you a really good understanding and explanation behind the statistics you need to land that entry level data science job. If you want to hear more from me, then check out my free email newsletter, Dish in the Data. I send it every Monday and it's full of tips and advice from my personal experiences as a data scientist. If you enjoyed this video, make sure you click the like and subscribe button and I'll see you in the next one.