 Welcome to coding and data science. I'm Bart Polson. And what we're going to do in this series of videos is we're going to take a little look at the tools of data science. So I'm inviting you to know your tools, but probably even more important than that is to know their proper place. Now I mentioned that because a lot of the times when people talk about data tools, they talk about it as though that were the same thing as data science as though they were the same set. But I think if you look at it for just a second, that's not really the case. Data tools are simply one element of data science, because data science is made up of a lot more than the tools that you use, it includes things like business knowledge, it includes the meaning making an interpretation includes social factors. And so there's much more than just the tools involved. That being said, you will need at least a few tools. And so we're going to talk about some of the things that you can use in data science if it works well for you. In terms of getting started, the basic things. Number one is spreadsheets. It's the universal data tool. And I'll talk about how they play an important role in data science. Number two is a visualization program called Tableau. There's Tableau public, which is free. And there's Tableau desktop. And there's also something called Tableau server. But Tableau is a fabulous program for data visualization. And I'm convinced for most people provides the great majority of what they need. And though while it's not a tool, I do need to talk about the formats used in web data, because you have to be able to navigate that when doing a lot of data science work. Then we can talk about some of the essential tools for data science, those include the programming language R, which is specifically for data. There's the general purpose programming language Python, which has been well adapted to data. And there's the database language SQL or SQL for a structured query language. Then if you want to go beyond that, there are some other things that you can do. They're the general purpose programming languages C, C++ and Java, which are very frequently used to form the foundation of data science and sort of high level production code is going to rely on those as well. There's the command lined interface language bash, which is very common as a very quick tool for manipulating data. And then there's the sort of wildcard, supercharged, regular expressions or regex. We'll talk about all of these in separate courses. But as you consider all the tools that you can use, don't forget the 8020 rule also known as the Pareto principle. And the idea here is that you're going to get a lot of bang for your buck out of a small number of things. And I'm going to show you a little sample graph here. Imagine that you have 10 different tools and we'll call them A through B. A does a lot for you B does a little bit less and it kind of tapers down to you've got a bunch of tools that do just a little bit of stuff that you need. Now, instead of looking at the individual effectiveness, look at the cumulative effectiveness, how much are you able to accomplish with a combination of tools? Well, the first ones right here at 60% where the tool started, then you add on the 20% from B and it goes up and then you add on C and D, and you add up little smaller smaller pieces. And by the time you get to the end, you've got 100% of effectiveness from your 10 tools combined. The important thing about this is, you only have to go to the second tool that's two out of 10. So that's B, that's 20% of your tools. And in this made up example, you've got 80% of your output. So 80% of the output from 20% of the tools, that's a, that's a fictional example of the Pareto principle. But I find in real life, it tends to work something approximately like that. And so you don't necessarily have to learn everything and you don't have to learn how to do everything in everything. Instead, you want to focus on the tools that will be most productive, and specifically, most productive for you. So in some, let's say these three things. Number one, coding, or simply the ability to manipulate data with programs and computers, coding is important. But data science is much greater than the collection of tools that's used in it. And then finally, as you're trying to decide what tools to use and what you need to learn and how to work, remember the 80 20 rule, you're going to get a lot of bang from a small set of tools. So focus on the things that are going to be most useful for you in conducting your own data science projects.