 The next step in our data science introduction and our definition of data science is to talk about the data science pathway. So I like to think of this as when you're working on a major project, you got to do one step at a time to get from here to there. In data science, you can take the various steps, you can put them into a couple of general categories. First, there are the steps that involve planning. Second, there's the data prep. Third, there's the actual modeling of the data. And fourth, there's the follow up. And there are several steps within each of these, I'll explain each of them briefly. First, let's talk about planning. The first thing you need to do is you need to define the goals of your project so you know how to use your resources well, and also so you know when you're done. Second, you need to organize your resources. So you might have data from several different sources. You might have different software packages, you might have different people, which gets us to the third one, you need to coordinate the people so they can work together productively. If you're doing a handoff, it needs to be clear who's going to do what and how their work is going to go together. And then really to state the obvious, you need to schedule the project so things can move along smoothly, you can finish in a reasonable amount of time. Next is the data prep, you're taking like food prep and getting the wrong ingredients ready. First, of course, is you need to get the data and it can come from many different sources and be in many different formats. You need to clean the data. And the sad thing is, this tends to be a very large part of any data science project. And that's because you're bringing in unusual data from a lot of different places. You also want to explore the data, that is really see what it looks like, how many people are in each group, what the shape of the distributions are like, what's associated with what. And you may need to refine the data. And that means choosing variables to include choosing cases to include or exclude making any transformations to the data you need to do. And of course, these steps kind of can bounce back and forth from one to the other. The third group is modeling or statistical modeling. This is where you actually want to create the statistical model. So for instance, you might do a regression analysis, or you might do a neural network. But whatever you do, once you create your model, you have to validate the model, you might do that with a holdout validation, you might do it really with a very small replication, if you can. You also need to evaluate the model. So once you know that the model is accurate, what does it actually mean and how much does it tell you? And then finally, you need to refine the model. So for instance, there may be variables you want to throw out, there may be additional ones you want to include, you may want to again transform some of the data, you may want to get it so it's easier to interpret and apply. And that gets us to the last part of the data science pathway. And that's follow up. And once you've created your model, you need to present the model, because it's usually work that's being done for a client could be in house could be a third party. But you need to take the insights that you got and share them in a meaningful way with other people. You also need to deploy the model, it's usually being done in order to accomplish something. So for instance, if you're working with an e commerce site, you may be developing a recommendation engine that says people who bought this and this might buy this, you need to actually stick it on the website and see if it works the way you expected it to. Then you need to revisit the model, because a lot of times the data that you worked on is not necessarily all of the data. And things can change when you get out in the real world, or things just change over time. And so you have to see how well your model is working. And then just to be thorough, you need to archive the assets, document what you have and make it possible for you or for others to repeat the analysis or develop off of it in the future. So those are the general steps of what I consider the data science pathway. And in sum, what we get from this is three things. First, data science isn't just a technical field, it's not just coding. Things like planning and presenting and implementing are just as important. Also contextual skills, knowing how it works in a particular field, knowing how it will be implemented. Those skills matter as well. And then, as you got from this whole thing, there's a lot of things to do. And if you go one step at a time, there'll be less backtracking, and you'll ultimately be more productive in your data science projects.