 Welcome to how to know if your data is ready for data science. This is the second video in the series, Data Science for Beginners. Before data science can give you the answers that you want, you have to give it some high quality raw materials to work with. Just like making a pizza, the better ingredients you start with, the better the final product. So in the case of data science, there are some ingredients we need to pull together. We need data that's relevant, it's connected, accurate, and we need to make sure we have enough of it. So the first ingredient we need, data that's relevant. So take a look at the table on the left. We met seven people outside of bars in Boston and measured their blood alcohol level, the Red Sox batting average in their last game, and the price of milk in the nearest convenience store. This is all perfectly legitimate data, but it's only fault is that it isn't relevant. There's no obvious relationship between these numbers. If I gave you the current price of milk and the Red Sox batting average, there's no way that you could guess my blood alcohol content. Now, check out the table on the right. This time we measured each person's body mass and counted the number of drinks they've had. The numbers in each row are now relevant to each other. If I gave you my body mass and the number of margaritas I've had, you could make a pretty good guess at my blood alcohol content. The next ingredient is connected data. Here's some relevant data on the quality of hamburgers, grill temperature, patty weight, and the rating in a local food magazine. But notice the gaps in the table on the left. Most data sets are missing some values, and it's common to have holes like this, and there are ways to work around them. But if there's too much missing, your data starts to look like Swiss cheese. Looking at the table on the left, there's so much missing data, it's hard to come up with any kind of relationship between grill temperature and patty weight. This is an example of disconnected data. The table on the right, though, is full and complete, a great example of connected data. The next ingredient we need is accuracy. So, here are four targets we'd like to hit with arrows. Look at the target in the upper right. We get a tight grouping right around the bullseye. That, of course, is accurate. And if you look at the next target on the lower right, oddly, in the language of data science, our performance on that target is also considered accurate. If you were to map out the center of these arrows, you'd see that it's very close to the bullseye. The arrows are spread all around the target, so they're considered imprecise, but they're centered around the bullseye, so they're considered accurate. The target in the upper right is both accurate and precise, so that's a good job. Now, look at the upper left target. Here, our arrows hit very close together, a tight grouping. They're precise, but they're inaccurate, because the center is way off the bullseye. And, of course, the arrows in the bottom left target are both inaccurate and imprecise. This archer needs a bit more practice. Finally, ingredient number four. We need to have enough data. If you think of each data point in your table as being a brush stroke and a painting, then if you only have a few of them, the painting can be pretty fuzzy. It's hard to tell what it is. If you add some more brush strokes, then your painting starts to get a little sharper. And you know when you have just barely enough, you can start to see the pattern and make some broad decisions. I look at this and I think, is this somewhere I might want to visit? It looks bright, the water looks clean. I decide yes, that's where I'm going on vacation. As you add more data, the picture becomes clearer and you can make even more detailed decisions. Now, I can look at the three hotels on the left bank. I say, you know, of those three, I really like the architectural features of the one in the foreground. I'll stay there right on the third floor. With data that's relevant and connected, accurate, and with enough of it, we have all of the ingredients we need to do some high quality data science. Be sure to check out the other videos in Data Science for Beginners from Microsoft Azure Machine Learning.