 videos that I have, but if I do, it's fine. I'll explain what they were. So thanks so much for coming today. My name is Heather Shapiro. I'm a technical evangelist at Microsoft. And you might wonder what a technical evangelist means. I like to say we're geeks who speak. We're developers without a product team. So we go and we speak to communities, startups, students, and try to get them interested in technology, specifically for students. So when we talk to middle schoolers or high schoolers, how can we get them interested in technology in general and then show them what Microsoft products are available that they can use as well? So my background is in statistics and computer science. I graduated last year from Duke with a double major in comp science stat. So that's been my focus as a tech evangelist to show all the different machine learning and data science tools that are available. So thanks so much for having me here at PyGotham. It's really exciting. I'm going to talk today about it's an end to boring data. So how can we use visualizations in Python and what tools are available to create more interesting data sets or more interesting meetings that you can use for your teams? So the things we'll cover today are why data visualizations are important. We'll look at a case study of New York City restaurant ratings. We'll look at what libraries in Python work best for different types of graphs. But first, why do we need data visualizations? I like to make this interactive, so if anyone wants to shout out why we might need data visualizations? Exactly. Yeah, exactly. So if we look at this Dilbert cartoon, Linda's giving Colin a spreadsheet that has lots of numbers. And he's like, oh, it's so boring, my head hurts. And I found that often in meetings that that happens to me, where my manager just shows tons and tons of data, and it makes no sense to me. And I'm just more of a visual learner, and I think that is a lot of people as well. Also, it's just like too many numbers. Like you said, Excel doesn't need more than 256 columns. Like, why is it available? And then just so many spreadsheets, you just want to get rid of that. It's just not enjoyable. And then we've also all probably seen like the visualizations gone wrong, where there's just like way too many variables, and you have no idea what's going on, or there's like no title, or it's just like all over the place. So there are different things that we can take into account when we're creating visualizations and make them better so that people can actually understand what's going on. The main goal is that if you look at a visualization within like 10 seconds, they can tell what's going on. And you can make an assumption about your data. So what it provides, it helps the visual learner. So if you don't like to look at numbers, and if you learn better with like pictures or graphs, that helps you. Like you said, it makes sense of like tremendous amounts of data, puts it into an understandable format. It helps you walk through a problem. So when you're doing like quantitative analysis, you can do exploratory analysis and just look at all the different things quickly. You don't have to look at the numbers right. You don't have to look at the numbers. You can just look at a graph and it'll show you where things are outliers, where things are going wrong. And it can tell a story in seconds. So what makes a good chart? If you keep it simple, so the charts that we saw are really all over the place. And you don't want that. So you use colors that are easy to read, easy to see, that don't like contrast each other too much. You want your data to be sorted so that it's not like A, Z, B, all over the, and not in order, or like not in the right date, time, order. You want them to be easy to compare as well. So if you've ever seen like charts and they'll change the axes and then everything is all off and you're like, wait, but these look similar. And then you realize like all the numbers are completely different. And they're doing that to kind of like mess with you. That's what a lot of like reports will do. They'll change the axes so that it looks better for them. And so those are things that you have to look out for. So I'm going to talk about some New York City restaurant ratings. So if you've seen in the past few years, you have all these restaurants have letters on the outside of their stores, delis, cafes all over. If you're anything like me, if I'm really hungry or I'm hangry, I'll walk in and I'm like, eat all the food and I don't care what the ratings are. I'll just like go in. But that doesn't always work because you get sick sometimes. And then you're like, OK, like maybe I should have thought about this more. And if anyone watches Parks and Rec, let's see if this works. Hold on, I'll try. Thanks for coming, man. Of course. This bowling alley has my favorite restaurant in Pawnee. Really? You're not scared to eat here? When I eat, it is the food that is scared. So we can all be like Ron Swanson. Sometimes you do need to look at the data and there's things like, yelp, that will help you. And you can look at the ratings. But if you're like me and you really like looking at data, there's these opportunities to look at it. So if you've ever looked at New York City open data, has anyone ever seen this website? Oh, a lot of you. OK, cool. So I have it open. So New York City open data has over 1,500 data sets that are available. There's government data. There's business, housing, development. There's things for, let me make it a little bigger. There's education and things for how many people have called 311 and all the different information like that. So there is a dashboard with all the releases. So you can see MTA data, city bike data. And so what they have is actually restaurant inspection results. So I looked at this. It's all open and free. And that was what I wanted to use today. And actually all of this is on GitHub right now. So if you go to Heather B. Shapiro on GitHub, that will be where this is in the iPiFound notebook. I couldn't put the data set in there because it's too big for GitHub. So you'll have to download it on your own. But everything else you can follow along in here. So what New York City open data provides, it gives you the restaurant, the borough, building, everything. And then it tells you what kind of cuisine it is, inspection date, what was wrong with it, if it was critical, score. It's just a lot of numbers. And you can look at this, but there's over 400,000 restaurants in here. And it's updated daily. So it's kind of difficult to read. And the good thing about open data here is that you can make charts if you want. So you could look at, let's say, score, grade, and score. And they're not the best graphs. So you can look at all of this. And it allows you. So it's pretty cool that this is all open. And it's just available at your fingertips. And you can just download whatever you want. But we can make better visualizations that are more succinct using Python. So if you wanted to download it on this website, you can go to export and then download a CSV or JSON, all these different options. So if we go back to the presentation. Yeah, so they give you data sets with all different variables, so like name, borough, building, what was the grade, score, and that information. So I have a quick video that explains how they go through and do this New York City restaurant rating. It's actually pretty new. So here, Deli just started. Can everyone hear? Recently, the New York City Health Department started giving all restaurants, bars, delis, and bakeries letter grades after routine sanitary inspections. What we're doing with restaurant letter grading is taking a score which was hard for people to interpret in the past and putting it in a simple form from the letter grade A, B, or C, and in a place where people can see it before they choose to die in that restaurant. We started the restaurant letter grading program for two reasons. First, to give people information that they want to have about the restaurant inspections. Second, to provide an incentive for restaurants to have better food safety practices. We know that some thousands of people each year in New York City go to hospital emergency rooms with food-related illness. We first got the idea of restaurant letter grading from Los Angeles, and after they instituted putting letter grades on the front of restaurants, they saw about a 20% reduction in food-related illness. That's a big health benefit that we think that we can do here. When our inspectors go into restaurants, they inspect everything from whether or not they have a sink for people to wash their hands, to the temperature of the refrigerators, to where the food is left out, whether there appears to be good preparation surfaces, to whether there might be evidence of mice or cockroaches. And establishments with scores less than an A have the opportunity to improve their grade. If a restaurant gets a B and has to post a B, we'll be reinspecting in about six months. If they get a C, we'll be inspecting in about four months. The first restaurant to get an A in New York City was Sparks Deli. This is out of 24,000 restaurants. I was the first one. We love this place because it's clean, it's meticulous. It always has a tremendous variety of wonderful foods and hot meals. And it's obvious this is a clean place to anybody that comes in here. There's a lot of restaurants you walk into and they don't look desirable to eat in there. You have to, oh, am I gonna be okay after I eat this? Most citizens, when they pass by a place, they don't know what's going on in there. I want my customers to be okay. I want them to be happy when they leave here. And I don't want them to be sick in the evening either. These guys deserve an A, you know, because they have people cleaning up and mopping around all the time. This showed that any restaurant or corner deli can have good food safety practices. So we were very proud of them. So that's just like a little information of like how they go through the restaurant ratings. So now you know a little bit more of like what goes into that letter grade, because if you look at the scores, they're pretty confusing. When you look at the scores on the outside, it actually goes up to like 108. And so when you see like a 95, you're like, oh, it must be out of 100. And it's pretty confusing because it's not out of 100. So you think it's better than it is. So that's kind of why they went for this like grade system. So in Python, so what are the steps that we have to take? So we have to load the modules, load the restaurant rating data, try to figure out how we can understand the data and then visualize it. So some of the tools that I use, these aren't all the visualization tools. These are just ones that I wanted to test out, see which ones I liked, which ones I didn't like. So these are just a few of them. So we have pandas, mapplotlib, basemap, folium, seaborne, bokeh, and plotly. So to get started, you'll need pandas. So has anyone worked with pandas before? Well, probably, okay, cool. So pandas is a Python library that provides data analysis features. It's built on top of NumPy, SciPy, and to an extent, Matplotlib. And it allows you to have data frames in series similar to what you'd expect in R or Excel, Matlab. It just shows you all the data in a table that you can index. And I'll show you some examples. So I'm kind of gonna go back and forth between the slides and code. So if anyone's ever worked with IPython or Jupyter notebooks, it makes it really easy for presentations that you can just run things in real time, show what's going on. And for the sake of time though, it's all already loaded, just in case, but I'll rerun other things. So you'll need to import NumPy, pandas, Matplotlib to be able to do some of these basic features. But to read a CSV, you can just do pdreadcsv, and then all of a sudden you have this restaurant data. So we can see that now it looks like a dataset that we're used to seeing. It's not just random data. In the past, you would have to go through every line in the CSV, put it in. So pandas really takes away that extra hurdle that you need to go through to understanding data. You can figure out how many are in it. So right now in this dataset, there's 450,000, which is hard to compute in Python. So I kind of went through and looked at only Manhattan, because that's where I live. So I wanted to see what are the restaurants in my area, or how are the scores and grades related? Are there ones that have scores that are really high and are giving a good grade? So in scores, the lower the score, the better in this sense, actually. So the way that they did it, the number of infractions that they have adds to your score. So if you have 100, that means that you have really bad cleanliness in your store. So with pandas, you can look at the columns that you have. You can look at the value count. So you can see that for the different grades, there's 65,000 A's, 13,000 B's, C's, Z is that they haven't been graded, and P is that they need to be regraded. So they had a critical flag, and now have to go back and get graded again. So this doesn't add up to the 450,000. So there's a bunch that aren't graded, so I removed those. Removed all the ones that are null. And then we can see score. There's about 84,000 scores in there, so it still doesn't reach that 450. So there's a bunch that have no data in them that we don't really need to look at right now. So 84,000 is a huge sample set that we can look at and still get pretty good results. So I removed all the null values, and we can see the mean is about 13 for score, but the highest is 131. So some stores that we're going into have a lot of critical flags that are going on. And then we just have to, oh no, it's always when the demo gods, it's actually, that just orders the levels, so I'll figure out that. But when you remove all these different rows, you have to look at the indexes because it kind of keeps all the numbers already in there, so it would have been 4,500 all in different rows, so you have to reset that index for them. And then you can see the different types of violations. So the ones that came up most common was non-food contact surface improperly constructed, unacceptable materials used, facility not vermin proof, all these different things came up a lot. So if we want to look at visualizations though, we can use Matplotlib, which is more of a MATLAB like plotting framework, and then Panda's kind of builds on top of that. So in Matplotlib, you have long things of code that you have to write out, you kind of have to write every single aspect of it, and you'll get figures like this, histograms that show you the average score. So it's interesting, good or bad, that there's so many low scores. It could be that they're not rating properly, or do you think every single restaurant has no infraction? So you can look at it in different ways. But then if we want to do it in Panda's, there's kind of a shorthand, where you just say you want to look at that score, put it into a histogram, say the title, and you get the exact same thing. So you can do that and you get the same thing, and then you can also just change the bins. So that's how many brackets you want. And it creates really easy plots. You just have to say, I want to look at this column, make a histogram, you can make a bar plot, you can make pi plot, that shows all the different things, and you just have to say MRS grade, to get the value counts for each one, dot value counts, and then plot, kind equals pi, you could do kind equals bar. And if we wanted to see all the critical flags, there's about 45,000 critical flags that are tagged in this data set. So that's something interesting. And we can look more into which ones, and of what level they are. So we have things like Seaborn, which is another platform that you can use that is built on top of Matplotlib, but it creates more sophisticated graphs that look more professional, they're more appealing to the eye, and you just have to do one line that's like strip plot, or there's violin plot, you say what you want to compare, and you get graphs like this. So they just make them much more visually appealing, and these are things that you couldn't do solely in Matplotlib or Pandas, and it does it for you. So if you wanted to see you have this strip plot, it's just basically a bar chart or scatter plot, but you can also do it where all the values don't lie on top of each other, which makes it really easy, more easy to see, and you don't want to, there's so many values, there's like 80,000 that if you just see this one line, it kind of gets deceptive. Then you can also put this like critical flag in there and have legends that say, okay, for every grade, what's the score, and then were they critical, were they non-critical, or did they just didn't get a flag at all? So if you look at A, there's a bunch of critical flags in there, which is kind of interesting because why do they still have an A, or will that change? It just gives you a lot of information that if you're looking at the numbers, solely you can't really tell as much. So when you see it this way, you see that there's, yes, there's a lot that are non-critical, but there's also a lot that are, and they still have that ranking. Then you can also see, for those that have a C grade, their score is low, or some of them have scores that are less than 20, which is usually what they give for an A, so why are they a C? So it's just different things that you can look into, and it shows you more information about what your data is, what your data has, what's going on. Then you can see box plots, you just have to say box plot, grade, score, and then you can change the hues, and then if you wanted to look at a bar plot, you can see the mean score for all of them. So it is good that for A, they're all low because that's what you wanna see, you wanna see that there aren't any mistakes and scores, and that for the most part, these people have no infractions. But like I said, what we saw before is a lot of them have critical scores or critical grade, so why is that there? Other things that we can do also with Seaborn are change the color palette. So you have options like this, so you can change it, and there's all different ones, you can make it six, and you get different colors, so that they just make it much more visually appealing. So then other things that I wanted to look at were mapping. So what tools are available for mapping? We have the address, we have the street, we have the house number, and then we have location, so what can we do with that? So before I go into the tools that we used, I had to convert all the addresses to that long, and that proved to be somewhat challenging because some of them are, you'll have First Avenue, written as one, and not with the ST, or you'll have First Written Out, or you'll have the First, actually, and then so there's all different ways that it could be written, so I kind of had to go through and figure out all the different options that were available, change them, and so when I did that, basically I went through, and you can use this inflect package, allows you to convert numbers to the actual word, so one would go to O-N-E, and then you can change one to First, and then you have to go the number one to First, so there's a whole layer of things that you have to go through. Then I went through, and for every word in the street, look at it, see if it matches one of those number words, and change it, so then you get, so now all the streets are named properly, so you have East, Sixth Street, West Fortieth, and went through and it did it for all of them. Then you have to find all the do-coding for these, so due to rate limits, it was very hard to do in Python, I was able to write code that would do it, but I would have, would have had to wait weeks probably because there's so much data, and you can only hit like 45, you can only hit the API 45 times in like an hour, so that really proved difficult, so what I did was I created a sample set that has the same seed every time so that I could just keep doing this, so the sample only has 100 in it, and when you set the seed, it just means that every time you do it, you'll get the same data, so if you set the seed to 10, every time it's the same thing. So I wrote these all out to a text file, and I used an online geocoder that you could just put all the addresses in and it would do it, so it did it pretty quickly in like 30 seconds for all of them, and then I just read them as the addresses and get the lat longs, so now you can see for all of these I have the latitude longitudes for all of them, and this is something that is lacking in a lot of the mapping tools that I've seen, is just that you can't just put in an address, you have to have the whole latitude longitude for them, so in other languages there are options for that, but in Python, I haven't seen any tools that will allow you to just do that easily. Okay, so the different options that you have for mapping or base map, it was pretty hard to install, so I wouldn't necessarily suggest it, maybe it was just for Windows computers, but there were a lot of prerequisites and the documentation wasn't there, and then when I finally did get it, it was kind of, you don't really get much granularity from it, it doesn't do streets, it doesn't do streets, it doesn't do cities, it just does land versus water, and it's good for, if you wanna do heat maps for the whole world, it's good for that, but when you're looking at this granular data of just cities and you want street by street, it doesn't really work so well. But I found Folium, which allows you to visualize data on a leaflet map, and it has built-in tile sets from OpenStreetMap, MapQuest, all these different things, and it allows you to get interactive. So we have, basically what I did was I put in the latitude, longitude of New York City, and then for all of the lat longs, if it had a score of a grade of an A, gave it one color, grade B, different color, and you get graphs like this. So you can cluster it, zoom in, see all of them, so you can see that has a score of 20 and a grade of B, and you keep going and you see all this really granular data and you can zoom out, and you don't have to have it clustered, so if you wanted to get rid of that, change it. So all you have to do is say you want it to be clustered and then add those markings to that cluster and it does it all for you. So it was really easy to use, really easy to install. So now you can see all of them on the plot. Clustered data just makes it easier to zoom in because there's so many, even with just 100 data points, it's really hard to see. So, but they do give you that flexibility. The other thing that's really cool about it is that they allow you to save it as an HTML. So if you have a website or a blog, you can get the HTML code for it and put it, this looks a lot, but it has all the data in it for you and you just have to plug that into your website. I haven't gotten it to work with WordPress because you're not, this JavaScript functionality there is a little challenging, but with other WordPress, with other sites like just HTML, CSS, you can just do that easily. So that's a really cool function of just like if you wanted to make your websites more interactive, visual, it's a really easy tool to put in. Then there are other ones like Boca, which give you more interactive charts and you can do like histograms like this, but you can like move it around and you can also, you can also make colors, so you can make it really easy like that. Boca allows you also to save the HTML code, make it more usable for you in like the web interface and it was a little bit more challenging than some of the other tools because there's so many different versions that every time I search something online, I would get brought to a different version page and then they'd be like, wait, this isn't the right one, like go to this one and like I was like searching for hours to try to get to the right, just even had to do a histogram. So while it does give you really, really nice graphs, it doesn't, the documentation was just like really all over the place and maybe if you have a lot of patience, you can go through that, but it just wasn't as user friendly, but it does allow you to like save them, you can move them around, then you can also do things like you can have tabs of different graphs, you could add widgets, sliders, so it does give you that more functionality than the other ones and it's much more user friendly for web interface. You could do chart, you could do maps, but it was also like not very granular and much more difficult to use. Then there's things like plotly. Has anyone ever worked with plotly before? Few people, yeah. So plotly is available in a bunch of different languages. You have R, you can use it in like MATLAB and it allows you to create visualizations that are linked online. So you can make a free account at plotly.com, you can make a free account there and it allows you to get like 100 requests per hour and there are pricing tiers that you can use but for this I just did the free one and when you do that, so you can create all these different visualizations. So the extra layout code is really just so that I can make it look pretty but really to just do it, you had to do like histogram X equals the grade. So we get graphs like this that we can see are pretty interactive. You can look at just certain parts, you could zoom out but then the cool thing is that you can edit these charts online and then they give you a quick and easy way to like share it or export it. So if you wanted to just easily embed it into your website, they give you the iframe for that. So it makes it so much easier and it was a tool that I really enjoyed working with because you get all the support that you would normally want. So we can make graphs like that, different density scores. So these are the same of the scores that we saw before but now it's just more interactive. So these are depending on what you're doing, if you just need a basic graph, you might not need Polly. If you just wanna show like for a presentation, okay, here, here's a static plot then something like this isn't necessary but if you're trying to make an interactive tool or create a blog, these are more useful. And then yeah, so you can create different types of histograms and you just like look at certain parts of data, zoom in, zoom out, you can compare data. So you can see all this at like a really granular level. You can keep going in and see. So if it was like a scatter plot, you could see that. But then I tried to do it as a map and they don't have city data either. So this is the best you can really do for a map. So I could keep zooming in and you can kind of see it. So you get all the same information but it's not as useful as the following one where you could see the streets. You could see everything that's going on. This is just the total overview and there are so many a's that you still don't really see that much differentiation between them. And it could just be that I chose a hundred of them out of 84,000, so it just could be that sample but for loading purposes, that was another thing is when I did try to do it with a lot more data for all these, it takes a really long time. So you have to, sampling is definitely useful and usually the rule of thumb is like more than 30 is a good sample but obviously the more amount of data that you have, like the better it is. Are there any questions before I like do some closing stuff? Yeah. So I, it was batch geocoding, sorry, hopefully this Wi-Fi. It was find latitudeandlongitude.com and then you can do batch geocode. And so I just put it into a text file with all the different street addresses and I could just copy that whole hundred lines into this and it would geocode them line by line. And so I saved that as a text file or saved it as a CSV actually and then I just had to import that back into pandas. So it was much easier than me going through and waiting the hours on end to geocode it on my own because there is geocode library called geocode or geocoder and there are other ones that are based on like Google Maps. It's just the API limits are really low and if you want to pay for more, you can get that but I didn't want to pay for more. So what's the address here to UN Plaza? Oh, cool. And so that will give you ability to, oh, cool. Awesome. Yeah, the thing, there are so many of them that if you just like search online, you can find them. You can see, yeah, so we'll do all of them in batch geocode and yeah. So it was just a lot faster and it's free so there are probably limits but for a hundred it let me do it in about 30 seconds and that for my timing I was like, this is perfect. Any other questions? It depends, I guess if you're like, because a lot of these are based on JavaScript so they're using it in the back end anyway but I guess it depends if you were starting in Python versus if you're starting and just if you have data in like and you're making a website. So if you have the ability to use Python and like that's what you started with then it's useful but I've also done it the other way where I didn't want to put all my data into Python and like manipulate it and I kind of just created the graphs through JavaScript. So it really just depends on what your preferences are and where you started and what you're doing. Okay, so I was wondering if you had done anything with ggplot, I know why hat has ported it from R to Python so I was wondering if you had any thoughts on that package? Yeah, I've used ggplot a lot within R so my back like I started with R from school. ggplot is a little more complicated to get the syntax down so that's why I didn't want to talk about it here too much but the cool, the nice thing about ggplot is like you do, you can change everything really. You can change like all the colors, you can change all the axes, the text, it gives you so much functionality, it's just really hard to get the syntax down so it takes a little bit of time to learn that and then otherwise it's a really good tool, it just takes a bit of time for a learning curve. Any other questions? Yep, so that's folium. So folium does that for you. You just have to say you want a marker cluster and put all of your data points into that cluster and it does it. So it was really easy. I literally just put the data points in and it made everything for me. I just had to say what lat long to start at, how far I wanted to zoom in and then it does everything for you. Cause then if you even, if you go back to it, put the clusters back on. If you zoom in and out, it changes the clusters for you so that functionality already there and it just makes it really user friendly so you zoom out, you can keep zooming in except I'm in the wrong spot. You can continuously zoom in and you even see all the restaurants that are there or other locations and just cool spots that are around. So you see it in such like, such fine detail and all you have to do is import folium. Like I didn't import anything else. I just said folium map, marker cluster and then for each marker, each lat long put it on a point and it does that. There are any other questions? I'd say folium, folium and plotly. It really depends on like what you're trying to do because like for me, colors matter a lot as well. So like just the use of Seaborn, if I don't need an interactive plot it makes them like much more visually appealing but folium like was so easy to use and plotly was so easy to use that if I just wanted to put in a website I could and it just made, the documentation was really great and made it very user friendly. What do you mean? So Boca will do that but it was very difficult to figure out how. So they show you all these examples but none of them actually work anymore. So you try to do it and then the libraries have changed. So you could in some of them if you did have like a scatter plot you could narrow it down to just like those two like two plots on that and then go into more detail about them. So you can do that. It's just I couldn't figure it out but if other people know that would be great. All right, so just like, so now you can go put them in your blogs. Yay. So just like some more closing thoughts based on like what we talked about and all the different libraries. So pandas, if you're doing data science you need to use pandas. It's basically, it's just so easy and it's simple for plots. You do have to be willing to learn like matplotlib to be able to customize it but if you're just trying to do any analysis pandas is using sci-pi is using numpy. It's using all of the statistical packages that you need and it makes it a lot easier. Seaborn supports like more complex visualization but also requires matplotlib knowledge of just like how to customize and make different changes and the color schemes are like a really nice bonus. Base map was really hard to install for chloropath maps or heat maps. It could be really useful if you're looking at like everything from really far out but it wasn't very robust and, oh I have typo, whoops. Not high granularity for like maps. So you can get city data. You can get even state data really. You could just get land, water. And like I said like folium was really great documentation for mapping. The one thing that I wish you could add was like more interactive widgets. So adding a slider, adding a legend that you could select and buttons. Boca is kind of like overkill for simple scenarios because the main things you can change are just like going back and forth what you select. So if you just need to make a plot I wouldn't suggest Boca but if you're doing things for like a website those are really, it's really useful. And then plotly if you want to make a free account those are the most interactive graphs and you can save them offline and create rich web based visualizations. But it wasn't good like we saw for mapping. So it really depends on what you're working towards and what your end goal is. Question. Have you played with Tableau at all? Do you like it? Yes, I've used Tableau. I used it a few years ago. I used it when it was like first coming out on Macs because it wasn't there for a while. So there were a lot of bugs to it and it kind of ran slow but for mapping and a lot of visualizations I thought it was really good. It's not free either. So you can get a free tier but to do the more intricate things it was more difficult. So things we covered. How you can look at visualizations in Python. How to walk through a data problem. So now we've seen this New York City restaurant ratings. It can be a little bit more skeptical because some of the A's had critical ratings. Some of the grades were higher or lower. So maybe we didn't figure out which restaurants were worse than others but it kind of makes you just understand what's going on more when you go to these restaurants and you see these scores, you see these ratings. What does it actually mean? We looked at what libraries are useful and what for what. And great ways to update your blog. So if you need resources, so I have my website is microheather.com, my GitHub, Heather of you Shapiro. Then you can find other New York City open data, data.gov. If you are using a work computer or you can't install your Python on your computer we do have with Microsoft the data science virtual machine in Azure and you can get a free trial for that and it basically comes with Python are all the different tools that you'll need and has it set up already. So if it's something that you don't wanna install you can use virtual machines. If you're interested in just learning about machine learning or data science we also have Azure machine learning which is a drag and drop tutorial or tool that's in the browser. And basically you have data sets you can upload your own data sets. You don't need to know any statistics really you just have to drag it and it'll apply the algorithm for you and you can see which algorithm is best so it's a really good learning tool. And then we also have like Channel 9 and Microsoft Virtual Academy, YouTube, all those different things there's plenty of videos on it. So if you wanna contact me my Twitter is microheather. My email is hapiro at Microsoft and thank you. Woo. Thank you.