 Hello everybody. It's a pleasure to be here. Today I'm gonna talk about practice, but first let's start talking a bit about myself, because my name is quite weird to pronounce, so I will just present myself. I am Guillen Rurambajaste. These are my Twitter and GitHub accounts. When I was younger I wanted to be a hacker, so that's why I studied telecommunications engineering. But for the last five years I've been more focused on learning Python and data science and trying to invent my own artificial intelligence algorithms. I'm currently working in Madrid source, which is a real good startup, but I am from Mallorca and I'm actually really proud to be one of the organizers of the meet-ups from BiData that we hold there. Why here? Well, now it works actually better. Do you hear me well? A few months ago, Bial Estella, who's a friend of mine, found inside the bmb.com, which is a website that periodically scraps Airbnb and makes the data available to the public. He wanted to study how Airbnb influences the real estate market, but he has no way to make full visualization about his findings. You can see here how inside the bmb collects data sets from all over the world, but mainly from North America and Europe. And, well, we also have some data sets from Australia and Hong Kong. Each one of these data sets contains 92 different columns with a very wide range of features such as how the information was scrapped, listing features about the daily price of a renting, how many bedrooms or bathrooms it has, and so forth. And also has location features like the GPS coordinates that I displayed on the map in the inside Airbnb webpage, the booking requirements that an owner makes to be able to rent his listing, and also information about how the people who rented the listing reviewed that listing. We will be using data from both Mallorca and Barcelona. And, well, each one of these data sets contains 14k columns and 17k columns respectively. So it's in the range of what a browser can handle easily. So this could not be considered big data. This is one of the visualizations that the inside the Airbnb webpage has, which shows how the Airbnb listings are distributed in Barcelona. Here, for example, you can see in green listings that correspond to a whole apartment or house in red. You can see the ones corresponding to one room, and there's a few ones here in blue that corresponds to rooms that I shared if you go to the website. You can see the same representation, but for data from Mallorca. And you can expect it to mean it actually works quite well when you have enough RAM and has Google tooltips and filtering options. But I think this is written in JavaScript, and this is EuroPython. So I wanted to make a tutorial on how to display this data. Broting maps is easy, but it can really be a pain in the ass if you don't know how to do it. So I wanted to make a tutorial for people that wanted to try to plot geographical locations, but had no idea how to start. So I can at least save them a few hours if they just download the repository and clone it. So today, we will talk about how to plot Google Maps using Bokeh, how to work with shapefiles, which are a data format to work with geographical locations, how we can plot the shapefiles using all of yous and geo views, and how we can use big data to make cool representations with data shader. So let's get the hand on the notebooks. These notebooks are meant to be used with any data set available in insert BNB, but as I am from Mallorca, I will be mainly using the data set from the island. So how many of you know Bokeh? Please raise your hand. Okay, there's a lot of people here. Well, the ones that you know, please check the documentation. It's an awesome package for making any kind of visualizations. And the ones who already know it will find it extremely easy to work with Google Maps. It is basically like building a regular figure object, but with additional features in order to indicate how we want our maps to be displayed. We will need to input the latitude and longitude coordinates of the center of the map, the zoom level ranging from one, which is for the whole world to like 20, which is the match zoom you can get on Google Maps. You also will need a JavaScript Google Maps API key, and you also need to state which kind of layout do you want your map to be displayed. For example, we have the roadmap layout, which is like the map that already know that we all know. We also have the satellite layout, which is more like a Google Earth style and the Hebrew layout. Then it is just like the satellite layout, but it also has an additional layer. Well, let's wait for the internet to load. Let's get the cable. This is the last layout, the terrain layout, which is almost like the roadmap layout, but with additional features that highlight terrain features. And the one that I wanted to show you before, which is the hybrid layout, is like just like the satellite one, but with an additional layer which shows roads and names of places. So once we have created our plot, we will go for the first example. In this example, I will try to recreate the plot from inside Airbnb. So now that we have the roadmap layout, we need to embed data on top of it. This embedding will be done by creating just a regular skater plot. Well, we will assign visual properties to each one of the points. So first, we will select the suitable color map in order to mimic the plot that we are going to replicate. So we will select a diverging green, white, red color map, and assign it three colors. And now we only need to create a skater plot where the X and Y coordinates of our plot will be the latitude and longitude coordinates from each one of the listings. And we get a pretty cool plot, which given the amount of code that took to build, looks actually pretty similar to the one that we had here. Of course, this can be improved, having filtered in kind of additional and more worked tooltips, but I think it's actually pretty good. Well, we're working with a number of points that can be managed by a browser easily, but sometimes it's coming handy to aggregate data over regions. So we cannot trust the region data that it's embedded in the dataset. We will have to download shapefile with all the geographic data of all the regions in Spain. Shapefile is a vector format for geographical data representations. Each one of the shapefiles can contain different records. And its record contains both a geometry and different attributes. This geometry thing is handled with shapefile, which is a Python library that allows to work with planar geometrical objects. And each one of the attributes is stored in the form of a Python dictionary. As we are getting all the regions in Spain, we would like to filter out that ones that do not correspond to Mallorca. So looking at the record attributes that we have, we find this code node three code, which turns out to be a standard way for representing geographical regions in Europe. Here you can see all the code nodes that belong to Europe, the level three. And it turns out that Mallorca is assigned the code ES532. So we only need to iterate through each one of our records and store the ones that belong to Mallorca. Now we have on the one hand all the shapefiles, all the shape records corresponding to the island and a lot of GPS coordinates. So if we want a match, one with the others, we will have to think some way to do it. What will we do? Okay. We will take the shapefile library and create a point object using the GPS coordinates of each one of the listings. Once we have created this point, we will be able to calculate distance between that point and each one of the regions that we already have stored. When we find one which distance is zero, we will just store the name of the region that we found inside the column of the data frame. This way, we will be able to aggregate the data by the regions that we just calculated. And now that we already have the shapefiles of the regions that we interested in plotting and the aggregated data, we will use all of us and zero views to plot it. Please raise your hand, the ones who know what is all of you. No one? Oh, great. And zero views, have you ever used it? Okay. They are actually two really cool libraries which allows to do really complex visualizations really easily and they can use as a backend, matlilyp, bokeh or plotly. And well, zero views is an extension of all of views that allows for also working with geographical data. So, we will load all of you and zero views and we will create a data set from the aggregated data frame that we had. Now, we only have to select the shapefiles that we want to plot, the data set and a series of attributes that will create the visualization. In this case, we will choose on which is the common key on we will join the data, a key from the attributes of the shapefiles and from a column in our data frame, value which is the column in the data frame that will be used to assign colors to one of the regions, index that are at least containing the name of the columns that will be used in the tooltip, group which is used to state the name of the plot and finally CRS which is the kind of projection that we will be using. In this case, it's plate query. Once we have learned that, we can use cell magics to state how we want the visual options of the plot to be displayed. For example, we can choose to eliminate viatsis, use a hover tool, the size of the plot or even how we want the colors to be mapped. Here, we can see how we started with a scalar plot of a lot of points and we were able to map them into this shape plot where each one of the regions is colored according to the number of listings that it has. For example, we can do more things with that like aggregating the data and coloring by different properties. Here we have a comparison, perfect. A comparison of the number of listings in each one of the regions in Mallorca against the median price per day that each region has. For example, here, which is Palma de Mallorca, the capital, has the most number of listings but it also turns out to be the most cheaper one. So, when you have a lot of money and want to be alone, just go to the northern part of the island to the end. And now, before moving on to data shader and big data, I wanted to show you one cool thing that my friend B.L. Stella did. He used the Barcelona dataset and various data sources to try to figure out how many listings in Airbnb there was compared to the total number of houses in Barcelona. Airbnb usually states that it cannot influence the real estate market because the proportion of listings that it has to the total number of, but to the total portion of the real estate market is really low. That is true unless you group the data by different neighborhoods. So, he first created a plot of all the different neighborhoods in Barcelona and then merged different data sources. He calculated the proportion of Airbnb listings to the total market of real estate market. And we find that, okay, it's true that if we take the whole city, the proportion of Airbnb listings are really low, but if we go to the center of the city, we found that, for example, in El Barrigotic, there's 14% of all the houses available that can be rented using Airbnb. And the colliding neighborhoods, you find that this proportion is about 10 or 11%. So, it really has a great impact if almost one in 10 houses in your neighborhood are occupied by tourists and available in Airbnb. Well, now that we've finished with HoloViews and GeoViews, let's talk a bit about DataShader. How many of you know DataShader? Have you ever heard this word before? Whoa, 8.5.1. Nice. Well, DataShader is actually a really good package which is meant to make easy loading big data. There's a lot of examples in the form of Jupyter Notebooks that you can just check here. But I will explain a bit its inner workings. It mainly consists on a three-step pipeline which allows to turn big data into images. How is this process accomplished? Well, first, we have a projection step in which we will define an image container. This image will be treated sort of a two-dimensional histogram and assign different bins. In the next step, the data will be aggregated in a meaningful way such as that can count or you can use any aggregation function you want in order to map the data into that bins that we created in the first step. And finally, we can choose how we want the visual properties of each one of the bins to be displayed. Using these three steps, we finally get an image which has all the data that we wanted aggregated. For example, if we wanted to show a DataShader representation of listings, in Mallorca, we first define a canvas as a projection, then we use CVS.point with the listings, X and Y-axis, and the function that we will use to aggregate the data. And finally, we pass that aggregated data and select the kind of color map that we will use. In this case, it's hot. And we will indicate that we want to map the colors logarithmically to each one of the bins. And then I also choose to have a light gray background, so it's easier to see. And this is what we get. In order to understand better how this binning process works, let's compare a regular skater plot with a representation of with a DataShader image. A moment, this, okay. So, as you can see here, where in this skater plot, a high density of points is just represented by a blue mass, DataShader allows to change the color according to the density of points. And if we zoom in, we can see how the bins, how the data is mapped into bins. These are each one of the bins created with an heuristic and each one is colored according to the number of points that fell into that bin. Of course, this is a pretty aesthetically unpleasant effect. So, if you change this dynamic attribute here to true, all the plotting paid line from DataShader will be recalculated each time. So, when you zoom in, these bins will automatically adjust to the data that is included in the image. You have to take into account that the coloring is also taken into account only the number of points that are included in the image and not the whole data set. So, when you're mapping data and zooming in, you can see how the map changes according to the points that are displayed. So, now that we already know how to display big data, with DataShader, we will explain how it can be overlaid on top of a map. Unfortunately, here, we do not have Google Maps. So, we cannot access the Google map plot like we did before, and we'll have to think some other way of doing it. In this case, we need to do a few steps, because we will be using something called tile sources, which is basically a class that, if you pass it a proper rule as a parameter, it will just download an image each time that it is called. So, in this case, creating the class argis from WMSTY tile source and passing this URL, we will later get an image representation. The downside of it is that it uses a different cartographic projection. So, if you want to use OpenStreetMap instead of Google Maps, we will have to transform the coordinates from one projection to the other. In this case, we have created a function that takes DataFrame as input and transforms the longitude and latitudes from the plate carry projection to the Google Mercator one. In order to create the plot, a map plot with DataShader, it's quite similar to the way that we did it with Google Maps, but you will need to figure out which are the coordinates that correspond to the center of the map. Sorry. Instead of the center of the map, you have to define the ranges of geographical coordinates that you want to have displayed. So, it takes a little bit of tweaking and it's not easy, but once you have found the right numbers, it looks like this. So, this map is not as cool as the Google Maps one, but at least you can still zoom in and all the pipeline is recalculated each time. And the more you zoom in, the cooler you get the map. You have to take into account that even though these are just 17k points, all the time it takes to aggregate the data and use it, it's pretty much the similar from this size to like 100 or 200 megabytes. But it's also building an image with big data, like using the touchy data set which is included in the examples that I showed you before, or the sensors data, oh, or the sensors dataset from the United States is actually quite fast. It's a bit painful to install all these examples, but you have a really nice tutorial there on how to do that. It takes a while to download all the data sets, but you should totally check them out because it's totally worth it. But, well, what are the downsides of using Data Shader to work with big data in the notebook? Well, unfortunately, everything related to interactivity does not work pretty well, unless you are using the options that are included in Data Shader to build widgets, but you have no total control over what you are plotting. For example, I tried to build here a dashboard that showed how aggregating data works with Data Shader, and if you are just using this image, it works quite well, but, for example, if you want to change how you aggregate the data, for example, if you want to aggregate the data by the match price of each one of the listings that fall inside each one of the bins, you get errors that you are not able to silence until you get it right and the image is recalculated. And, also, please do not use the reset tooltip because it messes everything up. Well, now it will be really nice. I can talk a bit more about JSON data and it can show you some examples, but I would really like you to ask questions. So, I will just finish my talk here and let you ask whatever you want. I'm not really an expert on this topic, so maybe you think of a question which is quite difficult. I won't be able to answer it and you can laugh at me. So, thank you very much for being here and it's been a pleasure to be talking about maps. Thank you. I would really like you to ask questions. I can keep talking if you want, because there have been, like, 27 minutes and I have more material, but please, come on, participate a bit. There's a lot of people in here. So, what would you like to know about plotting data on maps? Well, hands up, please. Stand up. That's better. My question. Thank you very much. It was very useful. My question is regarding the shapes when you painted, like, the shapes of, I don't know if it was the different regions of Mallorca and the density of points in that area. I had to do that at wall scale and from the files I've got, the precision was very accurate and the data was massive and it was impossible to plot anything. So, I mean, the shape of the United States, I don't know how many points there are there, but if you have, like, one point every two meters, it's, like, huge and I never managed to plot anything with that data. Do you have anything to just compress this or did you have to use anything like that? I think if you're using shapefiles, you are screwed, because I also try to do that with the whole regions in Spain, which is not a lot, but it was totally impossible, because I got a lot of lag and the browser just cannot handle so many points. So, you have to resort to data shader, but the good thing is that it can be used to plot data from all over the world. It actually has one example in which it's used data from satellites that shows really well how all this works, but, yeah, unfortunately, if you are using shapefiles, you are screwed, so you have to figure out some way. You could aggregate the data and pass it, try to create an image using data shader for the data you have aggregated, but it won't be easy. So, maybe you should contact James Bender, who's the main maintainer of these packages, and ask him how to do it. They actually answer really fast in GitHub, so when you open an issue, there will be a few hours until they respond. So, first are questions. It's a Python conference, so you presented Python solution. Is that the best technology to display such solutions, such information? I did not understand you, because when you talk with the mic here, it reverbs a lot. Can you please come down and ask me? Python conference, so you presented Python solution. Is that the best technology to display that, or is there some alternative? It depends on what you want to do. If you're using the notebook, you can achieve almost anything, but if you are a real pro and have a lot of investors worried about how you're displaying the data, maybe you should use JavaScript, because it will be easier to do some really handmade JavaScript tricks that are really specific for the kind of representation that you want. But if you want to build something a bit more general really easily, just fall back to all of you, because it's really easy, and it takes a few lines, of course, to do something really complicated. For example, the plot here, a few years ago, I tried to do something similar with plotting all the patches, and it was an example on the first versions of Bogey about the sense of the United States of America, and there were 70 lines of code which had to be really carefully thought, because it was really easy to mess up. So this is getting every day easier and easier to use. Now the most pro thing is JavaScript, but if you wait a couple of months or years, you will get the same things here in the notebook, and this can also be used using the Bogey server, so if you don't like the notebooks, you can try to do it there. It's a pain, but it can work. Come down. I almost heard you. Yeah, I would definitely use this to prototype the visualizations, and if you need something really specific, you can resort to JavaScript. But even with things like the tooltips, Bogey allows to play a lot with it, so you can embed arbitrary HTML code inside of them. So almost anything can be accomplished, but yes, if you are a pro, go to JavaScript. With the next piece, can you come downstairs? It's much better. We need the question on the video tool, and also he wants to understand you, so anyone with questions, please come down. Can you show us an example of how to visualize time series data, visualize time series data? Yeah, I don't know how to load it here, but there's an example. Here are a few examples. For example, here, she's a visualization. Let me zoom in a bit. On the population of some cities, you just enter a proper formatted data frame, and as I told you before, there are custom widgets that can be used with all of you, so you can use them to visualize how things are evolving in time on top of a map. Depending on the specific kind of visualization that we're going to do, I can point you some resources, but here you can find a few things about how to do that. And also, in the examples from all of you, I think there was, well, merging these two examples, you should be able to work with data maps and make cool dashboards. We can talk later about it if you want. Hello, thank you for inspiring talk. I got a question maybe unrelated to geodata, but is there any convenient way to create a custom shaped map, for instance, from a TFI, a TIF texture, to get the custom map view, for instance, for some game data, game level data? I mean, is there a way to create a custom map to load and visualize the data over it? Yeah, what kind of custom map are you referring to? I mean, for instance, you have an in-game level and you have a player activity data and you need to visualize it, so you have an actual geometry with some coordinates, so is there any way to load such a level right into these tools? Yeah, there are actually very different ways to deal with all these things, but there's a package called Cartopi, which allows you to get images in all of the coordinate representation that you need. Let me check for an example. Well, for example, here, there are different types of visualization using Cartopi. For example, you have, like, the whole warm-up here to be put with maplolip. Thinking to account that all the weird projections and all that, I'm not sure if it's work with Boke, maybe you have to do it by hand and be converting from one coordinate system to the other in order to then plot them as a skater plot, but Cartopi has not only everything you need to switch from one coordinate system to the other, but also has stored visual representations of maps, so in a similar way as we did with the Google map plot, you are able to select how you want your maps to be displayed. Wait a moment, let's see if I can find... Yeah, here you have a lot of different map representations, so it's really possible that you are able to find the one that you need, and if you're not, write James Benner, bother him. Okay, can you come please down for questions? Do you have one? No? You want me to talk to you with J.O.J. some data? No, that's just a joke, I won't. If it's possible to integrate OpenStreetMap with Boke, for instance? It should be possible, I have not tried it because I think that Google maps are way cooler, at least the point in which they are now, but I can look at it and tell you later. We'll try to search it, I have never done it myself before. Okay, next question too, any questions? Okay, can you come and speak to the microphone and directly to him, otherwise it's difficult to understand for him. Yeah, it's like the mic is for the people in YouTube, but if you use that I cannot hear you well from here, so... If you put the data into Boke, what do you get? Is it some kind of JavaScript or I didn't understand it really? I'm not understanding your question. I saw those Boke calls, something like this, and some map mysteriously appeared. What do you get if you put data into Boke? Is it some kind of JavaScript that you can put on a website? Yes. Well, if you wanted to study thoroughly all this data from Airbnb, first, go to the repository of my friend who's this one, they did an in-depth study on how data from Airbnb affects the market, and if you wanted to plot it outside the Jupiter notebook, you would have to use the Boke server. I don't like it because it's really difficult to configure for me at least, but it can work really well if you know what you're doing. So, you could also be using data shader, all-of-use and J-of-use, using Boke as a backend, and using the Boke server to render the plots, and then send them as HTML code and JavaScript into a regular website. Because Boke is using JavaScript as a backend, so if you're using the Boke server to render the plot, a plot from Boke looks like regular HTML code that you can embed in your website. Last chance for a question? Yeah, you can find all these notebooks on GitHub, so when you want to plot maps, just clone the repo and copy paste, and if you want to find, if you want to know the findings from my friend when he analyzed the Airbnb data, then go to this repository. Mine will be in my, what is this one? My username in GitHub is and the project is Insight Airbnb, and here you will be able to find everything that I explained here. Please come down side. Please speak to the microphone back directly to him. I wanted someone to ask me this. I did not forget to put the data into the repository. It was too big for a GitHub repo, so I have uploaded that into my Google Drive account, so I don't know which is the link, but I will put it in the repository, okay? I don't know what it is, not here, see? I have all the data inside the C file that I will commit to my repository when I finish talking about 10 minutes or so, okay? Last chance. Otherwise, get him for a coffee or so. It's been a pleasure having you as a public, because it's like the time that I have been asking questions in my entire life, so thank you very much, everyone.