 Thanks for the introduction. I'm talking about geodata and open source I especially wrote open source modules because there are some non open source modules doing the same so We do the open source way You see some fancy 3d Pictures on the first slide. I have to disappoint you today. We stay 2d. It will be a very basic introduction to processing geodata But what what exactly is geodata? It's it's quite simple. There are some standards easel TC to 11 etc, but the most important thing is it has Association with a location on the earth. Okay, you could say another planet too, but let's stay on the earth today and There are many different Geographic information systems available today I want to highlight to One is Arches of Ezrae and the second is QGIS both can be extended using Python However, I will not talk about geographic information systems today. I want to show you how to process Geodata using Python in a Jupiter notebook The difference is quite simple in a in a Geographic information system you do capturing you do Storing manipulating analyzing data manage and present it in a two-bit notebook I can do exactly the same But I want to focus today on the manipulating and analyzing part and a little bit visualizing part So what do we have today for modules? I There are tons of modules available But two are very two open source libraries, I say libraries because it's non-Python two most important libraries written in C++ are Chidol and the second is Chios Chidol is the Geospatial data abstraction library it's Almost 20 years old now and it can do two things very good. It reads raster data Actually images And Geos is the geometry engine open source. It's also written in C++. Actually, it's a port of a Java Engine called Java topology suit and it's ported to be imported to C++ so Chidol does have Python binding But it's really it's it's the Basically we have functions functions functions and it's it's not it's not Python what what we love here, so There were recently some Nice PySonic libraries being introduced one is rasterio it reads raster data and the other one was Fiona for vector data but before I come to The libraries I want to show you what exactly Chidol does It's a little bit small, but I can make it bigger This is the official page of Chidol the C++ library with the Python bindings and you see here There are some 155 different raster formats supported and 95 vector formats and if you open the list you see there are tons of These raster formats some you may know a little bit for example Geo TIFF TIFF is a is a is a image format, but it can be extended for Geo data, so you have certain Information in there to position the image at the correct position and Same for vector formats. There are tons of vector formats available. Maybe you heard of KML or other ones and you see there are really tons and This library can abstract them all and and you can read them and with Rasterio, which was created by mapbox You have a PySonic way to access all these kinds of format And then we have Shapely which is based on on the Geos on the geometry engine open source And from these libraries and some more like NumPy, etc There is a library called Geo pandas. I'm sure you know pandas and there is a Extended version for pandas for reading Geo data. May I ask you who of you already used Geo pandas? also quite Not not a third, but maybe a fourth. Okay, so I want to I don't want to stay with slides I want to go to Jupiter notebook You can download it here can make a picture or I will put it on Twitter later or on the on the EP your pison page So what you need is to install all these Modules it's quite easy Most are in Condor repository and one you have to get from condor forage volume It should work with pison 3.6. I didn't try it on the windows recently I know it works with 3.5 under windows So if you don't if you use windows, maybe you have to switch to 3.5. It doesn't work yet with 3.7 Because I like f strings you have to adapt it a little bit if you want to bring it to 3.5 So let's go to the Jupiter notebook So you go get up download it and then you get this one. I will Zoom a little bit in it's quite small Can you read this? Yeah, I think it's it's okay behind Okay, so again Okay, you can install these things But I assume you already installed that when you open the Jupiter notebook. Oh, by the way You really need a notebook. It doesn't work properly on on the Jupiter lab yet at least on my machine I had some problems with images So let's look at what we get in in shapely Shapely based on Geos supports several formats so several geometries like points line strings linear rings polygons Multi-line string and multi-point multi-polygon difference is if you have a multi-point it just have several points So you have one abstraction for all of this So let's build a polygon for example. So I create a very simple polygon Shapely geometry import polygon you could do the same with points line strings, etc But I I decided to use a polygon and then you give the coordinates in this list with tuples and Very important the first and the last is is the same coordinate because it is closed So you have to specify that And then we can do some operations for example era length of the polygon and you get this And one nice thing if you are in Jupiter you can just display it within the notebook But this I have to say it's just for debug purposes you can also display it using matplotlib and others but it's just simple just Give the result and it will be displayed in in to be the notebook. So I create a second polygon You also see the scale is not quite correct Compared to these two. It's just as I said, it's just a debug output So we have two polygons and then I could make for example union of the polygons And you see you have a new polygon the output and can display it you can access coordinates, etc You could also do a intersection of both Okay, that's a little bit strange But if you look at it, it's the inside part of the polygon. So it's a triangle instead of section And then one very nice thing is a symmetric difference You take the difference of the two polygons and you get a new polygon back Which looks like this. So it's it's really it's it's quite advanced this this Shapely Okay, and of course you can do things to you can get from from these results. It's a it's a new polygon you can Get the length and error again. Okay So you want maybe to see the output. There is a format called Wkt it's well known text It's actually a standard from the option to a spatial consortium actually a part of a standard But I'm not going into details here Where you can use text to represent all these these shapes. So in case of this this Symmetric difference you get back a multipolygon because it's it's more than one polygon and here are the coordinates So the type by the way of this is just a string. So it's really serialized as a string and You can also load again with this string So if you type this polygon a string or you have some some file where you can import it You can use Wkt loads and load s s for string and you have the polygon back so it's just Pretty easy way to dump and to restore Polygon from text by the way Shapely doesn't support Many formats actually just this serializing formats you can also use geochasen with some tricks But basically the best way is using This well known text representations There are also some very interesting binary operations available You can check if a point for example is within the polygon or you can check if a polygon is within a polygon Check does it intersect does it cross does it is it equal does it touch and so on so It just gives back true or false So I say polygon one intersects polygon two or polygon two intersects polygon one is the same In this case. Yes, they intersect and is it within? No, it's not within And is it equal is polygon one equal to polygon one? Yes, of course. It's equal So you can just do these kind of tests So I'm I'm leaving Shapely for now. We will come back to that in a in a later Let's go to Fiona Fiona as I said before in the graphic is based on the GDAL library and it can be it can do more than just Do operations it can it can read formats So we have you saw this this almost 100 vector formats before we can load from from this Quite easy. You're just open import Fiona open the data file. I used as reshape file It's a very popular format for vector data. I read it. I could also write the new one and then we can iterate through these These data sets. Let's just look at the first with next I can I can access just I iterate through the whole to the whole data set So is next I always get the next one so the first data set in this is is it's an airport data set From natural Earth they have a public domain shapefile with all most or many airports in the world and here the first airport in this is Based in India and you see there are many attributes together So we don't only have the position in this case You have a point like we had a polygon before now you have a point and we have its coordinates this is the the coordinate of the of the Airport it's it's a WGS 84 we come to that shortly and it's a geographic coordinate So it's an airport. It's a small airport. The name is Sunewal or whatever. It's pronounced. There is abbreviation There's a yata code this one, you know if you book Anyone know the code of Edinburgh? EDI, okay. Yeah, we've come to that shortly and you see you can't just access it It's basically a dictionary so you can access these these attributes properties by using the standard way to access dictionaries so We have actually two ways to open this airport one is creating a list of this Of this collection you see this of the open just got a list and you get a list of all airports The big problem of that is if you have a huge data set you will run out of memory So the second way we already did that is just iterate through it By using for loop So I used a width so I don't have to close the data set Airport shape as see see like collection and then I loop through airport in this collection and then I check if properties yata code is Edinburgh and if it is Edinburgh, okay, print a name coordinates and Wikipedia link Yeah, the wikipedia link is also stored in this in this data set. So I run this see it was quite fast You get a coordinate Edinburgh international airport You could go to the Wikipedia page and look exactly for more information there So we saw Coronet but we you know the earth is not flat unfortunately it would be much easier so We have coordinates different coordinate systems on the planet actually every country has at least one coordinate system But for this data set a global reference system is used Wgs84 you can Check what reference system is used by coordinate reference system. So collection dot CRS coordinate reference system You get EPSG 4326 and EPSG stands for European Petroleum through a group They have the greater numbers for every coordinate system and the 4326 is the wgs 84 you can check it here. It's nice web page EPSG I dot IO and you can just slash the number and you get the information So it's the WGS World Geographic System 1984 used in GPS. So and you see the coverage. It's the whole planet I'm not going into Details with projection systems today due to time restrictions of this talk. Oh I'm already behind schedule So If you are interested you can check out PI approach so I can transform from one coordinate system to the other So let's do something we open Again a data set a different data set. It's the admin zero countries shapefile from natural earth Admin zero means it has all countries of the world But for example admin zero means we only have great Britain. We don't have Scotland England Separated so it's admin zero country So we can see the first again. You see this time we have a multi polygon for every every country in this data set so We can also access again some Information for example the name the name is also available in different Languages you can check the continent you can check out was the population year And again, I do a loop through all elements in this collection and check if the name is United Kingdom and then I just print these these things Okay population For 2017 and it's a multi polygon So let's do a quick example. We have two data sets now You have the airport data set and and this this country Data sets admin zero data set so what we can do is we can say okay, give me all airports within this UK polygon So it's it's quite amazing It's just this Okay, we first we check out if is this is the UK. I think yeah, it looks correct and you see just Get this before we stored it by the way. I started geometry from this Data set so I have it here. So I have to create the shape of this geometry and I can Look at it. This is basically the same Like the shapely polygon is is actually a shapely polygon so we can interact with fire owner and shapely So in this short code sequence, I open the airport files it rates through all airports and check if the position of this airport is within UK then it prints and See it's quite fast. We can we can Output it you may see there are not all airports here. This is because the data set is not really complete So that's the problem of the data set not of the of the algorithm But I I don't think in another language you could do this in one two three four five Okay, with the import six lines of code. So it's really it's really amazing Okay, let's quickly go to rasterium It's for raster data. So you have image file for example Geo-tiff You open the data set you have many parameters Basically, you only need to if you start you can say I want to read it and I want to open the file name so it looks like this I use the Blue marble data set of NASA. It's the whole planet inside and This is already all just open and then I can access all attributes name Okay, you know the mode we read it count how many raster bands you have you have RGB in this in this Datasets so we have three raster bands and I can also do it with the indices So one two three are the indices to access them I can have the width and height of the of the file I can look at the coordinate reference system again. It's EPS G four three two six So it's WGS 84 and then we have something very special you have the affine transformation, so it's it's basically a Matrix you can you can Have pixel coordinates and transform them to the coordinate reference system. So for example if you Transform zero zero. That's the top left of the image. It has the coordinate of Of minus 180 and 90 degree. So it's geographic coordinates You can also do the inverse affine transformation just with the tilt in front of the affine So you get this transformation and if I transform the 0 0 that's the middle of Of the image basically middle in a geographic reference system So I get pixels 1,800 900 now we see we can compare it to the width and height which is In this dimension so it's in the middle of the picture Okay, we could also see where is the Edinburgh international conference center located in pixel coordinates So I take the geographic coordinate of here and print my pixels and you see this pixel represents this conference center So we can also see the bounds where it is located. That's not very Special what I want to do now is to read the RGB data and display it using matplotlib So what I do is I I want to read it in a NumPy array So we saw there are three bands red green and blue and I have to read them separately This is because images can have more than three bands spectral images for example or elevation files with only one band But now we have a RGB. So it's quite easy So with these duck and can I can put it in a in a in a representation where I only have RGB together That's the easiest way and then that can just plot it using matplotlib and we see this takes a while It's a big image. Yeah, okay that's The red dot would be in this position. So it it seems to be corrected if you look at it So let's go to your pandas Still have time I see more or less so Geopandas combines all this but why maybe you ask yourself why I showed you this shapely and all these things The problem of pandas and Geopandas is if you really have huge datasets Which don't fit into the memory you have to use this without pandas or Geopandas and if you have geographic data Sometimes you have terabytes of data. So you can't really have everything in the in in RAM So first we open dataset. It's a CSV file I use pandas for that It's it contains all cities with a bigger population than 5,000 In utf-8. It's it's comma separated. It's origin is geo names. It's a it's also Open data set and we see the first five It's we have the name of the city. We have the name in in Latin One so without special science and then we have the name of the city in different languages the geographic position other interesting information where it is time zone and so on when the dataset was created so you see that's a normal pandas output and First what I do is I want to I want to reduce it just we are not interested in all information So I take the columns one four five fourteen and give them some names So now it's it's quite easier to see that I named take the name in utf-8 and Latitude longitude and population. So that's the whole data set now And let's do a query just to see if everything is correct Edinburgh is within this data set and we have a latitude longitude position population I'm not sure if it is correct 464,990 probably So let's switch to two pandas Actually, the difference if you have this the main difference of two pandas is you have a Column called geometry and in this geometry column. There is the geometric information actually as shapely stored as shapely polygon or whatever and We can just take this data and convert it We should create a two data frame and for every we create a point of this position longitude latitude and Zip it together. So we have we have actually a shapely point in in this geometry Data frame so a geo data frame has a column geometry and this consists of these shapely points So let's look at it. We see it's basically the same. We just have a geometry Column and and that's that's basically all so we have a two pandas data frame so I drop You can you can leave it inside the latitude longitude, but I drop it and You see actually I could display it again quickly Gdf head You see it's without a lot of longitude. It's redundant. So we have the geometry here. So it's it's the same So what we can do we can just call geo data frame plot and We see a matplot output I should use Semicolon and this is all cities on the world. You see I think Can imagine the continents there are no big cities in the water. So So we can export that Dataset to a shape file for example Just gdf to file name take the driver shape file You could also export it as geochasen and some other formats and very important encoding utf-8 especially if you use Windows and Then you see in your file system. This takes a while shape file with this Okay, so this really takes a while if you don't have power And you would see in the Explorer I'm not showing it due to time you will see the city's shape is inside You could open it with a with this QGIS for example or another software So I Stored this geodata frame. I call it cities just to remember cities That's my data set just for later keep it and we can do some queries on this on the cities Data set let me create a new geodata frame called big cities just I want all cities with population greater than 10 million and It's this one you see it's not it's not 100% correct here It says Shanghai has the most biggest population. It's just again this data set has these numbers and We see the 10 biggest They just said I sorted it ascending. It's correct and Okay, let's load again this this data file we take two pandas And read file and we load these countries admin zero countries we use before in in shapely Fiona sorry and then we look at the data You see I can actually open this shape file and I have a two pandas output Again, it's a little bit too big. I just want to reduce it to to the name population and polygon We can also plot it again. I'm not doing it. Just call plot and you see the whole world What I want to do now is I I want to Again take United Kingdom now with the two pandas or pandas notation I look for our country's name United Kingdom and we see this result and Now let's plot it. This will use matplotlib and we see That's the result. You also see it's not High resolution. I used it's it's on github. I used the reduced version of the polygons So we don't have to download many hours So let's let's make another query. I make the query rest of the world and Say all countries where the name is not United Kingdom And I plot both together. I make this axis plot save this plot with color red UK and then use the rest of the world plot give this axis back and Display it blue so you see it actually works. You can have several data sets like this together You could have 100 different data sets and then plot it You can try it yourself. You can take Country you're from or your favorite country or whatever and color it in in your favorite color and now We see we can actually access this shapely you have to use the I lock Zero and access the geometry and you actually can access the shapely polygon from from within pandas and Now you can also do some some missed information You can check if if a city is within so I take this city status remember I stored it to panda cities It's all cities with population greater than 5,000 and I say I would like to have all cities within this UK geometry and let's plot the first let's display the first five You see this takes a while So it checks it goes through all these cities and you see UK cities It's all cities within UK in this geometry position. Okay, so now let's go to the last Interesting module on to show today. It's volume. It's Maybe you know leaflet J s is a mapping library It's also used for open-street map if you open open three of you see this this map and it's based on leaflet and Volume is a Python version accessing this leaflet J s So what you can do is import volume you can have a map Location the center of the map I use the Edinburgh Conference Center and say zoom level 17 and what it creates is just a map Yes, the internet is working great and you see center of the map is our conference Center and You can also save this map. We'll see it later. It's really in three lines of code. You have a map. So it's really It's really easy. So the interesting thing is you can combine to your pandas and volume So again, we have this UK Polygon and now what you can do we have volume We in volume we have a function geochasing and in this function you just give the Geopandas Data frame and add it to this map. So if we do that, we see our nice Polygon is now on this map so Maybe you try that with this with the original JavaScript library it's possible too, but it's double number of lines So it's really it's really very easy to add this polygon and now You want to say maybe I want to color it differently every country in a different color or something That's also very Pythonic here You just use geochasing and you can give a lambda function or whatever a function Where you take the feature and you can access this feature you can say fill color You could also make a function here and use the feature again But I don't want to do I have too many lambda functions. So I keep it easy and I just say green But if you have a function, you can say if this feature is country, whatever Then color it green or blue or black, whatever and so this can be a function to everything here can be a function function to And add it again to the map and you see I make green Fill color green and outline color black dash array and you see it looks like this So so next is Okay, you want to have you remember the big cities this 10 Larger cities on the world. So what you can do with volume you can create a marker and Also iterate through all these with the apply function of your pandas you you Take every data set and you create a marker. You can access the data. So I just do it now And you see that 10 biggest cities are on the map You can of course combine everything you kind of markers you kind of polygons and at the end you can just say save the map my map HTML and It should be It should be here somewhere Develop Euro-Python should be here and this HTML should be located Oh, yes, my map HTML. So I open web browser and does is this the result? So this is just one HTML page with the this are a great map I could do the same with the UK polygon So you can really create HTML of your map and you could also alter this this HTML and Extended using JavaScript Okay, that's basically Is what I wanted to say one more thing if you are interested in in a geo data and Python I Organizing next year the two Python conference again and since yesterday. I know the date It's from June 24 to June 26 2019 in Basel Switzerland in a brand new building with our compass has a new building it will be finished end of this month, hopefully and Actually people are already moving in and and it's the restaurant is already open in there and the first five floors are ready and we are moving in in middle of August and But about the conference, it's about Python and and geodata basically but we enhance it for computer vision Remote sensing and image processing machine learning and much more and it's really easy to access With public transport. We even have our own train since this year Okay, so it's Time for questions. Thank you very much So don't have any questions I'm very excited to see All the manipulations on the shapefiles on it's very exciting and you can do all those heavy lifting easily I'm very much interested in this topic. Do you can you recommend some books which which can Can get in your world in geo data processing world to get some Pinpoint so that we are not just as an engineer puppy and getting That there are really some books about this topic geo processing. It's from packed publishing, but Books it's nice to have books, but the problem is this is evolving this rapidly. So in in two months There are new features. So it's really it's really difficult. I think that the big time of books. It's is I don't want to say over but I mean, it's just the best resources still online Check the repositories. I made links of everything and they often have examples For example for Folium they have tons of examples with with with some marker clusters and everything you need So I would I would look there first and if you really want the book you can you can Find it from packed publishing. For example, they have geo processing. There are other ones Just Google Manning have has won geo processing book. I have it too, but it's already one year old or two years old And it's just not up to date. That's that's a big issue Hi Thanks for the talk Martin. I'm very interesting by the way. I was in a geopython In Basel in 2016 and it was awesome. So I recommend everybody. It's a very good conference So my question is about the test So that type of Format test is WKT. I think you can set If you can talk a little more about it, you see, you know, if it's compact And it's becoming a standard or it's a good Geo Jason alternative to get Geographic information in a compact compact way You mean the format of file format of geodata? Yeah, there was a Okay, there are several you saw for example is WK at a well-known text. It's it's asking of course It's huge. You can't really store that there's also a well-known binary which which shortens it I didn't go into that And then of course, there are tons of vector formats with compression and without compression Ascii is still very popular in the geo world even for for For raster data because it's it's easy to interchange But your question was the most compact format or I was wondering if I didn't Heard about that format before so I was wondering a shape you mean shape file for well will not test Okay, so as reshape file is quite common in the geo industry. It's it's a very old format They support utf-8 to and and many things. So it's it's a very popular Format that I was wondering for the well-known test other well-known text. Yeah. Oh, okay This one is from OTC to open to a special consortium. It's It's part of another standard Open geese you can you can go you can best way as you Google it or you go to Wikipedia well-known text They're friend find tons of information about this format. It's it's very popular this format. It's used for different things also for For a coordinate reference systems or the definition of coordinate reference system It's quite popular, but the big problem a big issue with that. It's it's it's asking and takes place And for that use well-known binary There's a question down at the front. Hi, thanks very much I was wondering how well these libraries for example rest trial work with data where the geo coding is not in a fine transform but for example ground control points or type point grids and also what's your take on net CDF and X array for example for multi-dimensional geo data So first if you don't have to reference that you have to do it yourself. It's clear You can add for example if you have a tape file, you just just create ask you file TWF file and then you you you specify that and For the formats As I said there are tons of formats I can't say this is the best format if there would be a best format There would be only one format so I think that answers the problem every company has its own format for certain purposes Even if it's marketing So we have more than 200 different roster formats And if you look for example at the 3d format, it's even worse Every 3d software has its own format. So it's a big issue The only way is GDAL is using abstraction layer and and if you have a new format You just write this part of the abstraction and then you access it. Thanks. Any other questions? No Yep, also, yeah Just just because thank you for the talk just because there's no more question Now I just wanted to say there is the geo buff formats which is based on protobuf for two-dimensional thing I think the most compact format actually is from my box and it's geo buff One more format. Yeah, but it's nice. Yeah, it's they say it's the most compact format of course and it's It's one more format But I think it's it's good. Yeah, exactly. But the big problem is data delivery who supports it now and so on It's not this easy. So you have to Especially if you access data if you access geochasing from a service, there are many open data portals They offer geochasing so you can't you have to use geochasing you have no choice Exactly exactly which is important. Yeah, but I often see just Sip g-sip encoding from the web server using geochasing is very popular too. It solves some things too It's not perfect, but it's a way to Sorry to interrupt with one more format. I I'm just thinking of the talk by Peter Hoffman about dusk Apache and parakeet and it's Very compatible with pandas. I just wondered if you knew if geopandas could extend to that so that you didn't have to open Very massive shape files because it I think it had a very clever thing for ordering and selecting by Features and columns so you could quickly access things. Geopandas is open source. It's everyone could everyone here could write Something You could put a ticket on the on github and then see what happens. I that's all I can say but thanks for the input Okay, thanks. I think we'll leave it there and thank Martin again freeze talk