 public means of public bike transportation on the city and the system is composed of of some stations and and these stations might have bicycles on them or maybe not so knowing where the stations are and how many bicycles are or how many spots to park the bicycle it's something that makes makes it really important to have awareness on the location of the stations and the availability so that was just for setting up the context so I'm hello I'm Luis Eskerda currently working at scraping hub and I'm going to talk about how I'm going to talk about how I managed to grow a project from it's a good okay how I managed to to grow a project from Barcelona to the whole world and for the people that knew what bike sharing is want to share this map with you this works with this this is a real-time map of the world this is all the all the systems I currently support on my project and every time I don't know if you can see it but every time there is an explosion on the map it means that somebody has just took a bicycle or or just left it on it's real-time ish so yeah as you can see some guy there just left the bicycle somebody took it there and and that's on the whole on the whole world or on the whole world in the as in city bikes can see so we can see the systems that are not on this map because they are not yet supported so what's this about in reality many years ago I had the not so novel idea of mixing up smartphones with these bike sharing systems there were some applications fortunately for for me in Barcelona there was no such Android application for the biasing system so I decided to create one and the first step that I encountered was that this information was not available anywhere so I looked on on the internet for you know busy API busy information and I that's the first time I I saw the term scrapping I will refer to the original library that inspired the project because if it was not for this library I would probably not be here and that's also what got me into into Python so the main idea was to to there was a website for the for the for the Barcelona system and there was a map and you had to take this information from from the JavaScript inside the map so it was kind of hidden hidden and then I knew why some apps that were around were really slow and it was because all the scraping thing was happening on the smartphones and at that time smartphones were not really that powerful so they were really slow they would break and they will crash so I created open biasing which was the first part of the project it had an open API and it was also an Android application that was I think it was a licensee to and that was for Barcelona then I realized that in Paris the problem was exactly the same even though the website was even more complicated to to scrape but I created an exact clone of the same project but I just rename it did open belly and then I realized that in it was also the case for doubling in fact doubling bikes was using the exact same system that belly was using so it was really easy to add it just three more lines of code and then I had another project then London started their cycle hard project which started with open information already that you could freely access and I decided I also wanted to be there so I created yet another clone of the same project and at this point it started getting really clear that this was not the best way to keep working that it's not easy to keep separate projects at the same time and then I decided to call it city bikes and it was it was the so sorry city bikes was was the the composite of all the cities that I started finding around at this point it was mostly the same Python library I had been using for the busy but just with more and more networks kind of struck together so it's it still felt a bit like like not doing a really good work on it but on the outside it was it was good there were many cities around and and so this is what city bikes is composed of the library was renamed to buy bikes basically this is the this is the main part of the whole project and on buy bikes people can what people or myself we add new scrapers for all these bike sharing websites and when you add it to buy bikes then suddenly it gets added to the API of city bikes and so basically buy bikes in in a sense it's a Python library that gives you access to bike sharing information but not dependent on my project at all but it's the actual library we are using to extract all this information then this is called giardo it's it's not really something it just the system I'm using to manage all the tasks and on the queue so all the scraping does must be must be a process it somehow and I just wanted to show a bit of the stack of the project not too much maybe the only novel thing is that this is using RQ instead of salary which is the main task processing system being used on the on the Python world and then the there is the API of the project fortunately for everybody in this room this part of the library does not have a logo so the API it so since the project started I mean I was not really intending to build an API but I knew that when you start sending information through the internet to a mobile device it's almost impossible to protect it so you better do it already open and then the API kind of happened already so it was already on the pressure night was the the most important part and in the end there is also the Android application which unfortunately I'm not spending much time anymore and and that's because of our realization I had that the Android application really does not produce any kind of of new information or new content whilst by using the API we are providing something that can be used to create maps so the main idea it's maps are not maps are useful representations of reality but they do not have the power of creating something new so and that's that's quite important to to realize because it's it's the point where you start understanding how wrong some city councils are a lot things and how things should be done in a different way for instance since publishing this API there is a people up for getting your bike sharing information there is a Google glass up there is city mapper which is if I know already they are using the city bikes API for Barcelona, Sao Paulo and two or three more cities I don't know but basically these are cities where they have problems accessing information so by providing it they can support it these are two apps one is GPL descents it the other one it's MIT but it's good I mean it's good to have applications that are open source or free software using the API because they are continuing the work I started and I don't really have to support all these many platforms there is even one app called city bikes I not even gotten in contact with with a guy but just decided to put the name there is an app for Windows phone so I don't I mean there was no way I was going to support any of these all of these platforms it will be it will be a waste of time and back to the growth thing I would like to say that it's been a good and calm and steady thing to do but and we hear many words around supplies like smart cities and open data and we are really content and so happy about all these terms but in the end they they do not mean anything if you are not actually providing what people needs so there are some cities are are starting to provide open data and that's and that's a good thing but they're going to be a lot of cities that maybe it will take 10 years for them to even know that they should be doing it or maybe they don't have any kind of of reinforcement of doing it so maybe they will never do it and that's where I met the evil part of the of the project it's called PPPs these are called as public private partnerships so basically this relation happens every time a company goes to offer a public service so a private company is going to offer a public service for a for a government or for a city council and so when the city council gets gets a contract to work with with these guys the information usually stays with the private part and not with the public part so somehow we could consider that these companies are blackmailing city councils for the information they should already own so let's say you are a city council you have just opened a bike sharing system and you say well okay I want I want to build an Android app and they cannot because they do not have the rights to access this information but the company obviously is really happy to provide them with an app that obviously it will be overpriced so I like to think that city bike somehow could be a key contributor on this on this right in the sense that the project kind of forces companies and city councils to to get it together so by providing a reality that it's not there anymore it's not still there sorry at some point they are forced it to to do it just because they they have no other option the information it's already on the net and people are using it there are many applications using it so they they they kind of are forced it to to release it or yeah so so I just read obviously I'm going to miss quote here's obviously the article does not have anything to do with even the theme of this presentation I think it's a it's a nice it's a nice quote so I think we should develop our software and choose our licenses for the future we want to see not the present that we are stuck with and this has been a major a major part of what the project is about so it's not just about going around scraping websites putting information out and everything so it's more about trying to create an image of how the world will look like if of this information was free and I'm not even talking about bicycles right now with just any kind of information that it's not sensible to user privacy that can be useful for anybody to to build things and I think that this map kind of could be that that metaphor I mean we are seeing the world we are seeing the people used bicycles everywhere and the only way to do it is is if you just release all this information so that's mainly that was it hello okay so now I have time for questions how I how I am on time pretty early quite good like and so we still have probably 10 15 minutes for 15 yep okay so what do you say if we add the Bilbao bike share system to city bikes yeah okay so I've been looking around it looks like they've been publishing this so this information it's open so let's just go to I mean I know not this one so they they released this information for free do what I found out that it's quite funny they also have an Android app that it's not using the open data feed but that it does provide a bit more of information but anyway for the sake of doing things the way it should be done you're just going to use the the public feed okay it's here it's in csa if be XML or WMS so what sucks less than the free XML I guess so just going to show a bit of how the deal how is it is or maybe how hard it is I don't know to add a new system to buy bikes so these are the the spiders and then there is another data folder that maps meta information to these spiders so for in the case of Bilbao because they are providing an open data feed that probably we're not going to be able to reuse ever because this is a format that they are only going to release for Bilbao I would have to create a new file we'll call it Bilbao and because I don't really know what I'm doing I'm just going to look at other fighters at the same time so we can there is a basic class that inherits from bike sharing systems so in this case it will be Bilbao I think this bike share system and we can get some some parameters we can add some meta information there so there are some things that are already being published on the API and they are important to if they appear here like what is the which is the company running it or maybe they have a license or something so this just goes here and goes well with it with the API by the way the so the idea of buy bikes is so basically you import buy bikes you buy bikes get some any kind of bike sharing system you want let's say busy you obviously have to put it somewhere and then you update it and suddenly it has stations on it so for stationing by seeing station spring station for instance and and and it's like that you don't have to do anything this is not a rubber around my API so if anybody here is doing data science or things like that in Python give the library a try because it's quite easy to use and it contains all the resources you need to to create whatever you want to do related to bike sharing information 10 minutes okay okay so I think it's better we we just go to question time if you are really interested in knowing how a system itself to the library we can talk it later or I can show it to you anyone who wants the microphone thanks for the talk it's awesome did you get into any legal issues with the with the companies that own the data that you're scraping did you get like a decision decision yeah I think I completely forgot to add that slide and it was a quite important part of it so for that I'm sorry to the audience yeah we've got into some trouble sometimes but the good thing is by by by publishing this information and and having all these small army of developers writing stuff for it the only guy that's going to look good but on the paper are always going to be there so usually they always try to avoid any kind of conflict there was a conflict with JC Rico in France where they send me a season this is letter for all the prejudices that the project was causing them because they were actually planning on selling all this information to any company that wanted to run analysis on it so may I don't know but the the men as of all the laws I was causing them was valued in around I don't know I think it was around 30,000 euros per year but then a new law passed by in in France where all public infrastructure or private companies providing public services are almost forced to publish this information and they even released a license for it so in the end it worked okay then there was another season this is from Brisbane there was another guy from a guy from Sweden that he had like a partnership with Clear Channel to be the sole developer of the app so I think they kind of made a deal with him that he would make the app for free in exchange he would be the only exploiter of this feed of the information and he I mean I exchanged some really angry words with him on Twitter and through email and everything and it really did not fix and so I just decided to forget about the city and it's not on the map anymore so when before I was talking about what's a map and what's information and what's create something so for instance if we go to this email which is the bike sharing system in Madrid right now they've gone as far away as creating a PNG map so scrappers will not get this information so yeah well no what would be easier is to break their Android app and actually access their API but the last time that happened hell broke loose and the many vulnerabilities were discovered on their systems so I think I'm letting this case rest for at least one year or two until they forget who I am but yeah these are been having the problem so far another funny problem I have had so far it's in getting people using my API to properly reference back the project because if not the message of the project gets completely lost and people just start assuming that this information it's already freely available when it's not we are going to create links to to create this database of information just to provide them with a source when there is not a source but it's really important that they say where the information comes from because if not just information it's there so we don't have to fix anything because it's already open and we are in the future so thank you for the talk my name is Vojtek it was really great to learn about the bicycles but your message is also that there is some extra data about the cities like the open data and other systems that provide not only information about the bicycle stations but you mentioned there is also some other piece of information could you like give me an example what else you could scrap from the data that is provided by city councils etc don't actually think I said that if I said that I was misusing my own words myself I said that I you mean well while I was pretending to write the the system for Bilbao I said something about the company and everything or no no no I mean like the data that you are presenting any wrappers is about many bicycles but what I understood is that the message of this is that this data should be open and sometimes the city councils are providing already some APIs about the cities not only about the bicycles so my question is okay is there any other like piece of information I don't know like the paths of the buses that you could track in this way and there is there is many people I mean there's lately there's been lots of maps showing thank you for the question there's been many maps showing real-time tracks of underground trains buses and everything and people love this kind of map so this information it's available I think and on the UK at least and in the US in some parts cities usually are just able to publish rubbish information that people cannot really create anything with like I don't know the public webcams of traffic so I so usually not there is not much to to get from from open data portals and that's that's a problem I in fact I've been discussing it for with them because they started really excited on on the hype of of the open data thing they completely believe it everything that happened and then nothing happened and nothing happened first because it was I second because there are companies really interested in selling big data solutions so suddenly they tell the city councils that they have a big data problem and that they need a solution for that and that's not true but still it wins good money and so in the end some city council have started just releasing different apps and we must consider that every time they release an app there is going to be public feed about that but what what I was trying to convince them it's they should really stop doing apps because they really suck at doing it so what they should be is just providing information and obviously not just that but every time they contract they have a contract with a private company there should be somebody there that knows understands something so they are able to get a more fair agreement that the other way they would not get I think it's just yesterday by looking on how I will scrape the well just integrate the the bike sharing feed from Bilbao I was looking for the name of the company providing the service and there was no way of finding it in the end I arrived at a PDF explaining the elicitation for the project so the company was there but the company it's not actually the company providing the service it's a consultancy company that it's sub contracting different companies to build the application to to provide the feeds and so it's it's really public projects are like these are they are really ugly and we are really know it so yeah hey have you tried having a look at open-street maps data to see if you can get locations for state bicycle stations from there and fact I tried to contribute back all the stations I have to open-street maps but the thing is I could only contribute the ones that come from a source that it's not liable of prosecution so what I mean really interesting right now in in OSM it's integrating their information usually I would I do not need the information from the whole world but just from the actual cities I have so I can get bicycle paths or I can get the altitude of the of the stations so maybe I can enhance somehow the API because right now the API the good thing is it's a cohesive syntax for a for a for an object let's say for a station and everything and that's what developers like they write an application for their city and suddenly they have support for the whole world and that's a good thing but in the end what my API is offering it's just a reformulation of the original information so what would be really cool would be to enhance this information and it's something I'm looking into thank you so have one minute for a really short question if there is any okay let's hi one technical question where do you run all the scrapers do you run on Amazon web servers or something like this so is it your own server somewhere in data center yeah I have to lean out virtual machines and then a dedicated OVH which does not have much CPU power but lots of this can run so I'm trying to stick to the rule of do it do yourself ever do you sorry that I want to do all the parts of it that's a problem because it's difficult but I really wanted to learn to learn so these different servers are interconnected in a VPN they also have tor exit nodes on the inside and then there is ready somewhere database somewhere and then there is the serving part which is done with NGX okay so let's thank the speaker again