 My name's Roberto, and for those who don't know me, and for those who do know me, I'm Steve Roberto, so I'm gonna be, this is the last talk, I know you're tired, I know you wanna go bar hopping, I'm gonna be doing my best to keep you awake. So I work for Critical, because it's a cryptographic, a customer cryptographic creation company, we also do software development, and we love Django, myself and the company. We've been backers of many high-profile Django projects, like the Schema Migration Project, like High Performance Django by Peter Baumgartner, and we are goal sponsors of the Django Vest framework Kickstarter. I've been doing software development for 29 years. My first claim of fame came by helping reverse-engineered Nintendo, so if you ever got fired by playing a sloppy made pixelated games in your cell phone, that's on you, not on me. In the Django world, I'm known for a few projects, and one of the most recent ones who have had a lot of visibility is my young EDMS, it's a document management system done entirely on Django. So how I got into this mess. In 2013, I was appointed Director of Software Development for the Government of Puerto Rico, and I had to oversee the creation and use of software in the government. And one of the projects I was handed was that the Governor of Puerto Rico had just signed an executive order ordering all government agencies to start sharing data electronically, but we had no infrastructure to do that. And this is the problem, this is the scenario I was given. We had at the time 142 government agencies, each of them creating and accumulating data in completely incompatible formats, and with no way to share the data. Some of the most forward looking agency did try to fix the problem on themselves, but because there was no policy, no oversight, the end solution was the same. Everybody just kept wasting money doing completely non-interoperable interfaces and export of the data. So pretty much this was my reaction to what is going on here with. So I realized we did not understood the problem. So the first thing we did was just make a checklist. What do we need to make this happen? Okay, we need an export, an universally compatible export tool where we can take any data, government data, and export it into new formats like JSON and XML, and regardless of what the original file format was. And then we realized that does not exist in the universe. So basically we just had to create it ourselves. We just were a new experimental software development department for the government, so let's honor our name. Let's start development, developing. And this is what we came on. We came up with Libre. It's actually a background to create an engine to free up government data. Libre in English means free. So it was also kind of a political statement. This is the ego I view basically what the platform actually managed to do. We can take completely heterogeneous data source regardless of format in the place where the data is being originated. And just by doing a simple description of how the data is in corporate structure, we can import government data. We can do versioning on the government data, and we can start hosting also open government data from the same product because infrastructure is also another big problem in the government. You can have, you have very few government agencies that have good infrastructure. Most of them will collapse as soon as they get 100 concurrent users. We also had to create a unified query language because our users are now more technical. The public is more technical. So some people do want to see just an infographic but most of our users now just want access to the data themselves to do statistic analysis, mathematical analysis. So we had to come up with that in an unified way so that our new clientele, the developer clientele could filter, could select what they wanted. And we had to support as many output formats as we could not just the original file format. We had to support JSON for JavaScript development. We had to support XML. So Django's rest framework was very crucial in this part. So now we can have completely outdated government data and it can be used in a lot of different scenarios. So this is now how Libre fits in the whole ecosystem. We can have now all government agencies producing data, how they know how to produce it. And we can now drop in this tool and they don't have to change anything internally. How to do things, how they keep producing this data. And yet the data can now be shared, can be used by all the government agencies or the general public. And we can start turning stuff very ugly stuff like this. I hate spreadsheet files. They have no kind of validation. They didn't tell you anything. And we can start turning them into beautiful stuff like this that a software developer can use without having to worry about importing the file. We can turn completely ugly stuff like this. This is a cheat file. It is a very bad name for a file format because it's not a single file, it's a distribution of files. You can blame ESRI for that. And we can turn them into this. The tool also had to support geospatial capabilities which is another big topic in the government. The government has a lot of geospatial data that is being produced. But it's being produced sometime in outdated formats. In Puerto Rico we had most of our maps, our state plane projection in that 27 format. It's a format that was standardized in 1927. And we started to move to that, not 1883. It still was not interoperable because it was a state-centric format. And with this tool, we can convert all those data completely transparently into WGS 484 which is a geocentric, it's a world-centric projection. That's how now Puerto Rican data can be plotted in stuff that is designed to work around the world. So basically we are modernizing government data. For developers, for example, now they can take a legacy cheat file from the government and using the Libre platform. They can, we can render a map and a developer can just capture the map in an iframe and you can actually now incorporate geospatial capabilities in your software without having to write code just by capturing a render map from a query you just issued to the platform. And you can just start doing stuff like this from data that was originally came from outdated cheat files that were just accumulating big rot in a government server and from Excel spreadsheets. So this is the kind of reaction we started getting from the developers but the administrators started hating it because they already have this amount of work and DevOps are very stressful people. They have the weight of the infrastructure on their shoulders and having them create descriptor files for the files that they were going to import into the platform was very much becoming another obstacle into the platform. So that's where the Django administration tool came into the rescue and on top of that Django suit allowed us to create this new web interface where a person without no knowledge of how to create a Django file can describe the file format that they are going to export. So now DevOps, which are sometimes the only technical person in a government agency staff now he can without having to know be a data scientist as a software developer now he can use this tool to start exporting its government agency data. And we started getting this reaction and this is so software development is so easy I want to become a developer myself. So that's how successful the conversion to Django admin was. So the came from, we started, the platform got very popular in Puerto Rico so we had a company whose name started with M means very small and soft at the end. And they said, no we already fixed this problem that you are reinventing the wheel. We have tools that create web services from all our databases. Now, which ones of you work with web services? Which one of you like working with web services? Nobody, see? So the problem with web services is if you don't have sufficient documentation have any one of you have tried to reverse engineer a complex type from a web service without documentation? It's not possible. So you're very dependent on documentation and web services have become a way to promote vendor locked in. Another problem is the standardization. That's why we jumped from WSDL 1.1 to 2.0. And there's a 1.2 and 1.1.3 draft that never made it into the public because they were completely interoperable. And the tools that are created in WSDL files, the website description language files sometimes create description files which are not even interoperable between one vendor or the other. So it's like trying to assemble an IKEA table with instructions and bad things tends to happen. And so web services were just outside the door. So the tool had to be a rest-centric tool. Even if people didn't like it we had to do because there's a beautiful thing about rest. Rest and JSON are self-documenting. Even if you don't have documentation, you see these and you know it's a dictionary. It's a key value pair. And even as cryptic as the key value is you still have a rough idea what this is and how to operate it. And because we are using Django rest framework from the same solution now we can re-explore using Django rest framework vendors to a different formats. And I love this. This is why our company is a gold sponsor of Django rest framework. The browser ball API allows developers to start playing with the data to start exploring the data and get used to the data even if there's no documentation for it. So what about a unified query language to be able to access all these completely different data sets? This is the same reaction we got from the company whose name starts with M. They said, no, we already solved that. There's something called SQL that's used for accessing data where you want to create the re-create the wheel. Because stuff like this, this is the name, this is the source code from an actual website, a government website, I'm not gonna say the name. Shout out to our government. And they're actually concatenating and creating an SQL statement in JavaScript, trusting user input and not doing any kind of sanitation or checks. So I talked to the developer that did this from that company and I was gonna ask him if he knew about SQL injections and sanitation, but I said, no, I'm gonna ask him even a more interesting question. Did you know who Bobby Table is? And he said, no, so that was my answer. So SQL was out of the question too. Because SQL is not a standard, it's a structured query language. Whoever told you that SQL stands for standard query language was playing a really cruel joke on you. It is not. This is an actual question in Stack Overflow I did while creating the platform because I wanted to know how to limit the amount of results in a result set for that query. And it turns out it's not even that simple thing, it's not even standardized across databases. So we ended up creating our own language. It's called the Leakworld, the Leakworld query language. Now another problem with data exporting tools is that you need a software, a server, and a client. So still you get that element of bend or lock in. So what we did is we created a RESTful query language. Basically the URL is the query that will give you filtering a selection and slicing for the data. Here we have an example. This is a shape file, a polygon from the municipalities of Puerto Rico. And if you, the URL is kind of small. But if you see it, I'm asking, I'm having to prove two predicates. I'm selling it, give me only the shape, the polygons whose properties in the name municipal, which is municipality, contains the fragment GUA ignoring case. So I get GUA, GUAS, BUENAS, GA, GUAS. And instead of getting just a data, I'm telling it, give me that, render it into a leaflet now. This is a very nice feature of Django REST framework. The renders can also give you maps or charts or tables. They don't necessarily have to be numbers or serialized data. This is another example of a simple query. This is the crime points of the department of police. And we are filtering just for the crimes of PSY4, aggravated aggression. This is the kind of thing now we can filter, we can start analyzing just by rewriting a simple URL. Because we were using Django and leaflet, we're starting incorporating Django's templating system into leaflet, popup and markup language. And now from Django, we can start customizing, creating customized map and we created a map builder. And we can start doing stuff like this. I can take a shape file from one government agency and start doing stuff like this. This is the whole universe of crimes in Puerto Rico being filtered by the result querying from a polygon of a municipality from the Puerto Rico planning board. Basically this is a join between two data sets completely different, two completely different government agencies. And this is a municipality centric querying. And this is a URL that produces that. There's no code, it's just one URL. Looks complicated, what you will see in a moment is actually just four elements. And even with the first two, you can produce them up. The last two are just cosmetic markup. The first thing is I'm telling the engine what is the data I want to work with. This is the crime data. And I'm telling it filter all those crimes where the geometry of the crime, in this case a point, falls within the geometry and the minus bracket is a subquery marker. Where the encompassing geometry is the result set from a simple query to the planning board asking just for the polygon of the municipality called Arecibo. And the JSON path is actually slicing the properties of the geospatial feature and just giving me the data points, the map points. And then passing that then to the geometry and doing a filtering. So this is basically a typecasting during runtime from the URL. This is then, this is telling the engine to render a map, not giving me the data points and to be able to see the outline because the map wouldn't produce anything. I'm passing also context to the renderer. Please pay me the outline so I know what I'm filtering about. Because knowledge of the language there was a little bit of barrier so we created also a query builder for the tools and this is where you can start experimenting filtering data, you have a preview on the bottom and you can do stuff like producing the result set as a dictionary list. So it is already processed to be able to be plotted into a chart. You don't have to do any post-processing for example in JavaScript and you can take that as it is outputted into stuff like D3JS and already start plotting charts. Excuse me. And after you have the data you want all you have to do is copy-pay the query string. And we can start using doing stuff like this. This is an egocentric, a self-centric result set of the same crime map or the same crime data. I'm asking the engine show me all the crimes in our radius from where I'm standing from and the query is even simpler. I have the same police crime data but I'm filtering instead of filtering for the result set of a polygon, I'm filtering just for a point and because points don't have area, I'm doing a buffer which in this case, in this projection and zoom level is just 0.1 arcs which correlates to roughly 10 miles. So I'm basically telling the engine give me all the crime that have happened where 10 mile radius from where I'm standing to see if I'm in a danger of being mugged or assaulted or killed and it's a good place for example to do a party because the only thing that has happened is you saw aggravated aggression, just a fist fight. So if I'm gonna park my car, I know that it's a good place because car theft has not happened there in the timeframe that this data set was created. And we can do also this, this is called feature analysis. I can see how the crime behaves in regards to a geographical feature. In this case, this is the PR 22 Puerto Rico's biggest highway and it has been criticized that there's enough police, I don't know how to say this in English, but the routing, the preventive patrolling. So we did this simple analysis and we can automatically know that at the south to the south of the highway, there's basically no crime happening in the time set which is two years that was collected. So there is something happening at the north of the highway that's causing a bigger crime rate. I cannot tell you what it is, but now I can give you the observation to do the right questions. And this is the start of the scientific methods. I give you the observation, now you explain why this is happening. Before this, we had no idea this was even happening. The query to do this is basically the same, but instead of a circle or a polygon, we are creating a polygon runtime just for that point as square. So this is more in that now how the tool works. This is now more nitty-gritty details because I cannot filter or give users the data every time they requested it. It is a very heavy operation. We took a page from DNS. This is a write once, a worm, write once, read many times. So all the processing was moving to the import phase and this is the import phase. The first thing we did is do a scheduler because I cannot trust the government agencies to give me the data. They basically have to go and get it forcefully. That's how great they are. And the next step is to do an audience layer. So I tell the engine how to get to the data. Once the engine has the data, there's a data driver's layer, which is to tell us how to understand, how to process the data. It's an Excel file, a REST API. It came from a shape file. Then we serialize the data to store it to be able to store it in a database because it's binary data and for this specific implementation, we chose base64 encoded pickle files so that they can be stored in the database. Base64 pickle file. This is your cue to squirm to get nauseated. So it's not glamorous, but for this particular implementation, we wanted just to get the data out. So no MongoDB, no fancy infrastructure, just code one order of magnitude of your worst case scenario. See, you never thought you were gonna get a business less informed with Django to come to the center. And this is now the read part. This is where we process the request for the data. Just cookie cutter stuff, Django REST framework does most of it. We make sure that the user that is accessing the data has access to the data. Is it maybe we want to control these specific data set just for government employees or maybe it's a public, completely public data. Then we pass it to our own custom engine where the query is split into its parts. Filtering, grouping, aggregation and segmentation. The data is then deserialized from the database and rendered in whatever format the user is asking it. And then we just pipe that to the response, to the response object that Django REST framework supports. Now that, and this is a bookmark of the presentation, that's where the project died. After 12 years, I became tired of the hate if you're a software development in a place where nobody's technical, you get a lot of hate. If you work in the government, you get even more hate. You get hate explicitly, implicitly and secretly. Your boss hates you secretly because you are showing that he's not prepared for the job you are. Your coworkers hate you implicitly because you are the software developers, you are this wisdom technology and yet you've refused to fix the coffee machine. And the public in general, you are just a government employee. Employees that are gonna hate you anyway. So after 12 years of being a lightning rod of hate, I decided to move forwards. And now the company where I work for are very open for community projects and we are actually hosting our own copy of Libre and hosting public government agency in our infrastructure. So basically we are doing the government's job. And with this data hosted, now we can start doing really cool stuff like this, like for example, creating dashboards using completely disparate data. For example, this is the section of the data sets of the Department of Energy Agency has really not much interesting data like how much clients they have, how much energy they've sold. You can see it in the table. But when you plot it into stuff like this, you start seeing patterns. You start seeing correlation. You start seeing behaviors that should not be happening. That's for example, in this chart, you see that the amount of industrial clients, the power company has to excuse me because this is from right to left. At this time we had, we hadn't even implemented ordering in the engine. That's fixed now. So the power company has lost two thirds, two thirds of their industrial clients. And yet their revenue for the concept of industrial income never decreased. That's not supposed to happen. So we started selling stuff like this. This is for example, the dashboard of the health department. Puerto Rico is a tropical island. We have a lot of mosquito-borne based disease. Sadly, some people do die, but diseases are preventable. It's just about making sure that people get the help they need at the right time. So there's a lot of money allocated into awareness. If you see now, if you, when we plotted this and we added the data of asthma, the problem of asthma in the island completely overshadows the problem of mosquito-borne diseases. And when you look at the amount of budget that's being allocated for asthma research and asthma awareness, it's just a fraction of us, mosquito-borne diseases awareness programs are getting. And when you plot stuff like diabetes, it completely crushes the problem of asthma, even though both are chronic diseases, asthma diabetes is a really big problem in the island. And when you put hypertension to something very interesting happen, the behavior of hypertension in the island almost directly correlate the behavior of diabetes in the island. So a statistical will bark at this and say correlation doesn't imply concession, but you cannot deny that there's something happening there. The government of Puerto Rico also wanted that we have a very big problem in the town halls. The people are leaving the town centers because of technology, now they have Netflix and stuff like that. And the central government wanted to start giving free wifi in public spaces. And they were starting already to allocate a few million dollars till we got this map up. This is the map of all the municipalities which were by their own initiative giving free wifi in the town centers. So they were fixing the problem in the first place and fixing the problems and getting people already into the public spaces. So just this map save a few million dollars in budget. And this is the same crime map, like I said, created just using just an iFrame with just refilters, municipality, time, and type of crime. And when you start running this, you start seeing time-based crime maps and you see how crime is more organic than you think. Crime behaves very differently from the time of year and even, excuse me, from the time of day. We started seeing how most crimes have a peak at 2 a.m. And yet how Steph, it speaks was 9 a.m. 9 a.m. and 12 p.m., 12 p.m. Usually the times the working class were outside the homes. So even doing stuff as simple as sending employees at different time brackets to have lunch at their houses would have reduced the problem of house theft. This is, this was one, a very interesting data. This is the department of solid waste data. And they give it to me, they were very nice. This was one of the few government agencies that really cooperated with the effort. And he said, but this really was worthless. They had that, I'm gonna give it to you and say, just put it out there. People are gonna find a way to use this. And they did. This is a project from a hackathon. They actually won this mobile app, a web app. It was created by three university students in less than 24 hours. And it's called GeoTires. And it's this scenario for the application is you're just doing internal tourism in the island and suddenly your car, you got a flat tire in a place you have no idea. You don't know anything about that place. So the application will give you a map using our technology and give you all the places where you can go fix your car before you become stranded with the metadata so you can call negotiate places. And if you click, we'll give you the route to get there as soon as possible. All this from a data from a government agency that's actually disappeared because that's how unimportant the government think that is from a data that even the government agency was producing it thought was worthless. Now we can have a commercial, a product that can resolve a real social problem. And this is a snippet of the code and you can see we are serving. You can see the name of the company in the middle and you can see that actually they are feeding the application from the instance, the public instance, we are hosting. They get the latitude and the logi to be a JavaScript from the user and they just filter it. Thank you. So these efforts got noticed by one great, awesome government agency, the Institutes of Statistics and they contractor us to start, they have a massive amount of information, a massive amount of data and very few tools to get to them. So we got in contact with them and the stuff that have been happening with that data is amazing. I'm gonna try not to get you all killed. I did sacrifice my last copy of Microsoft Office of the Gods. These are the maps I just showed you. This is using all open source software, the open source Libre Engine and the Cards BI and opens on a dashboard applications, application I created from Django 0.6 beta. You can see the behavior of the electric company. You can see the behavior of the Puerto Rico grid for the last 10 years and you can see the peaks and the valleys of how usage behaves in Puerto Rico and you can start predicting which month of the year the grid is most likely to collapse and the power company didn't knew this and when they were sending their brigades in June and July and when they saw the data we realized September or October are usually the most. So they were paying overtime in two months that nothing was happening and they didn't have enough brigades at the times at the months of the year that the electricity was collapsing and we also saw a very interesting curve. The electric company likes to make everybody uncomfortable and blame the problems of the electric grid and the people that you are wasting too much electricity. Please turn out what your likes. But this chart demonstrate that now 1.3 million kilowatts in 10 years ago Puerto Rico has actually consumed less electricity now than 10 years ago. So why is the grid continue to collapse? It's not because of use, it's because of a lack of maintenance. So now it can start shifting play and it can actually point fingers now. So this is just the open source version and now this is the commercial version that we recreated from scratch. It doesn't use base 64 anymore. Now it uses some more sane solution and the kind of projects we are doing with it are much more interesting like for example this one. Exodus in Puerto Rico is a very serious problem. People are leaving the island and now coming back at one rate at an alarming rate. This is all bureau transportation data combined with census data and this is the comparison chart of how many people are leaving the island, the destinations, the airports they're using to leave the island and the final aggregation. 5.2 million, 5.9 if you round it up, 5.2 million people left the island in 2013 and only 5.2 came back. So I have a difference of minus 1000 residents in an island of only 4 million people. You can see the problem now. And there's stuff like this, I can start predicting the peaks, the touristic peaks. So at what times of years the government need to prepare to receive and to make enough accommodations for tourists to see if it can fix this problem. And we can start doing things like this. We're gonna have to, there. There's a big problem in Puerto Rico and it is a political issue that says the argument is that there's more toxic emissions in places where there's a lower economic income in the area and this is a great project to start experimenting with that. It is a map that will, I'm getting some latency there. It is a map that correlates the amount of toxic emissions which companies are emitting them if they're emitting the correct toxics they are registered for and the income level of the area compared to the mean gross product, gross GPD of the island. And you can start doing experiments like this to see how this theory is correct in the island. You can see that some companies are starting to throw into the atmosphere these nasty chemicals just a few miles behind your backyard. Nobody knew this until we did this. So that pretty much is my wrap up. If you have questions, comments, please be kind.