 So we are happy today having Howie who will talk about open data APIs in Singapore and I will let him do his own introduction. There is more place here if you have hard time seeing it. It's a good time to move closer. Hi everyone, so this talk is going to be about exploring Singapore's open data APIs. So a bit about myself. So my name is Howie, I was previously a software engineer at Dropbox in the States. Now I'm at a small firm called Bright Technology Services. So this is a consultancy in Singapore and we do data science and software engineering consulting. So if you guys have any projects that need to get done or you need training that needs to happen, you can come and look for me after the talk. And I have handles on GitHub, Twitter and I have a technical blog that some people may or may not have seen. So this talk was actually based on a blog post I made a while ago which I was talking about how to use the LTA APIs to access the Singapore bus network information. And so if any of you want to go look at it later in the more verbose form you can Google for this and you'll turn up. So what's open data? So open data is like open source. This idea that the data is available to anyone without any sort of payment or contract or prior agreement. So in the old days, if you are a big company and some kind of valuable data, you'd keep it as a valuable asset, you'd sell it to people for lots of money, you'd use it as a bargaining chip if you want to negotiate with someone. Or whereas with open data it's different. You just make the data available for free and like open source you cannot sell it for money but in exchange you may get other people building things on top of your data that you did not need to build yourself. Other people build applications that use your data. Other people build tools or functionality that use your data that you yourself do not need to build. So like many other governments, Singapore has a bunch of new open data initiatives especially with the new smart nation stuff that's happening. So I'm not directly involved with that but I've spent some time poking around with the different online data sets that are available. So here are some. There is the data.gov.sg website. So this provides a small number of real-time data APIs, for example tax availability, PSI, PM 2.5, as well as a bunch of non-real-time, more aggregate statistics on their main website. So here you can find things like the birth rate or things like domestic exports. These aren't really APIs of the speak. They're kind of just data sets get published once a year with a small number of data points to say, oh this year the birth rate was 1.2 or this year birth rate was 1.3 and every year there will be a new data point that gets published. Apart from that, your LTA has a separate data site which has a bunch of both static and dynamic data. So it has geospatial data, basically shape files that describe all the roads, all the bus stops, all the MRT stations in Singapore. So if you want programmatic data about what the shape of the network looks like, a lot of it's available here as well as real-time data. So for some reason they're real-time data. One reason is in the PDF which is a bit awkward. But apart from being in the PDF, it's actually quite good documentation. It's very thorough and has all the things there that you'd want except that it's in the PDF so it's not indexed by Google. And you have others. NEA publishes real-time APIs on like two-hour forecasts, one-day forecast, four-day forecast. The library board has RSS feeds on books and one map has a bunch of interesting APIs based around location. So you can do things like, what's interesting here? You can do things like geocode or reverse geocode using one map API. So given this location, let long in Singapore, what is there? Or given this thing, where is this location? So one map has a bunch of those APIs that are also freely available. So the whole range of what they provide. So as I said, some of them like this aren't, they are open data but they aren't very interesting to programmers. So this is a kind of spreadsheet that, this is a kind of PowerPoint presentation that you put on a poster or you'll show your boss. And it's the kind of thing that you do once a year but you don't really care about if you're a programmer because anyone can look at it and read and look at the data, understand the data. Similarly, you have lots of small-scale data Excel spreadsheet of the year-by-year, year-by-year headcount of each ministry. It's kind of cool but it's updated once a year and generally there's not much you can do with it. So things like NEA weather forecasts are probably more relevant because it's real-time. So it's something you actually can build an application on top of. It's not just a report that gets published once a year but it's a tool that you can integrate into other tools or other applications. You can make a weather app using it, you can make a website using it and you can do other things using this real-time data. The geospatial data is also interesting because it's these huge shapefiles with lots and lots of data points just describing the whole geometry of Singapore like what are the shapes of the roads and the shapes of all the public transport networks. So you could build interesting experiences with that and of course there's a real-time data like the bus network arrival times. So given this bus stop, I don't remember what bus stop this is, you can see this bus number 15 is going to arrive next in, what is it? It doesn't say when it's arriving next. Yeah, the estimated arrival time. So it gives you the GMT or UTC arrival time and you can use that to calculate how long more it is in minutes. So for this presentation, I'm going to go through a few short demos of things you can build with Open Data APIs that are kind of neat. So the three things are we'll make a bus timing board similar to ones you see at the bus stops around Singapore. We'll make a bus trip planner that will tell you given one bus stop and another bus stop which buses can you take that will let you transfer or go directly to the second bus stop from the first. And we'll make one of these neat taxi maps you may have seen online. So show where all the taxis are in Singapore using the APIs that tell you where they're that long of every taxi. So I put these examples on GitHub. So if any of you want to go and poke around yourself, it's github.com. There are, the Python files aren't actually meant to be directly run. You can run them, but they're meant to be kind of copied and pasted interactively so you can see what the results are. So it's more of a guide rather than a one-off script. So let's start with the bus timing example. So this makes use of information that's real-time information that's available from the LTA Data Mall. So to get started, you need to request API access. This doesn't really do any cost anything. It's just so that if you start flooding the API with too many requests, they'll know who to call to ask you to back off. Or if you start doing weird things with it, they'll know who it is. But anyone can sign up for free and they don't do any sort of validation. So there's really no reason not to sign up other than the fact you have to read the captcha. So if you sign up, you get the API key, which is just a basic C4 string you can use to identify yourself. And after that, you can go into the API documentation to see what you can do. Let's see. So these are the main APIs that they provide to the public. There are probably others that aren't as well documented, but these are primary ones. So we can look at what they provide. So they have bus arrival times. They have information about what services, bus services are running, what their routes are, where the bus stops are. You have information about the taxis, ERP rates, information about estimated arrival times. So all that is something that you can just make a HTTP request and get. For example, if I want to look at, say, traffic incidents, this is one of the more simpler APIs that they provided. You would make a HTTP request to, let's make this bigger, to datamol2.mytransforsg-ltao-dataservice-traffic-incidence. You don't need to pass in any arguments for this one. It's just all incidents for the whole of Singapore. And it gives you information about what the incidents are. So we can look at that. So let's go back here. So for this demonstration, I'm going to be mostly using the IPython console since you can play with the data interactively. So to start off, let's make this bigger, bigger. Cool. So this is the key that you're meant to get from signing up for the datamol. I saved in the file, so I can just load it. And if you use the Python request library that some of you probably have used before, it makes making the get request very easy. So let's make this bigger, big, big, big, big, big. OK, something like that. So here we are making the get request to the traffic incidents website URL. And we are passing in a count key along with our API key that we are using to authenticate ourselves. So if we run this, it runs pretty quickly. And you can see we get a list of traffic incidents that are around Singapore. For example, if we wanted to extract all the messages, we could look at what's available so that there's this old metadata which we maybe don't care about and a value that we probably do care about. So it's value. This looks like it's a list. So we could say item message for item in that. And this gives us a nice list of all the incidents that have happened right now. So some of these endpoints are paginated. I'm not sure if this one is. So if you look at incidence value and look at how long it is, I guess right now there are only 20. So it wouldn't be paginated. The API paginates at 50. The LTA API paginates if there are more than 50 items. So we can see that if we make a request or something else. For example, if instead of incidents, I make a request to bus stops. Let's rename this. Cool. So again, this is a dictionary which has a metadata we can ignore and gives you 50 bus stops. And each bus stop has a bus stop ID, the description, which is what you'd find on the various bus apps, as well as the latitude, longitude, and the road name. So the latitude, longitude tells you where in the 2D space of Singapore this bus stop is. So if you want to get more than 50 items, you need to pass in the skip parameter. This is something that you probably won't notice if you try to do yourself until you bump into it. All the requests giving you 50 items back and wondering why there are only 50 bus stops. But it is, in fact, well-documented. So if you look for skip, it's here somewhere deep inside the PDF that you're meant to read. And it says that you need to pass in. It only gives you back 50. So we can do that and params equals $skip, let's say 50. I don't think it gives you any indication. Let's see. So it stops keys, OData, metadata. Did I get that right? Metadata. I don't think it gives you any indication to get more. You can just keep doing it until you get zero back. And then that's how you know you've gotten everything. Yeah. So the skip is absolute from the start. The skip is absolute from the start. So if you want to actually get all the data that you think you will get, you probably need to write a Y loop like that that just keeps fetching and fetching until you get it all down. So we can run this. And then now if we fetch all with the, say, bus stops, you can see it will fetch some 50 at a time. So while this may be annoying if you're doing it manually, if you're a programmer, it just needs to wait a bit longer. And in the end, you still get all your stuff out. It's downloaded. If you make more than a few million requests a day, they will call you up and ask you what you're doing. Yeah. So I guess that's why they asked you for your contact number. So if you start dosing their servers, they can tell you to back off. Yeah. So now if you look at out 22, you see that we get, let's call this all stops. You can look at all stops. I guess you have data for 4,860 bus stops. And from there, that's when you can actually use the data to do more interesting things. Now that you have all of it down on your computer. A lot of this data, especially for the non real-time things, are actually relatively small and don't change often. So it's not a problem to download it and cache it somewhere if you don't want to keep waiting. So for example, if you look at bus stop number 0, which is Hotel Grand Pacific, we could extract fields from it if we want to. And we could write a helper function that's at search for bus stops. So if you do find stops, say orchard, it can give us information for all the different bus stops with the name orchard in their name, with a substring orchard in their name. So in order to build our bus arrival time monitor, which is this fellow, we need to get the information about bus arrivals, which is available under this endpoint. So if you make this bigger, so you need to pass in to hit this URL, pass in this bus stop code mandatory. You can pass in a service number if you want. If you don't pass in a service number, it'll just give you back all the services for that bus stop, which is probably fine. And you can ask for the ETA in Singapore time or GMT. So that doesn't really matter. In the end, it's the same. So let's do that. If we go back here. So let's see if we define this fetch function, so I don't need to keep typing in the account key, colon key. OK, then have a snippet of code like this, which we'll pull down. We'll look for any stops with the name opposite authorization. Take the bus stop code from that stop, use that to fetch the bus arrivals, pull out the services, and then display the service number and the next estimated arrival time. So this is something we can run piece by piece. So for example, if we find stop we defined earlier opposite authorization, it only returns one stop. We then fetch it. In this case, the bus stop code is 09022. I think that looks right. And that gives us a bunch of data. Let's assign this to variable arrivals. So this gives us a list of all the bus services that can arrive at this bus stop, that are arriving at this bus stop. So if we look at, for example, any sample, the first one, you can see that the first one is service number 106. And the next estimated arrival is this time in GMT, which is a bit hard to understand. But if you look at the minutes, which is probably the most useful right now, you can see that it's arriving in about eight minutes from now. And similarly, they provide information about the next two buses sometimes. This time it seems it didn't. So if we run this snippet of code, this will do the same thing for every service, not just for the first service. And it will give us a list of the service number, which is on the left, and the estimated arrival of the next bus, which is on the right. So I guess that the nightrider services haven't started yet. 36B, 36A, 174E, maybe they are not active right now. And the rest of them all have the timings available just for the next bus. So parsing this into something that's more reasonable is a lot easier if you have a library, like if you have Pi ATZ or one of the other daytime libraries to do it for you. To keep the example not pure, more or less pure standard library, I'm just going to use a helper function that does a parsing with a bit of string mangling and a string p-time call. So this will take the time string, parse it into a UTC time, and then subtract it from the current time and then convert it to minutes. So I can call this on the area I had earlier. And some of the data points are weird and that's always a problem with these kind of open data. AFI is a data points that will always give you what you want. But most of them are reasonable arrivals for the next few buses. So 106 is in six minutes, 101 is in nine minutes, 14 is in four minutes, and so on. So you can try it for other stops too. For example, if I swap this for Kalang station and I do this again, you get the bus stop arrival times for Kalang station. So that's more or less the core of what is happening with these bus arrival indicators. So while they have a bunch of electronics and a bunch of networking, they make it work. In the end, all they're doing is taking data from the publicly available API and displaying it on the screen. And you can take the same data from the same API and display it on any screen you want, whether your screen is a terminal in this case or whether you want to make a mobile app or you want to make a chat bot like bus uncle that you can ask what time the next bus is coming. It really doesn't matter. And it turns out that the fetching data is very easy once the data is open like this. I didn't need to have any sort of prior agreement with LTA to give me the data. It's just there and anyone can take it. Okay, so that's the first mission accomplished. So the next mission which will take a bit longer is to build a bus trip planner. So if you've used Google Maps or in the old days of go there.sg, you put it in a start location, end location and it will give you a bus route, including any transfers. For example, there's one transfer here that will take you to the final location. Of course, the more fancy ones will have walking distances and MRT trips but we will ignore that for now. So the basic piece of information that we need in order to make this work is we need to know which buses go from what stop to what other stop. And this is information available through their bus routes endpoint. So not bus services. So we can query both of them and see what they return. So if we fetch all the services. So this is just calling the same fetch all functions just with bus services. There are I guess 600 something bus services and each service, each point of information from this endpoint only gives you the frequency and who's operating it and the start and end bus stops. So it doesn't tell you all the things in between. Like this is kind of neat. For example, if we want to, we could use this to see how many bus services each of the operators is running. So we can see that SPS transit is running 350 and tower transit is now running 45. But it's not useful for us to do our route planning. So to do the route planning, we need to use this bus routes endpoint. So if I fetch that down, you'll see that this takes a while. And once the data is all down, you'll see why it takes so long. But this is the reason why I ended up getting called from LTA telling me to not use so much, not make so many API requests. So this is definitely something that you want to cash if you're actually making use of this. So I think it's sometimes 20,000 endpoints. So this should finish another 20,000 data points. It should finish another 30 seconds to a minute. But basically what the problem is, is it gives you one data point for every bus stopping at a single bus service, stopping at a single bus stop. So it's kind of like what would happen. It's kind of like a denormalized database schema where every bus service and every bus stop is a single entry. So because there are 4,000 bus stops and 600 bus services, there are a lot of data points in this particular table. Are we done yet? Okay, we're done. So look at all route stops. Look at, for example, the first one. You'll see that each piece of information says bus stop number 4141 has a stop from service number dash S49 and it's the first stop in the sequence. And so that's why there's so many data points in this table. And it tells you what direction the bus is going in case it's because the busses go in both directions 2 and throw and what the first bus and last bus and all the other metadata which you may or may not use. So this is a bit of an awkward data structure but it gives you all the information you need. You just need to process it into something that's more useful. For example, if I want to do, if I want to make a bus strip planner, presumably I want to be able to directly say, given the bus stop, what other bus stops I can get to with the direct bus and what other bus stops I can get to from that one and so on and so forth. So I want to know from one bus stop what all the immediate relations are. In this case, it takes a bunch of code to get that. So the first stop, the first goal is to aggregate this into a set of routes that we can read into a set of routes which say given a route for bus, given bus number 174, what are all the bus stops along that route for both the outwards and inbound routes. So that is relatively straightforward with the for loop and a bunch of if statements. It's not particularly complex code. So if we run that and look at all routes, let's see what the keys are. We'll see that the keys are all the bus stop, for example, 160M1 is the bus stop, the bus service ID which is 163M and the direction which is direction one. Some of them are direction two if it's going backwards. And if you look at what's actually inside that entry, 163M1, I forgot the closing string. We now have a list of every bus stop that bus 163M1 in the outward going direction stops at. So we can say bus stop code for stop in that. Oops, messed up my brackets. These are all the bus stop codes and if you want to, you can feed it into the bus stop API to get the actual description of this bus stop. So now we have the ability to go from a bus service ID and an inbound or outbound flag to all the bus stops along that route. The next thing that you'd want to do if you're trying to do a graph search to find a path from one bus stop to another bus stop is to find given each bus stop what are the other bus stops I can get to directly. Because if you want to do a breadth first search or an extra or something, you always start off given where I am now, where can I go from here. And that's also relatively straightforward using just a bit of Python. So this snippet of code here, let's make this bigger, takes the routes that I've just calculated and computes a dictionary where each entry and dictionary is a bus stop. Let's see. Each entry and dictionary is keyed by the bus stop code and from that bus stop, it gives me an array. And what's in that array? That array is filled with a service and a bus stop that you can go to. So for every bus stop, I can say, well, one-on-one can take me here, one-seven-four can take me there and there and there and maybe bus like thirty-six can take me there. And that's what ends up being in this all stop connections function, dictionary. So if I run this, all stop connections, all stop connections, this is rather big. Let's interrupt that and just pick a single bus stop. So I believe this is the opposite orchard station bus stop and after running this snippet of code, you now can see, we now have a data structure that tells us from orchard station or from any arbitrary bus stop, what bus stops can I get to and what are the bus numbers that will take me to those bus stops? These are only for direct buses, so far it doesn't include transfers. So now that we have a dictionary that can tell us where all direct buses from any bus stop can go to any other bus stop, it's a matter of doing a breadth-first search in order to find how do I get from one bus stop to any other bus stop that may or may not be direct. So I'm sure, has anyone here not used a breadth-first search before? So breadth-first search basically says I start from my bus stop, I go to one's out next to it, one's out next to it and eventually I find the bus stop I wanted and look at what path it took to get to that bus stop. So you basically start off with a queue of bus stops to look at. In this case, I'm just using a Python queue module, which has a queue. In this case, I'm starting from opposite orchard station which is 09022 and apart from the bus stop ID, I'm also going to put in two other placeholder variables, which I'll show you why in a minute. And for most graph algorithms like this, you often want to keep a scene set so you don't trouble the same thing unnecessarily multiple times and we're going to have a result to put in the result variable. After that, you basically have a while loop which says while there are still bus stops to look at, I take the current bus stop at the head of the queue, take the stop ID. If it's the stop ID I want, I believe that's a stop ID for Anglo-Chinese school independent which is where I went to high school. If that's the stop ID I want, I'm done. I found my result. If the stop ID is the stop ID I've seen earlier, that means that I don't need to look again because the time I came earlier, I've already processed the stop and all its possibilities. Otherwise, I mark it as scene and if I haven't seen it before, I go to every connection and I will put that connection into the queue. And in this case, the two auxiliary variables I'm putting in the queue are the service to go to that bus stop and what the information about the current bus stop. So that means when I get to the end, current will not just include the final bus stop but also include all the bus services and bus stops along the way. It took me to get there. So I can run this which is currently hard-coded from Orchard to Anglo-Chinese school and now if you look what is in our results variable, we see that it starts off with this bus stop 09022 which is Orchard. None-none means I didn't take any bus services to get there because it's the first bus stop I'm starting off from. Then I took 174 to get to 11091 and I took 48 to get to 40069 which is the final stop I wanted to get to. So this code is a bit hard to read as a result partly because it's in this strange nested format partly because the bus stop IDs aren't really readable but you can easily process it and this is a bit of a wire loop that will convert it into a more normal array of tuples format and from there, let's see, all stops, have I done that yet? So since I have all bus stops, I can make a dictionary that lets me look up any arbitrary bus stop by ID. Let's say 40069 is Anglo-Chinese school and I can use that to pretty print which bus stops this route is taking. So for example, here I'm sitting. So for every stop ID and service, it says take service to the description of that bus stop given the stop ID. And let's see. So this says take from opposite Orchard Station, take 174 to Tulip Garden and take 48 to Anglo-Chinese school. So that's a route. And we can check it online if we don't believe it. So if you go to like LTA bus routes or let's see LTA bus 174. I believe they should have like an online thing. Oh, here it is. So this bus 174 which... bus 174 stops at before Orchard Station and then ends up at Tulip Garden. And similarly we can check bus 48 which stops at Tulip Garden and ends up at Anglo-Chinese school. So if you don't believe it, it does actually work. So this is a relatively naive, not very clever solution. It does a breadth-first search. So it finds any bus route which has a small number of transfers. So it doesn't take into account things like how long each bus... how long each direct bus is in that route nor does it take into account how many stops there are along each route. But those sorts of things are relatively easy to graft on. So if you look at all routes... all route stops array, for example, they actually tell us how far along each bus route each stop is. So if we wanted to do our search and use a Dijkstraz for example and weight each edge... the cost of each edge is how long the distance is. We have the data there. You can just feed it in. And similarly, if we wanted to... if we wanted to count how many bus stops there are along each section of the route and use that as a weightage, for example, maybe each bus stop takes a fixed amount of time for a bus to stop at, we can also feed that in using our all routes information. So you can try other ones. For example, if you tried to go from Bunle to Changi Airport, which is kind of far... just feed it into our code. Let's see. Find stops. Oh, Bunle. Let's use Bunle at Bunle CC and Changi Airport. So that would be bus stop number 21439 to stop number 95011. So we can plug that into our breadth-first search. If this was in real code, you'd put it in a function and reuse it. But since this is not real code, I'm just going to edit the hard-coded constants. So it's 21439 until 95011. Then we run that. It takes a bit of a moment to run because it's Python. And we look at results. It's a slightly longer route. If we feed it through the same processing we did earlier to first convert it into a nicer data structure and then print it out. You can see it tells you to take 240 to Bunle interchange, take 187 to Woodlands interchange, and then take 858 to Changi. Probably not the fastest route in terms of distance covered. But I guess it's one of the... I guess it's one of the fastest routes according to number of stops, number of transfers, and that there aren't any... There aren't any one transfer routes go from Bunle CC to Changi, so you have to transfer twice. Cool. So that is how you make a bus strip planner using the open data. It's kind of neat because I didn't need to really have any interaction with anyone. All the data is available online. I just need to solve a capture. I get access to it. And the documentation is all online, so I just need to read the documentation. And once I've done that, I can do whatever I want with all the data. So if you look for... If you want to augment this, like the bus route distances are online, number of bus stops are online, the MRT station information is all online. And there's also API to calculate walking distances. So if you want to include like a walking distance metric in your search, you could do so. So that's for the bus strip planner. I guess the last interesting demo is going to be the taxi map. So has anyone seen this kind of thing online? It's like someone's cute demo on his blog or on Facebook. So all this information is publicly available and you can make one of our own as well without too much effort. So while the previous two APIs are both the LTA data mall API, this one I'm going to use the data.gov.sg taxi availability API. The same data is also available from LTA, but this one gives it to you in a more convenient format. So for data.gov, you also need to sign up for API key and just new sign up for account, putting your email, putting your phone number, same as for LTA. And you make a request to this endpoint and get back the lat long of all the taxis in Singapore at the time you made the request. So let's try that. Let's see. So over here I have a similar script file. I need to start off with getting an API key, which I've already gotten and putting it into a variable. After that, it's a matter of making the get request to API data.gov, V1 transport tax availability. You need to pass in a date, time of when you wanted. They also have historical data. I'm not sure how long they keep historical data, but you ask for like a few weeks ago or a few months ago, and it's all there. And you pass in your API key and when it comes back, you get back a big array which you need to extract. So this should be features. And then let's look at what... I guess the first one, you want the geometry. And then you want the coordinates. And this gives you the latitude-longitude of the 4,600 taxis that were on the road at last midnight GMT, which was approximately, I think, like 10 a.m. this morning or 8 a.m. this morning. You can ask for different times as well. So this is a lot more convenient than paging through the LTA API batch by batch. Can you just give it to you all at once? And it gives you a lot less, I guess, miscellaneous information, but if you wanted to look at where the taxis are, this is all you need to know about where all the taxis are. So to explore this, we can say, for example, we want to find out where the max longitude or what the max longitude or minimum longitude is, which is information we will need later if we want to draw a pretty picture of where the bounds of the picture are. So if we want both max and min longitude and latitude, you can do that with a simple loop that maxes and minns all of them together, respectively, for x and y. And then we can see that we got the x goes from 103.6 to 104.0. I think I mixed up the min and max. Oh, well. y goes from 1.2 to 1.4. That's the latitude above the equator. So if we want to fix our box as something that's... as something we can display in the terminal, I'm going to have a width of only 40 pixels for our... of only 40 pixels for the heat map. So it won't be really high depth and high production value like this and it'll just be a really crude, small little heat map showing you where the taxis are. So given the width is 40, you can do some math to figure out what the height should be given that you know the width and height of the whole bounding box. It turns out in this case, the height is 23. You can make a grid of zeros that... Let's start with a smaller width and height. Let's start with 20. So width 20, height 10, grid of zeros, 20 by 10. And then it's just a matter of taking all the taxi coordinates. So I've already showed you how we need to query into the JSON data structure to get the coordinates out. So we take all that. We calculate which x cell and which y cell it should be in the grid of 40 of 20 by 11. And we simply increment that by 1. So if you run this and then you look at your grid, it looks a lot different now. And it's not displaying very prettily. But you could force it to display prettily and you end up with a grid that looks like this which is quite kind of cool. You can kind of see where Singapore is, right? This is diamond shape here. Then that's Singapore and there's like a hole of zero here where there are no taxis because that's Upper Pierce Reservoir. This is cool. And then of course the rest are all zero because there are no taxis in the straights of Johor. I believe it covers all the taxi companies. Not Uber? Not Uber, I guess. It should cover things like Grab Taxi but I don't think it'll cover Grab Car and those. Yeah. It also doesn't provide any information about the individual cars. So it's purely useful as an aggregate visualization. So you can't see which taxi companies are crowding in which areas or which cars are going in which routes even though that might be useful because of privacy reasons. So now we've gotten a crude map of Singapore. We can make it slightly bigger so it looks good on the screen. Let's do that. OK, so slightly bigger. And the next thing to do would be to make this actually more readable because this is not very easy to read despite the information being there. For example, you can see that there's one spot with 315 which is Changi Airport. All the taxis queuing up outside terminal 1, 2, and 3, that's 315 of them. So there are a few things to do to make it readable. One is you probably want to get rid of all the zeros because they don't tell us anything. And others, you want to color it so you can see at a glance where the high density and low density areas are. So making it all zero is relatively easy. But I'm just going to do both at the same time. So for terminal colors, you can print color in the terminal. So for example, if you've, so there are, so terminal, most terminal supports 16 colors, which is like red, green, blue, yellow, magenta, those are a few like that. They also support 256 colors, and some of them will support true color, which is two million colors. So for the first pass, I'm going to be making use of the 16 colors or just five or six of them, which is red, green, orange, blue, and purple. So basically how this works is if you print out this magic incantation. Oops, cap stop. If you print out this magic incantation followed by hello, it, in that case, it turns everything white, so that's not very interesting. I guess I want to print out another color plus hello plus white, so it goes back to white. Then that allows you to print out color text in terminal. So if you're writing command line tools, it's sometimes useful to make it easier to read a skimmed output. And there are libraries to do this if you don't want to be printing out magic strings yourself. So I can define, I'm going to define what these colors are. And I'm going to say that the colors I'm going to use in order are blue, green, orange, which is orange, yellow, and red. So the high density are going to be red, lower density are going to be blue. Next, in order to properly spread out the colors so that the colors are more or less evenly distributed on our taxi map, we need to calculate percentiles. So we can say the lowest 20% of the grid cells are going to be blue. The next 25% are going to be green, orange, and red. So doing that in Python is relatively easy. It would be easier with something like NumPy, but doing it ourselves is trivial as well. So I count all the cells which are non-zero. I sort them and enumerate them. So I list the enumerated cells. Now basically every cell will get a ranking. So for example, the Changi Airport with 315 taxis is the top of the list with ranking number 261. And at the top, you have all the cells with not you have all the cells with only one or two taxis and those are ranked lower. So from there, you can invert the dictionary which says given each value, what would the ranking be? So previously I had a list of ranking to value, but you can easily invert it in Python to say what's the value, what's the ranking. So now I have this rankings dictionary and you can see that 315 is at the top with rank 0.99 and the cells with one taxis are at the bottom with rank 0.1. So once that's in place, you can use that to color the cells based on their ranking. So this is a bit of a slightly more of a boast we are printing it, but what we are doing is I'm adding a zero check to print out the blank strings to make sure we don't have the zeros cluttering our field of view. And when I'm for the non-zero cells, I'm going to calculate the index into our color array using the ranking of that cell and use that color to color that number. So if we run this, we get a much easier to read view of where the taxis are in Singapore. So we can see that, for example, this 105, 151, this 351, these are all red. So the CBD area around China is a lot of taxis. Again, there's a hole in the middle. Lim Chukang has very few taxis. There's no one booking out at that time. And the last thing to do in this demo is we can make use of the more fine-grained 256 colors in order to get the better visualization of where the taxis are because in this case, it's still kind of unclear that the Changi Airport with 315 taxis is the same color as, say, this spot with only 25 taxis. So the way the colors work in the terminal is somewhat arbitrary but well-documented. So essentially what you need to do is you end up with, how should I say this? You end up with a magic string that looks like this. So you have the magic string that says this is a color and the index is an encoded version of the red, green, and blue for that color. That's how it works for the 256 colors. So the index is 16 plus blue from 0 to 5 plus green from 0 to 5 times 6 plus red from 0 to 5 times 36. So it's like a basic encoding of the RGB plus 16 because it needs to not clash with the initial 16 colors that were already implemented. So you can see over here, this is a function that's scaling it from 0 to 5. And after that, if we run this, before that, we are also converting from hue saturation value to RGB. So hue saturation value lets you control the color as well as the brightness and the saturation separately. So in this case, we are always going to have the saturation and brightness to be one and one. So it's always going to be bright colors, but we're just going to change the actual color based on the ranking of the taxi cell. So if we run this, you get a much more, I guess, pretty fine-grained version of what we saw earlier. So you can see that now only the truly hot areas are red. And now there's a nice gradient of oranges and yellows that slowly bleeds out into greens, blues, and purples where there are very few cabs. And if I just re-render this with a slightly larger, so this should just take all this and paste it in. Yeah, so this is ad hoc taxi heat map built in the terminal using the open data APIs. It's kind of pretty. I think it's kind of pretty. Maybe my sensibilities aren't good. OK, so that's the taxi map. So I guess in the last 40 minutes or so, I've built three small demo applications for you using the Singapore's open data APIs. One is the bus timing monitor using LTA's bus arrival time API. Another is the bus trip planner, which is using the open data available about the Singapore bus network, also from LTA. And the last one is the console version of the taxi heat map, which is built using the data.gov.sg APIs. So I guess what's the point of all this? At least to me, the point is that open data is nice because it lets anyone, no matter who they are, make use of your data to do interesting things. So in the old days, if this was not open, you need to be a company to do a deal with LTA to get access to the data in some way, and you then need to have a big team that would productionize it into some kind of application. But nowadays, because all this data is open, you can have one guy with 40 minutes of time and make use of the data to build a bunch of interesting applications. So whether you're a small team in a small company or a big company, whether you're a student working on a school project, anyone is able to make use of these open data APIs in order to make cool applications. And all of these applications are made totally within the grasp of someone in the school project. They're not difficult, and they are not a lot of code. So for example, the taxi map visualization, which we drew in the terminal, this one, is 100 lines of code with basically no libraries, except for the request library. And the bus arrival time indicator plus the trip planner is only 200 lines of code, including a bunch of experimentation and exploration code. So it's really easy, much more so than if it was a closed proprietary data set that you had to do a deal to get access to. So these are the subset of the Singapore Open Data APIs that the government provides. So I'm not working with them, but I was just fiddling around. That's how I learned about all these things. Apart from these, there are a few more that's not comprehensive, but there aren't a huge number of APIs available, unfortunately. So if any of you have worked in companies trying to expose the API to third parties, you'll know it's a lot of work to do it properly. So hopefully over time, more of these will get opened up. But for now, this is what's available, and this kind of constricts what you are able to do with the data. So that's exploring Singapore's Open Data APIs. Hope you enjoyed the talk. I still think this is really pretty. So all these, the two scripts which you can use to follow along are available on GitHub. And as I mentioned, they're not meant to be run, but they're meant to be kind of walked through yourself, paced into a terminal to see what happens. They're all Python 3, which is the future. Yeah, so I work at Bright. So we do data science and software engineering consulting. So if any of you have data science, software engineering work that needs to get done, or need training, come look for us after. And that's my email. Thanks. I guess any questions? Just one question. Yeah? Have you worked with URA data? I have not really worked with URA data. What did they provide? They provide all the proper transactions. Okay, interesting. For three years. Okay. So the problem is only that they only provide three years. Okay. And if you apply to their real list, they give you 10 years. So you pay, you get 10 years. Okay. If you don't pay, the public service, I think, is only three years. And they provide in CSV. Okay. It's an API. Okay. I guess that's reasonable. You generally don't need it to be real time. Right, right, right. Yeah. I guess the whole thing is in the process of slowly opening it up. Like when I first tried to get the bus data, I was told there was proprietary and they could not give it to me. I just wanted to know where the bus stops were and back then it was all proprietary. So I assume over time, they'll probably open up more. Yeah. Yeah. Any other data that you work with in real estate or just... Not really. Yeah. I know the OneMap API has some real estate data. Like the OneMap API will tell you, how much people living in different places earn and what kind of job jobs they do and that sort of thing. Like from the sensors. Yeah. How do we have a talk on combining your RA and LTE data? Right. By mapping out like a property price. How close to the MRT... MRT, yeah. That's right. Yeah. So this data is available. Yeah. Cool. Any other questions? No. Going once, going twice. Yeah. Any other comment? Data.gultofsg is running the song Scan which is a private project in itself. I believe so. Is it like an open source data platform? Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Cool. I guess we are done then. Thanks.