 Hello, everyone. So, the topic of my talk is exploring seasonal insights from Singapore Weather Station data. So, a little bit about me. Okay, a little bit about me. I'm Chin-Wi. I'm a data engineer at SD Engineering. My background is aerospace engineering as well as computational science. And yes, I contribute to Pandas. So, if you see the latest documentation for Pandas 1, I did contribute to the documentation. And in my free time, I also volunteered as a mentor at Big Data X. So, Singapore, we all know the coordinates. It's like 1 degree above the equator. Di mana-mana di sini. So, a year is usually split into four seasons, which includes spring, which you have all the sakuras. There's summer whereby people go to the beach and sun tan. And then you have autumn whereby you have very nice autumn leaves, whereby it feels very romantic. And then you have winter, which is very cold, but it's still pretty fun. But what about Singapore? Do we have nice sakuras? Do we have nice maple leaves? Or do we have snow? Maybe we only have snow city. So, under four seasons just hot, very hot, extremely hot, and rain. Ya, so boring. Ya, maybe, maybe not. Ya, but... So, why not let's discover how many seasons we have through the data? So, first we have to extract the weather data. And because I am a data engineer, and I don't believe in aggregated data that the government release. I want the raw data and stuff. So, I should go and extract the data on my own. Ya, so, I go to data.gov.sg and look at the API. Hey, there's an API for real-time readings. And it's in a minute scale. So, pretty much every five minutes you will have data. So, hey, this is as granular as it gets, right? Okay, wonderful. So, I go there. Ya, it's a real-time API. And then it's open data, open government data. So, that means I don't need to pay money to get the data. And it's me in my minute. So, very good, right? But then, we have... Okay. So, why do I have to go through a trouble of that when we have stuff like weather at SG to tell me what's the weather today? Or... Or, maybe I have very nice apps like... Like this, like Qion's app that can download. And it can tell us the wind direction. Like, oh, right now, what's the wind direction? It's going to rain and stuff like that. But those are data that are at one point in time. But what I'm actually looking for is trying to script data at a specific weather station and maybe I want a month-long data to analyse. So, how do I do that? So, first thing first, when we try to use an API, we try and play along with the API first to see what output we get. So, okay, look at this. Okay, so there's a curve for my... to get the server... to ping the server and see what response I get. And then there's the website. And then there is the API website that I can extract data from. Ya. Looks okay, right? The documentation. And then I try. Okay, I just try a day and then this is the result that I got. Okay, let's see how many levels there are. Now I have one set of curly brackets and another set of curly brackets and my data is all the way nested inside. For the standard, I see station ID is something. And then the reading that I want is also something. And then it's not just a... it's not just like JSON. It's JSON and then you also have another JSON inside. So, it's a nested JSON format. So, things aren't so simple anymore. And then I thought, Eh, pandas has a read JSON, right? So, it should help me. But this is what I got. So, you look at the table, right? It has an index. It has readings. It has timestamp. So, I want to look for the value for the weather station. But it's other readings and that is nested in the JSON. So, what to do? So, since we have a JSON after the pandas output, we have a JSON within the reading, right? So, why not try to pass out the JSON? So, an idea. So, it seems to be not so straightforward that I decided to just publish my code on GitHub. So, it might not be perfect but at least it does the job. And one of the main libraries that I use to get the data will be the request library in Python. So, because humans don't understand HTTP requests. So, we will need a library to help us do the job. Ya, I know that, I know, can you say? Don't use, don't use, don't anyhow import library but I'm sorry, I need to get the job done. Ya, here it is. And currently, currently the APIs that are supported for the library is, for the code that I've created is for air temperature and rainfall. Actually, I also did a little bit on the humidity but in the interest of time I should not touch too much into that. So, there are about 17 weather stations in total. And what this code does is it can scrap data for continuous time range as well as for a specific weather station. So, this will be the output that we as data professionals will be more interested in. But some design considerations will be very slow connection because we are making HTTP requests. So, in order to ensure that we will be able to get the data despite some slow connection, I use a retry mechanism which is from the retry library. Maybe your API is working it gives you a code 200 but you just don't have data inside. So, this type of edge cases I will have to handle that also. So, what I did to handle that is to return an empty data frame with the same column as if as if there was data because when you try to get the data and you don't have data it's empty, it's not even NA. You will want to get an empty data frame with the same column structure. And then the main problem of Gaster JSON I want to convert it to Panas Data Frame. So, first I will have to extract the desired station and readings from the JSON after which I will have to concatenate them back with the timestamp and then I want to match with the reading as well as the weather station. So, this is the approach. So, remember that the output from the Pandas you have the readings stuck in the middle. So, I extract them out and then get the readings for the weather station I want. And then after which I will merge the timestamp column and give the station ID and value column. So, let's show a bit of a demo of what is actually happening. Okay. Okay. Let's see whether this okay. By the API. So, this is the command line version. So, first we enter the date in a static specified format and then we enter the number of days from the day entered. So, if I want data from 1st to 7th, then I will key 7 days. And then it will ask me which API to scrap form. Is it the air temperature or rainfall? So, let's maybe like the relative humidity. So, maybe let's try scrapping for air temperature. Okay. While this is running, I will show a little bit more of a code. Okay. So, just a bit of what I will be doing is that we have about 17 weather stations that will measure air temperature. And the weather station that I will be interested in is Changi Airport. So, the code for Changi Airport the Changi weather station is S24. So, that is what I will be keying in. And after that I will be making use of the data that I have scrapped to do a time series analysis of the weather station data and then to be able to analyse some trends and patterns from the weather station data. So, let's take a look at the status first. Okay. It's a bit lag. A bit lag, but never mind. So, let me show what I mean by like okay, let me show you like the internals of the code that I use for the scrapping of the data. So, first thing first, like, there is the retry diary. And and then one of the age cases where I encountered was that sometimes when I keying a date like a date range let's say from like 201 9 April like like somewhere in between as a public holiday and then you don't really have data and then the application will just freeze freeze there for some time. And then when I debug, I realised that it is because sometimes you just don't have data even though you have a code 200. That means if a code 200, it is not a problem of you are not being able to access API, because the API is healthy. So, it would mean that you have to cater for such age cases that maybe your API is healthy but it just don't have data. And then when I showed the switch mechanism whereby you can keying so there's an option of whether you want to use air temperature rainfall so we don't have in C we have all those switch cases but in Byton we don't have. So So, right now I have to keep using so imagine that I have 5 APIs to handle this is what it's going to look like. Okay. And then so let me take a look at the status of the data. So, right now we are at the switch. So, right now after scraping from 7 days the IDs are shown. So, as mentioned I will be scrapping from the Changi Weather Station data so that is S24. So, at this stage all the weather station data from 1st May to 7th May has been scrapped. So, all the JSON everything has been already taken from the API now what I'm doing is trying to extract particular weather station data. So, which is what I show earlier about trying to extract the specific weather station data from a nested JSON. So, this will take quite a while in the meantime since this is not going to affect my workflow very much I will be showing another notebook which is okay Ya, this is okay So, this next one that I will be showing will be extracting seasonal trends. So, as I mentioned that the analysis period will be from about 2017 to December 2019. So, the objective is to see whether there are any seasonal trends that can be observed over the past 3 years. So, after I got the data and then and then try to plot raw data which looks like a mess actually. So, this diagram actually shows that your temperature data is actually within a very narrow range over the past 3 years. So, maybe we look at the plots for other countries. You will see a more significant cycle but in this case, the cycle do not seem very clear. So, to be able to decipher the cycles, we have to do some do some magic. So, after I did some sampling then you can see that the trends are a little bit clearer. You can see like some ups and downs so, you can see the trial will be around December January and the temperature will actually get a little bit higher in the middle and it will fluctuate again towards the end of the year. And for rainfall, it's not so clear. So, and then to extract the signality, we will have to analyse by month. So, what I did was I grouped the data by month and I did a box plot. So, if you can see from the box plot, right? If we look at month-wise box plot and we look at the mean and we look at the median temperature for each of the months, you can see that you can see the trend of lower temperatures in January in December and then increasing temperatures towards the middle. And how about a yearly view of air temperature? So, you can see that this is for 2017, right? It will be at around 20 about 28 and then 2018 is not so similar. You can see a little bit of a deviation but the median doesn't change much but if you look at last year it's getting hotter you are having an increasing trend over the past 3 years so, there is some form of global warming unfortunately and then plus if you look at the Bali rainfall from December 2016 to December 2019 if you look at December 2019 we have a lot of rainfall and then you can see that from the running rainfall you can see certain very short blasts for 2018 and 2019 which suggests that the drier must are getting drier, the wetter must are getting wetter and if we associate the rainfall with temperature if you have no rain it's going to be hot so, if you have rain, please finish your energy consumption and another interesting fact is that Malin number of rain days it seems to be that we don't really have a lot of rain days now days for 2019 but if you look at 2017-2018 the December rain days we have a lot of rain days so, it kind of means that we have less rain days but the rain are getting heavier okay, so let me see so, it's done and so after I show all those patterns how many seasons we all think Singapore has 2 seasons 2 seasons any more answers it's about 2 it's about 2 seasons ya so, basically that's it the main key takeaway is if your data is not so easy to take just build a tool and then you make your life easier ya, so take picture so, you can reach out to me at all those and my project is over here it's something that I'm developing on and off so, I might make some changes oh ya, and last thing announcement ya, ya'll know what's yet the Global Diversity CFD day ya, so I will be there so, if you want to speak at if you want to be here speaking if you want to speak at a conference and you need help come here ya, okay