 Thanks for the introduction, yeah, I'm Douglas, I work just down south of Edinburgh first a building, so this is a brilliant opportunity for me to come and talk, so thanks for sticking around on a sunny Friday afternoon. For this talk I'll introduce myself, say what I'm doing here and talk in a case study of the maen nhw'n cynnig y byddwch cymaint, y dylaiwch yn ddechrau'r ysgolwch gwaith o'r cyd-fawr. Mae'r cyd-fawr yn gefnogi'r cyd-fawr. Mae'n gweithio'r ysgolwch yn gwneud yma yn y proses, ac mae'n rhaid i'r projwch gynnig i gael yn y ffugir. Mae'r tîm agosig o'r tîm yn dechrau'r cyd-fawr yn unigol, ac mae gennym i ni'n wneud am y dylau'r ysgolwch. Mae'r ysgolwch yn cymwysgol o'r cyfnodol, Cyngor tyw i'r ffordd a'n deillaf, mae rhai ddim yn dweud i gynig ar ffordd, nid rhai ddyrych yn cyngor y cyfradd nhw, dywed ar gyntaf o mwynhau, ac ydych chi'n ddigonio fe oedd rhai ddweud eich fod yn cyfradd mawr yma a'r ddych chi'n gweithio iddi, rhai ddych ei gydweithio eu cyfrdd hon, a'u meddwl o'r ffordd. Ym rhaid iddo o'i gwybod ar y bol, oherwydd ffordd yn hyfodd yn garwyd i agelwyr, Something so I needed something nice to process the output from it and not so a python came in, I didn't use IDL for a while and then I run away from it like a wildfire. Pythons being much nicer to me so my work is mainly working in my research group. o data rangla person. I've heard of the term the other day research data engineer and maybe I fall into that category but it's just a title isn't it. So I'm somewhere in the middle of all these things hopefully not jack of all trade master of non but could argue that. Right so a beef introduced by air quality now if you saw the keynote this morning this was touched on basically all it is is a measure of how polluted the air we breathe is so air quality and air pollution are used synonymously and interchangeably between people and in this case I'm specifically talking about pollution with direct health effects so nitrogen dioxide ozone particulate matter which is basically just sucked this is not in this case greenhouse gases like carbon dioxide and methane because these effects climb it and not health directly and all these are generally emitted from traffic but also natural sources such as fires and interesting there was actually a fire on Blackford hill which is just near the observatory in Edinburgh yesterday luckily put out by the fire brigade before it got too far maybe it was a stray cigarette or something but you could smell that smoke in central Edinburgh and so that was having impact on the air quality. Just to bring the tone down a bit there's been in the news an awful lot recently a quick Google brings up lots of stories about how this affects your health and impacts it's having and it's pretty horrible and it's becoming more and more in people's forefront of their mind which is a good thing generally however we need to monitor this because just saying it's there or smelling it I mean it doesn't give us much information so what we need is some way to get this monitored air pollution into an accessible form which is where python comes in and this is a little quote up in the top corner that I heard a talk yesterday by someone called Alex Jacob but I thought it was perfect data only has value when it's relevant and that's so true because you get a number from a monitoring station and it is meaningless to most people who cares if ozone down the road is 300 what does that mean but to make that measurement into something useful you need to spend time and energy gathering the data knowing where it is processing it and putting it in a form you know and to most people this is daunting because they don't have the right skillset they let them know to look even people with the skillset it's time wasting so the reason I kind of started this job is because my boss wanted to free up some of his time when he's writing grants he doesn't want to spend a day trying to plot this air quality somewhere he wants a quick easy thing so he can do it in half a minute and get on with the rest of his day basically and the thing with these this hurdle in the way is for most people it's too much it's you know if it's outside out of mind I don't want to bother with that so what we need is something to combine this data collection the gathering of it the analysis and then visualize it in a way people can understand so ideally a set of tools that anybody can use and that are accessible and understandable by anybody and the idea of ideally you want a tool that can be used from anyone anyone school children to academics that is a broad range of people but you know it's an ambition and so the solution is why we're all here python and finally get on to python since we're here and so first steps first is getting the data now this has been earth not too bad for this so I'm getting for this case study I'm using air quality data from DEFRO which is a UK department government department currently owned by Michael Gove I don't know what that's good or bad but and there are over 150 sites currently working in the UK has maybe another 200 that have been previously working have been shut down for various reasons and these each take hourly measurements of all sorts of various pollutants so there's an awful lot to deal with especially since summer been going back since 1975 so there's a lot of measurements but on the grand scheme of things actually all the measurements have ever taken it only adds up to a few gigabytes so it's not big data but it's messy and annoying and hidden away and um is there but yeah not many people use it so these is a little plot of where all the stations are spread across the UK the nearest one to us here is just by Arthur's seats this is a picture of the one in Edinburgh it's a green box there um and you'll be glad to know that Edinburgh is generally pretty good according to this for air pollution however you've got to consider things that was alluded to in the keynote this morning that this air quality station is right next to a park it's set away from the road and if you're not a busy area at all you start putting one of these outside the road outside and you get a completely different picture so there are all these stations scattered around the UK and annoyingly Defford doesn't have a nice neat spreadsheet where they all are like a list CSV file longitude so I need to find every bit of information about these sites so I can start using them properly and usefully um so talking about coordinates how long they've been going for what pollutants they measure um codes you know like the european site codes they have all these sort of thing um and this is where python finally comes in is data scraping which has been an incredibly useful tool and I've been using beautiful soup which is a great module for passing out hgml code so it's basically putting your request there a web page from h from using python request um put it into beautiful soup and it passes it all out for you and you can search for bits of you want so you say all right for this site Aberdeen print in there give me all the bits of information you've got about it and you get a nice table out and although Defford have been great and you could email them and ask for this sort of information this is a very quick method of getting a lot of information you need and although on the website each one of these sites has its own web page so you have to go on look go on look you can do that in a loop no problem like just flick through them all so now I've got all these sites I need to get the pollution data from them and again this is another thing that is not made easy by the government but every site has its data available in a cv that you can just csv so you can just go to a particular url and it's there however you need to know that url and that's not available I managed to get that by finding someone else's code who does some work with them a couple of years ago going through their r code finding the url that they use and so it's a simple task if you know the url the problem is you need to know the site code and the year so each web page so for instance Edinburgh the site code is ed3 if you want 2018 you'd have to use those this data is not any useful structure you want data from 2018 in Edinburgh great it's all there you want specifically carbon monoxide from the past five years from Edinburgh Aberdeen and Glasgow say you're talking about 15 web pages there which are their own information in a lot of which is useless to you because you only have to carbon monoxide say not nitrogen or whatever but it's there and it's available and that's good we have some data to play with so the next step is analysis which is the fun bit that I enjoy and of course I use pandas however I am ashamed to say I came to pandas quite late in the game I was quite stubborn in the terms of everything I'd use just use numpy and that worked so why bother changing anything else however just a quick google of oh I want to read this CSV off a web page what's a quick way of doing that pandas is easy oh I'll try that one line oh that was easy oh that's a nice data frame in a time series oh this is really nice and I wish I'd spent a couple of hours maybe a couple of years ago spending a matter of how to learn teach myself pandas and I would say to myself I don't even want to think about how much time but it's a bit of a lesson in don't be so stubborn in your code of use things like filtering or resampling are such powerful tools in pandas and making so quick it is fantastic and there are also great tutorials on documentation out there you know any and I'd say stack overflow which I basically owe my phd to is full of pandas that it's just you know want to do anything pandas pandas pandas so we've got this data in pandas we can do all sorts to it I guess the next step is we want to visualize it and I use plotly now for a long time I use matlab which has been great but then I discovered plotly and this provides very simple ways this is a very small snippet of code that'll make that graph on the side that's all you need and it'll make you an interactive plot that has features like hover and zoom and you can change colors really easily and if you're thinking about interacting with people having something that they can change they can manipulate or not manipulate the data but you know make how they want to see it becomes a lot more interactive and a lot more personal instead of just having a static graph there that's showing you whatever you want to show and it's great because it makes things incredibly simple these are three I mean there's what they are isn't important but you can make very simple plots are you subplots things any bar plots wind roses so that's more to do a wind rose there are a few modules out there now that do it but plotly it just seems miles above the rest so so far I'm in comfortable territory for me this is what I've done for best part of six years I'd say it's some sort of data analysis for something however the next step for me is putting it online this is going very much into the unknown and doing something like this really highlights how you can think you know about python and then you really don't um after a little google and search around I went with Django it seemed like a good framework it's a huge framework has lots of documentations and lots of tutorials which is great but also a little daunting because somebody's never used this before it's like oh god but I'd say there's lots of tutorials and I know it's not aimed at me but I'd find that Django's girls tutorials on how to set up a website using Django is great for anybody starting off with this I know there are other frameworks out there and a lot of this is very much of an uncertain light I'll try it I'll see I don't know if this is the right thing for me but I'm going to go with it and see how far it goes and especially with Django it seemed it's very popular it's you know lots of documentation tutorials but it's not really designed as far as I could tell for the sort of websites I want to make it's mainly focused on blogs and that sort of thing but give it a go and I might be preaching to the choir here but basically Django for you you'll easily create a lot of files for you in a template and these files include things like urls.py which is a list of the website URLs you want to be called so your actual website name goes in there and then it calls this use.py which processes things and renders web pages and then model it just sets out this thing for you which as a beginner in a website but yeah pythonic websites it's it's ideal you know it click and play basically and then you can start spending the next couple of weeks breaking it day after day so again yeah you can type in your website by the way this is the name of the website but heads to urls.py which says to views.py hey this person wants to visit this website do something which then says to the southern module models.py they want something from this website process some data get something from a database do something and then they'll find here you go it's back to views.py and then that mixes that it renders it with some html and css and it makes yourself a pretty website and hey presto the website is born now this is very simple static website at the moment with some buttons on you can click but it was fairly simple to do that and python you know the amount of tutorials out there and python really helps that but as I say Django is a great framework but for what I wanted to do which was lots of people interact with this website change graphs play with them it's not I found the easiest thing to do to create multiple instances or interact with pages especially without reading scary words like JavaScript so I discovered along with plotly who do nice graphs they something do something called dash which is another framework and this is taken straight from their website says they build analytical web applications with no JavaScript required so that's two thumbs up from me and then this is built on their JavaScript to react flask and it ties it in so you can have interactive things like drop down sliders graphs whack that with your analytical code and you can make something that looks good pretty easily and I thought this is ideal so dash creates these apps which could be standalone websites by themselves in my case it's not and I'll explain a little bit later every time a website's loaded a new app instance created so you get one per user they do what they want it doesn't affect anybody else each app layout also each app has like a layout because basically in python you say I want this than this I want a drop down menu then I want some descriptive statistics I want a plot I want a selection menu you can click all that and then you click them and it calls these callbacks which are python decorators for functions so you click this and this decorator goes oh someone's said they wanted uh this colour bar to be yellow changed this and it sends it back and updates the page and it's brilliant so with a bit of wrangling amounts to put my dash up in my original dango framework with a lot of help of people and forums um so dango framework sort of holds everything together and dash up is inside that and that was all the hard work basically they get the data processing it displaying uh quickly on the website except if it wants to yeah great I'll just type it in maybe so you end up with something that's not showing on this screen that's great so this is a simple website it's not that pretty to look at it's a work in progress but dash providing so this is a selection tool to get some data you might want so we can go so define it by region let's say central scotland since we're here and I want all urban sites just carry the all it's thinking it's this is using a quite a cheap serve at the moment so you have to and eventually when that's finished spinning around it selects all the sites that are in central scotland that are counted as urban so you can click head and percent lennards which is the nearest one will stick with the uh with a time series select any variables they use let's look at nitrogen dioxide click submit now this is calling the data and plotting it up and this is all that and so this is what's good about dash is you can hover hover data get different points you can zoom in to look at more points um you can download it if you want let's reset the axis and with dash as well so you can make these interactive things you click weekly and this is saying all right someone wants to resample this data every week and it goes there and it says there's a pandas module that easily does that just resample week um or you want to stick it in a line graph instead um and you can just add more plots onto these so we have say histogram which you can uh change a number of bins to 50 if you'd like um this is a example of the average concentration over one day so it's taken all the time series and saying oh look there's a whole day there basically we can split that into weekdays so monday tuesday etc and you can see peaks at rush hour um so as all these really useful tools uh and makes a nice play website a website you can play with sorry problem is there's too much data really um it's time to use the database previously that that website was calling the defra website every time someone put in a request it out on this much and it's going off that's said before with the way they've structured their data this is just not feasible um it's calling it every time it's fine for a small amount of data it won't really take more than a second maybe but as soon as you start getting decent amounts it's taken a very long time and eventually it's going to crash so better better data management's needed and that's where jango comes back into its own again so using jango it's really simple to integrate a sequel database into it and i basically just copied all the data defra had and waxed it on this database which now jango calls it leaves defra alone sort of because it needs constant updates defra updates every day i'd just have a worker post in the background saying oh it's morning go collect some new data bring that down and now any combination of millions of data points is available you want every three o'clock on a wednesday outside abidine brilliant you'll do it for you no problem so that's the way it's out at the moment and it is still early days but it's been a good learning curve and there are developments that like to do this there are many many bug fixes that need to be done it's quite easy to go on that web page and break it it doesn't work on a mobile for instance it doesn't really work on internet explorer um but you know it's a work in progress i'd like to integrate more data with it so more stations a lot of european stations and a lot of council stations this picture here is a new sensor for co2 even though i said i wasn't talking about it but we could include it um on top of blackford hill the observatory there so that data could be available soon um talking about satellite data and models although then you're going from gigabytes for entire decades to terabytes per day so your data management starts going a lot more uh interesting i suppose and also to get more feedback from any users you know the people who are using this is actually useful i've made some plots but what you know i've showed you simple ones what would be really useful you know comparisons against different things so that's where we're out at the moment and i started to finish with lessons i've learned from doing this um and sort of going into the unknown the first one is just jumping i spent a long time being like no that doesn't quite fit what i'm doing and that's but you'll never find the perfect tutorial and it's best to start with something that's very imperfect and build it up than trying to find you know wasting your time trying to find something that's better and in that sense be adaptable i started with jango it didn't quite work what i wanted i went with dash i looked at some other things i went back to jango you know there's no point belligerently sticking with things and so don't be scared to make the wrong choices i started a lot of websites like this is just not right so it's not what i wanted but i'm very proud to just sit in there to let my thumbs in or what i'm going to do but take your time to learn new things in my case pandas is uh what i would have learned but you know i suppose that's okay isn't any walk of life not just python don't get bogged down by the little things i found with rightness website that i found it a lot better to quickly do something that makes you feel like you've achieved a lot and then you can be like all right i'll play with the colors of the bar plot later or the spacing that doesn't really matter right now what i want is to get something going and get excited about it but in that mind keep an eye on what you're trying to do because um you end up sort of getting again looking at these small things and be like well what is that i'm trying to do is this is spending a week going over this this bit of code is going to re-sample actually useful for anyone or is it just something i want to do also don't reinvent the wheel and this might be a case for academics especially because i know people myself included are always hesitant to use the people's code because it's always a bit scary putting your faith in results that is almost a black box you're like here's put this data into a python module or a website and it comes out the other side what's it going to show if you know what it does step by step that's good but you're going to waste a lot of time doing that and at some point you've got to trust people you can't redo everything and lastly go for a walk take time i've found if i get stuck that's the best way to do it really just go out and especially with air quality if i go outside and it smells and better do something about that so that's me thanks for listening and more tech problems isn't it i've not planned on being but i'm living Edinburgh so i could be i don't know what i've not looked at what sprints are i could just repeat the question back if yeah so he asked if i was out the out the sprints yeah yeah so that question if you didn't hear was about using data from a Scottish Government and in Edinburgh and Friends of the Earth have lots and yeah there's so much data to be used out there deffra is just a starting block but i've a few friends who work at the Scottish Environmental Protection Agency specifically looking at air quality and they have a lot more stations available and stuff but now i'd love to use it yeah and more the better in terms of educating as in say i say with Edinburgh you go on the deffra website and look that's going to show you it's fine but sometimes it's not you know past few years sorry that um the local monitoring stations in Scotland are lots of them are placed along roads and um Scotland's been breaking legal limits um along these and so that this is the motivation behind trying to better monitor better check these results and visualize them so i'll i'll speak to you later but um if you i mean yeah more data the better i'll put your judge thanks thanks for a great talk um are they do you know like so if you're using data from various different sources um like you were just saying um do you think that the what you collect from different types of stations will be comparable with each other will there be sort of like technical variations in those do you there is a problem with that so things to be directly comparable people argue so different types of instruments might have different calibration things and you saw the example went on the website briefly you can select it by different environment so Edinburgh is considered urban background but you might get urban um traffic which is on a road or and these things i mean you can say one's more so than the other they're not as that directly comparable um you can't do it easily but it's doable there are ways around it but yeah it's not just this number versus this number basically there was another yeah this is the last question uh it is open source and yes i would accept a request it is a mess right now last quick question and this is a question from friends of friends so this is a real case that the parents of the primary school they they are convinced that the the error around the primary school and the air quality is bad for the children but then however they don't have a way of convincing that the local authority local council to say the air quality that is really bad for that so do you have any suggestions or any two kids that for as a citizen of the or the parents of the school can use it and to collect it and then convince the authorities to say this is a problem um it's difficult because there are a lot of people that think that and i would argue rightly so there are lots of groups uh i know from university side and i imagine there are from commercial sides as well that are actually looking for uh ways to test their instruments and gather data so there's one recently um from the University of Birmingham here that they did a study around schools in uh within Birmingham so they brought some monitoring stations and it wasn't didn't cost the school anything didn't cost parents anything it was it was a research project done by Birmingham but then they fed it into the community and got the community involved and showing and their results actually put a no traffic zone around their local school um so there are there are out that they are out there unfortunately i don't do anything directly measurement wise um i could write down a few places you might be able to look afterwards okay that's uh all the time we have a lot of it so let's thank the speaker