 All right, well thank you very much today for giving the opportunity to talk remotely. It's really nice to be able to have that option. I think especially when people aren't able to attend in person because I haven't been in apologies for that. Right, so as people said, I'm Richard Onigliffe. I'm from the University of Bristol, but of course we work with lots of different collaborators as many of these projects do. So I'm going to talk today about OpenGHG, which is a community platform for greenhouse gas data analysis and data science. So you're just going to give a bit of background as to the kind of motivations for this project. We wanted to talk a little bit about what we use things like greenhouse gas measurements, greenhouse gas data for. So the most obvious is that you might have seen this sort of plot before where you can see this rise in carbon dioxide. This is in parts per million. This is from the Manilao Observatory, part of the NOAA Global Monitoring Laboratory. And you can see this very steady and perhaps even increasing rise in the last few years for carbon dioxide. So we measure greenhouse gas data. We want to look at it. We want to see what it's doing. We want to look at these global averages that are associated with this information. But we can also use things like the measurements that we've been making in order to try and understand. I'll talk a little about the kind of models we need to run as a part of this. But we want to think about how we can understand things like natural fluxes. So fluxes that we could be influencing by the sort of man-made things that we're doing, but also that are just kind of responding to the temperature changes. It's really important to understand the kind of feedback mechanisms we're getting with those. And also to think about how we can characterize our anthropogenic emissions, so our man-made emissions. So there are various different methods we can use in order to try and characterize the amounts of emissions that we think we're producing. And for instance, you can do inventory methods, which is what tends to be reported to the UNFCCC. But as well as that, you can use measurements in order to try and work out what these emissions are. And again, you can have to run some models, which I'll talk about a little bit in a bit. So evaluating national emissions reports is a really important thing that we can do when we're measuring greenhouse gases. So one of the challenges associated with this, and it's kind of a good challenge to have in a sense, is that we have a range of independent networks and projects. So I've just included here a few examples of that. We've got sort of the national scale. We've got the kind of international scales. This is the ICOS network across Europe. And then we also got global scale networks as well. And then on top of that, there's also different types of measurements. So not just measurements that these tall tower stations or flask measurements. We've also got things like satellite measurements. So as we've had talks today about, you know, the idea of how this kind of remote sensing kind of uses Earth observation. And so of course, you've got that in the world of greenhouse gases as well. This kind of gives you a different type of measurement because when you're measuring on the ground, you're kind of measuring what's coming into your your sensor. Whereas when you're measuring from the satellite, you're measuring a whole column of information. So you need information about how that can tie back to the how you can do these comparisons. You need additional models. And also you can see it kind of gives you this wide range of these different scales of these spatial temporal scales. Yeah. And then finally, we've also got this idea that in order to interpret this information in order to try and get out that kind of that top down that I mentioned. And how do we how do we do that? We can do that by running these sorts of models. And what this is basically showing is what's called an air history map. So the idea with this is this is showing you where the air came from in order to be measured. So where our measurement sites are the blue points here. And this is actually going backwards in time. It's telling us where did that air travel from because that's going to tell us what we're emitting these gases, which is really what we want to know if we know our missions, we can work on mitigation, we can try and improve our our emissions going forward. Okay. So, so I sort of mentioned a bit about this idea of this some sort of use of phrase a few times top down measurements. So do you have these two different ways of kind of characterizing emissions associated with, you know, how we're producing these greenhouse gases. So one of them is the inventory method, the idea of bottom up measurements where you kind of do your counting. You kind of look at the various different sources and you decide what you think is kind of going into your national totals. But there's also this idea of being able to use the measurements in order to try and form emissions. And so what you can basically do is you can use your kind of estimated emissions, you can create these air history maps. So that was the same as that that video that I just showed on the previous slide. And you can use that to create this modeled measurement comparisons. You can say, if these are the true emissions, how well is my comparison looking if I compare that to the measurements? And one of the things that we do is we can use that information, we can run that through various different approaches, Bayesian inversion methods, and we can try and see if we can improve upon this. Can we improve these emissions in order to try and to improve this comparison between the models and the measurements? So just to kind of characterize the typical workflow that we go through when we're trying to put together these these top-down emissions estimates. We kind of go through that data acquisition steps. I talked a lot about the sort of networks that we have, the satellite data. We have to go through these modeling steps. So that's the kind of what I've been talking about with these air history maps that we want a week. In our group, we want a model that's produced by the Met Office called NAME, but other groups will use other models as well. So there's an importance of how you're doing that and how you can compare these outputs. There's also this data model comparison, this idea that you do need to do things like correcting the drift. You need to make sure if you're using multiple sources of data or how are you calibrating those against each other or are they on the same scale? So especially with the satellite and the ground base, we need to be very careful about how we compare those two things. And then also in general, you have this idea that then at the end you can kind of come out with this inverse flux estimate. So you pull all these things together to produce this output. And so you can kind of see, I've just kind of highlighted the sort of considerations, the challenges that come around with this. You have this idea of need for easily easy visualization of lots of memory intensive parts of that. How do you standardize these diverse data sets? How do you consider these different inputs? And largely this has done lots of manual steps as many of these things are. That's how science works. Sometimes you do a proof of concept and then you think about how to automate that later. And also this is largely done on a case-by-case basis. So we'll look at a specific area. We'll use the methodology to come up with some emissions estimates and then our work will move on essentially. And we're not kind of continuing to just keep generating that output in the same way as you would necessarily want to with this sort of work. So what we've basically done is we've tried to think of a good way to be able to deal with these challenges to kind of pull together this methodology. And so this is the basis behind OBGHD. So I've kind of said here that we've kind of characterized this as two parts. So this idea of standardization, this idea of pulling together these different data sources. So having a prior knowledge of how to interpret this information. So like I said, there's global networks, there's regional networks, there's different standards associated with those. And there's also a Bristol we have. We run the the DEC network which is the UK network. So there's the kind of internal standards there. So we want to be able to pull together different data formats. There's also this idea that some of these products are produced in really nice, accessible ways. There's the CEDA archive. There's the ICOS carbon portal. You know, we've also been building tools to be able to pull and interpret the information from there as well. So pulling from these external archives so that we're not storing the data. The purpose is not for us to store the data, but it is to allow people to be able to compare this data into comparison. And then as well as that, there's also this idea of these, this is the measurements. There's also these additional products that I've been talking about. So these things like these air history maps, emissions ideas, inventories, maps and so forth. Things like what are the global scales of things. And being able to pull that information together is another really key part of what we've been doing. How do you connect these pieces of information in order to try and work on ultimately a pipeline for this, for producing these things? And the way that we built this tool is very much on the idea of open source. In the previous talk, there's obviously talk about Jupiter notebooks. That's something that we have at the heart of what we're doing as well. This kind of idea of a reproducible science is obviously really important. And so we've pulled together lots of open source tools. This is just an idea of how we kind of characterize that. We've got a, we've pulled together lots of, it's a Python based library. So there's lots of open access to be able to look at these things. And also it's fully open and accessible. If you wanted to, you could go and have a look at this on GitHub. All the code is available for you to look at. And we have been releasing that as well. So there is a version that you can download if you're, you know, if you happen to be using Python, you can download that using PIP or condo, for example. And we've also built this with the idea that we want to build it to be extensible. We want it to be something that the community can contribute to. So this idea of transformation, where we go from individual, where we go from kind of these data sources, and we add those to the, we add them into the, into our object store, I don't know what that means in a minute. But also this idea of, of transformation, where we might have various different databases or formats that we want to be able to interpret the information from and then put that into the, the overall thing. So, sorry, there's a question coming through on the chat, but I'll just add a bit. So just kind of go into a little bit into the structure of the way that we have designed OpenGHG, just to kind of emphasize those points that I was going through. So as I said, you can go and look at this yourself if you're interested in having a look at the way that we build this. But the idea is that we have this front end here, which is to do with how we standardize these diverse data formats and store that data. That all gets put into what's called an object store. So essentially that's a database, but one of the really nice things about the object store, and this is kind of a cloud-based technology that we've, that we've adopted with this, is that it's not based on a folder structure. It's very much based on this idea of keywords and searchability. So when you add data, it's kind of, in some sense, it's stored in a relatively flat way. There is a little bit of hierarchy in there, but the idea is that everything is tagged and it's tagged in a relatively standardized way. That means that when you search the database, you can find things that are connected. And also that's a really important part of how we're building this aggregation idea. How can we connect these different things together? We need these overlapping keywords to understand how we can group things together. And so the idea is that you can retrieve data in this kind of fully searchable way. And then also there's this, there's lots of this idea around the front end. So we can retrieve, compare, aggregate, analyze, and create these interactive visualizations. So I'll kind of show that in the next slide as well. And so there's these plotting tools that we have as a part of that. So we are continuing to develop those as we're going forward. But also this idea of this aggregation is pulling together of these different pieces of information. So we can do these comparisons that we need to do if we want to do the sort of science that we're interested in as well. And so this is just a quick demo. So this is actually a Jupyter notebook just to kind of go back to what's been discussed before. And so this here is just demoing how we would compare data from different global measurement networks. So these two networks, one of them we've loaded into our database, another one that we have pulled from an external archive. And so here you can see there's lots of them, I think I'm nitty gritty of the metadata associated with this information about what this is, what does it mean, what scales is this on, who is the data owner, and so forth. And then you can see that we can basically plot that up and we can make comparisons between these different sorts of data, which is often something that people want to be able to do. And this is all done through something called plotly, which is a nice interactive toolkit, which does integrate really nicely into Python. And so you can see how you can zoom in on this, you can play around with the data, you can export images as a part of that as well. And so that idea of that individual science, people can go back and have a look at this, for instance. And then just another quick demo just to kind of emphasize that aggregation idea. So here we've basically got some methane measurements within to really key and important greenhouse gas that's been in the news a lot recently was a focus of one of the COP meetings recently. And this was basically looking at methane measurements that were made and comparing them to potential waste emissions. So waste being one of the sectors, this is landfill and so forth. Can we compare that? So we basically take the kind of predicted emissions, we take these measurements and we say we can compare them. So you can see that there's obviously an offset between these two because it's not necessarily taking in all of the different things that would add up to the total methane, but you can kind of see roughly what the contribution would be in the sorts of things that we were interested in for this. And another thing here, and I kind of mentioned COP there when I was just talking through the previous slide, but here we've got this idea of how we could use, how we have used open GHG for outreach and will continue to do so. And so this is basically something we put together for COP26, which was in Glasgow. It was great. Actually, there was a bunch of people who came together and they put some sensors around Glasgow. And what we did is we created an interface that would then allow this live data to be available for anyone who is attending, anyone who is interested could come and see how these different points in Glasgow, how these numbers were changing day to day. So this is the carbon dioxide and what we think. And the idea with this is that we can obviously modify this for other projects. We are very interested in doing so for a lot of the networks that we have as well. And this has been built with open GHG at the back end for this. And so just to kind of also show some of the kind of spin-up science that we've managed to get as a part of this product. This is a paper that was published this year by Alayla Balola, who's a PhD student in our group. So really great work that she's done around how she's been using machine learning tools in order to try and improve the computational aspects that we have. So I talked a bit about how we have lots of memory-intensive steps that need to go into this. So one of those name-intensive steps is this idea of creating these air history maps. They're really important for the work we do because they give us that connection between those measurements that we were making and the potential emissions. But it can be very intensive if, whereas we get more data, especially satellite data, it's great. But it also means that if we're running it for every single point, then suddenly we're getting into big computational challenges or having to scale down that data, which we don't want to do. So this project, the idea here was to try and see if we could use machine learning tools. So using this idea of kind of interpolation. So we would still run our model, but we would allow, we would kind of include some meteorological parameters and use the machine learning tool, train it on these kind of subset of data that we were running in order to produce these predicted air history maps. And obviously we would also want to make sure that that was doing, we weren't losing too much by doing that. So one of the things Ellen put together was this comparison of this emulated generated footprint as we call it, this air history map versus the true name generated footprint. You can see that it does a relatively good job. So I think this is a really nice piece of work to kind of boost. And then this is obviously something that we want to integrate into the tool. She used it to do this, but we can kind of do more with this as well. And the last thing I just wanted to mention is this idea of what we're going to be doing with OpenGHD going forward. So this is a really nice project that's launching at the moment where OpenGHD is going to be used within an automated pipeline. So I'm basically sharing a few images here that you've always seen throughout the presentation. So hopefully there's a little bit of familiarity there. But the idea is here is that we want to take this entire pipeline that I've been describing, this idea of creating this these top-down emissions, these comparison emissions to be able to do this national inventory comparison. And we want to do that in an automated way. So I kind of mentioned how we do this on a case-by-case basis. We tend to run this and then there's lots of manuals that are required if we want to spin it back up again. But what we want to do is basically take all these steps, take OpenGHD at the core and the heart of this to essentially allow us to start to produce these emissions maps and emissions outputs on a regular basis. We're still kind of designing is actually what that's going to look like. We're thinking about their various inputs for that. We need to think a lot about how we can kind of automate a lot of these manual steps and make sure we're still producing high-quality output. There's lots of things to think about with that. We need to make sure we're integrating different sorts of models. We need to make sure we're integrating lots of different data. But the idea is that overall the product that we've produced is going to be something which is going to be really much at the heart of this. It's going to be able to allow us to create that searchable database to aggregate that data to allow us in order to kind of go through the whole pipeline in order to be able to produce this comparison. And that's everything from me. So thanks very much for listening. Shall I stop sharing now? So I'll set the questions. Do you have any questions on the slide? That's really good. Nope. Any questions from the floor? Anybody? Okay. I have a lot of issues around SQ3 measuring the green light that we chose. Even SQ3. So I want to ask how she's focusing on this measure of green light that we chose. Really, on both points for the national and national government. Then the second question is about, should we mention of some organizations, some, you know, international organizations that are measuring green light for the nation? So I want to ask if you are a venture, you are using your model, or you're using your data system with campaigns, for example, a graph method or probably the GSG protocol. And at the end of the day, you find out that your data set has some outliers. And how are you going to run that? So I think, Rachel, the first of those was how are you going to use this tool to very measure greenhouse cash flow, the different conditions. Is that right? Yes. And the second was how are you going to deal with the discrepancies between different measurements from different sources? Okay. Yeah. So I think the one other thing is to kind of highlight when it comes to this tool is around where it kind of sits on that pipeline that we've been talking about. So one of the things that we do work quite closely with the people who make measurements, or at least one of the communities that make measurements specifically in the UK. And so they want to be able to use the tool to feed in some of these, their raw data, and to kind of go through their calibration levels in order to make sure that they understand how the data, what the data looks like to make sure that looks sensible. So I say that one thing is that for a lot of the networks that we would deal with, we would be sitting after that stage, because when you're measuring from different inputs, the standardization is going to be happening prior to it coming to us. We would generally be dealing with level two products. So we level two products from satellite data and also dealing with the kind of the validated data in a lot of ways as well. And so in a sense, it kind of depends on how you want to use the tool, because we're not going to be making prescriptive decisions about if your data looks right, you have to decide if your data is doing the right thing. And so we just want to make sure that we're providing the tools for you to be able to do those comparisons and to make sure that that makes sense. So there's lots of calibration that goes on with regards to, so in the DEC network particularly, they do lots of work on calibration. They scour the data. They make sure that it looks sensible. And in terms of different scales, these things are known. The calibration scales are known in advance. And so if you want to be able to compare different data sets, then as long as we know the calibration scales, it's easy enough to make sure that we're making them comparable with each other. And when it comes to things like comparing against column data, so when we're looking at remote sensing, for instance, this is why we want to run these air history maps, because we can look at the measurements and we know that there might be some sort of bias between them because that's kind of known. There's a lot of work that's been done on that by the way, like about trying to make sure that there's an understanding of the various differences between the way that things are measured on the ground and the way that the satellite data is measured. But when we're running the models, when we're thinking about how we get out these emissions inventories, we can do things like allow for a bias parameter between those two things and that can be solved as a part of the inversion. And then we can use that in order to understand what differences there might be. And that would also characterize differences in the models as well. So not just a genuine bias in the data, but also a potential bias in the way that these, the way that the model has been run for these two different types of inputs. So I think that that eventually kind of sits at the heart of all of those things, but there's obviously got to be input from other people in order to try and make those things work. Your first question for Rachel. Yes. Okay. So, Rachel, we had a workshop on data ownership and recognition earlier when federating lots of data sets, how do you ensure they get properly cited? It's a really good question. I mean, I think that one of the things that I wanted to emphasize as a part of this is that we're not trying to act as a data store. We're not trying to be the one stop shop that everyone goes to in order to make sure that they, you know, they go to kind of get the data. So in terms of citations, when you're pulling from these external archives, all of that information is going to be included, especially for instance, with ICOS is really good with this sort of thing. So when we're pulling this information from ICOS, all of the information about how to cite that data is going to be there. Similarly, when you're running from, so the NOAA network has what's called an observation pack that they run. And similarly, they have really good metadata in there. So I think more than anything, we would say if you want to make sure you're citing data, go back to the original source. Don't try and use us as a place to understand how and where the data came from. Use us as a way to kind of look at the data to visualize the data, but you're always going to be wanting to make sure that you're citing it from the correct external resource or the paper that is associated with that data. And so as much as possible, we try and make sure that metadata is retained and that information is always stored so that when you're grabbing these data sets, it'll have information about where that data came from. But I would definitely say that people need to make sure they're doing this properly, and they're going to make sure that the data is cited from the correct places. There's a question from Lou Dowrick. So thank you for a great talk. Do you employ common metadata, for example, community-recognized vocabularies to promote interoperability? Oh yeah, that's a really good question. So one of the things that we've really been trying to make sure that we adhere to what's called the CF compliance standard. So we're kind of going through the moment in trying to make sure that that is being applied correctly in lots of places because there's lots of things around various terminologies to do with the way that the attributes are stored and X, Y, Z. So yeah, so we very much do try and make sure that we're adhering to those standards. And in a lot of the projects that we work on, we do work with some of these other communities and they're really big on making sure that these standards are correct. And so yeah, it's definitely been something that we have thought a lot about and is at the heart of what we want to do is to make sure that absolutely we're trying to make sure that we're adhering to this terminology. I can't guarantee that we always get it 100% right, but we're definitely trying to make sure that we are. Okay, another question from Matt Bright. Will you use cloud resources for people to run analysis and if so, which platform will you use? That's another really question. So when we initially started this project, we were hosting on the Oracle Cloud. So that was for the two-year original set of funding and so that tenancy ended because these things, you tend to buy it upfront. So we have a tenancy on Jasmine Cloud that we want to, whether we are in the process of trying to move over to at the moment, that's a bit longer than we hoped. So in terms of hosting, Jasmine Cloud is where we want to be. In terms of actually being able to run a lot of the tools, if it were possible for us to kind of, we need to talk to the guys at Jasmine and kind of decide what's feasible because I know the job in Cloud and it sits outside the kind of Jasmine infrastructure. And so with that, there is batch computing that kind of sits in the background. So when the cloud resource itself there's always limited ability to be able to run these kind of big intensive jobs, but lots of kind of visualizing data, looking at the data, doing this aggregation, that's all totally feasible in a cloud platform. If you wanted to be doing some of these bigger modeling steps, then we'd probably think about how to link up with HVC resources if possible. So Jasmine is kind of going to be the home for this going forward following the kind of finishing of the Oracle Cloud tenancy. There is the slight downside of that in that Jasmine tends to be UK focused collaborators. And so obviously we would like this to be something that's successful internationally. So it's something that we will need to think about a little bit if we're kind of limiting ourselves if we're hosting this on Jasmine. But I think that long term hosting on somewhere like Jasmine is better as we found than a commercial cloud tenancy. So do you think that that seems like the best approach for us going forward? I have the question. I think it's come from the chat. The Yara asked, is it possible to use this tool to compare local areas in the UK? Are you set a resolution question there? Yeah, for sure. So it depends on what data you have and what data you have available to you. Because obviously, if you're talking about things like local areas in the UK, there are some fantastic data sets that exist. There are some of the people that we work with at Crumfield who have these beautiful data sets to do with land data. But that's kind of primarily focused around certain areas of the country. So you'll get some areas that are really well mapped like London, for example, but you won't necessarily have this really good mapping across the entirety of the UK. But we also have things like, there's a model called UK GHG, which is a, Pete Levy runs that model at CH, I think he's at. And that's a really great model. That is kind of trying to do this higher temporal, higher spatial resolution over the UK specifically. And so yeah, that's something that we've been integrating as much as we can. So at the moment, the product is still run, it's an R-based model. It still run externally from open teaching. We don't run that model. But if you produce, if that model was accessible, so if you run that and then you produce the output, you put that into the tool, then yeah, absolutely, you can use that if that's on a resolution that you're interested in to be able to do things like comparing different areas, you could define different areas that you're interested in looking at cycle. So yeah, that would be definitely feasible. We don't have the ability to do it at the moment, but that's why things are scalable and extensive.