 Thanks very much. Okay, so yeah, data quality, it's sort of been something which has been sort of fairly important to me and my team in the work we do, so I'm just going to start off with some of the background to that because I think it's all relevant. I mentioned my unit, this is us, Habit Research Technology Unit, and the team is about 20 to 25 depending on interns and things, and we've delivered quite a lot of solutions around research support specifically in the health domain. One of the things we do- The green is not showing. The screen's not showing? Not sharing. It's a bit, it's sharing, okay. I can say it, Dougie. You can see it? Yeah. Okay, Arthur, you might have more than, the zoom often has more than one screen. I wonder if it's hidden somewhere for you? Yeah, Arthur properly trying to join us. It's a slideshow, okay, from my side. How's everyone else? Oh, okay. Yeah, good for me too, Meen. All right. Okay. Thanks, Sarah. Good to check. All right. Thank you, Arthur. So yeah, one of the things we do is we've developed a repository which has got data for about 2 million patients in it. It's called Patronard Data for Decisions, and of course, once you're starting looking after a large amount of data, things like quality becomes a bit of a challenge and a bit of a concern because we want people to use the data for research, and here the slide's a little bit old, but we've got quite a lot of different research groups that actually use this data for research, and it's GP data basically on 2 million patients. And a while back, we did a piece of research and we saw that in our general practice domain around Australia, there were about 106 different datasets of this nature, some bigger than others and some better than others. But to be honest, people's understanding of the word quality was poor to nonexistent, we felt, and the irony is a lot of these resources are being used for research, and people just don't even understand the quality of, they understand the definition of quality, or they have different definitions. So this got me interested in how I can change things at a national level, and there is a national organization in the health space, because it's the Truing Health Research Alliance, and it represents all of the major universities and so on, and research institutes. So it represents over 90% of all academic and research teams in healthcare, and most of the hospitals as well. So I wonder if you could put yourself on mute. Marcelo, do you think you could do that? Thank you. So under the banner of the Street and Health Research Alliance, we were lucky enough to be able to form what is called the Transformational Data Collaboration, so I lead this up nationally. And in this health domain, the collaboration has got a goal of looking at data quality assessment around data repositories that I'm talking to you about here today. But we have very related things that are probably very much of interest to you here as well. So terminologies and mappings and vocabularies are a big part of what we're doing, what we're doing, and we're just kicking off some work now with the ERDC around a terminology curation platform that we're working with, with CSI. So I see Melanie Barlow's here, so I'm sure Mark Milling knows about this one, Catherine Brady. So the other thing I want to discuss is the Odyssey O'Mott Common Data Model, and this is a standard data model for health data for research. And what the reason we're interested in this is that when you look into medical records, sometimes they have thousands of tables in them, and if you're trying to analyse the data for research, how do you do that? So this project that we're doing with O'Mott is looking to convert different hospitals and areas, whole Queens of Public Health, to a much simpler model that allows things to happen for research. And we're also doing work on Victorian emergency admissions data and also our own repository of two million patients with Patreon. So when we started really getting into analysing the quality, we did some research and in the health domain, probably the most widely cited, most widely used framework for looking at quality is what's commonly known as the CAN framework, and this is the paper from 2017. And that year at the Australian Medical Informatics Association, it was recognised as the most important paper of the year, that year because of the impact it has in terms of research, because it's finally someone looking at a framework for how we can assess the quality properly. This is just a quick snapshot of part of the paper where it's looking at different areas of verification and validation around different quality attributes. The framework's been used in this common data model space, so I want to just tell you a little bit more about this health data standard and the way we can create a common data model around this international consortium here called Observation Health Data Sciences in Informatics. Programme, otherwise known as OMOP. This is effectively what it means when you convert a health database into this OMOP representation. It converts thousands of tables, potentially into 17 tables, which makes it much easier to analyse. And along with that, you've got things like standardised vocabulary and concepts. It tries to really pack it down into something more understandable. I'm just going to quickly show you the Eden portal. To give you an idea of what this can do, in Europe, they have a whole list of places here. This is 101 places throughout Europe, and many millions of patient records all in the same format for research. This tool lets you analyse quite quickly information about these different health datasets so that you can determine which ones might work for you in your research programme. Here I can search through patients by country. I can even look at things like what diseases people have got. It's going to be a bit slow, so I think I might just leave it there on this particular thing, but essentially what you can do is analyse down at different levels in the data. Here we are, it's going to go faster now, I think. Yeah, the computer is a bit slow, so that's for chronic kidney disease. There we go. All right, what this has immediately done is told me out of all of these international datasets, which ones have got data in here, and here it's listing them, and I can then choose and I can actually analyse a little bit more data and decide who to collaborate with. So I'll leave the dashboard there, but effectively what it's meant for us is we can quite quickly then analyse our data of two million patient records quite visually through open source tools because it's in a standard format, so we've not had to write this, it's just there for us. Back to quality though, because the problem is it's one thing to have lots of databases, but the issue is is it worth anything? Now because OMOP has managed to compress the data down into 17 tables across the whole health domain, the OMOP, the Odyssey consortium, they've actually created a tool that runs here over 3000 tests across your dataset to see if it actually conforms so that other people can run similar queries on your dataset and reach regional outcomes. So you can see we have obviously got a few few years out the many of thousands, but in most of them it's actually because it's data that's not of relevance to us that would not be uploaded into this format, but this gives us a very good indication of how well our system's conforming and hence how well people could actually use our systems. The problem though is that's all very well showing data quality in a single system, but what it doesn't tell us is what mistakes we've made getting the data into that format and this is really the crux of it and what we're going to look at now with our white bandicoot system. So we actually looked at CAN and we looked at advancing the idea around how can we actually build tools to implement this not on a common model but on any database and that's a much harder thing to achieve and this was just some of the work around some of the attributes we thought we had to look at in terms of that sort of data quality and we also wanted people to be able to when they write their research output we wanted our tool to be able to inform things according to the standards that are expected in academic research and output here so can we actually help people understand the data and the quality attributes so that they can record it appropriately here. So the sorts of mistakes we are seeing in a GP record, we've got an example of a medical director here and there's a reason, a button and a procedure button that GPs can click when you're sitting in front of your GP and the problem is both of the screens look identical so the GP sometimes makes a mistake entering a reason as a procedure or a procedure as a reason. These are sort of things we can see but also if you actually just pull data out from the underlying data tables it looks like here we've got a nausea diagnosis on the 22nd of November 2018 but in fact the visit date was recorded as the 21st and you can actually see when you look at here in the bottom of the screen what the GPs that this nausea actually was on the 21st. So this is an example where if you just mind the underlying data tables you really got to understand the relationship between tables and how things really work and it's that level of detail we're really interested in finding out. So right Bandicoot we wanted to come up with a tool that was based on an international standard framework that allowed us to look at data quality in original tables and also allow people to compare databases as well in terms of the output. So here if you look at an original health system so an electronic medical record it's got its underlying database but in moving through research domains you often transform your data. So what I'm really interested in is what happens at each of these stages and can we document the quality and that's effectively what we're trying to do with White Bandicoot. So what I'll do here now is I'll hand over to Joel who's just gonna give us a quick demo of some of the screens here on White Bandicoot. Yeah exactly. So is it all yeah thank you. I'm just gonna share my screen. Yeah so I'm just gonna give a quick overview of the workflow for White Bandicoot. I'm gonna go through connecting to a database creating query and then running that query getting some stats from it and importing that into our web tool. So you can see here that I've connected to one of our databases and I'm going to load up this synthetic database called Cynthia. So I'm just going to create a project and you can see that now our White Bandicoot tool is loaded to the tables and the fields within this dataset. So I just want to see there at this point this doesn't need to be healthy to be so this could be anything. Yeah exactly. This starts being irrelevant here. Now what I'm going to do is I'm going to create a SQL query to get the completeness for this first table, elegies. And it's just going to check that a number of non null values within the each of these fields. So I'm going to just copy a query in. So this is what's called a CDM query which means that we write it once and it replaces fields within the query based on what fields and table we're running it for. So I'm going to check this CDM query. I'm going to give it a name for completeness, not null and this is yet a completeness query and I'm also going to set it to run on all table fields for the elegies table. Let me check up what everything. Yep. And then I will save it. And so the tool is going to create this and now I can go to the fields and you can see that it's created these queries for each of these fields. So what I'm going to do is I'm going to just run them. Well that's happening. I'll just mention this is a bit of a prototype. It doesn't look too good right now. But the key thing is being able to just try and get the functionality where it's able to run can compatible queries around things like completeness and compliance. So that's what that's what you're all issuing here. Yeah, exactly. So you can see for a lot of these that we have all completeness. So we're 619 out of 619 records a non null and that's just because this is a synthetic data set. But if we go down to the stop field you can see that only 10% of these are non null. And that's just because for elegies it makes sense that a lot of them don't have an end date. So what I'm going to do is I'm going to mark this as having issues. I'm just going to write a little note saying elegies. Yeah, often don't have an end date. And what I'm also going to do is I'm going to take this checkbox for each of them saying that it is the primary query. So that'll mean that they'll show up in the web tool when we export them. So now that we've done that, I'm going to go down here. The project that we created is selected and I'm just going to export the data. So I have some old data. While Joel's doing that, I'll just talk through a bit more about what's happening here. Because when you run data quality measures against your database, you have to run it within your data enclave. So what we have is a tool that is going to be open source that means people can see the source code so they can gain some confidence around being able to run this data quality tool on their enclave. But what we want to be able to do is have people able to publish their data quality. And even what we want to be able to do is allow people to actually put free text entry in there so that when they run the tool that actually is able to generate an entire data book for your database, including the quality attributes. So what Joel is showing here then is that he's just exporting the standard JSON feed. If that means anything to you, it's basically a format for data transfer. And he's now uploading that onto the web tool. Thanks. So I'm just going to create this Cynthia dataset that we just exported. I have some old data here. But so I'm going to just load this new project. I'm going to select the project here. And we exported the allergies. So I'm going to select that as the table that we want to look at when we generate this report. Then I'm going to generate it. And you can see here that it has the data. So most of the data like we saw was at 100%. But we have one field at 9.5 and we've marked it as having specific issues. So if we want to drill down into that, we can look at the fields. We can see the completeness for each of them. We can see that this is the field with specific issues and the note that we added. And then we can also see this in a graph form, or we can see it in table form and so on. But yeah, that's pretty much all I wanted to show you as a just simple demonstration of the workflow for the tool that we've made. Is there anything you want to add, Triggy? That's fantastic. Thanks for that. I'll quickly switch back and there's just another two or three slides there. And then we're finished there. Sounds good. Terrific. So let me share my screen again. Right, so effectively what you've seen here is being able to run the white band acute database tool against the database. And what we want to be able to allow our different people to run these sort of assessments on the database or even just different versions of it. But interestingly enough, we're sort of seeing that we've come up with some sort of a standard here potentially other another data quality assessment tool could do the same thing. So what the system is doing is outputting data to a dashboard. We've got a lot of work to do here both both on the tool itself and how easy it is to use and also in the dashboard. Users can you've seen already we've taken our patron database and we've converted it to a common data model. So used that extract transform load process to convert data to a model. And the model here with homework doesn't actually give the sort of information we're generating not the sort of free text content. So we're even interested in feeding back into this international community with this tool around things that we can do to add in further description for people to see the data sets. I think the final thing here that's of real interest to me and maybe is to the group here is around the standardization of how do you transmit information about quality to for instance a standard dashboard. So certainly this is one of the more recent things we've been doing is coming look at things like DDI. It doesn't really quite fit completeness and accuracy and so on. Although in version 3.3 there had been a look towards that sort of area. So what we're doing is continuing to develop white bandicoot whilst we find out all of the things we need that need to be transmitted across to our web tool. And at that point once we know the full extent of the information then we see ourselves being able to for instance start working with the likes of DDI around making sure we can support these within standards so that instead of our custom format we can actually move to a standard format. So that's it that's our whirlwind tour of quite a lot of stuff there. So thanks for listening.