 linkable data, bodies and research and health surveillance. So Douglas is going to talk about research terminologies Australia or the Rosetta. Okay, thanks very much. I'll not do much more than the way of introductions to myself but suffice to say I'm just a really enthusiast in health data science and actually making things happen in terms of research and that's my area. The first thing I want to do really is introduce to you an organisation they might not be familiar with the Australian Health Research Alliance. This alliance is a mechanism for bringing together all advanced health research translation centres and centres for rural innovation across Australia and certainly I've seen this as a real potential vehicle for trying to bring a voice to things like national projects specifically in the health arena that you can see here. Our challenge is always the competitive nature of research and really trying to bring things together. So under the auspices of ARA I was lucky enough to be able to lead a new national systems level initiative back from about how we can bring forward things at a national level around data quality assessment terminologies and mappings and common data models. So this doesn't mean reinventing the wheel it really means trying to take what's out there and think well how can we actually build on what is there to do something better. The health domain of course has for many years had a huge amount of work in this vocabulary space so my clinical terms being a prime example of course this is our national terminology. We pay internationally for access to this and we've got a system that's been developed by CSI RO which acts to curate the rules under which we're allowed to use snow-med clinical terms in an onto server. Huge in the space in healthcare of course is fire as well now as a researcher having things like this having this interoperability in the health systems is a great thing but I would say that there are some limitations in it and in that it's vast you know so if you're actually going to sit down there and try to implement fire it actually takes quite a lot of knowledge to do so that's not necessarily surprising but certainly it is one of the limitations in the implementation and this is a real world example this is a GP computer system I came across a few years in Australia and this lets you see the referential integrity across the tables there's thousands of tables and in fact this particular one was developed in Russia and all the documentations in Cyrillic so I was struggling a little bit with this one I'll tell you so data is not necessarily conformant that's the problem so as a researcher I'm really interested in the in the vast array of data that's out there and if we look at this example I just look through some tables from two GP computer systems around diagnostic past history and the first thing to know about this is 40 percent of all the data in the in these tables which are where the main coded data resides is not coded at all so there are no codes and what you're seeing here is lots of examples of spelling mistakes or abbreviations and things so you know if you're going to try and define right what do I need to do for a national indicator around type 2 diabetes you can see there's an awful lot you might want to pick out of here but there are some things you wouldn't you know so these are the challenges in that curation I think we've heard a few times today it is not always a single definition sometimes you're having to curate different definitions for slightly different purposes but how do you actually achieve that and when we looked at this further we were actually looking at what happens around national returns as part of national indicators and what we found is that okay there's 40 percent of data that's not coded but of course people have more than one piece of information in the record so we can often pick up coded data or more recent information what we're finding is like COPD constructive obstructive pulmonary disease thank you we can see that's missing 12.1 percent so that's basically meaning that national indicators are undercounting by over 12 percent now so what can we do about that so this is where I've been thinking is there a way that we can bring data together in a way that could be community curated because the problems there isn't one source of truth we've got different vendors we've got people that are spending a lot of time doing these mappings and then they're very protective about them because of the effort they've put in to create them in the first place I mean we even have very simple examples like here we've got the meteor codes for sex and yet probably every single system I encounter has got a different set of abbreviations for sex so where can I go to find that out or do I need to reinvent the wheel every time where I'd like to get to and where I'm working is in converting large hostile model and it's when you try to do these things that all these issues arise the advantage of a common data model here is that on the left we've got all of the health concepts down to about 15 tables so we're not talking about that diagram that you saw before and we're not talking about the complexity of fire and structures it's much closer to what researchers can actually deal with and of course it's about a common data structure but also a common vocabulary which is really important and this is based largely on snow med but not entirely so one of the challenges we face still in healthcare is some of the mappings and things we do snow med isn't necessarily always the destination although it's very commonly there this is just if you get copies of these slides later these are some of the tools that allow you to convert to this common data model so there's lots of stuff out there in this open international community and over two billion records have been converted to this over the years so it's been very useful in things like COVID research so that's all our challenges so you know I look at the Rosetta stone here they had some ideas how to fix this in 196 BC for other areas so you know what can we learn so really what I'm about here isn't necessarily about new vocabularies but how we can do community curation and mappings and things like that so on the left here we've got the idea of mappings that we for instance have generated in primary care and it's a diminishing return when you've got a hundred million records with lots of information in how far do you go mapping things like free text so you can continually evolve these things and potentially if you've got the right tools in other groups can help you simple as that really so in this example we're just seeing if a Western Australia researcher can actually build on what we've done because they need to map specific things around their research questions and that helps move towards greater mapping efforts Michael Olly is not here today he is a source of all of all knowledge in terms of on-to-server and snapped snow med CSIRO although he's going to be here for the rest of the week for the workshop so please don't push me too hard on questions around on-to-server but these are really keen our thinking you know we've been working through this with the RDC and with CSIRO thinking what can we do I'll just take you through these tools a little bit so I mean on-to-server as I said earlier underpins our national snow med capability so that needs curated but what I love about this model is lots of other people can run instances of on-to-servers that's got the syndication idea and you can do your own things in this so immediately I'm thinking well does that mean that the RDC could run its own copy of this for doing things around vocabularies and terminologies as well not set to replace what's already there but you know working towards you know how we advance the cause over the next few years one of the key things about this is this api connection this fire link and what this means is that if you've got a third party software supplier they can get regular updates to the underlying terminologies here not just snow med but anything that's on here so I'm thinking all right well does that mean then if I've got some custom mappings for type 2 diabetes and we're wanting to get all the third-party vendors to be able to connect into that could be used on-to-server so Michael says yes I'll take his word for that you know so you know this is fantastic but also then it's just got a whole range of use cases including even things like validating fire but you can read them in here in the interest of time I'll not go through all of this in detail so yeah because I want to get on to snap to snow med so on-to-server I mean it's been taken up internationally UK for instance are hugely invested in using on-to-server so I think this is another reason why I'm so interested in this platform is it's become a bit of an international de facto standard it's recognised as as a really you know real huge contribution into the into the sort of terminology space but specifically around snow med and what's happened is they've wanted to build better tools for doing the automated mapping to snow med and this opens up some great doors as well actually if I look if I fix on the needs here first some of the things they've got in here in ensure traceability so it's like provenance you know what are the different versions of any terminologies or things we put up here access control so a lot of the things that I've been hearing talking about today are actually built into what they've been doing with snap to snow med but of course snap to snow med is just fixed around converting to snow med largely at this time but you see down the bottom here they've got things like export to fire concept map is on the roadmap for next year so there's just a whole lot of convergence stuff that's really going on here so just quickly looking at some examples so here we've got a map of common diseases and what's happened is the automated tool has looked at these mappings are the equivalent or inexact so it tells you about that but also tell you tells you who has reviewed things when and whether it's accepted so this just lets you see that a little bit better so like any working tool user interface is very important but this is the sort of thing that I'd really like to be able to work on with non coders you know getting people in the community to think all right how can we get consensus around some of this stuff but the great thing is it's the huge massive start that's been made on onto server and start to snow med around a lot of the key technical underpinnings of all of this it's really fantastic and there you are that's the automap I'll tell you how you complete your and I've had quite a few researchers working with this and we get some really great feedback on it it's some of these automaping tools are are not so good but this one actually seems to be working extremely well so I'm just about to finish up here so really where to now is we've got this really excitement and working with the RDC and CSIRO to think can we actually do something around community curation in health in health terminologies so really to take that forward and say what the RDC is going to do next I shall hand over to Ramin thank you all thanks toky thanks very much look can everyone hear me okay can everyone hear me okay I can't see okay thanks yes again yeah thanks these are the three areas that I'll briefly touch on during this part of the presentation collaboration vocabulary service extension and a thematic research data comments and so in collaboration with University of Melbourne and CSIRO RDC will extend its existing vocabulary service to support researchers in the health and medical domain a service which will be woven into the fabric of a digital health research infrastructure Dougie described a vision for enabling shared access to health and medical terminologies and mappings particularly resources that just may not be available through existing services access to these resources and tools for their management a discovery portal enabling access by people and software at a service for supporting this community resource ARDC provides a vocabulary service to the Australian research sector comprising a registry discovery portal tools to managing vocabularies and access points for people and software and so why does it need to be extended well as Dougie mentioned the health system and health research domain uses specific international terminologies such as SNOWNED with specific interoperability standards such as FIA the ARDC Vocabulary Service will be extended to cover that content and those protocols to promote data standardisation integration aggregation and interoperability in health research naturally there are variations in the implementation of such broad standards between jurisdictions sectors and services and Dougie touched on this earlier and so the new ARDC capability will also allow mapping between concepts and interpreting in these different contexts as a national open research infrastructure the ARDC capability will enable reuse of these resource intensive mappings by other research groups or health services they'll no longer be silo ARDC will adopt and adapt the mature CSIRO developed onto server and SNAP to SNOWNED applications and stand them up in a national infrastructure service framework for access and use by researchers across Australia the instance will be optimised for research sector usage so it'll be flexible and customizable but it'll be completely compatible and interoperable with the Australian Digital Health Agency's National Clinical Terminology Service and the ARDC Vocabulary Service will be woven into the fabric of a digital health research infrastructure a fabric of weaving together of themes resources and technologies a fabric that has nationally focused capabilities that strengthen and support the broader digital research infrastructure system that system is reflected here by the horizontals such as people and policy and data and services a fabric that also has a deep focus on identified national challenges and opportunities these are the verticals examples such as has people and planet and so in returning to where I began collaboration service extension and digital health research infrastructure ARDC is well placed to operate this national information infrastructure and in so doing promote greater use of national and international standards develop national consensus on health research semantics and build greater national cohesion and coordination to support national scale research programs thanks