 Good afternoon and welcome to the Australian National Data Services webinar about data linkage and the Australian Health Thesaurus. I'm Kate LeMay and I work at the Australian National Data Service and I specifically work a lot with health and medical data. We work to make Australia's research data assets more valuable for researchers, research institutions and the nation. Today is our second in our current lot of health and medical webinars. We had three health and medical webinars earlier this year and then we've had two more this month. Last week's was on patient views on data sharing and today's is on data linkage and the Australian Health Thesaurus. Every single time I have done any event with Anne since I started nearly two years ago, when we get feedback, data linkage is always something that people ask to know more about. So we're very lucky to have Dr Trisha Johnston from Queensland Health to talk to us about data linkage and she's been the Director of the Statistical Analysis and Linkage Unit, Statistical Services Branch in Queensland Health since 2009. She's worked with health data both at the Queensland Ambulance Service and within Queensland Health since 2000 and has extensive knowledge and experience in statistical analysis, linkage and interpretation and reporting of health data to support the development of policy and systems to improve the quality and efficiency of health services. We also have James Humphrey here from the Australian Health Thesaurus and he's the Information Manager at HealthDirect Australia. James is responsible for managing content on the HealthDirect Australian websites. This includes the content production team, the information partnership team and for the purposes of this presentation the information management team which manages the terminologies and ontologies used by HealthDirect. So I'd like to pass over now to Trisha to speak to us about data linkage. Hi, thank you very much for having me today. So I'll be talking from the Queensland perspective and trying to fill in a little bit from the national perspective where I can. So apologies to people in other jurisdictions, this is mainly about Queensland. So my topic today is accessing and using linked health data. So I'm from the Statistical Services Branch within the Queensland Department of Health. So just a little bit first about that branch. So we have a broad sort of role in data collection and analysis and processing within the Queensland Health Department. So we collect, process, analyse and disseminate statistics about health of Queenslanders and use of health services. We also have a role in developing statistical standards and maintaining a data dictionary about a lot of the data items that are collected in Queensland data collection. We also play a central role in data linkage. So the data linkage unit is located within the statistical services branch and also a provision of data for health services, policy planning, management monitoring and evaluation and of course research. So there is a data custodian role within the statistical services branch relating to some of the data collections within Queensland Health. So those relating to hospitalisation, perinatal and also by proxy for death registration information as well. So today I'll just be talking about what data linkage is, how linked data are used and how to apply for access to linked data. So I guess the thing that people talk about when they talk about linked data is that data are collected and they sit in silos. So for example, within any given hospitalisation or hospital event, we might have ambulance data, emergency department, admitted patient data and it's for some people death registration data. So currently, particularly where you're looking at a cross hospital, there's no unique identifier to join that information together. And then within any given hospitalisation, you'll have some more data collections that sit in isolated silos. So you might have pathology, pharmaceuticals, intensive care unit, operating room, mental health, perinatal data to name a few. Outside of that hospital event, there are other data collections that are isolated as well. So things relating to notifiable conditions, vaccinations, outpatient services, registry information. So there are sort of things like cancer registries, stroke registries and a number of other trauma. And then there's primary care information. So that sits outside of the state jurisdiction, so relating to DP visits, MBS and PBS. And then there are other things like aged care data. Then you can take it further and we can look at things like education, police and justice, child protection, surveys about issues that relate to health, census data, et cetera, et cetera. So there's a lot of information out there if joins together could be quite useful, but it's sitting in a lot of silos. So the reason that we do linkage is that across all of these data collections or most of the data collections, there's not a unique identifier that can say that that information relates to a single person across collections. So we use data linkage. It's a process that uses person-level identifying information, things like name, date of person, address to determine which records within a data source or between data sources relate to a particular individual. Okay, when we talk about linkage, we're usually referring to probabilistic matching. So we're using probability to work out which records within different data collections relate to an individual. So the strength of that is that not all of the information needs to be available and the quality of all of those identifiers doesn't need to be perfect to allow us to make a match between the data collection. So why do we link? Basically because if we combine all of this information, it provides much richer information than the individual collection by themselves. So within health we do, we can use that information to look at patients across facilities. We can look at following up cohorts. We can look to ensure that people haven't died when particular organisations like health or researchers are wanting to make contact with an individual so that they're not contacting an individual's family and causing more pain. We can also link data to reduce the need to collect additional data which can be quite expensive. So within Australia in the health context data linkage was identified as an important research tool by the National Collaborative Research Infrastructure Strategy back in 2005 and they allocated funds to progress the development of linkage infrastructure within Australia. So the Population Health Research Network or PHRN was formed and nodes were created representing each jurisdiction in Australia to set up linkage infrastructure. So within Queensland, in the Queensland Department of Health, we have data linkage Queensland. There's Cheryl within New South Wales and they also link ACT data. They were set up before the funding for PHRN was in place but they're also part of that network. Again Western Australia has been operating a linkage unit for a long time prior to that but they're also part of that PHRN network. There is an SA&T data linkage to conduct linkage for South Australian Northern Territory. In Victoria, there's the Centre for Victorian Data Linkages and Tasmania have the Tasmanian Data Linkage Unit. Nationally, the Australian Institute of Health and Welfare have a linkage unit that links health related data. The ABS also do some linkage for non-health related data collections and they also do some linkage to health. And within Queensland, the Queensland Government statistician's office also does some linkage of non-health related data collection. So as I said, AHW does some linkage of health related data collection. So the reason to go to the AHW rather than to a state-based linkage unit is for linkage of those national data collections, so MBS, PBS and Residential Age Care or if you need to know information about people within a jurisdiction who might have a service event or who might have died and been registered in a different jurisdiction so they can link to, for example, the National Death Index. Whereas within a jurisdiction, we would link to the death registration data within that jurisdiction for people who are registered there. Okay, within Queensland Health, we have two main forms of data linkage. So we have production linkage. So we're creating a master linkage file that has enduring linkage between core health data collections. And I'll tell you what, collections we're linking in a minute. Our linkage production linkage is done in near real time, so we're doing linkage every two weeks to try to be as up-to-date as we can for the collection where possible. And our master linkage file contains about 45 million records at this stage. The other part of linkage within Queensland is request linkage. So we are able to either provide data from our master linkage file or we do a customized linkage where that data collection isn't within our master linkage file and that's for both research and government requests. Within our master linkage file, the data collection that we have included and the time periods are on the screen here. So the big one that we get most request for is the Queensland Hospital Admitted Patients data. We have both public and private records, hospitals included there. The time period that we have names and addresses on our data for the two different types of hospitals is different. So we can link a lot further back with our public hospitals than we can with our private. We have names and addresses on our public system going back to 1995. So we'll be linking back to that period. Currently we're linked back to 2001. In private hospitals we only have names and addresses from July 2007. We also have emergency department data, our perinatal data relating to all births that occur in Queensland. We have death registration data. We're linking that back to 1995 as well and currently it's back to 2001. We have birth registration, elective surgery and outpatient waiting list that relates to public hospitals. We also have some internal programs, so surgery connect program which is about contracted care in private hospitals that's contracted by public hospitals. We have Queensland Ambulance Service data that's just recently been joined through our master linkage file and we'll be going back to 2008 with that. That was just at this stage linked to two months. In progress we have notifiable condition vaccination, non-admitted patients, data for public hospitals, sort of outpatient clinics. We're going to be linking air retrieving so that includes flying doctors and care flights information and we are in negotiation with the education department to link some of their collections so that people can look at things like NAP plan results and AEDC. We also can have conducted in the past linkage to other ad hoc data collections, so some of those I've listed on this slide, so registries, cancer, pap smear trauma, strokes, suicides, other governments, agencies, corrections, transport and main roads, mines and natural resources linked to other Queensland health data collections, so community mental health is a common request we receive. Pathology, pharmacy, operating room and ICU and then other cohorts that have been provided by clients. So how linked data are used? So within Queensland health we get a lot of requests probably about 60% of our requests come from within the department and they're around things like trying to understand prevalence of diagnoses and diseases and co-morbidities. Planning, so looking at the number of people who are using services, the number of episodes per person, looking at readmission rates, patient transfers and patterns of where patients are going between hospitals. We also have requests for monitoring and evaluation purposes, so looking at service use and patient outcomes and trying to just improve allocation and recruitment of funds across different services. Clinically there's a lot of use of linked data at the point of patient care comes from a different system that's just linking within hospitals. So that's looking at things like previous service use by a patient when they actually arrive at a hospital and looking at things like allergies and drug information has already been recorded in the system and that's part of the electronic medical record and also we have a system within Queensland called the viewer that is sort of the interfaced where clinicians are able to view that information. For research I've just included a few examples, I mean we have sort of over 100 requests per year that we receive for for projects. We hold a data linkage symposium every year and it is possible to video conference into that if people are interested in other jurisdictions or to come along. It's actually being held next Wednesday so if you just look on our website there are details of that. But these presentations are all available on our website and they have so there's more information about them. So there's things like looking at the relationship between mental illness and offending cardiovascular health in people who are hospitalised for burns. Looking at indigenous status on data collections where there's not good coverage of that particular data item. So the example here is cervical screening data. Looking at vaccination programs and looking at outcomes for patients who have been vaccinated and then one very interesting presentation from Alfonpo's in last year that looked at cost effectiveness of homelessness intervention so that links a lot of different data collections from across all different sectors including health. So how to access link data in Queensland? So access to confidential data in Queensland we have legislation called the Public Health App and you also need to fill in a form that relates to that. We also need ethics approvals so I'll give you a link in a minute to our website and that has linked then to the different areas and departments that contain more information about that process. So it's important to note I guess that Queensland Health remains the data custodian of all Queensland Health data and that data can't be shared or published except in an aggregate form. Data can only be used for the purposes outlined in an approved request so that all of that information goes into a Public Health application also on the ethics protocol. The information about what you're able to do with data that you obtained from Queensland Health is all detailed on the Public Health Act application and if you need to access the same data set or a different researcher would like to access the same data set for another project what you'd need to do is just an amendment to that Public Health Act application and ethics or if it's very very different then a new application might be required. So this is the Queensland Health website relating to data linkage and we've got lots of resources there for people to understand how they apply for linked data or linkage services within Queensland. We've also got some resources that people find quite useful so we've got a nice table there that lists all of the commonly requested data collections that people are accessing from us. It's got a column with contacts to sort of apply for that linked data and then some resources as well some manuals and forms that relate to that data collection to help you understand the scope and coverage in that data collection. So for example one of the forms for our Admitted Patient data collection we've done these up for several of our data collections relates to the commonly requested data items from those collections and what they look like a bit of information about them so those forms are all available on our website. Okay and that's all from me thank you very much and please if you have any questions let me know. Thank you very much Trisha so we're just going to pass over to James now and he's going to present to us about the Australian Health Thesaurus. Okay thanks Kate. Hi everyone I'm from Health Director Australia so I'll start by talking a bit about Health Director Australia about who we are. We're a government funded organisation we're a profit we're actually a co-ag company we've been running going since 2006 and we're actually owned by the federal health department and the state departments state health departments of every state except for Victoria and Queensland and we were set up to develop a range of digital health and telehealth services. We began as the National Health Call Centre Network for people to contact after hours if they're having trouble with any health issues. We became a digital organisation of maybe five or six years ago and we provide digital services and other health information and advice to the Australian population. Some of our websites and services are the main Health Direct website which is a general health website. We also have the pregnancy birth and baby website. We have a carer gateway for those people who identify themselves as carers of people who have a disability or who are chronically ill. We run the Health Direct After Hours GP helpline to another contact centre. The My Age Care website on behalf of the federal government. We have a national health services directory so you can find the GP's specialist emergency departments of in your local area. We run the quit line and get healthy services in New South Wales as well. I'm here to talk about the Australian Health Thesaurus which we manage and it's a thesaurus of medical health and human services related concepts. It's human services as well because we manage the age care and carer websites. It reflects the current Australian health and human services environments. You can see on the right hand side I've got some statistics there with some of the we have over 5,000 concepts within our thesaurus. It was originally developed by the Department of Health and Aging many years ago and we inherited that thesaurus about four or five years ago. The thesaurus is actually based on MESH, the Medical Subject Headings which is maintained by the US Library of Medicine. We regularly update the thesaurus. We look at all the user analytics to see what are people searching for on our website. We look at the current news developments such as the Zika virus when the outbreak occurred in Australia a couple of years ago. We had no concepts on Zika virus so we added that in and we also do environmental analysis of certain domain areas such as age care which we did when we set up the My Age Care website. So we looked at all the websites in Australia that focus on age care and looked at all the concepts that they're using. The main thing about our thesaurus though is that it's consumer focused. It's not aimed at the clinicians or health professionals. It's aimed at consumers so we try to keep that focus on the concepts. So this slide here you can see on the left hand side this is the the main concept schemes within our thesaurus. You can see we've got anatomy, chemicals and drugs, diseases and disorders, equipment and supplies, facilities. So there's quite a lot of different concepts there. If you click on one of those concept schemes you can go through the hierarchy to see and on the right hand side you can see I've gone down to the through diseases and disorders, digestive system diseases and I've now selected the liver cancer concept. Clicking on the concept provides the shows us the other information about that concept and here you can see what's called the SCOS view, the SKOS, the Simple Knowledge Organization System view which is basically showing you the broader narrower and related concepts. Also on the right hand side you can see we have the preferred label and alternative labels and the alternative labels are the synonyms. So the idea being for our websites that if people in this case if they search for hepatic cancer they can find content that has been classified with the concept liver cancer. So the main reasons we have our thesauruses as I just mentioned was the classification purposes. All that content is classified with a thesaurus. It also helps in the relevance in our search results in the ranking. We also use it for auto-suggestion as well so if you once you start typing in your search in the top field you'll get a list of suggestions to select. We can display other contextual content on the website as well such as a video on asthma will appear on content, an article about asthma because they're both classified with the same concepts and we also use it to manage our medicines data and this is what I really want to show you today because it shows you how we are linking our data. We've set up a medicines catalogue and we've looked at all the publicly available medicine data sets that we could find in Australia and we're starting off with our own health Australian health thesaurus which we are using as our control list. We've also looked at the Australian medicines terminology. This is the national standard for naming conventions for all medicines in Australia. This is managed by the Australian digital health agency authority. We also have data from the therapeutic goods administration through the Australian register of therapeutic goods. That's a register of all the drugs that are sold in Australia. They also have data from the pharmaceutical benefit scheme and also a guild link which is the commercial arm of the pharmacy guild of Australia and from then we get pill images and we're just going through a process with them at the moment to get their consumer medicine information leaflets. That's in a packet when you buy it over the counter and we want to get that HTML format so that we can present the data on our website in a much more consumer-friendly way. We're also getting data from Drug Bank which is a great Canadian government initiative. It's got great information about medicines and chemicals. You can see on this side now that this is what we do without the source. This is how we start the linking process with these other data sets. I've selected the concept amy-trypdeline which is an active ingredient and we've created a whole new concept custom schema which we call clinical relationships and you can see on the right hand side in that column you can see we have added the ID identification of the Australian Medicines terminology ID and a bit further down we've also got an ID for Drug Bank so it's just a mapping with those concepts in those data sets. You also see there's a reference here to a beer's criteria and that's not the fact that this can be used to to make beer. It's an internationally recognised list of medicines that are inappropriate to prescribe to older people. In this case this is one of those drugs and we've got the Boolean logic of true here. A bit further down we've also got a pregnancy category as well which we get from the TGA and this has got a pregnancy category of C. We're not actually using that at the moment but we do intend to use that on our website soon. So what we're doing every month we update what we call a terminology service. It's a database. We import all the data from all of those those data sets into our terminology service from the Australian Healthosaurus, the TGA, PBS, the AMT, Drug Bank and Guildlink and we've quite developed the relationships between all those data sets so that when a user searches on a for a medicine on our website they can dynamically pull that content all that data into a web page for them. On this page you can actually see how those relationships work. I showed you where the AHT had that reference to the Australian Medicine terminology and Drug Bank. We can see that those relationships are here with the Drug Bank and AMT. The AMT actually has seven different data sets within it. From medicinal product which is a list of the active ingredient used in a drug and the trade product which is actual brand name of that and you can see you can actually work your way through these relationships to come down to the container trade product pack which could be in that data set. It could be Panadol 20 milligrams 20 tablets in a blister pack and that data set has a reference to the Australian Register of Therapeutic Goods ID which the TGA manages and Guildlink also uses that that ID as well and you see on the other side left hand side there the PBS link which is there's a reference to that ID in the medicinal product unit of use data set. So we can link up all of these data sets and as I said before the use is that so that if somebody searched on any medicine we can pull the data from all of these data sets to show information to the people. And so what does it look like to a user? I can show you from our site here this is a page on the on the drug in-depth and as I scroll down the page you can see there's the information on the page. Remember when I said the beers criteria because I had that true flag in the data it's we've pulled up this warning here to say that if you're over 65 years of age there may be specific risks and recommendations for use for this medicine so please consult your health professional or pharmacist. As I scroll down here people can select the type of packet that they have in this case whether it's a 10 milligram or a 25 milligram or a 50 milligram pack that's coming from the AMT the strain medicines terminology. As one of the bits of data we have from the TGA is a PDF version of the consumer medicines information leaflet and so we have a link to that so if that exists then they can click on this read leaflet to be able to read the consumer medicine information leaflet which will tell them all about the side effects of the drug when they can use it when they can't use it that type of thing. Then we have other information here as well coming from the TGA and also images coming from Guildlink so we've got the dosage form the route of administration we've got information about the pack about how to store the drug and the lifetime of the drug and also if that drug was available on the PBS then there's a link here as well to go to the PBS site. Okay go back to the slide there the source can also be used we have a public version of it that can be accessed by individuals and organizations as well so it can be used for research purposes just remember it has its consumer language it's aligned to medical clinical and and government standards and systems and it forms a bridge between those three different domains and it could be tend to be used for the surveys for interviews and for transcripts and I've got the link for it coming up soon but this is what the public version about the source looks like here you can you can do a search for a concept you can select the A to Z list or you can click on any of the concepts and just drill your way down through the through the concepts to find the right concept you need and you can still find the right information this is only showing that that SCOS view that I showed you before with the the broader narrower and related concepts but you can also find some other information on that as well and there's also a visual version of it as well so that you can see in a visual format what are the the narrower and related and broader concepts of that that concept here are some some links which you might find useful the first one is of our website the main health threat website we also have a link there to the medicines pages where you can start searching and see the how all that data comes together I've got a link here which shows you general information about the thesaurus and for those actually want to get into the thesaurus and look at itself that's the link down the bottom that's thesaurus.healthdirect.org.au slash aht okay and that's it so thank you thank you very much James for your presentation so I'd just like to thank Trisha and James for coming along to our webinar today it's been really informative and we really appreciate you taking the time to do that thank you everyone for your time today and we'll see you next time we have some more health and medical things to talk about