 Yep. Okay. So. What a great. Am I speaking there? It's not for here. Good, good to pretend that it's booming. What a great venue. Thank you very much to the Australian data archive for organizing this. I don't know online if you can see that magnificent sort of theater almost with a great backdrop. I do remember the old days when I would go and visit Steve at the other end of the campus in a little cottage that predated the university so 1920s or something like that. And, you know, there was a fireplace and a kitchen in the middle of their office and, you know, cold in the middle of winter. For those of you know, know the ARDC actually we're still down there in those little cottages. In fact, one of them burnt down and was replaced with a demountable and that's where we are. But anyway, it's good to be up the big end of campus here. And thanks very much for all the logistics and the beautiful resources that you've applied to this symposium. My name is Adrian, Adrian Burton. I work at the Australian research data commons. Now, right there we have a little bit of a mismatch there I'm going to be talking to you about that the context of the Australian research environment, although we have by putting out the call for vocabularies. So we have participants from government from industry from well beyond Australia. So I'll stick to what I was asked to talk about but we'll try and make it in some more fundamental concepts that do apply well beyond research and well beyond Australia obviously. What a great welcome we had from Paul what a generous welcome what a beautiful, meaningful welcome. And so in reciprocation for the welcome we acknowledge the living legacy and heritage of the great First Nations of Australia that really make this a unique. I've broken the cameras usually so I'm from the Australian research data commons as I said, we are part of a national infrastructure strategy and Chris national collaborative research infrastructure strategy. I can see Natalia here, a number of other participants from the national research infrastructure system. These are 24 national level facilities where we're providing really national scale services assets infrastructure to support leading edge research. So when we're talking the infrastructure in the title of our of the talk today. This is infrastructure to support leading edge research and stuff that couldn't be done by any individual research group or university, even jurisdiction and you'll see some of it kind of in be done simply by national infrastructure. Our particular role in you've got sensors sensor networks and laboratories and supercomputers and a lot of equipment that has to be deployed at national scale. But remit is digital and data infrastructure at the national scale. Now of course digital and data is in all those facilities technology enabled. We're looking at you know what needs to be done at the national level to support this transformation of research. So as part of that we do everything from policy skills training cloud storage as you might expect, but we have a program called national information infrastructure. That's what I'm going to talk to you about the that knowledge infrastructure that we bring and there's things that can only be really done at the collaborative national scale. Thanks. That's not what I thought. That's what I thought. So, infrastructure generally is in these three categories of catalogs identifies and vocabulary. And so that's the context, as far as we're concerned for providing these services at the national level. We run a national consortium for the meeting of DIYs with data site for models software data sets, a number of nontraditional research types. We run a national catalog to find research data. And we have this vocabulary service that I'll be talking to you about a bit more. So these are this category of national information infrastructure. So it's not in any single university or any single nations remit to provide a global identifier system for data sets just can't be done so that's what this national and global infrastructure is here to do to support the work of our scientists to the work of our research sector with the information infrastructure that's required and can only be done at that collaborative level as no. What's the point if the ANU has a brilliant way of identifying data sets. If it's different from Melbourne which it obviously would be has to be. It doesn't make any sense if I'm referring to a data set and then the British and the Japanese have different ways of doing it. So these are infrastructure systems they're beyond the individual research groups and institutions, and they're part of a national or international system. So that's the kind of work that we do. And the, the founding of the orchids consortium and it's running in Australia. We're running these national scale discovery services across the whole sector and synchronizing them with discovery over the world. And then in vocabularies. I keep saying to our staff. We're trying to reduce the amount of vocabularies in Australia. We're trying to increase the number of data sets increase the things identifies but actually we're trying to get people to use semantic standards and have them adopted and adapted. And so we have a vocabulary services the first sort of step in a set of semantic services. I'll start with catalogues just to give you a little bit of. Hello. How are you doing. At logs to give you the sort of context and it will build into why we have vocabularies in one sense. So, a lot of this stuff I've talked about is research system information. We're talking about the whole of research in multiple different nations multiple different organizations. The way in which people often look at, you know, I want information about data set so example the catalog I gave you there. What we're looking at there are the outputs so you're looking at it as a system kind of program view, and there's inputs activities outputs outcomes and impacts, and what we were asked to do. Well, let's have a look at those. What do we mean by an input. There's money grants investment. There's organizations. There inputs into this research system. You've got facilities. Some of the national facilities and other infrastructure facilities. You get your people as part of the input, and then they start to work on the activities which is the project outputs we all know. Journal articles, but as the world becomes more nuanced data sets models. Some of the semantic artifacts we talk about today are considered they would normally have been considered as by products and today they're being considered as outputs. The outcomes would be, you know when is it used and when is it applied to government or industry, and then in the end the impact would be measured by prosperity. What have I got their people and the planet. There's a way of looking at some of the artifacts and actors in a research system, and quite often this information infrastructure that we're talking about today is trying to talk about all of those things to, you know, to talk about what's happening in the research system. As an example and of course, as I was just saying that happens all over the world in multiple government agencies in multiple research institutions in industry and a number of service players. So when you're trying to bring together information as we were asked to do, for example, in research data Australia. We were asked to bring together the information about data outputs from research. Bring it together from all these different sources in our case across Australia in other cases in other catalogs that's worldwide. And then take that information about the outputs and bring it all the way up to the beginning of the process and make that a facility for input into new research. So that that's the the function of the catalog at this at this point is to take those outputs, make them available then as new inputs, so that the people and the projects can then use them into in new and innovative ways. Why am I talking about the research system? Well, quite often that's the context that we kind of get in order to run this kind of knowledge infrastructure. And here we were asked to provide that. And people said, well, I just don't want a list of data sets. So what papers were related, what project projects they came from, who are the people, what facilities did they use, what organizations were behind it, and where did they get the money. So already the just trying to bring together information about data sets from all these multiple information sources. I'm ahead of myself. What should I be saying now Steve. Yes, that's exactly where. So not only we were trying to bring together information about data sets and software, etc. But all the linked, you know, other bits from the research business process. Now, even just bringing together one item from all these different sources has its challenges and so then let's talk about that that's why we start to get into this knowledge infrastructure. Let's practice at that. Did you do earlier. So, thank you for bearing with me with that context now we're getting back into their sort of meat and potatoes of knowledge infrastructure. What was the challenges that you get when you try to do that I remember very clearly Melanie and some of the developers at ARDC coming into my office and saying, Adrian, we've got all the researchers from all over into this big, you know, national research data Australia, but I've got, I think it was 800 references to John Smith have no idea whether they're the same. You know this is 2006 seven or something like that eight. So we have no idea whether it's the same person or how do we bring them together, you know, are we going to create all these, each of the mentions of the data set from different places around the institution had a different reference to a person. So I said to them, I remember quite clearly well you're going to have to do something about that in the short term, you know by data cleansing and you know topic modeling and bringing stuff together. And in the long run, we will look to work with the sector to get proper to fix up the inconsistent referencing and the duplication of information. So that's 2009 we decided to pour money into the National Library to create a national identifier for people in Australia. And one year later, people in the journal said, I actually wish that awkward and so our local sort of initiative to have precise referencing for people for researchers in Australia was kind of overtaken by an international system which was great. We were ready to join up. And so now, you know, fast forward, when we get information from different research institutions even government facilities, research research facilities about a person, then we can just say, don't tell me that it's Brian Schmidt or BP Schmidt or Professor Schmidt, just give me the orchid and that's a precise reference, and we can run a distributed information system across the whole research system by having a consistent referencing by getting to these to the siloed duplication. All the information about Brian Schmidt had to be copied into all these different systems, you know, across the place. So that was actually the segue which I probably should have done a little second ago from where we started in catalogs and said, Well, actually, you need to have identifiers. We need to be able to persistently identify the different players in the research system. If the persistent identifies bring to that problem. It's a consistent way of, of referencing some of these artifacts and actors in the research system. Remember, that's why I brought up the top sort of paradigm we're using there is the, the, the business process of research, and there are different artifacts and factors involved there, and the persistent identifies that you know a DOI is for persistently is for referencing the journal publications, the data sets software increasingly grants. So, you know, that's the business that they're in, you know, orchid is there for for persistently identifying the people. We've got the emergence of raw and a few others for organization so those things that everyone knows about the very famous international persistent identify systems. They are designed for this problem for talking about the research system. That's right. I'm getting used to being pretty subtle in this talk. I thought we're going to have to evacuate that's pretty nice. Now where on earth was I there was right in the middle of the flow. Do you want to remember. I have no idea where I was there. Yes, persistent identifies the ones the famous ones that we know orchid do I everyone knows about them. They are for this purpose of identifying precisely identifying actors throughout a very complex system. I'm not sure whether I kept slides or not. Yes, so to go through this very quickly, because it's a very Australian context, the thing called excellence in research in Australia. It's like the ref in the UK it's looking at, well, I'll show you what it's looking at it's looking at how many publications there are in Australia, basically. And so that's now you if you follow the news in Australia that used to be done as a survey now they're saying we're going to do it as a data was a data enabled new process for collecting that information. So that will be based on consistent use of do is and and this kind of aggregation across the system. Engagement and impact is another thing that we have here in Australia and the audit of that, and that's bringing together trying to identify those artifacts of the system. One that everyone probably knows is the H index. What are they doing with an H index they've got all the DIYs for all the journals in the world. They're related cross ref that's what cross ref exists because it's the cross referencing between all those publications linked to who is the author, and then outcomes are pretty dodgy algorithm out the other end but we love or hate the H index. What I'm trying to say here is that the tools to make that kind of global information system about research were invented by the journal publishers. And they had to, because there's millions if we're talking about all the research institutions, all the papers, all the people in the world, all the researchers in the world that's what he's talking hundreds of millions of publications. I think it's 10s of millions of researchers, and you know lots of these. So if you've got it all that mixed up in a big system. They had to have a way of precisely identifying what application am I talking about and what researcher. So they were the ones who invented DIY and awkward for this for that purpose. So it, the identifies that we use come from that paradigm. Now vocabulary how we might be asking what on earth is this all got to do with vocabulary as well. Okay, it's knowledge infrastructure and more broadly. So what does it have to do with vocabulary as well as it happens, quite a lot and not much at all. We'll get to that. So go back to this example. And again, to my poor development team. We constructed remember a search out of Australia, the aggregation of all the data sets and algorithms and software across all these organizations in Australia, linked again made as an input for research and linked to all the other outputs. When we did the user testing and the requirements. The researchers said, well, we brought together all that a because it was part of policy to put data sets in the context of their outputs and outcomes, you know, put it in a proper context and people wanted to know which data set was created by which people that was a very good use case. However, when we did the actual user testing. The researchers said, oh no, I'm not searching by our C grant I'm not searching by an orchid identifier. And I'm searching for the soil map of a CT. And so Melanie Melanie is here. Melanie was involved in some of those interviews where we said to the researchers okay, what are you actually looking for and they're all ideas. They're all concepts that none of them were the artifacts of the research process that we're talking about before the grants and the institutions and the, and the people some the people to some extent I'm. I'm being a little bit poetic there. But these are real. And we've also had Ming Feng from our team which of you should know if you never met Ming Feng you should. She's been doing analysis over the actual, you know, these are actual queries. When people come to it we've got the logs that we constantly analyzing. So people are looking for recordings of the Latin language. They're looking for spatial biophysical intertidal and subtitle data sets. So, that's what's making the scientists run, you know, not unsurprisingly, it's the ideas. That's what that's what makes a, you know, our sector is not an ordinary production sector. It's the ideas which are at the core. So, again, the developers are saying well, you know, still it's not easy, you know, the concept of intertidal. Coming to us from industry research facilities institutions and governments. It's exactly the same problems we had with Brian Schmidt. It comes to us in all these very. In fact, I've probably got a slide I should keep up with myself otherwise. How am I going for time Steve. So we had exactly the same problems trying to bring together ideas across all of that distributed system. You get the same kind of thing the concept of intertidal is siloed in organizations it might be it might be defined or it may not be defined or it may be poorly defined. It's certainly not connected to the other ideas. And even if we were talking about the same idea being able to actually precisely say that by referencing it was not possible. So funnily enough, that's why I say, actually, they are related but not exactly. So a vocabulary service then from our point of view was an initiative to have the concepts out there shared as a community resource. Not siloed in back in the institutions or the tables of the data set of databases all over the place to have them precisely defined connect other references and to provide consistent referencing. Now my model that breaks down a little bit because the consistent referencing. IRIs and we've got Richard up the back here. Everyone should know Richard as well from our team who is working on both our vocabulary service and a new persistent IRI service to support link data URLs. So, the three part model that I introduced at the beginning still plays but they're not necessarily separate things. I've introduced vocabulary service in the context of the story of the catalog. And then certainly vocabularies, terminologies, precise semantic referencing does have relevance for discovery for data discovery or other resource discovery across the research system. But it really comes into its own in data integration. We did talk about data commons at the beginning I'm going to refer to them sporadically throughout the talk. Here's something we're doing with the social science data commons program in Australia. It's a napkin a napkin diagram from the God's Cafe seminar series between Steve and I were having every Tuesday over coffee. So we were talking about a problem that was it is relevant to your cross research but here I'm just giving it some context and social science. If you think about a big national data set, for example, a census or the census in Australia, and then you think about a local data set. Let's say a survey of cabra matter. There's concepts that are being scientific concepts that are being surveyed in both. Do we know whether we can integrate the two if I've asked a question about employment or social attitudes or something. Do I know that that's the same one that was in the national data set. Well, possibly. But not always and quite often. The ideas that are in these national surveys and I've just given for example the census here I'm not saying anything necessarily about the Australian census, but quite often. The concepts are hardwired back into some big database system as the column headings or keys or something and the definition that they're using is never really exposed out for anyone else to really use. So, same thing here, right spark researcher says I'm going to ask a question about that and it's going to be great. And then their idea goes hardwired into the survey. And even if I want to, if I wanted to, if a third party was going to come in and say are they talking about the same thing they probably wouldn't know, nor would the two parties. So the idea of vocabulary service in this context is to provide a an independent point, a community reference, a community supported reference where a research repository, the researcher and the or the software provider of the survey tool and the researcher doing the survey, as well as the national body are working with community enabled concepts. And we can know that if I've asked a question about employment, then my data set is going to be so much more valuable, because it's going to immediately I'll be able to draw conclusions from population trends. And so when it gets to the research repository, they'll know what to do with it because they know that what that concept is. So that's the, the concept that we're building within the social science data commons and there's quite a few projects around that at the moment is another one, Dougie thank you. We're doing some work we've got a people data commons which is to do with health. And here's an example where people are using standard terminologies, but the example on the, I just stole this from a Google search is snow mad CT and then the NC ithosaurus. But the kind of use case that we're working through with Dougie is New South Wales health and Queensland health have both implemented standards, but in slightly different ways. And then every time a research health study has to happen, then they're doing the mappings themselves and the mappings. You can't convince Queensland to, you know, have an official mapping to New South Wales or New South Wales to have an official mapping to Queensland is none of their business, but the researchers are keep to keep doing it so part of the work we're doing with Dougie is to look at Can we make these mappings an aspect that could be part of our vocabulary services already we're moving into new areas around that. In both those, I've got to say here, once you've gotten a slide looks like our DC are doing that and have solved that know we are working on the scenarios with both those communities the social science was the example from here here we're working with the health studies research community. When we now changing gear here. So just when we were talking about data integration in the social science example. You've got a domain that's working on a particular problem. And they can work together and say G it'd be great if we all collaborated. What happens when you get to impactful research remember that was the title that we were talking about knowledge infrastructure for impactful research impactful research almost always. It starts with a problem and doesn't obey discipline or administrative boundaries and says okay well what are the social determinants of health. And then the poor researcher has got data coming in from education health care, you know, urban environment employment. And so, as we found in our bushfire program, in order to understand development risk. So, what's the risk of developing a suburb in the north of Canberra. Well, that depends on that depends on fuel loads fire behavior. The environment itself. It depends on air quality, when there are fires, it then the different health repercussions of of the fire. So in order to just ask a simple question you know what's the impact of having a suburb in that spot, which is fire prone. The poor old researcher has got data from a number of different domains so remember Steve is getting this sorted out for a social scientist doing research within social science and integrating data with other big social science things but now he's ending up as part of a multi disciplinary community. Data integration at this level quite often and I'm using a really shorthand here, quite often starts with saying oh well, it's happening in this place, and at this time. So, there's an initial data integration between the air quality, let's say and the health data set by saying, okay, when what when was it, and where was it. And what was the air quality like at that point or what was the health outcomes where the hospitalizations or asthma. So you the initial linkage is kind of done on time and place. But then, you know, that's fine data integration. No. Then you've got correlates in two totally different domains health data is collected and the concepts in health data are managed in a totally different way to the way that the air quality people collect their data so you've got, you know, a rising in, in, in, or a decrease in their quality and an increase in hospitalizations and try to actually you've got researchers trying to work in those two different areas now. So, if we go back to the right at the beginning. We've now not only got, we're trying to get concepts not from different organizations, but from different conceptual communities. And all the stuff we said in the beginning about making the assumptions clear. The definitions and the references apply at a degree of difficulty that's, that's just a lot greater. And so we can't just have the assumptions about, oh, when I said hospitalizations surely you understand what no, we've got to bring those out into artifacts that will allow scientists from different domains to be able to take those concepts and work with them as well. So all the more important about taking care of our, our scientific concepts in these semantic services. I think we'll hear more about this from Simon tomorrow, Simon Hudson, but this area of the importance of semantic infrastructure to support these things, this kind of interdisciplinary research can't be underestimated. I should stop there but I'll just tail off taper off instead of stopping at the big bang of bushfires and world peace and etc. We'll go back into some knowledge infrastructure stuff. As part of the knowledge infrastructure. We need to curate the knowledge so it's there's another level of service and infrastructure required, because, first of all, we're talking about words and concepts and things, and they are as slippery as mercury. So what is anger, you know, and we're trying to use what is a slippery, quite subjective communications concept, and we're trying to use it as a precise scientific tool in a semantic infrastructure, which is okay. But if we just leave it, it's going to be like mercury and it'll just go all over the place to what the assumptions about the concept will change or, and so it needs definition. It needs leadership and governance from a group of leaders within research and government to curate. This is a concept, you'll need it. This is a concept, this is what we mean, and you can use it in public administration or research, but it's not like other, it's not like a telescope. You put it out, it doesn't start changing on you and you know morphing and, and turning into something else, the way that's human concepts to. For example, here, you know, that's what knowledge was, you know, a certain time ago. So knowledge also grows. So that's why it needs to be curated our knowledge organization systems need to deal with growth in science, as well as whole new fields of science that just appear over time, as well as the normal tendency of a concept to be clear in my head but not exactly the same clearness in your head. So all this just means that that governance in our kind of infrastructure is very, very important. And it's about community, it's about defining our community. And having, you know, having the governance over the semantic content. So what does that mean for infrastructure there's a there's a there's a role in the infrastructure to help to maintain versioning you ought to be really prosaic about it. Versioning profiles, different artifacts that that curate the knowledge over time. And then the last one, of course things get more and more complex we started with vocabularies. So there's a complexity of semantic approaches that that we'll trade in this graph graph there. We started with, we started with the vocabulary servers because that's a very concrete list of things that come into everyday science. And of course, those lists tend to want to be coalesced into more comprehensive world views where the lists become classes of things and those classes are then related to each other in a much more comprehensive view, either an ontology of the whole world or a model for talking about a broad area of science. So we have a vocabulary service. And part of our challenge now is okay well as the information science becomes more and more embedded into the rest of the research sector. How should our cement how should our vocabulary services evolve. So what would this look like in the future where we're not just talking about vocabulary services but the semantic services so that's why we are part sponsors of this conference today this symposium is for us to be informed about where the scientific initiatives are going, what the information science is that's required to support those, and what are the new directions for vocabulary services that could support more broadly the semantic needs of our leading edge research. Thank you very much.