 Okay, so look, I'll get into the project and so on in just a second, but it's the longitudinal spine of government functions, that's the project name, and essentially we're linking functional classifications of government for a few purposes. Okay, I'm going to progress. There we go. Okay, so a very quick outline. I'm going to talk about the goals of the project, the sources of data, and that's when I'll bring David in because the National Archives is one of the sources. Approaches to concept mapping, and then just a little couple of other interesting bits about mapping governance and some technical systems that we're using. I've got a bunch of additional slides sort of at the end of the project if anybody is interested in more detail about, in particular, term versioning and how to actually create RDF from non-RDF sources. Oh, and I should mention that with me in the room here in Brisbane, not that you can perhaps see him, is Jake, who's a student from Griffith University doing all things semantic web. Okay, so very quickly, the longitudinal spine project's goal is to really improve queries about federal government structure and the functions of government structural units, and some of the example kinds of queries that are possible now, but that we want to enhance, so make them more effective, quicker to execute, and so on, are queries like this. After a particular change in government, list all the matters, each matter, and their responsible portfolios. So a matter is a functional application of government according to the administrative arrangement orders, which are a government-issued kind of structure of government instruction. And so second query might be, find all the portfolios, so government portfolios responsible for a particular matter over a 10-year period. So again, this is entirely possible now, but this is very difficult, a lot of manual work. And then finally, another kind of example query is, find the National Archives of Australia's record series, or multiple series, for agencies carrying out the function, Vections and Titlements, which might also be called War Pensions in a different vocabulary, or perhaps a narrower term, Siemens War Pensions. So there's different ways of referring to that same term, and we want to get all the results. So who's involved? So currently, the project is a platforms to open data DIPA project, all these acronyms, but they're spilled out there. The project is the technical partner and the leader of the project. And then the clients and partners on Department of Finance and National Archives both are interested in the results of the project, and both contain significant sources of data that have been dealt with in the project. Why is this a vocabaries project and semantics project? Well, some of the functions in government are already listed as semantic vocabaries, and there's a lot of term overlaps and definitions that need alignments and so on. Okay, so very quickly, the framing of the data used in this project, we've got government entities, so these are agencies, they have different words and different datasets, but you know, units of government, and they're assigned government functions, and the functions can come be legal functions, they can be policy, you know, there's a whole range of ways you might establish these functions and describe them, and both the government entities and the government functions change over time, and that just makes things difficult, but that's the way it is. Now, the datasets that we're using in particular in this project, it's the Australian Government Organisations Register, AGOR, that's Department of Finance, the Collection of Administrative Arrangement Orders over time, so that's really, I guess, issued by Prime Minister and Cabinet, not quite sure who owns it per se, but the National Archives have got them all listed as PDF documents, and then we've text mined those. The Commonwealth Records Series database, that's a database that contains information about Commonwealth persons and agencies and their changes over time, and the records that they've produced, so you can find things like, you know, all of the documents that John Howard made, and also all the documents that pertain to particular functions over time as well. And then finally, budget papers which include instructions on how government spends money in functional units. Okay, so many of the data sources that we're using are already themselves public, so for instance, the AGOR information comes out and is publicly available at directory.gov.au, and on the left there you see just a web page for CSIRO as described in the directory.gov.au system, and on the right there in the purple text box, you can actually get a machine readable export of that information from data.gov.au. So there are other ways to access other datasets, but most of the information we deal with is public already. Now, the vocabularies of government functions, there's things like the Australian Government Interactive Functions of the Sorus, a gift from the National Archives. There's a couple of international ones, so COFOG is an international government functions classification vocabulary. Some internal ones like the Bureau of Statistics is a government purpose classification, and then the National Archives also have additional systems such as the CRS, the Sorus, and then there are things which are not always thought of as vocabularies or functions, but sort of are for our purposes, such as government expenses listed by purpose. And then there's some derivative ones like COFOG A, which is an Australian government version of COFOG. Okay, now I'm going to head over to David at National Archives, and the way I'm going to do this is I'm going to bring up his presentation, and I'm going to see if I can click on the next slide for him as he tells me to. All right, thank you, Nick. Once I can, David, I'll let you know when we're on screen. Got a problem? I think we're getting it, yes. Now, so Richard, oh, sorry, not Richard. Rowan, can you just tell me if you can see David's presentation on the screen? Yes, yes, that's fine, Nick. Okay, great. All right, David, we're looking at the first slide and far away. Okay, so before you should see a slide for the CRS system, sitting on it with a very basic box diagram. Is that the one you've got, David? It is now, yes. Excellent. Okay, so one of National Archives fundamental items is to make the National Archives... Just one moment, David. In the candle office, we're just seeing the first slide. All right, let me reshare. I know, I've seen this issue before. Okay, I'll stop and reshare. Rowan, let me know if... Yes, we can see a record search screen now, the CRS system. Great. Excellent. Okay, good. All right, so one of National Archives fundamental items is to make the National Archives accessible to the public. And to support that, we chronicle the structure and functions of the Commonwealth of Australia. By documenting the government at a snapshot in time. And then the changes of those structures and functions over time. And then link it all together by recording the entities of creation. That's in the government agency, agencies, organisations and Commonwealth persons. And link these to the records that they create in accumulations of what we call series. They're effectively information management units. So we've got series and they're made up of individual items. They can be digital objects. They can be records. They can be files, pretty much anything, really. This underpins one of our fundamental archival practices and is the documentation of providence to authenticate the information we look after in the public interest. Now, the CRS system, which is more broadly known as the Australian series system is able to cope with frequent and lots of administrative change by keeping these entities, as you see in the diagram, separate. So the entities of creation separate from the units of the information that they create. And then linking those entities together over time. This is actually quite different to how archives and public record offices operate elsewhere across the globe. And we're actually slowly bringing the globe with us. So it's 50 years in making so far and we're still getting there. So this means we can... both the agencies and their records can be linked across time and accurately document the changing relationships between the agencies, between the information management units, the series and the items, and between the agencies and the series. So that's a very broad and brief description of the entire system. There's a couple of links there that are provided in the presentation to more information if you're interested. So one of the simplified containers, as you see in that box, like in this function for government, this has been represented in a number of ways by National Archives, each evolving for quite different reasons. Okay, so onto the next slide, Nick. Sierra Sosaurus. Okay, you're there. Cool. Right, so the Saurus is pretty much a standard Saurus which describes government functions. It was developed to aid searching methodologies and tools for these provenance and records linkages. It was first introduced in 1991 as part of the finding aid suite that we had at the time. It was last updated in 1999. It's a list of what was then contemporary, so mid-90s, broad narrow terms that reflect the major functions and activities carried out by the Commonwealth Government agencies from 1901 to the mid-90s. Okay, that's all I was going to say about Sierra Saurus broadly. You'll see on the slide there, that's how it's accessible to the public through our search engine called Record Search. You can search on terms there, and they link to agencies when you click on the relevant terms, and from there you can link to the records themselves. Way to go. There is also a list of these functions in summary form. I'll provide a link to that available on their website. That's pretty much it, really, so the main strength of that source is that it actually provides some historicity to what it's describing, so it goes all the way back to 1901 as we found later, and I'll get to that with our gift. There are a number of terms in there which effectively no longer used in government, but that's valuable because of how government works. You get language shift, linguistic shift, and the boundaries of what functions actually are and they shift as well, so even though it's stated, it's still very useful. Okay, on to the next one. Records authority, agency specific functions. Okay, so part of our work at National Archives is regulating the Archives Act permissions that allow government agencies to retain or destroy government information. We call these permissions records authorities, and they include the retention and maintenance of National Archives records as well. Functions, on the slide there, you'll see an example of one records authority. At the bottom of that page, you'll see records covered. That's practically a functions list for that agency at the time it was issued to that agency. The functions have been used to classify agency business from around 1999, and we revised that methodology from around 2007. By effectively simplifying it, but we still have these functions for each of these records authority permissions. These functions are tied to the agency that the records authority gets issued to at the time of the issue. So whenever there's a machinery of government change over time, with more changes in government, we tend to lose a bit of track as to which agency is responsible for what. This is the nature of how we do the permissions under the Archives Act. The business relevance of records authorities is dictated by the speed of change of the agency business, and of course significant change triggers the review process to update the permissions that are embedded in the records authority. They also see on that slide there's some links to AGIF terms. This is the place where we link records authority functions to top-level AGIF terms. So that's a useful linkage to be made in terms of what AGIFT is for, but I'll get there in a second. So the third dataset that we have is AGIFT itself. So that's on the next slide, Nick. Okay, so I'm assuming some of you will be more or less across what AGIFT actually is. I've provided some links to provide some more explanation about that, but this is one of our more recent products. It was initially developed in 1999, and for those of you that was around then, that was part of the Howard Governance Initiative to make everything digitally available for government services. So there was a thing called the Australian Government Locator Service, which is now an Australian standard. It was made Australian standard in 2010, and that had a classifier of elements for subjects and functions. There wasn't made any particular difference between those two, but they were separate entities. AGIFT is the functions part of that. For those of you who were familiar in the library world, they developed at the time something called TAGS, the source of Australian government subjects. Anyway, that was pretty much as old as the CERIS resource. So getting back to AGIFT, it describes the high-level business functions carried out across Commonwealth, state and local governments in Australia, provides standard terms for government agencies to use as part of the AGLS system. So it's a resource locator aid. It improves the discovery, visibility and accessibility of online government resources. It was significantly revised in 2005, and we provided it in much more consumable form on line a couple of years ago. National Archives joined the longitudinal spine projects as a means of not so much keeping these vocabularies current, but being able to link these together and across other government vocabularies to enhance our existing accessibility roles across time. In this space, in terms of changes over time, National Archives have realised we're pretty much the only government agency that actually pays attention to the changes over time and have actually documented it in some shape or form. That's particularly the CERIS system. These vocabularies that hang off that to a certain extent are what we have, but we know they are limited, and we want to be able to link them across government. This spine project is providing very exciting means of enabling that across the data systems right across government. We're starting with the finance data sets, but there's many more out there. Our principal source of authority for most of this stuff starts with the AIOs. That's the administrative range of orders. Again, they've been manually analysed because there was no delay doing it. The spine project is starting with AIOs to get them more machine-consumable and make them more accessible. This is really good, and linking our vocabularies across spine is going to help us in our work and across government as well. What we would like to do is maintain these data sets that the spine project is generating to continue to make them available for reuse in a machine-usable form, which, of course, they're not at present because they're actually sitting on our website, apart from a gift which is available in a number of different formats as you'll see when you follow the linkages. That's pretty much all I wanted to say at this stage. Back to Nick. Thanks, David. I'm going to do the switcheroo back to my presentation, folks. Let's see. I have to stop sharing and reshare very quickly, but it should be back in just a moment. Okay, so, Rowan, please yell, scream, and otherwise, whether you can see my presentation. Yeah, we can see it, but not in presentation mode. All right. That will be kicking in about now, hopefully. Yeah, that looks good. Thanks, Nick. Okay, so I'll just zoom back to where we were. All right. Okay, so you've got an overview of an organization there that's got a lot of vocabulary, some of which are in the forms that we like to use and others which are not. But before we get into some technical details very briefly about fiddling around with formats, let's talk about approaches to concept mapping generally. Okay, so the vocabularies that are stored in a semantic form, and the particular semantic form that we mean is SCOS, I think it's a format for really to all. And when I say semantic form, I mean a form where the individual elements of the vocabularies have identity and have named relations between the parts. So if we've got two vocabularies in a SCOS form or similar form, we can assert, you could call it deliberate mapping, we can assert mappings just between them. So we can say, you know, defencing one functional vocabulary is equivalent or is an exact match of defense in another one or something like that. And when we do that, we get inferred mappings as well where, you know, subterms of defense in one become subterms of whatever the defense equivalence is in the other. And so for the kinds of vocabularies we're dealing with in longitudinal spine, this is the main way we're going to do mapping because at the high level, I should say, because, you know, we're only talking half a dozen or dozen vocabularies of tens to hundreds of terms. They're not vocabularies of many thousands of terms. So we have, or we will make, deliberate mappings between these vocabularies. So the second way we might do this of vocabulary mapping knows a more interesting way and a way long spine is set up to operate, which is where we have data sets such as collections of government organizations that classification X and classification Y are mapped to, then we can start to infer mappings between these vocabulary terms. And David showed a little bit of that before. It was a deliberate mapping between the lower level records disposal authority functions and the higher level A gift term. So that's a deliberate mapping, but the data set that contains the total collection of agencies and their record series and so on kind of contains all of the above deliberate mappings and these sort of indirect mappings between where different systems have been used, different classification systems have been used to classify the same data set over time. And so you can see how we would just say that, you know, 80% of the time when you see defense in the blue column equates to, you know, defense and other things in the red column. And that kind of mapping is emergent from the data when we've collected it in the way we're planning on doing. Okay. Now mapping governance. All right. So the sources of the data that we're using the vocabularies themselves are fairly uncontroversial. As I indicated before, most are already published. Most have single owners. So the national archives have this data set, finance have this data set, et cetera. But the mappings between these data sets are potentially contentious because people can have different interpretations of how terms map. And these are necessarily multi-owner things when you're doing a mapping because you've got, well, not necessarily, often they are, where you've got two data sets, you're doing a mapping between them and each data set has a different owner. So what we're doing, and we're doing this in many projects, but we're publishing mappings between standalone data sets as individual data sets themselves. And we call those link sets. So they're specialized data sets that just do mappings. And then link sets are used instead of mappings within a vocab. So we don't have a vocabulary of terms, and then within there you see mappings like, you know, term another vocabulary. Instead, the vocabularies are good at standalone. That's not always the case. Some of them already contain mappings. That's fine. But when we're creating new mappings, we put them as separate link vocabularies. And then the users of the project's data are able to include or exclude individual link sets from analysis as they go. So they can say, I want to do such and such a query. Oh, and by the way, you know, I prefer to use these link sets rather than those ones, something like that. Now just a small indication is to where these link sets and things come from. So the low key project and other link data project, which is not specifically vocabulary-based, it has a concept of link sets, too, that links spatial data sets together. And here you see the low key's overarching model that if you could be bothered to follow all the arrows, you would see that it's got data sets. It's got link sets. Link sets are kinds of data sets. It doesn't say it in words, but the link sets are for mapping between other data sets. And we're reusing that same link set concept. We have different types of information that we're linking between compared to low key project. But the functional place of a link set is the same. And what's inside a link set? So I'm not going to explain this in great depth, but you see a series here of four little chunks of information. Each of those is a link expressed in an odd way between two items with qualifications in that linking. And that's what the link sets contain. Long list of these kinds of mappings. Now this is from a low key project, this particular example, but this is what a link set looks like. There's a written description of what it is. And there's a bunch of files which we've published, which actually contain those mappings. And the low key project's published nine or so link sets. There's a set of them that you can see there. And the longitudinal spine project is going to publish a certain number of link sets. We're not quite sure how many yet, but it's wherever we've got an assertion of mappings between either two functional vocabulary or data sets. Okay, now I'm going to whip through this very quickly. And if people are interested, I can stay on the line and just go to town on the technical information. And I've got a series of slides after the end of this presentation that contain more technical goodies. So for a quick note on for vocabularies that are not themselves already in RDF or semantic form. We just go through a little process of getting them there. We've just done this with the CRS and Soros that David mentioned, where we've taken the concepts. Now they are available online, but they're also available in the relational database table. And we've dumped them into Excel. We've used formula to fiddle around with them and to extract bits and pieces. And we've dumped that to a file and then we've validated and normalized that file. Again, if you want to see the actual steps laid out, I can show them at the end. We're using a single triple store database, which is an RDF database to contain all the databases and vocabularies implemented in the project. This makes peering easy. And we're using the GraphDB product. There's many different open source and commercial triple stores out there. GraphDB is initially a free one and it also has a commercial version. And we just try different products from time to time. And in this project we're using GraphDB. That's what it looks like. This is a critical interface to manipulate or to manage the data. And what you see there is a series of individual graphs of information that are stored within the database. And so you can think of the graph as a database schema, like an individual schema within a total database. And if you look carefully, you can see things like Agift, CRS, et cetera in there. And we're making a query UI to grant technical access to all of the information that Long Spine generates. It looks something like this. You can drop sparkle queries in there and access all the data in all the graphs that we just saw in the last couple of slides. And we're developing example queries now so that people can know the kinds of queries we think they might want to run. And of course, they can innovate on top of that and make their own. And finally, we're now entering into building specialized clients for exploration and visualization of the data. This is just one out of the box with GraphDB. But it's the kind of thing that Richard spent a lot of time building for ARDC to let people click through complex structured data. Future work, data mining across our datasets and vocabulary to try and establish some of those statistical links that will emerge. Making example queries, client development, and then ultimately tools to allow other people to create links sets across government functions and organizations unit so that we don't have to do them every time that they might want to do themselves. Okay, so that's the formal end of my presentation.