 I'm not actually going to be too technical, I'm still looking at it from the business perspective so it's only when you hear Andrew and Xiaobin that it will really get technical. This is what I'm going to talk about for the next 20 minutes or so. ISO 2146, the standard that our scheme is based on and a bit about how it works, what's in it and how it can be used to share metadata with ants. We've already talked about this obviously. This is our goal, great big global commons with information and data in it. We're mostly interested in the data part and how we're doing it at the moment is by sharing metadata. The reason that we have a little model here is that I was asked to draft up a first information model for ants, which as yet is still a work in progress, but this was where I started. So inputs, activity outputs which is like modelling 101 type thing, system design 101. I thought this is really our focus, what we're interested in. This on the other hand is Sally's personal picture of the ISO 2146 information model. It's, as you can see, got four main entities that standard says that they're interested in, which I'll tell you what they are. So you can see down there, there's the collection which is our key focus. The activity that caused the collection to happen, so that would be like a program, a project, the parties, the people or organisations involved with either the activities or the collections and the services, excuse me, through which the collections become available. So those four entities are described in that standard, which ants has used as the basis for our schema. I think, and it's very important to notice the arrows between them. Those relationships are the most important part of this schema, I think, from one perspective. And the reason this model was chosen was we've heard people talk about their particular discipline projects, their data capture projects. Every single one of those will have either some discipline specific way of organising information or also will have some custom one or some proprietary one, depending what software they have. There's lots of standards as well for organising metadata. We've heard some mention of some of those. So there was SDMX in the statistical world, DDI in social sciences, Dublin cores used a lot in library type and archives areas. Obviously, ants can't possibly accommodate every single one of those specific schemas and methods of organising information. So what we needed was something that could lie across the top and extract what it needed from those other ones but without replicating them. And so this was the standard that we went with. And in this little picture, which I hope you can see, I've tried to take my system design 101, the three green bits and sort of munch them in with the red bits which came from the ISO 2146 standard. So you can see our research activity and their activity fit together. Their collections and our research outputs roughly fit together if we redefine that slightly. Where we run into a few sort of grey areas are in the research inputs which aren't really going to fit themselves neatly into the ISO 2146 information model. And you can see up on the top left there the public sector administrative data sets. Obviously the government, the public sector creates a lot of data for itself for its own purposes. They're not actually always doing research although sometimes they do. If they do do research it fits down there in the green bits but if they've got data that's not been done for a research purpose but for an administrative purpose, that's still of interest to researchers in Australia. So ANS does have a program of trying to describe that sort of data. So that's that top orange one. Other research inputs have also come up as things that people would like us to be able to describe specifically the instruments that are generating a lot of data at the data capture end. Another issue that's come up and I think was even mentioned a bit today, analytical tools, derived data sets, things that are happening inside that research activity box. From our perspective they're not exactly outputs and they're not exactly collections. So where they exactly do fit I don't really quite know and that's my unofficial view. In some ways they could possibly be seen as services and Nick's been doing a bit of work on looking at what the exact scope of a service might be whether some of those things could fit in there. So most of those orange bits except the public sector data doesn't really fit in my sort of unofficial information model but certainly the research activity and research outputs in terms of data certainly do. So I just thought I'd spell out the standard name there. You can see that the standard that we've based our schema on was not originally directly focused on our current focus so we've had to adapt it and implement a schema that focuses specifically on our requirements rather than implement that standard exactly as it is. So if you go to that standard you'll see it's extremely generic so it doesn't really describe exactly what we're doing but the links between the entities is one of the most important aspects. RIFCS which as we said stands for registry into change format collections and services. It's an XML schema and it's not a model that you would probably want to use for a data store. It's more stuff that has to come out of the data store than the format we want it to come to us in. A bit blurry. That should be okay in your packs anyway. Shows the four entities again, collection, service, party and activity and all the different kinds of relations that can be described between them. This is really the major benefit over this schema rather than using something like Dublin Core which would not allow us to describe all these relationships in this way. I should tell you more about what they are I suppose. The collections we talked about earlier, research data collections they might be images, they could be fossils, they could be mud cores anything that's of value for research but not publications. We have activities as we said, projects, programs, courses of study. We have parties that would be researchers it could be universities, research institutions and we have services originally implemented by Anne's thinking primarily of web services but as I said we've got Nick looking at exactly how that should work best for us in the future. I guess our basic scenario that we are attempting to describe would be that a researcher gets funded by a funding organisation to do some particular project. They carry out that project within some research institution. At the end comes a big pile of data and probably a publication and then the collection becomes available through some service and so we'd want to be able to describe all of the things that happen in that scenario and we'd want to also be able to link off to the publication although we ourselves aren't describing that. Our interchange schema is just basically a little pile of metadata describing an object so the object might be a collection, a party, an activity or a service. Monica will talk to us in a minute about the whole party description thing. Originally that was designed into our system to be described within our system. It's now looking like there'll be a far more difficult to describe an extended way of describing parties that also involves the National Library doing authority control work and so on. Monica will tell us all about it. This is the basic metadata that we're collecting which your systems are going to have to be able to push out the end if you're going to give us the discovery metadata we want. So the name of the thing, it's identifiers, any locations that are relevant could be addresses, the location of a metadata record, the location of a data set. Very important relations with other kinds of entities as described by their metadata records. Spatial and temporal coverage and subject which are extremely important for discovery and will also, especially the spatial coverage I think be very important for trying to link data sets together in the future and good descriptions including rights, information, so copyright or any other rights that apply and links to related information such as publications. Now obviously some of that would have to be manually created. Some of it might come out of the project proposals that people put in when they get their funding. There's quite a few sources for that information but it's still a bit of a trick to bring it all into one space. We also need a bit of metadata information about the metadata record itself. Group which we use for display purposes in Research Data Australia. A key which obviously is pretty critical because that's the thing that identifies the metadata record so that when you harvest it can replace the old one with the new one and a few bits of information to tell us a bit about the record itself where it came from and how old it is, what's happened to it. I can also just constrain the information by saying what language it's in or dates from and to that it was valid from. It's all pretty standard stuff and then because it's an XML schema there's also a couple of elements to do with defining the structure of the information within the document. So you think of XML as a big document has a thing to mark the start and the end and then within it it's got the individual records which also have to have starts and ends and I'll show you that when I do the Research Data Australia demo a bit later. One of the first things that has to happen if RIFCS is going to be provided to ANS is that somebody's got to sit down and look at what metadata you've already got and map it to what we want to have and this document here which is on our website tells you about the semantics of all the elements and the vocabularies. This is where we're going to keep updating all the guidelines about how the vocabulary should be applied and there would be a full web version except I haven't quite finished it yet so at the moment it's available as a PDF but I'm very hopeful that it's close. Furthermore, our schema which has been stable for about two years is now about to undergo some minor enhancements. The spatial and temporal coverage used to be in with the addresses, it's going to have its own element. The related information element that just lets you point to something outside will now also have fields that let you say what it is that you're pointing to. So say you're pointing out to PubMed or something you'll be able to say a publication. We'll have a new citation element so that you can put a full citation in for the dataset and that ties in with what Nick was talking to us about earlier, the citation and the DOIs and we sort of are aware of course that by changing the schema we're changing what people who've already implemented the schema have to do at their end and so we've got a lot of transitional arrangements in place and our website's got a whole lot of information about that. The whole question of how the schema should be changed and who's allowed to say that it should change and so on we've had a lot of people who want to be involved in that process and Anne's is in the process of setting up an advisory board that will bring in all the community input into that process. The basic schedule is that we would only change it annually not willy-nilly all over the place because we realise people have to schedule and plan and allocate resources and so on but I think it's probably reasonable to say that the schema as it was originally developed was sort of not exactly a prototype but it wasn't as much informed by experience as it is now so as we get a lot of people starting to use it including you people it's quite likely that the model would need to adapt and so therefore the schema will as well. Here's a nice pictorial version of what actually happened so you can see in the middle there's our Anne's collections registry our database I should have done a server picture but that's what we got Research Data Australia, our portal looks at that for its information and then down the bottom there's three ways that this RIFCS information is getting into our database so over on the left yes, the left published my data which is for individual researchers particularly if you had people who were at the end of their career and had to do something straight away our preference however is to use the things over on the right-hand side which are institution-based obviously we'd much rather deal with 50 research institutions than 100,000 researchers and their chances of getting good quality stuff out is much greater as well so on the institutional side there's either the machine-to-machine service or there's also the manual data entry option which Andrew referred to earlier and that's also available in our sandbox environment for you to play with and probably break no, you won't break it, trust me I'm only looking at this from a business perspective so I apologise to the people that are technical because this is obviously something you already know I created this little picture of how harvesting works for Anne's because I was trying to understand the log entries and error messages that I kept getting when I tried to do stuff and work out what was going on so over on the left you've got our data source where you've got a little data source administrator sitting there that data source administrator is the person who determines the whole harvesting process when it will happen, what will be harvested and by what methods so that person configures it that's step one, step two that our database instructs the harvester according to whatever the data source administrator told it the harvester application goes away, gets the metadata from over back over there takes it back to Anne's then it inserts that metadata into the registry database from which research data Australia is fed so error messages that I was trying to interpret came from each of these six places but I think it's still quite helpful to see that the data source administrator controls the harvesting process and this here is the basic process of setting up the feed from a business level obviously and where you people are all working is way back here on the left way to the left before this arrow even starts really because first of all we're assuming there is metadata and that's really where you are once there is metadata it has to be mapped turned into a RIF-CS exposed for the harvester to get it and then get inserted into the registry and made discoverable so this probably is not the perfect slide for you guys because I should have another box at the left which says arrange to capture data and metadata Anne says various services for helping you to provide us with this RIF-CS XML register my data as I discussed before register my data as for the machine level and publish my data as more for the individuals the identifying services that Nick described and we also have a whole lot of web services that you can use to work with your metadata records and here's some links which will obviously be in your packs how am I going? fast? okay so that's the end of that question anyone want to ask me anything?