 We've just talked about collections a little bit earlier, and I'm going to just talk about them in a little more detail because it's something you need to think about in software design of collections that are going to come out at the end for discovery purposes. According to this, from our perspective we're really interested in collections because of discovery. We want to be able to look for things and find them. In your context they may also be useful for managing the data and they may also be handy for administrative purposes. Really we need to think about what collections are useful for and it depends on the context. They're just a way of grouping things for your own purposes or in this case for our purposes. Our goal is shareable data sets so we need to think about how we can aggregate things into a meaningful level. Some of our earlier sessions we called boot camps or gum boots so we have a bit of a fetish going here. The issue of what a collection is is actually not all that straightforward. This is a definition that has been used. I've just abstracted out the bits that are slightly useful. You can see it's too generic to be of practical value when you're actually implementing something at a software level. Basically anything can be in aggregation. This one's out of the Dublin core collection description application profile. But again this isn't going to help us as much as we would have hoped I think. We've discovered from working with our early adopters that there isn't actually a single right way of saying what a collection is. So if people were hoping to get that today a prescription of what a collection is I'm afraid that won't be happening. The biggest problem that people have had usually is working out what level of aggregation is most useful both in their own context and for ants. Sometimes those things that first appear to be in conflict but I think mostly they've been resolved after a bit of discussion with our ants business analysts who've worked through the issues with people. Especially these days if you're talking about data that's in data sets which mostly I think you guys would be. Once they're in that data set like a big data store of some kind you can slice and dice them anyway and so any one of those slice and dice things can result in a collection. Does ants want a description of every possible one of those? Well no I don't think we do but it's probably going to depend a lot on your subject matter experts who'll be able to guide you in saying what's a meaningful aggregation. Sometimes there'll be discipline conventions you know we always aggregate them in this way. Sometimes it might be part of a policy in your institution you know we want to collect them in this way for managing management purposes or for funding the storage or some other issue. The level of granularity that you choose you know how big or how small the aggregation is it's always going to have to depend on practical issues I think and what works inside of your context. But from the ants perspective we're hoping that what comes out the end will be collections that can be described in a meaningful way so they can be discovered because if you can't discover it you can't ever get it for reuse obviously you've got to find it first. So these are some of the possible ways that you could bring things together to form collections. I won't go into it obviously the one that we like is the nice brightly coloured one at the top so some kind of intellectual theme subject or topic. So we'd like the collections to be brought together in a way that can be described as about something. So this is a collection of data about something about a particular research question about a particular thing that we're looking into. I think from the point of view of ants and discovery that's probably the top best possible way. Best possible result from ants perspective. You can see all of these others they may well be relevant in your local environment and it may well be the same data set or file belongs in more than one collection depending on the kind of context that it's at. So the whole idea of collection is a very flexible one. Also I think an issue that you guys may find particularly difficult is from the ants perspective we're looking at discovery way, way down the other end of the data lifecycle so you're working at the beginning where you're first capturing the data getting excited about automated metadata and that kind of thing. A whole lot of stuff happens to that data, years pass and way down the end is where we're interested where people are going to be discovering it for reuse later quite possibly from people outside the discipline entirely. So what kind of collection is going to make sense from those people's perspective and we're not funded for telepathy and we also aren't issuing crystal balls unfortunately. So it's going to be quite tricky for you to try and guess or think about what kind of collection might be valuable in the future to people who may not even be in this exact discipline and obviously that doesn't always apply depending on the nature of the data. Somebody told me about an example of ships logs from the earlier centuries from whaling where people have gone back through pulled out all the information about where whales were caught and from that you can say well there was no ice in this location if a whale was there. So I'm sure the people that wrote those ships logs or who've curated them in some museum had no idea that this would be a climate change data input way down the track but you can't always predict what sort of uses the data may later be useful for. This is just a brief interleard into metadata quality issues. Once you've tried to decide what kind of basis you're going to aggregate your collections on please give it a name that means something in relation to that collection and give it a description that also is consistent with the name and in the descriptions please say why you've created the collection as it is. What was it about those things that made them the same that meant they're in this collection? Sometimes we've seen descriptions that are of the project or that are of the publication but we also need to make sure that there's a description there of the collection itself. Why is this stuff collected into this grouping? Of course the reason we want that is for discovery as we will keep saying over and over. I'll talk more about some of these other things this afternoon because we're a bit short of time at the moment. Now I'm going to whip through this and apologies to the hardest people who probably didn't know I was going to be demonstrating this stuff. I'm doing this because I wanted to show the different levels of metadata that a collection might have and then show how that's been represented in research data Australia. Tata says we've had people talk to us about it before. It's all about crystallography. Here's the top level of that repository. It's got a couple of experiments there but experiments is the top category or collection level in this repository. Here's a whole lot of metadata about one of those experiments. I've cut out a bit of the description so I could fit the good bits in down the bottom. You can see it's got a persistent handle and it's also got some data set information. There's two data sets and 127 files inside of this collection as well as this metadata of course. Drilling down a bit is a couple of the data sets and you can see there's more levels down below this. Going down into there, we're getting into the nitty-gritty now so there's a whole lot of stuff about the actual imaging process about which I'm absolutely ignorant but if I was going to reuse this data obviously this is the stuff I'd be needing to do that. Does this information need to be available for discovery purposes? That's a very tricky question. It would depend who you were and what you were looking for. From Anne's perspective we're trying to make people be able anyone who is interested in this topic and doesn't even know they are still be able to find it but obviously a person who's an expert in this area they're going to want this low level metadata. This is how this record is being created in Research Data of Australia. Again I've cut off the top bits so I could fit it all on. In our case the fabulous choirs over at the Sync Retron have created a collection which equates to the level of their experiment so they've aggregated up to an experiment we're calling that a collection. You can see down here that they've added extra metadata some subject headings, links go back to the actual source at that URL there and I'll talk about this more this afternoon but I think it's interesting just to look at what they had in their store and then what we are putting in our store. There's obviously much more layers in the local store and I think that's the kind of issue you're probably all totally familiar with and you're wondering why I'm telling you it again. I thought what we might do now is a little bit of an exercise which Margaret's going to hand out so you don't have to listen to my not particularly healthy voice right now. There's two case studies on these handouts here so what I'd like to do is talk to the friend next to you or the whole row if you like. Just choose one of them and after there's questions on there about how you would aggregate these up into collections that we would be interested in. I'll go back to beginning. Two case studies, pick one of them and please look at the question of what would you want to describe in research data Australia how many datasets at how high level or how detailed a level would you want to describe those in order to make them useful for discovery and I'll give you due to other people going so long, Nick and Andrew we'll only have five minutes to really do this and then we'll have a five minute sum up at the end. Okay everyone, sorry to cut short this interesting little exercise but I'm sure you won't want to be held back from your lunch either. Okay how many people looked at the first one, the plant phonomics exercise? Okay so how many of you thought one data set or one collection was going to be adequate to describe that? Okay would anyone like to say how many they thought would be required at minimum? I know there's not a lot of information in a one sheet. Lots. A data cube. Okay so that's of course true and I'm thinking from the point of view of discovery if a person has access to the data cube they're home and housed but what if they don't know that it exists how will they find it? Yeah so anyone else like to comment on the plant one? How many collections you thought you'd be wanting to describe apart from lots? And people that looked at the social science data archive one which is actually even more clear cut I think. How many people looked at that one, the second one? Okay how many people thought one description would be adequate for that? And yet the Australian social science data archive has in fact got it under one which is what you're reading on this sheet of paper here. You can see that those sub-collections there could easily be seen as useful collections in their own right out of the context of this but maybe rights issues would prevent you using it in that context. The collection isn't fixed you know it can evolve through time and as other people mix and match and mash up then new collections happen. I don't think ANZ has totally got a position on how we want to handle that. The idea of being able to have a big gigantic data cube of every possible piece of data that we can query across I think that's a technology thing really because at the moment we're stuck in a relatively old fashioned way of doing what we're doing because that's a tried and true piece of technology that will get us started but I don't personally and this is an unofficial view I don't think that's our end game that's our starting point really what we've built up to now and I'm glad you enjoyed the exercise anyway that does show the difficulties involved in describing collections and working out what a collection actually is. So my wish was...