 Today I was just going to run over what we're doing at Deakin in terms of this space of integrating storage with description discovery is where I've loosely described it. So I'll start the presentation if it's going to advance for me. So we've got a fairly loosely coupled ecosystem to handle this at Deakin which is great. It's flexible in design but as I've said there it causes a lot of confusion in practice and I've had a lot of problem getting researchers engaged with the fabric because it is quite confusing and you'll see with a diagram I'll present a couple slides later what I mean. So just trying to disambiguate some of that and clarify what the tools are about and how they can actually assist rather than inhibit publishing of data and using the storage is really what the focus is on at the moment. So in the described space we implemented Redbox Mint under the two in the commons and other ANDS funded initiatives and we call that research data footprints to describe the footprint of your research. We've got the discovery layer so that repository isn't what we present to the world at large. We feed that into our Azure Thunder Research Repository which is called DRO, Deakin Research Online and then that's what's Research Data Australia harvests for the individual records and the actual data that may be shared in an open way is made visible through a very simple very basic portal called the Deakin data portal which is basically just a Mopachi server on top of the data itself and I'll show a demo of all these things later on to expand those screenshots. So when we were implementing this metadata repository we also implemented a research data storage system which allows researchers to provision storage themselves. We didn't have any strict requirements on that so anybody can create a bucket to store data but it is aligned with that data portal when a research is ready to they can publish the data itself and it will link those things together and that's what allows it to be exposed to the data portal. So how does this all fit together? This is the diagram I was talking about just before so we've got various components and I'm sure most of you would be familiar with some of these systems in play but basically the management system is the source of truth for project and party data around researchers that feeds this repository I was just talking about. The storage system can be you can create storage and choose to link it to a project or not. So I'm absolutely with that because we understand that the actual process of writing a grant can actually generate a bit of data so before success outcome so we didn't want to dissuade people from using the central storage that we've got on offer and really it was also a character to stop people buying external hard drives and storing data locally on their machine so having that resilient storage in our data center was a pretty key point for that service. And then the rest of it is pretty familiar to most of you so we mint DIYs against every data set that's created and expose that through to this fabric down the bottom. So it is a bit of a quagmire and does cause a bit of confusion but with the presentation layer which is our focus on the moment it is limited in that it's just a bucket of data and we're just presenting it as a list and so the benefit to the researcher is limited and that's what our focus is on now is looking at well how can we better make people aware of this storage that is available and how it should be intended to be used and how can we better display some of the data that people are generating. At the moment I'm getting a lot of people creating storage containers or collections and just backing up their whole hard drive to it and there's really no description and delineation to how they're describing things so it's really identified to me that there's pretty poor practice out there in terms of how people structure what they're doing and so that's where our library staff are helping out a lot in that one-to-one or one-to-small group discussions around how better to describe and manage data in the broader context. What I was also going to say there is we've got a portal at Deakin called Deakin Sync and we're looking to provide some context to what researchers are doing there around storage and so when one of the ideas is to link present to the researcher if they've got a successful grant outcome present to them the option of creating storage if we know they haven't linked it to that project already because we've got all that metadata there we can actually leverage quite a lot so with that portal we can provide a lot of value and direct all the researchers to go there to say okay well you may want to be creating some records because we can see the project's been running and it's near the end of its life cycle or at the earlier stages we actually create storage to put the data in that you're planning to generate with that project. The other options in the presentation layer we're looking at are discipline specific or quite aggregated systems that allow you to display data for various different disciplines so we're only just starting to look at how we can integrate these things into this platform or this ecosystem and so those things are like a meeker for all different disciplines that may want to create collections and manage them themselves and use that as their presentation layer rather than just a bucket with a patchy index on top of that. FigShare and Mitardus are bringing around image data, FigShare being quite general and looking at FigShare for institutions and how that could potentially play a part or MediaFox were still really investigating all those different options. So that's the real ecosystem and I didn't want to go into too much on that and really wanted to show you how it all sort of functions. This is the red box system we have and most people would have seen that in the past. It's allowing you to create the data descriptions as we all are well aware. What I wanted to show you here was the process we go through for each of these and how the DOIs are linked into the actual the data portal side of things. So when the process is they create a metadata record and then when they're ready to publish the data they click a publish in the store which I'll show you in a minute and then the links for that come into here and it publishes this data portal link and you may be able to see on the screen the URL down the bottom which keeps those two things in check and then when you go to view that actual data collection you can then see it on this data portal and which one was that? The interview data for some Papua New Guinea audio interviews. So we replicate the metadata from that footprints record and actually show the contents here to be able to download if you want to. But it's very very basic. There's no packaging of that which would be really ideal. There's no thumbnail sort of view of that so really you're just downloading in that first example there 800 megabytes and then you can actually understand what it's all about. So exporting the metadata of that mpeg file in this case is not really done at this point and that's where I'm wanting to get some improvements to present that better. The data store is this system here it's just a web application that hooks into our corporate storage that we have available and what we've done is provided four collection types and we allow researchers to create those activities they can link those to a project and then they can create these buckets to store things so they can create a traditional network attached file share which is these little yellow icons and they can create any number of those. There's a nominal limit of 10 but they can create any and they are unlimited they can put as much data in there as they like and that uses our what technology we're using now we're using Isilon storage for that so it means this snapshot's taken three times a day one snapshot at the end of the day for three months so they've got complete ability to restore files and manage their data very flexibly. There is another one called a publishable file share so when they're ready to publish data they can create one of those it's no different in terms of the technology but it allows you to hook into the actual footprint's record and then that little data portal link happens. The other one there with little star this is an icon for a product called Syncplicity so we're providing a Dropbox-like service because we needed a lot of researchers working with external parties and they've got a lot of issues sharing data externally so they can use this service now to do provide that so that's using our own on-premise storage with a Sync and Share platform on top of that so it gives them unlimited storage although unlimited in since that you need the storage on your local computer for that to really function but it has been very it's taken up quite rapidly because people really want that capability without having to pay for a Dropbox account and use that storage and the other collection which I don't have in this demonstration activity here is a wiki space so we've got a Confluence wiki instance which they can use to collaborative work internally and so the store the rest of the store has really gone from storage as in storing data to actually a store as you buy things and so that's going to expand we'll be providing a whole lot of other services through this research other store so blog engines and omicron instances and a whole lot of different things will be provided through this one portal for researchers and it all be tied together under this this activity or this project banner so a particular example I was going to show you is the is a Pacific Sea Star but Mark he's got a some sequence data that he's produced and he wanted to make it open so he's gone ahead and published that he's credit the now Fez Fidora Asher repository record through our footprint system and then he wanted to share that to the world so originally he he was working with the library and they stored the objects within the repository which wasn't great and so now they're provided through the data portal and so you can download the gigabytes or megabytes in this case of files and one thing I'm advising researchers is to really be descriptive about what that is I'm sure people in his discipline understand what all those different file formats are but it doesn't really have a overview sort of read me file it could describe it better so we're working with them on that and that's presented with that hook up through that that link there and also is I think that's available here so you can actually be taken straight to that record all the DOIs map through to our research repository so footprints really is just a collection gateway that links those things together and allows the the record to be curated as accurately as possible so really that's all I was wanting to cover off today can Chris talk more about the publishing function absolutely so really it's it's we call it published but it's it really disformalizes the link between the two systems they create a publishable file share which is just a a network attached storage location everyone should be fairly familiar with network touch storage this is a network drive and so they would just have a folder like this to store things let's just say this workshops one for example is something though it's all that would structure their data within that space that's completely offline it's not exposed for anyone other than themselves and then when they're ready to publish the data I'll just see if some of the this is uat so the I might get some errors but when they're ready to publish the data they can then click a publish button quite simply here's good it's ready to go so this particular folder which is fictitious because it's uat but when they're ready to publish they literally do that it will then look at the all the effort prints records and provide a list of ones that they that haven't been published and they can just choose that so in this example here I've already published against this other one but this one here I could potentially do that and then I can provide global access so to say yep anyone can get access or I could restrict it to an af member so in some way you could limit down to anyone who's a member of the af to who could see that so that's sort of semi open in terms of this collection and then within a few minutes that collection would be exposed through that data portal I showed you before so you would see it would appear here or if I logged in and it was restricted there would be more exposed once I'm logged into the system so anyone in Australia can log into this this data portal as you can see and then see that so that's how that that's that's working all right Chris there's a whole bunch of other questions that are coming in as you do that and says what's the maximum storage space the research you can request is there a maximum did you say yes what is the maximum none it's unlimited well I'm sure they'll all like that one so our IT managing the growth and capital acquisition that has to happen and they deal with that as it goes so yes it's completely unlimited so this next question probably ties into that which says what's the cost of the implementation what do you have a data storage costs so there's no explicit cost it's covered under our central capital expenditure on storage so it's just factored into all the storage that the university buys so there hasn't been an explicit cost for this particular service at the moment there's just about what are we up to 100 terabytes with another 60 at another site so nearly 200 terabytes is what we're looking at so not overly large we don't have any astrophysicists with a petabyte in their back pocket so it's it's probably relatively small to most institutions but it is it's covered under that so they provision that under systematic procurement throughout the year so every time that they're always negotiating a new price for that storage so I don't have to worry about that which is actually a luxury risk position to be in so probably that ties into that is a couple of questions which sort of meld together one is says use of storage by external to deepen users most collaborations in our national or international so is this possible that the external to deepen users can use it and there's another question very similar which says is this service going to be available for researchers in other universities and is their storage size limitations so the first bit is covered under that sync and share service where they can they can provision it so a deacon identity can provision it and share that with colleagues they're working with other institutions which but there are limits to that because if you're synchronizing to your own computer you need the hard drive storage on your own computer so there are limits the traditional network attached storage any deacon identity can access that because they can create a VPN connection but the external people can't so the way that's traditionally being handled at deacon is we often make those collaborators we need it as a visitor to the university and then they get access to the to the storage that's a little bit cumbersome and I don't but most people know how to work around that and then follow that process and the last question no there wouldn't be the ability for non-deacon people to create the storage space in the first place it really has to be instigated from deacons perspective okay another one is are the researchers able to mint DOIs by this publishing method yeah so in the footprint system that's where the DOI is minted and they are done by the library so the library when they're performing quality checks on the description are performing the step of minting it so it's done implicitly in that the workflow of a metadata record is curated by the library and they're the ones actually doing that but it's effectively it's a business to business transaction that happens on every one of those records so yeah the research themselves don't but the library does it on their behalf okay and then probably the last one so we can keep to Paul's time he says is that all the data stored on deacon infrastructure or is it stored on national infrastructure either this or local e-research provider so yes it is all on deacon infrastructure so amongst our four main campuses we've got two data centers and it's stored on the data within those data centers and replicated on those two so we haven't engaged with the the RDSI provision storage it's all purely within on-premise which our research is like because it means they can particularly if it's sensitive data they can tick a lot of boxes in terms of their compliance that they need to ensure