 The title of this session is Long Term Data Sustainability, but if you're expecting answers then you're going to be disappointed, I'm sorry, I don't know that there are answers. There are things we can do, but really this session is just about exploring some of the issues that I think you all need to consider when you're looking at how long and whether you should keep data. So I'm not going to give you answers, I'm not going to give you solutions or approaches, I'm just going to bring up some of the issues that I think you all need to get your heads around, things that you have to think about during the business of creating, managing or caring for data. There'll be a bit of time for questions at the end, but if you have any questions during the talk, if anything I've said is not clear or you want me to repeat something, just interrupt, I'm happy to take things on the fly. So this is just a brief overview of the topics, the things I'm going to talk about in the next 35 minutes or so. And I do, I don't know if I've warned the right word, but I'm an archivist so I do have an archival perspective on what preservation and long term retention means. I have particular archival concepts that I've been working with for the last 20 years. So part of what I talk about is things that are familiar to me as an archivist, which may not be familiar to you as data curators or data creators, but I believe that a lot of the archival insights and practices can be applied to managing data and preserving data over the long term. So you will get, it won't be archival, but I'm just telling you, some of these things I'm talking about come from my archival perspective. This is where we're starting from, why I'm talking about data sustainability. Have you all, have you had to talk about the code? Has that been on yet? So you're all familiar with every word of the code and you're all ready to apply it? Great. Well as you would know then, part A section 2 says in an overview, introductory statement, that policies are required that address, among other things, the retention of research materials and data beyond the end of the project. So it's encouraging you to think in the long term, not in the short term. Now I know there are some issues about implementing the code in academic institutions, but all the same. It is a guide to the sorts of things that you need to think about when you're undertaking or being responsible for or managing the outputs of research. And one of the things you have to think about is having a longer term view of the data that's created. So I'm just putting this up to show you that this isn't something that Anz has suddenly thought up as an idea that you all need to think about. It's not just Anz that's concerned about these things. So the people who wrote the code, the ARC, the NHMIC and the University of Australia have also thought about issues such as long term retention of data. And there is now some understanding in the ARC, much more explicitly, that data needs to be kept for long term past the life of the research projects that create it. So in the funding agreement for discovery projects, and this is the only one that has this section in it at the moment, but I'm sure it's going to be extended to the other sorts of funding arrangements that the ARC and the NHMIC eventually have. So they also are very explicit about research projects and research teams and their organisations maintaining data for long term. They, as I say, it says must, must ensure researchers look after and store safely any data, specimens or samples. Yes, Andrew. Did you forward to say any ask questions as you go through it? Is it not the case that this is why speed limit would have speed count? They say it, but I've never heard of any researcher not getting their next ARC grant because they didn't comply with this. No, no, I agree and that's, but you know, it is early days. It is a sign that things are moving in the right direction from Anne's point of view and from my point of view as an archivist. So ARC is actually saying, especially, you've got to look after it and keep it. You can't just chuck it away or not think about it. How long is this video recording? This is in the 2010 and 2011 agreement drafts that are on the ARC website. I think 2010 is the first year it appeared. Is that right, Margaret? I think 2009 I said should or something like that with the shooting from the first time. Okay, now it says must. So it says other things, but I mean lots of other things, obviously, but essentially this little section is about making sure that the data lasts beyond the life of the project. So they're saying you must deposit it somewhere. And the arrangements you may have to be acceptable to the ARC. So you can put it in an institutional data store or data repository and I'm sure that would be acceptable to the ARC, but they suggest you could also deposit it in, I'm not sure why museums, but perhaps because they're not talking just about data, they're talking about specimens and samples as well as data, but also archives, so the social science data archive, for example. Here's another data repository where you can deposit social science data. They also ask you to explain why you're not going to do that if you decide not to. And that has to appear in the final report of the project. So they are getting more rigorous about what they expect of you as far as preserving data goes. Oh, sorry, not the things over here, isn't it? So really I wanted to just to give you some backgrounds why we're talking about this and thinking about these sorts of issues, why we actually have to consider it, why you need to consider it as data curators or data custodians. So what do we mean by sustainability? I've just put that question here, I guess, in the sort of things that I think we need to actually try and achieve, what sustainability means. It's about maintaining data for the long term and that long term is obviously an open to discussion, what we mean by long term. The ASC and the code both see long term as being beyond the life of the project, but they don't have any views on whether it's for 100 years or 20 years or for 1,000 years. But in my view, the things that we need to achieve when we're maintaining data for the long term, over the long term, is that it must be findable. There's no point in keeping it if no one can find it. It must be accessible. So people must be able to get hold of it. They must be able to use it. And the data must retain its integrity. So whether you keep it for five years or 10 years or 200 years, the data, in my view, needs to have these characteristics after those periods of time, otherwise you've just wasted all the resources that have been put into maintaining it. And the other issue about sustainability, which I hope we can have a discussion about during this talk, is the issue of what we keep, whether we keep everything or whether we throw lots of it away. And as I said, I don't have any answers, but I think it's an area that will need lots of discussion over the next few years to try and work out consistent approaches that everybody can accept. So when we talk about sustainability, all we're really talking about is preserving data over the long term. Sustainability is just another word. And the way Anne's talks about it for preservation, it's not quite as loaded as preservation, which has been overused a lot in the last decade or so, I believe. So in Anne's, we talk about sustainability, but we're still talking about preservation, essentially. I'm going to cover each of these areas in a little bit more detail, to what I mean by findable, accessible, usable, and integrity. So when I say something has to be findable, first of all, you have to keep it. Otherwise, it won't be findable, obviously. I mean, that's very basic, but you've got to keep it for it to be findable. You have to store it, and you have to have some sort of consistent, structured way of storing data. You can't just let... Leave it up to researchers to keep it on CDs or their thumb drives and do whatever they like with it. That's not really storage in the way that Anne wants you to think about it. Because if we leave it up to researchers and they do that sort of thing, leave it on CDs, stuck in a cupboard or leave it on a thumb drive, stuck in their briefcase, then it'll disappear. So storage has to be more systematic and structured than that. And the third thing I want to mention is that the data must be described to be findable. Now, I know a lot of people don't like talking or hearing about metadata, but it has to be raised. If you don't describe data, no one will be able to find it. You have to have a basic level of discovery metadata so people can actually find what they're looking for. If they're looking for data, that's about biological resources in the Barrier Reef Marine Park. You've got to describe it, or people won't be able to find it. So as I said, I'm not telling you what you have to do. I've not given you solutions to these issues, but these are the issues that you have to think about when you're thinking about keeping the data findable over time. It's not enough for the people who've created the data to know how to find it. If the data's gonna be any use to future scholars or scholars outside the specific domain of the researchers, then it has to be described. Otherwise, it'll be lost. It'll be useless information. Are you all with me so far? No, I'm not gonna talk a lot about metadata, I could, because I've spent a lot of time in the last 15 years working on metadata standards, but I do want to tell you that you have to think about metadata. You can't ignore it. It's difficult and sometimes very complex, but you've got to think about it. I know why I keep pointing at that one. Data has to be accessible, so it's not just enough to store it and describe it. You have to be able to, people have to be able to access it. So it has to be available. Now, it has to be somewhere that people can get at it. Can't be stuck on a CD in someone's drawer. Could be findable through a portal, but it's not accessible if it's stuck on a CD in someone's drawer. So when I say available, I mean it has to be in a data store of some sort that people can access, people can connect to somehow, which means that the data sets or the descriptive information has to have some sort of location. So it can't just say it's on Fred's CD in his second drawer, because people can't connect to it. You've got to have some location information and ideally a persistent identifier of some sort, so it won't break. So if I look at it this year, I can go to the same location in 20 years time and find it again if it's kept. So you have to think about how you locate, what sort of identifiers you give information or data sets that you're keeping. And the other issue that I want to talk about briefly under accessibility is the whole issue of privacy, IP rights, ethics, copyright issues. I don't know if these issues have raised their heads yet in your repositories or your institutions, but there are strong groups in society who are very concerned about privacy issues with data and other sorts of digital information about how you can release that sort of information and make it available to people. How you manage copyright. Now, I think you've had a talk about Gov2, so you know the Australian government is moving towards issuing information under Creative Commons licenses. So these are the sort of moves that people need to think about, I believe, and when they're looking at how they make their data accessible or available to the wider public or the wider research community. But you're also obviously going to run into issues of privacy and what ethics committees think. I know, I think, all your institutions have their own ethics committees who all have their own particular requirements about how you manage research projects, especially those that involve personal information. So they're going to be issues that you have to deal with when you're thinking about making data accessible. You've got to think about what sort of privacy and ethics issues are involved in releasing, say, medical information to the world. Of the way in which violent families are not considered accessible. Well, how can I get it? How do I know who to speak to? What will be registered to. So, yes, I'm not, I suppose I'm not, I shouldn't be suggesting that it has to be online to be accessible, you mean? Or if it's, if someone keeps it in a CD in a drawer. But if I'm a researcher in the UK and I want to look at it, what do I do? Yeah. Okay, but it involves a lot more effort. But it doesn't mean it's not accessible. But how do people find out? Okay, I'll take that point, maybe I'm being too black and white. But there's degrees of accessibility. There's degrees of accessibility and everything could be able to be accessible. Yeah. No, you can always find a seat, you can always kill someone, then maybe it's a degree, where you draw the line. Yeah, it's an efficient way to go to the Galaxy, but probably yes, we can make everything accessible. But it's the whole thing which people do really just avoid fooling people into the information request, but I think it's difficult to access. Yeah. Part of the sort of meta game of this thing is that we actually make access easy just by just making data accessible. Yeah, and the idea of the Australian Search Data Commons, which is the ultimate aim of the work that ANZ is doing is to have data out there available for people to access. Without having to go and speaking to somebody who they happen to know or have heard about in an institution and having them copy it off a CD or a thumb drive and then sending it to them. That's really putting a lot of barriers in the way of access and doesn't really, I believe, meet the sort of vision the government and a lot of academic institutions have about making data accessible to the public more widely than the research groups who create it. Yep. So you're curating not just the data that you've collected, but also specimens or samples. Yep. Yeah, by virtue of the fact that some of those are objects that can't be replicated, then you are going to have to maintain some sort of pointer to where that physical location is. Yeah, sure, I agree. I mean, I accept that. There are going to be, often, there are going to be physical outputs, if you like, of the research process that have to be kept somewhere. But still, I mean, the idea about making it accessible is to put it into some sort of public, somewhere public, that people can get access to it. It's not like you have to know somebody who can talk to you and you can then go and talk to the person who's got the key, you can open the door to let you in. That sort of stuff is really putting barriers to access. Do you know if that's true if you're finding these access to the FOI? As far as I know, no. The new FOI, the FOI amendments doesn't really go into that much detail about data. It's more about, I mean, it's essentially its aim to change the culture of government, particularly. Belfast, the fact is the tree rings, one. The obligation was on the institution to convert all that data into a form for the recipients to a digital form. It wasn't enough to just give the researcher access to the old floppies. Yeah. So I think there is going to be more pressure on institutions to actually provide the data in ways that make it more accessible to wider communities than it would be if it was just kept in someone's drawer or locked in a filing cabinet. The third characteristic I talked about was that I listed was usability, or what I call usability. So the data has to be usable. It's not enough to find it. It's not enough to store it somewhere that people can get access to it. If they can download the data but can't use it, then again, it's not really being used. It's not fulfilling the purpose of sharing data if people can't use it. The sort of things I think you need to think about is how we store the data, whether we store it properly. Well, I talk about storing it properly, but really I'm not saying there's a particular approach that's the proper approach. You have to have some consistent and reliable ways of storing the data, so you have to think about things like media migration, for example, if you're keeping data on digital tapes, you have to think about migration. You can keep data on CDs and DVDs, not that I would recommend it, but if you do that, again, you have to think about issues of media migration. The San Diego Supercomputer Centre some years ago did an estimate of how long it would take to, how long it takes to migrate data from digital, one generation of digital tapes to a new generation of digital tapes, and they've reckoned that somewhere, I can't remember what the figure is, that somewhere around 200 petabytes, you would run out of time to migrate from one version of digital tape to a new version before that new version itself became obsolete, so there are issues like that you have to think about. Yeah, I can't remember the correct number, but it's in the petabyte region, it's not bigger than that. I worked in an experiment where they're migrating technology, five petabytes of data to migrate to new tapes, and the rest of them are bigger than that, for the transfer of power. So that is probably going to be a big issue when we have these huge data sets, if we keep them on digital tapes and the formats and the hardware change, although they change at a lot slower pace than people really think, or had thought it would happen, but still, there are issues about the time it takes to do that migration that you need to consider. You can't just think, oh, I'll migrate it, and that's all right, it's not that simple. The other thing you need to think about when you're looking at usability is what sort of formats you store data in. Again, I don't have any answers and don't look to me for answers, but there are things you can think about about how quickly a data format might become obsolete, how quickly, if it's proprietary, how quickly it's the vendor stop supporting it, how long the support program will run for, think about moving to open source formats. I'm not saying this is practical in all cases or in most cases, but it's something you can think about whether you can use open formats for storing the data, whether the software that's needed to query and access the data is going to be available in 20 or 200 years time. I mean, that's the big issue, whether the software and hardware can still work. And there's also another approach is to think about emulation and create software. Emulators that let you actually use the data in the future. There are no proven answers to these questions. The digital preservation community, generally at the moment is mostly focused on migration strategies of various sorts. There is quite a lot of work being done on emulation approaches as well. So really all you can do is keep an eye on the literature and understand what's happening and see where developments are going. Yes. I'm sorry, Scott. No, sorry. But when you consider what you're talking about for the large data sets, obviously emulation becomes more out of the box. Yeah. Yeah. Because the thing is, conversion of data formats is actually playing a pretty expensive role because I used to do this quickly. It actually takes a long time to actually get through data and convert it to the store and then actually go and buy all the content that's being created from something presented. Yeah. Yeah, I mean, yeah. There are issues that, you know, it's really, although we think we've been dealing with this for quite a while, and it's in the region of 15 to 20 years, there are still lots of issues that haven't been resolved because we don't have enough experience with dealing with things like large data sets and trying to keep them usable over long periods of time. So that's why I say there aren't any answers. They're just particular approaches people are looking at and are developing, which may change soon or in the next five or 10 years. But at the moment, the best answer people come up with that you need a combination of approaches. There are various ways you can do migration, but it's probably likely that you'll need some sort of emulation strategies as well to preserve data and make it, keep it usable. Excuse me. And last characteristic I want to talk about is integrity. I want to say integrity, I don't just mean bit level integrity, although that's really the core of it. Really, you need to make sure that there's no unauthorized changes to the data. It is possible for stray gamma rays to change bits, although not very common. And there's also the possibility of malicious damage. So you need to be able to recover from those sorts of situations. So you need to be able to ensure that the bit level integrity of the data is not compromised. So you need to think about backups and whether, how much redundancy we have, how many copies we keep. There has been a paper or there's an approach being developed in the US called LOX. Lots of copies keep stuff safe. There's their chief scientist, David Rosenthal, has written a paper saying bit preservation, essentially arguing that the issue of preserving the bits has not been solved. If any of you know of it, I would say take it with a grain of salt. Because that issue that he is talking about is really about storage media reliability. It's about whether things like raid stores can keep data safe, can keep its integrity. Now no one said those problems have been solved. We know what to do to preserve the bits and you need more than one copy. That's a given, one copy is not enough. You need to be able to make sure to have some mechanism for finding out whether the bits have actually changed. One mechanism with check sums, hash values if you like, algorithms to check those, applications that will do rolling checks of that to see whether the bits have been changed so you can recover data if it's been or changed somehow. And I'll just have bit level preservation there because you may think if you're just keeping the data in the form it was created, that's enough. And that often may be enough but you'll need to make sure you've got redundant copies. But you may want to have different versions that are available for access. You may not want to have the original copy available to the wider public. You've got to think about keeping an unchanged, safe bit level copy of the data. So if you need to do things for it in the future, you can. But again, that sort of strategy is really dependent on what your overall approach to preserving the data of the long term is. It may not be necessary to think about bit level preservation separately from thinking about the number of backups and how much we're done to see a new. But it is an issue that you need to consider even if it's only to reject it. That's something that has to be done. Section. This is really I think a big issue that's not really been addressed in the research community. The archives community has dealt with this for a long time. But I think it's an issue that the data research community really has to start thinking about very seriously. IDC estimate, well not estimate, it worked out that in 2,281 exabytes of information were created. That's 281 exabytes. They estimate that in 2010 there'll be 1.2 zettabytes. Zettabytes are 1,000 exabytes. That's a hell of a lot of information. 72 stacks of books, 93 million miles high. It's what someone's worked out. Is that so much data will be created this year? Can we keep everything? Anyone got an opinion, a view? Can we keep everything? Technically yes, financially no. I don't think we need to keep everything. I don't think technically. I'm not saying should we, I'm saying can we. I don't think technically yes either. Well, if you run out of physical storage space ultimately, there's a fine argument. Yes, and there's been some estimate. In somewhere like 2012, 2014, the world will not be able to produce enough storage to capture all the information that it wants to keep, let alone what's created. Because a large proportion of that 1.2 zettabytes for this year is data that we're not going to keep. Things like telephone calls, mobile calls and so on. Not necessary stuff that we want to keep. But there is a finite, as you said, there's a finite limit to how much storage can be produced in any one year. And the world will run out of the ability to store information if it keeps increasing, if the output keeps increasing at the rate it is at the moment. We can at the moment keep everything, we think. But should we? If we do, there are other problems. And if you think, okay, we'll keep everything our research team produce. You've got to think about other problems about how you actually find that and store it and make it accessible. Because it does rise by orders of magnitude, things like the metadata issue, how you describe it all. There's no putting keeping it if it can't be found and used. You've got to have the metadata to make that happen if you're keeping everything. And the costs involved in creating all that metadata are going to be astronomical. Unless we can develop really efficient, automated ways of doing it, which still look like they're a long way off. Okay, perhaps we can keep everything, but should we? Anyone think yes, we should? Okay, so you all think no? Enlightened. So if you expect like that to be in time, that's very much that, as a population. Yeah, but which data do we want to keep? Because the question selection problem, the classification problem, really, that's the forward loss of it. We're not keeping it for machines to use. So we don't have to do it like we do it for us. But would that suggest we keep everything all the same? No, actually, I think it's that dead-hand of history that they do things. Sorry, I missed that. What, we were just saying something or? I started, of course, yeah, one of the questions is, given for the machines, the machines are only machines, it's the actual, we still have to make a decision about data classification as to what we are actually going to interrogate, because it's the classic argument which I use myself. Let's keep everything, because we don't know what we're actually going to need. Yeah. And the same thing goes for machine-based logic as well. Yep. For the moment, that is a good policy with research, though. It's a good problem. Short-term, let's keep it for five years until we understand what the research is. The research doesn't. That is going to increase the cost, though, and even though there's this mantra around that storage is cheap, it's not really true when you're looking at huge volumes of, huge data sets. It is actually a real problem because properly curated and tensed storage is actually expensive, not because hardware is cheaper than it is. It's a human being to go and should like to give off, stay on. Yeah. I suppose you would have to send this gentleman to say, you probably have to consider the probability of reuse in making that decision. Yep. There may also be some sort of cost-metric for how much cost to reproduce that space of storage. It's also the cost of making the decision to be able to do that. So if we accept, as I think we all do, that we can't keep everything and we have to choose, that raises the big issues of who's going to do it, when it gets done, and what criteria are used. So should it be left up to the researchers? Should it be left up to the institutions? Should it be left up to university information managers? Should it be left up to the government? Well, the government can do it. They're going to make you do it, I think. A lot of those people will have a role, depending on the type of data and the use of the data. Yeah, that's right. I'm not going to go alone and tell everybody what data we're going to give them into the data. Now we're going to tell them, come on, let's just cross the tree. Yep. There are, particularly where we want to reuse across disciplines. The researcher doesn't necessarily know what's most interesting. That is why I want to see a plan, that's why I want to see a data contribution, that I really deserve, and be a data manager. I do care about his data, I care about his plan to keep his data and what he's going to toss. Yeah. But we don't know who's going to actually be qualified to make those decisions that lie to anyone. Because even within one discipline, we often don't know what we're going to use data for 20 years now in the channel. And someone says, oh, is there innovation? The answer is, I don't know either. I think we can probably make some reasonable guesses. I think that's all we can do, make reasonable guesses and just be, are there good enough to make reasonable guesses and humble enough to learn from our mistakes? That's a good point. As an archivist, I have to say that it's probably the least useful way of deciding which records to keep based on what you think the research value in the future is going to be because nobody can tell. It's absolutely impossible to judge research. Future research needs. I think the same is going to apply to research data just as it applies to records. But I hope to be, could be proved wrong. And I'd be pleased to be proved wrong, but I don't think I'm going to be. You're going to just have to make a decision to keep it with you for the next 20 years. There's not many people that have the same kind of human sense of the process of creating long, half-lake of citation. If it keeps it for a short, reasonable period, we could probably think just as to how far we've talked about to be able to be used. Yes, we'll get things wrong, but at least get the obvious ones out there. Yep. And there are, there are, there's some literature about how you choose or what criteria you should use. So it's been suggested if the data, for example, comes from a repeatable experiment, you don't need to keep it. If it's data that's produced by some unique one-off event like a volcano explosion, then you do need to keep it because you can't reproduce that. You can't get that data back. Where the data's obviously not reproducible because it's so unique to the time it was captured. So, I mean, there are various things, issues, or ways of looking at this that have been talked about. I'm not saying any of them are right. You might want to make a decisions based on what you think the government's research priorities are, which would be as good a way as any. And it varies by discipline. So when we were doing the needs analysis discussion with a number of researchers late last year, within the ANS services roadmap, Mark Reagan, who works at Institute of Learning and the Life Sciences of UQ, who's on the ANS steering committee, made the comments that for a whole series of genomics data, it was cheaper and easier to keep the sample and re-analyze it, rather than keep the data from the last analysis. He's the sequencing machine, he's the last one, he's the cheaper all the time. Yeah. Yeah, and that's, you know, that's always something that's likely to happen. You know, the machines you use to analyzing the data or producing it are going to get better and better and quicker and quicker and maybe you can throw away a lot of stuff because you'll have better ways of getting it in five years' time. That's a weak observation. No, no. The other question that you have to think about is when this selection is going to happen, if you leave it to the end, it's going to create a lot of problems because you might find that if you don't think about it upfront, you're not going to get the right sort of metadata captured and you're not going to get things stored properly. If you wait for five years. So I'm not suggesting you have to make all the decisions up front, but you certainly have to consider the issue of what data that's going to be produced by a particular research project will be kept or might be kept once the project's been finished. So the issue of when is not a trivial issue. You can't just wait till the end and then make a decision. Did you want to say something, Greg, or? No, sorry, I thought you had your hand up. Now I need to wind up quickly. Again, I just brought up the code because it does actually say things that institutions have to do. Retain research data and prime materials. And I'm not going to dwell on all these because I've covered some of the broader issues earlier on, but the code says you have to keep it. Retain research data and prime materials. Provide secure storage. Identify ownership. And then it talks about the issues, the privacy and ethics issues. So the code sets out things that institutions have to do. I try to summarise what things I think need to be done by institutions at the institutional level. So it's really to develop the policies, procedures and infrastructure that let these things go on. I was saying the institution itself has to do all these, but it has to set the institutional policies and procedures and infrastructure that can allow these things to happen. It can allow the data to be created and captured and kept and stored that can enable the creation of the appropriate and sufficient metadata. Allow researchers to manage their data and keep it stored properly. And then we'll allow them to actually publish that data more widely or make it available more widely. Without a lot of barriers to access by a wider community than has generally been allowed to get access to scientific or research data. So if you want to think about persuading your institutions to work on some of these things, that'll be a great start. The code also says researchers have to do things too. And I've really, there are two other paragraphs, but I've only got this first one, a retain research data, oh sorry, and retain research data and prime materials. And the code specifies reasons for doing this to allow reference use and also to answer any challenges to the data or the research methodology. Keep it so it can be used by other researchers. Now this is a slightly different use for the first ones I've talked about, which are really just letting people see what you've got. You need to keep it for at least as long as your institution specifies. So that's implying that institutions have to have policies about how long you keep data for. And all data doesn't have to be kept for the same amount of time. I mean, the data you keep doesn't have to be kept for the same amounts of time. Some could be kept for a thousand years, some could be kept for 20 or 30. The code also requires the use of secure and safe disposal practices, particularly for data that has privacy and ethics issues if you're not keeping it. You need to make sure it's disposed of. However you do that in a way that other people or people who aren't authorized can get access to it so you don't chuck your hard drive on a tip somewhere without wiping it at the bit level. Again, I've just got a little summary of the sorts of things I think this requires of researchers. It's really just the cultivation of good practice in these things. Now I'm not saying we know all the answers. I can't tell you what the good practice is in all these areas yet. That's one of the things that Anne's will be doing. In the next two to three years, we are trying to be able to give you the best practice guides to these areas to these issues that you have to think about when you're curating research data.