 Good afternoon. I think we're going to get started for the sake of time. I'm Margaret Headstrom. I'm the Associate Dean for Academic Programs and Professor at the School of Information at the University of Michigan. I'm also the Project Director and PI of SEED, Sustainable Environment Actionable Data, a recently funded NSF data net project that started on October 1st of this year. Also joining me from our team is Robert McDonald. He is the Associate Dean for Libraries at Indiana University and Associate Director of the Data to Insight Center. And Robert is a senior staff on the project. I'm going to start by giving you a little overview of our inspiration and aspirations, what some of our thoughts are, motivating our approach to the data net call for proposals. And then Robert's going to talk more about how we are going about developing infrastructure to support integration of data and long-term preservation and access. For those of you who are not familiar with the NSF data net program, it was a very broad call that came out. Well, it seems like it was a decade ago, but I guess it wasn't quite that long, but five or six years ago, actually. Looking at new types of organizations was one of the aspects of it that would integrate infrastructure, development libraries and archives, computer and information science, and domain science expertise to build the capacity for reliable digital preservation and access. For science and engineering data over decades long time, part of the ideas were to continuously anticipate and adapt to change in both the technologies and users' needs and expectations. And one had to be pretty forward-thinking to anticipate that it would take three or four years to get an award approved by the National Science Foundation to engage in research to drive the leading edge forward and to serve as components of what is envisioned to be an interoperable network of partners providing preservation and access and other data services to scientists. I am going to really address a little bit about the new form of organization. I'll say a bit about reliable preservation and access, anticipating changes in technology, and maybe just a bit about being an element in a larger network. I am not going to discuss the engaging in research. This is in part because in the course of the second round of data net proposals, they were vastly reduced. The original program were to be $20 million awards. They've been now reduced to $1 million a year for the first two years. And if we develop a successful prototype, there may be $2 million a year following for the next three years. As a consequence of that, all of the research that we had intended to fund directly as part of this program is either not funded or people are going out looking for alternative sources of funding for the research. The seed data net has a series of partners. It is sort of based at the School of Information at the University of Michigan with participating COPI's and senior staff at Indiana University in the libraries and the Data to Insight Center at RPI at the University of Illinois in Urbana-Champaign where one of our COPI is Praveen Kumar, who is a computational hydrologist and leading the domain engagement piece of seed is located, as well as some participation from NCSA and also at the University of Michigan ICPSR, the Large Social Science Data Archive. For those of you, I hope that some of you have at least passing familiarity with the data net program. I just want to mention kind of how I see seed fitting into, in with other projects that are working toward building the data net infrastructure. I think our unique contributions are trying to address domain-driven needs and requirements. Well, I would be a liar to say we didn't come into this with some preconceived notions about good ways to go about developing this infrastructure. We're very much letting the scientific community, and in our case that is sustainability scientists, I think drive what it is we build, what services we provide, how we evaluate them, and in many ways how we engage more and more people beyond an original set of researchers that we'll be working with. Secondly, we are looking to serve scientists and researchers in the long tail, and I will say a bit more about that in a moment, but Cliff Lynch's remarks about not all data is big data resonated very much with me because our orientation in seed is to scientists who are dispersed, they have very valuable but often small data sets, they're very heterogeneous, and they have not been served or addressed much by efforts to look at long-term preservation and access to data. I think another contribution is that to the extent possible we're trying to integrate existing tools, technologies, and services rather than building something new from scratch. And for example, although there's a lot of emphasis in the data net call on long-term preservation, our approach is to build a way of passing data that is of long-term value into institutional repositories and subject repositories like ICPSR rather than trying to build a whole new repository infrastructure. As I mentioned, we are working with sustainability scientists, and in particular we were forced to put regional and I would say subdomain constraints on our at least initial set of scientists that we're working on. So our interests are in mainly issues of sustainability of water and water resources in the upper Midwest and the Great Lakes region. We chose sustainability science because it creates some very interesting problems where researchers need to integrate data and looking at it on multiple scales from multiple perspectives. We have multiple data types and our goal is to provide a service that makes it possible to add value to those data sets in part by making them easily discoverable, in some cases being able to combine them in new ways or to do computation across heterogeneous data. So the kinds of data challenges we're facing is the heterogeneity. So we will encounter everything from some pretty complex GIS systems to hydrology models to streaming data to data sitting in spreadsheets to kind of standard numeric files. And because sustainability science is looking at human interactions between humans and natural systems, there's a whole set of social and economic data that comes along with it. We're talking about multiple scales of granularity, for example, data that's captured at a very, very particular fine point to data that is collected on the basis of an entire region. It is inherently multidisciplinary and, as I said, many small data sets. One of the things that we are going to focus on, in fact, are drive data sets where someone may have taken data from a very large data collection, selected data about a particular aspect of that data set, combined it with some other collections, and created drive data that has, you know, it's had a lot of time and effort invested in making it a useful resource for this community. And we're really driven by looking at kind of what are the economics of the long tail, the data out there in the long tail. As I mentioned, there are small and derived data sets, heterogeneous. There are many, many sources of data. So some of this data is coming from state and local governments. Some of it is coming from individuals out in the field. Some may be coming from, you know, large standardized surveys. And these data become valuable when you can discover related data of relevance and combine it to address a particular problem. And so that's the area in which we are trying to make a contribution. Now, just in terms of our overall vision, we are kind of, one of the things that we're trying to do is sort of develop a place between the researchers and their labs and where with very low barriers to entry, researchers can deposit data without strong metadata requirements. We will secure that data for them. We will let them control the initial access to it. And using this idea of both social media and active social curation, as others become interested in the data and discover it, we will provide capabilities for things like commenting, recommending data, and possibly adding metadata to it. So we want to leverage social media. We're in the process of installing an instance of Vivo for our first set of researchers. We are hoping to move data curation upstream in the data lifecycle, not necessarily by expecting researchers to make their data conform to long-term preservation requirements, but to get a better mutual understanding of what it takes to curate data like this and what their specific needs are. We want to involve domain scientists in setting priorities for the way seed evolves and also I think for what kinds of data we actually invest in. There's been, to my mind, too little discussion about what data should we be focusing on. And there are, I think, illusions about, we'll just save everything. Storage is cheap. And I think things like mandated data management plans almost push people toward at least saving data and not necessarily focus and investment in all of the things that might be necessary to make that data useful, preservable, and accessible. And then taking advantage of the existing infrastructures, as I mentioned. I think I've mentioned most of these points about active and social curation. And I would say we are taking a measured approach to the idea of outsourcing or crowdsourcing some of the value-added activities. We're not expecting armies of schoolchildren to want to mark up streamflow data from the Illinois River. But we do think that within a limited community where there's ongoing research and there's an ongoing interest, other researchers with expertise and a need for this data may make those kinds of contributions. And importantly, the idea of people reviewing and commenting and rating data I think is an important, it has a lot of potential, I should say, to at least experiment with an equivalent to peer reviewing in the data world. Just quickly, our status. We started the 1st of October. So we're in our beginning of our third month. And we hope to have, we plan to have. We will have a working prototype, 18 months into the project, which hopefully will pass muster and then allow us to expand out. I think, well, as I said, I was going to say how I thought we might expand. But as I said, we're going to let the users also help us figure out how it's going to evolve. So I'm not going to make any more comments on that. This is just a list of some of our key personnel. And my time is up. And so I am going to just acknowledge NSF for this and turn it over to Robert to talk a little bit more about the specifics. Thank you, Margaret. Just a quick switch here with my PowerPoint and I'll be back on track here. All right, so quickly through this. But this takes you to our wonderful new website. If you want to queue our code on that, I've got one at the end. But what I'm going to focus on really here is this a little more detail about, you know, the life cycle support, actionable data services, and what we mean by actionable within the seed kind of ecosystem and how that's integrated with a curation and preservation infrastructure. And I just want to thank all of my colleagues here, especially Jim Myers at Rensselaer, Beth Plaley at IU, Brian Beecher at ICPSR. We spent a long time working on this framework and of course retooling it every six months for a couple of years. And now, of course, we're kind of retooling it again to really mesh with, like Margaret said, the functional specifications that we have from our science partners. And so just a little bit about, you know, the challenges that everybody knows that are out there right now. You know, managed data storage and services are not cheap. They're quite expensive when they're managed. Now, if you can go get, of course, a terabyte of disk space and put something on it and not manage it, and that's very cheap, it's very affordable. But really the FTE in managing that piece and managing it within the enterprise at scale is the component that we're going to be dealing with here. And as our partners at ICPSR know, begging for metadata doesn't work. Actually paying for it did work in a big project for them. But of course, we're not paying for metadata here. So we're going to have to use, much like Cliff was talking about earlier, the best in our machine intelligence agents and processes to try to sift through that metadata and enrich it over time and capture as much as possible directly from the data source of the experiment. The other big piece that Margaret talked about is the long tail is not standardized. We're kind of working on the fly here in terms of working in an agile process toward what we build to glue together the types of community source software that we want to use to put our final framework together. We know that the data models that we're working from this will evolve over time. And we also know that every six months that cyber infrastructure picture changes. And at the bottom here, we had this piece originally in some of our slides where we said, you know, if you build it, they may not come. Well, we all know if you build some giant thing and you don't build the community first or build it and leverage all of it as you go, you're not going to have enough users because it's not going to meet the needs of those users. So that's a key component of what we're really working here with SEED on trying to do that with our scientific communities with Illinois river basin scientists. Now a little more detail about what we're talking about here. I found it interesting because, you know, when I put these slides together, I thought, oh, this is not going to be new for anybody or, you know, people aren't going to be that interested in what I'm talking about here today. But then I heard Cliff's talk and so it was just a refreshing kind of opening for a lot of what I'm talking about because our social networking component and what I mean by social networking component, that's easy to look at it and say, oh, you're talking about Facebook. Now I'm really talking about science of science and how you can do analytics across the right type of data there with science of science approaches. And looking at co-authorship and co-funding and how you actually put that together in a way that could be linked data across repositories. That's one of the interesting things we want to try here with our repository partnerships. And so at this point in my talk, I wanted to ask, who here has an institutional repository at their home institution ratio? Well, that's great. How easy is it for you to take some of your data in that repository and move it to another repository? It's still hard for me, but I got my hand raised. I know how I could do it. That's going to be the next step in terms of how we interoperate together as repositories. And as you saw there, almost everybody here had some kind of institutional repository. Another question, how many of you are publishing data in that institutional repository now? All right, we got a few folks. That's great because that's the key piece we're going to be focused on here now. I think there's other elements of linked data and sharing that can happen there. But the piece that we're most interested in right now are in terms of getting that data published and getting it put out there within our federated repository working group. So we have a permanent place for that data to live. And in particular, so that we share the infrastructure from our institutions. And I'm going to get into a little more of that later on. But the big piece that we're seeing there now, and it was interesting, I've heard the micro citation concept explained a few different times now by people who work in the areas of chemical information and big pharma. And I also heard, I was at a talk at a day where I heard somebody kind of compare it to this. What if you could get, you know, if your scholarly communication model evolved in such a way so that every piece of software update that you did to GitHub was a micro citation and gave you credit for that in terms of a publication. That's exactly what we're talking about here and how we're publishing our data within SEED and how we want to move that forward for the long term and tying it to the social networking components that we're going to use. And in particular, with that, we're going to use an open source product, Vivo. And there are other pieces out there that are open source. There are other pieces out there that are vendor based like Collexus, which is now part of SyVow. And in particular, I thought that was of interest from Cliff's opening statement in terms of the metrics piece because I did a five minute lightning talk on metrics not too long ago and I've had more requests for a follow-up on that just because it seems like that's at the point of interest now for research institutions. But if you take a look at this, this would be like an open data model from a chemical information component. And then this is an actual live one that we have up in our Vivo instance at IU that's mapping Coddy Borner's publications to a scientific map of disciplines. And one of the things we want to do with our science of science component of SEED is to actually be able to do that with the content of our data sets across disciplines with that kind of automated mapping. And this is nothing that took work. It was all built into the Vivo processes that are there in Vivo. So some of the key SEED questions that we really have to ask ourselves moving forward and especially in this 18 month prototype is what could SEED capture and when and how much could it capture at the point of origin for some of these scientific experiments and how can SEED provide direct value to our data producers, users, and curators. That's going to be the make or break for us whether we can get that functional specification directly from our users built into what we're putting together as kind of what I call enhanced glue for the types of open source tools we're putting together for people to be able to use this way. And how can we use robust web services and social computing to lower barriers and reduce and realign costs for kind of that mass crowd source curation. Just a little bit more about the repository. The three pieces that I'm iterating a little bit here today are the active curation component, the social networking component, and the kind of what we've been calling the virtual repository, which is like a thin layer repository that enables, you know, temporary storage of data in terms of moving it into our federated repository infrastructure. And that's exactly what we're after here is to leverage existing resources. And a couple of years ago I used to give a talk and I would ask people how many people counted their data in terabytes. And I'm sure most of us could raise our hand at that now. And now everything that I'm talking about now really needs to be thought of in petabytes. And if we're not looking at leveraging existing infrastructure and one of the key tenets of what we put together for SEED always had been in taking a huge chunk of infrastructure, both from IU, from NCSA, from UIUC, now with Rensselaer on board with their computational nanotechnology innovation center, and seeing how that could provide, you know, if you will, a cyber infrastructure layer to provide for the data throughput into the repositories for the long-term preservation of the content. And you'll see that the repositories we're working with are the Michigan D. Blue, IU Scholarworks, ICPSR's repository, and UIUC's Ideals repository. Just a little bit more, of course, we have to have a few technical pictures here. This is the, you know, kind of the layer cake view. If you cut it straight through the middle, you'd see there are a network of data producers who are putting their content in through our active content repository and taking advantage of our social and scientific networking components. And then that's moving into our virtualized archives that eventually ends up with the data in its home areas for the institutional repository and storage, managed storage infrastructure that we're taking advantage of for that. A little bit more about that. We think of it as having this active workspace that really is able to collect data at the source of the experiment and work with it in the most granular format and then bring that back into our repository areas. And like I said, toward petascale, we'll see an Internet 2 upgrade pretty soon. We'll see the bandwidth of that jump from 100 gigabits per second to 8.8 terabits per second. There's a terabit component again. And once we see that, we're going to have to think really hard about how our next generation data systems sit on that network. They cannot sit in the middle of our institutions between many, many firewalls and expect to have data moved around well. They can't sit there if you're going to use some kind of a wide area file system that makes the data through the throughput of the network appear as if it's close to computational resources like we're going to see set up under the NSF XSEED program. And moving up how to buy the data, I'll hopefully go from around a 10-day prospect to a 25-hour type of prospect. And while our data in SEED doesn't exactly fit with this, our data actually causes other types of problems because it's lots of small files. It causes other types of problems for hierarchical types of storage systems in terms of compute power. We'll also see the fact that we want to federate our repositories. And so, yeah, lots of repositories will have lots of small files, and that will end up being at a better scale once we start being able to share that a little bit better. So, within our 18-month prototype, we're going to target these key components of the active and social content curation. We'll have a pilot active content repository using Vivo. We'll have an exemplary service for data ingest, discovery, reuse, and curation. And then within that, we'll also have our virtual archive for long-term access. We'll be having our data model in place, our protocol design development, and our pilot federated repository infrastructure in some sort of test in that 18-month period. And just a little bit to leave you with, I'm not going to read all of these because some of these are covered, and some of these I've already covered in some of my slides, but to really re-emphasize the key components to have the community input on the agile development for what we build as services, to have an active curation layer that actually pulls in data as close to the experiment as possible, and is able to derive from that information about who created it and what they're publishing and how that ties in with the rest of the scholarly communications stack. And then we'll leverage existing institutional resources for long-term access rather than trying to build resources ourselves. We're going to actually tie in the institutions that I talked about existing resources in new ways. And I think we'll see, especially with this I2 upgrade, different types of data systems emerging at some of these cyber-infrastructure kind of powerhouse places because they're going to build new types of systems that will be about data brokering, whether it's actually in moving data from one cloud to another or whether it's actually getting the data close enough to a computational system. The other big piece here is the sustainability and resource growth partnership and collaboration that we want to engender here for what we're doing. And a lot of that drives from the fact that our core cyber-infrastructure team has worked greatly with software development and in looking at long-term sustainability models for that and how to build communities for that type of software. And the things that we're actually using here are actually already have communities around them that use the software in other ways than what we're doing here with SEED, but for the long term we hope to give back to those communities with what we actually build here from our scientific community input. And with that, I think we're ready for questions and I hope we'll have really good questions and good input. We had a recent panel of data nets and I walked the floor kind of like Phil Donahue, so I hope you're not going to make me do that in this big hall today. Questions? Yeah. Well, if you could go to them, it might be great because they're recording it. Thanks. So you mentioned, I think Margaret mentioned that there's still an intention for the data net projects to form some kind of network, hence the name, data net, but could you say anything at all about how you think you'll be collaborating with the current and other recently funded projects? Is there anything happening with that or any discussion within NSF about how that might work? Is this on? Yes. I think there are two mechanisms that are in place, maybe not operational yet. In late January there is going to be a meeting of the PIs of the data nets and the interop projects in Indiana and that's going to be the first time that we've been formally brought together. It's a challenge. The call itself said how are you going to interoperate with these other data nets, but you can't know what they are because no one knows what they are yet. Now that we know the finite number, what the goals are, I think we have some concrete basis for moving forward. The other possibility is smaller proposals for research coordination network awards, which I am just beginning to understand, but certainly with the expectations of collaboration and working on interoperability on very much trimmed back travel budgets, we're going to have to, I think, get some additional resources to have some workshops. I should also add that we want to... I think it would be unfortunate if the focus was only on the data nets and the interop projects. I think there's going to be interoperability on numerous levels, and I think the piece that we're hoping to contribute to is interoperability between our service and institutional repositories, but I could see interoperability in lots of other dimensions as well. So, you know, those are the two initial, I guess, concrete things that I see, and maybe Robert has some more ideas. Well, I also see another interoperability layer there from the next generation type of data utilities. We're going to see that we'll be for much larger data, big data, and we'll help it either move around or get closer to HPC resources, but then that derived type of data finding a way for that to find long-term homes in our repositories is the only way to really leverage the resources that we have out there in a lot of ways for that. I think I should just one other comment on this, and it's not, it's moving to a slightly different subject, but I think what's interesting and Cliff mentioned in his opening remarks about how it isn't really, nothing is really shaken down as to whether, and this was with regard to the data management plan requirements, whether universities would be providing these services to their researchers, whether domains would be providing these, you know, you'd have sort of topical repositories, whether they might be federally funded services of some sort, or commercial. We're looking at this in the sense that both, you know, Vivo has been deployed on an institutional level. We're trying to deploy it for a domain, and that's sort of an interesting experiment in whether you can sort of move out of the box of the institution and focus on a science-driven domain to use this to tie a community together. Likewise, the institutional repositories are institutionally based, but we're going to try to mix things up quite a bit, not just among the institutional repositories that are, and ICPSR, which is not an institutional repository, it's a topical repository, you know, whether we can mix things up among those, as well as potentially other institutional repositories. Is that Jeremy? Jeremy. I was wondering what you were saying about the interoperability with the repositories, with a very different kind of data, what issues you might be encountering or expect, taken to prepare those repositories to handle that kind of data and ensure it can be preserved? So, one of the very first things we're doing, and we'll be starting on it next month, is working with a particular group of sustainability scientists at the National Center for Earth Systems Dynamics at the University of Minnesota. They are an NSF STC that is sun setting, so they have data, and the infrastructure that's been government funded to, you know, collect and work with this data is going to go away. Part of what we're trying to do is extract a lot of information about their data types, their data models, the data standards that they use to get an understanding of that variety and heterogeneity. I look at the active content repository as kind of a soft landing zone for data that scientists may still be using a little bit, but they're kind of looking to give it up and hand it off to someone else to take care of. And it's sort of, I think, in that active curation portion of the project where the real specifics of the integration and interoperability issues on a data level are going to be exposed. And then on the repository end, although we're much less specific about it because, you know, we have this looming 18 month deadline, interoperability between publications and the data that they reside on or that they represent is going to be another interoperability problem. So, I mean, I look at interoperability pretty broadly. Hi. Thanks for the presentation. It was great to hear about your project. I was wondering if you could talk a little bit about your plans for the prototype for public access and discovery of what's in the repositories? Well, there will actually, there'll be a component of that, but that'll probably be a lighter weight piece of it now than some of the other parts that I mentioned around the actual ingestive data and the active curation component and the social networking component. I think more of our discovery layer will eventually be derived from that social networking component, but a lot of it will, you know, of course, take a little bit of work in gathering that data first and then being able to do something with it in some kind of faceted approach and then let you dig into it based on, you know, who published the data and how that works. But, you know, that eventually will be there, but the core elements that we're working on, that 18-month timeline will probably be less focused on that and more on how we get the data we need for the network components, for the actual researchers and how we tie that to the repository components. I think for the prototype that access to the active content repository is going to be pretty restricted to a specific designated community because this is data that, you know, may not be ready for primetime yet, may have all kinds of other restrictions on it. When it passes the barrier from the active content repository into institutional repositories, certainly, you know, the public access dimension will, you know, very much come into play. And we have questions about, you know, what do the policies need to be around the active content repository that, you know, if you make it too open, it's a real disincentive for people to put their data out there because it's not really clean or they might get scooped or there are proprietary or, you know, confidentiality concerns. So, you know, as I said, it's a kind of intermediary place for to hold the data, to do some value-added activities and to get a sense from the community itself who thinks this data is valuable and worth taking to the next step. A different question then. Coming back to this relationship between disciplinary services and institutional infrastructure, in earlier runs at the DataNet program, sustainability was a big part of the projects and I wondered if your project has any components of that still and if not how you're thinking about that since this has been a big problem for us for a long time trying to serve disciplinary needs with institutional funding. So, I guess one aspect of this being the long tail is that this data dispersed among multiple institutional repositories and requiring relatively little curation after it leaves the active curation process. From a cost perspective, you know, an assumption I have is that it hides in there with lots of other little data sets. I don't mean in quite such a, you know, covered manner, but, you know, part of our assumption was that if institutional repositories are going to be part of the infrastructure, then we have to count on being able to rely on something. Now, we're also talking about redundancy so that we're not dependent on a single institutional repository, but at this point, having had, we had actually a whole, we had an entire faculty person devoted to looking at sustainability models. The funding for that is gone. You know, I would say until we get through the prototype, we don't have focused resources to, you know, validate or challenge any of the assumptions we have about institutional repositories willingly and readily taking this on. Now, I will say that what I hear from the other direction is that many institutional repositories are interested in getting into the data business, but having some difficulty figuring out which data, which types of data, how to get the data, and I guess I see us sort of greasing the wheels in that regard. So I think I misasked that question because I'm not doubting that institutional repositories will continue to play a role in this. It's your active repository, the front end. Oh, the front end. That is a centralized service that your project is building and running. It isn't a distributed federated thing and it's aimed at a discipline. So I don't quite see yet how that fits into this ecosystem that we're talking about and whether you're planning to work on the sustainability of that piece of it. You see? Yeah. And I think some of that comes into play with the committed resources that we do have from IU and how we can put those toward what we've done in recent years, MOUs with other places like the Texas Advanced Computing Center and with Rensselaer on how we swap out data resources that are committed for the long term for this type of activity. Now that being said, you know, how big does that have to get before the institution goes, oh, we can't do any more of that, right? Excuse me. Yeah, thanks for the clarification. I guess I would say one other piece of this is that for the curation work, adding metadata, finding errors, improving documentation, we are looking at a model of, you know, micro-contributions by scientists themselves as they use this data. Whether that is just a pipe dream or something that we can actually enable remains to be seen. But I think, you know, you're pointing to a very important potential weakness in the fact that we have a grant to fund something that's supposed to be sustainable. My question, in a sense, follows on on that, but it also touches on something I think Robert brought out in his introduction about the issues of moving content between repositories. In a sense, when sustainability goes wrong, as we've seen with a number of domain systems, you're forced to the problem of moving content around between different technical infrastructures and different organizational infrastructures. Now, I'm aware that in your project, you are looking at moving content between the active curation domains and the institutional domains. What wasn't clear to me is whether you're also going to test out moving content between domains that are similar in function, but maybe different technologically or organizationally. Let's say moving between institutions. There's been a very small amount of work done on this, but I think not nearly enough when we could do with a lot more knowledge. Yeah, I think that's something that we would like to have as a real opportunity, especially with, like I mentioned, the upcoming Internet 2 upgrade, because everyone always says, well, this is an answered question. Well, not if you really want the data moved in any normal amount of time, right? Or you didn't write it to disk and actually ship it through FedEx to do it reliably and all the time. And that's what I was talking a little bit about before. And one of the interesting parts about when we started this data net was getting the CIOs on board a bit from the institutions, because they understand that this coming kind of piece for them is going to be more about where these resources are located in terms of the network more than anything else because of what you're talking about. And in fact, I use committed a certain amount of data for this activity to grow over a certain period of time for it, and I see as using that, you know, with what we have now with our data capacitor and even moving toward our next generation system, which has yet to be kind of put out there yet. But, you know, its whole modus operandi will be about brokering data movement across networks at real scale. So I think there's a lot of opportunity for finding out more about what really needs to be there and making sure that we're able to do some of that with this project. Those are interesting points. I mean, the technical ones about the network, but I'm also thinking about institutions do disappear. They merge, they split and responsibility for domain repositories comes and goes as well. And we then face with challenges that are partly technical, but partly very, very different in trying to move content between entirely different management domains of different concerns. And often people haven't planned for that from the outset. They've thought about the changes that happen within your institution, but not the changes that happen when somebody else needs to take it on. Yeah. ICPSR celebrated its 50th anniversary in October. And the University of Michigan will celebrate its 200th anniversary in 2014. So I don't know. As institutions go, I think universities and university libraries are pretty stable things. Certainly relative to, say, cloud services or let's start a data center, you know, and then worry later about what to do with the data when the project ends. The risk of, I don't wish to dominate conversation, I don't disagree that on average universities are long-lived institutions. And ICPSR is also a great example of a long-lived data repository. But we've also got it only takes a few to change and we've got, certainly in the UK alone, I can give you enough examples of institutions disappearing, going bankrupt, funders changing their minds about the main repositories they want to support, that it's a real problem that, well, we've seen failures in the past and it would be good to know how to avoid those failures in the future. Sometimes we will have to move stuff and I'd like to be more confident that we can do it without pain. And I'm interested to know whether your project's going to help us answer that. I was wondering if you are addressing any of the usability issues behind data preservation. One of the inherent things with data is that it's much more than bits and bytes that they come with readme files and contextual information interpretation is very domain science-specific and complex so you could imagine that a data set would be readable 50 years from now, but it would not make any sense to those who want to use it in scientific sense. Specifically where I think we might make a little headway in that regard is what makes a domain like sustainability science interesting to address that challenge is that you have a hydrologist and you have a soil scientist and you have an urban planner and they're all looking at what will happen if we increase the amount of impervious surface in this river basin by 25%. They all need data that you can somehow integrate and get a reasonable answer or set of parameters to that kind of problem. I don't think we can start solving these problems by saying this very specialized, we're going to make this very specialized data now understandable to anybody else, but in a sort of pairwise step-by-step way if we can get a soil scientist to understand the hydrologist data, then maybe the next step is that we can get an economist to understand the model that the urban planner and the hydrologist and the soil scientist put together. Whether that is something scalable I don't know, but I think it begins to look at one piece of what could otherwise be an intractable problem. I think we have time for one more question. Okay, well thank you very much for your participation. It's very helpful.