 I am just delighted to see everybody here, and let me welcome you to the fall 2009 CNI membership meeting. I am really pleased that you all were able to get here and join us, that we seem to have had the weather mostly behaving, that everybody was able to navigate the journey, a special welcome to our international participants. I know that traveling internationally is not getting simpler these days, and I particularly am grateful for the effort you've all made to be here. We actually have a lot to talk about today. This has been a strange year, but one that's actually had some quite remarkable things taking place, both the good and the bad. I'm going to comment on a few of those as they affect our agenda particularly, and talk a bit about what we've accomplished in the last year, and some of the things I see coming in the next year. I'm going to aim to go for about an hour, and that should leave us with a good deal of time for conversation at the end. I puzzled a great deal about where to start this. There's always plenty happening, the question is which thread to pull on first. That question actually as it sometimes does, answered itself as I looked at various newsworthy events that have been taking place in the last couple of days. Who would have imagined seeing a call for public comment from the Office of Science and Technology Policy on behalf of the White House about open access to federally funded scientific and other scholarly research? I certainly didn't expect to be reading that last week, and I think it's quite a remarkable thing to see a call like that come out. It really to me indicates just how profoundly the thinking not just of a few people within the academy, but of government and of the society more broadly, just how big that shift has been over let's say the last decade or so. The whole way we think about the sharing of scholarly research and knowledge and who should get to see it and on what terms, and the way that the information technology environment and the network has become absolutely integral to that whole thinking about dissemination. To me, that was kind of a watershed moment. When you put that in conjunction with the steady drumbeat of open access mandates that are being adopted at institutional levels, as you see the discussions which to me are perhaps even more important about open data and about sharing of data and expectations about data sharing, I think we see a really substantial shift in opinion here. I also just want to note there that it's not all government and higher ed. There are lots of pieces of this puzzle. For example, as you will hear in one of the breakout sessions today, the National Academies has launched a data board to kind of organize the broader U.S. policy dialogue about scientific data in particular. And that again seems to me to be an important step. One of the things I'll just note before moving on from open access, which is something certainly that CNI is interested in, but perhaps more for its implications than its essence. There are other organizations that we collaborate with who are much more focused on sort of the philosophical essence of open access. I note that there is less talk about open access as an economic panacea. It's much more about doing something that's right than doing something that's going to magically solve an otherwise intractable financial conundrum. And I think that's an important thing to note because what it means, at least the way I think about it, is that we're going to have to return to seriously grappling with the roots and causes and implications of the financial conundrums in the system of scholarly communication. The other thing that's worth noting is that open access and particularly things like open access mandates are going to start having a whole set of rippling infrastructural implications. They are already, for example, reshaping the way a number of institutions are starting to think about their repository strategies and not just their strategies, but indeed the costs and staffing and other implications of those strategies as we go forward. And that takes me to the second of the sort of large observations, which is economics. One of the other things I didn't really think I'd be reading in the paper was the various articles talking about how Harvard has evidently basically stopped work on the Alston Science Campus, that they are putting that kind of on hold, as I understand it. That's a pretty substantial strategy change that's driven by economic developments. And as I've watched economically driven developments over the past year, one of the things I keep thinking about is how quickly are various economic stresses going to be forcing cultural and institutional changes and where are they going to appear? And I've been looking for these in a whole bunch of places. I've been looking for these in terms of faculty activity, in terms of publishing activity, in terms of libraries, in terms of information technology and information technology strategies, in terms of broader institutional positioning. Are we really going to see fundamental retrenchment by higher ed institutions where they eliminate degree programs, eliminate professional schools or departments or things like that? How much of that sort of thing is happening? And it's very hard to get a feeling for this because things happen both faster and slower than you expect them to. I mean, you know, at first you expect these sort of dramatic instantaneous actions, but in fact things play out over a couple of years. As some of you know, I have some long historic ties to the UC system, and particularly to UC Berkeley. And I had a very strange moment there a couple of months ago where on the one side, I'm watching the state just massacre the UC budget in dreadful ways, but at the same time they collected another Nobel Prize for the UC Berkeley campus. And the thing that's scary about that is it takes 20 years to grow Nobel Prizes. So, that's a very backwards looking indicator that I fear people will seize on and say, well, you see, they'll muddle through, things are fine. So, you know, I think in those things we see lessons about how difficult it is to understand the timescale implications of some of the decisions that aren't being made. One of the things, though, that is very clear to me is that we are seeing in these economically stressed times a real serious emphasis on new thinking about interinstitutional collaboration. And again, there are limited numbers of things I can specifically point to and tell you about that are operational, but there are an awful lot of conversations I know that are underway. And there's a lot of very good thinking and analysis about how to rationalize things like print collections on a national level. I'm thinking of some of the fine work, for example, that Ithaca has done in this area that helps us think about how we can really rationalize things at scale if we can learn to collaborate at the necessary scale. I think those developments are very promising. One of the other fascinating things I took note of this year was Eleanor Orstrom receiving the Nobel Prize in Economics. A few of you may be familiar with her work. I'm indebted to Charlotte Hess some years ago for introducing me to some of the work she did. Basically, what she's been studying throughout her scholarly lifetime is how to manage commons of various kinds, how to manage public and shared spaces. Doing something very different from the kind of economics that have been honored by a whole series of Nobel Prizes over the past decade or so that are basically about the magic of markets and of various kinds of quantitative mechanisms for understanding them. I think that's a particularly appropriate kind of thing to see at this stage as we reassess how we can best deploy resources across the education and research enterprise. I've been spending a lot of time this year also, as have a few of my colleagues in the audience, on various things around sustainability. Notably the slightly pretentiously named Blue Ribbon Task Force on Sustainable Digital Preservation that has been sponsored by a number of funders to really take a look at funding models for various kinds of scholarly resources across time. This has been very interesting. It's amazingly hard. I'll just mention a couple of conclusions that I personally have come to out of this. And I don't want to suggest any of these as the conclusions of this group or any of the other groups I've been involved in looking at this. The first is that there is a place for markets in here and sometimes that works. The second is that there is a place for circles of gifts and sometimes that works and that we would be very well advised collectively to spend a little time understanding how to make that work better because often we fall back on market mechanisms because we don't know how to make circles of gifts work or in some cases because we're just determined to prevent them from working. I mean, it's quite striking. Many institutions, particularly state institutions, can buy all the things they want. What they're not allowed to do is make contributions to a common good easily. That's the sort of thing we've really got to take a look at. The last thing that I really have come away with as a strong conclusion from this is that there really is also a place for public goods in this and that we don't like to come to that conclusion. There are a lot of people who are very uncomfortable saying that but the honest truth is there are some things that it really does make sense to treat as common public goods. One of the things that I've been wrestling with a great deal had at least one of its starts in some of the discussions about usage policies for bibliographic records and the challenges that OCLC has been facing trying to come up with a strategy there. Again, I want to kind of distance my remarks from anything specific that OCLC or its advisory groups want to do and simply reflect that the nature of a lot of content resources is really starting to change now as we move much more deeply into a networked environment. We have a form of older thinking that dates back to the 70s, the 80s, the 90s where you have these centralized database services and of course the network allows you ever more sophisticated ways to reach those things, to search them in some cases, to do cross-searching between resources. But what's happening now, I think, is that we're starting to see certain kinds of knowledge resources literally diffuse into the infrastructure. I'm thinking of names, of gazetteers, of authority files or things we once called authority files, and really what they are is databases of names. I'm thinking of biographical kinds of things, factual biography that might be attached to these. I'm thinking of chemical structures. These don't make so much sense anymore as centralized databases. They're things that are diffused and linked and used as adjuncts to all kinds of computational and searching and annotation and matching activities. Some of this is, I think, very connected to the thinking about linked data that's becoming popular in the web world, although I don't want to narrow this observation to that specific technology. I think that sort of linked data thinking, though, is a manifestation of this. I think that when we look at this kind of diffused intellectual resource, it really does take on an infrastructural quality, and it becomes very, very tough to figure out how to finance this as anything other than a sort of a commons. It's a very big challenge, and I think it's one that is very much going to be upon us in the next years as we move more and more into this kind of diffused and integrative data environment. We're already seeing a lot of signs of this in all sorts of scholarly work. I mean, it's not just the sort of big, many, many different application area files like, say, a gazetteer of place names, but also much more specific kinds of things that serve scholarships, species names, grammars, these kinds of tools. So I think we need to do some serious thinking there as we look at the sustainability picture. But I want to move on from a few of these sort of broad reflections and talk a little bit more specifically about CNI, about developments on our agenda, about our program plan and how our work fits within these kinds of really broad frameworks. And I'm going to try and step fairly rapidly through this because there really is a lot I want to talk about. One thing I should say, I should just take a couple of minutes and say that CNI itself, at least in my view, remains a strong and vital organization. It's an organization that I think continues to carry out a number of really essential roles. We're an organization that continues to be financially, I think, sound, although we have, as everybody been watching the financial picture very carefully, we've held our dues flat for the last year. One thing I do want to share with you is, like all other membership organizations, we budgeted and prepared for the prospect that we would perhaps lose a few members this year due to economic pressures on those institutions. Indeed, we have lost a couple of members and lost them generally with wishes that they will be back with us just as soon as they can in the hopes of slightly better days. What we had not planned for, and I just want to share with you, is new members in this time of stress. I'd like to welcome six new members who've joined us during this year. The University of Texas at San Antonio, the University of Idaho, Loyola University in Chicago, Pepperdine, the University of North Carolina, Charlotte, and Gail C. Engage Learning. All of those organizations have joined us, even in these difficult times, to work with the rest of our members in advancing our agenda of digital content in the support of teaching, learning, and scholarship. I think that's just a remarkable thing. I also want to note just quickly that recognizing the financial pressures that many of our members are under, especially around areas like travel, we have done a couple of things. We took some video from the last meeting, and we've put that up on the net, and that has been well received. We are capturing a couple of the plenary sessions, this one, Bernie Frischer's closing plenary, and a couple of selected breakouts on video. One of the things I should warn you about is that means if you ask a question later, or if you ask Bernie a question, and you're in this room, someone's libel to come running up to you with a wireless microphone, or I'll try and repeat the question, and you might get to be on TV too. We're not taking kind of random audience scans, but I just thought I should mention that. So we are trying to get at least some bits of this meeting besides, of course, all of the documentation from the breakout sessions out to you so that you can view them and share them with your colleagues out on the net. The other thing we've started is something called CNI Conversations, and this is a thing we're doing about once a month. It's an hour to an hour and 15 minute phone call. We send out information on this to our member representatives, so those of you who are member reps should be getting this. Those of you who need to hear about it and haven't, talk to your member reps. Basically, this is just an opportunity for me to catch you up with a few recent events and have some conversation. And again, it's also a good way to connect with not just the two member reps who typically represent our members at this meeting, but larger numbers of people within our member institutions. That's been pretty successful so far. We've been putting up after the Conversations audio for public access, audio recordings, and those are available through the CNI website. So I commend that as another good vehicle for staying in touch in between meetings. I would warn those of you who are looking forward to the one on Thursday that I think we're probably going to spend most of that summing up this meeting for at least a few highlights of this meeting for those who couldn't be there. So the December one may be the least exciting of the lot for those of you here. In terms of program, you've got in your book, I mean in your registration packet, the 2009-2010 program plan. And I'm not going to take you through all of it, but I am going to highlight a few areas there. I would say just a couple of things about the program plan generally. The first is it's sort of a statement of intentions as of November 2009. I have a feeling that like all plans it will become adjusted between November 2009 and November 2010. So take it for what it is. It's a view of our thinking at a specific point in time, and our thinking is not going to stay static for a year. The other thing I'd say is please use this to the extent it's helpful as a communications vehicle. Not just from CNI to its member reps, but as a way of helping others in your institution or beyond your institution understand what we're doing and why it's significant and how it might connect to other activities that are underway. And please let us know if you need additional print copies of that. Now just to remind you, our major programmatic work kind of falls into three not totally discreet areas. One that we think of as dealing with content, another with organizations and institutional practice, and a third with technology and infrastructure. And I've got a few developments and things that I want to comment on in each of those areas. I'll start with the area of digital content. And one of the things that I want to note there that I think is very significant is the way ideas about data curation, which came out of course of the thinking about e-science, e-research, and cyber infrastructure, are now starting to become much more sharply delineated from ideas about digital preservation. Certainly the tools and capabilities to preserve digital material are an important part of the toolkit for data curation. But the areas really have somewhat different focus. The discussions about digital preservation historically have dealt with how do we keep things and keep them in forms that would be usable and interpretable for a very long period of time. That's the essential digital preservation challenge. And it was of course recognized as we set out probably in the early 90s to really seriously grapple as a community with digital preservation. It was recognized that there were some questions about, well, how do you decide what to preserve and what not to preserve and how long to keep it, and that sort of thing. But those were always kind of kept at the periphery of the discussions. The heart of it was how do you organizationally, legally, technically preserve material for long periods of time in ways that its meaning is conserved. Data curation started out as sort of, well, one of the things we need to digitally preserve is scientific and scholarly data sets. But it's rapidly taken on a very different character where we think now about the goal of this activity as being reuse often, not merely preservation. We think about content life cycles. We think about what you need to do in order to facilitate discovery and reuse. We think about educational programs to train people, to work with scientists and scholars to curate data in order to achieve these kinds of goals. This is really, to me, the emergence of a new discipline in some sense that is now separating itself off, although continuing to be linked in complicated and fruitful ways with our more longstanding concerns about digital preservation. I think that is a very powerful development. I think it has enormous implications for stewardship and memory organizations of all kinds. And I think it's one that we need to reflect very carefully about. In the area of preservation itself, we are seeing also lots of action. Everything from the transition of the ended program at the Library of Congress from a sort of a project to a permanent program on one side, all the way through a lot of fairly serious and focused technical and engineering work on the longevity of storage media and their failure modes. You've got a taste of some of the thinking here in, for example, David Rosenthal's keynote at a recent CNI meeting, and if you haven't looked at that, that was one of the videos we made available, and I'd urge you to have a look at it. One of the ideas that I find very compelling and that CNI has been looking at a little bit in collaboration with CLEAR and some of the folks at the data conservancy effort at Johns Hopkins is questions about the resilience of data preservation strategies. And I'll explain this a little bit both by analogy and by sort of high level example just to give you a flavor here. One of the things that's fairly clear is that when you put enough data together, there's no way you can put in enough redundancy so that over time you won't lose some. It's just the way things work. When you get a big enough collection of things, some of them will always be broken. That's one of the basic things in life. And we're starting to see this more and more in all kinds of systems that we develop. It's very interesting that this is becoming suddenly a thematic problem in high-performance computing because the path forward to ever more capable high-performance computational gear now is parallelism. So rather than building a processor that runs twice as fast, what we do is we build 100 processors for the price of one processor last year. And again, when you get enough of these together, some of them don't work. And this is producing enormously difficult challenges for software engineering and algorithm design. I commend your attention to some really frightening papers on resilient computing and the exoflop regime to see just how hard these problems are and how few good ideas we have for dealing with them. We have exactly the same problems showing up in storage, at least at a philosophical level. We are going to lose bits. And unfortunately, the general engineering approach has been, well, we will assume a perfect substrate and throw enough redundancy money into the storage substrate that it's perfect. And then we can go on with some sort of tidy and rational software engineering above it. And it doesn't look like that's quite going to be the case. What we really need is algorithms and data structures and things like that that begin to be robust in the face of limited failures. In other words, when something goes wrong, it would be much better to lose one image than a database of 600,000 images. It's interesting that this actually underlies the sort of old, I don't know what to call it, it's sort of an engineering myth of digital preservation that one should not store things in compressed form. And if your storage substrate is fine and your compression is lossless, of course you should store things in compressed form. It's much cheaper. But the nature of compression, and this can be made more precise in various ways, is that you become much more vulnerable to individual bit errors. Losing a bit causes much more mischief in a compressed file than an uncompressed file. It's the difference between having a pixel flip in an image and having an image file that's basically uninterpretable. Those are the kinds of things that now are starting to surface as we think about digital preservation. And I think it's high time that these start getting greater attention. I also want to note that there is a big policy discussion waiting in the wings about digital preservation and cultural memory. And some of this ties to work like the work of the Section 108 Committee that looked at the special exemptions and permissions that accrue to libraries and archives for maintaining cultural memory, essentially. And how those need to be adjusted in the digital age. But it seems like it is time for a much broader thinking about this. About things that might be done to encourage the movement of material into the sort of public memory and public domain in digital form, or at least have representations of it in that form. So I think that there are some very significant discussions that are getting ready to happen there. And you can see many threads leading into that. The orphan works conversations about Section 108. One can imagine a number of issues around things like tax policy that might come into play here as well. So I think we should not overlook the need for continued attention on the legal and policy side of preservation. Just to wrap up the sort of specifically content oriented things that are very much on our agenda. I want to mention two things. One is the emergence of a set of activities that sometimes go by the name of citizen science. There's a very fine report that Liz Lyon over at UKOLN prepared recently on developments here. And I'll point her to that out on CNI Announce. But what we're starting to see in a number of fields is a whole new regime of data collection, which brings in not just experts and researchers and students and credentialed people, if you will, at universities and federal laboratories and things like that, but interested citizens. And actually, if you look at it the right way, this is also, of course, happening in the humanities. This is one way of reading the interest in genealogy and local history and similar kinds of phenomena, all of which are things that have gone on forever but have been vastly amplified by the networking capabilities of the web. And I think we're going to see those sorts of things feeding more into our thinking about the management of both scholarly and infrastructural databases going forward. The last thing in the content area that I just want to point you to is just a wonderful symposium that I was thrilled CNI could cosponsor along with our colleagues at the Association of Research Libraries. This took place in October and it dealt with the new world of special collections. It looked at the changing role of special collections in teaching and learning and cultural memory. It looked at the implications of digital tools for physical special collections and at least lightly touched upon some of the questions about special collections of the future that may contain large amounts of born digital material. There are some very nice papers that are either out today or out imminently on the ARL website from that conference. I did a sort of a summary piece and we left a few copies of it in pre-print just out by the registration desk, but I'd urge you to have a look at those proceedings and perhaps to listen to some of the recordings of some of the sessions which are available through the ARL website if you weren't able to join us for that meeting. It really was, I think, quite extraordinary and really helped to crystallize the key role that special collections are going to play going forward in this kind of networked world and some of the challenges that curators of those collections are going to face. I certainly have come away from this more and more persuaded that this is an area where we're going to see really high-profile technological innovation changing scholarly work on an ongoing basis. So that's a great place to hone your thinking about some of the developments here. Now, one other thing I want to mention that actually spans the infrastructure and content area is some of the work that's going forward on annotation in a network environment. And we'll see a plenary on this, I'm sorry, not a plenary, a breakout session on this at our meeting here. There's a whole team of folks working with funding from the Mellon Foundation on this and this is a really hard and important problem. It's something we've wanted to be able to do for years to be able to attach commentary to networked objects and to do it kind of independent of what silos those are in and do it across silos and to be able to share that material on controlled and uncontrolled bases. There have been probably a hundred research projects that have built annotators that achieved at least some of those objectives, but for one reason or another never got any uptake. There are a couple of things that have been done that are of more limited scope. For example, they only annotate web pages that have a certain vicious simplicity to them and that have foundered largely on social and organizational grounds. And we really need to solve this problem. This is a major, major barrier to network-based scholarly work of all kinds. I am very encouraged that this work is going forward and I think it's a very important one to keep an eye on and to try and encourage the success of. And certainly CNI has been trying to help move the thinking on this forward. Let me turn to just a couple of comments on the organizational kind of area. And where to start? I could start with repositories. We had an extraordinary executive round table this morning looking at repositories five years later. Actually, six years later would probably be more accurate. Trying to understand how thinking and roles of repositories have changed. I'm not going to do that. I'm going to simply note that there is a great deal of activity there and that it's being influenced by everything from eScience on one side to lecture capture in the middle to open access mandates on the other side. And we'll be writing some things about that and talking more about it. But it's clear that the landscape here is really changing in complicated ways. I do want to share a couple of comments about some of the component things there. One is that interface with eScience. It's clear that high-performance eResearch needs to be coupled with high-performance storage, access to databases, large databases at data rates that are commensurate with the rest of the computational system. It is very unclear what role repositories, at least as we think about them today, have with that. Because it's not clear that they can meet the kind of bandwidth and performance requirements. So there's a very interesting question about whether we're going to stage information in and out of repositories for doing serious computation on it or not. And that remains unanswered. I do want to note that related to that there is, I think, a complicated set of questions that are starting to emerge at the interface points between data and the networks. We have sort of waved our hands about network capacity for the last few years. We've talked about switched wavelengths and 10 gigabit backbones and, you know, owned fiber in the National Lambda Rail and all of that. I would note, by the way, that both the National Lambda Rail and Internet2 are going through sort of planning for technical refresh process cycles. And there is a session at this meeting about some of Internet2's plans in this area. But I think that we really are kind of at a decision point about understanding the realities of how much data we can readily move around, either for replication or from storage to computational nexus on today's networks. And when we have to resort to the old-fashioned shipping of disks or entire computer systems by FedEx. A side piece of this is the extent to which we need to regenerate and revitalize some of the thinking about the models of bringing computation to data, otherwise known as a database system sometimes. But those only work right now for some kind of constrained scenarios as opposed to bringing data to computation. One of the really challenging areas here, interestingly enough, that is starting to get some visibility is text mining in the humanities, where we have ferocious problems about how to normalize data in order to compute on it and how to talk about structuring computations that can be shipped to data. So there's some very interesting questions there that call for a more detailed exploration. In the area of scholarly communication, I just want to note one thing. We can talk a lot later about linking data to articles and things like that, and we've talked about some of those issues before. At the Digital Curation Conference in London two weeks ago, the opening keynoter rattled off a computation that I'd never done before that I have not independently verified, but that I found absolutely terrifying. Two new papers a minute in biology. Five new papers a minute across all of science. Those are astounding numbers, if you think about them, and to me suggest that we really have now successfully, by publisher parish and any number of other mechanisms, taken the scholarly communication system, especially around the journal, to the absolute breakdown point. And all of the things that this implies for text mining, for trying to sort out articles that are fundamentally contributions to a database as opposed to those that make more analytic kinds of arguments, the stresses of the peer review system, all of that comes into play here. And I just invite you, as you think about the future of scholarly communication, to think about that number, and I don't really know whether two is the true number or three is the true number, but to just think about that rate of information flow and what it implies for the ability for actual usable collections of knowledge. I want to say a quick word about some of the conversations that are going on in the world of instructional technology. There's an old idea called learning spaces, old in this business being a few years at least, and one can of course argue that learning spaces were around before we called them learning spaces. We used to call them classrooms and things. And as you know, John Lippincott has done wonderful work on particularly informal learning spaces that complement those classroom spaces. There's a new set of ideas that are floating around now that are starting to subsume learning spaces where people speak of learning environments as sort of mixtures of physical and digital environments and behaviors that can facilitate protracted engagement and learning. And there's some very interesting work coming out of particularly the EDUCAUS learning initiative program in this area. It seems to me that there are some very interesting questions about where information resources play in these, but there also are some very interesting deeper questions about how we organize teaching and learning. It's very striking to me that the sort of unit of operation has historically been the class. You sign up for a course and we think about a learning space for a course or a learning environment for a course. I'm beginning to have nightmares about students signing up for five courses and you get five professors who've all developed these viciously engaging learning environments that each want to pull the student in for 20 hours a day, all operating in competition. And it seems to me that one of the longer term implications of this is that we're going to need to do some thinking about the transition from course as a unit of activity to discipline or degree program or undergraduate experience or something else that begins to let us think about how to use these technologically aided formulations in more effective ways. Closely related to that, of course, we're also seeing the emergence of lecture capture. And we're seeing that connect up to ideas about open educational resources and about opening up education at our institutions. We're also seeing it connected to speculation about the death of the lecture and the death of the dearth of live bodies in the classroom when you capture all the lectures and this kind of material. One of the things that strikes me very much is that a captured lecture is not necessarily a substitute for a lecture. And certainly as it ages, it is very definitely not a substitute for a lecture. A captured 20-year-old lecture is actually a legitimate scholarly genre. And it is perfectly reasonable to ask people to watch them and think about them understanding that it was captured 20 years ago in the same way that you might ask them to read a journal article that was written in 1950. Understanding that it has insights to offer and that it is limited in the sense that it doesn't accommodate what happened since 1950. So I think that what we're really going to see here is two things. One is lecture capture directly in support of the actual course experience in the near term, but also perhaps the development of an additional genre of scholarly material that we're going to have to sort out the sort of conventions around how much we keep, when do we keep it, why do we keep it, and how do we understand its context and its limitations. I also see a very interesting set of conversations showing up again connected to some of this discussion about open education, about the future of textbooks, and about the construction of educational materials, particularly for fairly high volume historically often lecture or lecture and laboratory introductory courses. It's very interesting to me to see the emphasis that this administration is putting on community colleges as one of the very economical access paths into higher education and some of the investments that are going on in the creation of learning materials for some of the high volume classes that take place in that arena as well as in K through 12 and more traditional university settings. So I think that we'll want to be watching some of those developments pretty carefully and they raise hard questions about roles, about what players should take the roles in the development of learning materials. Right now they take place largely outside of universities and are acquired largely in private transactions between students and publishers. This may move us much more into a site licensing model or something else. I'm mindful of time. There are, of course, lots more things we could talk about here. We could talk about mobile devices, for example. Again, this is an area where John Lippincott's been spending a considerable amount of time. I think it's an area that we are only beginning in the most superficial way to understand. What's going to happen, I think, is that mobile devices will eventually have impacts that are going to be very different from thinking of them as miniaturized laptops you tote around and are going to be much more dependent on their abilities to capture images, to do geospatial locating, and probably to support other kinds of sensors that are going to get integrated into these. There is some fascinating speculative work going on now in areas like very large-scale citizen sensor networks or citizen journalism phenomena. These all tie into these sorts of developments. I think that this is a place where we probably really need to stretch our imagination on one side and not to get totally focused on short-term, easy wins on the other. I want to just close with a couple of quick comments about infrastructural things and particularly cross-institutional infrastructure. I see a lot of serious discussion about this going on, probably more serious than at any time I can remember. A good deal of it, of course, is driven by the economic situation, some of those being driven by new ideas about cloud-based computing. There was a very insightful comment that I want to share with you, and I probably should have asked John first, but John Wilkin said this a couple of weeks ago at a meeting I was at, and it's really been stuck in my head since. He distinguishes between shared problems and common problems, and we need to work on both of them, but there are actually different kinds of problems with somewhat different strategies, and I see both of those sorts of things going on inter-institutionally. Everything from investment in shared infrastructure, things like In Common, which maybe should have been called in shared in hindsight, all the way through some of the thinking about cloud storage and above-campus level computational facilities. I'm very encouraged by some of the work going on to look at multi-campus scholarly virtual organizations that are discipline-specialized. Many of you are probably familiar with the bamboo work, for example, which is looking at this in some areas of the humanities. There are, of course, a whole pile of initiatives in the sciences and more on the way. It is interesting to me that in Europe, the European Commission, as it looks at its seventh framework program, is identifying these sort of multi-institutional collaborative environments as a very high and very significant funding priority within its e-science and e-research programs. So I think there's lots to be done there. Finally, I'll just identify some of the poorly explored common infrastructure names again to return to that question, is something that I think would benefit from some collective attention. We are going to need infrastructure components like gazetteers, like names. We talked some years ago when the internet first took off about a national geospatial infrastructure. Maybe we actually need to be thinking about ideas now like a national and global biographical infrastructure, for example. I think there are some very, very interesting questions to be asked there. So I've given you a sampling, and it really is a sampling of some of the things that CNI is tracking on and trying to advance work on. I've tried to do this not in terms of CNI is doing X and CNI is doing Y, because really the much more important issue is what's happening in the landscape broadly, what directions do we want to move it in, what forces do we need to be aware of, and how can we all work together to make progress and move things in the directions that we want them to move in and avoid dead-end developments and redundant and poorly coordinated investments. So I hope that in this look at our program plan and where we're headed, I have given you some sense of the landscape as we're reading it from CNI, at least Cercia, very late 2009. I thank you for your attention. I'd be delighted to take some questions or comments. Michael Seidel from Berlin School of Library and Information Science from the University of Berlin. Fascinating talk, Cliff, partly because these are issues that we've been discussing, particularly in the curation, digital curation side. Within the European Union, we had a sub, sub, sub dots. Group that we hosted in Berlin recently to look at how we train people for digital curation in the future. European stamp on that. It takes 20 years or so to build a Nobel Prize winner. It takes some time to build the people who can address the problems we've got here. And there's the danger always of saying, well, we'll take one from column A and one from column B. We need this computer science course. We need this library course. What you're talking about here is something much more integrative. And I wonder if you could just say a few words about the kind of thinking, the kind of mentality that we need to inculcate into the generation of people who are going to be dealing with some of these problems. Many of which I think, as you've been suggesting here, we don't really entirely even understand. We're beginning to look at data issues that we weren't looking at a few years ago. I think in particular this problem of damage data and how we address it is something that 99% of my colleagues would block on, would never have thought about. Gee, that's a small question. Let me try a couple of different pieces of it. Specifically around data curation, I think we are starting to see some beginnings of traction from some very good investments that IMLS in particular made a few years ago in trying to develop curricula for at least the first generation of data curators. And so there's been good work done in places like University of North Carolina and University of Illinois-Champaign Urbana to start sorting this through. And some of those people now are going to get field experience with the data net grants. And I think that's going to greatly refine the state of the art through a cycle of that. My guess is that what we're going to end up with is data curators who are going to need to have some substantial disciplinary knowledge in most cases, not necessarily. And the question is, so how specific? I can believe we'll train biologists, data curators. I'm not sure we will train data curators who are expert in the idiosyncrasies of zebrafish. There's some manageable level of special, affordable level of specialization. I think the humanities, it gets very interesting because it's less clear how much it's going to be by genre and how much it's going to be by discipline as we think about managing evidence and data in the humanities. And the social sciences, again, it's probably less clear. Some of it is in the social sciences seems almost methodology-based right now. So we've still got a bunch to learn about that. In terms of data recovery and data forensics and things like that, there's a whole field growing up there, but it's grown up largely outside of the academy. Some of it is today mostly in areas like law enforcement and intelligence and has very specific kind of goals. Some of it also is buried inside specialty consulting companies, like the people you send your hard drive to when you discover you didn't back it up and it just died, and you'll pay them whatever they ask. And they've built up a set of techniques which really probably need to be mainstream. There's some fascinating work going on in a few special collection settings and a few new media and literature programs talking about digital forensics as applied to special collections and literary materials, and I think you'll see more of that kind of thing. So those are, you know, a few pieces of what I see going on in that area. Yeah, I think that's probably all I want to say about that right now. Other questions? Bill Arms from Cornell University. Cliff, in your review of the information landscape, I kept noticing how often you use the term we without ever actually defining the term we. And I think most people here are very interested in how the organizations in which we're part actually map on to this landscape. And with the budget pressures, there's a lot of questioning about organizational structures. The impression I get is that from the universities I know something called a library is going to be around a long time. But most of the other organizational structures that support academic life are under question. Do you have any observations in your experience and broad knowledge? That's an interesting question. I think that particularly as we look at the data curation, and there's that we again, the data curation challenges, there's a lot of unease there because I think you're right, there is a fair amount of confidence that libraries have been around for a long time and are going to be around for a long time. And as we look at attempts to set up free standing disciplinary data curation centers and things like that, I think that there is a very legitimate and real fear about sustainability there. Those are key operating entities and as we've already seen, for example, through the situation with the Arts and Humanities Data Service in the UK, they can go away. I think that, you know, government has a tendency to persist as well. It doesn't always, it doesn't always persist on the same kind of agenda, but it persists. So, you know, I think people are mostly at least modestly comfortable with the notion of government data management programs in some areas that are legitimately governmental. I think, you know, actually a beautiful kind of dual example is the wonderful stuff that the National Center for Biotechnology Information at the National Library of Medicine has done. I think there's a very high degree of confidence in the persistence there. One of the issues that I think people are starting to grapple with in its high time, if not past time, is this business about open and community source and our reliance on various kinds of open and community source for various kinds of systems and services. And the fact that that doesn't always survive well by benign neglect, often, at least in the long term, there need to be some interested parties with a stake in it. And having interested parties with a stake certainly doesn't preclude operating as open or community source, but does seem to be important. I'm actually very encouraged by some of the consolidation in that area within the higher ed sphere. The growth of Kuali and the folding in of a number of efforts under that, where they've got a lot of expertise in running community source. The merger of Fedora and DeSpace, I think, was a wonderful thing. And one development that I've been watching with tremendous interest following on from there is the establishment of these sort of solution communities within the merged DeSpace Fedora organization within Duraspace, where these, at least to me, seem like they could be homes for extensions or groups of tools that work in a specific discipline in conjunction with data resources. So I think we're starting to at least grapple a little more with some of those issues, but those concern me a lot. In general, I'm starting to feel like software is a bit of an underestimated problem. We don't spend anywhere near the amount of time curating software as we do talking about curating data. We don't archive it in a very organized way by and large. There are a couple of notable exceptions. When you look at the complexity of a full software environment for doing some kind of computationally intensive project today, and you start thinking about reproducibility, we tend to focus on reproducibility of the data, meaning if we get the data again, we can get the same result. But getting the whole software system reassembled is monstrously hard. Often it isn't even documented. And we've actually seen some problems over the past few years where people have discovered, you know, errors in software and have had to sort of walk the cat back through the computations that were done with the bad software that in some cases gave rise to downstream products that had to be rerun. Good provenance systems are going to be very critical to managing that. So that's a long answer that I wish was even stronger because it is a big issue. Let's do one more question if there is one. There is not one. There is one. Randy Frank at Internet2. This is somewhat of a homecoming meeting for me. I'm returning to this community after a 10-year absence. One thing I'm curious about is one that was involved in things like JSTOR when they started up. One of the underlying assumptions back then was that by doing some of these digital library projects, the individual holdings at various institutions, you know, often these things stored in off-site storage facilities, we'd be able to finally get rid of them. I'm just curious, after 10 years coming back now, what's happened and has the current economic environment helped, hindered, not much has changed? We're just sort of a catch up for me. Welcome back, Randy. And I should say, by the way, you know, Randy's return here reminds me there have been or are about to be several important leadership changes in the network world that he comes out of. Doug Van Halling, after a decade of great service leading Internet2, has said he's going to step down. And on the other side at the National Lambda Rail, we have another homecoming to this community in the form of Glenn Reichert, who has recently been appointed their CEO. So lots of homecomings in that world. But to answer your question, I think that economics are certainly pushing people to get comfortable with things that they've been taking time getting comfortable with. It's a slow process, but for example, saying goodbye to dual print and digital versions of things, consolidating very lightly used print collections across institutions. All of these are things that 10 years ago sounded reasonable, but people said, well, you know, let's not be precipitous here. Let's make sure everything's okay. And now if it actually saves money, it can get pretty precipitous at times. I would say, though, that there's one particular sort of odd thing here, which is that storage space, whether it's off-site storage or on-campus storage, doesn't equate in a direct and straightforward way to financial savings. I mean, if you're getting ready to build a building and you cannot build the building, then it does equate to real financial savings. If you're paying rent on the off-site storage and you can stop paying rent, it's a real financial savings. But if you're just freeing up space that isn't immediately needed for something else and it's a sunk cost, it's a long-term financial savings, but not necessarily an immediate relief to an immediate budget hit. So this business of, for example, rationalizing print collections is important, but isn't necessarily, you know, the way to deal with the pain of this year's budget cut. I think we have run a little bit over time. I thank you for your indulgence and for joining us. I wish you a wonderful conference. And I just remind you, you might want to occasionally check the message board near registration for any potential schedule changes around particularly our breakout sessions. So welcome. Great to see you all. Thanks.