 Welcome everybody to the opening of the fall 2012 CNI member meeting. I'm Cliff Lynch, the director of the coalition. It's my pleasure to welcome you all here. I think we've got a very stimulating and challenging day-and-a-half plan for you. And we're here today to start it off. I want to plunge pretty directly into some review of the year and of developments. And then I'm going to say a few things about the program plan and how things are changing in response to these developments in terms of our priorities and what we're tracking. I am going to really try and keep this to about 40 minutes, which is going to be hard. And that will allow us a little time for conversation at the end. If we have to, I might creep a couple minutes into the break if necessary. We'll see how it goes. I want to start with just some sort of general comments on what's been going on. And, you know, as you know, a lot of the focus, certainly not our exclusive focus, but a lot of our focus has been on research and how that's changing and on kind of the interconnectedness of research and teaching and learning. We've not really talked a lot about classroom instruction, relatively speaking. I mean, certainly we've touched on that. You've heard some very important things about teaching and learning from folks like Phil Long and Gardner Campbell in recent meetings. But I think, you know, in any kind of reasonable judgment, the story of the year really comes out of the world of instructional practice and particularly the emergence of these massive open online courses or MOOCs to use this hideous acronym. You know, you can't pick up anything or go anywhere without reading about this. It's like big data only worse, except that it's kind of really mostly focused on higher education right now. I'm not going to give you a lengthy talk about MOOCs. Certainly this has been well covered in FORA, like the recent AgiCos meeting. We have got a couple of breakouts on MOOCs and I will say that those breakouts are really very much trying to get at the institutional setting around these rather than the sort of, I'm a faculty member and I just taught my first MOOC and let me tell you about the experience. So we're trying to go more generally than the individual faculty experience and less generally than the sort of, you know, here comes an apocalypse for the entire system of higher education as we know it and, you know, the whole world is going to be restructured next week and certainly there's a lot of that rhetoric going around. But I do want to make a few observations based on what I've been seeing here and in particularly I want to touch on a few connections that maybe haven't been touched on enough at least in the things I've been looking at. So what we have going on here I think right now is a number of demonstrations that have suggested that there is at least a class of course that can be done very cost effectively and pretty effectively in terms of teaching using this technology and this framework. It's very interesting how this has grown out of kind of earlier work on flipping classrooms and video capture of lectures and I think that that connection and the potential continuity there hasn't been fully recognized. One of the things that hasn't happened much yet is the assemblage of MOOCs from multiple sources of lecture video. I think that's coming and I think one of the at least footnotes we need to take away from here is that we really need to think a little bit more carefully and diligently about the management of the assets that our captured lectures represent and the potential reuse of them. One of the things that everybody is trying to understand right now is what areas work in MOOCs and what areas don't work so well in MOOCs and by the way what's your definition of success here which is kind of a tricky area. If you look at a course that takes in 100,000 people that 5,000 people successfully complete that's not a very good completion rate by normal standards on the other hand numerically and in terms of cost-effectiveness it's very good so I think we're still trying to understand how we evaluate works and doesn't work but certainly there is a lot of optimism and some considerable evidence that these work for certain kinds of relatively mechanical skills. They are tougher for certain kinds of more conceptual things. There is certainly a faction at least that believes that they will not work well at all for a fair number of humanities courses. This remains to be seen. I will just note in passing that if it turns out that these work for the very large humanities courses this is going to have some very disruptive effects at the economics of the production of PhDs in the humanities because right now we pay for a lot of those people by having them TA these big courses. That's I think much truer in the humanities relatively speaking than it is in some of the sciences and we should be mindful of that. I think one of the things that is really fascinating from a kind of a technology point of view is that one of the keys it seems to finding areas where MOOCs work really well is whether you can do some machine grading of work, machine assessment of people's growing mastery of the subject as they turn in assignments and things like that. Now I know that there have been sort of middle grounds proposed notably pure grading sorts of things but the earliest ones and the ones that were really, really sort of slam dunks are the ones where you can do machine analysis of right and wrong answers and actually you can even do a little more and you can characterize the kinds of wrong answers you're getting so that you can compensate for that and help clarify those points for individual students. One of the things that this is setting off is a whole flurry of research about how can we be smarter about machine grading on things. I had a very interesting conversation for example with someone who was doing a introductory computer programming course in a MOOC setting. Now actually those are a lot like teaching writing in some ways. You're not just teaching people about how to write a program you're trying to teach them how to write a program that makes sense as well as doing the right thing. You're actually trying to teach them some things that are stylistic as well as how to translate an algorithm into code. And so there has been a lot of interesting experiments in ways we can adapt various kinds of industrial tools to help evaluate code. I think we're going to see more and more of this kind of creative reapplication of tools in various areas. The last thing I want to say directly about MOOCs is that these are throwing off massive amounts of data and part of the issue here is who controls that data because this is data that can be put to work to make the MOOCs better, to write more effective textbooks, to make teaching more effective. The idea is that you may see frequent misunderstandings frequent in the sense of it occurs in one or two individuals out of a class of 100 every year and a experienced teacher who's taught the course many times may come to realize this but in a class of 100,000 those two individuals scale up to a very noticeable spike in your test results and you can actually start doing things to compensate for it. So there is masses of data coming out of this and a lot of the questions are about who controls that data and who gets to do what with it. One of the things that I have to say feels very strange in this area as in so many other areas is nobody seems to have asked the students. I can easily imagine if this goes forward a situation where learning records are amassed about individuals in the same kind of detail or perhaps more than medical records today are. At least in theory today you're supposed to be able to look at your medical records most of the time. There doesn't seem to be any such pact around learning data and learning dossiers about individuals and I think that this is an area that we probably need to do a little thinking about. The last point I want to make about MOOCs and this actually goes beyond the academic setting is that these started in academia, they're not going to end there. Some of you may be aware that Google decided it would be fun to build a MOOC platform and earlier this year they ran a couple of instances, an instance being about 50,000 students, through a how to do search good using Google course. That actually I think got into a number of things about searching and information literacy and related sorts of matters and they just did it because they wanted the experience, they figured it would help them improve their product and it would drive use of their product, makes perfect sense. I just invite you to take a minute and think about this technology being applied at scale as a way of teaching people about various things, potentially totally outside of the traditional academic setting. I think we should look to developments like that becoming considerably more commonplace in the next year or two. It's affordable as a way if you will of consumer education in a sense that taking traditional educational practices and moving them to the consumer world has not been by and large. I should say a word or two about library access in support of MOOCs. I think there are two things that are happening there. It's fairly clear that the sort of globally promiscuous admission to these large scale courses doesn't map well to the way institutions have licensed content for their communities historically. There are going to be two things I think that happen here. One of course is a greater drive towards open access material because if it's open access it bypasses a lot of these difficulties. The other though is going to be broader community licensing of materials. We're starting to see finally some fairly serious work on alumni access to content resources. You'll note for example that at this meeting we're going to see a talk from JSTOR on their program in that area and they are not the only player that's starting to make content available not just to current members of university communities but to the alumni groups. I think you're going to see a lot more pressure to do that and a lot more pressure as well to move to more rational personal licensing schemes, more affordable personal licensing schemes but even there I fear the transaction costs are going to be very challenging so I think that that alumni connection may prove to be quite important because it may be an enabler to let your alumni participate in various kinds of educational events taking place under all kinds of different auspices, not necessarily just your own institutions. It becomes a more significant added value. A word about e-textbooks because I think we're again starting to see some real traction there. We are seeing some attempts to do licensing of e-textbooks at scale and these are having some success. They look like they may have some economic payoff. They of course completely remap the relationships between universities, faculty, publishers, students and bookstores if there even is a place for bookstores that's left in here and I think in that way are pretty significant. I do think that one of the messages that we need to take away from both the MOOCs and the video lectures that underlie the MOOCs and also the e-textbooks is that we probably need to have a more deliberate strategy around the licensing and management of instructional materials. A lot of the work on e-textbooks has taken place through the IT community. It's been wonderful in the sense that they've actually been able to make some progress and make some progress pretty fast. Libraries historically in most institutions have tended to stay away from this area saying it's not our problem yet libraries have also spent the last 20 years developing a very sophisticated understanding of licensing and licensing terms and some of the issues around privacy, around archiving and around related matters which I think without question need to be brought to bear and exploited in these negotiations around e-textbooks. I also can't resist telling a story here that bothered me a little bit and I just want to see if it bothers any of you. So right around the time of EDUCAUSE about a month ago there was a short piece in I believe it was the cron about a new textbook platform that someone was hawking presumably just in time to show it off at EDUCAUSE and the distinctive feature of this platform is that it would report to your teacher whether you'd done the readings every class so that the teacher would be able to walk into class and know who's done the readings and who hasn't done the readings and get some handle on these lazy unprepared students. And they quoted a few faculty who seemed to think why this is wonderful I've just been waiting for something like this to deal with those lazy students who aren't doing my reading. Do you find this at all creepy that you've got a textbook that tattles on you in detail? But of course we can ask the same questions about our MOOCs we can ask the same questions about our learning management systems. Who are they talking to and on what basis and do the students even know? I think personally that this is an issue that is just waiting to really hit the front pages or as my predecessor Paul Evan Peters used to say when the net hits the fan. I think that we really need some conversations some serious ones about privacy and informed consent around interactions with learning. There are of course all the good reasons in the world that you can enumerate for why you want to be able to do these things to help students succeed to know when the students are having trouble or not doing their work. But at the same time we've got to have I think some balance and some level of respect here. I'm going to transition at this point from discussions about instructional practice and its connections to the networked information world to just a few comments about the macro environment before I move on to some specifics around the program plan. I don't think I need to rehash for you the debates that are going on in the public policy sphere in the political sphere about how vocational or not vocational higher education degrees should be about whether we're still going to do humanities about things like whether there are really decent jobs out there for all of these STEM graduates that we keep saying we need to be producing. I think really all I can say there is that these things are under debate with an intensity that I've not seen certainly in the last 20 years and they bear some serious consideration. And I'd invite you to for example think about the roles of employers in training as opposed to the roles of universities in teaching to revisit some of those points I just made about MOOCs being applied in settings other than higher education. I think the world here can make it more complicated than we know. I think that if you look at science we see science and scholarship more broadly but especially science under a great deal of pressure. One of the things that seems to be slowly bubbling up is a crisis about the reproducibility of results in various scientific fields. People are running efforts to reproduce results in a systematic way and some of them aren't going very well. I think that this is going to potentially if we don't get it under control really create some problems with the public supportive scholarly work and especially the public funding of scientific work and I think if you look at where the biggest problems are there's some evidence they may be around the biomedical and life sciences. I think we're seeing some fascinating phenomena in the publishing world that bear some consideration. One of them that I've been watching lately is the rise of plus one. I'm sure most of you are aware of this what you may not be aware of is the size of that journal at this point and the proportion of the scholarly literature it's actually publishing which is measurable concentrated in one place. And the kind of interesting distinguishing thing here is that this is about vetting for correctness rather than ranking. It doesn't make significant judgments so much as it makes correctness judgments and it tries to do that quickly. So it offers a level of predictability that is not found in the experience submitting to many journals. This is a very, very real change and I think one that suggests some interesting future developments for the scholarly literature assuming that it continues to gain traction. The last two macro things that I want to just note both deal with e-books in various ways. One area deals with e-books and public libraries. Basically public libraries are being largely cut out of e-books particularly mass marketing books at this point. It's a very problematic situation and it's one that over time I think will come to affect research libraries as well who need these as an important part of the sort of broad cultural record that they hold on to. This is a very good example of where licensing can take us and the extraordinary power that the shift from first sale to licensing gives to rights holders. Now the other related area is around consumer rights broadly in intellectual property goods and here we've had a very mixed record. We've had some I think very encouraging court judgments that have supported the principle of fair use and I know that many of you have been tracking on those developments and I think we can all take some encouragement from that. At the same time there are some very troubling things in the first sale area that suggest that we may see an ever increasing kind of limitation to first sale and I think that the broad kind of populace is starting to wake up to this a little bit. A very easy kind of catchphrase model on this is somebody going to be able to inherit your e-books and if so who and how. That's something that actually is starting to make sense to people as they spend hundreds of dollars a year acquiring e-books and they remember that their grandparents gave them books and it's not clear they're going to be able to give these to their grandchildren in any meaningful way. So I think that we are seeing some very interesting things that we've been thinking about in our community now emerging much more clearly into the broader kind of public debate and we need to be very mindful of these things. Now let me turn in the last chunk of time to a few specifics around our program plan and some of the things we track there. As you know one of the sort of touchstones of our work has been understanding changes in scholarly practice and particularly understanding the changes that are driven by information technology and the availability of large amounts of digital content. I think that that's a place that's worth returning to again and again because scholarly practice doesn't stay still. It continues to change and I think that we can identify a number of new developments there that have sort of crept up on us in the last few years and that maybe haven't received enough attention. So one that I'm watching a lot is something that it would be I think easy to misclassify as big data. There's a tendency now to point at everything and say it's big data. I think what's really happened though is that we've moved into a world where for many kinds of scholarship there is an abundance of evidence whether it's a historian or a political scientist trying to examine records or whether it's an archaeologist trying to understand typical practice in the making of urns during a certain period of time and space. We used to have very few examples and studied them hugely in great intensity. Now in many cases we have lots and lots and lots of examples. We want to know about averages and outliers. We want help in trying to make sense out of some important literary figures quarter of a million email messages that they've given to some special collection. All of a sudden you see tools that do things like social network analysis on these collections. And various kinds of automatic search and clustering. I think that these are a bit different and indeed in some ways predate the ideas around big data but are becoming very deeply embedded in a lot of scholarship now. And I think we also see the same kind of abundance in our efforts to cope with the scholarly literature. I just note a couple of things that I've come across in the past year or so that I'm finding a lot of people haven't looked at yet that are examples to me of some potentially new scholarly environments that bear consideration. How many of you have seen math overflow? I see just a very small number of hands and I might be sort of blind. Now this is a system where mathematicians and upper level graduate students can basically post questions and get answers. It's not really built to have lengthy kind of conversational trails in the way that some systems are but really more to frame a question, take a few comments on it and get an answer. It has a very elaborate system of ratings and rankings on it that allow for various kinds of sort of self-regulation and crowd control. There's a similar system called Stack Overflow that may be familiar to others which is really much more focused on programming and is less academic and more engineering in character I would say. Actually Stack Overflow is both the computer science instance and also the underlying platform. But I think if you look at systems like that you are finding very sizable scholarly communities now or practice communities growing up around these and starting to use them in very serious kinds of ways in their scholarly and professional activities. I think we're seeing experimentation in a number of other novel areas, a system I've been sort of staring at for a couple of years trying to really figure out what I make of it, for instance, as Wolfram Alpha. If you've not looked at that, I would look at that. That's a new kind of class of information system which has got some capabilities for encoding computational knowledge and is really quite new, quite novel I think. I believe that we need to be very open to recognizing these kinds of new systems that are showing up in various scholarly communities and I just mention these as a couple of examples in passing that I've been looking at quite a lot lately. There's every reason to believe there are lots more of them out there. I think that we need to always be mindful that scholarly practice does not stay still and that this creative environment that the digital world affords us continues to ripple change here. As far as data curation and research data management I think that we're in a very interesting place right now. This is an area where CNI has been very active for a decade now trying to look at what was coming to alert our membership to the coming focus by funders on the importance of managing and reusing research data. I think we're there now with the first wave. We've certainly seen the NSF requirements. We've seen the NIH requirements. Other funding agencies are moving along these paths. I'd say a couple of things. First, while we've changed the regulations and certainly are demanding data management and data sharing plans we're flying mostly blind here. We know very, very little collectively about what's in these data management plans. There's been some small pieces of very good work done on a couple of campuses, but we really need to know much more systematically what's being proposed. We need to know what effect this is actually having on funding decisions. We also need to know what kind of compliance is taking place, whether any of the people actually do what they say they're going to do in these data management plans. We need to know a lot more than we know today about reuse, about what gets reused, about what's useful to reuse. This is going to guide us in our preservation and retention decisions. There's a tremendous need now that we've made some first steps in policy to collect data so that we understand what's working and what's not, and we have some guidance for the next policy cycle, which I would presume would be a few years out. I think this is a place where we should all work together and work with the funding agencies to try and get a much better handle on what's happening. There are a couple of specific very sore points, and while I would say, you know, in general, probably CNI is going to be a little less active and a little more focused in this area, because as these requirements have hit and as everybody's come to understand the importance of research data management, a number of other, in many cases, much better resourced organizations have come in to help organizations work through many of the sort of tactical and implementation things here. I think, you know, in many cases our best contribution is to continue to work a little bit farther out towards the horizon, trying, as I say, to understand what the ramifications are of the actions we're taking. There are some specific areas that are still very problematic. One that I'd call out is anything involving individually identifiable data, whether it's out of the biomedical world or the social sciences world or the humanities world. Reusing this right now is very, very, very hard, and in fact the sort of traditions of IRBs seem to be very much at odds with the traditions of data reuse. There's a need for a rather difficult conversation, frankly, involving a lot of groups who historically haven't talked to each other very much to begin to sort this through. I'm very hopeful that we will be able to contribute at least something to catalyzing that discussion over the next year or two. I'd also come back and underscore an observation that I've made in previous years about one piece of the research data management puzzle being very much about risk management and research continuity. Given the investments that we make in research as a nation, as a world, given the sort of hosting responsibilities and stewardship responsibilities that universities have for that research, we do need to think about research continuity and risk management. And sadly, we have this year had another case study in this in the form of Hurricane Sandy. We had a situation some years ago with Katrina, and I think the institutions that were affected by that learned a great deal and taught all of us quite a bit about some of the issues around disaster recovery and research continuity. I would hope that we will have the same opportunities as everybody gets back on their feet from Sandy to study these issues anew. One of the things that, for example, is already very clear from early reports and early experiences is that it's very easy to take an exclusively IT-driven view of research continuity and just say, oh, if we can back up all the data, everything's okay. There's a lot more to research than data. There's cell lines and reagents and things like that that depend on freezers working and not being flooded and various kinds of physical continuity and indeed a fully articulated risk management program for research has got to look across all of these things and even the interplay between them. So I think that there's still lots to be done and lots to be explored around data curation and research data management. I think CNI's role in here, as I say, is going to become a bit more selective and a bit more horizon-looking as we move forward. I also will just note that we're starting, I think, in the development of initiatives like the Digital Preservation Network, DPN or Deepin to see a new emphasis on interinstitutional strategies here, which I think is a very welcome and necessary complement and continuance of the institutionally based efforts that have dominated the landscape outside of disciplinary repositories up until this point. A couple of other comments. The mobile world continues to be a very interesting one. We're seeing lots of platform diversity there. We're seeing a whole new set of places where the consumer world and the higher ed world meet and cross and interact in sometimes very unexpected places. It's not just bring your own device and how we cope with that. It's really a much deeper set of questions, especially as a number of the vendors try and turn aspects of these devices into walled gardens of various kinds. A phenomenon that I don't think we've thought through enough. I think I'll make this my last note about major areas of opportunity and interest. I'm seeing suddenly a lot of progress in building a true, genuine web of interconnected knowledge. What I mean here is really something much deeper than linked data or the semantic web, something much more comprehensive, although certainly some of the tools that linked data and the semantic web provide are helpful in this area. We're seeing, for example, conversations about interconnecting name collections, authority files, biography, encyclopedic resources like Wikipedia, geographic resources, cultural heritage resources in enormously complicated ways that really are almost without precedent. This is happening at scale. I think it's a very important set of developments. It's breaking down silos all over the place. These are hard conversations to have. They are messy. We are very weak in terms of the standards underpinning that's necessary to make some of this happen. Nonetheless, the amount of progress I'm seeing here is striking. A final piece of the puzzle, which I'm actually starting to develop some serious hope for, is the Open Annotation Initiative that Herbert von Dessampel and a large number of his colleagues have been working on. That's preceded down two tracks. We've had a number of briefings on it here and we'll have an update at this session. One is broad standards and the other is implementing projects. As you move into this web of knowledge, you really need to be able to annotate across objects and across silos. That's part of what's going to make this happen and make it effective. Annotation is a problem that people in the computer science world, people in the human factors world, people in the world of web things have struggled with for 20 years. It's a really hard problem and it's got a number of social aspects and organizational aspects to it that tend to cause a great deal of trouble as well as the fundamental technical challenges. I have to say that I am starting to see some projects that give me considerable optimism that we may have made a real notable piece of progress here in starting to deal with some of these issues and I think it's going to be very important to track those projects and to understand some of the broader system ramifications of them. For example, where do you store annotations? How do you identify annotators? How do you specify who you do and don't care about annotations from? Those are the kinds of things that have to be worked out at scale to really make this work and I think it's not too soon to be thinking about some of those areas. So that is a very episodic kind of look at some of the things that are happening out there today and some of the ways we're shaping our program and the areas we're tracking most closely to respond to these developments. You have the 2012-13 program plan in your packets. It's also available now on the web. There is more detail and there are also some other activities that we discuss in there that I invite you to read about and be in touch with us about but my time today is limited and I wanted to spend a little more time going into some depth in a few key areas rather than survey every bit of the landscape. I would be very pleased to take a couple of questions or comments on what's been happening, what we're doing, what we're not doing, what we should be doing. There is at least one microphone here that I can see glittering and maybe one further back. I don't know. And then after we finish that, I'll get an update on any breakout changes before we adjourn. The floor is open. Tim Lantz from NISANET. MOOCs are a technical intermediary for a classroom. Overflows very much like, reminds me of tea, the afternoon teas at Princeton. It's almost discussional and it's self-regulating by expertise. It really is something new to me anyway. Yeah, I mean the analogy that's come to my mind as I've looked at it a number of times is really it's sort of like being able to wander down the hall and talk to a series of knowledgeable colleagues about something except that the hall is very long and goes all over the world. It really is, it feels to me like something new as well and it will be very interesting to see whether some other disciplines pick up on this as well. Other comments, questions? We do have a couple minutes. Wow, okay. Well, hearing none, wait. I'm sorry, it really is hard to see with these high power lights. Not to prolong the time between now and coffee too far, but I wanted to pick that up actually. As someone who's benefited greatly from stack overflow over the past few years, I've noticed recently that it is suffering what a lot of web resources over the years, internet resources over the years, this antidates the web, have suffered from indilution through success. And I wonder whether there's, and there's the concomitant move of the rise of more sort of closed-wall resources again like Cora that can boost their ratings in Google and offer answers but you have to pay to get in. And I think there's this tension between the radical democracy of the web that invites everyone in and a flood of superfluousness without getting political about it that dilutes the conversation and makes it less interesting. That's a dialogue it seems to me that goes back and forth and I wonder why you think this one's any different. That is certainly a dialectic that we've seen again and again in online communities where you'll have a online community that does very well for a while and then too many people move in, especially people who don't know what they're doing and then the people who do move out and go somewhere else. And that is a very difficult problem to manage. I would say that it has certainly not been managed successfully by any software system I've seen yet. There are probably a few communities who've sort of dealt with it by social norms somehow and there certainly are strategies of erecting barriers around online communities to try and limit them to people who can contribute. But as you say, there are some very real tensions between that and wanting to facilitate open educational and research dialogue. I think that certainly if you look at some of the most recent generation, they're trying some new things there in terms of the ability of the communities to be more sort of self-policing about that's a stupid question, goes somewhere else, or perhaps that's a stupid question, but that's an inappropriate question for the level of expertise we're seeking on this forum. Part of the trick is having somewhere else to send them to. And I think that that is perhaps an important factor, not just saying go away, but for people in your situation and your level of understanding, this is a better community for you to go to. But I agree, it's to some extent an inherent tension and one that is proving very challenging to manage by technical means. We have to get better at doing some of this by technical means. And I think we have made some progress since the early days of discussion boards, which showed that tendency you described in an absolutely vicious way where these would thrive and then suddenly depopulate in a period of weeks as people moved on to the next system, or at least the founders moved on to the next system. One more question. People are ready for a break. Joan, what can you tell me about schedule changes? Okay. The numbers are... Okay. The session on massive open online courses has been moved to 1 o'clock tomorrow. It will take place in Federal A. But rather than today, it will be tomorrow at 1 p.m. Wikipedia and libraries will take place today at 2.30 p.m., right after this. And that session will also take place in Federal A. So those are the two changes we know about. Hopefully those will be the end of it. And the really good news is that that means that both of these sessions are going to happen. And by the way, I would certainly commend both of them to your interest and your attention. With that, let me once again welcome you to our fall meeting. And I hope you enjoy all the breakouts. Thank you.