 We're going to get started here. I'm Dean Kraft, and I'm talking today about the Vivo project, enabling national networking of scientists. And my co-presenter is Val Davis. So Vivo project is, among other things, a very large collaboration, as you can see. There are seven schools involved and a number of developers and implementers at each one. In the talk, I will give a sort of a high-level overview of what Vivo is. And then I'll look under the covers and talk in more detail about how it works. Then Val will take over and talk about how we're actually implementing it at the sort of social and organizational level. And then a little about what's ahead for Vivo. So as research efforts become more interdisciplinary, it can be hard to find collaborators, certainly outside of your own narrow area. This is a problem for a lot of disciplines, and biomedical area in particular. So there's a challenge to locate people to build large teams. Vivo is designed to help people find the collaborations that are really important in modern science. And it allows you to find things both within the institution and outside across institutional boundaries. And both scientists and administrators, students, faculty, others beyond can get a sense of and explore the disciplines, researchers, faculty at an institution. So what is Vivo? It's a semantic web application, a web application using semantic web technologies that enables discovery of research and scholarship across an institution. It's made up of detailed profiles of faculty and researchers and gives a lot of dimensions of faculty expertise and involvement. I'll talk about a number of those throughout. And it gives you a powerful search functionality that lets you really sort of explore and search and parse out the information to find what you need. So a Vivo profile is detailed information about an individual drawn from what Vivo collects. It allows you to sort of illustrate your own expertise, make people aware of that. As compared to a standard sort of web page description or a social networking site, tends to have richer content drawn from a number of sources about the individual. And it's more explicit, and I'll talk about some technical ways in which it's more explicit. So who can use Vivo? The goal of the NIH project, the National Institutes of Health Vivo project, is really to enable national networking and collaboration for biomedical researchers. And so a fundamental user group is really the faculty and researchers looking to find other people to work with. But at Cornell, it's been heavily used by prospective faculty and students, particularly when exploring a complex discipline like the life sciences. We have faculty and expertise scattered all across a number of colleges, departments, programs at the institution. And for somebody outside trying to find who to work with can be a real challenge. It's also useful for administrators in making connections. The Cornell Development Office was contacted by a major company looking for researchers in a particular area. And using Vivo, they were quickly able to pull up a number of people from different departments and actually make the translation to real funding dollars for the institution. So Vivo also can serve as a real disseminator of information. It draws stuff from a variety of sources and that lets you reskin, repurpose various cuts across that information to give specialized views of the institution. So I mentioned the life sciences. This is a view that gives you a sort of portal onto the graduate programs in the life sciences and lets you explore that again across departments, across the sort of traditional web boundaries. Here's another example which is a research portal highlighting research activities in the College of Agriculture and Life Sciences at Cornell. Another example, people looking for particular expertise in an area like climate change or any area within the institution. In fact, you can build a specific portal that will draw out that kind of information. And finally, here's a collaborated Cornell site that has expertise shown in geographic areas across the world. So if you're looking for somebody to collaborate with in a particular country, again, you can get hold of that information and find out who might be available as a good partner. So how does it work? Let's look under the covers a little. So Vivo is built by harvesting data from authoritative university sources, verified sources within the institution. This allows you to build up for an individual faculty member a fairly complete profile without their having to do anything, which since faculty are very busy people is a real advantage. It centralizes the information in a common format. I just showed you a bunch of sort of skins across Vivo and it gives you then a pool of information that you can draw on for those purposes. So Vivo takes information from internal data sources, sources of record of the institution, external data sources as well, publisher information, others from outside. And finally, it lets people edit and customize their profiles and we'll talk a lot more about that. So storing data in Vivo. I mentioned that Vivo is built on semantic web technologies. Information is stored using a format called the resource description framework, RDF. And the data is structured as triples where you have a subject, a predicate, or a relationship, and then an object. And these triples are... So the relationships and the nature of the subject and object are specified in a shared ontology which is essentially just a description of the meaning of those pieces and how they relate to each other. Let's look at a more detailed example from Cornell. So here we have a particular researcher, faculty member at the institution who is the author of a particular paper who teaches a course at the institution who is featured in a particular news article. A member of a department is heading up a particular research program, works in a particular research area, is a co-author with another faculty member who's also in the same research area. So you have people identified by these explicit relationships with unique and defined objects in the system. So there's only one crop management. If somebody puts crops management in for their department or department of crop management or something else, you don't have that kind of a matching problem. These objects are firmly defined in the system. So once you have all this information available, you can then query and explore it. You can explore it by sort of those individual objects. So everything about, you can look at everything about an event, a grant, a person, and the system will allow you to display information that way. You can look at things by type, so a class of events or whatever, by a specific relationship. So you can look for grants with PIs that come from different colleges within the institution and just pull out those kinds of grants. And by various other combinations and facets exploring by publication and co-authorship, by grant, co-PI, and geographic, many different sort of cuts and explorations across the information. So here's an example from the site. You've got a single grant and linked information about the principal investigator, about the administrator of the grant, the co-PI's, and these are all live links. You can click on them to explore further the sort of network of relationships among all the pieces. There's another example where you're browsing by a type, a class of things, in this case by seminar series. So these are all the seminar series at the institution and you can pick individual ones or explore that way as well. You can look at results for a single topic, in this case homeostasis, and you get results that are a mix of faculty, courses, other things connected with homeostasis, and then in turn you can restrict based on facets. So you can restrict it to just the people involved or the events involved or whatever. So what are the advantages of using this ontology approach? It provides a key to the meaning of these objects, the fine sets of classes and properties and unique namespace. We're getting a little geeky here. It's embedded as RDF, so the data, if you expose all of this data, it's really a self-describing set of information. Using the ontology lets you align RDF from multiple sources. So multiple institutions, you can draw information from outside. It also actually internally allows us to align the information as we just draw from different databases within the institution by mapping them to this common ontology. And it lets you map to... Once you have the standard ontology for our own information, it easily lets us map that to other ontologies in the outside world that are being used more broadly. And finally, it lets you have... You can have local extensions to the shared ontology that then sort of roll up so you may have a particular refinement of an academic position at your institution that you can just roll up into academic appointment in the common ontology. So with shared ontologies, the individual facts line up in ways that are easy to follow and easy to process. Without shared ontologies, you tend to wind up with a log jam. Your individual facts are piled up on top of each other and it can be very hard to sort out the underlying relationships and information. So to get even more geeky for a minute, we're adhering to the linked data principles. Sir Tim Berners-Lee articulated this as a set of principles for semantic web information. We use URIs, like a URL, really, Uniform Resource Identifier as the names for things. They're HTTP-resolvable, which means you can stick them in a web browser and get some description of what the thing is. We use standards RDF and Sparkle to exchange information and we include links to other URIs so that people can discover more things. So Vivo as a system at an institution enables authoritative data about faculty, about researchers to become part of this overall cloud of linked data applications. And it's growing pretty regularly. You'll see there are a lot of sort of science and biomedical things down here at the bottom. Music and other things are up here. It's a large collection of interacting information. So there are some challenges with taking the semantic web RDF approach. I'll go through some of them here. One is granularity levels. I mentioned academic appointments before. Those can be very different at different institutions and as you write an ontology, what level of refinement do you want to put? I mean, if librarians or faculty at one institution and not at another, how do you make everything line up? How do you come up with a common way of describing the information at a granularity? So that flows over into terminologies. You have to get everybody to agree on a common terminology for a lecturer or a department or a program or something else as you build up this common way of talking about things. Scalability. We know that VIVA works well at a single institution. It's been running at Cornell for a while. As we scale up to the national level, we expect to face some challenges. Fortunately, the underlying semantic web technology has a lot of oomph behind it and people are building more and more scalable solutions all the time. So we're hoping to leverage that outside work. Disambiguation is an issue. You have publications and other things. You may have different author ideas, different spellings of an author name as you're trying to pull in common information. How do you disambiguate? Provenance is one interesting one. The semantic web, as you saw, it was just a triple subject, relationship, and object. There's nothing that says where that information came from. As you pull these facts from various databases and combine them in the RDF store, you can potentially lose the provenance. You don't know who made the statement and therefore perhaps how authoritative it is. And finally, temporality, this semantic web by its nature, really gives you a current snapshot. It shows you exactly the way things are now. And as things change over time, there are ways that you can represent the fact that somebody's employment or position or whatever changed over time and keep all of that information, but it tends to make the ontology more complex and harder to work with. So that's a bit of a challenge. But, and this is the handler hypothesis, a little semantics goes a long way. If you can make these simple statements available, you really do get this network defect where you can combine information and really gain a lot more from even simple statements about individuals and their relationships. So the major components of Vivo, it's a general purpose open source web application that leverages semantic standards. It includes an ontology editor, a data manager to manage the information about people within the system, and a display manager that lets you put it out there. So that underlying technology actually is not specific to the Vivo project I'm talking about here. It can be used elsewhere for other things. The Chinese Academy of Sciences is making use of the underlying Vivo technology for several websites to connect scientists across China. Australia universities are looking at it as well for similar approaches to being able to network their scientists and publications and other things. So the customizations for this application for the Vivo project I'm talking about include a specific Vivo ontology that's focused on faculty research and the organizations and structures around that. A specific display theming to display that and then as Val will talk about in a bit, specific sort of implementation and installation work that's going together with it. We're also, as part of the project, building additional new software that enables a distributed network of RDF by harvesting from these individual institutions and building across them. I'll talk a little more about that in a moment. So three functional layers of Vivo. There's a top, the end user piece is a search and browse interface with potential self-editing for individuals. The middle layer is the curators of the system. They can sort of set up the display themes, the navigation setup. They do curatorial editing, so if you, you know, the librarian can put in information about publications about a particular individual to kickstart the system. And at the bottom level there's ontology editing, the system lets you within itself without using an external ontology editor manage the ontology and all these relationships and ways of describing information. And then at the bottom there's data ingest, drawing from the authoritative sources at the institution and data export. So what does the local data flow look like at a particular institution? So you're drawing from local systems of record. At Cornell we have a PeopleSoft HR system, we pull a lot of information out of that. There's a grants database, there's courses databases, other sources for this. We're also pulling from external national sources, things like PubMed where we get publications. We're in discussions with individual publishers about granting us permanent free access to the basic bibliographic information for our institutions. Whether they would be willing to make that available to us so that we can incorporate and redistribute it in VIVA. So the flow then is into these data ingest ontologies in RDF which in turn then goes into the VIVA ontology, it's mapped into the VIVA ontology in RDF where it can be interactively updated by curators or by individuals. And finally it's re-exposed as RDF and shared outside using RDFA which is a format that embeds RDF in web pages. So those display pages I was showing you before would have RDF tags embedded in them by harvesting RDF from the institution or potentially by doing these sparkle queries to a sparkle endpoint that we would operate at an institution. One of the main efforts of the VIVA project is to go from the local level to the national level. So what we're looking to do is pull filtered RDF, perhaps not everything from an institution, out to a, could be a national or it could be a regional or specialized indexing portal and system. For this project Cornell and the University of Florida are going to build indexes across our entire sort of the VIVA collaborative. And also to be able to do visual, you know, once you've got the information into a common triple store and all together you can potentially do analysis, visualization and other exciting things. So the national networking piece draws, as I said, from these individual participating institutions and their individual VIVOs builds up these RDF triple stores that search across them, does visualization and then networks in with other sites as well that might provide either VIVA ontology, compatible information about their people or other RDF that we can map to it. It can be used by, for example, a professional association, could be either a consumer of information about faculty at an institution or in turn a provider of information back to national VIVA. So lots of possibilities and all part of the sort of linked open data web. So now we'll talk about how do we implement it and I'll let Val take over here. So in 2003 Cornell began development of VIVO and in 2007 UF, University of Florida where I'm from began their first implementation. In September of 2009, just this last year, Cornell, UF and five additional institutions we received $12 million in stimulus grant money from the NIH to enable national networking of researchers. All seven of these institutions have VIVO currently installed the data is imported, stored and maintained by the local institution. Because that data is RDF compliant, because it's linked open data, and because there's the common core ontology shared by all of those institutions, VIVO enables networking on a national scale. So now we have seven institutions with VIVO installed and we're moving towards full implementations at those sites and we want to expand our partnership. We want to encourage adoption and participation outside of those seven institutions. There are a number of ways in which, in ways that institutions and organizations can participate. The first and most obvious is you can be an adopter. So individual institutions such as Columbia or Northwestern, federal agencies such as the NIH or consortia of schools such as the CTSAs can all adopt and implement VIVO. You can also be a data provider. So VIVO enables national networking of researchers. There's a sister grant led by a Harvard team called Eagle Eye and they enable resource discovery. Every resource within the Eagle Eye grant has the ability to have a person who creates or manages that resource. So that's a data link between those two systems. Data providers might also include your publishers or your vendors. So PubMed, your subscription databases such as Elsevier or ISI are potential data providers. And in addition to Collexus, which you may know is a similar product to VIVO, they've done quite a lot of author disambiguation. They have a large amount of PubMed citations in their system and it's a possibility that they would feed VIVO profiles with that data. And then of course professional societies such as the AAAS are curators of data which would be very useful for building out VIVO profiles. Participation can also include the development of applications. So Dairy, Digital Vita, other semantic web community members can build applications that reuse and repackage VIVO data, that pull that data from the local or the national level and repackage it. You can also be a consumer of the data. So your major search engines such as Google, Bing, Yahoo, again your professional societies, and then any producers or consumers of semantic web compliant data can participate in the VIVO initiative. But there are a number of challenges to facilitating adoption. First off, in developing VIVO, balancing the needs of the individual with the needs of the institution. At University of Florida, many of the researchers, they want CVs, they want biosketches, they want collaboration tools. They don't necessarily want a VIVO that helps their administrators evaluate them. The institutions on the other hand, they might want a faculty reporting tool. And so balancing, finding common ground between those two needs is a very important step for VIVO. VIVO was originally designed for researchers, and the research community has been enthusiastic, willing to participate. That enthusiasm might, you know, vary based on whether a researcher is early in their career or well established in their career, but generally speaking, the enthusiasm is high for VIVO. As we branch out to the biomedical community and maybe more specifically to the clinical community, we're finding that their business model is very different. And so we need to better understand the challenges and needs of that community before we can really expect them to adopt VIVO. On the right hand side of the screen, you see a graph that talks about adoption trends. So in the very front end of that graph, we have about 16% of potential adopters will adopt no matter what. If it's exciting new technology, they'll jump on board. On the latter half of that curve, we see the people no matter, you know, what kind of marketing you do, they're always going to be late adopters of a technology. So in the middle there, we have an early majority and a late majority. And these two areas compose of about 68% of possible adopters. And these are the people that are crucial. We need to identify what sort of makes them tick and then make sure that VIVO meets those needs and addresses their particular issues. We do feel that VIVO does that with the link to the local data sources and harvesting data automatically and not making local researchers fill out their own profiles initially. We think that that is a crucial step to meeting the needs of potential adopters. And so in the VIVO model, we choose to meet those challenges through the support and dissemination of VIVO through libraries. Libraries are traditionally a neutral entity on campus. We understand our user communities and the research environment in which we work. My very first position in a library, I was there maybe about six months, and our IT and library were merged into one. You know, librarians are very adept at information management and are increasingly involved in IT decisions in the local communities. We are also subject experts. So whereas I'm 80% on VIVO right now, I'm also an agricultural science librarian. And I understand, I feel that I understand the needs of my agricultural science community and what drives them and what kinds of policies affect my faculty and my administrators. Librarians also have a very strong service ethic. It's a kind of providing academic support to our communities. And maybe even what's most interesting for me as somebody working on VIVO, we can also resolve many of the data integration problems endemic to legacy systems. Within my Institute of Food and Agricultural Sciences, it was sort of the putting out to pasture of one of these legacy systems that brought VIVO to us in the first place, presented on VIVO as a possible solution to this issue and started conversations locally. So what do librarians do within VIVO? Well, we are, we do content development. So we identify what data, what local data to grab and push into our VIVOs. We help refine the ontology. We look at the data, we have to understand that data, and then we push, we identify whether the VIVO ontology will meet those needs and if not what ways we need to change the ontology to better hold and define our data. We also make suggestions for interface refinement. And again, we negotiate with the people on campus who hold the data. We identify what data it is that we need. We tell them how we're going to use it and in some cases how we're not going to use it. All data that is stored in VIVO is publicly available and administration, they have very legitimate concerns that that data isn't used improperly. We also provide local and national level support and training. This means the development of documentation, website FAQs, presentations, any kind of publicity and marketing materials. Librarians have their hands in it. We are there helping to create those support materials. Through VIVOweb.org we provide, we are liaisons to potential collaborators. We create a community of support through user forums and we also provide a large amount of feedback, both on usability and in the creation of new use cases, which we then deliver back to the development team for greater feature development. And then lastly, we also do marketing through the development of the PR materials and we do a large number of demonstrations on VIVO, both at conferences, workshops, and through the website. So what's ahead in VIVO development? As of this week, this Friday, Release 1 will be deployed at seven, our seven partner sites on our production hardware. Existing data within our previous versions of VIVO will be mapped to the local Release 1 ontology. And then once that's done, we can then begin batch loading data from various sources on campus. So at UF, what we've done so far is we've already loaded our human resource and people data. And once Release 1 is out, we will expand to grants, to courses, to local events on campus. Release 1 will then allow us the opportunity to provide feedback for future releases, feedback on ontology development, whether or not a local ontology needs to be built. We do not change the core VIVO ontology, that we need to keep that as consistent as possible for national networking, but it is possible in some cases whenever an institution has a very unique instance that you can develop a local ontology to help meet the needs of those unique types of people or data. We also will increase or improve our usability, and VIVO will become open software under a BSD license, and the ontology will be available for download. Over the next six to 12 months, again we will expand our data ingest framework to include a greater number of sources. We will provide visualizations, both in-page and at the site level, and then we'll also bring improvements to modularity and customizability of VIVO. We'll again continue to provide user support material, develop that as we get feedback from the community and additional adopters, and we'll expand functionality to meet the developing use cases. So there are a number of things that we feel that we're doing that will help drive future participation. First and foremost, we've already mentioned user scenarios. These user scenarios will help develop the ontology, continue to develop the ontology, and to identify key features that our users want to see within the VIVO interface. We will do iterative user testing to improve usability, and we feel that VIVO, because of the nature of the seven schools that are within this grant, we have a large array of different types of administrators, scientists, clinicians, students, staff. We think that VIVO provides a compelling proof of concept within the consortium. We will also be addressing interests from other institutions. This is done primarily through VIVOweb.org, and in August on, I believe, August 12th and 13th, there will be a national VIVO conference. Anybody interested in participating in VIVO in any way can attend this conference. It will happen in New York City at the National Hall of Science, I believe. Yeah. New York Hall of Science. New York Hall of Science. And at that conference, or even prior, if you go to VIVOweb.org, you can request demos, meetings, attend workshops, view any exhibits and download and use the publicity that we've been creating. That publicity is designed both to create, to increase our partnership, and also to help sell VIVO at the local level. So, you know, you've decided to become a VIVO participant. How do you sell VIVO to your local community? And we have created quite a lot of publicity and PR materials to help you sell VIVO locally. We're also going to be doing quite a lot of network analysis and visualization. There are three levels of visualization maps that we will be providing. The first would be at the individual or investigator level. These would be in-page graphs, say within my own personal profile, I could view my publication history, all sorts of collaborations that I've done with other people, say, on grants or maybe co-teaching courses, and then view my co-authorship networks. Another level would be at the departmental or the institutional level. And this would be you could view trends in your research grants or your publications. You could view your collaboration networks or topical alignment with base maps of science. And then the last level is the network level. And this would be patterns or clusters by geography, by topic, by funding agency, institution. Really, if we have the data in there, we can probably visualize it. So future versions of VIVO will generate CVs and biosketches for faculty reporting. We'll also incorporate external data sources for publications and affiliations. So this might be, say, your database aggregators, your major publishing citations from your databases. We would also be able to display visualizations of complex research networks and relationships, such as the one you see on the left-hand side of the screen, which is a co-authorship network. We would link data to external applications and web pages, and then finally we would realize the full potential of the semantic web. So with that, I encourage you, Dean and I both encourage you to go to vivoweb.org. Take a look. You can request a demo through the contact form. You can drop us your questions, your concerns, any interest that you have, and we're here for questions. Hi. I'm just curious. Can you say something about the adoption on the seven campuses that this is being extended to? Who owns that decision? I'm Brian Skip from the University of Michigan, and we've got a bunch of different units on campus looking at a bunch of different products. There was an RFP that went out for an expertise database. We've got the med school using Colexus and finding that they have to invest heavily in correcting the data. We've got Elsevier staff, I shouldn't say peddling, but telling us about the wonders of Syval, Thompson has their product. All this stuff going on, and what I'm finding is different individuals on campus have different needs, different goals, and the vivowe is very interesting, but I'm just trying to figure out on other campuses who makes a decision to go with vivowe in your experience. I can say that well, first of all, at the participating institutions there's a lot of money thrown at them by NIH, so they may not be fair. They all agreed to take this on as part of the grant. At Cornell, before the grant came on, it was actually the provost office and a system provost who saw the system, was excited by its possibilities at the institutional level and really took it on then. I think if you're going to do it institution-wide, you need to get buy-in at that level. You need to say, yes, here's a system that is, first of all, will work across the entire institution. We've been talking here particularly about science because that's the focus of the NIH grant, but vivowe is not in any way science-specific. At Cornell, it supports all the disciplines and runs across the entire institution. It's a broad solution that can fit in anywhere. It will be completely open source and is fully integrated with the semantic web to the extent that that is potentially of interest. You're not tied to a particular commercial provider. I don't know in detail a lot about the potential competing solutions, but certainly a number of institutions have found vivowe to be compelling and have picked it up. We will certainly be trying to make that case over the next 18 months of the grant as we roll out new capabilities and demonstrate the system more broadly. If you're interested, as Val said, if you're interested in having us present to anybody at your institution on it, we can certainly do that. At University of Florida I think the institutions see the value of having one system for that one institution. At least at University of Florida the library was willing to take on the challenge of pulling data and managing a system such as vivowe. In some ways, within UF if you have the backing of your administration, then I think it's sort of a no-brainer. It just kind of happens. It can happen to the push can happen with the one organization that's willing to manage the resource. The other thing I will say is vivowe can also really serve as a lingua franca for a bunch of different systems. We actually at Cornell, faculty reporting in some, but not all of our colleges, is done using activity insight, and we're just looking at building a standard way to harvest from activity insight into vivowe. Again, activity insight holds data that you want public, but also data that you want to keep private. We have to be careful about sorting that out, but again at the administrative level they don't want to force all the colleges necessarily to take on activity insight, so vivowe can be the common channel language presentation across all these diverse systems. Yeah. Has NIH installed vivowe, or does it intend to? I believe, are they talking about it? They're talking about it now. I believe they're talking about it now. We're still... Six months in. We're six months in. Nominally what we've just done now is a release one point out. I would not encourage anybody at the moment outside of the participants to put this system that we've got up today in production at their institution unless they're willing to take a few... do a lot of extra work or take some arrows in the back. Six months from now we should have something that we could endorse for production release. I would also add that we have a contact management listserv and we're getting daily emails from people who are strongly interested in adopting or participating in vivowe. I'm Ann Dobson from UC San Francisco. Our CTSI has adopted I think a different product. My question is are they going to work together because this has been implemented and we've collected lots of information in it and would like to share that. Certainly if you can expose information in RDF and in particular if you can expose it in RDF in a way that we can map into the vivowe ontology then you could certainly become part of the national network. Vivowe in that sense is not exclusive. It really is part of this manic web. Hopefully we can come to some slightly better ontological mapping than just friend of a friend but I don't know what you're using for an ontology. It came from Harvard I want to say. It came from Harvard. It's not a manic web. We would have to look at how we could pull information out of that into a semantic web ontology. I mean essentially vivowe and the vivowe network will happily work with anything that's RDF and can map to the ontology. I wonder if you might just take a second to talk a little bit about the considerations of public and private presence in displaying faculty networks communications products and all of these fashions kind of the human element of vivowe. So you're asking about building private groups and views of information where you would share it among colleagues rather than share it entirely publicly. I'm trying to understand the extent to which the individual researchers revealed through this type of platform not only within their community but you spoke a little bit about their presence becoming visible in different ways within their institution, across institutions and we could have a lot of potential impact in terms of a number of decision making processes etc. Yeah, I guess the assumption that we're making and our provost is making as that for our institution that level of visibility for faculty and researchers is good. You need to be a little careful about what kind of information you make available. Actually one of the challenges of pulling stuff out of institutional databases is you want to make sure that you don't include the grant information before the grants publicly announced you know there lots of sort of timing issues you know it is more sort of public information and exposure. At some level it's probably happening already just not quite in such an organized way. I mean the web crawlers are pretty good at digging out whatever is available as online information about people. So individual facts probably are fairly easily discoverable in any case. The putting it all together you know Vivo does make it easier to sort of look across the networks and do comparisons and gather that kind of information together across a broad swath of science and you know I guess it's a mixed blessing. So speaking of where information is gathered from one of the early information flow diagrams you put up there had an intriguing circle labeled census data. Census? Oh probably one of the linked data. I think I didn't. Oh the linked data cloud was that the big... Could well be, yeah. Did it have a zillion circles on it? A zillion, yeah approximately. Approximately a zillion. Yeah that's so I believe there's and I don't know whether it's making publicly available. I assume it's either statistical summaries or geographic information or some... It's probably not individual. It's not individual information as linked in. And so perhaps say something about how non-individualized information would fit into this model. Again it seems to be person oriented. Yeah so you know you could again Vivo is focused on individual profiles so it's not that we're trying to only expose statistical information about an institution or a region or something as you would with something like census data. It's more that you can do network analysis and other things once you have the individual data. The individual data itself is as I said generally available and shareable. Although every time you do one of these network things you do interesting kinds of analyses that you might not have been able to do before. I mean there are all sorts of metrics now, scholarly metrics and other things they're taking advantage of publication data and being able to do kinds of graphing and analysis that really weren't easily possible before. So Vivo does enable that sort of thing. I think that's a good thing. Hi. Recognizing that you had a big grant to implement this. Can you say something about the estimated costs for an institution to maintain this service? So certainly before we got the grant Vivo at Cornell was running with you know two and a half FTs. I mean there's the data ingest portion, there was some development. I don't know how much of the development to allocate because it was you wouldn't have that much in a production system once you've gotten it up and running. I suspect the running costs aren't that high until as usual you upgrade some system or change to a different system of record or add activity insight and all of a sudden you want to draw stuff from there and now you've got to do some work to do the mapping. The sort of regular data flow and operation of the system is not very intense. I don't know what's what's your experience? I might be able to be a little more specific because at UF when we decided to implement we didn't have any funding. I mean it was just two rogue librarians who saw the value of it and an institute of food and agricultural sciences who wanted it. I think we applied for a library mini grant of $5,000 which paid for a $2,500 server which if your institution already has a server then that's not needed. And then we actually because we needed proof of vivo's value we used the other $2,500 to have a student manually input a ridiculous number of publications and people profiles and that's because at that time who could assist us with data ingest. I would say that that's probably the biggest I mean if you're you need to have somebody who's technically who's technical who can assist with with the data ingest. Librarians I mean if you have somebody interested in it they can reach out to the data source stewards and ask for the data it's once they get the data in hand that's the problem. If you have somebody who who can who knows the process for smushing data it's not as difficult to take that data and push it into vivo. I think but that's maybe sort of a long answer and that's where vivo's at right now I think if you you know the system itself will be pretty stable but the issue of mapping from your own institutional sources of record into the system is always going to be an issue even if you're running PeopleSoft and we've got PeopleSoft mappings I'm sure our customized PeopleSoft is enough different than your customized PeopleSoft that there's going to be an issue there. So that's probably the biggest expense curating the system itself and keeping it running is not a my response would be that in all these sorts of situations in straightened economic times one needs a pretty hard business case for you know introducing some sort of new system even if it's open source and for maintaining it so hence my question yeah the you know from the NIH's point of view the ability to to find collaborations and put together you know teams for to do research and you know find grant opportunities you know for Cornell we really have been able to connect people with funding the systems worked for that system works at the level of you know if the president is going off on some trip to Africa and wants to know what kind of African research and other things are going at the institution now he can find out and again there's so they're sort of the developed I mean the biggest economic is the on the funding you know development information side that the institution becomes more transparent to its own administration and the funding agencies as well as corporate how do you handle author identification and how does that work as the data moves up into the semantic web yeah we're we're looking we're working with some of the author ID efforts now the the ORCID effort and I'm not an expert in this area but but we're certainly looking at how we can come up with standard you know do disambiguation of of authors and make sure that we come up with a common mapping there was we actually had the workshop yeah we had a vivo author disambiguation workshop where ORCID people from ORCID came we had Baron Mons we had quite a lot of people who are interested in and knowledgeable in the area of author disambiguation I don't know if Dean and I are really equipped to answer your question but I do I would say that vivo is equipped to help with author disambiguation if you're talking about pulling in publications from PubMed from ISI together our partnerships with database vendors and with Biomed Biomed experts their possibilities are there to assist in author disambiguation and I didn't go into this too much but the underlying semantic web application will do inferencing you can do same as declarations to map multiple different author identifiers and indicate that they're the same object and the system supports that and you'll get a comment mapping and then in addition to that we do have a team working on author disambiguation and coming up with algorithms so the algorithms take into account the author name location maybe an email address there's a lot of different things that they take into account the algorithms are never perfect it's a very problematic area and it has to be something more expansive than just algorithms but I think the efforts are there I think there's a lot of leaders in the field there's certainly a lot of interest in this and we're definitely talking to people about this I think we're going to find that there's some real movement in that area in the next year or two Does vivo include a reporting module? Are you thinking about writing one? I don't think it includes an explicit faculty reporting piece again you could to the extent that you have the underlying data in the system it's fairly easy to pull it out there was talk there was talk about whether you use vivo for faculty reporting within Cornell as I say the College of Agriculture and Life Sciences actually went up going with activity insight because that provided them what they needed so we haven't yet built that capability but it would be fairly easy to do so Maybe more at this point there's more discussions about faculty reporting for an individual so if you have a packet to put together for tenure and promotion you know, vivo will make it easy for you as an individual to pull out that data and insert it into the packet that you're creating One more related to the outcome the platforms built to facilitate collaborative processes or opportunities therein I'm wondering to what extent those type of collaborations have come as a result of implementing vivo on these campuses At this point we aren't far enough along to really point to we can come up with probably a few examples at Cornell but we're very early in the process at the other campuses so we don't yet have examples of collaboration across institutions out of vivo NIH is counting on us to be able to produce that Right I was interested also to learn of student uptake and use and can you describe a little bit more about any type of examples of that kind of facilitated I know there have been there certainly have been examples where prospective students have used this to find again across Cornell life sciences stuff across a vet school and engineering college agriculture and life sciences really spread out and particularly PhD candidates may not know all the opportunities of people they have to work with and build a special committee again at Cornell you can draw a special committee from all across the institution so certainly I know that the life sciences initiatives that were the start of vivo have been pretty happy with the way it's enabled prospective students and faculty to find out about what's available at the institution I haven't been involved long enough that I can come up with specific anecdotes for you but I could probably get you examples if you wanted to get in touch I don't want to take too much of the time here but is there an element of conversation that can be brought into this platform and by that I mean I'm thinking about the collaborative processes I've seen evolving so at this point I think we talked about vivo as an infrastructure on which you could build such a collaborative system and we're very interested in those as opportunities and people coming to us with ideas for leveraging vivo information in that collaborative system that they're working with but we don't have anything yet I think thank you all very much thank you