 Thank you very much for coming. My name is Kate Wittenberg and I'm the Managing Director of Portico. And my colleagues here, Tim DeLauro from the Data Conservancy at Johns Hopkins and Ken Rawson of IEEE are going to tell you about a project that we've been working on this past year and that will be going on for a little more than another year. I will give you the overview and introduction to everything. And Tim will talk in more detail about the technical work that we're doing. And then Ken will talk about an important component of our work, which is the community engagement and outreach that we're focusing on. And then we'll have time for people to ask questions and have discussion. So we're calling this the RMAP project, short for Research Map. And the goal of this project is to engage with a much broader set of changes and developments that are happening within the scholarly communications environment. These are driven and our work is driven by a need to respond to a change in the whole definition of articles, the publications that result from research in the article form. In the past, the article has been primarily a text with some graphics or data that was usually indicated by a citation. Increasingly the primary unit of scholarly communication is becoming much more complicated. It's becoming a multi-part distributed object that more often than not includes data and sometimes software. The elements of this new kind of publication may reside in different repositories, maintained by different institutions, employing different technologies. So the goal of the project that we're doing is to maintain and preserve the connections among these various components of this increasingly complex scholarly object. An important part of our project is the partnership that's at its base. And we believe that our respective institutions represent important pieces of this whole picture and so it's important to us and to the project that we're already involved with each of these areas. The data conservancy, as most of you know, has tremendous expertise in management of large data archives from multiple disciplines. IEEE has expertise in management of data-intensive scholarly journal publications, perhaps more so than any other publisher. And Portico has expertise in digital preservation in the workflow requirements of our publishers, which now number 275 different kinds of publishers. So we bring in that expertise. The overall work plan for this project, which as I mentioned is two years, is in year one we have what we're calling the planning phase. And during this time we're working to gather requirements, create use cases. We've held a workshop with stakeholders in the community, which was very useful and very important in gathering feedback at an early stage of the project when we can and have made changes in our planning rather than getting feedback too late to do that sort of change. And we're refining the scenarios based on this feedback. So by the end of this year we want to have enough specs so that we can move into year two, which is going to be focused on development of a prototype in which we're creating a system to identify, store, update, and retrieve relationships among publications and new forms of scholarly output including, as I mentioned, data and software that will be attached to the article. The outcomes and deliverables for this work, we expect to be a working prototype of the RMAT tool. Existing and plans for future collaborative partnerships with the community, and by that I mean data repositories, publishers, libraries, and perhaps individual researchers, the different stakeholders for the work that we're doing. A system that supports emerging forms of digital scholarship and publishing as we move forward. And a plan for sustainability of the project beyond the grant funding that we've received to start this up and develop it during these first two years from the Sloan Foundation. And we'll be talking and thinking more about the sustainability plans as we have a better sense of how the project itself is developing. So now I'm going to turn things over to Tim, who as I said will talk more about the technology work behind this project. Thank you, Kate. So I'd like to start with some key objectives that we looked at from the technology perspective. A lot of you are familiar with some of the bigger services in the scholarly communication environment. Our bigger publishers, Crossref, now Datasite. One of the things that we wanted to be able to do was to support assertions into this environment from players who are not necessarily part of those big groups. We wanted to be able to include the results that come into our map as part of the link data environment. And while we do want to have input from these smaller players potentially, or even individuals like authors eventually making assertions about the entities that they create, we do want to be able to leverage data that exists in existing environments like those of publishers in Crossref, software repositories, data repositories, and the authors. We also wanted to be able to deal with resources like textual citations or named authors who don't have any identity associated with them, and at least provide some level of support for those within this environment. So based on those objectives, we developed a data model, and this is a very simplified version of that data model. I hope at least some of you can see it. If not, it will be in the slides, so I'm going to walk through them, or walk through the elements of this data model a little bit and talk about some of the key components of it. So the first thing I want to talk about is the concept of a resource. I think we can all understand what that is from our understanding of the web. A resource is a thing that can be identified with a URL or a URI in the web. It's one of the web's basic building blocks. And it is a key entity for description and retrieval within the web. So within this environment, we are focusing on a model that supports linked data and is based on RDF, or at least the concept of a graph that's supported by RDF. And resources form the subjects, especially, and objects and predicates in RDF triples or in relationships in the graph. And I'll also note that while resources called out specifically, other entities in this model are also resources, and they support the semantics of resources as well. The next concept is this concept of the RDF triple, which is a building block of the semantic web, and is conceptually of the form subject, predicate, and object, which is very similar to subject verb object in English. And it's the way resources are described in the semantic web. We also introduce a new concept called a disco, and we have nice icons that are disco balls to represent these entities in some of our sketches. And this disco is a distributed scholarly compound object, and the idea here is that if you're familiar with OAI, ORE, resource maps, these kinds of entities are very similar to aggregations of the OAI, ORE. And I have a little example that I can show, so the idea here is that within the blue lines, we have the compound object itself, and within the red line we have the entire disco. So the compound object represents an assertion by whoever is asserting this disco of the compound object, what they consider part of the object. The other two entities in this sketch are sort of the tentacles. You can think of them as tentacles into the semantic web, so that this compound object doesn't sit by itself, it has connections into the broader web. And so this is very similar to OAI, ORE aggregations as described in a resource map, as I said, without some of the constraints that OAI, ORE imposes. But our discos can be expressed as ORE resource maps. So just a little example of what this enables us to do. In this sketch, we have a couple of entities that are being held in a software repository and a data repository, and in the publishing system where an article is being published, the article A2, and the data set associated with that D2 may have been built specifically for that article. You can see in this sketch that D2 was derived from D1, and D1 is sitting somewhere else in a data repository. You can also see that D2 was the output of some software program, which is sitting in a software repository. So one of the things that we can do with RMAP, if each of these entities is pushing their discos into RMAP, an identity provider like ORCID could come along later and based on the identity of the creator C2, harvest information out of RMAP without having to visit each of those other software repositories, data repositories or publishers. A couple more entities within our model. We have the concept of an agent, which is a person or thing or group of things, an organization for example that is responsible for some action that is happening. And we make the distinction in our model between a scholarly agent, someone associated with the creation or modification of a scholarly artifact like an article or a data set, versus someone who is doing actions within the system, the RMAP system, and that kind of agent is called the system agent within RMAP. For example, a user would be considered a system agent within RMAP. And finally, events. One of the things that we want to be able to do within RMAP is capture provenance to know when things happened and who did what within the system. So that we have a good sense of, from the event information we would be able to reconstruct the model, the graph at any point in time for example, so that in the future we might be able to see the state at different times. So next I'm going to talk about our APIs. We've decided to use REST APIs, and in the past in some of our projects we've done things like going down the path of Java APIs. And we're going to do that underneath. But one of the lessons we've learned is that we need to expose information in a very lightweight way. And so one of the benefits of this approach are that we are programming language independent and we don't need to be tightly bound to the underlying model. The model can change often in ways that we don't have to expose. We don't have to break our APIs each time the model changes. And a few of the decisions that we've made related to how we do this implementation is that we will include versions within our API path so that over time as we evolve our APIs it will be very obvious to the users of those APIs that they have changed and they won't have surprises when they go to dereference some object. We also decided that we're not going to be bound by HTTP verbs within that model. And we want to be flexible enough to do things that will make it easier for users of the APIs and not be dogmatic about how those are applied. One thing I didn't mention here is that we've also decided that any access through the APIs will be controlled at least by an API key even for read kinds of operations. So some registration will be required so that we can have some control over how people are using the system especially when the system is full of triples. It could be very expensive to operate. So this slide is just meant to give a little example of the APIs that we're supporting. You can see that they are focused on either retrieval based on resource identity or creation and update modification based on the discos. The retrieval of the resources will be driven by the set of active discos at any moment in time. Discos can be ingested into the system and they can be replaced in the system and as they're replaced they become inactive. We have methods to get at the inactive resources as well but primarily consumers would be getting resources related to active discos. So one approach that's often taken or an initial approach that's often taken in API development is to specify the paths of the APIs. We started down that approach and what we found is that it's really important to talk about the behaviors of the APIs even before you start talking about what the resource paths of the APIs look like. So we didn't do that at the beginning and we're circling back to do that now. They really should come first and so that's another lesson learned that we're taking away. So then we're looking at API paths, the data models that are associated with the APIs, which can be slightly different from the underlying data model serialization of those models. So if we're sending out an ORE, something in an ORE model, we might serialize it as Adam or RDFXML or JSON-LD for example and then implementations associated with those. So this is another eye chart but this is an example that shows some of the behaviors within a particular API. This one is updating the disco and so articulating the behaviors themselves, what the requests look like, and what the responses look like. And this is one of the implementation flowcharts that's been developed for one of the API methods and it's in swim lanes. This one is kind of swim lanes without goggles on so your eyes are really blurry and you probably can't see it. And here's a call out that's maybe a little bit better of a little chunk of that but you can see that there's quite a lot of work that goes into the specification of the implementations for these. So for API coverage, as I mentioned before, our current focus is on APIs that populate and support access to the graph and future focus will be on authentication and authorization, other administrative functions, supporting an inference engine so that you don't have to know the exact property or relationship that you're looking for but you might be able to find it through broader or narrower terms and operability of the system. So being able to keep the system running and understand the state of the system and whether it's working properly. And lastly, I'll just mention that the technical team activity, we've developed an initial data model as I mentioned, we're specifying and prototyping APIs, members of our group are participating in the RDA, research data alliance, data publishing, working groups and interest groups and we plan to have an initial prototype platform of the work that we're doing in March of next year. And now I'll turn it over to Ken for community. Good morning, I'm Ken Rossin from the IEEE. I'm going to give you all a chance to take a deep breath from that technology deep dive that Tim was doing and kind of back out a little bit and talk about the community aspects of this project and go into a little more detail following up on Kate's overview. So the scholarly article is still the primary form of scholarly communication and I think in this day and age that's the printed journals and also PDF representations of articles on various repositories, publishers, websites and the like. But that article is becoming more complex, you know, following up on what Kate said, with many, many, many more connections, links, you know, emphasis on data particularly in the hard sciences and large data sets and relationships among all these pieces. And that's becoming very, very difficult to do to get an understanding of the scholarly article as this complexity builds and grows and we think that's where our map can come in and serve a useful purpose. So this is a different visualization of a scholarly article, you know, a different view of what Tim presented and just the idea here is that there are a lot of different interconnections and interconnected pieces potentially of an article. And this gives you a visualization of the scope or footprint of an article which can lead to some very interesting uses of this data. So in terms of the role that our map can play in the STM community, that's a role of an aggregator or pulling together all these disparate pieces of information about scholarly articles in one place because this information is available in a lot of different sources and silos but you have to go to all these different places to see it and it's hard to get a sense of a complex article from lots of little bits of pieces. And we think by our map pulling this together and describing these relationships among the parts of an article is going to help enhance scholarly communication. So in terms of the our map community, we've been talking to publishers, authors, funders, data repositories, librarians and we're beginning discussions with others, Crossref, Datasite, Dryad, Pangea, Wiley, ACM among others at this point. So we're really reaching out and trying to engage the STM community and the scholarly community to bring them into this project and see the value of it. And we really need to do that because our map is only going to be useful if people submit data to it and then use that data in interesting ways and the value is going to come from the community and it's really important that everybody be involved and as Kate mentioned, we held a workshop in October in New York City. I think there were between 25 and 30 participants from all areas of the STM community to get their feedback and their thoughts on this project and where it's going and its usefulness. So some feedback we got from that workshop is that they felt that our map should be a meta service or a clearinghouse which we think makes a lot of sense in capturing all this data in one place. And accessing that, Tim talked about would be through APIs and various linkage mechanisms so that people could pull this data and make it useful. They're helping us refine our use cases and defining the goals of the project more tightly and we're working on that. This is very useful and it was very clear that they didn't think we should replicate others work and we absolutely agree with that. We want to build on what other people are doing and kind of pull that together and integrate with others. So some other things that the workshop participants provided for us. They were talking about the challenge of secondary data and inferred relationships and we think that's really a sweet spot for our map because that's not really being done at all right now. Folks are talking about this but I don't think anyone's really hasn't actively working on this and that's something that we're really trying to do. They felt that our map having an established publishing partner and the IEEE was important. And they thought that we might focus on the input side in terms of software and research workflows and really gather content about articles and scholarly communication. So we're having an ongoing dialogue with the folks that participated in the workshop. We want to reach out to other people. This is certainly not exclusive. This is an open, transparent project. We really need to engage more deeply with the STM community if this is going to be successful and with your help we think it will be. So these are the team members that are working on the project. We have Saeed Choudhury and Tim DeLauro from the Data Conservancy at Johns Hopkins. Several folks including me from the IEEE, an extensive team from Portico with Kate. And we very much want to thank the Alfred P. Sloan Foundation for their generous support for this project. And we thank the workshop participants for really helping us with this. So we think this is a going concern. We're working very hard on this, but its value is going to be proved by the community. With that, I open the floor to questions for Kate, Tim, for me, for all of us. Can you just point out the website where people can get all the information? So this is the project website. There are a number of documents on there. We update it regularly and we invite you to please go there periodically, because it is changing. Okay, well, thank you very much everyone, appreciate it. Thank you.