 Okay, I think we're going to go ahead and get started in a couple of minutes after now, but it looks like everybody's gotten a seat. So, good morning. Can everybody in the back hear me okay? Good. Thank you. So, I'm Andrew Salons. I'm the partnerships leader at the Center for Open Science. And today I'm going to talk about improving integrity, transparency, reproducibility through connection of the scientific workflow, or the scholarly workflow. So, what this is going to mostly cover, that's not what I expected there. Okay, there is a first bullet that's not showing up. It shows on here, but not on there. There are three things that I want everybody to take home from this conversation. The first one that you can't see is that this talk is going to be about the open science framework. It's a platform that we build at the Center for Open Science. The first bullet is that this is a free and open source platform. So, it's something that anybody can use. We try to reduce the barriers as much as possible to get in and use it. A second thing is designed to add efficiency to the workflow. So, to make lives of users easier, reduce steps that are involved in doing various tasks. And the third thing, and this is a big important point, is we want to bridge the workflow. We want to connect to lots of other tools and services across the workflow. So, keep these things in mind, including the one that you can't see. And we'll kind of cover various points in there along the way. So, why is this an important thing to be talking about right now? I've got a snapshot of several different news articles that have been out in scholarly communication outlets. I think some from large journals, some from various popular press locations, all talking about different issues in science around reproducibility, around publishing of results that are false, so confirmatory issues in publishing research, power failures talking about sample sizes and power analysis issues. These are a variety of different types of problems that are getting lots and lots of discussion in many areas of science and scholarship more generally. All around published literature, all things that people want to find ways to do something differently to support and improve and everybody's looking for solutions. On one hand, what we're looking to do is address these types of problems. On the other hand, what we're trying to work on are things of barriers and incentives, these kinds of discrepancies that exist. So, one way of thinking about this is norms and counter norms. So, in scholarship or research in general, you might think of these types of norms that are outlined on the left here as being the ideals, the things that we'll want to work on. Open communality, open sharing, universalism, the idea that evaluate research for its own merit, disinterestedness, meaning that you're not necessarily looking for a particular outcome, you're interested in the question and discovery. Organized skepticism, consider all new evidence even against one's own work, so you could discover something and then disprove yourself as a concept. And quality is another point. The counter norms are the alternatives of these things. Secrecy includes as opposed to open, particularism, meaning very focused on particular problems and around reputation. So, are we qualifying this based on somebody's reputation or the merits of the work itself? Self-interestedness, meaning research becomes a competition as opposed to this discovery process. Dogmatism, all about finding and promoting one's own theories and findings rather than trying to consider new evidence and evaluate whether something is valid or not, and quantity versus quality. So, the point with this is these are some of the types of challenges that exist that are drivers and cause conflicts in incentives and generate some barriers around those issues. So, as one way of looking at this, there was a study that was published in 2007. It surveyed 3,247 mid and early career scientists. This was through the NIH funding. And it talked about how people... what sort of ideals people are subscribing to to this norm and counter norm issue. People say that they believe in one thing and then they practice something different, this type of issue. So, the first one up here is looking at... the gray bar that you see on the left is the norm. And this is one's own behavior. So, what do you actually practice? So, you're looking on the left, what do you practice? The norm, everybody says, yeah, I practice the norm. There's a small percentage that do the counter norm. A second point on it is pure behavior. What does everybody else do? The left shifts a little bit further. So, everybody else is practicing the counter norm. And the third thing is shifting even further. So, others' behavior entirely. So, the perception is, I believe in this. I do this. Everybody else does this other thing entirely. Which presents significant problems. Yeah. So, we're up against this type of problem. So, it's somebody else's problem. It's not my problem in practice. And this is a difficult thing to overcome. So, this is some of the context that we're talking about. We've got these sort of practical barriers and issues. They're difficult to overcome. We've got these issues of perception of me versus others. And we're trying to figure out how to address these things. So, just a little bit of context before I get into the actual platform discussion. Some context on our organization and who we are and what we're doing while we're working on these things. COS is the Center for Open Science. We were established last year, 2013. We describe ourselves as a non-profit tech startup. So, we are an independent 501C3. But we operate more like a tech startup. We try and be quick and agile and iterative and all of those things. We began with four leading foundation funders. So, we are grant funded right now. At this point, we're somewhere between $14 and $15 million in startup funding. I didn't have the dollar amount available right now, but we're in that range. We're in Charlottesville, Virginia. Not with the University of Virginia, but nearby. And our team has grown very rapidly over this period as we're trying to build and grow to be able to sustain and support what we're building. We've got about 25 full-time people, about 20 interns at any given time. And an important thing to underscore here is our team is mostly software developers and researchers. So, we bring researchers onto the team to provide expertise in the things that we are trying to build and support. And our mission at the bottom here, as you see, high level goals, mission is to improve openness, integrity, and reproducibility of research. What we are often focusing on is scientific research. But what we think is that this applies to research generally, which is why I'm calling it scholarship today. So, I'm going to go ahead and get right into what the actual tool is and what it provides. This is the URL down at the bottom, osf.io. So, if you're on a computer and you want to follow along on there, feel free. We try and make it very accessible. So, all you have to do is go there and create an account. It's easy, it's free. We describe it here as you see as a project management tool with collaborators sharing. It's a file storage system. It's lots of different things, all of which you'll see some examples of as we go along. But the first thing you have to do is create an account. So, once you do that, you have the opportunity to modify some account settings. You can update your information. We basically have a profile system built into this. So, it can be internal and then representation of your profile within the system. So, others can see, you know, you're a researcher at this institution. You can add social media information, employment information, education information. So, you can basically build your profile. And we try and connect this to lots of other similar type tools and services. So, the social section will have your Orchid ID, your Twitter handle, your Facebook page, your LinkedIn page, GitHub ID, all sorts of different things that might be relevant. The next section is your dashboard. So, we approach this as a sort of a dashboard concept. The black bar that you're seeing up at the top, that's your main menu bar. It has my dashboard and then Explorer where you can look at public projects. What you're seeing here is where you would land once you have an account. And there's two main columns here. Project organizer is the left side, which is basically like a file tree. It's a bit more complex than that, but what you can organize is your registrations, your projects. You can use your frequently used list. You can organize it all sorts of different ways that are useful to you. We leave this very flexible and let the user specify how they want to organize things rather than impose some sort of structure on that. On the right side, we have Create a Project. This is a very quick onboarding method. So, basically all you have to do to create a project is type a title of a project in there and this can change later. A description which is optional and a template which is optional. So, if you were to create multiple projects over and over again with the same structure, you could use a template to do that. Say you had a lab and you wanted data collection and data analysis and publications and all of these things to be organized in certain ways. You just use a template to do that. Once you hit Create, it'll create that project and we'll go on to see what that looks like. You would land on this page. So, I have a new project and this new project is pretty bare bones at this point but there's a central menu bar in the middle here. Overview, File as Wiki, Statistics, Registrations, Fourth Sharing Settings. And I'm calling out a number of different features. So, we're on the Overview tab right now which shows you the entire project. You see the project's file tree and that's only for that project. Over here we have the Sharing section in the menu. We have citations. Everything in here gets persistent, unique IDs. And we generate citations automatically for people based on that. Down in the bottom right here you see a log. Every single action in this system is automatically logged for the user. So, it creates a history and tracks provenance, which is very helpful in understanding how research evolved and actually very difficult to do sometimes if you don't have anything like this. The top right on that red box up there is Private and Make Public. And you see that Private is the one that's suppressed right now. The default option on every project is to keep it entirely private. And we never require that it become open. We allow users to operate privately as long as they would like. But we also make it very easy if they want to become open with how they're doing it. So, that can happen at any point in time. So, that's some highlights there. Let's go ahead to the next section. The Wiki. We have a Wiki tool built in here. It's pretty simple at this point, but we're adding a lot of new features to it right now, actually. So, just some highlight points. You can access the Wiki in the menu bar, Wiki history. So, you can click on there and you can see the entire history of changes on the Wiki. So, you can look at differences. And over on the right, you can add new or edit a Wiki page. So, that's the highlight there. The next thing is components. So, again, we do not impose structure. We let users choose how they want to structure their projects. The way that we provide support for that is a thing called components, which could be considered like folders of a project in some ways. So, if you were on a project page here, you would see add component. You could click that. You would get this box over on the right, which says name your component, and choose a category. Some of these categories now and in the future will have certain activities sort of related to them. You know, certain types of rules that could be there. But the default one, the most generic one is project. And you can nest as many of these components under as desired. So, this really gives users the freedom to structure their project however they like to create as many different, you know, branches of the tree as are needed to correctly represent how things relate to each other. Once you complete a component there, you'll see down here, this is the top level, a new project, and data analysis plan would be the component of that project. So, you see it represented that way, and if we were to go back to the top level view, you would see a tree structure at that point. The next thing I want to cover is adding contributors. So, we call users on the system contributors. You could call them collaborators. And what we're talking about here is adding new users to projects and giving those users different types of permissions. So, we have permissions controls. So, all you have to do here is when you're on your menu bar, you find sharing, you do add contributors. You click that button down here. You'll see select from the results. So, I would type some of these names in here. We type Courtney in here. We get the list of results. We choose the plus sign. It adds it over to the right side. We specify permissions. We have read, rewrite, administrator, the main levels of permission. And then you would submit and add that person. So, in this case, this is somebody who is already in the system, but you can also add people who are not in the system already and they'll be sent an invitation by email and they can join that way. So, we basically put all the control of this in the user's hands. One other thing to call out here is, it's probably a little too far away to read right here, but for this particular user, the one that we were calling out, there's additional information below her name. And that information is a product of her profile that she created. So, she will get a richer view, a richer representation of her identity in a contributor list if she's completed her profile or added information to her profile, which helps her differentiation when people have common names. So, that's one thing there. As I mentioned before, privacy is an important thing for us. Despite the fact that we would like people to be open and transparent with their practices, we recognize that that is not the norm right now. So, we try and support people where they are. That's really sort of an underlying philosophy here is don't make people come to us, help us go to people, provide support in the ways that they want right now. So, if you're in the private state right now and you want to make it public, we provide a warning that says, once this project is made public, there's no way to guarantee that access to it, access to the data it contains could be completely prevented. Which means you go public, somebody might access it at that moment. You may go private again, but somebody might have already gotten it. So, we just try and make a very explicit statement there, warning that. Alright, continuing along, another thing here that's obviously of significant importance since this is what most people use is uploading files. People use this as a file storage system. We have unlimited capacity on this right now. I mean, obviously there is a limit, but we do not impose any sort of limit on people. We have not had that abused yet. So, please don't abuse that. But yeah, it's free. And we allow people to put as much stuff in here as they would like. And we want to support that, we're funded to support that. So, we try and make that process extremely easy. If you're looking at your file tree here, there's an actions column. There are two buttons here. In this case, click one of the buttons for the place that you want to upload it. A modal will come up and allow you to pick it from your directory. The alternative is take that file and drag and drop it on here. And we have a drop zone integration, so it will just be added to where you dropped it. So, we try and make that extremely easy for people to use. I mentioned earlier that we do logging. So, every activity in there is logged. Every action from adding a file to adding a contributor to removing a contributor to making a public to making a private, all of these things get recorded in the history of the entire project. And this is whether it's public or private. So, if you had it private and you make it public, everybody can see what was done, which is really important for transparency of process. A second thing that we think is a very high value feature is versioning. So, this is something that is a challenge for a lot of people and often done very manually. So, we try and do it in a more automated way. So, basically, if you took a file and you went and uploaded it, and then you drop a new version of the file in, it will create a version history for you on that. So, down here in the bottom right you see version one. It was submitted by this user at this date. In version two, this user at this date. And on the right, there's zeros, which are the number of times that file has been downloaded, and then download buttons. So, this is nice in the sense that this helps support collaboration and automated versioning. It also helps people with their needs for understanding what sort of impact they're having. They can see how many versions, or how many different downloads have happened on each version. And then above that you get, there's the download button there where you can just download everything at once. So, the next feature and sort of concept I want to introduce here is one that may be less familiar. But this is the concept of registrations. Show of hands in the room. How many people have a sense of what I'm talking about with the registration? Okay. That's what I was expecting. Okay. So, this is something that's increasingly emerging in the space right now. And especially in social sciences and in some life science areas, it's the idea of registering. Well, there's kind of two uses of this. Registering studies before they're done. And this is to avoid changing hypothesis to accommodate outcomes. Right? So, you have a hypothesis. You collect data. You do your analysis. You don't like the analysis you get. And then you say, well, I meant to actually have a different hypothesis to change it. Yeah, that's not really ideal for quality in science. So, a lot of different disciplines are starting to adopt this concept of registration where at the very beginning, you register your claim someplace. You say, I'm going to do this study. This is the question I'm going to ask. This is the protocol, the analysis plan that I have. And when I get to the end, I will have evidence that that was what I was intending to do. And that way, if the results come out in ways that you don't like, that's the result. That's how it is. You can do a different study, but those are the results. So, we provide a registration mechanism to support that. And what this is basically is, on a project, there's registrations up here. You hit the registration, the registration. It provides you with a form here, which there are various templates for. We have an open-ended registration where there are several text boxes to complete. A user types in whatever is required for those boxes. And then it creates this copy over here. And you can't see it very easily, but there's a bar mark behind this that says Read Only. And this is a frozen copy. So the whole point with the registration is that you can't change it after you do it. Otherwise, that would defeat the entire purpose. So it's a completely frozen set of copy of the project at that point in time. And then, when you get to the end of the project, you can refer back to that and have evidence that that was the intention. We also use this in another way in that people can create a registration at any time in their project. So say you just want evidence that, you know, when you did, before you did the data collection, you create a registration, before you do the analysis, you create a registration, before you publish this next thing and do a registration. If you just want to see easily where the project was at that point in time, that would be a case for creating a registration. So people do use it that way sometimes. These are, as I said, these are increasing in use. A number of journals are requiring these. There actually are changes in some journal practices shifting towards this type of thing as a priority for actual review and acceptance. So, yeah. So it's increasing in awareness and popularity and usage. So the next thing, sharing work. So I mentioned earlier the adding contributors and the ability to control permissions and things like that. A different dimension of this is sharing in a private way. So say that I do not want to add somebody as a contributor which would elevate them to the status of collaborator. They're not really a collaborator. They just are somebody who needs to see my stuff and they need to see it when it's private. Perhaps a peer reviewer, perhaps a funding reviewer. What you would do is create a view-only link. Create a link. Pulls up this box over here where you get to generate a new link to share a project. You can name what that is. You can describe it as a peer review. You can choose to make it anonymous if you want. If you make it anonymous, it wipes all of the information in the log history and everywhere on the page. Everyone on the project, any identities. So it is not going to the file level and wiping identities out of files. That's obviously a risk. But at the level of contributors and versions and history log and all of that, it is wiping identities often showing as anonymous. So this can be extremely helpful for peer review and anonymous peer review in particular, or blind peer review. The second aspect of it is you can choose which parts of the project you want to share in that capacity. So you can just choose particular components where you can choose the entire project. And I guess the last important point on this is you can expire these links. That's a critical thing, right? You might have a several month period where you want somebody to have access and then you don't want them to have access anymore. So you would go in and delete that access level. A critical topic to bring up in here, of course, is unique and permanent IDs. So our philosophy, and I think a shared philosophy in here would be that scientific content must be easy to site and annotate. Without being able to consistently get to something, it simply is not going to be as reliable. So right now our strategy is we have GUIDs that we generated for this. We are exploring other options, whether we go with DUIs or easy IDs or any sort of different types of persistent identifiers. But right now we have our own local GUIDs that we generate. They look like these. OSF IO and then five characters. And these are randomly generated characters. We do not let people specify their own set of characters to know vanity URLs right now. And my point with this is these are three different types of things entirely that all have similar types of persistent IDs. This one is a project. This one is somebody's profile. So is that individual's profile? And the third thing is a file. So in all cases there's persistent access to many different parts of the open science framework and somebody's experience in there. All right. So a third thing that I mentioned at the beginning that's important for everybody to take away here is we are trying to connect the workflow. So I just talked about all internal features and I'll talk a little bit about external. We have add-ons. So our model is we have an API. Other tools and services have APIs. We want those APIs to connect. And we want to bring value to our users in our service as well as users and product owners of other services. So right now these are the five that we have in production. Dropbox, GitHub, Dataverse, FakeShare, Amazon S3. If you're not familiar with these, Dropbox is for general file storage, GitHub for code, Dataverse. I imagine a lot of people in here are familiar with Dataverse, but Social Science Data Repository largely. FakeShare, a general purpose repository mostly or at least begun for figures. Amazon S3, a general purpose industry data storage service. So we chose these mostly because they had very accessible APIs and they had a fair amount of usage in certain areas of our user groups right now. So what happens here is when you're in a project and you're looking at your menu bar, there's an option for settings up on the right. You go to the settings. It provides you a selection of add-ons that are available. We have three categories right now, documentation, storage, and other. The storage section shows the five or six add-ons that we have. The other one that's on there is not an option, actually. It's always self-storage. It's internal storage. You choose the one that you want to add and then down here it lets you configure that. So we do this via tokens. So we do not keep credentials. We're using the tokens. And once you do that, you assign it to a project. You pick a folder or whatever the appropriate structure is within the other service. That becomes available on this project. So the value that this adds is if I have a team that is using the OSF, but I happen to be the only one on the team that uses GitHub, nobody else has GitHub IDs. Nobody else wants to have GitHub IDs. But everybody else wants to be sure that they can see the things that are in there, even if it's private. I can authenticate my GitHub account with our project and then everybody on the team can interact with that GitHub account. That GitHub repo through the OSF and the opposite. So we do bidirectional as much as possible. So I could add something to GitHub. It would show up here. I could add it to here. It would show up in GitHub. And that's true for pretty much all of these add-ons. And that will be the idea for most add-ons going forward. So these are all storage, but we want to do a whole lot more than that. And as I said, connecting the workflows is the philosophy here. So some other examples of things to kind of illustrate what we mean by across the workflow. DMP tool. This is a service that was created to help researchers produce data management plans that are compliant with requirements from funders or from other organizations. So the idea here is that service right now ends at the creation of a DMP. So somebody goes in. They see the requirements. They complete various questions. They get a DMP out. They submit that DMP to a funder and end of story. What we want to do is take that plan and shift it into the OSF and make it an active living document, something that people can use as a guide to actually creating their entire project, which ideally is a starting point for the project. So we think that that would be a good addition for people. A second thing is Databib. Databib already three data. Basically both being a catalog of repositories of data repositories. And the idea in this case would be to provide a data repository lookup service so somebody is managing their data in the OSF. They get toward the end of their projects and a journal or funder requirement says that they have to now deposit that data someplace else. Where do they deposit it? They don't know. So the idea would be to have Databib and RE3 data be the lookup service that provides suggestions based on the type of data, the type of discipline, various criteria. And then the thing that we'd like to extend on that is adding a checklist. So this repository that you've been suggested has these requirements in order to do deposit in that repository. It accepts these types of formats, these types of files, the size. They requires these different metadata fields, those kinds of things to basically help researchers transition from where they are with how they're working with it to a state of deposit, which we know is a major hurdle for people. A third thing, and this is similar to the other storage services but more preservation oriented, is Databrary. So Databrary already does sensitive video storage very well. They're doing lots of video on different types of interactions with participants and studies. It's a hard problem. It's a problem that we do not want to solve, but we want to help other people access that feature. So we would connect to their service if you were in the OSF and you had appropriate rights in their environment, then you would be able to access both things from that, in that case. Okay, so that's the end of Connections for right now. So I just want to show a couple of different use cases from real people so that you see kind of what we're talking about in more detail. So this is a real user. This is Erica. She's a PhD student at UC Riverside in Psychology. So she, full disclosure here, she worked with us as an intern for a semester, so she got fully on board with this process. In that environment. But the really positive thing is, as she was working with us, she worked as a developer with us, even though she's a psychology grad student. She came and worked with us. She saw the value that this could bring to what she does, and she took it back and her entire lab adopted this. She's not in charge of the lab, but her lab believed that this was something that would significantly improve how they were working on things. So the tools that they use day to day, these are not things that we support entirely right now, but Qualtrics, Dropbox, SurveyMonkey, and R. We have Dropbox now. We actually do have an R library where you can be an R and access public data in the OSF. So you could call data from the OSF and do analysis in R. Qualtrics and SurveyMonkey are things that we'd like to support in the future. As far as the features that she said brought the most value to what they're doing are things like version control, collaboration, a wiki, the wiki they use for all their lab notebook and meeting documentation. So when they have their weekly lab meeting, they're recording things in their wiki for their project in the OSF, and it's just helped with sort of consolidation of all of these things. A second case, and this is a fair bit different. This is a, Rodrigo is a researcher in bioinformatics down in Mexico City. They have lots of different types of genetic data. So they use things like next-gen sequencers, pipelining software. They write a lot of their own software. So they've got the problem of how does everybody in their team make sure they continue to have access to the software and different parts of things that they have in their project. So the things that they felt were extremely useful for them were version control, file sharing, the GitHub and Dropbox integration, and then public sharing. They really wanted to be able to make this consolidated view of their work public and visible. They do this in a gradual way. They're not making everything instantly visible, but they wanted to be able to make it more easily visible as it's represented across multiple services. A third one that some people in here may be familiar with because of Norm is, oh, except in the classroom. So this is Project Tier, which is at Havertford College, and I think they've presented here before, actually. The problem that they're trying to solve is how do you teach undergrads how to make research reproducible and verifiable? They've got that. They've got some various programs that they do, workshops and things that try and teach those different techniques, and at least helping people be aware and thinking about these problems. So they find the OSF to be extremely useful for organization of documents and data, the command files, as people are writing these and needing to know how they're doing the analysis of what they're doing, the metadata, so being able to describe the parts of what they have and the file sharing generally, basically across the team and across classroom groups. So those are a couple of examples. There are a lot of other use cases that we have, but these are some of the more prominent common uses. So I think I promised this in the billing, so I thought I would put this up a little bit. If you're interested in this, this is how this is built. I'm not a developer on the team, so I'm not going to go into a lot of depth on this, but our CTO is here at the meeting. He presented yesterday, and he could certainly talk about more things if you're interested. But we are a Python shop. It's all Python, JavaScript. We use Git to do the versioning within. We use MongoDB, TACU, MX for the database structure, or the NoSQL database. We use Ansible for automation of the entire process. Elastic Search is what we're using for the indexing. I put OSF API, because that actually is an important part of our infrastructure. We have an API that we use ourselves. So the core infrastructure interacts with the top-level infrastructure through our own API, and other services will obviously be able to interact that way as well. We use Rackspace for storage and line node for hosting. So that's what the environment looks like right now. So just to close out and provide context on how this stuff connects to some other things also, if you're interested in what we're doing, if you're interested in collaborating or getting involved, this is what Share is built on. You made it for that yesterday, or in other venues, but OSF is the infrastructure to support Share. So one way that you could contribute is to be able to add content into Share and partner on curation of content. That's sort of a future direction that we'll have lots of discussion about, I'm sure. A second thing that we're just getting started is serving as an ambassador. So we're going to start an ambassador program. So these types of use cases can be shared at various institutions so that more people know about this and more people can choose whether this works for them or not. A third thing that some people in the room may have already heard about is that as a compliment to this service and to try and help support these broader goals, we are funded to provide training and consulting on reproducible and open practices, reproducible statistics and open practices. So basically we go around and we give workshops, training sessions, we have conversations with lots of people about how to make statistics more reproducible and the entire process more transparent. The use of the OSF to support these things. We've got workshops in the next few months set up at Princeton, Yale, Duke, a whole series of places that I'm blanking on the entire list of right now. But we would love to offer these in more places. So if this is of interest to you, I'd love to talk about that. We usually do it through libraries as much as we can. Libraries as neutral places that are not solely interested in departmental activities. So, yeah. And the last thing, we are always hiring. So if you or anybody else would be interested in working on these types of activities, please see our jobs page and please encourage people to apply. It's a very active growing environment and it's exciting to be sort of part of this. So we'd love to have more people join. If you have questions, I'd be happy to take questions now, but if you have things later, these are the places to contact us. Contact at OSFIO or Twitter is OSF Framework. And we'd love to talk more. So thank you very much for listening and I'd be happy to take any questions.