 Thanks, everybody, for this opportunity to talk about a project that I'm working on called Openworm. Back in the early days of open source development in the 90s, not really all that long ago, but an influential essay was written observing how open source code development was being done at the time. There were basically two camps. The first was characterized as a cathedral. In this camp, people were developing code amongst a closed group and putting that source code out into the public on regular intervals. But largely, the community of people building the code was more closed. A second group was characterized as having a more distributed model. This was characterized as a bazaar. And in this model, the users could also be developers of the source code. And this was, in fact, the way that Linux had been operating for some time. And the essay was trying to try to figure out what was the difference in the models between these two. And so the essay was called the Cathedral and the Bazaar. And the main difference between the two, as the essay talks about, is the difference between a closed or a hierarchical top-down approach to developing code versus a more distributed, decentralized version where every line of code was, in fact, exposed to the public as it was being created, as it was being written. Well, Linux ended up being pretty popular. If you have an Android phone, you're running it in your pocket. And it's installed on about 1% of everybody's computer in the whole world. So a lot. And so I think it bears looking at for this field, as well, what we can take away from it. So the author of that essay, Eric S. Raymond, observed that in the source code community, with enough eyes, all bugs are shallow. And I would pause it, perhaps, for us in this community that, potentially, with enough eyes, all of complexity of biology maybe is shallow. Or at least that's the attempt and the context in which I've undertaken some of the work that I've been doing and I'll talk to you about today. So really, what we've endeavored to do is to create people-powered computational biology with a project that we call OpenWorm. So what is OpenWorm? So what I'll talk to you about today is structured in basically three parts. We'll talk about computational biology and informatics. We will talk about open science and social media, I approach what we're using there. And we will talk about using best practices in open-source software engineering. OK. So the project, OpenWorm, its long-term goal is to create a digital simulation of the organism, C. elegans, from a cell-by-cell manner up to behavior. Now, on the one hand, some folks think this is either terribly uninteresting or terribly over-complicated, depending on who you are. So let me walk through a little bit of the points of that by way of introduction to the C. elegans itself. So the first thing is its behavior. So the C. elegans actually does have some interesting behaviors for such a small organism. It is microscopic. It's about as long as a hair on your head is wide. But it does things like search for mates. It avoids predators rather deftly. It has social behaviors. It actually knows other worms and groups up with them. But I think some of the strengths of it for biology is that it's the first organism that we had a complete sequence of its genome. That's one of its claims to fame. And we know a lot about its cellular anatomy as well. So we know every cell division from a single egg out to the full adult. And it has a nice property called utaly, which is that this is conserved amongst all individuals that have the wild type genome. And for the purposes of neuroscience, we know that it has exactly 302 neurons, 95 muscle cells, body wall muscle cells, and a total of 959 cells in its body. And it's very, very well conserved. So from this perspective, it's actually rather appealing. And then as well for neuroscience, it is the only full organism connectome that we have to date down at the EM level. And Mitra Chislasky, who you heard from yesterday, has actually been involved in pushing that connectome further. So it is a continual improvement in work in progress. All right. So we had to break down that big task into something a little bit shorter in the medium term. And it happened to be more relevant to neuroscience. So in the medium term, our goal is to reproduce a specific database of behavior with a three-dimensional neuro-mechanical model of the C. elegans. And this will be the part where I talk about that. So here's what this guy looks like. So there's a C. elegans under a microscope, happily crawling around in a auger dish. And superimposed on top of it is a outline that comes from a computer vision algorithm being run on it. And this is actually used to get a very precise description of its behavior, which is very useful, because we can actually mathematically decompose and break down the behavior of the C. elegans into things like principal components. So researchers have found that approximately four principal components, or eigenworms, I love that, four eigenworms can describe 90% of the variability of the C. elegans movement under these conditions. So that's really nice, because it gives us the ability for a modeling project like this to build a behavioral classifier. And actually, this has been done in some follow-up work where they can actually detect the difference between a wild type and any mutant C. elegans just based on the statistics of its movement. And that's really important, because that means that we can use that as well for the purposes of determining if we were to have a simulated worm. Does it act like a wild type? Does it act like a mutant? Does it act like none of the above? So that's a really nice constraint. And so our approach that we've been undertaking, and a lot of this really, and I'm going to show you today, is really about infrastructure building and laying out the stage for this. We're still in the process of going through this process. But this is a very nice constraint for us to be able to take, essentially, snapshots of a simulated worm, feed it into a classifier like this, and then give ourselves a metric that we can use. So we always know if we're doing better, if we're getting closer to simulating or not. So this is a very nice feature. OK. So I said that this is a cell by cell description. And so in a very crude model of the way the nervous system works, a very crude feed forward model, we know that we have a flow from sensory neurons to motor neurons that eventually impact muscles. And of course, the whole nervous system is in service of creating motor behavior. And that impacts a body, which then impacts the bounces around in a world, and then ultimately results in sensory feedback that comes back in. And so it's very tantalizing for an organism of this size and that we understand as well as we do to try and close this loop, all in simulation for the purposes of understanding not just if the behavior of some squiggles looks like behaviors of squiggles are real thing, but if, in fact, the cells are doing approximately what they should be doing in this scenario. So we've presented in the past work on reproducing the C. elegans nervous system in silicone. So we've had some posters in this before. But basically this is showing here a walkthrough of combining the cellular anatomy of the neurons in their positions within the anatomy of the worm with its connectivity, some knowledge about neurotransmitters, and doing all of this using NeurML, an open language, posting it up all online. And then this is actually even available to click through. This movie here is drawn from an online site, Open Source Brain, where you can actually go through and click as well and see this. So we have the starts of a nervous system, certainly not finished, certainly not complete. We don't claim that it's yet operating or functioning as well as we'd like it to. But it's a place for the bizarre to take over. Now frequently, one of the concerns or criticisms about C. elegans neuroscience is that it's been hard to do electrophysiology, and that has been true classically. But recently, with advances in optical imaging, and shown here is some calcium imaging of a neuron in the head of the worm, we're getting a lot better at being able to see the dynamics of neurons. And with the advent of new voltage-sensitive dyes as well, this is only going to get better. And just back a few months ago, there was a Nature Methods paper that showed actually calcium imaging of the entire worm's body as it was moving around. And so there's very exciting developments for us as we make better and better models and have better constraints here. OK, so that needs to connect to a body. That was on the other side of the screen. And so we've taken effort in the project to actually break down the anatomy, the physical anatomy of the worm's body, into a particle-based system where we're able to describe the soft and elastic tissue of the worm in a way that allows us to build virtual muscles and have that pulling on a body. So this is what that looks like. So this, in December of last year, we were able to do kind of a first boot up of this sort of physical side of the system. Now, this is just a simple sinusoidal input going into the muscle cells of the C. elegans. But what it's doing here is sweeping away liquid in the back. As you can see here, sort of a thin sheet of those blue particles is sort of a liquid gel. And to sort of reproduce at least a minimal amount of swimming and crawling behavior that starts to approximate what we can see in the real worm. So with this kind of a setup now, we're poised to be able to not only close the loop of connecting the nervous system to the body, but also closing a loop with the data sets, which is to extract the skeleton from here the same way that we can with the real worms and compare and make sure that we're getting an accurate metric for improvement. OK, so now switching over a little bit to the open science side and the social media side of the talk. So this book here, Reinventing Discovery, I think I'd really recommend it to anyone curious about open science. It really takes some of these ideas of open collaboration and applies it to the scientific realm. So on the open science side for open worm, so we're an international open science community. Here's some at a glance statistics of places that we're up and things that we have. We do a lot of our work on GitHub. So we really are putting out our code in the public sphere. So for anybody to see, there's a bunch of different repositories. There's a lot of different things that are going on within the project all the time. We also host our meetings online using streaming tools like Google Hangout. So we have, I think our total collection now is about 100 different meetings, not only of our own development such that other folks can see what's happening in the development of the community, but also journal clubs. So you'll hear from Shri Joy and Rick, who've actually both done open worm journal clubs in the past. And those then get archived for all time so folks can see what's happening in that community. And so then a couple of vignettes. There's a lot going on in the community, but two things that I wanted to highlight, especially relevant to us here. So INCF graciously sponsored students from the Google Summer of Code Project. This is where Google funds in sort of internships for students to come from around the world to work on open source projects. We worked with two different students this summer who successfully completed projects and they're listed up here. One working on Jopeta, which I'll talk about later, and the other working on a project related to sort of the semantic area of understanding C.L.E.ans anatomy. So that's one way in which we've been, I think, increasing the group of people coming into this area. So people who otherwise would not have interacted with computational neuroscience or neuroinformatics or now have a way to come in because the code is open. And a second thing that we're sort of excited about is that we use the crowdfunding platform Kickstarter to get out to about 800 people who backed us. Some of you are actually here in the room, if you are. Thank you. And here the goal was really to take the YouTube video versions of what you see and really create an interactive application on the web that lets everybody play with the model without installing anything. Because one of the things that we realized is that even just putting the code out still doesn't let people interact with applications. And so we wanted to ask the broader community if they'd be willing to sponsor us putting that up in a more robust way and host it up on cloud services and let everybody have access to it that way. So that's currently under production and should be released next June. OK, and so then switching over to the open source software engineering side of things is the third part of the talk. So when we started the project back in 2011, it was really a nice collaboration between folks coming from the scientific community and also folks coming from the software engineering community who didn't really maybe know so much about biology. But as they got to understand the problem, they understood more and more the thing that we've seen repeatedly here over the course of this Congress and is a very common theme, the themes of multiple scales, multiple algorithms, and multiple timescales. And they set about to, they wanted to really build a code base that could address that, a platform that could address that. And so this has started to take on a bit of a life of its own, which is why my slides change color here, into a project that we're now calling Gepetto, which is a sub-project underneath open worm but is actually the source of other collaboration and other folks. Because this idea seemed to catch on and become more popular even outside of neuroscience or even outside of open worm. So the idea of it, it is a web-based simulator that on the front end is able to have these advantages of just allowing URLs to link to models. And here's an example of that, where we're running a simple Hodgkin-Huxley model. But it's being done on the web. And so you can check this out on your own laptops at live.gepetto.org. But it's also able to run different algorithms at the same time. So you can click through and see, for example, simple examples of that particle-based simulator that's able to simulate physical physics. And combine them all together in a single platform. So why should you care? So access to simulators and a browser can aid collaboration and let lots of folks see what's going on, allows you to visualize your models without installing any software. And it allows you to work with complex systems, as I said. But I think what's even more important about this, and the reason that I wanted to bring it up and highlight it here, was more than what that particular application is doing, because in fact, again, it's just one of many of the code bases that we're working on under open worm. It's the way it's being developed. In order to create this bizarre, kind of a, not bizarre, but bizarre, approach to, maybe it's a little bizarre, approach to open science. So one is that all of the code bases, as much as possible, are under regular tests. So that means that every time a new commit or line of code is checked in to the system, it reruns a whole battery of tests, and the running of those tests is also public. So everybody can see when our tests pass, and everybody can see when our tests fail. So that's very important for ensuring reliability. And the other is this, which is really making not just the code open, but the process of developing the code open. So this is an example of a Kanban board, where each one of those little cards there that you see capture either a bug, or a feature request, or an enhancement, that somebody who's coming to this platform wants to build into it. And through regular meetings every couple of weeks, and through monthly releases, so every month a new version of this code is put out. In fact, the whole process of putting this together is much more distributed, much more collaborative, and much more the bizarre, rather than the cathedral. So that's kind of where I want to leave you here, is thinking about the notion of is what we're doing in computational neuroscience or informatics, a bit more of the cathedral, which was still describing something where content was being divulged to the public, or what are the ways that we can start to bring it more decentralized and make it more of a bizarre. So from all the way from, so hopefully today, what I hope that you've gotten across is this idea of our approach, that we're using real data of behavior, trying to close the loop of the nervous system, and attaching that with a physical model of the worm body, to the open science approach, where we're really putting everything up on GitHub, examples of that, Google Summer of Code and Kickstarter, and finally really caring forward with an open source development project, where we sort of featured Gepetto and an open process on GitHub using an open development process. So those are the folks involved in the project. It is the work of many folks, and you can find out more about the project at openworm.org. Thank you. OK, so we have a few questions relating, if we focus on the talks, and then as I say, we can discuss the more general philosophical points, maybe at the end. Any specific questions for Steven? I was wondering how many people have you been at the very beginning of the openworm project, because I think there are many good projects out there, which simply lack of a critical mass to get the visibility and to get the kind of power that you need to maintain this project. So how did the openworm initiative started, and how did you manage to get such a big community? It began with four people who met on Skype on a regular basis. It started by connecting. So I think the most important thing that we did at the beginning was that we started with people who already had a burning interest to try and do this simulation with the C. elegans, rather than finding people who had no interest in that and trying to get them to work together. So there was a nice confluence of interest there, both from the folks who were building the physical models, the folks in the neuroscience side, the folks in the software engineering side. And then we followed best practices in building open source projects, like putting a website up and building a mailing list, and grew it organically from there. So first I must say I'm very, very impressed that you've managed to build this kind of open community around this. And I understand it, whether researcher or student or non-scientist, if you can contribute, you're welcome if I understand that. That's right. Have you had any sort of negative pushback from more senior researchers or from people's bosses or from funding people that you shouldn't be this open, that you should keep the data close? Have you had any problems like that? So far, so good. We haven't had a negative pushback. We are, you know, we've gotten one publication out. There's a second one that's currently in review. That hasn't yet been a concern. We have a few more that are in the pipeline. So far no, but obviously welcome to hear any suggestions or tweaks to it. Quick question. So first of all, what are entry points for people who are interested in this? So for example, if you have someone who sees this open project and wishes to contribute, like how do they get involved? How do they, I mean, it's good that it's all open and whatnot, but there need to be sort of like organized entry points for people with different skill sets. So how does that process work? And then my other question is when you go to publish, how many of those names you have to put as authors? Yeah, good question. So the entry points, the website, and depending on people's interests, they come in in a variety of ways. Often I'll ask them to sort of get a scheduled time with me to understand what they're interested in and to kind of funnel them into a part of the project that they're likely to succeed in to start off with. But other sort of more decentralized ways that folks can get in are just by looking at the milestones and the issues that we post up on GitHub, because we have kind of all of our little mini projects that we're doing posted there. And so folks can come in and see, well hey, maybe I can help out with this piece of the project. We also do a lot to put documentation up so that folks can read and sort of digest and understand that we're trying to constantly make that even better and better. So docs.openworm.org, for example, has a whole list explaining our modeling process and all that. So those are some entry points, but I think often the most effective is just kind of getting that person synced up as part of another sub-project that's already going on and start talking to people. And the second part of your question was, what was the second part of your question? Co-authors, yes. So this has been an interesting thing, yes. So as manuscripts get developed, often what we say is that in order to qualify as an author, you need to make some substantial, substantial modification to the paper itself. So you need to have seen the paper, you need to know that the paper is happening and need to have participated in its writing in some way and that'll get you on the author line. And otherwise, if you've contributed some code that's related, that'll be an acknowledgement. So we'll put you in the acknowledgement section. That's so far it's worked out, all right. Yeah, about three or four years ago, there was a talk at SFN about how difficult it was to actually model C. elegans because the circuitry would change depending upon the hormones. And so I'm wondering to what extent the model of C. elegans in open worm is not just one model, but maybe a variety of models under different conditions and how you incorporate new scientific data into the information that needs to be modeled. So you're talking about neuropeptides and the challenge of dealing with that. And I don't actually know, I think this is a place where the field as a whole is limited. There's not a lot of published computational models, for example, that deal with peptides anywhere, I think. There may be a handful of them. So we want to incorporate those as well into the space. But on the broader question of data, we do work with declarative descriptions of models like neuromel and we've been working to have a cleaner pipeline from data to models so that you can actually have models that are generated from data that then once the data are improved or facts are added, it actually drives right into say a neuromel model or it drives into a different model of physics and that kind of thing and really trying to automate that whole process. But yeah, I mean, we are not any further ahead than the field as a whole I'd say in actually doing the modeling, we're trying to take the best that's happening in the field and expose it so that we bring more people into it.