 So, I wanted to start off today, actually, with a little poetry. This poem was written by a very famous American poet in the United States, and it turned out that she went to my high school, so I feel a particular attachment to it, but she also said something in 1939 that I think was rather prescient. She wrote, upon this gifted age, in its dark hour, rains from the sky a meteoric shower of facts. They lie unquestioned, uncombined. Wisdom enough to leach us of our ill is daily spun, but there exists no loom to weave it into fabric. And I think that this is really one of the great challenges of this coming decade, is to build that loom. And by a loom, I don't mean just the technology, just the machinery, but also the societal, the cultural, the human aspects of how do we bring these facts together? How do we weave the measurements, the daily work of neuroscientists, into a coherent picture? And so, as you know, in neuroscience, of course, we're studying very many different brains. We're studying zebrafish, drosophila, mouse, cat, rat monkey, human. And it's the challenges that we need to understand how do we learn from each of those systems in order to build an understanding of the human brain. And it's really that that excites us, right? We're interested, what is it that drives us in these uniquely human aspects that underlies our creativity, underlies our passion to discover, to go to the moon, to entertain and what underlies our emotions? At the same time, there are a broad range of disorders and diseases that afflict the nervous system and therefore affect us, affect who we are. And the challenge that we face today is that we simply don't understand how any of these diseases actually manifests at the cellular level, the systems level. What is happening to the brain in these, in many of these disorders? And we simply don't understand how to treat them. The treatments that we have found, we don't understand how they work. Seventy percent of those treatments don't work in fifty percent of people. The pharma companies are leaving, they're giving up on the brain, pretty much wholesale. It's too complex, it's too risky, it's too expensive. Also, two studies came out that showed that in the biomedical fields, seventy-five percent of the big findings, these are the discoveries that people get tenure for, science and nature publications that discovered something important that the pharma companies would take and go work on, but seventy-five percent of those were not reproducible. So we've got some challenges here, some pretty fundamental challenges to building this loom. So we're at a bit of a special time in terms of science. The last thousand years has really been about science being very empirical, collecting observations, describing them, trying to derive insight from these observations. The last few hundred years theoretical, the mathematical branch of building models has emerged, and in the last few decades the computational branch has emerged to facilitate running simulations and analyzing complex models. And so today, pretty much all domains in science are undergoing a transformation to what's called data intensive science or e-science, where they're taking advantage of the technologies, the information technology branch, to manage data, to process it, to analyze it, to build models, to run simulations. And this is happening broadly across all of science. And our challenge at INCF is to help transform neuroscience into an e-science, into a science that can use information technology to facilitate multi-scale data integration. So really taking the data from the molecular level, the neuronal level, synaptic circuit, macro-circuit, whole-brain, cognition, behavior, and even the clinical, and be able to facilitate the integration of that data. Now we're not going to solve all of that at once, but what we want to do again is build the loom that can help scientists put these pieces together. It's not a new problem. 13th century Buddhists wrote a story about this, that this is the parable of the blind monks and the elephant. Now each monk came to this object in front of them, came to this elephant, and each of them could describe just a piece of what they were encountering. They could find that the tail was there and he described it as a rope. The trunk was described as a python. Another one described the tusk as a saber. And each of them encountering just a piece of the elephant and describing it in their individual terms meant that none of them together could see that it was an elephant. Now in neuroscience, we're not that bad off. We know that there's a brain there, but we're not putting the pieces together. And we're measuring more and more data all the time at many different scales. So from the subcellular to cellular to tissue to whole-brain and in many different modalities, many different protocols, many different methods we're capturing so much data and it's growing all the time. And in fact there are some large-scale initiatives that are planning to gather even more data. We're entering a new decade of the brain. Now you all may know that the 1990s was declared the decade of the brain in the United States and I think in Japan it was the launch of the century of the brain. So I think they understood a little bit better the scale of this. But now we've had some announcements from the Allen Institute committing virtually a billion dollars over the next 10 years. The Human Brain Project was just awarded about a billion euros over the next 10 years. And the Brain Initiative in the United States was just announced which will be at least 100 million for 2014 and presumably much more over the coming decade. And we had a president of a country stand up and actually talk about hundreds of billions of neurons and trillions of synapses and point out that we can identify galaxies light years away and we can study particles smaller than an atom. But we have not unlocked the mystery of the three pounds of matter the one and a half kilos of matter that sits between our ears. And we also have to recognize that no one group can do this alone. That the U.S. is not going to solve understanding the brain on its own. The Human Brain Project, the European efforts are not going to solve it on their own but this must be a global collaborative effort. And some examples of data integration what it means to bring together this data and weave it together is for example building a model neuron taking morphological data, gene expression data, electrophysiology and building a model that can predict the electrical behavior of a neuron and running simulations to analyze and compute that. Model microcircuits taking information about the cell types within a volume of the brain, their distributions, their electrophysiology morphology and also synaptic connectivity and dynamics and using that to predict network dynamics understanding fundamentally the electrical basis of neural microcircuits and whole brain circuitry. Now it's possible with the data that's coming at whole brain scale that we will be able to actually integrate data across these many levels at the scale of the whole brain in order to start developing some insight to whole brain dynamics and of course this is the goal of some projects. Another aspect is multimodal brain atlasing. Now if you can bring together different scales of measurement including the structure at various levels of detail and functional characterization of whole brains into a common space you can analyze that and come up with a new way of characterizing the organization of the brain one that isn't just based on traditional anatomical boundaries due to cell density or other landmarks but one that really integrates both structural and functional aspects. So INCF was launched really with the idea that we need an organization that can act globally to facilitate this integration. So it was the Global Science Forum of the OECD that launched this discussion and came up with the plans to create INCF. In 2005 the plans were endorsed and August 1st 2005 INCF was launched. The mission is to coordinate and foster international activities in neuroinformatics to contribute to the development and maintenance of databases and infrastructure to support neuroscience, to make everything that we build or contract freely available and to work closely with government, academia and industry to make sure that this is as fruitful as possible for the advancement of neuroscience. So some of the activities that we do are to promote best practices in neuroscience in relation to informatics, to advocate open access and data publication. We work on scientific, technical and sociological issues around data sharing and integration and we work with government policy makers, funders, publishers and scientists to promote data sharing, collaboration and open access. The secretariat as I mentioned and as Stan mentioned is here, it's hosted at the Karolinska. There are 15 staff currently. This staff is really responsible for coordinating the scientific programs, the node activities for each of the member countries, manages the infrastructural developments and it organizes an annual neuroinformatics congress. At NIDESC we have actually many of them present if the INCF staff can stand up. Then you get some faces to put to, so you see it's over half the audience. But they're here, you'll see them around campus and feel free to talk to them and ask them other questions as well. There are currently 17 INCF member countries including Japan, South Korea, India, most of Europe, United States and we just recently had Victoria Australia join. So not quite the full continent there but we're planning on that by 2015 we get the rest of Australia. So the INCF programs are really a core part of the process of engaging with the community and this is how our activities are determined. We hold community workshops and those workshops can be proposed by members of the community. And those workshops bring together expert scientists from around the world, they don't have to be within a node, anybody from around the world on a particular topic that the organizers think can contribute to defining specific activities and make specific recommendations to the governing board or what product, services, standards, guidelines are needed to attack specific problems. They bring those recommendations to the governing board and the governing board makes a decision about allocating funding to those activities and those recommendations. They also appoint an oversight committee which is expert scientists to oversee the deployment of task forces that bring in again scientists from the community to work on particular problems, challenges defining these items and if they're unable to achieve that work from volunteer methods alone we can issue contracts that are targeted to get that work done quickly in the idea to really accelerate these developments. We currently have four programs, the digital brain outlacing, multi-scale modeling, ontologies of neural structures and standards for data sharing. And I'll go very quickly through some of the products that have been developed and made available from these programs. For example, a scalable brain atlas. This is a service that you can embed in your website to provide images of brain regions and allow you to hook in services to look up information. The Vox home space which is a standardized coordinate system for the mouse brain. If you register your data to that you can use the digital outlacing infrastructure to make queries to other atlases and to other systems like the Allen Institute and do transformations between Paxinus Watson and the Allen Standard Atlas, for example. So it gives an infrastructure for doing spatial integration. The program on ontologies of neural structures has brought together so Gordon Shepard has led an effort to bring together 40 scientists around the world to define neuronal properties, neuronal classes and properties and register them in a public online wiki. There's developments in defining pan mammalian reference structures, the delineations and definitions of brain regions for the Vox home space and also a common upper mammalian brain ontology. The program on multi-scale modeling has developed some standards for describing mathematical models of neurons and synapses and networks but in a way that is independent of the simulator that's needed. So this gives a tool to exchange one common model description between different simulators. The connection set algebra is a way of describing connectivity in large-scale neural networks. Multi-simulation coordinator is an API and a library that allows coupling at runtime of different simulators specialized at different types of simulation and they can communicate while the simulations are running and exchange information to coordinate those simulations. And there's also a computational neuroscience ontology for naming and categorizing classes of models. The program on standards for data sharing has two task courses, the electrophysiology task course and a neuroimaging task course. And both of these are really concerned about all the issues of metadata and data formats but also fundamentally the issue of provenance tracking and of reproducible results. How do you ensure that enough information is captured at data acquisition and then along every step of the way of your analysis so that by the time you come up with a publication and a figure in a publication you want to be able to track back through all of that process and know exactly how to reproduce that figure. We also have training activities. We've had a hackathon with the Allen Institute, worked a course on statistical modeling of neuronal data and modeling across scales of analysis. This year we'll actually have a course in Havana, Cuba, a short course on neurogenomics and brain disease and also multi-scale integration of imaging data. We also have a very active project with the Google Summer of Code. So basically we help match projects to students and Google pays for it all. So it's a very nice collaboration with Google. The question is how do we approach integrating neuroscience data? Because I think that if we share data alone without thought to its integration, it's not very interesting. Sharing data, I can hand you a hard drive of data, but you won't know what to do with it and there won't be any meaning derived from it until you can relate it to your data. But we want to integrate this data and we want to build a culture that can do this so that we can ask new questions. We can use that data to teach. We can replicate results. We can analyze, visualize, model, simulate, understand the data and build a new model for publishing. So to start along this path, we've been building a collaborative research infrastructure for neuroscience and that is to enable sharing of data, publishing data, preserving it, federating it, making multiple sources of data accessible through a common interface, searching over that data. We want to be and provide the capabilities of Google, but for neuroscience data, ease of use, making it sure that it can integrate into a neuroscientist's lab, making the data accessible for analytics, and also supporting the building of data analysis, workflows, provenance tracking, and so forth. So the first step to this is an international federated data space. I'll explain what that is in a moment. The next step are sets of services that build on top of that, including a semantic community encyclopedia for standardized vocabulary, digital atlasing services using the Voxholm space and registering data to standardized coordinate spaces, and data access query and provenance layers. So these are services for accessing the data. And upon that, the community then can use that to build workflows of various types to visualize, analyze, model, simulate those workflows. And specific applications for each subdomain in neuroscience, for example in neuroimaging, can build a specific application for searching for brain images and for doing analysis on top of that. So in general, data sharing is difficult. We get a lot of questions. Where do I put my data? So a number of funding agencies are now requesting that you have a data sharing plan or that you make your data available. Well, how do you do that? It's not always obvious. How can I share data in a collaboration? I don't want to make it public yet, but I want another group to have my set of data and it doesn't work easily in Dropbox. How do I make that data available? Where can I back up my data if I want to have an off-site backup that's maybe with a collaborator or somewhere else in the world? And also, what's the place you go to? If somebody says, there's data out there. There's data shared about that. Where do you go? Where do you look? So we wanted to do that and we wanted to make it easier. We're modeling it on the idea of a Dropbox for scientists. You should be able to drag and drop any type of data, text, or images. And you have the control over access to that data. So we held a workshop. So this is part of that whole process of INCF. We held a workshop that brought together big data scientists, that brought together people from the CERN, and that brought together people from Zincotrons, from the Allen Institute, all over the place that had experience already dealing with these issues. And they came up with a specific set of recommendations that told us how to move forward and how to build the first step in an infrastructure for data sharing. We call this data space, the INCF data space. And this really allows you to make your data public, collaborative backups, and gives the whole infrastructure for doing that. So the first use case is basically if you want to take a collection of electrophysiology traces, images, maybe a cell reconstruction, maybe a model, it's a folder worth of data. You drag and drop it and you can make it public worldwide. Another use case would be for collaboration. Say this site is generating large amounts of data, big data. You want to replicate this to this site. The data space gives you that infrastructure to do it. High speed in parallel. And also international in terms of an international infrastructure. So for our Japanese node wanting to make sure that their infrastructure is replicated to another continent, should there be another earthquake or nuclear disaster. We provide the European mirror of the Allen Institute data. So we can use the infrastructure as well for replicating the data between these sites. And also in Europe particularly, there are a number of national infrastructures that provide storage to the scientific community. But there's generally no easy way to access this. If we get them hooked up to the data space, we can easily provide access to those resources. This is what it looks like to log in through a web browser. You can log in. And then you've got the different machines that are around the world. So what this is doing is it's basically hooking up data resources, data repositories from all around the world to a common interface. So it's not that we're bringing all the data together to one central location. We're basically making it all accessible as though it's one big virtual hard drive. And you can drag and drop your data into or out of the browser. And so the web is one way to access it. You can also have a Java client, which gives you high speed parallel transfers. And then of course for the advanced users, there's a strong command line interface for accessing all the properties and replicating and synchronizing data. So this is built on existing technology. We basically contracted to get it all assembled and put together and refined for our need. We deploy it on the Amazon cloud. So it's replicated. The metadata for where this data is and all these servers is replicated in four zones around the world. So it's very fast, low latency to do searches and queries and then get routed to the specific data servers wherever you are in the world. And what's important is it gives us, at one level it gives us this drop box like functionality. But fundamentally it gives us a whole layer to manage all of the future requirements in terms of very large scale data management, workflow tracking, provenance tracking, arbitrary metadata annotation, all of the things that we need to do the next generation of applications. But we first want to get scientists with an easy to use infrastructure for sharing their data. So there are a variety of ways to join the data space. First, we are providing five gigabytes of storage for free for those of you who have data sets to make available. We also, if you have an individual data server already running, we provide the software packages that install onto that and then hook up to the Amazon cloud for making that data available. And if you have a larger organization that has your own set of users that you need to maintain, you can create your own zone in the data space. And we support you in doing that so that you can manage both the federation users and your own private users. You can also install a server that's solely in the Amazon cloud if you're willing to pay the cost. So Rafael Ritz is here and feel free to contact him for further information. And we also have a competition or contest going on right now. You can win $10,000. It's American dollars, but it's sort of close to euros. And if you show us, give us a good shining example of a large-scale, powerful, reusable set of data. Data that others can access and use productively. Let us make that a publicized case and we'll give you $10,000. So the next steps for data space, we're working to build in even more capabilities, making it even easier to use, even more Dropbox-like integrated document previews. We could even... One of the future things that we can do is add a plug-in so you can visualize, for example, your brain images remotely but just within the browser. So you don't have to go and copy the data out but start to add in services on top of this that are specific for different data types. Ultimately, we'll also add in capabilities to do analysis where the data sits. As the data gets larger and larger, you don't want to move the data to analyze it. What you want to do is you want to run your service where the data sits. So this is where we're headed with this. And of course, we'll also support tagging with semantic tags, linked data principles, community ratings, collaborative analytics. We're an associated partner with EUDAT, which is a European project to build a data federation and a collaborative research infrastructure, and we are contributing our experience with building this to their group as well as testing out some of the services that they've developed. Another service that we're developing is actually built on an existing site, Neurolex.org, and this is a site that was developed at NIF in San Diego, the Neuroscience Information Framework. It contains the ontologies, a large number of ontologies for neuroscience. So standardized vocabulary and classification in relationships of brain structures and of diseases and so forth. And what we're doing is we're going to take this to the next generation. We're going to build it on top of semantic wiki technologies and turn this into a community encyclopedia that we call Knowledge Space, which will be used to maintain living review articles. It will link to the latest data models, literature, and be a tool to maintain these standardized vocabularies and keep the community going back to that. And those are the vocabularies that will be used to annotate the data in the data space. So it's searchable. We also have a program, and it's kind of an emerging area in clinical neuroinformatics. And this came about through a partnership with an organization called One Mind for Research. They're based in the United States. This was launched by Patrick Kennedy, who's the nephew of John F. Kennedy, and it was seen as a new moonshot, but a moonshot for the brain. It's an effort to reduce and improve the treatment of brain disease. So they have a vision of building a platform, a neuroinformatics platform, that can build a common, shared understanding of disease. One that's accessible not only to researchers, but to patients, to clinicians, to technology partners, and to patient advocates so that there's different views on the clinical data for these different audiences. And we have partnered with them specifically to coordinate the development of a platform for a large-scale study that was just recently funded under FP7 funding for traumatic brain injury. And so this is a large-scale multi-site, multi-country study. Seven years, greater than 5,000 patients, and it really is about providing a standardized informatics platform for that clinical data acquisition, for managing the brain imaging, for managing the biomarker data, high-resolution ICU data, and so forth, so that it can be brought together and analyzed. And this will also be made open source and open access to other groups wanting to deploy similar infrastructure for their studies. And we'll use, for example, the data space to manage that data, the large-scale imaging data, biomarkers, genetic, and high-resolution ICU. So as I mentioned before, the digital addressing infrastructure enables spatial integration of neuroscience data. And one of the things that we see coming in the future is these atlases that really consist of large amounts of data that are still remaining distributed around the world but are registered both spatially and semantically and become searchable. So you have an interactive atlas for a given brain, for data, models, and literature. These are some of the identifiers that we see as useful for bringing that data together and aligning ontologies for describing the protocol, the experimental protocol, the metadata, the animal, the brain structures. Space, so we have a common reference system. And time, both in terms of development but also behavioral events and stimuli events. The genetic identity, if you're looking at populations of genetic variants. And for patients, a globally unique identifier, one where if a patient participates in one study and they also participate in another, you have the possibility without personally identifying that patient of bringing that data together and understanding comorbidities, for example. So data integration, what we really want to provide is the ability to link your data through standard object models which give you the informatics ability to access many different types of data but link them to ontology, to standardized names and standardized spatial locations. If you have a favorite neuron, say a layer 5 pyramidal cell, you've got a standardized name for it and the location of where these cells can occur would be described in standardized coordinate spaces and then that would all be linked to various data types including morphology, electrophysiology, ion channel, gene expression, protein distribution and also models and literature. So this is what will happen after you get your data into the data space. You'll be using the community encyclopedia and the tools for annotating and curating that data into these entities that then can be shared, exchanged and analyzed where you can search for data that was generated according to a specific protocol on a particular species. You can look for the community rating. How well was that data set, annotated and curated by the community? Making these things pieces of the puzzle that you can bring together, analyze, integrate, visualize and model. And we're using the latest W3C standards for describing semantic relationships and provenance to make that data and its relationships to other data searchable and tracking the provenance and analysis as you saw in the task forces for neuroimaging. Now there's a group that we're also working with Sage Bio Networks which is really in the clinical space but they have developed a platform for taking those objects that I'm talking about, the curated data and sharing not only those data but also the analyses that are done on them. You can share those with other groups within a social network type of environment and what they're seeing now are the coalitions of collaborating scientists that take one model that's been developed fork off, split it, start doing new models and as a community, as a collaborative community producing new and even better results more prognostic disease models and what's important is that this system keeps track of everybody who contributed to every step of the way so the attribution will give you credit if you developed one key analysis that sits in the middle of that you're going to get credit for that so it's a whole new way of thinking about working collaboratively, sharing and getting credit and we see that this is really growing as a possibility as an opportunity for neuroscience and related to that is using the same type of system to produce reproducible digital publications so we see within the next five years the ability to go to a journal article and for every figure you can open it you get the full provenance change and you can trace back the whole analysis to the original data and actually re-run that analysis changing a little parameter in the analytics to see how much does it affect the final outcome so this is coming a relationship with a whole brain catalog at UCSD that is built a system for taking these data and visualizing them in 3D and navigating them like a Google Earth for the brain importantly I see INCF really in a role helping to link these large scale brain initiatives now we're already working with One Mind for Research we will be partners on the human brain project coordinating some of their infrastructure we're already working closely with the Allen Institute and we're already discussing with some of the leaders of the brain initiative how we'll be able to facilitate data sharing and access through our work on infrastructures and standard vocabularies and so on so this is really the future for INCF and these are some of the partners that have helped us along the way to where we are now we also have an annual congress and this year you're very fortunate it's here on this campus August 27th to 29th we'd love to see you there and we're going to have a special session on these large scale brain initiatives so it's really your opportunity to learn firsthand from Tom Insel from the National Institute of Mental Health on the Obama Brain Initiative from Carl Heinz Meyer who's a co-director of the EU Human Brain Project and Clay Reid from the Allen Institute on their work on Project Minescope so I would definitely welcome all of you to come I want to, since we're also webcast and some of you are INCF community members here I want to make sure that I emphasize our deep appreciation for all of the work, effort, volunteer labor that the members of our community go to making INCF successful and we're very interested in learning more from all of you as to ways in which we can help ways in which we can facilitate your work so thank you, I'd be glad to take any questions Don't be shy so we'll get you the microphone first so the webcast people can hear you It's a question about the information technology and how it can be used for robot technologies and a combination of brain interfaces and so on Right, so of course by learning by being able to bring the data together build models and learn some principles about the brain, those principles can be used in building neuromorphic technology controllers for robots and this type of thing so there's the potential there and we do have some partners that are working in that space that we're glad to work with but we don't have any specific program activities on that at this point So could you develop a little bit more the relation to the human brain project? Sure, so as many of you may or may not know the human brain project is really about building informatics platforms for managing and integrating data about the brain in order to build models and large-scale simulations So there are a number of different platforms that are part of the key deliverables for that project One of them is a neuroinformatics platform and that includes things like being able to federate data have standardized vocabularies and we'll be working closely with them on such efforts to make sure that they're integrated with the community standards as well And the ultimate aim? And the ultimate aim? But the ultimate aim is really to... it's multi-fold So for the human brain project there are three aims One is first of all to integrate all of our knowledge about the human brain into a common place and be able to build models and simulations of it Another is to develop new ways of building an understanding of diseases and identifying ways of developing treatments and the other is to extract principles from these models and simulations to derive new technologies new brain-like technologies for neuromorphics and neurorabotics And ultimately to understand the brain? And optimally very much I mean all of this is very much to understand the brain but in many different aspects to understand it both in terms of how the brain functions what principles can be derived for computing and also how to treat disease First the more or less standard ways of dealing with data has been that you perform a meta-analysis of some kind of field where experiments have been done data collected, etc And so then this goes in two ways that first of all you have to collect the data you have to see whether the data are compatible with each other use the same terminology use the same methodology and so on and so forth Then I guess this is the stage where you're going to help, right? to collect data on this standardized kind of platform Then the next step is what do you do with this data? So are you going to make a model? Are you going to extract some kind of information? Is your help coming there as well? Absolutely, I mean that's by making sure that as you're saying first of all at the level of being able to bring that data together and do the meta-analysis having standards for the way that data is described and the brain structures are described and the data formats that's already going to help there But to build models if you have an infrastructure where informatically you can make a query you can get the data back with sufficient annotations to know I can run this analysis script and extract some features from it and use those features to build a model and then when I build my model I run a simulation for example and say well what's the data that I can compare that against that's also available so it's part of that whole process and also managing those workflows so that it's all reproducible so that you can share that with somebody else and they can rerun it easily And say to what extent is this kind of analysis protected because you obviously you want to share but not necessarily with your universe So we take the position of you can decide completely if you want to make it public private or not So we give you full control some people think we should force everybody to make it all open we think let's give them the time to decide when Other questions It seems that there should be an advantage in having some standards for doing the experiments Who will develop those so that results can be So I don't think that many people are going to accept that a standards committee sits down to decide a standardized protocol but what I do think is that by making your data available and other if a population makes their data available annotated with the protocol that they used so a description of that protocol that if you want to build another data set that you can integrate with that you'll follow the same protocol and by being able to share many different data sets that can be integrated you'll know how to follow which protocol in order to build a data set that's compatible I think that should only be done as long as it's useful as long as it's valuable to keep integrating that together but you'll still need innovation you'll need new protocols you'll still need an ongoing exploration and the innovation that the scientific community is known for but the value is is that we can have the data and we can search for the data that has been captured by the same methods Other questions? All right The simplistic way of formulation of Google Brain so that you can easily move between levels of formulation from the molecular and cellular to the disease oriented or more global nerve function I think it's an instrument to facilitate exchange between different neuroscience communities and you have after all so very many differences in your science community from linguistics from psychiatry to structural biology and it's very important to facilitate the interaction between these groups so one can realize why the defect in an ion channel would give a symptom given disease or cognitive etc Good All right, well thank you very much for coming