 Let's say the question that we have, why does the human brain project need such a, let's say, big computing and data analytics infrastructures? And so I formulated these questions and I tried to answer it today why we need these infrastructures and why we want to build it up in a sort of co-design process together with neuroscience. So my talk is structured in the following way. I would first give you a short, very short introduction. We already have heard yesterday by Richard Frakowiak something about the human brain project. Then I will concentrate on what is our subproject, the HPC and big data analytics subproject and the corresponding infrastructure to build. I will give some exemplary activities that we have done in the last year. So the project is running now for one and a half years to build the infrastructure and what actually do we do in terms of research on the one hand and the other hand in terms of activities to, let's say, give services to the community. Then I will introduce you to our concept to form the infrastructure. Building and establishing, let's say, computers, establishing data repositories, establishing the networks as well as the, let's say, parallel global file systems and so on. And on top of that, all the service is one thing. The other is it really adapted to the needs of the human brain projects and of the wider community on the long run and our concept to guarantees is to introduce this co-design projects together with neuroscientists. And then I will try to answer the question I pose at the beginning of my talk. The question is why do we need those concentration on supercomputing on HPC and on extreme scale data repositories? I have three examples. One is image registration. The other one is intercellular architecture. The other one is fiber architecture. And then I will also touch simulation requirements. And you will see that all of those are now going into regions of requirements as to the, let's say, compute capabilities and the data management capabilities and the communication capabilities that all are already at the high end of current technology. First, let me start with human brain project. In a nutshell, what is our, let's say, commitment? We want to work towards the unified understanding of the brain and its diseases. And of course, there should be an impact on IT. One should remember that this project has been started or is started and is funded through the ICT Directorate, Directorate Connect of the European Commission. And here's a brain-inspired aspect of IT and the neuromorphic aspect. And of course, from my point of view, also the guiding, the future of supercomputing and of big data analytics through the neuroscience community's requirements play an important role. So from a European Commission point of view, it is something very new what they have set up. It's a so-called flagship project. There was a long process to choose from. I remember at the very beginning from about 15, I think it was 49 expressions of interest. They condensed it down to 20 and then to 9 and then to 6 and finally to two flagships. One is graphene. The other one is the human brain project. This is in the new and emerging technologies section of funding of the EU. It's a collaborative research effort. It is interdisciplinary. It's between neuroscience and a lot of other science and engineering fields. It is planned for 10 years, but of course, its impact should be much longer. So it's not just some project. It started out as a research project, but you will see that it has become much more than that. And of course, this discussion of its funding, there is a research project from the side of the EU in the range of 0.5 billion euro. It is expected that another 0.5 billion euro will come from the member states. This is in negotiation and different mechanisms to achieve this under consideration. Today, we have more than 400 scientists from over 100 institutions and many countries, of course. So you see all the logos here. Just to give you the perspective in which this project, and maybe one should not say project, it is more than a project, is developing. So we have three parts. So first of all, the neuroscience and the scientific aspect from the side of neuroscience is the idea to reverse engineer the human brain, of course, through the fact that one has a different focus of consideration, the mouse and the human brain, and of course, also others. But these are the major ones, data integration from all sorts of data, integrating also cognitive and theoretical models in research. Simulation plays a major role, and it is clear that the idea is that simulation is not just to have this fine running and performing human brain, virtual human brain. The idea is much more to use simulation as a research tool to bridge maybe unknown scales between the structure that comes from post-modern brains and functions that might have a much lower resolution through simulation. So one can learn through simulation during this process of understanding that's the idea that we have in the human brain project. So we want to have, secondly, because this is the first aspect, the second very important one is to create a research infrastructure that will serve the needs of neuroscience. And yesterday, Jean-Hill was heading the session on your ideas, also heading the sub-project neuroinformatics, where all the data federation is carried out. There is another sub-project on simulation, and all these sub-projects will transform into infrastructures. Jeanette Helgrin will give a talk also. The HPC and Data Analytics sub-project is headed by me and by Thomas Schuldes from Switzerland. So the idea is that we create all these infrastructures. And of course, we cannot just tell people these are those infrastructures that you need. We want to know from the neuroscience which infrastructures are needed. And it was always like this. I'm coming from theoretical particle physics. And I think that this field has very much that the influence, the development of computers in the past having, let's say, very high compute capabilities in order to perform those simulations of quantum chromodynamics and so on. We need, for the future, we need guidelines. And one of those guidelines, certainly, is this developing field of neuroscience. And as we have heard yesterday, many countries, and it is not by chance that all those countries also built and used computers and simulations extensively, like, for instance, the US, of course, but also China, Japan, and Europe, and more, participate in this research. And we can expect very much that this will influence our future direction of computing. Then, of course, the idea that we contribute to new computing technologies by brain-inspired, by neuromorphic computing. There is a neuromorphic computing sub-project that should develop into an infrastructure in the future. Of course, these infrastructures always are in a different stage of development, of maturity. Though the maturity of HPC might be most far progressed today, the other ones will follow. And finally, the idea is to have a neuromorphic facility working for the needs of, let's say, performing calculations for neuroscience problems but also for other type of problems where you can access those facilities by getting granted compute, or I don't know if you should call it compute, by getting granted neuromorphic processing time on those systems. Medical informatics, yesterday we heard this interesting talk by Professor Frakobiak by Richard. And he has given a quite comprehensive overview, also going into technical detail concerning these activities. There is certainly a connection, though if you might have this question, what is the difference between the medical informatics, big data analytics, and what I will talk about? Of course, from the point of view of federation, the medical informatics is a federated activity in the sense that at several hospitals more and more, computing will be done and data repositories will be created. Then, let's say, create anonymized data streams, aggregated data streams that can be used and integrated in larger analysis work more centrally. So this will be the connection. This is something that we will build up and that we will create in the human brain project. And finally, the idea to have these closed loop simulations in the field of virtual robotics, understand that the brain needs a body. The body needs a brain. Really intelligent roboters, of course, need to be steered by intelligent brains. This is a long-term vision, but we start with this already now, and this is the reason we have a virtual robotics subproject. Now, the third point of the human brain project, which was not foreseen when we started. It has become funding intermediary from EU to integrate new groups through open projects. I think that's a very important point. So it's not just that we started with a neuroscience project. Certainly, that's a research part. When it was found, a project of this size necessarily will develop into an infrastructure. There is no way around that. It's just the way such a project has to go and will go. And of course, in order to not to, let's say, being stuck with the people that are in the project to be open to the outside, it has to become, let's say, there has to be introduced a mechanism to create new research within the project. Through, we call it open calls, or there was recently this open expression of interest where four projects will be chosen to, let's say, being part of the project. So they come in into the human brain project as the others before. So this new project is led by Katrin Amunds. And I think this will be a very successful, let's say, next step to, and will show to the outside world how, let's say, flexible this project acts and tries to, let's say, re-humanize itself to become younger and to become forward-looking and focused to the real goal. So this is my view on the project as it is today. We will have these three fields where we concentrate on, but the structure that we have set up in the past certainly will still be reflected within the three parts. Maybe one word to why or how the interest of myself as leading or heading an institute for advanced simulations that's actually the title of the institute I'm in, in Yulisch, with a lot of different type of simulation activities. The reason is we have a sort of funding programs in our super-ordinated organization called Helmholtz Association, named after Helmholtz's famous physicist of the past. So we have here two funding programs that join. One is the program led by Katrin called Decoding the Human Brain. It's at the Foschungszentrum Yulisch Research Center. She is a speaker. And the other one is super-commuting and big data. This is at two big centers, it's Foschungszentrum Yulisch, as well as the Karlsruhe Institute for Technology. I'm the speaker of this program. We came to the conclusion that brain is one joint vision or information processing is at least on this level is a joint vision of both of our programs that should finally lead, first of all, to a better understanding of the human brain organization. And of course, should bring more to computing and big data, especially when we address extreme scale. So we see it from both sides. Why do we need extreme scale computing and how do we do extreme scale computing? So these are the two sides of the metal. And so we decided to go for an allied program partnership that was started with a sort of pilot. This is called super-commuting and modeling for the human brain. So we were generously, but must really say this, funded by our research organization to be part in this activity and to be, let's say, also play a role in the human brain project. And this is our, so to say, our basis for this. And this we want to do, we want to help driving the FET flagship human brain project with this activity. So this is a motivation that comes from our center. So these are people and you see Katten Amunds and many of those you might know, from Augustis Mann, Achse, von Ja Grün, Abigail Morrison, she's responsible for NEST. There are also others from the chairman Cancer Research Center, Roland Eils, from the Munich Center, Wolfgang Wurst, Alan Evans, you all know, Felix Schirmann from the Blue Brain Project, Karl Zilles, Alex Zareel. So and finally to show you a picture of a computer, maybe this is one or two I have. These are big systems. I hope they become smaller in the future by integrating functions and better information processing technologies that we can learn from research that you do. I must say I was extremely impressed yesterday from both science talks. One was of course a small scale, let's say research on small scale brains which was very interesting. And of course the large scale brains where I'm more in and learn more and more, this is where we want to go. One can really feel that this is fields that have a huge, let's say, impetus at this time and from all these activities that they have so to say triggered in the countries that I mentioned before, I expect that the impact also on the future of computing will be huge. Let me come now to my second topic. This is going into more detail about what is the high performance analytics and computing platform of the human brain project. What do we do and how do we learn to express what we want to do better and better? So this is not just that we started and that this is it. We are adapting to what is the needs and how all this evolves. I'm extremely optimistic that this project is on a very, very good track and is internally flexible to really adapt to the needs to the Euro science and what is required. What is our mission? And I should say this is a flexible mission. We certainly will not stick in the next 10 years exactly with those statements. These statements will of course be also, let's say, fed back from what research will bring. Of course, building and running the hardware and especially also software, of course, infrastructure in the human brain is one of our most important activities. This is concerning, of course, simulation, but simulation is not just running nest on your own. It is much, much more. It is starting from the Abinizio type of simulations via molecular dynamics by diffusion, reaction type of simulations, to going up to the cellular ones, to the nest, to the point neuron, nest and neuron type of simulations and going up to the theoretical models. Data analytics, data analytics of all aspects I will concentrate on what Yulish is doing in this respect, where I believe is the most important challenge nowadays in order to make progress. And of course, the understanding supported through visualization. Using big computers like a laptop might be the goal and this will be a goal that might also influence much more than just the human brain research or the brain research, since this could be useful for a lot of other type of research. Enable simulations up to the size of the human brain. Of course, many people discuss why is it necessary. There are certain investigations, for instance, from the nest people that tell us that certain type of averaging processes that you need in order to get proper signals require especially for those human brain type of simulations to go to larger and larger systems. There are multi-scales, certainly. It will not just cover one code. There will be several code. We have to close the border lines or to transcend the border lines between the different scales. And there will be, also in one code, there will be from a purely geometrical point of view, there will be different scales. There will be different timescales and so on. It will be massively parallel at an unprecedented level. It will be data intensive at an unprecedented level. We don't have the technology today to do that. So it's just impossible. And as I said before, interactive. We will enable analysis of massive neuroscience data sets. So big data analytic workflows now comes to this issue that many people talk about workflows. But if you go for automatization of analysis, then what's this? Something wrong here? First time in my life that it did this? Just broke down. I don't know. It's just stopping. Never saw that. I have to restart. I don't know what is going on. No. Should not happen. Sorry for that. I hope we can continue without interruption. Workflows that allow you to use, let's say different programs and to go back and forth between different steps of an analysis of, for instance, a registration problem or whatsoever have to be created and have to be made stable. You see stability is a big issue with computers. And we want to push. That's also one of our missions, the co-designs together with neuroscientists in co-design projects. And what is our roadmap? The best way to, let's say, express the roadmap is to the level of maturity we think that the infrastructure will develop. So we are at month 30, we have a first version of the high-performance computing platform. I will show you what this means. What is the ingredients of this platform? What can you do with that? We have the goal within the next 30 months to arrive at a sort of pre-exascale, exascale in both data and data communication and in compute performance and capabilities. From the point of view of simulation, this is a goal we probably have difficulties to achieve to have a sort of 20 petabytes system that would already be able to have lots of neurons integrated and with the compute performance of 50 petaflops all together for exceptional type of simulations, of course, only not, this is not a standard. And this should go on, so in the second phase of the project, end of the second phase, this is the ramp up phase, this is the first SGA, we call it a special grant agreement, special grant agreement. The second phase and the final phase we should be developed the extreme scale and let's call it exascale infrastructure, I'm not sure if it is really important to know the exact number of flops, but it should be an infrastructure that would really enable simulations that are not possible today. And how would we go forward to do that? Of course, we have a goal to serve, first of all in the project, several sub-projects and these sub-projects of course are reflected in the community, so activities in the community. At this time in the ramp up phase, we have one activity that has to do with the future of supercomputers, we call it PCP pre-commercial procurement, it's one of those ideas of the EU, how to let's say foster companies and so, on the other hand it gave us a good opportunity to include the, to convey the needs of the community to companies. So usually companies don't react, but here companies like Gray, like IBM, like Dell, like others reacted strongly on our needs on the molecular dynamic simulation benchmarks on the simulation benchmarks from the cellular simulators, from Neuron, from Nest, to let's say adapt their strategies. So I'm sure that already a little bit, through the small PCP projects, we have influenced the future direction of their research. The second one is important, I will show you results from mathematical methods, usually the mathematics field, it does not appear so much, but if you go further to HPC, every 10% of let's say improvements through new mathematical algorithm and methods is so to say real money that you spare, and programming models as well as optimization technologies adapted to the needs of the neuroscience people, not just free research, it has always the component that we are guided by those requirements. Interactive visualization, especially with examples and with real problems from the field. Management of very large data sets on a computer, integrating them is an important issue. Here we have let's say the direct contact to the medical informatics subprojects through Anastasia Ilamaki, who is a computer scientist working in the field of databases and of very efficient and large databases. Integration and operation of the platform itself is an issue and probably the most important, though that this all works together in several countries, four major sites and a lot of other smaller ones. User support and community building, certainly important. Here we have to contact to our education program and of course it's managed finally also. Interesting is that we are connected to a network that is created by the partnership for advanced computing in Europe by Braves. And as you can see, the major sites that have, let's say the carry the infrastructure, the HPC infrastructure, so here you see Ulish, here you see the EPFL, you have CSES in Lugano, we have the computer in Bologna, Bologna, where is it? I have to see it, this one I guess. Bologna Supercomputer, we have Barcelona and as you can see here the CSES, these are the major sites, but all those sites have satellites. Why is it interesting for us for the human brain project because of the fact that this partnership has already established a European network with global file system support, so data exchange on the level of hundreds of terabytes is possible here and it is very secure, it's all IPsec safe communication, it is pre-configured so there's no routing in the sense of that it is rerouted, it's fixed routes on this network, so also let's say data that might be very sensitive can be transported over this network and as you see it supports global parallel file system, so if you do something on the machine in Bologna, it's very easy let's say to integrate activities on the machines in Ulish and so we will have finally a U-Wide multi-petabyte data repository at our hands for the human brain project. The final word to the main contributors of this work, as you see a lot of people are working in the project, these come of course also mainly from those sites and some of those you might be able to identify as I said before for instance Anastasia is in the project, so this is the site of the human brain project of those people that are doing infrastructure activities to build it up. What is the present status? We have had the first platform release and what is the content of this release? We have now available the supercomputers and this is not a triviality, usually you have no availability of supercomputers over national boundaries, so within the human brain project you can access those computers through proposals that are of course always peer-reviewed, that's a prerequisite, but this is now possible that some member of the project from another country can of course have a proposal in Ulish and the other way around. Using for instance the big system, the blue chain queue that I have shown in the first picture, one of the largest computers worldwide, I think still in the top 10. We have cloud storage available, then we have a system that allows you web-based access, but very fast access to those services and also access that would allow you to use this as a sort of platform that can integrate any type of analysis tool that you would like to integrate. And finally this is a basis also for our own collaboratory software platform that we just created in the human brain project. We have available high fidelity visualization that is not only on site, that can of course be developed and run on visualization workstations or platforms and is let's say compatible to larger systems, to caves or to power walls, if necessary, we do research on the software side with mathematical methods, performance evaluation, programming models, this research is not just that we choose it ourselves, there is an active process together with the neuroscientists. We have established HPV user IDs and single sign-on, those is all for the comfort of the users and we are now providing the basic middleware for our collaboratory to build where everybody that has access to human brain services can have, let's say a very convenient, let's say a portal, a very convenient portal to use all those tools that we are developing for the benefit of the community. And as I said, one other element is this pre-commercial procurement process that is evolved very far that now also influences the course of development of large systems from let's say, potent companies. Now we are also changing this structure of our sub-project somewhat, we adapt to the needs so the simulation technology methods will be instead of the pre-commercial procurement, we will be our work package here that are directly adapted to the needs of codes so several methods that have been developed, in particular the mathematical ones will be done here. We will concentrate on data-intensive supercomputing especially also including the visualization. The other ones are more or less the same but we will also include the simulator engine that's planned, like Ness and Neuron, et cetera. So there will come more in so we believe that through our open projects in the course of time we will have more simulator engines in the project and we will have methods and big data analytics as a new work package since this becomes the major bottleneck, the computational, let's say requirements here become the major bottleneck in research. Let me come now to my third topic, exemplary activities. One is for instance, who answers the question why we need mathematicians. There was a recent work on the inclusion of gap junctions in point Neuron simulation, et cetera in one of the talks there was also the gap junction problem mentioned. The problem in the simulation is it is difficult to include this because you have a time step wise evolution of your spiking activities and the let's say reaction on the gap junction always is a wrong one if you go along the time steps of the simulation because it's not properly resolved. So what have people done so far always has introduced factors of additional complexity and especially of wall clock time. So with the help of mathematicians and you thought of, I call it a wave form relaxation has been introduced. It's an interactive process that can be done easily between the time steps without increasing the wall clock time, let's say not more than maybe 10% or less than 10%. And it has been implemented in Nest. Now Nest contains those accurate handling of the gap junction. So that's one good argument. And we are seeing that in the future there will be many more of such let's say good ideas, new ideas brought in into the simulation software through the mathematicians. Next one, why we do need HPC experts and I mean here HPC in the sense of having optimal implementation of programs. So we have a long history in performance optimization all over Europe. So this is in Barcelona. We have very good experts in English. We have experts in all the other countries as well. And one or two codes that we are developing to understand performance bottlenecks are for instance Paraware. This is good for the local type of performance optimization and Scalaska. It is a scalable optimization tool. We have made found massive improvements for Neuron and Nest. And now if you grant supercomputing time with Neuron or with Nest, let's say in the range of a million Euro per year today maybe that's one or two million Euro per year if you improve by a factor of two either you spare or you have much better opportunity to use the codes. That's a very well invested activity because it also will help us in the future. That's the reason why we need those people in the project. Why do we need computer scientists? We need computer scientists because this project is huge. You cannot have particular solutions for everybody. You need unified solutions in order to give everybody the opportunity to let's say access any of the tools. This goes through a basic software we call Unicor. That's probably the only safe. Comes from the time of the grid. Access software of computers and supercomputers and all our big systems can be accessed through Unicor. You don't feel Unicor because it's more or less middleware in the context of what we do. Finally, users will go through a collaboratory. This does not mean that they are forced into, let's say you're going just one path. Finally, they can also go through the collaboratory and do an SSH connection to a computer. But it goes through a mechanism that is inherently safe and will not, let's say, be compromised. Nobody will be able to compromise access or anything on this system. And of course, then through the collaboratory, you will have access to all of those services that we will offer on the system. So that's the reason why we need computer scientists in the project to help us. And why do we need visualization specialists? We'll show you just with some examples. So we have simulation of neural networks interactive exploration, details, navigation of neural networks, abstract presentations, exploration of relational data, large volume data sets navigation in physiological data, pipelines and architectures. And of course, I said before, high fidelity platforms that start with simulations of neural networks. So one of those codes is the RT neuron. Some are integrated from other sources, some we develop ourselves in the project. So there is, I'm not an expert here, but those who are experts, we'll see what you can do with those types of visualization. This goes on for a while, I shortcut it here. If you want to have it, you can get it from me. This other one is, for instance, a tool that has been developed for nest simulations. So you see here the visual system, I don't know from what, but Katrin might know. Where you can look into, you can have details that you can also, let's say, show in table form at any stage of a simulation in an interactive manner. Then you have navigation in neural networks. So there's a neural scheme, you should say, that is an abstract representation. This can be used later in connection, for instance, with RT neuron to get several parts of a network, let's say highlighted. You have a form of pre-like representation of the network. This is neural scheme. One can explore a relational data. So you go from different compartments, you see which connections go to others. This is also in connection with nest simulation and neuron simulation. There is interpretation of very large volumetric data. We have also one example, there are several others, Levera, for instance. So you can go up to, I think, gigabytes of data that you can handle with Levera in a parallel manner. The same is Paraware. I know this is a PLI. In the context of the polarized light imaging activities at Ulish, together with the Aachen people, there is navigation in morphological and physiological data. I think this has been included from outside source into our tools, so these are all tools that you can get through the collaboratory and can use for your activities. Let's go on. There can be combinations, so you have on the one hand RT neuron and the neural scheme. So you can project to regions of interest through your navigator that you have on the right hand side and can project out certain activities on the networks. There is Paraware, where you can look into very large data sets as I run on the system. This means it's not going through a bottleneck. You don't remove the data from the machine. You only see the visualization. And you have high validity platforms. Of course, if you go to Exascale, it's not possible to get the data read out. The IO will not be possible then anymore. And caves, you know, all I don't need to show you caves. And you might know how to use them. Of course, for larger audiences, it's also important to have display walls. All this technology is compatible to the usage on your laptop as well as on more visualization, CD visualization workstations. Okay, this is more the technology. Of course, we also need good teachers. We do a lot of workshops, education workshops and summer schools. Yeah, I've had some of those, as you can see. And several locations have become others. We have planned a human brain project PhD curriculum with five syllabi, ICT for non-specialists, brain medicine for non-specialists, neuroscience for non-specialists, research ethics and societal impact, intellectual property rights, translation and exploitation of research. And we have a female coaching and mentoring program. You can find the training courses and the workshops on the web. So you see a lot of activities are going on in this human brain project. And as you can see, everything is existing in the HPC and analytics platform. But as I said, this is how we started and this is a work of one year. Now the question is, can we really develop the HPC activities and analytics activities for the benefit of neuroscience as well as all the other type of infrastructures to world-class infrastructure. And the idea is to go through co-design projects here. We can just sketch the four planned current co-design projects. So we have the development of the whole mouse brain model and the related mouse brain atlas. So a simplified model simulation of the whole organ initially based on the nest code. So you see this will be exactly the point of contact with the performance analytics and computing platform. The source will be the multilevel mouse brain atlas. We will involve SP5 in neuroinformatics as well as the mouse data and finally closed loop neuro-robotic experiments have to be performed here. So also SP10 will be involved. So there's a huge interaction in the project through this co-design and it will influence the different sub-projects, especially also neuroinformatics as well as simulation. Second one is mouse-based cellular cortical and sub-cortical micro-circuit models. Here we will not use nest. Here we will use neuron as a tool to integrate morphological and physiological data. Again, simulation will be important. This is actually SP6. Mouse data will be important. SP5 will be important. Of course, also the neuroinformatics platform and of course the simulation of micro-circuits on supercomputers again will involve the performance computing platform and influence its direction. Third one is multi-level human brain atlas. Now we come to the structure, structure, productivity and functions. So this will be human brain data. This is our sub-project two led by Katrin. Prototype of the human brain atlas is to be created or is created in the neuroinformatics sub-project and the efficient HPC supported big data workflows will influence activities in the HPC and analytics platform. The fourth one is visual motor integration in the human brain. So the idea is to use multi-model top-down models of sensory motor integration with a focus on the somatosensory and auditory system. Again, HP human brain data will be important. The theory will be involved. Cognitive will be involved. This has to be determined and to be defined because it's now built up through our last open call. A match to the bottom-up simulations is planned. Here simulation will be involved HPC again and of course again the feedback will be given and will be involving SP-10s robotics. These are the pre-designed projects by which we try now to go a step further in adapting the platforms from the inside of the human brain project to the needs of the community. The next step on top of that will be that we want to have co-designed project with the outside world. So this comes. So this is now the first step. We will see what will be our experience with co-design and then we'll come to co-design project with community as a next step in order to further adapt the infrastructure to the needs of the full community and not just of the inner part of the project. Now let me come to answer the question finally that I posed at the beginning, extreme scale challenges. Some extreme scale challenges. Of course here I can more or less only recourse to those where I'm confronted with directly. So let me start with cellular architecture at different spatial choices or plots courtesy to the people of Catherine and of herself. You'll see if you have a resolution of half a millimeter what you can see and this is not very much if you compare it to one micron. It seems at least just from the visual inspection of this that one will finally be forced to go to one micron level of resolution and beyond in order to see all the details or all the fine scale structures. Of course one can discuss if all this has to be incorporated in simulation. This is not a point, but from the point of view of experimental research on yesterday I very much liked the citation that was made I don't know by whom from Freeman Dyson. Actually Freeman Dyson is a theoretical particle physics again guy and a very famous one. He invented quantum electrodynamics technology but instead of saying I'm the great guy and I invented a theoretical method he said what I invented in some sense he said what I invented was just to explain things that existed but the real things are the new ones that you find through experiment and the better you resolve certainly the more new land you will find. Just to see this more from the abstract point of view scales, instruments and structures. So if you have problems that you want to address though in vivo and postmortem of course then you need the yellow part you need different type of technologies in order to get results and these different type of technologies at some stage come to the one micrometer level and compare one micrometer with 20 micrometers. Since we live in a 3D world this means a factor nearly 8,000 it's nearly 10,000, 10 to the 4 on just on the poor number of data that you have to handle. That's difficult. This is if you go to large scale networks if you have smaller problems systems you don't have these problems today maybe if you do specific type of analysis you might have also very high requirements on data handling and computation. Yeah, going on. Let's discuss a little bit the image registration problem. So the cellular architecture but you can see here a movie from the big brain this is on the 20 micrometer level recreating a 3D picture from a 2D slices. This is interesting that it is possible at all in fact that you cannot distinguish between the different directions after reconstruction. It was also given an award the MIT technology review one of the 10 technological breakthrough in 2014 that was a very high recognition for the neuroscience research of this kind. The problem was it was a 10 years effort to create and of course cutting can tell you much better. I will come to the automatization here to create the first big brain. Now the group tries to do this in a more automatized manner and using parallel computing technologies and methods to analyze this. So things are distributed and you have workflows that you can go back and forth in the different levels if necessary you can also manually interact here. So the complete pipeline you should not need to see the details only the complexity of the complete pipeline of such type of analysis with all the let's say feedback loops that you need in order to do this properly. This is done on a machine and I will come just to complexity here just to tell you the story why you need HPC. This is done on a machine you could do it on a quite small machine in fact. I can show you why this is a machine that they use. It was installed in 2009 called Europa an eight core per node system with a lot of teraflops three under 10 teraflops like quite a large memory it was complimented to another system so this is actually one system but this does not play a role. All together I think 3,000 nodes that work on this system more than 3,000 nodes. Actually they don't use the 3,000 nodes they only use 150 of this machine to perform this research and I can explain it to you. This is on the 20 micrometer level and I told you factor 10,000 going to the one micrometer level. This was a typical job size I already told you 150 of the system was used in order to perform 512 parallel evaluations of the slices. Now we go down to one micrometer we have this confocal laser scanner that big tiff images 30 gigabytes per slice one petabyte per brain. You can go to single cell analysis but what does it mean? First of all, we have a new machine now and let's now compare this in terms of the new machine the big system 45,000 cores it is a general purpose cluster computer it is not a high performance computer as let's say a blue gene it's a system that allows to do any type of research but with a very fast parallel communication system very fast access to parallel file system one node has in general 128 gigabytes of main memory it can also have 256 and 512 but only a few have that. So if you go on this system you would have access to 281 terabyte of memory and 2.245 petaflops of peak and now let's do the comparison. So working on the 20 millimeter slices actually the analysis could go only to 40 micrometers there was a reason for that on the previous system so big brain two at this stage was only done under 40 micrometer level will go down to 20 micrometers because it was hampered by local memory 512 cores were used in one analysis part run so eight cores per node altogether they used only 64 nodes this means 150 of the machine and the reason for that was the core memory of the Europa was exhausted and 512 slices exactly where the number of slices you need to go through and see how the different let's say distortions work so this was quite convenient for the guy who operated the system to work on. Now go down to one micrometer the actual resolution has been 20 micrometers going down to one micrometer means the factor of 400 in a 2D slice one section then require 600 gigabyte and the memory of the Choreca is 5.3 per core of course you have the other ones but only takes the first one 5.3 so this means one section alone requires 110 cores this means on this smaller memory five nodes so you have to go to several levels of parallelization in order to do it otherwise you cannot do it that's the problem one section we don't talk about one section we talk about more sections we would have all sections actually the all sections would be in 10,240 virtual sections on the one micrometer level but only 400 sections saturate the full memory of the machines of 281 terabyte okay this is not much if you would like more you would like to have more sections in real physical distance 512 times 20 micrometer would be the real physical distance you would like you cannot do that because in one run only 400 sections fill the full machine and the compute time finally will be you have to do this 25 times because then only you get the full brain you would need 25 times the full system so you have 2500 times more CPU hours that you need and of course the cost of computing will dominate the whole research that's what happens here let's go on image analysis yes go on image registration to the fiber object that's what we have learned the cost of computing will dominate the research can we do better we can do certainly better if we include maybe GPUs on this machine this will be done in the next step and we believe that this can help but actually it was dominated by memory and it was not dominated by compute so this one should remember polarized light imaging those are the idea to get the fibers disentangled current resolution 60 micrometers times 60 micrometers scan time 15 minutes per section the resolution one would like 1.6 micrometer times 1.6 micrometers scan time six hours per section this is also a limitation that is given today and actually the system to read out is much too slow so we will expect improvements on that in the future but this is the first bottleneck what is actually in the registration of course again here you need to do let's say manual or originally manual type of changes what you would like is an automatized process and this can be done by simultaneous registration and comparison to the reference pictures that you have from the block phase image so you have these are 1,700 images here and you see this now we do on GPUs we have 400 GPUs the machine is called CHATS Julia GPU system you have 20 hours of computation to do that so these are typical processes for instance segmentation that are done in an automatized way by seeded region growing also but this is not what I want to show you what I want to show you is the improvement in segmentation efficiency so if you look at the run time of such a code and if you take one CPU only so we are here the run time will be 290 days in order to do this but you can do better of course you can use more CPUs so run time goes down and this is all highly optimized this is not just a bad implementation you go to GPUs you gain a big factor of 20 and you maintain the factor and this communication does not play much a role here actually you start from 295 days of computation going to the GPUs alone reduces this down to 15 days this is just relatively let's say not interesting if you have such large run time it will not work you need let's say hours you can come down to hours so the idea is to go with CPUs to 5 days with GPUs to 5.6 hours starting from 295 days so that's the improvement that people could get and making this a viable tool and now from these 5.6 hours through better GPUs and usage of GPUs they can go down to 1.2 hours of this analysis that's why you need HPC in the field of this polarized light imaging and in fact this is not the only thing the other one is that of course with the polarizing microscope finally you need about 4 petabytes per brain again this is not feasible today in reality we have data repositories of 20 petabytes online yes but this is at the limits of current technology so maybe in 2 or 4 to 5 years this will be really possible and we are on the way to do this so we are at the limits of next generation HPC technology now don't know how we can do better another fact.10 seems impossible to me today maybe one has to become more effective but at this stage we know 1 micrometer is reachable feasible through HPC technology this is the last let's say the challenge I want to pose is on the simulation again you have many scales and this means you have different type of problems molecular dynamics problems reaction diffusion problems neuron based and mean field type of problems so for instance a subcellular proteomics neurotransmission receptor cellular connectome functional cognitive models and different type of codes these are exemplary codes chromax is well known, steps is well known neuron nest and of course there are special solutions upstairs if we count let's say the memory required to let's say go through a full neuron for instance on this level it's very large we see 1 megabyte per neuron for the code neuron 0.1 megabyte per neuron for nest this determines let's say the large system this determines more or less what we can do on current supercomputers concerning molecular dynamic simulations one of those examples is for instance these multiscale activities done by Paolo Caloni here is looking at kinase signal transduction you know this is very important type of code say it was a Nobel Prize two years ago for the let's say technology that has been invented here in physics what they do they do a multi-level type of approach and he is working here on the Abinicio type of simulation we have to provide the machines for that we do this and of course it becomes increasingly easier to simulate so this is not done on supercomputers these are small scale machines but here you need the largest systems available this is done on the u-queen blue chain machine that we have in eulish another type of simulation is done through nest so you see again the different problem elements involved if they want to do a one centimeter point neuron region in the cortex it covers already 10 to the 11 neurons we stand to the fourth synopsis panneuron and you see of course they include also long range connections at least they simulate or emulate those long range connections with nest and finally they visualize it online the requirements for that as I said 0.1 megabyte panneuron and going back here means 10 to the 11.1 megabyte is a lot they made scaling experiments some time ago already can scale very far on several types of systems for instance on the k-computer as well on the blue chain computer in eulish on u-queen meanwhile we have better data points we can go up even further here so what you see is nest is memory bound it's not compute bound and today the point neurons one can use can get to 1% of the actual number of neurons of a of a human brain whatever it means so this is just in order to understand what will be possible at all there is also from the computer science point of view they have monitored the development of the software this is something that's very interesting to see how much work and different versions codes and code elements over time go into that if you are interested it would run for 5 minutes and you see also branching of codes and branching to new directions the different elements they have and so on it's just from the point of view what does it mean to conduct such a huge effort in software development it's quite fascinating I stop it because it does not give you new information now let's come to the final slide the memory challenge this is just a rough estimate if we take our present computers and if we want to go further we have rodent brains and we want to touch the human brain we know there is this region to cover and we think current systems that we have planned HBP systems have to go this roadmap finally and though we will end up certainly in the range of 20 petabyte for the next step and we should touch 100 petabyte in 2022-2023 it's unclear if it's possible this is one of the other research aspects of the HPC and analytics platform to get hierarchical memory technologies realized on the system so you see we believe that we can touch that and can provide machines but this will require innovation and it will lead to innovation in HPC and computing I hope that what this guy said is not true that it will take several hundred years to let's say disentangle what the human brain is with the current efforts I believe one might be much faster and I wish you all the best to achieve that and I hope that we can help a little bit thank you very much