 Okay, my name is Juan Fernando and first of all I would like to thank Eric and Sean who it is not here for inviting me to to be a speaker in this workshop. It's been a pleasure to be here because I've learned a lot of things from the presentations beforehand and I want to give a little bit of my background. I'm not a neuroscientist as some of you already know. I obtained my degree in computer science from the Technical University of Madrid in 2005. It was the old degree so it could be considered also as a master's in computer science and I finished my PhD in visualization last year from the same university and since now since then I've been working as a postdoc at the Supercomputer and visualization center of the UPM within the Cajal Boobing project and collaborating with the Boobing project which is a collaboration which I started back in 2006. My expertise is in visualization and computer graphics and that involves C++, OpenGL, libraries for visualization and also combining C++ and Python and the convoluted things that can be done with that. So here's an outline of what I want to present. I'm afraid that I'm not going to be able to present impressive applications as the previous speakers were done because the pieces of software that I saw in the previous talks were pretty amazing. I will talk about what we've been doing in this Cajal Boobing Boobing project collaboration which basically refers to this application that we use for visualization which is called RT Neuron and most of the talk will be on the challenges that imply moving to multi-scale modeling and a scenario in which this multi-scale modeling will be accomplished which is in the scale computing scenario. And finally I would talk a little bit about software architecture that we are starting to develop to most couple with the prototyping challenges that needs to be done to cope with these challenges for visualization. So I want to start with something that doesn't have anything to do with neuroscience at all but has to do with visualization which is this magnificent image from a French civil engineering that was designed back in the 19th century and what it depicts is the army of Napoleon from its travel to France to Russia and back and how it was, how the number of soldiers is measured from the 400,000 missile number in the army to the lethal that came back from Mosque which is the black line. What you see here is a lot of information in a single image. You have the direction in which the army was going, the number of soldiers. You also have the dates and the temperature in the curve below of the travel back to France and you can see the details like for example that in the crossing of the river which I can see the name from here which is in this corner half of the army couldn't pass. So you can see that in a single image you can depict a lot of information if the visualization is well designed. So how can help visualization in scientific discovery? Scientific discovery has moved to simulation based research in all areas. It started first with engineering and physics and biology has come the latest to the party but it's pretty clear that a lot of biological research depends on simulations. So visualization can be a fundamental tool first of all for the bagging and from the bagging and validation of a simulation. Then what to me is the most important use case but it's the less use most of the cases is the discovery of new knowledge and then of course the dissemination of scientific results which this is what I call the 3D for visualization in scientific discovery. And since we are moving into this simulation based research area the community from data management analysis and visualization needs to collaborate tightly coupled with the simulation communities in order to deliver the high quality tools that scientists need to actually comprehend the huge datasets that are being generated nowadays. So I said that scientific insight should be the primary goal of visualization but instead most of the time it's being used as a debugging and a presentation tool. A scientific discovery relates to what it's called in the literature as visual analysis. So with visual analysis we mean we try to mean that visualization it's not only a rendering of pretty pictures but you actually want to answer scientific questions and in order to do that you you have to be able to configure and customize the visualization for the needs of the scientists and different tasks have to be performed in order to be able to transform a data set and missile data set into something that looks like a picture that can answer a question. And a recent taxonomy of tasks that was presented in ACMQ is this one. I removed two of them because I thought that they weren't so interesting as the rest and this taxonomy has these three categories a data beyond specification view manipulation and process and provenance and the most important you could say that are the ones on the top the data and view specifications which are visualized filters sort and derive. By visualize it's what we can understand as visualization in the in the main sense which is choose a visual encoding for the data set and render it. Then if you actually actually want to to be able to understand a massive data set you also have to be able to filter data to focus on the features that are of particular interest also sort the data in order to expose patterns that are not readily visible for example using clustering and things like that and also derive additional values from the data like statistics and statistical analysis things like that. Then in view manipulation that has to do with all that it's interactive manipulation of the of the data like selecting the items that you want to operate in and navigate through the data and that doesn't mean only to navigate do 3D navigation on a 3D rendering but also being able to choose between the different levels of detail that come from the data and also coordinate the views when you want to have different representations of the same data and do selections of subsets to those selections be applied to the different views that you have the data for example. And then in the last category we have the annotation which was already mentioned in the presentations before sharing for collaborative viewing which was also mentioned in I don't remember the name but it was the presentation that's before me with this collaborative annotation tool and guiding users to the analysis process which means that sometimes the visualization can be really complex and the data set also so it's not easy for a user to know where the interest features are. So if you can provide algorithms that are able to pre-select interesting or important features from the data set and show them directly to the user that's also a good advantage. So going to this BBP Cajalbur Bayne collaboration what involves me has been mainly developing tools and software architecture for the visualization of the detailed cortical colon simulation so far. And well I'll say two software products but actually it's three. From this collaboration three software products have been produced. The first one is the one that I'm going to present here which is Harti Newton which is a C++ standalone application which is another renderer for the Hodgkin-Hodgkin simulations. Also it's Pina which is something that I won't talk about because I wasn't involved in the development it's something that has been done primarily in the Cajalbur Bayne and it's a tool for segmentation and registration of VM stacks in particular to to be able to extract the synaptic clefts and the shapes and features of the endritic spines. And also a BBP SDK which is a C++ library which was used to access the data that is being used to build the circuit and the simulation data that is produced in the BBP simulations. It's a library that is wrapped in Java and Python and I won't talk more about it than this. So when it comes to visualize the cortical column simulations from a visualization point of view the data sets have these characteristics. We have a hundred of different morphologies which have an average of around four thousand and five hundred segments each one and from these morphologies that Dan already talked about how they are extracted in the laboratory we can generate a membrane mesh it's a triangular mesh and that those meshes have on average around 150,000 triangles. On average I mean considering how many times its instance appears in the in the final circuit. And regarding the simulation data that is produced in a single in a one second simulation of the 10k cortical circuit a membrane voltage report at a tenth of millisecond of time step produces as much as 100 gigabytes which if not it's not that much if you compare with what HPC simulations are producing out there but you have to consider also that this is quite short simulation. When we were first faced to visualize this data set we found out that the visualization packages that are out there like BTK, Amira, AVS do not provide features to directly visualize the data set because it's quite domain-specific. So you have to either use the scripting or the API capabilities of this software so you have to develop something on your own and that's the path that we took. So RT Neuron is this application that was developed to do the postmortem visualization of the simulations and the features that it has is that it provides high performance rendering of this particular data set with high quality levels of detail different representations that depending on the distance at which you are seeing the neurons use less geometrical complexity to show the data. It's domain-specific. It has support for parallel rendering using Equalizer which is a library developed by Stefan Eileman and with that we can use this application in a cluster environment so the scene it's split it up in pieces or the screen is a split it up in tiles and then its note in the cluster renders up part of the data set so to speed up the final rendering of the images. And it also supports alpha blending algorithms that are different from what you can find in the computer graphics literature. Alpha blending it's a complex rendering problem in computer graphics because you have to for all the pixels for all the different pieces of geometry that fall in a single pixel you have to sort them before you can do the composition of the transparency. So that sorting operation is something that has to be done at the pixel level and becomes a prohibitive if you want to do it for very complex scenes and also still being able to do it in real time. It also has movie production capabilities so you can set up camera paths and set up the simulation playback at different rates and output the images in order to make a movie for presentation. And I said that it's a standalone application but it could also work as a library with a core by interface for remote control and customization of what's been displayed. So here I want to enter into much detail about how these levels of detail are rendered but I wanted to show how they look like. For the summas we can use spheres which are not test related to spheres that's an important feature they are great cast spheres so you only have to send to the graphics card a vector and a radius and then it expands to a quadrilateral and then it's raycasted and it has this appearance of a sphere with a correct depth. And for the branches we can use this thing that I call seal the cylinders that are also things that it's a geometry that is partially competed in the GPU you send to the GPU just the segment and the width of the branch and then quadrilateral is generated in the GPU aligned to the screen and shaded like that so that reduces the geometrical complexity by a lot because you don't have to tessellate. So when you say it's done in the GPU you just mean that these are done by OpenGL you're not having to do any specific programming? No it's OpenGL yeah it's with GLSL so it's completely standard. Would it work for WebGL in much the same way? For WebGL I would say yes but WebGL it's a different standard so it might have constraints that might not be possible might make not possible to use the renderers that I'm using but I'm not doing anything really fancy with regard to the features that are needed so I think that with WebGL it will work because it's in the in the shading language not in the API part so probably it will work. This is another level of detail I choose about Neuron here because it has this artifact here that this branch should be wider here because for the part of the soma I'm using the mesh the tessellated mesh and for the rest what I'm using here it's also geometry that is generated on the GPU. I'm sending as before segments and width and this time instead of being generated a quadrilateral that is oriented to the screen this is a conical frusto which has a spherical caps so it actually has depth and it's recast it so you generate something that it's like a bounding screen aligned geometry of the tubelet and then you recast the geometry to compute which is the final shading and if there is tubelet in that point which you are recasting and these are the meshes that are generated by the algorithm that Sebastian Lacer developed. So I want to show some videos to show how the visualizations look in action. This is a visualization of 1000 neurons you can see here the membrane voltage of the soma color coded depending on the hyper polarization or polarization state you can see here the same column but instead of being rendered the membrane voltage what I'm depicting it's the spiking state so it's black if it's not spiking and white if it's so much spiking and then here we have the same but with some alpha blending applied here the alpha blending it's depending on the voltage value and hence depending on the spike state and you can see how this visualization looks like it's pretty easy to see the activity of how the excitation state on the column propagates in these kind of images and here it's the same for a mesocircuit comprised of around 300,000 neurons if I remember correctly so this video plays back slowly because it was actually a static real time this is not frames that were output and then a movie that was compressed this video was captured real time so this is the performance that you can expect for the recasting algorithm for spheres in a not really high end but a good graphics car nowadays okay this is a video because this is actually something that couldn't be done in real time with the full column also with the voltage dependent transparency with the soma's voltage dependence transparency works okay because there are less geometrical pieces that process but with the full circuit it becomes more complex and this one was rendered offline the thing here is that for the action we are applying the same color value than for the soma and that's the reason for which you can see all this white here so instead of trying to render the action applying the same membrane voltage done with the than in the soma we have another session technique that what actually shows is how the spike propagates along the axon hello if he wants to stop sorry for this yeah here we go so this is only 100 neurons but you can see when that when a neuron spikes there is an action potential traveling through the action of the neuron okay so this is the car in the state but we have limitations with only this because so far we've been using this more like a rendering engine than a visual analysis tool which should be the final use case for for this tool and that means that mainly we've used it for a circuit debugging for example for debugging the the circuit construction and finding where the touches were being detected are where they should be visually and also for public presentation things that should be improved for example in the quality and performance we should provide better anti-aliasing algorithms for some of the other representations and better performance overall for the alpha render for the alpha blending corba has proven not to be a good choice to provide a communication interface for the application so external we could be provided and for those reasons we are currently under factorization of the application to wrap it in a python library so we are moving the core API and we're providing an API that will enable the users to customize the visualization in a much more powerful way than how it's currently done so the opportunities of this python wrapping are that we can have a high performance rendering engine with all the flexibility that python provides this will allow as I was saying faster insert customization of the visualization so the output of the visualization can be tailored to the scientific question that the scientist is doing that will also provide us the ability to leverage the 16 software for a graph plotting for example we could combine the underlying library to access the data bbsk and rendering engine to select one neuron and then plot the membrane voltage in matplotlet plot for example and also you can develop grouis much much faster using pqt for example and that will also enable reusability of the code snippets that the user built for the different tasks between between each other so this is a mock up of how we would like this to look like this is actually what the interface of the application that was using corba to connect to our team you don't look like but in the end we want to do this with python instead so we'll have here a python console here the rendering and you can select neurons and see here the voltage plots change the color map and have a special widgets to simulation for simulation playback and do things like it is also this is a image that I did hard coding the color maps in the rendering engine itself but it will be possible to do this with little scripting so what this is showing it's a layer 5 pyramidal cell the dendrites in red and you have all the axons of all the neurons that are projecting to this neuron and the color of the the neuron and shows which is the distance from the synaptic connection to the soma of the projected neuron so deep blue means that the neuron is projecting close to the soma and clear are more transparent blue means that it's projecting to the apical dendrites the top of the apical then the the apical dendrites so that's where we want our team neuron to go in the short direction but we also want to continue pushing it forward to cope with the challenges that the VP is facing in the short and long-term goals in the in the modeling and simulation side so in the modeling aspects we want to deal with more geometrically realistic circuit reconstructions that implies having the spines on the buttons on the meshes and also moving to have unique morphologies because in a piece of tissue you cannot be replicating the same neurons over and over you have to have unique models and we also want to couple the rendering with the multi-skate simulations so reaction diffusion simulations the synapse level are also considered in the visualization and on the computing aspects we want to consider the how xxk computing challenges is going to challenge is going to affect the computer architectures of the future because that will have an impact on how the software technology and the tool chain for the analysis and visualization of the simulations has to be performed apart from all the generic research all the research that we want to continue carrying on in generic visualization strategies So I have a question Is this fine-tuned for just for the blue brain project or is it something that can be used for any dataset? Not likely it's fine-tuned for for the blue brain so the algorithms themselves most of them are independent for example the algorithm for rendering in the somas it's independent of the data but the data access layer it's relying on this BVPSEK library so if if I want to say that I want to render a particular circuit it's not using URL or anything like that you know it's using the BVPSEK layer So I was just looking for it Is it there on any of the repositories? Public repositories you mean? Source for GitHub? No it's not in the public repository Is that the plan? That's a million dollar question So in order to release this as a public software first we have to be able to support us so we need manpower at least one more person because it's only me so far doing the whole development of the engine and that's something that has to be discussed inside the project Personally I would like it to be open source at some point but that's a decision that I cannot take because all the IP involved in the code I would say that it's really complex because it involves not only the BVP but it involves my university and Kahalbhu brain so that's something that would have to be cleared out first And part of the reason for this is that there's something like an 80% overlap with the project that we've been doing which I mentioned very briefly in my talk They go for most you said Yeah so we tried to do due diligence and check that we weren't replicating somebody else's efforts and it seems a shame that that has in fact happened anyway which is why it's nice to have these things out there easily findable and then one can do coordinated rather than parallel development Yeah I think there's also a new round of video Yeah we checked that out and both those are sitting there and get out both of them are freely available and those people are speaking to each other learning from each other exchanging ideas and both of those will and hopefully will have I mean they have similar requirements for loading in morphology loading in datasets there out there speaking and hopefully we'll come up with a format for saving morphology member potential whatever other data in a common format which both of those can view maybe neuro construct can save or neuron can save they can visualize and having your tool out there freely available would mean that you could do with that process as well and hopefully Yeah now that you mentioned simulation output I have a slide regarding that because I think that that's something that could be standardized processes there too because we think Another aspect in that domain if we may enter the discussion already a little bit but when it comes to the hardcore rendering engines and such there is also another domain where this is receiving a lot of interest out of the scientific domain and that is gaming engines and I know some people in the modeling community are looking into things like Blender or Panda 3D which are somewhat sophisticated tools in the Python world in that regard or at the more microscopic imaging level again within science there are libraries like Maya V Have you ever heard of those? Have you ever looked into those? Do you see any connection points there? Yeah I know them in Maya V I haven't used it myself Blender I use it and I wouldn't say that Blender is targets the same kind of rendering style rendering capabilities that we were targeting with RTNU For Maya V it could be the case but still you have a domain specific component that forces you to be able to program part of the rendering itself for example all the mapping of simulation onto the meshes you can do it in very different ways you can apply a color per vertex for example that's something that any rendering engine can do but if you want to be able to do it fast so the simulation updates fast on the screen and you can have an interactive or even real-time playback you have to do something which is more specific you have to store the information on a GPU buffer and then you have to refer the information with special indices from the geometry so it's something that is a bit odd and if you want to develop the level of little techniques that are specific also for neurons or if you prefer some callings so you don't have to dispatch as much geometry as needed because it's not in view that's something that is specific to this problem so yeah, continuing with this, the implications that we have from the visualization perspective regarding multi-scale simulations is that there are a lot of techniques that can be involved in doing a multi-scale simulation at tissue level we have different techniques for simulation of particles and molecules, we have the classical hot chicken hoxley different equations even in severe stocks if you try to simulate the flow of blood in the blood vessels and also you can have derived data from the analysis like the local field potentials it's different in nature to the entities that were simulated that means that we have increased data sizes from all sides we have, as I said before we will have unique cell geometries because if we want to go into a circuit level we will need to have unique geometries for the cells we also have more complex mathematical models that means that we will have to refine our levels of detail and we also have new simulation datasets for example the most obvious one which is the molecular positions so there is no one size fits all solution for visualization we see that there is no single rendering engine that will be able to deal with all the particularities of the different rendering techniques that you have for example you have to be able to combine volume rendering with mesh rendering and there are pieces of software that do that independently probably for volume rendering you don't want to replicate that effort but you need to be able to integrate them in a single visualization in order to provide answers to questions which is the final goal of this and regarding exascale computing I don't know if you because I have the impression here that most of the people it's running the simulations workstations or small clusters but I think that exascale computing has an impact on where we are going because it will shape in some sense also the computer architectures not necessarily at the consumer level because that's the mobile industry but at the workstation and small cluster level it definitely will have an impact so the main things that I want to stand out from this slide is that in order to achieve this exaflop challenge and also keep the memory the power consumption constraint it seems that the way to go is to have computing cores that are going to be just slightly more powerful than now but a lot more concurrency but at the same time the higher capacity is not going to grow that much What is the meaning Sorry Swim lane Sorry I didn't explain that I took this data from a scientific report from the Department of Energy in a workshop that was celebrated last year but they were referring back to another workshop from the Advanced Scientific Computing Research series of workshops that they organized and what they consider swim lanes are the two different choices that the hardware manufacturers are considering we will have on one side having more powerful cores which is this one and having less concurrency well less nodes in the end or having less powerful nodes but much more concurrency in either case you can see that the total concurrency of the system I don't know if this should be a the total concurrency of the system it's much higher than before so that means that code if you have a code that it's working fine on a computer on our processor right now and you expect it to run much faster like four times faster in three years you will probably wrong unless you parallelize your code so in the single core the computing power is not going to grow that much so that means that we have to go parallel even if you are not going to run in an hardware machine so as I said we are moving to massive parallelism and that includes not only that we have many nodes that also inside a single node you have much more cores like for example what Intel has already something that you can develop as a prototype which is the Intel mics or the GPUs so to say also in a super computer environment for IEO will be much more costly than now so there is a relatively slow down compared to the computing power and also the power limit this is something important will constrain data relocation that means that instead of moving the data where you want to process the data it will be better to move the code that you want to run on the data to the data in summary regarding visualization it means that we need to have a closer collaboration between the simulation and the analysis and visualization communities because in C2 processing which is doing analysis and visualization in the same memory space than the simulation will become much more important than it is right now some expectations are that by the time that XSK is here 80% of the process will have to be done that way with representation from the community that it's doing visualization so I don't know if that's going to look like that anyways postmortem is going to be still important so there's still room for defining standards for simulation reports so new techniques will have to be developed to cope with how the datasets are going to increase and also do an efficient management of the IEO and the visualization community there seems to be a pretty common, it's accepted that the current technology it's not going to scale to this scenario so software like BTK Parabium Visit won't be present at that scale because they have concurrency problems that need to be addressed so this is not going to be a single team effort and since we have more computational power that means that we are also going to have more simulation datasets multi-scale simulation in some particular case of that we have higher dimensionality if you are running a stochastic code probably you are interested in running a lot of simulations and then do analysis of the ensemble of the simulation results produced so you have to develop new visualization techniques for that so if we want to have interoperability between tools simulation engines analysis and visualization frameworks we have to specify the entities that participate on the simulations that are being performed so that means we need ontologies and taxonomies and also APIs for example music which I learned from it not on Thursday we are browsing on Wednesday when I first learned from music but it's the type of thing that we need to in order to exchange data between the simulation and the analysis and visualization and regarding post-mortarization as I was saying on purpose of what you were commenting I think that it would be interesting to define file formats or metadata standards also for simulation results but considering that you need to have also certain capabilities for the HPC simulations so on the metadata side you need mapping to the static data structures so you can refer back one particular position of an array of the simulation to the entity that produce that simulation value but you also need to be able to have a scalable way of write and read the file format that you define and if possible it should be also would to have compressibility of the dataset and for sure what it's for me important it's having a file format that it's random accessible and queryable in order to be able to provide query driven visualization and for that I don't think that XML is the better choice for the final storage so the hierarchy that you define in XML it's okay but then the file format itself it's not it precludes random access and it doesn't make the visualization algorithms that require to query data randomly at the user's request easy yeah you can send me the links later I'm interested in it so well I'm gonna skip this slide because most of this content I already mentioned it so we have to combine different rendering engines different entities and also different plots that the user may want to be able to see and for that we are working on a prototype architecture in order to deal with multiple view multiple rendering engines and runtime configuration and it's based it's very similar as you were saying to Mugli Mugli when I saw it yesterday I saw well it's precisely what I'm doing right now so here you have the architecture you have a C++ backend with the rendering engine the parsers for the I put HCA5 because that's what we use in BVP but it could be any other data format that you can imagine the rendering engines it's on top of OpenGL and then you have Qt a part which isn't the C++ site and another part which isn't the Python site for the GUI and there is a callback mechanism in order to if you perform an action on the rendering engine it's doing some selection then have a callback that goes to the user land let's say in order to trigger an action like for example grab this data and plot the membrane voltage of this neuron and I don't know if what you did for Mugli it's more of the same but we are targeting here to have multiple views being able to have different rendering engines at the same time so at some point we would like to be able to show that you can do a VTK rendering and OSG opens in graph rendering both in the same window and have matplotlib graphs on top and transparent so far this is designed to be off-screen rendering and it will be possible to do blending of the images of the different rendering engines if they are outputting to GPU buffers in the proper way so as a conclusion currently we are doing the rendering and visualization of this electrical simulations of the cortical columns with RTN but it's not sufficient as it is because we need better visual analysis of the result we have to integrate multi-scale simulation first of all it will be the synapse molecular simulations that Daniel Keller is performing and because of all the implications that relate to exascale computing and the current work goes into these two converging lines which is making a Python library for RTN neuron and writing a generic software architecture to easily prototype different visualization techniques and at some point we would like to make those converge and make the Python API for RTN neuron being a plugin of the software architecture and finally I would like to acknowledge of all the people that it's in some way already involved in this work from the CAHAL BlueBrain and from the BlueBrain project and thanks everybody thank you