 I'm going to talk about a project which I initiated some years ago and now it's an open source project since March and it's called Neurokernel. The focus of our work is on fruit flies and the reason is simple, somewhat similar to what Steve mentioned here about sea elegans, but there are some differences. One difference is that while the network of neurons in the brain is fairly small, maybe 100,000, some people believe it's maybe 150,000, so somewhere in between, the fly has a range of complex behaviors which are very interesting and they are very well understood in the literature. In addition of course there's a very powerful tool called genetic techniques which can be applied to the fly and this was one of the reasons why actually I got into this field. This slide very briefly tells you what we are trying to do. On the right hand side you have the real fly, on the left hand side is the model of it and we would like to emulate it. The demulation I'm going to talk about today is on the brain level, but there are some people who are interested to do more than that as far as maybe even build a robot insect. Now there is a lot of data available in the literature about the fly and I'm showing here some data which came out of the lab of Chang in Taiwan whereby a mesoscale model of the fruit fly has been established and is being established right now. On the left hand side you see a depiction of the brain, what you see there in green in the middle is the antenna lobe, so there are two lobes left and right and then on the sides let's say the medulla and there are a lot of neuro pills roughly 40 in the brain and this has been established using genetic techniques and the level of abstraction is mesoscale. So one has a very good sense about the neurons and their morphology but one doesn't know too much about synapses and connectivity. On the right hand side however you see the mesoscale connectivity, these are the tracks. So this picture gives you a sense of one point of view in looking at the architecture of the brain. Another point of view is what Mitya talked about yesterday whereby one uses basically electron microscopy and of course this produces a lot of data, the work is much more tedious in a sense, it's much slower but the general lab he led came up with a connectome both for the laminar and for the medulla and so this is something very interesting for us, it's a subsystem if you like but I'm going to show you some data about how this is used in building an architecture and finally there is a lot of electrophysiology being done in the fly and I'm showing you an example from my lab, this is electrophysiology done on the olfactory system so we are recording here olfactory sensor neurons you see on the top there is a time dependent waveform with various concentrations in the middle, a set of trials shown in different colors and the reason is because actually this experiments they are extraordinarily difficult contrary to intuition and the reason is because auto delivery is actually an art, this is the first demonstration actually that one can provide auto delivery within one percent precision so that actually data which comes out of the neurons is precise and this is an example over there so essentially there are various levels of attacking the problem from a neurobiology standpoint and there is a lot of data and now we ask the question how to go about modeling and the first step is neurobiology modeling in terms of the neuropeals and the abstract languages these are local processing units which consist of local neurons and projection neurons, projection neurons extend the axons to other LPUs whereas local neurons are local and their axons do not leave the neuropeal, they are roughly 40 LPUs and they provide either spiking information or spike train or a graded potential for instance that's the case in vision. Now so this was my hat of a neurophysiologist if you like so now I'm going to put on my hat of a computer engineer and so I start with requirements this is a starting point in building a emulator and so what are the requirements so the requirements we have here one is scaling we have to be able to scale the system as time passes by and the scaling has to be not just in terms of the modeling gets more and more detailed but also the resources on which we are running the system might be more larger and therefore we have to sort of scaling across hardware platforms and we believe that and this is sort of a point of view coming out of neurobiology we believe that the most complex object we can build in computer science is an operating system and most likely the fly brain is more complex than an operating system so at the minimum what we have to do is apply an operating system concept or operating system concept to attack the problem so here is a difference now if you like from what has been done in the past because we are looking at all this as being an operating system and the message I have as a result is that there are three important things in this field a program of programming models a programming models and programming models so similar to you know what people are saying in real estate location location location we say everything is about programming models and so what I'm going to discuss is actually programming models at various level of abstraction in particular I'm going to talk about the architecture level but I'm going to also talk about programming instructions on the LPU level so start with LPU the abstract language which we are using is that the LPU is an abstraction which has input and output ports the output neurons which extend their access to the output to these output ports or they are connected to the output ports if you like and then there are local neurons which are only for the LPU domain and they don't know anything about anything outside the LPU so for simplicity in particular for those of you who are some coming to this from biology this looks like a chip so let's say pick your favorite chip let's say an Intel chip and an Intel chip has input output ports and has logic in it the logic here comes from local neurons so now what we have to do is if we have chips we would like to build actually let's say a computer board or a PC that requires define some interfaces start interfaces so that these things can be put together so that's this is what drives us towards a programming model so you see what I have here is two LPUs LPU 0 and LPU 1 I like to interconnect them and so I look at LPU 0 it consists of essentially a switch matrix which is in gray and then I have in blue projection neurons which extend their axons to the output and I have local neurons and somehow they interconnected to the switch fabric inside so it's if you like equally switch fabric if you are a computer engineer connectivity if you are a biologist I guess and then there is a need for an object which interconnects the LPUs and this is here a pattern and that's also switch fabric so it provides essentially connectivity between input and output ports and so fundamentally what we have here is a set of toys if you like this set of toys are LPUs and patterns and what we have to do is match them together any way we want to so we come up with an architecture of interest in this case for the supply brain now so the programming model therefore is one of on the level of architecture okay is one of mix and play so again the way to think about it is you have a computer board you have chips on it you have a bus you can read and write on the bus the reason rights are according to an API so what we have to do here as a result publish an API which people have to abide to and then the design of the LPU LPUs is completely independent which means that various labs can design their own LPUs according to their own criteria they don't have to follow any rules if you like but they have to abide by the API's and that allows us to do cooperative work so you see the collaborative model therefore is one based upon and this is the main message I'd like to give you today on communication interfaces API's the API's are between LPUs if you like between LPUs and the patterns or more generally a set of interfaces which allow you to essentially separate the innards of the LPU from the outside world and this is an essential in in program development it allows collaborative development and refinement of emulations it allows the researchers to leverage additional GPU resources and enable us in vivo validation of the neural kernel now so how did we go about this set up a website called neural kernel here's the HTTP address and it's based on RFCs we are forth in in the language Steve presented here with a bazaar we are following the Linux model the bazaar and we essentially invite the community to submit RFCs to the website and together RFCs together with code RFCs in the community computer science community are as good as papers actually some people think they're better they're more valuable okay because they have direct impact and so we are asking people to submit RFCs and code which then can be discussed so this of course is publicly as accessible RFCs might be superseded with new RFCs because of improvements and at the same time they're going to have sort of some validity because there's going to be some agreement presumably about the best RFCs and so that actually the architecture we can be moved forward so here quickly here's the website and see if I go to documents there are requests for comments there and there are two RFCs so we published RFCs now this is essentially against swimming against the stream in your science because we are publishing the stuff we were not waiting you know for approval if you like and we are sharing the information with other people and it's exactly how Linux was developed it's exactly how the internet was built okay we think that this is a model which is very valuable because you know when you submit your RFC your name appears not only in the code but it also appears on a document it's also clear in terms of you know publications who publishes what that names associated with this and here is a start for this my file is here now my message is as I said interfaces open interfaces open APIs this is about openness here now the picture is more complex this model which I described before actually appears on the application plane this is what the Python programmers is is what it deals with but from a system standpoint and I'm talking about our pure computer engineering essentially this has to be mapped into computational resources and we feel that today especially for small labs the most price competitive solution is to use GPUs in an architecture whereby the GPUs appear at the bottom they operate on the fastest timescale then there's a CPU timescale which operates slower and on the top you have the LPUs so you see this is a classical model for operating systems in the sense that the underlying infrastructure CPUs and GPUs they provide a set of services in other words the system is an extended machine which on the top provides the services this is where you run your brain emulation this is where the LPUs are this is where the patterns are and at the same time it provides the second feature which is resource allocation and this resource allocation is guaranteed through the CPUs which control the resource allocations on the GPUs and this is of interest in particular if you want to deal with things like real time and interconnecting you know your architecture with the actual fly now in terms of specifics now so on the architectural level I said the issue is open up interfaces publish them work with these interfaces define the LPUs any way you want to now we started looking into the LPUs and specifically our interest is in LPUs associated with the the vision system and we don't function and meeting I talked about the vision system so I'm going to spend some time on it although speaking the factory system is closer to me so in the vision system what we have here is five LPUs it starts with the red part which is the retina then the next is lamina medalla and late so these are LPUs and now in a sense the question is computer engineering standpoint how can we deal with this we need programming abstractions we need a programming model we have to be able to actually deal with this somehow okay a flat structure where you have a lot of neurons it's too low level for us we need circuits to manipulate okay we feel and I think this is the main message as soon as you look at a programmer extractions you need something higher level than neurons and so here is I just go quickly through I start with a retina you see the retina consists of basically omatidia yeah sort of facets each one has six plus two neurons for the receptor six black and white two color and they're organized in a beautiful geometry sort of hexagonal geometry what you see here on the left is a receptor it has 30,000 microvilli this is where the transduction process takes place in implementation that means that you have actually 30,000 groups of 20 to 30 differential equations which you have to run if you have light or some video coming and then the output is driven by a Hodgkin-Hockston neuron type of model whereby everything is analog so everything is greater potentials now if you see on the right there there is a circle denoted by A this is like sort of the input to the next level to the lamina to a cartridge and the input is to this cartridge comes from this R1 to R6 neurons from its neighborhood and they are roughly 800 omatidia okay so 800 times 8 neuron roughly that's the numerical complexity on the level of retina when you go to the lamina you guys this circuit with the inputs coming from the retina and there are some outputs this connectivity we use the data published by by Janelia and you see now we started looking at a programming model whereby we think about that here omatidia and the cartridges are tubes and then there that will be later on columns in the middle of their tubes to these are the programming abstractions they are doing their circuits which do local processing and so this is what is represented there and so this is very natural because we have a visual space which essentially is sampled okay through this circuit but at the same time there has to be some lateral connectivity because images move across the visual space and so there is connectivity among this abstractions so while you can look at them to be independent you know in the first instance then as a whole you see that the programming model consists of the cartridges as objects and then rules of composition among those and the rules of composition then help whether you are going to recognize you know an object better or not the next level is the medalla and the medalla again has a similar structure in terms of these abstractions and then connection interacts between them there are a lot of different types of neurons involved the number of neurons is different etc etc so now I'd like to give you a demonstration so you get a sense of what we're doing so here's the picture of the retina so this is a half sphere and you see the hexagons those are stand for the omatidia so as I said before six neurons are in there black and white two for color the black and white neurons feed the lamina directly and so what it's interesting here for us is to look at if there is an input there's an image and in this case is a set of bars moving across the screen so think about a flat screen the bar is moving across it one has to map it into this hemisphere and you're going to see on the left the hemisphere is moving and then we look at the output of the retina specifically we are looking at the R1 neurons and obviously how many are there they have 768 because there are 768 omatidia on the right right hand side you'll see the simulation starts a little bit later because there is a natural delay so what you see here on the left is what the retina sees and what you see on the right is what you know the rest of the brain sees okay in other words behind the retina and so here is the simulation and you see this is of course graded potentials it pretty much follows this the input and of course you have that six times so if you think in terms of super resolution of course the quality is going to be much better i'm finishing this okay and this is the first image and now the next one is here and here we are showing now the entire system where we also include the medulla and we are picking some medulla neurons and you see from the left from the top we have visual input then we are looking at the what does the output of the retina provide in terms of information then we're looking at the lamina okay and we are sampling various neurons in the medulla and we are showing this so what is the message here we think that this is not about simulation it's not about differential equations simulation is about information so we are bringing essentially information and we think that things are shifting towards this whereby you know we have a circuit we have to demonstrate what is the IO capability what can it do and that's exactly what you're seeing here and since angus is hinting that it's the end of it let me stop here however i would like just to say that we believe that the fundamental challenge we have is what has been discussed before is validation biological validation the issue is not validation in the computer science sense is validation in the biological sense and that requires basically connecting the emulator to the real thing in the lab and seeing you know whether there is any convergence between the two thank you you uh describe this local processing units as as modules which uh you sort of independently can contribute uh at the same time you had uh different layers where the actual processing occurs like the gpu layer so uh i guess that means that that the the api for such a local processing unit is not that slim but but in fact must have rather um rich set of methods to to describe such a local processing unit and you're not completely free to divide up how you do the computation right yeah so um the message i have here is this uh i believe that the priority for us from a new informatics standpoint is programmability is function and i have seen this over many years in the internet what one ad was functionality uh we can take care of performance there are a lot of geeks in this room i i'm i included i don't think it's a problem but it's true that since this is an operating system uh you have two functions one is programmability and the other one is resource allocation and you have to take care of both and the issue is for us here i believe to hide everything which has to do with the research resource allocation and focus on programmability now the way i went about this was first of all i moved from cpu to gpu so you know from traditional cpu clusters to uh gpu clusters and the reason is because then in no time i got a performance advantage and i did not have to focus on performance i had to focus on functionality and i believe that this is where the action is and i'm i'm a big proponent of this uh because essentially the history of computer science tells us that while there is a lot of focus on performance we do this because we are geeks but not because moose law is not taking care of it and uh so you know um uh in the presentation uh felix had on on i guess what was it one day tuesday uh basically what we have seen is that uh computer power is going up and um so i'm not too concerned about it i i feel that we can take care of this it's not saying that it's a free launch there i'm not i'm not suggesting this but i'm suggesting that if we want to make a difference it's all about biology okay that's that's the feeling i have and this is where i'm coming from at this okay so obviously i'm very much influenced by the fact that you know i'm looking at data in the lab and i would like to be able to explain it and uh now i have here for vision i demonstrated a um a running system okay it's executable executable in the sense of i o and i can change it now any way i want to because i have a programming model to do so right no no i i very much believe in introducing more interfaces in computational neuroscience if we compare to uh traditional computational science they typically divide the work into more independent pieces of software than we do we in computational neuroscience we typically have rather monolithic softwares so i do believe that we need more interfaces okay thank you are there any other questions yes somewhat along the same lines i mean are you aware of some of the more recent asynchronous task based programming models the computer scientists come up with like parallax or what parcelona super computing center does stars s because essentially what they are i mean they they they look very much more generic than the neurokernel model you're proposing in the sense that they actually allow programmers to describe any sort of function and sort of data dependencies and then allow the runtime to schedule it on whether it's the cpu gpu or whatever other heterogeneous kernel you have so i mean in that sense it seems that um there's at least duplication or do you think there's no i i'm very familiar with this i i i spent 20 years in computer networks and i i i actually build this you know i build uh you know network programming into into you know networks essentially the problem is if you look at the internet you have on the high level on the let's say package switch level you have an asynchronous system which is mapped into a synchronous hierarchy because at the low level you have a clock and those problems are already there in the internet we have the same problem here because unfortunately we don't have hardware which is clock independent and so the gpu is driven by clock and we have to sort of we have a counter and you know so on the high level we think about let's say spikes and spikes are completely asynchronous events and then we have to map that down into hardware which is you know clock based we don't have a choice maybe if some folks you know develop some new hardware we can avoid this there's not going to be a way to avoid this without it not in the next in my opinion 10 years there's no way to go around this okay and by the way i don't even think that it's a good idea to avoid it because essentially there is a community of probably one million engineers out there who are developing all this stuff and we just we just we should just suck it up okay and use it okay and this is what i'm banking on also in this gpu development here uh remember i had this picture at the low level i had gpu's now that means basically that i have direct memory transfer between gpu's it's not working properly yet but it's going to work properly it has not been designed for that typically from the gpu you have to go up to the cpu and then you go down to gpu again and it slows you down what you want is memory transfer from gpu to gpu in other words you want to look at gpu's like routers in the internet okay and they talk directly and then the cpu layer just appears as a control layer takes care of connectivity and you know doesn't care about the details so it's a point of view