 Hei everyone! Na i fi settings, it's nice and a little bit terrifying to come to such a big conference to present the talk. I'm here to talk about gen, GPU enhanced neural networks, which is a graphics accelerated neural networks software framework. My name is James Turner, I'm a PhD student of Thomas Lovotny and I work with Thomas and Eitan Javuz on gen. First things first, so why not single core? Why not CPU neural network simulations? Well, there's only so much you can accelerate a single core before. There's only so fast you can make a single processor core. So the solution, as we've already had earlier today, a lot of people are adopting graphics processor units because they have a lot more cores than central processing units. The good thing about GPUs is that, thanks to the video game industry, there's a lot of bang for your buck. It's driven by a very big market and thanks to that you've got very high performance chips. The good thing about GPUs is that you can get very large network simulations at real-time performance. However, it can be quite hard to program these simulations. With GPUs, it's a non-travel problem to ensure that all the resources on the graphics card are used 100% at a time. Because GPUs are optimized for throughput rather than latency as well, memory operations are quite expensive. So that's another thing you have to worry about and program around. So the typical user doesn't care about this stuff. Neural networks is all they care about. So enter GPU enhanced neural networks. It's a C++ source library for generating GPU code for neural networks. It uses a code generation approach. I'll get on to the advantages of that. It's fast, flexible and as I'll hopefully show in the next few slides, easy to use. And it's open source and it's cross-platform. It's available in Mac, OSX and Linux and Windows rather. The benefits of code generation, a lot of things can be calculated and fixed up front, rather than being recalculated a lot during the simulation. So that seems a little bit of performance. Also, the user only has to deal with defining the important things, like neural models, synapse models. You don't have to worry about how much memory should be used. Another good thing is that the code is automatically optimized for the given hardware that the user is using and the network configuration that they're using. So I'm going to go on to showing you just a tutorial of how you might define a simple network. First, you might define your neural models. So you would first define your variables of your neural model, the type of your variable, so either a real number or an integer, et cetera, and you would define your parameters. And your simulation code for your model neuron would be defined as a C++ string and also the threshold condition, which is the condition which must be met for a spike to be propagated. You also define your pre-synapse models. So, for example, I've implemented a simple graded synapse model here but you can also do regular post-couple synapses, STB, spike time independent, plasticity rules, et cetera. Again, similarly, you would define your variables, the type of your variable, real number, integer, et cetera. Your parameters for your variable and your syncode as a string. And finally, you'd define your post-synapse models. So post-synaptic model is how spikes are converted into input current to the post-synaptic neuron. So, again, you define variables, types, parameters and the simulation code as a string. Now, gen comes with a number of pre-defined models which you can use such as chub miles or Hodgkin-Huxley, possum process, graded synapses, spike time independent, plasticity, or you can define your own, so it's quite flexible. Okay, once you define your models, you now connect everything together. So you would use a function, you would define a function, model definition. Inside, you would define your model name as a string, your integration step size DT. And after that, you add your neuron population. So you would give a neuron population a name, a number of neurons in a population and the model which you defined earlier of the neuron population. And once you define your neuron population, you define your synapse populations. Synapse populations take a name, the in-group or the out-group. They take the pre-synaptic model defined earlier and they take a post-synaptic model defined earlier. And we also have synaptic delays as well. Parameters are passed in the form of arrays and C++ vectors. Okay, so the idea of gen is that it's very minimal, it's very bare bones. Gen only defines the functions which simulate a single step time and copy the data to and from the device. Everything else is left to the user. So the user could do such things as define input patterns, define conductance matrices. He would save, analyse the data online perhaps and maybe even pipe the output into another process. It's very minimal, very flexible. And then you would use the generated functions to first copy your data to the accelerator device. Use time step GPU to integrate a single time step of the simulation. Supposed to a lot of other simulation software where you would say integrate your simulation for 10 seconds or such. And it wouldn't stop until those 10 seconds are finished. Here you get a lot more control. You can integrate a single step size and if you wanted you can copy the data or do whatever you want after a single iteration. So it gives you a bit more flexibility. And after that as required you copy the data back to the host. Okay, so we give a few connectivity options. So usually synapses, conductance matrices, they're represented as a two-dimensional matrix. They're actually represented as a one-dimensional matrix which is a two-dimensional flattened. So typically you'd have a dense representation where per presynaptic neuron you have all connections to post-synaptic neuron, whether they exist or not. And this can be quite wasteful if within a synapse group there aren't many synapses. So we also offer a sparse connectivity scene where only the connections within the synapse group are actually encoded. And I had a small performance here for indexing this sparse array. But you could save a lot of memory doing this way and be able to use a lot bigger models. You could change the way specs are evaluated. So if you get a synapse group which has a lot of fanning in or a lot of fanning out rather, typically the synapse kernel number of threads would be the number of post-synaptic neurons. However, if you have a lot of synapses which fan in from a presynaptic group, it can be more efficient to parallelise over the presynaptic neuron group rather than the post-synaptic neuron group. Gen doesn't calculate these settings automatically yet. So it's up to the user to do a bit of experimentation and determine which settings work best. Okay, so just to demonstrate a few of the capabilities of Gen. There's a few benchmarks. This is a Isikovic post-couple neural network model. It's a population of 80% excited tree, 20% inhibitory, 1,000 random connections in neuron, random conductances and parameters, and random input for every neuron at every time step. This is the result. The top graph is the throughput in spikes. This benchmark demonstrates the bottom trace is the throughput of spikes. That's a number of spikes per second times 10 to the 6. You see the CPU trace at the bottom stays pretty constant. It gets worse as the number of neurons increases. Over the GPU, we have up to, in this particular simulation, 40 times 10 to the 6 spikes per second. The curve begins to trail off when the device is fully saturated. You can see it's a lot more than the CPU implementation. Another place Gen excels is when you have complicated neuron models. In the previous example, you have a very simplistic Isikovic model neuron. However, if you were to use a Chubb Miles Hodgkin-Huxley neuron, you can get real performance gains compared to a CPU implementation. This model was developed by Thomas Novotny et al. It's a model of the insectal faction. It uses Poisson process inputs and Chubb Miles Hodgkin-Huxley neurons. Here, again, we have throughput, which is the throughput of spikes because it's not such an active model as the Isikovic. The throughput isn't so high. However, the simulation walk up time is a lot higher than the CPU implementation. The CPU implementation for a 5-second simulation time is completed in 10 to the minus 2 seconds compared to 10 to the minus 2 seconds compared to a GPU simulation. This particular GPU does the same simulation. Sorry, I've really messed that up. The top trace is the CPU implementation. It does a 5-second simulation at 10 to the 4 seconds. The lower trace is the equivalent GPU simulation at 10 to the minus 2 seconds. The model is so complicated because it's computationally expensive. GPUs really excel here. Jen can be used as is standalone by the command line. That's bash for OSX and Linux. Or it can be used as part of a bigger simulation package. We have a backend, where a backend to the spine creator GUI package, neural network GUI package. We also have an interface to the brand 2 interface for Python thanks to Dan Goodman and Marcel Stenberg. Some work in progress. We will soon have an OpenCL implementation. Right now, Jen only has a CUDA implementation, so we're limited to using NVIDIA GPUs. However, with an OpenCL implementation, we'll be able to use AMD devices, Intel devices such as the Xeon Fire Chip and even FPGAs. We'll soon have multi-GPU support with load balancing such that neural networks, neurons that are nearby within the model, will be nearby in the hardware topologically. Also, all numerical simulations have a certain amount of error, and these errors have strange interactions and parallel hardware. As part of my work, I'll be investigating the numerical error of simulations on GPUs, as well as the error introduced from the integration method. I'd like to thank Marcel Stenberg and Dan Goodman for their brand 2 interface. Alex Cape for the SpineML interface and Alan Diamond for his help testing and his neuromorphic classification algorithm added to Jen. Our project is available at github.com slash gen team slash gen, and we're under active development and always looking for suggestions and help questions. Thank you very much. Thank you very much, James. Nervous PhDs. Perfectly. Questions for James? Great. Thanks for that. I think that code generation is absolutely the way to go, and we need people to use GPUs. My worry is that models then don't have much interoperability, so I'm wondering how in Jen, if two independent people create models, if they then want to link how would that work? Is there a pipeline for that? To get completely reproducible simulations on parallel hardware, you'd have to serialise some of the parallelised instructions. That's the only way you can get completely comparable. But some of the work I'm hoping to do is upper and lower error bounds, so two runs of the same simulation on parallel hardware should be within very tight bounds, but obviously without serialising instructions can't get completely reproducible results. However, you can get very close. Thank you. It is wonderful to see GPU can give a high impression for simulation, but I wonder whether it is because of this connectivity, like here you apply all-to-all connections. It means quite homogeneous, but as we know it for the neural connections, it's not all-to-all. It's based on different facts, like a special or other facts. It costs much heterogeneous. In that case, it might take a long time to communicate between different cells of the GPU. In that case, to consider the real case, whether it is still to apply a GPU, whether it is still high-inficient, in that case it's more like a standard CPU. I wonder what's his opinion. There is still a performance advantage to be had over the... I don't actually have the... I believe these were sparse synapses. No, we don't use all-to-all a lot at the time, but even if we do use the sparse implementation, there's still considerable performance advantage over using CPUs, especially when larger models are concerned. I think we're going to have to stop now. There are opportunities for more questions in the round table.