 Hello everybody. Welcome. It is September 26, 2023. We are here kicking off a new stream series at the Active Inference Institute. This is the Morph Stream 1.1. Today we have David Cappell and also this section and streams facilitated by Sarah Hamburg. We're going to have an overview first presented by Sarah, then David will share some work on neuromorphic computing and then we'll have some time to discuss. So thank you both for joining and Sarah to you for the first presentation and also to introduce yourself if you like. Yes, that's a good idea. Thank you very much, Daniel. So my name's Sarah. I'm a neuroscientist specialising in intelligence, currently working in the field of neuromorphic computing at Sheffield Hallam in the UK. So I'm going to give you a high-level overview of what neuromorphic computing is before we hear David's exciting talk in the first edition of this new series. Just to let you know, if you're watching on double time in the future, I talk quite fast so you might not want to watch me on double time. So this QR code I put here will take you to a paper which I thought was a really nice introduction to the field. But neuromorphic computing can be defined as computing systems that are designed to mimic the structure and function of the nervous system. So this doesn't have to be the human nervous system. The field actually takes inspiration from all sorts of animals and insects, although the definitions online don't necessarily acknowledge that. So some people are quite open with what constitutes neuromorphic, while maybe others would prefer neuromorphic was reserved for hardware instantiations of biological like neurons, which are sometimes referred to as non-renewing computers. And I think that's what the paper that the QR code refers to as their definition. So what I think is really interesting is a little bit of the context. So like our current von Neumann computer architecture was also inspired by neuroscience, particularly the McCulloch and Pitts 43 neuron model inspired von Neumann's first draft in 1945. So neuroscience has a long history of inspiring computer science. And this also includes reinforcement learning, which is based on theories about learning, decision making from behavioral psychology, based on rewards and punishments, and also Hebbian learning principles of cells that fire together, wire together from 49 became foundational for unsupervised learning. So first of all, one more second. So in order to understand the why of neuromorphic computing, I really wanted to explain what's so great about the brain. So here's some inspiration for lightbulbs. So I'm going to ask you a question. I just want you to think about it for a second. In terms of lightbulbs, how much energy do you think the brain uses? Do you think it's more or less energy than the bulbs lighting the room that you're in? If you're in the future by all means pause this if you want to do some in depth calculations, but I'm going to skip to the answer. The answer is here in the pink circle. So it's 20 watts. So that's the equivalent of one modern day energy efficient light bulb. So that's probably what's above me now basically in my room here. This QR code should take you to quite an interesting paper on power consumption in the brain if you're interested in that. So that works out about four bananas a day to power your brain. And this is calculated by the way based on calorie intake that the brain needs. So for context, the fastest supercomputer in Europe, I think it's called Lumi in Finland. It's been called exceptionally green and its power consumption is 8.5 million watts. So that's around half a million light bulbs. Well, your brain uses just one. So then the question is, well, what does your brain do with that one light bulb or four bananas? Apparently it does 1000 billion calculations per second. So there's lots of other massive estimates out there. This wasn't even the largest by several orders of magnitude. Estimates are obviously very speculative, but they're all massive and they all tend to be based on the number of neurons, their connections and firing rates. But I think it's really important for the context that supercomputers can't actually yet match our complexity of skills or the adaptability of the human brain. So we actually excel way beyond supercomputers when it comes to things like complex decision making, learning from experience. So how does your brain compare to AI? So I mentioned that modern AI is already brain inspired. However, artificial neurons are highly simplified. They don't they don't capture the complexity of biological neurons or networks like not even close. Individual neurons are actually more like networks themselves. Research suggests that modeling one biological neuron requires a five to eight led deep artificial neural network made of around 1000 artificial neurons. This QR code should take you to the paper for that. You have 86 billion neurons in your brain. They work together to form a highly energy efficient low latency supercomputer that works just above room temperature off the equivalent of about four bananas a day. So hopefully I've given you a sense of how amazing your brain is, as if you didn't know that already, and how it's already been used to inspire that I guess fairly basic AI that we have now compared to human intelligence. So next I'm going to explain how key features of the brain are being implemented to catalyze our next generation of AI and technology through the field of neuromorphic computing, which is why you're all here. So traditional volume and computers have physically separate computing memory units shown here on the left. During computation data must transfer backwards and forwards like really fast. So there's a bottleneck essentially for speed and energy. Whereas in neuromorphic architectures, which are shown here on the right with the help of Dali, computing and memory occur in the same place. So they're said to be co-located. Essentially individual neurons perform computation, while memories represented by the strength of the connections, the weights between neurons, so the synapses. So chips like this might be created with components like memoristers, for example, which can emulate synaptic weights. And this architecture improves speed. It reduces energy consumption. And what's really interesting is it enables massively parallel processing, meaning that multiple problems can be worked on at the same time. So this is particularly important, this architecture for various use cases, but also because as we reach the end of Moore's log, which is the number of transistors you're able to physically make tiny and tinier to fit on a chip. And it's also important because humanity needs to massively reduce its energy consumption against the backdrop drop of creating ever more powerful AI. So artificial neurons typically use continuous activation shown on the left, they're always on. While neuromorphic neurons, they're said to be spiking. So they're on or they're off, which is shown here on the right. So similar to sort of an action potential. So the benefits for this are again, power efficiency and also applications where timing is important. And this is given that they're event driven. So essentially they have the potential for spatial and temporal dimensions, which then enables added spatial temporal encoding and processing of information. You might be wondering a bit about GPUs, which also enable parallel processing. Research suggests that GPUs are suitable architectures for deploying spiking neural networks, which I think makes this a really interesting time for the field, given how high end GPUs are becoming ever more pervasive. So the brain learn strength of synapses between neurons. This is based on pre and post synaptic firing patterns. I think David's talk will talk a lot. We'll go into a lot more depth on this, but there are many different types and patterns of this across the brain, depending on the types of synapses, such as excitatory to inhibitory, excitatory to excitatory. And the neuromorphic field is working to leverage these rules because of benefits for on chip learning and also applications such as pattern recognition and edge computing as well. Edge computing being quite a huge use case for neuromorphic computing because of the sort of event driven nature and also the low energy usage. And this QR codes to take you to quite interesting paper on STDP that I found. So what neuromorphic solutions are available now, you might just think this is all theoretical. There are actually many different solutions out there, which I'll just give you a really high level overview of. So the human brain project has created several large scale neuromorphic computers, including spinnaker, which is this one at the bottom. This board's like maybe like, you know, the size of my face. So that runs in real time. And it's comprised of multiple general purpose on microprocessors. And there was also brain scales, which is an accelerated analog architecture, and it runs a thousand times real time. So the board next to the blue one, that's a natural credit card size version of brain scales, which they've recently made, which I thought was pretty cool. And then there's also some big players in the space. So this blue one here is Intel's Luigi chip. They're on to Luigi 2 now. That's their neuromorphic chip. And they have an open source software framework for that as well, because they really want to catalyze the open source community to get involved with it. So neuromorphic sensors also exist. So this little blue thing in the middle is actually a neuromorphic camera. It's maybe like this big. So they aim to recreate how our nervous system senses stimuli, such as light. So for example, in a neuromorphic camera, which is the one here, each pixel works independently with a microsecond resolution. Hopefully my GIF will work. Oh, there we go. So you can see each pixel working there, which is pretty cool. So compared to traditional digital cameras, they have improved performance with motion and lower power consumption. There was also a neuromorphic nose recently by Intel, which was pretty cool. So it could learn the scent of a chemical after just one exposure. And then it could identify that scent even when it was masked by others. And then finally, this is a humanoid robot called an icub. And what you can do is you can actually integrate neuromorphic sensors, such as the camera, and then neuromorphic chips, maybe spinnaker or brain scales into a humanoid device like this, or other devices like a drone. And then from that, you can actually create embodied neuromorphic systems. And this is something that we work on at the smart interactive technologies research lab in Sheffield in the UK. This slide just highlights some of the potential applications of neuromorphic computing, which I thought were quite interesting when you think about it. So the understanding of context, pattern recognition, advanced sensing, few shot learning, generalizing across tasks, complex decision making, explainability, and brain interfaces. So all these skills are really beneficial when you're thinking about human centered real time applications in dynamic environments. So things like self driving cars, for example. And personally, I think that neuromorphic systems are also likely to be the future substrate of brain computer interfaces, probably a bit biased because I'm a neuroscientist, but they're low energy, they're real time, and they also have architectures which match our own hardware. So I do think we'll soon see the BCI field being catalyzed by neuromorphic systems, particularly maybe for hybrids of hardware and wet wear. So maybe even potentially containing people's own, you know, brain cells, which you can actually grow just from a hair cell. And a particular focus of our work is designing AI, which learns in a similar way to a human. So it has an innate sense of curiosity, and it learns through interacting with the real world. So in the 50s, Alan Turing said, instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child's brain? If this were then subject to an appropriate course of education, one would obtain that adult brain. This is very much the philosophy behind the neurodevelopmental approach to AI and neuromorphic computing, which I just wanted to highlight. There are some challenges in the field. These are very high level, but I'll just give you a little bit of an idea of it. So training spiking neural networks is more complex than traditional neural networks. Also designing hardware, which actually implements spiking neural networks, STDP on a large scale is said to be fairly challenging. And then also developing algorithms which can actually effectively leverage all these like technologies. So the hardware that STDP is an ongoing active area of research. So if you're here, you're probably interested in active inference. So I wanted to highlight this. Actually, someone put on the discord today, one of these studies, which is pretty cool. So a couple of recent studies have combined neuromorphic computing with principles of active inference. So active inference comes from neuroscience. And I would argue that it lends itself very well to neuromorphic architectures in a recent paper on embodied neuromorphic intelligence. So not not, it didn't really mention active inference in it, the QR code on the top right here. It was suggested that a real breakthrough in neuromorphics will happen if the whole system design is based on biological computational principles with a tight interplay between the estimation of the surroundings and the robot's own state and decision making, planning and action. So some of those themes might sound quite familiar to people interested in active inference. And I would suggest that active inference is well placed to meet these requirements. And I just want to have a couple of recent studies here. So this one on the left, Gandolfi et al, recently demonstrated plasticity and rapid unsupervised learning in a neuromorphic system using active inference principles. The author suggested that their experiments could be adopted to implement brain-like predictive capabilities in neuromorphic robotic systems. And then there was the dish brain paper, which some of you may be familiar with by Kagan et al. So this was a hybrid wetwear hardware neuromorphic system that the author's claim was embodied. The system showed rapid parent learning of the game of Pong using the free energy principle for learning and the authors claimed that the system exhibited synthetic biological intelligence. So the field implementing active inference principles in neuromorphic systems is very nascent and the idea behind this more streamed series is to create a space and a community to share knowledge, ideas and expertise to catalyse the field. I think some really exciting technological leaps are probably going to come from this area. So thank you for listening to my quick run through Neuromorphic 101. And next up, we're going to hear from David. So David, over to you if you want to introduce yourself, please. Thank you, Sarah. I would share my screen. Can you see now my screen, the presentation? Yeah. Okay. Perfect. Okay. Hello. My name is David Kappel. I'm a researcher and group leader at the Institute for Neuroinformatics at the Royal University of Bochum. So I'm leading the group on sustainable machine learning and we have a very strong focus on neuromorphic computing. That's why I'm here today. And I'm going to start with a very similar motivation than Sarah did, which was really a great inspiration for this talk, I think. So probably most of you have seen this interesting recent result. I don't mean Germany winning the basketball championship, but this really big leap in artificial intelligence that we have seen in the last years, especially the last two or three years. So this is a picture generated from a prompt by a Dali network. And it's really amazing because it's considered science fiction like only two, three years ago. And this was essentially made possible by a neuromorphic approach, which is deep neural networks. And these deep neural networks have become huge now. But this comes also with a caveat. So basically the flip side is that these, among other problems, maybe these models may have, they consume huge amounts of energy. So models like Dali or JetGPT, as Sarah already mentioned, they would consume energy budgets that are comparable to houses or cars. So training JetGPT a single time is like a gigawatt hour approximately. So that would be 300 tons of CO2 emission, and many times, which comes down to many times the lifespan of a typical car. So this comes with two problems, obviously. This makes training these models only accessible to a very small number of very large players, so essentially the big tech companies. And secondly, and maybe even more importantly, this is not compatible with planet with limited resources. So if the growth rate of AI continues like it did in the last years, it will consume 13% of the global energy consumption by 2030, and it will basically outrun the transportation sector in another five years or so. So this raises the question, does sustainable machine learning exist at all? And obviously, since I'm working in the group of sustainable machine learning, I believe it does. And why I think it does is because we know a system that is very efficient and is still probably better than these AI models, which is the human brain, which consumes, as Sarah mentioned, around 20 watts or four bananas a day. And so it's many orders of magnitude more efficient than the AI models we have today. But so far, we don't know essentially how these networks work and especially how to train them. And basically our goal is to transfer now mechanisms from machine learning. That's so we have this nice picture here. Basically, we start from the machine learning side where we already know our way around. So we have these models that are wonderful and that give us really impressive results. But they are not efficient and we want to transfer them to new efficient AI generation. And our idea is to use inspiration from neuroscience that make this transfer faster and possible in the first place. Okay. So actually, biology is a great source of inspiration. And all this comes around the corner with very surprising results. And one of these results that I stumbled upon a couple of years ago is that the reliability of synapses in the brain. So as you probably know, neurons in your brain are connected for synapses. So this is a paper from 2019 and they could actually identify individual synapses and they could trigger them to make a synaptic release like a single transmission. But if you look at these measurements, you see that this is really covered in noise. So essentially, if you basically average over this and zoom this out, you see here these typical synaptic traces, which is the average white line here. But below that you see this huge chitter. So there's they really like go several standard deviations up and down. And this is actually very surprising, giving that neurons are probably the single most costly cell type in your body. In terms of energy consumption, they are really in comparison. They consume quite a bit of energy in your body. So you would expect that these transitions or transmissions that are communicated between neurons, which are very costly, should be highly reliable. So this is very counter intuitive these results, and has been puzzling neuroscientists for quite a while now. And then there is a there is a second puzzling observation, which is that the morphology of neurons looks somewhat like this. So this would be a typically a pyramidal neuron you have in your cortex. But you see that this is actually for a cell quite quite big and elongated. So this is this can be up to a millimeter in the human brain, which means that if a synapse fires somewhere here, it has a very hard time communicating with the cell body, which is down here. So the electrical signals that are produced here may travel down here. But the synapse up here has no way of measuring the actual voltages at the cell body. And this is actually the interesting place because here are the action potentials formed. So if the signups would really like to know about what is going on in the cell body, so that it can make predictions about how the neuron will behave and how it interacts with the world. And this is another very puzzling or an open problem in neuroscience, how this communication actually works out in in single neurons between the cell body and the and the synapse. It is known that actually the action potentials can travel back up. So they see kind of this binary variable when the when the neuron spikes, but they cannot actually measure the membrane potential down here. So only the most prominent electrical signals can actually back propagate through this. And this suggests that the synapse actually has very sparse information about what's going on in the cell body. And most models of synaptic plasticity don't cover this at all. So and we were wondering how does this interaction work? How can the signups produce useful learning signals given this sparse information about this important state of the neuron? And our idea was that essentially these two observations, these high levels of noise in the synapse and this large distance between cell body and and synapses, which give these high uncertainties, these are actually two sides of the same coin. So our hypothesis was that actually we could use the same models that we know already from the behavioral level of how an agent can can act and perform in an environment of high uncertainty. And we just apply this to to every signups and orders in error. So every signup should utilize this same models basically. And this model would immediately suggest that actually synaptic transmissions should be noisy. And these levels of noise would express uncertainty about the environment. And then we can use this model to derive learning rules and we can compare them side by side to biology. And this is the first thing I want to show you. So I just give a very quick introduction to the free energy model. Because some of you might not be that familiar with it. But this is essentially a model to describe a situation like this. You have a person that is interacting with some environment. And here I assumed very simple. So this person tries to throw a ball to some target. And we assume that we are we are good in solving such tasks. And we are also good in solving such tasks if there is high levels of uncertainty in this. So if the person may receive some visual feedback, but a lot of this feedback may be hidden. So you can imagine this going on behind some wall. And the person now still might want to predict what is the trajectory of this ball flying towards the target so that it can make an accurate action. So we will assign some variables to these states here. So we have essentially this feedback that this person can observe. And we have this unobserved state of the ball flying here, which we call you, where the person doesn't have direct access to it would only see parts of it, for example, when the ball is at the corners of the ball appearing. And then to model this essentially or to describe this behavior, the person would have an internal description of this trajectory. So an internal model of this state U. And this model would then be updated to match the observed feedback. And this can be described very nicely in this beautiful mathematical framework of the free energy principle. So the idea is that you would essentially establish a model of your internal state. So essentially how the internal states and the state of the environment interact. And you would have a model of the feedback. So how the states and the feedback you observe interact. And essentially you can then write down a loss function that that measures the distance between this model of the internal state and this model of the feedback and the external state. And then by essentially minimizing this distance between the two, you can derive all sorts of behaviorally relevant can solve behaviorally relevant problems. For example, learning, but you can also use this for other things, like figuring out what are good actions, for example, so making inference about both the internal states and the actions, for example. And this is in a nutshell the free energy principle. And this object here happens to be what is known as the variational free energy, which is also just coincides with statistical physics. And this is where this framework has its name from. But you see that it's all is probabilistic here. So essentially you have two probability functions q for the internal model and p for this interaction between states and observations. And you have here a distance measure between them that you want to minimize. So now if you look at the neuron and the signups and how they interact with each other, we find a very similar picture. So a single signups, which we have here in green has an internal state, which we were simply model only as the synaptic weight. Based on this, the signups, when it's triggered by a presynaptic spike, it would generate a post-synaptic current. This would then propagate to the somal, which is our external state, which we cannot directly observe, because it's too far away from the signups. But we can see a feedback, which is this back propagating action potential, which is this binary variable that tells us whether the neuron has spiked or not. So this is exactly the same framework, if we write it down like that. And we can just use the same mathematics to solve it. So to solve it, we only have to come up with a couple of, we have to make a couple of assumptions. So we have to write down a model for this guy here. So this model of how the feedback and the external state interact. But we have very good models for this. This has been started over many years. So here you see how typically a model neuron behaves. So you have here the membrane potential of a mickey-integrated fire neuron. And you see that this is just going up and down. So this neuron would receive a lot of presynaptic input and maybe also noise. And eventually at some point, it hits a threshold. It would generate a spike. So that would be the set that travels to the downstream neurons and also back to the synapse. And then it resets. Right? But now this, so we can write this down mathematically as a very simple differential equation, but the neuron doesn't have access to this state again. So it's again behind this wall, it only sees these spike events. But we can actually, for the simple case of a mickey-integrated fire neuron, we can solve this analytically. So we can write down what is the posterior distribution of membrane potentials given the spike times. And what comes out of this is actually a so-called stochastic bridge model, or in this case of a mickey-integrated fire neuron, it's an Ornstein-Urnberg bridge model. So this can be written down analytically. And this, I mean, it's not simple, but it's doable. And we can then just use this directly. So this model that we have, to again write down this free energy function, or so we make here an assumption how the synapse actually produces post-synaptic currents and how they're integrated in the neuron, but that's also given by the mickey-integrated fire neuron and actually the stochastic inputs that the synapse generates. So in this possibility, we only assume here basically Gaussian synapses that would inject, draw a Gaussian random variable and inject this to a mickey-integrated fire neuron. And then all these ingredients actually can be solved in closed form and we can derive learning rules that would minimize this free energy function right now. And if we do that, this has a bunch of nice properties because this Ornstein-Urnberg bridge is completely determined by the pre- and by the back-propagating action potential. So basically there's times of the post-synaptic spikes that arrive at the neuron. This shape that we get here only depends on two neighboring post-synaptic spikes, which means that we get here automatically a learning rule that looks like this. So a learning rule that only depends on the difference between two post-synaptic spikes, which we call the delta T2, and the difference between the post-synaptic spike and the actual input that is triggered at some point on the pre-synaptic side. And we can basically make here this look up table and just compute what would be the update that these synapse would need to make so that it learns optimally in terms of this free energy principle. And this is the shape that we get out. You see that there is a strong dependence on the post-synaptic firing rate, but there is also a dependence on basically this typical STDP that Sarah mentioned before, what is the relative positioning of the pre- and post-synaptic spike. So in a nutshell, this model can now be split into essentially two pathways. So we have this a torque response, which basically just whenever there is a pre-synaptic spike that triggers an action in the synapse, we would draw from this Gaussian distribution and inject it into the neuron. And then there is this post-talk update where the synapse would look up in this Onsen-Ulenberg bridge. What would have been the optimal output that it should have generated? So what would have been the optimal action? And then it compares the actual action with this optimal action according to this free energy principle and then generates a delayed response, which is an update of the synaptic weight. Okay, and importantly, this internal model is only implicit. It's encoded, so to say, into this spike-time-dependent plasticity rule. So how do these forces look and how do they compare to biology? And actually, the fit is quite nicely given that this is derived really from first principles without making any assumptions. So this is the measurement in biology. This is this B&R rule, B&Pool, very old work where they actually did this in vitro studies where they measured the injected pre- and post-synaptic spikes and then they measured what is the weight change in the synapse. And this is the rule that is predicted by our model. And you see that at least in a first-order approximation, it gives us very similar shapes. And this also makes sense because the synapse wants to change the most when it's close to the pre- and post-synaptic spike times, because this is where or the post-synaptic spike times, because this is where it knows the most about the state of the post-synaptic neuron. The free energy principle would actually suggest these kind of cones with almost no assumption, essentially. But we also get, because we have not just the first-order spike time dependent plasticity rule, but we also have this dependency in the post-synaptic firing rate, we can also compare this to other results. And this here is this old work by Grouppen and Grunel. So this is actually a model, but that was very detailed, describing the plasticity based on the pre- and post-synaptic firing rate in the synapse. And this is what our model predicts. So if we inject random pre- and post-synaptic post-synaptic spike trains with different rates, our model would predict this shape, which again is not a perfect match. But given that this is a very idealistic model, it's actually at least the main features that low firing rates on the post-synaptic side would lead to depression and higher to potentiation are reflected in this. Okay, I assume I still have 10 minutes, right? Okay. So I would give a quick intermediate summary and then I want to show some other work where we actually apply this now to actual machine learning model. So what we have seen here is that synapses are actually very stochastic and this was a big puzzle. And we suggest that actually the synaptic noise is actually the synapses way of reporting its own uncertainty about the environment, where the environment is actually the post-synaptic neuron and it really interacts in this fashion of the free energy principle with this post-synaptic neuron. Or that's a very nice way of describing it. And if you're interested more, there's a paper, a pre-printout, you can read up on all this. Okay. So how does this now connect to neuromorphics? And actually, so we are not actually doing neuromorphic hardware, doing neuromorphic algorithms. So we try to bring these inspirations now into actual machine learning models. And we thought that this might be a good attack angle to solve a problem that is well known in machine learning. So I just drew here very simple convolutional neural networks with various convolutional layers and then maybe some dense layers that you would have in your machine learning algorithm. And the way this is trained, as many of you know, I guess, is through end-to-end aerobic propagation. So the idea is that you have a training set which has inputs and targets. So for example, in a classification task, these could be pictures of cats and dogs and you would have targets which are class labels, so to say. So some new, there's actually artificial neurons in there. And one of these neurons may be active for cats and one may be active for dogs. And in your training data, you have exactly these labels that were generated by humans that were sitting down doing this by hand. And then during training, you show these examples to the network by propagating these inputs all the way from the input layer to the output layer. Then the output is here compared to these hand-labeled targets. And then the mismatch between the two is back propagated through all these layers back to the input. And all the weights or the synaptic weights that are here in between inside these layers would then be updated accordingly so that after doing this many, many times, this network becomes good in telling apart cats and dogs. So this has a problem, this algorithm, it works great in practice. And it's the foundation of all these models we have talked about, like the LED or LGBT, but it is quite inefficient. And the problem is what is known in the literature as the locking problem. So if you would split up this network now into blocks, which I already did before, but this is arbitrary, but for implementing this efficiently in terms of a software algorithm, it might be interesting to do that. And now you would want these blocks ideally to run in parallel so that you can basically show the first example on this first block and then already train it while the second block is doing something else. But this is not really possible with end-to-end back propagation because of this locking problem because the activation of the second block depends on the activation of the first block. So you have to propagate it all the way to the end. Then you compute this error and you would then back propagate. And only when this is done, you can start the next epoch where you show a new bunch of examples. And you see that during all this time here, the thread that would run this first block maybe would be idle and has to wait essentially all the time. And this obviously makes them very inefficient. And now our idea was that we use basically what we had learned from this earlier model on how synopsis communicate over these long distances for this free energy principle and also apply it just to a deep neural network. And the idea is that you have here again this, basically you have already this generation of inputs to some output. But what is missing to make it applicable to the free energy principle is this feedback that you always need. The idea was that we put here a very lightweight feedback network. So essentially each of these blocks now in this deep neural network would be accompanied by a feedback block that locally generates a target. So we used to really, in the simplest case which we have, so this is very recent work, we only used linear blocks so far. So these are single linear layers. And we would generate now these outputs here in these feedback blocks and then use the free energy principle to derive a local loss that allows us again to minimize both these feedback weights that we have here and also the weights in the forward network. So it's essentially the same idea. So we have here these outputs which we interpret now as parameters to a probability function. So we can apply this probabilistic framework. But now all the rest basically rolls out the same way. So we assume that these outputs are essentially the internal states of this model. And we have these observations given in the inputs and the targets. And now we try to minimize basically P would now be this feedforward network and Q would be now a function that contains features of both the feedback and the feedforward network. And the nice thing is that if we, I don't have time to go into the details now, but if you write this out, you see that this actually decomposes, this log term here decomposes into local linear terms that give you these local losses here. So essentially that you can minimize your block local between forward and the corresponding feedback block loss function. And you can then actually do this in parallel. So because maybe the picture is good to see here. So what you have to do now you have a bit of overhead because you have this feedback block. So this would be the two execution times of the feedforward block and the feedback block. But in principle, they can run in parallel. And once the forward block is done, the next forward block can start propagating through this network. But simultaneously already the forward block because it already received a target here can start updating the weights. And when it's done, it's free to operate on the next epoch. So there is no locking anymore in this, in this framework. Okay, so I'm pretty much done. I'm also out of time, I think. So how does this perform? Because we changed the learning algorithm, now we have to also go back and see if this is still giving us the same performance. And actually for, so this, as I said, this is the first results we have here now. And at least two mid scale data sets like Cypher 10 or so. This seems to actually perform very well. So we have tried it for standard architectures fashion MNIST with ResNet 15 and ResNet 18. We've worked mostly so far. For like small data sets like fashion MNIST, we've applying free splits to ResNet 50, we get basically the same performance as standard backprop. As networks get deeper, you see that there's actually the our problem that we have now is actually overfitting. Because we have this local target, it seems that these smaller blocks actually overfit to some extent. This is not so severe for still up to tasks like Cypher 10. So we get quite close already. But if you go now for really large tasks, there's still something missing. We are, for like single splits, we're getting there, but we're not reaching all the way up to back propagation. But it's still interesting to see that you can apply this principle also to these standard machine learning algorithms. Okay, this is my second summary. So actually, we found that deep neural networks are surprisingly good in generalizing over probability spaces. This is how actually this work started. And our idea was to explore this and to utilize it to distribute learning in the same fashion as in the first project I showed you. And to solve this credit assignment problem by generating these feedback networks. That's basically it. And then I want to acknowledge my co-workers and my students. So I have two very good PhD students, Carlisle and Kaplan, who is now here in Bochum and works on this topic. And the first project I showed was work I did together when I was in Göttingen with Christian Tetzler. And this project, the second project is a work I worked closely with Christian Meyer and Anand Sutomoni. And yeah, thank you. Thank you very much, David. That was absolutely fascinating. I think it's incredible how closely the sort of model matched biology, like considering you derived it from first principles, I think that was really cool. I did have a question just about the last sort of the bit you spoke about the convolutional neural network. And, you know, you said traditionally it has to go all the way end to end, which is really inefficient. And then you showed the results that you got. Did you look at energy consumption with yours as well? Not yet. So we are actually currently working on... So it's actually not so easy to implement these things in standard machine learning toolboxes. We have a... So Carlisle is currently looking into this, the PhD student in Dresden. He has an implementation now, and he's now evaluating how well we can make use of this parallelization in practice. But we are actually quite confident that for parallelizing it should be there. The question is how much you save in terms of energy, because for these smaller scale models that we use now, ResNet 18, ResNet 50, the effect might be not that huge. So once we ramp this up to really larger models, the effects should be bigger. But yeah, this is ongoing work. Very cool. Thank you. And then I was just wondering as well, like this local error back propagation, is that something that other people have tried with these convolutional neural networks? Or is this quite a new way of implementing it? There is a bunch of approaches that do this. So there is, for example, I mean the closest, I guess, is target propagation, which has been proposed, which essentially uses random feedback weights to back propagate here. So these guys would not be trained. And those, as far as I know, this works nicely also for small scale problems. But as far as I know, they don't perform that well, even for Cypher 10, it already starts breaking down because these random feedback weights are just too coarse in approximation, I think. And this is the first, okay, maybe I have to be careful. I think this is the first method that allows you to train these feedback weights that is not a contrastive method. So there is a bunch of methods that use a contrastive step. So you've maybe seen this forward, forward algorithm and all these things. But what they have to do always is they send in the actual input data, and then they send an kind of anti-input, so an anti-input that is usually generated artificially. So they do some distortion to the input to make it, and then they train, they have to use both informations locally to do the updates. So the network has to keep in memory the input and the entire input and the responses. And this makes these approaches a bit harder to parallelize in this. And the nice thing here is that we derive an upper bound to this variational free energy loss that can be spelled out completely by forward propagation. And that I think is new. So that is the new bit of this one. That's awesome. Wow. Yeah, few comments. One piece that kind of between the two of your talks, that was at least a new distinction to me, which was the difference between the neuromorphic hardware and the neuromorphic algorithms. So it's not just about new hardware or wetware, though that would be great to see. It's almost like there's this intermediate or a bridge step with using the algorithms on the hardware we have today that, like Sarah mentioned, the spiking neural networks, which are amenable to GPUs, or just using standardized CPU multi core scheduling approaches, you can already do more with what we have using the neuromorphic algorithms. So it's not just a material science topic. But also there's a lot at the really micro scale that we can learn related to noise processing, scheduling, and then also even at higher levels of abstraction, probably learning from biomimicry and cognitive systems more generally. But that was like, that was a distinction for me. Yeah, maybe to add. Yeah, so I think that the problem becomes really now more pressing as these neuromorphic devices become also the hardware devices become more mature. And they usually cannot really shine on this on this standard machine learning algorithms because they are really optimized for GPUs. So you need to think a little bit. So you have to take one step back and think again about the algorithmic side to really use them to full capacity. And we also call, as you have seen, we collaborate with Professor Maya who is doing this spinnaker chip in Dresden, but that's also other approaches and like the Luigi chip that Sarah mentioned from Intel and so on. And they are really looking into this now. They are also from the algorithmic side. I find it really useful. It's kind of like a mindset or a mental framework when I'm thinking about computers or AI. And I suppose because I'm a neuroscientist, but I always have to translate it to all how does the brain work, how does the information processing work, etc. in the brain. And when I came to computer science and AI after neuroscience, I found myself naturally translating it. But I feel like the framework is just a really useful way of understanding computation at the end of the day, because our brains, as I said, are just these massive supercomputers. And I'm constantly reading papers on computer science or whatever. And then once you can conceptualize anything really as well, how close, how neuromorphic is this? And then if you start thinking about, well, how could you tweak it so it's slightly more neuromorphic? And is that then going to give you these gains that we get with the brain? Like, is it going to give you some like extra parallel computation? Or is it going to give you some energy efficiency? So yeah, I find it like the definitions when I was looking at a definition is, I think it really depends who writes the definition. And because it's such a dynamic area at the moment as well, I think it's going to be, it has been changing and it will be changing. But for me, I feel like neuromorphic is more of a frame like a mental framework where I look at things through conceptualized through. Yeah, I think it's also not very well defined. I mean, you also mentioned that that's actually artificial neural networks are a neuromorphic concept, if you want, and they were from the first day. And it's actually, it's a big success story, right? If you look into the 90s or so when these support vector machines and these alternative models came up, but none of them have, have outlived the new neuromorphic approaches. So it's actually very nice, but still there is this community community that thinks that there's more features from the brain that you need to put into to get to the real thing. So I think this is a bit of a, it's not a very well defined term, actually. And I think with your research, yours is almost like the smallest level that I've seen people look at it on. I don't know if you've seen anything else, but we're not just talking about like a cell level of free energy and active inference. We're actually talking about like a cell structure. And then like, so then you think, well, how small does it go? Are we going to talk about cell subcellular structures, you know, eventually like mitochondria using free energy in a similar way with compartments. And yeah, so I guess actually it'd be interesting to get your thoughts, David, on, you know, you talked at the start about how synapses are compartmentalized. Do you see different sort of instantiations of this in different compartments just within one synapse almost? Like, are you wanting to sort of look at that granular level? Or is it, is it more now taking what you've learned from this and putting it back into how can we sort of, you know, make the AI more efficient? Yeah, we are going much more in this direction now that we see how we can build this back into AI models. So I think that the free energy, so one has to be a bit careful when using the free energy principle, because it's such a powerful general framework that you can apply it to basically anything. And it's not necessarily will come up with a useful result in the end, just apply just putting this. And we basically, I mean, this started actually as a side project, this was my kind of COVID and they make lockdown project. And I was just curious about this and whether you can actually solve this because I thought the synapse is maybe simple enough that you, because when you go into the papers, they have at some point to go into some approximations, they do some mean filter usually. So they go for first modes. And then you can solve these guys for more complex, even for neurons, it's hard, actually. If you go to the neuron or network level, it's hard, it's very involved math. But for a synapse is simple enough, actually, so you can actually do this and spell everything out if you make the right assumptions and really just derive these things. And that was kind of just kind of a game I went into. And then it turned out to work quite nicely, I think. I'm not sure if you are like mitochondria or so I'm sure you could apply the same principles, but I'm not sure if the results you get would be any meaningful or would help you in any way. That's always the risk. You invest so much time and then in the end, you get some results and you don't know. That was also something that I thought was quite interesting, which was the synapse was the agents. It's really easy to think, oh, we'll make an agent based model of a neural system. First off, that tends to not include glia or non neural cell types, but it's almost like a doubly unquestioned assumption that the cell would be the agents. But then it was a great transition from the person throwing the ball over the wall as an action-centric approach where you only have partial visibility of the consequences. And then that is the exact scenario that the synapse finds itself in. And in a different way, or it could have been set up so that a neuron is the agent, we're building maps, not territories. And so then, just like you said, free energy principle, it's a principle for everything. And so just making principled statements about things is table stakes. And then, I guess my question for you is then what does make it useful? Or in your learning and tinkering around with these models, what differentiated situations where you applied free energy principle or active inference, and you felt like it was providing a contribution to your research direction versus where you played around. And it was like, well, that was tautological. I think the free energy principle makes sense in a context where you have incomplete information. So as soon as you have, so for example, in the synapse case, the question we started with was this problem that the synapse has to solve, that it has incomplete information about the state in the cell bottom, because it only sees this kind of, that's the assumption at least of the model. And also what we get from the experimentalists that they essentially only see this back propagating action potential. So it sees a single binary variable about the state of the soma. So essentially, this is a problem of incomplete information. And also the second ingredient that you need an agent that you need some form of agency. I think if you apply the free energy principle to a system without agency, so if something is not interacting with an environment in a closed loop, then it becomes really sketchy. And I think already this model is on the edge when it comes because these models don't really have an agency, but they at least produce an output. So you can still think of this as an interaction with an environment. But some as soon as you lose that, I think then there would be simpler models that can just give you the same. And actually in the synapse case, the agency is only this adding noise actually in the model, because the synapse is triggered presynoptically and then it adds this it uses its internal state to add the right amount of noise, which is probably already the minimum agency you could imagine. Sarah, you want to ask a question or I can ask a question? Yeah, it was more just a comment. Like I think it's quite interesting. Like people talk about biology. Some people say it's not a real science because it's all messy and noisy. But I think it works really interesting because it's like you say the synaptic noise is actually a reporting uncertainty. So in that sense, it's actually probably quite accurately reporting and the messy world rather than the biology itself just being all messy. But that's just what I was thinking about. Yeah, I think I was curious as well. Like you said, this was like your lockdown project, but I'm just interested in how you sort of came to use the free energy principle, how you came across it. Was it something you were quite familiar with already or some of your network or peers were talking about it? Or did you stumble across it in a paper? I mean, I was when I was doing my PhD, I was interested in in variational methods and probabilistic methods. And then I started reading about this. And so I read a bunch of Carl Christensen's papers. And I found it interesting. And my PhD supervisor always encouraged me not to not to go in that direction. And then after I finished my PhD, I thought, okay, now I can do what I want. I tried out. So yeah. And then I guess do you think it would be worthwhile like next steps for you or for the field actually trying to implement this on, you know, maybe some of the more analog chips that are being built in the space like the analog neuromorphic chips, which I know you can have like some scientific synapses and things like, do you think it'd be worthwhile trying to implement it on hardware or what are your thoughts on that? I mean, the triplet rule that comes out from this first work I showed, I think that would be interesting to implement it. It is the nice feature is that it should in principle be kind of this should have this self stabilizing feature because it's really mimicking the the dynamics of the membrane, the cell membrane. So if the if the neuromorphic hardware would so if the model in the signups and the neural model match up very well, the model should give you this nice self stabilizing feature so that neurons really not go into some epileptic states or so and you get this for free from this model. That's what we saw in the simulations at least. But in the simulations, of course, we have we had full control of this dynamics matching up in the right way. So that is probably a bit more tricky for hardware, but it's probably solvable. So it would be interesting. What you get for it is that you have this purely event based force, right, which only use pre-post spikes, which is nice. Very cool. Thank you. Daniel, if you have questions. Well, that's a great principle there, which is like if you can design the neuromorphic algorithm so that it harnesses a material feature like the actual leaky permeability of membrane or actual spatial proximity, if you can leverage a material feature, analog feature, that isn't virtualized, then it's already an adjacency into future hardware. So that's one great point. And then to Sarah's point about like, almost biology, not being a science, which there's a famous quotation, there will never be a Newton for a blade of grass. Because some people say, yeah, it's a different biology is more like history, because whether you approach this from a development or ecology or evolution perspective, biology is historical science. It's not like a real science. And then that remind me of the cross country shirt that says, our sport is your punishment. So it's like, well, no, like that your noise is biology's signal. And that's how it happens. My question was about this tension between, I guess, neural and computational ways of looking at the resources associated with computation. So from the von Neumann paradigm, we have a lot of shared reference points, CPU cycles, RAM capacity, and all these kinds of things. And like, even in your introductions, you conveyed like, well, this is how many CPU cycles it's going through, or this is how many parameters would have to be stored or something like that. However, that's referencing another paradigm. So what do resource descriptors or capacity or capacity descriptors look like? When we're outside the space of, okay, yeah, power consumption, that's something that you can put into a box and just use a bomb calorimeter. That's kind of like a low hanging fruit. But now, okay, beyond just the sheer energy or caloric requirements, what can we say that is like analogous to the way that we talk about the processor or the RAM or the hard drive on a computer? I mean, yeah, I have to think about that one some more. I do think there was some interesting comments in the paper on that slide. I showed that talks about the brain and energy. There was a paper I linked to, I'll have to get the reference and let you know what it is because QRK has gone now. But that had some interesting ideas, I think, on what you're getting at there. But yeah, I'd have to defer to the paper. And how do they describe what is being designed? So they say it has this many of this type of component and then that might do nothing, though. So how do they describe or evaluate these different designs or algorithms? I think it's all different depending on the use case. That's what I've found really, like the language is different depending on, you know, if it's written by someone maybe with more of a neuroscience background or engineering background and then you kind of get used to the sometimes they're more interchangeable than others, but I do think the terminology is something which needs to be looked at a lot more closely in the space because then I think that will help everybody working in it to be on the same page a little bit closer. Cool. Well, any other thoughts or questions? David, first, and then Sarah also I'm very curious, what direction will this series go? But first, David, what are any other kind of closing comments or directions you want to provide? Not really. I would say thanks for having me today. It was really a pleasure to discuss with you. Thank you, David. It was amazing to have you on. I think your work's absolutely fascinating and I think it's going to be have lots of benefits in the future for implementation, which is always nice to see as well. Yeah, so what was the question? Where do I see the series going? Hopefully we can have a new guest each month. I think it'd be kind of cool maybe next month to do to have someone who's building hardware. So like maybe someone on the brain scales team or spinnaker team or something like that would be pretty cool. But yeah, really, I just want to have a space for people who are interested in this intersection to meet people who and see talks and reach out to people who are also working in the space because it's pretty niche, but I think it's pretty important. Actually, having said that, David, could you let everybody know if they wanted to reach out to you, what's the best way for them to do that? Cool. I'm not very active on this Discord channel, so maybe email is still the best to reach out to me, I guess. Cool. Do you want to give your email? Oh, I think my email should be easy enough to find, but you can also give the email out there. People can check the papers and then in the active inference institute Discord, there's the neuromorphic channel. All right. Thank you, David and Sarah. Really cool to see Morphstream kick off its developmental trajectory this way. So till next time. Thank you. Bye. Bye.