 Thank you very much. Well, it's a pleasure to be here. It's thank you for inviting me to this beautiful place. It's not too loud, right? Okay, so So many years ago, so let's see if I can make this computer work Let's see So many years ago about I was trying to About the time I met Yan actually which was about 25 years ago. So we were playing with things like that robotic catching and the By the time over it done that we were pretty good at it. It was actually pretty hard for the robot not to catch a ball And again, the universe is mostly for the dates about 25 years ago But we also Realized we would have to get a little organized if wanted to develop a basketball player or something like that Which actually we had no intention of doing Because even getting this to work reliably on lots of objects and so on required about a hundred thousand lines of code at the time and so we figured out we wanted to get a little organized and so So of course the usual place to look for ideas is nature, right? and if you think about the brain it has about 10 to 11 neuron and We have about ten to the fourth connection for neurons. So these are big numbers, of course They have a number of implications first of all since you know, this is a this is a Workshop which has to do a lot with graphs, you know, it means it's a very sparse graph Right because you have ten to the eleven node and only ten to the fourth connections. Okay, so it's extremely sparse graph Okay The other thing is that you know if you take this ten to the fifteen connection and do a little very quick calculation It means that from the time a baby is born to the time she goes to college She creates about a million new connections per second Okay, so this is just a regular to draw, right? So if you just if you just compute that so you have about a million new connections per second And of course it's much more than that because actually there's lots of pruning. Okay So in a sense, you have this very big dynamic system with lots of feedback and so on an extremely active system and Another number by the way is that it's using 20 watts, right? It's using 20 watts. It's not using a Power plant or it's not using a dam or whatever to do its computations. It's using 20 watts and the other number is that It's doing all these things which we now trying to reproduce and perhaps Excel with deep learning and other things with hardware, which is desperately slow, right? A neuron works about 10 million times slower than a transistor, right? It's saying, you know, budget cuts suddenly when you have your hardware is 10 million times slower have a good day, right? So but it's exactly what the what the brain is doing, right? It's doing this desperately slow hardware So The one of the main properties of course that you know brains and biological systems have is that the result of evolution And evolution is something that works by piling up things that work Okay, both revolutions and through development and so you have this in the emotional response I'll go with three quickly. You have this in in the immune system, right? You have innate immunity which is old and fast and you have adaptive immunity Which you know The basis vaccination and so on which is much slower and involves some kind of memory, right? So it's actually a very cognitive thing if you think you the immune system Similarly, if you think of motion control The Inverter Brits at least or at least in some classes of vertebrates Motions are obtained by combining elementary force fields. Okay, and just a few of them Okay, so I won't get in the detail of that But basically you have motor primitives and the motor synergies and you combine just a few of them to get all of the motions You know, so in the case of the leg of the frog about just four motor synergies So in the brains, there's a lot of feedback also, you know in the sense a lot of what we see today in In learning system is very much at least in real time is very much feed-forward There are the feedback as we know actually so this is a very This is kind of an old drawing but there's millions of connections feeding back from the cortex to inner nuclei To the talimus the talimus is the place In the brain where all the information that eventually reaches the cortex has to go through all the sensory information Except for smell because smells a very old sense, you know bacteria you smell to go up chemical gradients So but apart from a smell all sensory information that reaches the brain Eventually has to go through the talimus. Okay, so on top of this drawing I put two more recent informations one from Sherman and Guillory and one from Rodney Douglas and group in Zurich If you look at just the connectivity of the talimus, right? and you look at The anatomy of the talimus if you look the number of connections the connections that come from the senses are only 5% okay, so the data is very sparse, okay, and similarly if you take if you take a Neuron in the cortex most of the information that comes to it is from neighboring neurons very little from the talimus Okay, so in other words you have this huge dynamic system. Okay, but the input is very very sparse Okay, and by the time it arrives to the cortex. It's really sparse Okay, and so it's a little as if the natural state of the brain was something like dreaming and from time to time you constrain it with very sparse reality, right but the the whole dynamics is is this entire dynamics is There whether you constrain to reality or not And of course in genetics you have similar things, right? I mean you as you know 98% or so of the human genome does not directly in code putting sequence and a lot of that Doesn't encode is used precisely for regulation control. Okay Similarly, we have all of these beautiful and very powerful mathematical techniques and wavelets and so on Which I use all the time on the internet and other places to transfer data Still at least last I checked There is no known computer algorithm Then can reliably pick up you listen to a jazz quartet and extract the trumpet Okay, but seems really easy, right? But you know any two-year-old child can do that Okay, if you once you told her what the trumpet is right she can sing the trumpet, right? But there's no computer algorithm and can do that at the moment You know the people are trying that but may this may have again a lot to do the fact that you know The auditory system is not this kind of hierarchy and so on is it's a very strongly a feedback connected structure Similarly, of course we this is a long introduction just to not to do only math so Similarly The brain uses a lot of prediction, right? Largely because well one of the reason is that it's is dealing with his desperately slow components So of course, you know, it's doing prediction and catching As our robots did right, but you know It's of course doing prediction and crossing the street because presumably you do that differently in Paris and in London or you should You your mother may have told you that you know, it's important to wake up at regular hours You know, she may not have known why but it's because Years before you wake up your brain starts preparing your body for waking up Okay, and that starts hours before and if you keep changing the hours you wake up after a while your brain realizes you're not worth it Similarly conditioning right all of conditioning is based on prediction, right? the placebo effect right 42% effectiveness of sugar pill as compared to real So-called real drug unless there's something really Magical about sugar, you know some things one that should be studied much more in the 42% right or something like that effectiveness right Wagner operas. I don't have time to talk about they'll be happy to answer questions about and Illusions okay illusions are also a reflection of the fact is the brain is is dealing with very quick Mechanisms to do prediction and sometimes you can fool them it in the sense. It's a feature okay, and some people have been working on illusions to try and understand the brain One of my favorite such illusions from my colleague Ted Adelson at MIT, right? Anybody looking at this picture Will think of this as a as a gray square and this as a as a white square, right? But of course if you if you photoshop this and just get rid of everything that a and b You'll find out they're exactly the same okay, and that's because your brain is making all these assumptions about what's going on in the scene, right? and so You interpret this perfectly as a gray and be white, but of course they're not okay And it's a feature right because nobody said that the brain should be a photon detector the brain is meant to interpret scenes Okay So it's a feature okay Of course synchronization is also something which is used everywhere in the brain coincidence detection mirror neurons Binding and so we come back a little to that and you're probably all familiar with kind of pictures, right? Which are still pictures, but can make a lot of people sick actually And that's you know, it's kind of reflection that your visual system and the brain is a dynamic system Okay, and so you're seeing all these things moving when none are So let's get more into the math if I may say you know This is not a profound math as compared to a lot of what's done here is rather simple things so We need to we were interested in building theories that allowed to deal reasonably simply with questions of Nonlinear dynamics behavior simple nonlinear dynamics behaviors the effect of feedback and Particularly the idea of being able to pile up things that work and still have something that work, okay? so we There's something in in control in dynamic systems, of course very well known She's liable in the theory and liable in the theories is a sort of virtual physics but more precisely a sort of virtual mechanics, okay, and I had at some point this extremely brilliant student we need low Miller who came to work with me and We started wondering whether instead of doing virtual mechanics. We could do something like virtual fluids Okay, and we ended up with what we call contraction theory, which turned out to have a lot of past history in various forms But let me explain Presented to you the way we played with it again, so We need a definition of stability right we have this nonlinear dynamic system time derivative of the state x is equal to some Function only function of the state and time here. We assume ods. We could do that with PDs as well to some extent We wondering Well by definition we'll say this system is contracting is any two solutions converge exponentially in other words If you start with any two arbitrary conditions is in this normal in your system. They'll end up doing the same thing exponentially What? The same thing they'll end up doing the same thing and so the theorem is that This this is true if and only if the Jacobian of the system so the linearization of the system everywhere, okay it's negative definite in some metric, okay, and So in other words originally what it's a little weird that we can deal only with Jacobian because it's a nonlinear system But it's not the Jacobian at the point is Jacobian everywhere Okay, so Jacobian everywhere is more or less has the same information as the entire function, right? There's just a different way of seeing it so So that's the definition and that's the theorem right So let's let's maybe get a just a little more technical, right? So what does it mean some metric it means as there exists a transformation an invertible transformation such that theta transpose theta is positive Definite that's the metric and then there's something we call the generalized Jacobian, which is basically the Jacobian of the system computed in this metric, okay, and so The So basically and and the proof basically at least the proof of the sufficient conditions extremely simple It's just a few lines basically to some extent it's kind of Riemann coming to the rescue of Lyapunov right because basically what what you do is a sort of Lyapunov proof for differential displacements and then you take a path integral at fixed time of You compute the length of a path and the length of the past shrinks exponentially because every little element shrinks exponentially The style of doing things with contraction is a little different from the usual style in the sense that you don't have to play with an error signal, okay, so for instance You don't have to you can but so consider for instance the Lawrence attractor which is definitely not a contracting system But suppose you're trying to build an observer for the system In other words something that would measure for instance x and would reconstruct the other two states y and z all right Suppose you're very lazy. So you just copy the two last two equations Putting hats on y and z to mean the hats are the estimates that the observer generates, okay So suppose these are just the two last two equation copied. So what can we see about this observer? Well, if we pick an identity metric in other words We just look at the Jacobian is Jacobian is this right so x is a function of time, of course But it's Jacobian is this is obviously negative definite because you know once you take the symmetric part You just get these two is the diagonal, right? so in other words this system is contracting with a rate which is the minimum of what in beta, okay and Sorry, and of course it has Of course it has two particular solutions one is One is the real system because it was a copy of the real system. So the real system is a particular solution and This this system therefore Converges to the real system with a rate which is The minimum of one and beta right because it's it's a system It's a contracting system. So any two solutions converge to one another. We know one solution namely the real system So this system exponentially converges to that solution, okay? So it's a one line if you want it's a one line proof, right? And it has no error signal, right? It has just the fact that the observer is contracting and you know a particular solution All right So as advertised These contracting systems if you consider them as consider them as Lego block have very nice aggregation properties So if you take positive Parallel combinations with a positive coefficients negative feedback series and cascades hierarchies Translation and scaling wave look like an all of the above a very preserved contraction, okay? So in other words base once you've have the contraction properties of the building blocks the dynamic will building blocks when you can Combine them in pretty much any way you want using simple rules and automatically the entire system will be contracting and Therefore will verify the property that any two solutions tend exponentially to each other Okay, and at that time we we we said well, you know Maybe nature is using this idea too because you know It certainly helps if you take two things at work and connect them But at least the result is at least stable And so we said well, you know, it's a it's a probably a good a good idea for for modeling Natural object at least beyond a certain scale to have this this property We didn't call it that way because that hadn't yet its word that didn't hadn't been invented But it's an instance of what's now called evolvability Okay, so in other words the notion that you know, there may be certain things that are done by biology to To make evolution easier, okay, and that could be one of those So I'll skip the proof that the proofs are very simple of these combination properties Okay, so for instance if you take the the hierarchies you have this notion of composite variable which is You can if you have let's say a second-order system and use a composite variable Which is weighted sum of position and velocity and you control now that variable You reduce the order of the the problem But you have an equivalent problem and that corresponds to creating a hierarchy of contracting systems Mode of primitives in robotics or in frogs, you know again if you if if the basic mode of primitive is Contracting then any combination of these mode of primitives with time-varying coefficients will be contracting Entrainment if you take a contracting system Which is driven by a function of vector of time which is periodic in time Then the system will tend towards a periodic state in time robustness result and so on if you can have alternate norm resurgence, so this is with you Euclidean norm that you could have Different that equivalent criteria with the z-norm and the infinity norm which would have to do with diagonal dominance suppose you have a network of Network of dynamic systems you can using a very standard result Replace this network by a directed acyclic graph of strongly connected components, okay You tell this is a completely a you know standard algorithm to take a big network and make it look more like this like like a Cascade of strongly connected components since hierarchies is contracting systems are contracting then if each of these Strongly connected component is contracting so is the entire result, okay, and they could all be contracting in different metrics and so on Getting back to this biology and evolvability right There was this paper completely independent work, of course by Gerhardt and Gershner later on in 2007 which talked about facilitated variation and the idea that these are biologists real biologists with the idea that In a sense you have core processes which are highly conserved in evolution things like DNA transcription, you know or DNA replications things like sexual reproduction and so on these are very They are core processes that you see in lots of animals and so on and that evolution is mostly targeting the way you combine These core processes, okay So in terms of philosophies this is very close to what we saying because if we say well You know you have these kind of core dynamic systems And then you can combine them any way you want and you will still have a contracting dynamic system And you can play with the combinations to to get the functionality that you want similarly People like Urielon have been looking at the fact that if you look at networks in in biology You don't find random networks, but you see some very specific what's so-called network motifs and so for instance he their group found those in in a number of Natural networks and if you wonder what these network motifs correspond to what you can show now Of course the motif in the networks behaves differently from the motif by itself So in studying the motive by itself is not particularly Helpful, but what you can show is that these specific motifs are precisely the ones which preserve contraction best Okay, so in other words which given the stability the stability properties of the elements Optimize the stability properties of the result. Okay, so for instance if you plot the contraction gain or the contraction loss as compared to The whether you're a motif or something which which is something that you find very often or an anti motif Which is something which you find very rarely. You see the very strong correlation between finding a Motif often and having a minimal contraction loss as you combine these things Similarly, if you have time delays you can on the very simple Conditions say that if you have two contracting systems, there are specific kinds of coupling Which preserve contraction under the time delay and these couplings have to do with the metric Okay, so so for instance if you take two linear stable system, which are obviously contracting if you couple them with using What's called a PD controller and namely a controller which has something proportional to the position error and something proportional to the velocity error this is Not contracting in the right metric and as a result the system will be unstable If however, you just use a D controller something just proportional to the error in velocities Then that's contracting the right metric and therefore the error will be stable okay, and so you can therefore take these systems and Based on their contraction properties this gives you constraint of the architecture to make sure that the overall system is contracting Okay, and you can see this in this kind of more general diagram that by now you see all the time today, right? with feedback with feedback With You can see that This basically this results tells you if when you have when you have this kind of predictive feedback hierarchies, right the feedforward and then feedback stability Stability demands if you want on the overall system specify some very specific kinds of connections if you want to be if you want to be Stable and presence of blades, okay now these questions Related to a young stock yesterday Turned out to apply directly to the questions of optimization continuous time optimization because if you think that if you think about This condition that the joys Jacobius negative definite you can equivalently write it at this matrix condition on the metric, okay? and This matrix on the metric as we said implies that any two trajectories will tend exponentially to each other More technically it says that the length of any geodesic With respect to this metric will decrease exponentially at rate alpha, okay? now if you think of the most simple gradient descent, of course if you take a gradient descent and you take is Jacobian you find the Hessian Right, so this gradient descent is going to be contracting if the Hessian is positive definite, right and the and the Minimum eigenvalue the Hessian will tell you the contraction, right? Okay, so that's fairly obvious. Just take a gradient and you take the the Jacobian to get the Hessian And you don't need a metric just an identity metric What's a little more fun, which in a sense Was floating around in the when we thought about contraction that we never said it explicitly So is that this is really? Directly related or directly applicable to all of what you can say about natural gradient, okay? So suppose that you think about the natural gradient very much kind of continuous time versions of what Ian showed us two days ago Is right so x dot equal the inverse of a metric times the Euclidean gradient Okay, so the natural gradient direction is direction of steepest descent, right? According to distances measured on the space equipped with a Riemannian metric All right, and so that's that's the gradient in that steepest direction, right and As you know a function is is alpha g convex if it's alpha convex in the length parameter along any geodesic curve or equivalently if it's Hessian what's called the geodesic Hessian is larger than alpha times m Where the geodesic Hessian is based on the Euclidean Hessian plus some Christopher terms, okay, so the point is if you take this natural gradient descent, okay Then a Function is going to be strong alpha strongly g convex in the metric M if and only if Its natural gradient system is contracting at rate alpha in the same metric in The same metric, okay, so in other words if you take your natural gradient your natural gradient is contracting in the metric M This is going to be contracting in the same metric M and this comes from a very obvious Just algebraic computation Which actually if you see from the tensor point of view is even more obvious But let's just do the algebraic version The point is that this matrix here that you're trying to make negative definite for the system to be contracting is Precisely minus two times the geodesic Hessian. All right, so In other words the geodesic Hessian is one to one related to this contraction property Okay, it's one to one related to this contraction property and therefore the the The natural gradient will be contracting in this in this same metric You can apply this also to the non strict contraction case. So in the case where you know, it's not Strict convexity you may have lots of local minima and so on So there's lots of results known about this from rap sack and all but the nice the fun thing is that you can prove All these results purely from Dynamics point of way, okay Very simple dynamics point of view. So for instance of the natural gradient dynamics of F are semi-contracting in a certain metric F M Then F is G convex and every stationary point of F is a global optimum and Every any geodesic between two optima is composed of global optimum Okay, and so that's a property of geodesically convex functions But it's also something that you can prove very easily using these dynamic tools, right simply by taking two equilibria of the system and a path between these two equilibria initializing the path any way you want, okay, and Basically the flow of this path the the how this path transforms through the flow of the system Will tell you immediately that f of x1 will tend towards f of x2 Okay, and therefore the value of these equilibria will be the same and therefore if one is good They will all be global because there's one global equilibrium somewhere So and you can prove that to very easily using these dynamic tools as well Similarly you can play with I won't get in the details of that especially because I'm running out of time But the you can play with tools like the Bregman divergence or its specialization to the KL divergence in the kale of in the case of probabilities distributions or in the case of positive definite matrices and You can prove all sorts of So typically the the Bregman divergence is convex in the in the first argument, but not necessarily in the second and You can show Using simple dynamic these simple dynamic system tools when it is geodesically convex in the second, okay Simply by playing these games. We just did okay, so for instance the discrete kale divergence is g convex in Q and Similarly, this is going to be g convex in the matrix Q, okay so Basically, you can use in a sense use this formula here, which is just a pure algebraic formula To to derive if you want all these questions of convexity and and quasi convexity and Partial convexity and so on purely from a dynamic system point of view now the other thing is that This formula is also true of everything, but the metric depends on time Okay, so in other words, it applies to non-autonomous systems as well, okay So it says all these combination properties we explained about contracting systems Negative feedback hierarchies and so on apply immediately to these natural gradient systems, okay in any metric Okay, so for instance You can have a hierarchical natural gradient when you have a natural gradient of metric m1 and a completely different State and dimension with a natural gradient of metric m2 then this Overall system assuming each of them is strongly g convex the overall system Will be globally exponentially contracting, okay, and that's of course that's backprop Okay, that's backprop that accept its backprop in a concurrent backprop Basically, you have a series of great gradients or natural gradient, but they all occur at the same time, okay And you show that the entire thing is contracting With a rate which you know explicitly from the contraction analysis Which is actually in this case the slowest rate of the the rate of the slowest subsystem Okay, and of course there's nothing special about two systems You could have ten systems or 200 as as we were shown the first day, okay Similarly if you do Negative feedback between two gradient systems So for instance, you could have something like that in adversarial learning when you have very much kind of a game theory kind of set up, okay Then if you have negative feedback between these multiplayer games then assuming that f1 is strongly g convex in In x1 and f2 is strongly g convex in x2 And you have this negative feedback then you're going to converge to a unique Nash equilibrium Okay, so you're going to converge basically so so this could be in principle applied to these adversarial settings and so on and incidentally, you know, for instance In in Ian's talk on on Monday, you know that also suggested very specific kinds of natural gradients and metric Like these punker spheres and things like that, right? Which you know would apply directly here as well Similarly, if you have primal dual kind of optimizations, you know, it's very similar to the negative feedback if you have primal dual kinds of optimization Then you can do geodesic primal dual right by putting metrics metrics inverses in the in the computation and Basically, you can give very simple conditions, you know if if The the Lagrangian is g convex over x and g concave of a lambda then this entire thing will be contracting With a metric diagonal of mxmx land and then lambda. Okay, and these are all like two line proofs, okay? So you can play very easily with these primal dual optimization and so on using these metric tools Now so this is related to your question here. We're talking about Contraction, okay any two trajectories end up doing the same thing in particular if the system is autonomous is not time-daring Then it would mean that they tend toward an equilibrium in general they tend toward a unique trajectory now, how about things like synchronization So How do you deal with oscillators or how do you deal basically with doing multiple optimizations in parallel and sharing their results? Okay, so the basic idea at least in our context is what we call partial contractions supposed for instance You have two systems. Let's say two oscillators and you want to show that they synchronize if you couple them in a certain way Neither of the oscillator is contracting because you know once in its limit cycle You know, they don't catch up on the limit cycle But you can show that they synchronize if you can exhibit a third system of mathematical system Which is contracting and has these two oscillators as particular solutions All right, so and similarly if you have a system, which is which You want to show is That the solution tends let's say to zero you can also play with virtual system this way so for instance since we were just talking about the optimization setting suppose you have a Natural gradient, but with a time-daring and state-dependent learning rate P, right? So when does this work? Well, you can build this virtual system This is mathematical system, which is minus P of x t times m inverse of y and Gradient of f of y. Okay, so think of it this system. Okay, it's Jacobian It's the basically the usual Jacobian if you didn't have P because for this system The Jacobian is not just multiply by P of x t right is just an external thing and therefore this system is contracting Okay, but of course it contains x and the minimum of f as particular solutions And therefore x tends exponentially towards the minimum of f Right, but you see the subtlety here is that we didn't take the Jacobian of this system because would have been a mess We took the Jacobian of this virtual system, which is basically the usual Jacobian times P of x t All right, and so this shows that If f is alpha strongly g convex in m then this system converge exponentially to the minimum of f with rate alpha times the minimum of p, right? similarly in Biology of this has a long history, Bonnie Bassler is extremely active in this field at the moment and many other biologists There's this idea of quorum sensing that we could also think about it from the point of view of optimization so the idea of quorum sensing is saying you know a Bacterium assuming it's a bad bacterium by itself doesn't have much of a chance to do anything to its host Okay, so to do things to its host it multiplies Okay, it grows and at some point is enough bacteria so that indeed they have a chance to the power of the immune system So they switch on different behaviors and in coordination try to attack the host Okay, the question is how do they know how how do they know how many they are? Okay, how do they know that they are there's enough bacterial to have a chance to attack the immune system? Okay, so suppose we start so let's have a simplified version of this suppose we start with a Number of dynamic systems were not necessarily contracting and we couple them all to all Okay This is equivalent. This is just algebra. This is equivalent to creating a common quantity sum of xj and From each of the xj's and feeding it back to everybody Okay, so in other words if you're trying to create an all-to-all connection, that's of the order n square But actually you can create it by creating a common signal and feeding back to everyone and of order n And it's the same thing you haven't lost anything Okay, and that's exactly what the bacteria do right they put in the environment a chemical Which is called it of an auto inducer and then measure the total amount of that chemical All right, and this way they know how many they are and this way as this becomes a synchronization mechanism And this way it's exactly equivalent to all-to-all coupling Okay, now how do we show under which condition? These systems synchronize well, these are the real system i equal one to n We pick a virtual system of the same dimension as one of the system y dot equal f of y t minus k and y n is the number of Systems k is the gain plus this k sum of xi right and We see immediately that if we look at the Jacobian of the system in an identity metric This disappears when you take the Jacobian and you just get the Jacobian of f minus k times n times identity All right, and therefore this system is contracting if k n is large enough meaning either if the gain is large enough for instance, you know the The environment is confined enough. Okay, or you you have led them more time to multiply okay, so This is contracting if k n is large enough as soon as this becomes contracting You guaranteed that all the individual elements which happen to be particular trajectories of your contracting system will synchronize towards one another Okay, because they all happen to be then particular trajectories of this virtual contracting system, okay, and of course you can use exactly the same thing in optimization if you're trying to if you use many gradient systems or natural gradient systems and You you're creating this common variable and sending it back to everybody else Then you immediately have an algorithm Which in order n creates all-to-all coupling and guarantees you synchronize above the threshold, okay, and just for fun because It has been a long day. I can just show you was it Google or was it Yeah, can show you something with about ten years ago with Patrick Béchon and Aldébaran This is a this is kind of silly, but it's just so not silly but it was really a toy thing We did very very quickly. It's so we had a bunch of now robots You know when they had taught them to dance right and then you push up one of them on the floor And it has to get back up and get back in sync And so we did that it was about ten years ago We did exactly this quorum sensing thing right where the server served that well you was used as the quorum variable You know they all send their positions to server and send them back to everybody else This is what it looks like Okay Without the music So this is Patrick. So you put one of them down, okay gently And so the other ones are still synchronized with this quorum sensing, okay, and then you know Catches up with what everybody else is doing Okay, and of course the demo is nicer when you have like a thousand because you know the point is that they still all synchronize even the Disturbances and so it's a feedback system right so it's very robust and they still synchronize But You can And this is just this is just this is the entire math right so it's again one line kind of proof Okay So these Oscillator synchronizations have a have a bunch of Non-intuitive properties and I'll skip some of them because of time But you know, of course one of them is that you can get global synchronization from local interactions, okay? One of them is that you have this leader following property which you can show Which is you know suppose you take a million let's say oscillators, but perhaps optimizers, okay? and You couple them perhaps locally so that they get global synchronization or through a quorum sensing mechanism and Suppose you drop in there a guy Who is just like the others? Sends information to locally to the others or perhaps to a quorum variable. Okay, send information just to the others But doesn't get any feedback Okay, so this Subsystem so you have your million oscillators and you add one oscillator which connect to the others locally perhaps But doesn't get any feedback. What can you show? Well, you can show that everybody still synchronizes but of course since he couldn't KLS about The others they synchronize to him All right, so in other words, you can have these million oscillators and you have a million and one Oscillator very small change in the system and suddenly the phase of everybody becomes his okay in The Bush era and probably even worse today is just be called the leader Okay, it's is the is the system that doesn't listen to the others, but everybody has to follow Okay, so and you see you have just this one system very sparse change and because everybody's still synchronized They have to synchronize to him the same math show that if you have let's say a million oscillators and All globally synchronize say wait a minute I don't like the synchronization because that happens to be called epilepsy Okay, or that happens to be called problems the power system or something like that You know, I want to get rid of this oscillation. What can I do? Well, you can take your million oscillators and take two of them at random at one inhibitory connection In other words one connection just like the others but with a minus sign Between these two oscillators and everybody will stop And the reason is that before you had a virtual system which was contracting and by adding this connection The real system now is contracting and the real system being autonomous in that case It tends towards an equilibrium point. Okay, and so you can show this. It's a little. That's why I was saying non-tuitive, right? It's a oops. Why is this? It's a it's a little non-intuitive the You know, this is an example of Fitsu Nagumo oscillator So it's a simple model of hotchin-hoxley oscillator. We take one variable. Okay, we plot one variable Think it was a thousand oscillators So they synchronize very quickly from original initial conditions. You put this one inhibitory connection. They stop You remove this connection. They resynchronize. Okay So in other words the synchronization is very robust to changes in parameters and we'll come back to that if we have time But it's very fragile to change in topology Right just adding this one inhibitory connection changes completely the behavior Okay, and you know from a from a philosophical point of view think of what we said before about the talamis and so on We were saying, you know, you have the system which works, you know But the data is very small very sparse change in the data Change a lot the behavior and here we're showing very sparse change in either the input in the case of a leader Which is like the data right or in the topology of the connections We'll completely change the behavior of the system all right Now of course when we talk about synchronization it applies as well to as I said optimization for instance if you have a System of multiple equilibria synchronization mean they'll always vote for the same equilibrium, right? And you have old if you own all the algebra and again I skipped some of the algebra, but it's just a few lines all the algebra is very very simple Okay, because basically just a basic algebra and positive definite matrices So I don't have time to talk about that. I'll skip this 15 minutes. Okay What you question? Yeah, so maybe I let's say Anybody who knows what that is This is from the right of spring. Okay, fine The point is maybe I'll just show you briefly at least this right suppose you have Three similar systems which you're trying to synchronize very easy to do it with a common leader like system and it's not a question of symmetry because for instance if you have a feedback from one system to back to this They will still synchronize the yellow things will still synchronize because they still get a common input from this center Okay, so it's not a question of symmetry is a question of what's called input equivalence Okay, there's lots of work done by Golubicki and people like that on on the earlier earlier inspirations that idea, okay, so the So it's input equivalence and the yellow things will synchronize now Suppose you take the yellow things and connect them to green things completely different dimensions and Dynamics and so on. Okay. Now you see that because the yellow things synchronize the green things now They will serve as a common input to the green things Okay, and therefore the green things which are far away in the system will globally Exponentially robustly synchronized without ever talking to each other Okay, and going for completely different dynamics Right again, this could be dimension 10 and this could be dimension 2000 and completely different dynamics and You can change a lot of things about the system which will still be true Okay, this will still be true with all the yellow things synchronized and all the green things synchronized Of course, there's nothing special about two Variables two colors. You could have ten colors, right? And this was originally motivated for us by this question of binding in the brain The fact that as you listen to somebody give in a talk, you know some parts of your brain process Visual information and some parts of your brain process sound and so on and the visual information itself divided into edges and And colors and so on and similarly for the sounds and the onsets and so on but All of these computations are done at different parts in your cortex And you don't have you're not conscious of them, but you have to time them You have to know that these different parts are bound Okay, you have to know that these different parts are talking about the same time instant Okay, and that would suggest a mechanism to do that Okay, and from a contraction point of view is just instead of taking talking to instead of tending towards Instead of tending to word the unique trajectory you tend now towards a linear subspace and Since I won't have time to talk about it. You could do the same thing talking to tending towards a manifold And so you'll have very simple conditions based on Jacobians, and you'll have the same combination properties So I'll skip the details of that just to say that It's robust Convergence to a subspace Okay, and therefore if you put small errors in the dynamics instead of converging to a subspace you'll converge to some boundary layer around this subspace This may be one of the reason why nature is using spikes Okay action potentials. Okay, lots of good reasons to use spikes you mixing Continuous and discreet computation things like that, but also from the point of view of timing. Okay, suppose this cartoon is actually The the spot two spikes for two systems just on top of each other. Okay, so we're just synchronized Okay, suppose you move the parameters of one system So now they desynchronize All right, originally were synchronized you you mess up with the one system. They desynchronize. Yeah You see that if you're using spikes Desynchronization means a huge error on the trajectory right because we are spikes. These are very big and then zero Okay, but from what this robustness results says it says that a Huge error on the trajectory can only be explained by a huge error on the parameters Okay, and therefore If you take let's say and you can do this on on that lab any way you want right if you take two models of spiking oscillators Which synchronize and you mess up the parameters of them one of them by 60 percent They're still synchronized exactly at the same place Because this robustness results says that spiking give you enormous robustness of timing with respect to parameter variation Because you would need huge parameter variation to To justify these huge errors created by spikes which do not synchronize And actually that's a numerical example Which I think is like if 80 percent parameter change and they still spike it exactly the same time Now remember I talked about these kinds of I talked about these kinds of Predictive feedback hierarchies you can Do them also these where with contracting systems, but you can all also do them with with oscillators, okay? so In a sense you you have very simple properties that tell you when when you have a system like this Composed not of just basic dynamic system, but of spiking Oscillators when the overall system will be this big synchronized spiking system All right, so I have a much time So just just to say that synchronization protects from noise, okay, and you can compute that explicitly, okay, so in other words You can show that If you have Let's say if you have this is the output of one oscillator And if you drive this one oscillator with a lot of noise then this becomes this output, okay now Suppose then that you instead of putting one oscillator you pick ten oscillators and you couple them So that's in the absence of noise they would synchronize Then this is what you get in ever the red Okay, so in other words the fact that you've coupled them so that they synchronize Give you the same noise averaging behavior in these completely non-linear systems That you would have in a linear system, right in the non-linear system in general you don't have noise averaging, right? But the input noise averaging but the synchronization gives you that okay, so another way to see this is that if you take The the pure system and you look at the mean you get this green Green curve if you look at the mean when each of them has a lot of Independent noise you get this curve, which is nothing to do anything it looks clean because it But it doesn't have to do anything with the signal but if you take the red curve, which is when the systems are Synchronized so that they would exactly synchronize in the absence of noise then basically you recover the noise averaging properties of linear systems And these completely non-linear oscillators driven by input noise You can play with multiple timescale optimization very easily I don't have time to talk about it But basically you can use these ideas of a combination of contracting systems and so on to give very simple conditions When you have multiple timescale systems Like this which depend on an upper and a lower level When when is it sufficient that each of the system is contracting by? By considering the other two as external variables, okay When is it sufficient that these each of the system is contracting from the overall system to be contracting? Okay? I'll completely skip of that so you can have applications to putting together neuromorphic chips Robotic turtles micro satellites on the space station computer graphics motor primitives and in robots Distributed adaptation slam I won't have time to talk about and And I won't have to talk about arbitrage, but I'll be happy to answer questions and last point You know once you know how to play with stability and I think we have at least for reasonably smooth systems and so on a very Rather general and comparatively simple way to play with stability of non-linear systems You can play a lot of games with controlled instability. Okay? So for instance, you be wonder, you know, suppose I'm trying to solve a graph coloring problem You know under which condition is the most Basic idea which is to just take win or take all and have in fight with each other Under which condition will this work and and you can show under which condition it actually works quite generally and I won't have time to talk about this except to say that you can play with notions of controllability of networks which is directly kind of graph notions and you can Play also notions of observability and if we think about this idea of facilitated variation this notion that You have core processes built up by evolution and that Evolution then targets mostly how you put them together then you see that understanding this from a mathematical point of view at least Loosely understanding this or having a nice analogy if you want from a mathematical point of view involves both these tools of contraction and What does it mean to choose the connections and the choosing the connections is not controllability of nodes But it's controllability of links. Okay, which is you know, we had this nature paper on the controllability of Systems which had to do with node controllability But if actually you do link controllability, which is work that the V check and his His colleague did that actually in low no low link controllability is the tool You need to be able to describe, you know, how are you going to pick the right connections to get Whatever extra complexity you so I'll I'll I'm around anyway I'll be happy to answer questions and most of the papers are here and of course this is work with lots of students Which are all of this So in aeronautics and control theory general, yeah Designers I try to make airplanes stable, but as And stable as possible because the least stable they are the more control they are The more the most maneuvering so so yeah, yeah, so thanks for asking so yeah So the control is stability so the example of airplane, you know Not when you take a jumbo jet when you take an a380 is not clear But when you take a military aircraft, right the center of mass is very close to center of lift so it basically is the The the plane is either unstable very close to being unstable and basically you rely a lot on the control system You are entirely in the control system to to make it work again The idea is that basically when the military aircraft wants to to make a maneuver You basically throw it into instability and then you catch it Yeah, you don't want to do that for a passenger jet, but you do that in military aircraft Yeah, well to some extent yeah, but you don't throw them still you know, but yeah Yeah, that's definitely the a380 is very close to that. Okay, you know, if you think of a 747, you know the nose was in front to have The center of mass very much in front of the center of lift and of course it got stretched and stretched and stretched and the a380 is Two decks right so the nose is nowhere, right? so Control instability you can have the path of least resistance also You know lots of people are playing now with robotics and deep learning to open doors, right? With robots, but opening a door with a robot is one degree of freedom thing Okay, or maybe a two degree if there's a handle and then you open the door And so that actually can be done purely with one line of code Because basically you're just trying to create an instability and the system will by itself find in which direction it can move Okay, so that's another example of control instability This constraints as faction I just gave you notice another and and if you think of it, you know expansion and pruning right this Million new connections per second. I was saying it's just it's just if you do a Scaling right but actually it's much more complicated than that because not only you have connections which are created That's the unstable process, but you have lots of connections which are pruned Okay, and so you have the interaction between the two, but so yes, so it's stability. I think is a is a major mechanism Used in the aircraft, but still grossly underused I think in general in particularly in robotics Or last question If I listen to your talk, can you then explain with your theory of contraction? What is the success of deep learning strategies? Okay, so that's what we're trying to understand. So success. So first of all, you know So so first of all you have to realize either that's why I was pointing out 20 watts and you know, so so We still very far from the brain and we still very far from you know from a theoretical point of view, right? we still very far from The couple of examples a child needs to learn what the lion is as opposed to the millions of example that A deep learning system might okay, but in a sense it's saying okay, so so you have You have these tools which can deal with of course convex optimization deep learning is not convex But more generally g convex optimization, which is more general you also Know that so I'm not saying but by the end of this also you won't know the answer. I'm just trying to hear but I You also know that stochastic gradients of former filtering Spatial filtering and so you know in a sense if you take a function which is rather messy and spatially filtered It's going to be much easier to get to a good minimum They starting actually I think to do in deep learning what exactly what we did in quorum sensing before okay Which is to try to coordinate these different algorithms and so on okay, so I'm just saying you know You know and of course I'm Ari and so on I've worked has worked a lot on just you know Optimization the natural gradient optimization for learning and so on so I'm just saying having these really comparatively quite simple mathematical tools To do all of this I think should should help a lot And that's and especially this mapping between just natural gradient and contraction You know allowing you to use all the tools from contractional all the combination properties all the synchronization property and so absolutely for free You know I suspect that should