 is the setup of the problem I want to discuss today and hopefully I can convince you with that's not too unrealistic and maybe has some interest. So we're going to look at the very the easiest possible perceptron which is the linear perceptron which is represented by the circle which is trying to learn random associations between correlated inputs which are representing here as signals that involve in time and correlated output. So just to set up the correlated inputs are represented by this matrix x i mu and mu is you can think of it as an index of time or patterns and inputs have a certain mean which can depend on the pre so all that we can think of this as freezing up the neurons and sigma i's are the standard deviation and then we will set the correlation structure of these psi i mu which are the mean removed input and the same for correlated output okay so the output will have its own mean and its own start and deviation and then we will decide whatever correlation structure we want. So all of this boils down to solving the regression problem and I wrote the loss function here which in general depends on the input x on the output y and on possible regularization gamma which is an L2 regularization. So what you have to do is basically use some overall patterns and you can think of this as summing over time and then take this color product between the synapses w and x you add some sort of bias you can think of this as a constant current can be an excitatory current can be an inhibitory current and then you want your processing input to be very very close to the prescribed output okay so what we want is that we want these inputs to come from excitatory fibers that means that these weights should be constrained to be positive and the blue ones are constrained to come from negative fibers so they get multiplied by negative synapses and so that we think of wi to be greater than zero for when i is less than fe and so fe represents the the ratio of excitatory to inhibitory fibers, excitatory to inhibitory increasing of the neurons and the other ones are negative okay so now it's my job to sort of convince you this is somewhat interesting problem and the starting point will be uh training a recurrent neural network okay so we start from what i consider like the hydrogen atom of theoretical neuroscience which is the so-called scs model the Sombolinsky present the Somers model you can think of this as like a continuous time version of spin glass you just you take a random network you just draw a Gaussian recurrent weights this jij and you scale the variance accordingly and the dynamics of this network will basically like calorically wander around this very huge state space and this has been studied to death we know many of these using dynamical mystery and then what we can do from this sort of reservoir which we will use for computation is that we can try to read out something from from this random neural network okay so we we set up some some readout weights like I say which are indexed like a these are weights w I'm just representing one and if we let this thing evolve over time we will see that the some sort of spontaneous activity which which is when projected onto this random W network will look something like this okay let's say we want to uh transform this random network to what is called a functional network we want this network to do something maybe to respond to some to some stimuli in in some way uh the the very popular force learning algorithm what it does is that it just learns the readout weights so these weights w and then has uh some feedback from the readout to the the actual uh neurons you can think of this as a rank k when k is the number of readouts perturbation to the uh initial uh completely random uh connectivity metrics and if you use a very fast uh least square method like a recursively square you can train these weights and after training so after learning you you you will have the uh post learning the network is able to uh for instance self-generated self-generates and very complicated readout so what about the recurrent weights of course there are many ways there are all the stuff that we know from gradient-based learning and gradient-based learning in this case would be a way to solve the credit assignment problem so we given an output uh z of k which has to be equal to f of k which we decided as prescribed output uh we have to to assign a certain uh time course to the each uh neuron over time right so one way to circumvent completely this uh this problem which is net resolved by uh gradient descent is to restart completely and uh invent a second network which we will call the t-shirt network this is a non-train completely uh random network which receives as an input the output that we want so we want to generate an output here to read it out and so we feed these output to uh completely unstructured say a non-learn effort and then we take the target currents which is with the which uh this teacher network generates and then we match these currents which we read out from the teacher with those of the student so what the student has to has to do is that each neuron in the student has to reproduce the time course of the current so these current are representing here at h i s is the current generated by or impinging on a neuron i in the student network and it's basically like that the row of the geometric time fee and this fee can be whatever so in a rate network can be uh like what we call the rate activation in a spiky network can be uh the post synaptic potential where we integrate spikes of prismatic units and it has to be equal to the the target currents the h i t which are generated by this teacher network which uh gives which has an input with this random weights u k i the actual output f k that we want so by introducing by circumventing in a way this uh uh this problem of uh created assignment we factorize the problem at least along neurons right each neuron has its own targets and in parallel it has to learn some time course and so of course we can rephrase this as a regression problem so we have a loss function this loss function in principle is just a sum overall of loss function over every neuron and we have an output which is the current uh of uh in the teacher and h i is the current in the in the student that we want to learn we want to learn the j j's and possibly bias in order to match this to okay with some formal regularization uh i don't want to give you the idea that this only applies where we when you take the teacher you use it to generate the targets and then you throw it away to use the student so uh this can be used in general and has been used by christopher kin for instance has been used very recently by the group of carna carajan when you actually have some recordings uh from and you from neurons in circuit so you have some recording at the level of single neurons or you can have some recording at the level of population and so you have those targets which are not just generated by a random matter but the actual recordings and you want to construct a student say which reproduces those activity and you can do different things you can maybe you have some measurement on the current themselves maybe you just measure the the activity and then you want sort of to infer the current that that's what's kind of garage on for instance is doing to uh sort of infer the differential effect of different areas uh in this image her network of network approach okay so what happens if we uh remember something about the brain which is uh the brain respect base law which is the fact that uh that the the the courses uh differentiation and the course the courses division in the in the brain is that uh between neurons is that we have excited the universe and individually we have to consider a student which is divided into two populations so an excited population and inhibitory population of course we can use we can play the same game we have a teacher which is itself divided into an excitatory and inhibitory population and it receives as an input the same readout that we want to actually read out from the student okay this opens all the breakthroughs in this case because you have to solve many other problems there's not a it's not easy to like uh generalize regardless of the square to this constrained uh to when you actually have constrainable synapses uh calc suppression in the networks uh when they are in the balance of all balance regimes complicated and actually we studied this a lot especially uh Reiner and Gelfan as Van Gellicke are again sort of a complete uh dynamical mean field theory description of of of this problem and so okay let's jump right to the to the like the spoiler it's actually possible to solve this problem you can construct these models which have this uh this layer of biological possibility in the sense they uh they are composed of two different populations so for instance in here I'm I'm showing some results uh like a network of I guess 400 units 200 excitatory 200 inhibitor endurance that just uh oscillates over time produces this uh self-sustained non-linear isolation in this case you know sort of a functional network which receives two different signals with a with a delay period of delta tau and then responds with the pump which is exactly at the same uh with the same delay period okay so this is the actual response of the network this is the input okay so we can do that but then at some point we uh sort of ask herself the regime the dynamical regime which a rate network or a spike network uh would operate strongly depends on the kind of solution that we have for this regression problem so we we're asking each neuron to independently solve uh a linear perceptron problem we think so we have a resulting uh connectivity metrics and the structure of the connectivity metrics uh acts feedbacks on the actual dynamics of of the network so the the internal dynamics of the network we uh will depend strongly on the structure of the learned connectivity okay and so and now I hope that I motivated the fact that uh we can consider a linear perceptron task with excitatory and inhibitory constraint as a sort of a toy model of target-based learning okay we can consider like these targets coming from a random network we can consider these targets to be actual recording for instance okay so let's put some more ingredients just to formalize mathematically the problem we said that we have some correlated input correlated output we take the number like the length the time length if you wanted the number of patterns to scale uh with the proportionality factor alpha with uh with n which is the number of synapses the output is correlated with this metric cy and for the input we consider two different correlation one uh the first one which I call the ensemble covariance I don't know why I gave this name but ideally it means that you have the uncorrelated uh we have no correlation among neurons okay so that you have xy and u xy j nu there's a delta ij but you have only correlation which have been called semantic correlation or correlation in time if you want and the second one is the sample covariance uh we assume that we measure something we we measure the targets for instance and we do pca so in order to do pca we just uh take out the mean we normalize and we have the xi can be written with an sbd uh as us v transpose and and let's assume that we know s so we know the the spectrum uh if you want s square we know the variances of pca and we don't know anything about actually the direction of pca so we will average over u we average over p this is the model okay just two words for the method the the method that I use was just the the usual replica method maybe we can talk about this afterwards but basically what you have to do is you have uh you write a free energy and then the free energy is sort of a a generic function for all the all the important quantities that you want to that you want to uh measure say once you average over the distribution of the input and the output and technically speaking what happens is that if you consider um inputs which are uncorrelated across neurons but uncorrelated but are correlated in time what you can do is that if you have a stationary process the covariance matrix would be topless or if you want you can consider periodic boundary condition and it would be actually circled so in the end what you have to consider is just the power spectrum in in Fourier space and the actual error the regression error if you want is epsilon which is the density of the regression error will just be an integral over a lambda and you can think of lambda as like the square of singular values in this case or the eigen value of the covariance matrix or in other words it's just the Fourier spectrum so if you have a certain Fourier spectrum in frequency it will it will act on the structure of the weights and we act on the structure on the actual regression error that we will get in the second case let's say we we measure and we do pca and we have a certain a certain spectrum for the singular values okay and we don't care about the direction uh you and me so we integrate over the school r measure because we know that the the metrics of the left and right uh singular uh vectors of the basic octagonal matrices so we can integrate over these octagonal group we have this complicated integral and in the end what we get is uh the that the um the the result which describes this delta q which basically describes the the variance of the distribution of the weights and epsilon which describes uh the the the actual regression error will depend on uh in general on the stilties uh transform of the uh of the distribution of the eigen value so we we uh in the end everything will just depend on the distribution of the eigen value distribution of the covariance matrix which we sort of fix uh or we we measure it and then we can use this mean field regression if we have a measurement of uh when n is not actually going to infinity that is very big say even if you have n equal to 100 this kind of question work very well and um I think that methodologically speaking it's very interesting to to see that if you have iad output uh you can actually write this in terms of a quantity which comes again again in in the in random matrix theory and and it's been uh sort of studied from different perspective in average stack equations in the in the concepts of spin glasses in in the context of random matrix theory which basically is the integrated R transform which is dualistically connected to what people information theory uh called the the channel transform so you basically have these quantities which only depend on the spectrum of the covariance matrix in the end on what uh on pca which people in neuroscience do every day okay so uh so much for the method uh what are the results that we have so uh the important thing is that the uh the optimal solution for this regression problem are uh balanced weights so the the the notion of balance uh uh network comes to mind so a balanced network is one in which each uh neuron receives a very strong excitatory current very strong inhibitory currents but the two sort of balance and it's different in a case of sparse and and that's natural but the the basic idea is that they they balance at the level of the mean so that you have a current which is over the one but with fluctuation which are still over the one so you have Poissonian spiking for instance in the case of spiky and on spiking network and what we see is that at the level of a single neuron you get basically the the same scaling so if you have an external optimal bias which is the word of square root of n then uh so the average weight this we for instance and the average inhibitory weight there will be over the one with square root of n such that when they get multiplied by the average activity say the average input the average excitatory input in the average inhibitory input they cancel each other so the the mean contribution contribution from the mean gets uh canceled up to order uh one of a square root of n and it gives you the uh what you want in the output so the average output and and uh this only works if you optimize over the bias too so if you uh if you don't learn a bias for instance you get into the trouble uh of having uh yeah like a total current which is a function of n is always days of order one but uh this contribution which I'm calling h tilde which uh goes up and seems to increase as orders per root of n but uh is sort of balanced by by some finances balanced by uh by by correlations between the rates uh and the and the variation of the weights we respect to their averages so you don't get basically a balanced network which which have some very interesting dynamical properties so we can sort of in this very simplistic model you can already see that the optimization of the external current either gives you a solution which is balanced or not uh in terms of the of optimizing over this quantity i I think the second in the second interesting result is that there is a sort of a gauge symmetry once you take out this contribution from h tilde so once you learn the optimal current there is a gauge symmetry which implies that basically no matter uh what is the ratio of excitedly limited endurance and for whatever correlation in the input and output the optimal capacity for this perception is always 0.5 and here for instance it says gamma equal to zero gamma was the regularization parameter so at no regularization parameter you basically can learn and half pattern and then you start uh increasing like the the the regression error status increasing linearly and these curves are for increasing uh value of the regularization you have sort of a zoomed transition and this gauge symmetry also happens to uh impact the distribution of the weights so if you look at the optimal distribution of the weights once you optimize over the external bias you see this very uh strong delta function in zero and again whatever is the the ratio between excitedly inhibitor endurance you always have that uh in half of the synaptic weights are zero so you have in half of the excitedly and half of the inhibitory are zero and the the shape of the distribution tends to be a Gaussian with variances that you can control so I'm running out of time I'll skip this and maybe we can talk afterwards just to conclude I show you that target-based approaches are are effective for training recurrent networks and we can try and understand the structure of the resulting connections inside these recurrent neural networks just by using like very simple models very simple methods that come from the analysis of feed-forward networks and that's it thank you very much