 Jean-Claude. Thank you very much, Laurent, for the introduction. So as Laurent said, this is the title of my presentation, The Young, the Statistical Prospective on Deep Learning, which is in fact the usual one nowadays. The toposic point of view, invariance and semantic information. So I just wanted to notice that there is an archive paper that is a common work with Daniel Benka, which has been available since last night in archive. So it is effectively a joint work with Daniel Benka, and I would like also to thank a lot Laurent LeFoq first, who opened to me this fantastic world of toposies in 2017. And also Olivier Carmelo, who has helped us a lot to understand some notions that I will introduce afterwards. So just a brief introduction on AI and machine learning for our mathematician colleagues. I need to say that I'm not at the origin of mathematician, but I'm coming from the electrical engineering world, so it's in general quite far away of those things on toposies. And this is through, in fact it is through AI, so artificial intelligence and machine learning that I have been confronted to toposies and other mathematical notions, other related mathematical notions. So what is machine learning? In fact, very briefly, it can be, let's say, it can be divided, subdivided into three basic tasks. The first one is called the supervised learning, which in general is used to perform classification or regression, and that means, for example, that you want to recognize, for example, in an image if there is a cat or not. So just use what are called labeled input data, that means it's images where someone has labeled them with cats or non-cats, and so this is used for the machine to be trained. Then so you train the model by using these labeled input data, and finally you test it with new images, and you hope that you will be able to recognize even in new images that haven't been part of training data, that you can recognize a cat or not. Regression is, let's say, related to more continuous problems. Then there is also unsupervised learning, so you don't have any labeled data. You have basic raw data, and the idea is to perform grouping of, it can be grouping, it can be dimension reduction or discrimination. So you don't know that what you are considering is a cat, because in fact this word is completely unknown to you, and what you want to do is to discriminate, for example, cats and non-cats, based on some, let's say, for example, cats will be in some manifolds and non-cats will be outside this manifold, and you want to understand patterns and to discover the outputs. Then there is another one, which is maybe very close to the way animals or humans are behaving, which is reinforcement learning, where you have an agent that interacts with its environment by performing actions and learning from errors and rewards. So it's a trial and error method. So it's, in some sense, reinforcement learning can be considered as a kind of supervised learning since when you have rewards or errors, in fact, there is someone, something that says that, I mean, that it is an error, or that gives you some reward. And so that's why it's quite close to supervised learning. In this presentation, we will essentially consider the supervised learning case, because maybe it's the simplest one to understand in terms of toposes, but also the other ones can be understood this way. Okay, so in order to perform these tasks of machine learning, in fact, the most popular way of doing that and most successful is using neural networks. So what is a neural network? So this is an example of what is called fully connected deep neural network, where you have in fact here. So this is the simplest case where you have input data here at the input. So this neural network, and you have layers basically. So input layer, first hidden layer, second hidden layer, et cetera. And at the output, you have what is called output layer. So I'm sorry, it's a little bit small, but the output of a neuron yj will be, in fact, a kind of, you perform, you compute a linear combination of the inputs Xi by using what are called weights, Wij, and these weights will be, in fact, learned using training data plus what is called a bias. So in fact, instead of being linear, it's a fine, a fine transformation. And then you apply it to this number, you apply nonlinear function phi, which is called an activation function, and that could be a sigmoid. So this one hyperbolic tangent rectified linear units, so which is the third possibility and some other more exotic activation functions. In fact, depending on the problem, you choose the one that is best suited. So in order to train your network, that means in order to compute the weights and the biases, in fact, you need to, in fact, what is the most popular algorithm is called back propagation. It dates from the 90s. And in fact, what you're doing is that you compute a loss function, which can be based either on the Kulbach library of cross entropy or mutual information, or maybe also some other loss functions are also possible. Even some of them are just based on Euclidean distance. And in fact, this loss function has variables, which will be the weights and the biases of the neural network. And then you find them, the idea is to find the minimum of such function by using the label data, so the label, the training data. And the idea is to use gradient descent. And thanks to the chain rule, in order to compute the partial derivatives, then the gradient calculus becomes very efficient and can be done layer by layer. And so this is on this figure so you can see, let's say, a schematic view of back propagation. So I don't want to spend too much time on that. So you have seen a neural network based on fully connected layers, but you can have other architectures when you try to deal with problems which are very specific. For example, what I call the convolutional neural networks, I will come back to this architecture later on, because it's something which is really related to a word on my title, invariance. So and the idea is fact is, instead of considering a fully, let's say, let's say, a fully family of linear transformations on when you are considering edges from one layer to another one, then the idea is to restrain this to some more specific linear transformation. And in this case, it will be convolutions. And so you will have some layers which are convolutions followed by max pooling, which is just a kind of, let's say, restriction. And then at the end, you will have fully connected layers. I will explain later why we use this architecture. This one, the convolutional neural networks, are basically used, in fact, for computer vision tasks, so for image processing, essentially. Then you have recurrent neural networks, where, in fact, here what you have are vectors. And the idea is to use a kind of, let's say, a kind of loop. If you unfold this loop, then you obtain this kind of architecture. So in fact, this architecture can be used when you have time series or when you have to consider, for example, a natural language processing where you have some sentences which can be considered as a kind of time series, fine. But basic recurrent neural networks are not good. They cannot be trained efficiently. Because when you consider gradient descent, then the gradient rapidly vanishes. And so that means that the loss function will not become very low. And that means that you will have too many errors. So the idea is to consider some other kinds of cells in these recurrent neural network settings, which are called long short-term memory cells, LSTM cells. And so I will not spend too much time here as well. So as you can see, the idea is to have not only short-term memories, but also long-term memories in order to make the neural network able to be trained, sorry, more efficiently. OK, so after this very brief introduction on machine learning and neural networks, let's go into a toposive view of deep neural network. So very briefly, what we have is that, in fact, it's possible first to model a neural network. Let's say by using, let's say, those grotendic toposites, it can be done in several steps. The first one is based on the architecture of the DNN, of the deep neural network. And it will constitute the base site, the base grotendic site. So in fact, our way of considering it is by considering, for example, in the case of a chain, for example, what we have seen was this fully connected deep neural network. And in fact, we have shown that the best way of modeling it using toposites was by considering that each layer is an object of some site. So here, in fact, the feed-forward functioning of the network when the network has been trained will, in fact, correspond to a covariant function X. So from the category which is generated by the graph, by this graph, to the category sets. So, I mean, you can see that it smells the grotendic topos, of course. In fact, this Xk plus 1kw, which is in fact a mapping from Xk to Xk plus 1, so it will be an edge here, will correspond to the learned weights that goes from layer K plus 1 to layer K. And it will correspond to each row in this category C op of gamma. So gamma will be, of course, this graph of the neural network. And then the weights will be encoded in a covariant function, so this blackboard w, from the category C op of gamma to the category of sets. So the idea is that at each layer, Lk, so we define Wk as the product of all the sets In fact, W, so this is essentially a matrix that goes from the layer L plus 1 to the layer L of weights. And to the edge that goes from layer Lk to layer Lk plus 1, we will associate the natural forgetting projection that goes from Wk to Wk plus 1, OK? So then the Cartesian product Xk times Wk, together with this map, will also define the covariance functor, blackboard X. And the natural projection from X to W will be in fact a natural transformation of funtals. And what is interesting is that if you consider supervised learning, which is the central case we will consider here, then the back propagation algorithm can be represented by a flow of natural transformations of the functor W to itself. And in this case, in the category C of gamma, X, W, W and X become contravenant functors from this category of sets. That means that there are pre-shifts of a C, and that means that there will be the objects in the pre-shifts topos C-at, OK? So this is the case of a chain, which is quite simple, because in fact, in all these settings, we will have objects and natural transformations in the topos of pre-shifts based on this simple site. Now, if you have something a little bit different from a chain, that means if we consider the general case, then the situation becomes a little bit more tricky. And now the functioning and the weights cannot be defined by functors on C of gamma. So in fact, what we have done is a canonical modification of this category. And now, for example, if you have this kind of problem to be solved, that means you have in this graph many different, let's say, modules that converge to this subject A, a small A, then we have to perform a surgery, because considering this as a site will not work at all. And the idea is to introduce new objects here, you can see, between all these A prime, A second, etc., and A, and the object A here by introducing capital A star and capital A, right? And with arrows that go from A star to A and from small A to capital A. And that form a fork with tips in A prime, A second, etc., and the handle will be formed by capital A star, capital A and small A. And what is it? And if we reverse the arrows, then we'll have a new oriented graph without oriented cycles. And the new category C will replace that which, sorry, the category which will replace C of gamma will be the category now C of bold gamma, which will be opposite to the category freely generated by this bold gamma. And now, the main structural part, so that means the projection from a product, so the product of A prime, A second, etc., to its component, can be now interpreted by the fact that this pre-sheaf becomes a sheaf for natural growth and dik topology, J. And in fact, on every object X of this new category, the only covering will be the full slice category, C on X, except a fix is of the type, sorry, A star, where in this case we have the covering made by the rows of the type A prime towards A star or A second towards A star, etc., okay? Yeah, so in this case, we have, in fact, basically all possible scenarios that happened, all possible structural scenario that happened in neural networks. So even if we consider modular neural networks where you connect the many neural networks to some other ones, etc. Okay, so this is a structure, but the structure is not enough. Now we have to consider a second stage, which is now the, what we call a pre-semitics, and in this case, we'll see that considering just Goten di Ketopos will not be enough to characterize all possible neural networks that can be used now, and maybe to consider to characterize also some new ones that may emerge in the future. Okay, let's start with a simple example, which is the example of convolutional neural networks. So this is the one that I showed you in a preceding slide. So here, you know, the images that, because this is a convolutional neural network is used for image processing. And so images, of course, I assume to be by nature, invariant by planar translation. For example, if you have an object in an image, and if you shift it, of course, this object will still be the same object. And so the idea is to use this invariance in order to learn much more efficiently. That means that you will have much, if you are able to consider this invariance, this translation invariance, then you will be able to consider much, much less weight to be learned. And that means also that you will need much less training data, you know, in order to make the neural network learn. So in this case, in fact, this is imposed this to a large number of layers to accept now a non trivial action of the group G of 2d translations. And also to a large number of connections between two layers to be compatible with the actions of this group. So that means that even the underlying linear part when it exists will be made by convolutions with a numerical function on the plane. So this is the way in fact this action of the group G of 2d translations will be considered. Of course, it doesn't forbid that in several layers, for example, these last ones, the action of G is trivial in order to get invariant characteristics under translations. So in this case, of course, the layers can be fully connected. Some other groups have been considered also in the literature together with their convolutions. So and now DNS that analyze images, they have always they are constructed in the same way, which is several channels of convolutional maps and with max pooling in order to make this as an object. And all these are joined then with this fully connected the DNN in order to take a decision. In fact, this looks as a, you know, a structure in order to localize the translational invariance. And this is in fact what happens in the visual areas in the animal brains. So it's really a copy of the nature. So what is interesting also is that experiments show here that in the first layers, we can see kinds of wavelet kernels that are formed spontaneously in order to translate contrast and color. And the opposition kernels are formed to construct also the color environments. So it's these convolutional neural networks are very, very interesting tool for major processing. Okay, so let's go back to our toposic interpretation now. So as we have seen, we need to take into account these grouping variants. So toposic manner to encode the situation. In fact, the consistent cause in considering the contravenant fonters from the category C of the network. So the one we have seen that takes into account the structure of the network with values in the topos of these sets. Okay. So, because it's, in fact, of course, it is exactly the, in fact, the actions of this group G on sets are in fact, the objects on the topos of the sense. So the collections these factors with the morphisms, they will form a category, which was shown to be itself a topos by Geo in 1972. So we thank Olivia to have informed us of this fantastic work from from Geo. It is equivalent in fact to introduce a category F, which is five of the groups isomorphic to G over the category C. Okay. And it's, it's satisfied the axioms of a stack. So, if, in this case as a canonical topology J, which is a courses one, such that by the morphism from F to C is continuous. And, in fact, the ordinary topos E of sheaths of sets over this site, FJ is named the classifying topos of the stack and is naturally equivalent to the two C J tilde that we have seen here. And the general theorem is much more general than that it doesn't concern only groups, but it extends to any stack of a C. And it says that the category of covariant functions from C to the topos of the fibers is equivalent to the classifying topos of the stack. In this case, nothing is seriously changed compared to group if the group is replaced by your group read. And if we consider category F, which is five, but in group weights of the category C, or it's associated stuff for our own purpose. In fact, we have also considered post sets and post sets fiber group weights instead of group weights is something that I think that Daniel will introduce them, but it will not be part of my talk. So, with groups, we open the, the, the, the, the neural, the convolution neural networks with group either in fact was this interesting is that we open. In fact, the interpretation of the long short term memory cell RNNs. So, for example, or what I called, for example, the attention networks which are very powerful networks. It's a generalization, which is very, very interesting for us. Then we have the language. So, in fact, we have to consider now a vibration another vibration over F, which is denoted in this case a, and which choose an adapted language and the semantics over every object of the architecture. And by context in it's in its internal category F. So, in this case, the objects you have the architectural category C, together with side of the fiber F on you will represent the specific context in the layers represented by you. And each one of them possesses a reservoir of logic in the classifying sets of parts, Omega of you. And, in fact, the transmission of the potential logics between layers and context for morphism alpha H will goes into the two directions. So, Daniel will be will explain with more details, this, these logical filters. So pie star, and so covariant from the pie sub star and the contra variant one by index, sorry, exponent star, which come respectively from the red agent. Sorry, F star alpha and the left adult F alpha so which extend at the unit extensions of the pullback defined by the front of F alpha so you will have a more detailed explanation by in the next talk. So, and they will give rules of transformation of the formulas or actions that will be available at one layer into formulas of action to another layer. So to another, to another connected layer, and it can be a backward or forward depending on the pie star that we're considering. So pie star would be a pie exponent star would be a kind of projection when pie sub star will be a section of my exponent star. So it's, in fact, one is, let's say, will go to the output theories, when the other one in fact will enrich by considering some other possibilities. So it's something that will be explained by the nature. So, now just before considering the this concept of information briefly some, I would like to show you the results of some basic experiments. I would like to thanks a lot Xavier Giro for that he performed all those experiments. So, the first one, we're done by using a small networks. In fact, we want to do. We have been inspired by a result from a two neuroscientists. So, Moshovakis and Neuromyotis. So to Greek neuroscientists, which have, in fact, analyzed what was happening after the what are called the motor equivalent cells so a Mac neurons. And so, and they found that the neurons that were coming afterwards were, in fact, performing. Let's say Boolean propositional calculus. And we wanted to see exactly what was happening. If we replaced those neurons by using artificial neural networks. So, we modeled the output of the Mac cells by, in fact, using some activation signal that are that were distributed using a fund mises probability distribution function. And the idea is that we have an activator a that can take three values. Capital E, which is the eye that means it corresponds to an activation of the, of the eye of the monkey because those experiments were done on monkeys. It can be so capital H was the hand and H was in fact both I and N. So, in fact, the, we use the very small neural networks. The first experiment where with three layers. So an input layer L zero and output layer two and just one in the layer L one, and those numbers are just the number of neurons per layer. And then we tried also four layers and five layers in order to see what was happening. The activation function was the habitable extension. So, just very quickly, here is what is happening. So those circles in fact on those circles you can see the activation of one neuron. So in some hidden layer. So, in fact, this is a cell one cell two cell three and cell four. You can see that, for example, this is the way they are encoded. The blue curve represent, in fact, the response of the neuron. When it's the eye. Okay. The red one, the response of the neuron with when it is the hand and the green one when it is both I and him. Okay. So, well, and then when the, when the curve is dashed, it means that in fact, the, the sign of the output of the hyperbolic tangent is minus one. And if it is not dashed, it is plus one. Okay. As you can see here, when the curve is red, in fact, the response, the sign of the response depends on the angle. But when it is blue or green, it doesn't depend on the angle. So we cannot, in fact, deduce any logical behavior when it is read, but we can deduce here a logical behavior when it is blue or green. If it is blue, so it's I, we can say that I implies minus one, it is the sign minus one. And when it is I and the H, it implies also minus one. So if we contrapose those two implications, then we obtain that one implies the hand. So that means that if we have so, so if the output of the of this cell is positive, then it means that it was the hand that was, which was the activation at the input, right. We can see the same, for example, for cell four here. So we have only one curve that doesn't change sign. It is the blue one. So it means that I implies, in this case, plus one. And if we contrapose this implication, it means that minus one implies hand or each. So you can see that here. This neuron performs already proposition bullen propositions. All right, with three hidden layers, then the network generates complete triplets of cell. That means that in this case triplet will be sufficient to conclude in any case, because the triplet here always, in fact, as this kind of behavior. So one implies I one implies H or one plus each minus one, etc. Okay, so from these. This behavior of the three cells of the triplet. In fact, we can conclude the, we can conclude in any case, because, okay, when you have a configuration that implies the force. That means that in fact, this configuration never happens basically. And for the other configuration we can always concluded without any, any, any ambiguity. All right. What is interesting here in this experiment is that we have used two different encodings for each or each of the output. The first one where we encoded the, in fact, these three activation activities, sorry, by using just an interval, right. And this was, he was at one end of the interval each at the other end and each in the middle. And the problem is that in this case it was very hard to make logical cells appear, but then by using this encoding. Then, of course, this encoding respect in some sense. The group of symmetries of the problem, because you basically you can exchange a each or each right. And in this case, in fact, of course by using it. So, then the logical cells were appearing much, much more easily. It is something that, in fact, shows that the fact that the action of this group is very important in this case is the symmetry. Okay, then also we have done some experiments when instead of considering three classes, we're considering more classes. So in this, and we were considering what we call the logical information ratio, which was the number of decidable logical propositions at each layer divided by the number of logical propositions that can be generated by in the theory. In this case, you can see that when you go from the input layer to the output layer, then this logical information ratio increases. And at the end, basically, you can, you can decide everything in the theory. Okay, we have done also some experiment based on predicate calculus. And so, in this case, we have considered very quickly three bars, two or three bars, it could be a red bar and red bar or also a blue bar in our experiments. And we were considering an interval so line. Or it could be also a circle so we have tried also to consider line module or something. And, in fact, for example, for two bars, the questions that were asked where. So the first one the other disjoint or I is one bar included in the other one or why or can they intersect but the shortest is not included in the longest. So we can see that compared to the input layer where you just some senses can a region of this interval or off the circle. In fact, the propositions that are in that I evolved. In fact, predicates and not just coming from proposition calculus said with three bars the same questions but of course with more possibilities. In fact, for example, so these are the first results. If we train with bars of respective lens, five units and three units for our energy. And if we test with the same length, then, in fact, let's use this figure maybe rather. In fact, a bar can be just encoded by using the center of the bar and the length of the bars. In this case, what we can see is that, in fact, the testing looks almost perfect. Because basically we are able to, I mean, it's because the testing is done with the same links. In fact, we don't ask to the neural network to generalize in any sense. If we ask it to generalize, then, for example, by using still for training the bars of links five and three and for testing, for example, with links four and six and also we can exchange the bars now, for example, the longest one in training becomes the longest, the shortest one in testing. Then, as you can see, the results are not so bad, but it's quite blurry here, here, here, and here. So it's, I mean, of course, it's not bad, but of course, a little bit worse than in the preceding case. What is interesting also is that if we are considering, for example, less than three layers, then the neural network is not using logics to perform the task, but it's using Fourier analysis. But from three layers and beyond, then it's a logical analysis that is performed. All right. What is interesting is that this connection and inclusion only are the most frequent outputs. And in order to decide the inclusion only. In fact, most neurons instead of training to decide it directly, they eliminate the two other possibilities. Probably because IO is more difficult from the point of view of predicate calculus than the two other possibilities. But it's just a hypothesis. Okay, if we have enriched training, in this case, then in fact, we have a remarkable logical behavior. Yeah, where are we just by using, in fact, the outputs of two neurons, we can basically answer to the questions that are asked. Of course, it's, I mean, it needs quite high generalization power from the neural network, but it's not bad at all. There is also a very nice relation between the weights on the, I mean, the lattice weights and the logic. In fact, what is interesting is that at the last layers, in fact, weights, it is as if weights were performing the proof. Sorry, because here we can see that. For example, sorry, maybe it's. Yes, if you're considering, for example, the histogram of deductive power of the weights applied to quantize activities, if you are considering all possible triplets of neurons at this layer, always in this case, it's six objective functions, so it's three colors, then you can see that the weights are, I mean, I distributed basically in almost everywhere, let's say, but if you just select the triplets that are, I mean, the interesting triplets, then the weights, in fact, become much more, I mean, have distributions that is much more narrow. And so it's really because in this case, they are, in fact, they basically perform the proof from the, for example, the last hidden layer to the output layer. Also, we had, oh, sorry, maybe I will have to skip it because I have not much time. Okay. So, now let's go to the to the part four, which is, in fact, something related to the notion of semantic information. If you want to define what is semantic information. Okay, we need to understand how semantics appears, for example, if you use a neural network. Okay. First of all, the semantic category that we will consider is, I mean, is quite general category. So first of all, the artificial intelligence is connected to the real world. The one that we are perceiving. And so in this case, languages have to be, let's say, as general as possible. There cannot be just the languages that are used currently, for example, mathematics, they have to be richer than that. And as it has been suggested in Lambeket Scott, in fact, a good caricature of semantics will be, of course, the interpretation of the language of in the complete category. Of course, topos is a good example. But here we aim at being more general. And what we propose is by close monoidal category. And, in fact, it is a category such that for any triple of objects, x, y and a, they will exist to, let's say, exponents, exponential objects one on the left and one right and natural bijections such that those equivalents will be satisfied. So what does it mean in terms of, of language? It means that if, for example, x, y, etc. are the meanings of some thing. So the, the, the, the, the, this tensor product would be the composition and the exponents on the, the exponentials on the left and on the right will be respectively the conditioning of a by respectively the presupposition of x, or the post supposition of one. The, the rows in this category will be associations of meaning invocations, etc. Of course, if those two exponential, if they can commute then we get the classical case of topos is. So, in this case, theories will be collection of objects and arose such that if let's say a is belongs to the theory T and if this arrow for me to be in a then be as to be in a and two actions of this monoid a so the one on the right and one on the left, given by the exponentials will be named conditioning. And these condition x will be essential to define the notion of semantic information that will be defined in, in more details by Daniel in the next. So, let's, let's say, let's see now what, what we call data sets in fact we'll see that it's not in machine learning they are not really data sets that are much more than that. So, in order to see that, let's consider the case of supervised learning. So we have input data. So, sign, which are, let's say, elements of a big set of, of possible data. So they are basically all possible data that can be seen at some point by the neural network, right. And then, you have a deal put basically some theories T out, belonging to a set of three capital theta. All right. So in the classical settings of machine learning, a new network is seen as a parameterized a set of functions. So FW so it's parameterized by the weights W from x, and which associate to any data side in a theory. Yeah, for example, you have data and you have to say if it is a cat is true or false. That's a very simple example. So they have been in universal universal approximation theorem by Sybenko in 1989. And that says that continuous maps from compact subset K of a numerical set space, sorry, already. So basically, in this case to be the input data to another numerical space can be approached uniformly on any compact subsets by a standard neural map is fixed nonlinearity of sigmoidal types. So the sigmoids are, it's an example of activation function. Okay. So basically this shows that the neural network works well for interpolation. Right. But the problem is that it has to be a compact subset. So now what happens outside the compact subset because the problem is that, in theory, even with a low probability you can, in fact, some new completely new thing is input can happen. In this case you have absolutely no guarantee that you will find the right theory corresponding to these data. Okay, so what about extrapolation. In fact, it is related to what is called generalization in machine learning, which is the possibility for your network to be able to extrapolate. But in order to do that, just a theorem of analysis is not enough at all. We need something else. We need to capture the essence. So the structures of the data with respect to the goal, which is in fact relate to the task or the question which is asked to the network. In this case, what we want is that a small set of data of data size zero, which will be the set of data useful training the network can be considered as representative for the learning problem. But just training of these data set is sufficient to know basically what will happen over the whole possible data sets. Right. And in this case, the approach of deep learning is, in fact, what I presented you to you in the in the second part, which is to construct an architecture that will be expressed using the growth and the site of the architecture network, a stack, which will be considered by using vibrations in groups, group weights or module categories, and the language, language story, which will be a vibration of this type of layers of neurons, which will be able to extract the structure from a The main are sampling size zero. And, in fact, we will have the relations between data and theories through now properties, so that we need invariance by the action of a group or something more general group weights or module categories. Okay. This is, in fact, we just introduced the action of, in fact, something much more general than the action of a group on the set or of group weight on the set, which is in fact the action of a category G on another category V. This category G will act on this other category V when, in fact, the contra variant from G to V is given. And in this case, because, for example, if a group acts on the set, we need to consider elements. So we need to define elements in the category V. So which will be just a fi which will be morphism from you to V, which will be considered as an element of the object V. So now the definition of the action. So that we that G to the first category acts through this functor F from G to V. And we have V equals F of a. So, then, in fact, the orbit of fi under the slice category G of a will be the functor from this left slash category to the right slash category. So if we associate to any morphism this element of F of a prime in V and to any arrow from a second to a prime over a this corresponding morphism from you to F of prime to you to F of a second. And so in this case, the theory of topos stats and languages will extend the notion of actions of categories and the morphism to the action of fiber the category F to a fiber category. And in fact, from group equivalence, which in fact is represented in the structural properties of CNNs, for example, we'll go to category. And so, in particular, so what we have is that now given a shift of category from C to cat. For example, a stack on group ways or some other categories that we consider as a structure of invariance and another shift that we consider as a structure of information flow. For example, possible theories or information spaces that Daniel will define afterwards, given an object Q of C, an action F on M will be a family of contraverent functions such that we have this nice commutation relation. This is a vast generalization of group equivalence and it will allow us to consider much, much more general structure, structure, sorry, on neural networks, in order to take into account. So, many, many aspects, structural aspects, and the fact that we will be able to generous much more much better than what is currently. So, okay, sorry, it's a little bit late, but okay, we have done a hypothesis of invariance enlargement. It means that in this between the inputs side and, in fact, and the output, there exists a kind of layer, which we call a maximal invariance layer that contains basically the full possibility of of invariance of the problem. So, okay. And in this case, okay, I'm sorry because it's a little bit late I have to be very, very, very quick. So, the correspondence from the input side to the output of theories, the theta will be said to be justified, if there is a language, which is external or coming from some supervisor, wider than the language of the output so the language of the question. And of course, and broader than the language of the input, basically, which is just a very simple theories, what you have just many, many objects, for example, the pixels of the image by, but basically, no morphisms. And which are the languages respectively adapted to the questions at the output and the encoding the input. And this correspondence will factorize by the language L X through a collection of expressions of this type the following aspects ABC of psi in the language of input expressed by sentences in the language of X, of X, in the language L X will characterize the proposition in the language of okay. Okay. Yeah. And so in this case, we consider something in order to be able then to propose a kind of theorem of semantic coding, which is this is exact category. We have two semantic. Sorry, where you have. Sorry, fiber categories at the input at the output to semantic sheaves of language, respectively, at the input and output given by the fiber category. And now we will assume that on F this central category, which, in fact, will be the will be the place where you will have the maximum invariance. We will have this language a and that can provide the justification from for the mapping sigma from the input to the output, right. And then to prove that every justified problem can be realized. By triple C F and a, we must realize this F and a by a stack and languages over a site to see that is given by a neural network architecture. Okay, and the invariance under F will be isomorphic to the maximum enlargement of the stack. So this is for now. Just an hypothesis, I mean, a kind of conjecture that we hope to be true. Okay. Okay. This is just the first notion of semantic information that has been introduced by Karnap and barrel in 1952. In the, in this case, what they had was, in fact, elementary propositions in this case in this example for example they had three subjects ABC. And then these subjects where individuals were persons that could be either that have two different attributes and for male F for female, why for young and all for older. All right. And then elementary propositions where just something in which you say that a is a young and male or and B is young and female and C is old and male for example. And then the combinatorics. In this case, of course, is the war of all these of all possible parts of those propositions, elementary propositions. Right. What's interesting is that in this case, what could be information. In fact, by considering some shapes in some spaces and shapes, I will show in which way. We can see that there exists a galore group G of the language that is generated by the presentation of the end subjects in this case three. The permutations of the values of each attribute and the permutation of the attributes that have the same number of possible values. In this case, the group of subjects permutation is the, is the, the, the symmetric group as three, the transposition of values will be a 182, I mean this transposition and for same for the gender transposition, and then you have four integers of attributes defined by this permutation by this permutation sigma K and K three and tau, right. And this group is generated by the group generated by Sigma Sigma A and Sigma J. In fact, will be afforded eight. It is simply the the Adrian group of the four of all the isometries of the square with these permutations. And the stabilizer of a vertex will be the cycle groups C two of type either Sigma or tau. And the stabilizer of an H will be of type Sigma A or Sigma G, and it will be denoted this way. So, in this case, the galore group of the language will be G, which is the product of, of S three and before. Notice that in the language here L will be a shift over the category G, which plays the role of the fiber F and C as only one object, you zero in this case. Okay. So you have four types of orbits. And what is interesting is that, in fact, all those types, in fact, can generate a new proposition in this big language. For example, for type one can be translated into all the subjects of the same attributes. This is why, et cetera, for the other types. This is why we need to consider now information measures, semantic information measures that are not only a scalar quantities, as it is a case in sharp non information measures, but these those need to have value in in the space, right, which is in this case, in fact, okay. So maybe I have to stop. Sorry. Okay, so thank you very much.