 Yeah, exactly. There is, in fact, I hesitated regarding the question mark because at the beginning I was not sure at all. I mean, there are many aspects of toposis probably that could be right now very interesting for what we are doing. But then there are some other aspects. For now, I have no idea of what they could bring to our activity at Huawei and essentially in wireless networks. So, in fact, this talk is essentially a way of trying to identify the topics covered by these wireless networks. The topics in which toposis can bring some value. And, in fact, at the beginning I had very few of them and then by thinking, by looking at the literature many other ones came to my mind and I think that it's just the beginning and probably in the near future we will be able to identify even some other topics very interesting and that could bring some solutions to some of our problems. Okay, so let's start with the problematics that we have in wireless networks and that could be impacted by toposis. Okay, so first what is wireless network? What is the wireless communication in general? In fact, it is something that uses a lot of tools that are very diverse. For example, information theory, so Shannon theory essentially, signal processing because in these wireless communication networks, in fact, we have to transmit bits and the problems are to transmit bits. These bits have to be converted into signals and then, of course, at the receiver these signals have to be processed in a way that we can, for example, minimize the irreprobability or some other measure. We have also optimization. For example, I mean, there are many points where optimization can be useful and I can give you an example. Suppose that you have, in fact, in wireless networks, in fact, the wireless network is a cellular network and the problem is that each cell interferes with the neighboring cells, right? And the problem, of course, is that if everybody starts to talk loudly, the problem is that you will impact the signal-to-noise ratio of the neighbor that will start to speak even louder, et cetera, et cetera, and so, of course, it will diverge and that's why there are some procedure which I'll call the power control in order to avoid these problems and we have to find the optimal solution of power control so it's an optimization problem that can be centralized or decentralized. We have also problem related to data fusion. For example, if we have sensors, we want to use them in order to localize terminals or in order to do something else. And then we have graphs. So graphs can be used to study the cellular network. It can be used also for channel coding. I will give you some examples afterwards. And finally, maybe I forgot some of them. Sorry for that. And finally, at least in my list, there is channel modeling, which is very important because this is the basis of the wireless network and in order, for example, to be able to transmit or to receive efficiently the signals that have been transmitted, in fact, we need to know the behavior of the channel and we need to model the channel in some sense. Okay, so you see, there are quite a lot of technical areas impacted by wireless. So now, okay, if I just copy the definition of the Grotendig Topos, what can be the relation between this and this? This is not obvious at all and this was the problem that I was faced with that problem. So which of these items can be impacted by this? In fact, it's not easy at all to answer directly. But just using the definition, we can first try to find some relations between some of these topics. Maybe all of them, you will see. The first one, for example, Daniel has shown in his talk the relation that we can find between information theory and the Topos. For the other ones, there can be some relations and in fact, I will start by finding some relations between these points and sheeps. Okay? Moreover, there is another constraint we have. The problem is that we need to implement algorithms. We need to perform computations. And so, we need to use computational and algorithmic mathematics. And that means that, for example, the sites or the topological spaces that we can use will be quite restricted. They cannot be... I mean, we need to find the ones onto which we can compute some interesting quantities like homologes, etc. And also, we need to find, let's say, some... We need to find also sheeps that give rise to computations that make sense for us. For example, what we can do the best is linear algebra in terms of computation. And that's why we will mainly focus on these restrictions. Okay. So, first of all, I needed to analyze why we could need the sheeps or more deeply the toposes In fact, this property of sheeps is the one which we need the more. That means how to... I mean, from local to global, we have local data how to reconstruct the global picture. And in fact, that's why the computations that we will mainly be faced to will be the computation of cohomology groups, etc., etc. And also, sometimes, to derive some criteria for setting to zero some cohomology groups in order to remove the obstructions. Okay. And so, okay. For example, our local data can be local probabilities. There can be LLRs. So, this is log likelihood ratio. So, it is related to probabilities. For example, for coding, for channel coding, there can be local interferences if we consider a cellular network and we want to go from the local interferences picture to something more global. And so, in this case, we need from there to derive the right sheeps. And so, for the last stage, I have to confess that I haven't too much idea for now. Okay. Let's start with one of these applications, which is, in fact, the channel. It is a basis. The channel is the basis for us. Okay. So, first topic is, I would call it learning the channel. I can tell you why. Suppose that we have, for example, this is a cellular network with here cell, with its base station. So, you see the antennas and mobile terminal that communicated with this base station. I consider in this example, the uplink. And so, we have, let's say that, in general, if we use a simple model, we can say that what we receive on all antennas, so this is why there is this word, MIMO, which means multiple input, multiple output. So, multiple antennas at the transmitted, multiple antennas at the receiver. So, in fact, it is this model, which is the simplest and the most commonly used. So, what we receive on all the antennas is modeled as a vector. Why? And it is some matrix H times what is transmitted by the terminal, which is also a vector, and plus some noise, as you see. This noise component, in fact, can incorporate also interferences coming from the neighboring cells. So, as you can see, here the channel is characterized by this matrix H, all right? The problem is that we have to recover the signals that have been transmitted here, so which are embedded in the vector X. And of course, if we don't know X, we don't know X, and if we don't know H, then we are in trouble. So, what... No, no, no, it can... It's a rectangular matrix in general, yeah. So, the idea is that, of course, we have to know in some sense the H matrix. So, what we do in this kind of network is that we transmit what we call pilot sequences, which is, in fact, a sequence of non-symbols, and onto which we will project the coefficients of this channel matrix so that we can estimate these coefficients, all right? And at the end, so our new problem is that we have just to detect what has been transmitted in X when we know, in some sense, or when we have estimated the total coefficients which are embedded in this matrix H. Okay. So, this is what happens in the multi-user case. So, you see, you have exactly the same kind of expression, but then we have the sum of the two components coming from the two terminals. So, now for 5G, 5G, in fact, is using what is called massive MIMO, which means that in this station, we have a lot of antennas, many, many antennas. It can be 64, 128. Okay. And what is called CSI, so it is channel state information, so it is the knowledge of the channel or the knowledge of the H matrices, lives, in this case, in a high dimensional vector space because of massive MIMO. So, the problem is that in order to estimate all these coefficients, we need to send a lot of pilots, all right? And at the end, the problem is that if we send pilots, then we have less space for sending data. And that's why one possibility could be this, because in fact the CSI lives in a very high dimensional vector space. In fact, what is sure is that for a given cell, let's say that the set of all possible CSI doesn't fill the whole space, but rather lives in a manifold which will be of dimension which can be, if we have many, many antennas, it can be much less than the whole space, the whole original space, all right? And so the idea should be here to go from this high dimensional vector that requires a lot of a lot of pilots and so which will let's say prevent the transmission of high rate data instead of using the whole space, just try to use this manifold of lower dimension, all right? In this case, it will reduce signaling and also reduce the feedback, so the feedback can be necessary, for example if you want to transmit then, for example, in downlink from the base station to the terminals the base station has to know the channel which goes from the base station to the terminals and which is accessible only at the terminals, for example, I mean in some cases, let's say let's be not too general and in this case learning this manifold knowing what could be this manifold could be very valuable for us, okay? Then another problem, I mean another application related to the channel is fingerprinting, so there are some people in our lab who have implemented some fingerprinting approach in order to in order to have the localization of the terminals by using machine learning the idea is to use the CSI so the channel knowledge at the base station in order to determine the localization of the terminal, okay? So, what is the idea? The idea is that in fact by using supervised machine learning we train a neural network it can be extreme learning, deep learning I mean, there are many possibilities in order to learn a given let's say function phi which defines a manifold here as well and this function in fact is a function of the CSI and of the position of the terminal, okay? So, we train a neural network in order to learn this function at least partly and then once this stage is finished we can find the localization as a function of the CSI, all right? The problem is that in this case as well as in the other case what happens is that the manifold we have to learn in general is learned with noise it can be noisy observation we can have small variations of the channel because the problem is that the channel behavior depends also on the obstacles that you can find on the path of the the waves and these obstacles may be fixed or moving like cars or human beings I mean, yeah? So, this is another another, let's say, example where we need to learn some manifold which can be impaired by noise by some cluttering by, you know, some small changes of of the channel, all right? So, and for these two cases there in fact we can propose to use persistent homology to solve at least partly these problems. Okay? Another problem, so what is quite amazing is that this morning I have discussed with Daniel about these belief propagation on graphs and I didn't know that there were some solutions I mean, based on on the computation of homologies and, yeah so that's good news for us. Okay, especially we have this problem of making belief propagation work on graphs essentially in channel coding, okay? So for example in 5G, so in the fifth generation that we are developing right now of Wallace networks cellular networks. In fact the channel codes that have been adopted for 5G I mean, for what has been done right now, but maybe there will not be so much changes in the next releases. There are what are called LDPC codes for in order to encode data bits and polar codes in order to encode control bits, control channels so data is the information that is sent and the control is the bits that are useful in order to make the network work, okay? So, and these two families of codes are quite different LDPC codes in fact are decodable by using a belief propagation of a graph which is called a tannograph and polar codes are decodable using another kind of algorithm which is called serial cancellation and apparently it's not related to belief propagation not yet, let's see so what we know about the decoding of LDPC codes is that the tannograph of LDPC codes on which we have to run belief propagation is loopy we have cycles and so in this case we know that belief propagation doesn't work I mean we cannot make it work well or it's something that is very painful very complex and so okay? oh sorry, yeah sure, sorry for that so you have a graph and the idea is that you have some probabilities on vertices of this graph right? and the idea is that for example you have some constraints some vertices can correspond to some constraints that in fact will change the probability you start with unconstraints probabilities and then you take into account these constraints right? and you have the idea is to to do what is called also message passing on the edges of the graph so that you take into account locally all these constraints right? and at the end the idea is to have a global understanding that means at the end let's say the the optimal thing that you have to do is to let's say to take into account let's say all these constraints and if you have a tree then it's quite easy to do it but if you have cycles then we get some issues and it's very harder to have the optimal solution in this case right? so and but for LDPC codes in fact what we use to decode them is to use belief propagation as if it were a tree because what we know is that when the length of the LDPC codes goes to infinity then the tannograph tends to be a tree but of course we are not using infinite length codes and the idea is that we use belief propagation as if it were a tree but so it's so optimal but we design the code in such a way that the gap to optimality is kept small alright? okay so in this case let's say that we don't really need to run belief propagation on a loopy graph but there is another case which is polar codes so very quickly so polar codes is based on what is called the kernel so suppose that you have the simplest kernel which is the original one is called the Arikans kernel so Arikans is the inventor of polar codes and in fact it works in this way u1 and u2 are two bits alright? and then x2 equals u2 x1 equals u1 plus u2 okay? then x1 and x2 are signed into the given channel that can introduce errors erasures or can add Gaussian noise for example and so in fact x1 and x2 will see exactly the same channel right? and the output of the channel will be y1 and y2 but then in this case what u1 and u2 which are the information bits what these bits will see I mean which channels will they see because of course x1 and x2 will see the same channel but u1 and u2 will not see the same channel and so here is how it works so okay this is matrix form and let's do a capacity analysis of what happens here okay suppose that i of x and y is the mutual information between x and y so it can be an index because it's exactly the same channel we have here so if we consider this vector channel now we have the mutual information is the sum because these two channels are independent and so this is i of x1, x2 and the vector y y1, y2 then I replace in fact what I want to know is what happens if you instead of having x and x1, x2 you have u1, u2 so in fact because u1 is x1 plus sorry because in this case you can recover u1 by doing the addition of x1 and x2 and then once you have recovered u1 you can say that u2 is just can be recovered by the observation y and the knowledge of u1 then you get this kind of equality right and in fact what happens is that this first term is smaller than the symmetric mutual information right which is the capacity in this case of the symmetric channels and this will be larger ok and now polar codes are formed by considering the end time the Kronecker power of this kernel of the Arican kernel ok and in this case what happens is that for a given ratio of the bits that we have the first term will tend to 0 and for the rest it will tend to 1 and the ratio of bits for which the mutual information here tends to 1 will exactly be the capacity of the channel if n goes to infinity alright and this phenomenon is called the polarization the channel polarization and this is why these codes are called polar codes alright but then the problem is that this Arican channel is polarizing but too slowly so too slowly with respect to the length of the polar code so the idea is that we want to find some better polarizing channels of course their size will be bigger than 2 which is the case for the Arican the problem is that if we have a general kernel so a general kernel will be let's say a square matrix of size L which is invertible in the finite field of size 2 ok and ok let's forget it so this and what we know is that channels with better scaling exponent exist if the length is more than 8 and there is another exponent which is the error exponent and these two exponents in fact characterize the way the channel polarizes with respect to the length of the code and for example there are kernels ok for example this kernel of length 16 is known to be the best kernel in length 16 in terms of error exponent ok so now what we have to do is that if we want to find the equations of decoding of a big kernel then in fact for each bit that is decoded it corresponds to a tanner graph and this tanner graph may have cycles this is for example for length 8 and for decoding the third bit corresponding graph as you can see there are cycles and this is a very simple one if you have for example the kernel that I showed you the ebch kernel of length 16 for decoding for example bits in the middle then the tanner graph will be really will be terrible it's something very very hard to decode so and how to decode it is by running belief propagation on this kind of graph on the tanner graph and we cannot do the same as for ldpc codes we cannot change the code because this is not a code this is the kernel that is fixed and so we have to find a way and what I was thinking even before talking with Daniel was that belief propagation is exactly this local to global so there should be some corpology and so and I was happy to learn that it's true so I will be happy to read what you have done on this topic alright now we have identified a couple of problems I have also identified some publication which applies sheeps on the domains on the areas that I have listed at the beginning of my talk and I can show you a couple of examples first one sorry before starting we have in fact as I told you to use topological spaces on which we can compute things like corpology groups and in this case one good choice is to use simplificial complexes so for those who don't know what it is so what is a case simplex let's say so this is the concrete point of view of simplificial complex so this is for example zero simplex is vertex one simplex is an edge a triangle for two simplex tetrahedron for three simplex etc and a simplification complex is a finite set of simplices that satisfy these two properties so ok now in order to apply these to apply this for example to be able to compute corpology on these simplificial complexes in a way that will be efficient for our applications in fact this is a very restricted way of defining pre-sheaf on this on this complex but it is the one which will be useful for us ok so pre-sheaf on the category of vector spaces or it can be modules and so we define of course restriction which we will call this way f of a a ruby and a b are two phases and of course we have consistency of the composition and we have also to define between pre-sheafs or sheafs and so this is it assigns for each phase a linear map which satisfy this equality ok and then we define a quantity which is so b a and this quantity equals 0 if a is not a phase of b otherwise it equals plus or minus 1 depending on the orientation and this quantity will be useful to compute with what follows and essentially the co-boundaries etc. so this is our co-boundary which is defined this way and so this is the former sum over all case simplex and you see this appears in the prediction map and sigma of a sorry ok let me go on so this we have the classical we have the classical relations between the the kernel of decay and the so the cycles of the boundaries and because of that of course we have we have bk which is a subgroup of zk and in this case the homology group the case homology group will be defined in this way as the quotient group ok and in this case we get a sequence of linear maps and and of course this the idea is that in fact the idea of this homology group is that only the elements of zk that are not already consistent are worth mentioning ok and now let's apply it to sampling theory so this example comes from this reference from Michael Robinson and he proposes a shift interpretation of the sampling theorem so in fact he starts with a shift of vector spaces on an abstract simple complex and then he uses a shift morphism between two shifts so f which is a shift on this simple complex and s is in fact what is called a sampling shift and which correspond to the operation of sampling and it is associated to the sampling morphism from f to s and then he defines an ambiguity shift a in which simply the the stock in fact a of small a where small a is a face is just the kernel of this map and is that the shift theoretic sampling theorem that he derives it is that the global sections of f so of identical with the global section of s which means that we can reconstruct in fact we can reconstruct the we can we can reconstruct this just from this if and only if in fact the case homology homology group related to the ambiguity shift is 0 for k equals 0 and k equals 1 so and from there it can recover the classical Shannon Hartley sampling theorem of of band limited functions for example or some other results coming from graph theory or from quantum graph if I remember okay so in fact we can have in signal processing we can by using shifts we can generalize the results that we had at the origin second example network coding and it has been developed in this paper from Greece and Iraq and in fact so very quickly they have shift homologies in which case H0 is equivalent to the information flows that we have in the network and many practical problems may be solved by using these homologies such as the max flow bounds network robustness and some other ones right and the final example persistent homology which I think will be very valuable in our case if we want to learn the channel to denoise what we measure so to denoise it to remove the small variations in order to keep only what is persistent and so it's still simplices complex okay so in this case now we define an elementary key chain in this way and of course we have two different orientations which are here represented by the sign plus or minus and this is a key chain so in the case of persistent homology what is used is essentially coefficients in Z but it can be an abelian group and yeah and then so we have we define these the boundary operator in this way so now of course we are in the reverse direction compared to homology so we go from k to k-1 and the same way as for homology so we have the kth homology group now which is the quotient group of the kernel so the cycles by the boundaries and we have also Betty numbers which are useful in persistent homology which defines being the rank of these homology groups okay so now coming from there what is persistent homology so suppose we have a cloud of points in a space and in no case it will be the measurements that we will do and from there we need to find structures, we need to remove noise we need to remove the small variations the clutters etc and we need to find some structure inside this cloud of points okay so how to do it so what would be theoretically the most acceptable would be to build a church complex in this way so you draw some balls around of given radius around the points and then you form a simplex S if there is at least one common point in each ball of S alright the problem is that computationally it's very hard I mean it's something for which it will be hard to compute the homology groups so instead of this what we do in practice is rather to consider the rips complexes and in this case we will form a simplex if there is at least one common point in the balls which are considered now pair wise okay so for example this one is a two simplex for the rips topology but not for the church one okay so and then we consider a nested sequence of topological spaces x0, x1 etc and this is done just by considering very in our case varying radiuses ready from the smallest one to the biggest one and in this case we need for sorry we obtain for each of them homology groups okay and then in fact what happens is that we want to identify when an homology class is born and when it dies alright and so in this case for example this one is born here and dies here and the birth and death of the homology classes will be very important because with this we can construct barcodes and sorry barcodes and in this case long bars will have a long life and it will mean that long bars will be the persistent features alright? so for example if we use the rips complex so this is our cloud of points you see they are more or less in a torus I mean on the torus or donut and so this is the complexes we obtain when the radius increases okay and for the same thing so these are the so on the x-axis you have it's parametrized by epsilon so this is the homology groups H0, H1 and H2 as you can see this one is persistent yeah and so sorry how long does it have to live? this I don't know exactly but it depends probably on the application because it depends mainly on the application okay and I found something related to the topos of persistence and that means that in fact it's a kind of generalization of the topos of sets in which instead of having just sets we have sets indexed by time so these are given by this barcodes that I have shown to you and it corresponds to time index sets and on these time index sets in fact we can construct a persistence-heightening algebra of intervals, ordered by inclusions and the sheaves of this algebra p encode barcodes alright and sorry a very simple example for example here so I time 0 so these 0 correspond to the time and this one to the dimension of the homology so here for example in this case as you can see h0 of 0 is essentially I mean xy quotient by 0 so it is as morphic to z plus z and h1 of 0 is just 0 because you have only 2 vertices so 2 points then at time 1 we have this and we can compute h0 of 1 which is this and in fact the asomorphic to z and h1 of 1 can be computed and we can see that it is asomorphic to z as well etc. we can compute the homology group of all these guys and so by doing gluing in fact we can glue the point wise homology groups together consistently in order to extract global information of course it is a toy example and the global information is this one for t in the interval 0,1 we have h0 equals to this changing to h0 equals to z when t goes from 1 to 5 and for the first homology group for t equals 1 to 2 we have h1 equals z changing to h1 equals 0 when t is in fact the union of these 2 intervals this is a very simple example of one topos which is related to persistent homology and which may be useful for us and that concludes my talk thank you very much in fact there are algorithms for computing of course maybe they have to be simplified because in general it is very heavy but the idea now for example there are some ideas now of combining persistent homology in fact the topological analysis of data with the neural networks and for now I have just seen one paper which is not very I mean it's not very very easy to understand what they are doing so but this will be the next step because apparently using topological data analysis plus machine learning seems to be very efficient are there questions maybe one do this method give an hint on what you told before that's a small dimensional yes the idea is to try to characterize these many faults in fact thanks to the homology group usually it's not really exactly the homology group that comes from the other spectrum it's not really psychological any more comment it was very interesting the fact that some people revisit the sapling theorem of Shannon my question is that do you see some results on compressed sensing in topos which could be also linked to the fact that maybe you want to from a couple of samples try to find a global representation of your information no I haven't seen it right now but I haven't searched really in a different direction ok if there are no other questions we'll take a 15 minutes break and then we'll have a short discussion like half an hour so come back at 4 and we'll do it from 4 to 4.30