 Okay, we can start, Tanya, with your last lecture. Thank you very much for joining us. Yes, thank you. How are you? Is it okay? Yes, we can see the slides. Alright, so today is the last lecture. And I think I saved the best for last. So we will have hopefully an enjoyable lecture. I prepared lots of figures and kind of a summary. And so please remember to ask questions as much as possible. So to summarize the last few lectures, we discussed that for binary responses, there is information preserving population factor. And if it is a logistic function of a linear argument, then there is this simple readout. It's always linear in neural responses, but it can also be linear in terms of the argument of the logistic function. Then we said that because of this nonlinearity, because of this logistic nonlinearity, when added across many neurons, you have a compression and the infinite space of inputs is compressed into finite range that depends on the number of neurons. And this compression can, one of the ways of thinking about this is a hyperbolic representation. And so in the neural code, you have two to the n values, but because of the logistic function as a continuous function, you get a topography in the space of the representations. And so, yes, I will upload, I guess there is a chat about lectures four and five, I think, four. I will upload these lectures later today after this one. So optimal hierarchical code. And then we have a representation of hyperbolic space. So the plan for today is I will show you, so that's optimal or predicted to be optimal. And then the number of neurons is correlated with the curvature or the size of the hyperbolic map when measured in units of curvature. So it's a little bit contentive because sometimes we talk about the radius of the space and sometimes we talk about curvature, but they are actually the same quantity. Either you have a space of unit radius where the curvature is large, or curvature is one, but the space is large. And then if the curvature is less than one, then it means that the space is not very hyperbolic. So the plan for today is that I will show you evidence of hyperbolic geometry in various instantiations of biology. So one of them is produced by plant volatile, so you can think of it as metabolic signals and a communication between plants and animals. Then we talk about hyperbolic geometry in human perception. We are working on it, but we do not have concrete figures for hyperbolic geometry from neural data. And the third part will be hyperbolic geometry in mammalian gene expression, so within the cell. That's the plan for today. So often people, one of the questions is, so we think that these nonlinearities, the logistic function that I describe, it's in neural circuits, but it's also, as you know, within a cell, many of the activation factors for intracellular molecules, transcription factors, and so forth, they also have this logistic nonlinearity. And many of the signals are organized hierarchically, so for any hierarchical network, hyperbolic geometry provides a good approximation, a continuous approximation to it. So that's another justification or motivation for why we are looking for hyperbolic geometry or using it in various places. So in the first part, I will talk about distances between molecules. You can think of also this as distances between neurons, but there are many papers for distances between neurons, for example, correlation, and the thought was to use similar distances between molecules. So in the case of molecules, it is a long-standing problem, and one approach which sometimes we also follow is to define distances according to physical-chemical properties of the molecule. So each molecule is described by a number of descriptors. It can be how long is the carbon chain, what is the molecular weight, does it have a ring, and so on, you can have thousand descriptors. And the question often arises how to find a person's description of all of these measurements. And one possibility, so you can apply these tools that I'm discussing today with these descriptors, but in this talk, we use the statistical definition of distances. So what do we mean by statistical definition? So you think that any natural source such as strawberry produces many molecules at the same time. And you can have many different samples. Just like we talked about in neuroscience, you have many different stimuli, and we can compute correlation between neural responses across different stimuli. But we can now think of each molecule as a neuron and how they compute correlation between their abundances as a function of different samples. So in this particular data set, the data is from food industry, so they are interested in making the tastiest strawberry. Maybe in Italy the fruit is better, but here the fruit is kind of a little bit wooden. So I mute it or I unmute it? Yeah, I mute it. Okay. And so what we have is the measurements of various abundances of molecules. So in the commercial strawberry, there are various evolutionary arguments that you can make, but you might notice that it doesn't smell as nice as a wild strawberry that is not on the field. Some say that's because the strawberry that is on the field is going to be picked up and is going to be eaten, and it doesn't need to invest any extra energy into smelling nice. So here comes this problem. So this study measured different genetic varieties of strawberries and abundances of molecules. So from this table you can take the measurements of abundances and compute a correlation. There are other... With understanding that stronger correlation means that they are part of the same pathway or if they're not part of the same pathway, they might be of two different pathways that have the same transcription factor that activates them. So in other words, stronger correlation means that there are somehow more couples and have smaller distance. You can have other distances. We can talk about information distance between molecules, and we can also look at Euclidean distance between many of these abundances, and we can discuss one of the questions that can be raised. If we are looking for hyperbolic geometry but we have measurements along thousands of components, is it okay to use the Euclidean distance between these components? So that's a point for discussion, and I will show you some evidence that it is okay and why that might be. So in our case, stronger correlation means smaller distances. Any questions? Because usually this is the slide where the definition of distances is not clear. Any questions? Carlos? Hi. Could you... Sorry, I didn't understand between which variables the correlation is being calculated? Yes. So the correlation is computed between these two variables. So this is a trace for one molecule as a function of samples. You think of it in the natural world and another molecule. So then you take correlation between these two values. So in other words, in principle, what is it... As we discussed, like natural stimuli in vision, I take a camera and I walk around and then I make a movie and then I compute correlation between two pixels and then we plot a power spectrum. So if we think about natural stimuli in auditory processing, we take a microphone, we go across various environments, there are data sets like that and I can compute distances between, say, different frequencies or in different moments in time. Now what's the analog for the olfactory world? So one possibility is to also walk around and record whatever smells you experience and this method, these data sets is an approximation to this rather than walking randomly. Maybe they say, well, let's look at this fraction of natural world. So in this case, the strawberry and now we can compile other data sets that represent a compilation across many different food sources. So wines, cheese, beer, meat, we will talk about mouse urine. So all kinds of environmentally salient signals from mice and then compute correlation between molecules. Is that okay? So the distance is computed between molecules. You can also, yes? I have a question. So if you take a strawberry and then it will have lots of chemicals in it. I mean also it will have all proteins and all types of metabolites because it's made of cells. So are you taking the strawberry itself or are you taking the odor and how are you defining the odor? I mean I guess it should be what is released by the strawberry in the air. Yes, so in practice what it means is that there is a sample of strawberry. In this case, they mush it up in a food processor and then they put a sample into a guest chromatography and measure the abundance. Okay. So these are all volatiles. There are also studies and unfortunately the data set was very small. So in this particular case actually for these strawberries they can also measure the sugar content and the acidity content. But with other food I had a small data set where they measure concentration of omega-3s and various other nutrients. And this is, I would say, even more interesting because in the case of the volatiles it is a true communication system because the strawberry is producing volatiles. Part of those volatiles are to attract the animal to eat it. And so the animal has to figure out they're not that interested in the volatiles but they're interested in volatiles as an early detector of what's inside. So sometimes food stays in the refrigerator for longer and I smell a sandwich and say, should I eat it or not eat it? So that's an early detection and you know through experience that or at least I know if it smells certain way then maybe I shouldn't eat that sandwich. So we learned that metabolites some of these smells are not from strawberry. They can also be from a fungus or bacteria that ferments the strawberries. So another, in an ongoing work we are also studying green strawberries, ripe strawberries and rotten, overripe, fermenting strawberries. And you can tell a difference by whether the fruit is fermenting naturally or fermenting in a predefined way where it has been pasteurized, sterilized and a specific yeast was added. So some of these smells are from other species. So thank you for the questions. Any other questions? So because usually, the distance is between molecules and you have many molecules and now we want to create a map of these, I would like to have coordinates for molecules just like we have coordinates for space. So for space we have say x, y and maybe some abstract quantity which we don't know. And in alfaction, we start with abstract quantities. So that's an example of creating a map for reasonably abstract quantities. So now this is an analogy. So I'm not sure maybe people can correct me. I heard that earlier at some point I gave a talk in Paris and he said that the title was how to make a suit with one measurement and all kinds of tailors came and he opened his presentation by saying I'm going to approximate human body as a sphere and then they all got up and left. So I don't know whether that's true or not but that's what I heard and of course he was working on projections on how to put a curved surface on a flat map. So in our case, imagine that you're giving distances between cities and Earth and based on these distances you should be able to figure out what is the geometry. So if the distances are within Europe then we can say it's consistent with the flat Earth hypothesis but otherwise it's not going to fit. So we are trying to do a nonlinear dimensionality reduction in two steps. First figure out what are the rough properties of the space. What is the curvature positive negative? What is the dimension? And then once we know the properties of the space then we can put points on it specifically to respect the measurements that were taken. And I will discuss today several methods. So some of them are well established such as multi-dimensional scaling which we are adopting for the hyperbolic geometry case and the other ones are more psychological and they will have their advantages and disadvantages. So in our case we have measurements between molecules. So this is a sample segment from a matrix of six molecules but think of it as in reality it's about 80 molecules and in the case of genes thousands of genes and the other one is to create this matrix by putting randomly points on various surfaces and seeing whether the statistics of distances that you get matches the statistics of distances from the measurement. So in this first study we used the topological method but as I mentioned there are other methods that we will discuss with respect to other data set. So in this particular case we use the topological method. It's related to persistent hemology but this particular algorithm is from Vladimir Itzkov publication with Gusti as the first author and the advantage of this algorithm why we liked it is that you can compare the two matrices just by subtracting them and this can be problematic if there are nonlinearities in the individual measurement of distances. So this is a non-metric method because it thresholds this matrix and at a given level and it assigns connections if the distance is less than the threshold value and then for a given threshold you convert this distance matrix into a network and then this network is evaluated according to how many holes of different kinds are there. So by cycle or a hole we can look here we have these nodes 5, 3, 1, 6 and they're not fully connected so 1 and 5 is not connected and 3 and 6 is not connected. So this part here will be a cycle because 3 and 6 are added to the network when we lower the threshold for what constitutes connectivity then this cycle will disappear. So this is an example of the so-called Vedic curve but they plot the number of cycles as a function of the edge density meaning what is the threshold. So low density means very high threshold for very strong correlation for what constitutes a connection and then as you lower your criterion the number of cycles will increase because the network gets more and more connected and then it will decrease because it's going to fill then. So the general shape of the curve is that it rises and then it goes down and you can have cycles of different dimension. So this can be cycle on the order of 2 but you can also have like a hollow pyramid so which will be a cycle and then using this method it's a rather sensitive method to the distribution of distances so if there is any kind of hubs in the network then the network will be will grow and fill in in different ways so the shape of this curve will be different. And any questions about this part? Oh, there is a question. Can you repeat what are these matrix and how they are computed? Okay, so the matrix on the left is computed from the data so it doesn't have to be strawberry data it can be any kind of data set that you have. I think in principle it can be economic data so it just your variables in this particular case we are talking about molecules so it's the distance the correlation between molecule 1 and molecule 2 across various samples so that's experimental matrix the matrix on the left is I'm trying to find an embedding for my experimental point so I'm in this particular method we are trying different geometries of different dimension, different curvature so for example let's think about a sphere I will put the same number of points as I have in the data on the left randomly on a sphere I validate distances between points and I get a matrix and then we will compare these matrices whether we can detect statistical differences between the two matrices and then we will take, yes? So is the distance the matrix on the left is it obtained from a correlation like this I don't know if you can see the blackboard I can't see but it is correlations like x i times y i, yes correlation may be normalized by the variance but other than that is okay so this value and then c i j divided by square root of c i i squared and j j squared so with this distance as you might notice you can have a few problems with this distance for example it can be negative it can also not quite respect the triangle inequality so these are various issues that we ran into something like this, no? Yes I think so, something yeah you can, yeah so the distance can be this now advantage of this I think we even took e to the minus so correlation so advantage of the topological method is that because it's based on the rank ordering you can transform the distances through a monotonic function and you will have the same result so you can have this distance d i j or you can have e to the or you can have e to the d i j so exponentiate it so sometimes that helps with negative values okay so d i j e to the minus c i j yeah so for example that that would work and you know you can also have and so according to the topological method it will you will have the same result whether you define distances this way or even it can be e to the minus c i j squared so now so you can have any function here so any function and what you are saying if this function is decreasing then then the ranking of the distances will be the same essentially will be the same the inverse ranking of the correlations yeah so that's one of the advantages of this topological method because we don't really know how to define distances distances are defined in a somewhat abstract way so if we use a topological method then we are invariant to this we can find a geometry and then with later methods that I will describe once you find the geometry independent of the metric then of a precise definition of distances you can see maybe if I move around with play around with a definition of distances I will find the one that works the best for a specific metric embedding that's one of the possible logic but I'm open for other suggestions so that's so far the thoughts so far any other question I'm still a bit lost on how you construct the geometric model I mean if I understood right the second matrix is obtained from points randomly sampled in the geometric model in the geometric space but how you construct this space from the data for this approach it's a little bit of a discreet fitting to the data so you say well the curvature if we are looking for a space of constant curvature then there are only three options it can be positive, negative or zero so we say well suppose we look at euclidean spaces which is what it's crafted and we will look at two points randomly in euclidean spaces and see whether we can match them to the data so maybe I will give you another this is also from his figure so the central figure is for veticurves for a matrix where the distance values are selected at random the one on the left is the distance matrix where they call it the geometric model they consider euclidean spaces of different dimensions so the dimension is shown here and then you fill the cube with random number of points and what they showed in their paper is that the behavior of these veticurves is very different so you can then map what you have and compare it with the data which one fits better so it's kind of a discreet fitting you have a discreet number of possibilities with the euclidean space of different dimensions and we are trying to see which of these veticurves fits the data better so for example after a while a student in my group he developed an intuition for this and he would look at experimental veticurves and say this is hyperbolic not hyperbolic, this is radius not that radius these veticurves with order grows and for geometric model it decreases so that's one of the signatures and then you can see the amplitude of the first peak depends on dimension and in this particular paper they applied it to hippocampus and incidentally they said that it doesn't fit the random matrix but it fits the geometric model and they said that the geometry that it fits is very high dimensional so it was about 80 dimensions so now my interpretation I haven't looked at their data but my hypothesis is that the data is actually hyperbolic but I need lots of euclidean dimensions to describe hyperbolic data so this is a little bit of a tangential comment more of a discussion does that answer the question yeah almost so let me then show you another graph and then let's see whether maybe it will help and if not we will circle back to the first so now in our paper that we were interested in hyperbolic spaces so we said the same thing as Gusti but now points are put on the hyperbolic space also initially randomly and then we will move them around when we can't quite fit them and you can see the behavior of hyperbolic relative to euclidean so there for example much closer in the peaks of this cursor closer than what is shown for the euclidean space and I think this is in three dimensions so in this particular method the optimization over space is not done in a continuous manner you'll have just a table of curvatures and the table of dimension and you discreetly by hand change parameters of the space which is curvature and dimension and you put different number of points in it evaluate the matrix from the matrix generate the Betty curve and repeat this process many times and then you have a distribution of Betty curves and then we are trying to match the kind of expected range of Betty curves which is observed in the data now back to the audience of which aspect we should discuss more what wasn't clear for me I think it's the construction of the curves with which you the construction of the cubes here you compare the distributions the construction of the curves so let's talk about the construction of the curves you have a matrix let's threshold it we rank order it some of the distance will be the smallest and some of the distance will be the largest and then you cut it at the level of first the threshold is set to the smallest distance so then only two nodes will be connected and everything else is disconnected because two nodes are connected there are no cycles because the network is not connected then you say well let's lower this threshold at some point two nodes become connected and it can be in a sequence or it can be two disjoint nodes that are connected and then you lower and lower and then there are some percolation can happen so you can have close paths so that's the first cycle and this is what's plotted here on the y-axis the cycle for a given threshold density and then as you lower the threshold this would be an example of a cycle because it's not fully connected so if I add three to six connection then five, three and six collapses into so it can be contracted to a point and so it's no longer a cycle but in this case it is a cycle because I cannot contract it to a point okay so you consider only cycles as loops that have no diagonal essentially yes so then the topologists they talk about different kinds of loops or holes in various dimensions so this will be dimension one and then you can have think of a pyramid where the top and the bottom are not connected so then it is a cycle of order two there might not be cycles individual kind of lower dimension cycles are all filled in but there is a kind of a two-dimensional hole inside and then you fill that in and that hole will also disappear and then you can have higher-dimensional cycles so they you can compute many, many orders so you don't have to compute all of the orders if we truly match geometry and we match it in terms of the first and the second in predictions for this circuit and if it matches if the geometry is truly matched then we are okay and it will match the rest of the orders in practice what happens is that these higher order cycles they are they take exponentially longer to compute on a computer so the CPU time grows exponentially with these cycles and at the same time the variability in them also increases so they become progressively less and less useful for comparing between data and various geometries we need, oh okay I have a question why in different dimension the edge density will change? I think that if in different dimensions the density in higher dimensions points to concentrate for example in euclidean spaces at the distances will be close to the to constant so there won't be as much diversity in terms of distances so in different dimension when I put points in different dimensions the distance matrix changes and it also changes in the hyperbolic case for example let's see so a picture of the hyperbolic space so there are more in the hyperbolic space most of the points are near the edge and the distance between them goes through the center so most of the distances are and there is a narrow distribution of distances outside of that value and I think for the euclidean maybe that's a derivation that can be done as part of if you take many random variables and compute the distance between them I think the distance settles to 1 over n do I remember that correctly? do you remember? if you take random variables and in euclidean spaces then I think the distance between points the variance is 1 over n and the distance maybe decreases I think in one dimension it should scale as 1 over n then in two-dimension maybe also 1 over n let's that might be a nice I will maybe that's a problem we can consider for the exam we can start thinking about it I will do the derivation before the exam and then we will try to figure out how if you take the random variables and you compute the distance between them the expected distance between them xi and xj for two random variables 1 over n and 1 over g if the variance along each dimension is proportional to sigma because the distance is normalized if there is a break I will try to think of the exact scaling is that okay? basically when you add more variables the expected distance between them changes as a function of dimension more questions and then you can evaluate distances between these curves according to a number of measurements the easiest one is the integral of the curve and in this case just number of cycles called integrated value but you can also try to match the exact shape and that would be a more sensitive measure so in this particular case we are back to our strawberry data and as I mentioned for spaces of constant curvature it doesn't have to be constant but this is the first approximation you can talk about curvature being positive spherical spaces very dimension Euclidean and also hyperbolic and this is interesting because hyperbolic would imply the kind of maximally informative coding with on-off devices and also congruent with the hierarchical network and then when you do the data evaluation you find that you can rule out uniform distribution in Euclidean space uniform distribution on the spherical space and the semi-uniform distribution in the hyperbolic space a match so it turns out that the distribution of points was not quite uniform but it was integrated on the poles you think there is some cyclical variation due to the circadian rhythm that best matches the data so I think that's the summary of this data set and if you want to see the Betty curves then the Betty curves I think are these dashed lines and there is a variability around these dashed lines but you can see how well one can fit them either with hyperbolic space in these curves or with Euclidean space so with Euclidean space the peak position is different from the data and they decrease much faster so then that's the paper where the data comes from so if you're curious the code is publicly available the data set is publicly available so you can run the analysis and then as I mentioned there are many other data sets this is about the strawberry data set mouse urine data set, blueberry data set, tomato data set so in each there are about 80 different samples and a variable number of molecules about 80 for the strawberry and blueberry I think was the least of about 40 molecules and what I'm showing here on the top graph is comparison between the data which is in triangles and the best fit in hyperbolic geometry and you can see that for example here it doesn't match in the third order mouse urine doesn't match in the third order here it almost matches in the Euclidean case we will discuss why and also in the tomato so the differences are statistically significant but when you do not have a lot of points then you can account it becomes more difficult to become hyperbolic effect compared to Euclidean case so questions about this graph and also whether there are previous questions that need to be followed up followed up on so I think what you are describing is topological data analysis applied to these data sets just put on Slack some introduction slides on topological data analysis that also shows explains how you derive better curves in a say more detailed fashion I think what you have here is essentially you compare these better curves I mean the integral of these better curves for the data the network that you get by thresholding the data the correlation matrix of the data that you obtain from randomly drawn points in a d-dimensional space with a certain curvature, right? So we have another question. Can you maybe show us the formula of the curves the better curves? The formula? Yeah. There are only four, so there is no formula. What are the parameters beta in the different? So the parameters, you mean 1, 2 and 3? That's the order of the better curve or the parameter of the space? Yeah, I meant beta 1, beta 2 and beta 3. Yeah, so beta 1 would be would be a cycle like this number of cycles of this kind so 1, 2 and 3 is not a cycle 3, 2, 4 is not a cycle 4, 3, 5 is not a cycle but 5, 3, 1, 6 is a cycle and 5, 4 2, 6 is also a cycle. So there is a code that goes around and counts the number of cycles in a given network and you can have also cycles that are kind of have a two-dimensional, so you think of this as a ring as a one-dimensional ring like a circle, so it's kind of topologically equivalent to a circle, but you can have a cycle that is topologically equivalent to a sphere and also in higher dimensions, so that would be beta 3. Is that okay? You have a cloud of points in the dimensions and then you look at holes and you have holes of dimension 1, you have holes which are spheres or say equivalent to spheres up to deformations or equivalent to hyperspheres, et cetera, et cetera. This is more clear. So when you have these points in the dimension and you connect points which are closer than a certain distance then you have say a geometrical object. This geometrical object has a certain topology and topology is determined by like holes and there are holes of different dimensions. I think that the problem is that I never took a course or something in topology, so don't have any idea on that things. Yes, this is a subject that probably should be a course of its own on topological data analysis and but you can look it up. So there is another question here. Yes. If we define it that way at some point we have like a close data set because each data point will be connected to another data point and so on and so on. So I'm trying to imagine the example you just gave. So we have we can say that this is like a graph that closes itself with some kind of topology or how is defined in the way. My question is to be more punctual is how does the graphs are topological different from one another regarding the whole concept? Tanya did you understand the question? Maybe every phrase so the way I understood we have a given network and you can start contracting these cycles and at some point you can't so then you see how many these non-contractable cycles are there and that's the characterization of a network and then you lower the threshold and you see how many cycles are there now and some of them can be removed and so the number of cycles first increases then decreases. So we can draw a correspondence from the number of cycles to the number of holes or it's not a... No, the number of holes is not at my level of analysis they are the same. That goes our holes. So now this is fun with data sets strawberry and now we have what we found that it was a three-dimensional space that fit all of them and it was approximately the same level of radius or curvature from which my interpretation of this is that well we can talk about it so my interpretation is that these are the data sets of similar complexity now and this will become maybe clear in a moment so we start with say a three-dimensional tree like this and as I mentioned that one way to visualize hyperbolic space is to compress this three-dimensional space into a sphere so just like we talked about Poincarea circle where the infinite plane is compressed through this tangential linearity into a circle of radius one which is unattainable or you can have a certain radius and the curvature is set to one same thing here in a three-dimensional space we are visualizing the space as a sphere so I told you that the space is not spherical but we are visualizing it with a sphere which you can think of as an envelope of a tree and this is now visualization of once we know this dimension we can put the actual points onto that space so each point here is a molecule so now we have our map of molecules in this space and to show you that this is not for example molecules this is some kind of a common I think it was Scandinavian berries anyway so now the distance between them as you know the distance between molecules you have to go inside the tree and then out so this is an example drawing of a geodesic and we can see it now in 3D so now we can talk a little bit about this space so as I mentioned these are all volatile molecules and we find that they are all positioned close to the edge of the space now if so what this method gives you is that you are observing leaves of a tree but by estimating the curvature and dimension you can estimate on average the branching process that is inside the tree and is unobservable to us so if we could measure the non-volatile content of a strawberry which are datasets or I hope there will be in the future of various sugars or acids or other nutrients then those molecules will be positioned inside this sphere and the reason we get this embedding is because our data is limited to volatile molecules that are not themselves they are all kind of like children they do not cause other molecules because they are in a way end products of their reactions being volatile any questions? so now there is a question no questions so now we can analyze this problem as a communication as I said that our faction is an interesting way of communication between plants that are that cannot move and animals that can so there are all kinds of stories about how when a plant when it's sensing that they're being eaten by a herbivore then they emit a pheromone that attracts the bird that will eat the thing that is eating them so here is an example of the communication so for now these are the maps that are derived exclusively based on statistics of strawberry data and now we add human perception this is a combined data set between strawberries and tomatoes based on the overlapping set of orders so there are more points and the color shows how much people like strawberry or tomato so the red axis is the one that points towards the most preferred kind of strawberry and one can see the geography that for human rankings that emerges it didn't have to be present in the data but I think because there is some logic to our perception and that's the reason for this mapping so what these three axes are the red one is the most highly ranked the perfect strawberry there and then there are many axes you can define you can think about acidity of the sample you can think about the boiling point of the molecule and because the space is low dimensional you can now make predictions for one how pleasant something will be or smell based on for example acidity of the sample and boiling point of the molecule so here you have data of both strawberries and tomatoes and the pleasantness is the same vector for both? well we computed one vector but you can because it's a joint map but in the graph on the left you can do a cross validation meaning the axis is computed using one subset of orders and then you predict pleasantness for a novel set of orders that were not used in computing the pleasantness axis so of course the pleasantness axis will fluctuate depending on which molecules you include in the data set so in reality there is an error bar based on this axis and also even the distances between molecules they change depending on the context so the distance between two molecules within a strawberry data set is different from a distance between two molecules in the tomato data set and here we just average the results but they're also different between green strawberries and red strawberries so the distances are empirical distances that fluctuate depending on the environment so in other words you can think of this map it's not there is some dynamics to it that one could study in the future with additional data how does this map deforms as we change the data set and also some of you may know there were a series of debates about how many orders can human perceive and what is the dimensionality of this space so and the computation for the number of orders was similar to one of the variables was how big is the dimensionality of the space so my point of view, my contribution to the debate is that I think that the space is low dimensional but you can have higher or more or less resolution depending on a human so you would think that people who rank wines or other products so you can have I still have a low dimensional map but they can have finer gradation with this map so you can merge your number of states yes and we can talk about, for example, different species whether the mouse has more receptors than the fly and the elephant is even more than the human so depending on the number of receptors that you have you will have a higher resolution but the space can still be two or three dimensional any questions about this any debates on this point of view the alternative argument is that the olfactory space is high dimensional that's a disclaimer so when you read papers you can think about this lecture and whether you agree or maybe there is new data whether the space is high dimensional or the space is low dimensional but has high resolution within the low dimensional space and the same thing is actually in the visual domain so there are papers for hyperbolic visual perception so do you know that you have hyperbolic perception or how it can be tested I think I mentioned in the beginning of this series of lectures that you can have these alleys do an experiment where how should the room be positioned in order to be perceived as having straight walls and it actually has to be curved to be perceived as straight provided you do not move so when you move you can figure out the distances but based on the two eyes and perception within several meters of the head is hyperbolic any questions about the olfactory map so maybe I have a question so these are maps which have to do with physical properties of the different samples so do this map or do this metric correspond to perception in the sense to the response of the olfactory system that if two odors are closed in this space then also the response of the olfactory system should be similar this is a little bit of an ongoing work and I do not have so what we find in preliminary studies is that you can define distances between molecules based on abundances in the natural world as in this case or you can define distances between molecules based on their physiochemical parameters correlation across different abundances and preliminary evidence indicates that and then I can define distances between molecules based on neural activation and it can be receptors in the nose or it can be high order neurons so what it seems is that the distances between based on chemistry are more closely associated with distances at the receptor level but the distances based on abundances in the niche world are more closely associated with high order neural responses and in this case we are measuring perception so I think the goal of the perception the perception is most closely correlated with the natural abundances the final goal is to figure out what is happening inside the sample not the particular chemistry of the molecule so that's kind of preliminary evidence and thoughts in that direction in other words, yeah I think that's that so we can talk a little bit more about human perception now yeah I think there was something else I was going to say again okay so human perception then this was actually I think you know for a long time even right now for human perception is based on the descriptors so you know how like fishy or chemical the space is and so on and then you go various categories and the categories themselves as you see there are a little bit of hierarchical so now we can transition from the space of natural orders there is a data set of actually mostly unnatural orders and they are ranked with according to these human descriptors and we now assign distances to the orderance depending on how human observers thought they were similar or dissimilar and here's an equation that I think I will address closer to the before the lecture is that here the correlation doesn't quite work for us so we had to switch to Euclidean distances between the human rankings along these descriptors with the idea that if one order is ranked as 100% fishy but the other one is ranked as 0% fishy and then they are correlated so they can be very correlated but at least you can do so in the previous analysis with the correlations we actually took the absolute value so the absolute value here doesn't work so a closer approximation was a leading distance across many components and here is some visualization from previous papers that indicated this perceptual basis curve and this is the work by Alex Kulakov and they even say that it's a potato chip geometry which is a hyperbolic geometry and then here's another paper and if you plot kind of a flat representation of PC1 versus PC2 these points you can even see the hyperbola here so now motivated by these studies and also by our general interest in hyperbolic geometry so we look at these topological signatures over this and turns out so here is the Betty curve 1 and Betty curve 2 so now it's split a little bit by the Betty curve and I'm showing you the fit using Euclidean the data is here and this is a 3D hyperbolic space and we try to fit the dimension of the space as in terms of to fit the integrated Betty value 1 and then using the same parameters make a prediction for Betty curve 2 so one can see that the data for the Betty curve 2 is much more noisier than for the Betty curve 1 and it also has lots of these peaks, disjoint peaks so it turns out that whenever you have the multi-picked Betty curves it means that the data is not uniformly distributed but more clustered and that leads to these multi-peak curves so in terms of the I'll show you the fit to the space but in terms of the integrated Betty value here you can see that the 3D space fits as well as higher dimensional spaces but the deviation increases this dimension so the 3D space is the lowest dimension that fits the data and then by comparison the Euclidean space if it fits the Betty curve number 1 in terms of integrated Betty value it doesn't in terms of the Betty curve number 2 so that's the motivation and here is the visualization so I told you that here the distribution of points is very different from the previous case we are no longer have points that are distributed on the surface but some are more central and some are further away so the central points are you can think of them as generalist orders so the orders that are in many sources so you would say for example you can have one molecule that is in all dairy products another molecule that is only in a specific food source so that would be an example so one can so that's the visualization of molecules in that data set and now you can see there are many more molecules that need to be sampled in order to fill the space in other parts of it any questions about human perception so the reason why in the Betty curve 2 you have these spikes is because you have very few data points no I would say the number of data points is not small it's actually in 127 or so but I think the data is more clustered and it's not uniformly distributed and this leads to these multiple peaks so now very briefly in the remaining 8 or so minutes actually it was published so it's hyperbolic geometry mammalian gene expression but I wanted to focus on one slide we can talk more slides but the main points that I would say highlight the differences with the topological analysis and also what to do with multiple variables and give you an intuition for why hyperbolic space when it works and when it doesn't fit the data so we use here a different method so some of you might be very familiar with the multi-dimensional scaling so this is the method that we used here and advantage of this method is that it is a metric method so it's not a topological method and it's very fast versus Betty curves are computationally expensive you can only compute it for less than 150 points versus there are many thousands of cells or even in neural data so that would be one point and so in this case it's a metric method and the idea of the multi-dimensional scaling is that you're again are working with distances so suppose you have a set of points you evaluate distances and then you don't know these points so you want to reconstruct the position of the points based on their distances so that's the multi-dimensional scaling and regular multi-dimensional scaling can work very nicely now the same thing you can do in the hyperbolic space you take a set of points you evaluate hyperbolic distances and then you move points around in order to get the match that's an illustration of hyperbolic multi-dimensional scaling but then there are two versions of the multi-dimensional scaling one of them is a metric method the other is a non-metric method the metric method attempts to get at you know this is called the shepherd diagram the plot of initial distances versus after the embedding distances and the metric method metric MDS gives you a straight line and the other one, non-metric is again a similar rank ordering so it says I'm fine with a curve but not a straight line as long as the rank ordering of distances is preserved after the embedding so it turns out that you can use this non-metric method as a way to detect differences in geometry it's a little bit it has a few steps in the logic and I'll show you how it works so imagine that you have data that was taken from Euclidean space and you put it in another Euclidean space then you will have a nice match between distances before embedding and after that and now you take these Euclidean points and push them into a hyperbolic space of a certain dimension a regular multi-dimensional scaling method will produce a straight line but a bigger schedule the non-metric method will produce a curve that is tight but will not be flat a straight curve will have a deviation in this curve as a way, as a signature to detect the mismatch in geometry between my original data distances and the embedding data distances so it's a multi-dimensional scaling it's non-metric because we are looking for shepherd diagrams that are not penalized for being a straight line the distances are evaluated with hyperbolic metric and then the second derivative of this shepherd diagram gives you the difference in curvature between the true space and the embedding space this is a lot of information but any questions about that so to follow up if you take data from hyperbolic space and you put it in the hyperbolic space you will have a straight line and then from hyperbolic space to Euclidean the shepherd diagram will curve but in the opposite direction so you can use the convexity not to overuse the word curvature as an indicator of the mismatch in the curvature between intrinsic space and the embedding space so then I will just tell you the conclusions, maybe two slides is that if we look at the geometry of gene expression then what we find is that we take lots of different cells and lots of genes and the cells are diverse but if you measure cells only with respect to a small number of genes or the cells are from a similar body part kind of similar organ, same organ then the geometry will be Euclidean and I think that makes sense because you can always approximate hyperbolic geometry with a locally Euclidean geometry so I think I will stop here and ask for final questions and then I will put a concluding slide here so we discussed today examples of hyperbolic geometry in plant volatiles, human perception and this is visualization that I only showed to you in the conclusions of gene expression so approximately yes, so you see natural orders from plants and animals and human perception, gene expression and this is actually, if you heard of the tisny so that's our version of the hyperbolic tisny which we think works better for biological signals and then I think I'm ready to ask for your questions so do I have questions? so maybe I'll start so what is the meaning of the dimension that you extract from this method so I mean in human perception and natural orders I think you get three dimensions so what does it mean that there are three coordinates, three relevant variables yes, I would say so yes, so we think that it also can be equivalent it also may be an indication of the branching ratio, so in terms of continuous map this is how many independent variables I need to have to characterize the space and it also you can make a connection with the branching ratio of the underlined tree so and can you say something on what are these three dimensions, these three variables so one of them so this third dimension the radial dimension is kind of derivative versus master regulator and the other two dimensions so we have freedom to define them there are many options and they're interdependent so in this case we chose acidity and I think molecular boiling point but you can find other dimensions so that are maybe more relevant so in the case of strawberry and fermenting strawberries and so one of the angular coordinates is fermenting angle and you can shift it depending on admixtures in the yeast so the interpretation of the coordinates I think that's you know what are the best coordinates that remains to be specified because that depends on a particular question and what kind of either in relation to the neural system or the specific biological problem such as fermentation what are the parameters of fermentation I hope that so because the space is low dimensional they have thousands of different variables to look at then I can define many axis that's my group but you can define many axis but the question is which of them are more useful or specific biological question I have two questions following the question of the meaning of the number of dimensions what kind of consequences or I don't know how could we say something about the fact that the hyperbolic geometry is the one that fits better the data I would say in the case of some of the newer methods you can that we are trying to develop for example you can have a Bayesian information criterion and to determine the optimal dimension and optimal curvature based on multi-dimensional scaling so it's a statistical statement saying if I am bad point in this space versus the other space I have so many deviations and this is my noise model and I think that the Bayesian information criterion tells you that this is the optimal dimension is this and the optimal curvature is that and in the multi-dimensional scaling you can vary the curvature and it will even for some of the analysis that we do you can run the analysis and it tells you that the curvature is 10 to the minus 2 so in this case the space is approximately you believe my second question was if you could give me it says that it helps to improve the visualization of the data I suppose it's better if we know hyperbolic geometry before understanding the visualization this is some of the slides you can see here from this slide so this is tisny visualization method and based on this analysis of gene expression we think that the data is locally euclidean but globally hyperbolic so we add to tisny hyperbolic large-scale constraint and then you can look at for example the embedded distances versus data distances compared to other methods in this case they have local hyperbolic geometry and no large-scale constraint and you can see that there is a more faithful embedding and I also have this analysis for gene expression where this is the quality of local embedding versus the quality of clustering coefficient and you can have various versions hyperbolic tisny and then this is another method that we work on so you can have quantitative measurements of the quality of the visualization one last question sorry this is the geometry is a consequence of the statistics of the data or is a consequence of the data itself I don't know the statistics of the data is a consequence of the data sorry my question is it's a consequence of different sets of data will have different geometries or all the data that are trying to maximize information will have the same kind of geometry so what is the implication of what we are trying to the long-term goal but it hasn't been established because right now we are only looking at finite number of data sets and in some cases you find hyperbolic geometry and in other cases the curvature is less so there is no implication that if you have a hierarchical system and you would like to maximize information in the presence of binary on-off states then there are reasons to expect hyperbolic geometry based on optimality arguments and then one can verify that in some number of cases that to the present are limited but that's the overall kind of roadmap for future work okay, thank you any other question so if not then I think we can thank Tatiana for this nice set of lectures thank you very much and this is for your question I will upload lectures 4 and 5 and I will communicate with Mattia regarding your exam yes, thank you very much okay, so have a nice weekend bye bye