 Thank you. Okay. Cool. Um, this has been a really cool morning It's really great to find out like the parallels between what I plan to talk about and what other people has already talked about and nothing Has been planned. So this is all emergent So yeah To introduce the Speakers Oh here. I see Okay, so I'm Chen Ling I'm a PhD candidate at UC Berkeley in the computational biology program and my research is mainly focused on Making tools to analyze single-cell RNA sequencing data So a lot of the examples I'm gonna give today is related to that research and Diamantis Salis who is a R&D engineer at Cosmo Tech and he's gonna join us via video soon To talk about the evolutionary evolutionary biology side okay So Biology and complexity science. So like asking me to talk about biology is kind of like asking me to talk about the entire world But I think One point that I want to make before we even start is just how biology is relevant to Development of many quantitative science in the past and why I think it's really relevant for developing complexity science in the future So this is a quote that Claude Shannon made on information theory Right when information theory was getting invented So it says the establishing of such application is not a trivial matter of translating words to a new domain But rather the slow tedious process of hypothesis and experimental verification So this is funny because information theory has been applied to absolutely every scientific field that we can think of But Shannon himself says that the application of it isn't just translating words, although it's really helpful It's really applying those principles in theory to concrete problems Which is really relevant to what you were just saying where we need to have like a system systematic way of thinking where we put Concrete problems that we want to solve into a much bigger system and then finding an effective way of reducing it back down To something that we can tackle So why is biology really a sealable system for this way of thinking is because first of all It's very multi-scale So we can go from as small as a molecule to as big as evolution that encompasses entire earth And there's many different levels in between that are semi-independent But never totally independent and then the second point is that biological systems are highly non-linear So we know a lot about this already where people have studied Metabolic network and gene networks for a really long time and then at the end there's like a lot of very well-motivated Problems in biology and I think this is really why Biology people tend to think about biology being like a playground for quantitative scientists to apply their methods to But really a lot of quantitative Methods have been motivated biology problems such as one of the famous examples Regression people never thought about correlating two variables together before they started looking at human height Which is a really simple problem okay so Before we delve into the details. I just wanted to introduce single star in a sequencing for those people who don't know it This is a really new technology that really started around 2010 so before that people knew a lot about Transcriptomics, but really when we look at transcriptomics of whole tissues We couldn't distinguish whether it's like proportion of individual that was changing or each individual that was changing So the technology works as such so you have single cells in the test tube and basically You merge them with a single droplet that has unique barcodes for each cell and unique barcodes for each gene And then in the end you get a matrix Where every row is a cell and every column is a gene and there's many challenges with this technology Mainly because it's really high dimensional So we have around 20,000 genes that we can measure for each cell most of them are zero And there's really high noise because for each cell there's a tiny little amount of RNA that we're trying to Get a lot of information from but what's lucky for us as it turns out that Cell biology is very has a lot really high redundancy of information and people have done this where they measured a Hundred random quantities from each cell rather than meaningful units such as gene expression and they recover almost the same reduction space Yeah reduce space as when you measure like more meaningful Information and then it's very robust to a lot of perturbations. So that's why a lot of these methods work Okay, so the first example and I'm gonna my talk is gonna be really different I'm gonna jump over a lot of short examples as To different ways you can compress a really complex system down into kind of a flat problem that you can't resolve So the first example that I'm gonna talk about is using variational auto encoder to address this problem So it was really good that you introduced neural network Before this because I didn't have time to do that But essentially you have this model that tries to learn a reduced space for a single-cell RNA sequencing despite all the technical difficulties and one thing that's really different between auto encoder and what we were talking about before is that when we train a Neural network that supposedly gives you some outcome You have to have training data for what that outcome you expect to see is and at least some of your Examples, but in biology we kind of want to go in without knowing anything about the system So all the encoder comes in as a really useful method because you have input on here That's represented by x an output here instead of being y is x prime And so going in you don't need to know what the output is because all your network is trying to do is Minimizing the difference between your input on your output And what's really useful here is that you can have a middle layer that we call a code That's much much lower dimension than the initial input that and because you learn a network that If that effectively reproduces your input, you know that this code Effectively represents all the variation that you care about in your input without ever having to know anything about what actually the I Guess the relationship is but we're gonna go back to actually learning about the relationships later But this is what the essential model looks like so And the reason so SCDI is the method that we're working on and it has I guess a lot of properties that solves kind of complexity related problems. So first In biology, we have really nonlinear mapping between the gene expression and the cell states which is what we're trying to work with and One way we're addressing this problem is VI is using your network. So by nonlinear I mean that so we have two cell states and we have a bunch of genes But the way the genes determine how the cells behave isn't Independent so like gene one and gene two might have to be together to promote this date and then these three genes might have to work together to promote this other state and We have a lot of experimental measure, but they're not super All of them have their own biases. So we want it to make Something that essentially learns other relationships by itself and that's why the auto encoder and your own network Approach was really useful It's not supervised because I'm during the training phase. I'm never telling it what the cell type is During the kind of prediction phase. I'm looking at the codes that it provided me and Like I'll ask biologists. Do you think this space represents what you think this should be? Yeah Yes, yeah, and the Basically, does this neural network is a very over-specified or under-specified problem So there's many different solutions that could give you Answers that make sense and there's a lot of like interesting like saddle point theory that is Relevant in this, but I won't go to it too much And then another difficulty that we were presented with the high noise of the data and how the data we observe is Probabilistic and this is like another layer of complexity over our model of single cells So one really really naive example here is that presumably have two cells that are completely identical All of their properties are drawn from the same distribution But when I look at the data, they could look really different one could be here I want to be here, but that's just a property of my distribution. It has nothing to do with how different those two cells are and so generative model We're gonna take a generative approach to this so another way of thinking about generative model As like opposed to predictive model is you can think of predictive model as modeling this probability Puy given x so that's the probability of your outcome given your data, but generative model actually models both the outcome and Input at the same time as a joint distribution and here is the probability model that we ended up using so we have a number of variables and each variable feeds into another variable as an arrow so an arrow from C to row means that row is dependent on C and so What we have is the lane space which is like kind of the biological Cell space that we chose to represent in the arbitrary dimension space And then row is like the true expression level that we care about X is the data that we observe and then we have two extra variables that are feeding in one is batch so that's kind of the technical variance that you want to get rid of and then there's the scale which is accounts for the cell size and this Combining this model with neural network ends up giving us some really meaning really powerful results because We have a structure so we have like the function that's highly nonlinear and really free, but then we're making that Like freedom into learning a model that we have a structure for and we can interpret a lot of these variables later on All right, so that was all and then Example two is metabolic network and single cells so earlier on we said that we wanted to take a simplification That sort of gets rid of all the prior knowledge that we know about How the network in the cell actually is and this is just another approach where we make The known structure of how the genes are related like a focal point So this is a metabolic network basically every dot is a metabolite a chemical in a cell And then every edge represents an enzymatic reaction that leads from one chemical to the other chemical And this is similar to a map where you want to get from point A to point B And then depending on like how fast the road goes How many people can be on it at the same time you can optimize your system So the most things goes from point A to point B or you can study the system and say what's possible It's not possible, etc. And So if you want to use this network to study the states of the cell There's one approach that people are doing is using flux balance analysis Basically, we can quantify how thick the edge is by looking at how much enzyme there is in each single cell and From that we can predict what the metabolic state of the cell is So is it doing more black glasses or is it metabolizing lipids for its energy keto diet, etc and this is like a pretty complex model because in Practice there's not really a objective function that you can optimize for like depending on what your cell is trying to do You can't think Like you can't just assume that your cell just wants to replicates because not all the cells in our body wants replicates So there's like her a stick that we can do to this and then also it's using a flux balance analysis Which is assuming that everything is an equilibrium, which we know is not true But this is like the best people is doing what singles aren't a secret I encourage you to think more about what we can do here But this is one example of how we can take another slice at the problem And the third slice we're going to take from a very similar problem is instead of using the metabolic network use the transcriptional network, which is basically how genes regulate each other and This particular paper is addressing the problem of development So how do cells start from kind of a homogeneous dates and diverge into very different states? And how do we measure that potential of divergence? So the way they did it this is actually introducing back information theory as well What they did is that they took the protein protein interaction network And then measured the strength of the edges from each protein to another and Kind of the assumption is that the more entropy there is in this network the more Potential it has to diverge into different states and they did this experimentally and actually found that they can predict the potential of differentiation just from measuring the entropy from this network perspective and then the fourth example is the research Another project that worked on and that's like the simulation Perspective so we have introduced a lot of really complicated Methods and models and I think one of the ways to test whether our assumptions actually make sense is by doing simulations coding out kind of all the mechanical process that you think things are happening in and Eventually seeing if your model replicates the same behavior as your observations Okay, and then example five. I'm moving a little bit towards the field of Immunology and this is a really cool study. That's about predicting receptor diversity using maximum entropy so For those of you who aren't familiar with immunology We start with the same genome when we're born But we have a lot of immune cells that each produce a unique Antibody and the way that it is done in the body is that there is kind of a genetic Modification happening in each of your T cells and your B cells where you start with a bunch of Possible fragments and then most of them gets cleaved out and then the so there's the V element D element and J elements and then one draw is taken from each Array and then you end up having one protein that's made up of three elements And that is what produces like the diversity in your body to make Antibodies and so D particular is very diverse because it's not just the combination it's also genetic mutation that's happening on top of that and this These researchers are particularly interested in whether you can predict the sequence of D But so diverse and occupy such a large space that you can't possibly just model the entire space So what they did is they came down to a reduction to that space using physics theoretical physics and wait the thermodynamics and Maximum entropy so basically you can represent the maximum entropy distribution as a function of the effective energy of your sequence and so data is just the This symbol I forget what's called is a representation of sequence like of Sigma yes of animal acid and then you can write it down as the sum of different sources of energy So the first one is just the function of the length of your sequence and then the second one is a function of each individual Recidral of your sequence and then the third one is the energy from pairwise interaction from the first in First residue to the second to the third etc And so this is a simplification of what actually happens, but they were actually able to use this model To predict the actual diversity they see in nature, which I thought was really cool Okay Well But three or four or five minutes would be good. Okay. Thank you The closest thing to a time machine is looking into fossils as far as I know And Getting into such a time machine and going into the past a few billions of years One of the most impressive things we would notice is that What Is the complexity of life no matter how you measure it in terms of different cell types size Or whatever At some point life on earth was just blobs very complex ones that just blobs look around to you today. You see Wonderful trees flowering trees. You see buildings that the principle is a complex civilization So what we see is things used to be less complex now are more complex another simple observation in Uh, most of the cases that we can actually look at it is the rapid initial increase in diverse So we have a great expansion in what in the possibility space Well, maybe one famous example case to think about is the The body plans of metasurans Expanded initially and then we have fewer additions So we can Go to the next slide about the mechanisms I have here two figures on the left figure. It's a A classical figure where the tops On the left on top the x-axis is Some sort some measure of complexity is very schematic and the y-axis is just pure number So in the past in our time machine We had the few complexity, but just Fewer smaller sizes small But Few smaller numbers Today in the lower part of the graph. We see there's a a big tail with a schematic diagram of a human person Standing on the right side of the graph representing the tail of just a Random walk with a barrier. So one possible one proposed mechanism of this Explosion of complexity was just No more you have more than we have We tend to focus on the tail of the distribution and then we may observe more complex and that's it And not everybody agrees with this point of view of The measures of Innovations of complex forms and this is showing the right hand side right now and here times in this so Back in time is In the lowest panels and now is in the upper panels It shows two different ways of increasing complexity Same axis here x-axis is a measure of complexity y-axis is the numbers So this proposes a statistical test to see if we what we observe is just a number of growth numbers And then we see a larger table or the minimum also of the complex to measure growth And the third test not shown that also implemented quite often in some data Is a parent offspring Ratio so if you look back in the fossils You're looking to the parent and the offspring you can measure the content in general have a tendency of growth Of complexity whichever measurement use or diminution and then you can look into averages And then dropping this up in this third slide There are many opposing views regarding to how How we get the complexity to be evolved why we have an open-ended evolution And an evolution of new qualitatively different forms would be called innovation and I just list a few here Using one example of the role of modularity modularity diversity and robustness the role of that environmental influence But I wanted to think about this a bit more broadly going into other artificial systems beyond biology We have we can look and we can now measure how Piece of code has been growing we can I think of the Linux operating system It's been here for many of years as thousands of hours of work and we can again study complex complexity with different very specific measures and we can make the same Statistical measurements and we can see if the the emergence of complex forms of increased complexity is more of a passive trend of growth in terms of volume or active trend or a mixture of both And with this I would like to close the three minutes and let you think it's your own system your own expertise what have been the The important or maybe the more Novelties in your own system would have been the emergent The complex systems how they emerge and what possible pattern they could follow And how we could make this statistical test that this is probably what we could know in the project section Discussion I hope I was