 오케이, 저는 다음 주의 주인공의 작가, 덕선 미의 키아스입니다. 어서 오세요. Can you hear me? Right, so thank you very much for inviting me to this nice workshop. 장봉, 정효, and Antonio Salani. In this talk, I'm 덕선 미 of 키아스 here. And today I want to introduce a simple model for the co-evolution of the cellular metabolism and also the phylogenetic tree of species. The motivation is that we wanted to explain the origin of some empirical features that we found when we analyzed the large-scale dataset of the cellular metabolism across many species. So before telling you my research problem, I'd like to introduce first the background. So this shows you the metabolic reactions. In our cell, we have many thousands of chemical reactions that convert chemical compounds. And this is, when you collect all of them, we see a very complicated map like this, a map of biochemical pathways. And this can be also represented by a network like that. This is a bipartite network connecting metabolic enzymes and reactions and chemical compounds. So the left one is from E.coli and the right one from humans. So we see that although the cellular metabolism is expected to be almost identical across species, but when you check out their differences then we see they are different from species to species. So we wanted to understand the difference. So we downloaded the data from Biocic database and we used this version and the numbers are. So it covers about over 5,000 species and over 10,000 reactions. And the starting point of all of our research is this species reaction matrix. It is a matrix element of 1 or 0. That means that whether a reaction R is present or absent in a species S. And the main quantity we are interested in is these two. So the first one, both are obtained by summing the elements in a row or in a column. So R to the S represent RS is the number of reactions in species S and S of R indicates the number of species containing the specific reaction R in the metabolism. So the number of reactions in a species they are quite similar. So its distribution is almost Gaussian distribution. On the average, 1,500 reactions are found in the metabolism and the standard deviation is large, but anyway the distribution takes a normal distribution. But the problem is not problem but one interesting thing is that when you check out the distribution of SR that means how many species has the specific reaction R then it takes a power law. And when you feed the data to a power law then you find that the exponent is 1.0. So that slope is to be exact, I mean to believe. So it's a power law with exponent 1. So those things were identified in the analysis of the empirical data. So our question is this one. Why is such a clear power law observed in the metabolic network? So we checked out the metabolism across species and it takes a power law. So we expect some principle to be underlying that power law. But I don't know. Would you think this finding interesting? So one possible answer to this question would be that this distribution simply reflects different biochemical importance of different reactions and for liveners. And the importance of different reactions happen to be distributed in that way. So this power law just reflects quite different degree of importance of biochemical reactions and no surprise, right? But of course I don't think so. We thought that this distribution comes from some specific way of the metabolism evolution over a long time scale on Earth. So if we think about evolution of the metabolism then this answer could be the case but we need to be more quantitative in digging into this idea. So that's the main theme of this work. So we need to check out the previous studies about the model for the metabolism evolution. There are many models as far as we can check and these two are the most famous one. So in the left one the main idea is that the goal of the metabolism is to produce A material and then if it is available in the environment then everybody is happy but if there is normal A then we need to produce inside a cell from B and suppose that B we are run out of B then we need to produce B to make A then we use the available resource C to produce B and then A and so on. So chemical reactions are added one by one it's a backward evolution. In this model in the primitive time long time ago one enzyme catalyzes every all the reactions so it has a broad specificity and then as time goes on it takes a long time to catalyze every reaction So by mutation and application there are many enzymes that are acting on selected reactions So along this idea if we draw the metabolic network then metabolic network changes in this way. Given that the structure of the metabolic network is characterized by a broad degree distribution people were interested in proposing the metabolism evolution model that can reproduce the broad degree distribution power degree distribution and this is based on that idea In the right study the people introduced the organism degree this is exactly equal to the number of species having a given reaction R, S of R in our study So they found that the organism degree is distributed broadly but they didn't plot it in logarithmic scales And then they proposed a natural growth model where they consider one metabolic network and it is expanding by recruiting a reaction one by one from a large pool of reactions So they were able to reproduce many properties Including the fact that At the center of the metabolic network you find more likely the reactions having a large organism degree So the way they define the center is based on the natural properties like the bitenin centrality and so on Also Maslob and Snapple and this statistical physicist was interested in explaining specific scaling between the number of transcription factors and the genome size There is a quadratic scaling So they introduced a similar model This was published before the previous one So they proposed a two box model It's also similar So taking reaction from the pool and metabolic network is expanding And then the main idea is that they add a sequence of metabolic reactions And they assume that this pathway segment is activated by the same transcriptional factors So they were able to explain this quadratic scaling And also Borenstein and their colleagues was interested in explaining human microbiome So their idea, the main difference is that they consider the interaction between multiple species metabolism to explain the human microbiome Right, so this is a brief introduction about the previous studies And so, let me tell you first our model result So the main difference of our model is that we consider both network expansion and also speciesation So we consider both the growth of the network and growth of the species tree This is the animation So circles are species And in every species the metabolic network is included And new species are born And network is also expanding So this is one of our model Before telling you the details of our model I would like to advertise that this model explains a lot of empirical features So here yellow circles are from the simulation of our model And blue triangles are empirical data So you see their agreement This one is the Gaussian distribution of the number of reactions in a species This is the problematic distribution The power law distribution of reaction popularity And here this popularity I forgot to mention this The popularity f of a reaction is simply equal to the number of species containing the reaction r divided by the total number of species So f is ranging between 0 and 1 Also, the lower two panels were generated to test Demonstrate the predictive power of our model So it reproduces well the power law distribution for compounds in the metabolic network And the last one is the characteristics of the species tree generated in our model So we measure the distance between species based on how similar The two species are regarding their least set of metabolic reactions So we use the Jacquard index I'll explain it later So among these we will focus on this feature And our original motivation is to understand and explain why this power law distribution emerges So why such a power law emerges can be understood by considering a much simpler model than presented in the previous slide So please think about this toy model So initially, suckers are a species and scares a reaction Suppose imagine that there is just one species having one reaction And then next time this species obtains another reaction number 2 Also at the same time new species born it inherits the reaction number 1 from A So A is a parent of B But simultaneously species B obtains a new reaction 3 This is repeated So at time equal to A obtains another reaction 4 but and give birth to another new species C C has reaction 1, 2 and additionally has reaction 5 and so on So these yellow color reactions are the newly recruited reactions at a given time And then this is the situation at time equal to 3 So you see that reaction number 1 is found in all the species On the other hand number 2 reactions are found in 4 species number 3 is found in 4 species And reaction number 8 is only in species A So This can explain the origin of the power law distribution So we can compute everything number of species is doubled and number of reactions is doubled as P and R T can be obtained exactly And also we can compute how many species will have a given reaction R So it is given by exponential function with tau R the birth time of the reaction So the earlier a reaction is born then as R will be larger for the reaction And popularity is obtained by simply dividing it by the total number of species So popularity of a reaction decays exponentially with its birth time or first recruitment time So here reaction Number 2 was recruited at time 1 So if we look at the situation at tau equal to 3 then the time interval is 2 So 2 to the minus 2 That is one force is the popularity of the reaction So popularity distribution can be computed in this way So if this is the definition of the popularity distribution We check out every reaction whether it has the popularity f And then we change variable to birth time tau And then this is Jacobian And we know that how many reactions are born at given time So R tau plus 1 minus R tau gives you the how many reactions are born at given time And it is the 2 to the tau in this submodel And this Jacobian tau tau decreases with tau So the derivative of tau with respect to f So this increases with 2 to the tau So the birth time distribution behaves as f to the minus 1 And this Jacobian also behaves as f to the minus 1 So its product gives us f to the minus 2 It explains why we see the power load distribution This summarizes the result But this model is satisfactory but has a problem So the exponent is 2 Not 1 So sadly And also Every reaction is recruited just once Of course it is inherited But for instance the reaction number 4 is recruited at this time by species A And then it is no more recruited by any species at any other time And also in this model Spassation and metabolism expansion Here metabolism is considered as the set of reactions Its expansion occurs simultaneously So their time scales are set to be equal And if we consider the network structure of metabolism then the recruiting or reaction will be more selective It should be able to be maintaining non-zero flux when it is added to the metabolic network It should be considered So we want to address these points further So this is an improved model In previous slides I showed the result of running this model So we consider universal pool of reactions And here RSA is the standalone reaction set They can be activated standalone They can be activated Even if they are inserted into the metabolism alone Because it uses the substrates that are externally available So this summarizes the main steps of the model So this is the algorithm So initially the beginning is the same as the time model So one species having one reaction exist Then at every time step Every species undergoes one of three transitions Expansion Rest means doing nothing or specification So expansion or rest specification is divided depending on the reaction selected for insertion So every time Every species selects one recruitable reaction from the universal pool And then examine whether the species already has a similar reaction to the candidate reaction So the meaning of the similarity The similar reactions are the reactions sharing substrates or a product So if there is a Suppose there is no similar reaction in the current metabolic network of the considered species Then the new reaction new candidate reaction is just added to the metabolism Suppose that that reaction new reaction has a similar reaction already in the metabolism Then rest or specification happens with probability mu So this is the only parameter of this model Then R new replaces the similar reaction and then new species is born So Please see this one So this is the parent species And new reaction is this one So this new reaction replaces this one The parent species is still there But new species is born And comparing these two metabolic networks This one is replaced by this one And this one was dropped Because it cannot be activated anymore Right? So mu controls the specialization rate So we repeat these steps Until we obtain over 5,000 species The same value as the empirical data The same number of species So I showed you the result in the previous slide This is a snapshot Another snapshot of the simulation of this model All right So how many minutes do I have? 8 minutes, all right So From now on What I want to do is to do the similar calculation as I did in the toy model So I'll compute the number of species and number of reactions as a function of time to derive how the popularity distribution behaves as the popularity So in this model there is one parameter specialization rate And depending on this parameter result are changing But please look at this line So we know that mean number of reactions per species is empirically 1,500 And we ran our model for these four values of mu And among these four values mu equals 0.02 is gives the most similar numbers of reactions per species 11,100 Right? At mu equals 0.02 Right? So I'll present the result with this value of mu in the slides that I'll show you from now on One remarkable feature of this model is that Yeah It shows a crossable behavior in the popularity distribution So when mu is equal to 0.02 that crossable is not so clear But when you use a larger value of mu with this value of mu the metabolism size is smaller than the empirical one Then you see that two regions are seen So slow decay and fast decay of this popularity distribution And the exponent this region the exponent is 0.7 or 0.8 And the fast decay region the exponent is 1.5 or 1.9 So I want to say that to me it looks that the larger exponent is close to 2 of the toy model the wrong exponent And 0.78 is close to the empirical value And when we plot the simulation with the empirical data together then this slow decay looks similar to the empirical one So my point is that there is a crossable behavior And let me denote this crossable scale of the popularity of the AppStar And AppStar also depends on parameter mu And with such a small value of mu as 0.02 then that crossable is not so clear So the argument with the empirical data is better So the first term is the number of species as a function of time It was 2 to the t in the toy model And it's similar So we can write down the equation for the number of species And mu is the spatial ratio and alpha In our model new species appears only when the candidate reaction has a similar counterpart reaction existing in the current metabolic network So alpha is a probability that the candidate reaction has a similar reaction already in a species And if if it is constant then we can expect that as grows exponentially Right? And this is the result So it really grows exponentially and the exponent is measured as 6.31 Here We introduce the normalized time So t tilde runs from 0 to 1 dividing by the total simulation time Right? And we divided the time scale from the crossable popularity value So f star was 0.4 with mu equal to 0.2 t tilde 0.4 Right? And Such an exponential growth was observed for different value of mu So next is the number of reactions And this is the equation for the evolution of the number of reactions It also expected to grow exponentially but there is a constraint So 1-alpha is the probability to be expanding And 1-3-mu-alpha is the probability of spacation Either of the 2 happen then new reaction will be recruited But RT counts the number of distant reactions So newly recruited reaction may not be brand new setzen to indicate the new candidate reaction is really new brand new then our measurements So it says that it's better 이를 적용하는 기준입니다. 그래서, 만약에 적용한 기준의 function을 적용하는 기준의 1을 적용하는 것입니다. 이 부분은 거의 계속되면 그의 반응이 이번green restaurant는 동면들에 대해 만드는 상황입니다. 이еты와 groans 가ing in the previous slide, it is given by this factored times bet time S. And our expectation is that beta is almost constant in the initial time, early time, and later time, beta decays S to the minus one, so it is almost constant. So this distribution will be almost constant in the later time regime, 이 시각의 가장 중요한 차이가 있습니다. 이 시각의 차이의 차이의 차이점은 성장적이 성장되어 있습니다. 그런데 이 시각의 차이점을 그의 정지와의 정지의 정지와의 정지에 관해서는 하지만 어리타임을 사용하는 것에 대한 영향을 가진 것입니다. 그것은 중요한 부분입니다. 블루, 어리타임, 그리고 레드, 레드타임을 사용할 수 있습니다. 그의 정지의 정지의 정지의 정지와의 정지의 정지로 불타임을 제공하고 있습니다. 우리의 리액션을 통해 리액션을 제공하면 다른 시스템을 제공할 수 있습니다. 만약 우리의 리액션을 제공할 수 있다면 그 위치의 반응을 제거할 수 있습니다. 이 위치의 반응을 유지할 수 있습니다. 그리고 이 위치의 반응을 제거할 수 있습니다. 그리고 그 위치의 반응을 제거할 수 있습니다. 그리고 이 모델에 놓여서 푸짐을 깨는 것과는 조립기의 정치기의 정치기의 정치기. 그래서 벌써는 전기적으로 확산기의 정치기의 정치기의 정치기의 정치기는 정치기의 정치기입니다. 전기와 전기적인 정치기의 정치기의 정치기입니다. 다� pins if the late-time regime, the Borscht's distribution is almost constant. So if you remember this formula, then p tau is almost constant in the late-time regime. So it cannot give the f to the minus 1 contribution. That's why we observed, in the early time regime, we see p f is given by f to the minus 2, and late-time regime we have, pf가 f to the minus 1으로 적용되어 있습니다. 다시 적용할 수 있습니다. 그와의 차이점이 다를 것입니다. 그리고 이 두 quantities, 드리버티브, 버스타임, 이 자코비아는 f to the minus 1으로 적용되어 있습니다. 이 exponents are 0.7. 1의 차이이의 차이자 이 exponents essentially determine this slope 0.766, etc. 그래서, 이 exponents of the popularity distribution is essentially determined by this Jacobian thing. 알아요? 그리고 마지막에 이 월에 대해 공개한 레트로가 있었습니다 그리고 그는 우리의tim e elastic power of this mother so we checked out the properties of the species tree generated by our model. distance between species based on the dissimilarity of their set of reactions. So we define the distance as 1-jacquard index of their set of reactions. And it's the distribution, sensibly dependent on the parameter mu. And luckily, we found that when mu is quite small, 1-jacquard as the point of 2, then this black one is the empirical one, and the yellow one is from simulation. So this distribution overlaps with the empirical data. So it is a plus of our model. Right. So in this world, we wanted to understand the origin of the popularity distribution, taking a power law. And we found that birth time of a reaction essentially determines its popularity and its distribution and dependence of popularity on the birth time essentially determine the statistics of popularity with that we observed in empirical data analysis. All right. Yeah, thank you very much for your attention. Any questions? Can you tell what the rate of new reactions being recruited is? I mean, this. Right, so mu is the rate of giving birth to new species. So mu equals point of 2 means that when you expand your network 50 times, and then you have a chance to give birth to a new species. So yeah, it's the rate of specification, right? Any other questions? If not, let's start here. And let's thank you Professor Lee. Right.