 Okay Okay, so I've got to stay here frozen in place So I wasn't quite sure what to present At this meeting, but I think that especially given what Stephanie will be talking about later on this afternoon Where she's presenting some results about how looking at the archaeological record can inform our current thinking About how to go forward. I thought it might be worth presenting some results that I did several years ago now With my great collaborators there my price Tim Kohler Brendan Tracy. I'm Hedgemer Schmauer and J1 Shin On looking at a data set that was actually collected by somebody else Peter Turchin who's somewhat a big controversial figure and Analyzing it a bit more carefully there were comments made this morning about you know, the old platitude that All models are wrong, but some are useful There are some important consequences of taking that kind of message to heart that I'll be presenting in this talk And if there's time remaining at the end, I would like to Basically point at a bunch of approaches a bunch of mathematical tools That Stephanie is also involved with that have that would provide hopefully a way and we think to go much much more Much much more sophisticated in terms of these analyses and could also be applied a far far more wild are widely Okay, and so I'm actually more comfortable if people ask me questions during the talk so long as it doesn't get out of hand So if there's anything here that's not very clear, please let me know so First off, I'm Peter Turchin with a group of about 51 other Archaeologists got together about five years ago to collect a data set to Create a data set by collecting data that Extends over ten thousand years all six inhabited continents all civilizations and just start analyzing it Ultimately he Peter does have motivated reasoning He's got a he has a hypothesis about what is driving history that he wants to use this data to actually confirm He and he's recently come out of picking up with a paper in science advances That's using a little bit more sophisticated and a little bit more updated version of this data set Where that's one of the major lessons that he is drawing But credit where credit is due this on project this data set that he Curated and brought together is an amazing thing and they did a very good first statistical analysis but there's a lot of Extra things that one can discern from the data and that's what we'll be talking about today So first I'm going to go into what this data is and then later on I'm going to be talking about a more this You got more information from this data Here's the PNAS paper as you can see as I promised. It's about 50. I think 51 maybe Archaeologists these are people who had expertise in different specific civilizations in different times so one of them is a expert in the Khmer civilization for example another one is an expert in Peru ancient Peruvian civilizations There's another one who's an expert in Latium the Latium plane and so on so forth. So they there was a consensus reached on how exactly to quantify a set of about 45 characteristics of The civilization that is their expertise Every single one of them came up with these 45 numbers for the civilization that's theirs expertise and then the data was brought together and There was these are the civilizations. They were total of 30 that they ended up Putting it together into this data. You can see here are their locations all across the planet and What they did was they took the original those what they call those natural geographic areas there's lots of Crude attempts to correct for statistical biases and how they formed this data set in the first place They threw out a lot of data to try to actually make there be things like uniform coverage To a Bayesian to someone who's anal retentive about such things like I am you never ever no matter what throw out data All data is at worst Useless, but if you're careful in your statistics, it'll never actually be something that you don't want to know about But in any case it makes sense to be quite honest for a first approach, but they did so they settled on those 30 national geographic areas, which I just showed you a hundred year time slices From the beginning of the agriculture Far back is 9600 BCE depending on the actual quality in question to very very close to the modern period And so what you had here is very very important is longitudinal data sets. It's not cross-sectional It's longitudinal you can watch different civilizations. The reason that's so important is that different civilizations? So to speak had different birthdays So if you just take a cross-section of a group of people who have very very different ages and try to discern something about Dynamics of an individual Humans lifetime you're basically screwed you want to be able to track individual humans And that's what they were doing in this data set and they called it the session data set They are one thing that they did. I think it's on this next slide Yes, so they took the 51 Different variables and they then here's one of the first hacks They actually Collapsed them down to what they called nine complexity characteristics These were the nine variables that they then I'm sent it to their Subsequent statistical analysis and you can see here what those nine variables are so for example in infrastructure They were a bunch of binary that variables And do you have markets or not? Do you have full food storage now? But a lot of these variables actually had small number of numeric Values for example, it could be ports. How big was your port? Irrigation different ways of quantifying the sophistication of the irrigation and so on and so forth They had substantial problems that they had to finesse because a lot of there were things like not only choosing which There is to ignore as I mentioned but also to deal with missing data They know civilization can you actually fill all 51 variables to all these one century time slices? So they use as technique that standard and statistics, but it's frankly junk called imputations And what I'll be pointing to at the end of the talk. Hopefully is how you can use Fitting of data sets with the longevity on dynamics that completely circumvents that issue. So anyway Then the okay, so here's the way that they started I can carp about it, but the truth be told it's a very reasonable first pass of what you're going to be doing with this data set All these decisions that they made Then they got a result which frankly I was 100% convinced has to be wrong Every historian ever knows that this is junk, but it was true They did just a principal component analysis The dumbest possible thing you can do to the entire data set in the space of a nine complexity characteristics and They found that the first principal component so it's people who don't know that's just your visit fitting a single Gaussian to your data and The first principal component the major ellipse of this Gaussian Which is actually about equally loaded on all of those nine complexity characteristics. So it's not just ending up being one of them It explains over 70% of all the variability So you can just look at that one direction in nine-dimensional space essentially Point to yourself and this is like a historian's version of that T-shirt with a picture of the Milky Way galaxy saying you are here. You can just point to where you are on that one axis and say this is where your civilization is and Civilizations all moved from low PC one values from low values of that principal component to higher ones Absurd this is like the revenge of Toynbee or something like that. Everybody knows this can't be true So the first thing that I did was I tried to dive into the data and find mistakes, but it's legit Here's as I say the loading of that first seven principal components again in the nine-dimensional space Here is some cross-validation kinds of things that they did one of the rather powerful natural experiments as economists some we'll talk about instrument of variables is something called the Atlantic Ocean You can compare New World to Old World and in these Times that this data is collected from you're pretty sure that they were completely independent And you can actually see that the same phenomenon is going on in the New World in the Old World So this is actually human civilizations The experiment was done twice Okay, let's see what we got here. Yep. That's as I was just saying the held out data for in the North America And for those of you who are history junkies It's actually you can just spend it hours and hours looking at the actual data and what it's showing So here are just some examples to sort of wet your appetite Here's the years and here is that principal component that first one that explains almost everything and here is upper Egypt and the middle Yellow River Valley and as you can see there are obviously fluctuations and here are some other civilizations Susiana the catchy plane. I think that's actually Kyoto narrow Latium that's much of a contains where we are right now, but it's certainly Rome So Diana so Oxford River civilization and so on and as you can see there are many major fluctuations down fluctuations Well, whether the fluctuations or not there are many Dips downward in PC one with time So it's not that PC one exactly equals time But nonetheless it does seem to be something that basically you can view it as a measure of the complexity of A civilization, you know, this is something that sociologically you are not supposed to say that one Society is more developed than another one, but to be quite honest the PC one value plays that role a lot So and here's just how they go with time Alright, so now I'm going to be taking those results of Peters, which so I went through this paper Confirm that boy. It really does seem to be legit. How can we take it further? So principal component analysis Here is an example It's a simple sine wave and if you were to do a principal component analysis So you're going to try to fit that data with a single Gaussian You will find that the first principal component is this red line and the second one is the green line This red line is going to explain almost all the variability You'll be pretty sure that yep. This is just a single Gaussian Notice that you're missing a hell of a lot of important structure So, okay, given that What we're doing? Do we actually maybe have this kind of a structure underlying the PCA? Underline the data set. Yes, we do This is PC2. This is the second most important principal component and this is PC1 It actually looks just like that sine wave This is the data. We've dev a sliding window here. That's what these error bars are reflecting and you can see all civilizations This is our history folks We seem to be going down in PC2 as we grow in PC1 Then we hit this point a hinge point We called it and you now start increasing and then you start going down again Okay Well, what's going on notice? This is all model free really in a certain sense or it's put another way through again Point to that earlier talk This is taking the original model which is a single Gaussian and saying well, not only is it wrong? It also might be a little bit misleading. Let's go into the data a little bit more But it's still model free in the sense that there's no at this point Well, I'll show you so far. There's no underlying thesis about this is the way human cultures work This is just raw collecting of the data material. Sorry David. So how much? Variance does the PC2 explains? How much does it explain after the PC1? Yeah, let's see. I don't know the number of hand it would go back to This right there So this is the amount That's explained and so PC1 is up there just under 70% PC2 It's actually very very close to a bunch of the other PCs. All right, so this point I've not said anything about humans really. This is just raw time series analysis This could have to do with populations of frogs and a pond or what have you and that's not completely glued by the way Huck upon I've been talking off and on about applying some of these techniques to ecosystem modeling but now I'm going to actually dive into some Social science kinds of hypothesizing about what's going on here So, well, let's look at those nine complexity characteristics. These are the this is what they are for the top two PC components This is the population of the polity the territory of the polity The population of the capital number of levels of the government the government type and so on and then over here is the Infrastructure how developed it is Writing there's bit various kinds of writing all the way up to ultimately I think they was listed here as the most sophisticated writing would be alphabets But I'm how widespread texts are money. There's of course many kinds the most quote primitive money is barter You can end up going all the way to fiat money Thanks to a Yuan China civilization. Actually, I think it was town but in any case and So here actually is the answer directly to Mateo's question 77% of the variability and 6% now if you look at what these are I'm going to be very very crudely Refer to these ones here. These have to do with the information processing in the polity in the civilization These ones have to do with just how damn big it is how fat That polity is so this one is sometimes called in archaeology scale It's called a scalar value not to be confused with what everybody else would mean by the word scalar and this one sort of Expanding a little bit beyond what the true data necessarily justifies We're going to be referring to this as the computational capabilities of the civilization It's how much information processing it can actually do Viewing a civilization as a computer in essence Okay, so Given that there's these two values here notice by the way PC to flips from having negative loading On the first components to having positive loading over there Moreover because PC one is positive and by definition PC to must be orthogonal. We know it has to be that case So it's got to be the case that is PC one is going up some PC to components are going down and other ones are going up and if you use that and apply it to this particular data set here is the actual sum of The PC to components and here is just looking at the negative loading ones and the positive loading ones so what we're seeing here to interpret this for you is that up until this point right here the The size characteristics of the polity are increasing Without much change in the information processing capabilities You're getting bigger and bigger and bigger, but you're still dumb ass Eventually you get up to a sufficient hinge point a threshold and all of a sudden you say wait Now I'm actually pretty big The causality who knows it might be that you can't grow bigger without developing more information processing capabilities Or maybe only now are you able to but whatever you tend to stop growing? Physically, but instead start developing greater and greater information processing capabilities your computer stops us adding more and more Slow memory and now it's getting a bunch of GPUs added to it until eventually you get up to a sufficiently sophisticated My information processing capability and now you can start growing and getting fatter and fatter again That's us. That's humans folks Who knows what it would be with some alien civilization? But that seems to be us and who knows why this is happening, but this is just what the data seems to be saying minimal modeling There's many many other patterns in this data This is actually the data shown the the time series this is PC to this is PC 1 These are all the it's color coded the color is just to try to distinguish the trajectories of the different polities But there's all kinds of Properties in this data set first of all notice that this huge regions of PC 1 PC 2 space Where nobody's there? Why? What's wrong about those regions of PC 1 PC 2 space? Also notice here's PC 1 and so time tends to be going left to right What seems to be happening and this can actually be quantified now with techniques I'm not going to be showing here is that societies all start to spread apart But then they start to to use a biological term. They start to canalize Over here. It seems like everybody's getting on the same highway and we're all going in the exact same direction and we're converging to the same spot and Interestingly, these are the two hinge points So this is the one where you now start developing information processing capabilities And you start going everybody in all which directions and then once you have sufficient information processing capabilities everybody starts homogenizing and Of course, this is all there've been many many things Remarked in this talk about things like loss of language diversity Worldwide and so on and so forth That's all part and parcel of this kind of a trend in human civilization in that case of the global scale Okay, but other things to notice These patterns they hold are mostly for old-world societies. Those are the blue ones New world societies is not so clear that they do This might be the fact because to use very very loaded language New world societies never actually got as developed as old-world societies. They never got over here in the PC 1 space But that's to be determined. These are the kinds of things one can start to actually tease out of the data Yes So as you asked for questions during the talk, and you just said new world societies never got as civilized as old-world ones How about things like the Aztec and my own society? I flipped through that let me be more careful in my language The new world societies that are recorded in the session data set So this is actually things like a Oaxaca and I think Monta Alban Inca was not here Tenochtitlan is not in here Teotihuacan is not in here I think the Maya were the Maya were in fact. Yes, I'm sure of that, but that's about the extent of it So based on the data that you have and given that you're seeing that dissimilarity How representative do you think these patterns genuinely are of all trends if You know that they're not representing all stages of development in these societies So very very good and this is gets to the issue that I was emphasizing before you never ever throw out data They did because they wanted to try to get essentially uniform coverage over the world And they also want to try to pick early stage middle stage and late stage what they going in before even analyzing the data viewed as Early middle and late stage civilizations So they just did that as a first pass to try to de-bias it because obviously there's a hell of a lot more data in say Latium than there is in say something like sub Sahara Africa So completely and and there's a lot try to point to at the end of the talk How long how much time do I have I have 20 minutes? Okay, and I'll try to point at the end of the talk There are some techniques that have been developed in machine learning Let me actually point to it now Do you know What fucker Planck dynamics is laundry von dynamics stochastic differential equations? Okay stochastic processes there have been some techniques that have been developed in machine learning recently on Fully Bayesian approaches to actually fitting a fucker Planck dynamics So a stochastic process To a data set where you can have missing data. You can have completely you can have missing fields You can have all the data in there You don't it's there are no biases that come in by having extra data points This is very very powerful machinery Well very very importantly the fucker Planck dynamics the was close to cast a differential equation It is not varying in time, but it's allowed to vary in space So it's allowed to vary across this entire system here And that is the proper way to address your questions and a whole hell of a lot as well But that's not here and it's also not in this paper Where we did this initial analysis? So that's probably more than you want to know but none of us David maybe Just one other question. So why do you call this second component information? It looks like it's It's a measure of the amount of interaction or exchanges that are occurring Well, I Don't I don't know it look one couldn't see that it is I mean these are certainly if you do want to view a society as a computer and insert here all the cliches about how every scientific era likes to view Other physical systems in terms of whatever the sexy thing is in that scientific era And so viewing societies as computers is maybe a silly thing to do But nonetheless if you do want to do it This makes sense as this is the information processing capability if you have an alphabet Your writing system is first of all it's going to be much much greater literacy And in general you're going to have many many more kinds of text early writing Pre-alphabetic writing was first only scribal classes could ever read it or write it So it's a very very limited part of your actual society more over there's in many early societies Where you work had very very early scripts most of what was written was actually just Bookkeeping there was almost nothing in terms of what we would consider Either the sciences or the arts because it was very very practically oriented. That's not completely true for example Chinese Chinese Calligraphy that that actually originated by divining the future by looking at cracks on the shells of tortoises and things like this But by and large for example the more sophisticated the writing the far greater It is possible to actually use writing for society as a whole to develop itself Some people say that Gutenberg's printing press is one of the most important inventions ever because it basically unleashed the virus of the fifth of the Early European version of what we would now call the internet revolution that once you had the printing press Literacy was now much more widely and distributed many many more texts could be written You didn't have to have these silly monks and monasteries Writing down things very very tediously. This was a similar thing. So and that's true for money Certainly as well one of the major uses and economists will tell you of money is to try to essentially be a hyacin Construct that allows the market to be the computer and so on if you don't have fee up money If you just got barter money, you're not going to really be getting there So it does make sense. I think to view these as Whether that's the reason they're important or not who knows but they are actually variables that enable a information processing abilities of the actual society so I mean, I've got a lot of questions around these that can wait for later you stated that the Information spreading often comes in when societies reach a certain size and obviously I mean I think of the Victorian era and when we started to evolve ways of dealing with ways But What it makes me think of is some of these ancient societies collapse because of what we think may be Related to disease and pandemics or as evidence to suggest that for some of the Latin American civilizations and the climate civilization Has it ever been examined to see if there is a lack of information transfer when they hit that Significant size that you actually see other factors coming in that could lead to the extinction of those societies So there are many things that can happen to a society That do not have to do it just directly these variables for example volcanic eruptions Pandemics coming through the Justinian plague Take your pick. I'm terra erupting Maybe or maybe not having to do with the crops the minimum civilization One of the things that the stochastic process modeling techniques Provide is the following I Can look at a data set Hit it with a stochastic process Say that okay look at this particular event right here though How probable would it have been? Under this long-term stochastic process that seems to gen to govern the dynamics of human civilizations If the answer is very very Improvable Then that is great evidence that you've got an external perturbation on your system So I that's not the subject of this talk But going to stochastic process modeling where everything is probabilistic is Extremely powerful. There's lots of things you can do with it Some of them are to adjust the very very question that you just raised Unknown unknowns. When are they actually which big glitches? I'm seem to be due to the unknown unknowns in the sense of not being recorded in the data set Okay All right, so Yeah, oh Probably the way that they Normalize their numbers. I don't know. I'd have to go back and dig into it. Um, there are Whenever you dive into data sets you always find there are going to be some glitches and I should say That's such that in those recent science advances paper. They have developed it further There's now more than 51 variables and they got a lot of pushback mostly from archaeologists. There are big fights About the way that they were collecting their data and so on and basically they won those fights But nonetheless, they've modified a lot on how they are signing numbers in like such that version, too Let me actually give you an example of a fight and In the small amount of time I have left so this is an interesting Topic in almost sociological sociology of science There has been this hypothesis kicked around. I'm Joe Heinrich and others are one of the major proponents that a Necessary condition for a society to get sufficiently complex is that you have to have a religion based upon moralizing high gods It's tied up with people who believe in things like the axial age, which by the way is junk Um Peter church and actually is a great paper that he just basically takes that to shreds The notion of a why a moralizing high god is necessary is the following. It kind of makes sense as a cartoon Go back to the standard deities, you know say Greco-Roman deities. These were essentially Zeus Apollo all them The dudes and a couple of the dudettes. They were all really just powerful humans in all way shapes and forms and for you to actually Get them to look kindly upon you and to benefit you you just had to offer up enough sheep and cattle and maybe Get yourself sufficiently stoned in some of the Pythian caves and this that and the other and that's how it is that you went Through life and did your religion? Moralizing gods the Abrahamic religions Notionally some of the Hindu deities and if you want to actually say Hindu wisdom is religion Um Potentially you could view wisdom this way um the moralizing gods said no no no no none of that for you to actually get the benefits of The religious realm you've got to be moral and good Why would might this make um any kind of difference the cartoon is that well if Everybody is now being incentivized to be good Rather than just my chop up enough of their firstborn sheep That means that they are much more likely to actually have engaged in reciprocal altruism They are going to be much nicer to people that they don't happen to personally know if They think that you're being watched by the big dude in the sky Who's going to really slam on you if you're not nice to people just because they happen to be from a different tribe That's the cartoon Peter and so as I say I'm Joe Heinrich and many others We're saying that this is in fact true that you need to have moralizing high gods before you can have a jump in complexity in civilizations Peter I Understand why he did this I would have as well or I would have possibly He was very very eager to mine the sesh at data set just after came out So he actually came up with a nature paper. He and his was published in nature Nature nature not nature of human behavior. It was later. They have to actually retract it but For for other reasons besides this one what they tried to do is to use the sesh at data set to address this question Are moralizing high gods a precondition for societies to have a jumping in complexity? The way that they tried to do it is out for each society in sesh at they did a simple time series analysis of the PC One values to find the large jumps Interpret those as jumps of complexity and then said well Did they pre-see the onset of moralizing gods and they concluded in fact that moralizing gods always occur after a jump in complexity? So the causal arrow they said if there's anything the other direction Okay, well We've got other measures of complexity here Besides just PC one We realize that well, we can actually use these hinge points as measures of complexity. So let's see what that says because There's actually a major statistics flaw in what Peter Adalia. We're doing that nature paper in order to actually Carry out their program of doing this time series analysis You need to have PC one data in society from before the onset of moralizing gods to after Do that you have to and so what they did throwing out on data that All over the place they excluded societies from their analysis where they didn't have such data This is very statistically biased Because it's basically going to be much more societies that have the moralizing God onset much more recently a much more likely to Satisfy those conditions It was realizing this that led us to say well Maybe we can use the hinge points instead and so what we found So anyway, as I was saying Alternative that we investigate was to see whether PC one of society has reached its first hinge point That's how we are assessing social complexity and see where the moralizing God onset is compared to that Anybody here want to make predictions of what we found? basically Moralizing gods are irrelevant There's too much other stuff going on not completely irrelevant presumably they have some role to play But here's what we found remember the first hinge points right about there and There we found that there are some onsets of moralizing gods that are before the first hinge point and some of them are after You know, maybe you might want to say that there's more after than before But you would have to be careful about your statistical biases and when you can actually measure the onset of moralizing gods But to a first pass No, it's just a nice cartoon, but it's not really got anything to do with anything Okay, so many many other things to go through I should probably start to Finish up. Okay. Let me skip this. There are no surprise some statistical artifacts we found Having to do with their use of imputations One thing that I should point out is It's actually a very subtle kind of a stochastic process In this kind of a data set there's actually two stochastic processes that you're going to be see that's going to be going on One of them is just the dynamics if I'm a civilization and I'm here in my Non-complexity characteristics. Where am I likely to go in the future? That's the dynamics stochastic process But especially because remember, this is longitudinal. That's a big power of this data set Rather than just being cross-sectional There's another very very important stochastic process, which is when were you born and There's presumably If you look early very early the probability of any given Civilization being born then is very very small and it grows with time These two stochastic processes come together to actually distort the kind of data sets you see we found That in fact it looked like there were two Gaussians Remember PCA assumes a single Gaussian. We thought that they're well my god If you look at the data looked like there's actually two Gaussians and maybe it's a mixture model But then when we really dug into it We saw that because of these two stochastic processes going on You could get a pretty crude fit to it by just modeling the birth process Plus a simple Markov chain dynamics. It's these kinds of issues the stochastic processes going on here are very subtle Where the where do we go from here? That's pretty much the end of what I've got to present here as I mentioned we Mami and some collaborators And I've been talking with Huckabee and so on about playing this to other data sets I'm like microeconomic data and so on why is any of this relevance to this conference Based upon what we Are experienced as an analyzing session We've now taking these tools that have been developed in machine learning Which allow you to be a much more first-principled much more powerful Stochastic process analysis of a data set Where you can do things like detect was there a perturbation from outside the system another kind of thing that you can do Asking very loaded questions Is it a good model to say that India right now is just where Japan was 30 years behind? We can actually start to address those kinds of questions You can actually apply these kinds of techniques that we're developing to macro system Dynat to macroeconomic dynamics. You can also apply it to climate dynamics Basically, it's very very agnostic Modeling fully Bayesian where you don't put any interpretation in and yes every model is only in approximation No model exactly correct, but there's lots of great powers of using some of these more recent techniques To actually do the analysis rather than a conventional time series Delay embedding time series where you would need massive amounts of data to get anywhere So Thanks David very interesting talk so many You know such the whole arc of history there First of all I really like to hear that you said you're thinking of applying this to ecosystem data as well too because our work in Ecosystems are that they go from stages of growth and scalar Increased to development and focusing on information and all of that so that I see that fitting very nicely With that which then brings me back to those nine characteristics. None of them were natural resources based and That seems to be a problem unless it's just implicit that the city couldn't start unless it already had something like that But it would be nice to maybe try to consider that. Oh, yeah, it's a big problem There are many many things to go forward with this whole approach People should be joining. I mean that's one kind of a thing that could come out of this kind of a meeting Have people with data sets in all kinds of fields related to sustainability Throw it all together get into one big data set That you could then just use stochastic process fitting to that data set and just see what the hell comes out You'll be able to see I expect fully just like the hinge points Which we didn't anticipate that there will be things that we don't anticipate about climate human behavior interactions Almost guaranteed. There's things that nobody in this room is thinking of that are actually very important That such a data set would actually uncover for you and something I should actually emphasize that's so powerful about these new techniques If you go to anybody who's an expert right now in nonlinear time series analysis Based upon delay embedding. It's called people are familiar with this armor models or Or the regressive models are a linear version of this to be able to Do any kind of a reasonable fit to something and say even just a nine dimensional space You would need probably depends but on the order of hundreds of thousands of data points To do it with these recent techniques because it amounts to assuming that you've got a Markovian dynamics That is allowed to vary across your space the such that data set none of the time series Contained more than several tens of data points So especially when you're data poor Or just anything other than hugely data rich These new techniques that these people have been developing are Extraordinarily powerful. They should be used essentially all sciences. I would think Rather than or in addition to standard nonlinear time series analysis in addition to the fact that they can tell you things like External perturbations and you don't need to use imputations to deal with missing data and just so when women win Characteristics, okay, just as a follow-up question to this one So all your analysis depend on basically this predetermined pre-specified nine complexity characteristics, right? Can we ever imagine, you know having a society to develop to and you know information to a point where you have such Information present capacity that you have so many more complexity classes So basically just can we even study or talk about the emergence of a new complexity characteristics or something like that? Or are we sure that these are actually, you know constructing with societies and it's just done there's I've done a lot of work my day on machine learning for Customers business customers or engineering customers like when I was at NASA There are several mantras one of them is you never throw out data The other one is you're not going to get very far without trying to use domain experts Quote-unquote people who really know what the things mean Don't be full of yourself. You nerd scientist. These are people exploit them. They really know what they're talking about Sometimes they're wrong, but really you've got to understand them That's in essence what was done here. It was domain experts who came up with those nine complexity characteristics You could have Alternately just worked in the 51 dimensional space directly. They didn't want to do a PCA in that 51 dimensional space But what you should probably be doing is not PCA. You should be doing what's called manifold learning You couldn't even imagine using what's called topological data analysis If you were to do something like manifold learning to actually just find what is the low the lower dimensional manifold That everything reduces to these the important ones and then look at the dynamics there One could very easily imagine that certain things are emergent You would need to be sure that whatever is emerging is not being it has to be at least somehow captured in the data set There was nothing in these data sets that said does this civilization have the internet or not? If you don't have that in your data set, you're not going to be able to see the onset of the internet But nonetheless, you could very easily imagine things like these complexity characteristics for a weird kind of a statistical hack Change with time one could imagine how sensitive are these results to some methodological changes like if you change the imputation method or if you read If you use a different way of standardizing some variables or if you exclude some variables rather than others, you know when you when you up Study these data sets you have to take tones of methodological choice has anybody do you have a sense of how robots are, you know, if you are the end results to Variations of these choices. So Peter used cross-validation and I am a very very firm believer I think that's the under Difference Presentation would be that cross-validation is really the scientific method. I would say Quantified made formal he used that with things like the Atlantic Ocean as I mentioned So that is one version of testing exactly what you're saying if you were one of the advantages if one wants to What we view it that way of having these nine complexity characteristics is that if you throw out one of your 51 variables You'll still have nine complexity characteristics the also the fact that the PC one was Equally loaded on those nine complexity characteristics means that even if you got rid of one of those nine complexity characteristics PC one would still be aiming in a pretty much the same direction that being said No civilization Had a complete history without any gaps in the data Archaeological data is just very very holy so to speak In a class a couple of senses and so they did use these ugly things called linear imputations very very hideous and They weren't aware of it, but introduced to the school artifacts. You can see them right here This was we were trying to figure out why the two Gaussian peaks And so then we did we went into really deep dive into their data Which they had pre-processed with these imputations which are a very ugly way of trying to fill in missing fields And we noticed these streaks We concluded that these streaks are not themselves directly relevant But nonetheless they are there one of the advantages to keep on evangelizing of Using these new stochastic process models is that it would be very robust against these kinds of things. It's Bayesian all The way through so not only do I get a stochastic process But I also tell you how confident you should be in the parameters of that process David just quick. I wish we didn't use the word complexity in any of this, but that's an aside, but I the I mean what you're finding is this peculiar Collapse of dimensionality right and is that a sort of operational definition of a civilization where all its various Factors become correlated Um in essence, that's what the mat if they had used or we should anybody who's Got the resources to be quite honest be blunt about it Which I right now don't to go after this should do it's up Like manifold learning where one would see exactly what you're saying what you're pointing to Where you would see as you know all of the data it really it's not spread out I mean notice I'm recall that I said one of the big mysteries way back here Was How come there are all these gaps? Well, if we if you look at things in nine-dimensional space, there's gonna be presumably much larger regions This this is going to be some kind of a weird sheet in that nine-dimensional space and that is exactly Reflective of the kinds of collapsing of all these different Features down to actually only one underlying variable Here's another way of putting it the the physicist in me and there is still a little bit of one Wanted to be able to take this dynamics and to be honest Try to actually model it as though it's actually a physics system where you use what's called a Hem holds decomposition to say can we view this as like a gravitational field? With all societies are being moved with noise according to that gravitational potential Turns out it doesn't quite work, but that's essentially it would amount to what would be so cool by that kind of thing Which is relate to what you were saying is that we would then discover this would be like Kepler We would be discovering the laws That govern human society dynamics purely from the data It would then be some kind of Newton further down the road who would figure out the why But we would exactly Tycho Brahe I guess would be better than analogy than Kepler We would then actually be discovering by doing exactly what you're saying What the dynamic laws are that seem to govern humans in a certain sense in a certain Okay, thank you very much David. Okay. Thank you. I think we need to move