 Thank you, Victor, for the introduction and for the invitation also. I'm very happy to be here, and I guess it's a great timing to present this paper on demographic change and development from cross-forced genealogies in early modern Europe, right after the discussion we just had. I very much agree with all the points that were raised, but I think we can get pretty nice insights from genealogies, especially in periods without censuses. Okay, so in this paper I studied demographic change and development using a novel cross-forced genealogical data set to study populations in the past and in periods without censuses, especially in the 18th century. I do that with millions of observations depending on what I look at, sometimes hundreds of thousands of iterations from publicly available genealogies. Second, I do a very careful evaluation of sample selection, and I show that sample selection is somewhat limited at least in the 18th and 19th centuries in Europe, which is a time when parish records are widely available for people to look at to search for their ancestors. I do that by comparing cross-forced data with census data whenever available, and I show that what we get from that is that there is very similar evolution in both types of data. Finally, because initially that was really a data paper, but I also wanted to have to document some stylized facts and also to reconstruct things that we already know just to see, to validate the data. As an illustration of what can be done, I document some stylized facts, especially I do that at the European level and I find large aggregate gains at the European level, large demographic changes. At a more desegregated level, I find important changes in the 18th century and I can document some somewhat novel stylized facts with this data. Okay, so let's go into the data and so an overview of where it comes from. So this is individual level data from genealogies and basically it consists in all publicly available profiles on the website genea.com with geocoded place of birth and death and intergenerational links. The data was scrapped by Kaplanis et al. computer scientists in a paper published in Science in 2018 and it relies on individuals reconstituting their family tree by searching through scanned records of birth, marriage, death, and publicly uploading the data online. You know they think the data is private but it's actually public and then these guys Kaplanis et al they just go and scrap everything and they also did like a careful matching in order to not have duplicates in the trees. So what I do with this, first I match it to current not European regions because the original geocoding was a huge mess and also I match it importantly to town level time varying Bayrock data on historical urbanization because this is simply because this is the best available measure of proxy of development at a very granular level and that's available comprehensively over time and space in Europe during that time period and this is an important step in order to match the data to over data sets to look at the effect of some town level original level variable on demographic outcomes which is something that people could do with this variable that with this data. Second I generate a measure of completed fertility from vertical and horizontal lineages so I will come back to that in a second and finally because some countries are over represented in the data especially Scandinavian countries that have such great road resources observations are related to account for the over representation of some countries. Okay so it's great because we've already talked about that but I will go a bit more into details the alternative to using the genealogies is lineage reconstruction or family reconstitution pioneered by Marie and by Marie in France and and Vrigli and Schofield in England and basically you know it's already been evoked here but basically consists in family reconstitutions in rural parishes from birth, marriage and death records and so basically they went to one specific parish and they would reconstitute like the life histories, life trajectories of each individual that lived in that parish at some point in time. Very okay I should say here I'm going to talk about the issues but there are many great points about this that have been raised by Lionel right before but there are some issues. First it is a very tedious thing to do it is costly it is very time consuming it is imperfect because of the nature of the data and finally it is not scalable and it is not representative. Again this is not too you know there are many great points with this type of data but we should keep in mind those issues. So first very poor handwriting in records they're hard to decipher so you know we've already seen a parish record here you can see some over in the in the 18th century basically it's messy it's hard to you know hard to understand what's there and you know users of genealogical websites will not necessarily do better than demographers but they can have an incentive because because they're looking at their own ancestors. Second records provide very imperfect information and they need a lot of extra work to be cross-divided so we've already talked about that the names can change age or age or date of birth is not always provided the dates are rounded up and so basically you know it's very hard if you see if you see Jean Dupont in one parish records then it's going to be very hard to match it's you know to to the birth or marriage record of that person and very important selection and representativeness only small rural villages can be studied and therefore there is no spatial variation which is an issue because if you want to look at the effect of you know some variable on demographic outcomes some fertility or on life expectancy or whatever you know having individual level records is great and and you can do a lot of stuff but if you don't have any spatial variation you're going to be very limited and and and finally related migration is not taken into account and creates completeness issues so essentially genealogical data is is cross-sourced linear reconstruction so just a quick summary of the sample there are two samples that I'm going to be looking at the first one the main sample consists in all individuals who were born or died in Europe in this data so there are roughly 10 million observations after 1400s among which about seven millions in the 18th and 19th centuries and I'm going to focus on the 18th and 19th centuries because these these will be the periods during which selection into the sample is limited I'm not going to say there's no selection but selection is limited and second the fertility sample which is going to consist of individuals in the main sample with a recorded fertility and for which the genealogical tree has a fully recorded horizontal lineage so I'm going to give more details on that but basically I'm going to impose like a little trick in order to to select to those people for which I think the fertility that is the absurd fertility is their actual level of fertility or at least close to and so for these guys in the fertility sample I have about 800,000 observations after 1400s among which 500,000 in the 18th and 19th centuries okay so going back to the fertility which which I'm going to focus a lot in this talk an important issue is that fertility not only requires a knowledge of the vertical of the vertical lineage of the direct lineage but also it requires a knowledge of the horizontal lineage right because when when people look at their ancestors most of them are actually interested only in their direct ancestors they only look at the direct lineage the vertical one and they do not necessarily go on the horizontal branches and if they don't go to the horizontal branches to their great great cousins and so on and so forth then the you will just observe the fertility of one one one one right and and this will be of course this will be biased because this will not be the actual level of fertility of these people and so what what we want is is the people the individuals looking at their ancestors to actually carefully record the horizontal branches of their lineage and so I deal with this issue by defining the fertility sample which is a sample for which there is a good reason to believe the horizontal lineage of a person has been recorded and and I impose one criterion on on on the data basically I look at one individual I and if at least one ancestor of I in the four preceding generations excluding I is recorded with a fertility that is strictly greater than one that is different than one then I will just assume that's for that individual the observed fertility of that individual will be his true level of fertility so basically if you know if these these couple they had they are recorded with two children in the data then I will assume that's the person that's looked at the the individual that recorded the ancestor that recorded the tree carefully recorded the horizontal branches and therefore whatever is the fertility level of individual I is in the data this will be his true fertility level okay so this is this is a map showing the the spatial distribution of of places for which we have at least one one observation in the data so this is not so great because in France you have for example in France you have 36 000 towns and they're all very small you know if you go to Sweden towns are much larger but but basically what you can get from that is that in southern Europe the data is not as good and basically the more north you go the better it gets but but we'll see that when we when we look at simple selection to okay which is what we do now okay so the the main thing you should remember is that selection to sample is limited I mean that I will show you the selection to the sample is limited based on observable measures so adult life expectancy urbanization and fertility are things that we can look at in this data and so to what extent so I compare the crowdsourced data with representative data again whenever available for each country so I have 30 countries in the data and I do that in a systematic manner using the human mortality database mostly based on census to look at life expectancy and to compare it to the crowdsourced genealogists for urbanization sorry for urbanization I look at the rock data and census data to compare it to the genealogists and finally for fertility I look at Colin Watkins data and what I do is that's I just correlate in this table I show the correlation for each region in Europe I'm not showing it for 30 countries I am aggregating the countries to to some regions of British Alps Central Europe Eastern Europe France there will be just one region because because I'm French Northern Europe and Southern Europe and basically this is showing for each dimension urbanization life expectancy adult life expectancy and fertility what is the correlation at the country level with the representative data this is just showing correlation we're also interested in levels right so there's a very high correlation for for these all these regions but but also in levels we find very similar results again the my goal here is not to say that there's no selection of course there's a lot of selection thousands of way selection could operate but it's actually somewhat limited so here at the level of Europe this is looking at at mortality so in the human mortality database in pink and in the crossroads data in in blue okay so at the level of Europe still looking at the rock data in in pink again and in blue this is showing the share of people that were born in a town that was coded as urban at the time of their birth so here as you can see there's a little overestimation of urbanization in the genealogists but actually this is mostly due to the fact that they rock is underestimating urbanization they rock is simply looking at weather towns where above or below 5000 inhabitants and basically like his his missing a lot of towns that are close to 5000 inhabitants and so he's underestimating urbanization and basically when when we look at at census data this is the urbanization rate we get from from from census data and it's actually very similar to so here this is in France and so looking at census data we get very similar pattern over time in the census data is only available from the from 1793 okay and and finally this is fertility in Europe comparing the genealogists to call in Watkins data and here this is the marital fertility index and calling Watkins which which which makes sense because I mean it makes sense to compare fertility in the cross-source data to the marital fertility index because in the cross-source genealogists this is conditional and having at least one child so we could guess that most of these people were married in the genealogists okay and then we can do it at say at the sub at the sub-European level for each region so for British Isles for Central Europe and Low Countries for Eastern Europe or France Northern Europe, Southern Europe so here this is looking at life expectancy at adult life expectancy at age 30 again urbanization in in in all of these regions and fertility in all of these regions and so as you can see in Southern Europe it's not so good in Eastern Europe too but we can learn a lot from this data okay okay so then I fertility is initially why I was looking at this and so I'm going to do an even more careful evaluation of selection by by looking at fertility and so on top of census data from Colin Watkins I leveraged two additional sources from family reconstitution in France and in England so you can see friends on the left panel and England on the right panel and so we've already seen the marital fertility index of Colin Watkins which is in pink and the cross-source data but then I add two different sources the first one is complete family reconstitution in a limited number of villages so this is the Louis-Henry and Brigley stuff and so basically in France it will be complete family reconstitution in 40 different rural parishes in in England this will be the family reconstitution of 26 parishes from Brigley out and so this will be the dash dots line and second I also look at an extraction of aggregate statistics in a large number of towns including cities which is only available in England and is a more representative source and so this comes from Brigley and Scofield in England and this is what you get there and so basically as as you would expect from from the Louis-Henry and Brigley data which is again is a great source but first kind of an underestimation of fertility and it's somewhat noisy data which we can guess is is is not necessarily your representative sample of France as a whole because it's only small rural parishes and so this is what you get here so in France and from New York and England from Brigley now but then when you look at the aggregative extractions from the 400 parishes in in Brigley and Scofield you get a very similar time evolution to the genealogists in England so which is reassuring Aguil yes sorry sorry so Lee has a clarifying question but his mic doesn't work so he asked whether this is total or marital fertility that you're plotting on your graphs oh yes so in the genealogist this is completed fertility so it is somewhat close to marital fertility or total fertility it's a different measure so it's conditional on it can't be close it can't be close to both I'm sorry it can't be close to both marital fertility and total fertility because lots of people don't get married yes I don't know if they got marital I mean marital fertility and total fertility are quite different denominators yep no so I don't know if they got married or not in the genealogists I only look at the level of fertility and it's conditional on having at least one child yeah so it's a very different type of mirror but in any case it looks similar to marital fertility when you compare it to marital fertility somewhat similar yes okay so then I document some stillized facts within this data so I do that at the European level and also at a more disaggregated level to capture some changes not necessarily seen in aggregate level statistics and basically I asked what is the evolution of human mobility fertility and adult mortality over the long run in Europe and our known facts found in the data okay so at the level of Europe the first last fact is that in the early 19th century there was a rural flight into urban centers so this was already known but we can document it in a pretty nice way so here this is looking at the log distance from birth to death in the data and here I'm plotting the median log distance from birth to death over time and you can see a big jump starting from the 1820s so this is in all of Europe as a whole this is the mean and this is the 75th percentile okay so this is basically capturing the rural flight and urbanization the second stillized fact or also it's already known is that in the late 19th century there is an important decline in fertility in Europe so okay so this is fertility in Europe as a whole and and basically it went from about five children per woman to three children in the in the early 20th century and the third stillized fact is that in the mid 19th century Europeans experienced an unprecedented increase in adult life expectancy again this was already known but here this is showing life expectancy at 30 in the data and and you know we can see the effect of the two world wars probably there's a bit of an overrepresentation of of persons that died in world war two in in this data but but you know this is just confirming things we already know we already knew okay then at a more disaggregated level and here I'm going to focus on fertility especially so the fourth stillized fact will be that the decline in fertility in France took place more than a century before the rest of Europe and so this is already known but I find the decline in fertility slightly earlier than previously thought so I find that in France the decline fertility took hold in the 1760s which is 15 years before previously estimated in Louis Henry he finds that the decline fertility started in 1776 and so this is before the French Revolution and more than 100 years before the rest of Europe so here you can see in this in this graph the number of children over time in the genealogies in France and England and Wales and you can see a very striking difference with a very early decline fertility and so this has been a puzzle for a long time so in a different paper of mine which will be one of my job market papers next year I argue that de-Christianization played a major role in this decline fertility okay so the second stillized fact is that in Europe there was a process of cultural diffusion from France of the decline in fertility again this was a known fact but here I'm able to observe the entire process of diffusion so spoiler and bag jar in a recent paper that's forthcoming at economy journal look at the effect of linguistic distance from France on fertility across regions in Europe and they argue that places that are more culturally distanced have experienced a later demographic transition and places that are closed culturally to France experienced an earlier demographic transition the example like the the most known example being Belgium in which the French speaking parts experience a much earlier decline in fertility than France and so here I'm just replicating spoiler and bag jar with my individual level data so this is the same specification with with decade fixed effects country fixed effects and just showing the effect of linguistic distance from France on fertility across regions in the data and so in the paper they show the process of diffusion after the 1830s because this is when the data is available Colin Watkins data only starts in the 1840s and so basically they show that linguistic distance has a positive effect on fertility but they can only show it starting from the 1830s so basically they don't if you can see my pointer this is they can only show these parts and then like once everyone has adopted the the norm of lower fertility then basically the effect of linguistic distance goes to zero here I'm able to show the entire process of diffusion and I'm able to show that before the decline in fertility in France started again in the 1760s there is no effect or a slightly negative effect of linguistic distance from French on fertility in Europe and then the effects of linguistic distance increases over time and it reaches a maximum in the 1850s 1870s and then it declined so I'm able to observe the entire process of diffusion okay and and finally still aspect number six and this one is completely new there varies a weaker intergenerational persistence of fertility as early as in the 18th century in Europe okay so basically this this figure is is using the fact that we have intergenerational matches and so this is looking at the coefficients on the log fertility of parents on the log fertility of individuals in the data so this is the intergenerational coefficients and in this regression I have decades fixed effects accounting for the time variation and I have country by decades fixed effects so accounting for all of the time variation for each country for each different country and and what I find is very surprising I find a weaker intergenerational persistence in Europe after the 1770s at a time period when the aggregate fertility did not decline so in in that in the second time period starting from the 1850s there is a decline in fertility taking place the second decline in fertility here takes place at this time and so it's not surprising that there's a decline in intergenerational persistence because there's a change in the education or income fertility gradient and also there's a lot of of of distributional changes taking place at the time of the decline in fertility and so Tom Vogel has a paper showing exactly that that's when there's an aggregate level decline intergenerational persistence breaks down however what's very surprising is that very early in the 18th century starting from the 1770s or 1750s there is a weaker intergenerational persistence going from point 45 to roughly point 25 and so this is very surprising because here the time variation is accounted for with decade fixed effects in the regression so this is not due to any time changes and so one hypothesis and I'm not going to go further than hypothesizing is that this may be because of distributional changes in preferences or in income following the enlightenment or the industrial revolution but again in this paper I'm not going to have any sort of explanation for that okay so you know these were some of the specialized facts that we can document using this data but you can do much more and so basically this is a novel historical individual level data sets cleaned and ready to use or at least it will be whenever if ever I publish this paper hopefully soon and and I do a careful country by country evaluation of sample selection which which is very important also trying to compare it to over sources to the calling Watkins Princeton project data to to family reconstitution and and finally I use I documents I'm still expect using the genealogists and so I guess you can scan this little thing thank you