 Okay, we're live. Give me just a second to wait on the system here and get everybody moved over into our session. Just a second. Wait on the number to go up. Make sure we've got everyone. I think we do now. All right. Great. So, yes, marching onward. Thank you everyone for being with us. The now fourth talk of the day, fourth of five. I have the distinct pleasure to introduce Nicole Bertoldi who's just who's now just started at the outcome as well as on on a postdoc with with Christophe Maltaire there at the search group at the end of the outcome and also who did some of this work I know as well that the HPST and in Paris so without further ado yet let me turn it over so I'll be talking with us today on from the archive to a computational conceptual map a distant reading of biometrica so please Nicole. Thank you Charles and thank you Luca for organizing this and thank you to you all for taking the time to listen to me. Unfortunately, I, I fear that my talk was will not be as marvelous as some of those that preceded that preceded the mind because I'm going to present a very incoherent idea, which is the project, the research project I'm currently working on here in Montreal under the supervision of Christophe Maltaire. So this project deals with the analysis of the archive of an important scholarly journal, the journal biometrica, which was founded in order to foster the study and more generally the organization of a new field of knowledge. In more precisely the field of biometry, the application of statistical methods to the study of different biological phenomena with keen eye on variation. In this talk I'm going to present a brief outline of my research project, and then I'm going to give you some background as to why biometrica matters, in my opinion. So as for the outline of the project, as it is reflected in the title of my talk, my idea, which sprung from the PhD thesis that I wrote and defended in Paris is to try to adopt an approach to the study of this particular scientific community, which is the community, which was the community of the biometricians, by taking an approach that could be described as a distant reading of biometrica and of its archive. I took this formula from Franco Moretti, and by distant reading I mean the search for hidden structures within a corpus, in particular in this case within the archive of biometrica, which could be used not only in order to understand how the intellectual field of biometry was structured, but also to understand how the community that was active around this journal was itself structured. So biometrica was founded in 1901 and for my research project, I am working on its archive from its inception until 2013. It's a relatively small corpus, which encompasses 100 volumes, 296 issues and a total of, some total of 7,678 articles. However, it's a corpus that can lead to some very interesting avenues of research, because it's relatively well organized, annotated and OCRed, and it's ready to be explored. And also in its size notwithstanding, it's a corpus that encompasses a very large range of subjects and of research topics, which mirrors the way in which the community of the biometrician evolved and transformed into well, what we could, what we could call modern statistics, modern applied statistics. However, how does this corpus look like? In this moment, I'm working on metadata that has already, that I was able to acquire through my supervisor from the website JSTAR. And as you can see, those metadata I'm working on take the shape of, take the form of XML files with sort of tree-like structure. And different in this, within this structure, I'm looking at the different blocks that make up those trees in order to extract relevant information with respect to the journal or with respect to some particular articles. For instance, I'm looking at titles, at authors, and etc, etc. And in this moment, I'm merely trying to constitute a data frame with the aim of transforming this archive into a structure that could lend itself to further analysis. Take an even closer look. Here you can see how the metadata are organized. And here is the code that I'm writing in order to generate the data frame. Unfortunately, it's, as I was saying at the beginning of the talk, it's a very inchoate project. And I'm just at the first stage. I'm trying to organize the information that I'm extracting from the metadata. And here are some other parts of the code. I'm not a professional programmer, so this is a bit of a challenge for me. I'd say that the fact of trying to write a code in order to store and organize the information in a data frame is giving me the opportunity to reflect on the way in which I approach the texts as a researcher. So here's some more detail. So those are the kinds of items of information that I'm interested in this moment. And I'm going to develop the code further as my dive into the archive progresses. So to go back to the second part. Why biometric? So, as I was saying, previously, biometric, the history of this journal is tightly related to a major event in the history of the life sciences, and more precisely to the one on controversy between the Mendelians and the biometricians. The Mendelians were biologists that rallied behind experimental geneticist William Bateson at the beginning of the 20th century, who had concluded that the results of Gregor Mendel's experiments on the hybridation of different variants of a given character supported the hypothesis of discontinuous variation. On the contrary, the biometricians were largely Darwinian and they aim to show that the application of the modern metal of statistics to the study of variation could vindicate the Darwin's insights into the genesis of variations and the process of evolution by natural selection as the accumulation of the continuous small variations. So this controversy was settled in 1918 was officially settled, let's say, in 1918 by the when R.A. Fisher published a similar paper on the correlation between the relatives on the supposition of Mendelian inheritance, which proved the consistency of Mendelism with the Darwin's hypothesis on natural selection. However, the seeds of the reconciliation between those two opposing views were already present in present works and even in Mendel's original paper itself. That's why this controversy has constituted a source of bewilderment for scientists, historians and philosophers of science. And it's precisely for this reason that David Hull referred to this controversy as an inexplicable embarrassment. The biometric played a fundamental role in this controversy. It is precisely on the backdrop of the heated debates between Pearson, R.A. Pearson, sorry, Carl Pearson, Raphael Watson and Francis Galton on one hand and William Bates on the other hand that the journal was founded in 1901 with a two-fold objective. On the one hand, the journal's founders who were precisely the foremost biometrician of the time aimed to foster the collection and the interpretation of statistical data in various fields of biology. On the other hand, they held a theoretical and perhaps even ideological commitment to statistics as a theoretical tool capable of settling scientific controversies such as the one around the causal power on natural selection. As Carl Pearson's son and biographer noted, the main aim of the biometrician was to transform statistics in a branch of applied mathematics and to apply this new insight into the study of biology in particular to the study of biological variation in order to build a real science of variation and of evolution. This view of the aims of biometry was indeed consistent with Carl Pearson's broader epistemological and ontological views. He regarded all physiological hypothesis on hereditary as a subsidiary to statistical laws and biometric models. He considered the format to be to be completely a theoretical in nature. This view of modeling squared well with his personal epistemological and metaphysical convictions, which could be brought in, which could be brought in labelled as a form of scientific In the grammar of science, one of his most important works, Pearson argues that the only true objective facts are phenomena which which are the things constructed by the human faculties of perception and thought. More precisely, Pearson asserts that what we call external objects are nothing but construct that are comprised of two kinds of sense impressions. On the one hand, there are immediate sense impressions, which the body receives through the senses. And on the other hand, there are the effects of past sense impressions that are stored in our memory and which are added to present impression through a process of physical association in order to produce a full blown sensory experience. Such impressions are, properly speaking, the only objective constituents of reality. This is why, according to Pearson, the reality of a thing depends upon the possibility of its occurring in a whole or part as a group of immediate sense impressions. Therefore, Pearson's regarded thought as the process to which objects are constructed in the mind, which is ultimately elicited by immediate sense impressions, yet proceeds by automatically associating one memory with the following one. In other terms, according to Pearson, the mind proceeds from direct and physical associational memory to indirect reflective and mental associational notions. i.e. from percepts to concepts. It follows that just as objects, what percepts only exist to the extent to which they can be reduced wholly or partially to groups of immediate sense impressions. A concept can be deemed scientifically valid, only if it is self consistent and deducible from the perception of the normal human being. Furthermore, just as the universal validity of perceptions relies on the homogeneity of the perceptive faculties of all normally constituted human beings, all valid conceptual inferences are those which could, although not necessarily would, be drawn by every logical trained normal mind if it were in possession of the conceptions upon which the inference has been based. This is why, according to Pearson, there could never be any universally valid knowledge without the existence of a canon of normal perception and a canon of legitimate inference, which shall ensure that the outside world of phenomena, the processes of association and logical inference, as well as the inner world of story impressions and conceptions, must be practically the same for normal human being. It is precisely in those respects that person's view fit perfectly well in the category of scientific phenomena. On the one hand, the person contended that the only objectively existing things, and that's the only subject matter of knowledge, are phenomena which constructs from sense impressions. On the other hand, he was also convinced that the objective existence of such phenomena is only ensured by the canonical rules of perception, observation, experimentation and reasoning that constitute precisely the grammar of science. Therefore, the question arises of knowing to which extent and in which way those philosophic epistemological commitments of Pearson had an impact not only on Pearson's research, but on the wider community of biometricians. And this is precisely for, in order to answer a question like that, that I took upon myself to develop this project and to look into biometrics archive in order to try to build a sort of conceptual map on the basis of the information that I will be able to extract from the data that I'm gathering. But what is therefore a conceptual map? I'm using this concept in the same way as James Grissimer defined in his works on the for the rational reconstruction of scientific theories and in particular on the rational reconstruction of evolutionary biology and population genetics. In particular, Grissimer has put forth an approach that he deemed to be a pragmatic approach, which, who's the aim of which is to identify precisely some conceptual maps that conceptual charts and conceptual maps as a connection between conceptual notes that underpins the way in which scientists present their own theories as possible definition of theoretical structures aiming at aiming to specify the peculiarities of a particular kinds of system of objects to which the theories themselves are ultimately related. In other words, the definition, the function of those definitions that are underpinned by conceptual maps is to determine what might be called the empirical domain that is associated with the theory that the philosopher aims to rationally reconstruct. This empirical domain is, in turn, a tightly related to a conceptual domain which consists of the principles, the laws, the concept, the models and other elements that the theory in question relies on in order to explain the way in which the objects that makes up this empirical domain evolve and change in time. In conclusion, one of the main objectives of my project consists in trying to reach the gap between the empirical domain of of biometry and the conceptual domain of this same science by looking at the way in which those two aspects of of biometry were articulated by the members of the community, the scientific community of the biometricians. In order to do so, I I aim to, I intend to analyze the archive of biometrics in order to extract from this archive a conceptual map that could possibly be imagined to underpin the whole presentation of biometry as it as it can be understood by reading, reading both distantly and proximately biometrical. So before before leaving time to questions, I wanted I wanted to thank Philippe Ullman, who was my, who supervised my PhD thesis in Paris, and Christophe Maltaire, who is my current supervisor and who helped me get the funding and the resources that are necessary to my to the development of my project. And I also wanted to thank you all for your time. Fantastic thanks so much. So, well, some of you in the audience may or may not know that I also have worked on this exact same period of the history of biology so I could talk about this for a very long time. I will, I will start by, I'll start with a question while I while I wait on the questions to arrive from from the chat. One thing that one thing that I'm that I'm interested in is. And this is a this is a hard question, I know, but it's kind of a kind of a broad worry that we might have about doing digital humanities type work in in this period and actually I'm very interested to see we have we have a couple of talks yours and one coming up. It's tomorrow or Thursday, where people are applying these methods to older data to more difficult data. You know, one of the, of course, it's lovely we've seen just in the last few talks, actually, in the talks so far in the conference, the kinds of processing you can do when you have very high quality in bound in bound data but I know it's much harder right to extract things like like citation networks and whatnot from these from these much older, these much older sources so I'm wondering. Obviously I anticipate that you're going to be thinking about about topic modeling but I'm wondering what kinds of just in general how are you thinking about surmounting that problem of dealing with an old data set. And how is that how is that sort of framing your thinking about the project because I think that's a, that's a real interesting challenge for those of us who don't want to look at who wants to want to look beyond or behind the last, even even 20 years or so when our since articles have been born digital so I just wonder what your what your thoughts on on that that problem of historical data might might look like. Well, I could just what I could say that there are two kinds of barriers, or more precisely two kinds of challenges that those can this kind of this kind of data poses. And there is the technological practical challenge because it's necessary to have the, the, the time, the technology, the manpower necessary to treat those data in particular when the documents in question are for those are of old books or old articles, but there is also another challenge that is particularly interesting and it's something I came across while working on the metadata and metadata from Biometrica. It's the problem of structures. In particular, with respect to the way in which data well are catalogued and are gathered because for instance I realized that the meta, the structure of the metadata for Biometrica is not consistent not really consistent across time which which forced me to look at a little bit closer to the article I was I was trying to extract information from. So I'd say that the first challenge is the more obvious, but the other one should not be overlooked. Sure, yeah, yeah, that's that's a that's a that's a nice point. Thanks. Thanks. A question coming in from using your petrovich vests. Yeah, could you tell us more about the conceptual maps or are you planning to generate it by quantitative or digital methods or by more classic close reading methods or maybe a combination of the two how does how are you seeing that that process. Well, I'm just I'm a sort of a new fight in the in the field of digital humanities and my intuition is that both approaches distant reading and close closer reading are are necessary to and they constitute two dialectical opposites that are actually both necessary in order to really understand the dynamics and the structures behind the documents and at this stage of my research I plan to start with some let's say basic topic modeling on the archive as as a whole from 1901 up to 2013 maybe up to later issues and one of the problems I'm anticipating is consistent in the fact that the topics that I would probably identify be able to identify well they risk to be not as informative as to the real content of the documents and as to the real structure of the the conceptual map that I want to extract from those documents because for instance I'd like to be able to find a way to identify different kinds of concepts that are more or less closely related to the empirical domain I was referring to and in particular I'd like to discriminate between mathematical models representation of phenomena representation of data as a sort of intermediate step between phenomenon models in order to look into the way which biometrician conceives of the relation between those three macro areas of a possible conceptual domain of a conceptual field of biometry and by doing so I'd also like to test let's say a null hypothesis which consists in positing that Pearson had determined wielded an important influence on the development of biometry and therefore that Pearson's epistemological and philosophical convictions should be reflected in the way in which in the presentation and the conceptual map of biometry that I anticipating I will be able to extract from biometric. Great, thanks. I'll help myself to a more historical question which I want to build on something you just said. I wonder how you're thinking about how to put this. Yeah, this is a good way to say this sorry I'm coming up with this probably too quickly on the fly but another thing that I think happens and this is this is related to these these questions about about the history that the long scope of the history here. Another thing that I think probably happens over this period is not just the content of the articles changes that is to say the kinds of science that gets done in a journal like this. But even the sort of social nature of what publishing in a journal is what it means to be to be printing an article, you know you're starting not that far after the sort of birth of the contemporary journal system and then running into the now today's you know hyper digitized etc h index powered etc etc a very very different kind of not just not just scientific understanding but say understanding of what it means to be to be publishing in a journal understanding of what your editor does. Brief, you know, I know you know this but but for people who don't they're there there are notes in the archives about people would send papers to Pearson for publication and biometrica and he would say this is great. I'm absolutely happy to print this I'm going to rewrite it for you though before we put it in the journal and you're like wait this is not this is not what I thought signed you know scientific publication did not mean exactly what scientific publication means today so I wonder. Obviously that's a lot of close reading that has to happen behind behind that but I think it'll be interesting to see how you can trace how you can see that emerging if that's a signal that you can see out the other side of the data in the project and that could be applicable right not just to the biometrica case but to sort of questions about scientific publishing more broadly I think it's an interesting point. Are you wondering about the. The possibility of identifiers identifying a sort of personal style, victory style and conceptual style in a cross papers. That might be one way that it would show up but or that but that's really I guess what I mean is that sort of one facet that I can think of of many right that might be visible in terms of a change of what it meant to be writing a journal article and publishing a journal article over the over the scope of your of your project that could be a neat kind of. Another kind of knowledge another layer of knowledge that that an analysis like yours could could produce. Yeah and biometrica is very interesting from this point of view because, well, we could say that. It's in section it was a militant journal because Pearson was wary of Bateson influence, and he feared that he would not be able to publish as he wanted in other journals therefore he set up this venture that was not only a publication but also a real act of affirmation of an idea of science. Yeah that's great yeah yeah I think I think it'll be very interesting I'm really excited to see to see where the project goes. I don't see any further questions I'm going to do that thing that I do where I stall for a second to give to give the tape delay a chance to catch up on the broadcast again my my apologies I am not used to working off of live that's a bit of a weird. feeling, but failing that we're very near to time in any event so why don't I just thank you very much for for this for this introduction to your project I'm really excited to see to see how it unfolds as you as you already know so. So thanks a lot and with that will be back in a little more than 10 minutes for the last talk of the day. Very soon now let me take advantage of this of this brief moment very soon now I will be posting I'm going to go do it right now I'm going to go update the conference website to post the link to the zoom that will all be headed for at 715 p.m. European time I forgot to look up the translation into another time zone but I'll be going to post that zoom link very soon so so I keep your eyes open for that. I'll see you guys back here in just a few minutes. Thanks again.