 So we are live. Wait just a second for the settings to move around. And everybody is slowly coming across. Just another couple of moments. Let people move in from the other session tab. And that's probably long enough. So yes, hello and welcome to the final talk of the day. It's my distinct pleasure to present to you guys, introduced to you interdisciplinary team of Modi Mizrahi from Florida Tech and Michael Dickinson from the University of Illinois. And they are going to be walking us through philosophical reasoning about science, a quantitative digital study. So I'm really excited to see what you guys have for us. Take it away. Thank you very much, Charles and Luca as well for organizing this great conference. I've been amazed by the work that people are doing and what I've seen so far. And I'm ashamed to say that we don't have fancy visualizations for you. So forgive us for that. But we are definitely happy that Charles and Luca have organized this kind of conference because we've had in philosophy at least this kind of experimental turn in the beginning of the millennium. And so maybe it's time for a digital turn, right? And the use of more computational and corpus-based methods in philosophy. And I definitely think that that's long overdue. So with that in mind, we're gonna be talking about philosophical reasoning about science. So it's kind of a meta philosophy of science project. And we started thinking about that because if you open any book on philosophy of science introductory books like Samira Kasha's Philosophy of Science, then you'll see philosophers of science saying that induction is at the foundation of science, right? Kasha even points out that for most philosophers, it's obvious that science relies heavily on induction and you don't even need to argue for it. It's that obvious. If you look at the Stanford Encyclopedia of Philosophy as well in several entries about philosophy of science, about scientific realism, about the problem of induction like this one, then you will see the same kind of sentiment being expressed that we rely on inductive reasoning, not just in everyday life, but in science as well. And it's at the foundation of the scientific method. And then in a kind of related study that I did before, this one, I looked at the way in which scientists talk about confirmation and hypothesis testing in scientific journals and my findings. Also, I looked at, it was a corpus-based study and the findings suggested that, yes, there's mostly an emphasis on this kind of inductive talk when they talk about confirmation and hypothesis testing, but interestingly, it's not in all the sciences, it's mostly in the life and social sciences, not so much in the physical sciences. But that's a different project for today. We would like to focus on philosophy of science itself and ask the question of whether inductive inference is as much at the foundation of philosophy of science as it is supposed to be at the foundation of science. So to those who study science, the philosophers of science in particular, rely on inductive reasoning as much as the people that they study supposedly do. So these are the questions that we were interested in. And we, the corpus that we have is from JSTOR and in particular, it consists of articles from the following journals, British Journal of Philosophy of Science, of course, is a flagship journal and philosophy of science. And we also have the other flagship journal, Philosophy of Science itself. And in addition to that, we have these journals as well, all of which publish, of course, articles in Philosophy of Science, some of which in History and Philosophy of Science as well and some of them also in the History of Philosophy of Science as well. So it's kind of a broad spectrum of what the discipline of Philosophy of Science looks like, at least as far as journal publications are concerned. Now, what's nice about studying reasoning and arguments in corpus is that in philosophy, you have a fairly established way of doing that. Namely, if you look at any logic textbooks or any introduction to logic or critical thinking or introduction to reasoning, these textbooks will tell you that the way to find arguments in text is to look for indicator words or markers of arguments, right? We heard some of that today from one of the keynote speakers about markers of arguments in texts, right? So that's a fairly established way of doing that as far as logic, critical thinking, argumentation is concerned, right? So these are just two examples from logic textbooks, right? Where the authors talk about indicator words like therefore, since, because, right? If you see these words in a text, you know that probably, right? Of course, this is not a foolproof method of finding arguments, but probably there's an argument there being made in the text. And similarly, in this introduction to informal logic, the authors say that the word therefore is a conclusion marker or conclusion indicator. And then you have reason markers or premise indicators where it's like since and so on. So if that's the way to find arguments in any text, then of course, that's the way to find arguments in journal publications as well, in articles, right? So that's exactly what we did. We took these indicator words and divided them into these types of arguments, right? So for abductive arguments or explanatory arguments or inferences to the best explanation, we have indicators like best explain, of course, the best explanation for, and so on. For deductive arguments, we have words like absolutely, certainly, definitely, and so on. And for inductive arguments, we have indicators like likely, probably, and so on. And here you have some examples from philosophical texts of how these indicators are used in the context of an argument that's being made. And so we took these indicators for argument types and we paired them with general indicators for arguments, right? The argument markers that we saw before were words like therefore, hands, it follows, we can infer that and so on and so forth. So when you combine an indicator for an argument type, like abductive, deductive, or inductive with an argument indicator, words like therefore, and hence, and so on, right? You get these indicator pairs that will identify in a corpus, right? Not only that there's an argument being made, but also what kind of argument is it, deductive, inductive, or abductive, right? So this combination of indicator pairs gives you 36 indicator pairs per argument type, right? So 36 for abductive arguments, 36 for deductive arguments, and 36 for inductive arguments. And the other thing we did is to conduct these kinds of searches in our corpus with different lengths between those indicators, right? So in one kind of search, we did three words between an argument type indicator and an argument indicator, right? So for example, if we're searching for, therefore, and best explained, right? You could have three words between these two indicators, right? And you're gonna get a hit if there are three or less words between these two indicators. And we did the same thing with six words and 10 words between the two indicators, right? So this method is something I have used already in another article that was published in Metaphilosophy, where with my co-author, Zoe Ashton, we looked at this idea that philosophy is a priori, right? It's a discipline that requires abstract thought that you can do from the armchair, right? So we used a similar methodology to find out that, you know, if philosophy is indeed an a priori discipline, we should find more deductive arguments being made in philosophical journals rather than inductive or inductive arguments, right? So this method has been used before, but for this particular study, we have sort of scaled up the method significantly. And so to explain that further, I'm gonna turn it over to my colleague, Mike, who has really been doing all the hard work. So he's gonna explain a little bit more about our methodology. Hi everybody, I just wanna say thanks for listening today. I am formerly a data librarian and I did work with Modi at Florida Tech before leaving for the University of Illinois. So that's how this collaboration came about. And without further ado, I'll tell you a little bit about our text mining methods. So the first tool that the main tool that I would say we used is the R language in conjunction with RStudio, which is a graphical user interface, which enables us to use R easily. We also used a sort of a language within a language which are known as regular expressions. Regular expressions allow us to specify really interesting and diverse patterns when searching through text. And finally, we did a little bit of work in the Windows command prompt. I also do wanna mention the main packages we used, which are in R, which are stringer, the plier and read text. And all of these are pretty famous packages and pretty straightforward to use once you start learning. And now I would like to, before I go into the methods, I'd like to say just a little bit about the documents in our corpus if you could advance the slide. So we're working with a pretty large corpus of philosophical articles and book chapters with approximately 435,000 full text articles and chapters taken from JSTOR. And then with those full text articles come an XML file, which contains the corresponding metadata to each full text article. And next, I'd like to tell you just a little bit about how we go about importing 435,000 and ultimately close to 950,000 articles all at once. And that is with the read text function, which it comes from the read text package. And all you have to do is specify an object equals read text and then input the folder path and the read text function will navigate to that directory or folder and essentially input or load in all of the contents of that directory. When it appears and it's finally finished in art, it creates a data frame with two columns. The first column is the doc ID column, which is a unique identifier for the column, but it can also be the file name. And then the text column is a single character back string, a single character string for each article, which contains the full text of that article. And when you have multiple items in the folder, it will create a string vector. So that you have a single data frame with two columns and each row contains the full text of that article. I will note that it does take quite a while to load that in and to search on a specific pattern. When you're dealing with 435,000 documents, it could, I would hit run in R on my local machine and it would take about 30 minutes. I did move to a newer machine with a solid state drive and that cut the time down to about 20 minutes each time we would run or load the data in, but it is worth noting that it does put a heavy load on a local machine. And now moving on, I'll talk about how we would search for a specific pattern. So we would use the string detect function out of the stringer package. And with string detect you input the string vector that you're looking for a pattern in and then the specific pattern or a regular expression. And we'll talk in a moment about regular expressions. It's also important to note that this function will ignore punctuation. So as you can see in the example we have the sentence, this new revelation definitely proves Hume's argument. And if we use string detect on the string and look for proves, that'll return true. I will note that if we used argument and just look for the term argument, it would return true as well, even though there is the period at the end there, it will ignore that punctuation. And we can move in the next slide, please, Melanie. And so we used a regular expression in our research because we were trying to specify a really specific pattern. So in place of the pattern, we do use the rejects command, which is an R function. If rejects, regular expressions are sort of a language within a language. They're actually pretty tricky to use. If you look at the bottom of this slide, you'll see I used rejects, which is a tool online, which helps you build regular expressions. I've actually never met anyone who's so good at using regular expressions that they can just write them on their own. I think everybody pretty much has to use a tool like this. But I can tell you a little bit about what this does without telling you every detail. And you can see our indicator pair in the expression. In this example, we have proves, and then we have probably, and then you have the brackets that specify zero and six. So what we're essentially looking for here are the terms proves, and then the term probably, and they can be within six words of each other. And then when we're searching across 10-word range or three-word range, we change that six to a 10 or a three. You'll notice in the middle there's this bar, and then probably starts again. And it's essentially the same expression backwards. And what that indicates is that the regular expression does not place any preference on the order that the terms come in. So probably could come within six words before proves or proves can come within six words before probably. And with that, we're going to move on and talk a little bit about how we converted this to numeric data. So once we have searched for the pattern, it will, string detect will return the list of logicals representing true of the pattern is present and false if the pattern is absent. So then we just use the string replace command, which is also in the stringer package to replace the true values with one and false with zero. It is important to note that that will still be character vectors, so we do have to convert that to a numeric vector, which we use as numeric. From there, we can sum the total matches and that will yield the count of total matches for each indicator pair across each work range. And we can move on. And there is more that we did. And I'm just kind of gonna go through this a little more quickly. So it's also important to note that while we generated that count data, we also took the matched articles from the full master corpus and created our own separate CSVs and data sets for them. So we were able to do this by taking those true logicals and binding them back with the data frame and then filtering for true in the logicals list vector that has been attached to the data frame. You can see how we did that with the filter command here. And that is from the deep wire package. Deep wire is really great for cutting up and manipulating and filtering data. And it's definitely an essential package to use if you're using R. From there, we did save those CSVs, as I mentioned. We had to then import all of those CSVs and bind them by word range and then add the argument type in a new column so that all of the deductive indicators are labeled as being deductive indicators and all the inductive and so on. And what that ultimately results in are three master data sets, one for each word range, three words, six words and 10 words. And those data sets contain the document ID and the full text like they did before, but it also contains the argument type. And once we have the argument type, we are able to extract our metadata and we can proceed, Modi. Oh, I will note a couple of, a note and a limitation of this approach. So because of the way that we approached this, it is possible that a single article can be counted in our list of matched articles more than once. As you can see with this example, if an article contains the terms follows and absolutely as well as hence and definitely and both are within 10 words of each other, that article would appear twice in the list of 10 word matches. It's also important to note that the only really serious limitation of stream detect and that is that stream detect will only tell us the presence of a pattern one time per article. So say if follows and absolutely were to appear more than once in a single article, our approach was not able to count that. And once we have that master dataset with the full text and the metadata attached, and the argument type attached, we are able to attach our metadata by joining using a simple left join command. This is where that doc ID column becomes really important because the metadata documents have the same file name, which then is translated to doc ID by read text. So that is how we're able to join and connect the appropriate metadata file to the appropriate full text file. In order to do this, I did use a simple command in Windows command prompt and that is the star rename. And then you can put the file type you're starting with and then the file type that you're converting to, which in this case was text files. It is possible, I will note, to work with metadata and XML directly in R. I just think it's personally a little more difficult to interpret your results and what you're working with. It's a lot easier to read what's happening when you just convert to text. And if you could advance the slide, Moli. So in order to extract the specific metadata that we needed, we had to identify the specific tags. In this case, it was the journal title tag. And if you haven't ever used XML, every XML is a bunch of little containers for different pieces of information about any kind of work. So in this case, the journal title tag is that container. And we used another regular expression to extract the information between to locate that container and then extract the information from it using the string extract command, which is also from the stringer package. As you can see here, we have specified, you can see just the greater than sign on one side and then the parentheses containing the period star and question mark and that is the end of the start tag. And then you can see more specifically the end of the end tag so that we were able to specify the right parts to pull data from. And this could be done on just about any piece of metadata that you want. We did do it, perform this technique on other pieces of metadata, like the date and the author name and things like that. But for this study, we really needed to focus on the journal title. Once we have those titles extracted, we can then bind them back to the corpus again. And from there, we were able to filter it down to the specific journals that we have been working with. And I believe that's it for me. So I'm gonna let Modi take over again. Thanks everybody. All right, thank you, Mike. So as you can see, I was right, Mike was doing all the hard work. So I'm going to tell you about the results of the study. And as Mike pointed out, we have three data sets, one for our three word searches, one for six word searches and one for 10 word searches. And we basically found the same patterns in all of them, but as you can expect, for the three word searches, we have less hits than for the six word searches and for the 10, right? So what you can see here is the proportion of argument types by journals, right? So you see that in some journals, like the British Journal for Philosophy of Science, there are more deductive arguments than other types, but for others like Hopus and PSA, we have more inductive arguments than deductive arguments and deductive arguments are the less frequent type. So we wanted to see if those differences are significant, of course, so we ran some tests and as far as the BJPS is concerned, the difference between deductive and inductive arguments is significant. As far as the HPLS is concerned, the difference between inductive and deductive arguments is also significant. For Hopus, the difference was not significant and the same applies for the JGPS, no significant difference between deductive and inductive arguments, although deductive arguments are slightly more frequent than inductive, but the difference is not significant. For Philosophy of Science, significantly more inductive arguments than deductive arguments and for the PSA, significantly more inductive arguments than deductive arguments as well. Of course, in that respect, it's important to note that the PSA papers now appear in Philosophy of Science as part of the journal, right? So I don't remember the year exactly where they were joined together, but we still looked at earlier PSA papers, going back to 1975, I believe, percent. All right, and as I said, in the six-word data set, we get a similar pattern, right? Again, BJPS, more deductive arguments, but Hopus and the PSA and Philosophy of Science, more inductive arguments. And again, of course, we wanna know if those differences are significant. So we ran some tests, like we did for our three-word data set. And again, we got similar results, right? So the patterns are the same and the results of these physical tests are very much the same, right? So again, BJPS, significantly more deductive arguments than inductive arguments. HPLS, again, significantly more inductive arguments than deductive arguments. Hopus, significantly more inductive than deductive. No significant differences for the JGPS and Philosophy of Science. And for the PSA, significantly more inductive arguments than deductive arguments. And lastly, our 10-word data set. And again, as you expect, right? A lot more, the proportions are higher, right? A lot more hits for our search results, but the patterns are pretty much the same. BJPS, again, with deductive arguments and the others with inductive arguments and abductive arguments are the less frequent type of argument, right? So again, the results of statistical tests are pretty much the same. In the BJPS, deductive arguments are significantly more prevalent than inductive ones. HPLS, again, inductive, more significant than deductive. Hopus, again, inductive arguments significantly more frequent than deductive. No significant differences between deductive and inductive arguments as far as the JGPS and Philosophy of Science are concerned. And finally, for the PSA papers, inductive arguments are significantly more prevalent than deductive arguments. So again, pretty much the same results in our three-word, six-word, and 10-word data sets. So that shows that the results are pretty robust, right? We get the same patterns in these three data sets. So what can we say based on these results? I think we can say that, yeah, philosophers of science do rely on induction, but it doesn't seem to be as foundational to philosophy of science as it is to science. Or at least it is supposed to be to the science, but philosophers of science tell us that it's supposed to be for science, right? So, and that's because of these kinds of mixed results that we get, right? We did find that articles published in HPLS, Hopus, and the PSA do contain significantly more inductive than deductive arguments overall, but on the other hand, you have the BJPS where you have significantly more deductive arguments than inductive arguments. And as we saw as far as philosophy of science and the JGPS are concerned, there are no significant differences between the proportions of deductive and inductive arguments. So these kinds of mixed results don't support this idea. I think that inductive reasoning is foundational to philosophy of science as it's supposed to be for science itself. Another interesting thing about our results, I think, is that abductive arguments or abductive reasoning or inference to the explanation doesn't seem to be all that frequent in philosophy of science, right? Even in our 10 word dataset, it was roughly less than 10% of articles that contain such arguments. So I think that's also interesting and somewhat surprising because in addition to this idea that induction is at the foundation of science, philosophers of science also tend to think that science relies heavily on abductive reasoning as well, right? This is a quote from Anjan Chakravarti who said that it's ubiquitous in scientific practice. Of course, McClellan has this book, right? That it's the inference that makes science, right? And he's talking about abductive inference in particular. So if it is the inference that makes science and it's so ubiquitous in scientific practice, it doesn't seem to be so ubiquitous in philosophy of science, right? It doesn't seem to be the inference that makes philosophy of science. And so it's maybe pointing to a way in which philosophy of science is different as far as the reasoning is concerned from science and maybe even from everyday reasoning. Because in addition to this idea that explanatory reasoning is foundational to science or is the inference that makes science is McClellan's terminology. Many philosophers have pointed out that IBE is ubiquitous not only in science but in everyday life, right? So this is a quote to that effect from a fairly recent collection of essays on inferences to a best explanation by McCain and Poston. So they say that explanatory reasoning is quite common and it's not only common in science but it's virtually ubiquitous in everyday life. In fact, it's so routine and automatic that's from Igor Duven's SCP entry on abductive reasoning. It's so routine and automatic that it sometimes goes unnoticed, right? So again, based on our results it doesn't seem like explanatory reasoning or inferences to the best explanation are that ubiquitous in philosophy of science, right? So we think that that's another interesting finding here from this study. And that's all we have for today. So thank you very much for listening to us and thank you for inviting us, Charles and Luca to give this talk. Fantastic. So before there are lots of questions and I will get to them momentarily but I think you'll be happy to know your talk has inspired furious discussion in the chat because now we're all thinking about, okay that was such a nice demonstration of how we could reproduce an analysis like this on our own. And now we're thinking, okay do we need like a shared code repository for people who are all interested in this kind of stuff or like a way to aggregate, GitHub repository links or something. So the chat is like bubbling away thinking about like reproducibility and collaboration. So right off, that's completely awesome. Let me get into the questions. First from Chris Green who asks how many significance tests did you run in total of which what proportion were successful? Was there any correction for family wise error or was it all at 0.05? So might we have a lot of type one errors turning up as significant here? Yeah, that's a good question. So we only looked at so that the value was 0.05, right? So that was our threshold. And since it was clear just visually, right? It was clear that abductive arguments are not all that frequent. So we just tested the difference in proportions between deductive and inductive arguments because that's where they seem kind of close, right? So I mean, of course it's possible, right? That some such errors, type one errors specifically might sort of creep in. But in some cases I think, when you like in the BJPS, for example, right? The results are pretty substantial, right? That there's a significant difference there. Great, thanks. Next question coming in from Stefan Hespruegen who says, this isn't a question but a friendly objection speaking as a historian of philosophy. So Hopos and HPLS are gonna contain a lot of papers in the history of philosophy. And we historians are prone to hedging using expressions like probably a lot, but that doesn't necessarily turn our work into inductions, does it? Good, I mean, that's a very good question. So of course, phrases or terms like probably or likely and so on can also be used as markers of hedging, right? So I think one way to address this, and this is what we did in this study is to pair these words like probably and likely and so on with argument indicators, words like therefore, hence, it follows and so on. So of course, it might still be the case, you might still get some false positives, right? You might still get some cases where the word probably is being used as a hedge, not as a marker for an inductive arguments, that's always possible. But if you pair it with an argument indicator like therefore, then that I think significantly reduces that chance that it's a hedge, not a marker for an inductive argument. Nice, nice, thanks. Next question from Eugenio Petrovich who asks, so how can you distinguish argumentative patterns belonging to the philosophical paper from argumentative patterns that the philosophers are reporting on from the scientific studies that they discuss or the historical sources that they cite? Yeah, that's a good question. And of course, we didn't do that for this study, right? So if there's a quote from a, and Mike probably could talk more about this than I can, but if there's a quote in a philosophy of science paper, let's say published in philosophy of science, and there's a long quote from a scientific article that has the expression therefore likely, right? Somewhere in there, so that's going to be counted, right? I don't think that we didn't account for that, right? But still, I think that, I mean, those are going to be quite rare, right? I would assume, right? But of course, I need not assume something like that. So if there's a way and someone can suggest a way to try to differentiate between quotes from a different article within an article, that would be awesome. And maybe Mike, you would like to say something about that too. Yeah, no, I mean, you pretty much nailed it with we didn't consider that. And it's really interesting to think about how you would get the machine to be able to identify that because the full text as we receive it, is it's a jumbled mess. It's not formatted or anything. So maybe there's an indication where there is a long quote that doesn't have quotation marks around it or something. I don't know how you would exactly be able to look for that kind of a pattern, but like you said, Modi, I'd be really interested to hear people's ideas. Yeah, in my experience with JSTOR data, not even inline quotes with quotation marks are gonna be reliable because a quotation mark is a favorite mistake of the OCR process, right? Quotation marks start showing up everywhere that they're not supposed to be. So yeah, it's gonna be rough, but it's an interesting thing to try to think about. A question coming in from Alex Musnuk, who asks, it's just a possible alternative explanation for the finding concerning the lack of abductive reasoning. Is there a possibility here that just abduction is just much more hard, difficult to model and capture when we in text? And so there's just a difficulty in finding the signal. Yeah, I mean, I think that's possible, right? If you look at the phrases that we have used for abductive reasoning, right? So unlike deductive and inductive indicators, which these are unique words, absolutely, certainly, unlikely and so on, the abductive indicators tend to be at least pairs of not three words, right? Like makes sense of and best explanation for and that sort of thing. So that could be one way in which searching for abductive arguments is more difficult and more complicated than searching for deductive and inductive arguments. So yeah, I mean, that's definitely something to think about. And again, if people have any ideas, then we are certainly open to suggestions. Great, thanks. Next question coming in from Christoph Maltaire who says, a very inspiring talk, thanks. Did you look at publication years to do a diachronic analysis? So he says, I find the difference between BJPS and philosophy of science somehow puzzling, but it could be explained in a diachronic perspective. If philosophy of science starts in the 30s, where BJPS starts in the 50s when the discipline was undergoing professionalization. And so maybe their difference has to do with changes in how we write down philosophy and a way to test this would be to have a diachronic view on each journal. That is an excellent idea. Definitely something to do next. We didn't do it for this talk, obviously, right? If we did, we would present it, but definitely something to do next to look at the timeline and see whether there are changes in the types of arguments being made across the years of publication. Yeah, definitely. And we're running out of time, but I do wanna push this last one in a similar spirit of friendly suggestions. So Eugenio Petrovich also says, thank you very interesting. It'd be interesting to investigate with this method the role of the same inductive, deductive and abductive forms of reasoning and other disciplines studying science, like the more empirical ones is like STS and Scientometrics and to try to compare that with philosophy of science. Absolutely, yeah, that's a great suggestion. So thank you. We have a lot of work to do. And with that, I think I have to, I think I unfortunately have to call it there for time because we are over for the day. So thanks very much. Yeah, I encourage you, by the way, to go look at the Crowdcast chat on replay because it's been popping off as they say. So thanks very much for the talk and we will see everybody back here tomorrow for the final day of DS Squared 2021 with another actually exactly the same schedule as today. So looking forward to seeing you all once more tomorrow. Thanks again very much. We'll see you all soon. Bye-bye. Bye.