 as of right now we're live, but it takes a few seconds for these settings to propagate through Zoom. Okay, so that's coming in now and let me... Okay, yeah, I think we are good. Waiting on just a second to let the other people come in from the last session. Numbers up into the high, numbers up to 50. All right, there we go. I think that's everybody. All right, welcome back for the second to last talk of the evening, day one. Very exciting. Very, very happy to have with us Mel Andrews who's doing some really cool stuff right now on philosophy and machine learning and formal approaches to biology. So this is really stretching actually in the direction of our next talk, which will be about philosophy of mathematics. So we're going on the little formal tangent for the end of the evening. So I like this. This talk is on machine learning and the scientific method. So without further ado, Mel Andrews of the University of Cincinnati, I will be happy to give you the floor. Take it away. Mel Andrews, thank you. Yeah, so I'll be talking about machine learning and scientific methods. Specifically, I'm going to look at what the literature on scientific modeling can tell us about how machine learning behaves in science. So this is sort of the map of what I'll be talking about today. The inputs, what I'm bringing to the table are machine learning, the literature in philosophy of science and scientific model. And my experience through the past few years has been specifically with this model called the free energy principle, which has has its own complicated history. So that's what I'm bringing to the table. I'm going to look at descriptive and normative aims in understanding the role of machine learning in science. And then I'm going to hope that this gives us sort of an overall broad brush stroke picture of machine learning and scientific practice. So first, machine learning, a definition, computational techniques for big data analysis that generates a statistical model of the data on which this train, given the instructions of an algorithm. Now, today, there's not a lot of analysis in the field side literature on machine learning and scientific practice so far. But it's been addressed a little bit by the likes of Cornell and Kambachner, Captain Creel, Martin Halina, Spino-Linelli, Carl Stetnik. And scientific models, the science, model based philosophy of science, as a literature, it embraces sort of the messiness, the disorderliness of science. But it's sometimes done so at the cost of being overly permissive by science. There's a big emphasis on pluralism and that sometimes leaves open important questions. So a lot of, most of the ink has been spilled in this literature so far on giving typologies models, always attempting to give sort of exhaustive typologies of scientific models. And then kind of arguing over the ontology of models and their representational status, how it is that models represent systems in nature. Lastly, the free energy principle. So this is a formal model that's currently relatively new, right? It's currently being used in neuroscience, psychology, cognitive science, and biology. And it's built up from machine learning methods. So the original core of the framework is coming from a method known as learning with noisy weights or ensemble learning. This is coming from like Hinton folks in the 80s and 90s. And you can see it perhaps best articulated in McKay's work. And some of those methods that are involved in the FEP, the free energy principle are coming also from physics. So there's this interesting sort of historical trajectory where you get the techniques developed in physics and then they're brought through machine learning and then brought into a model in biology or neuroscience. So they've gone through like multiple fields, multiple domains, many, many different interpretations to get where they are today. And this method has generated a lot of controversy and confusion. And I think that's the case because there's a lot of what's been called reification going on there and I'll say more about that. So descriptive and normative aims. There's sort of a mapping project here and an anatomy project here that we can get from the model and literature. A taxonomy of models, variety of different types of epistemic virtues at play in different models. And hopefully this will allow us to answer the questions, how does machine learning operate in scientific practice? And how does machine learning help us predict, explain, understand, and manipulate target systems in nature? But there's also a normative aim, right? We want to be doing not just mapping but kind of preparing ourselves to do intervention more neatly. So there's a, I hope to also get from the literature on scientific modeling from philosophy of science. The tools needed to do diagnostics and surgery when we see that, when we see that machine learning methods are going awry when we see that they're somehow diseased, right? So how do we identify problems in scientific modeling with machine learning? And then how do we go about analyzing them and ultimately rectifying them? To that end, I'm going to look at a problem that's unique to scientific modeling known as reification. So on the anatomy side, I'll just say briefly, I'm going to draw these distinctions very briefly just to have them in the background. There's a distinction between models and target systems. There's a distinction between serial models and data models. There's a distinction between structure and control, and then there are sort of lower order technology models. So models are taken to be abstract structures that enable investigation into real-world systems known as target systems. Often this is said to be done by representation, right? We take a model to be representative of a target system, although some people have a more kind of pragmatic conception of what it is that models get us, right? What what purchase they give us over target system. Modeling involves necessarily idealization, abstraction, distortion, simplification, course-graining, black boxing of irrelevant variables, right? And the famous quote here that representative, it's not from a philosopher's client, it's from a statistician, George Box, but it's sort of representative of how the philosophy of science literature takes models to be useful. All models are wrong, some models are useful, right? This is a very women's hadian position. His famous quote on this is false models as means to truer theories. Now there's a distinction as well and a little bit between theory models or theoretical models and data models. So a theoretical model or a model of the phenomenon is going to represent a worldly process or phenomenon, and it's going to be informed by theory. It's going to be conceptual. Data models on the other hand are when we take raw data, so to speak, that we've collected and we clean it up, we remove outlying variables, we remove any kinds of errors and we find a rough first-pass way to organize it. That's what a data model is, right? We do, for example, cleaning and cursing. Now I think that we can see, and this is tentative, this is my first-pass attempt at applying what I've learned from this modeling literature to machine learning, but I think on a first-pass we can see that there's sort of an analog in machine learning and how it's used in science with this data model theory model distinction, right? So on the data model side, we see that machine learning in scientific practice can, for example, sift through data, discarding what's interesting. So for example, at certain and large Hadron Collider, there are machine learning models at play that are sifting through particle collision events and keeping only the 1% that are most likely to be interesting for us. So machine learning can also classify data and that will maybe sometimes fall on the theory model side, but it may also in general fall on the data model side. Data models can find simple patterns in data, so for example linear regression is a data model type thing that we might do, and also we can generate simulated data on the basis of existing data. If we were generating simulated data on the basis of theoretical considerations, this would be more of the theory model, but if we're looking at existing data and generating from those patterns there for other cases, that's a data model type thing. On the other hand, the theoretical model side, machine learning systems can be taken to be representative of target systems in the world. They can be taken to be representative target systems literally, straightforwardly, or maybe approximately with some idealizations and analogically as well, structure and control. So the idea with this is that models are composed of, this is Michael Bussburg's typology, right, we have physical, mathematical, computational models and physical and mathematical and computational refer to the structure of the model, right, what the model is composed of. But what a model consists in is both the structure it's instantiated in and a scientist interpretation or construal, and it's this interpretation or construal that relates the model to a target system. So this is what's drawing the kind of comparison between the model structure and some system or systems in nature. And perhaps this breakdown is best seen in the case of what we call model transfer. So model transfer is our term for when we develop a model in one domain, in one discipline for one purpose, related to one target system, and then the structure is imported to a new domain or new discipline and related to a new target system with a new construal. So for example, the Lotka Volterra model, it's a system of nonlinear differential equations. It was first used in physical chemistry by Lotka. And there, right, it's a sort of oscillator model, right, of a type. And it represents chemical concentrations, right. But then it's imported later into population biology and ecology. And it's used as a predator prey model. So instead of representing chemical concentrations, it's representing populations organisms. So it's known as model transfer when the structure, right, the system of nonlinear differential equations is brought from a context of physical chemistry to a context of population biology. There have been a few types of, I guess, like lower order typologies of models that have been really helpful to me in understanding the free energy principle of the past year. I've been thinking about the FEP for several years now. Really my breakthrough with it was thinking about it in terms of this philosophy of science of modeling literature. That was what really enabled me to finally, I think, get to the heart of it. And for that reason, I think that looking at that literature for understanding scientific modeling with machine learning techniques in general is going to be a really fruitful avenue. And a good first step towards understanding machine learning in science and practice. There are a few types of models that I think particularly accord well with the roles that machine learning can play as sort of theory models in science. These are exploratory modeling. So there might be a domain that's pulling you to us where we don't have a lot of evidence and we don't have robust theories built up. And in order to kind of get a foothold there, in order to just get in the door in that new domain with that new sort of area of investigation, exploratory modeling, it doesn't build in a lot of assumptions. It sort of casts a wide net and enables us to, with minimal assumptions, work out what sort of systems we're dealing with, what the parameters are, what's relevant, what's not relevant. Targetless modeling. Weisberg looks at targeted modeling, target directed modeling, rather, generic modeling and targetless modeling. And targetless modeling is modeling where you don't have a specific target system in mind when you go about the modeling process. And I think this is another good candidate for understanding machine learning. Baran Duran has specifically a kind of typology for life-mine continuity models. And particularly, these are all computer simulations of life-mine continuity series, right? And he gives us an account of generic conceptual modeling there that I think fits really well. And then Tari Shmarra has this notion of model functioning as guides to discovery. I think all of these are good avenues for understanding machine learning and scientific practice. But on to the normative slide, there's this notion of reoccupation in the modeling literature. It's only been really so for a rather scantily diagnosed. But there's been some talk in lab and labs in Vincent's work on what they call reification or conceptual reification or pernicious reification. Reification occurs when we make category errors in wielding or interpreting scientific models. And the dialectical biologist, Lavens and Wilson, tell us that abstraction becomes destructive when the abstract is reified and when the historical process of abstraction is forgotten, so that the abstract descriptions are taken for descriptions of the actual object. There are many distinct components to modeling, right? The idea is that modeling is going to help us relate theory to the world. But models themselves, we've seen breakdown into a structure and a construal. And then our access to the world is always mediated by data, right? We don't have access to the world out there, but just data we gather from, right? So in scientific modeling practice, we have to be careful not to confuse theory and structure and construal and data and world. And yet we seem to all the time in scientific practice. This is the chief pitfall of scientific modeling is that we confuse some level of this process with another level of this process. And I'm going to highlight for us the six primary types of reification that I think we see commonly in modeling, if I can change that. The first type of reification we see is sort of the classic case of mistaking a representation for the world. So Korzybski gives us this famous quote that I'm sure we're all familiar with, the map is not the territory, right? When we have a map of the world and we confuse the properties of that map for the world out there, for the territory that we're traversing, this is this first order type of reification. This is where the existence, qualities or results of a model are taken as facts about the target system. And Levin tells us that models, while they are essential for understanding reality, should not be confused with that reality itself. The second order type of reification is when the existence or qualities or model are mistakenly attributed to a theory. So this is happening a lot. I've noticed with the free energy principle where the formal structure of the SET, that is the maps, is conflated with a theory of brain function or biological self-organization. So Ramesh said that cognitive systems tell us that systems are alive if and only if their active inference entails a generative model. But here what active inference and generative models or generative modeling refer to is maps. And what it is to be a living system, right? What life is, what the dynamic of life is, is a theoretical construct. And the idea that this map is going to tell us in and of itself whether systems are alive or not is sort of the category mistake. The map can't tell us that. We can build a concession up from that map of what it means to be alive, but that doesn't flow as a direct consequence from the formal techniques, right? Same as Carl Poplar made a similar mistake in the 20th century. So he developed his concession of the theory of evolution by natural selection from looking at mathematical models, right? There was a, in the 20th century, there was a, especially in the first 20th century, there was a surge of mathematical models of the evolutionary processes. And Poplar developed this logical conception of the theory of evolution by natural selection from looking at this kind of secondary literature, right? And particularly these mathematical models. And he said, well, it must be the case that evolution by natural selection is not a science theory, but it's in fact a medical research program because it's structured pathological. According to these formalisms, we cannot differentiate realized fitness from the sort of abstract idea of adaptiveness. Therefore, it must be that evolution by natural selection, Darwin's theory is a topology, right? If he developed his notion of evolution by natural selection by reading Darwin himself, he wouldn't have come away with this, right? But he's imputed the qualities of the mathematical model with the theory of evolution by natural selection. And later he recounted that and said he was mistaken, but it's a, it was an interesting debacle in the 20th century, right? The third type of reification involves taking some features of an interpretation of a model and confusing that with the structure. So we might relate a model to a target system and land it in interpretation and then claim that some metaphysical conclusion follows analytically from the map, from the structure, right? And basically to have philosophical conclusions, to have metaphysical conclusions, to have theoretical conclusions, we need a, we need a construal, right? We need structural plus control, structural plus interpretation, right? It's, it's common to try to pull metaphysics from math. And math alone does not get us metaphysics, doesn't get us ontology alone. It needs the, it needs the addition of interpretations. Similarly, it's common to try to pull facts or knowledge from, from the model alone or the results of the model alone without relating the model or the results of the model back to gathered data, to measure data. It doesn't work. We need, we always need the contribution of data to be able to say that we have knowledge of a natural system, right? A simulation alone, unless you're plugging in data, a simulation alone does not give us knowledge, right? We always have to relate, relate our models and the results of our models back to something we've measured in order to have knowledge of the natural world. And this is a common mistake, right? This is a very common mistake in, in evolutionary biology, right? We see this all the time where we're, we're doing modeling work and we claim knowledge without, without checking against data. The fourth type of reification is when in interpreting or applying the results of the model we forget to de-idealize. We forget to acknowledge or compensate for distortions that we've introduced in the model process. So for example, when we build a scale model of the bridge, some of the properties of that model are going to scale linearly. They're going to scale up to the actual engineering sheets that we're trying to accomplish, right? But other properties, for example, some, some structural properties of the material do not scale linearly. And we're going to be in big trouble if we assume that everything scales linearly, right? And they're, they're going to have to be like scaling coefficients at play to make sure everything works out when we go from the scale model to the building of a real architectural system, right? Likewise in biology, we're, in doing evolutionary modeling, we're often making simplifying assumptions. For example, we, we imagine an internet unspructured population or we imagine them like discrete generations, non-overlapping generations. And we know these not to be, you know, true of real populations. Real populations in nature are slain, they are structured. They are, we have overlapping generations, right? So, so when we, when we do that modeling work and then relate it back to real systems in nature, we have to remember that we've made simplifying assumptions and parse those out so that we get something that's realistic, that corresponds to, to what we observe in nature. The fifth form of reification, this is particularly the case in the cases of model transfer, right? So this is when a construal of a model in one domain is misattributed to the construal in another domain. So we interpret, we develop a model in one domain, we bring it into another domain, relating it to a new target system, but we're falsely carrying over aspects of the construals in the first domain, inappropriately in the second domain. So, a kneeling, we might, we might be forging a sword and to get, right, to get the right lattice structure of that sword to make sure that it is robust, but it's not going to break. We can't drop the temperature all of a sudden. We have to, we have to do this slow process of bringing the temperature down to get, to get the lattice structure to form in a way that's, that's free of the kind of abnormalities that would lead it to be more susceptible to shattering, right? But we also just simulated kneeling. So I'm not doing machine kneeling, right? And we, we want to learn a feature space that's a, it's got a bit of a cynic-y topology, this feature space. We find that, that using like a simple backcrop, simple like gradient, the center of the center, we, we tend to find ourselves landing in local minima. And so we want to do a simulated kneeling where we're, we're playing with the temperature of the system, right? Where temperature corresponds to how, how big the jump it's taking, all right? The magnitude of the backcrop is it's taking in gradient descent, right? So we're playing with the, the temperature, quote, unquote, to get this machine learning model to find a global minimum, to find an, an optimum value, right? But your temperature is analogical, right? And simulated kneeling, we're not referring to the real temperature of the system. And it's, it's common. You see all the time that people actually think things like we're actually the computational system with the physical computational system we're using. We're actually lowering the temperature on that. This is a kind of mistake we see people make all the time. You see modelers make all the time. And this kind of mistake is, is common, very common in the free energy principle literature where we take, we have borrowed models by analogy from the physical domain, right? Into a statistical context. And we take them to have their original physical meaning, right? Like energy. And we take it to refer to actual energy, or we take temperature to refer to actual temperature, that is physical temperature, right? So construal in one domain is mapped to construal in a new domain, right, inappropriately. They're appropriate, like metaphorical mapping, right? But reification here happens when, when that mapping is, is somehow pathological when it's, when it's leading us astray. The sixth and final form of re-education is when data are, are naively said to model, or the output of a model is naively interpreted, such that the modeling effort doesn't elude state, but in fact obscures causal facets of the target system. And this I think is probably our, our biggest worry in machine learning in science, right? This is really common in large-scale statistical models. Diagnose, and for example, Lewington in, in 1934, there's a really close analysis of ANOVA, analysis of variance. So I mean, oftentimes what we're doing with this model, right, this model gets this variance, it gets us, it gets us correlations, right? It doesn't get us fundamental underlying causes. And we're often trying to use this method to get at causes. We're often in interpreting the results of ANOVA. We're often interpreting these causally without license. And, and, you know, in, in a preliminary work of machine learning, we see really kind of pathological um, uh, reification in this respect where we, um, we have rough, huge correlations in what we've observed. And we take these to get us at some kind of underlying causal factor that doesn't exist, right? We have phenotypic variance and we, we think it's getting us at genotypic variance when the, when we know genes not to work like that. That sort of thing, right? So I think that paying attention to reification is going to be really important with machine learning, um, for a number of reasons. Um, right, machine learning systems are in some sense complex and not widely understood. Um, what they're harnessing is in some sense what, what machine learning systems are harnessing is in some sense like very rudimentary statistics, like, like DOM statistics, right? Um, which is not difficult to wrap your head around. Um, but what we're using machine learning to do is, um, data analysis at a level that, you know, the human mind simply cannot do. It's doing things that we cannot wrap our heads around because we just cannot process data that magnitude, right? Um, so it's going places that in some sense we can't. Um, and yet we can do it, we can, we can employ machine learning very conscientiously, right? We can, we can understand what it's doing such that we can debug it. Um, some of us not understanding machine learning comes down to it being oftentimes proprietary in, in a, in a lot of contexts, the machine learning systems that we're using are not available to us because they are, um, corporate owned. Um, another problem that arises is that, um, we might have a system that's a black box, right? We talk about explainability and understandable, understandability and, um, transparency machine learning systems. Um, we might have a system where we just know the inputs and the outputs, but the, um, the underlying rationale for how that system has gone about categorizing or dealing with that data is unknown to us. Um, it may also be that data is perhaps mishandled or of unknown provenance, right? Um, we don't always have control over where the data is coming from and, and, um, what's been done to it, you know. Um, and if, if the data's been improperly gathered or improperly handled or improperly, improperly fed to the model, um, we get all sorts of problems. Um, we also know machine learning systems to be very sensitive, um, to noise and to artifacts. They're easily fooled. Um, it's, it's easily, it's easy to, to, um, lead a machine learning system astray, right? Um, so in all these senses, um, machine learning in science is, is ripe for this kind of reification, right? Um, we don't know what's going on under the hood always. Um, maybe we're, maybe we're trained as, you know, astrophysicists or population biologists or, or what have you, but we, we are not trained as, um, programmers or statisticians or machine learning engineers in a way necessary to really get what's going on here. Um, it may be that, um, even if we do have a good high-level understanding of what's going on, the system is simply unintelligible to us. It's simply, um, doing things that, that we cannot penetrate, right? Um, and so on and so forth, right? It may be latching on to artifacts that we don't know about. Um, so in conclusion, um, I think we can learn a lot from this philosophy of science literature on modeling. Um, I haven't, uh, I haven't necessarily argued a strong, um, philosophical thesis here. Um, that wasn't really my intent. And my intent is more to, um, open up a conversation about machine learning methods in science and what the tools we already have at our disposal, for example, modeling literature is able to do for us here, how this is able to elucidate machine learning in science. Um, and what the pitfalls will be, like what, what we need to watch out for in doing, um, in going about machine learning in science, right? And I would like to see the philosophy of science literature sort of, um, take up this gauntlet in a strong way, um, both in terms of, uh, getting ahead on, on a class of flying machine learning in science. And, and really most importantly, getting ahead on, um, figuring out how to identify problems with machine learning, figuring out how to debug machine learning. And I think, um, understanding the perils of reification is really important. Um, and I hope that in delineating what I hope to be sort of the main, uh, types of reification in science practice, um, with models that, uh, this will give us some headway. Fantastic. Great. Thank you so much. Uh, very, very cool stuff. Questions are rolling in. Let me, uh, let me go ahead and get right to it. So let me start with a question from Brady Fullerton who asks, who says, really enjoyable talk. Thank you. Uh, but then asks, how would you address concerns in machine learning over whether the practice must resemble models of the scientific methods strictly? For instance, you discussed machine learning as an exploratory model, but what do you think of the temptation to see machine learning as needing to involve hypothesis testing? Is it a mistake to commit oneself to hypothesis testing in machine learning? Uh, when the model of exploratory machine learning is perhaps more suitable? Yeah, I think, um, I don't want to say hypothesis testing is, is, uh, is out or, um, is no longer useful to us. But I think, um, I think this literature on, on model based philosophy of science tell us that there's a lot more going on in scientific practice in the scientific method than, than sort of, um, simple hypothesis testing. Um, a lot of what we're doing is kind of, um, uh, heuristic. A lot of what we're doing is kind of fashioning broad net, right? Um, um, casting them out to the world and seeing what we draw in. And then, you know, figuring out from scratch what, what sticks, you know, figuring out how to deal with it from there. And I think, I think more and more these days, we're going to see, um, with the increased role of machine learning in, in scientific practice, we're going to see more and more that science is, is quite like that and not like we traditionally can see for this. Okay. Next, next question coming in from, uh, Stefan Hesperugen who asks, actually picking up on your, on your question about, uh, your mention near the end of categorizing machine learning, uh, do all the alleged bugs that you see in, in, I like that bugs in quotes, uh, in machine learning apply indiscriminately across, uh, the various forms of machine learning. So unsupervised, supervised or deep machine learning is, do you think there's, there's a, there's, there's more taxonomizing to be done there? Oh, absolutely. Yeah. Um, yeah. And this is, this is why I think, uh, it's, it's sort of, um, it's very timely that, that we as philosophers of science turn to trying to like understand machine learning, um, as quickly as we can, because there's, there's just, there's such a, uh, a plethora of, of types of models that come with their own sort of like, epistemic pitfalls, right? Um, uh, do I see like, uh, if I just think on it right now, do I see that, that some of these forms are more likely to, that's something, you know, I, I, I haven't, I haven't thought that far yet. Um, but it's something I hope to, to be thinking about. Cool. Um, one more question I see here from, uh, from Yomantin. Hey, hey Yon, how are you? Uh, it says, interesting talk with lots of promising ideas. I'm curious if, if Mel wants to relate the talk to the connection between prediction, understanding and explanation. So can we get any type of explanation from machine learning models? Do these models explain anything beyond, uh, beyond their ability to predict? That's really interesting. Yeah. Um, yeah. So there's, right, there's, um, there is a robust literature, um, on explainable AI, right? And there is a robust literature on, um, like explanation as an epistemic goal in philosophy of science. Um, as far as I know, no one has drawn that connection yet. Is that true? Has Carlos said Nick from that connection? I don't think anyone's drawn that connection yet. I'd love to see that. Um, I think that there's, uh, again, a really fruitful, um, area of exploration there. Great. Um, we are out of questions and essentially out of time. So I think this is perfect. Uh, perfect, perfect place to, uh, perfect place to wrap up. Thank you so much for a fantastic talk. Um, and we will be back in