 All right, it's fast five, so I think we should get started. So hello everyone and welcome to this, I think already fourth seminar in our machine learning and physics seminar series this time. Our guest today is Nathan Coates, the director of the AI Institute in Dynamic Systems and the Yasuko Endo and Robert Balls Professor of Applied Mathematics at the University of Washington. He is also a senior fellow of the East Hollywood Institute and an adjunct professor of physics, mechanical engineering and electrical and computer engineering. Among as many centers of interest are notably neuroscience, fluid dynamics, nonlinear ways and current structures. Today, he will give us a talk on yet another fascinating subject, the potential for deep learning models to directly encode physics. The title of this talk is The Future of Governing Equations. Thank you very much for joining us, Nathan. The floor is yours. All right, thank you. So thank you guys all for having me and I'm hoping to just share some thoughts. The title is more of a self-reflection. So I don't wanna overburden anybody's mind too much on like I have something super intelligent to say here except for that I have a lot of reflections on this because we're starting to think a lot about what data-driven methods can do for us from physics modeling perspective. And it does raise a lot of questions about how we should think about our traditional modeling approaches when we have the power of these neural networks and deep learning algorithms and machine learning in general. So what I wanted to do is start thinking a little bit about it and frame this around what I really am interested in is a target use of neural nets for solving physics problems. And this is just a very rough sketch of almost everything I'm gonna do is gonna involve something like this which is to think about taking data and learning your coordinate transformation to put the data into a representation where I can easily think about the dynamics, okay? So oftentimes the measurements you take aren't the right measurement space but oftentimes we also know what coordinate we should be but if you take a system and you're agnostic to that, how does this thing learn a good coordinate system and a representation of the dynamics? I always like to couch this in terms of a very well-known problem for all of us which is celestial mechanics. What you're seeing here in the foreground is Mars, the retrograde motion of Mars. And over here on the left is the retrograde motion of Saturn. And this is sort of the classic physics problems that for millennia was this was physics. How do you start thinking about the motion of these heavenly bodies? And what are the right, how do you do a prediction or a forecast of where these planets are gonna be in the future? So the first successful theory of this actually came out of Alexandria, Egypt in the second century AD. This is from the Claudius Ptolemy. In fact, the Ptolemy dynasty there that was one of the generals of Alexander the Great took over Egypt, Alexandria at this time was the intellectual capital of the ancient world. And out of this came this idea of the doctrine of the perfect circle which is from the Earth's perspective you could essentially think about the motion of the planets as circles upon circles. So all you have to do is specify these radius of circular rotations and you could actually have a really good predictive model for the motion of the planets. So one way to think about this this was a very early version of a Fourier transform, right? Put the motion of the planets in terms of circles in terms of circles on circles. This theory lasted about 1500 years and this really broke down as mounting evidence in scientists came to the forefront. So we had Copernicus really pushing us towards the heliocentric universe but it was really the work of Galileo and Kepler that solidified this new coordinate system which instead of all the heavenly bodies coming around the earth that in fact our solar system was revolving around the sun. And so this change of coordinate system was foundational and fundamental because once you have the right coordinate system then comes along Newton in your neighborhood there and Newton is able to then propose in this coordinate system our F equals a may inverse square law rule for the motion of the planets, okay? And it turns out we had this theory for several hundred years before in fact some inconsistencies were noted and we developed this further we found that F equals a may wasn't quite right but in fact there's this general relativistic effects that Einstein was able to with improved data build a model for that. So every time we improve our data collection we are able to make progress in terms of our theoretical constructs and I think that's just a part of our historical legacy that we've inherited. So let's come back to this system and I wanna and now that we think about data-driven methods in collecting data I wanna think about two models that were built out of this and let's just kind of walk through these because it's gonna be important for us to frame this idea of a governing equation that we wanna follow up on. So one model of this was exactly what Potolamy did and here it is and this is going back to the paradigm of let's say some kind of discovery architecture I wanna automate this process I'm gonna give you the input data which is actually measurements of the night sky and what I want to happen in this is I want to make a coordinate transformation learn dynamics in this new latent space and then come back out and be able to come in and out of this coordinate system. So what the Potolamy model is is something like this this coordinate transformation which is actually 2D data looking at the night sky gets projected into a model such as this here which is circles on circles. So this is where you see this circles on circles theory developed here. So this is a representation of how you might construct a predictive model for the planetary orbit. So in fact, this again this was a highly successful model and it was only through improved data that we were really able to move away from this model into a second model and the second model is the one that we are more we have adopted is more of the truth which is what I really wanna do is come from the data I wanna learn a coordinate transformation into this coordinate system where in this coordinate system here which is the heliocentric coordinate system I can write down my F equals a model. Okay, so both of these allow you to do prediction both of them allow you a construction of the data but there is actually a big difference between this F equals a model because the F equals a model gives us our governing equations and we often think about that as the truth the ultimate truth in any system is if you can write down the governing equations for that system somehow you've written down the DNA for that problem. And so let's come back to these two models with Kepler and Newton because really this gets us right into the heart of data science and where we have to start making some choices and decisions about what we wanna be doing a long-term with our data or how we wanna use it. So in some sense, what Kepler did is he took Tycho Brockett's data after he passed away and 11 years later, he was able to publish his tableau de nouveau where he actually, I think that's the one he actually a tableau redouffle one of those that he was actually able to publish his basically elliptical orbits, right? So he had to do these all these hand calculations to get this, right? Where if he would have had MATLAB or Python, right? This is in the data. No, this is the afternoon of work with our modern regression algorithms we have but what he was able to do is that's exactly what he did he regressed onto a model. So this is an interpolation. What Newton did is he proposed a mecanithus model F equals MA so that what he has the ability to do is not only does he reproduce the orbits of Kepler but if you started to imagine what it would take to launch a rocket from the earth and put it around the moon F equals MA generalizes to that situation. So this is why we love governing equations. I can actually propose trajectories I've never seen or I've never seen anything like it and I can model that like putting satellites in orbit versus Kepler, he cannot model that because he's never seen a rocket take off from the earth he doesn't know about escape velocities Holman transfer of orbits, none of that because he's not observed any of that in his data. So this idea of this having this underlying governing equations is really important for us. By the way, this fundamental distinction between this interpolated model and this extrapolatory type model is really present in our modern day thinking of machine learning. So here are I would say the two grand challenge problems we have today in autonomy. One is a robot, one's a self-driving car and in some sense they inherit very different portions of this legacy that we have. The robot is based upon Newtonian mechanics to make that robot function. It's imbued with ethical MA physics, classical mechanics. This is one of our graduate courses we still take today in physics, learning how masses, angular momentum all of these kinds of concepts play out in making this robot work. Whereas the self-driving car is a bank of sensors. So the bank of sensors is what allows us to sort of teach this car, you label all the data and tell it, you know, don't hit people, drive between the lines, stop at a stop sign, you give it rules, it has no physics knowledge. So in some sense, the way to think about this is the robot inherits our traditional physics point of view, which is I'm gonna give it governing equations and the self-driving car just inherits essentially labeled data. And so what's important about the second one here is that you're trying to make the self-driving car an interpolation problem, whereas the physics-based problem, you're trying to allow this robot to have the capability to extrapolate, okay? With its physics rules. So the legacy of Newton and Kepler I feel are just exemplified beautifully in these two modern day examples, okay? So I wanna come back to this, because now we wanna come back to a more general architecture mathematically for what does it mean now to start thinking about building models from data? What are the machine learning possibilities for us? So here's the mathematical architecture. I'm gonna take measurements of a system. Those measurements are called Y and I'm trying to measure some state space in that system and I might not know it, it's X. So for instance, I take measurements Y and by the way, there's noise on that measurement. So I'm gonna characterize that here. There's some measurement model and there's actually some state space which often I may or may not know, okay? It depends upon your scenario. The dynamics of the state space itself in other words, the governing equations of the system is given by here, dx dt equals f. And again, I'm gonna make the assumption that there's a very good chance in some of these more modern complex systems I don't necessarily know the dynamics F. However, I will talk later about what happens if we have partial knowledge or F or we actually know F, what are the possibilities? But here the assumption is I'm just gonna take measurements and from the measurements alone, I would like to reconstruct the state space X, reconstruct the governing equations F and learn the parameters. This is a terribly ill-posed problem. And so the only way you can solve this is to make it well-posed which means you have to put extra constraints on this ill-posed problem in order for you to get an actual solution, make it well-posed and get a solution. So the question is what are the constraints in machine learning parlance that comes out to be the idea of imposing regularizers, okay? So regularizers are just simply essentially constraints or kind of in an optimization language, things that you would like to see enforced in building a model. So what we're gonna actually do here is we're gonna enforce this as our constraints. And for us, this is the ultimate goal is this idea of interpretable models, parsimonious models. You know, I have on my bookshelf here a bunch of physics books and I'm sure you all have a bunch of physics books. And if you look at these, it's remarkable that in fact, you look at things like Maxwell's equations, Schrodinger equations, Navier stokes, all of our governing equations we look at typically are pretty small. They fit on one line of a page and only a portion of that. So why is that? So this is usually due to this idea of dominant balance physics but also when we do these derivations we get these simple laws and we often get parsimonious interpretable representation of the physics. And so this idea is what we're gonna impose here. And these are old ideas going back to William Obama about imposing sort of nominal models. Impose a model that's just as complex as needs to be but no more complex than that. Same thing with Pareto. Pareto was very much in terms of there's this like, you've probably heard of a Pareto frontier which is this idea of how do I build a model and make it as parsimonious as possible and not overly complicated and yet still get good performance. So this is gonna be for us the ultimate physics regularization. That's just an opinion, but it's my opinion it might be yours as well because obviously if you come from physics you believe in the power of these really amazing governing equations we have. And so I wanna really enforce that through this kind of regularization. Okay, so I wanna use this to do this model discovery trick. I wanna take data and discover models from it. And of course, for the last few years I've been really thinking about discovering governing equations but more recently I've really have thought broader about this and this is why the title is the little self reflection is about what is the future of those governing equations. And I wanna give an example walk through my thinking around this. So here's the paradigm I want to talk about. It's Berger's equation. Now Berger's equation comes from around right around the 1950s even like a little bit earlier than that but what it is, right? It's a wave equation essentially where the speed of the wave depends upon the amplitude. So this is gonna create a shock structure and what was added to the Berger's was some diffusive regularization. So as the shock starts to form this diffusion term becomes important and helps regularize the behavior of the solution. So let's call this the truth. So this is let's say a governing equation of a system. And in some sense, this is ultimately for us in physics and math like this is the truth. If I have this model, I have the blueprint for understanding everything. But let's talk about what you would do if I gave you this model. If I gave you this model, you'd say, oh, this is Berger's equation. And what you'd really like to do is though I wanna understand what's the behavior. So you would try to solve this equation. So let's take a path to solving this. One path is to say, well, and this was done by Kohlenhoff independently in 49 and 50 or 50 and 51 is they said, look, there is this transformation which is we know the Kohhoff transformation that if you go to a different coordinate system, this nonlinear PD becomes linear and there it is. So one could ask the question right away, was this a good governing equation? Cause this one seems to be better. Why? Because it's linear. And in fact, that linear equation, I wouldn't stop there. I said the whole reason I like linear equation is I can write down a solution to it. And so for instance here, if I assume some domain L with zero boundary conditions, this is your eigenfunction expansion solution to this. And of course we learn these all throughout physics, right? We learned this from wave guides where we have modes, electromagnetic waves, trapped in wave guides to quantum states. These eigenfunction expansion solution ideas, this is a representation of the solution of this. And then the question is, all right, well, I have the solution, by the way, it goes technically to infinity here, but really I would never do that. I would say let's go to some finite end. So you see there's taking this path, there's at least four representations possible for me. I could represent this physics in the original PDE burgers, which is nonlinear, or in some transformed variable, which is much nicer. But even if I had that, really the only reason I like linear systems is because then I can write down the solutions for it. But by the way, if you look at this, I still haven't told you what the solution looks like. In order for me to actually produce what the solution looks like, I have to simulate it. So I actually have to, or I can just add these solutions together to produce, for instance, this as my solution, okay? So given some initial data, I can actually compute the A of Ns here and there's the solution. So in others, for me to actually give you what the field you looks like, I actually have to do some computations, okay? Because this here is just an abstract representation just as this governing equation is. There's an alternative path. Very rarely do we have something like a Koholff transform to linearize the system and then allowing us to do this eigenfunction expansion. An alternative is just say, well, let me represent this by something like a numerical scheme. And here's probably the easiest numerical scheme available for us, which is finite differences, Euler step and time. And this little model here is essentially an approximation to this, an approximation you can bound by the way by how you discretize delta x, delta t. Remember, if we collect data, it's at finite delta t and finite delta x. So this is not necessarily a very bad representation of what we might actually do in a real system. So this is just an alternative. And then I can just iterate on this for every position and time point and march it forward into the future and again, produce the solution. So here's my point around this or the question we need to ask. What should be learned in machine learning? Right? I've given you three representations of the same thing. The full PDE, I can give you that and say, here is this truth. This is the governing equation. But if I ask, here's another truth, which is actually the solutions. This is a manifestation of this, but I have to work at getting this. And this is actually pretty high dimensional in a sense that maybe spatially I have n points and there's m time points. So this matrix is n times m, whereas that representation here is a very small number of character strokes. So maybe even 10 bytes or 20 bytes of data, I can get this whole equation encoded, whereas this is actually much larger because I actually have the representation of the whole field. But I could also use, again, I could learn a stepper or I could even just go directly and learn this solution form. So when you do machine learning, you have options. And the question is, normally when we write down a governing equation, we're gonna still solve it. So the question is, if I'm gonna go solve it, maybe I could just learn the solutions directly instead of the governing equations, right? These are the kind of things that we think about a lot in terms of what we're gonna do under learning paradigms. So let's come back to this because what I hope I illustrated for you is that you have options when you start to frame this mathematical problem here, which is when I come to this dynamics layer, I start to ask questions. And the questions I'm asking is, should I really learn F? Or should I, if, you know, what I did with burgers is I transformed from this coordinate system X to another where the dynamics was linear and then I transformed to another where I could just write down the solution as an eigenfunction expansion. Maybe I could just go for measurements directly, skip learning the governing equations and just simply write down this expansion in terms of these eigenfunctions, right? That's the kind of thing that we start to think about a little bit in this learning process. All right, so here's what I'm gonna do. I'm gonna start this process and I'm gonna start learning different things. The first thing I'm gonna do is say, I have a knowledge system, but I'm gonna try to just see if I can just do what we did with that Cohoff transform and just say, instead of learning the governing equations, learn a linear model that's equivalent in some equivalent coordinate system or some learning coordinate system that makes everything linear. That's maybe what we're really saying here. So how to build a linear model? So let's start off with, this is work with Bethany Laush and so the idea of what Bethany brought to the table is to say, look, let's consider the pendulum and I don't mean the linear pendulum. I mean theta double dot equals minus sine theta. So the full linear pendulum, that's a harder problem than one might think. And what I wanna do with that is say, well, that's a nonlinear dynamical system. I can take these measurements and I want to move into a new coordinate system where in the new coordinate system, the dynamics is linear. In other words, y of k is time t of k. And so when I go from y of k to y of k plus one, I want it to be a linear operator. So I wanna try to learn a transformation to a new coordinate where the dynamics is linear. And this is actually a pretty hard problem until you realize that actually, in order to make this work, you have to parametrize the linear operator. The pendulum is two-dimensionals. You write it down in terms of theta theta dot. And so I know that's true. And one of the things that you have with a neural net is you could always go towards infinity, make a really big neural net, but what you've lost is a really key important part of this, which is, wait a minute, isn't this thing really low dimensional? And the minute you go to high dimension, I feel like you've lost because what you've done is you've given up on this idea of interpretability. So not only is it parsimonious in its representation as a model, it's also parsimonious in dimension. And I would like to restrict both the dimension to the right dimensions of the system and build a parsimonious model. So this is what we're doing here is we're trying to say, keep this two-dimensional, but if I wanna learn a linear operator, I just have to have it depend parametrically on the frequency of the oscillation of the amplitude sense. So this allows us then to take the full nonlinear pendulum, which is theta double i equals negative sine theta, which again, you can see some of the more, oops, exotic oscillations here. And here you are in the face plane of theta theta dot, and you can move yourself into a system where all the dynamics is linear. There's a linear operator that takes your solutions forward in time, okay? All the way out to the saddle, which is the pendulum sticking straight up. So this is one way to start thinking about what you can do with a neural net. I've preserved the dimensionality and I've made this nonlinear problem linear. We can make this a little bit more fancy by taking a harder problem. This is a partial differential equation, flow around the cylinder. So it's Navier stokes and this flow around the cylinder produces these von Karman vortex shettings. And if you take snapshots of this and you look at some low dimensional embedding of this, which you can do through the singular value composition, you find there's three dominant modes that dictate the dynamics. Now this is very low dimensional, three dimensional, and you can take that three dimensional system and embed it in a linear system. So you can make the flow around the cylinder, which is a fully nonlinear dynamic system into a linear system. Notice what I'm doing, right? I am enforcing this. I am saying, hey, look, I like linear systems. Just find me a coordinate system under which this dynamics is linear and I'm able to do that. Finally, I'll ramp this idea up significantly here. And this is the work of Craig Jim. So Craig was this fantastic postdoc. He kind of took over after Bethany and he did something that I'm still kind of so surprised he was able to do, which is we considered something like the Kermode-Schivitschens-DPDE model, which generates spatial temporal chaos. The very difficult system to sort of characterize because the dynamics, again, because of the spatial temporal chaotic behavior that it has. And I didn't think we had any shot at this, but what he did, he said, look, let's take training data, simulation data from this and let's learn a coordinate transformation to a new latent representation V where I want the dynamics in this new representation to be linear. And in fact, this works. So what he did is he's discovered a linearizing transformation for the Kermode-Schivitschens-DPDE. In other words, an equivalent transformation to Kohlhoff. The Kohlhoff took a nonlinear PDE, made it linear. And what Craig did is he took a deep neural net and he learned the same kind of linearizing transformation for a much more complicated system. So this is very much like the inverse scattering transform. There's only a few PDEs that we know of that you can actually perfectly linearize Kohlhoff as well as the completely integrable PDEs like KDV and nonlinear Schrodinger. So, but it seems to be something that you can actually do if you have sufficient training data. And again, we've moved from the original governing equation truth, which is some nonlinear PDE, which I only have some qualitative knowledge about behavior. I have to actually simulate to see the behavior. So a linear system is V, where actually I can produce the eigenvalues and eigenfunctions at matrix and learn quite a bit, okay? So that's the kind of thing that we're looking at here. I could also just go backwards a little bit in time. And what I mean by that is, let's actually go back to the idea of Claudius Potolome. Potolome just said, look, if I have some ultimately persistent motion, he's gonna just place it as circles on circles, in other words, oscillations and different frequencies. And this is how he represented the motion of the planets. And we can do something similar, which is any long time persistent dynamics, one representation that's actually advantageous to use to represent that long time dynamics is 40 modes or cosines and sines, okay? So we're gonna do that here. And this is work with Henning-Longa. So in other words, if you remember in my burgers solution, you could say take the burgers equation through the Cohoff, turn it into a heat equation, solve the heat equation, you can write down a Fourier representation. Why don't I just learn the Fourier representation to begin with and call that my model for the system. In other words, I learned the solutions directly. That's kind of the idea here is to say, I'm gonna skip learning the governing equations or linear model, I'm just gonna learn, in fact, in some sense, the solution representation. And that's exactly what Henning did here. So what he said is, look, I'm gonna look at some of these quasi periodic behaviors that are long time persistent. And for instance, here's some time series data and it doesn't look very sinusoidal. So obviously you can fit it to sinusoid and that's one way to do it. In fact, if you don't wanna constrain periodic boundary condition, you don't wanna do it FFT because FFTs only have prescribed frequencies and they have to have a periodic box. What I'd like to do is learn the frequencies and not constrain to a periodic box. So we can do this through gradient descent of a objective function like this. But more than that, we can now use a neural net to take a time series like this and make it as sinusoidal as possible and then fit my model. So again, it's a coordinate transformation but now the coordinate transformation is happening in the time variable. I am warping time to make time look sinusoidal. And then of course, I can learn frequencies and do an amazing job doing this fitting to Fourier modes. So again, think about what I've done here. This is like saying, I'm just gonna take the data that Kepler had or Tommy had on the sky data and say, learn how to transform this data to a set of frequencies directly. So skip learning all the steps in between and just learn the solution. That's kind of what this does, right? Cause that's actually what I do if I had a problem and I could solve it, I would say, here's the solution to the problem and this is what this is doing. So how well does this work? And I wanna also highlight the power of this thinking of like Fourier bases that we have. And this is called also a Coopman operator. It destroys all these like, you know, modern machine learning, like LSTMs, grooves, equestate networks, all of the sort of highfalutin machine learning algorithms because we're using this physics concept underneath it. And you can see here on the table, the performance is significantly better than other statistical machine learning methods by using this trick of understanding somehow these solutions that I come about and that we've known about for a long time out of physics. Here's again, a different example of this. I can take a full simulation like this, which is a very high dimensional and running a full PDE on this thing, okay? So it's very expensive to run, but I can take snapshots of this and look at the low dimensional structure and it's time what the time variable is doing in these low dimensional structures. And then I can just do this trick that Henning developed and say, well, learn a model for that time dependence and then work time to make it sinusoidal. And now what you're looking at here, this is my solution. In other words, just like I can give you a four representation of the solution, that's exactly what I'm doing for this whole PDE. So I now I can evaluate solution at any point in time because I have an expression for the solution, right? So in other words, I've learned the solution, not the PDE, okay? And the question is that, you know, it starts to turn on its head, should I be learning governing equations or should I just learn the solution form? Because ultimately, even if you learn the governing equations, the first thing you're gonna do is try to get a solution out of it. But this is just saying, skip the governing equations, go right to solutions and I have it, okay? There are alternatives. The alternative here I wanna talk about is instead of linear models, we could just learn, try to learn governing equations directly. So let's learn nonlinear dynamics. And so we've developed this method called sparse identification of nonlinear dynamics. And the idea is, I gave you time series data. So I don't know necessarily what the model was, I just give you time series data, let's call it X. And you knew it came from some model X dot equals F of X, you just don't know what the F of X is. So if I give you X, you could produce X dot. So, you know, if you don't know what F of X is, but you know it's X dot equals F of X, you can produce X dot and then F of X, what is it? Well, you don't know. However, you can pause it here, a library of potential right-hand side terms. F of X could be lots of different things. Let's make a library of all the things I think it could be. Here what I've done is an example where we put in a bunch of polynomials up to degree five. So this is a matrix theta X. So ultimately what I'm gonna solve here is AX equal to B, nothing harder than that. It's an over-determined AX equal to B solved. And what I'm gonna do with this is to solve it, it's opposed because it's over-determined. But what I'm gonna do is impose a solution where I have the sparsest loading of these library terms. In other words, this here tells me this is the B, so it's the X I'm solving for, which is the weight or the coefficient in front of every library term. And what I know is when I do physics, there's usually only a few terms that matter. And so I'm gonna promote a sparse solution and by doing so, these dots tell you the non-zero coefficients and they're exactly the Lorentz. So in other words, you gave me the time series, we regressed with the sparse regression. And what it said is here's the terms that matter and it's exactly the terms that produce the equations. So you can do this for part of problems like this flow around the cylinder again, you take the data, you take snapshots of this, you do this regression procedure and you write down this dynamical system for the evolution in this low dimensional space. And by the way, this took about 30 years to derive and this was done by a NOAC and coworkers in 2003 to start really understanding essentially the normal form that dictates the bifurcation off for this Von Karlin vortex shedding. And you just discover it as part of this AX even to be a regression architecture. You can use this and Sam Rudy here, one of our grad students who graduated a bit ago. Here's a bunch of canonical PDEs, we often study and apply math and mathematical physics. And for every one of these, if I just give you spatial temporal data, you can discover the governing equation. And so there you have it. There's this, including, if I measure the vorticity of a flow field, I can give you back and discover maybe your strokes. So there's no pillbox derivation, there's nothing like that. This is actually discovering what we think about is our governing equations. But of course, I can still ask you, yeah, but once you have this governing equation, what are you gonna do with it? Well, I'm gonna say I have the truth, but then I'm gonna have to figure out how to solve it. So I'm gonna either do numerics or find some analytic techniques. And so again, what I was talking about previously is to say, even if I give you this, then maybe the next step in the process is find a coordinate transformation to make this linear, which in fact, you can do with KDV with the inverse scattering transform. And then you can write everything down in terms of a set of eigenfunction and eigenvalues in this new coordinate system. So in this model discovery, I actually gave you the state space, but oftentimes I have to discover the coordinates and dynamics at the same time. So here is what Kathleen champion did. And I'll give you a very simple example of this because this is sort of in some sense where I really like what Kathleen did was she said, look, oftentimes you don't know what the right coordinate is. So how about we do the following? We'll come back to this architecture I started with at the beginning, which let me give you input data. And your objective is to learn a coordinate transformation into a new variable set, let's call it latent space. And then the question is, what do you want to have happened in that latent space? Well, from this model discovery point of view is in the latent space, I want to discover the governing equations. So what I can do is impose that Sydney architecture right there in this latent space. So the model discovery is going to go the following way. I'm going to take data and I'm going to say find a coordinate system where you can find some parsimonious representation of physics versus what we did previously is find a coordinate system where you can write down a linear model. So now I'm going to allow it to be nonlinear model. And the canonical example here is this. I give you videos of a pendulum. So this is just like a video of a pendulum. There it is, we detached it to a wall. And I say, okay, in other words, when you look at this because you have physics training, you'd say, look, yes, it's a pendulum. And the way I would write down the governing equation is theta theta dot. But what this has is just the video. It doesn't know there's a theta theta dot nor does it know the governing equations. So the goal here is to first, that coordinate transformation has to learn how to parametrize the coordinates correctly, which it can do by saying, I have to learn theta theta dot. And then once I've learned that, I have to learn that this is the governing equation in that coordinate system. And in fact, that's exactly what Kathleen was able to do. She was able to say, take like a video of physics. And I'm not going to tell it that there's a theta theta dot. It's going to first learn there's a theta theta dot. And then once I've learned that, I learned that in fact, the dynamics, the parsimonious dynamics are given by this model. So this is sort of really for us, I was super pleased with what Kathleen did here because this gets us to what we tend to do in physics. Is what we've always done is say, well, if I can just write down the right coordinate system, I can solve the problem. This is what we did when we changed the heliocentric universe. In fact, this is exactly, I observed the night sky. I transformed the coordinate into heliocentric coordinate system. Then I build my model there. Same thing with the pendulum. Okay. Okay. So there's one last thing I want to talk about in all this model discovery, which is how do we actually make use of all the physics we know? You know, so far I've been saying, we modeled these systems as if we didn't know anything. And of course, you wouldn't be at this seminar if you didn't know anything. You studied physics or math, or you have background knowledge of a lot of systems. So how do we bring in known physics into a modeling paradigm? In fact, I'm a big opponent of the fact that you should use all the knowledge at your disposal to build better models, okay? And so the descriptive modeling framework starts out with this idea. So instead of discovering a model from scratch, you start with partial knowledge of your physics. You might know that there's some kind of idealized Hamiltonian or Lagrangian, or that there's some conservation laws or some symmetries. And so you might know this is an imperfect model, but there's some truth to it. It has these symmetries as this conservation law. So here's your partial model of the physics. The problem is the partial model of the physics doesn't quite explain what you see. So you're missing some physics and you have to learn that. So how do you do that? There we go, I'm a little slow here, there we go. So then the idea is I then have to learn the missing physics. So think about this a little bit, right? So I have an imperfect model, I learn a discrepancy. So what is the missing physics? Well, the nice thing about this Cindy architecture, it's trivial to use it in the context of discrepancy physics. So I have to just take this f of x, move it to the left-hand side. So remember, I solved the x equal to b with Cindy and b, the right-hand side, was just the derivatives of the time series. But now it's the derivatives of the time series minus the physics I know, that's b. Now you look for out of a library what the missing physics might be. Same architecture, super clean and allows you to start now building better models. Now, why is this important? Well, sort of in the modern world, what we're looking at is in engineering systems, especially physics-based system, is these digital twin concepts, which is what you're looking at on the left is a CAD emulator of a real robot. The problem with the digital twin is that that robot on the left, this emulator is actually not accurate enough to represent the robot on the right because this is an idealized platonic model, right? Perfect physics. Where this, there's sticking in the joints and after you oil it, maybe the physics changes a little bit. So what you need to do for precision manufacturing friends who use this, is you need to put sensors on here and tell this CAD drawing, here's the physics you're missing. And if you can build that in, now this digital twin will be an accurate representation of the real physics itself. Okay, let me see if I can, okay. All right, some thoughts to finish this up because I wanna, so far, I hope that what I've tried to convince you of is there's options around what you learn, right? You could, what I've said, you can just learn a linear model. So going back to that Berger's equation, one thing to do is just I learned the Cole-Hoff now the linear model and then linear models are nice. Or if you have a linear model, you can just do a decomposition. And I showed you how you can just learn directly the solution form, right? In terms of, let's say, formulas. Or you can learn a governing equation. And, or we could just leave this all behind and just say, well, you know, when I go to multi-scale physics problems, which are very difficult, maybe I just untether myself altogether from this interpretability angle and say, I wanna learn some physics. And so, you know, if I'm doing a multi-scale system, it's very difficult to model. I could just learn flow maps. In other words, there's some multi-scale physics. And this is work with Yuying Liu, where he said, look, I have some physics processing that are all interacting on different time scales. And, you know, I could simulate this. And of course, I'm having to simulate the fastest scales to resolve this. So it's very expensive. And it's not even clear I have that much understanding except for that I, you know, it's very sensitive to data. And I don't really know how coarse-graining happens very well. We're not very good at this. But what I can do is say, how about if I do this? Why don't I just learn how to, a flow map for times is this step size delta T3, flow map for medium scales and flow map for large scales. So I'm just gonna learn, take training pairs and learn, how does it see going from this time to this time? And you just build a very simple feedforward network to learn that. So if you have enough training data, it starts to learn that mapping forward in time. And you can do here is you can learn lots of different time scales simultaneously and then glue together your solution with these. So here's just some pictures that I wanna show you. If you learn this multi-scale flow maps, here's some ground truth. Some of these are like actually governing the equation physics like the Kermota-Shivoshinsky. But some of them are just things like music scores, okay? So here's a prelude and few, a cylinder flow and here's just a movie, a video of a flower growing. This flow map architecture doesn't really care what you show it, whether it has physics or not. So this prelude does not have physics, right? It's a music score. Someone made it up and people play it. Same thing with the video. You're just watching this flower grow. But our method is able to, by doing this multi-scale decomposition, do a beautiful job of being able to take into time course and learn how to do forecasts in the future state versus if you try to do this with standard LSTMs or Echo State Networks, clockwork RNNs, these are sort of your modern high-end time-stepping algorithms, they just don't work very well. They're not accounting for the multi-scale nature of the physics and they fail. So even if you're just gonna go and ditch governing equations or even sort of solution forms that you like, you'd still do better by taking into account some of the physics properties like the multi-scale time that you know is there in the physics. You build that in and you get yourself better representations. Final thoughts, a governing equation is like the DNA of a system. So if you give me governing equations like Navy or Stokes, you can express it in one line, right? Or even if you've typed it on your computer it'd be like 20 characters. But if I say, yeah, but I wanna see the flow field, that Navy or Stokes tells me nothing about what the actual flow field looks like in terms of spatial temporal coordinates. I'd actually have to now simulate that system to generate for you big data which is a specific instance of a flow field. So in other words, the governing equations like DNA and what the DNA do, DNA can produce an entire human being. After 18 years, you only have this adult person who can vote, right? So you have to wait 18 years to get this person from the DNA. Same thing with the Navy or Stokes. You have to simulate it for many hours on a computer maybe or less time, whatever, depending upon what you're actually doing and then you get out a full spatial temporal field. An architect that we've been developing that I think is a really nice blend of between governing equations and this the ability to just generate flow fields is this neural implicit flow representation. So this is where it was Shao Wu, Pan. And so the idea is to say, look, here's what we're gonna do. We're gonna train data and we're gonna have two, we're gonna have a hyperparameter network which is we're gonna take what's called shape net and it's gonna train XYZ locations and 3D to map it to the solution you itself. So from some spatial temporal PDEU. So we're gonna do this but we're gonna parametrize this by a second network with external factors, time being one of them but also parameter dependence being the other. So in other words, this network is gonna learn how to parametrize the spatial field. So we can actually use this. What's interesting is this allows us full spatial temporal compression which is parametrized here through some squeeze down of a latent representation. And so this is very much like learning this is very similar to what's called depot net and neural fields which developed out of Caltech and Brown separately. But now by parametrizing time over here what we have is a much more flexible framework which once you have this encoding, you can ask me, oh you know what I wanna do is I wanna see the flow field at this parameter for this set of time then you can just easily generate it right through here. You just put this in, it'll generate that flow field in a very fast way compared to doing the full simulation itself. And this is a very compressed representation. And just to show this, we've done this for full multi-scale, spatial temporal, this is turbulence data. And what we can do now is take this massive turbulence data sets like the John Hopkins data turbulence data sets which if you wanna put it on your computer it's just huge. And we can take all these and we can learn the representation for the slow physics in terms of a very compact representation which is this neural net. So now this neural net you can generate flow fields. Instead of doing this massive simulation or storing all this data this now becomes a paradigm which replaces a governing equation because a governing equation can't reproduce this unless I actually simulate it which is expensive. But this is sort of somehow a hybrid between bringing in the DNA of the governing equations and yet having a representation of all the full high-dimensional flow physics all built into one. All right, so I'll conclude here with this some thoughts, right? Which is ultimately I never backed out of this idea of that parsimony is this really important concept we wanna go with like how do we encode things and some representation that's interpretable. But my hope is what I kind of showed you is that there's a lot of different things you can learn from the data measurements. You could learn the governing equations. You could go one step further and learn a coordinate system in which that knowledge of governing equations is linear or you can go one step further and say, well, if it's a linear model here I would write down some eigenfunctions and eigenvalues maybe I just learned that directly. So there's a lot of different possibilities here to learn a representation beyond the governing equations because even if I give you governing equations you're still gonna have to solve them. So when I just tell this to just give you the solution directly, right? That's maybe a possibility for some of our problems to say here are the solutions to my problem it's these eigenfunctions and eigenvalues and I gloom together and I get a solution versus I have a PDE and I gotta do all this work of solving it to get what I actually wanted in the first place which was a solution. So I'll stop there, take questions happy to answer anything you guys have. Yes, thank you very much. That was a very interesting talk. So indeed it has Nathan mentioned if anyone has any question don't hesitate to raise your hand or ask if I know. But otherwise I actually have very tiny questions. In the model you showed that Kathleen I think developed the one way you retrieve the governing equation. I was just wondering how do you- Oh yeah, so this is Kathleen's stuff you said? Yes, yes. Yeah, so like come right to here. Yes, something like this, okay. Yeah, I was just wondering how do you encode the output of this network? How do you know you have this rights purple arrow where the output would match to this equation? Do you give it like a large dictionary above? Yeah, what we do is we do a Cindy layer here and what we're enforcing or saying look you can build any model you want x dot equals f of x and f of x is coming from a library but what I'm gonna force you to do is build a model where you can only use a few terms. So it's forced to parsimony by my restriction of imposing sparsity. So it's forced to say like, okay, so if you didn't tell me that I could build you anything to fit the data but now you're saying I can use only a small number of terms. Okay, that's a big restriction and then what it has to do it has to partner a coordinate system that allows that. So this is the magic of what Kathleen did is that she could actually take like a video of a pendulum and you give it no knowledge and it actually learns, okay, in order for me to encode this in some parsimonious way I have to learn this coordinate system here which is actually things that are not and then it learns this model. So we were like, I mean, you know, it's the kind of thing like we were like, wow, Kathleen, you did something amazing because that's exactly what we want to do. I like to call this GoPro physics, right? You know how you have these GoPro's and people do all these cool adventures and you film it? Well, that's what we're doing now with physics. So like you just film it. I got a video, I got this high-speed camera massive resolution. What do you do with the video? Well, normally you spend a bunch of time processing it. What this is saying is I take the video, feed it directly in, you film the physics phenomena, have it see if it can learn how to write down a coordinate system and of governing equations directly from the video versus you having to do all the work. Yeah, interesting. So to go in more practical terms, so this is an autoencoder. You're not using a variational autoencoder in this case. No, it's an autoencoder in and then we want to be able to come back out here with a decoder, yeah. Yeah, okay. And you know, there's a lot to this, right? Which is how do I pick the dimensions here, right? How do I know if I don't know ahead of time how big this should be? And typically what you're doing here is, the way I like to say everything is in machine learning is, here's a nice way to say it. Oh, a hyperparameter tune this. In other words, I play around with the dimension until if I can make it smaller, I do until I lose performance and then that's where I cut it off. So another way to say it, which is the more cynical way, which is probably the more correct way is, I don't really know what I'm doing, but I like to just play around until it works. And you know, and if you're in computer science, you call that a hyperparameter tuned it, which all that means is, I didn't really know anything as I played around until it worked. I'm just trying to be a little bit honest where they, with what those terms mean, I think. Yeah, that's it. Thank you. But also therefore I have like a final question regarding this is that means that you've trained this architecture on this very specific example and you've retrieved a rather general equation of a pendulum. What happens if you feed it like a very different kind of pendulum, the same pendulum of different kind of information of different pendulum or different feed? Is it still able to generalize? So here is that this is, okay, this gets to a really interesting concept. So let's take the pendulum and just say, here's a pendulum and I doubled its length. And I give you a new video with the pendulum at double the length. And by the way, this is equivalent to, I give you the orbits of Mars, train it, but now I give you Jupiter. So it turns out if you learned this model, this model in some sense, encodes a generalizable physics, right? This is where it encodes F equals MA. And what you know is, well, F equals MA of Mars and Jupiter are the same, which is a different coefficient. So I can still use this model to learn new coefficients, but the coordinate system, I have to relearn. I can't use the Mars coordinate system for Jupiter. So the coordinate itself has to say, well, if you double the length of the pendulum, you can fix them, in other words, you can fix the model right in the middle layer, but you're gonna have to relearn how to get to that coordinate system because a double length pendulum and they're both theta theta dot, but I have to encode them differently from the video. So the coordinate is not generalizable, but the F equals MA is, which of course, that's what we know. In fact, as a comment, for those of you who took engineering courses like statics and dynamics, these are two of the basic courses that all engineers take, the first one is sum of forces equals to zero. The second one, the sum of forces equals MA. And most of the exercises throughout those courses is you just have to construct the correct free body diagram. In other words, the correct coordinate system is very much like this problem. This is constructing the right coordinate system and then I can just impose the physics in it. All right, very interesting. Does anyone else have a question? Otherwise, I'm gonna close this meeting down. All right, well, let me thank you again for this very interesting talk and I wish everyone a good day or good afternoon, depending on where you are. Thank you very much. Well, thank you guys and thank you so much for having me and feel free to follow up with any questions if you have them, so. Yeah, that's a good thing. It's a good. Have a good day. Bye bye. Yeah, bye bye.