 Okay, welcome back everybody, good afternoon. We will continue with the, we just now actually finished the Dirac medal ceremony for this year. And of course, our idea was to combine that to be a nice lecture online. It's a tradition to announce the Spirit of Salam award on this occasion. And this award is constituted, was constituted by Salam's family. And the idea is to select one or more individuals each year who is dedicated, I mean, it's dedicated to a scientist, associate visitor or a staff member who has gone beyond the call of duty, sacrificing their time to further up the Salam's ideas and vision. And this year also the Salam family received several prominent nomination. And I'm pleased to inform that they have awarded this, they have given this award this year to Mohammed Hassan from Sudan for his lifelong dedication to building sustainable science in the developing world through research education policy and diplomacy. Then to Gregorio Medrano Asciansio from Spain for his efforts and vision to create an institute that would have overcome geographic and religious barrier to promote unity through science. And to Hilda Cerbera from Argentina for her dedication to ensuring that scientists no matter where they live and work have equal access to prestigious science literature. And Ahmad Salam, who's a friend of ICTP and son of Abdul Salam, will be announcing this year's awardees for the spirit of Salam award in parallel on YouTube and I please watch it on YouTube as we speak in a pre-recorded message and it will be kept on the ICTP website which you are invited to listen to. In view of the time, we will move on with the lecture of David. David and as all of you know, David is a distinguished cosmologist currently the director of computational astrophysics at the Flatiron Institute and one of the leaders in the observational microwave background. And today he will tell us about determining the universe's initial conditions. Thank you. So I actually should note in a sense I now have a new job coming soon. I'm about to become president of the Simons Foundation which as some of you may know plays a role in supporting institutes throughout the world and supporting research in math and physics. So I've been, for those of you who are students I think I described this when I met with some of you earlier. I feel in the midst of one of these transitions in life that I think many of us have gone through it at various stages from undergrad to graduate student or graduate student to postdoc or postdoc to faculty where each page you discover that you have an enormous number of new things to learn and that's where I am now on that side. So today's lecture is going to be different. Up to now I've talked about work we have done and work that I've been part of and work that others in my community have done with microwave acronyms. And this lecture is going to look forward and talk about work that we are trying to do and very much work in progress, very much ambition of what we want to accomplish and where I hope at least part of the field of cosmology will go in the coming decade. And our ambition is sort of the summary of the talk is basically this figure here. We now have a very high quality observation of the microwave background radiation that tells us both about initial condition and about the intervening universe. We have ever improving maps of the galaxy. Here we're showing the data from the latest version of the Sloan survey, but as we'll say looking ahead to the future with surveys like Desi and Euclid and Rubin and Roman coming, we will have very high quality maps of the three-dimensional distribution of galaxies. And in addition, we will have maps of the distribution of neutral gas through observant 21 centimeter and alpha forest observations projects like the square kilometer array. We'll be tracing out the hot gas we are tracing out with projects like the Rosita that receives in the IF has been involved with and is really starting to return very exciting data. And the theme of this talk will be can we go from that data back to the initial conditions in the universe and infer not only the power spectrum but the fluctuations but the actual initial conditions. And a key piece of that is can we go from initial conditions to observations? And can we solve, there's two pieces of this in a sense, can we do the forward modeling correctly to go from initial conditions, capture the physics of not just gravity but star formation of feedback, the effects of active galactic nuclei that drive powerful winds into the intergalactic medium, the cooling and evolution of gas and the formation of stars, can we at least describe them on the large scale statistically so that we can go from initial conditions forward to galaxies and make comparison with the observations. And today's the day where we should, when you think about giving a talk like this, hope that Professor Salam would enjoy the talk and certainly wish him well, honor his birthday. So also honor in doing this to other pioneers of astronomy whose surveys are sort of driving things forward. The Vera Rubin who was a pioneer in the discovery of dark matter and the Rubin telescope formerly known as the LSST, it's going to survey positions of billions of galaxies in its goldstone. At the same time, we're planning to use the Roman space telescope formerly known as WFIRST and that's a telescope I've played a significant role in, though I think my most important role has been advocating for restoring its budget in the Senate and the house after it was been cut four times by the Trump administration. As many of you know, because of their lack of support for this mission, we've now replaced Donald Trump. That's perhaps not the leading reason but one of the many impacts of the recent change in the United States is I'm now pretty confident that later this decade, we will be in a situation where we will survey much of the sky with high resolution, have images of billions of galaxies, there's the proscopy. And one of the motivations for today's talk is knowing that data is coming, also knowing that we're going to have data like I talked about earlier in the series where we'll have detailed maps of the large scale distribution of mass, the large scale distribution of galaxies and want to think about how do we want to infer information from that? Right, we know where the mass is, we know where the stars are. We want to ask, can we learn more about our basic model? At a minimum, can we determine some of the basic parameters of this model, the density of matter and baryons, the expansion rate, the initial conditions, things like neutrino mass and the properties of dark energy, the number of effective species, perhaps detect physics beyond this model, how can we best extract information? And with observations of the microwave background, we knew what to do to do a parameter. We know that the field is very close to Gaussian and we know that the optimal statistic is the power specter. There's a reason why in one of the previous two talks I kept putting up lots of the power specter. This was the right thing to mention. When we look at the large scale distribution of galaxies, we're looking at a field that has now become highly non-Gaussian and want to know what is the right way to get the maximum amount of information from this? What is the optimal statistic? What's the best way to go for observations to theory, to compute? And right now the approach that's being taken in almost all of these projects, and I'm part of the very large science teams for Euclid and Roman and Rubin of these projects is primarily to measure the two point function on large scales and make the approximation that things are Gaussian. But when we look at these big surveys, I think we can do better. In fact, back in 1996, when we were putting together the WMAP proposal, was that the MAP proposal, Chuck Bennett who led the mission turned to me as the kind of lead theorist on the project and said, we work very hard to make these maps and all you theorists want to do is measure two point functions. Is there something beyond the two point function? That actually was some of the motivation for the work I did suit afterwards with the chair of Komatsu and his thesis where we looked at the three point functions and test kits. But in the case of the CMD, that was about it. But with the large scale structure data, I think we can do more. And trying to think how to do that is really the theme of what we want to do. And a lot of the things I'll talk about today are going to be drawn on analysis that we've done of the set of simulations led by Francesco Vincenzo Navarro, looking at this picture of the team. Paco's picture is not on the slide since he gave me the slide. So I need to apologize exactly the lead on all this. But we've assembled a pretty big team of people here in New York and also in Princeton, but really throughout the country and number of our collaborators through Europe, looking at these large scale, Mateo and Creta, these large scale simulations that we've done and trying to use them to learn more about what information we can extract from the service. And the idea here is we've done, it's about 4,000 separate simulations. It's not the largest simulations, but it's the largest total values, people who've usually taken the approach of doing one single big simulation of the universe with the best possible parameters. We were interested in understanding the dependence on parameters to basically be able to do things like measure the Fisher matrix, let's compute a large scale structure statistic, but compute the statistic on 1,000 different realizations, 2,000 different realizations, each with different cosmological parameters. So we can explore the relationship between initial conditions and data and see what we can infer and measure effectively the information content in these surveys. And what these figures are showing when we've done, looked at a whole bunch of different statistics of the large scale structure, what can we get from two point statistics, from void statistics, from mark statistics, from things like the vice spectrum. And as you see in these plots and delve into these papers for more detail, as you add more ways of measuring the field beyond the two point function, there's additional information that means we can extract more science about things like the neutrino pass of the density of baryons. So for example, we found that with the more power spectrum, we can measure the neutrino mass with an error bars that are half the size. That's actually a pretty big deal. It makes a difference between whether you're looking at a three signal or a five signal detection or equivalently, if you ask how much, how big a survey do you need to do? If you have a technique that doesn't factor too better with the same data, that's equivalent to getting four times as much data. And if you're getting four times as much data, it gets pretty expensive when we're talking about these billion dollar projects. So I like to say a little theory and simulation can save you a billion dollars here and there. And that starts to get to be a real money with these big projects. We really do want to think about are there more ways of inferring things that are more powerful than the standard techniques and the work we've done with these surveys have pointed in this direction. And we've written a series of papers up here on the archive that's come out to POK in the past year or two. But the generic conclusion is there's just a lot of information in our measurements of large scale structure beyond what we measure in the power scale. And that information is there on scale smaller than 30 megaparsecs. And it's kind of waiting for us. And the challenge I think is that we know when we get to these smaller scales that the uncertainties in what astrophysicists like to call baryonic effects. But that what we mean is all of the rich nonlinear physics of gas cooling turning into stars. Those stars pumping energy out in winds and this shows an image of winds flowing out of a galaxy. These winds are driven by both star star formation stars form in bursts driven by all galaxies have large supermassive black holes with masses from a million to a billion solar masses lurking in their centers. These black holes often have accretion discs that drive powerful jets. These jets affect the regions around them. And those are complex, effectively spattastic processes that are lots of people working on them doing great work but not fully understood and very difficult to fully model. And certainly at the computational level we will never have the dynamic range in the simulation that will at least not in our lifetimes that will let us go from the scales of cosmology to the scales of stars and simulate them correctly in a way to capture all the physics. So, you know, that's been a limitation that has led people to shy away from looking at smaller scales and looking beyond the two points of distance. And one way I like to think about this is ultimately what we'd like to do is recover the initial conditions. And this equation tries to capture what we want to get at. We want to ask what is the likelihood of the data whether it's the CMV data, the large scale structure data, given the initial amplitudes of, if I think about simulating the universe as a big box that represents the volume we see and you know, what's the amplitude of all those initial conditions? We don't fully know capture the astrophysics but we can often parameterize it by often a handful of parameters that capture most of what we care about them in terms of how it affects large scales. What is the energy that is converted from gas to supernova explosions that drive winds in star formation? What's the velocity of those winds? How much energy is put out by the AGMs? We want to marginalize over that. And whenever we deal with real data, we have observational systematics and we need to model them. And effectively what we would like to be able to do is evaluate the syndrome. And while it's straightforward to write down, it's pretty challenging, right? This is an interval over 10 to the 11, 10 to the 12 dimensions. If we're thinking about, say, working at a resolution of a few megaparsecs to capture the volume of the Rubin survey, we need to, for each evaluation, ask given the amplitude, given the astrophysics, given this observational uncertainty, what models that predict, how does that compare with data? How can we, we have to forward model everything. And there's been a number of attempts to do this over the last decade or so with increasing promise. And I've referred to a couple of key papers here. But what we want to do is do this in a way that not only captures the gravitational dynamics, which is what previous work has done, but actually really try to model over, modernize over astrophysics. And the approach we're trying to take is take advantage of recent advances in machine learning that have enabled people to solve a lot of hard nonlinear problems or at least approximations more accurately. And this has been the approach we're taking. So I've already mentioned these challenges. We're working at this huge volume every evaluation requires we go from initial conditions to galaxy distribution. We're going to need to marginalize over uncertainties and astrophysics, we'll need to project into the observational plan. Now, as I stressed in the beginning of this talk, this is very much a work in progress. I cannot yet tell you that we can do all of this, but we're going to convey to you in the rest of the talk is a sense of the progress we've made. And as we've been working on this problem over the last couple of years, I've become increasingly optimistic as we see advances in various pieces of this problem that we'll be able to do this. But by the time we reach the end of the decade and we have the data from these large surveys that we will have the mathematical tools, the simulation techniques, the statistical techniques we need to more fully exploit this data. And in the program we've had at Flatiron, we've got, as I mentioned, a large team of people involved, but the real leadership has worked by Shirley Ho, one of the group leaders here for the group leader for a cosmology and machine learning group, and Francisco de la Suede de Navarro Paco, who has really led the simulation team. And we've already seen real progress on analytical mapping of dark matter from initial conditions to today, going from dark matter to galaxies and gas, and making real progress on marginalization over baryonophysics. For those of you who haven't worked with machine learning, let me give you my very high-level summary of what we do or what the techniques let us do. I think of these machine learning, particularly these techniques we've been using like conformal neural nets, as being efficient and effective ways of approximating functions in high-dimensional spaces. And a modest way of describing what has generated tremendous excitement is this is a good way of doing interpolation. And one classic machine learning problem is you're given as a training set, millions of images of cats, millions of images of dogs, and you train the neural network to differentiate cats from dogs. Let's think about what that means. If I give you an image of cat, say taken with a 10 megapixel camera in color, that's a 30 million dimensional space, where you can characterize each image that you have in your data as one point in that 30 million dimensional space. So you can imagine this 30 million dimensional space peppered with a bunch of blue points for cats, red points for dogs, and what the neural net is good at doing is fitting a surface in this high-dimensional space, if you like, that separates cats from dogs. It lets you do that classification effectively. The conformal neural net student does this by representing it with a series of local functions that if you identify different pieces of the image, you don't have to tell it, look for this feature. It learns it from the data. It will fit this nonlinear function, but it's a challenging minimization problem if you think about this as fitting a function in a high-dimensional space. One of the key tricks here is a technique called stochastic gradient, where you use only a fraction of the data in each attempt to update it and compute its gradient. This stochastic gradient technique is noisy. As a result, as you go down, as you do your optimization, you go down your surface, you don't get stuck on local saddle points in minima because you've got a little bit of heat, but it's a bit like simulated annealing or the way a crystal forms, a metal forms, as you work your way down to the minimum. You could include symmetries of the function and the representation, and these are sort of the basic ingredients of these tools, and people have used it to go from observables to the probability of rain, classifying cats and dogs. The first success of this, in some ways, was classifying numbers. When you write a check, the check image, the check image is analyzed with a neural net. It's not read by a person anymore. And we want to know if we use this for cosmology. The structure of these networks is we start with an input data. We have some intermediate layer, and at each point in the network, we represent it as a function, as the sum of some weights, a function on the data, the form of the functions varies. It's often a simple nonlinear function. You can think of something like a cance as one example. Sigboid function is another popular one. As a sum of weights, plus a constant. And what we want to do is fit these weights. And the functions are often kind of local, where they're local manifestations of functions on the data. You will talk about networks having multiple layers based on the number of these nonlinear mappings that you go through in the process. And this is the basic form of what we're using. What got me interested in this about two almost three years ago, was some work that Shirley Quo did with her thesis student at Carnegie Mellon, CUT, that trained a network to compute, to replicate the results of an N-body simulation. So the idea here was you start with initial conditions, first use the Zoldovich approximation and say effective analytical approximation to go from initial conditions to an estimate of the final conditions. You then for training, started with the same initial conditions, used an N-body simulation to compute the final conditions. And then you train this network, this particular form is called a UNET, to find what are the weights for the network to let it learn how to go from initial conditions to final conditions. And you trained it not on one simulation, but on many simulations, so that it would learn basically the Zoldovich approximation gives you something that looks close to the date to the simulation, but it makes things too fuzzy. And it improves the analytical model. And you can think of this as providing, what does provide you with an analytical map. We're just computing weights in this function. So we have an analytical, differentiable and as a result, we can even invert in some forms, invertible maps that let us do this. And to compare how this did with sort of the standard, kind of state-of-the-art analytical techniques, this shows on the right, what you get with the two LPT approximations that are often used in analytical models for errors, the difference between what the analytical theory does as an approximate description of the nonlinear evolution of the predation theory and the actual data. And you see that there are deviations, that these big red regions, when you have clusters, the analytical theory makes things that are too fuzzy. The final positions are often off by five megaparsecs. We remind you the scale of a cluster is about a megaparsec. So you're not getting the dense objects right. On the other hand, this neural net would compute the positions accurately and the errors are now far below a megaparsec. So you can train a network to do much better than the best analytical theory in getting the nonlinear descriptions of what's going on. And one way to quantify this is to look at the power spectrum you predict versus the power spectrum that you get from the simulations. The plot on the top, the truth, what actually comes out of the simulations is shown in green. The predictions for machine learning is shown in orange. Blues what you get with our best analytical technique. And the plot below shows the transfer function and shows the correlation. And you can see that down to fairly nonlinear scales, machine learning does a pretty good job of reproducing the effects of the un-body simulation. Now you might ask, why are we doing that if we have an un-body simulation, why are we doing this at all? Why do we need machine learning and why do we wanna train this? And in this case, it's really just a matter of computational time. Once you've trained the neural net, it's 60 million times faster to evaluate the predictions using the neural net than actually running the full un-body code. And once you can run 60 million times faster, there's a set of statistical techniques you can use that suddenly become available. And this has been applied not only to go from initial conditions to where the dark matter is, but for going from where the dark matter is to where the galaxies are. For this work, what we've done is start with some of the state-of-the-art hydrodynamic simulations, the illustrious PNG simulations led by the team, led by Lars Harquist at Harvard and Volker-Springle at Max Planck. And they've made their best simulations available. And we use that to train on how to go from dark matter to galaxies. And there's some very complicated function that maps from dark matter to galaxies. In standard approaches, people linearize this and approximate this with a parameter called bias by saying that the galaxy density fluctuations are equal to some constant times the matter fluctuations. Well, that's a pretty good approximation on very large scales. On small scales, it really does break down when we'd like to know this nonlinear function. And we'd like to improve on the predictions. Well, first of all, the thing that does improve on the predictions, the embody simulations. And this is recent work led by Red Under Loverra, a visiting graduate student here at Flatiron from Brazil, working with Ian Lee, Shirley and myself. And you can see we've been able to now push the scales on which we can do the embody simulations, reproduce the results of the embody simulations down to much smaller scales. Our neural nets are now good down to wave numbers of one that's really down to the scale of a megaparsec. To put this in perspective of people haven't thought about these scales and just to talk about this in terms of how we think about this with the data, what is the current approach when most people analyze large scale structure data is they only looked at observations down to wave numbers of about 0.1 each inverse megaparsec, current spending to lens scales of about 30 megaparsecs. And the feeling is on scale smaller than that, things are nonlinear, we have to do simulations after nonlinear things. Here you can see we can use the neural nets to push, to make accurate predictions down to wave numbers of a megaparsec. Why do we care? Why is it a big deal? Well, if I can take my observations and use the data down to a wave number of a mega K of one rather than 0.1 once 3D data, there's a thousand times more information in our observations that we can go to wave numbers of a thousand than wave numbers of point, but wave numbers of one rather than wave numbers of 0.1. So this gives you a sense of the potential information that we've been leaving money, we've been leaving on the table things we've not taken advantage of that we may be able to do with machine learning. One of the interesting features of the neural nets in this application for studying the growth of structure is generalizability. So I've talked about machine learning as learning how to do interpolation. And those of you who've done some numerics know that interpolation is usually safe and extrapolation is usually pretty dangerous. If you have information about a function inside a certain region extrapolating beyond you're doing your own peril. And what's been interesting to see is that we can train the neural net with one value of a mega matter and run it quite successfully with a pretty wide range of matter bases. And that wide range in what the network seems to be learning and this is something that's very much a speculation because we're trying to test this hypothesis now is the network seems to be learning how nonlinear gravity phase correlates modes regardless of the amplitude and the scale invariably. That's information I think is what's encoded in the network. And I think that's why this lets us not just work on a training set region but extrapolate beyond. So far we've talked about dark matter and using dark matter simulations and making use of the publicly available illustrative. In order to take the big step and actually try to use these techniques to compare to galaxy simulations we need to marginalize over uncertainties and astrophysicism. Now this is a phrase that sometimes gets my friends who work on galaxy formation a little unhappy that they're very eager to understand the very rich physics of galaxy formation. But as a cosmologist, I just wanna step back and marginalize over the uncertainties that we don't understand. And look at the, and it's a very different approach to thinking about the baryonic physics rather than attempting to just get the best fit model to the current set of data. And this is the approach usually taken in the field. What's our best estimate of things like star formation feedback? Rather let's look at the uncertainties as we vary parameters in star formation physics. So what we've done in these camel simulations and it helps to be director of a computational institute if you wanna do this is we've burned 8 million CPU hours, generated 200 terabytes of data and ran 4,000 simulations, 2,000 hydrodynamic simulations, 2,000 in body simulations. The hydrodynamic simulation is two different ways of describing galaxy formation. One, the arepoelustrous PNG code developed by the perquiston-springle group. The other, the gizmosimba simulations developed by Ramil Davai and his collaborators. And both codes are used for all these simulations. So we can look at what of our conclusions are sensitive to the codes we use. And the plot that I'm showing here shows the effects of barium physics. This shows the power spectrum in the simulation in the model with hydrodynamics compared to the power spectrum in the body simulation. And one way of looking at this is you can see since we're covering the range of feedback that's plausible. What is the effects of hydro? What are the uncertainties? And if you look at this plot, you'll see that hydro effects do become very important on small scales. Wave numbers of 10 and above is an enormous difference between models. But once we get down to large enough scales, the hydrodynamic uncertainties become small. And the hope is we can actually use observations that at least constrain things on the very smallest scales and give us more confidence in moving to larger scales. So having this big set of simulations, let us apply this to a whole bunch of different problems. Paper we put out late last year, led by Leander Kale, a Princeton graduate student, looked at how to, you know, applying this for our microwave background simulation. How do we go from simulation to dark matter, how do we go from our initial conditions directly to predictions of what we should see in our microwave background experiment for those initial conditions in terms of predictions of what the, yes, electron pressure should be. So what is the thermal syniathal dovage effect? What's the electron pressure? What's the kinetics syniathal dovage effect? What's the large scale distribution of electrons? And we've trained the neural net to go from the dark matter simulations to match the predictions that come out of our various hydrodynamic simulations, marginalize and be able to marginalize over those uncertainties. One of the technical challenges for this effort is that if you look at the box, almost all of the hot gas sits in very few voxels in that simulation. It's all in a very dense region. And this shows the PDF and the stimulus of the hot gas in the simulation. And you can see that in the big box simulations, almost all the pixels are at very low pressure. And we're interested in making a map of the integrated pressure. Almost everything we observe is coming from the pixels whose amplitude and pressure is, this is normalized by the variance is above one sigma. It's positive, right? That's everything in the spot above one, 10 to the zero in the spot. And you can see that the number of points, this is one of those moments where I wish I had a screen I'm pointing at to make sure you're seeing there are very few points above one in that blue curve, solid blue curve. Almost all the map is empty space as far as the pressure zone, indirect pressure, so on. So we need to develop techniques for letting us do a fit with relatively small fraction of the data getting most of the weight. To do that, we use an analytical mapping as a first approximation. One of the things we found in fitting these neural nets that works as a very good technique is use all the physics you know, rather than going from initial conditions directly to your predictions with a neural net, use an analytical theory that goes from initial conditions to an approximation of what you think it will be and then train the neural net to improve on the approximation. So the approximation we make, for example, treats everything that's spherical. With the risk we make of that spherical approximation, we know what to expect if things were spherical and unequal everywhere. That actually gets you 80, 90% of the way there. And then we train the neural net to learn the difference. We could also use the neural net and something that I won't talk much about but we really just started to delve into and this is work that's part of Miles Cranmer's thesis one of our students in the Astrophysics Department Princeton is to use symbolic regression. So we can actually train the network not just to give us a fitting function with all those coefficients buried deep in the network but actually train it to give us an analytical expression that captures what it is doing. And the hope is that we can learn some new physics from this and Miles has been applying this to a wide range of other problems. Some very interesting work he's doing with Shirley Hogue in German Fielding here is applying this to turbulence and getting better closure relationships for turbulence in the classical closure relations. And they've been able to find that they can get and this is also a much better fit than some of the classic relationships for turbulence. So that looks very promising as a side. I'm going back to our TSE maps. This shows first in black what comes out of the simulation in orange, our machine learning network and the blue curves show an analytical theory and a recalibrated analytical theory. You can see the analytical theory isn't bad, it captures much of what's going on. This is a long plot. So the fact is that it's off by 50% means you won't want to use it to match observations but you can pretty easily use it to calibrate a fitting function to go from the analytical theory to the data. It's just a relatively smooth recalibration. And not only can we fit the power spectrum, we can pretty accurately fit the one point function. Even though that one point function is dominated by a handful of regions in the tail. We've also, this is not work on a co-authorometer, our group here at Flatiron has also looked at how do we go from dark matter to galaxies, learned and trained in that work to do this effectively. We're hoping to apply this to be able to apply the neural network to actually get the now available Sloan data. So here's to some more on the camel simulations. This just shows that for a lot of our simulation pairs, we're looking at both simulations from illustrious and Simba. This is the dark matter of the gas. Just a few words more about the simulations give you a sense of what we're doing with them. They're relatively small boxes for those of you who are experts in this, these are 25 bank of Parsec cube boxes, the resolution of 100 kiloparsecs that lets us run large numbers of them. Realistically, if we want to apply this someday to say Euclid data, we would want to run much bigger boxes. One of the things I think we understand better now is how to quantify what we call the cosmic variance to the effects of having the small boxes. The most important aspect of that is with small boxes, we have relatively few clusters. We have found we could actually quantify this variation, say with the same cosmological model and just capturing small regions from region to region just in terms of the mean electron temperatures on the side because that captures the properties of the rich clusters. So we actually see these simulations we're doing now as just exploration. These are not the final definitive simulation we'll use a decade from now to analyze the data, but it's teaching us what we need to know and what we can extract. So here are, as I've mentioned, these temperature maps all drawn with the same cosmology and astrophysics. So the way we've done these simulations is we sometimes keep a cosmology fix and look at many realizations all with the same astrophysics, sometimes we keep the initial conditions fixed and we vary the astrophysical feedbacks, sometimes we keep the astrophysics fixed and vary the cosmology, and sometimes we vary everything once. So with our 2000 simulations, we've got a whole bunch of different ways of capturing this and understanding how things like in this case the electron temperature depends on the amplitude of fluctuation, what we call sigma eight and how much energy we put in from supernova or the active galactic nuclei are how big those winds are. So let me sort of step back. I'll show, I may skip a few sides of time and just give you sort of our vision. We want to sum over thousands of simulations. We want to sum over thousands of astrophysical models. One of the things we're also working on is the techniques for super resolution imaging. There's been a lot of work in the machine learning community where you give the network a low resolution picture and it gives you back a high resolution image. Effectively, what we want to do is train this on a few high resolution, very expensive simulation and learn how to capture the small scale physics with the neural net is remember always thinking this is an approximation so that we can run relatively inexpensive simulations yet have the output as if it was high resolution. So we're working on that problem. Another big problem we're working on is how do we apply this to likelihood free inference? How do we go from given a bunch of simulations that tell us are now using machine learning so we can go from initial conditions to observations? We have predictions. How do we compare those predictions to the actual data? And you can think of, what I think of this is we have a likelihood function, that function in that high dimensional space that depends on initial conditions. We're now sampling that function at a bunch of places with each of these realizations. And remember, this may be a 10 to the 10 dimensional space and even with machine learning running millions of times faster, we're only gonna be sampling this at 10s of millions of points. So we're gonna need to find a representation and the neural net was promising for fitting this function at least locally around the minimum in that space. And then we wanna be able to use that to be able to determine the cosmological parameters. And there's lots of pieces in this. We're working on, I've talked about going for the matter field to make lensing predictions. I've shown you matter field going to thermal SC and KSC. We are also working to compare with the kind of X-ray data that Rashid's in the IF and others are capturing. Another piece of the problem, and this is something that Christina Price and Miles Cranmer, for instance, who graduate students have been leading is how do we go from physical space to retro space? How do we capture what are called retro space distortions? The fact that we don't observe galaxies, 3D positions are affected by their velocities. That's another nonlinear map that we're working to capture and it's the same, it's the same thing. We take analytical theory, we use it as the first step. We then train the network to learn how to improve on analytical theory and we're seeing promising results in there. For those of you who are now delved into this world of neural nets, that work we're actually using what are called GNNs, graph neural nets, which seem to be better suited tools for that particular problem. I'm getting close to 11, so let me quickly zoom through this and show that we've inferred, this is just fitting for some global parameters like star formation rate and that camel simulation and then applying this and what you're seeing here is what actually comes out of a particular, one of those 2000 realizations for star formation histories of function of redshift. That's in the dash curve, the black is the fifth. So you can see that we can pretty quickly, pretty accurately capture highly nonlinear things that depend on cosmological parameters with this neural network. So with this rather than run a simulation if someone wanted to fit to observation the star formation sprays a function of these parameters we can do this. So this looks promising and we're, as I mentioned, trying to go beyond this and actually now apply instead of fitting the neural net parameters we actually fit functions and teach the network to explore a function space with some constraints of scarcity and find fitting functions for star formation history. Let me go forward to the conclusions. What do I hope you take away from this talk and all the talks? Cosmology is big open questions, the profound questions that are telling us about physics beyond the standard model and we're at a wonderful stage in the field where observations are improving. We've had tremendous progress in the microwave background going from Kobe to WMAP to Planck and I hope I've conveyed we're now going beyond Planck as we work towards ACT and the South Pole Telescope and Simon's Observatory and CNBS4 and observations from space hopefully with projects like Lightbird that will succeed Planck as the next space mission perhaps and on the ground and space there's in the coming decade we're gonna have a tremendous amount of optical and infrared data tracing the large scale structure of galaxies. So we've got this wealth of data coming here to answer these questions. And what I've tried to convey today is that I think it's a very interesting problem to go beyond the standard two point techniques that have been used up to now for analyzing both microwave background of large scale structure data and extract more information from that data. And I believe that these advances in machine learning is giving us a new tool that can let us analyze data and model highly nonlinear processes in ways, in novel ways. So I'm hoping that a few years from now we will be even further along in this project and we'll be able to tell you that the dream we have of getting of applying these machine learning techniques and cosmology has become a reality. And perhaps this will become the standard tool that everyone will be using for doing their analysis. So let me stop there. Thank you, David. So do you have time for a couple of questions? I do. Okay, so Matteo Marsiri has a question. So if Walter can allow him to ask, please Matteo, go ahead. Okay, hello, can you hear me? Yes. Okay, so this is a very nice question. So the question is, so you are using neural networks trained on simulations which are essentially based on the physics that we know. And the question is, how can you, how can this discover new physics? And this is a little bit related to the issue of priors in Bayesian inference. I mean, the big formula that you were showing. And when you are doing inference in such a huge dimensional space, so priors may be important. In this case, your priors may be very much centered on the physics that you know. So the general question is, isn't there a problem that these approaches of machine learning may be blind to new physics? So this is, I would say, not an IE, but a profound problem. And I would ask this question in a couple of different ways. One is, even if you're doing simulations of physics, even if you have the right model, what about the uncertainties, both in your neural network and in your simulations? And one of the things we've been working on is with some statisticians is first, just capturing what are the uncertainties in the neural mix. The second part of the problem is, what about the uncertainties in your ways of representing the baryon physics? We're sticking our toe in the water by simply using two steps of biological simulation. Our hope is that different approaches people use differ from each other roughly as much as they differ from the true universe. This is all just first in the context. I'll get to your question more fully in a moment because my hope is the difference between simulation technique A and simulation technique B is comparable to the difference between that and the data. And our broad plan is to train all this, get this working on simulation techniques A and B, work with colleagues using a different set of approaches for simulations and analyze their data as if it's a reality to get some measure of the biases in that approximation. There's some new physics that we can explore. If we know what new physics to look for, we can run analyses, including those new physics, and impose proper progress. This is a problem actually not only in applied machine learning, but in any statistical technique. Even when we're doing such sort of classical things like fitting the microwave background spectrum to the Lambda CDM model, we're doing sort of fitting basically what looks not so different from a chi-squared when you write down a likelihood function. You're fitting a model to the data and you're determining parameters of the model given the data. If it doesn't really tell you, is there a better model out there that provides a better fit for the data? What we often do is we look for anomalies in the best fit and then use those anomalies to identify where there's new physics. So if you take a model without dark cosmological concept, without dark energy, and you try to fit the microwave background to you, there is a best fit model of a universe that is a mega matter equals one to the microwave background. That best fit model, if you plot it versus the data, already by eye, you notice that it fits somewhat, but it clearly has systematic deviations from the best model. So I think what one has to do is go back and forth, ask for a given model, what is the best fit, and then ask, does that best fit really characterize what's going on? And this gets to kind of a philosophical question of what's the best way to do science like this? Something I've thought about is maybe what we want to do is ask, define, this is actually how we're approaching it, is take our data, get the best, from say the microwave background experiments and galaxy surveys, work our analysis techniques to fit those, and then say, okay, that's our best fit model. And then ask, what would that predict for the erasita x-ray server? That's a different way of looking at the universe, right? We're working at the hot gas. If we've got the right model, it should probably fit that as well. But if we're not capturing something going on in clusters, it might show up in a more noticeable way there. So that's one way of how do we get at new physics? If we don't know the new physics is there. If we have two alternative models, one with interacting neutrinos, one without interacting neutrinos, it's straightforward to compare the two and ask what's a better fit. What's harder is you've got model A and I want to ask, is there something beyond model A? And this, any, we can ask the same question of data coming out of the LHC, right? The standard model of particle physics does a very impressive job of fitting lots of data. But how do we know there's not a model better than that? What is the unknown unknowns working there? And there's a bunch of different ways of approaching this, but there's not actually, as far as I know, a systematic way of discovery. So that's a profound question. Yet another reason why I wish I was in Triesta, this is an excellent one to continue tonight after dinner over a glass of wine and talk about, how do we discover things and what is the history and what things should we be doing differently? Okay, so thank you, David. So I think we should conclude here. Thank you very much for these lectures. Maybe at least you want to say some final words. At least you're muted. I had a question for you for dinner. All the time. No, but I think this question is really interesting. For example, for a planetary motion, suppose you train, I mean, have people done this kind of thing for a simpler system like planetary motion that you've given the data for eight planets? Can you predict for what will happen if you throw in 20 planets? Okay. Actually, we have two interesting papers on this that I want to describe very quickly. One is, and this is some nice work that Miles Prandtler did with another graduate student at Princeton, was they gave it the data for planetary motion and they fit it, took a graph neural net and they said, there's some interaction law. Figure out the interaction law between planets. And the network fit went over our square graph. Now, what I, a much, it's a subtle thing. It's hard to get, actually, to take the planetary motion and notice the procession of the perihelion of Mercury. So would the network have discovered that? We haven't pushed it that far, but we're happy to see one over our square come out. The next problem we did, and we have a paper on the archive that with a number of us co-authored, should pair of papers, looking at the stability of planetary systems. So I can take a system like our solar system and ask, is it stable over long times? What's the times here on which it falls apart? And we gave the network a series of numerical integrations of three-body systems and trained it to figure out a three-planet system. So star in three planets. The network learned the resonance coupling rules. Now, we knew some of it, again, as we took advantage of this technique of, we do know something after 400 years of working with dynamics of understanding that the instability of planetary systems is driven by resonance overlap. And those of you who know this stuff, so the Chirpoff resonance overlap criterion between you've got some three to one resonance of this planet system coupling to the two to one here and that drives the instability. So when you have, but the network learned to predict the instability and did remarkably well, much better than any existing analytical method for given some initial conditions. So we can now give it a thousand years of orbit integration and it can predict what will happen in the next billion, at least statistically. And all the best you could do with statistically was it's actually a chaotic system in some parts of space space. And what's fun, and this gets to get new physics because the dominant term is the three planet interaction. We've given it five and seven planet systems and it accurately predict stability. So we're able to do that. Could we learn that? And we haven't gone this way, but it's an interesting question. It's hard because we know the, it's hard to do right because we know the answer. If you had the data on the procession of the periheal of Mercury, how would you use a neural net to do this? Now, I think, thinking back on this historically, some remarkable analytical work was done by mathematicians, physicists. It was so at that point, the boundary was pretty loose of the late 19th century where they actually predicted all the terms from planets that contribute to the procession of the periheal of Mercury. And Mercury's procession, the terms due to Jupiter and Earth resonant effects are much bigger than Geo. And if the theorists hadn't done their perturbation theory accurately, which they did, it was, I think, a great triumph that's under, but not talked about enough, we wouldn't have picked up the procession of the periheal of Mercury. So, to think about this in terms of modern terms, you're looking at deviations in the magnetic moment and you've got all those terms that you know how to predict from standard theory and then you pick up the deviation. So I think of machine learning not as a way of discovering the deviation, but simply as a computational tool that for some problems, let us compute the terms that we know better. So let me end with that. Okay. No, this surely, we should continue. I hope you will come to it yesterday and we will. Well, as I mentioned, I will be visiting, well, I'm getting back, my phone buzzed, it told me I do not have COVID yesterday. Good, I'm getting vaccinated today at three o'clock. So, as soon as Italy vaccinates and is ready, it's open to visitors, I'm there. Okay, perfect. I will be vaccinated, I will be completely vaccinated in three weeks. So I'm just waiting on it, I believe. Okay, great. You're an important person in Italy. Get the country vaccinated. Open up. Okay, great, thanks. So thanks, David, for this wonderful set of lectures. I really look forward to seeing you, you know. Let's all hope so. Let's all hope so. And once again, I will thank the COVID Foundation for advancement of science for these lectures and their support to ICTP and wishing of the salam, I mean, memory of us of this salam, it's happy birthday. Thank you. Thank you. Thank you.