 Hi, everyone. Thank you. I'm Elini. I'm Regina and actually we're super happy to be here. This is our first year Python and we love it so far. So we are both PhD students and the Institute of Astrophysics of the Canary Islands, where we work a lot with Python, like using it like 80% of our day every day. And we use it to see if we can, like, our field is to study the formation and evolution of galaxies. And we use Python to see how well we can actually seems to understand it so far, we seem to understand so far. And in a sense, see how well we also understand the universe. So first, before we answer this question, which is kind of a big question, we have to go back and take a stack back and ask ourselves how well we used to think that we were able to understand the universe like some years ago. And like only 100 years ago, apparently the answer is that we weren't able to understand it not well at all. Like many, many astronomers around the 1900s, 20 hundreds or something like that, they thought that Milky Way stars, like our galaxy and all the stars that comprised it, consisted the entire universe. So as we know now, this is certainly not accurate. And there was a big debate these days to find out what's what's going on, because they could actually observe other galaxies but not very well. They seemed like this clamp structures that they didn't have the means then to observe them very well. So this debate was resolved when we actually got the first observation and evidence that other galaxies existed. So this guy here, maybe you know him, Hubble, he was the first to actually manage to measure the distance of the Andromeda galaxy from our galaxy. And he did so by observing the brightness of a certain type of variable stars that are very good, give us a very good measurement of distance. So he measured that these stars in the Andromeda spiral nebula, as they thought it was back then, is actually much bigger than the size of Milky Way itself. So apparently it cannot be part of the Milky Way, right? It has to be something outside our galaxy. And just as a fun fact, this is how Hubble used to observe the Andromeda galaxy on the right. This is how it looked back then with the telescopes they had back then and on the left, sorry. And on the right, this is how we can observe it now. So we can see that we have actually evolved a lot since then. So by continuing to study the galaxies and studying more about their distance and velocities, we actually found more and more interesting results. We actually found that galaxies that are further away from Earth seem to be moving away from us in a much higher velocity than galaxies that are closer to us. And what this means is that the universe is not only much larger than we initially thought, but it's actually expanding. So to be able to understand this better, of course, everyone's, all the scientists concentrated in starting to create a new cosmological model. And this is just a way of trying to explain what you observe and the behavior of matter as we see it. And the one that is most widely accepted right now is called the Lambda CDM model. So what the, the lambda, oh, I can use this one. Okay, cool. So what the Lambda CDM model tells us is that the whole universe just has three basic ingredients. And like 70% of it is dark energy. This mysterious energy that we are not really sure what it is. But we know that this is what drives the expansion of the universe. And it's mostly associated with vacuum. And then we have like 25% that is again some dark material that we call dark matter. And that we again cannot observe. This is why we call it dark. It doesn't interact with light, but interacts with gravity. And it's what causes all the clamping that we see and all the structures creating the universe. And then this, this time tiny little bit like only 5% of stuff that we can actually observe. So it's like everything that we are made of or like the earth, the stars, the galaxy, etc. No, okay. Cool. So now that we have this cosmological model, scientists will like, okay, maybe we can try and simulate the whole universe, right? So how do you go around that? Like one, like 10 years ago, I think the first chemical simulations were around then, they were like, okay, we have to define some ingredients in order to simulate the universe, but of course, in a reduced volume. So for that, what we need is some questions, some physical equations of how the matter behaves. And then we of course need the effect of gravity. And then we need some smaller scale physics, because we cannot simulate the universe in the whole, its whole extent, like in simulate every individual star, we have to stop at some point, right? So we have to stop at resolution, the specific resolution and everything under that. We just have to define some specific recipes that we think that govern how the small scale things work. And of course, to run that we need some huge super computers. And we need a lot of time. So this cosmological simulations to run, they take like months. So we see here on the right, like a cartoon of how this cosmological simulation will look like in time. So you see, you will see in a little bit, again, in the beginning, it looks like all the matter is like randomly distributed and homogeneous. But as time progresses, we start seeing these structures and everything starts to be more clamped. And this is actually a very good result from cosmological simulations. They are super, like amazingly well, they do a very good job in simulating the large scale structures. So what we get from cosmological simulations looks very much from what we can get from observations and what we think the cosmic web looks like. So this is an amazing result already. But what can we do now when we'll go to smaller scales? And by smaller scales, because we're talking about the universe, I just mean galaxies. Not really small scale, but yeah, in that context, yes. So here I'm plotting a simulated galaxy and a real galaxy. And I guess you can guess which one is which or not. I don't know. The screen is not very good, but yeah, probably you can guess. But still, so the simulated galaxy is one on the left and the real galaxy is this beautiful M81 galaxy. And you see that while we can still guess, it's mainly because we haven't added any observational effects in the simulated galaxy. But still, the cosmological simulations do an amazing job in simulated galaxies and recreating galaxies in both the physical contexts and shapes that we know that exist in the universe. So for me, that's already amazing. I'm amazed by that. And why do we like so much cosmological simulations? So cosmological simulations give us a great capability that we didn't have before. So the reality is that we can only observe galaxies at a very specific point in time, right? So no longer how much we wait, like our whole lifetime, the way the galaxy is looking the sky is not going to change, right? We have to wait infinite amount of time, not infinite, but in our scale. So cosmological simulations actually track the formation of galaxies at multiple subsodes. So we actually have, along with how the galaxies are now, we have how they used to look in the past. We track them across the whole evolution. And that's super nice because we can use, for example, machine learning and terrain models on data that will have from simulations where the history of the galaxy is available. And then if we find out that they work quite well, we can just go on observations and apply them and infer things that we didn't know before, because we cannot observe them, right? But to make sure that we can do that, we first have to quantify and make sure that these cosmological simulations are actually trustworthy. So to do that, we have to do two comparisons. First, we have to compare different cosmological simulations with each other, because there's not only one cosmological simulation right now out there. Multiple teams created their own cosmological models separately, and obviously they defined different small scale physics. So this might create differences. And then, of course, we need to compare how simulations look in comparison to real galaxies. So first, we're going to first focus on this part, simulations, various simulations. And in order to do that, we have to go a step back again and get an idea of how galaxies actually evolve. So the evolution of galaxies, as we know now, is bottom up. So they start from smaller scale systems, and then they continuously emerge without the galaxies, and they keep creating larger structures. So it's like a high hierarchical evolution. And this, like on the left, you can see how a merger looks between two galaxies. And of course, like I don't need to say that I think this is a simulation. This is not something that we can observe. And so from that process, we can define, classify stars into different categories. We have the in situ stars of the galaxy, which is our stars that have been present in the galaxy before the merger event, and then they were created there by gas already present in the galaxy. And we also have stars that were accredit, or let's say stolen, from other galaxies. So if we manage to measure how many in situ stars are present in a galaxy, we get a pretty good idea of how much it has merged across cosmic time. So in order to test between the two different cosmological simulations, we will just set a very simple setup. So we will use machine learning, and we will use some of our favorite Python libraries because it's EuroPython, I have to say that. So the setup is quite simple. We will use as inputs for the model, like images of galaxies, of how we see them now. So this is like simulation data, but this is actually data we can actually get from telescopes or close to what we can get from telescopes. And as an output, we want to see if we with what we can see from the galaxy now, we can predict something about its cosmic past. So as an indicator of the property of the evolution history of galaxies, we will just take the fraction of X-2 mass in this galaxy. And then we will train in one cosmological simulation and see if we can predict on the other cosmological simulation. Or if the differences that the small scale recipes that these cosmological simulations enforce actually do not allow us to predict. So to evaluate that, first we need to plot how the ground truth looks. So this is the ground truth between the two simulations, the one on the left and the other on the right. And as you can see the X-2 fraction in a galaxy relates a lot with the stellar mass in the sense that more massive galaxies have accreted more mass from other galaxies. And this makes sense like in the heliological concept that we said before, the more a galaxy merges, the more massive it gets. But still this relation is kind of different between the two simulations. So this is how well our model as the neural network model does in predicting this X-2 fraction just from images of how galaxies look today. And this is in a fixed simulation. So you train in one cosmological simulation and you test in the same simulation. And we see that we have quite nice results, right? So this means that we can actually infer a property of the history of a galaxy just from how it looks now. So this is already nice. But now, thank you, so now when we go to across cosmological simulations, so we train in one cosmological simulation and we test on the other, now we see that we find this bias. And this means probably that the small scale differences that we described before actually mess this up for us. We cannot predict from how galaxies look now as an aspect of their emerging history. So this result doesn't make us feel very confident. But still this doesn't discourage us. We do not lose faith in the cosmological simulations as a tool because we can play around a little bit with the inputs and then just decide to discard some inputs and then use some others. And we find that if we only train with inputs that are more closely related to galaxies, to gravity, sorry, we are actually able to cross predict across simulations quite well. So we managed to find features that are independent of the small scale differences between simulations and they're actually robust. So this means that we might be able to use that so that we can actually go and predict on actual observations, which is nice. So can we do more? So can we just see how our galaxies look now and instead of predicting like a single number for just one galaxy, can we, let's say, predict the origin of every star that we see? That would be nice. So let's see how we would go about doing that. So for that we will use like a more complex convolution neural network now. We'll use a conditional variational autoencoder. And what a variational autoencoder is, simply put, is just a version of an autoencoder that just compresses images in a low dimensional space. And then it's able to reconstruct them just from this dimensional space. And then you can also make this conditional. So you just factor in the encoding process and the decoding process some conditions that you want. And then you are still able to reconstruct the results. And the nice thing about the variational auto encoder is that the latent space is well behaved. So it has good properties. And that means that you can actually use this model as a generative model as well. So you can completely remove the encoder part. And during inference you can just sample from the latent space and then just create new images that follow the distribution of the inputs that the model was trained on. So in our case what we're going to do is we're going to use as conditions the images of how galaxies look right now. And the ground truth and the reconstruction that we're trying to achieve is the information of how the evolution looks in a 2D image. So some part of their evolution history. And of course we need to even the conditions are 2D images so we need a convolution neural electrical for that to compress that. And during inference we will just ask the model okay I know you have been trained now on simulation data, simulation data. I want you to produce me from what you know from this latent space that you know now. I want you to produce me a galaxy the evolution history of a galaxy or how this would look in a 2D image. Given that I give you as inputs this how this galaxy looks now. And this actually seems to work to work quite well. So here on the top I have the observable inputs that this galaxy this like this is three different galaxies from the simulations and they depict properties that of how this galaxy looks right now. And on the bottom we have how the ground truth looks and how the model predicts it. So for the all three galaxies the result looks quite nice. I think so if you want more info you can just scan this QR code it will get you to this publication. And so from this work what we've learned is that we can actually use cosmological simulations and we can use them with machine learning. And they can actually help us unveil some part of the evolution history of galaxies. But before getting too excited about that we still want to make sure of this other part that I said before. We still want to make sure that simulations and real galaxies actually are close related together. Right. So for that I will give you to Hina. Thank you. OK so from an observable point of view I'm working with the interior field spectroscopy. So this type of data is just spatial into axis and one spectral axis with all of them information of galaxies. So we can just think of at every wavelength we have an image of a galaxy. So the cool thing about this data is that we can derive physical properties. So the 2D maps that she was showing the kinematics the star ages and the star chemical composition. They can all be derived from this data. The only thing is that we have to make a lot of assumptions on how the stars are formed how the light propagates through space and the instruments and so on. So for this we really need to make our simulations look like observed galaxies and this is what I worked with. So for that we take a galaxy from the simulation and then run it from a Python pipeline where we put all these ingredients. So we have from the astronomical part star formations that are evolution star emission dust absorption in the instrumental side. We have all the description of the instrument that we're using to observe the galaxy. So we have the resolution the sampling and so on. So then we can get an observations like data cube and this pipeline usually takes like several hours to run from one galaxy. Just as a comment for now and to compare with observations we need to cover a variety of different types of galaxies. So for this we need to have really multiple galaxies and this is because even though simulations do reproduce galaxies well they generate galaxies that look like the galaxies in our universe. They are not reproducing exactly the galaxies that we are observing. We are not generating an andromeda galaxy. So for this we need a lot of simulated galaxies and we did this for 10,000 galaxies and you can see the results of this comparison with observations in a paper that I show here in this QR code. But well maybe the interesting thing for you is that this data is all public and you can go and check it out on your own and go and trace back the history of the galaxies that are simulated because all that data is public. So to further compare the physically resolved properties of galaxies because what I showed before was only the integrated properties. We use contrastive learning. It is a self supervised learning algorithm. In particular I use convolutional neural networks because I'm using as input the same type of data that she's using so 2D maps. And I'm not going to explain anything about this maybe. But if you're interesting we can talk about it later. So the only thing that you need to know is that it's useful to extract meaningful representations by applying transformations that we want our data to be invariant to. So for example we don't care about how the galaxy is orientated in space because it's just a projection effect. So for that we just rotate the image or things like that. So here I'm showing how a projection of this representation space that we obtain marked in green around. And I'm comparing with the components that are obtained via linear decomposition just principle component analysis. So we can look at observational effects. So as that I used to color code this plot. So each galaxy is a dot and it's color coded about this taking into account this effect. So we can see that PCA shows a smooth transition for apparent size. For example there's something that we don't really care about because it's not something intrinsic from the galaxy. And this means that this space is correlating a lot with this property. We don't want that. And in the other case we see that the distribution is quite arbitrary. And we see the same for orientation. Now if we look at physical properties the things that we do care about for example how the galaxy rotates that is the galaxy spin or the age of the galaxy. We see the smooth transition in our representations but not so much in the PCA. So this is a proof of concept done on observed galaxies but we want to take this further. But first we're going to see some examples of what we do with what can we do with this representation space. So on one hand we can do clustering to see what are the average properties in different regions of the representation space. We can also try to like we can find one galaxy that we're interested in and try to look for similar examples within this representation space. We can use it also to find galaxies that are further away from all the rest in the representation space and might be weird galaxies or rare galaxies that we're interested in studying a little bit further. And what is my next step is to use this representation space as a common ground to compare the simulations with the observations. Because in this representation space won't be affected by the observational effects we can actually compare if the two sets of galaxies live in the same space or not. So yes, as a conclusion we can say that we mostly understand the broad properties of galaxies. We can replicate those trends with simulations. So we can we know that all these ingredients, gravity, supernova, star formation and acting in the clay are part of how galaxies evolve. And if we try to form a prediction of a galaxy at a given stage we mostly recover something that makes sense. It looks fine. We have the kinematics, the mass distribution that is fairly well recovered. But it still needs some tweaking in the flavor of this cake galaxy. And this is mostly related to, well, the chemical composition that is related to star formation. So for this more detailed comparisons as what Yidini was showing or what I was showing need to be done to actually constrain the physics and the models behind the processes that regulate galaxy evolution to get the proper recipe to form this cake galaxy. So why we are here today is basically, let's see, as astronomers we can't just go and measure a galaxy with our hands. We can't like generate our own experiment. So we have just data that lands in our hands and we need to think of ways to derive physical meaning from this data. So this is where astrophysics and Python meets because basically what we are doing is we're just doing data processing and modeling. So we use all these libraries and even more. And something quite nice also is that most of the code is done in Python in astronomy at least today. And we have dedicated libraries for that like astrobi, which is super nice. It covers a lot of different ranges in astronomy. And well, also something that is super nice is about these communities that we have lots of data that is publicly available. Like whoever can just download the data. And if you're interested, like just let us know and we give you the tips. But it's mostly just go get a survey, look for survey data access and you got it. And the same with the simulations. So yeah, if you're interested, just let us know. Thank you. Thank you. Great talk. And we have some time for questions if you can come to the. Yes. Yeah, hi. Thank you for a really nice and interesting talk with lovely visuals. You said that in the simulations you have the time evolution of the galaxies. And then when you observe galaxies, you obviously going to have it's fixed, but some of them are going to be old, some are going to be young. Can your machine learning models predict the ages of galaxies that are observed? We can derive the ages like through cosmology, through models that can, like you can use the images that you get from telescopes. Then there is some way to derive the ages from there just by taking the chemical composition of stars and something like that. But still these models are not very accurate. And we could, I think people have already done that or I'm not sure. We can, of course, train a machine learning model to just see how a galaxy looks today and just predict the age. And then compare with the other techniques we already have. I'm not sure if someone has already done that with machine learning. I think that it's possible to do with simulations and machine learning, but I don't think that the result will be directly applicable to observations because of all the problems that we have to determine those things from observations. So that's like one of the big unknowns so far in galaxy evolution, I think. I mean, a rough idea, yes. Hello. Thanks for the nice talk. It was really good. Very interesting. I'm wondering, you said that you are using PCA. That means that you have a high dimensional feature space or what are your features typically? Yeah, so for PCA, I didn't really use it. I was just in comparison with, I mean, with what we get on the other hand. But, yeah, the input is the the input maps in 2D. So we just flatten them. And this has kinematics, H, metallicity, and just an image of the galaxy itself. So about five features, I guess. It's more for visualization then? Five maps. So what's a map? So 2D. So it's like one dimension, two dimensions, and then five stacked maps. OK, thanks. Let's give Irina an original question. Can I have one more question? Yes, please. More of a general one. I'm wondering how important do you think is coding knowledge in natural sciences in general? That's a very nice question. So like what we see, it would be very nice. We've got some coding lessons before going into natural sciences because we see that we have so many data that we are actually enforced to learn coding ourselves. And sometimes maybe we don't learn the best practices or something like that. So in a sense, we manage to make things work. But of course, if we got some input maybe from you, we would make it much more efficient and much cleaner code or whatever. Thank you. Of course, super important. Hi, thanks for the talk. I was wondering if you only trained in a purely data-driven way or if you also experimented with basically encoding knowledge from physics in the network or in the model via the loss function, for example. But it's certainly an option. Yes, some people do these other things. But I mean, yeah. It happens. We didn't do it, but some people do it. OK, the truly data-driven. OK, thanks. So I have a follow-up question from the last second last question, which was about taking help from developers or software developers via IEA. So is it the work you are working on is open source where people can see what you're working on and maybe recommend what is the best practice or something like that? So we can only answer about ourselves, I guess. But yeah, after the publication, we try to put everything in a GitHub repo and then have it publicly available. Yes. Cool, thank you. Nothing, in my opinion, that's the best practice, I mean, to make your data public so that people can just actually use it for the next work and even check that everything is working fine. But it's not always what happens. Actually, we have time for one more question. OK, let's give Irina and Regina a big applause. And thank you very much.