 Valeria Petorina and she will tell us about exploring the universe with Python. Please welcome Valeria. Good morning everyone. Thanks a lot for being here and like first to thank the organizers or live on YouTube for the opportunity to talk here. I will discuss how we can use Python in cosmology to unveil the universe and in particular the dark universe. I'm Valeria Petorina and I'm a physicist so just let me briefly introduce myself. I'm a physicist. I work in astrophysics and cosmology so the study of the understanding of the origin, the evolution and the content of the universe. I work in particular for two space missions. In fact the lights were fantastically down before. For two space missions financed by ESA and the European Space Agency and NASA. So the first one is Planck satellites that was launched in 2009 and we released the data last year for cosmology and the other one is this beautiful one Euclid space mission that will be launched in 2020. I'll tell you more about that afterwards. I've been also working a lot on communication. I've been in charge of the communication, internal communication for the Euclid space mission and public outreach for two years. I'm also very much interested in data science. I've been working separately for a health care project for a startup in London for some time for an IoT project but that's a different story. But I mentioned that because I've recently also become ambassador for the S2DS program so before we go to cosmology let me also tell you about this program. It's science to data science. It's the largest data science bootcamp in Europe. It's a five week program that happens twice per year one virtually one in place in London and it really aims at joining the academic community with data science experience. The ambassador program in particular aims to build a network between scientists from academia and outside the data science community and they support talks also covering partly expenses if one wants to organize some event. I'll be moving actually to Paris in a few months and I'll probably be organized data science workshop next year. So if you're interested in taking part or be part of this community please contact me or just look at the web for the ambassador program. Okay so let's look at cosmology now and let's first understand a bit at which distances, which scales are we talking about? Well human beings are more or less of dimensions of a meter roughly in this scale and if you go down to smaller scales then you reach the interest fields for chemistry for atomic physics for nuclear physics down to 10 to the minus 15 meters and even lower scales the scales of particle physics where the large adren collider at CERN is working or at those very small distances detected that we heard yesterday from gravitational waves. But now I would like to bring you up in the other direction at very large distances beyond human beings beyond earth beyond the sun in the domain of astrophysics and cosmology. So we start our journey across the cosmos from our planet earth which is one of the planets in our solar system this blue dot right there and the whole solar system is here in the edge of the spiral of our galaxy which is the Milky Way. So here we are at about 10 to the 21st 21 meters but we want to go and we can actually go we have the power to look much farther than that. In fact this is a picture I don't know if it's too bright to see anything but that's that's a photo taken from the Hubble Space Telescope financed by NASA in which every single point in this picture is a galaxy like our Milky Way and we can go even even farther and for that I'd really like the light if possible to be even lower if there is anyone there before I start because otherwise you won't see anything of the next slide and also I mean it's a dark universe so somehow okay that's already great that's already much better thanks so that's again picture of all the galaxies around us but we can go even farther so if you imagine that you are somewhere in the center let's say of this video and you go far away from these galaxies then these are all galaxies which have been observed by the Sloan Digital Sky Survey collaboration that observed about 1 million galaxies and they're just placed around in space as they've observed them and as you go farther and farther from our galaxy you see that they don't really fill out the whole space but they actually form a web they form voids places where there's no galaxies and filaments places where there's lots and lots and lots of galaxies and this is all really I mean observed by the Sloan Digital Sky Survey so this is what is called the cosmic web in addition we know that the universe is expanding this already since it's a long time so in the sense that really distance between galaxies space itself in between galaxies is stretching and for a long time the expansion has decelerated so just going slower and slower due to gravity that tries to pull things together and slow down this expansion which is also what you see here so there's like slowing down and then suddenly about five billion years ago it started to accelerate these expansions started to accelerate very very faster and faster and this was discovered only in 1998 it was a huge huge surprise that got these three people Nobel Prize in physics in 2011 for the discovery of the acceleration of the universe and right now the universe is is accelerating is in this phase of accelerated expansion now since then since 1998 have been several experiments and so a lot a lot of data coming from different experiments from ground in space different collaborations looking at different things and they all seem to point out towards the same surprising picture of the universe a universe which is mainly dark so atoms ordinary matter all human beings and basically stars they all account for at most five percent of the total energy budget in the universe the rest is completely basically unknown and is we know that it's partly for about 25 percent in the form of dark matter so that's a form of matter that still feels gravity and it's like the glue that forms galaxies that keeps galaxies together and even more more mysteriously about 70 percent of our universe energy budget is in the form of dark energy that's dark in the sense that it does not emit light we haven't actually seen detected the particle of dark energy yet or that matter yet but we know that it's responsible for this accelerated expansion of the universe and so understanding 95 percent of the universe as you notice is like almost embarrassing so it's it's it's the major challenge at the moment and for the next generation of experiments and this is this is a cosmic vision of really having the big picture understanding again 95 percent of the energy that surrounds us but it's also it's also a big data challenge that joins a lot of different communities together so there is already a new generation of experiments among which the next one to be launched again is the Euclid space mission which are going to use different probes to scan the sky slice it at different epochs in time so they're going to observe for example the shapes of billions of galaxies at different epochs in time and this is this is a huge challenge it's a it's a challenge from the technological point of view because you have to of course predict the technology and build a new technology to for to have let's say the resolution that allows you to to discriminate among all the possible theoretical models that it can explain dark energy to actually build the detector to actually transfer the signal and compress it so the whole signal processing challenge to understand the to reconstruct the shape of the galaxies and to compress the data that comes from space and to expect to actually in interpreted in terms of comparison with theoretical models to finally all together test gravity and fundamental physics at very large scales like we do in in at very small scale so testing forces testing interactions at very large scales like people do at the LHC at CERN for example and I would like to stress that this is not the work of a single astronomer a single person that looks I don't know dry writes down strange equations on the blackboard or or looks at the telescope from from somewhere this is really an enterprise in a way this is work that involves huge large collaborations so I'll tell you something about the two in which I'm in the first one is Splunk and this is this is a collaboration of about 100 scientific institutes in Europe in the US and in Canada it involves about 500 people and I for that I've been leading the analysis that compares the data from the satellite to theoretical models that predict dark energy and and the theories beyond the general relativity so modified gravity the other mission Euclid is more than twice as big so at the moment it includes 13 hundreds people from 120 laps 13 European countries plus US NASA and Berkeley labs so for that I am I'm in particular apart from working on the communication I've I'm in charge of the whole forecasting activity to determine a reliable pipeline that can tell us how well Euclid will perform in in discriminating among different theories okay so let me tell you a bit let's say more in detail what we actually observed and how we actually analyzed the data and of course where we used Python in it and I'll do that for a particular for a plank plank was launched in 2009 and we collected terabytes of of data it was sent 1.5 million kilometers away from earth orbiting along around the second Lagrangian point somewhere on the opposite side with respect to the Sun and it scanned the entire sky twice per year so the spacecraft spins with one rotation per minute and it traces circles in the sky observing the radiation in all direction at different frequencies so it contains two instruments one at low frequencies and one at high frequencies and the high frequency part had to have a very complex cryogenic system that had to cool down the the whole detector down to 0.1 kelvin so it was literally the coolest place in the universe for a while it observed in all directions as you see all the radiation and what you see here is is the emission also from our galactic plane along this line which actually for us is a background we remove it we don't want to see the galaxy our galaxy the light from our galaxy what we want to see is something which is much more challenging what we actually saw it's something which is much more challenging which is the light the cosmic microwave background that was emitted 13 billion years ago and and this is a map of it this is one of the main results outputs from the Planck collaboration in which you see this is a microwave radiation and that the different colors correspond to different temperatures so tiny tiny differences in temperature in this radiation the mean temperature is about 3 kelvin so it's very very cold that's why the the detector had to be so cold even colder than that but we actually what we're actually interested in is this tiny tiny differences in temperature when we look at different directions so all these are like hot and and cold spots when you look in different directions and they're there we have such an amazing resolution of of this map that we can understand how this light traveled down to us and from there understand the evolution of the universe and reconstruct the content of the universe itself so that's really sort of similar to what you would do a map of the temperature on earth on the top where you would go up to whether not 40 degrees or even higher in Bilbao recently or but just on the sky so this is 3 kel around 3 kelvin so minus 270 degree centigrade and really you see tiny tiny differences of one part of 10 to the 5th of something that was emitted 13 billion years ago and that gives you a resolution on say on on the parameters that describe your universe on the amount of their energy their matter and the expansion of the universe with a resolution of the at the percent level so I think that's just almost astonishing and most of the analysis is actually in the whole processing of the data in trying to get rid of all the other sources all the individual point sources that for which we have catalogs that we just remove we remove the radio emission from the Milky Way this Milky Way is really annoying let's say for for us we remove all the dust emission again from our lovely Milky Way which is in itself however of course of interest for other communities that that I mean study that in particular in order to unveil the the cosmic microwave background and this was the result that I mean you might have also seen this map was kind of advertised in basically all newspapers on page what we actually get of course it's not really the map it's something terrible happened so we what we get is just time order data the beginning that that's an example of three minutes of raw data that we get from the satellite and then this is most of the analysis is really in processing this data and for that we use the several classes all over the world basically this is the main data processing center are in in Italy and France both for Planck and for Euclid and they collect basically terabytes of of data and for the next generation of experiments we really expect also from radio telescopes to have about terabytes per minute of data that arrive so there's all this information that comes from the satellite that arrives at the mission operation center in Germany and then it's transferred to Italy and France where there is the data processing center and then it's transferred to the whole community basically again around the world in different institutes and there's different groups with that they extract from those data clean up all the all the data and extract these these these maps there are challenges between different groups to understand which one performs better we then extract from them we we project in spherical harmonics to identify say the the dependence in at different angles in which we are looking at and for all this process there's actually as it was mentioned actually in the talk before there's lots of different codes by different people written in different languages so for the extraction for example of the maps lots of them actually are in IDL and use ill pics lots of which it's unfortunate in sense it's not even open source there's lots of Python in it there's lots of CC++ some matlab and yeah from all of that so from terabytes of data we can extract the power spectrum so really the the data that's on the y-axis you see again the temperature perturbations temperature differences at different scales as a function of the spherical harmonics so as a function of the angular scale this is very large angular scales and this is very small tiny angular scales and then there is a whole processing which have to try to compare this with theoretical models and fit that somehow and the fit that I'm showing you is exactly the one that corresponds to the pie that I showed you before so the thing is if I can probably show you this here that you can find online so depending on the amount of atoms of dark matter of the energy that you put in it you get different kind of predictions different kind of curves so for example if you all have 100% atoms only atoms then you would get this kind of curve that as you obviously see does not fit the data so if you in order to fit the data you need to decrease the amount of atoms even more and decrease the amount of dark matter and as you see all the predictions are changing and there's other parameters on the expansions and when the reunitation takes place and initial conditions and finally hopefully if you have only 70% of dark energy then you can actually finally match the data now obviously we don't do that in this way we have to analyze the whole region of parameter space and use several tools there is a whole collection of tools which is available in the cosmology in the NASA website I'll show you here and all these codes are all open source and they're all available you can all play with them there's several of them so for the future missions you need in particular the whole sounds grant ground segment and also all the forecasting activity that I lead have chosen Python as their recommended language so most of it will be in Python for the actual interpretation at least for the interpretation of the plank data of course you need to do simulations we use Monte Carlo Markov chain to compare the data with the whole parameter space of of the theoretical models of the predictions of the theoretical models and we use a Bayesian analysis with that so we try to to build chains that reconstruct the posterior distribution so the the probability to have that model given those data and we have several tools for that but in particular there is one that I want to mention because it's written in Python and it's a code which is open source it's called Monty Python it's a Monte Carlo code written in Python you can find it on on github there's documentation and the main developers are Benjamin O'Drain and Julian Lescourts plus many many others so this for example will also be used also I mean for the forecasting activity in in Euclid now all this is requires to deal with complex data and also to combine data that come from different sources from different experiments and that sometimes I mean look at different things like different parameters you have to deal with several free parameters the ones that describe your common cosmology so the amount of matter the amount of of their energy the expansion and so on order of 10 parameters per cosmological model plus about 10 to 100 parameters that describe the instrument and all the systematics involved in it so we need to sample very efficiently in parameter space there's different possible samplers that are used and also integrated in Monty Python for a long time the people have used and are still using let's say code which actually was written in fortune 90 and that is a sort of part of it is now fagocitated by Python which is called Cosmo MC and Monty Python is a more recent version for the moment written in in Python too of course it guarantees that it's much more concise with respect to the previous code that was used it it allows to run with I mean much more stable Monte Carlo chains for days investigating parameter space and it also allows to have a much more modelled structure in which we have to understand that basically this is in has to interface with different codes that for example deal with the data from different experiments or with different samplers in parameter space or with the different codes that solve the actual equation that describe the universe from the big bang down to us so all these modules are written sometimes also in different languages and they're all integrated within Monty Python so that's a sort of schema of the modularity of this Monty Python part that's the part here for example integrated here with class which is a code in C that solves the whole evolution of the background all the top is comes from different data sources and then there's all different samplings here on the right-hand side this is also so Monty Python recently so it's also on binder so if you go for example to the link on the top you can also see part of the of the class github repository transformed into I Python notebooks and you can play with it and it includes examples and repository with previous results what is not yet optimal in Python at least for this project for us it's mainly that it's slow for what we need to do for for the force especially means some some parts the previous the previous code that was written in Forton at least I mean has as a huge let's say is integrated with the open MPI and one can run simultaneously lots of different chains and also run everything on a grid so that basically you you investigate very quickly a large fraction of parameter space this is not yet integrated into the Monty Python part and Monty Python uses MPI for PI but of course it would be much useful much more useful to have something like open MPI and yeah so in fact if you have any ideas or or yeah on how we improve that we we need input from the data science community Python is used instead a lot in all codes for the analysis of the chains and for plotting all the posterior credible regions so basically all the regions in parameter space that identify how big can each parameter be so that's an example that's another example of plots that we usually look at of like 3d sort of plots produced with Python and also in combining different experiments so this this is for example one of the of the results that we that I I had in the when comparing data from Planck with say general relativity so if you if you see here this cross corresponds to the model represented by standard general relativity and you see that while Planck the blue contours roughly it's still I mean fine with general relativity it agrees with general relativity there is some tension when you combine Planck so information from the early universe with information from the late time universe so with other probes with other probes from surveys of galaxies so basically you just have to look at the red contours this combine different data sources from different experiments that combine information from the early time and information from the late time universe and their combination prefers theories which modify gravity with respect to general relativity of course this is not this is just I mean this is only at the 3.5 sigma so it's not what you would call a detection but it's it's something that will be of course of much interesting with that we will be able to detect with the future generation of experiments that will have much higher resolution in addition we can produce maps like that so that's that's a full sky map of the polarized emission from the dust of the Milky Way it looks like some some impressionist portrait but it's really the polarization of the light related from the dust of the Milky Way and that's important because this is in a way it's a background so the problem is that the point is that gravitational waves that we heard yesterday can also have an impact on the polarization of the CMB so on the polarization of on this light that gives us a picture of the of the early universe and this undirect detection so detection of gravitational waves through the CMB through this this microwave radiation has not happened yet we haven't seen that yet but for the first time we have a full sky map of other sources that can mimic the same kind of signal and so in the next in the next months basically there will be also new data from from ground from balloons looking at the polarization of of the CMB trying to understand again to detect the gravitational waves also in this way so there is really a revolution that is coming in the next 5 to 10 years to unveil the dark universe it's a huge challenge it's a technological challenge it's a big data challenge we have again terabytes of of data coming per day at the moment and like per minute in future radio telescopes there's a lot of there's been a lot of investments already from national funding agencies from ESA and NASA to understand this problem and again it's a big data challenge and we want to join different communities of course to get the best scientific return so it's not just about one person working somewhere in some office it's about joining expertise because this will actually in a way determine and that's a bit drastic the future of our universe whether everything will be destroyed or whether we will expand forever or whether we will just collapse again through gravity and this depends on how much of this technology there is in the universe so overall we really want to be sure that we look at the big picture and we join expertise coming from different fields to understand exactly what are we actually observing so yeah thank you that's fine so questions first of all thank you it's excellent talk really really exciting thank you the background radiation the picture you put up is not uniform why is that and to me it looks like clouds in the sky are you fractal does that mean anything yeah okay so that's a very very good question actually let me show you just the map exactly so it if you look in different direction it's mainly isotropic and homogeneous in the sense that it has a mean temperature more or less everywhere at three kelvin but really what we are interested in is exactly in anisotropy in differences in temperature this is what is mapped here differences in tiny tiny differences in temperature with respect to the mean temperature and these tiny differences in temperature are due to the fact that in the very early universe there were very tiny density say perturbation so very tiny differences in space due to let's say they were stretched it's really about the initial conditions of the universe just after the big bang there was a there was a phase of very fast expansion which is called inflation in which very tiny differences in densities so in which some matter was more in some place and less in some other place were stretched to microscopic scale and this reflects this affects the temperature of this radiation that was emitted at that time so what we really see here is really a picture of the initial conditions of the universe it's a picture of the universe as it was 13 billion years ago that's the that's the farthest that we can go up to now it's a that's surprising isn't it sorry that's surprising because how can I put this we would expect things to be uniform unless we had additional information so in some signs for example we use probability or whatever we assume equality or homogeneity isotropy if we don't know we so I guess does this say that people are working to find out why there were differences in matter density or it is yeah it seems surprising to me yeah yeah I mean you you usually when you solve the the evolution of the universe you assume homogeneity and isotope and isotropy and and then you treat this as linear perturbation around a mean universe homogenous and isotropic background and yeah that that I think yeah pretty amazing I think yeah okay one question one question now you mentioned before that you were looking for ways to accelerate some inner loops and computation stuff have you tried number for instance on any other solution like siton or siton yes siton yes it's already used for example to wrap modify so modules in multi python which we deal with data and the with the likelihoods with the codes which actually solve the evolution of the universe so that that's already used a lot the the main problem is that the region of parameter space it's it's really huge especially if you want to test models beyond the general relativity which are still allowed absolutely allowed by the data and so it's really the process of sampling that I think should be somehow yeah become faster okay thank you if you have more questions I invite you to contact Valeria outside directly thank you thank you very much