 Let's see. Okay, I think we are live, yes. So hello everyone and welcome to another session of the Latin American webinars on physics. I'm Joel Jones from the PUCP in Peru and I will be your host today. This is webinar number 143 and we're having Miguel Romero as a speaker. Miguel did his PhD at the University of Southampton in UK, which was followed by a brief postdoc at the same university. He then abandoned us for a couple of years working as a data scientist and machine learning engineer at a startup in Cardiff. But fortunately for us, he came back to academia as a postdoc at LIB in Portugal. He's currently back in Southampton as a visiting researcher and also working part time as a data scientist in industry. So if you think that you were busy, think again. Miguel will join Durham University as a postdoc in October. And today he will tell us about artificial intelligence and machine learning applied to particle physics. We are very happy to have him as a speaker. And before we begin as usual, let me remind all viewers that you can ask questions and make comments via the YouTube live chat system. And these questions will be passed on to Miguel at the end of his talk. Okay, so Miguel, if you could please share your screen. You're ready to go. All right, so thank you very much for the nice introduction. And indeed, I'm going to be talking today about artificial intelligence and machine learning for Collagen-BioListana Model Physics. So this is the very rough outline. I'm going to start with a very brief introduction of machine learning and its context in HEP. And then now we're going to move to do lines of research that I've been conducting one for Collagen-Physics with a very specific focus on Jet Quenching by the Parkman Plasma. And then the other one just goes into my heart and something that I'm more quite keen on progressing which is to use machine learning and AI for BSM model building, namely applied to parameter space scans. So machine learning in HEP. And before we start with machine learning in HEP, let's talk about what is machine learning and artificial intelligence. So artificial intelligence is the quest of creating machines that think and act intelligently. And in his seminal paper, Alan Turing, very quickly realized that a machine in order to exhibit some type of artificial intelligence and type of intelligence has to learn from experience. So in his seminal paper, computational machinery and intelligence, you have seven sections and it's seven sections about learning. So machine learning is then a subfield of AI that concerns itself on how the machine can learn from experience or the way that we see it in a more pragmatic way, how a machine can learn how to perform tasks. So the most, I think that the most intuitive way of thinking about it is to think about machine learning as a different type of programming, nothing more. So in classical programming, what you would have would be a programmer that would input rules into the code. So you write code and you compile it and you have a binary that will process inputs, which would be data to give you answers. So this is what classical programming is. You input the rules, you create a program and the program will transform inputs into answers, into outputs. And machine learning, you have a different paradigm of computing. You already have a lot of data, which has the answers and you don't know what are the rules that produce the answers that you're seeing in the data. So for example, these can be labels of pictures. This can be what type of label data you might have lying around. So what you do is you develop a machine learning algorithm that will try to derive the rules from the data. And once it has produced the rules, it is equivalent of being compiled and then have a new program that will be able to process new data to produce answers. So machine learning is a different paradigm of programming where the rules are not imputed by the user but are found out by the machine. Now, it's very important to understand the current scope and capacity of machine learning. So the current paradigm of learning is that of statistical learning. So this means that machines can learn functions over the distributions of the data. What does this mean? Is that your data by definition is finite and by being finite, it will have a compact support over the variables. And therefore the only functions that the machine will be able to learn will be within these bounds, the compact support of your data. As a result, machine learning algorithms tend to be incredible interpolators and not so good extrapolators. Another feature or bug is that the current paradigm of learning requires large amounts of data, far more than what a biological intelligence needs. Up to some discussion of that subject. Ehep, there has been a huge research into our interest in machine learning and AI applications and I'm not going to review all of it. It is literally impossible. Thankfully for us for the last three, four years that this resurgence has happened, someone has decided to create a living review that tries to collect the references into topical organization. And so if you are interested in a specific topic, you can go to this website, you can click control or comment F, use a keyword of something that you're working on and chances are that someone has already done a paper on that subject. Obviously it might not have been your idea, but it is a very good way of doing a very quick review. So I'm not going to cover everything that the community is doing. Obviously, I can still self promote my work. So apart from what I'm going to be talking today, I've also been working very closely with experimentalists, namely their artist collaboration on how to develop machine learning algorithms that could be generic discriminates for new physics. So things that would not be trained on specific signal hypothesis, also playing around with them with clustering of collider events. But, and I am a PI for a project on quantum machine learning in HEP. So apart from AI and machine learning, you also have noticed that another technology that has gained a bit of interest both in the industry and in academia is that of quantum computing. And so quantum machine learning is nothing else than just performing machine learning tasks on a quantum device. And there is a paper to put out and it's a project that we have for a name. Now I'm going to talk about some of the work that I've been doing more recently. And I'm actually going to start with our first work on the matter. And then I'm going to progress to our most recent one and something that we have in the oven. So I'm going to talk about the collider studies for jet quenching. So first of all, what is jet quenching? When you have heavy ion collisions at really high energies or relativistic heavy ions, you can for a very, very brief moment in time create something which is the quark wound plasma. So the quark wound plasma, it's a very, it's a colorful state of matter, very hot and dense where quarks and gluons are actually three. And so they are the relevant degrees of freedom. So it is like a plasma. So this has been first then observed at the relativistic heavy ion collider, but now as well as the LHC. During the brief time that the plasma exists, jets which will be then produced by heart scattering processes within the plasma will have to traverse the plasma for a short period of time. This will then leads to jet quenching which is the modification of jets by the quark wound plasma. So therefore jets can be used as probes to the quark wound plasma. So you can imagine or you can think about this as using the LHC as a microscope to the quark wound plasma where the jets is what the microscope is picking up. So obviously this is difficult because first, so here we have two heavy ions colliding by themselves on the side view. So this is side view. So they look like blankets because of Lawrence contraction, right? But if you see like along the beam line, then you have quark wound plasma generating in some heart scattered points from nucleons were going to create jets. Then eventually the quark wound plasma starts to cool down and the germination starts to take place. And in the end of the day, you only have hydrants and hydrants will then be the jets that will deposit signals in your collider. So it is very important in order to say the quark wound plasma colliders to be able to isolate the jets that were modified by the quark wound plasma. Now it is important to understand that not all quenching or not all interactions between the jet and the quark wound plasma is made equal. So as the jets traverse the quark wound plasma, there's going to be for momentum and color change within medium. And so at very early stages, the part on shower is going to be, it's going to be changed. So it's going to be modified. So later stages of the jet in principle will have imprinted in them to changes to the early history of the jet. So they're very intuitive fusion that you normally have for hard sketches in proton protons just like in his picture breaks down because what you have is something more on the right-hand side which is you have a particle colliding and then some jets will then traverse large amounts or a longer periods of time inside the quark wound plasma and others will not. But you can also have jets that will traverse just on the rim, some on the center and et cetera. So you have this capacity of using quench jets as a multi-scale probe for the quark wound plasma because all of these modifications are going to be equal. So some of the jets even at heavy-on-clutch will be very vacuum-like. So they will not interact with the quark wound plasma. Some will interact a bit, some will interact a lot. So you don't have a discrete binary yes or no has it interacted. What you have is vacuum-like jets and some that were then interact with different amounts. And what do we actually have at experiments? Experimentalists can only see jets. So you have here two event reconstructions at Atlas and CMS to back-to-back digest reconstructed in a very clean reconstruction. So you have almost no, I mean, you probably already removed the pile up and et cetera. So you have two very clean jets. And additionally, people approach this problem with global observables of the jets. So for example, mass distributions, and in fact, that's how jet quenching by the quark wound plasma was first observed was something called the IRA, which is how the difference between the distributions of the PTs in heavy-ion with PP collisions. However, because I said that initial history of the jets will happen inside the quark wound plasma at heavy-on-collisions, then the branching patterns will change. And so the jet will modify its history as it develops and has its adrenizes. So in principle, there will be imprinted information in a jet substructure about the interaction with the medium. So we need to start looking into the jet, inside the jet, not only the jet as a whole with its global properties, but actually the constituents of the jet and its substructure. And then the big question is, is there more information inside the jet that the current set of the art substructure observables are not captured? So this has led us to our first work on this topic. So with Liliana and Guilherme, which are our two QCD theories and ethnologists, Huta and Nuno are our two Atlas experimentalists and Felipe was doing a masters at the time with Guilherme. So, and we use deep learning. So another jargon word that I have to introduce, so what is deep learning? Deep learning is just a subset of machine learning. So it's a subclass of machine learning algorithms that trends something called neural networks. I'm not going to introduce this, I have backup slides if you need a bit of introduction about what deep learning is, but they are very powerful machine learning algorithms and you must have here a hundred nine neural networks that deep learning all over the news, things like chat, GPT, diffusion models, et cetera. So all of the most powerful modern machine learning models are all based on deep learning. So the idea was borrowed or a follow-up from Schwartz and collaborators, which is to use the images, to use the colorimeter deposit the jet as images. So if you think about a color experiment, it's a cylinder where you have colorimeter cells where the jet deposits the energy in a hydronic colorimeter. So you actually have a natural grid and you can treat the grid as pixels and you can unfold the cylinder into an image, into a square image, right? So this is what they have done initially in a study to try to separate gluon initiated from quark initiated jets. So, and just as an image has three channels for colors, blue, green and red, they then do something like that. So they use three channels as well. As the first channel, they use the momentum of the charged particles. The second channel, they use the momentum of the neutral particles. And the third channel, they use the multiplicity of the charged particles. Obviously, not all of this information would be accessible in experimental setting, but it's still a logical study about how one could do this. So they use something called a convolutional network, which is basically top of the shelf deep learning architecture for image processing. So everything on computer vision using deep neural networks are with convolutional networks. And what you have is that you feed the image and then you're going to create some convolutional maps, which is going to be like a filter that's going to do with the receptive angles, going to parse the image. And you do this progressively until you have a vector representing the image. And then you try to classify between a quark and a gluon. And by doing so under the same conditions, they got state-of-the-art discrimination between quark initiated versus gluon initiated jets. Wait, I went one too far. So we continue on this idea for our specific case, which is to drive and discriminate jets that were produced in the vacuum from jets that interacted with the medium. So how did we do this? We produced a sample of the jet plus jets. So these are simulated samples. What is that? And the jets are back to back. And the only difference between both samples is whether the medium is there or not. So if there's vacuum or not. So everything else is the same. So the same hard sketching processes are all of these things. Use short pairs, the base and multi-carb generator. We use the same hydrogenizer. All the analysis steps are exactly the same. So you would prepare a sample from proton-proton collisions and a sample that will be from heavy ion collision. So you're going to see that the magnitude changes a bit between vacuum, proton-proton or n-quenched and medium led-led and quenched jets. So I'm sorry about that, but community is still not completely silent what to call what. And so we presented these two samples. And so for example, for some of the global, so these are global jet variables distribution. So for you to see that there are differences between medium and vacuum jets. So we do know that the jets that traverse the quarkin plasma tend to lose energy. And so you can see that the medium here, the PTs are shifted to the left as opposed to the vacuum. And they tend to lose constituents. So again, medium shifted to the left as opposed to the vacuum sample. For the Z plus jet sample, we also have a very nice variable, which is XJZ, which is the ratio of the jet of the Pt of the Z over the Pt of the jet. So the Z is colorless. So the quarkin plasma is completely transparent to the Z. So the Z traverses the quarkin plasma without interacting with it. But the jet does not. So in vacuum, you would expect it to have something picked at one, very picked at one, almost as a delta Dirac. You don't have a complete delta Dirac because there are jet reconstruction algorithms, errors and et cetera, and there are line radiation that can fall into the jet. But then for the medium, we can see that there is an obvious suppression of the Pt of the jet over the Pt of the Z, so you have to shift to the left. So we are not going to use this variable. This is a bit of our, let's say, not ground truth, but our indication for quarkin or not. Okay? So this is a very clean, phonological background to this study. So what did we do? Not only we did the images and we did something very similar to what the other guys did, but we only used the channels. We only used the Pt and the multiplicity at each cell of the calorimeter. But then we used the two variations of the image one, which is not normalized. So the pixel actually has the absolute magnitude of the Pt that landed in that calorimeter cell or the number of particles that have pasted. But the other one we actually normalized. So that both channels, the sum of all the pixels equals to one. So this is a try to understand to what extent you can still derive some information, which is an independent of the Pt, because notice that Pt is quite discriminant in itself. So we want to understand if there was discrimination beyond the Pt. Then we also had something called a Lincoln coordinate. So once you have the jet and the jet was produced by the Pt clustering, you can then recluster it. What it's called a Cambridge-Hachen reclustering sequence that will give you a clustering sequence, which is ordered by the angle. And QCD, namely hydrogen showers is angled order. So this has a very, not intuitive, but to say a very principled ordering for QCD processes at the level of the hydrogen shower. So the Lundplan coordinates then becomes the values of these, for example, Pt and delta R for each of this splitting. So you have here an example of what would be a Lundplan coordinate sequence, right? So you're going to have different points. And so you're going to have different distribution between vacuum and medium. Finally, just to understand how much of the Pt of the jet and the number of constituents of the jet are contributing to discrimination, we then trained a tabular discriminant just on these two variables, which was also known as work. So really for the images we did a convolutional network just like Schwarz et al. And so you will see here, we have two channels. So the Pt, so this is a jet. So you can here see the deposit of energy of the jet in different collimator cells. Here's the number of constituents. And so we have then four layers of convolutions. And then finally trying to guess once you turn this image into a single vector, trying to understand if that was from a vacuum sample or from the medium sample. So we do this for the normalized and non-normalized images. And for the Lundplan, we use a recurrent null network. So recurrent null network is a network that reuses it. So you can give it like time steps as if there were time steps. And so it reuses the same bits of information over and over again as it tries to understand what's happening. So for this null network, we present the jet as the splitings of the Kermit-Jachen's requesting sequence. And so the coordinates in the Lundplan. And again, we try to eat jets at different, can eat jets for different lengths. So this is, you can actually have jets which are only like a few constituents. I'm looking at very big. It's irrelevant because it compactionates in the end everything into a single predictor and then tries to predict whether if it's from the vacuum or the medium sample. And then as I said, just a very simple tabular against the network for those two global variables, the jet, PT and dynamic positions of the jet. So what is important to understand is that different architectures embody distinct assumptions and biases about the data. So images assume that information is encoded in a grid which for a Kermit-Jachen is the case with a compositional bias. How? Because the filters that are the receptive field that traverse the image will start by trying to create features from the local to the global aspect of the image. So there's a compositional bias. The load plane assumes that there is a significance to the order so that the step, so that the variables at the step T are dependent from the variables at step T minus one. And we therefore ordered them as a Kermit-Jachen clustering sequence because there is a QCD motivation for angle-ordered splitings of a Kermit-Jachen image-ordered clustering sequence. Whereas for the tabular data, there's no assumption besides that information is there and that's it. So what do we get? We get good discrimination. It's a very difficult problem. We're already more or less used to a difficult problem because this has been a big discussion in the community, in the QCD community in the second quarter of Osmar. What we find out is that we get very good discrimination. So this is what's called a rock curve which is receiver operating curve. And basically, once you have the train model, you have the output, the distribution of the output for that model and you can for different cuts or for different thresholds of the output, you can compute the false positive rate and the true positive rate that you are getting. And for each one of those points, you put here in this plane and then you connect them with a line. So that would be the rock curve. A perfect classifier would have area one. So the perfect classifier would just be squared and a random classifier has a rock, an area under the rock of 0.5. This is what it is, this dashed line. So what you can see is that the Lund, the Lund recurrent network is the best discriminator. Now actually, no, it's the unnormalized image just followed by the Lund. But you can see that the global which only has the Pt of the jet dynamic constituents has a rock curve area of around 73%. So there's a lot of disinformation that actually comes from the Pt and the number of constituents absolute scale. So this is also seen that the normalized images still have an area of greater than 50%. So there's something that it's learning which goes beyond just having the information about the Pt and the number of constituents scale of the jet. Contrary now, this is actually separating modified from unmodified jets. We now perform a cut at different acceptance efficiencies. So for example, here we have some of the observables that we talked about and we put all the jets together and then we did a cut at different levels, okay? Of acceptance of medium like jets. And what you can see is that the vacuum like jets or the jets which did not pass the cut always follow the distribution of our vacuum sample. This means that what is being rejected is vacuum like. Whereas what is passing the cut is even more different than the medium sample than what we had before. So all the open blue symbols that you can see here are more far away from the vacuum than the medium sample. So we are indeed able to remove vacuum like jets from the sample of heavy ion collisions. Now, we went directly to deep learning but you could have asked before, wait a second maybe you should have actually studied the jet substructure observables that you already talked about. And in fact, that was in hindsight maybe this should have been the first paper but this was what's followed up. So this was a paper, me Guillermo and Marco. So Marco van Luven, which might recognize is the current spokesperson of Alice. And so we had this question, what can the jet substructure observables actually give us in terms of separating quench from unquench jets? So this is a very recent paper. I don't know if you noticed that. So this is from this month. Okay, and I've only put together a couple of slides so that this presentation doesn't go too overboard. Okay, but we also have the code and the data available. So this is something that we were very careful about to be sure that the entire paper is actually reproducible and it has instructions on how to reproduce the results of our paper. So here we did a survey of a comprehensive set of jets substructure observables. So we went from also some of them are high level, for example, global level, I mean, so for example, thermostituents, the mass of the jets and et cetera, but also then substructure there. So we have these angularities which are defined like this. So this will more or less try to capture the girth in how the distribution of the constituents inside the jets along the jet axis are distributed, right? The distribution of the constituents in this traverse plane to the jet axis, but also just end subjectiveness, which is an attempt of measuring how many subjets the jet has, but also the jet charge and then variables that you get from grooming procedures. So for soft drop grooming, there are some intuitive ones which is the radius of the splitting at which the cut is installed, the momentum fraction at that split and the amount of times that that split happened. And we also studied a more recent grooming approach by Albert Sotun, Sotun Tosun, which is dynamic grooming and collaborators obviously, which is quite non-intuitive. So basically there is this quantity that you can compute along the jets and once you find the max, then you say that it's when you have to split the jet and you can then derive not only the value of this kappa, but also the r and the z at which the split exists. Okay, so these are quite high level and there hasn't been that much work into them because it's actually very recent, it's from Tosun 20. I also have to say I'm very, very thankful for both Cordina and Albert for providing code and for allowing to reshare their code with my analysis for this, okay? So what did we do? So we did an extensive study, where we actually did three different, three different machine learning based analysis. One, which was to study the linear correlations and using also principal component analysis and other using autoencoder, which tries to capture non-linear relations between the observables. And then finally, trying to understand what would be the combinations of these observables that would maximize the discrimination between both cases, between quenched and quenched samples. Okay, so here you can see that I'm not using vacuuming medium, I'm using quenched and quenched, but different collaborators will use different names. So it's very similar to our previous case, the sample, but in this case actually more difficult to discriminate because as you can say here that the distribution of the PT is actually the same for both cases. So we actually use the digit, okay? And so once you apply the cuts, the distribution of the momentum is the same. So here we completely factored out the dependence of the momentum on discrimination. So we're really just interested in seeing the impact of the substructures of observables to discriminate between both cases. So our first discovery was that the vast majority of these observables are actually all highly correlated. Okay? So as you can see here, you can cluster them through correlations. So this is basically a cluster along the correlation matrix between these observables. And surprisingly, the phi and the repeat of the jets are uncorrelated to anything else, which is normal, we already knew that, but then almost everything are just basically angularity types of observables. And then we have the charge of the jet, which also stands out separately. So these are very interesting and they are very similar for both cases. So some difference or some variables become less correlated to the main bulk of variables that once you turn on the medium, but generally speaking, most of the variables actually highly correlated. What we also found out is that you basically saturate the discrimination potential by using just a couple of pairs. So we trained a full BTT. So the BTT is like, is it discriminated to just take more data? So we trained a decision tree with all these observables. We got a rocker of around 70% notice that this is a lot smaller than in our previous paper because this is a more difficult process, a more difficult problem. And once you actually just reduce to some of the variables when you start cutting some of the most discriminant variables, you basically get exactly the same rejection efficiency for the same cuts than the BTT would get. And in fact, you can see, for example, here, that once you cut in some of these variables, you get exactly what you would get from the BTT. So it's very, we were quite surprised. Another thing that you notice in this, for example, here we have the histogram for N quench and quench. So vacuum and medium for two observables that we identified, which is the RZ, which is just the mean aperture of the jet and weighted by the PTE fraction. And then the kappa KTD, which was that kappa from the dynamical grooming from Albert and collaborators. And you can see something interesting, which is there is a population migration along the distribution, right? So you start with vacuum here and then once you have medium, it comes here. So just means that the correlations is something that we found is that the correlations between observables actually seem to survive the existence of the medium, which is something which we were not completely expecting. So the population migrates along these lines, but still you can see here where you could then try to focus to find more medium-light than vacuum-light. What am I working on this now? Also with the measurement, but also with Schwann, which is a student, we're trying to go beyond the two papers that we've been before. So the two papers that we've done before, we had different biases. In the deep-finding architectures, we had the bias of the image, which was a compositional bias from local to global. We also had the bias of the recurrent neural network on the sequence of loom-plane coordinates. And in our previous paper and the last paper, we had the bias of the observables themselves. So when you're using high-level observables proposed by a theorist or a phenomenologist, you are assuming that it's interesting because it's just proposed by a human being. So we tried to go beyond that, okay? And so what is the least bias study that we can do? So there is this architecture in the machine learning literature called a transformer, which is a multi-hat attention and mechanism, and you have the transformer block. And it comes from a seminal paper called attention is all you need. And so this is the backbone of all the large language models that you have seen in news lately, like chat GPT and etc. So this is the backbone of those models. And it actually is invariant under input permutation. What does this mean? This means that you can feed this neural network, the jet as a set of its constituents without any specific ordering, okay? In fact, for natural language, so for chat GPT and etc. In fact, they actually have to put a positional embedding so that the transformer can have the information about the position of things. But here we don't want that. We want to train a neural network that takes the jet as a set of its constituents as for momentum. So we're talking about PTs and the distance to the jet axis and tries to do this task of trying to separate between vacuum like and medium like. So it's the same digest data set that we saw before. So the one which is very difficult to do. And these are preliminary results. But these are incredibly good results, okay? We have for the first time in the community evidence that it is possible to completely isolate modified jets. So this is something that's when we start, when me and Guillermo started doing these papers a few years back, he was convinced or at least he was very suspicious that it was not possible. That it was not possible to completely remove vacuum like jets from a sample that comes from a heavy ion collisions. And in this paper, we actually have the proof or at least the first evidence. Obviously many caveats here, this is simulation. It's not completely in equal stages as an experimental analysis, it's a funological study. But even at the funological level we can't completely isolate modified jets. And here for comparison, we also use a BDT but this time around I optimized the BDT. So instead of having 70% has 71% of area under the carpet you can see that our transformer explodes to 84%. So this is a huge difference, okay? Because there is the notion of it is exponentially more difficult to get the left to the left corner of the rock, okay? So this is such a good result that we actually are checking it twice during all the checks. So this might take some time even though I already have plots with late tech funds we actually still a few weeks of work ahead but hopefully within the next two months we put this on our part. So that is our expectation before summer at least. Okay, and I'm going to completely change the topic. So for those of you that like Collider Physics more than BSM model building you already took the most out of this talk for those of you that prefer model building and don't care much about Collider Physics now the rest of the talk is for you. So my PhD was actually on model building and strength phenomenology so this is a lot closer to my heart. And whereas the previous one I was working with QCD, theories and phenomenologists that were interested in BDT driven methods and approached me. This one is something that actually came from my own ideas. So for those of you that are used to create BSM models and then have to try to validate it against a very large array of constraints this is what you normally do. Here's some PowerPoint from your parameter space. You then you have your computational routine for example, sphenus of Susan McGregor's call cap that can be whatever you want to compute some observables and then you're going to check against experimental constraints and then you pass the experimental constraints and you keep the point or it's over the way and you don't keep the point. So this can be computational time consuming as you all know and for high constraint problems this can be the low something efficiency can be prohibitively low to the point that eventually you will give up when using all the constraints and you're going to start simplifying your problem. You start using alignment limits you start by looking around corners of the parameter space that you already know that exist some points you start using less constraints and then you say that in future work you will try to adjust them even though you're not completely sure how you're going to do it in future work and et cetera. So some people try to solve this using machine learning so that prior to our work the main ideas work to try to reduce the overheads of the computational routine by trying to either predict what's the observer's will be given the sample so you can do this by training a machine learning regressor which is given a parameter space point trying to predict what the observer will be so that you do not have to call the computational routine because for example even for relatively simple models like the CMSSM or the PMSSM Spheno plus Micromegas can sometimes take almost a second, okay? So this will be one way of trying to do it or some people are more tried to be more ambitious just try to immediately predict if the point is going to be valid or not even what you've seen before. Another approach which has been looked for by Holdingsworth and collaborators is to try to replicate new points from the good points that you already found. So for example, you already have a collection of good points from an early scan and then you train a generative model in this case they did a normalizing flow network to try to produce more valid points before you presented to your computational routine. However, all of these approaches have the same shortcoming, okay? So if you remember from a study from early on machine learning paradigm is that of statistical learning. So you can normally learn from the data that you have. So all of those approaches will require large amounts of training data more specifically will require large amounts of valid points. So for example, if you're a regressor if you don't have enough points your regressor will not be able to map the parameter space to the observer's correctly because you might not even have full coverage of the parameter space. If you have a classifier again the same problem if you don't have the full cover of parameter space you might do the wrong guessing. And for resampling, if you only been able to find valid parameter points in the sub-region of the parameter space it will only resample points from that sub-region. It will not be able to sample new points elsewhere because again that would be outside of the compact support of the data it was trained on. So all of these approaches have the same problem. So for highly costumeristic scans this becomes computational prohibitive to get enough valid points for these methods to be valid. So for example, the paper that chooses the generative and normalization flow starts off with a million valid points. And you could argue that if I already have a million valid points I already solved the problem of sampling, right? So the problem is how do I approach this problem without having a large amount of data and more specifically without having a large amount of valid points. And this is where we made our first contribution that it was out last year and finally published in PRDTC here and our code is available for whoever is interested in these things which is exploring the product space using machine learning in AI algorithms. So the moment for me was when I said but we just need to change the sampling, okay? Because all of the other problems in the end they're still sampling from a uniform prior they actually do not change the way that points are produced to the computational routine and the constraints. And for me, I would like to quit the computational routine and the constraints because that is an oracle of truth. So why I do not want us to get rid of this because this is the ground truth for me. So let's try to change the sampling itself. So how do we do this? The idea is that once you have a point and you can compute observable you can measure more or less how far off it is from being valid. So for example, in this case I've been observable with two bounds an upper bound and a lower bound. If you're passed by this function and max of zero minus of a lower bound and observe minus the upper bound which has this form whenever you are inside the bounds then this is zero. So the point is valid, it's fine. If you are outside it's invalid but you have a distance a notion of a linear distance of how far you are. Also I need to say that it's irrelevant if it is linear or not. It just has to be monotonic. We did studies on that. So I have a notion on how bad the point is and even for the random sampling these two statements are the same. So the set of all valid points for a certain model will be the point such that the valid dysfunction is zero. But it's also the same to say it is all the points that minimize dysfunction. So finding valid points is the same as minimizing this function. And so, likely for us in machine learning and AI literature there is a wealth of what it's called black box optimization algorithms. So in this case I treat the computational routine and the constraints as a black box. The only thing that I can do is I can only provide parameters points and then I get the value of that function back. And then I put it optimization algorithm to try to find a point that do it's exactly that, that minimize that function. So very quickly, so I'm just going to we did CMSSM and PMSSM. I'm only going to show the PMSSM cases here. You can look at the results. We also did Higgs mass and Higgs mass plus dark matter. So we use micro-magnets for the dark matter and use phenol for the spectrum. And we did both with these constraints by passing through that C function. In this paper, we sum them, okay? We did some studies to see if rescaling would have changed our results. They didn't change, but as you're going to see this is how we're working on now. So what we're trying to do is minimize this function which is the sum of that C function that max zero of the value of the constraint of both of the constraints. And our upper and lower bounds are here because it's a supersymmetric model. The theoretical constraints on the Higgs mass are greater than the experimental constraints. And so we actually have a quite high bound for the Higgs mass here. And for the dark matter, we use the usual one. So we wanted to compare different black box optimization algorithms and we went to the literature and there are many classes of them. So we were able to identify three classes and three examples. One example, each class. There is a Bayesian optimization algorithm to use which is called three-parts and estimated. Then we also use the genetic algorithm which has a fantastic name, non-dominated sort of genetic algorithm too. And we also use a non-genetic but also evolutionary algorithm called another great name called Variant Magics Approximation evolutionary strategy, SMICE, okay? So these algorithms are all different between them. The way that they work, the way that they explore. So they all work differently and we wanted to assess how the differences would then impact the final results, okay? So it's important to take that all of these algorithms are sequential. So they will, the point that I'm going to suggest at a certain step T, it will depend on the points that happened before, okay? So this is where effective or emergent intelligence that's why these algorithms are following to you within the AI and the literature will learn. So it's important to say that two of them have a learning component. So the Bayesian one has a learning component and the evolutionary one as well but the genetic one does not have. So the genetic one is not a machine learning algorithm. So, but most importantly, none of them requires any data to start off with, okay? So you have your Spino with Micromagics pipeline ready and you can just split this in even in your first run and just let them go and fetch quick points for you, okay? So they will adapt their search dynamically. So our methodology, we use a Python package called Opportunum which already has implementation of these algorithms and for each scan, so each scan is CMSSM, GMSSM and then only Higgs mass or Higgs mass plus dark matter we did 500 independent scans, which we come to episodes. Each one of them doing a total of 2000 points sequentially. So in the end for the scan, we did a million points, okay? So the points, the sentence will compare with different metrics. So not only the efficiency, so efficiency is how much of these million points are actually valid. So the ratio of the points are actually valid, but also the vastest time distance against uniform distribution which tries to measure the coverage on the parameter space. So in that case, we want less is better and also the mean occlusion distance of the points that were provided. So again, less is better to try to understand what was the coverage of the parameter space because it does not suffice to have valid good points we want to cover the parameter space. So in the end what you have, so these are the results for the valid points for the different samplers for the Higgs mass and for the dark matter density. You can see that the red one is the one that differs the most from the distribution to get from the random sampler, whereas the Bayesian and the genetic one are not that far off what you would get, but it's not the same and it's okay not to be the same. We are not trying to get the posterior distribution using likelihoods. That is what you use a Monte Carlo Markov chain sampling study for. We're here, we're trying to find the regions of the parameter space which are valid. So it's okay for the distributions not to be what you would get from an MCMC or from the random scanner. In terms of scatterplots, so here you can see some of the features that the different samplers can provide. So the TPE which is the Bayesian one seems to be the one that's closest to the random sampler in terms of density and spread out of the valid points. And then you start seeing things that you already expect if you know how the algorithms work. So the genetic algorithms you start seeing like these stripes which is like cloth stripes, right? So this is common, these are called schemas in genetic algorithms. And this basically means that there are some traits of the population that will survive multiple generations if they are, okay. So as you can see here, what basically this is saying is that the genetic algorithm sampling, once it fixed the value of 80, then that value probably survived multiple generations of some of this, the same. These vertical lines represent probably an entire episodic, all of the points along that generation, along that run, okay. Then the SMICE, the SMICE works in an interesting way. So the SMICE works as effectively as a Gaussian that is moving around the parameter space looking for points. So the SMICE will always have these lowest key points which feels like paint brushes, right? So you can see here like this paint brushes, it means that's the Gaussian, yes, the normal distribution got there, found a lot of points and then it is repeated. So we turned on a self-restart option so that it wouldn't get stuck in the global minimum, okay. And because of that, it will then give you a lot of points in certain regions that the random sample would not give but nonetheless it's still giving you the valid points. So in terms of efficiency, the differences are completely paradigm shifting. So for example, for the most constrained case, which is the CMSSM, for the Higgs mass and the intact mass of density, the random sample has an efficiency of a 1.0.1% whereas the evolutionary strategy gets to 43%. Okay, so these are two orders of magnitude above and you can see similar for the PMSSM. So this SMICE is highly efficient. Then comes the NSGA2 and then the TPE is the one that is not as efficient as the others but still provides something efficiency above one order of magnitude. Also notice that once you start getting our efficiencies of around 0.1%, you're basically very close to saturating their efficiency. So there will not be more gain from this. So you would have to go for a more difficult problem in order to see further benefits from different algorithms. So this is how they work. So remember that each independent trend was 2000 sequential points and the width here is a bootstrap. It's a bootstrap one standard activation of the 500 independent scans. And so this is very interesting. So here you have the average loss and for the random sample, obviously the average loss is going to be on average, constant along the entire run for all the four cases, right? And all the others very quickly decrease the average loss of the next step. So this is also a rolling average of the 50 previous trials. So you can see here the convergency to the minimum happening in all of the intelligence scanners. And then in terms of efficiency, it is basically not the reverse, but the complementary of this analysis, which is you see that the average efficiency saturates very quickly, very close to one for this evolutionary strategy and also for some of the cases for the genetic algorithm. Then the TP always has high efficiency than the random set, but it stabilizes very quickly. So this is for you to have an idea on how this works. So moving forward with this idea, what have you been working on? And I don't know what I've been talking to almost done. So this will be less than one hour. So with Verne and now with one of his students, Andreas and my student Fernando, we are trying to follow up on a more difficult problem because after seeing what I presented so far, we could have asked, what about a very difficult problem? And hold on to your seats. I have two very difficult problems I'm going to show you. So one of them is based on the work that Verne has done with collaborators on the model which has a leptogenesis energy minus two solution. It's called the genetic model test. So it's also dark matter model, right? And this is a very interesting problem because you can see the contention between what your model is trying to do. So on the one hand, you want to expand the G minus two, right? So now you have to expand the G minus two which is going to be this plot. You don't want these fermions to be too light, otherwise the loop suppression is going to be too big. So you will want to have them a bit higher but if they are too high, they also go into the loop of the neutrino. So then there was one neutrino mass. So then what you have to do is, okay, in that case I'm going to keep these things light but I'm going to increase these couplings, right? So that's the loop contribution increases. But once you do that, then you're going to have charged leptoflavor change in currents which you have bounds from experiments. So there are many contending forces between the different constraints that you're trying to implement. So what have you actually tried to do? So in the end, so let me just also say, so we have around 63 parameters, okay? Because actually we are using the full complex numbers and we are using pretty three constraints. So the Higgs mass, the material density, the chain data, so including mixing angles and CP violation, that's why we have complex parameters but also the flavor violation bounds, okay? So this is a very difficult problem. We are approaching it on a different way, okay? So we're approaching it as a multi-objective optimization problem because we have that many different constraints. So as you recall from our previous paper, we only had two constraints, we summed them together. There was this question that we had between us if we should try to put them to the same scale so that one would not overdo the other and et cetera. A way around this is actually not to sum them together but try to optimize them all of them jointly. So there is this notion of multi-objective problem which is you try to in the objective space, so you can go to an objective space and you can put each point where it falls under and you are trying to minimize all of these objectives. So you're going to create something which is called a period of thought which are the points which are better than all the points which are inside this region, right? So these are all the points that are better than those but they are not better amongst them. So they are equally good amongst them. So some of them will be better for one of the objects but not for the other and et cetera. So this is the period of thought and what you try to do using a genetic algorithm in this case we use NSGA3 which is the evolution of NSGA2 which is specifically crafted for many, many objectives and we have 23 of them, right? So we, NSGA3 tries then genetically, tries to get the points which are better, better, better, better than eventually for our case we do know that the good point is when the objectives are zero. Also, because we want to make this even more difficult we got trees of the Kazasibah parameterization. So in the previous papers, in their paper, in fact, in collaborators they use the Kazasibah parameterization which for those of you that don't know is you actually use low energy neutrino data as inputs then you rotate back to your BSM parameters and then those BSM parameters will have to, will have to, will have to agree with some perturbative or unitarity bolts but you already are working with the correct neutrino mass, right, and angles and et cetera. So we gave up on that. So we're actually doing the more difficult thing which is we actually are sampling the BSM parameters and then checking the low energy data of the neutrinos, okay? So far what I can tell you is that random sampling we did a one million points test but to be honest, we believe that random sub-efficiencies follow within this and I don't have plots because the students are still, this one is mostly the students doing the heavy lifting. So they are working on this right now but they already have obtained convergences. So we already have an STA3 working on this problem getting you right, good points after handmails and of the order of that. So we already solved the problem for this one. So for very difficult problems, this approach still works. And in fact, I'm working with another one which you can say it's probably even more difficult depends on what it is we charge from this one. One, they have 16 parameters. So this is double-higgs, triple-higgs W model, okay? A three-hdm and we only have 16 parameters but we have 60 constraints and the constraints include the STU parameters, the ObliquezU parameters, the balance from below of the scale of potential, perturbative in your time, everything in your hours. We also need the LHC Higgs couple of constants. So we have both the NEWS and the CAPAS from Atlas and we actually are running now with the most recent constraints and the random search efficiency sampling for this is at most one in one billion, okay? So you would need one week with 16 cores to get around 10 points. And so what people do, and you can go to their paper and see this is exactly what they have done, they go to alignment units. So for you to have an idea why this is such a difficult problem, here I did a collective flag. So points that are good with STU, points that are good with all the NEWS, points that are good with all the CAPAS, as it goes from a BS gamma and et cetera. And you can see that some of these constraints, for example, in the charity balance from below are highly perpendicular. So the effective dimensionality of the parameter space is almost of no measure. And you know that sampling, trying to find a no measure space behind the sampling is basically impossible. So this is why this is such a difficult problem. And what I can tell you is our method solves this problem, okay? So we went beyond the line limit. So here I actually have around 10 to 20 points of random sampling. As you can see, they're scattered very small points. We also made a scan along the alignment limit. So here alpha one and beta one. So the alignment limit is that beta one is not good enough, one by 50%, but not only that, also the gamma one can mature in some of the masses. And this converges so quickly that now the problem is not trying to get points, but instead trying to explore the parameter space, okay? So whereas this takes around a week on 16 cores, you can get points like this in minutes on your laptop, okay? So this is a paradigm shift on sampling. And here you already have these are preliminary results and this method will take a little bit longer to produce because there are many working, many moving wheels. But you can see that within minutes you can completely explore everything. So here we have different variations of the algorithms which already include extra exploration. And you can see, for example, you can even force the algorithm to say go beyond the alignment limits. So this would be a very nice complementary scan to this one, right? And then you can start saying, well, in this exploration, I don't have a lot of high values for beta. So then you can then scan slices along the time beta or beta so you can completely cover the parameter space. So you can see that it's so fast to get points that you can start doing these variations. So what we're doing now is trying to understand if what we get outside of the alignment limit is different in terms of physics or phenomenology. So what you would look for in experiments. And look here, for example, that's where as for alignment limits, you normally have these very awkward cuts in the valid parameter space, right? So they have like these hyper cones where gamma two cannot be greater than 50% of gamma one, whereas our points, our scans completely cover all the possible points for gamma one and gamma two. So what we're trying to do is trying to get this to a lot else. The reason why this has a lot of white as compared to this one is that there are like around five to 10 times less points here than there are here. Okay, so these were just an even quicker study. All right, so I'm going to conclude because this was a lot to go through. So in recent years, HAP has seen a percentage in interest in both AI and machine learning applications, okay? Most of these have been carried out in experimental contexts, okay? So we all know that experimentalists use a lot of machine learning. So nowadays, BTAG are neural networks. You already have BDTs in analysis. So experimentalists do use a lot of machine learning in their workflows nowadays. So for them, it was really not novel. What was novel was the resurgence of the interest in itself. However, I hope that you are convinced that there's a lot of possibilities for phenomenology and theory that we have to start exploring for real, okay? Moving forward, the future of our field in phenomenology and theory will pass by AI and ML. So as two examples that I've been working on, so for collateral phenomenology in the very specific case of trying to identify French chats by the parkour plasma, we have shown that you can get state-of-the-art discrimination by using neural networks without using any type of high level observable, okay? And there is more information inside the jet that's not contained in job services that we, the community have been using so far, okay? So there is more theoretical studies of QCT of our generalization in the parkour plasma that needs to be done, okay? And also we provide evidence that there are unique fragmentation patterns imprinted by the parkour plasma because we were able to completely isolate modified jets from the sample. So these are very important results for QCT theorists and phenomenologists. On the BSM size, I hope that you are convinced that we were able to solve the random sampling efficiency for highly constrained models, okay? And this will lead to the possibility of finding new regions and studying regions that you haven't studied before from a logical point of view. So this is the type of follow-up that we're doing now. So our next two papers are not as focused on the efficiency because we already know that we solve the efficiency problem are focused on the coverage of the parameter space and what does this mean for phenomenology? More generally, it's also very important to say that things are moving very fast. So the research, academic and industry research on AI now moves really really fast. So for example, you know, chat GPT and also the stable diffusion are less than a year old, okay? And even though you already have chat GPT-4 and stable diffusion already being surpassed by others. So we can only imagine what's in store. So for example, here you have, I asked last year, chat GPT in your opinion, what is the great next application of artificial intelligence machine learning to high energy physics knowledge and theory. So almost all of it is just platitudes about experimental applications, which is useless for us. But then it says something interesting. Additionally, AI and all techniques could be applied in developing new theoretical models, which is the second half of my talk and understanding the underlying mechanism of physical phenomena, which is the first half of my topic. Obviously the disclaimer here is that I do not use chat GPT to decide what I'm working on. And then it's sort of here is for the paper that we have on quantum machine learning, the student in the Alpe Foto. So excited about all of this that then he went to stable diffusion to create the icon. So this was generated, this image here was generated by an AI, okay? So the NML applications for head theory and phenomenology are still relatively in infancy when we compare to experiment with our experimental colleagues, but there's a very, very exciting future ahead. And last but not the least, it's actually very a lot of fun to develop and implement AI machine learning to figure out and theory. And so just institutional propaganda. I know this is more for the Latin America, but it might be interesting participating in DC is Susie, okay? So Susie happens every year and Susie is always preceded by pretty Susie, which is a summer school, which has many topics. And this year, I was, I'm very thankful to have been included in this list of superstars, including Fernando Covedo, Monica Taylor, Stephen Martin, et cetera. I'm going to give a quick two-hour course on machine learning for Susie model building. So basically going through some of the things that I talked about today. And that's it. Yeah, that's it. Thank you very much. Super. Thank you very much for the very nice talk. So before we go to the audience, maybe we can have a question round by the participants. I don't know if maybe Nicolas and Roberto have any questions or maybe I can start. Okay, so let me start. So one thing that regarding the first part of your talk, you were comparing jet properties when the jet was in vacuum and when the jet was going through a medium. But I was wondering what would happen if like the jet is produced near the boundary of the medium, right? And you have it going through from medium into vacuum. So could that happen? And what would be modified in that case? So that can happen for sure. So this is what this image is showing here, right? So for example, in this giant jet event and have one of the jets being produced at the rim of the quark wound plasma. So what this means is that quenching is not a binary characteristic of the jets. What this means that quenching is what this means that quenching is somehow a continuous effect. So how you measure, how you quantify it, how many degrees of freedom, that is the open question preceding, right? But what we do know is that in fact, inside the sample of heavy ion jets, so jets that were producing heavy ion collisions, some of them interacted a lot, some of them did not interact as much. So the ones that didn't interact as much would then be vacuum-like. So the big jet stuff would appear as vacuum, right? So, and whereas with vacuum, you have a definite binary category because you can always prepare proton-proton digit events, okay? So the LHC does billions, millions, trillions of digits every year. So we have like a candle, right? So we have a candle for what the vacuum jet is, but we do not have a candle for what the modified jet is. So what we claim is that these methodologies are removing vacuum-like jets from the heavy ion collision jets. But inside that there will be jets that will be modified a lot, jets that will be modified less. So the question was more or less in the direction of would it make sense to, or would it be possible to simulate jets that have got both components? So this has been an ongoing question. So I'm not a quesitif in malgis of the Earth. I think that was, it's clear by now, right? And I'm a model builder. And so I've had this question with the animal watch. And what I can tell you is this, there are many Monte Carlo simulators for jet quenching, for heavy ion collisions and et cetera. They all seem to use different physics to motivate it. They're all calibrated against experimental things in the end, some of observables. But there's a bit of a, so some people like cones, they don't like all this. And again, it has a lot of collaborators and we have been working a lot with jewel. So we have, I have opened up this question. So is there something that can be done in jewel that is not possible? So jewel does not do that. So that would not be possible in jewel. I think there are other simulators that might be possible but then they fail in some aspects where our studies will not be possible to do. But in principle, that would be very interesting. And actually it's something that we're actually very interested in is having a vacuum jet, what would happen if we pass it through medium, which is exactly your question. So that is something that we are very keen on exploring at some point. So that is something we have many, like back of the envelope ideas, like to do lists of things for future projects. And that is there, I can tell you that. By the time being, we don't have that level of data, see related data, but we don't have that type of data. Because I think that like giant, that is kind of things, but okay, maybe other medium like blocks of material. Yes, I just think is that giant is interaction of radiation and streams with normal matter. So for example, when you run, even when you run things, for example, so experimental, this will run giant to simulate collider interactions, right? So the way that the event interacts with the collider. So that is done, but this is exotic matter. So modeling the quadruple plasma itself is an area of research. That's why there are so many generators of the quadruple plasma and jet quenching. So when you pick one of them, you're actually more or less subscribing to a certain choice of physical processes to model the quadruple plasma because we do not have a formal way of doing it. So non-perturbative QCT is a paramount example of something which is formally well-defined, but it's computational intractable. So all of these quadruple plasma simulators rely on different approaches, okay? Right, okay, great, thank you. Any other question? I have a couple more, but probably that's a little bit. Here comes. Oh, Miguel. So, Miguel, very nice that it's okay. I mean, it's very impressive how useful could be all these type of modeling for the use in machine learning. My question was related regarding the part of BSM. You understand the model case? Because as I para-understood, I do, you can tell me if it is correct, but you have to start with the process of training your tuning the machine learning in order to be more efficient in the, to look for good points once you stop. So, not our approach. Our approach, it will learn as it goes. Yes, yeah, yeah, but in the sense, when you were estimating, for instance, in another part of the part, regarding to the efficiency of the algorithm, do you know more or less how it grows as soon as you are started to explore it? Because I think that in the early stage of the search in the space of parameters, it's almost like a standard Monte Carlo. Just. Yes. So here you can see. It has to be more efficient, but do you know more or less how much points you start to get efficient, start to deviate it from the standard model case? So one of the things that we've learned from this paper is that the metric of the global efficiency is irrelevant because instead of 2000 points, if I would have a million points, I would have almost efficient point in almost all of them. So your question is, how many points do I actually have to burn through to use Monte Carlo terminology to get to the minimum? So here you can see that it's very quick, right? So if this is 500 points, you have around 100 points until you get good points. And remember that the random sampler efficiency is 0.1%. So at least it converges 10 times quicker, okay? What we have now is a lot more than that. So for example, I actually do not have the points for this, but you can get valid points here after 1000 tries. Whereas the random sampling efficiency is below 1 in a deal here. So we're already talking about six orders of magnitude improvement. So six orders of magnitude is the difference between a second and a month, right? So it's a big thing, right? Yeah, I'm getting my... Cool. So Anna, for following up question, in that aspect, I mean, one of the nice stuff, let's say when one is doing search of the... Using a standard Monte Carlo for the space of parameter of a DSM model with dark matter or whatever, is the stuff that you can parallelize in... I mean, you can run the same process in many, many computers at the same time and then gather all the good points. In this scenario with machine learning, I guess you can also run it in many computers. But then the learning part of each different code, is it possible to merge it again and kind of a common knowledge for the... So, I mean, this is as parallelizable as a Monte Carlo chain, because a Monte Carlo chain is a sequential. So you can have multiple of these in parallel. And in fact, these are multiple parallel runs. I can imagine that is the idea too. So that's the thing. So each one of them can be run on a single core. So you run multiple of them in parallel. They are independent amongst them. So there is no communication. Would it be possible to have a common oracle to keep track of some information possible? Not the focus of my work at this stage. So at this stage, I'm trying to completely explore 3-higgs doublet model with 2022 LHC data and studying what is left beyond the alignment limit, because that's what their literature has not looked into, right? Efficiency is like that. It will come to it. So there are already two Python packages that try with different approaches for scans, which are not like ours. They are using the things like the previous attempts. And even though my Fernandes, which is my PhD students, Vendor Fernandes PhD program project was the idea was to create a Python package for that. Like I'm not in a rush to get to that point. Eventually, we're going to get to a point where this is going to be distributed in a way that those problems will be addressed. Because notice that I have very little motivation to find tuning like tweaking like that when I can get thousands of valid points in minutes on my computer. So I'm promising you hyper-efficient parameter scans on your laptop. So yeah, if this can be my parallel and put it on a class, yes, eventually one day. But that's not the game that we're playing at now. So my last question regarding all these, also because many of the people that follow the webinar, people doing PhD also want to follow different type of background as well. So most of these studies that you did, you were using framework based on Python? Or because I don't like mathematical, I don't like great package for mathematics. So it's Python. Yeah, I use Python virtually for everything. I know that Mathematica has some machine learning implementations and etc. Mathematica is very heavy for long numerical studies, Mathematica I think it falls short. I like Python. Python itself, the language, is not very fast. However, you have to think of Python as an interface for many good libraries. So when you use TensorFlow, PyTorch, NumPy, Numba, these are low-level C++ compiled routines that run almost as fast or as fast as if you had written a C++ code from scratch. So when you want to do a lot of these things in Python, what you are using is in Python only as an interface language to all of those packages. If you're doing a lot of things in Python, then you're using it wrong. Just like you can do a lot in Bash, but you only use Bash to do some things, right? So Python is basically the same. So yeah. Cool. Thanks. Okay, great. So we're running out of time, but I still wanted to ask a couple of questions regarding the second part of the talk. First, a quick one regarding the Casa Sibarra. I mean, I guess that you abandoned Casa Sibarra because it was already being taken care of by the learning, right? But was there any other reason? Because in principle, Casa Sibarra, then you don't have to care about neutrino data. But what was the reason that you said, okay, let's just not use it and let machine learning take care of the neutrino data? So the motivation was as simple as this. We finished the other paper, Verna was putting his paper out as they put in January, and Verna said, we need to try it in a very difficult example. That was it. So by giving the Casa Sibarra prioritization, we are just making the problem more difficult on purpose. Okay? Yeah. So, for example, in the Higgs, in this one, we are using an equivalent parameterization for the Higgs mass, which is the deep end colors, deep end colors parameterization, which basically you put the low level masses, and then you have to rotate back into the PSM scalar parameters, right? In this case, it didn't complicate. We could have complicated it, but in this case, we didn't do it. So it was just a matter of seeing if it was possible. Because Verna's Casa Sibarra deep end color parameterizations are very powerful. We also want to show that maybe in the future, we will not have to look for parameterizations like that, if machine learning and AI are real distribution defined points. Okay. And the last question. Could you go to slide 42, where you did your scans on the MSSM and PMSSM? So I was a bit surprised when on the right, where you have your, I don't know how you pronounce it, mice, CMA, ES, right? They're like those blobs for very large mu that are not really noticeable on the other ones. I don't know if there's an explanation for those blobs, but what is happening there? It's not a physical explanation. It's an algorithm explanation. So one thing that you'll notice is that minimizing this C function makes your problem very different from doing an MCMC scan with likelihoods. And with this C function, the global minimum, the points in the global minimum are the same, are all the same in terms of the way that's an algorithm see, because all of them yield zero for the loss function during trying to minimize. If you were doing something like an MCMC, what you call valid points would still have a non-vanishing likelihood because you have some Gaussian profile or something like that. So if you were doing an MCMC, you would have very little points in this region because it would not be favorable over the likelihoods. But for the CMS, for the SMICE, the points that are here or in that blob, they are the same. They are as legitimate and as they can be, because it's just the region of the parameter space that it found that minimized the loss function. So the Gaussian of the SMICE will then spread out in that region and then eventually will restart, which is exactly what you're saying here. So basically it found a region there, it spread out, found a lot of value. Concentrated on there. Then it was restarted and went elsewhere. So restarting the SMICE is something that I'm quite focused on, not on the paper with Verna and the students, because here we are using the multi-objective approaches in statistical rhythms. But for the 3D HGM, this is the SMICE variation. So here I'm trying, so you can see these lines of the SMICE, actually the SMICE going around a region that's already valid. What I was able to do here was to motivate it to extra, to do extra exploration in the global minima and so you can see the SMICE actually going around and finding points in the global minima. Okay, that's very nice. Yeah, so how you get these algorithms to explore, because they tend to be very, the terminology in the literature is eager, they're very eager, they're very eager to find the minimum. Whereas what we want is not only to find valid points, but to find all of them, or at least to try to have the best coverage possible of the parameter space, right? So I'm now fighting the fight of exploring the rest of the parameter space. I'm already not fighting the fight of finding valid points, because those are very easy for me to find nowadays. Yeah, excellent. Okay, super. So I guess that's it with Verna video out of time. So thank you very much, Miguel, for the very nice talk. And to all viewers, please join us in a couple of weeks, as we will have David Velasco giving a talk. So thank you very much, everybody, for being here, and we'll see you next time. It's your role, bye-bye.