 So, first of all, I would like to thank the organizers for bringing me here. I mean, it has been a wonderful experience, a very nice environment, and I'm actually one of these domain scientists that after a few years has realized of the need to be using different new techniques that have been around for a while. So, if you find something that it's too low level, I mean, this is because machine learning is not the primary knowledge of the group. So, let's see if I can move on. So, since everybody's so young here and everybody is so into new methodologies, I wanted to bring back a very old question, okay? This was posed in the Royal Society before the 20th century, saying, okay, we don't have enough nitrogen. How are we going to feed all the population that we are having if we don't have enough nitrogen to grow our crops? Okay, that was before the starting of the 20th century. And actually, what happened is that it started to raise for different scientists to try to see how we come up with a solution to fix nitrogen from the gas phase into molecules that we could bring to the fields and then grow more crops and eventually, sustainable people. So, yeah, basically you guys are here because there were these two people that were scanning 20,000 different materials, okay? And I want to bring your attention that big data has been here for a while and has been in heterogeneous catalysis for a while, okay? This process is still giving us enough food and it consumes 1.5% of the total energy consumption. No, about 1% of the energy and generates 1.5% of the CO2. Okay, but remember, 20,000 compounds, where does it? Now, we are facing a few challenges at the turning of the new century which is not the turning, it's already the 20th. You know that we are generating all these CO2 on plastics and eventually we will need to refarnish the full chemical industry as we are understanding it. And one of the problems is that we will have different inputs, not the ones that we were having before. Let's see if I can find a point, oh, find a pointer. Oh, I can use that one. Okay, oh, marvelous. Okay, and then that one, okay. Then, so basically we will need to refarnish, yeah, oh, okay. So refarnish all the chemical industry and we will like to take CO2, water, air, electrons and use those to make our new chemicals or basically start using all our dust or our waste and generate new chemicals by different ways. So yeah, so materials for catalysis is one of these sciences that will quest for properties, activity, selectivity, stability and we will ask for, okay, how we prepare these materials. Actually, the final way how we do think is a little bit different than that. We would like to go this way but we end up synthesizing lots of materials, characterizing them and trying to infer the properties from this. And basically this is what is based on our structure activity relationships. Most of the things that you hear in heterogeneous catalysis and electro catalysis and in all this is based on knowing what is the structure of the material in order to be predictive, okay? But it will be nice to go the other way around. Okay, so how does activity look like for a typical catalytic problem? Well, activity looks like these two central panels here, okay? This is a typical volcano where you are having the top of the activity in a region and then you have less activity for the catalytic materials later on. And then you are having also this other problem here that resembles a little bit what Francesca has also in the activity of the materials that if you are looking for selectivity and selectivity is producing just one compound out of the many that you can produce in a chemical reaction, then you are facing a cliff. And these type of problems here are represented here in two dimensions and you see activities, easy selectivity is a pain. For activity, you see that I'm having one term here. This is what is called in our terminology a descriptor, okay? So catalytic people will say this is a descriptor, typically it's a binding energy that provides you that it's the one giving you what is going to be the represent activity of this parameter. So if you are able to tune this parameter, eventually you will get to the top of the volcano, which is what you are aiming. Okay, so these are more as the basis. Now the problem start. Well, let's go one thing. The reason why these volcanoes appear is because there is a lot of symmetry between the properties of different intermediates on surfaces, okay? And by symmetries, I understand that you cannot optimize the different intermediates on the materials, typically because our materials over the years have been very, very simple. Okay, we have been using metals, mostly metals, and when things get too complex, then we were adding oxides and things like that. But actually, there is not a single catalytic formulation in industry that is not more complex than a metal, okay? So these are our academic version of how the things should look like. And as you can see, there is plenty of this, there is plenty of these dependencies here. The transition state, which is controlling the rate of a given process, depends on the energy difference. And you can basically find very many of those that are related, that are related, okay, okay. That are related, that are relating the thermodynamic to thermodynamic properties, some kinetic properties to some thermodynamic or even coordination. So all these rules, what I'm putting is so many symmetries into your system that it doesn't work anymore. So the ultimate consequence of this is what you can see, oi, oi, oi. What you can see here is that on one side, you are having these adsorption correlates on one way, and then the coverage, which is the amount of these particular species that you are having there, correlates the other way. So if you draw the envelope between these two lines, this line here, this line here, this is the origin of your volcano, okay. So now that we know the basics of how people has been addressing these problems, let's see, so traditionally what we were doing was the following. People was testing materials in an experimental lab, we were doing some DFT calculations, basically getting the reaction path. The reaction path is just a list of intermediates and the barriers that are connecting these intermediates, for not only the direct path, but also in principle, also for the non-selective paths that I was explaining before. How about these we were doing, our beautiful volcano was trying to see what was the descriptor of the energies. Now, problem comes because life is much more complex than that, and actually catalytic structure, it's a problem because you have seen defects, modifiers, promoters, subsurface atoms, impurities, you name it. You have a lot of molecular complexity that comes from the fact that you have large molecules, so all these three-dimensional structures are difficult, including chirality, aromaticity, rigidity, we have heard many of those today. And if you go to electrochemical environments, for instance, then you are having even more complexity there. One part comes from the external energy source that you are putting, being these the electric potential or light, but then you can have so many effects, cations, and many other things. So the whole thing is that we can no longer survive in this very easy life of a small bunch of metals and calculating our reactivity there, but actually we need to increase all this complexity. And this comes at the idea of machine learning at different stages to try to understand or try to get information for different things. So the first thing is having a database. And I have heard here, everybody having a database. So Moises Program 1 for us. The good thing of our database here, which contains only computational data, is that it has been trying to generate, I mean, to store data that comes not only from materials, but also from molecules. Our systems are in a continuum phase. And the reason for that is that we have seen that many of the databases were basically based on describing materials or describing molecules as separate issues. And we think that there is a lot of information that can go at the interface between these two things that could be important. So a few characteristics, exactly chemical purpose, project oriented. We have basically a central node so no one can say that it's storing data on us and so on. And actually, I want to say that we managed to commercialize this, which is very surprising from my point of view. But OK, so basically, we have different areas. You can go have your private area, public area. And this is very good for companies that are trying to use it. And while this is a typical pipeline, what is important is we normally do this thing, upload our data for a system, for instance, for a paper that we have submitted. And when the papers accepted, it's when we release the complete full database that it's associated with. Of course, we are using lots of codes, traditional things related to the format and the ontologies that will not work more. And yeah, we have been using the typical chemical markup language in order to make all the translations. OK, that's enough. So it looks like this. It could be a little bit more or whatever. OK, so as I was saying, and one of our most important role when we are trying to do catalysis discovery is to see what is the descriptor for a particular property. OK, and here we started very, very lousy. We started doing principal component analysis. OK, and this is the routine. And at that time, what we were thinking is OK. I mean, we can calculate lots of properties of very simple fragments. And that was at that time, carbon, hydrogen and oxygen and up to C3 fragments containing all of this. And what we tried to see was how many of these principal components we needed in order to in order to represent the energies in the absorption energies in the proper way. So we ended up with these two types of descriptors, the metal descriptors and the molecular descriptors. And of course, because this is principal component analysis, we have to try to map them afterwards in order to see if we could see if we could find any meaning to these to these descriptors that we have found. So this is what we did. And actually, yeah, I mean reproducibility is nice. The test is very nice. But we found two things. The first principal component analysis is something that was known in the literature. So it can correlate to something that was known for very long in the literature. And this was a deep and center. So there was a lot of modeling that was done in the 90s and beginning of the 2000s. That was based on absorption energies only can be I mean, the main descriptor for the absorption energies in metal is just a single parameter. And actually, we managed to retrieve that. And for me, that was the first signal that we could do something useful. Second signal that we could do something that was interesting for us is that we were also able to retrieve what the second principal component was. And this is very much related to the redox capacity. So the capacity to either donate or retrieve density from another object. OK, these are the two properties from the metal. So for me, what is very, very important is that we understand the chemistry that it's coming behind because the chemistry is what I could convey to my synthetic colleagues to go on and try and look for more. OK, of course, we did some testing, how it was working outside this our system. And while I mean, considering the simplicity of the model, it really was very important and very interesting for us. One of the other problems was also mentioned by Riann, complex reaction networks. When it starts to come to investigate, when you start to investigate activity, we have exhausted most of the simple reactions. Ammonia synthesis has been known for many years. We all know all the step by heart. But now when it comes to systems like you start having lots of possible intermediates, the annotation and the propagation of the different and the proper transitions between the different elements becomes important. So we annotated first time for ethylene glycol, all of the composition network. And that was a pain because our aim was trying to see. Now you see that it ends up in a reaction network that it's as complex as this thing here. OK, so you get a reaction profile, but you cannot make any logic out of this reaction profile. And the reason for that is that it needs to be combined with microkinetics so that you can really express what is the major product that you are extracting. OK, so DFT, since we have so much DFT data, we are losing the capability of saying what is the real outcome if we don't couple it to microkinetics. So now situation is even more complex for reactions that are in the electrochemical context. And this is just copper and copper is known to be. D catalyst for electrochemistry, CO2 reduction. OK, now you will now we try to see what people show what people has been doing here. OK, let's see. So in a catalytic in the electro catalytic CO2 reduction, what we would like to have is high products, let's say more than three carbons that are coupled together so that we can form these long chain systems. But actually, the system is a little bit more complex than that. And bugger and colleagues, they realized that they could categorize, they could classify, and this is a classification problem, the properties of the different metals just by separating what is was the absorption energy of CO and what was the absorption energy of hydrogen. Very easy problem. You took all the metals, all the reactivity, and you end up with all these groups. And actually, the only one that it's able to form carbon-carbon bonds is copper. OK, the rest are giving only C1 products and mainly C1 products or hydrogen. And this is not what you want because then you will put your expensive electrons into producing something that you don't want or you want less. Now, the problem is that if you want to go to C3s and actually you can get C3s, you need to do a few bunch of things. First of all, you need to change the configuration of the cell. This means that the traditional way how catalysts were searched for is not good enough for this particular problem, which means that you need to pass a lot of current. OK, now the second thing is that if you do this, you end up with this very nice list of products. OK, so if you are not a chemist, I can, well, I cannot even draw all the structures here. So even if I'm a chemist, so basically you can generate absolutely everything. But I want to point out a couple of things here. One of them is that you can generate these are the C2 products or the main C2 products, acetylene. It has a carbon-carbon double bond and ethanol has a carbon OH group. It's an alcohol, OK? So basically you see in the C2s, you can generate these problems. But if you go to the C3s, the molecules containing three carbon atoms, then the one that contains the carbon-carbon double bond doesn't exist. Or you cannot generate it in this material for this purpose, OK? So basically in order to figure out how was this problem going on, we had to generate all the structures going from C2, CO2, all the way to C3s and even we have been growing even larger, generate all the combinations, use molecular graphs for this, eliminate the redundancies and structure and level all the individual steps in our reaction networks, OK? It's also possible to do the ball breaking reaction, I mean, detection, so figuring out what will break or what will form. And then you end up with an intermediate kind of code that we generated, which is this one here, but my favorite feature ends up to be that one. So if you go to CO2 reduction up to C3 products, you have a problem because you have 500 intermediates and in principle, 500 intermediates you can handle by DFT, but you have 2,500 transition states linking those. And this is a huge problem because for each calculation that you do for an intermediate, you will need a spring and calculation that it's at least five to 10 times more expensive for the transition states, OK? So basically, this is not doable anymore. So what we did that was to generate all the graph, calculate the intermediates by DFT and then use a series of filters that allowed us to simplify the reaction network down to the point that we wanted it and that's the point that we wanted it. So in here, what you can see is the following is just a piece of the reaction network. And as you can see, this molecule here is the one containing the carbon-carbon double bonds for those that are not chemists in the room. And this is the propanose. So remember, we are getting this, but this one is the expensive one, the one that we would like to have. And actually it comes down to the fact that there is a problem with the kinetics with how accessible these different parts are. And basically we are stuck with an intermediate somewhere here that goes all the way to propanone. And the only way to understand how this reaction works was to go to our colleagues from NUS and EDH and what they were doing was starting by two different related, they're not even intermediates, they are related to the intermediates. They started by two different molecules. And when they started by the molecule that we told them, by the intermediate that we told them, they ended up seeing some propelling here that you can find here in the experiments. So the only way to integrate such a complex reaction network is to put one of the intermediate products and see how it goes. And this is as much experimental information as you can get for these complex reaction networks, which I would say is really a challenge. We have also been working on other types of weird things and this is kinetics trying to generalize performance equations. Let's see. So again, you can see that what we did here was to take, yeah, to take DFT data and then we built a full machinery here that I will try to show better here. So this is the results that were produced in EDH. And as you can see here, this is a bunch of metals, same support, silica, and you see some patterns for the conversions. OK, so how much is produced of the product that you want. And in the second plot, how much selectivity you are getting to different problems. So this is one of the typical problems of selectivity. Maybe you have good conversion, but then you have selectivity that doesn't work for your project. OK. If you do this in traditional DFT, what you will end up doing is calculating this reaction network, which is very cheap. OK. Now, the problem is the following. Even if we are sure that these materials here are more or less the same size when prepared and this is a crucial problem, what we are not sure is if they transform under two reaction conditions and by transforming, I mean if you are generating some carbon, which is one of your undeserved products, the carbon can go inside the lattice and then you have a carbide. And basically your calculations are wrong because they are calculated on a metal when you are representing a carbide. So when you are having this type of phase changes and this type of modifications of your materials, this kind of approach just based on the DFT profiles doesn't really account for the whole thing. So what we did here was try to see if we could get some parts of the information from the experimental data and then try to see if we could couple this to some descriptors that were obtained by DFT. OK. So this kind of hybrid data, this is what we try to do. And for this, we use an algorithm that was done by my colleagues, Marta Sales and Roger Guimara. And this is basically a Markov chain Monte Carlo that goes and tries to find the best equation for your problem. OK, so this is trying to find 3D equations, I mean, in terms that we can interpret as humans. So basically, let's see if this goes or no. If we go here, they start by a constant and they keep on trying to see, they have learned all the different functionalities for Wikipedia, the most typical formulas for Wikipedia. And basically they keep on adapting the different things. OK. And you can see these evolving over time and it looks very, very nice. Except it has a problem and the problem is dimensionality. So when we put all these intermediates that we have calculated via DFT, we found that there was not possible to reproduce to obtain any selectivity or any conversion pattern. And the reason for that is that the data is so linearly related that in the search for the different materials, it was just scrambling different variables in the main equation. So basically, this was taking us nowhere. But as you have seen, we had a way of sorting this out, which was our small PCA. So what we ended up doing was trying to see what happened if we include our PCA and we could kind of map the different metals and map the different intermediates and use just two absorption energies as the script does for these materials. And then you end up with this equation for the conversion, which is a very, very simple one. OK. Of course, it's only valid in these domains, but basically it contains the full chemistry for these particular systems. And if you keep on running these Bayesian machine scientists for longer, you will get other equations, but many of the motives that you find in the primary structure, they are still fulfilled. OK. So it means it's possible to obtain equations, formal equations for these problems that are based on one or two descriptors exactly in the very same way as we would having for the volcano plots, but these way in a completely automated manner. Of course, you can try to go for selectivity. And I have to say that this is much more complex, but something can be done. Other, the second source of complexity of catalytic systems is just the structure. So I was seeing before, I was telling you before, in the old days, we used to have very simple structures that were metal based. So the complexity here comes, for instance, on these materials. And this is the copper that I was showing you before. If you put copper or copper oxide, because you cannot have copper, basically it oxidizes when you look at it almost and you put it on the potential. You will end up with structures like this one's here. OK. Basically, your system is reorganizing all the time. You have some copper atoms here and then some oxygen leftovers. We don't know how many and we couldn't find how many from these simulations. But at the end, what we were looking at was to see if there were predominant patterns. And this is for the discussion that we were having earlier today. To me, the identification of patterns is one of the bottlenecks in which I will need more support from the machine learning people because we are not really very good at that. And of course, if we know what the patterns are, then we can look for the reactivity of these different patterns and try to see if they are repetitive in the different preparations and try to see how this works. But when you discuss structure, you end up having to have some characterization. And actually, in any catalytic preparation where they spend more money is in characterization, and this is mostly done in academic labs. OK. And I want to show you a proof of concept that we have derived in the group that is very similar to someone was presenting experimental results before. So we have been trying to understand the chemistry of this single atom catalyst. OK. Single atom catalyst is a very easy concept. So you try to use as little as possible of the expensive material. OK. In this case, the expensive is platinum, meridium or so. And many of the scaffolds that we use are based on carbon. So they are really cheap. You can burn whatever and you will have a carbon scaffold. Maybe not the one you want, but that's a different problem. OK. Now, when it comes to this, you end up having a bunch of a bunch of problems. OK. We would like to use this as sustainable activities for organic synthesis. And the reason for that is that if you are synthesizing something for pharma, you want that your metal stays where the metal should be. And it doesn't go into the pharma because then your pharma cannot pass the FDA or the European levels, of course, because you cannot have that many, many organometallic catalysts, many molecular catalysts have metals. And sometimes they get incorporated into the final product, even in PBMs. And this is really, really bad. So one of these strategies is to try to grab the atom, stay it in the lattice and try to use a proper reactivity, but the stain on the lattice. However, identifying these isolated metals on the scaffold is really, really very difficult. And one of the ways we do it is through high resolution transmission electron microscopy. And then basically the workflow is also follows. You have the image acquisition, which is as complex as it can get. And what we are trying to do here is to automate the image analysis. OK. And then we would like to extract because from these very expensive characterization, we were only getting in the past qualitative information. But if this is automated, we can get much more than we used to get before. So how we do that? OK. So our colleagues in Zurich in the NCCR were able to generate these kind of scaffolds with a lot of metals, which is good. So you can have lots of reactivity. But basically you end up 15 metals and five different types of force. And then you end up with this collection of galaxies. I would say that this is a proper distribution. I mean, the proper name for this. So you can have basically everything. And of course, there is a problem because for each image, an educated person needs 30 minutes to level. OK. This is where the bottleneck of the whole thing is there is a second bottleneck, which is the perception bias. Each different microscopy will level the will level the image in slightly different ways. OK. Slightly different, so not completely different. There is one area here that you can see here, these bright areas, the spectroscopies, they don't believe that they have enough contrast in their eyes. And they will think that they will not level in these particular areas because they are not good at it. OK. So you see lots of human labor, train some biases, intrinsic to the perception. So what we did here was the following. We contacted the group in BSC in Barcelona. And this is why I think Sasso's comment this morning saying, we need to sit down and spend some time together to see what we really need and how we can convey the needs of the different groups. This is really important. It was really important for us here. So there are the experts in the image recognition. What we did was sit with them. They come up with a very nice and cheap solution, which is a convolutional neural net that works really nicely and identifying all the images in about one second. They even spot atoms where the microscopy doesn't. And actually, these are some of the parameters that are related to how robust this methodology is in terms of what is called the position and the recall. And basically, this is a nice way to see if if if the convolutional neural net was working fine. Of course, we only run this for one set of materials that were the platinum based carbon nitrides. And one of the reviewers that was very clever and that was actually our first question was what happens if you change platinum by another metal? Now we took the same exactly the same material, but instead of being prepared with platinum, being prepared with iron and actually the networks and identify it's a single atoms, too. And I think I'm running over time. So I will wrap fast. Basically, this allows us a different thing, too, which is something that the microscopist will complain about doing, which is ranking the positions. I mean, making a 2D map of the positions of all the atoms. So actually, the degree of information that we are getting from these samples in terms of how is the distribution? It's these are random distribution. We can test better models if we are having these things here. And we can also calculate to what extent the different metals can interact. Yeah. And with this, I would just like to conclude. I mean, as you have seen, I mean, machine learning can affect the different levels, the whole field of catalysis. And we need we have challenges in changing the language and how to transfer the different understanding that we are having. And with these, I would like to thank you and thank the group and you for your attention. Thank you. We have time for several questions, maybe just to get it started. You started your talk by talking about these these next databases you had with IOTMBD and then you gave us a lot of stories about how you used your own data sets in some of these predictive models. Do you have any efforts to automatically use things from IOTMBD or things that weren't generated in your group? Have you leveraged other random things that people have put into the database and any of these efforts or have it all been your own students sort of generating these things? So for instance, it's the automatic network recognition and people is subscribing because because the data sets are nice to upload and easy to handle. But still, we don't have people contributing for developing tools. No, which is a pity. Couple of slides ago, you had this convolutional network trying to detect the elements there, right? So I mean, I guess there's some object detection going on. But for that, you'd need a lot of labeled data, right? Somebody would have to sit down and draw boxes around it. Did you have to do that? Or did you use some other technique to? No, we had to do that because you still rely on the truth of the microscopes. OK, so you still need the way you train the image manually annotating them. There is no other way so far. But if you come up with one, happy to hear. But you also said that you found some atoms that the microscopist hasn't. Yes. So who's who's right in my case? Yeah, that's that's that's that's that's that's a crucial problem. I mean, that's a crucial problem if we are trying to to use these automated things. There is not a ground truth. And then since there is not a ground truth, there is some room for for discrepancies and I'm upset. I mean, we will have to face this, but it will happen with many techniques that we are that when we are trying to do this smart characterization, we are contrasting them with references. And we think that everything is right. Yeah, yeah. Any other questions? Thank you very much for this amazing talk. So I found this single atom catalysis extremely interesting when you said that you could just replace platinum with iron I think. So can you in principle do this with any metal or are there some limits? So the contrast in this image comes from the square root of of the of the atomic number. OK, so in principle, platinum and iron are not as far as you can get four metals in the periodic table. Now, if you are asking me carbon and nitrogen with a wood, we will need to combine more than one different technique. Microscopy is not good for this. Thanks for the beautiful talk. Have you ever tried any unsupervised learning, like the one Alessandro Lio presented to capture density peaks in, I don't know, color, for example, or would you intend to do so in the future? Not yet. We will try to do that in the future. I mean, this is more characterization project. Both was the easiest thing that we could do. But certainly there are many other techniques that need to be implemented that some of them are very much related to density, so they will come now. OK, let's give her another round of applause. Thanks, Maria.