 So, the previous speaker was talking about the synthetic data sets or here I am. In this presentation, I want to tell you about our chemical machine learning exercise, where we aiming to answer the question how we can treat systems with our large number of elements or in our case 25. So, as a toy system, here we want to hope on the thread of how do I change. It works. These are, I hope on the thread of high entropy alloys, which are no well metallica alloys very interesting or appealing for scientific community due to the unique properties and why it's interesting for us to study the system is because you can control their properties by doping with small concentrations and adding different number, various number of chemical elements, and these systems are quite challenging for modeling because of their chemical diversity. So, what has been done so far in the literature, there are many nice works about high entropy alloys so we can categorize them in three major lines so these are first principle calculations on lettuce or lettuce models, but most of them, they are four, rarely five components alloys and very or narrow very narrow phase spaces. Another problem is that there is no data set are available in the community, which would help us to get some insightful information about different face abilities and help to predict or perform discovery exercise predict new phases of high entropy alloys and help to facilitate combinatorial search. And of course also are having multiple species in your system is very challenging because the descriptors are usually are done to scale badly with a number of different chemical elements. So, I'm not going to bore you with this slide I'm sure you've seen the slide from different angles. So, but I just use it to guide you through my talk. So, that's going to be a complete story where I start from discussing data set how we created it then about features models prediction and so on. First of all, we looked at all the elements which are used typically to create high entropy alloys. And here you see on the slide, and the most the top popular elements are 3D transition metal or elements. So, and we focused on those. Because we also want to get some insights or if we can cluster elements by their chemical nature so we for now we took the group which shares some similarities in the chemical nature and then based on this 25 elements, we created a structure either on if this FCC or BCC lattice, which contains from three to eight random chemical elements. Then we shuffle the structures or not or not so we have on lattice and on lattice of lattice structures. We created one million of structures for each of category, you can see on the right hand side of the slide that 2D representation of the randomly generated data. And then with FPS selection, we secure the uniform sampling of our randomly generated data. So all the selected structures were recomputed with a plain wave per last code with non speed polarized DFT are containing around from 3036 to 48 items per structure. So this data set we also want to propose to the community as a possible data set which could be used to benchmark the scaling with our number of chemical elements with the system. And now since these are some properties of the data set which atomic concentration are represented in here. So for every element they're more or less the same. Here you can see, you might wonder why you don't see any differences, because it's mostly we mostly target low concentrations because we want to model or something doping, doping related small concentrations and there are many elements in the same frame, and also on the right hand side, there is displacement with respect to crystalline lattice. For now, they're not that large, but we also planning on adding or more displaced structures, more radical structures. So now we have our data set. We also need to describe our data to pass it to the model. So since I'm coming from the lab where we are, we identify ourselves as a SO family. So I'm using some descriptors, and here we have our atomic density based descriptor, where we use, and here you can see that this descriptor treats separately every chemical element, and they are richer, of course, makes its scales badly with our number of chemical elements. And there doesn't help that to cover many, all the combinatorials, permutations of different species, of course you also need a large data set. What do we do? So we might ask ourselves, can we use the similarities between different elements? Can we use the similarities in their behavior? And this question was discussed in the article of my former colleague, where they showed that you can actually represent, if you project your chemical space in this, or if you reduce your chemical space and project all the elements, that you actually can see that the group, by their nature, like for example, hydrogen, you can see that it's closer to not the first group of elements, but closer to their alkaline metals, and they come up with this idea of creating basically a matrix, a kernel which would compress your chemical space and providing you sort of with pseudo-species, which could be used to represent your real species. So Andy showed an example of opazolite that it actually works. The opazolite data set has 35 elements in total, and here they use the compression up to four chemical, pseudo-chemical elements. So Andy got quite high accuracy. So we adopted the problem of that approach was that it works so good, but it takes roughly two weeks to converge the point using 6K training points, maybe not so good. And so we adopted this approach and we re-implemented the GPU-in-GPU friendly manner using PyTorch. Now, it takes much faster. So I'll just quickly guide you through this chart. Let's explore. Cool. So as an entry, first you pass, you initialize the coupling matrix where all the chemical compression is happening. You initialize it with the differences in electronegativities between our different elements in your system. Then you initialize the model wastes as random. Then, of course, you also have as an input from your data set, the density expansion and energies. Then at the first iteration, you compress your density expansion with the initial matrix, your compute power spectrum, then you predict energy forces, loss, and then with a gradient-based optimization algorithm, you do the prepropagation and optimize the coupling matrix and weights are simultaneously. So, yes, so their chemical, the degree to which you want to compress, you pass through or operating the coupling matrix. And the important thing is that differently from approach presented by our colleague, we use a simultaneous optimization of coupling matrix and model weights, which makes it helps to drop down also the accuracy. And so what we achieved so far, so these are the predictions on the data set I discussed just before. So you can see that we already, so this is still work in progress, and we hope to, of course, we hope to get more points on the learning curve. So, and you can see that for quite diverse data set where the sigma on energies, the standard deviation of energies is around two electron volts per atom. So we managed to achieve the accuracy of 13 mille-electron volt per atom. And also, using their energy-based only model, we predicted the forces, the error you can see on the top of this chart, of this scatterplot. And if you toss in 200 training points with forces, you can drop the energy even lower to 240 MeV per angstrom. So, yeah, so with this, sir, I would like to conclude and discuss a little bit our future steps, what we want to do next. So, as I mentioned before, we want to expand our data set. We will add the structures with larger displacement and structures which contain simultaneously more than eight elements. And if you have any valuable inputs, I'm all yours. If you, I don't know, have any wishes, sir, and you want to interested in this data set potentially. So, come over. Then also, we want to, as I told you, we optimize in combining matrix in our loop. So we want to investigate if this combining matrix can become sort of like an input parameter or in a sense that you can transfer to this matrix to another system and you already have this coupling of the elements of your model. Then, of course, the features should be the same. And the setup of the model should be the same, but still could be also a nice result. And also, if we can get any interpretability of this combining matrix, if we can get some insights about chemical nature. And, yeah, with respect to results, then next, our less ambitious goal is predicting phase stability and more running our NVT simulations and some other kinds of MD. So, thanks a lot. Thank you for your questions.