 Actually, it was Kath and Sweeter from 2009 that I... I'm sorry. Can you come here? Can you hear me? Good. So I'm going to talk about computational screening. It's also the first talk this morning. But in this case, I will focus on solar energy materials. So some of the ideas behind computational screening, which you have already heard about that, but I've just put three points here. The first one is what's actually the problem that you want to solve? And so what are the properties of the green material for whatever application that you have in mind? And in order to then get any further, you need to know what we can actually compute, which is relevant for this material. And finally, since the number of possible materials is so huge, you usually have to limit somehow yourself to a part of material space. And I think these are sort of the main components that I'm also going to address. The problem that I'm going to focus on is so-called light-induced water splitting. So the idea there is to take light from the sun, make electron hole pairs and use those to split water into hydrogen and oxygen. In this case, with a tandem cell, we have two semiconductors and one taking the high energy part of the solar spectrum, the other the low energy part, and the reason that you do this is because of efficiency. It's much more efficient than having a simple material. So why do we want to do this? We want to produce hydrogen as a fuel to save the world. And in some way I'm not even kidding. This is a really serious problem with the use of fuels. So we have to do something in the not too distant future. So the devices that people work with now are really quite complicated and I'm not going to go into any details because you are not all that interested in water splitting, but the fact that we have these two semiconductors and then we probably have to protect them from the water with some protection layers and we also need some catalysts to get the chemical reactions running. And it turns out that some of these components we actually know how to do, but one real challenge is what should be the large-band gap semiconductor. Silicon will work pretty well as a low-band gap one because the optimal band gap is around 1.1 EV but the large-band gap one should be around 1.8 EV and that's the material that we would like to find. So I will discuss three different approaches to the question about the material space. One is to map out a particular class of materials and I'll look at the perovskites. The other one is to focus on known materials as we've also heard earlier today from the ICSD and the last point is to take a look at machine learning and how to use that to guide the search in material space. We've earlier done some work on oxides and oxynitrides, perovskites and they have, for many reasons, they were not optimal for this application here. One thing was that the position of the valence band maximum was too low and that's something that sulfide generally do better. So we decided in this project to concentrate on sulfide perovskites and then we have a screening funnel where we go through different properties and as a question before I think about getting TC which order should one do the different screening steps is also of course to have more demanding calculations at the end so that you do not have to do that many because you've actually discarded many of the materials. So the way that we approach this is that we take 53 different metal atoms so we have A, B as 3, A here and B there and then as an additional screening we take a small 5-apton unit cell which is allowed to distort and then see how many of these were in fact semi-conducting. So can they have a gap at all? Could I take, excuse me, a bit of water here. So after this step where we've identified these we look up in the ICSD, one of the most common structures for A, B as 3 materials and then we simply pick the 6 most common ones and then we investigate these materials in all the 6 different structures. In this way we can, these are the different structures for given material we can calculate the binding energy, the heat of formation and we can calculate the band gap which is shown here and here you can see an example where we have different structures with different energies but then we have a point here which is just as Stefano talked about, the convex hull. So this is the energy you get by combining binary components in the most efficient way. So in this way you can see that the convex hull is more stable so this material will not work, it will be unstable. And we have these error bars on that we get from using our meter-beef functions so that we have an estimate of the error bar on the prediction energies. In some cases like over here you see the convex hull is up there so it might be stable but you can see that there are several different crystal structures that we cannot really distinguish between and then it's quite important what the band gap is for these different structures because if you have a competing structure where you just get a little bit of that into your compound and it's metallic then it's terrible but if it has a fairly high band gap it might not be that important for the performance of the material but these are the kind of details you have to go into if you really want to I think find new usable materials. Then we can calculate the band gaps. Again here are the different materials and the different symbols you know the band gaps calculated for the different structures so we can focus on this green window which is relevant for water splitting around 2 EV for the band gap and then we also actually collect materials that could be relevant for photovoltaics with a band gap around 1 EV. We calculate the band gaps with GLLB which includes the derivative discontinuity and therefore gets a much better estimate of the band gap that's APB, DFT. We also try to address mobility by calculating the effective masses that's of course not in any way a complete description but it's better than nothing and then we concentrate on here you have the electron and hole masses on the materials where you can find masses which are below one electron mass. The last step which is maybe the most unusual one here is that we try to study the defect properties and the way we do that is by looking at the density of states if we create vacancies and what is important here is if the vacancies will introduce states in the band gap as you see here because this might be scattering centers which will destroy the properties of the material. So these materials which we call defect sensitive or they are called defect sensitive we will then remove from the screening procedure. So in the end we end up with in this case like 15 different candidates of these sulfites and then as it was also discussed before then what do we do? Luckily we have some experimental colleagues nearby that are exactly interested in water splitting and these kinds of materials and we discussed with them that they should try to look at length of the metrium sulfite as a candidate and they've done that and here is the X-ray result when they synthesize that you're not intended that you look at any particular things here except that the conclusion is that the material does actually have the predicted crystal structure also the band gap is around 2EV as measured here by spectroscopic ellipsometry and finally the photoluminescence shows a nice peak close to the band gap some indirect indication that you don't have too many problems with states in the band gap. So at the moment they're trying to actually incorporate this material in a water splitting device that's not that easy but we'll have to see what comes out of this. So there are of course many limitations in this approach we only have a particular composition we only have few structures we do not give a very accurate calculation of the band gap the mobility is only addressed by effective masses defects we only consider neutral vacancies and so on so there's room for lots of improvement but still in the end if we started out with something like 3,000 materials and we actually end up with a short list of 15 so you have a huge reduction in the number of materials that you have to worry about so a different approach than looking at a particular class of materials is to take a starting point in some of the available databases and what we have done here is to start with the ICSD as they are already calculated within the database the OQD so that has a lot of advantages one of the most difficult issues to address is the stability but these materials have actually been synthesized so that's at least not the major point for a start and also some of the properties have already been calculated within the OQD so that's a way to speed up the search to use that the first thing we address here is the abundance of the elements and the question of whether you have a monopoly market for these elements if these are meant for as energy materials that should save the world they will have to be used at very large scale and that's the reason to only focus on elements which are expected to be available at a reasonable cost so here is the screening funnel where we start by picking out these elements and then we get something like 70,000 materials then the bandgap with PBE is already calculated in the OQD so we can use that to do a pre-screening of the materials and say we'll only focus on the ones which are known to be semi-conducting and to have a bandgap less than 2EV since PBE is severely underestimating the bandgap so we expect that you wouldn't in this case miss materials which actually have say 2, 3EV of bandgap in reality so this brings down the number to around 1600 materials then we calculate the bandgap with GLLB and here you can see as expected the connection between the GLLB and the PBE bandgap is somewhat larger so we can focus on the region which is relevant for water splitting and the region relevant for photovoltaics then as before we can calculate the effective masses and here you see below this curve the ones which the fraction that actually obeys this criterion and again the same approach with the defect sensitivity and in this way we end up with 74 candidate materials so out of the whole of ICSD this is the rather small class of materials that we find and so we have a long list of the properties of these materials there are many known materials of course which was also addressed by Stefano, I mean you have to check and see does it actually behave in the way that you expect for materials which have been investigated as solar energy materials and then we can point to some particularly interesting candidates and for example this strontium sulfide is another material that is now being experimentally investigated to see how it will perform so at the end here I will say something about some of the attempts that we have in using machine learning of course there are many ways of doing machine learning different fingerprints representing new materials different doing kernel regression neural networks and so on the two questions that I am going to ask here is can we predict material properties without knowing where the atoms are the point here is that the kind of screenings that I have been talking about mostly use standard density functional theory calculations and what is really time consuming is to do the structure optimizations for all these new materials but the point is if we make a machine that can predict say the stability in the band gap for a material if we know where the atoms are it is no help because it means that we have to do the DFT calculation first in order to find the positions and then we don't need the machine another question is if we are considering very large material spaces then actually even though you might have an efficient machine it is not possible for the machine to run through all the combinations if you have 10 to the 10 or 10 to the 15 different materials this is not possible so we will need some ways of directly identifying interesting candidates instead of just trying out I put it there one example for this not related to both materials but to organic solar cells where we are interested in some what is called PCBM based blended polymer solar cells I have written down here what PCBM means because I will not be able to remember it so if you asked about it the way this works is that again you generate electron hole pairs and then crucial I am not going into any kind of details here I think the main part is the fun with the machine learning and then the crucial quantities that you are after is the position of this lube level at the acceptor so that you can transfer the electron and what the band gap is for the light absorption and we are looking into a class of donor acceptor molecules where you have different possible acceptors different possible donors say backbones and then you have different side chains here indicated x and y which can have a number of different possibilities so in total this space will be something like 10 to the 14 different molecules in principle so we have done B3 lip calculation for about 4000 of these but you would not like to go into doing this to a much larger extent and you can do machine learning on this using all the atomic coordinates of the molecules using some of the standard fingerprints and it works very nicely but again how do you know the coordinates for molecules that you haven't calculated so the approach that we have taken here is to represent the molecules by strings or rather grammatical production rules so that I talked about that you had an acceptor and a donor part and you had different side groups and that's turned into a string that can be produced with some particular grammatical rules so in this case we do not have any specification of the atomic coordinates and let me say it over the end result if you do this instead of using the coordinates is that you're doing predictions of the band gap maybe 40% worse okay so it's better to have the coordinates but it's not that proof you this kind of approach has been used in a spore-goosie group also using smiles for some other molecules so this is a question of how do we represent our systems in this case without any atomic coordinates the other point was how do we make predictions without running through all the different combinations and one way of doing this is this variational autoencoder where you train a neural network so that you have an input in this case the strings and then the neural network would map this into a much lower dimensional vector space in this case in our case here it's a 32 dimensional vector space and the way that this is trained is so that you can optimally reproduce again in the output the input string so you're trying to compress the information into a vector space and then out again so that you don't lose very much information by doing this and one of the advantages is that then you can move around in this 32 dimensional vector space it's still large but because it's a vector space you can also do things like taking gradients and stuff like that to actually optimize your searches so just to show one result for this latent space is if you are doing a principle component analysis then these are the first two components of this 32 dimensional space and here you can see the data points from the training set but what you can also see is that if we now from the data set have some interesting molecules with appropriate lumo values and homo-lumo gaps they are indicated by the yellow points so the point is now that you can actually move now around you train another neural networks network to produce the numbers for the band gaps and for the lumo you train that on this space and then you can move around in the relevant part of space and then suggest new molecules without having to run through billions and billions of molecules and you can see the result here that this is the training set and this is the relevant region in this case the optical band gap and the lumo energy and here we have predicted 100 new molecules by this approach and you can see that now they lie closely in the region that we are interested in so I mean this is kind of a test this is in no way useful at the moment for these solar cells but I think it's fun to see these different approaches that can be used for finding materials so at the end some pretty naive considerations when we move to the excess scale and we can simply do much more in terms of the computer power we can stream more different materials we can do better calculations we can calculate new properties which are much more demanding than what we can do today a trivial point that I'd like to make here at the end is however when we will be able to do this we might be able to calculate not a thousand materials but a million materials or a billion but if what you are seeking for is one percent of these you will actually end up with very many candidates and you cannot go to the experimentalists with ten thousand candidates and say could you please try that out so I think a main very important challenge will be to find better and more descriptors which are relevant for the real material properties but one can really narrow down the materials that are interesting to investigate further and here are some of the people that have done this in particular Corina Kuah and Monish Pandey have done the two screening studies of the Salfa Barovskites and the ICSD Okrium Depart ok thank you