 So yeah, thanks a lot for agreeing to be on the panel. This turns out to be the last event in this workshop. So I think it's a good way to both conclude and look into the future. So let me ask one question first to all the panelists and then we open it up to the floor and to Zoom. So I was interesting to hear your opinion on what you think is the biggest achievement that machine learning has brought to the field in the last few years in your opinion, right? You all work on slightly different stuff. So maybe if you want to start, Zach, and then we go from right to left. Convincing people to learn statistics. I think if nothing else, it has made it cool to learn statistics and alternative propagation and all the things we sort of take for granted. And it was quite hard to convince people to do this a couple of years ago, 10 years ago. But now all of a sudden everyone wants to learn. So I think that's already a big help, even if we call it machine learning. I think the impact of machine learning has been to encourage us to rethink a lot of the problems that we had before in the field where a lot of manual parametrization or intuition has been sort of the norm, like in density functional theory in force fields and from experience what computers have learned to interpret images and text, basically told us that a lot of the manual work can be automated with some of these high dimensional models. And so now all of these approximations can be rethought and redone much more accurately and perhaps automatically. So that sort of encourages a lot of people now to think that all these difficult problems maybe can be solved. Yeah, I mean, together with these two things, I think adding the openness of our data, it's also a very good point. I mean, we are really able to do this thing because we are more transparent in the way we work. And this is going to be a good for all of us. I don't really feel like I'm in a place to say, I feel like I'm more on the other side. But what I think, what I hope to continue to see happening is that there is more sharing of code so that we can have quicker iteration. And I think it's already happening. I definitely don't want to say that machine learning computing has done that. But I think it's happening more and more. And yeah, I think that will be better because we can, you know, that's the only way we can make good progress, I think. Yeah, so I think as experimentalists, it gives us the opportunity to connect simulations and experiments. And we have seen some of that. And I think especially when there is not a direct connection between the experiment and the simulation, machine learning can help to bridge this gap. And this is very helpful. Okay, thanks a lot for this initial assessment. So maybe you take questions from the audience, including Zoom. So just raise your hand. You can ask a specific question to individual panelists or general Andre, for example. Okay, I have two questions. First one, we've seen over this conference that everyone or lots of people are using much more and more complex models like the neural networks we have in more and more layers, more complex architecture. So my thought is, because I'm not machine learners yet, are there a way how we can build something like active learning AI that would architect this machine learning model for you so they would be more efficient by themselves? So this is my first point. And then the second point is, for example, in the catalysis, we've seen lots of new improvements going almost to the close to the accuracy we want. But still all these models are mostly in vacuum. So, but the real chemistry is going in ambient conditions or in water. What do you see how the field will get over this and other dimensionality of having the solution inside and how it will be in the future? Thank you. Let's take them maybe one at a time because they're quite different to questions. So any thoughts from the panelists on using active learning to make models more compact? I think you might be talking about auto ML sort of methods like neural architecture search or other automated methods to find models that work. And I feel like that's already starting to happen a little bit and has especially happened in the image and language side, but I'm not aware of too many great tools for atomistic potentials, but it sort of feels like it has to happen unless everyone in this room becomes an expert at the latest and greatest every single time. And so maybe that's a solution. We all just get really good at it or maybe someone comes up with a clever strategy to find the right representation or make these decisions automatically. I don't know what'll work. Any other thoughts on this point? Rihanna? I think for the image domain it took quite a long time for the auto ML sort of like setting to become maybe still, it's still actually in the process of becoming actually useful. So I think we'll probably hit the point first where we have lots of people who have these magic skills to figure out what's the right architecture and then maybe we'll get there. It's a super hard problem, but it would of course be super useful. Then on the other aspect of complexity, I think there's great opportunity of what you mentioned by fusing computation and experiment in joint machine learning models. Any other thoughts? To me, that's really the only way to go farther if we are able to integrate information that comes from different layers. So not only our traditional DFT or the energies that we can supply from the new potentials, but also try to encode some parts of information of in catalysis, very detailed characterization that we are having and characterization is very expensive. So you want to make the best use of it. So I think that models with multiple layers will be needed in the future. Just maybe to comment on the model complexity and I think it's a very important point that was raised that models are now getting very accurate for a lot of properties, but in practice, they also get slow. So for practical application like molecular dynamics or design the materials, they need to become much faster and sort of consume less computational resources themselves for training and for inference. And so I think the next step in this whole game is specification or reduction of model complexity, whether it can be done automatically or somehow by hand, I don't know. But there's a lot of evidence that we are overdoing a lot of things in gigantic neural networks or maybe very complicated descriptors. And every time you try to simplify things, it still works. So there's a lot of opportunity in sparsifying and simplifying models to become much faster and maybe they'll become also more transferable, who knows. But I think this is where the field probably needs to go. Let's take another question from the floor. Thank you. So my question was already, I mean a little bit talked about in the first talk, but we also produce a lot of data, but I've been feeling we are quite ineffective at sharing it actually with the community. So maybe some thoughts on how you think data and also maybe pre-trained models can be shared the most effective way with the community. So do you want to add anything, Zach? I mean, you're already doing it too. We've certainly been generating data, but I think purposefully, we haven't been working on the problem of how to share the data or include it with others because I look around and it's a monumental task that is very long-term and someone really needs a lot of resources. And in a lot of cases, I think the only way to get everyone to do it is to have it sort of be top-down from funding agencies or someone else saying how to deposit stuff, like what happened with the NIH and the protein data bank and others. It really required the NIH to step in and say everyone has to deposit in order to get everyone on the same page. And just in my own group, I don't see any way that I could do anything on the scale of IOCMBD or Nomad or these other centers. And so I feel like there's already people working on it. It's definitely a hard problem. I feel like it's getting better and hopefully it becomes easier, but I don't think it's something that any one academic group can do. It feels like it has to be larger at an industrial or a national lab or world scale or something. Nasty, do you have any thoughts on sharing experimental data? Yeah, sure. So I think it's incredibly challenging task because there's so many different groups, so many different approaches on how to like working approaches. And you have to come up probably with some sort of powerful ontologies and individual groups should have their own ontologies that are then merged into bigger ones. And so that gives you a layer that can, yeah, in which way you could understand the data from different sources, maybe. And this will be something that the community should work on in the next years probably. Nuriya, you're running a data platform too. Do you have any thought about including models also? Yeah, I mean, a little bit, as both are saying, I think that from the funding mandates that we are all having, this is very difficult to allocate the money to do this thing in a systematic way that will be good for all the community. And the only actual way to do this is either enforce it as a proteins people. Proteins people did it many, many years ago or help us in doing this. Even the data creation, I mean, many of you know that if you have data and you have a European project, you have to project how you will be supporting this data for a little bit more than the extension of the project. But you cannot be covered by the project itself. So this generates lots of problems at different levels, not only the practical problem of fast sharing, but also the practical problem of finding the money to maintain the databases alive. And this has to be syndicated in some way so that we can get our way forward. And I think that researchers need to get recognized by the data that they are putting into databases. Okay, this should be an achievement tool like a manuscript is or a paper is an achievement. There should be and the same goes for code. I mean, when I say data, I also say code. These parts are constituents to what we are doing. And you should get rewarded for this. Okay, let's take a question from Zoom, was it? Yes, this is a question from the chat and it's asking what is your position on machinery models for direct property prediction and whether these can be joined with, for example, machinery potentials. Since after now they have been quite separate entities. Maybe very much depends on which property you're trying to predict. Some properties are very smooth in structure space. For instance, if you're predicting energy, if you're predicting something different, like for instance, a kinetic property or a conductivity of things, very tiny changes in structure will produce very large changes in the property itself. So learning directly could be extremely difficult. And there you need the direct method for predicting the side of electronic transport or thermal or ionic transport, something like that. And that's where, you know, you have machine learning model that accelerates the computation itself that gives you the property. Some properties can just be directly probably mapped from, you know, molecular structure to the output, but it very much depends, I think. So I don't think there's a single answer there. I also want to comment, because if you have seen many of the predictions that we are trying to do, or many of the objectives that we have been saying are related to thermodynamics, and still you have all the burden of the kinetics that we are kind of having in the second layer that we haven't really addressed or not too much in many of the presentations that we have seen this day. Okay, a question from the floor. My question is about how the methods that are developed in this kind of community can be used by non-experts. So real experimentalists and how they, for example, how experimental chemists would, for example, maybe use DFT. And my question specifically is how much you think that the fact that there seems to be a new, better method very often, how that kind of inhibits people adapting our models, so the fact that there's a lot of change, and for example, the leaderboards that we have, and whether that's just kind of in the nature of that this field is still very much developing and we're kind of just waiting for the alpha fold that will blow everything out of the water, and so that it makes it very clear for people to adapt these methods, or how you see that general problem. I think there's more and more interest in how to do really high throughput and distributed inference across really wide spaces, right? We saw one talk today, there was like a billion different crystal structures, that's amazing. I think that's going to get faster and faster. There are things that Google and Facebook and Amazon and others have already sort of set up, right? You don't need to do all that from scratch. Taking a PyTorch model and doing it highly parallel is something that should be possible. And when I think about what has happened on the DFT side, things like the materials project have the same problem, where there's always a new recipe or functional or something, and they don't just have one entry, but it increments over time as they change their settings and updates up. And so it's a hard problem, right? It took a lot of software engineers and others of the materials project to make it happen. But I think that's a really nice demonstration of what we should have for these predictions of these models. I don't think it's going to be experimentalist running models having to learn simulations and everything else. I mean, that's just a lot to ask of someone, right? I mean, that's months or years of effort, and I'm not sure that's the best use of the time, but we should have the outputs of these things be curated and put online so that anyone can see what is the next greatest model, say the energy of some surface or crystal or whatever is in the same way that we just go to the materials project and say, what is stable? Any other thoughts on this? Yeah, I mean, also there is the issue of the cheaper technology gets, the more people gets attracted and the more black pocket it is, the more you can get results that are not really helping you. And this is a danger and this is also something that we need to be aware of that there is a potential to use all these methodologies on many of them because they are far too cheap in the wrong way. But this has happened before and for the rest, for the models, yeah, they will get the point that there will be a consolidation of a standard and maybe you will have changes in this standard, but if you think when the DFT codes were developed, the play with codes were developed, there were many initiatives and at the end of the day nowadays there are a few codes that have survived and I will expect that there will be some kind of reduction over time of all the things that we are exploring. Related to that may be a question from my side. So we've seen in many other presentations sort of reference datasets that came from mostly image processing that helped a lot to get that technology progressed, right? But it seems to me that the material science community doesn't have such benchmark datasets and benchmark cases yet that are maybe understandable to a contributor without too much domain knowledge. So I think what you're building up, Zach, now is the first example of that with the challenges and okay, you have your collaboration with Facebook that helps already, but do you see maybe in your respective fields the emergence of such reference datasets and sort of reference challenges that people can contribute to? So I think there's already others in the community. Anubhav Jain comes to mind as having that map bench where you can go online and they have small data all the way up through materials project scale data for predicting properties and then track models over time and so we go there quite often to see what the best for crystal structure energies and other properties is. The other thing I would mention is the ML community has already addressed some of these challenges with sites like papers with code where the datasets and the models are being tracked over time. I don't think most of the material scientists have done a good job of getting their stuff into papers with code and tracked properly. So when you look at those things are pretty sparse in the leaderboards, but that seems to be the approaches working on them if you have other thoughts. Yeah, I agree. Probably something like that. I think it just has to be a bit more. All the data just has to be shared a little bit better on similar platforms. So I think papers with code is a great way or doing it through these challenges. I mean, how many of you have gone to papers with code before? I mean a couple, right? But I mean these are the things that we should be thinking about, right? Like how do we really get it easy to compare the data sets to the models? But I think it's also quite, I mean, there's so many different tasks, right? You also mentioned, right? That it's quite important to explain to non-domain experts what the task is, right? But out of all the talks we've seen here, if you think about what type of datasets you need to solve all of your problems, that's actually quite a lot of different datasets, right? So I think it's important to think about if we just throw 10,000 datasets at people that are all really small, nobody's going to do it, right? So I think, no, some organization also in that aspect is probably going to be. Now what you see is the potential in sort of experimental data generation for such curated datasets for general machine learning improvement or even challenges, like or similar. Yeah, sure. So I've actually seen a few examples where the code and the data was shared or actually also in our papers we shared the code and data mostly. But there's also efforts by the community to share datasets for benchmarking. These are mostly relatively small datasets, but once you make it public then people could add to it and also refine the code and then something like this could evolve and I think it will evolve in the future. Yeah. Do we have questions from the floor or from Zoom? I think my question is more of a bit more general. So if you think about it at the beginning of deep learning all of the hype and all of the young students who were super excited to do deep learning they were quite motivated by the examples, right? For example, if you have images you can do generate new, I don't know, cat images or whatever. And then this still pub game about which started talking about how does the model work and then a lot of visualizations around what you can tweak and then how things work. So I feel material sciences is as a field so I come from computer science and to me it's quite interesting like the kind of work that's happening. But I also feel this, if I now try to talk to my computer science colleagues it's really hard to explain to them what exactly is going on and I feel there needs to be some sort of, I don't know an outreach program which kind of explains like the kind of interesting problems that we are solving here in terms of like layman's languages. So do you feel there's maybe a gap? I mean, of course now this is completely devoid of what we are doing here like machine learning for materials but then I think there's a need for a general outreach kind of a task that I think we as machine learning for material scientists could do in terms of showing applications of what's happening and then generate more interest among I don't know lay people and then that in combination with the data sets that we just spoke about it will just, I think I feel that's kind of missing and then what are your thoughts and opinions on that? I'll comment on that. You want to solve climate change, you need materials. You want to solve the problems that we are having with plastics we need materials. You need to solve the energy harvesting problem, we need materials. You need to store hydrogen efficiently you need materials. Just a few of them. I think that's great, but that's I think a machine learning person needs slightly more concrete information than we need material. This kind of education I think that is important, right? So if you as a machine learning person you enter a sort of like a new feeling like oh my god this looks so scary I don't understand any of the normal picture I have no idea what you're talking about. It's like okay I'm just going to there's many workshops, right? I don't know if you're interested in in NERBs or in ICML or ICLR for ML for science. But they're pretty as much as we try they're still quite sometimes inaccessible for people with no domain knowledge at all, right? So slightly clear translation into a language that you know machinery practitioners can understand like this is my output, this is my input of my model this is what I want to predict and this is important I think sometimes these things could be improved because for instance if I can I can write maybe a nice paper about a generative model of material and I'll probably you know write something that I think is great and then you'll look at me like that's totally useless what did you do, right? And so I think these type of discussions we should probably have a bit more. Absolutely it was just a a flavor of a few things that if you're interested. I would also go one beyond that I mean getting collaborators interested in your problems is an old problem, right? I mean we've all struggled with this, right? It's not just getting ML people interested in our problems it's getting physicists or mathematicians or statisticians or whoever else and a little bit of it is motivation climate change is a good one and I think that's the best one to hook people with because there's obviously a lot of targets and needs and ESG targets and other things that make these companies very motivated and solve these problems but it also needs to be an interesting problem and it needs to be a hard problem and it needs to be something that they agree is actually hard on their side and cool and fun and pushes the boundary somehow and so my impression for one of the reasons why so many companies are interested in the space right now of all the times that this could have happened is large graph models are quite interesting and hard and are moving very quickly and if we said that the only thing that needed to happen was another image recognition problem to solve climate change there would be some people interested in that but I don't think we would have the same sort of engagement so it's on us I think often to do the translation and find problems that are hard and interesting and sometimes people will just look at you and say that's something I could solve but it's not actually a hard problem and you just have to live with that and that's okay you go and talk to other people or you just download the codes or the reference methods or just do the standard approach but not all the time is there going to be the spark that says the ML people or math people or whatever have to work with you and have to help you on your problem and that's okay Thank you so we're already coming to the end of this panel discussion it's relatively short today but I wanted to ask each one of you one concluding question so we started off with the achievements of machine learning and now I'd like to ask you about what you think is the largest challenge that we're currently still facing just one I mean I know there are many in your specific area that you're working in so let's start at the other end so Lars if you want to Yeah so maybe from the experimental side if you do machine learning you often need input and output values and so in that case it means that you have to have some final conclusion about your results and often an experimental if you look at experimental data this is subject to personal understanding of the data and maybe also the field of the person and so there's not always the one final conclusion which is also maybe an issue with benchmarks and experimental data sets if two people perform a face classification they could come up with different results basically and this might be an issue and a challenge Okay, thank you Rianne Yeah so I think the one thing that I'm worried about is I think a little bit related to what you've talked about also is that we're going to do machine learning on synthetic data only and then for instance we're going to do oh yeah great we managed to fit the DFT functional but actually the DFT functional isn't really that good in the first place right and then I feel like we haven't really done anything useful but I think it's still pretty hard to integrate with experimental data which you probably have because of that It has a big challenge I think training and training for everybody and make sure that we can have careers for different people with different backgrounds that provide different contributions to the chain value in science I mean if we are just thinking about academia we are really missing that we don't have clear career paths for profiles that are mixed we are trying hard but it will be very important that everybody understands the role of different people that it's in a scientific team adding all this complexity because our problems are highly complex and there are many people that can understand and translate them in very different manners I think the biggest challenge probably is formulating problems that can then be solved with machine learning I mean we have techniques in the machine learning world but using them in an intelligent way for predicting properties is some problems are very data driven you start with a structure you try to predict some property but oftentimes these problems can benefit from incorporating a lot of physical priors or exact constraints or symmetries that clearly makes a lot of models much more efficient and more interpretable so the question is how to transform your problem that you are trying to solve I think is describing quantum states density functional theory is not good we can make progress in making it better how do you learn a density functional how do you input into the learning algorithm or in force fields how do you incorporate exact symmetries so that your force fields are not brute force so these kind of problems are connecting what we know in physics and chemistry with machine learning I think is where a lot of work has to be done I think if we want these methods to have an impact the models also have to be robust robust to different structures, robust to different inputs robust to all the different things we ask of them and I feel like in my own group we've developed tons of tools where it's easy to have a demonstration system and the results are good and you are hand published there are 110% of edge cases that crop up and you have to hammer them down and figure it out but it's really hard to do and I think that's going to become more of a problem and I'm coming at this also from a chemical engineering background where there's been this community of people doing systems engineering and optimization and control where if you want to control action for a plant a chemical plant that is processing tons and tons of material that is flammable and destructive and everything will potentially blow the plant up and we don't have quite those same problems with DFT no one is going to die if our DFT simulation goes off course and that's good but robust optimization and robust control and all these other things are very established and I think we need to learn some of these things from other fields because it's really hard to get something that works all the time where someone actually trusts it and says this is as good as a student doing it by hand or agreeing to be on the panel so this concludes the panel discussion now and I hand over to my fellow organizers for the concluding remarks