 All right, so after those phenomenal talks, we just wanna take a little bit of time and assess where we are and get feedback from you. We'll have plenty of opportunities for that, but this is half an hour of a discussion time to really talk about where we are. And I'm just going to give you a couple of slides to kind of orient you about where we are here. As David mentioned this morning, we've accomplished a lot in this first year already. So we've released and validated the first version of the Scream-off format. We spend a lot of time on infrastructure. You've just heard from Jeff about the toolkit, which is really the way you will interact with our force fields to apply parameters and export them to other codes. There's always ways we can improve that. So being back on how we can make that easier is critically important. And that's why I think the one-on-ones will also be very important to understand how these will fit into your workflows and how we can make sure to make the process of using these sports fields much easier for you. We've been very excited about all of the work that Daniel Smith and his colleagues at MOLSI have done on the QC Archive. How many of you have played around with the QC Archive out of curiosity? Anyone? Raise your hand. Okay, there's just a few of you right now. It's super cool. You can go to the webpage, qcrchive.molsi.org and interactively through your browser, play with it through one of these online Jupyter notebooks, download data, look at version profiles. It's really great. I encourage you to do that. On the toolkit, how many of you have actually used the toolkit already? Some of you, okay. So some of you today will probably be your first time on the one-on-ones actually using the toolkit for something like your application. And then hidden behind the scenes, of course, there's been a ton of work on things like the property estimator that Simon Booth wrote is really led. These are things that you will sometimes maybe use if you want to build new parameters in-house using private data. But for right now, it's just powering our various workflows. And then there's been a huge amount of work of course done by LinkedIn's lab and especially Yudong on actually refitting portions and refitting a lot of the other parameters. So these are the main tools. They're all online. Everything we've done has been openly developed. You can follow along in the issues. You can watch the repositories if you want to see all the traffic. You can start the repositories if you like them. But everything we've done is open and sometimes that's too much information for folks, but at least we've hopefully given you the opportunity to be engaged as scientifically as you'd like through both through the Slack and through the infrastructure development. Just to kind of orient you in where we are, our overall goal here is not to just produce one force field and then stop. So build and automate the tools that allow us to rapidly produce new generations of force fields that increase coverage and increase accuracy so that we can continue to move towards more interesting parts of chemical space and really power the next generation of biomolecular modeling. So we've started off with mineral coverage, but that's going to rapidly increase as we go in at more computing power and increase our coverage of what chemical spaces we're dealing with. We've got things like patent sets from some of the pharma partners. And thank you very much for those of you who have provided those. For those of you who haven't, we'll be reaching out to you to try to get that information because we'd really like to cover the molecular spaces that you're using. And then we're also excited about chemistry as a future as we move on. The physical property data, as you've heard, we've started very minimally, but that can be rapidly scaled up, especially as we get to including mixtures of solvents, things like that. And then we're very excited about including thermodynamics because that's something that people haven't done before and is going to really lead to larger improvements. And then finally, there's a lot of other things that we'll be talking about today, how to get rid of some of the older infrastructure and limitations of charge modeling and how to improve things. And then we've started with just a refit of this Miranoff force field, but we're going to expand these types in the coming iterations. So just to give you an idea, this is sort of our plan from a year ago. We had broken things out into three generations and we've essentially now automated everything from generation one, which means that it's going to be easy for us to completely refit these things over and over again in a mostly automated way as we go through these different generations. There's lots of stuff that's left to come in terms of increasing accuracy. You're very much trying to prioritize it by what's easy first and that's what we tackle. But I think there's a lot of really exciting things you have to come. And like I said, we're in this world of being amber biocompatible. So everything we're trying to do right now is compatible with the amber biomolecular force field. So you can use your favorites. There's been a lot of discussion recently and we've been talking also with part of Simmerling's lab about the new amber 14, is it the 14 SP or is it the 19 SP is the latest one, sorry. Which we're also quite excited about. So we've been talking with them as well. And then eventually we will have to consistently refit biopolymer force fields at the same time in order to do a greater accuracy. So there's a lot of data that we could use in force field preparation obviously. This is just a subset of it. There's a lot of exciting things going on also with the potential for mainly for possibly in the future releasing a small subset of the CCSD as public, which could be a tool but that would be a whole research project on its own as well. But you've started at least to automate a subset of the things that we think are gonna be the most important and then form a variety of different force field aspects. So here's the discussion of topics that came up from the earlier questions. But I really want to open it up. What other questions might you have or other concerns or ideas or what are your thoughts so far on where we are in this effort and where we're going? If nobody else has anything to add, we can also go through this list. So I'll just let people jump in as they want. Where's the microphone at for us, please? Okay, great. So we'll need you to run around. Well, let's just go through a few of these though. So one of the questions that came up is how do we avoid an explosion of different format? One obvious answer from David is that the force field, open force field effort itself is aiming to achieve greater coverage and greater accuracy on a very large space that we're gonna be putting out different generations that should consistently improve the state of the art. So you want to use the latest open force field version of the force field, for example, and that would be the best. At least if you want the, yes, can I just send it over to Julia? Though we do want to empower other people to build their own force fields, especially within how state of the art that's what they have on hand, right, yes. And you're talking about it improving, but if you have a list of 2050 properties, not all of them are going to improve. In fact, some of them are going to get worse. And so it becomes difficult to define that you're definitely moving forward. And I've seen the DFT Zoo, right? I'm right there. And so I would just caution you about that and think a bit more about it. Understanding where we are is extraordinarily important. That's why the benchmark and assessment set has to represent a broad variety of use cases that are representative of what people, especially in the industry, are really hoping that these force fields can do. And I will have a full breakout session on putting ligand benchmarking, which I think is going to be a large focus for many of the folks here. Can you send the microphone over there? But it's really important that we figure out, especially with limited functional forms, how we want to balance things against each other, which are we willing to compromise to get slightly wrong in order to get another property to be more improved, right? So part of this is finding surrogates that we can use in parameterization that will speak to the more complicated things. So Mike Gilson's idea of using host-guest thermodynamics as a way of a surrogate we can use in parameterization and really trying to get that right as a representative of the person ligand, for example, will be one way we can ensure that we're doing well in the systems we would like to do. But this has to be set by a community of people that really come together and come up with our best understanding of what data has to go into it. So this is where we might need to form some sort of advisory board in terms of what are the right pieces of data to include in the generation of force fields and how does that represent what we really care about? So this is supposed to be a community-driven process where people provide input about what data is most valuable and what's the right way to model that data. So we're hoping to at least be clear and open about that process and also document all of our choices in a way that if we learn something, it's not working out. We can change that and fix those assumptions that are flowing. John has mostly clarified this from when I got the microphone. But another way to think about it is what is the best force field given a set of data you wanna try to fit? There's never gonna be a best force field given the functional form. We can say, hey, given this set of data that we're trying to get, we're gonna restrict it in temperature range, we're gonna restrict it in pressure range, we're gonna eliminate metals. What is the best that we can do with a given set of experimental data? I think that we can do. And also there's gonna be a lot of cool science in terms of what functional forms are needed, what's sufficient once we start having those data sets that are specified for given sets of data. I would agree in the semi-empirical space. So, AM1, PM3, PM6, PM7 and all of those, you kind of know what they were parameterized for and then you use those semi-empirical parameters for, you could even call them force fields for similar type systems, similar type properties. We do realize this won't work for everybody, right? If somebody else has a particularly different use case, then they should be able to grab our data, ditch the data they don't need, augment it with data that they would like and then train it for their case. So it won't fit better on that use than me. So that's the other way. So, another question, should we be fitting paralyzed liner zones or using combining rules? This is something that, oh, I'm sorry, Christopher has another point. Well, when we're talking about the zoo, I wanna just, so I think it's going to become a real question what we do with our own private zoos. So, when we have individual companies making their own subspaces of torsion fits within their own walls because of their own proprietary fragments, I just, we will enter into a world where we will have some publicly released force field like the next iteration of the open force field and then is that consistent with your own private zoo of torsions which you made for the last version? You made your own private zoo thinking that your torsion terms were very, very general but actually you only parameterized them for a certain chemotype subspace and how are we going to actually try and get that research back into the mainstream of things? The fragments are proprietary but does each company want to keep their force field parameters proprietary? I see that as being a kind of a two tectonic plates meeting and causing volcanoes. The hope is that we can always share data rather than force fields because you can always combine data in a way that will build a better force field. So, this is part of the reason why we've been working with the pharma partners to try to see if there are data sets that speak to parameter sets of interest that maybe you're not the proprietary partition coefficient data sets would be highly valuable for example because we don't have large curated sets of that but these exist inside of industry. So, data sets like that would really help the overall effort achieve higher accuracy more broadly. I realize we're running low on time so I just maybe want to make one more point. We can come back to some of these discussion questions later. I think the effort has some good answers for these already but some of them we'll hear more about later on. One thing I want to point out is that one of the biggest questions is how do we ensure that we get the same force field transformed into all these different simulation packages? And this is a big question that we have to wrestle with. Some folks in the audience have spent many years tilting at this problem where there's such a heterogeneity of even how people perceive of what a functional form means as they implement it that it's often impossible to get exactly the same numbers or is very, very difficult as Julian Michelle and David Bobley figured out in just computing hydration for energies. I don't have data from the sampling challenge but from sample six we found that when we were trying to benchmark the rate at which different free energy methods converged in different packages for computing host gas planning for minimics, all we found out is that none of them agree on what free energy is and that the differences are large and significant. So this is something that we need to take very seriously. It's not clear that just throwing more effort at parliament is going to fix that problem. Instead we need a way to get the same force field specification, like the exact same conceptual meaning of what it means into each of the different simulation packages. So fortunately we have mostly sponsorship for a workshop on molecular dynamic software interoperability. This originated from Mathieu Chavon with the idea that we want to be able to take the output of simulations and transform them between different packages and analysis weeks. But if we can't simulate the same system then that's also a big problem. So we're hoping that this can be a major focus of this particular workshop. How do we get the same force field into each of these different simulation packages? And maybe there's an opportunity to have a single representation that everybody can pledge their support, which would make our lives considerably easier as well. So this is something that's not funded by this initiative, the open force field initiative, but it's something that will become important if you really want to make sure that the force field works the same way in different simulations. So with that, I think we're up at the coffee break. Are there any last minute burning questions or should we, I mean, part of why we're here is to meet each other and discuss things and have good scientific discussions about how things are going to build? There's a lot of dependence, of course, on the experimental data. And we know from the work of Peter Guthrie that this is a non-trivial task to gather all this together, curate it, make sure it's reproducible. So you're fitting a lot to experimental data. How do we guarantee the reliability or reproducibility of that? If you want to use ligand binding stuff, that depends a lot on the assay and is it reproducible across labs? So if you're going to use that kind of data, how do you guarantee that it's valid? That's a fantastic question and one we could spend an entire several days on, but for now we're starting with a very specific concept of what we want to use in terms of physical property data. And we're specifically using the thermo ML archive, which has been curated by NIST from several journals that report thermodynamic data. The advantage of this is that it's data that has been captured in some semi-automated way that includes information about uncertainties of the experiments and the measurement techniques and all of the associated metadata, including the original records. It has no ligand binding data. There's no ligand binding data. Yes, so once we get to the protein and even this has challenges, of course. So we're working closely with NIST to also refer to their internal models to see if we can get consistent results for their expected uncertainties and error bars for all of those measurements. So that's ongoing process. For protein ligand measurements, this is of course the Wild West. So we've been working with Michael's, and BindingDB has a full subset of hostcast measurement, I'm sorry, of protein ligand validation datasets. That's something that could be useful in the future. There's data sets that have come in through DPR and have already been cleaned and examined. Some of those have their own drawbacks in other ways. But it's something that we really need a working group to focus on, like, how do we know it's reliable? How do we use ICP50 data in this case or in that case? Can we combine data from different laboratories? Fortunately, we have a whole post-op where you, David, in the back who's cited at Janssen in Belgium and he'll be spearheading the effort to curate and collect these protein ligand datasets. There are a lot of data quality issues that we'll have to work through with that. So he's the Peter Guthrie of BindingDB data. Indeed. Right, with that, let's have some more coffee and we'll convene in half an hour at 11. We just have a side, sorry. And apologies to those of you in Europe, we can't send you home.