 All right, it's my pleasure to kick off the second day of our open force field meeting. I'm just going to be talking about some of the forward-looking aspects of the collaboration. We have a great day lined up for you today. We're going to hear from our guest speaker, Olexandria Seov, about some of the really cool stuff going on with machine learning force fields, because we think there's a bright future for these, possibly melded with physical force fields. But I'm going to tell you about some of the technologies that we've been working on that will take us through year two and beyond. I first want to want you to think about, like, there are other fields we should be paying close attention to and trying to learn from them when we can. Our field is not a very modern one in some sense. It started computational chemistry quite some time ago, and a lot of the tools we use are from the earliest days. Certainly, we're still using GAF from 1999. That was when I started grad school. So I want to challenge us to think about what the future could look like for force field science. And if you look at machine learning, for example, apologies for the top bar. I don't think I can move that. What happened in the case of machine learning where there's been a sudden, huge increase in interest? I don't know where I can put this, right? Very close, yeah. But what it is, it's the number of Google searches you've seen for machine learning, right? So it's increased enormously over the past two years. And what's driven this kind of increase? Well, part of this is simply due to the fact that, look on this. There's a huge number of tools that have come out, some of which are big, open tools that allow people to be very productive, right? These open software ecosystems that have come out, there's lots of money behind TensorFlow, lots of people are using it. It's powering a lot of research. But what it does is it provides these useful levels of abstraction that enhance productivity. And now that these easy to use tools are deployable in many different aspects, they find new uses everywhere. It's under the hood for many different things. People can put these TensorFlow models onto embedded devices or inside of cars, which is really cool. So what is it that's behind these abstractions that makes people be productive? So here's an example for those. How many of you have used TensorFlow before? OK, so it's very familiar to a lot of you. This is the Keras API or the 2.0 API. In just a few lines of Python, you can grab a standard curated data set that the field has standardized on as a standard way of benchmarking these models. You can define a new type of model by collecting a few different abstractions of layers, composing them in a novel way that maybe nobody has ever done before. You can explain to it how you want to train it and what your objective in training it is. And then you can fit it and use it all in a few lines of Python, right? Imagine if we could do this for our own field, right? So I mean, we could do something like that, right? This is totally possible. We're curating these data sets. We can make them programmatically available. We are coming up with abstractions for physical forces where we could put them together in a new composable way where you might want to try Druda oscillators or you might want to add in a certain kind of bond charge corrections. And you might want to combine them in a novel way. And then you could explain to it how you want to train it, possibly using WePings fancy approximate Hessian-based methods. You can explain to it how you want to penalize different deviations or experiment. You could fit it and evaluate it in a few lines of code. So I'm not saying that we're definitely going to do this, but with something like this be valuable to the community. And I think this is something that we should be thinking about. What are the levels of abstraction that would allow us and our daily jobs to be very productive? So what we've been doing is focusing on automation so that we can easily scale our effort and go through rapid cycles of parameterization to ask different questions about what matters and what doesn't, what helps and what doesn't. We've concentrated on what we identified a while ago as the easy aspects, right? Fitting valence terms using fixed valence types. But we're working our way through all of these different aspects to try to get to the point where we can ask really complicated questions about whether polarizability helps if you have the same data set you're training on, for example. Nobody's ever done that kind of experiment before where you can use the same data, you train things in a consistent way and you ask, is polarizability worth the extra cost? How much accuracy does it deliver in problems that I care about? So our toolkits and our infrastructure will allow us to ask these kinds of questions. But this is sort of the rough idea, this should be generation three, rough idea of how we're moving forward here. Part of our idea of going through rapid iteration cycles where originally we were aiming for every three months to release a new force field. That pacing may change, we're still discussing that right now. But the idea is that we could do some science in parallel with all of the automated infrastructure that's central to our effort. And once the science was ready, we can merge that in and benefit from it because we've constructed it in the right kind of modular way. So we're working on things like off-center charges, off-atom charges, for example, that might make it in sometime soon after the toolkit infrastructure supports it. And then I'll talk a little bit more about Bayesian fitting framework. I'm not going to talk about the physical property surrogate models, but there's a lot of work going on there too. I mentioned yesterday there's a lot of experimental data and a lot of quantum chemical data types we can tap into. And so we've been designing our infrastructure in a way that's very modular. So if you want to try to explore a new kind of data set, it's very easy for an individual researcher to examine that data set, come up with a likelihood model for it, and then plug it into our infrastructure in a way that we can all benefit from it without having to change the underlying infrastructure that we've built. There's a lot of things we can do as we start to scale up. So we fit to a very modest amount of data in this first force field iteration, but there's an enormous amount of data we can tap into. And by design, we've structured our toolkit, not the toolkit, but the infrastructure so that we can tap into very large data sets or generate very large data sets. With the help of Daniel Smith's QC archive from the MOLSI, we're starting to look at generating very large data sets, things like small molecules from the PDB, which Alberto has generously contributed some fragments of. We could go up to things like Enemies Real, which has 11 billion molecules. We've been talking with people like Katarina about getting their patent data sets of different molecules. And if we pulled them from all of the different partners, there might be over a million molecules there, especially more if we fragment them. We're thinking about allowing partners submitted data sets possibly through a simple API. So for example, you could just have a thing you submit. It runs on a SMILES file. It will fragment everything for you and submit that to the QC archive. And we can balance that with the other critical calculations that we're running. Also, we heard about the ThermoML archive. We've just used a very small fraction of it. So a tiny fraction of a percent. There's a huge amount of data we can exploit that has a lot of coverage of different types of chemical moieties. And then as everything goes on, everything is getting more efficient. PSY4 keeps coming out with faster releases so that we can grind through more quantum chemistry. The QC fractal ecosystem has become faster and more responsive and scalable to very large data sets. Things like OpenMM keep getting faster and exploiting faster GPUs. Our clusters keep getting upgraded. And the property estimator keeps getting more efficient as it's being able to draw and re-weighting. And then also, we're looking into increased access for computational resources. I just wanted to mention that the fragmentation for small molecule quantum chemistry, this is all done by HighEastern, is becoming more ready for production. So we're going to start our first fragment-based quantum chemical data sets pretty soon. This is very much inspired by all the work the Pfizer scientists have done. And I hear that their paper is coming out quite soon. But it uses a vibrant bond order to make sure that we don't cut across bonds that might disrupt the torsion bond. And this was a close collaboration with Christopher Bailey in trying to figure out how to use these vibrant bond orders to inform understanding where we cut chemical environments. So this idea of automation and modularity is really helping us make progress and be able to go through rapid iterations. Apologies for the data points here. So the basic idea is that we would like to make regular version improvements. And again, the pacing is something that we have to discuss here. But every cycle we go through, we can expand the data set sizes because everything is becoming more efficient. We're able to use more data. We've been thinking that as we curate a benchmark set, which requires human effort, for example, that can become a training set for the next iteration so that we don't need to go through two stages of data curation as we do this. We've been talking yesterday in the breakout group about exactly what kinds of data we'd like to see in these benchmark sets that then become the new training sets. So that's still up for debate. But I think this strategy is allowing us to quickly move through these cycles once we get to pace. We're also looking at ways in which we can do more in terms of Bayesian inference in the future to not just do local minimizations. I'll tell you a little bit more about that too. And eventually, we'd like to be asking questions like, which functional forms and which choices of things like Leonard Jones mixing rules are justified by the data, for example. So I mentioned modularity is very important too. So one of the cool things about the QC fractal ecosystem is that we can easily define new distributed quantum chemical workflows that include several different steps that then are deployed widely across multiple different systems. And this has been super powerful because we've not only run torsion drives, we've run a lot of 1D, we've tried a few 2D, there's also different optimization data sets that we've created where you enumerate different conformers and then store every point along minimization from those optimization data sets so that we can get information more globally about conformational, conformer energies, for example. So that's something that we'll probably play with and expand so that we can get electrostatic potentials and do things beyond Hessian calculations as well. I mentioned the physical property data set and physical property ideas in that you can define a new physical property like speed of sound and maybe a new data set or data source so that it can automatically curate that and interpret what data is available. And this allows us to get access to very large data sets that we could use for parameterization. So right now a lot of our work is in selecting what's the most valuable data to actually curate. But we're thinking about expanding this beyond into other data sources. For example, there was the discussion about CCSD potentially releasing a small fraction of their data set as a public data set. We've also been talking about using the property estimator for benchmarking and assessment as well because it can automate and distribute these calculations. So once we get open source tools for doing this, they can be distributed amongst many different GPUs as well as CPUs. One of the examples of this early on is that David Slocar has integrated Paprika so that we can actually do a host guest thermodynamic calculations in a distributed manner. So the toolkit itself can be expanded as well. So we've built this in a modular way such that new parameter types, such as drew to oscillators or point polarizable dipoles or off-site charges can easily be integrated without having to touch any of the central infrastructure. So this is really exciting because it allows us to explore a lot of different things including possibly bringing in ML models without having to touch the central infrastructure. And that'll allow us to keep expanding and exploring new research ideas as we move forward. So there's a bunch of different extensions to the infrastructure that we've been talking about making over the next year or two. One of the earliest ones is probably being able to use vibrant bond order to interpolate valence terms. And I'll mention just one slide about why that's really important. We've been talking about off-adam virtual sites. We talked a lot about alternative Leonard-Jones mixing rules and possibly using couple torsions that we haven't explored much in that area yet. But there's also support for things like alternative intervals functional forms where if we find that there's a slight modification to Leonard-Jones, for example, that would deliver increased accuracy, we can start to experiment with that and then work with package developers to eventually get that into their tool of molecular simulation packages in the future. And then it would be really exciting if we could also bring in support for ML potentials as well. So I mentioned vibrant bond order is something that could be very useful for assigning valence parameters. So Haya has found that in work with Christopher Bailey as you look at the change in vibrant bond order for a central torsion bond, the torsion barrier seems to scale. In fact, it's so striking that if you look at a few different related molecules, for example, as you plot the vibrant bond order, the torsional barrier height seems to linearly change. What's even more exciting is that this seems to be a universal phenomena. So if you compute this for many different related series of compounds, as you change the vibrant bond order, the slope of this change in barrier height almost seems to be universal, which is extraordinarily exciting because that could lead to a great reduction in the number of types we need to accurately treat torsion barriers in general way. So we're still exploring the ramifications of this. There's some cool papers being written right now, but this is something that we very much want to play around with because it could bring a drastic reduction complexity for increased accuracy. Christopher is very happy about that for those of you on Zoom. We also need access to more computer resources as we scale up, right? We've used a very modest set of computer resources right now. We have, for example, 1,000 cores at MSK striking away on quantum chemistry. We only have about 200 GPUs. You see, Davis has been grinding away on 1,000 cores and can use up to 100 GPUs for their, mostly they're leading the refits to quantum chemical data. We're integrating other of the academic sites as well, but it would be very cool if we could access the big iron as well in a way that made sense. So we're looking into that. We've been thinking that we could use AWS or Google Compute Engine. There's maybe academic grants we can get for this too. So we're still investigating that. One thing we've also been looking into since we have unlimited quantum chemical data sets we could grind through is whether we can get the QC fractal ecosystem running on folding at home. And we're basically awaiting some developer dollars to actually do that. So we're trying to seek that from different sources. So Ross asks, how feasible is it for companies to contribute to the computation of quantum chemical data? We've thought about that and maybe Daniel Smith has as well. I'll hand the microphone to him. Yeah, so I think the biggest issue there is getting behind your firewall and being able to compute those and then validating those results. So I think it's a relatively low amount of work for us now that we have a stable ecosystem that we feel that it's been up behind your firewall. And it would most be a matter of like trying to figure out how to chunk up data sets for you guys to run and then basically getting a blob of data sent back to us. So I think it's relatively straightforward at this point. We'll be interested in that. So I'm gonna take a little diversion to talk about some of the future parameterization infrastructure and ideas that we're trying to take from Bayesian inference. And we may not jump directly into it but we may take a small diversion into Bayesian like methods. So one of the things we're aiming for is to really automate everything that we're trying to do where we don't need chemical wizards anymore like Christopher who can tell us exactly what the bond length of a particular nitrogen should be but instead would like to discover this from the data in a fully automated manner. And so we're really aiming for something that can automatically select like the appropriate functional forms and systematically improve our force fields in a way that can also quantify the uncertainty in our force fields and our predictions. And also to tell us if we have limited budgets for gathering new data what new data is gonna be the most valuable for moving forward. And there's a lot of challenges with the way that current force fields are parameterized. One of which is that we may have many local minima in the parameter space. That's something that we haven't even started to deal with yet. There's a lot of other issues too that we kind of glossed over. We do have to come up with all these data weights about how much you wanna penalize deviations from experiment for one class of data versus another class of data. And then we've almost barely scratched the surface about overfitting. We're not even keeping a validation set out to monitor whether or not we're overfitting in this case. So there's a lot of concerns that we have to address. And just to illustrate one of these things. I think I was talking with Julia and Bill yesterday about the banana shaped parameter landscape and how you can, you know, there could be an entire non-linear manifold that has approximately equal fits. And that might be the same, it might give you the same predictions for the training set, but it might have vastly different predictions for the, for whatever application you actually wanna use this for. So why just pick this set of parameters and why not actually assess the uncertainty from over here. Here's an actual example of methane GB radii. If you look at the C versus the H radius, fitting data like hydration free energies pretty well, it fits on this weird non-linear manifold, right? And there's no reason to prefer any particular point along that manifold, but some of them might generalize better than others. On the other hand, if you look at the deviations in any individual direction, you'll find that it's very sensitive. So you can't just change one parameter and freeze out the others. So Josh and my group has done an experiment also looking at fitting GBSA models because it's a very inexpensive surrogate. You can compute hydration free energies from just a few samples. These are actual free energies of transfer though. And he's matching against free solve dataset, which has about 700 compounds. If you start from sampling from the prior for all of these different types, these are radius and scale. And you just minimize using a standard kind of minimizer like you would find in force balance. Here's where you end up. The little dots show you where you get stuck. You don't move anywhere. You can see that the versions, like we're not seeking a global minimum here, are ending up stuck in a very different kinds of models with very different areas in that case. If you try somewhere near the basin where you think there's a global free energy minimum, which we've found by exhaustive longevity dynamics, the same thing happens. It's very rugged on every scale. You still get stuck in a very number of very different model places that are not at all equivalent. In this case, they're pretty similar in error, which is great. So we're hoping we can seek those free energy basins, but there's still a very rugged landscape that we have to deal with. And that's something that we're not dealing with right now. The other thing, of course, is that we'd love to be able to predict not just a number, but a number with an uncertainty that will give you some confidence that the free energy you're predicting for a binding free energy is going to be meaningful or it's going to be worth it, or if there's some problematic functional group in the compound, we'd love to be able to tell you whether or not that's going to cause uncertainty in our predictions. So thinking beyond your two, we'd really like to be able to deliver some tool that could give you an assessment of uncertainty. And the way we do this is through Bayesian inference, where we have a prior, which tells us something about physical parameters. We have a likelihood function, which tells us how likely we are to see some experimental data or some quantum chemical data. And then we can put that together into reasoning about the posterior and we can just sample from that. And the cool thing is that the likelihood function is something that you can easily multiply. So that means it's very decomposable and very modular so that when you plug in a new type of data, it just, the log likelihoods get added up. Simple as that. You can do all sorts of cool things, provided you have a way of computing whatever it is your experimental measurement is from your simulations, which we already put into the property estimator. And then you need an error model, which says given that I've observed this density, and I know what the true density is, what's the probability that deviation was with an experimental error. And that's something that the thermal ML archive provides through the uncertainty measurements that are reported automatically. So the cool thing is that as you add more data in this Bayesian approach, if your model space is big enough, you always get better. You can quantify how much you reduce the uncertainty in your parameter space or uncertainty in other predictions through information theoretic means. And the really cool thing is that even if you don't know some parameters, like things related to the experimental error measurement process, you can just infer those away as long as you have some sort of model of them. So you can propagate that uncertainty all the way through. You can even use prior rounds of inference. So if you reduce a parameter set where you'd like to be able to say something about how does the internal data I have in my institution actually help condition this, you can very rapidly reweight that or reselect a subset of parameters that includes the effect of your internal data. So you're essentially selecting some new parameters that are more consistent with your internal results. And the cool thing is that this is exactly the same as anything you know from Statmec. There's an isomorphism between the two of them. If you wanna understand that better, there's a great book you can read from John Liu from Harvard, Monte Carlo Strategies and Scientific Computing. All the slides will be online so you can find the link later. So we've been using different methods from molecular dynamics to explore parameter space in this prototype model. And I just wanted to show you one quick example back on the GBSA landscape. There's a lot of things that you could do that are very familiar. We use Gibbs sampling and replica exchange, for example, or expanded ensemble. Longivon integrators we use all the time to sample and for energy calculations. You can use the same techniques and sampling from parameter space and show that you can really explore parameter space quite broadly and very rapidly seek good solutions that are often much better than what you would get if you just minimize. So this is something that can easily be brought into the force balance aspect of things as well as into future iterations of how we do parameterization. So as I said, conditioning on data reduces uncertainty. So if you look at this visually, for example, if this is an experiment that was done by Michael Schurz's group, if you're thinking about using densities and enthalpies, they support different Lennard-Jones data sets. So this is the high probability region of epsilon and sigma for if you're using the densities of methane from different temperatures, here's what you get for the using the enthalpies. And you can see that they're very different from each other but their combination gives a very well localized set of parameters. So we'd like to keep combining more types of physical measurements in a way that will really constrain parameter space. I will skip that because we've already talked about predictions. So there's a lot of decisions that this approach can help us make in the future. Like we were talking about Lennard-Jones mixing rules and we could try things and certainly there've been papers about which ones fit quantum chemical calculations better but which method is really best justified by the data. And this is the case where the statistical nature of the measurements that we put in there will help guide our choices between different discrete sets. So we don't have to rely on something that's not statistically sound anymore. We can use the statistics of the data set to really say how much we prefer one versus another in terms of gambling odds. So the base factors turn out to be how much you would bet, how you would spread your bet amongst different decisions if you had to play over and over again against a house for money. Same thing for atom types. So I just wanted to highlight one aspect of this though you can obviously use this for a lot of different choices where we have discrete choices to make but how we choose atom types is actually quite difficult. I'll skip over the math but the basic concept here is that if you have a mixture of Gaussians that all you see is this and this is often what we see for example for distance distributions for bond types you can fit a number of Gaussians even if you don't know how many components there are. These Bayesian methods allow you to figure out that there are three components and what the parameters for those are. This is just a graphical visualization of this process at work. It's just Monte Carlo sampling. It's the same thing you would do for a grand canonical simulation where you have to insert and delete more particles except instead of particles we're inserting and deleting more parameter sets here. So there's a very statistically rigorous way to figure out that there are three components and what the parameters are. So Camila Zanetti and Caitlin Bannon from David Mobley's group had tried this idea and coming with new atom types and new parameter types. So now there's a great publication that's out. You can find it on the open forested website. Smarty and Smirky are the names of the algorithms right now. But the basic idea is that if you have a parameter type you can either delete the parameter as a child or you can create a new child parameter where you try to change an atom or change a bond or add something to this. One simple way to do this is to take the parent types and add a bunch of smarts like decorators to construct a new child type that's more specific than the parent. And when you do this process, it looks like you can create a typing tree which is a whole Smirnoff force field typing tree for a particular parameter type that tries to elaborate on the chemistry that might be relevant. And then you can accept or reject this in a way that is consistent with the data. So they had used this and worked out some really cool ways of basically trying to crack open Christopher Bailey's head and generate a computer algorithm that will at least come up with almost the same kind of type elaboration scheme by matching hand created types on different data sets. And the cool thing is that you can actually do this. It discovers chemistry on its own. It's a little bit inefficient right now but that's something we can fix. So Josh and my group tried to apply this again to the GBSA example that I just showed you where he's starting with the parent types and parameters now and trying to come up with the librarians, these children that have slightly different parameters than their parents. And if you run this process, what you see is that it actually discovers chemistry. So here's it sampling the born radii. Here's the increase in the log posterior. Here's the amount of time we spend in the number of different types and it doesn't increase without bound. It kind of saturates at some point and then comes back. But the cool thing is that at the end of this process, you'll end up with an elaborated vision of chemistry where it does things like it distinguishes my trials, which is really cool. It understands that sulfonyls have to get different types. So it's something that will allow us to do this type elaboration in an automated way and it's a little bit off for the entire, using this for the entire typing tree, but I think we wanna bring in these ideas sooner rather than later because it's going to give us a way of automating the decisions about which parts of chemical space we should be exploring. So I don't have much time because we started a little bit late. I will try to wrap up very quickly, but I wanted to say that we're thinking about down the line past year two probably using a second generation Bayesian based force field where we've started with this idea of elaborating rediscovery of atom types. We're sampling over GVSA types, typing rules right now. We're thinking about experiments right now where we can determine how many Leonard-Jone types we actually need to fit the data because this is something where there's a lot of uncertainty. We've inherited these Leonard-Jone types from very early days of our force fields. So we could probably do much better there. And then we've discussed a lot yesterday about the mixing rules and what types of functional forms. And like I said, these ideas about using something like Langevine Dynamics to avoid local minima is something that we can easily put into practice in the near term. And then I think the killer app is these for binding free energy calculations is quantifying the major source of uncertainty in our predictions, which is the force field. That's something that we've never been able to incorporate into these before. And we think we can put together something that will very rapidly reanalyze data that you already have and give an estimate of what that force field error is going to be. The question is, what will this framework look like? Will it be something that goes into force balance? Will it be something that we build on top of the existing infrastructure we have? This is something we're still discussing. But one example for just to take advantage of the ecosystem that I mentioned before is TensorFlow has added TensorFlow probability, which used to be called Edward, where Dustin Tran has put in a lot of really cool things that allow us to use a lot of the stuff we know about. Rare replica exchange, Markov chain Monte Carlo, Langevine Dynamics, all of this is available within the TensorFlow probability ecosystem. And we could build on their ability to support distributed computation. There's significant advantages we train students to already be familiar with TensorFlow because then they can do other things as well. And there's synergies with these machine learning potentials if we make it easier to compose physical force fields with models that come from machine learning as well. So we're talking about what framework is actually gonna be best for supporting this kind of thing. We're gonna hear next about Olexander talking about Annie and friends, but now there's a proliferation of different machine learning potentials that have really intriguing results about their accuracy in predicting things and how they can use experimental data to better learn different aspects of the force field. So we can think about maybe hybrid models that get rid of our terrible expansions for valence terms and torsions and include physical long range interactions that are more justified and fitting to both QM and physical property data, which as far as I know has still not yet been done. Maybe Olexander will tell us otherwise. But there's, we need to standardize how we get these into the molecular simulation tools. And that's the real challenge. So we're gonna be talking about this at the multi-interoperability workshop as well. There's one other cool thing I wanna show you and then I'll stop and take some questions. But the basic idea here is that we're trying to think about the infrastructure changes we'll need to get rid of some of the problems we've or hiccups we've run into over the past year. One of which is that we love AM1VCC but because so open eyes implementation is fantastic that's not available to everyone. So in RD kit we have to expand different conformers. We have to use the Ember tools SQM, which is extremely fragile for computing AM1. And it still takes about 15 seconds per molecule, which is a lot of time if you wanted to scale up to 11 billion molecules. So there's also differences between the AM1 implementate or the AM1VCC in open eye and in RD kit. And it's certainly not gonna scale the biopolymers if we want consistent pre, consistent charging scheme between those two. So there's ways we've been looking into getting better AM1 into different sources, but that will take some time. We definitely want to use the Viber bond orders as I mentioned before. Maybe there's a way to replace it with a simpler machine learning model. So Yuan Cheng Wang, who's here in the back somewhere has been experimenting with different ways of using machine learning, in this case graph convolutional networks which automatically encode the chemical equivalents of different atoms in his package gimlet, which I guess this is a line here. So it's certainly consistent with Smirnoff. The idea here in these graph convolutional networks is that you pass messages from atom to atom according to some functions that depend upon the atomic features, which might include the elements and some other aspects of the electronegativity. And you can pass messages to their neighbors in a few rounds where you're sizing the size, it's still a neural network, but you're sizing it based upon the molecular graph. So everything is still consistent. But otherwise it's just like a neural network kind of model. The difference is that we use this physically inspired model, which is what Mike Gilson actually turned me on to, which is that if you expand the energy for moving charges around, you can define what's like an electronegativity and a hardness and electronegativity is how much you want electrons. A hardness is how much you resist taking up a different charge. And you can put this together into a simple minimization scheme that given the hardness and the electronegativity for every atom predicted in your molecule, you can solve for the total charges using the constraint that the total, or you can solve for the partial charges given the constraint that the total charge has to be unchanged. And Repa and Goddard used this early on in their charge equilibration scheme. Mike Gilson described this really cool V-charge scheme, which is basically a graph convolutional network. So Yuan Qing just said, let's use more modern graph convolutional networks and message passing schemes and see what a modern method will do on this. So we just started with Sorina Rinniker's very high quality QM charges. These are confirmation dependent, but we're excited to do the confirmation independent A1BCC soon. And the cool thing is that it works really well and it's also quite fast. So the R-squareds, apologies for the lack of air bars here, but the R-squareds and RMSEs are quite small. We don't know what the impact is on free energies, but this would allow us to predict the charges that we should use for later bond charge corrections, perhaps in a very fast way and that's also scalable to entire biopolymers. So this seems to be very exciting. The cool thing is that it's 500 times faster and it's entirely portable because anywhere you can run TensorFlow, you can run this model as well. In fact, you could even export it and put it into different libraries that don't even need TensorFlow. So, yes. Yes. There is a particular issue with the way that SciPy computes this particular statistic. Yuan Qing will have to explain what this weird thing is. So there are two ways to define R-squared. The first is that you can simply define that as the ratio of explained variance versus real variance. And the second way is just a square of R which is the slope of your linear regression. So if you use the second definition, then you get a sort of a free layer on top of it. So I chose the first one. But if that is too confusing to everybody, I guess I'd switch to. We've already talked about how the standard of the field is to use the fraction of variance explained. So I think that's something that we should definitely do and also get the 95% confidence intervals. But it's at least exciting. So with that, I think we're coming up to the point, is it the break is next? Okay, so I just wanna end on one particular slide here which is we've been talking more about sustainability for this initiative too because I think there's a lot of science that's left to be done beyond a two-year effort. In fact, as David Mobley had mentioned, our initial idea was to have between two and five years, depending upon how much funding we got, we're way below the five-year level. So we're still on a bold five-year mission to deliver what we originally thought. But there's so much science to do that. It's very exciting to think of that, continuing this effort and allowing synergies with other researchers who wanna use our infrastructures and especially researchers outside of our own community. We're trying to engage with force field developers outside of the open force field initiative itself. So we formed a scientific advisory board, which we have a few members of, including Bill and Julia, attending from IBM Research, about how we can maximize the impact in other more traditional force field communities. And I think that's going to become very important for us. But we also need funding to keep the effort going. So besides the consortium, we've had some small success with NSF funding, some of the aspects, including Michael Scherz. A lot of us have kicked in, a lot of institutional funding to support this effort. It's now in the hundreds of thousands of dollars at the very least. We've been very lucky to benefit from MOLSI software fellowships for our students and postdocs. There's also an NIH-focused technology, R&D, R01 we've submitted to allow us to expand into biopolymers and heterogene systems, which we think are really important so that we can consistently parameterize things in the future. There's also some other philanthropic efforts that can support infrastructure, like specific open source software in one-year chunks. So we've been thinking about maybe using that for some of our key infrastructure resources. But we'd also like to talk more about other funding sources that might help the effort to continue because as I said, there's a lot of science to be done and I'm very excited about all of the aspects of it. So with that, I'll take any questions and then we can have a coffee break.