 Hi, everyone. Today we'd like to talk about our upcoming neural network model for assigning self-consistent partial charges. So up until now, force fields are open FF that largely cover the small molecule space. Chapin just introduced our upcoming Bridgen force field, Rosemary, which will apply to substantially larger molecules. And he's largely focused on generating and refining valence parameters, which can be applied efficiently to molecules through pattern matching by using spoke patterns. So the parameter shown on the screen, for example, applies to the back bend torsion outlined in teal. There is one part of our force field that we can't do this with, our electrostatics terms. So we use AM1-PCC as a charge model, which aims to reproduce the electrostatic potential surface around a molecule at HF631G star. It does this in two steps. Firstly, we calculate molecule charges with AM1, a semi-empirical method. And then we assign bond charge corrections from a library based on the bond types. So this is cheaper than REST, but it still scales pretty poorly with molecular size. As Chapin introduced, the first release of Rosemary will use library charge templates that assign charges by residue. These templates will get around the two main advantages of AM1-PCC. Firstly, the poor scaling with molecular size. And secondly, the fact that the charges that you get from AM1-PCC are dependent on the conformity that you use in the AM1-PCC. As Chapin has said, he's averaged charges over multiple conformations to get the multi-comformant representation. So L10 here refers to the method of selecting multiple conformers. And every time you see it in this presentation, it just means multi-comformant charges. So this was break for normal proteins, but the problem with charge templates is that you're necessarily limited to the scope of what has been templated. And OpenFF eventually wants to address more general problems in the protein and biopolyl well, like post-translational modifications, covalently bound ligands, and other non-standard units. So one of the major projects that we've been working on is developing a neural network product with sign AM1-PCC charges to molecules of arbitrary size. This project was started by Simon Beathright. It builds a lot of previous work by Ed Sheen-Wing and Josh Holton. And it follows the same general scheme as SBLM and charge. The general idea is that we use a convolutional neural network to generate continuous admiral representations from input features. We use this to predict the relative electronegativity and hardnesses for each atom. And then from here, we use the same charge cooperation method proposed by Gilson to assign partial charges that sum to the total molecular charge. And while neural networks operate a lot of flexibility and we keep betraying to anything really, our initial goal is to fit a model to assign AM1-PCC charges that could be used in Rosemary as a drop-in replacement for the tools that we're already using. And we actually have a couple of these tools that we can choose from. So the OpenFF toolkit has two implementations of AM1-PCC available for users, one using the open-eye backend and one using the amber tools backend. Both of these are generally accepted in the bimoleculous innovation community as valid implementations of AM1-PCC. It turns out though that these implementations can give different results and for both parts of the AM1-PCC calculation. So open-eye and amber tools can give different AM1 charges and the way that they assign bond charge corrections is also slightly different. So even for a relatively simple molecule like the one on the screen, the charges that you get out from the back ends can vary substantially. Right, so back to the project at hand. Our goal is a drop-in replacement for current toolkit back ends. And now we have to kind of decide what that means. Because we currently fit our parameters to open-eye multi-conformer charges, we can rephrase our goal more specifically to be having a neural network that assigns AM1-PCC of 10 charges using open-eye as a reference. And the way that we decide that we're close enough is we want a neural network to achieve similar variants to open-eye as amber tools does. So we can use a few different benchmarks to measure if we fit our goal. Firstly, we can compare charges directly to the reference and look at the chart or a mistake. We can also use the single highest difference between the charges assigned here. Secondly, we can look at more physical comparisons such as the electrostatic surface potential generated by the charges around the molecule and for multiple conflicts. These checks are pretty fast and easy, so they let us iterate quickly through different models. And just go back to this box, to the box plot here, because you will be seeing more of it throughout the presentation. This box plot shows how amber tools compares to the reference open-eye charges. And because these are all deviations, lower or more towards the left is better. Onto the neural network itself, there are several factors that go into actually creating a neural network, such as the actual components of the list in the neural network, the data that we use to train, and the features that we can use as inputs. And I won't go into this first point too much, the actual structure of the model, but basically we did a grid search over various combinations of hyperparameters for different feature sets. And we wound up with this general default model architecture showing the best performance. For our training data, we draw from a wide variety of sources, like the ones from the top left. We combine them all together, and then we filter molecules to within a certain size with a certain number of rotatable bonds. We then path the data set by generating up to two parameters for each. And then for each of these molecules, we generate conformance and assign open-eye chargers. After that, we try to generate a diverse but balanced training set that has at least four examples of each and a pair of environment by mostly following the the procedure laid out in Blythe-Cypher and Serena Rannick's paper. And then we partition it about 80-20% into training and validation sets. For the test data set, it's more straightforward. We just take our existing benchmark industry set and the SPICE data set for machine learning. And then we assign both open-eye and amber tools chargers. So we wound up with a data set that looks a lot like this, with a decent spread over the over the elements. We also looked at a number of different input features, but in the interest of time, I won't go into this too much. Instead, I'll focus on showing our latest model and answering one key question that we had, which is what should we train our model to actually produce? Because we can get AM1-BCC chargers in a few ways. We can fit directly to predict AM1-BCC chargers. We could also ask some neural network to predict AM1 chargers and then assign our own bond charge directions on top later. And finally, we can see it adding additional BCCs on top of the AM1 chargers already predicted. We can see if that gives us better results. So back to the feature set. In general, we found that models have mostly similar performance, past a certain number or minimum number of features. But I'll just focus on model nine here, which has the atomic element, atomic atom connectivity, whether the atom is in a ring, the total bond order of all the bonds for each atom, whether an atom is aromatic under the AM1-BCC model, the period of the atom element and the group of the atom element. So how does it do? Firstly, let me explain what's going on in the box plus here. So on the top three, we have the neural networks that I introduced previously. The first one is a neural network trained to produce AM1-BCC chargers. The second one applies additional bond charge directions on top of the first one. And the third one is a neural network trained to produce AM1 chargers with bond charge dimensions by the top. Because we generated our own confidence for our reference chargers, I also wanted to compare what happens in just like open and generate confidence for the multiple confidence chargers. So that would be the fourth row in the box plus here. And at the very bottom, we had amitose, kind of our goalpost. And this shows the performance of amitose chargers compared to the reference chargers. So when we compare chargers directly to open-eye, we see that every neural network that we have performs better on average than amitose. But we actually also see outliers and extreme values, which could be concerning. And then when we compare between the neural network models themselves, the models trained directly to AM1-BCC, the top two seem to do better than the one trained to produce AM1 chargers with BCCs applied afterwards. The stories are a bit different if you compare the electrostatic potentials generated by the chargers, protected by the chargers. So here, on average, our neural networks actually do worse than amitose, but they have fewer extreme values. And here we see that the best model is the one that's trained to AM1 chargers with one charge correction supplied afterwards. That being said, it's not quite clear how concerning these differences are. The difference between the highlighted orange model and ambitose is only 0.14 kilocalories per mole. So we can look at some other physical and automated benchmarks. Another metric is to compare to the QM electrostatic potential. Since AM1-BCC is fit to the ESP at HS6-1G star, we can calculate that same level of ESP with a QM calculation and then just compare the QM property and our general property directly. So this graph plots the root mean squared error to the QM ESP across multiple components and benchmarks with this will perform for the smallest subset of the test set, kind of a selection of the industry benchmark set. So unsurprisingly, the leftmost gray line is risk chargers and they do the best here. What's interesting here is that while the neural networks seem to perform worse than ambitose up until a certain special, you do see a clear difference where the model trained to produce AM1 chargers with VCCs supplied later performed better than the ones trained directly to AM1-BCC chargers. We also compared hydration free energies with different chargers in tip 3P water and we did serve for a spread of molecules. So not just ones that had that were outliers on the benchmarks, but molecules taken over a spread of the benchmarks. So on the x-axis here, we have molecules with neural network chargers. On the y-axis, molecules simulated with reference open-eyed chargers. In general, they align well, but I will pull a few examples out and have a look at them specifically. The molecule with the highest difference in hydration free energy unsurprisingly scores poorly across all three metrics, especially the ESP. And this one here. That being said, just having a large difference in electrostatic potential doesn't always mean that the hydration free energy is affected that substantially. The same applies for some of the more concerning AM1-BCC and mass difference benchmarks where for this molecule, it basically didn't affect the hydration free energy at all. And the inverse also kind of applies. So for example, having very similar electrostatic potentials doesn't necessarily mean that you get very similar hydration free energies. So here it's a 0.23 kilocalorie per mole difference in ESP compared to a difference of over two kilocalories in hydration free energy. Since this network is meant to apply to the biopolymer domain, we also want to have a look at performance on proteins specifically. And especially those with modifications, since that is the primary piece case for this network. So I generated a data set of peptides between one and five besieges long with and without post-translational modifications, which are enumerated. The results are a bit disappointing. We see generally higher differences from reference charges than on the test set. If you look at just the peptide bond, the AM1-BC metric is worse, but the maximum difference in the ESP metrics do improve. As to why this might happen, we did only have less than 2,000 molecules with a peptide bond in the training set and less than 700 in the validation set. And we actually specifically excluded larger molecules from training and testing. So it might be that we just need more coverage of the protein domain to get better results. We do know that here again, training the neural network to AM1 gives better results than training to AM1-BCC. But overall, it looks like we have some work to do to improve. So to some of the stories so far, on average, our neural networks perform better than avatars on direct charge comparisons, but they do have high allies. That being said, in free energy comparisons, we saw that high differences from benchmark don't always correspond to large differences in hydration free energies. On the ESP benchmarks, our models perform worse on average, although only up to 0.2 kilocalories per mole. And they also have much lower extreme values. And finally, unfortunately, we have relatively poor performance on proteins. To give back to our earlier question of which model we should focus on, using a neural network trained to produce AM1 charges, and applying DCCs on top afterwards seems to be the way to go, especially if you look at physical benchmarks. But we have a couple of things going on to keep improving and assessing all three of our models. So firstly, we're returning with more peptide data. We also want to generate more cure and peptide data sets for ESP comparisons. So far, all the models we have shown here were fit to charge us directly, but we do want to explore fitting to other properties such as the electrostatic potential and incorporating multiple objectives into training. This has been slow to get started because of some issues with memory, but is now running. With assessing the performance of our models, the proof is really in the pudding. So we also want to set up some more simulation benchmarks such as protein-figure benchmarks. And finally, we're exploring the possibility of having warning systems for possible bad chargers. If we can warn users when the predicted output might be inaccurate, they can choose whether to fall back to a conventional way of assigning chargers. We found that one way we could do that is that we found it's useful to provide predictions of multiple neural network models. So the average of neural network models don't necessarily perform better than a single model, but the standard deviation between them can be indicative of how much the charge RMSE can vary. So something like this could function as a potential warning flag for users. And finally, we would like to make the model more efficient. Right now, very little effort has been made towards that, and it's actually substantially slower than using protein laboratory chargers, although overall it is much, much more efficient than using OpenEye or Avitools. So the plan is to release a force field with neural network chargers as Rosemary 3.1. Given the results so far, we're confident that we will soon be able to do so after the initial release of Rosemary with library chargers. And we're looking forward to the expanded workflows that it will enable. So with that, let me just thank everyone at or involved in OpenEffort, especially Branching, Simon, Josh and Chaven. And just a reminder that we do have a prototype model available in the OpenEffort toolkit now, if you would like to just play with how it works. And thank you very much for listening.