 Okay. This will be, I think we can make this relatively short, I'm the only one between you and lunch because a lot of these concepts and ideas have been discussed already. So, so the next steps. So, so we talked a bit about what was done for this first partially release. And the next steps really are benchmarking the force fields that are produced and bringing in data for future parameterization what sorts of data are next so the general strategies. We want to, I should be saying no wonder that's not a mission. The general strategy we're going to be using throughout the course of this effort is parameterize on cheaper quantities that we can generate vice and cheaper quantities to generate by simulation benchmark and more expensive quantities and then gradually over time as our ability to calculate better as change, we shift data from benchmarking to parameterization things get moved into that set and you bring in what complex things more data into the benchmarking set. So, current put so this is a key part like whatever we do here we've been talking a lot about these other things, we need this experimental property data to be able to make sure force field works. So, the initial problem physical properties to the first release, there weren't that many physical properties use so Simon did a great job of getting this together short about a time, using densities and HVAP for people it's near room temperature from thermal ML, we're just optimizing existing Lenergens types only the only so they're not all the smirk strings corresponding to Lenergens to change only the ones that involve at least two HVAPs and two densities to behalf the total those the who were not overfitting things really it's just 30 compounds here the compounds and 58 total data points so there wasn't that much data that was involved in the beginning, because we were just making sure to work for work, finding that an immense amount of work to make sure that it did work. The uncertainties used in this so we use an important part of physical property prediction is using a properly using uncertainties. If it is a type of data that has a large uncertainty, you don't want to be over emphasizing that. And invasion inference that comes in through your error model. And, but even an optimization that is essentially equivalent to a regularization that you analyze a missing that value more if your uncertainty is low, and if you don't know very well then your penalty is like that can be mapped directly. So for now, we're taking the uncertainties for the average uncertainties for density in a drop overall for reasons. So we want complex. So what what other sets of data we use using densities and heat of vaporizations are things that have been used for decades to improve force fields, but what do we want to do we want complex with measurements that are sufficiently easy to compute. Really, the hypothesis is that getting the composition of properties right and binary and journey liquids will go quite a ways towards getting protein like an interaction right but really, you know, Adams is Adams hydroxyl group is a hydroxyl group that as long you take into account the chemical environment that it's bonded to so you get the thermodynamics right. It shouldn't matter what it's in it's a small molecule of protein. And so, because these calculations are significantly easier to converge and simulation and significantly easier to make precise experiment. So, really, thermodynamically, if you have the Gibbs free energy, as a function of temperature pressure and all the various components, it is sufficient to characterize all the thermodynamics and emphasis you can't come up with a observable that isn't, if you know this, everything is set all the thermodynamics, thermodynamic properties are set. So we want to be trying to characterize this function but it turns out the change in Gibbs free energy with pressure is just the volume is a function of temperature pressure and composition. And so there we can get that densities and mixtures excess volumes excess densities. These are all just different ways of measuring the volume is a function of tp and composition. And if usually we think about free energy, derivative of free energy with respect to temperatures being entropy, but if you sort of recast it and do some dimensional changing then really the temperature dependence can be written in terms of the enthalpies as well, which we can get from delta H is of mixing delta H is of vaporization delta Cp of mixing where that's actually a second derivative property but it gets us information on that. And of course, the change of free energy with respect to composition gives us chemical potentials and those chemical potentials can be and they're either tabulated directly or salvation for energies activity coefficients and they're all just different ways of expressing the chemical potential. And so a few other things, the electric constants related to the Gibbs free energy when electrical field is applied, or if we look at the change with respect to some interfacial area that gives surface tension so really what we want to be doing is, you know, if we can understand the Gibbs free energy as a function of the things we vary. That provides us a guide to what the experimental data is telling us. So thermal and L has a wide range of fluid mixture data collected from the from literature. As you can see, I mean, these are a number of data points and thousands you've seen this graph before. There's a lot of data we can look at here. You know, enthalpy, 30,000 enthalpies of mixing over a large number of compounds. So there's a lot we can wait heat capacity a binary heat capacity heat capacities as function of them. So there's a lot of data to to program. So that's one source other physical property data we're planning and using a host is fighting affinity. So the gill sum lap is extensive experience emphasizing measuring and simulating these systems. And so where the key is, as you can functionalize these best get systems, and then test with a large range of different hosts have different different chemical functionality. So it's a it's a probe of not just kind of get a single host gets right, but can I vary chemistries and get differences in chemistries. Um, ligand binding affinities. So, uh, actually, I would mention that really, you know, this is this fits into the scheme before it's just the chemical potential of a ligand when you have one host and whatever solvents and ligand binding affinities really are chemical potentials of a system with one protein and solvents, the chemical potential. So we're in the process of setting up semi automated, we meaning mostly David Han and BTS gatsis, along with experience from their group and others of us have done for a free regulation setting up semi automated protein ligand binding energies. And pre prepared systems using the gatsis route. This will not be ready as part of the benchmarking for the October 1 release, but it should be going to be going soon after that. So context timelines. David start how many David how many how long ago did you start. Honestly, oh, okay. It's been less than a month because I when I was when I was in Zurich a month ago, he was still there. So, yeah. So, and basically the other data sources so thermal ML does not have older simpler data they started collecting data in the 90s. A lot of like, you know, what is the density of ethanol temperature this was generated so long ago that that people don't bother to do this. So, and one issue with thermal and L is that all the data is public, some of the uncertainties in the data and the data analysis and curation is not entirely public. And so we've been working with thermal ML for a while to exactly nail down the conditions as to what we can release, which is why for the existing data set we're using average uncertainties as opposed to the curated uncertainties by the experiment. So this has been complicated by the fact that the the thermodynamic research center directorship changed last month. So hopefully we can get that worked out relatively soon. Right. It's the curate. Well, it's the curation that they're doing on the public public data which which they think is extremely useful. So the data itself is public. It's the uncertainties. Exactly. Yeah. Yeah. So, so that would be much when we get that straight now, that would be much clearer what exactly we can use for the uncertainties for regularization. But the data itself the values, those are fine. There's no issues there. So for simpler systems I'm talking with Brigham and University and the AICG the chemical engineering National Organization about the Dipper database. And so this is an interesting database, extensive predictive correlations for pure fluids. There's no mixtures there, but So we're discussing getting access to a subset of the validated data uses input. There's already nearly free. There's some license things are just want to nail down. We can use it for about 60 floats more potentially available. But here's an example. So, you know, for benzene on the, this is the, the almost free data sets. It's for non-commercial use. We have to figure out if we can use it correctly within open force field. So we're just working as he goes out. You know, for benzene, they've got all these, you can pull up, you know, temperature, the kind of pressures and for all for every single thing they have, they describe, they show exactly where they got this. So we can, and I've curated it to some extent, you know, what they think looks good, what should not be used. So that's something that the worst case scenario we can, we've got the list of literature there that we can find. We don't have to do curation. And so physical property benchmarking for personally the current plans which the exact choice of molecules is still to be decided the next week people have been rushing to get the initial fits done on pure fluids, about 40 diverse molecules, each at two different temperatures, getting the temperature dependent properties important. Dialectic constants, if they're greater than 12, density, heats of vaporization. And for binary mixtures, ideal is to take about 15 molecules and look at the pairwise interactions where that did exist, two compositions in one state point, excess volumes and heats of mixing, picking molecules that are well represented in the thermal database. If time running the free solve hydration for energies, I don't know where that actually is in there, sorry, free solve hydration energies. Some host guess calculations, but no protein looking binding calculations by so the October 1 deadline for that is what is planned for benchmarking other future data partition coefficients between solvents that data is not in thermal ML so getting. There's a lot of data out there getting it properly curated it would be a challenge relative solubility same thing, not in thermal ML would be very useful curation as a problem. Speed of sound ends up being a very sensitive measurement of a lot of properties. And then using x-ray data strain energies. There are problems. It's not quite as easy to use crystal data because exactly what the observable is is more complicated for things like strain energies could be tested. It's something we should think about NMR data, you know, new simple liquid data, you know, do we need to collect some more data in order to fill in the chemical gaps. And yeah, what else will have a break we're having a breakout session after lunch. So if you have particular data sets you want us to start thinking about and figuring out how to incorporate, we'll be talking about that after lunch. Okay.