 Okay, so as mentioned, I want to give a little bit of updates of what the QC archive team has been up to in the past six months. Specific updates with open forest field and some just general infrastructure updates as well that may be interesting to this particular crowd. So I thought to begin with as we go back over the goals of QC archive project to see kind of like where we're coming from and why we're moving in certain future directions. So first of all, fundamentally what we are is a platform for generation storage of masses that's training data, or just general computations for either machine learning, or for benchmarking but effectively it's always about organizing and storing quantum data scale, making sure all the relations are there so we can always find adjacent quantum data, and then displaying quantum data that is as simple and easy to use. It's fundamentally what we're really after. I think the third point that we talk about a lot is that, you know, ultimately what we want QC archive to turn into is going to be a place where you can mine quantum chemical data which will inform day to day molecular analysis and sign. Obviously this is a much more tall tree than the first two points which are fairly simple to do, but I believe we've made a little bit of progress in this area and I'd like to share that to you today. And then finally and fundamentally, like all open force fields software infrastructure we are an open source ecosystem, which is both guided by the community and also developed by the community these days, where we've had over 35 external contributors across the entire software stack. So to focus really on generation of machine learning data. What we kind of view is a very simple as machine learning pipeline like so. You know, first of all, you choose the molecules new training sets you want to calculate energies and properties. You want to get those energies and properties combine them with your futureizations that's and then finally do your analysis and potentially publish papers models and data. So when we look at this what we say is, and within this pipeline, what are the things that you know we can actually help out with what are the things that we can standardize and come on twice to make it easier to form this pipeline. And our end answer was the the following boxes. So first making sure that generation of quantum chemistry data across different methods and basis sets, even programs as seamlessly as possible. So in this top box up here, we actually have the computation of Omega B 97 x with the site for program on the QM three data set. And also for comparison, the ability to compute for example torch any through or any one x through through the same exact QM three data set. And in this way we should be able to be able to homogenize the ability to do arbitrary computations on on these large data sets. On the bottom right hand corner, we can then get these properties back, for example, into a pandas data frame or CSV file or whatnot, which will allow very easy labeling of the data. Of course, you know, futureization is going to be extremely to spoke these days, so you can do an individual combination with bespoke workflows for different kinds of futureizations. So if the two other areas where we believe we can help out is both publishing the computational model and being able to hook this into a larger ecosystem, as well as publishing the data in a place where it's going to be in a standardized format and also easily available to the rest of the cosmic chemistry or other communities. In particular, our enhancements for this particular machine learning workflow. So really radically improve the performance for very large molecule data sets so handling million million molecule data sets is not a problem for for kind of like machine learning side of things. We now have the ability to export these very large data sets in HDFI format. We've also added a bunch of tutorials and examples using QC archive to kind of get up quickly at the speed to be able to generate this data quickly and easily. And we've also looked deeply into storing wave function information. So if you want to store say orbitals or densities or even foc matrices. We can now do that as well and present them to you in a very similar way of, you know, here's a pandas table of umpires. If you go to the link at the top, you can explore all these examples and more trying to show you how to kind of get up to speed and quickly interacting and generating quantum chemistry data. The other two things that we've been working on which I'll actually talk a little bit about later, but one we have a new quantum chemistry machine learning database where you can deposit all of your data as mentioned, and I'll give like we'll demo that. And we also have made it a bit easier to add new machine learning models to QC archive using kind of something like torch any as an example of what they should look like. So that's the primary machine learning workflow and then also we'd like to talk a little bit about workflows with open force field. So here we have kind of a different sets of computations required with something like open force field we of course are not interested in single energies or single gradients. But we're actually more interested in complicated values. So hi was showing off, for example, the torsion drive computations and the results from them within the QC archive. So far open force fields compute about 4,800 of them. A new and upcoming thing that we've been looking at, for example, is the ability to engage, for example, workflows to handle nitrogen and version centers, which has required which has requested a new service called a grid optimization which effectively does a constrained scan over, for example, you know, is a given nitrogen center inverted or not and we're able to provide effectively energy surfaces from that. So far we've earned about 1300 of them. I suppose we'll probably get an update from the results those computations fairly soon as well. We've moved over to Docker based images and condit environment based images for containerizing open force field computational resources to ensure the distributed computing framework gives you back reliable results always with the exact same version. And then ensuring that there's no issues with say updating the like. And that in turn, I think has really given rise to a pretty high level of uptime and compute scalability. I believe Jeff is going to add a little bit of information about how open force fields computational requirements have scaled up by say 5 to 10 fold over the past couple weeks to meet the demands for the upcoming force field. So all told, this is about 11 million gradient valuations with another 51,000 hashing computations computed so far for open force field to kind of give you an idea of the total scale of compute involved with this particular project. I believe this came to something like 2 million core hours or so. So it's becoming a decent amount and we've we've ensured that there's enough headroom that we can scale up by another order of magnitude pretty easily. I think the other thing to point out here is the cute their more open force field is a QC archive sponsor. So everything we do is open source but sponsorship really helps guide the efforts to the QC archive in terms of capabilities and data online. One thing that I do want to point out that is, is I think it's neat and combines both benchmarking with regard to force fields and also benchmarking with regard to machine learning models is that we've been extending QC engine to include AI energy evaluation and force fields as well. So for example, here's an up geometry optimization scheme and put for the site for program with a given level of theories. This is a mega beneath an x dash D. In this case, we should have added a basis that but to give you an idea here. And of course you get the geometry back out after this optimization. If we want to switch out something like torch any it's going to be a single line where instead of calling site for we want to compute torch any with the any one method. And then something that we've been adding and is worked by a lot of open force field scientists, for example, is the ability to do something like running spurn off through this exact same framework. And so what this will do is this will really give rise to the ability to give a perfect one to one comparison. We're always running the exact same workflow for all the different back ends, which will ensure that we get a very apples to apples comparison. When we start benchmarking things like Smirnoff 99 frost and and and similar force fields that are coming out. So this is coming soon. We're pretty excited about this field automatically generate benchmark data, not only for quantum but also for for different force fields as well. I was going to talk about some some general improvements. So I think one thing that's both like a little bit scary but also really needs was we actually had our first drive failure on both these old hardware and with this cause was was actually a complete database loss. But fortunately we were able to give rise to all the backups, all this work flawlessly and we're actually able to restore within about 20 hours. And when we did this restore we actually migrated to a new server which we're planning to anyways, and the new server has about five years of scalability built into it. And I think most importantly is now the primary database is resilient to individual drive failure so we can actually lose a couple drives on the new server without anything going down and not having to do a full backup restore. On top of this it kind of highlighted that 20 hours is probably a little bit long so we instituted do hourly nightly monthly backup strategies to further limit data loss and improve recovery time. So, before we lost about six hours of compute and think it's about 20 hours to come back up in the future we should only lose about an hour of compute and the recovery time should be more likely on the on the order of an hour timeline. So we're pretty excited about this is we're talking about this because it was kind of a highlight of, you know, how important this kind of thing is and making sure that the backup and securities there and kind of our first pass of this kind of guarantee actually worked without a hitch. So we're very happy about that. More software based improvements is we've been making basically everything faster so the ability to add, you know, 10s or hundreds of thousands of molecules at the same time is now much much quicker, you know, by about the factor 50. So, so all kinds of creations and operations within QC archive much faster. In addition, we've been adding custom queries to the database so that you can, for example, get all the final molecules and torsion drive and a much, much shorter amount of time as well. So overall just doing a lot of performance enhancements. Another thing that we've been focusing on is the ability to have insight into what the current server is doing. We're logging lots of information about what are the competition resources and the current service date. So for example, we can say, you know, the open force fields compute resources currently has this many cores and this many failures over time. And, you know, we can we can look, you know, very, very detailed about this so that we can kind of give information feedback to these groups much faster. To help with this amount of data that we're starting to log in store. There's going to be a new server site dashboard so you have like a graphical GUI way of navigating this kind of state and maintenance information. So if you want to make, for example, say like new users of like everything can be GUI based rather than CLI based as it is right now. I'm going to say this is a very small fraction of the software enhancements that we've been really working on over the past six months. I think we have combined 200 pull request from a few dozen contributors during this time period. So things are moving really quite rapidly which is great. And so I wanted to bring back to bring up a new thing that we've been working on which is you know how can we use this archive chemical data to kind of inform day to day molecular analysis and design. You know, again, this is obviously something that's much more complicated and much more nebulous than the previous two goals, but I just want to demonstrate two ways we've been approaching this. So first is the ability to do different kinds of web applications. So for example, this is the web application for showing machine learning data. So you can kind of look at effectively every single machine learning quantum chemistry data set there is we've curated them all. We put them into a homogeneous format where you can either get the HDF five or CSV like files for this. And we've allowed you to kind of search through these things as the number grows kind of ordering them in different ways and trying to figure out what the common chemical elements are for these things. So we've made a lot of agreements with the the generators of these data sets so that the data sets are published online at QC archive concurrently with being exported something like Zinno or fixture like. So we're able to have kind of like the central resource and central place of finding all these data sets if you're interested in that. Another one that we're doing is interactive ways of plotting and looking at various statistics. So in this particular case what we do is we're pulling out a variety of curated data sets. So in the quantum chemistry world, for example, it is fairly popular to take one of about 30 different molecules or reactions. They're kind of well known literature and do different kinds of DFT your basis sets comparisons for them. And so we've done is we've gone through and we've we've we've recomputed all of these that we can find. And we've recomputed them to make sure that they're they're kind of on a standard footing and we have a whole bunch more metadata about what what they actually describe. And what this web app allows you to do is slowly and interactively choose a data set then build out the methods you want to display. And in various ways if you want to buy a bar violin plots if you want to do it from group buys or different methods or basis sets, you can interactively explore things. So you think this is going to be in a really cool direction for something like the QC archive to go in because we're able to continuously drill down and deliver. You know what is historically in papers but now just online and can be accessible in a couple clicks. Not really shown here but you can also explore all the molecules in the data set so you know instead of this kind of S 22 esoteric name. We can actually see you know what's in there all the different molecules that comprise of it. And I think the thing that's going to be really neat that once we get to is we're going to actually be able to compare time to solution. So for example here we have different basis sets and you know going from deaf to SVP all the way up to oxy CPVTZ is going to be you know dramatically more costly you know perhaps by a factor of 10. And so we kind of hope to do is be able to give you a plot where we give you accuracy versus performance so you can make much more informed decisions on top of just what is the most accurate method that there is because it's not the only metric that people care about. I want to throw this slide up here one more time just as the simple note that we continuously have even more partners that we're interacting with we're seeing increased number of downloads of all the software stack. We're getting even more computation then from groups like open forest field and even more external contributors. So so things are definitely growing I'd say at a pretty steady pace. And then finally what I'd like to do is thank everyone involved with this project is such a such a huge group and collaboration and people that really that really work on this and make this project work. And so really a thank you to the huge community. Thank you to open forest shield for working with us and of course multi for sponsoring the project in the first place. Thank you for your time.