 All right, so we've heard a lot about PCR5, but I want to give kind of like a brief overview of its architecture and how it works and some of its use cases and goals. As mentioned, I'm actually a multi-software scientist, so PCR5 is a multi-project. OpenForceField is sponsoring a lot of features within it, like the torsion drives and some constrained optimizations and the like. So really the overall design goal of this is how can we kind of creating curate quantum chemistry data really at scale. Very similar to what Simon was talking about, can I stop caring about the exact input and output file and then parsing it all into some sort of CSV or something else, but can I just simply request computations at scale and have them come back in some sort of organized fashion, that's really what we're after. Of course, if we just do this, we really want the ability to access data from anywhere on the globe, which means that we have central databases and we have clients, they can pull them down and we'll also have some websites that we're spending up to, so you can just simply go online and pull data chunks that you want using canonical web resources. A couple other things to consider is that can we not just use one supercomputer at once, but can we use dozens or hundreds of supercomputers simultaneously to compute with? It's also one of the other design goals because we really want to support large organizations like the open force field where we actually have managers that are running quantum chemistry computations at the different facilities across the US. Then really on the order that we're talking about is how do we not store tens or millions but actually billions of quantum chemistry results. The data involved is actually really quite trivial as long as you're not storing wave functions, so being able to correlate this kind of scale of data is really what we're after as well. Then finally, I think the other thing is really after kind of like removing this middle man kind of thing where there's no, you know, I have a graduate student that's on the new computations, he runs it on a supercomputer, he does everything, formats it, and then tries to give it back to me. Can we just skip all that? So can we just say at a very high level go off and compute these things and then have everyone in a single group or a single community able to access that data simultaneously? On top of this, we try to take quite a few really fun use cases. Open force field is one of them. So as mentioned, we want to do all these torsion drives, we want to do constrained geontrotonizations, partial charges, ESPs, wave functions, ground requests for just an incredible diversity of data from open force field, which is the kind of things that we're looking at storing and how we organize and correlate them together. Then most quantum chemists will say like, you know, I want to search for like ammonia dimer or something like this, but in the biomolecular space, of course, we really want to search based off smiles or MGs, so it requires us to kind of revamp how we can query data and how we can organize data as well. So today, I think these numbers have been shown that we've been meeting quite a bit of data with this kind of framework. Right now we're actually limited purely by core counts, and the new thing is anyone here can actually contribute to these computations by spending up managers locally. It would be pretty trivial actually to recompute everything from a server side and task throughput point of view in just a couple of days with the submission and out the parallelized course. So if you ever want to contribute, get in touch and we'll show you how to spin up some managers and just pull from our central queue of tasks. The other thing we look into a lot is how do we integrate with Jupyter notebooks? How do I visualize and get all the data kind of in a pipeline and API sort of fashion? And so this is what we're really focusing on at the moment is can I pull all this data down? I'm not going to have the time to show it because we're running a little bit behind, but you can actually get every single portion drive open force field is run in about six lines of code. So you install one thing and you just pull the data and it'll be right there for you that you can kind of explore kind of as you wish. The other thing that we're working really hard on is doing web applications. So can I just simply go to a website and can I do some interesting things? So I think one would be example for open force field and just simply have a molecule goes in, use the open force field toolkits, returns you a parameterized molecule in a web application. We're doing a lot in the machine learning space and kind of data visualization space as well. It's web apps. We've actually engaged a group that's at multi's level to do something that does something very similar to this that helps us out. So not only Jupyter notebooks, but hopefully through web applications as well you can browse the data. So also to say is you can engage us in a huge variety of ways. So first of all, I'd really suggest you checking out and viewing our data. Also, as I mentioned, you can also help us compute this. So if you want to compute for open force field, that's great. If you want to compute for your own use cases in an open fashion, we can support that as well. So really kind of look through these things and look at how you can engage. I think the other interesting thing about this is we have to really be careful to delineate between what is QC archive, which is this main multi instance that everyone can go to and what is the infrastructure involved. So a lot of these cases that we've heard is like, not only can I run this at multi at scale, but how do I use this exact same architecture and actually run it on a local machine or behind a firewall. So all of the software is completely open source. Everything that you can do at multi and everything in computation that you can run, you can also run on your own without our primary central, primary central server. And so what this means is that if I want to be able to do kind of like this, though, torsion scans, I can go ahead and spend this up on my local supercomputer. I can run the torsions in exactly the same fashion that open force field is doing to get all these parameters kind of like out of the box. So, so completely open source software snack. The other thing that I think is really important to these data efforts is some sort of schema. So effectively, can I come up with the canonical input and output format for all of the industry? As it turns out, quantum chemistry is very lucky compared to mechanics, the kinds and diversity of inputs are actually much, much smaller in space than mechanics. So this is a much more tangible and approachable project. And the other thing that's nice about this is you it'll automatically spit out all kinds of things like, you know, what are my charges, what are my bond orders, et cetera, without me having to go through and actually parse quantum chemistry output. Part of this ecosystem is abstracting this away. And finally, we also have something called QC Engine, which getting a schema to actually work and getting a lot of quantum codes actually pick it up is a very long process. To shortcut that, we've actually wrote something called QC Engine, which wraps around all these different quantum codes or codes that act like quantum codes to be able to run these and get the schema input and output from these codes up front. So what this means is if I want to run force fields with something like already hit, or hopefully more like open event pretty soon, I can do that in this fashion, and I can run my higher level codes around this. So like my Geometry optimizer, it takes this input output. So I can just kind of switch out different backends. And so if this is force field, if it's semi empirical, we have Torchani up here. So if I want to do machine learning force fields, or if I want to do quantum mechanics, it's really all a matter of just switching out a couple of little lines and being able to extract this from the back end. And so what this also means is that the even higher level thing, so I have a torsion drive, which requires an optimization program, which requires some sort of backends, I can kind of mix and match it as I want. So if I want to rerun all these torsion drives with something like Torchani, it's actually a matter of just switching out a couple of things. So it makes it very easy to have this very composable ecosystem. I think assignment points it out, kind of like this client server compute architecture is becoming really popular. This is something that we call it as well. You have central servers, you put compute up to them, and then you can have third party clients, or you can just kind of engage our rest APIs directly, depending on how you want to access all the data. With our API clients, you have lots of things of how do you have these reproducible pipelines such as a torsion drive. Torsion drives actually contain every single piece of input information in them, along with every single piece of output information. So you can actually go back and reproduce any of these as you want. Or I think more important than simple reproducibility is the ability to tweak these. So if I want to go back and look at torsion drive that was already run, I want to change a couple of parameters. That's just a matter of changing a couple of dictionaries at a higher level and then rerunning the entire procedure. The other thing that we do is how do we collect tons of millions and millions of computations in some sort of sane fashion. So we have these things called collections, which are kind of like these flexible ways of looking at organizing data. So for example, one of them is called a torsion drive dataset, which unsurprisingly organizes thousands or millions of torsion drives into some sort of smiles to torsion drives for the fashions. They can always go back and say, hey, was this torsion drive ever run? And I can automatically get the data right there. So very flexible layers and we can talk more. For distributed computes, I think this is one thing that where we can actually take in lots of other programs and computations, we can effectively have the central server with like a central compute queue. And then we can spin up what we call managers on arbitrary physical resources. As long as they're connected to the internet, that can actually pull down and execute various tasks on different supercomputers simultaneously. As Simon points out, there's tasks in desktop queue. There's also ones that are much more made for exceed and leadership class supercomputers like radical and partial, which allow us, for example, to scale up to theta. I think we did 150,000 cores there simultaneously or connected to about a dozen other supercomputers and it just all kind of worked. So lots of fun that you can do with these and we're actually not compute bound at all. It's just a matter of how can we get enough cores to actually feed this kind of thing. Again, as mentioned, I think this is really important to note is that mostly has this main QC archive server, but all the software is completely open. You can spend this kind of infrastructure up on your local computers that doesn't talk to us at all. Or if you want in the future and you want to actually give us data, everything's in the same format. So it makes it incredibly easy to push it out to serve as kind of like a data resource for the entire community. I just want to point out that kind of like our mission is sort of like this grassroots approach where we work with a ton of CMS and community codes. We have all these kinds of different software layers that go into additional downstream codes. So we have like things like engine that are being picked up by lots of compute requirements in the community. We use tons of cyber infrastructure from the NSF. Private databases are of course supported. I think this is very important. We also have our community database for all this kind of like open data that anyone can access. We use tons and tons of technology from the community again to visualize and plot this kind of data. And then finally we're going all the way up and we're looking at kind of like gateways portals for this data. And we work with say journals or other large projects like us like the materials project to help kind of like build out these kind of global ecosystems. And there's more people that are on the slide that have contributed to it. So simply say basically think everyone here. I think probably at least half the people here have touched the project in one way or another and get in touch if you want to work on additional data.