 Hi everybody, welcome to the BioXL Web Seminar Series. Now our Web Seminar is number 42 and it will be about what is new in Gromach's 2020. The presenter is Paul Bauer from the Royal Institute of Technology of Stockholm, Sweden. I'm Austenit, I'm Alessandra Villa and I have two posters that is Julian, seen from Edinburg and a Ross poster from Stockholm from Royal Institute of Technology. So I want first to start to introduce the presenter of today, Paul, where he was actually working on enzyme catalysis with Lynn Cameron. And they start also to get familiarity and get involved in management of software. Then he moved to Stockholm and as a researcher in the group of Eric Lindow in Stockholm University and Royal Institute of Technology. And there after a while he took over the position of developed manager of the Gromach software. And now he will tell us about what is new in Gromach's 2020. Now I just switch the presentation to Paul. Okay, thank you Alexandra for the kind introduction and please don't be concerned that it still shows Alessandra speaking because we're running all of this with her laptop. But this is so that you actually can understand what I'm saying during the presentation. I'm going to give you a hopefully brief, but also informative overview about the recent changes to Gromach's 2020 series. So some of the new features we've added changes we've done functionality features that got deprecated and also a bit of overview what we're planning to work on in the future. So just a quick thing, what do we use Gromach's for? Of course we use it to do biomolecular simulation of things like membrane structures or complex structures of fatty acids that are shown here. The thing we would normally expect it to be used for simulating a bioprotein and act with environment. Here are shown example of a membrane protein receptor invited into a membrane bilayer with ions and small molecules surrounding it. With Gromach's, we have one good thing we have extensive documentation and that's also hasn't changed for the 2020 release so basically everything I'm telling you about now. You can read about in our release notes for the 2020 release and also browse our manual to see some more details for things that have changed. If you haven't watched them, I would also invite you to watch the previous webinars for the 2019 and 2018 releases so you'll get maybe an idea about how we have proceeded with changes over the last few years. So with Gromach, we have quite strict release schedule where we plan to have one major release every year and then over the years, over the 20 years there will point releases to keep the software up to date and keep bug fixes in the software so we don't let you use versions where we know that they may cause issues in the future. Currently, there are two major supported branches of Gromach's. One of them is the 2020 branch that is for the current release. Here we do all our general bug fixes and get fixes get in here first. We also still support the 2019 release of Gromach where there's going to be one more live support release that will happen in about a month if we get the timing right. And no, let's have a month in a few weeks actually. And there we only do critical fixes where we know that the bugs that we find would affect the validity of a simulation or that may cause major havoc for you if you want to set up a simulation. We may actually keep 2019 open for fixes a bit longer than usual and I will tell you about this in the future. In the fourth part of this year, we will start to work on the 2021 version with the Orbiter cycle that we plan to do a few better releases during October and November. Release candidate hopefully in December and then the 2021 release fully in 2021 as it should. Also, just a bit of showing off maybe for the project, you can see that you have to start off committed of similar cetaceans that Gromach is still relevant and still being used. And that also the hopefully choices that the newer versions are finally getting more cetaceans and more newest and older versions. As some of them are really old from 1996 to the first published release there. We are also planning off actually working on a new Gromach paper so that we can inform you also in this kind of way about the new development and also give more up to date weapons for you to use in the future. In Gromach we have several projects going on all the time. This is a non exhaustive overview of them. They're actually many, many more small projects but we can't show a lot of them otherwise this would be way too crowded. The main of them are by Excel, the European funded project that helps us work on software sustainability and general concrete maintenance. The speedy software engineering Institute that helps us to work on parallelization and HPC capabilities. When the code is done then we work with hardware vendors to improve scaling and the capability of the code to different hardware, especially accelerators. And the collaboration with the National Institute of Health in the States where we work on APIs and modernization of the simulator part and I will tell more about this in the future. Let's start with the main project that is the European HPC Center of Excellence by Excel that hopefully brought you to the webinar. And here the software development is actually just one part of the work that is being done on promoting HPC software. It's also working on making it more usable for the development of workflows. We work for training and providing users support. Another plan on being able to software consultancy. Gromach is one of the major partners but the other partners also should not be forgotten. This is Hedork, a general proposed docking software and the quantum mechanics engine and CQ2K that we want to integrate with the main molecular mechanics code to give a modernized human-meme interface and Gromach. So now to the new Gromachs. This is basically a copy of our release notes with the major hybrids and the main thing that is new in Gromachs is that you can now use cryo and density maps. In general, the density maps to guide simulations meaning that you can use this to fit structures into cryo and densities or explore how this structure would need to move between one density into another. We also improved the Python GMX API. We now at version 0.1 where we give a full functionality for all command language commands. So you can just run everything related now directly through Python and also included some updates for the interface between Python and the core later structures in Gromachs. And I think maybe for some people, but I think something important is that, you know, give fuller support for the chomp for scouts where we can now support which was set for two items in a line something that had been kind of working before but was not officially supported. But now we can actually fully support us and also can assure that this work is intended. The updates to the integrator also allowed us to improve the simulation capability a bit, meaning that you can now use accurate pressure company with the parallel Raman scheme with velocity valet something that again before was not possible now has been made possible. We also made a major improvement to performance. I'm going to spend a lot of time. Later on, this is the ability to run close to complete part of the simulation on the GPU using only GPU using only the GPU device. It allows us being able to accelerate the one significantly. Another minor thing is that we improved on the PME offload capabilities. So you may have known from the previous webinars that it was possible to do the machine work calculations already on the GPU with with CUDA and video. We can now improve this also so you can run this on general with general NCR code on NVIDIA on Intel and on AMD GPUs so that we get closer to future parity gap. Now main feature for this I want to quickly talk about as before as opposed to landscape guided simulations. In the work of Christian Brown, one of the researchers with KTH and he implemented density fitting code that allows you to do accurate fitting of structure and simulate buying density to cement the crime in that densities. If in relatively short time and in a physical correct sense so that you can have physically correct behavior that is just guided by the forces provided from the difference between your actual experimental map and the map obtained from the simulation. And for example how this would look like you would have general density map that can be anything. You put your protein next to it and when running the code, you can make it fit inside there without much trouble and you can also define which regions you're more interested in you can use it for for fitting only part of the molecule you can fit in by using it for fitting small molecules. Or for an interpolating between different structures. Something has to be also said that we finally removed something from Gromax that had been deprecated for several years now and this is the group group based cut off particle particle interactions. I am mentioning this here now for and actually have a full slide for this is because this affects simulations and probably affects a lot of simulations at this point. If you have any simulation that used group based cut offs in the PR has been generated with MDP input files, there will no longer work. The functionality has been removed and it's not coming back as the valet based cut off is easier for us to maintain more accurate. And running a single code path and getting the same simulation behavior from all kinds of setups is I think preferable. The problem is that with the with this change a few simulations setups have been temporary or permanent. And you hope that we actually get some of them back with the 2021 boys, but this has been now what is now being disabled is simulating on the vacuum conditions. But this should be possible to be added soon again, and this plan for 2021. We also no longer support users applied tables for the short bench interaction. This is also just because we need to update the support for this in the code base. But you hope that it won't affect people too much. And switch to actions that have been only being able to use with the good base cut off also been removed the PM because they are not physically correct and we don't want to promote this kind of usage. Membrane and betting. So G man made is currently deactivated but will be activated again as soon as possible. And will probably be implemented in terms of the test particle insertion code. And currently the Q man and support has been removed because this also relied on the group scheme and needed to be reworked anyway with a general look on the Q man and interface. Another thing that had been deprecated for a while and has been removed, but it's on the way to good base cutoffs is the journalist last reaction field. We're working on more specialized action people at the moment, and the support is fully later. You have some new requirements for the code. And this is mainly that from 20 20 every compiler you want to use needs to support C++ 14. This is any reasonably modern compiler. ceiling 3.6 gcc 5.1. And I see C 19. Correctly top of my head. So that shouldn't be a big show for people. If you are unfortunate enough to work on the supercomputer system where they don't support this kind of compilers. I would ask you to complain to the administration of your supercomputers, but we need this to actually make our code easier to maintain. Another minor change is that we need CMake with version new than 3.9.6. Again, that shouldn't affect many people because CMake is far ahead of this version right now. Again, this was needed so we can modernize some of our CMake code that had been difficult to maintain and should make life for everyone easier. Another thing that you may have seen if you are used to messing with source first or trying to build different gomax versions with modifications. gomax now checks if your version that you downloaded from this on our account and built from the source first is the same as the one that we use to generate the source source file. And you will get a warning during compilation and the build will be marked with a modification if you have managed to change any of the source files. This is to help us to make sure that if you have an issue, we know it's in our code, not in any modified code. And also should encourage the people that distributed modernized modified versions of gomax, for example with Blument, to use the ability to mark their version as a changed version so that you always know what you're actually using. Some more announcements just to warm people. If you're still using 32-bit architecture, gomax will still run, it will compile, it will run. But we cannot ensure that it will continue running in the future because we cannot support it and we don't have the hardware for it. We removed one part of the free energy code for SoftwarePower48 that was not used by anything. So we removed it to be deprecated in 2020 and we removed it in 2021 to make all that easier. And we will also stop supporting on visa again because we have issues with maintenance there with the compilers and it becomes difficult to test it on the hardware that we have here and so on. Now, enough of bioexfoliated projects and deprecations, now to some other things that were happening during the 2020 release cycle. And that was one of the major coders and projects we had that was in collaboration with NVIDIA to improve our GPU code performance and our GPU code paths. And was altered in something like this in the Stylers version. Before in Gomax 2019, you had basically two ways to run a simulation if you wanted to get good performance on strong GPUs with big CPUs. That was offloading as much of the simulation tasks to the GPU, meaning a number of calculations, a mini calculation and one calculation and only performing integration and constraining of coordinates on the CPU. From Gomax 2020, this will be changed that you can do all of those relations on the GPU, meaning that you will no longer have to perform copies of positions and forces between the device and the host device being GPU or CPU, meaning you save time there. And also using the GPU to be for the update and constraint calculation improving the scalability of the code. Well, this means like if you are someone interested in performance and it's coding and knows how to use the NVIDIA profiler. That's what before was the GPU path called in update having the update on the CPU and the rest of the calculation on the GPU. So you have a GPU path and the long CPU path that was wasting time on the GPU and leading to under utilization of your GPU devices. So you have those H2D copy positions and so on. Those are the copying of what and position and forces later from the CPU to the GPU and then back to the CPU and needed modification of them. But this is internal. Now, if you change this, you can actually perform all of those modifications instead of in the CPU on the GPU. So you're making more utilization of GPU corners, reducing the CPU code by a bit, but not that much. But then you can make do all of this on the GPU, meaning that you perform integration with the algorithm. And then you have a separate constraint for water on the GPU. And then all the tasks as well on the GPU, meaning all of the time you had wasted before on the CPU with the GPU idle is not fully utilized. Of course, you don't have always just code that can only run on the GPU. We have some cases where the CPU is needed. And this is, for example, if you have special forces or special interaction types that are not supported in GPU code, for example, the CMAP calculation for Trump. But that you can also use the time that you spend actually on the GPU or the CPU is idle to calculate those forces, copy them back and then perform the rest of the calculation again on the GPU. And again, increasing the overlap between the two calculations, making sure the CPU is utilized the same as the GPU and the performance. This is especially important if you think about more complex setups, like the only part of the PME calculation on the GPU or the only PME calculation on the CPU. Again, all of this is possible with different setups, but for now we want to make this more clear that you can use the single code for the GPU. And special ed code for different cases. All of this would be meaningless if I wouldn't be showing you some numbers as well for performance that were gathered by us for some very bias cases after myself. And that is because we have, in all of those cases, very strong GPUs, a lot of performance, relatively big CPUs, and you also want only one calculation on each GPU. Of course, if you have strong GPUs, you may want one more to the calculation in one GPU. This is possible, but the performance cases become a bit more nuanced, so I won't go into detail of them. In general, what you can see, if you want the complete code on the GPU, in this case, look for the dark green line. This is PME, non-mandered buffer ops and update on the GPU. You get about 30 to 50% performance improvement to the previous code path, where you would only run the PME and non-mandered calculations on the GPU with version 2019.3. That was the, when those performance measurements were taken was the current version. Now, this code is true actually for both the cases where you have no special forces on the GPU and where you have special forces in case of the Ambo force field, and Schaum force field, I'm sorry. Meaning that it means in general that we can improve performance a lot, even for relatively large systems. And even more so for small systems where you want to do robot simulations. So, let's go on. Our projects that we have, as I mentioned, was the project with the National Institute of Health. Here we're working on our API with Python bindings and Xenoxyplus API, and also on simulation correctness and simulation setups. One thing, as I mentioned before, is that we are now at version 0.1 of the GMAX API with complete support of all GMAX command line utilities and all possible combination of commands. Also the ability to set up tendencies and workflows in the API that you can define inputs, simulation at one and then the analysis that should be done on the finished simulation and the API will do the management of the resources for you. The work here is still ongoing because we need to, of course, improve the C++ API that is exposed to Python and also improve the usability of GMAX API. Make sure that when you run GMAX API and you're just running a robot around the command line programs, you actually expose all of the data structure to the API level that there should be. And also make it more that doesn't have what kind of input you provide. Another thing with the NIH collaboration was that we developed a new way to do actually integration in GMAX with a modular simulation approach. And this is just a simple showcase that shows how the modular simulator is actually set up and how advanced it's different steps. If you want to read more of this, I actually would highly invite you to read the documentation we have in the code for this and the manual. There you see different graphs also showing how these simulations work in this case. Basically what a modular simulator does is that you can decompose one simulation step into the individual task that you want to do in those steps. But they are no longer bound by having a similar description of what is supposed to be done doing this simulation. But you can define that, for example, you want to do multiple bonded integrations before you do a normal set, easily making it possible to do multiple times to do simulations. This also aimed at providing in the future the ability to do something like Monte Carlo pressure coupling because it will make it easier to save the different states and decide between different paths that should be done doing the simulation without just ending with a giant branch of statements in the code to decide which branch is taken in the end. This also will make it show that our code is modular as it should be as the name of the simulator suggests. Because it means that we need to write all individual tasks that are under these malicious steps as module that can be called and can be executed independently of the rest of the code. That is one of the major challenges we have right now is entangling different parts of the code. But we're working on this to make the code more maintainable and also more usable and make it possible for external contributors or people that want to implement their own integration algorithm that they haven't thought of. To do this in this module integration scheme instead of having to work themselves through our current main MD loop and just hacking changes in there. This will make it easier to write their own module and this module then will do the integration statement. We have a lot of long-term plans and observant people that compare this slide with the slide from the 2018 and 2019 webinars will see that a lot of them don't change. But that is because we have to work on them and they are not done within a few months or years, but they take a long time to actually work out completely. One of the main things we have planned for coming 2021 is that we have support for multiple time-stepping. It will allow us to hopefully improve performance again by changing some way from calculation and also remove the current virtual setup that is not always physically correct and can lead to other things. We are still working on the modernization of programs, but we are getting closer there to actually finalizing this and having all corporates of programs modernized and modularized make it possible to do easier plug-based combination between the modules. The lower encapsulation of lower level levels and the API is something we also work in hand-in-hand and we also work together in a different project called NVLIP. It is working on providing a general API for non-bundled calculation and library that can be called by power programs as well and that we want to incorporate as the non-bundled calculation engine in performance itself. Something that has already partially happened with the work on the density fitting code by Christian is that we are having the ability now to do extensive force calculation with modules providing forces instead of, again, having to add force calculations in the middle of the envelope, instead having an ability to add a new force calculation module or something, anything that provides force on atoms directly as part of the code. This will also hopefully be in the future exposed to the API, making it possible to write your own force module and not having to worry about having to hack into the main code. Our projects involve also the modernization of our two sets that we are actively working on and that I myself work in and the hope that we soon be able to switch our current testing infrastructure and our distribution infrastructure to a more container-based setting, making it possible that people can just take a container we use, a proper container that we use for testing to test their own setup on their own. And also make it easier for us to do an investigation of debugging and testing performance in a way that this would be usable and not based on current systems that we are standing above here in Stockholm. Something that is also very much in our mind for usability is that we want to change the more current legacy input and output performance, the modern formats that are more online than what people would expect, like JSON or YAML files that are used all over the place in different setups for inputs, maybe also for outputs so that we can have a human-needleable platform that is easy to change and extensible, for example, if you want to change that human information in a structured part. But it can also be used to efficiently and effectively set up a new simulation by just reading it from this setup. Yeah, this brings me close to the end and I think I am hit early, but it means we have more time for question. And yeah, I don't think I have got everyone on the slide, and again this slide is old and probably needs updating at some point. And I want to thank all the crew members and past members of the Gromax team that helped develop the software and bring it to the point that it is now, and by three months, whom I took over the job from before, was helped to actually steal the project to its current state. And before I come to the point where you can hopefully ask me all the questions that you want to, I just want to mention that we are currently as part of BioXR and this survey about what you want in Gromax for doing QMM calculation for the major pinpoints now with the Gromax implementations, how we should change the code and what we can do to make Gromax more usable for you. And with that, I will give over to the post documents so we can go to the QA session. Thank you very much for that talk, Paul. So we've got at least one question from Maximilian. I'm sorry I'm going to butcher your surname, but I believe Minger is the first question. I will unmute Maximilian and if Maximilian wants to ask their question, that would be Grant. And if they are not there, I will ask the question for them. Maximilian. Yes. Would you like to ask your question to Paul? Okay, yeah. So you mentioned that you have also the Python API and I was curious, do you also plan to support single point calculations so that you can simply say I want energy and forces computed using that API? This is something we would like to expose and we're working on exposing. The thing is you can already of course do it by just doing a zero step calculation. But you want to make this easy so you don't need to go through the points of having to set up NDP files and so on for this. But this is something you want to expose. Okay, thanks. The next question we have is from Matthias Machado, who unfortunately their microphone is not working at the moment so I will ask the question on their behalf. And the question is, is it possible to use cryo EM guided molecular dynamic simulations to improve packing of lipids in protein membrane complexes? This is a difficult question. It's possible if you have a good density for your lipids that you can fit them in there. If you have the lipids as one group in your simulation and as an index group, it should be possible to use the cryo encode also to fit them into the density you have for that. But for this, I would recommend that you actually ask question on the user mailing list. But then I think Christian will be able to answer you more precisely. Thank you very much. The next question we have is from is from our Arthur Zalewski. Arthur, I've unmuted your microphone if you would like to ask your question. Hi, Paul. So I wonder what is the reason for the exchange of mimic API which was recently introduced into the upstream codes in favor of USP2K quantum API? What was the reason for the change? They have to say this is more on the management level. We are still working with the mimic people to support the current API. But we wanted to extend it because we think that CP2K will be more support more kinds of calculations that users want to do. But this is more on the management level and less on the code level. Okay, thanks. The next question we have is from Gabriel de la Jora. I will unmute their microphone. Gabriel, if you would like to ask your question. Thank you. Thanks for a nice presentation. So we currently are attached to the previous version 2019.4 because we use blue mid batching. Essentially, I've seen that in your new slide that was a great improvement on the performance because of the more use of the GPU. How would be the best configuration to install on installation and scripting to use less CPU and more GPU? So I also have the text of your question here. In general, you can use, for example, in your setup, something like four cores per GPU and specifying which GPU device you want to use. But I have to give a warning because I don't think that the current code, the GPU code path will work with the blue mid patch. I'm pretty sure it won't work because it changes some of the calculations and I'm not sure if it would interact with the way we do the special force calculations that it would get the correct coordinates from the GPU while doing the calculation. I can ask this. I can ask the GPU for the most we have if there are any issues with that. But in general, for your node setup, you could use something like each use for the course. And just as I said, maybe four cores per GPU and then running one simulation on each GPU. So that you have each simulation taking one similar each GPU taking one simulation in the end. Thank you. Cool. Our next question is from Sahin Khan al-Baslan. I have unmuted your microphone Sahin, if you would like to ask your question. Sahin, can you hear us? Sahin is possibly not there. So I will ask the question on their behalf. The question is, can we simulate and computer run computations of metallic compounds and crystallic systems? The problem is, if you have metallic compounds that they're very badly described by a general molecular dynamics force fields, it is possible to do calculations with them. I think in lamps and with special force field descriptions, but in general, my is not good at running metallic compounds because force field general molecular dynamics force fields are better describing metallic compounds. For crystallic systems, it all depends if you have a proper description of your system in terms of force field. If you have a force for this description that can be done in terms of electrostatics and one of us. And the force field is good enough that it can get physical behavior than going to be able to go on it. Cool. Thank you very much for that answer. Our next question is from Anikit Magarkar. Anikit, I've unmuted your microphone. Can you hear us? Yeah, I can hear you. Can you hear me? Yes, we can. Would you like to ask your question? Yes. Thank you so much for the beautiful presentation. My question is, will there be a future support for PME on GPU for the free energy based calculation where you have charged perturbations? Yes, those changes are getting currently merged for the 2021 release. Okay, so till the end of the year. As beginning of next year, you will be able to do this with Comax. Okay. Thanks a lot. Great. At the moment, we do not have, that is all of the questions. If anyone would like to ask any more questions, please do so now or indicate that you would like to do so now. Probably the easiest way to indicate is I will put everyone's hands down. And if you raise your hand, I will wait for you to ask your questions. That doesn't seem to be anyone asking questions. So in that case, thank you again, Paul for the very good talk. And thank you everyone for joining us. Thank you so much. We have further webinars at the end of March. We are in contact with Brenda Vallott to give a talk about PDB. And we've got other initiatives going at the moment, including at the moment where BioXL is running a QNNM survey. You can find out more about all of the other activities that BioXL does by going to bioxl.eu.