 Hello everyone and welcome to the BioXL webinar series. My name is Rosen Apostolov and I will be today's host. As you know, Gromis is one of the most widely used applications for molecular dynamics simulations and for a long time there's been an interest in developing an API for controlling and extending. Luckily, we already have today Eric Idan from Newestiel, Virginia who will present his work on the development of this API. Before we start the main presentation, I have to tell you that this webinar is being recorded and a recording will be posted on the BioXL website and on our YouTube channel where you can watch again later or you can share with your colleagues and friends. I'd like to give you a very brief overview of BioXL for those of you who have not visited our webinars before. BioXL is the center of excellence for computational biomolecular research. It's a distributed consortium of 11 partners in Europe. We work in three main directions. One is on the development of three main codes used for molecular modeling simulations. These are Gromax, which you know very well. Hadoq for docking and integrative modeling. Some of you might have used it for drug design, for example, and CPMD where we work on hybrid QMM simulations. BioXL also develops different workflows and packages that improve the productivity and efficiency of researchers. We also do a lot of training and we provide consultancy services to promote the best practices for the usage of these applications and the developed workflows. We're running a number of interest groups of different topics in the wide area of biomolecular research. Some of these interest groups might be of particular interest to you. So we welcome you to join any of those that are interesting for you. You can visit our website. We have support forums. We share most of our code. We have a chat channel, so feel free to contact us. At the end of today's presentation, we will have a Q&A session. So during the presentation now, you can use the control panel of GoToWebinar and you can type in your questions while the presentation is going. At the end of the talk, I will give you the microphone. I will let you speak to Eric directly and ask your questions. If you don't have a function, I will read the question on your behalf. And of course, after the webinar, you can always join us for a discussion at ask.bioxcel.edu. With that, I'd like to present you Eric Irgan from University of Virginia, our speaker today. Eric completed his undergraduate studies at the University of Texas and he did PhD in materials science and engineering at the University of Michigan. After that, he joined Castle Lab as a postdoctoral fellow and he's been working on the development of interfaces for extensible molecular dynamic simulations. He is applying and is proponent of best practices for software engineering. And he's also supported his current project by a fellowship by the Molecular Science and Software Institute in the United States. So with that, I would like to welcome Eric and I will give him the opportunity to start the presentation. Hi Eric. Hi, thanks. Yes, I'm a postdoc under Peter Kassen at the University of Virginia and I'm wanting to talk to you folks today about a project that was begun under an NIH grant in international collaboration with the University of Colorado and the University of Virginia and KTH Stockholm. And that continued under a Molecular Science and Software Institute fellowship. And I recognize the names of a lot of the attendees here. I think we've got the right audience here. But I think people have different ideas of what they are looking for in an API for Gromax. And sometimes the differences is people are thinking at rather different levels of access to Gromax. My current project is addressing interaction with Gromax at a few different levels that we're trying to keep discreet and I'll explain that more in a moment. The bio, I guess I gave a little bit of my background before the Kassen lab, but the group I'm in right now is focused on biomolecular sciences questions. And as we all know, the most interesting and most useful topics are really hard. We are like many people working to combine experimental processes and data with advanced computation techniques to try to tackle problems and infectious diseases. The Kassen lab is particularly focused on the biophysics of infection processes and we try to apply large scale simulation and data processing and machine learning to try to apply and develop new methods to tackle these tricky questions. As a lot of folks here probably are. And like many labs over the years we've developed a lot of our own in-house software and in-house tools while also trying to cobble together community tools with varying levels of compatibility. Trying to both benefit from the community tools and contribute back to the community, but we do end up of course with a lot of modified source code that's hard to maintain in different forked versions of Gromax, for instance. So what we tried to do was to put some effort into some abstraction layers so that as our graduate students are developing their methods or their workflows, they can do so without producing a lot of stuff that's hard to maintain or hard to contribute and so they can concentrate on the work that's most important to their research while we look for the right abstractions to independently interact with other tools without sacrificing performance. So what that looks like is we found some use cases that together addressed some research needs in our group while allowing us to explore the beginnings of a toolkit to make easy to use scriptable Python interface to complex Gromax based workflows, but to use Python as a front end to machinery that interacts mostly at the C++ level and tries to use integration with the native Gromax code to avoid sacrificing performance and I feel like I hardly need to motivate the topic to this audience but bear with me a moment just looking back at the past bio Excel webinars. I see other cases where people have had to modify Gromax source code and then have a hard time trying to keep it maintained or keep it integrated and other projects trying to tackle the workflow issues. Sometimes these are integrated tasks and sometimes they're not this this talk from earlier this year particularly caught my attention. I think I remember during the Q&A the author had described that the code extended Gromax but it sounded like there was some trouble keeping it maintained but there's a lot of great methods being implemented in this sort of project and other folks are concentrating on workflow management in various different ways. There was a recent talk about common workflow language and another project that is working to integrate different tools and try to help connect all these pieces and so in short I'm trying to make Gromax more flexible and more free in a way while keeping it fast and a few of the sort of sample scenarios to set the stage the sort of pattern that that I've been seeing over and over are a few different kind of general situations. There's adaptive workflows. For instance, I think can be maybe schematically represented something like this. You start with some sort of dynamic setup, do some simulation then have to manage trajectories and generate new inputs and launch additional simulations which ends up being a big management issue but if that's also coupled with complex analysis tasks you can end up doing a lot of work that both do both shuffle data around and manage the tasks but also to try to integrate the analysis code with the simulation for performance reasons that can end up being a problem for the first research project that we applied ourselves to here. We started with some work that had been done by a graduate student in our group that was the work was based on Gromax 2015 release I believe and it was getting harder and harder to continue to develop it and it wasn't getting any closer to something that we could really contribute back to the community. So to describe that work a little bit we were working with spectroscopists who had data describing the conformational ensemble well they were able to get spectroscopic data describing an interesting part of the conformational ensemble of a biomolecule and we wanted to understand the parts that couldn't be probed experimentally so we wanted to better sample in simulation this is similar to other problems that other people have tackled but there wasn't at the time anything integrated with Gromax to perform this restrained ensemble simulation so we looked for a way to migrate that code just to flesh this out a little bit the idea in this sort of a workflow is that we have an experimental distribution of some measurable in this case it's the pair distance between some spin labeled residues we have the experimental data for the part of the conformational ensemble that we're interested in the simulation use the historical information of the simulation data and the reference data from experiment and iteratively refine a bias potential that allows us that once it converges allows us to focus on the interesting part of the conformational ensemble so by the time we finished this proof of concept running a simulation of this sort consisted of really just a handful of python commands to drive it plus some C++ code to implement the additional forces and I've talked about performance a bit one of the key things is that the python front end largely sets up a description of the work that's going to be performed and then allows the work to be dispatched altogether and connect pieces that can all be implemented in C++ connect them directly to each other and flexibly dispatch the work to some other sort of scheduler so what that ends up looking like is that with the first command here you would start to describe this work graph that includes an operation to open some input file and attach that to a description of the molecular dynamic simulator and then the researcher or someone who's exploring our sample code repository can load a module that's completely separate from Gromax it's built against the Gromax installation but it's a separate chunk of code and the python interface allows to get a python handle to the code that implements this restraint attach it to the simulation schematically at least and then once the workflow is done the graph of interconnected computing components is fully described then it can be dispatched to run and to the way this works I mean on the right side I've just shown sort of schematically the details of this are that the python commands build up a simple data structure that's easy to interpret into a directed acyclic graph of operations and interdependencies and it's as simple as possible because it's easier to separate the user interface and the implementation this way but also we hope that with it being simple as possible it can be very clearly described in a specification so that if someone wants to use the computing infrastructure we've developed but with a different front-end tool it would be easy to write a connector or alternatively if the python interface is helpful but someone wants to attach additional code there's not a whole lot of parsing they need to do and importantly we think of this as a middleware layer that makes it easier for us to write connectors to different work dispatchers or other either workflow management software or resource management software that perhaps some of you are already actively developing so the end result again in our specific research case is that the researcher has written some code that can calculate the forces to be applied and that's all it does is it takes atom positions as input calculates a force according to the potential that the researcher has expressed in a few lines of C++ uses a couple of additional resources to allow an ensemble of simulations to share their historical data share their statistics and collectively update the bias potential they're applying to continue for another simulation segment and then to do this repeatedly until convergence so out of the box the python package that we've developed this is the version from the manuscript is GMXAPI 0.0.4 so just out of the box the basic functionality is to be able to load a single simulation or a batch of simulations and easily run it with some additional options to configure the runtime environment as you would with MD run command line fights and we've intentionally tried to separate the sort of user interface and management stuff from implementations of any interesting methods and for instance the cool part of the method that was developed is in a separate repository as a sample code and the way well this is a simpler version of well this is a simpler restraint just for illustration purposes it's also in the repository but it's just there for demonstration really and this is just what it would look like to write a simple chunk of code that calculates the harmonic force between two particles that aren't necessarily short-range non-bonded pair but just arbitrary pair of particles somewhere in the simulation which is a facility that might not seem that interesting but we can build on it a lot and it's not something that was previously in Gromax without extending some pretty extensive and intricate infrastructure that is in Gromax and would require branching the code, recompiling and maybe the argumenting so the key points here are again you get a handle to the description of the simulation you want to run starting with a standard input file then get a handle to a custom chunk of external code attach it to the simulation and run it. The current version is 0.6 it's available on github right now there's sort of three pieces this URL goes straight to the python package that is dependent currently on a forked copy of Gromax that is also linked there we are working to make that work less and less necessary and get the primary distribution of Gromax caught up with the functionality that we've introduced because the whole intention here is that this would ultimately be completely part of the Gromax project and the official releases should work with their published functionality so we may have to keep Forks active for a while to make the latest and greatest features available. Some of the things to call attention to in the current release of the python package are completely CMake driven install that should be easier than the package that was published improvements to the sample code which is in that third repository and in addition to being able to run either uncoupled or loosely coupled ensembles of simulations that can reduce data collectively and update. We've also added the ability for plugin code to issue a stop signal to the simulation in which the code is running for situations where the plugin code may or may not be applying forces but is doing some sort of analysis on the fly and looking for some sort of convergence condition for instance. In that case you wouldn't necessarily be running for a fixed number of time steps but stop on achieving convergence and the workflows that we're developing right now use a mixture of these exploratory trajectory segments that run until some sort of convergence condition refining parameters that are then used in other plugin code and launching branches of simulations with those parameters. So we're working towards some we're trying to enable some interesting workflows and we're expecting GMX API 0.0.7 to be based on functionality in Pro Max 2019 we would like compatibility to the extent that simulations from the MB run command line versus from the Python interface have exactly the same behavior because they're running on the same underlying code base and such. So what else is in the future? I mentioned both the I think a much requested feature to be able to stop an MD simulation from code implemented externally to Gromax. We are considering that to be one of a special case of much broader set of data flow sorts of features where the nodes and the work graph have distinct input data and output data that can be bound from one node to another through the simple expression at the Python level that's then captured in this intermediate data structure all of the different C++ code can be properly bound when the work is dispatched and to reiterate the thing there is this sort of thing people have been doing this sort of thing for a while with Python wrappers or even shell scripts to try different ways to collect output data and make it easily accessible as input data but what we're trying to do is to allow that to be abstractly represented in Python or in intermediate middleware API while allowing for an implementation that doesn't necessarily rely on the file system doesn't rely on simulations totally tearing down and starting up again not just within the same HPC job for instance but even within the same process I mean like CPU process without tearing down such. How does this look I want to spend a moment I guess talking about the lower level API and implementation because some folks are going to be particularly interested in how the plugins work we've provided this Python package that uses PyBind 11 for Python bindings but our goal is not to insist that that anyone should have to derive from our Python classes or depend on our Python module in order to interact with this API and plenty of people already have great Python interfaces that they're happy with and maybe implemented bindings with some other package like SWIG or maybe even directly with the API for Python so we try to make sure that the bindings protocols for the C++ objects that researchers write and for the C++ interface in Gromax so that's simply and clearly specified so that anyone can implement bindings however they want and they're free to either use the Python package that we've provided to glue everything together or implement their own stuff so this is just an example of how it's implemented in our package and if you've got your own preferred method of Python bindings hopefully it's as simple to do your favorite way without any dependencies you don't want so this would be this is a chunk of code that's out of the Python package this is the corresponding chunk of code out of the sample code that we've provided in our sample restraint repository and what this does is just declare the Python side of the simulation and the plugin code so that Python can tell the compiled code or the compiled objects about each other and get them speaking to each other through the C++ interface and once that's done the Python C API is used to start with the Python API let that turn into a C++ conversation and then the C++ objects know about each other and can sit there and wait for the later API call that tells the simulation to start and researcher doesn't need to worry about that necessarily that's more information for someone who's already got a thinking about how their package with Python bindings might interact with ours but it also I think illustrates to some degree how we've tried to separate code development responsibilities in a way we think is appropriate the researcher doesn't have to think much about how their plugin gets its Python bindings or how it talks to the Gromax instead if someone wants to implement their own plugin like the Restrained Ensemble plugin for instance the main thing they have to do is just implement a calculation function with one of the document call signatures. At this point there's a little more boilerplate to copy and paste than I would like but it really is just boilerplate it's not a whole lot of custom code and we're working such that in future simulations there's less and less of that in future versions there's less and less of that boilerplate more stuff moved to headers that are in the the upstream packages like Gromax rather than in the sample code the researchers code base and to use more templating and such so that when Gromax or other packages are updated the researcher may have to recompile their code but any API changes that were unavoidable are hidden they just get dealt with by the headers so I think there's going to be a lot of time for questions but to yeah just to recap a bit we we built a collaboration with the goal of providing different sorts of API access for people with various API needs we found a research question that required as many of those use cases as we could find to make a sort of a sample implementation and give us an excuse to develop the most core features which we released as GMX API 0.0.4 in March and the bioinformatics applications note is in early access publication now waiting for an actual issue to appear in and that was co-authored between by me and Jennifer Hayes graduate student in our lab and Peter Kassen the project the biggest part of the project was started under a national institute of health grant that established a collaboration between Pascal Mertz and Michael Schertz at the University of Colorado Boulder and Peter Kassen's group in Virginia and KTH the GROMACS team over at KTH and Stockholm of which Mark Abraham has been my close contact and collaborator and I also want to thank Jessica Nash who is my counterpart through the at the molecular science and software institute who's a great resource along with the other resources that I've had as a result of the scholarship. KTHUB URL, like I said there's three repositories but you should be able to find them all through the main entry point there and we hope that this project is of some immediate use to you but also clarifies some of the directions in which we're expanding our energy and the sorts of functionality that we hope will be more natively accessible in official GROMACS releases in the future. Before I say any more, I suppose I should find out what you folks are most interested in. Yeah, thank you Eric. This is Rosen again. So it's a very great software we're looking forward to start trying. We have a question from someone. Let's see if we can get an audio. Hi, can you hear us? Hello? Yes. Which algorithm is used for this simulation here? Which algorithm is used for the simulation? Different algorithms for simulations but particularly we must carry which algorithms are used in these specific algorithms? The main descriptions and the figures I showed are from an implementation of an algorithm based on that of Benoit Rue that was simply called the restrained ensemble method and it's more completely described in the work by Jennifer Hayes who adapted that for GROMACS for research. Thanks Eric. I was wondering if users start trying out the code and play with what's the best place to find help if they have questions? Will it be through GitHub? What's the way to communicate with private mail? You're welcome to send me a private mail but I think the new tracking system on GitHub is pretty on top of that I think. So anyone's welcome to the I1 issue and I'm totally open to suggestions of other communication channels that doesn't seem preferable. As we move towards more integration with GROMACS, I expect a conversation will probably move in the direction of the GROMACS issue tracking system. Right now, either the issue tracking system on GitHub or totally fine. There is documentation you can build in the project that's also built and linked to GROMACS.org. Again, a lot of the interesting stuff that we're trying to enable with GMX API has sample implementations in that separate sample restraint repository that's linked from the main repository and that repository in turn has some sample strips, some sample Jupyter notebooks and documentation in the code itself as well as there should be links to Docker images if you just want to get something going quickly and just kind of poke around and try the examples or see what it looks like. Thank you. We have another question from Ludovic Let's see if we can get a audio connection. I guess his audio is not working. So the question is are there any plans to build visual workflow on top of GMX API? My personal interest is both object oriented and in a nice simple procedural Python interface built on top of that. But like I said I'm trying to make it as easy as possible to integrate with whatever other efforts people want to make at any level, whether it's a high level interface or some part of the implementation. Trying to make the fine work with that. I'm not aware of any visual workflow tools that I've had a chance to talk in particular yet, but there's definitely if someone has a favorite workflow manager or something like that then I would love to try to make it as easy as possible to integrate and leverage the efforts of the different projects. There's big opportunities in that space for extending If I could expand on that for a moment, I think in the past bio-excel web presented projects that do a great job and really thoroughly go into sort of one level of interaction with the molecular simulation workflow and it may be that there's one level that that project is particularly excited about or particularly good at compared to other projects, but it seems like everyone is forced to deal with more layers of the software stack than maybe is productive. I think the biggest thing that we might accomplish in the context of those other projects is we need to make it easier to apply those more general packages to Gromax without having to wrap the command line or work with a branch of Gromax and to still get native performance without spending a whole lot of time working just on integrating some package with Gromax instead of the higher level or advanced workflows that the projects are really focused on. And so we have the I think we can provide a good Python interface at several levels to that functionality that we make access to and I hope it's useful to people but like I've said, I definitely don't want to compel people to use the interface if they have a better mind or one that suits their purposes more. And yet if there's a chance to use both the Python interface aspects that we've provided as well as Python interface for a package to be integrated then that would be great too. We're trying to make the inter-operation between different sorts of bindings as easy as possible and the data exchange as high performance and easy as possible. Thanks. We have one more question from Adam. Can you hear us? Adam? Hi, yes. Thanks, Ross. So my question is a general one. Do you think in the future as the GROMAC source code changes it's likely that there will be updates required to keep this API in sync because it's not just about sort of building a command line or building a parameter file and then running the program. It seems to be deeper than that. Is that correct? Yes, exactly. So I mean that's the problem we're trying to solve essentially is we're not the first people to make a Python interface that lets you steer a GROMAC simulation and we're not the first people to steer a GROMAC simulation but in conjunction with developing this Python package we are working specifically with the GROMAC project to develop the appropriate GROMAC level C++ API that we can build on and try to use, try to get the necessary infrastructure into the core GROMAC installation. So my goal for the coming year is that well we're in the process of integrating some of the GROMAC infrastructure that we've developed with the idea that you would be able to install GROMACs and then install the Python package and if you then upgrade GROMACs at most you have to recompile the Python package and moving forward ultimately the idea would be to have the most basic sample bindings in the GROMACs core project so that it's always in sync and we're trying to move as much of the implementation as possible down to the C++ level so that it's infrastructure that's shared with the command line tools and the other parts of the GROMAC project. The maintenance nightmare is what we're trying to avoid and trying to tackle and then like you say there's the other part the plugin code definitely operates at different levels. There's the Python side which for plugin we want to basically provide some tools so that the Python interface is basically generated for the researcher and then just work on the C++ side but the C++ code we're both introducing abstractions between the MD the computation that supports the molecular dynamic simulation so that it's if you have a task that's as simple as calculating a force based on a vector then that doesn't inherently depend on any details of GROMACs so where we can we are trying to abstract out these different types of calculations so that we can quit the most stable API on the GROMACs core and to the extent that there's instability there we can also provide adapters probably in the form of template headers so worst case scenario the researcher would just recompile the same plugin that they've already written against the new copy of GROMACs and with the updated template headers it would just recompile and work with the new version the compatibility will improve over time but already we were able to migrate code that was tightly coupled to GROMACs 2015 into something that has no appearances of being coupled to any particular GROMACs version and then eliminated a lot of lines of code this is for that restraint ensemble potential that I mentioned and in fact right now it's not built against the most current version of GROMACs but as we continue to migrate that infrastructure and get things updated I don't think the researchers code is going to change I don't think it's basically going to be installed in the new version and recompile and that's it. Great, thank you. Thank you and we have another question from Ludovit does the GROMAC API provide any utilities or helpers to prepare input topologies specifically interested in setting up large simulations like for instance the HIV cap site and even larger systems? That's a great question and I wish the answer were yes absolutely. With the 0.0.7 release and upcoming work for the rest of this fall we're going to be spending a lot more I'm going to be spending a lot more attention to input preparation and tools to set up the simulation because we want to be able to better help people run adaptive workflows and stuff it's actually pretty essential that we have more public easy to use stable tools to manipulate inputs so the same code that's being developed to connect different parts of a simulate analyze type of workflow or an adaptive workflow those same tools can be used just for general simulation preparation but no it's not in GMX API yet there are people who are working on such tools with an awareness of the possibility to integrate soon with GMX API so I hope that we see more of that this fall but right now we have to rely on other tools. Thank you Eric. These were all the questions we had until now so before we finish can we go to the next slide? I'd like to tell all of our audience that we have two more webinars coming up in the next few weeks so the 4th of October we will have Marc Baden who will show us some pretty cool way of 3D visualization of biomolecular structures so you're welcome to come and enjoy the presentation in a week after that we have Katlin Banan who will present us the SNIRNO format which is part of the open force field initiative a new take on helping all these different desperate format that are being used for molecular structures so everybody is welcome to join us for this and I'd like to thank everybody who joined us today in this talk we are looking forward to follow the developments of GMX API and we will have another presentation in the near future Thank you. That's all for today. Bye