 Hello everyone and welcome to the next edition of the BioXL webinar series. My name is Rosen Apostolouf and I will be today's host. So as you know, Gromach is one of the most widely used applications for molecular dynamics simulations and it's my pleasure to have today Paul Bauer who is working on many aspects of the Gromach's usability and he will tell us more about how you are able to contribute to the Gromach's code and the documentation. For those of you who are new to our webinar series, I'd like to give a very brief overview of BioXL and what we do. BioXL is a center of excellence for computational biomolecular research. We are almost three years old by now. Our three main lines of activity are first the improvement of performance, efficiency and scalability of several key applications which are extensively used in biomolecular research. These are Gromach which you know very well about, Hadoq which is one of the very popular application for docking, for integrative modeling and also CPMD which is extensively used for hybrid QMM simulations of, for example, enzymatic systems. In addition to the key applications, we also work on improving the usability of these codes and also coupling them with various other tools. We do this by devising efficient workflows that let researchers in one goal execute complex pipelines, for example, that are needed by their research and we work with key workflow platforms such as Galaxy, Colts, KNIME, Apache Taverna and a few others. Our center also puts a lot of effort in consultancy and training and we are promoters of best practices in the training of end users, not only in the applications but also how to write code and so software engineering best practices and others. So I'd like to introduce our presenter today Paul Bauer. He received his PhD in computational chemistry last year at Uppsala University while working with Professor Lin Kammerlin on computational understanding of enantioselective enzyme catalysis and although his main part of the doctoral work was also experimental work, he started working on software development where they were developing in half simulation software. After his PhD, Paul took a position as a researcher and scientific programmer at Ellington's Group and for the last year he's been extensively contributing to Gromach's working with the infrastructure, improving the documentation and he's the best person to tell us about how to contribute to Gromach's and why do it and what's the best way. So welcome Paul. Thank you, Wilson, and thank you for the introduction and giving me the opportunity to share this talk with the people that are interested in it in the form of the BioExcel webinar series. During the talk I will go through a few general questions about Gromach's how to contribute to code and especially why you should contribute if possible. I will give you an introduction to Gromach's documentation that has intensive information on that to find the information how to contribute and what you have to consider before contributing. I will go through a few examples that I think will very well from the last year internal and external contributors used our existing documentation to provide us with new functionality and updates to our software and I will finish a few sessions on general advice for users, contributors and the like on what they should consider and do if they want to contribute to projects in general and what is in general a bad idea if they want to contribute because they lead to hassles for the developers and also for the people that are contributing. So to the question why contribute to code, this is something many people ask themselves when they see a large software project especially one that has been established for several years and that has a few core developers that know best what to do. Other people then think there are other people, external people think there are main developers working on it, they know what they are doing so I shouldn't bother. Also it's often in the case of complex codes, one is one of them, they are even more visible examples of this for example in the Linux kernel, that's often the idea of new people that they think they don't have enough expertise to actually do something in the code and they would just be a drag on others. It can also be that they have to try to contribute to different projects or even the same project before and they got harsh review on their contributions. So they didn't like the impossible or they didn't like what has been done and they feel that their effort had been wasted and contributions were not acknowledged and they think that after a few years of nothing being done they shouldn't even try to do anything anymore. My opinion this is a set of own assumptions because there's always good reasons to contribute to whatever project you can think about, it could be just fixing a very small issue that you had found by using the software and it's not new, maybe it's just a type on the output, it is a missing comma in the documentation or a hassle in the user interface where you need to put extra options if you don't need it. It could also be that you have been an expert in the kind of development or the application and you know how to improve it, you know performance improvement, you can prove it, then of course you should contribute this at some point. It could be that you're working on new ways to analyze data that is produced by a program or by other software, you want to include this analysis method to the software and you want to make sure that it can be, that you want to include the analysis method to be a simple part of the code and of course it's always possible that you want to improve documentation because you found something that was unclear that was explained to you but has not been updated in the documentation itself and then of course you can contribute and fix this. So how to do this and what you should think about when you decide that you want to contribute to any project. First as a general advice you always need to have a good idea of what you want to do and what you are planning with your changes, how those changes should look like. You should have worked out the requirements of what is needed or whatever you want to implement or what you want to change and also reduce the scope of your possible changes as much as possible because it's always more difficult to implement something that is itself big, that is maybe hundreds of thousands lines of code changes and that does everything that you can think of at once. That usually is not what is needed. What is needed is the test case that shows that this feature does work as it should in a very simple case without any additional work, even performance improvement aspects and of course always if you want to change something you should have tested it before that it actually works and that's what it does but that's what it's supposed to be doing and it doesn't do anything else, it doesn't interfere with the rest of the running of the program and that's for example in the performance as much as much. So if you want to contribute for example to Gromax and how to contribute there are a few central places that I will also introduce to you, those are www.garromax.org. This is our code review server that is also linked to our integrated testing infrastructure. We have a WebMind issue tracker, WebMind Gromax.org that is used to keep track of what we want to do with the code, conducts that we know of or that users have found tasks to improve aspects of the code and features that should be implemented or kept track of them. Our important resources are user mailing lists where general questions about code comes are usually asked by people that use it and usually you can find any information like a bunch of questions that have been answered or that have already been answered can be also searched by looking for the archives of the maintenance and we also have a developer mailing list that is focused on developer questions and questions that are actually related to code development and what should be changed for some of the new versions of the code and also we can get information of new versions, new features, new releases and our plans for the future. So as I said, we should overview what you can see right now is actually a picture of our code reviews site from yesterday morning. They see that we're currently working on a few changes to the code that are public and can be reviewed by external people and internal people allowing us to vote and judge their code for correctness. It turns to our coding standards and also if they follow the general vision of the project in the end. This infrastructure, the server is our simple infrastructure is linked to our automated testing system that allows us to verify each submission to it to see if it passes our tests on different and build systems. For example, different versions of compilers, different versions of subsidiary software such as CUDA. And also if it works on different versions of creation systems to make sure it works on Linux, Mac and Windows in the end. Another simple information infrastructure is our issue tracker. This is a red line. You'll see that all issues that have been filed by users are kept track of and can be searched and investigated by people that want to know for example which box could be simply fixed. You have this information also in the documentation in the list notes, but sometimes it can be also important to see what actually we're done. And the advantage of the issue tracker is that it also links to the changes in the repository and on the error. Allowing us to quickly crossfire for these changes to issues, features and tasks. Make it possible to see when a task for example is completed and the issue can be closed or to see if the box was fixed recently and can be also checked by users that reported the box. So after you have gone to these pages and informed yourself about what you wanted to do, then if you still think that your project idea is good, what you're coding on here, it's usually a good idea to contact one of the core developers on the mailing list. You can find those developers as simply the most active people on the web. And contact them directly if you have an idea, but it's in general also good idea to post it to the bottom. My name is to see what is the consensus of other people that we are not core developers, but they're invested in the pros of the software. After the discussion there, the discussion usually moves over to Batman with a new issue that can be discussed. There can be important feedback to show how the project tackled. And you can get some ideas for example how to improve your general concept or how to reduce the scope of the problem that you're working on. After this, after the discussion there, it comes just a housekeeping part. You need to have your own test repository, you need to be able to build the software. And you need to of course do the necessary task of preparing for the preparing tests for what will be changed by your new feature that you're implementing to make sure as I said before that it does achieve what it's supposed to be doing. This is the point where you usually upload a change together and this can be also a point where you have to prepare for some frustration that is inherent to the share software development. There's a change that is uploaded together that is something that can be viewable and can be viewed by everyone on the server. And code review and for going with it usually tends to be very on point to keep up our software standards but also helpful. We try to help new people and also code developers to improve their code to always get something better out of that and help to reduce the scope for future bugs so they are not hidden because something was implemented for them. Can be quite possible that any of the changes you're doing, especially if it involves some changes to corporates of governments, to go for a large number of reviews that can go as long as 60 or 70 even in a few weeks, if multiple people have interest and are invested in your change and want to make sure that it's a toaster of standards in the end. It can even be that if some people think that your change is not toaster, it's for that it is maybe not the optimal way to implement a new feature that's initially injected and then you have to consider it and see from the feedback that you got how you could restart your implementation. You could see that the upload and new change that takes those considerations into account first, that goes then again for the ones of code review to actually be accepted. Now it's also important that you have to think that developers, code developers are humans and they also can be frustrated by their work. More on this on what you should be doing and what you should not be doing to reduce frustration on all sides will be mentioned later. But just a few main points, it's not a good idea usually to upload large changes to the code without any context. If we see that someone uploads 20,000 lines of code without any reference to wetman, even though if this is the most beautiful layout code with beautiful documentation, it's likely won't be accepted because we need to have a discussion body first. Another bad thing is to not have tested your changes before and seeing that everything you tried to do with somebody else in our testing infrastructure. This means that we have to go through changes before again. And it keeps at all times that we know that your stuff is getting tested. Other people cannot test their code themselves. Also, we have extensive information about coding style and coding standards. And the new coach has always told us this time to make sure that there is not that much noise about them. You should change your coding style to the standard that we have agreed on. But you can start with this from the beginning. Now, everything that I just talked about, you can actually read about in our documentation. And the Gormex documentation is basically your source for everything that you need to start your own development, that you need to start your own work with Gormex. And that you need to, that you can go through to get more knowledge about the backgrounds, about the physical dimensions of the use of Gormex, and what kinds of model it implements. Gormex documentation itself has been recently needed to be hopefully more accessible to users. And contains as a set of information on how to compile the code, what it can do. But it has always been a work in progress and has always been on the stage of it can be improved, it should be updated more, but there's usually not enough time for it. Currently, the documentation is available both as an online web page where you can get information about how to code. You have to release notes for the different versions, installation guide, user guide, some hard tools for beginners, and a few other sections for some of the reference manual, developer guides, and even for code documentation from Doxigen. You also have most of this available as a PDF, but we think that it is better to have it actually available as a web page that can be cross-linked easily. In the documentation, the different sections are split up according to the topic. The user guide provides information for end-up user people that are new to simulation software, new to Gormex itself, and gives you an idea of what the software can do, answers some frequently asked questions, and gives you also some extended information for how you should manage simulations, and what you should think about before stopping them. The reference manual is the Holy Bible of Simulation with Gormex because it explains everything that is in it for what kind of assumptions are made, what kind of physics are modeled, how the model and how physics is modeled, what kind of different methods are made, and how those methods are connected. The developer guide, on the other hand, contains more information on what you can do if you want to implement your own tools, what you can do to get an idea for contributing to Gormex, for example. There you find the contributing to Gormex section, I think this is probably of most interest for people that are joining you. There you have the checklist for Gormex. We will also give you some ideas for example, if you have a change that is not going to be accepted or you think maybe there is not enough interest in it. For you to fork the Gormex repository, keep track on our current stable versions, but keep your code in a separate repository in the end. You have to find, I hope you can all see those links here. You can see for them in the end. You can find the user guides, reference manual, and other sections of the manual that are released for the current beta version, all online. And can also browse them if you're interested in that at some point. As I mentioned before, you also have full code documentation for everything that is currently documented in Gormex, and it's only been more version. And the documentation material. You can also get all of this directly from the manual documentation, not only for the current beta version, but also for all the previous stable versions, and also the current in-development version that is just the daily build from almost a bunch. Now to some examples that I will just introduce you for how code can be contributed to Gormex, and I will do first the shameless plug, because I'm going to show my first project that was related to Gormex, and that was a rework of the documentation. Initially, documentation in Gormex was always handled as two separate pieces. He had the online documentation that was user guide, install guide, and developer manual, and there was the reference manual that was in external documentation. And then there was the PDF that contained the physics. Those kinds of this kind of split made it actually difficult to relate the different sections, because you needed to always have both of them separately at hand. And it also meant there was additional work needed to keep them in sync. If something got changed in the user guide on managing simulations, it would be more prone to errors. There are also several other pages from previous versions of Gormex that contained some information that they were not kept up to date and are also difficult to access in the end. My process to change this was initially actually learn how our infrastructure works, how, for example, our automated tests are the quarterly system books and how a built environment for the documentation should be set up to test that the documentation actually works. And the following parts they are actually what you would expect. See that the new changes to the documentation can be accepted and to upload the change to the server that tests it following by after a while approval from the other developers. All of this was actually what you saw in the section from the embedded version is that the format got format documentation for the documentation got unified. Now it is all available as RST market, or if someone is not familiar it's restructured text market. This allows us to both provide HTML documentation that we have now on our page and also allows us to build the PDF version from the same source material, meaning that we only have to keep track of one of them in the end. Another advantage of changing this to one or basically one unified documentation is that we can now cross-reference between all different parts and make it easy for users to see what the simulation for example uses a specific simulation method means in terms of the physics explained in the reference manual. Another example is that we therefore were able to duplicate some of the old webpages that were out of sync and we also planned to have the documentation as well as the conspiring that just the unique digital object unit files make it possible for users to always refer to the version of documentation that they used from the release release and not having any ambiguity about this. Another example that we had from working from an external computer was some updates to the GMX cluster tool. Then some discussions started on the developer list and later on RedMind from someone that saw that there were actually some missing functionalities in the tool that default should be easy to fix and could be of use with or could be of a lot of use to other people. The discussion on RedMind led to him uploading a patch to build this work working without any trouble. With the other developers then helping to clean up this patch reviewing it and leading to it to be submitted in the latest release. Another example was updates to public detection that was actually a student from a student contact us and the developer team for a project to get started with software development and get started by looking for example in the performance community. Among these amount of developers was then decided that one possible let's see the type I'm sorry some that's one part that could be updated in a moment would be the hardware detection library HWH that is used to detect what kind of system performance is running on to get the best performance improvement possible performance. We also worked on getting to know our code system and ultimately testing close by working on some smaller issues again in the documentation and successfully was able to upload a patch that was agreed on by the different developers to support a new version of the hardware detection library and considered support for the older version. It led to actually some that they identified and that they fixed by the developers and the contributor is still active in the community and helps both with working on new features and coding. Showing that this is actually quite easy to get integrated in the developer community and be active in the project. What all of the other different examples had in common was that people that wanted to contribute got involved in the community that means actually helping with the software development problem. It not just means uploading code together and fixing things improvement documentation but also helping in giving you other people's changes. It means spending time to go and go to code that you need and think that you understand but you think you can help with you and suggesting how you can improve or saying this works as it should. It should be included. It also means that people should be active or can be active on the users and developer and in this to help people with issues with running the program or with developer questions itself. A very important thing is also that you shouldn't first start with trying to fix the interesting thing you're working on but maybe fix something small first to get better to know what is in the code base how the different systems work how code review usually works and how to quickly get code accepted. Another thing is that it's always good to be actually active on our issue tracker and code review server itself because it means that you care about the state of the program. If someone just uploads a change and expects us to fix it in the future after it got accepted in the code, it's very likely it will be deprecated in a short time because no one will have to acknowledge no harm and time to actually continue to work on it. It's always good to actually care about other people's goals, trying to find errors, trying to find possible improvements and trying to prevent bugs before they happen. But what should be mentioned and what is always an important part of being active in the local community is that code review and discussion on that might get user-maintenance developer-maintenance should be followed to keep up our coding standards to keep up the aim of the project and most importantly to keep the physics correct to make sure that people can continue to do awesome science work comments. But you need to stay civil and kind. You need to be able to show people how to improve their code without being accuses to them or being neglected of their own ideas. So if you find an issue with someone else's code, you should point it out to them. You should just say this can be improved enough. This is just wrong. Just as important are things that definitely shouldn't be done and this is almost all the time just being able to do anything that you want to change. Time for developers is finite. You can only you can only work 24 hours and then there's the night. But developers they cannot spend all day reviewing the change after one project so we have our commitments to work. We're going and reviewing a launch change takes time. It cannot be done within 30 minutes by something else. It's happening or you have downtime from other projects. It takes a lot of time to understand what is happening and also to get an idea of what was meant to be done in general. Another problem is that our testing infrastructure is also limited, has limited resources. For example right now for a single change pass or verification on the current developer branch takes about 22 to 24 minutes for all systems to be freed again. This means if someone uploads different changes that should all be reviewed or should all be checked by our testing system that for more than three hours no one else can do anything. So it's not a good idea to just upload even if you have changes even if they are small we can maybe easily be reviewed if you will block our people from contributing at the same time. This also means that you should definitely check on whatever hardware that is available to you that your code will pass to make sure that the testing infrastructure is not used to build the binary tested in some cases and then having it fail in most of the other cases wasting everyone's time in the code review. And it's also very important to listen to the advice that you actually get from other developers especially from the senior developers. If they think that your code can be improved then it probably can and you should take it into consideration get updated to new coding stars to different constructs in C++ for example and try to implement their changes as good as you can. And another thing that sometimes come up is that you see extremely large changes upload response and yeah we have lenses of 20,000 plus lines changes are bad idea. Because they will make it difficult for review. You can see how much it changes that is uploaded and the 20,000 plus line change will probably scare away even the most hardware developer that is used to review a lot of code at once. This kind of code hides a lot of complexity also because you usually forget what was done in the first file after in your touching and going through file 25. Meaning that there is a sub debug that is hidden because of some strange combination of input method and assumption to be very hard to find. And this kind of changes also means that probably a lot of parts of the code are touched and modified at once making it more and more difficult to understand and levelate what is happening in new change. Yeah this will need to box it may need to be done again. So if you want to have large changes you usually have to see those as I mentioned in the beginning that you can reduce the scope as much as possible and then add more and more different aspects to the code after it has been validated to do what it's supposed to be doing. After calling cases have been identified and probably accounted for. And it has been shown that there is no negative impact on the rest of the code. This concludes the presentation part and I want to thank some people that involved in my framework. This is of course the project needs for Glomax especially Eric Linn of the class in Mark Abraham in Stockholm as well as everyone in the cyber-client that is working on Glomax and of course I want to thank all developers and contributors to Glomax and all the users that continue to do awesome science work with and that shows that they are doing something that is workable. And with this I want to thank everyone and we will go over to the questions and answers session. Thank you Paul for the very thorough explanation and yes so now we have the questions and answers open. Please use the questions tab in the control panel to write your questions. We don't have at the moment yet so please use it. So also Glomax website is under new development when can we expect to see a new one? So the main website is also some of the older information that is difficult to migrate from it. We still keep it but in general we think that most of the different pages should redirect to the manual .org because we have now all the information about the project centralized there and easily accessible for some of the different versions of the program. Download things for every one of the students to all of the different development infrastructure. Yes it did a lot of effort is being put in the code so for infrastructure it requires even more. So if there are questions from the audience please use the questions tab. We have a question by Adam . So I was a clear explanation of how to contribute if I already know the area in which I want to contribute what would you advise is the best way to learn about the parts of the code that are most extensible to learn about the architecture of the codes the parts that can be changed modifying the central part of the code that makes it difficult to test in your opinion what areas of the code or what aspects of it are most accessible for improvements and modifications. So one area is of course the documentation if people want to change something then to get started is always good to look at the documentation and change something there another area in my opinion is quite easily accessible with this framework for analysis methods because we also provide a template that people can modify to implement their own method and to learn how the analysis works in the end. Otherwise to find out how different parts of the software look I think that the documentation is very structured and informative to get started on different aspects that are of interest to use. Okay, thank you. Thank you Adam. So we have a question by Gisun Li for Gisun Li pronunciation let's see if we can get a value connection Hi, can you hear us? Yeah, I can hear. Oh, great. So you're welcome to ask your question. Yeah, I want to know will Gromax consider to support multi-body potential such as EAM used for metastimulation? What do you mean with multiple potentials? I don't think I am that familiar with this kind of approach I'm sorry. So do you mean different potentials for like three-body? Yeah because if I want to do some simulation about the material sometimes Gromax cannot support some potential so I cannot use it so I'm wondering if Gromax will consider to support these potentials. I don't think there are any plans but now to support this kind of potential in Gromax. There is work on providing better support for QMNM approaches to maybe provide something that is closer to this kind of approach but for direct implementation of multi-body potential I don't think that's anything planned at the moment. Okay, thank you. Actually it's a little difficult, it's not easy to implement new potentials straight away without affecting performance and this they touch on the very core parts of the code so it's not easy at the moment to extend with such like lamps for example they support different potentials but this comes at the expense of performance. In Gromax there is support for user-defined potentials using tables of interactions and forces and forces so it is possible to have your own user-defined potentials that you can use for simulating the system. Okay, thank you. But these you need to tabulate Yes, they need to be tabulated for it and they usually are not able to use for this kind of performance in Gromax this available for already implemented potentials. Okay, thank you. You're welcome. Do we have other questions? Still have time? Can you please use the control panel? Well, if there's no more questions we'll be the best way to touch with you all just drop your email. If somebody is maybe hesitating if it's a new user and is a little unsure whether they can actually found a bug or something needs to be changed I suppose everybody is welcome to contribute and the thing is if you think you found a bug then I think you should let us know if you think something is unexpected in the software then most likely is unexpected behavior and even if it's not we may have to explain better by the software behaves in a way that you are not expecting it to. It really helps to get an outside view of how information is presented as well. We have this problem and as developers that we often we see what we are working on and we see what we are doing and what we are used to but we don't know we don't always know how people use this software. You tend to develop like a blind eye for someone I think that's right. So if people use it for interesting science if you didn't think about they need to let us know or if something doesn't behave as they think it should. That's true also for all other community projects. It's always true. But Gromsky has a big user community so he has a big opportunity to improve quickly. All right well since we don't have other questions that we can close today's presentation as I said we will upload a recording of the webinar on the BarXL YouTube channel and on the website so you're welcome to watch it again or forward it to your friends and colleagues and to let you know that coming up next in the BarXL webinar series we have two presentations one is on one high level framework the COM says and it's Python variant PyComps which is exclusively used for development of parallel applications will be presented by Danio Lezzi from the Barcelona Supercomputing Center and for especially for the Gromacs users and our listeners today in November in the month it's time we have a presentation by Michael Gert who will talk about MD benchmark a new system for easy benchmarking of your course something that is very important for everybody who is using in particular large scale approaches that can help you find the optimum way to run your simulations without using too much compute cycles so everyone is welcome to register and join us for these webinars I'd like to thank all the participants for joining today and Paul for the great talk and we will see you again next time thank you and that will be all for today bye