 So we had two slanted approaches to the whole thing. One thing was going one by one in our heads through the pipeline and figured what are the things that just keep us from just having a very streamlined dissemination process of that. So the chronological order and there we just figured that there's things like structure repair as we call it, that is all kinds of things like adding hydrogen bonds, adding pines into places. It's not very sanitized. It's not very easy to describe. It's not very easy to follow through. So that is something we would like to see taken action on. We figured it's relatively difficult just because there's so many different ways of fixing up things and not very good dissemination. So do you envision just one tool that everybody would use or a common way of many tools being able to be used on the same thing? No, I think this goes back to everything we had today that it's mostly about ontology. It's not much about how you implement things. Do we have a way to say that I added hydrogen over here? Do we have a way of saying, OK, the structure was incomplete but I did these few fix-ups and then we shouldn't care about what tool does the actual job. So is it ontology for operations you performed on it? Yes, exactly. You would presumably also need a common format for representing the thing that has whatever items you already put in it, right? Yeah. I mean, we didn't discuss that so much. I would still prefer to say structure repair. Kind of the... I'm just taking notes. The way of describing what you've done is the first thing to do. This is my group. Can we use the whole spectrum? Is there anything that's always left on a part of the video? No, no, no. But there's the beer. Yeah. We're okay to share. OK, I just move it a bit. Yeah, as I see it. Oh, no, also there. Yeah, how about beer, man? I would just keep it in whichever... Yeah, that's easy to get. See, you had it. But pretty important. Then there's another thing. OK, that's it. Then, following up on this, even before the liquid and molecule parameterization standards are lacking a bit, especially if you don't use a standard force field board, you slightly modify your force field also there. We miss even the whole language to describe these things and a way to make sure that also when we simulate the force field is actually what we think is happening. One case, for example, are things like changing ion committees would be true. Part of the community just did pre-default on Ember 99 because so in chloride was... Crystallizing. Crystallizing is that way. It's just nowhere documented. You just had either the patch or non-patch force field that is horrible for any kind of reproducibility or a smooth isolation workflow. So also that kind of thing that you say, OK, I parameterized my molecule. I used a certain force field. These things are described in a better way, a more sanitized way, and that we also start using databases with all legal parameters more concisely. So when you say the standards, do you mean a standard way of describing how you apply the force field parameters or a standard way of describing how you apply the parameters or what's the... The first and easiest thing I think would be to give you a source form so that we keep the source of the data and the model simulations so that we can switch over to what kind of tools we have for the setup already. And while we see that things we already do have to make things happen is something like a charm GUI for setting up simulations. We're setting up free energy calculations. There's, for example, something called PMX for the BIO setup framework. There's other things. For the workflows themselves, we have a BIO sim space called the workflow language being developed. We also have openMM, for example. These are all tools that are already in place. We have workflows. Also, our MD software packages themselves we have, and that is, I think, existing tools but at the same time also concerns and difficulties of the same posted because we see, and we just briefly check, we keep here there's a 60 or more than a 60. And these software packages we just listed a few where we thought, okay, these are kind of important and all the data we posted is filled so that makes it hard also to take action. If we want to describe workflows and streamline the simulation setup we would need some kind of comprehensive way of describing what these packages can do, what they require, what they need to do. So we would need some kind of way to listing commonalities between soft barriers to listing overlaps and the blah, it's an abaction, I think. Then just like this and also with the difficulty and concert comes all these different formats. We assume we have 60 different file formats that makes an end screen for intercom really, which we all define. So we have something that is like an end screen problem for a translation between file formats and that is hardly maintainable. Things will go off-scope, things will not be implemented properly, there will be a certain transformation, translation that will break so that is something we see as an example. Yeah, if you go always to one, exactly, but this is what we have, but we need some kind of a common object model to avoid an end-screen complexity. So that is like, as it actually we put it in here, say that you need that type of common object model so that you are sure you can do that all to one translation and you know that you have a one-to-one correspondence when you do the translations. So that goes on a different side-line but what we figure was extremely important is that we start to modulate our tools on the one hand so that no tool is magic to a whole tool on the one hand but then on the other hand we are able to layer our tools and mix them up accordingly that means we have a very clear distinction between tools that do transformations of input data that change input data and tools that can bundle functionality that is we put this under two key roots that is modularization on the one hand on the long side of that stratification so we have a clear layering of responsibilities I reckon ladies this is for to have a gift then along side with this modularization need and like this historical depth we see coming with these good old tools is something that is also again related to standardized structure repair these tools that try to be auto-magic and we have this this int graph that is thrown up here we see on the x-axis it's the magic and on the y-axis it's a catastrophic failure that is for a long time you can just do lots of magic do all the wizardry you'd like to do not much happens but then there's a critical transition and suddenly very likely you just see catastrophic failure if you cross this written magic boundary we have this happening in our efforts at the moment and yeah then just in our collection of tools there's two mentions to mention MB analysis and something we should use as an existing tool and anyone so like this ENI one that was brought up but maybe you can say there's a whole state of people that are fitting in high level quantum mechanics with machine learning models and so a lot of the new types of interesting potentials for small molecules or proteins are these machine learning potentials that can run in things like TensorFlow or PyTorch so the question of how do we make things interoperable with those kinds of neural network or deep learning based methods cool, alright thanks who's next let me go over here we'll start with you think it's the easy one that was pointed out to me I even get this wrong all the time step one it will help if our tools come with correct documentation which was tested live I've got the minus P8888 per my 1888 but actually documentation for our tools is generally rubbish because we all don't like writing documentation it happens after you've written the software and everything is cool related to that is we need to basically develop more training material to develop and share best practice so again one of the things that comes up are run-lead simulations but actually what is the right way to run it and the tool is not the place that defines the right way to run it it's the community that should define the right way to run it and when they do that we're going to enter a writing training material and sharing best practice which I think is a very important and maybe a little bit more difficult then do you envision that it's something a human would read and then somehow implement or it's something that would be a pipeline that someone could run like your example notebook I think I think humans will come up with the protocol but I don't know what the best way to represent them is because the protocol is more just the data it's actually executable as well it seems like a video workflow I mean you have a protocol as an object in your example is that what you're thinking about is that darkening these sorts of things or by running the protocols or do you mean like running down all of the steps specifically in the checklist so someone could come in and understand it and implement that which I would say the training materials which you can do is as training events if you do not have the training events you can run the MDM thing using those protocols I think they will eventually end up being things like objects because it's not just data executable as well which are objects that represent protocols and then having a mechanism where we can share objects which represent protocols and then if we then got into we could do this in UML we could do this in other things and that's really what we had next is make it easier to define and share protocols which is kind of I think important but heading towards more difficult what's really difficult but is really important is that actually we're really bad with our tools of telling people why they failed so typically when you run a parameterization and the parameterization doesn't work it crashes which is a really useless way of telling you it's failed well it might work but it's done something badly wrong like parameterize your NH2 group which is a bit dodgy just because you didn't convert or you didn't do your own calculation but hey as a conversion we'll still give you charges so we're really bad in workflows we're actually reporting errors and reporting errors in a way that we can catch them so we had some way of standardizing or at least documenting how the thing reports errors which is really nice quite difficult to find there's NH2 they're very bad and you only look at them and normally the PhD students will only look at them at the end when they make the figures of placement I've done that I think it's really important that we improve its probability and tools that really fits in with the modularization stratification ultimately we should move away from the culture of a PI and the group have their monolithic tool that does everything and we will all live in that world very lax, dandy, charm and blah blah blah and move to a world where actually you don't even know what software is being run because it's all hidden in a Python script you know all the algorithms should just be pick up or put down a ball and we should have some mechanism of tracking providence of code so that the person who wrote pandas can see that's having massive impact pandas written by one individual is being used by more scientists and researchers having more impact than any empty package and yet that person who wrote pandas is almost jobless now they've only just been picked up by non-focus and so we need ways of actually breaking this I am the person who wrote the software and making our tools into profitable and to get that you kind of need to have a culture change so we need to culture change into community I'd actually say we should get used to the interoperable tools we shouldn't be making a brand new tool to do something the owner of the software should not be the one who develops it we should actually be happy that if you start a piece of software you no longer work on it in the future but one of the things I say to people who want to write new software is before you start create your exit plan because you don't want to be developing and maintaining that software in five years time you don't want to be developing it when you're 60 so you start software what's your exit plan to be happy for their software to be developed by other people does this require the character or the stick? both and so I think there's one of the things if you're carrying a software strategy then we're very much going we want to make it profitable software we're not going to be going down the model we'll let it fruit anymore and here it is, we'll give you money if you're managing to make interoperable software and that's measured in practice during the and that's what we're trying to do but ultimately just writing calls saying the call is not write a new piece of software to do it the call is how do you make interoperable or how can you add the functionality to existing codes and the fact that we got by SimSpace funded which is ultimately just a load of file format converters that's one of the first things that came out of this we didn't produce new software it's just file format converters but about the stick the stick I think is there's a community which you actually shunt towards you know ultimately you should not we shouldn't have this balkanisation you know, as a PhD student if you're being told by your supervisor use this particular package and you will let everything in this one package so classic one there's no one of Ariov-Arshall's group here before I offend anyone I really like Ariov we get on relatively well the problem with Ariov-Arshall's group is obviously they have their own stack by stack for everything and if you've been in the balkal group you will be in that stack and you will never go out of that stack and you have this whole parallel molecular simulation world that's existing in this stack and it's fantastic because the methodology develops at 20 years beyond where we are obviously we don't know how they work yet but there's a whole career to be made deciphering the papers and translating but then just have to get all the understanding we implement all the software and actually fight back and say this tool is something which I should be able to do this in other tools I should be able to choose the tools I'm using and I should be pushing towards tools which are more open by design and actually making those decisions early on in the project so are you suggesting that people are starting to change and how power they're in it's quite important because ultimately the other thing that came before is like culture change you're being told by your supervisor to use a package because they used it when they were postdoc 20 years old already that's why in this field we are still doing the same things we were doing when I was a PhD student in the 90s we are still using text based interfaces and logging on with SSH even though we have graphics now why are we using text consoles when we have graphics there's firepots it's more expressive data sharing and cleanup we need to basically share our data because we need to actually engage in a process where we're cleaning up the tools and cleaning up the data we're producing because ultimately there is a reproducibility crisis because we're just running a simulation and publishing the result immediately if something interesting happened in the simulation and we won't be running it again in case it doesn't happen again in the simulation as I say in my software engineering part I try and get people to do testing and I get them to do testing by saying at the moment the way we do testing in science is you run your script and if it all looks right it worked and if it doesn't look right you fix the script until it worked and that's the way some scientists do testing but what we have is there are major major attractions now in nature and science and other things where scientists have had their careers destroyed because the software was wrong because it wasn't tested we need to get people to actually do unit testing regression testing, performance testing and actually make it so we can take all of this data into being shared and do something with it i.e. we run it with lots of different tools so we can actually see which tools work and which tools don't but then we can regression test as we develop the versions of tools I'm getting the thing we need to accept just the discussion I had just now and I apologize for this we have a fear of writing software that depends on other people's code but other people's code is a whole cesspit that I don't want anywhere near my code and we're terrified depending on other people's code so we end up putting everything into one monolithic thing and the discussion on having a single file format kind of revolved around we want a single file format that we can write so we don't have to depend on a third party to do it but we'll give the single file format out as a third party tool for other people to use and it's fundamental contradiction you have to accept that we're going to have dependencies to do any real work you're going to build on top of other people's code so we need dependencies we have to accept that I think it's quite difficult for people to accept it but it is an important thing to do but then when we accept we have to depend on other people's code and depend on other people's software dealing with dependency management is extremely difficult I hate dependency management I hate Conda now and Conda is was like you know from Star Wars episode 3 you were the chosen one and there was a meme with Conda for that because Conda was meant to solve all of our dependency management problems but when Conda installs something it breaks everything else and changes the version of Python for crying out loud but we need to fix this if we can in the field fix dependency management and actually all got together and said let's make proper Conda packages that actually respect dependencies and report the things they depend on and we actually curate a standard set so if BIO Conda actually worked it would make it so much easier because then I could do Conda install growmax Conda install lambda Conda install all of these tools and it would work, that would be my promised land that I would like to go to and that's why installing biosinspace is such a pain in the neck because what we had to do is take Conda install everything into Conda lock the versions down and then we create a binary package and load it and then you pack it on your machine and you never run Conda update or Conda install in that package ever again I'm going to volunteer mostly to help out on this because I know that my former post-doctor Levi Eden is happy to step to weed in and resolve some of these issues and I think some of the lessons could be shared a bit more broadly there and then your programs crash too much sorry but they crash they're not robust they're not sufficiently module I think we said before should stratification but these are all problems they are problems they're not robust they're not interoperable it's really annoying the D3R challenge we would have finished it probably last Tuesday but we're running the free energy calculations they just randomly crash this is the only field where we accept randomly crashing software it's like would you want your airbus like autopilot and it's a significantly more complicated piece of code than the package even though they have to reboot most of them reboot 24 days or something like that otherwise it will fall out of the sky I think it was the 77 that was the 77 yep I'm disappointed there was no software engineering it was collapsed on much of the aviation industry and I'll now be flying airbus so it's not interchangable the other thing is about methods is if we could find a way where we could actually publish protocols and share protocols we can actually just say we use standard protocol X so I used the like you know Jameson group protocol X for setting up ligands and almost give a gig of a liver to it and if we can do things like that and actually you can always think about automating writing methods in shows because that would be really useful difficult but very very useful you're actually going to be able to find and then we're excited to find out what's going on Eric was sitting in Gromac's doesvis which is protocols.io it's an existing tool that does that I think and then the other existing tool is htmd and Biosynics Waste is not an existing tool it's an existing development I'll keep that to say it's not complete we are still halfway through the project and what we're quite keen to have is actually equally developed tools at the moment we're writing the wrappers around those tools ourselves what would be really cool is if people who develop a tool like tool to protonate a protein or a tool to parameterize a ligand or whatever we're very happy to talk to you and help you write the wrappers around your tool so that we can deal with all your file for one supporting you can then only deal with one file for that and yet you can still put it into everything else it's hardly important and data publication I can't remember just it would be really good if we published our data actually I remember as you did in the data publication so the reason I came so I've spent too long in this career working in this field and I left it for a bit and it was always in the case of being a PhD student where you always were given something from a previous PhD student you had to reproduce it or you then had a result in a paper and you had to reproduce that and just getting to the idea that somebody would actually share their starting structures and their input files and I was back in the early 2000s people were not doing that people were not sharing their starting structures and their input files and they still hold on to them thinking that somebody is about to steal them from them at any moment and rerun the simulation before them as I said all of our input files and starting structures for our D3R challenge are on github now as software developers when you're coding it's live going on to the internet now it's not waiting for publication and three months later they'll publish it you're doing it live can I just also say I think that's also really important because if you share a bunch of files you call them file 1, file 2, file 3 not really this much so there's no point in doing that you can share it on you can share it on I was looking at the product and I was like it's not a product it's not a product it's not a product it's not a product it's not going to hurt it's not going to hurt you can do more exercise this way we can go on we can go on for a run there you guys so I think we'll agree that if you want to reduce friction that's amazing how is it not important I really objected to ever target a D3R that is what it's meant to be I think I'm balancing basically there was nothing there one thing that has been mentioned already that really reduces friction is when things are documented and documentation is somewhere on the board the documentation is good when you already know what you're doing so training is important as well and training is there as well it's a big overlap between the yellows then this is one of the things actually all of this documentation training would reduce the inertia there is within a group to use that specific tool because a specific tool is used because the PI said so but also because there's some expertise people know how to use that tool people do not know how to use the other tool then you need to lose three weeks to learn how to use this other tool and then you hope for the best that you are using it properly so we are not looking enough to the other tools the one that exists already and this is partial because of documentation generally black there have been efforts to address that for example, morcy has a giant database of a bunch of random stuff in carcam but it sounds useful yeah but then does it that's not trained we know it exists we have many lines of code there because that's one of the fields but it does not tell me how to use it properly and I don't know if I used it properly until I have trained so there are many tools and we should have more tools but that requires tools to be more generally enough so they only do their thing and they do it well and that would be highly facilitated if we had a common data model and we go back to the first topic of this morning the we do not have this common data model that is agreed upon yet but we do have tools to convert between file format between project we have that can produce input for many simulation engine we have AC part that can convert topology from some format from some format to some others the last thing there would just be easier if we have this common if we could agree on what the bond is so we have all these tools all these tools are dealing nicely with what they know but then we don't do all the same thing because then we don't need to be that many if it's always to run the same simulation so we will run simulation with stuff that are slightly weird that are not covered by the tools like this weird post-transitional modification suddenly your protein is not a linear polymer anymore it's a branched one PDB to GMX breaks and so we need to be able to be more flexible to look at this non-generic aspect that breaks the assumption of the existing tools which goes back to the documentation because often it breaks because the assumption is not documented how many of the tools assume that your atom has an element and doing coarse grain I do not have atom that has an element in the very big table I have to lie to open the map but then we have all these non-generic cases that exist but nobody knows about, well the tool do not know about them and so if you are not careful you end up with something that appears to be right but does not account for that weird molecule that was there in your file and that got removed magically so we need sanity checkpoints to identify I don't know what I'm doing here and actually this has been mentioned already this is the error reporting error reporting, yes which I lost meaningful error message or something it's a little maybe no well I guess we can find it at some point and finally one of the problems we have is that people who write the tool and the people who use the tool are less and less the same people and so when I have this strange case I need to report it in a meaningful way to the person who is writing the tool and this is a and the developer we go back to the documentation need to say hey I made this assumption be careful with your stuff and this is a feedback loop between developer and user which we need and he's not completely defined yet and I think I'm going to bust it I don't know I don't agree I was telling the paper you don't have to say it or something there's tape, I brought tape so many solutions to this problem are you causing your workflow? I'm just stippling there's no way of doing flipchart man and man you guys are out of the box thinkers I'm usually among the groups here we identify things that weren't important so we haven't required the frame shift that all of the discussions have so critical among the things that we think are easy ones are that we can have validation suites for force fields particularly as a developer I frequently get requests for can we have charm 36 yes I'd love to please show me the github where there are all the inputs and outputs and the values I have to reproduce what we have in Gromax at the moment are all the things that contributed ages ago I can't say yes this absolutely includes something faithfully unless there is a test case for the force field you don't just need tests for software you need tests for our physical models as well that comes in a lot of forms that we need finding what is meaningful to validate the force field is not that trivial there are some things that are easy ones we should have free energy cycles that have to change to zero we should be able to reproduce within the limitations of the force fields that we parameterize to things like density these sort of things we won't exactly reproduce this but at least we should be able to between code versions and between codes if we come up with some limits to go and investigate a problem but initially what we need is just to reproduce the single-point energy if it is conscious that's the best one otherwise you're involving things that are not having anything to do with the force field we're choice embarrassed at their choice embarrassed at which is important but that's an capability that goes rather than a force field we can turn it back to usability because they don't know what to choose they're not told in the hands to evaluate what to choose but in terms of you being told that you just want to be able to show that you can reproduce the energy that's reported for those molecules because then I think a more example was between number nine we had whatever identical results silently between number eight or nine or nine or ten it changed the way they interpreted their own torsions completely fine the reason we find out that suddenly if it impacts their results exactly anymore, completely undocumented some they don't read, some they don't and the point is not to blame Amber but the point is if they had an internal test then it would also be clear to them that we no longer passed internal that we have now updated our regression tests having on that we would have also been able to oh they've updated their regression test because they changed their interpretation good, then we need to change their interpretation too as a community when we hear someone publish a new force field we ask them where is the test suite it won't be available in your favorite tools in the end to us there and test suite which should have a specific meaning for us to agree upon we found something that was of more importance and also quite easy is to identify the massive bias in space on their dynamics are these fit for the sort of simulations that the community wants to be able to do are we able to agree upon simulation protocols that are able to be implemented within the bias-based style models is there something as tool and application developers that we could actually implement as a higher level thing that we currently have at Hock file formats and non-descripts they've all worked 20 years done on models we should give them feedback because they are people working in this done model space that very few people are working on the straw men that they are proposing implicitly and the fact that they've got a well-defined data model is something that we should either criticize if we can or agree with if we can but that is a very easy thing to do as an application developer I'm aware of what the data model is after there's an application about it I can say yes I can implement that, good rearrange the stuff done and now we have something that's very easy for them to put groomed machines around for example and we thought it was important but a little more difficult is improving the user experience in terms of being able to produce parameters of the ligands particularly out of mystic force fields there are people contributing to this space there's relatively few people contributing in coarse-grained force fields parameters of the ligands we thought that was actually more difficult because few people are working on how to get that done is less well established we are working on it good, glad to hear it I've heard you try to do this as well so we have in GrowX a tool PDB to GMX jokingly we rephrase one of the frequently occurring requests that we have as tool developers please allow me to just run my simulation I want to give you a PDB code and then I want to run a simulation so I would be jokingly describing that as an MSF2TPR to avoid the growX simulation my response to that is what do you as the scientists do if I can do that for you why aren't you doing something more honest with your time because there's a whole bunch of questions that we as scientists need to be able to think about what pronations take to make sense what are the experiment protocols actually translate to in terms of something we can actually simulate there's a whole bunch of positive text I think there's a really important point there because as we're making these tools more and more accessible to people then also people who are not thinking too much about stuff are wanting to do it and so that goes maybe to training or other aspects sure absolutely I can build an MSF2TPR and then we'll just be around I feel like it's definitely a black box by having to see the black box which does everything for you automatically what's wrong with the black box if you know how it works you don't have the knowledge of like how important or what do you call it it's not the same argument you need to know how to do the current trial you don't learn anything from setting up a simulation you learn something from analyzing the simulation well you don't know the simulation crashing I think it's ridiculous to think that there is like the scientist as an artist is somehow shaping something out of the PDB file to get whatever conclusion there actually it's not quite true but we believe presumably as a field that there's a best way to do something if you're after a particular question and that there should be a standard way of making the right decisions along the way that give you a robust simulation so in principle we could write this provided you got some expert information what's the expert information that you need to put out something that's not a random check but what would that be so that's what you mentioned in your example of method section it could be something about the experiment you're trying to reproduce but maybe only certain experiments are amenable to that kind of simulation right now and speaking of the fraternization state I think all the people trust you to lead those states should we take those up would that make that code more visible no but I mean nobody can force you to do good science if you are not willing to but there are good tools that can make you and that can help you do these good science faster and even more efficient at least in the domain and specify problems that you find but people should then know about these tools because men just don't and like when they are the tools out like it should be shared we're going back to this we probably just want to run you a few nanoseconds of the protein in water and we just so this could be a good build up there's a vision in the pH and 0.15 molar salt or whatever just creates some physiological conditions that would make a lot of people happy but one of the reasons people pay ridiculous amounts of money for modular or ridiculous amounts of money for loss of the tools which they use in pharma is because they have really good PDP to set up systems from codes and they've got the protocol so again the credit suite they put a lot of effort into taking PDP and doing it and then presenting the choices in the GUI in a way where they had to make choices like okay user which is to do and you just pick and choose you know so it's making it really simple at the moment in the academic space we really haven't tied these protocol tools together for protein setup at all we don't have a good tool that does leave funding because no one's really put effort into actually building missing messages at all it's like it's an abandoned thing I wish we had an open source lead modeler because at the moment modeler is the only thing you can use for any of the worlds I really like making models but it's the closest thing and it's that protocol thing it's like you need to go here like some open source modeler is there anything that anything in the tools category that does the modeling for you for free? you can also do it in time yes, that's true as it was said as physical there's a lot of delete modeling programs they're the same certain ones have been developed in Oxford and I shout out names for it I shout out teams the general problem with loop modeling is that it's a very narrow regime where there is a significant quality difference for very short loops everybody gets it right if they just close the loop for if any rest you just don't like it it's it right and then there is a there is a there is a regime somewhere in between where some tools do better than others but it's very relatively narrow but I mean I have a simple script actually you can ignore it for that reason it's not that important to have the best loop modeling tool ever but I would love to have a script that oh I have six missing rest users we have this implicit knowledge within the community of actually this will work but this will not work there are actually protocols which we teach our PhD students and say you do this, you do this and it's just about finding ways of encoding in shareable protocols and then you learn one little about what you've been doing when you say okay we've really also six missing I'm happy with it you know that you don't continue in a workflow yes you keep doing but I'm saying let's share those protocols of doing protein sets up and making set up so that you can basically rather than taking this this is the hidden group way where we build the liquid, the hidden group way where we do a protein that's in a public arena so one challenge to think about from a user point of view I'm very much inspired to the idea about using other tools but if we grow back we're going to we're going to kill for you to do mix that's actually why it hadn't made a whole lot of problems in the last two years because it's a kids sensing solution that tries to be able to fix things that's very different from the problem of typing something to a point of speed however the general challenge for a user of course to be able to use any some sort of tools to prepare you but then you need to run it it would also be frustration of users if they're repeatedly told though you can't do this you need to find extra program one and two and three and four and five and six and seven while on the other they also know that Linux distributions they really hate it when you somehow just aggregate and pull tools into packages because suddenly you're going to have 14 installations of that in the Linux system so maybe we need to find some way but maybe either grouping tools to make it easy to pull down these so that they can be maintained externally but when we actually install them you will have them installed so this is what my hope was was with Fonda so when we originally built it was actually for tools that existed it detected it and then did a condo install one little lab did that beautifully so we realised that condos from the Xe Broke could have and if we fixed the managing dependency problem then actually the user shouldn't see this tool doesn't exist so he pulls it from the back and we talked about that over the base before that so it's easy to start new things and I would actually I spent this part to be full this can work though all the time with our tools it pulls down 40 dependency that we don't have to write ourselves and it reduces the maintenance burden on ourselves because we can use well-developed and well-maintained tools and pin them if there's any other problems so I think it's something that can be solved as a community if we floor so I want to give Mark an opportunity to finish his things and sit down with his view again something that we talked about particularly from the growbacks point of view is that we need to work on separating our analysis tools from the preconditions analysis tools need do I need molecules made of hold do I need to cluster things from how do I do the IR the post-processing this is a discussion because as a community we now can move away from somewhere out of the car I have to fix PPC before I can do anything else with it this is about separating the data model aspects of okay I have an analysis it requires these preconditions something needs to establish my preconditions that might already be true my user might notice true user might have given me enough information to be able to establish this precondition all of these things will enable us to implement analysis tools in a transferable way so we can get more into our capability to have a data model that actually works and that we develop analysis tools that actually announce what the preconditions actually are we have like five different variations of minus PPC implementation in growbacks I mean I assume other people have similar quality signals we need to work on this we upgraded the need for documentation to be the need for stellar documentation documentation in space that is yes yes that's true no documentation the single most valuable thing we found in the growbacks project for making sure that we can have tests to go along with your code documentation to go along with your code is that nobody can change the code to other people have reviewed it and for this to be the community expectation and for that you need everybody including the PIs who has done the project have to go through the same groups you can't change the code unless you have tests on your code unless you have documentation on your code both of the developers and the users doesn't matter that stuff doesn't go in we need in the more difficult category we need general well designed input output descriptors that might belong more into this sort of corner and this comes back to some of the other more difficult issues that we've been tossing around we don't have a good way of describing a chemical species in a machine possible way that allows us to understand oh, what is in this video I need to go out and build those models with more missing residues because they're missing I need to go and choose a problem I used to have it told me all of these sort of things are very difficult for us to grapple with at the moment we also don't have general well designed output formats either but all sorts of things need to get improved in that space some of which we talked about earlier today and also in the long term category we need to start evolving high level APIs such as good down models that's a great one started in biasing space to develop a good down model and as a tool developer I would very much like to be able to build upon that in terms of expressing a pattern that they won't need to write shins around because we'll just the doubt should be in memory we should not need to write intermediate disks alright, thank you so much well, okay look at the most important side going from easy to hard at least we have something emerging okay, it starts with better docs and training and how we can get there is anybody's guess like you've been good about enforcing that you have to have documentation with your code but are there other ways we can encourage our community to do that modularity we have data model, common data model we have APIs which are pulling from up there as Marc just mentioned we have escape from dependency hell better error messages and finally up at the top we need a culture change that rewards modularity people said was hard so all these things are possible technically it's possible to write a common data model it's possible to write good error messages there's some technical schemes that facilitate that it's also possible to resolve the dependency hell if we put it in a hybrid but what's the societal way we can make sure that this is what people want to do or that people are encouraged to do this people in this room are certainly motivated but is there a way we can facilitate or encourage that development through some change that we can do later somewhere