 A chyfnodd y byddai'n gofyn o'r ffordd y llwyffydd oedd yn ddiddordebach ymlaen i'r ffordd. Mae'r ffordd yn gydig ar hyn o'r cyfatrwyddol i'r ffordd, ac mae'n baut yn cael ei fod yn cael eu ffordd. Mae hynny'n gallu'n gydig ar hyn o'r cyfatrwyddol, ac mae hynny'n gallu'n gydig ar hyn o'r ffordd. Erich yn ysgrifennu ar gyfer y ffordd, mae'n ddiddordebach ar y ffordd ar gyfer y ffordd, ac mae'n gydig ar hyn o'r cyfatrwyddol. ac wedi gweld wedi'n ymlawef ymygoel sy'n cael eifawr. Fydd ydych chi'n fawr o'r cynodiaeth o storid? A fyddech chi'n amluntol yn fawr oherwydd dechreu fan hyn sydd y storid ei fawr wasio unigol y bydd yn cael ei wneud, felly ddim yn gwbod am fawr. Rwy'n dechrau a'rgynniadau efallai leolethaf yng Nghymru. Rwy'n cymryd yma ar gyfer amddachion gwheiddoeth? nid o wneud o'i weld ysgolwch ar gyferwyr. Felly mae'n gweithio beth o'r hyn yn ei gwybod i'ch meddwl i'ch ffordd. Yn ymgyrchau fysgfaith yw'r tylu yn ymgyrchol, ac mae'n gweithio beth o fe wnaeth yn ymgyrchol. Felly mae'n gweithio beth o fe wnaeth yn ei gweithio'u. Oni'n ddweud o'i gofynu, fe'n gweithio'r hynny, ac mae efallai yn ymgyrchol yn ymgyrchol byddwn i'n gweithio ar gyferwyr. A thysg ar gwaith a'r ffordd o'r ddweud. Mae'n ddweud o'r cwestiynau i'n gwneud ar y ddweud i'n ddweud o'r ddweud, nad yw i'n ddweud i'r ddweud i'r ddweud. Mae'n oed panfod o gwaith o'r ddweud, Eric oedd o'r ddweud o'r ddweud i'r ddweudio yn y ddweud. Mae'n oed i'n gweithio negatifol o'r ddweud o'r ddweud. Rhaid i'n gweithio i'n gweithio. Yn mynd yng Nghaerng honno, rydyn ni. Yn gyfnoddon, edrych yn ffwg o'r ddau'r ddau? Yn byw'r ddau? Yn gyfnodd. Yn gyfnodd, y gallwn am y ddau'r ddau, mae'n fwyaf yn y ddau sy'n ddau, mae'n ddau o'r ddau, e'n ddau. Yn y ddau, yna. Yn y ddau, mae'n ddau'r ddau. Mae'n amlwg o'r ddau, mae'n ddau'r ddau, mae'n ddau i'n ddau, Felly, all masfer yr adeiladau oedd y bod yn arwag gennym ar yr filoedd ymlwg yw'r adeiladau, ydych chi'n ddiddordeb ddiddordeb perthigai'n ddiddordeb un fyddai ymddwymon i mlyneddau rhai o'r ddiddordeb digwydd. Roedd chi'n dda i ddiddorg o'r gyrdd ac yn fam, gan mynd i ddiddordeb gynhyrchu. Bryd o'r ysgrifennid yn llawer o'r sherf feddwl y pramys. i'w fath o oes oherwydd wrth y llexarniaeth cymuned iddyn nhw. Rydych chi'w meddeithasun oherwydd mae'n meddwl yn meddwl i'n meddwl'r... Rydych chi'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl i'n meddwl iddyn nhw... ..fyrdd eich meddwl wedi bod y Llyfr Tory Mdb sydd ymlaen i'r tyniol a'r ddımıll ar gyfer EID.. Mdb. Mdb, molecule here. Let me check if I already go on. Mdb, and it has Mdb as a score minimisation and it has Mdb as a score dynamics. So it's a stock. It probably doesn't have everything, but it allows you to describe parameters of the simulation. The fact that we're meeting under the sort of biaxel banner tells us something. So the initiative for biaxels is already in place. So that's obviously a very good thing that is helping to develop the field. Mdb analysis is an analysis tool. Mdb tranches is another that is no longer supported, but Mdb analysis is a way forward for future analysis. And all the next time, certainly still very much ahead in that zone. The fact that Eric mentioned this POS white paper that is discussing is that out yet, Eric? No, I think that it hasn't appeared yet, but it should be out in a week or so. I mean that's an essential initiative that has got both key players within the field and also the publications on board as well. And we also wonder whether there should be some sort of benchmarks that POS went forward as well with what can be discussed. In terms of actions, we need to identify what actually are the needs within the community. So that can be something we discuss over the next few days and then come up with our own white paper. There are readers in place, but I think Gromax needs an MMCIF reader. I don't know how to potentially, but obviously you convert back to PDB format before then going into to use Paddock. We need to talk to the various PDBs including the WWE PDB just to get some sort of consensus on this required format. The idea was there also just to comment on that one thing. Eric was saying it would be cool if you could go to a PDB entry and find all the simulations that have been started from that entry. But for that I think it's important that we talk to them. If you were to integrate this data, what kind of format would you like to support? But I don't think that's going into the five formats, right? Provide information to the five formats to make it easier for people to mine everything and start creating these links out for people. But if we come to a format that we have to change all our machinery to be able to use it, that's going to be a huge effort from their side and they're probably not going to do it. But if you tell you or if you provide us JSON, then it's fine, and then see if it's fine, then we know that then we can just integrate it. Yes, I've actually written that as well. So it links directly from the PDBs a sort of simulation information. We need to talk to the journalists as well just to get an idea of the sort of benchmarks or otherwise the sort of integration of all the sort of parameters when it comes to publishing a paper. A couple of key things that Lerwick mentioned as well, reproducibility of the data, and that's serving something that we should act upon. And also looking at other formats such as Yaml and JSON as well. And there's probably a couple of things I missed, but I think that's enough there. Good. Thank you. So you might move on to Team Yellow. Can we elect as both persons to tell us about your things? As we go to the new one, like transfer to the new one, because it's similar to what you've already talked about. Move the board to the front, yeah. Are you sure? Are you sure? See, I've just written an apron on the new set. Yeah, just the apron, yeah. I can use it. I've got to do it. Do you want to? It should be clever. Are you not? No, I'm not. OK, we're going to hold it for a minute or two. Oh, OK. Maybe one of you will see the one of me. The ball of promised it can clip and then it can actually do the same. So let's talk to the what makes it possible. Actually, the boundary between what makes it possible and action is kind of thin. So it needs to be extensible, OK? It needs to be something that we start with a core concept of the five elements. But that is the set of elements. You should be able to add things when you think that it's needed. OK? I don't think it was already... It feels to me more like an initiative rather than something that we already have. Doesn't that really make it easier? More action would be better. It doesn't make it possible. It's something that I need to make it possible. So there's an action. Initiatives is what already exists. Is it correct? So we already have this key value tree format, Jason. They already exist. Something we should build up. I think that's why I thought it was more in an action than more really positive. That Jason exists is a positive thing. Yes, absolutely. It is about positive and not an action. We had existing formats and I'm safe. I can exist in format Jason. So this five format, that's again an action idea, needs to be something that is hard-solid in the software. So you need to come with the idea that you need to come with the idea that you need to come with the idea that you need to come with the idea that you need to come with the idea that you need to come with the five format and be able to use it directly in your core software format. So that goes together with the A-force from the key software. So it kind of comes just from the user point of view. It needs to come also from the software provider. But that would actually, to convince, nice foundation, to convince core software to do this, it needs to be a shared ownership. You cannot just say to someone, just do it and that's it. It needs to find some priorities. It needs to find its own priorities within the initiative. So it needs to be shared ownership between the different partners. We need to have a portable library API as we could find for the item in the format for this Python API. Ultimately, you would have to extend it to see the common knowledge. To allow the users to basically easily write on-read this format. Having a way to define, validate, to find those kind of data with the API that we have for the IHMMC that we have for this teaching area against which to validate the file which we want to delete something. So with a clear metadata of learning of a big problem, software people need to do things. But actually once you have this definition of metadata, it doesn't need to be a software team that doesn't come in the software team. MGS. It needs to be a self-contained file in a song that quantifies one simulation. So we discussed a lot about this format or not concept. So should we have on one side parameters or something like that. On the other side, just the coordinates or should we have everything in one file? The best way to achieve what we want to achieve is to have a file where everything is in there. Okay, it might raise some issues in terms of performance in terms of the size of the file but it shouldn't be the limits we put right now. At first, because of the technology that exists, because we might be able more easily to overcome these limitations where some technology might take other benefits from compute cycles but humans make mistakes. Can I just say that given the actions again quite large, I'm a bit concerned about workload for software. Whether, as was said earlier in the morning, that we need to really refine exactly what are the achievable in the near future rather than just doing that sort of wish list for whatever is possible. Let's take that as an actual item after we've collected everything. It makes it impossible with different priorities for the different... people involved or different needs different priorities. The diversity on the end of data we need to store on share let's do this one. Store space... So I'm here what the MVP is or which will have most impact. The what? So what would be the MVP? I think it's a lot of work. It's a lot of work. On the initiative, I think we already said, so MD analysis on MD training are two examples of that have been made to try to have something generic enough to read and write in different formats. So we can inspire ourselves on this initiative. On the IHM family I can actually... OK. What's the last one? The IHM MMCF initiative. The one that Alex did, right? So let's go with... ... ... I have the next one, you're doing the... ... ... ... ... That's actually... ... ... ... ... ... ... ... ... So listening to everyone, something that kind of struck me is that actually what you guys are trying to do is build ontologies. So there's actual ontology initiatives that try to interface the quantum mechanics up to the actual mechanics of a car. It's a very complicated thing, a very long-term thing, but just know that these things are out there and people are at least trying them. So it's worth at least seeing what they do. I think the way that these things will eventually work is a very little field to have their own ontologies that will slowly grow together, hopefully over time, over a very long time. So I don't know if this is a positive thing due to the time investment compared to the payoff, but at least it deserves to be upon the boat. And on that, from everything about ontologies, the other kind of neat technology that you start thinking about is the semantic web. So you have things that are not necessarily glued to a single schema, but how do you create a schema that can interoperate? How can they all connect together? And again, I think it's a really negative or positive, simply because it's an extremely complicated idea and not something that I really recommend doing right off the bat, but at least, again, being on the boat and something to be aware of. So on that, a big positive is that there are similar initiatives in other fields. So in quantum materials, there are four or five. In quantum chemistry, we're mostly actually doing one. In things like cryoem, there's tons of spec and schemas coming out. So there's a lot of initiatives that are relatively similar and good ideas that you can pull from there. Most of those are with pink ones, actually. Where's the fact? Oh, that's below. So initiatives we already have exist in other directions. And then ontology we had as a negative, but I guess it's on the boat. Yeah, just about a second. So I think this one's very particular to actual parameters. Like ontology for parameter space. This is like general ontologies that actually stretch the entire spectrum. That there's prior ontologies is very important. We have problems with it. And so another one I have for positive is that there's a clear benefit. Although I have to question mark against this because I think everyone thinks it's a clear benefit because we clearly try this a lot. But why do we try it a lot? What are the actual outcomes we're looking for? Is something I think we should try to answer as well. So I'm going to put this kind of just slightly above the water. Let's see. I think one negative is that in a lot of ways MD programs are fundamentally not interoperable for each other. Like there's a lot of commonalities, but there's still a lot of funshality in each one that's specific to that single program. So I think this is what we should be aware of. And kind of on that kind of vein is that they also have completely different data models. Like how they actually think about their data. How they organize their data is completely different between programs depending if you're GPU or general HBCX86. You know, like it's completely different between programs. Do you have an example of that one? Doesn't nothing come to mind. Sure. So I think this goes back to if you have a 4CAM program and you want a really large, meticulous service you have a C++ program and you usually go for a more hierarchical, you know, kind of like object models. And so fundamentally like they're organized a little bit better than they can go. So how you organize like say your coordinates, for example, is actually different between programs, but if not, it's up. Sure. And then we come to express things, you know, in implementation if this should go away. It should go away, but people very much care that they can read it directly into their program. I'm not sure why, but it's something that we see across the board that people are extremely concerned with. They're internal representations being the same as external representations. I mean a good example is, for example, with a lot of the ways that people have coordinate files, they're all the X's then all the Y's then all the Z's because that's what's coming from the program. Whereas we take the PDB, it's Atom, X, Y, Z, Atom, X, Y, Z, Atom, X, Y, Z. So what you, essentially what you're talking about here is a data model. So the PDB data model is that we have frames and inside frames we have atoms and het atoms inside an atom, we have an atom name a residue name, an X, Y, Z, coordinate blah blah blah. But that data model can be represented on disk in many different formats. And actually the format that the data model is actually represented on disk really doesn't matter because it's the data model of how bits of data are interrelated with one another. That's what we're saying in the PDB. What happens is that when I search it into a program a program will extract the bits of that data model that support the program and then arrange it in memory in a way that the program needs to most efficiently run. Now the issue we have in the field is that we tend to write programs that go directly from the disk representation straight to the application representation and we don't actually think about the data models that are actually held that are represented in that file format. What we really need to be doing is rather than saying should it be in JSON or should it be in XML or should it be in SCDF or should it be in the final format actually saying what is the data that we need to actually represent how does that data relate what type of data is it arranged or flowed to the arrays or doubles how does that data get packaged up with other data so we have a data for a molecule but then there's also data for parameters then we have data for running simulations blah blah blah again what are those bits of data how do they get interrelated and once you've built a data model that says one of the bits of data are now interrelated how you've blacked that data model to disk who cares really if it's one file, a hundred files, a thousand files who cares because ultimately it's that bits model which is the thing you then want to go and take into application space to run applications if you say actually if everyone is working from a common data model then we can support lots of formats underneath and for the application speaker data model they can actually get intermedically with that as well and I am quite biased in this my sayings because this is what the biasing space project is is we basically have an extremely flexible data model that relates parameters force fields, molecules molecular systems how do you run simulations blah blah blah and we deliberately do not write this data model to disk what we have are interconverters that effectively enable you to read and write any type of molecular data any file format into the data model and then the data model invites the input files for the individual program to run them effectively empty analysis does that for trajectory cool I'll look forward to reading more about that so we can get the tools in place sure that actually covers a lot of state news that internal external representation should very much differ underneath the initiatives we have again think models and not formats like I think a lot of people are thinking that for JSON or something like that and I would argue that this is kind of the wrong way thinking all these things and instead of what you should really be thinking about again is what kind of key value array sort of thing do you have what kind of fundamental structure do you have how do you collect various objects that are in these kinds of ideas and a lot of people say like you know what let's do JSON because JSON is great and that's true but JSON in a lot of ways is actually a really good format for as you see it's extraordinarily slow the numbers are not serialized in any kind of way you know instead there's all kinds of things which are like message pack or parquet and these things can actually represent arrays and the thing is you can convert them between JSON and the animal and whatever you want extremely easily so I wouldn't think about JSON I think JSON as in like key value array structure that's what you should think about but not JSON and as was pointed out is that people are doing this data model idea that is actually pretty common so I think in the analysis and detrage has mostly kind of had a project for aeroporting different kinds of trajectory formats that has internal data models as well so I think you can find a lot of people trying to send for representation so we're thinking about them taking ideas and then the final action that we really want to say is that you know all of this topic that we talked about what we're really trying to do is figure out like where the data model split is like so what's the trajectory, what is metadata I think that isn't very clear because like a lot of people have what you would think of as metadata but trajectory is vice versa so the question is more like how do we conceptually organize this data and so I think thinking about that will actually dictate a lot of what we think about in terms of file formats and how we actually write these things and help them I actually had to think about and is that going to be the work? Identification of needs Yes The nice thing with data models is it's not up to the software programs it's up to the field to say this is the data this is how we need to arrange it but once you've got the data model that's actually quite simple for a software program that's a data model I've already seen for that something that tried to counter software programmers go and work so it's a problem for us We don't have that much and we had a little bit of a challenge because at our table there was nobody who was actually developing file formats or just these kind of programs so we were a little bit more thinking about what we as users would like to have in these kind of ways so like first of all what we want to have is that it's storage efficient we don't know how to really do this they are already trying out different kind of file formats we also have like actually the we actually did not Yeah So we have like storage space Yes We also agree that so for us different existing file formats is not really that much of a problem so we don't like aim for having just one file format because there are so many tools which easily just convert it but we rather would like to have that all the formats have some standard input which is actually in them so that every file format from these things have the same kind of input of what can extract from them so in the end that when you took the conversion tools like you said before these backing up go ahead things So we have competing data models that are currently expressing file formats is that one of the idea? Yeah Okay Yeah The meta data like the JSON file which is more or less where is So you have my existing file Yes, exactly it was mentioned in the talks something like this which includes which includes the information what everyone had so where are exactly these our structures coming in the end it would be like to be started from the PDB file then have information code was processed and absolutely there wish list but it would already be great when you could have some files the information where where you got these information from how long are your trajectory files and these kind of things Also the thing which we would like to have is something like a streaming feature so that you don't have to load the complete trajectory in a stream information like for example like in a YouTube you can just jump to a certain frame and look from that on and something like this that you don't have to actually load everything in there so that's something which we actually would like and I think so far all the tools only so we have to load it at least once per day So there's a net CDF that's why you use net CDF because you could just jump to any frame Okay I guess it's a container file There is progress in this field obviously everybody has made one for this but yeah I guess that's actually something useful So what makes it complex like we have to agree on all the information which should go into these files I guess that's sort of a brilliant point Yeah or like to make it complex to get that So that's sort of there's models and formats and types of information that's a point At the point what about post analysis so do we also want to have a file format which gives us something out of the amounts which is the same for everything Yeah I think it was a one point that is something that's hard to distinguish it's not always what is just the opportunity that when we define a state it might have something to do with analysis running during the simulation upon which the simulation decides when we're ready to go over it to stop and all these like stop conditions for example they can be they can be in the state crystal ideas of Donald they can be hard to write the thing because I don't think about it like in a system I don't think about the simulation I don't think about the scientific workflow there's another other level in there and you need to have a process result There's actually NASA's done a lot of work on this and what they've done is actually come to a three level model I think it's generally applicable and the top level is simulation so the data is actually running for the simulation the middle level is what they call campaign and campaign and all the simulations all the things which revolve but actually the complete campaign is a complete experiment and then they have archive and archive is now the campaign that's finished it's now gone on to cold storage and they have different formats and different things for a running live running the campaign and then going to archive there's actually different uses and needs for those three levels Who is this? NASA So that's a good... Chris, can you make a push to the left? Yes Because they have a weird good paper actually where they describe this because they run a wide range of simulations Yeah Like it was mentioned in the talk something like the Jason file format or something like that and like we already discussed that it has pros and cons and yeah it was like sure I don't like it's already existing and in general tools for readability so those which are human readable and machine readable if you think you could also think like what you don't actually need to have all the file formats being readable readable by the human if you have good tools who give you the output in a good way that everybody can read it so you had it? Action Action Alright so thank you everybody for that large question and ideas we have about half an hour no less, 15 minutes left where we didn't talk about how we moved forward from this set of ideas we didn't really reflect there's a lot of other things that need to be done by software people but we as a large community need to be involved in setting up the boundary conditions for those some of them My impression is that there are a little bit tool opinions which direction we should go with there is roughly one direction is that we have a standard file format in all the programs that we are using and then these programs give up on their own formats so then everybody would be using same formats and there would not be these specific formats for each program anymore the other direction is that programs gives their formats and then we have converters and my feeling is that I have a little bit third arguments for both direction but sometimes it's alright, you can have a program that can read both performance and I think that's by far it's unlikely that all programs will give up the ability to read their native formats right away but I think being able to read this format too provides an easy solution and you don't have to convert Ultimately if the new formats deliver value to funders or PIs or students or users then they will get adopted the software that has not yet adapted to be able to read them right so that's I think the path to getting things done that I was wrong trying to come up with some software that says you can't put a mail unless you do this for them Can I go back to data models and look at what needs to be done but I think for data models there's also a lot of work that has been done so far it's just largely neglected because nobody really cares right so there are people who have to develop data models but nobody really can for example from it's actually a pretty good model I think it needs to start with so you don't have to start from scratch you can already parcel data from older major and the software packages Conrad Hinson has developed several data models and there are also general data models that are sitting out there and now how you would call each field I think it doesn't really matter because the structure exists out there so I don't think there is so much effort about actually building the software but it's more about reaching the consensus about what kind of code can you be able to put in and how do you want to hold these what are the variable names and I think that's where the code begins and then it starts right there Is this the same as ontology you were mentioning? Yes, essentially it's connected because ontologies and what kind of semantic work do you kind of know which word means what and we need to agree that this word actually means that particular thing so we don't have different definition of I don't know what's thermostat I mean you can think that in Central Georgia I don't know or something like it so we need to make sure that we know that every word is like one to one I think essentially because there's no space for ambiguity it all gets to a point where they think that resisting far formats are too slow to evolve because there's a burden of some committee period to get new keywords accepted that tend to go with aren't key value stores are really wonderful and then they throw away the existing far forward they jump onto key value stores and then over time they produce they have different meanings for the keys and different data types of the values and then they suddenly realise they need to create a committee that standardises the key names and standardises the data types that's the thing so ontology is what a community can do as a community in the final ontology and actually agree the meanings of things and what how they're going to relate the solution is not generic key value stores because generic key value stores is just everybody coming up with their own names and everybody coming up with their own values and as soon as different groups start inviting those to this then we've got chaos because we want to meet different things I think one of the things that came out and some of it we've really struggled with with Biosyn's face is there's this naive view that all of our tools are interoperable when they're not you know, if I take the same input from my MND package it will not run in another MND package there isn't a standard definition of what we need about the terms, the integrators or any of those things and this is why these initiatives fail as soon as you get past kind of the coordinates the velocities, the force field parameters and even with the force field parameters there's a disagreement actually what they mean, what is tip 3p water or actually it's different from packages so what I'd like to see is not so much just a community bubble I would like the community to begin to evolve what is the ontology for this area what do we mean by tip 3p water what is a tip 3p water in terms of the force field parameters how it works, how interactory shake what is a protein how do we define them in those kind of things what is the ontology of formats and how is it all discussed so trivial I think everybody agrees that's the good thing to aim for but we need to start somewhere where there's enough common ground and probably something where there's enough incentive to work together that you could do something cool that you couldn't do before and I don't know if that's like taking the MND or something like that and be able to export in this common data model format that would then be usable in all of our simulation codes right out of the back that would be useful for example but there might be something else that would be a minimal subset of things we could do and then some cool thing we could do with it immediately the thing I'd love to see is basically more take up I mean I have nothing to do with the MND analysis project or the MND tragic project because they recognise that it doesn't matter what the trajectory format is it's what we're doing with the trajectory that's the interesting thing it doesn't matter what you know you'll just load up whatever trajectory format you've got and as a community we can actually support and develop a tragic darling is that but if we can support and develop MND analysis we really begin throwing stuff into that through our own individual analysis tools that would be great but that doesn't help with debugging and running things from one software to the other and that sort of thing I'd say that too very different part right coordinates and all these analysis that's a very simple problem farce it's like two or three orders of magnitude that's specifying all the parameters that should be run and in that case in point there is not a single packet that provides an ability to cross reference parameters and start simulations in different packages simply because it has to be exact it's not I can't interpret it I use exactly the same words exactly the same sentence on the other hand I think the good thing is that it is well defined because all of these programmes they have implementations that implementations are exact this is the issue sure but the implementation is accurate and over them and Rolex and then we might have to realise that they differ and then we have to use different words for it to fix the implementation so I suggest we finish off by reflecting on what things we individually might be able to contribute to these in future in order to contribute to running some words for all my paper optimists in order to in fact can I contribute to the evolution of these ontologies can I recognise existing efforts such as I've owned can we build those into data models that are suitable for some minimum viable product of things relative to binolecular simulations or binolecular calculations more generally thinking broader than indeed can we then set up some boundary conditions so that the people who are implementing analysis suites and simulation packages and docking programmes and web servers and all these kinds of things have the ability to refer to a community document and say ok, these are the things I want my users need now I can make sure that I get that done and that it's consistent and it's going to be scalable and detailed Can I also add one more thing? So how do those ensure that whatever we do doesn't follow the same list name between the credits so it lives for two years and the guys and it's on and we can say it's so nice What happened to it? It's no longer supported and we're responsible for being the caretakers and stewards of useful, interoperable software tools for our community No, but I would say no that's also been right, because none of them are funding for more than the next four or five years and then they might be gone and that's because back then if we're going to do this you can't think that there is funding available we will have to take this on as a long-term engagement even if we don't have money a short term we might be able to fund it but if we want this we can't think of securing funding for it which is a scary part But we still need people to do it, right? Yes, but it will have to be an engagement that we prioritize in our groups even when we don't have dedicated funding for it Otherwise, good funding fluctuates No matter what funding we find for this there will be a point in time where there is that funding specific funding is no longer available and then it will die if we focus entirely on funding in time but equally if we one of the problems in the field is we tend to produce new tools from scratch we get rewarded for producing new normal tools from scratch as to how grants work and we tend to produce a new tool from scratch and actually when it comes to writing a new analysis thing for MD if we just throw it into tools like MD Analysis you would then begin to build a pop-rock and source community which makes it more immune to having individual funding fluctuating and then in terms of sustainability we're very good at producing tools and producing tests Grants is a very good example of sustained piece of software but most software that we've got tends to be written in a way where it's not able to be picked up and put down because it doesn't have tests and very much are on proper documentation and a proper community management model and we need to add those things onto the software project we made so that then with the original group with us away another group can come in and immediately keep building in the same framework and that's how all open source projects basically became successful they didn't have continual funding from the government they did bend up and down because they put the community layers for sustainable suffering in place I think for people where I think about this tool not just what tools do we need but how it can be sustained and how to build so yeah we need people to contribute words to this we need people to think about ontologies we need people to then think about how to convert those into software so those are going to be challenges for us all there are those of us with time and funding right now to the primary work in here I'm happy to contribute to those efforts I hope some of you are able to as well in the different areas where you have different kinds of expertise I'm not the right person to be specifying what the data model is because I don't run simulations I don't want to see empty packages it's no good just looking at oh you're the software guy I'm not going to do it all for you I don't have the time or the amount of funding as do you want It's true for funding so here we are mostly we are bio exteriors some European organisation we think that at least this consortium can give something to start with a bit of money to try to do something to start something or I don't know or can we identify at least some sources that can be conducted to try to find that by some specs which I think is the British equivalents they still are both like how old person we can personally first of all try to basically write a couple of lines there because obviously everything we focus around is defining data models and the importance of building data models and how you build them in a simple way so did that really contribute time for people to write in bits and things like that the same thing on bio itself we can certainly find people time to help them to find things and do better implementations but it's also the danger there right is that if you have two or three places with funding for it it easily becomes a site for it that those groups are involved in and that's we need to take a sustainability how do we feel that this is a shared ownership that we all do together at some point in the shorter near time there might be some funding for it too but I think we need to start by things on how do we involve all the community and it's unlikely it's not fun here but it's unlikely that we will find the grants that will be able to support everybody because those grants they don't exist collaborations 2, 3, 4 groups yes but 25 groups no does there also have to be something we see many benefits of having this common data model but does there have to be something particularly interesting here different codes or things together yeah that's what we have to put into the buy some space grant so specifically building it so you could run different workflows with different underlying packages and do you force build on that comparisons so this comparison could be the explicit yeah we're looking towards this big energy and accuracy comparison or performance comparison or the accuracy what's the best method to apply a prediction of binding I find accuracy far more interesting than the performance yes because again we can start a comparison of free and de-calculations right there are a ton of small settings so are there things that we know that tend to improve accuracy that's essentially what you get so whoever is you going to shell get a little bit of a trick essentially there is kind of a question what is the right book what does this department actually mean anything much more concrete level that is side tracking a little bit for the code like the grammar so I think one very useful thing is to separate the internal data representation and the external data representation that means splitting off like XCC file reading more into libraries than having this seen as being part of a grammar mix as already has happened with the TNG but I think there is lots of like file dialogue going on that is so much intertwined what the program does internally that might be a smaller input package we could immediately benefit actually we already have the XTR files that are regular it will be implemented all the time though for a package so it's great one thing I would like to see is actually for the packages to document their data models so I mean Gromax is very good for that I think it is because I work with auto-MD packages that Gromax is by far the best because actually in the manual it does define 90% of what's in the file not in terms of the data model but in terms of what the line should be most of the time it's correct there are times where it's not correct 51% but I think it is too much for me Amber is terrible in terms of defining what their data format is they have nothing to say what the data model is except for the NCDF trajectory spec which is brilliant everything else is terrible Charm PSF multi file format oh my god but there's the thing for the individual software package we can actually define a document that says this is the schema this is the grammar for this format this is the data model and it says those documents might be so much easier for us to build these tools and actually we need to look at the file formats in terms of their ontologies so that then you can find commonalities we want several things though we want both the low level, serialized format and there again could be multiple things multiple realizations of the same data model which is what we want people to use but if we want them to use it we also need a common implementation that's well documented and that is easy to link against and that's maybe easy to stand alone somewhere Python API as well to easily manipulate from that so that's the great thing about things like NCDF is that the NCDF for Python implementation is very easy to use or you can grab it into your Fortran or C but there's so much support for developers to easily use it that it's easier to do that than to roll your own reader or writer which I presume is what happens with these XTC funds just one of the packages wrote up in the windiness it was like all of themselves there's an NCDF basically as pandas which is some people have heard of pandas it's just so easy to use Python buys a single install line and then you have things like pandas which is a generic data analysis platform which you can interface with an ump and a map which you can then do analysis of any type of NCDF up because it's already in that data frame format I would love to see Tyfil I would say yes, we're just all going to put all of our array data just put straight on to NCDF everyone's life is so much easier and again that's a data format they've built we really doesn't matter exactly what the format is if we can at least all agree on that data model and I think that would be phenomenal we have a bunch of priorities in the development of the TNG library that worked out these couple of different levels that should be able to be generalized to wider community needs so is it then Voyager or a TS9 ha ha ha ha should I turn next time Rachael ha ha ha ha ha