 So I will not give that much of an empty-based approach, because that's not our core business. We do empty, but we use your standard approach. But I want to speak a little bit from the perspective of the Integrative Moment film, because there has been quite some discussion over the year. There also ought to archive these kind of models. So Lundi Scale was also a thing there where people have been thinking about what to do. So what are we speaking about? Well, these days in structural biology, it's very common that there's not a single technique that's going to give you all the answers. To combine data, you need to use simulations in combination with all kind of experimental data to get to your model. I think one of the nice examples is the work of Andres Salli, where he's been modeling in several stages the nuclear core complex. And in there, you would find atomic-stick representations, you would find things that are more coarse-grained at the same martini type of coarse-graining, but you would also find proteins that are simple spheres or cones based on whatever knowledge is there. So it's a really newly-scaled kind of effort, and it was for a long time no real way of representing those data in a systematic manner. So this was very much linked to the software that was used for that. And also visualization of these kind of things was problematic. So the developer of Canera are also very much involved in this effort. So can we actually use a file format or use a description of the data and the system that allows also these relations? So I should know that I'm leading the coding structure, and I should be able to visualize it. It's no longer a simple sphere that pertains to atoms. So in principle, if you look at the field and the related software, the software has an integrated modeling but also docking and field docking, which might be doing a big issue regarding the data. There is no real consensus on the data format. I think most software in the field is still using the PDB as the old PDB format as a consensus. Some of them can handle MFC as well, which is officially the new format of the PDB. There is no efficient way of storing large data sets, or for sharing. It's not really arranged. And if you look at the software also, different software will have different ways of sharing things because it depends what you are doing. If you are doing docking, you might just deal with rigid molecules, and the only thing that you need to store in principle is the transformation matrix. So rotation, translation, and you're done, and you can use the entire set. So that's very compact. And if you look at publications, if you are looking, there may be a PDB file attached to a supplemental directory. So that's the thing. And usually, there's never the full data that are shared in terms of communities. And it's very rare that you find actually the settings for people to do docking. So at least in the docking field, it's not very common. So while we are running the problems also, we are participating in experiments like Capri where you have to do docking, scoring, and all of that. And the scoring part, we get PDBs from all over the community, and we run into problems that normal PDBs are equal. There is some standard software that are not producing the proper PDB format, and this might just give you trouble. There's many software that are actually truncating those PDBs right after the coordinates. You would not have occupancy key factors or even the fields that are coming after that that also contain useful information. So because of that, we started developing, actually that's not the PDB that I wanted to do, but these PDB tools, which is a simple manipulation of PDBs, but also simple validations. And we use that also in our portals. So we are operating. So Hadoq, the main usage of Hadoq is through a web portal where people submit data. But we have to do a heavy validation of the data, and commit a voice to your portal, or extend it. So people have been submitting work documents to the portal. They edit the PDB file in Word, and they don't realize what PDB file is. They don't know that it's a text file. And then they submit a word of human, and of course it doesn't work. If it doesn't work, please fix it. But even if PDB doesn't support PDB anymore? Well, it's still there. So you can still download it from the site when the official format is MMC. So everything is on our portal also. We should support that and see if it's as good as output, if you want to do things properly. But I think there's a lot of software out there which is still not yet ready to handle MMC. And so what you will see now is that people are dealing with MMC, but they are converting back to the old PDB format, and then you're dealing with that software, which is perfectly fine if you are looking at small objects. If you have more than 9,999 residues in your system, then you screw it up. Or you have to start playing with chains and all of that, which is doable. So in terms of what we are doing, in terms of data processing, the server has been operating now for a few years. And they are two stages. So when people submit their data, but also when they get back their data, and this will be the start of the result page, we do provide for us to get a complete data structure, but it's based on a directory structure of PDB, but also since the start we have been providing a parameter file, which is actually a self-contained text file, which contains the parameters of the data. So even if the users don't see all the options, for the single interface the portal is just giving to PDBs and a list of residues, and that's it, they get this file which they can download before the run starts, which contains probably 500 parameters that they could in principle change, and it contains the coordinates and it contains the data development. So that's a self-contained file, which we are always saying if you want to save one file, save this one because there is also an option in the portal to upload this file as one file upload and rerun the simulation. So we're also advertising if you want to provide a supplementary value or something from your modeling, do give this file because someone else can take it, come back to the server and hopefully get the same results unless we have changed things on the server or you're running a different model. So that's a simple text file, so now we are moving to a JSON version of this file in a new server that Mikael has been working on, and also in the context, so HTA5 was mentioned just before by Eric, so in the context of the deep learning project, we are also storing data in HTA5, so maybe that's something we have to consider for the future. So one of the problems that we have on local systems is that we are generating lots of small files, so it's not like MD where you have large, maybe compressed files, but a few files that are very large. So thousands of thousands of small files is what we are doing, and this is causing a stress to our system at some point. So we have to think of ways in the future to reduce the number of files. So now that one of our data sharing, there are a few repositories where you can come with a few terabytes of data, and I don't think that's enough to accept terabytes of data. So that's one problem in the fields. So they want an initiative, so this was published in 2016, that's the SP Grid initiative, Structural Biology Grid, it's in the US, and they were funded to provide the data repository, and the first end there was to actually store the primary image from X-ray physiology, so actually what the detectors are recording. So the PDB is the archive for the structure and also the structure factors, but the structure factors have already interpreted data from the images, and they wanted to collect here the raw images, so there's a very long list of orders here, but they also accept structure models. So we have been using that to publish actually our data set, so when we do some new work and create a benchmark and then demonstrate what we can do on a given system, we upload our data here, and that one, it's about, there's one here and another one, so the most recent data set that we uploaded is this one, it's about membrane recording complex, it's a benchmark where we provide so all the input files, so the input file that I show you which is self-contained is provided also but also all the models that come out of the docking and this particular data set says here storage requirement 1.3 terabytes, so that will be uncompressed, so they do compress the data, but so we are able to put on this repository our models, so I don't know if they will start accepting trajectories, that's something we will have to ask for that, but they do accept, at least they do accept our models, it's not like the docking community is dumping all their models there, but for us it's a good way of doing things because we can indeed put a terabyte of data in there and it's fine. There's not so many metadata associated with that, so what you get is kind of, so they use our scene, they provide a script to do the things, but the metadata are rather limited and you have a DOI also, so actually when we publish the paper related to this work, we provide a DOI to the data set, but that itself is not really machine searchable, for example, you will find the data set and the publication of the lab name, but you will not find all the details of what is in the data set, and there is no specific structure of the data, so we do provide our own dietary structure, which is related to the way Hadock works, but another software might provide something completely different. So for us it makes it easier to share it like that way, because the format is not even and not brought anything. Okay, so now for integrative modeling, since a few years ago there is a new site from the PVB, which is harvesting integrative model, so the PVB itself will only accept, say, hardcore explainable structures, although there is some models in there, some of those models are coming from Hadock. There is a dictionary which has been defined, so there is a GitHub site which is freely accessible where you can see what has been defined in a model in order to allow the deposition of those integrative models. For example, defining that the molecule has a code shape is something that needed to be added to the model, because PVB's MMC format will not handle these kind of weird particles. What also went in there is we want for Hadock, so we have been adding things in there or requesting additions, but there is much more than that. I can on this, but I want to be able to say that mutation of residue x is important for the binding, so that's information that will go in there. And there is a website where now we can actually deposit these kind of things, so this is an hybrid explainable tool, modeling structures, and we also try to connect in there all the data associated with the model. So this came up together because there has been a task force from the worldwide PDB that has been set up to think about what we need there, and this was a meeting which was a big three years ago at EBI, so I'm actually in there, where different people from different aspects of the fields have been thinking of what is needed to be put in this model, so it's kind of what we are doing here at a smaller scale here. And the recommendations from that, and this was published, a recommendation, so okay, we need to archive the models, the experimental data and the metadata. We need to have a flexible model representation so that you can mix atomistic with cross grain with this weird shape or whatever that you want to describe. You need methods for validation. That's not simple. Build a federation of models and data repositories. I'm going to say a little bit more about that. And you want also to establish publication standards. So what should people report when they publish these kind of models? So what are the, what is captured in the model? So you will have sequence chemical information. You will have structural data repositories, experimental repositories, so the model doesn't capture everything, so in data repositories where you will put your data, you will just link to that, but it will be part of the file itself, of this MMC5 integrative modeling file. So you will describe the chemical entities, so the small molecules. If you have a cofactor in your system, whatever it will be described here, it will describe also the starting models. So there was a discussion about which PDB and 3D you use as starting points so this information will be there. And spatial reference that you use to build a model. So you have a few cross grain from MS, they are going to be described there. Yes. One of the problems the open force field initiative has had is that none of these formats that describe the chemical molecular components describe the chemistry. They describe the connectivity, but they don't really tell you like is this a double bond, is this a single bond. Do any of these file formats have a path forward that you know of that will describe the... I feel like in MMC there was a way of describing actually the force field to some extent. So this is working progress. So this MMC, this integrative modeling force field, the MMC format itself is constantly, the format is not changing but the fields are either to represent new stuff as we needed them. So it was very much starting from the work of Sally so everything that was there was to model or to describe what was coming out of the integrative modeling platform of under Sally in and then we started also contributing and we say well in our model we need to be able to describe a number of arbitrary residues that are part of the interface and this was not in the model of Sally because they were doing things differently and this has been added. So this is very much progressing so the model was not defined from the start and this is what we need at that signal we add stuff as we need. This could be a starting point for something else where you say we need to add those additional entries there. This model, this integrative modeling and SIF format is kind of decoupled from the BDV one so the BDV is very hard to have things say it's a kind of procedure to do that and this is much more flexible in a way it's a different repository but there is a... so it's evolving basically and you can look at what is in there already so there is also a software library that supports the dictionary and they have APIs to that So currently what is linked in all of that so there are connections to EMDB that would be a crime SASDB for the SAS data VMRB for MR data there is mass spectrometry cross-linking type database what people can see FRAT data is also a database for that so the idea is not to store everything in that particular repository but to link as much as possible to existing ones but if nothing is existing then it's okay to add new entries so the example was to be here there is a little bit of a mutation which is coming from this table in article X so you can describe that in the model so we have been starting and actually Mikkel has made the work to enable support in HALO for this format so we want to facilitate the position of models in this PDB archive or PDB dev archive so the new HALO portal is not creating this kind of models and the idea will be that the layout of a docking typically is not one model but an ensemble of models which can be associated with energies and there are the first three models of that so we want to capture all of that in one file that we provide to the user so that the user can add a little bit information to the file but then go for the reposition because the reposition system now it's not like the PDB where you are guided through all the fields that you need to enter this is not yet at that stage so we should support as input this NFC file and we should support as output of all those models so we have now built it as output of the server there are different options at this time it's still a fair model a fair structure but ultimately we want to add one download parameters where we capture all the models that we presented to the user at the end which could be different clusters and with all the associated energies clustering statistics so one file should contain more and hopefully that will also allow you in camera camera X to visualize the complete document run in one go because camera X is also working very much with this format to visualize clusterings statistics and all of that so opportunities if you think also for integrative model PDB DEB as a repository it's funded actually for US brands and this time the PDB Europe is also involved in that so there is storage there if you think of MD we need also a different model possibly to share the data which is probably the pilot data model which is probably a good one because you need to involve also the data infrastructure providers not the data generators but the people who are willing to store the data so in all case we use as big grid I don't know how long they will be happy to accept models maybe at some point they will say no more we will have to see we have this European Open Science Cloud we are European so maybe there are opportunities there to do things because it does provide networking, compute and behind these data as well so we are working in this European Science Cloud and we have this series label agreement so we are only using compute actually in the Science Cloud these days but we have for example a 250 terabyte of dedicated storage of course that's not going to help you as a community because this is one of the data sets so that's peanuts but maybe to organize the data astronomy is using those resources different communities are using resources so if you have a good use case to say well we would like to operate some server or portal to collect the data maybe we can make a run for it and that's it I just want to acknowledge a few people here so I got a number of some grid that is the one who is really leading this integrating modeling MMC format so she is the one to contact if you want to have your double bond described in the format because it is required if you decide to build on that and then Berman was the former head of RCSB and John Westbrook is the boss of MMC when it comes to the PDB pretty much so these are all the PIs from the work by PDB so for the UK there are people in mind of Samir so he is involved in this initiative as well although most of the integrated modeling initiative is happening in the US so Anversa is very much leading this Tom Bordert Kamera and Radix it was very much involved and on the other side was also part of this so Miguel was doing all the work actually and the idea is we need to promote by providing ready to kind of ready to deposit files promote the sharing of the data so we will do it ok