 Thank you so much, Andrea. And hi everyone. Good afternoon. The second call for 2023. And this time we have our first actually invited speaker in the series of Argos Community Calls, who will be talking about Argos integration in a platform that supports research objects. His name is Raul Palman, and he's head of data analytics and semantics development at PSNC, which is the Posen Supercomputing and Networking Center in Poland. And Raul, thank you very much for joining us. We look forward to hearing more about what you did and how you did Argos. Yes, okay. Thank you very much, Eli. So hello everybody. I'm very happy to be here as well. Join you in these community calls. So one of them just to comment that the word that I am going to show you here is some part of the activities that we are doing currently as part of the Reliance Project, which is one of the InfraEOS 07 projects, you know, so is the set of projects that is enhancing and extending the EOS exchange. And one of the activities in this project related to the management of the research lifecycle is also connection with the Argos data management platform. So I will start a bit with the motivation of this integration. I mean obviously you know very well the research data lifecycle. So basically as a continuous and dynamic process that involves different stages, since like the initiation of research project through the development of the project and then after the finalization of this project. Obviously there are many stages as you see in this diagram, you are very sure that you're very familiar with this of course that includes since the planning collection the processing analysis of the data preservation of the data sharing of the data and then the reuse. So what was the idea for us, you know, to connect this with our work. You know the main challenge here is to support this lifecycle from the perspective of various related stakeholders in this. So basically like researchers and data providers but any other type of a stakeholder that is connected to to this integration or management of the data, which do not only of course definition of how the data is handled during and after the project, but also that are the ones responsible or in charge of this. The production generation collection of the data this processing and analysis and the preservation of the data and making sure that the data is discoverable, so that it can be reused later on, if it's possible. So our goals were basically to support the creation and maintenance of machine actionable DMPs that can also support these stakeholders to actively and dynamically manage and maintain the data management plan, but also the related data itself and the management plan with the data from a single point that also enable the linking of these resources and not only like the data itself but as well. Other resources that might be relevant for the data management plan like you know the software that was used to analyze or produce the data or the publications that were derived from this data. And also to enable the fairness of assessment of the data management and related resources, particularly the data sets and making sure of course that everything is discoverable in ASC, particularly in the in the research or research graph or scientific knowledge graph that you call. So that was basically what we wanted to to achieve. And, and then that's where it comes to the arrow have platform into play. So I don't have started as a prototype solutions since 2010 or less so that was like a, like a proof of concepts. So this holistic solution for the management of research objects I will go into details what are what is research objects or you will understand better. But the Roja platform started to evolve into, you know, from the proof of concept into some kind of testing in the 2014 2019 which was the first real testing in some research communities. It was a production product which is now from 2020 and is now integrated as well in the AOS. So basically, I don't have provide the storage life cycle management and preservation of scientific outcomes. It enables to share and make these resources available to others, publish and release resources through the eyes, and allows also the discovery and reuse of pre pre existing scientific knowledge, basically. And it's a reference platforms because it implements, not to lead the arrow grade model and paradigm. And, of course, it supports different stakeholders but our primary focus is always on scientists researchers students, and any kind of entity aspects in science. And it's considered or used as a backbone to a wealth of our central application and interfaces across different scientific communities. In order to understand better of course this, the platform and the whole concept of these, we need to go a little bit in details of what are the research objects. So, again, this, this definition of the research store objects started like around 2009. And then it was evolving during the latest year into what is called now the arrow crate. But the goal, the main goal of the research object was to account describe and share everything about your research, including how things are related to each other. So you can see this as a, as a box, of course, where you put everything that is connected to to your research. So, let's go to see how this works in practice. So, basically, the research outcomes and related resources are normally, you know, spread across different repositories they have of course their own metadata, but are normally in different ways it's like you know workflows are in repositories like workflow hops software and normally in github's models are in some other specialized repositories presentations and slideshare articles, and Bob made or you know, any other structure data, for example, in Senado and many other organizational repositories. So that's more or less the situations, but in order to make the research fair, of course we need access to all of these resources. And this is where the research objects comes into play so it's like giving an integrated view over all of these fragmented resources using PIDs and metadata that is human and machine readable. So you can basically see all of these connections and the box itself as a contextualized graph because at the end the research object is like it's a graph is that it's just connecting the different resources and putting accessible from a single point and based on a single point to identify a permanent identifier so the research object has its own metadata as well of course, but can be managed and involved in its own right, so it can be packaged it came on the positive transfer access and reproduce if appropriate. So that's, that's more or less the high level view. So a little bit more into inside what is a research object so this is how it looks like, particularly in the new, you know, a paradigm based on the arrow creates. So you have like our metadata file, basically, which is this having this metadata structure metadata about the research object itself, and the content, I mean all of these resources that are there, like the files directory links to resources are really important because as I said, the resources can be just pointing to the location where where they are physically place like you know this repository that I mentioned at the beginning. Of course you can also put, you know resources physical inside the research object but the let's say like the most common usage is connect resources that are spread around different locations. And then if we do see like how this is in the implementation point of view of course at the end, we have them these ideas so for example, for all these contextualized entities like authors organizations or projects, that kind of thing we are using identifiers so like orchids ideas or Roars IDs. And then for the individual files and resources we have of course as you know a set of metadata that is describing each of them, of course, but they are also having their own identifier that can be even do I, of course because it's an article is always a data set that has its own do I, or it can be just pointing to the locations in I don't know in GitHub or in Zenodo or some other place. So this is how it looks like. And as you see at the top this is like a high level permanent identifier of the research object itself. So, as you see this is based on the link data approach, of course, and based on some basic and simple principles of how to package this information and make it accessible as a both human but also machine readable way. So in summary the arrow create is providing like a practical lightweight approach to packaging the research object that entities, and you know, which with the metadata. The aggregates files and any content that is accessible and addressable via your eye with contextual information to a decisions about reuse I mean like who, what, when, where, how, all of these questions of the resources that are connected to this research are what we want to, you know, addressing the metadata of the resources inside the research object. They are web native of course they are machine readable human readable search engine friendly, they are familiar with all this, you know, it's nothing like completely new so we are based on existing works. And this is extensible and incremental so you can always add additional metadata, which is particular to your to your research community, and this is done like using a profiling. So you can do some profiling of the arrow create for some particular community or for some particular kind of application and there is this is an open community efforts there's a lot of people behind this. So basically where it from from the very beginning as I said in 2009 or less but there is a lot of people behind this now. So, so based on that. I go a little bit on the functionalities of the Roja for you to understand the, the management of the research object so basically Roja enables you to manage high quality research object that can be interpreted and reproducing the future. This means that you are able to assess via Roja, the quality of the research object based on different dimensions or criteria. So for example, one of this is something that we call the checklist, which allows you to say for example how complete a research object is based with respect to the requirements in a particular community, but now we also have fairness services connected to that I will show you in a second a bit on that. You know, reference share and preserve your studies campaigns observations. All of these resources connected, including, like I said internal or external ones, and you can also point to other research so you can have nested research. You can collaborate via our hub with other colleagues, right so you can have like permissions to say who can modify a research object who can edit in this research object and who is if the research object is for example private right so that kind of things. You can manage the evolution. So this is very important. So similar to, you know source code paradigms. Basically you are able to produce some releases at certain points in time which we call snapshots. And also you can make some form from a research object in order to facilitate the reuse, you know, and acknowledgement of, of this work that has been reused. You can over of course publish with the eyes and allowing the citation. You can also monitor and follow the research so you can notifications for example if something has changed in the research object the quality has changed. And you can build also reputation because there is a possibility to do some rating. Favorating some research objects, and of course find research that is related to other people in in a similar domain. So we have some of the added value services here. So the semantic enrichment so we have connected services that enables the extraction of metadata from scientific texts scientific content. And this metadata is then of course represented as also machine readable metadata. It's also used for searching and discover research objects. This metadata is also used for the recommendation as you see here this is something that we call the collaboration spheres. Then you have all of these social impact information. The life cycle that I mentioned before which allows you to publish also in other services I will show in a second. The quality assessment that I mentioned, and integration with the AOSC itself of course a rehab is connected to AOSC AI via the EGI checking service. So how is the connection with AOSC in general so this is how it looks like. And here of course all the services that you see are somehow connected or on boarded in AOSC so you have like data cubes from Adam platform. You have the text mining services. These are to come in from reliance but also you have you know EGI notebooks EGI binder Senado be to share be to drop. And of course Argos, which we will go now in details, then you have the research graph of opener and EGI check in. Yeah, so this is high level overview of the architecture I don't need to go into details here just to say that this is based on on Django and and react. It's like two modules there is like the backend and of course the front end of portal is is in a different location or different components. Sorry I have something calling. Okay, so going into details with the integration of Rohan and Argos. I'm coming here. Just give me one second because I think that somebody's calling I am as I said in a hotel wait a second. Yes, no worries. Sorry. No problem you can continue. Yeah, okay so now let's go into more details of the integration with Argos. So what we are having now. We are having the possibility to of course you know Argos can export DMPs in different formats, including X XML and Jason so these are some of the potential export formats for for data management plans. So Rohan now can import those DMPs particularly based on the XML version, but also the Jason version can be imported I will go into details why is like this. And then this import generates a research object in Rohan, as you can see here so this is like a data management plan that we have in Argos, and based on this input process we have generated a research object. That includes all of this information about the data management plan. So let me show you now a bit more on these. So the imported research object includes on the one hand all the information from the DMP in the form of human readable as a subset because you saw that there is the description the funding, you know the the grants all of this information. So all of this is of course also machine readable right away. Sorry, human readable, but everything. The whole information is, you know, aggregated in the research also in machine readable format, reusing standard vocabularies here. Particularly, as we will see, there are some existing ontologies that are in vocabulary that are particularly for DMPs. But, you know, for more general terms we are reusing other other vocabularies that are well known and reusing many different contexts. But apart from from all of these metadata. We have the data sets themselves. So the data sets can be physically in the research object, or of course by reference so that the data is just pointing to a location where it is placed now for example a repository in the organization or placing whatever you can also put this in the research object so that that's what the research we will have not only the data management plan with all of these metadata, but also the data sets themselves. Right. Of course, if the data set is not existing at the moment of the creation or of this import. The research object is just adding like a placeholder is a folder where later on you will be able to upload the data set or a link to the data set when it will be become available later on. So that's that's how it is generating the research. Additionally, when new versions of the DMP are created, they can be also propagated in the research object. The new version of the DMP will create a new version of the research object as well. So basically, what we are doing is before importing the new version, we create a snapshot. I mean, this is automatically of course that this is done by by the input process or the, you know, update process, a snapshot of the research is generated before the important the new version. And then all the new information coming from the new version replaces the existing metadata in the research object. So you will have then as well access to this kind of life cycle of the inversion of the of the DMP be at the research object. So a bit more on the workflow and implementation details on that. So first of all, how this works. The first thing that you we need to do of course is about templates because Argos of course is based on the use of templates. So you have the templates for example for Horizon Europe for Kistera program so the DMPs are based on templates which the templates reflect the requirements of a particular program or funder. Right, so that's more or less how it works. So I don't have what he's doing is, first you need to process the Argos template which is in XML. Then Rohan extracts and returns the questions with the idea of the answers in JSON form like like like here. Of course this is empty, because we don't know how this will be mapped into the particular terms that I will, I was mentioning before. So then is the role of a domain expert here to map the answers to predicate using those existing vocabularies like I said like the DMP ontology, doubling column terms or schema.org. Ideally, of course if no predicate is found in those vocabularies, then a new predicate is generated or created that's not a problem but but we try to be compliant and reusing you know terms from from those existing vocabularies. So then this field mapping in JSON is important in Rohan. Of course this is something that is done once and these are only admin operations right so this is not something that we expect the users to do or something like this. This is done by administrators and in help with the domain experts. So once this is done right we can now support the DMPs based on these templates. So, this is when you can import the DMP itself that is generated from Argos and the operation expects the DMP and XML, but that's the main source for the input. But optionally, you can provide the URL of the DMP so that we can also generate like a reference from the research object to the DMP in Argos. Right, so that's something optional that you can add here. And also you can provide the JSON of the DMP. Why? This is because we found out that there is some information, some metadata that is not present in the XML and is present in the JSON. However, the XML is still more complete than the JSON. So there is a bit of work that we had to discuss with Argos on this but it's more or less the situation right so if you import both then you have almost everything that is in Argos. Then, you know, you are able to retrieve folders that correspond to the data sets, right, for giving a row, and you can also get from RoHAP or you know the machine readable metadata of a particular folder that means the DMP data set. That's more or less. Then this is how the upgrade works. And you of course first go to the RoHAP and tell I want to upgrade the DMP so I have a new version of the DMP and of course you can import it. But there we have of course some scenarios to consider. For example, the new version has a new data set or the existing data set is updated or the existing data set is deleted. So, you know, the first two are more straightforward how to manage. But for the last case, there is a constraint that the data set should be removed from the research object before we can upload it. Or of course, if it's empty, it's not a problem, it's automatically, you know, just discarded. But the one thing that we didn't want is just to, you know, lose existing data that is already there in the research. So the user has to be aware that if he removed something, there is already some existing data sets connected to the research. So he needs either to delete that from the research object or make sure that the new version also includes this data set. So these conditions are met, like I said, there's a snapshot is generated, and then the annotations of the research object are replaced. That's how it works. So now, what are we working on now and what is the future work? I mean, what are the things that we have, there are many, many still open tasks, open things that we want to do with this integration. First of all is propagate changes in Roja back to Argos. So, you know, going the reverse way. That's something that we haven't done yet. Then of course, export existing research objects to Argos. So, you know, you have already many research objects in Roja. So can we generate or at least, you know, pre-fill the Argos DMPs with information from Roja. So that's, we have already done like an initial test on that. We are able to do some export. So we can export the research object to XML and then import this XML in Argos. So this is, you know, what we are working on now, like testing, there's a lot of things to be done on this side, but it will be something very interesting as well. Of course, we need to define templates in Argos aligned with the DMP requirements, especially in our case at the national level. So in Poland, so we have many requirements, especially from the National Funding Agency called NCN. We have like a first version on that as well, but it needs to be validated, discussed and adapted, adjusted with the, you know, validation of this agency. We want also to extend Argos GUI to enable direct export to Roja. So instead of doing this manual export and then import, you know, it will be very great if you have here in the Argos just export to Roja. And this will be done, you know, automatically by the user, all of this process. So that that's really something that we want to do as priority because it will be really nice thing to have. And the same thing will be to Roja GUI to enable the export to Argos. The same thing, right? So that's the situation. Then another thing that we are having as a, you know, things to do is provide some automatic suggestions for the mappings that I showed you before. So this JSON that is generated with the questions and the answers, the idea is that we can provide automatically some suggestions based on the some kind of NLP. So that would be really great as well. This is something that we have some ideas and we would like to do, but you know, this is there, of course, into the future. We also need to integrate national repositories on Roja to allow search and connect existing data sets to the DMP or to add the new data sets in from the DMP into these repositories. So that's also something that we want to do. Of course, we need to do a lot of improvement testing evaluation at the national level. You know, alignments with the funding agency systems as well because there are some systems in the national funding agencies that we would try as well somehow to produce the DMPs or export the DMPs information in alignment with whatever this national system can import if it's possible or at least that it can be easy, you know, transfer into this system because you know to avoid duplicate effort from the from the researchers. Of course, if there is any possibility to use Argos API is something that also we are, you know, open to discuss as well. And yeah, some clarifications issues found and why we are using these two files that I said at the beginning. So, we realize some things like, you know, in XML. You know, you have fund their grant projects ID, normally use our internals IDs, while in the JSON file you get information that are pointing to, you know, Senado IDs which are discoverable. So this is something that we need to figure out some also data sets descriptions are missing in the XML contact data is missing but it's in the JSON file. You know there's a problem with the profiles ID because they do this is not the same for for each user so we cannot rely on the IDs. You know, some additional data is missing like for example some some fields there which is not in the XML counterpart. So, you know, there's some things that we have found that we have tried to solve at least partially, but we still of course have a lot of discussion and interactions we need to have a lot of interactions with Argos team to make sure that we can overcome and solve all of all of those issues. So, this is like just a list of resources that you can use regarding row have like, you know where to find a documentation tutorials, help this can and so on. So, and that's all from from my side so thank you very much. Thank you very much Raul that was a great overview of everything of the research objects, context, concept sorry, the row hub and integration. I can assure you that we are already working on the API to facilitate, you know, better integration. Thank you for sharing that also with us. I see some questions that have been in the chat so one is by Paulette, why importing data sets themselves in row hub references should be sufficient. Why importing. As I said, you can have reference that idea so that you have the, you have the possibility to add reference that the research object is a, is an aggregator of references. Basically, you can have of course the data set itself, but the main idea is to have the reference to the data sets that's the main idea. Thank you. And then any asks, are there any work being done on further developing the RDA common standard to fit the occurring needs of scientists their institutions repositories funders etc. So I'm not directly there, so I cannot, but I can, I can, I can inform because I'm co-chairing this group, the interest group for active data monster plants in RDA, and there are not ongoing activities around this but we are considering how an expansion of the current common standard could, could look like at least in the future. So we are exploring this idea and how it could look like. I understand that these RDA common standard is also the one that produce these DMP ontology right? Yes, yes, the one that I said would be yes, exactly, exactly. Then Diego Mugave asks, think you're all, can we possibly have your presentation? Yes. So we're going to share everything is going to be linked on our Argos community goal page. We add that here as well again. So after the presentation, you can find everything here. And again, Pauline said, yeah, that's why she doesn't see why to import data. As I said, we don't import the data. We just like add the references where the data is right? And the idea here, of course, is not only that you will have then all of this information in the same place with all of the metadata, you can also connect other resources that are related there in the DMP as well in the research object. And of course, I didn't go into details what else you could do once you have everything connected there. But for example, if you have, I don't know, software notebooks or whatever in the in the research shop that you could directly, for example, open these in EGI notebooks and start working and on, you know, with the data that you have been produced, right, in your DMP. So, you know, once you have everything in the research, there are many other things that you can that you could do. Right. So that's, that's the main idea. And of course, as I said, we have also this, you know, services for assessing fairness, for example, that is already connected to to the Rohab that is like, you know, just giving the, the, the, you know, research object, but also the fairness of the individual resources and that kind of, that kind of things. Yeah. I think you're in agreement. So you're taking about the same thing as Paulette mentioned. And any other questions, or can I have one question maybe. I can give you a bit more time to type or even unmute please for to unmute and have a discussion with us. And my question is you mentioned that you're having a national instance in Poland. Yes, so can you, can you, sir, if you think about that and if Rohab is used for in other countries or in other institutions. Yeah, so, I mean, we have, you know, as part of all of these activities at the national level, we deployed Nargos to be used in Poland. We were actually making all of this translation to the Polish language. So now it's everything also readable for the guys and for the researchers here in Poland. And, yeah, so this is, of course, connected, because the instance of Argos is not a now. Yeah, Argos itself is not connected directly to Roja, because now it's done via the export input when when I said I want, we want to update. I mean, improve Argos interface in GUI. You know, of course we would, we would need to do that at the national level. Hopefully this can also be an option in the you know, general instance of Argos at some point. Yeah, the idea is that we would have this connection directly to Roja and Roja again is also having like a main instance because this is like the general instance. We didn't have the need to have different instances of Roja at the moment. We are having the possibility to for researchers also to put their data there at the Polish level so everything is translated into Polish language as well of course. Right, so that's the two main language of Roja. Now it's English and Polish. Yeah. Okay, let's see. Maybe I can contribute to have Greek also or someone else can contribute. Yeah, yeah, of course. We have Roja in other languages. We can add to how the translation for Argos, you know, so there is some kind of the main configuration files for with solid. Okay, so you have everything and then you just translate it from English. Okay, that's good to know. Thank you. Are there any questions please feel free to unmute. Okay, then if there are no other questions can you, where can we find you Raul if we, if you would like to explore this idea of localization or using Roja in general. Yeah, yeah, yeah sure I mean, we will be very happy on that. I mean the first thing maybe that would be really nice to see is also your, you know, input feedback. feedback on how the research object was generated from DMPC in Argos. So I would invite you when this is like more a bit more tested maybe like you know, you know, some days or something that the latest changes are released. I was just saying that we are working on now. These are going to be released soon then it would be good to have your, your feedback because you know you are working on the DMPC basically all the time so you know, very well how this, you know, how it looks then from the research object perspective and then you know any kind of opinions recommendations it will be it will be really useful. Thank you for this open call for people to provide their input. Thank you. Yeah, thank you for all the nice comments as well in the chat. It was very informative. Thank you so much Raul for being here and joining our community call. Make sure to add your links also you have you have in the presentation so everyone can reach out to you. And we'll come back yes we'll continue of course our collaboration and discussion offline. Thank you very much everyone also for joining. Hope that was interesting it was something different this time. It was interesting to you as well. And talk to you next month. Bye bye.