 My name is Marc Portier and I'd like to welcome you to my home for this webinar on the work zones in open science. Thanks for showing up and a big thanks also to Open Belgium for bringing us together today. The slides are available and free for reuse under creative comments, but I do appreciate your attribution and your determination to share alike. I have a limited amount of time and way too much content, so let us quickly get some slides out of the way. Since one year and some days I work at Vlis, which is the Flanders Marine Institute. We are 20 years old and based in Ostens on the Belgian coast. Our mission is to enable an advanced marine research in all possible domains. And that ranges from biology, geography, ecology, and even includes history, medicine all the way up to psychology. We do this to promote an advanced society in general and therefore we target various audiences that we see as important stakeholders in and around the benefits and responsibilities for our beloved seas. Narrowing down to my role, I only have time to mention the department I work for, which is the VMDC. In English, that acronym expands to the Flanders Marine Data Center. Our focus is on data, data management, and the design of data systems to support it. We participate in many and a lot of projects and all of them have different levels of geographical coverage and outreach. Most notably, I think we publish a number of data products for the marine research domain and those are ranging from something like the Schalde monitor, which shows our local expertise and connection. But it scales up via the European level for projects like rubies, which tracks marine biodiversity, but even spans the globe with the reference databases we manage and govern for marine species and marine regions. And concerning open science, we are active on the Flemish level inside the Flemish Research Data Network and the Flemish Open Science Board. And in Europe, we participate in projects of the European Open Science Cloud. Within the VMDC then, together with three colleagues, we make up the open science team. We make sure the organization as a whole realizes and keeps furthering its open ambitions. So what we are doing is implementing the fair principles, advocating to and supporting research groups and adapting the data systems to be open by design. In doing so, we very much like and embrace open source and linked data technologies. And if this sounds appealing to you, keep an eye on the slash jobs page of the Vlis website because we are working towards an extra job opening in the team. And this is me and the various ways to contact me or find me. Ah, yes, this slide and some form or another. This has been in every public speaking slide deck I've given since 2009. Honestly, honestly, and honestly, the web is totally awesome on so many levels. And I think we should regularly have parties just to remain consciously aware of that. It turns out this slide keeps being relevant. So I keep it in and it surely will be today. So we will come back to this fabulous web. Right. Now, all of that is out of the way. Here is the actual agenda for today. I see this as going to be six main blocks. The first one to lay down some introduction and overview of what open science is. And then five actual identified work zones. That could turn out to be too much, but let us be ambitious. Finally, this is 2021 and we do this online. So I don't hear nor see you. And the point is we should we would we would very much like your feedback, even more than only your feedback. We actually want to reach out and recruit some of you to collaborate on some of these tasks and to achieve that goal. I would like to ask you to use this code writing space we have foreseen and which should have been shared in the chat also. And also, at the end, you might prepare yourself to open up your own mic for our audience participation part. Also, some of my colleagues are around so they might interact on the chat while I am talking. In an experiment, I did an effort to include QR codes for all the links I mentioned. So if you have a device with a QR scanner around, you can follow up on those as well. And a last tip, my voice is mostly not going to read out what is on the slides. Okay, I expect you to be able to do that for yourself and allow my comments to be loosely providing additional insights around that structure. So on with the show. When I talk about open science, I almost always end up doing some fair bashing. If you haven't heard about it, the fair acronym is the first thing you learn when coming to the open science field. It's a bit like a rite of passage or a painful tattoo, something like a required code to become part of this club and to sign into its belief system. The letters spell the expectations for all data and publications. All of them should be made, findable, accessible, interoperable and reusable. Don't get me wrong, it is a very smart acronym and a very suitable top level shopping list of the things to cover. However, it is also becoming its own purpose. Most importantly, in my personal opinion, it is too often translated into a simple checklist. One that allows people to declare we have done enough to reach the end of their responsibility. They just need to check all four boxes as quickly as possible. So instead of taking that route, I want to invite you to a more open and unbound thinking about open science. This is what we take as the guiding user story for the work we do. To provide a relevant slice of the global research data set that is as simple as a Google search. We actually package this in our gyro as bug number one for the complete team. And we believe it's probably going to take us until we retire to get it done. We see this as having two parts. The first part is about the natural way of finding the data. If you go to Google today, you will be able to have it answer quite convoluted questions. An example, who is the actress playing the wife of Stephen Hawking? Or who is the director of that movie? This kind of natural way to find is precisely what you would like to achieve for the data lookups in the realm of scientific work. And on the left side of the slide, you see a number of examples in the marine domain of how such searches could be expressed. In contrast, on the right side of the slide, we have listed all the steps that are involved today to actually answer those questions. And if you look carefully behind, semi-transparent behind the list in the background, you see a big pie chart. And that brings this to my next slide. It turns out that no less than 79% of the working day of a data scientist is wasted on those steps. All of them are avoidable and overhead issues. And if this 79% could be reduced to 18%, it would shrink their working day to only one third. If we turn that around, we could make every full-time hire of one data scientist count as three. And we believe it is precisely this kind of web-scaling that open science should aim for. Society is funding the research, so logically it keeps the right to expect a high return on investment. Open science often serves on a way of high ethical motives, but we should not be shy to claim economic goals as well. We should make and deliver on a promise to reduce the cost of doing research. The next open science doesn't stop at targeting the science community alone. With the second part of our ambitious bug, we want to stress the need to target the general public. The research community needs to push its quality information out on that same web everybody has learned to use. It has to be available there and it has to be made high-ranking. At the bottom, you'll find my catchy Twitter quote on the subject. If you agree, scan the career code and give it some of your Twitter love. And remember, this event is using the hashtag Open Belgium 21. So now we have a challenging job and we see two parts in it. And in our head, both parts lead us to the web. First, as an example of easy and natural search, but also as the platform to reach everyone out there. So we believe the backbone of this fantastic web is the foundation for open science, especially the principles of the semantic web. Which brings us to our, this we call the IKEA observation slide. It should explain to you the subtle difference between semantics and metadata. And it also shows the connected unity of this whole eternity. It is always data, metadata and never forget the third one, semantics. We need them all and they should be kept together. I mentioned the smart Google search earlier, but you might remember this scenario. It's less than a year ago. You have booked the flight and suddenly your email client receives a confirmation email. And it is successful in assisting you to add that event to your calendar. And you have to understand it is not an effect of some spooky artificial intelligence or any big brother stuff. I know it looks like they know about your plans, but in the reality, this effect is driven through clear and straightforward semantic annotations inside the message. Inside the message, there are just some schema.org terms hidden inside the HTML, the HTML of your email. And those allow your mail clients to understand part of the information that you are reading. And magical effects like these are easily achievable when we add semantics to the data. And that is what the phrase making data machine actionable really is about. And then going back to research, I think maybe even worse than in other domains, the attention for semantics has been absent and allow me to put that in context. The required understanding of meaning and intentions has not been neglected. Rather, it has simply been kept implicit. They assume it to be known among the smart peers. But in reality, in most cases, you have on the one side a knowledgeable producer of the data. And it talks only to the unknowing random external consumer via some intermediate data system. It is therefore essential that all insights and understanding, all semantics are captured and made explicit. To conclude, all of this shows the access part of open science is only the tip of the iceberg. You might remember that Richard Stallman from the Free Software Foundation at some point in time clarified that free and free software should be read as freedom or free speech and not as free beer. In the same way, I think the open and open science really refers to open-ended and not to open cans. Open is about taking up the extra work. The data is not released only to repeat your own findings and experiments. Instead, it needs some extra work. It must be prepared so it can be rehashed, reassembled, recombined in future new contexts. Their original meaning should be kept, but their reuse should allow solving new problems. And there's also come back to targeting audiences outside the inner circle of the new research community. The shared data should function as an open invitation to three new classes of players. One is the scientists from other domains. Then you have the citizen scientists and the general public. But also we should target machines, brains that are wired totally different than we are and that could show us some new perspective on the data. In one poetic quote, we call this the rule zero of open science. If you're doing it only for the scientists, you are doing it wrong. So at last, we are ready to get into the work zones. Number one is about vocab management and lookup. Just to make sure we are all on the same page, vocab is just short for vocabulary. And that word should make you think about dictionaries. I should actually now explain that RDF is the language of the semantic web and introduce you to its grammar, but we don't have time for that and we don't need it. It is enough to realize that any language relies on having good dictionaries. Lists that hold and explain all the terms in use. So hold this dictionary image for a moment. Okay, the Flemish people, the Dick von Dauer and allow me to make just two adjustments. One is conceptual, unlike natural languages, which can be messy. These vocabularies make each term totally unambiguous and clear by themselves. That means that no extra context is required and also the other way around. No added context could change the meaning. So when they are used in a statement, there should be no confusion about how to read and interpret it. The second adjustment is a smart technical trick. The terms themselves are spelled out like web addresses. And yes, that makes it very modern web fashionable, but there is logic to this fancy madness. A lot can be said about it, namespacing and DNS management and whatever. Okay, but for the day story, we keep it at appreciating only one specific property of this URL. It is known as the follow your nose property. You can just type these terms into a web browser and end up at a page that gives you the explanation. So doing a dictionary lookup in this system is just as simple as using your web browser. Finally, to get it totally clear, we now have the ID. These are just three examples from the marine science domain. The first example shows a universally accepted term identified by marine regions. This one's particularly is numbered 2350 and is in fact referring to the North Sea. The second one is about my favorite marine species and people sometimes call it the horseshoe crab. But the correct worldwide way to talk about this is to use http://lsid.tidwig.org slash urn-colon-lsid-colon-marine-species.org-colon tax name and then 150511, beautiful creature. And last time on the slide, you see that piece of marine lab equipment. A little bit looks like a photocopier. It's in fact used to detect and measure plankton species in water samples. And it's known as the zoo scam, but the real good thing is that the unambiguous identifier for it has been coined by our British colleagues. So I hope this gives you a feeling how exposing our own data sets using these terms is actually the way we ensure they can be understood and compared with the rest of the world. Now that you understand what vocabs are, if you didn't already, you probably also see feel, I hope, you feel how important it is to know all the words and all the dictionaries and to use them correctly. And this is precisely what this first work zone is about, making sure people get tools that help them assign the correct vocab terms in the context of their work. The slide lists only two services you can find on the web to search for terms that are available. And many more are around and they're very useful, but these listings are very general and they're not tuned to the subset shared between the members of your research team. Also, they might not be tuned to much your own language nor include search terms or even local slang. And thirdly, there is no easy way to integrate these services with the applications for your team. So we put that all into one use case diagram and this is what we'd like to see. We introduce here some vocab admin, a role that governs the vocabs your team should be using. And at this level, popular local search terms, translations or team slang could be added to make them findable. Next, that's the green section. We consider how custom applications could use widgets that do lookups directly into the selected lists of searchable terms. And by using this setup, we aim at reaching the end goal, having actual scientists apply the correct terms to the data they are sharing. And remember rule zero of open science, we're not doing it only for the scientists. This comes down to if all of this actually becomes even more useful in a context of citizen science. With tools like this, we can assume members of the general public are assigning the proper semantics to the data they are producing and making it more available for the researchers. Finally, this slide introduces a spontaneous set of building blocks to get the job done, just one ID. Talking in depth about this is part of the invitation of today, but not here and now, I have four more zones to cover. The second one is going back to the trinity of coexisting data, metadata and semantics. It centers around the search for a unifying semantic research data package. We already talked about the wasted efficiency of our data scientists, but allow me to revisit the topic. When scientists go out to grab existing data sets, they do so full of hope and dreams. They're dreaming about finding information, dreaming about being whatever they found being transparent and clear to reuse. Dreaming it will be complete and not start a difficult search to grab all related parts on cloud services. Dreaming it will be interoperable with their own systems and workflow so they actually can get to work. However, little surprise, we already said this. If they actually do find data, it is most of the time packaged into some form or format that looks like a treasure box. It's only willing to disclose its value to the original owner. And the more striking part of that observation is the other side. It is that the very same people are also daily working towards sharing the artifacts of their own research, but are packaging them in similar closed boxes, and they do this despite the best of their intentions. Do my feeling it's not due to a failure on their part? They just like the tools to do a better job in this area. So, even further, despite the uptake of RDF semantics, knowledge graphs, I honestly do not believe that the two-dimensional table approach towards data is going away soon. If you think about it, if you consider scrolling, sorting, plotting, comparing, calculating, even scripting, you name it. All data manipulation assumes data today being in rows and columns, rather than in mind maps or graphs. And the good news is that we know how to mob the one to the other, but the tooling to make that as accessible as the common spreadsheet is simply not there. Another challenge is that more and more data sets are, in essence, scattered around in a number of places. Consider big data systems. You can't contain all the data there, not in your set. Consider remote repositories holding images or DNA sequences, again, too big to be contained inside your package. And then you have the workflow systems in the cloud. You keep on having a lot of local stuff. You have your mapping files, actual instrument measurements, what do you have? And keeping track of that complete combination, keeping track of all the relations between all of that, it always turns out to be solved by some invention on the spot. A highly custom, personal fit, mostly without any documentation, absolutely without any formal semantics. And the result is that we introduce obscure layers of wrapping that turn that well-intended glass box into the dreaded treasure box. And I know up to now I've mostly been talking about the lack of tools. And yes, this slide as well, it does so when it suggests there should be an ecosystem of supporting tools. But the elephant in the room is that we don't have a common data package standard to capture and share the three parts of the Trinity. And that's not entirely true. There are a number of standards emerging. And I'm choosing today to share my personal favorite in this field. It's called the research object crates in short the RO crates. And I like it for a number of reasons. The first is simplicity. We all know how to structure files and folders and then group them in an archive like zip or put them on a versioning system like it. So let's just do that. Second, it embraces semantic web technology. So yes, it has this RO crates metadata JSON file, which is a JSON-LD file that next to the tabular CSV files you put in the package, while it allows to, through the JSON-LD, you can add semantic statements about it. For instance, you use the CSVW CSV for the web vocabulary to express what the meaning of the fields in the CSV are. Third, it doesn't care if the described data is in your data package itself or in some external repository. So it deals with this modern reality of being half in the cloud and half working on local data. Third, the backgrounds of the people behind it, the people that are making the standard, their backgrounds is a mix of alpha and beta sciences. So actually by design or by coincidence, I don't know, but this group is forcing themselves to be explicit about the meaning of what goes in. So do have a look at it. If you take the link on top, you will end up on their website and the specification, and the links on the bottom are a fast introduction slide deck and a YouTube movie delivering that. I really can recommend the last for a fast entry. Of course, yeah, we all know about XKCD slides or cartoon 927. I am a fan of XKCD and every fan knows that Randall Munru is always right. And I think in this case too, when it comes down to standards, we all have our own pet peeves and we are more than happy to translate those into the motives to stick to what we have already been using. Definitely the middle panel here, one universal attempt to replace them all. I think it's totally recognizable. Most of the time, however, the resulting pile of standards resemble the image on the left, which is the ancient Indian model of the universe, half of a sphere, carried by four elements supported by a big turtle. And then below that turtles all the way down layers on layers that are never actually achieving completeness nor consistency. But allow me to replace cynicism with optimism. There are, there are in fact useful ways to compare standards and objectively describe their properties. The first link here is one that points to the design principles written down by Tim Berners-Lee about the web, I think roughly 20 years ago. It's definitely the kind of web archaeology you could find. I dig it up every three years or so, and each time I get extremely inspired by it. It also introduced me to the old Chinese wisdom on the slides claiming that the usefulness of a teapot comes from the fact that it is empty. And you have to think about it, right? The emptiness is the thing. The emptiness of the teapot is the thing that allows it to hold your tea. I think this probably is the first incarnation of the modern marketing slogan, Liz is more. To be honest, that modern version never really made sense to me, but this original one, this one I feel. Anyway, back to standards. Just like teapots, they allow broader usage by limiting themselves, by being concise and focused, by removing all assumptions, by allowing extensions, evolution, embracing a collaboration with other good solutions and standards. And this is the kind of thing that made the web as fabulous as it is. And to my feeling, it's the same kind of meta-properties I can also recognize in the design of the arrow create standards. Anyway, with the semantic package standards like the one of arrow create, we can go back to our work zone. So actually, whatever glassbox standard definition we end up using, we will need to face the remaining challenge. We will need to revisit all our treasure boxes from the past and convert them into their transparent counterparts. At Liz, we have drafted up a process that switches between, on the one side, automated detection of the fields in a package and then allow a human, an expert assistant, to narrow down and actually decide. Actually, again, assign the correct vocab term to describe those fields, which is indeed linking back to work zone one. For that, we sponsored last year an open summer of code project also 2020. It was called the ShimDoc. We can argue about the name, but it made a first attempt and prototype of precisely this. In all honesty, it needs quite a bit of extra love and attention, but the original design plans and prototype are available and linked from this slide. So if you have any thoughts, while these or wild ambition in this area, we are very open to discussion and further tinkering. Remember, leave your notes in the shared documents. And just to conclude, think about having a system that can convert this treasure boxes in glass boxes. If we have such a system, we can rethink our repositories and archives for science data as well. We can actually give it this level, extra level of smartness. If you look at the current repositories, they are content of adding a limited set of metadata fields, and those then get attached as a meager discoverable label to the treasure boxes. But with this new approach, okay, helped through some semantics discovery, we could have the repository equipped to do the conversion into the glass boxes. And by doing so, we would make the packages discoverable and interoperable on a semantic level of the data and not only on the level of the metadata. So I think that's really important. Actually, the same line of thinking can bring us even a stage further. If we adopt the IDs of working zone three, which I called the machine actionable DMP. Again, a term I have to introduce because myself I had never heard about the MPs before starting at this DMP stands for data management plan and the top level way to look at them is you consider them being a contract. They list the expectations and agreements, all of them concerning the handling of data. And they are shared between all stakeholders of any specific research product. At this stage, many organizations or funding agencies in the research are requiring you to have these. And I think that testifies to the growing importance of data and research. And you'll find there a link that's only one examples, but if you need to build one for yourself, you have services out there that help you cover all the needs, then all the needed aspects. The platforms here are typically things that guide you through a set of templates and questions. So you don't forget anything. Definitely, these DMPs are important and useful. They're also formal. And more and more, they're following these template structures, but still they're free form text. And so they're only understandable to fellow humans. So again, people have been thinking about applying the trick of semantics also to these documents. And the big question is, can we have these structured so machines can read them too? And the hope is that through such efforts, it will lead to automated processes, robots, if you like, that help you out in applying the rules of the DMP. And for us, actually to fit it into this talk, which is, which definitely is about tools and support. We read that ambition to automate more as a way of assisting and enabling, and not only as a way to automate, automate policing and validating, which is the classical and easy approach. So here is our thought a platform we should think about having a platform for for building automated data management assistance. Okay. These would essentially learn from the DMP what needs to be done inside the project and then produce a handy user interface to help you with it. As such, it could become the helpful guide nudging you into the right doing the right things from the start. And it will definitely tie in with work zone two, because I think in the end it will produce arrow create packages. And those will hold the data trinity. But again, it will probably reuse vocab loop look up widgets from work someone in a scenario like this, I think we should be able to completely avoid avoid the treasure box trap, right, not getting into that. Anyway, without limiting your own imagination about such a platform. This slide is only offering some personal suggestions on what it should try to achieve. It's an assistant. So in my mind it should be close by it should have a desktop like user interface, but it should also embrace the best of the web. And it should be browser based. So I think some kind of a local host service would be best. After all, it also needs to seamlessly integrate with the cloud platforms that are listed in the DMP. And internally, it might be totally relying on knowledge graphs, but it definitely should have a natural presentation of the data that that look like tables and spreadsheets. And finally, yes, it should support a set of API API hooks. So it can be tied up in scripting languages, but also be used in workflow engines. Good. I'm glad we are you already made it here. And I hope you were too. We have two more to go. I realize I'm stretching your attention and I'm not giving even giving you a break. Let's let's do that now. Okay, let's take 15 seconds. gymnastics actually stand up. I can use it to stretch a bit drink a glass of water. Bend your neck, loosen up. Okay. The good news is that the remaining two zones are quickies. I only have some early principles and IDs. And I hope you guys will take care of the details. Okay, everybody ready. This is the final stretch work zone for is about linked data publishing. Now, the first three zones hopefully brought some uplifting positive vibe. Okay, we've covered all these nice things we can do with semantics. I hope I did, because now is the time to tune that back just a little bit. I'm already sorry for that. But the sad truth is that only working with vocups and data standards is really not enough to achieve interoperability. We cannot overlook how these vocups and standards get applied in web services and protocols. On this slide, you see a number of systems mentioned that are used in our marine research domain. And the observation is that despite the common belief in using shared vocabularies, even adopting them in some semantic aware data packages. And we are all using this wonderful lab. Still, they end up not being interoperable out of the box. We still face the fact that the inside our own domain subsets of data are closed up and thus hidden from other subsets. Of course, they can be converted, but that always requires additional coding and that implies cutting some corners and the outreach to other domains is limited in an even bigger way, because many of these standards only make sense inside our own community. Speaking for myself, only WFS was known to me. All the other ones were new and I'm active in the web domain for the last 20 years. So there's really stuff that is only used inside this community. Personally, I think we have been focusing on the wrong preposition. And I mean preposition as a player of words, because I want to target the preposition in Dutch HUT4 zetsel, as well as the similar sounding proposition, the guiding ID HUT4 style. I think all of us, the developers, we have all spent tremendous and well intended efforts to link a vast number of existing data systems onto the web. And in many cases, we have been adding layer or layer or layer stack turtles to achieve this. Often though, we have neglected to really embrace the web design principles and let them influence our legacy backend systems. Those systems and the layers on top of them do not still do not confirm to the beautiful design properties we need. Going back to the less is more Chinese teapot. This is my conviction. We should aim removing layers, aiming for less layers. Actually stop publishing data on the web, but seek for ways to have our data be truly in the web. Follow the nature of how the web works. One practical example I see is about blending away the arbitrary differences between data sets and data services. People don't even question the clear difference between both. Nobody even wants to think about them being the same thing, but we have to realize that any distinction between both only lives on the side of the producer of the data. Because in either case, the customer view is identical. The consuming scientists just sees a generic data provider. It sees an accessible URL. He doesn't or she doesn't care about that URL holding parameters or not. It is just there to produce some response. And in both cases, the response should have these glass box transparency. That's what we are expecting. So the transparency rules we talked about should be applied to our services as well as to our data sets. They should be equally semantically described. And I only mentioned here Hydra CG as one possible standing. Now, sketching the blueprint for this complete work zone is a challenge and frankly is a lot bigger than what we can chew in the time we have left. I just have this list of elements that I think that are part of the solution. And I have to admit when adding all these links to the slides, it did kind of feel like this QR ID ambition was starting to look a little bit silly. But the real silly thing here is that all these pieces of work are coming from the same single research group based in Gantt. So I really have to urge you to go and check out their work because all of it is highly inspirational and ready to be used. Most notably and to some extent it's a lot like the work zones I am suggesting today. All these elements all live kind of close to the actual users. They assume some close by assistance. They take for granted a more peer-to-peer distributed nature of the web. They actually have working technical answers to the big federated search question. And all of that is often in big contrast to the very centralized approach of many of the EOS projects for instance. All of them in some way introduce a central hub rather do that than provide open source code that allows anybody to set up their own hub nodes. They draw you to new big investment cloud infrastructures and often neglect providing the tools for local and distributed participation into that. And this allows me to make another observation. You see the fact of the matter is that a computer science department like the one I'm mentioning here. Those are not directly involved in any of the ongoing open science projects. None that I've seen funded by the EU. Maybe it's a funny story but last year I was at my first conference in the domain of life sciences. So that's doctors and I end up listening to smart medical doctors. Contemplating computer architectures that are actually a challenge even for the engineers. And to me it made me think about the room filled with programmers that are deciding yes we are going to solve the cure to cancer but we're not going to talk to any of the medical professionals. So maybe we should actually even extend the rule zero. Remember it says open science is not only for the scientists. Well I am convinced the open science platform should not be entirely built only by the scientists either. Right one more to go. The fifth and last zone just to keep me in time is named usage tracking and metrics for this. I have to come back to my honest now really honest appreciation for the fair acronym. After all it is the only torch lighting a pot for all followers of the open science procession. It is no coincidence in my mind that this acronym landed on the word fair. Somebody crafted to be here. It's landing on the concept of fairness on being fair. Really you just hustle the order or replace findable with something like discoverable which would be the more tech variant of the word. And the result doesn't end up being a word. Actually the word would also not easily apply to what people would call the better angels of our human nature. But this one does and that is truly useful. We should not joke about it. It is useful because it is tapping into one of the four main ways to drive human behavior. And that number four comes from the book shown on this slide and I can really recommend it. It covers intellectual property rights and the ongoing cat and mouse play with digital piracy. But here is the exercise. Let us apply these four influencers of behavior to the open science topic. After all we hope to convert all stakeholders to become active contributors. So the first one law and enforcement points me to the funding of research projects. There you have some leverage to shoehorn the independence of research groups into adopting new rules. A new rule like you must have a DMP from now on or you must follow the fair principles. The second influencer is architecture and the previous four zones I think we covered those four and they all fall in this area. The idea is to develop tools and techniques that make what I would call just doing research should be naturally feel like the same thing as doing open science. All the rules applied out of the box by just using the correct tools. There is some work to do there but I think it can be done. Third I already mentioned we have the magic of the fair words. It sells the idea that the open science way is the morally right way. But the last element is the one for this work zone and it is where it is the important missing one. The question is what are the tangible effects and payback streams. The scientists that are adhering to this new set of rules. They see the effort and the cost they need to do but they hardly see any gain. And noting that the currency in academia is counted publications and citations. Naturally there is an ongoing work to extend that approach. People are searching for a similar count to have some valorisation and appreciation of data production and data sharing. Opendatametrics.org that looks like a web domain but it's in fact a book if you follow the link you end up at it. And it gives a very good understanding of the current state of affairs about data citation and usage tracking. I definitely recommend to read it. It is very complete but still it left me craving for more. The point is that the tracking we need is a lot more complex than the classic track and trace of orders and physical packages. It's about digital media and we know digital media. In digital media you have error loss copies and they are cheap and thus abundant. We also see mashups. That's all the rage in science too. There is a lot of repackaging. The point is that data sets are not only published and redistributed via repositories. They also get loaded into aggregated services and those refragments and regroup all that data. And yes, we all know assigning a DOI to every data set is an important first step. But the new questions are spontaneously bubbling up. To what level must any possible fragment be identifiable on its own? Should data services attach full provenance trails to any service response they provide? You could even go further and question should those responses, when you think about those provenance trails, when they are made up of fragments, should you have a provenance trail for each of those fragments? So there's a lot to be thought about. And yes, I think we need some shared practice of publishing statistics in a semantic way again. Something that can then be openly harvested by anyone. And I obviously see a bridge to the DMP assistant from Workzone 3. You can envision having an assistant that would, when you use it to obtain download and include data in your research, it could automatically add the usage tracking statistics inside the package and then publish it together with the data. So there's a lot to think about. Actually when I, only this funny coincidence, it made me think when I was looking for an image to put on this slide, I just entered the keyword tracing and I got the second result here. You have to admit this is an interesting contemporary interpretation of the track and trace problem when you think about viruses and exposure. And it's just a vague ID, but maybe that's the kind of new kind of tracing that could open our minds also to find a better solution in this area. Good. That's it. We made it. Thanks for sticking around. I almost just not completely ended up in time. So I hope you're still here. From my side, I'm really eager to shut up now and listen to you. So if you can open the mics, please do. Most people join this call and listen only mode. So if you want to talk, you have to reconnect to the audio and do the echo test. Marek, there was already a question in the chat about frictionless data. Yes, yes. That's the good typical candidate that gets mentioned as a counterpart for, well, in competition with auto-create. I hope they find a way to match. I have looked into frictionless data. They also have this great vision of helping out to be a tool that is close by to the researcher and that is helping out. What I like less about it, actually in the history, they were, at the beginning, they were collaborating with the CSVW group. And then they split off. So they actually kind of chosen the non-semantic routes, assuming that was way too complex for people. Maybe it's just a matter of timing. Auto-create is maybe a little bit later. I don't know. They are now in a time where something like JSON-LD exists. And JSON-LD is typically, I don't know if you know this famous quote, but some RDF gurus saying, I'm really fed up with RDF. I'm not going to use it anymore. It's a pipe dream. From now on, I'm only using JSON-LD. The joke being that JSON-LD is just RDF, is a serialization for RDF. But it really, the joke shows that JSON-LD is mostly approached as being JSON and not as being linked data. So I don't know. Maybe if JSON-LD was around when frictionless data started, it could have been more aligned with semantics all over. So I don't know. Maybe there is some combinations from possible marriage, possible in the future. I like frictionless data, but I kind of like auto-create more. Anybody else? More questions? More questions in the chat at the moment. Just a lot of thank you and top presentation and that kind of comments. But if I understand it correctly, Marc, you want people to write their suggestions in the document, right? That you put the link up top. Yes, yes, yes. If you didn't have spare bandwidth during the talk, that document keeps being available so you can come back to me. Also, people have found my email in the slides. You can contact me. I'm definitely open for more discussion on this. Good. Sorry for going over time. And if nobody else is speaking up, I think I'll like to call it a day and stop the recording. Marc? Yes. Oh, there is Lukas? I will try my luck. Yeah, now I found the unnecessary. I wanted to ask you, you are showing us a great framework that looks applicable. And you are displaying yourself as a person with knowledge that could be contacted how to apply it. I wonder when there is like EU funded projects and you listed that you are like involved in many initiatives and projects. Is there any mechanism to ensure that they apply this framework? That they are really like not trying to reinventing the wheel, but really like that there is some incentive mechanism in the whole system. All actors are in to apply what you just presented because it makes so much sense. Well, thank you for that compliment. I am up and around in open stuff for the last 20 years. I used to have my own open source company. So that's how it all started working at Apache and stuff. And actually, I don't know if we should expect in a number of ways. I don't think we should expect distributed approaches from a highly centralized organization. That for one. And I also don't think we have to wait for top down decisions. I see a lot of good and interesting things happening on the floor. And maybe we could just assemble this as a bottom up counter answer to that thing. The good thing about the top down is that there's a great way of distributing the taxpayer money into the direction of reaching the correct people. And I have to say last year I've only met either smart people or extremely smart people. So I think that the distribution mechanism should be keeping on top down. But I really think the real solutions will come bottom up. And it will be from actual people on the floor seeing what problems are arising and finding smart solutions to do that. So yeah, maybe I was a little bit critical about my observation on there are not including computer science enough, etc. But I am quite optimistic that the intellectual potential on the grassroots level is there. And we will definitely reach solutions in a number of years or days. Good question, Lukas. Thank you. Anybody else? Anybody struggling to get the mic working? You could also put it on the chat. There's also a question from Bruno Mark. He asks in the document, do you think that public administration funding, scientific research and innovation could or should enforce the use of their results through ROC rates? And what would be the smallest step in that direction? Well, my previous question, I don't really believe about the enforcement. So the first way to modify human behavior in my list of four, law and enforcement, I don't think that's the one with the longest. It could be a smart way to kickstart it, but not the easy way to get it in the long run, to persist and keep it useful. Sure, local investments could be added up to the more European ones or the more centralized ones. I definitely agree there. And I don't know, was there another element of the question? Where was it? It's in the Google Docs and he also asks, can we as public administration offer them to foster the adoption? Fostering the adoption is just adopt, just start doing it. It would be nice that, and my previous job was in local government. So I definitely see how we could collaborate on a number of these smaller elements that could be then rejoined. I don't know, maybe having an ROC rates solution is over the top for local governments, but the elements like having a vocabulary search, that's definitely useful for them as well. DMPs, again, that's very research oriented, but having semantic frameworks, and maybe ROC rates could be a useful addition to, I have to think about it. Could be, in fact, used there as well, because indeed there exists the same problem. You have data, you have metadata, but people forget adding the semantics. And that's definitely something that a package structure like ROC rates could help with. I hope that answers the question. Okay, I see that Leonard added to my understanding that frictionless is not yet adopting linked data, but still has an open issue with it. So yeah, there you go. Good. Close to the one hour mark. Shall I stop the recording? Yeah, and we could still leave it open, but I think since nobody is speaking up anyway, I think we could just stop the recording and close the session. Thanks. My name is Mark Portier.