 Hi, so I'm Carly Heidemann and I'm going to be talking today about the Research Identifier Ecosystem. I hope to convince you that you did is a PID with benefits. So this is work that is currently being developed, is being incubated at the Trust Over IP Foundation, and it's informing my work in AgriFood Data Canada at the University of Guelph. Now in the research community, researchers need to be able to unambiguously identify objects. Now single objects such as which particular instrument was used for the measurements or this particular data set was a result of these research activities or methodologies, vocabularies, these are all things that could benefit by having a identifier. Now we can take that idea even further because with the creation of compound objects. So we would have like a project compound object where it would reference the people and the outputs and the funders that were all part of a particular project. We can then do the same thing for publication. So a lot of research information is disseminated via a publication, which is a collection written by a set of authors. They'll describe the methodology and maybe any code that's used, instruments that were used, the data sets that were generated, and this can all be expressed as a single object, as an example. And finally, versioned objects are another example of things like you could have a publication and version one, version two, version three, etc. So it can all be called the same thing, but you can follow the history along. Now when we're able to identify these research objects, we can link them together via the metadata. So the figure example here from Project Fran, we know that A and B are related and B and C are related and therefore we can assert the relationship between A and C because of something that we call in research the PID graph. So a PID graph is that we can view a network of the relationship between objects that are generated in research. So here we can see A, B, and C in that example and I've extended the graph. And by following the graph, the idea is that we can jump from say a person to their paper and their methodology, finding therefore a related paper and related data sets. So traversing the PID graph and there are many benefits being able to do this. We can generate insight about the research that is being done. We can follow funding to see, you know, after this funding thing, these are the objects that were generated, finding out what the community values, the thing that the communities reference constantly or to explore new relationships, to be able to find new collaborators who might be working on related topics, discover new objects for research like a new data set and assemble new sets of related things and ultimately perhaps to identify gaps in knowledge. So to uniquely identify the object, to find the description of the object and ultimately trust the results. In research what we need is a globally unique identifier that's globally resolvable to something useful and within a secure ecosystem. And in research we call these things PIDs, Persistent Identifiers. Now the current state of PIDs is that we have an identifier such as ROAR or RAID or DOI and when you have that identifier you plug it into a resolver which then usually searches some kind of database to find the unique record and from the record you can then find the metadata and associated links that were originally associated with that identifier. So in this kind of system of current state what you need is you need centralized database. Centralized control of that in the majority of Persistent Identifiers there's a pre-determined schema choice and there's really limits about what can be identified. So if we want to get to that state of the PID graph to be able to unambiguously connect from object to object we need to be able to describe many different types of objects. Now I'm proposing here a new improvement for Persistent Identifiers and that is the idea of decentralized identifiers. This is a new concept that's being developed. You can see that many of the Persistent Identifier pieces are here but they're just rearranged. We still have an identifier and that is resolved we use a resolving service to find the record that's associated with the identifier but here for example we can have those records on some kind of distributed ledger and then ultimately we're pulling up the associated metadata and any links that are related to the object that we are referencing. Now there's many benefits to having this kind of system of decentralization. We have a robust decentralized network. There's transparency trust traceability and with that traceability and the associated digital signatures we have auditability and that means that it's a very extensible system and there's lots of opportunity here for innovation. So people can create new things in the ecosystem while maintaining high trust. And this is very key in that it's a governed ecosystem. Certainly in the research identifier world we're talking about we would be looking at building a system that is extremely governed because and that is yet to be established that would involve many stakeholders and the governance is going to ask like how much of the ecosystem is directly governed and how much is open, who for example is allowed to write different schemas, who is allowed to register identifiers, perhaps the ecosystem might want to limit where the identifiers can resolve to or the role of endorsements and all of these choices can be tweaked because one of the key points is the funding model. So often funding is at a national level but the need for a global system of resolvable identifiers is an international one. So governance is going to have all kinds of choices to be made about who could join, delegation, etc. And this is ultimately going to dictate the functionality of the distributed identifier ecosystem. Now what about trust? How does trust work in this system? So how do we trust the results? In the current state what we do is we trust the centralized authority who controls the records in the database. So you have a number of people who are privileged to be able to log in and are allowed to edit records in the database and we trust that the centralized authority is better used. Now in decentralized systems such as the did system, we have trust that's built on asymmetric cryptography. So asymmetric cryptography relies on keys. Anyone can independently create a private or public key pair and then when you sign an identifier with a private key you can mathematically confirm that signature using the public key that's available. So conceivably anyone can create and sign identifiers but no one can create an identifier with your signature. So we have a new way to trust. So we're moving beyond the idea of passwords and central authorities who do the vetting to private keys and digital wallets with the public keys that are available on the ledger. So the basic parts of the ecosystem for decentralized identifiers that I'm describing here, first thing that we're going to have is what we call the did document. That's the object that's signed by the creator and it represents the object. It's going to be the one that has the links to downstream resources like the associated metadata of the object, the URL of where you can find the database, etc. And the location of this did document in this example it's on a distributed ledger. Now the W3C is a definition of decentralized identifiers of course there's different ways that you can store this kind of information. And the did itself is just a short little string. I've given an example here of a made up research identifier and that would be kind of the text that you might include in your publication for example. You could say that the data set is available at did, res, etc. And then when somebody looks up that did they would then plug it into a resolving server service in order to then I find the compiled the did document and then trace it back to find the metadata and any external links. And all of this of course is is predicated on private public keys. So it also involves wallets for all the users holding their private keys and also possibly something I haven't mentioned yet, verified credentials, which might be credentials that let you do certain things within the ecosystem. So let's test out some of the use cases and illustrate the possibilities in research identifiers. One of the first things we're going to ask about is interoperability with the existing system. So we can use this decentralized identifier to point to an existing identifier. So my object already has a PID in this example. It's got say a DOI for paper or something. So I can create did that just resolves to the PID and it will be cryptographically signed by me so that anyone can confirm that I was a person person who author author and controls that. So we see here the first thing that I do is I create my public private key pair. I've got my private key in my wallet and I would publish my public key onto the ledger of the decentralized identifier ecosystem. So that would say like Carly Heidemann, I claim this to be me and this is my public key so that when you find something, you can check it and confirm that I was indeed the person author. Now, the next thing that I would do is I would create the did for my paper that's already existing and the did would just be a reference downstream to the existing DOI or URL etc. And I would sign that did with my private key. So somebody else then would be able to go and look up that the that they can confirm using looking at my public key that I was indeed the author of that that paper did. And the important thing here is that I didn't need a central authority in order to secure and trust my decentralized identifier. So coming back to one of the key ideas that I introduced at the beginning of the talk, the idea of creating different types of identifiers. Currently, in the systems that we have right now, it's hard to do. So what we ultimately have is that, you know, if we want to create an identifier for a person, then we have to have international collaboration. They'll write about what kind of information belongs with this identifier. Then the in this example, it's known in research as an orchid ID or orchid. And they will run their own database that has all the entries. And then we come along and we say, no, now what we'd like to do is collections projects. So the raid identifier gets international collaboration. And they set up a database, they build a resolver, they secure the funding, etc. So you can see that it's as it is currently being used as it currently exists, it's difficult for new types of identity fires to be spun up. So we don't see a whole lot of adoption of the different types of identifiers. But if we go back to that figure of a PID graph and being able to follow resources, we can see that it would be really beneficial to be able to give things unique identifiers and describe them with the appropriate metadata. So with the decentralized identifier ecosystem, it can be very easy to create a new identifier type in the system and to have the high amount of trust that goes with it. So in another example of a use case, so I am the society, Canadian Society of Microbiologists. And my user, my membership has identified a need. We want to create a new microbiome identifier type so that people can give their microbiome research objects like a sample, a decentralized identifier, and they can describe it using this new community schema. So in this example, then here's a different variation of governance. So perhaps the governing body says grants to CSM and organization, they're credentialed that grants them the right to publish schemas in the ecosystem. So maybe in this kind of governance model, we say that schemas are only allowed to be registered by specific parties that have been granted that way. So what the CSM does is they are going to, because they have the permission for it, they're going to write a schema and then they are going to create a decentralized identifier that points to that schema and they're going to publish that on the, they're going to publish their did document and they're going to sign it with their private cryptographic. Now let's use that new microbiome schema and we're going to mint a new identifier. So Alice, she has a microbiome freezer sample, she's been attending, she's a member of, say, Canadian Society for Microbiologists, or she just appreciates their work and she wants to give it, she wants to give her freezer sample identifier and she wants to describe it using the CSM microbiome schema. So the first thing that Alice needs to do is that she goes out and she finds the CSM microbiome schema, perhaps on the CSM webpage, they have published a list of schemas that they support or endorse or that they are authors of and that's maybe the way that she enters into this ecosystem. So she's found the identifier for the schema, she looks it up and she can confirm that the CSM were the ones that digitally signed the schema did and she trusts that she has found the correct schema because she looks up the CSM, they're a decentralized identifier, she looks at who signed the schema and she confirms that it was them and then she is able to go follow the links and find the actual schema. So now what Alice is going to do is she's going to write metadata for descriptive data, metadata for her sample that she wants to give an identifier to. So she composes that metadata according to the schema that she had found, she's going to reference the schema that she's using in order to write that metadata and then she is going to create an identifier for the metadata record that she has just written and she's going to sign that identifier with her private key so that anyone can look up and confirm that yes, it was indeed Alice who wrote that, who authored that microbiome sample and she was the one who registered the decentralized identifier. So another use case that we can talk about is the ability to create a compound object using a schema and have that compound object reference other identifier. So for example, we find the schema for a publication object and it will say that you need to contain different features like who are the authors, datasets, etc. So you can write that metadata for the publication and then give it an identifier so that people can find that publication and then the publication itself can internally say oh the authors were that referencing the specific author dids or pins or something and the dataset identifiers as well. So you can write compound objects and they can contain references to other objects. So with this system we have interoperability with existing systems of identifiers. So it's very easy to add new functionalities. It means that we can create a flexible system that enables creativity. You can have researchers controlling different aspects of the ecosystem that they are able to build and since everything is signed there's going to be full providence and this should ultimately handle enhanced reproducibility in science. And in my little examples here there's a lot more of the technology that there is than what I have discussed like open wallets, key management, carry, for those in the know, revocation lists, endorsements, licenses, and so on. But the next step after all of the discussion here would be to start making governance decisions about all the different parts that are possible how to put them together in order to make the ecosystem of research identify. Now the technologies that I'm talking about here parts of this ecosystem are being built right now. Now I'm not the first person to come up with the idea that decentralized identifiers are great examples of PIDs in research. I didn't see this, but Marcus Sabadello presented Kailia Young's work at the 2019 Pidipalooza. Pidipalooza is a conference about research identifiers and I guess they needed to make the topic that is admittedly a little dry, a bit more interesting. So this figure is from Marcus's presentation where he compared decentralized identifiers to all kinds of other identifiers that are currently being used. And I added in here that indeed the PIDs are unique for being the ones that are cryptographically verifiable. Another example of some of the technology being in use right now is that of Glythe Global Legal Identifier Foundation. So they are using authentic chain data containers based on carry and in this example Glythe is a root of trust for identity. So these are after the 2008 financial crisis, Glythe was an organization that was founded in order to give all business organizations identifiers to help make financial transactions more sure. So they are now setting up ways to do this with a cryptographic verifiable LEIs, Legal Identifier Foundation. And here in Canada we have starts of another decentralized identity ecosystem. So some of the promises are contributing to a Canadian Hyperledger Indie Network called CANDY. The idea is that Canadian government entities can be given identifiers and they will be able to issue and digitally signed verified credentials. And the governance for this is available, readable on GitHub and the link is down below. So you can see some of the efforts that are currently being done to, with different parts of the tools for the ecosystem of resistance identifiers. And as I said before the W3C, they have now recommended officially DIDS and VCs. Canada is also looking towards digital trust and identity and is certainly referencing these standards as well. So I hope that I have convinced everyone here that DIDS are indeed PIDs with benefits. Benefits are robust decentralized network, transparency, trust, traceability. Everything is digitally signed. So this means that we have the possibilities for auditability. Very flexible, all the components can be remixed, accessible to new use cases, and ultimately an opportunity for more decentralized innovation. I really like this ecosystem and the technologies that are part of it because it means that we can start to become a bit more creative with how things get mixed. People can create new things within the ecosystem, but then they're still able to maintain a high degree of trust. Thank you very much.