 Hello everybody, and welcome to the presentation how to foster data sovereignty for OpenStack Cloud based on always self-description. So what does data sovereignty means? This means that you, as the owner of the data, have the control over your data that you can decide in a self-determined way where and how your data is processed, where it's stored, how it's transmitted. And in doing so, you need some information about the properties of the provider, the service, or the resource, the data centers that host the service. And these information are currently not there, or if they are written down where, they are not written in a transparent way, in a machine-readable way. And this can be solved with self-description. You may know the term self-description from GaiaX. GaiaX is a cooperation, a European cooperation, which aims to foster data sovereignty. So it defines standards, concepts, architectures to establish data sovereignty. And we are inspired by that concept of self-description. We reused the self-description and extended a little bit to foster the data sovereignty. So what do we want to do? I will shortly explain you what self-description is. Then I will show you how you can use the self-description to foster data sovereignty. I will show you self-description for OpenStack-based clouds. I will show you the tool stack, and I finish the presentation with a conclusion. What are self-descriptions? I already said it. Self-description are machine-readable, standardized formats to describe the properties of clouds and cloud providers and cloud services. They are also by the cloud provider. They are based on linked data principles. They are serialized in JSON-LD. They can be visualized as RDF graph. They are comparable because it's a standardized and machine-readable format. And you can also sort and query the self-description with, for example, OpenSci for Sparkle or Sparklestar. A self-description is based on a self-description schema. And what is a self-description schema? A self-description schema is in a very simplified way. It's a vocabulary you have, which is available to write the self-descriptions. And it consists of two elements, a taxonomy of classes. As you can see here, this is the taxonomy of classes. You have a top-level class, which is a GAIA-X entity. And there are three Bay subclasses, a participant, a service offering, and a resource. A participant is an entity who offers or consumes service or a provider or a consumer. A service offering is what the name says. It's the service which is offered. And the resource is a building block for a service offering, which is not available for order. And you need this building block in order to describe your cloud service in more detail. This could be, for example, the data center where the service is hosted on. This could be the data itself. It could be the software. All these artifacts can be important for customers, so you need a way to describe these entities. And each class has a set of attested attributes. The attributes can be optional or mandatory. And the mandatoryness is due to trust guidelines. And there is a special document, the trust framework, from Guy X, which defines the attributes which have to be there in order to be Guy X compliant. And mandatory can also be due to technical guidelines. For example, a technical mandatory attribute is the service endpoint to access a service. And these properties are defined by a special subworking group of Guy X. This is a service characteristic working group. Here you see the self-description schema for a legal person. A legal person is a special subclass of participant. And what you can see is that we have three attribute groups which are mandatory. We have a legal address. We have an ad quarter address. And we have a registration number. And the registration number is also subclassed by five different parts. So currently, Guy X supports the like code, the value at a text ID, a local registration number, the economic organization resource identifier, and the European identifier as a registration number. And you have an optional attribute. This is a relationship so that you can model the parent and suborganization relationship between legal persons. Here you can see a self-description for cloud and heat as a legal person. It's visualized as a very simplified RDF graph. What you can see is the subject in the middle, which is cloud and heat. And cloud and heat has two registration numbers, a local one and a value at a text ID. And you see, although, the two mandatory attributes, the headquarter address, which is in Dresden in Germany and the legal address, which here are the same. But there are organizations where the headquarter and legal address differs. How about data sovereignty? How can we use self-description in order to make it easy for the consumers to decide which cloud service to choose? And in order to look at these concepts, we have first to clarify how we identify subjects or entities in Guy X. And this is done by W3C decentralized identifiers, shortly DITS. What is a DIT? A DIT is a ORI that identifies an entity and that an entity is in Guy X a participant resource or a service offering. And the DIT also associates a DIT document to that entity. And this DIT document contains more information about the entity. The DIT is always controlled by someone and this is a participant in that case. And the participant, if the DIT identifies the participant, it's the participant who provides a service offering. It's a participant who maintains the physical resource or it's copyright owner of a virtual resource. So Guy X divide two different types of resource, a physical one, this is a resource you can touch, for example, a data sender or a virtual resource, which could be a data, a software, a license, or something like that. The DIT document expresses metadata, cryptographic material, services, and verification methods for the DIT subject. And the DIT subject is a participant a service offering or a resource. And there is an important part in the DIT document. For staff descriptions, it's a service part. What is a service? A service is an endpoint that you can use to retrieve more information about this participant, the service offering, or this resource the DIT identifies. And we use this services to retrieve a self-description. So when you have a DIT, you have a DIT document. And in the DIT document, there is a reference how to retrieve the self-description with details about the entity which is identified by that DIT. The DIT document is stored in a distributed DIT registry. And you can retrieve the DIT document with a DIT resolver. There are public-available DIT resolver out there. And you give them a DIT when you type in a DIT. You get the DIT document. And I told you already that in the document, there is a reference to the self-description. And that's how you get the self-description for a guy X entity for a participant, a resource or a service offering. As soon as you have the DIT, you cannot retrieve the DIT document. And the DIT is a URI, as I told you. And from the URI, you cannot always deduce the entity which is behind in URI, so that we have an onboarding process where you match the DIT to the entity in the real world, so the DIT of Cloud and Heat to Cloud and Heat, which is identified by the DIT. How do we achieve data sovereignty? So I told you that we have, on the one side, the provider with the services, with the resources. And the provider defines the resource or service offering with the self-description. And on the other side, we have the consumer. And the consumer needs information which are in that self-description. The consumer also knows how to retrieve the self-description, but there's no trust between these two parties. So how does the consumer knows that things a provider is claiming about himself or his services or whatever are true? And this is solved with W3C, very fiber credentials. How does it work? So when you have the provider and the consumer and they don't trust each other, you need a third party, both parties trust. And this is a conformant investment body. What does this entity does? It gets the claims from the provider, for example, the legal address or the ISO certificate, or that's a service of the data senders hosted in Germany. All these claims are sent to the conformant investment body. And this is responsible for checking the integrity, the honesty, and the completeness of that statement. And then it issues a so-called verifiable credential back to the provider. And the provider stores it and can use this verifiable credential to prove a specific set of properties. This concept is not new. We already have it in the physical world as well. So imagine you want to prove another person that you are over a teen. How do you do that? You show the ID card you have, which is issued by the government. And the government is a trustworthy third party, is a conformity assessment body. And because you have this ID card, this temper proof ID card, and the other person that you trust, this ID card, you can prove that you're over a teen. And this is the same in the digital world with W3C verifiable credentials. And the self-description is then just a set of verifiable credentials. So the provider decides what verifiable credential you want to put in a self-description and then publish this self-description. How about the data sovereignty on the provider side? So it may be that the provider has some information you do not want to share with the entire world, but only for a closed user group. For example, a provider do not want to publish the entire architecture and software stack of his services unless there is an NDA signed. And in order to give all of the provider the possibility to decide with whom he wants to share which data, we have developed the concept of a self-declaration. A self-declaration differs from a self-description in two points. The first point is that the self-declaration is complete. That means it contains all information which are available about an entity. And the other thing is that the self-declaration does not contain the full verifiable credentials. So in the self-declaration is not the full address listed, but just a reference where you can download, where you can retrieve the address. And the client can, when it has the self-declaration, can use that link and go to the storage of that verifiable credential and authenticate himself. And then when he is allowed to get this information, he receives the information. So the provider can now, for each verifiable credential, which is in the credential store, can define fine-grained usage policy, who can access this information and who is not allowed to do that. How does this look like in a nutshell? So I told you, we start with issuing verifiable credentials. So the conforming assessment bodies issue the verifiable credential to a credential storage. And this is the organization credential manager. And then the provider defines the self-declaration and decides, OK, which credential do I want to have in that self-declaration for, for example, my organization, my service, my data center, my software stack, whatever. Then this is not good. Then this self-declaration is stored in a self-declaration storage. This is just a public available endpoint where you can download the file with the reference. The consumer knows the did of the entity, and he wants to know more information about that entity. So he takes a did, puts a did in the did resolver. The did document is retrieved. In the did document, there is a link to the self-declaration storage where he can download the self-declaration. And in the self-declaration are the links to the credential store to download the verifiable credentials if he is authorized to do that. How about the self-description schema for OpenStack-based cloud? The schema is very complex, so I cannot present everything. I just focused on three important parts. This is the jurisdiction of the provider of the cloud service. It is the physical locality or the physical location of the data center where the cloud service is hosted on and the interoperability of the API of the cloud. The jurisdiction of cloud provider I already showed you. So we have a service offering, and a service offering had the mandatory association to the participant who provides that service offering. And the participant can be a legal person, can also be a natural person, but we are focusing on legal person here. Legal person, the legal person had this mandatory properties, headquarter, and legal address. And with those two information, you can deduce the jurisdiction. You can figure out which law may be applied to the data which is processed or stored in that cloud service. Here you see the self-description for the OpenStack cloud cloud and heat is providing. So there's a reference to cloud and heat, and this I already showed you is a legal and headquarter address of cloud and heat. How about the physical location of a data center? A data center is modeled as a physical resource in GAIAX. And we have an aggregation between service offering and resource so that you can define which resource your service offering is based on. And the physical resource is a subclass of the resource and has a mandatory attribute location which points to the country or more precisely the country code where the data center is located. And when we look at the cloud and heat OpenStack cloud, we have aggregation to the data center and the location address is here points to the country code. That means which is Germany in that case. So the data center of cloud and heat is located in Germany. And though the German law applies to the data which is processed there. Interoperability of OpenStack clouds. This is also very complex interoperability. So I focused on two parts. This is the VMs and the flavors which are provided by the OpenStack cloud. And to model an OpenStack cloud, we have a special subclass called Virtual Machine. And the Virtual Machine is a subclass of service offering and has two mandatory attributes. There's a code artifact which points to a Virtual Machine image with a lot of properties and also at least one service flavor. And you can define the hardware requirements of that service flavor, for example, the memory requirements, network, the CPU properties, the GPU properties, and the same also for VM image. This is a subset of the interoperability of Cloud and Heat's OpenStack cloud. For sure, we have more than two images and more than two flavors. But in favor of visibility, it just presented you these two to visualize how such a self-description can look like. So we have here images. We have the size of the image, the CPU architecture, the RAM size. We have the license, the copyright owner, and so on. And the same for flavor. That's not a rocket science. How about the tool stack? So we need some tools in order to make these things happen. And there are already some tools out there. We have a portal. The portal is a web-based user interface to create credentials and self-declarations manually. So it's just a format type in all the information. And then it outputs you the appropriate JSON files. There is a generator, a self-description generator coming from the sovereign cloud stack. This is a Python script or a bunch of Python scripts which generates credentials for the OpenStack clouds automatically. So what does this tool do? It takes a normal tenant block in and calls via the API the OpenStack cloud and retrieves technical information such as the flavor sizes, the VM image, the properties of the VM image, the volume types, and so on. We need also a storage where we store the credentials in. And this is the Organization Credential Manager. And this credential manager also provides an interface to access these credentials and also to define the access policies. And we need a self-description storage. The self-description storage does not exist currently, but it's just a web server can be a self-description storage where you can download a file. So it's not rocket science. What are our next steps? So I told you that most of the parts are aligned with Sky-X, but some parts are not. So our next step will be to align all these parts with Sky-X and also to give the additional concepts upstreams. For example, the self-declaration part is not yet Sky-X compliant, so we give this upstream this idea. What we also want to do is to implement a proof of concept of the thing you already said so that you got from a dit to a self-declaration and from the self-declaration to the self-description. And we do it as part of two projects, the TELOS project. It's a German research project and as part of the sovereign cloud stack. And what we also want to do is we want to extend the self-description schema. Right now, the sustainability and security features are not supported. And at least from our point of view, these are very important features which should be included in the self-description of a cloud service. That's all. And now I'm happy to answer your questions. Can you use a mic? You said that the self-descriptions, the main focus is, let's say, machine readability and stuff like that. Exactly. At the same time, you illustrated quite well the complexity of these models. Does it work in practice? If the structure and the hierarchies and the interrelationships are that complex, then I would assume that it might be quite difficult for machine processes to find the information. Honestly, I cannot answer the question because there is no practical survey at the moment. We are just in the beginning. But it's simply the same like the linked data. And the linked data all the work, as far as I know. Because the self-description is nothing else than a big linked data file. So maybe a second question if possible. You said that you want to also extend the GAIAX federation framework with your extensions. So who's responsible for that? Who will accept this or will reject this? What do you mean with the GAIAX framework? Maybe I go back. Do you mean the self-description scheme? Or do you mean the upper part that we want to align the concepts with GAIAX? The upper part. The upper part. Who decides? That is a technical committee. Who decides that? The technical committee or the GAIAX is divided into several working groups. Cloud and Heat is part of that working groups. So we bring this into the technical committee and then they decide hop or top or whatever. OK, thanks. Other questions? There seems not to be the case.