 Thank you, everybody, I'm very happy to be here because this is my 10th Euro Python, and it's really stunning to be again with all of you, all the new friends together after two tough years. So today we will speak about self-explaining API. So I work for the Italian Digital Transformation Department, and today I will present you how to design schemas that simplify the API mesh up and interoperability. At first I will explain you the concept of controlled bookabularies, and then how to use them for creating interoperable REST API based on contract-first schema design. At the end I'll show how a central data catalog for semantic interoperability, a lot of words, will support this approach, but don't worry, this is not a talk about semantic web theoretics, and well, for semantic web folks, please forgive me, I will try to make things understandable. In Italy we want to simplify API mesh up, but it's not easy because we are a lot of people, we have a lot of agencies, and every agency is published their own datasets or services through API. So the art part is that all of this should have some common meaning, some common ground, and this is not easy. Let's see a simple example. Well, what's semantic? Semantic is the study of meaning, and this is important to be sure that the message is understood. Now we can see two different API messages, but those are not very clear because in the first case we don't know if it is a full name or it is just a first name, and if it is a full name, which is the first one, and which is the family name. In the second case, we don't know, okay, we know that Fabiano-Romeildo earns four million something, but we don't know what something is, so if we have to exchange this message with another country that has a different currency, this message can be problematic to integrate or to mash up. The solution are controlled vocabularies. Controlled vocabularies are a computer science tool that uses URI to disambigrate terms. It is very simple. So the first part of the URI is the name of the vocabulary. For example, this is the BPD vocabulary. Then there is the term, in this case it's dog, and then there is a definition. You see the RDFS comment. It's the common field name in vocabularies for a definition that is written in human readable language. So vocabularies contain a collection of terms and define concept and relationship in a specific domain. For example, in healthcare, in finance, whatever. They are validated by designed designated authority that is not necessarily a public authority. For example, your company, your own company could have a vocabulary for defining the different job titles, for example. So that when the hiring managers have to hire people, they can use a very well-specified job positions to do it. And they just not invent job positions. And they are formally described. We have languages for that. Using the text start on media type, or it's JSON counterpart, JSON LD, which is a W3C specification. Actually, all those specification, all those languages are completely isomorphic. So you can switch from Tartle to JSON LD and add exactly some information. Complex vocabularies are called ontologies, but well, this is not the focus of this talk. And codelists are the simplest form of vocabulary. They are simplest of terms. For example, the job title one, I was saying to you. So let's see how to create a very simple vocabulary. Here we have a vocabulary made up of four terms described in Tartle. At first, I declared the URL in spaces so I can write it in a more concise way. Instead of writing W3.org 2000 and so on, I just write RDFS. Then I defined the terms using one or more sentences. A sentence is made by a tribal, a subject, a predicate, and an object. So I say that a person is a natural person. It is described, you see the RDFS comment predicate, means that this is for human. It's human readable, it's not machine readable. The person has a given name and given name is the given name of a person. The same for registered family. As you can see, family is a complex term. Different countries, for different countries, or for different communities, family can mean something different. But even in the same country, for different agencies, the term family could have different meanings. In this case, in this vocabulary, a registered family is a family, is a group of people tied together according to a very specific Italian law. For a service produced by another agency, the term family could have a different meaning. In this case, it is not an IT registered family. You can see that IT is W3ID.org, Italian onto CPV. This means that worldwide, I can classify a registered family with a unique URI that is valid worldwide. If Italy is going to interact with another country and we use the term, they can see the meaning of that term. And another country or another agency can use a different URI to define a family. So, three terms for now. Now we define another term that is child of. Is child of is the child parent relation. And you can see I have another predicate. I have another sentence defining is child of. That is that it applies to person. In this case, I have a very clear definition of what is a person, what is a given name, what is a registered family and what does is child of mean in a very small vocabulary. Every term is well defined in this file. So, I can use Python to process this kind of files. The library is RDF-lib and vocabulary is interpreted as graph because I have entities that are related together. Subjects and objects are related by predicate. So, I parse those files in a graph and then I can translate this information from the total format to the JSON-LD format that is completely isomorphic. I have even other ways of serializing this information, for example in XML, but we are not interested in XML. In this case, you can see that I have a context. The context is that the IT string means that long URI and then I have a graph that is made of a list of travels. I have the East child of that has a comment and it has a domain. You can see that ID, the domain has an ID. This means that there is another line in the graph that contains ID, IT person. Let's make another example. This is very interesting and very useful. I can use and define vocabularies not only for concepts like the person concept, but even to define information, data set. This allows me to provide a lot of information in data set and this information doesn't need to be linear. There can be graphs. So in this case, this is a vocabulary based on these cost and doubling core standards. Those standards provide keywords and predicates to create more and more vocabularies. Syntax supports internationalization using language tag. For example, I can see the country ITA. This is the subject. The identifier is the ITA string. I have two labels, but I can have more, one in Italian and one in French. The same concept can be shortened, expressed in a more concise way. So sentences with the same subject and predicate can be shortened using semicolon and comma. For example, I can just write France as an identifier and two preferred labels. They can even create terms. For example, for the Czech Republic, I can say that it replaced another entry that is in the vocabulary, that is Czechoslovakia. This is the same for the Slovak Republic. On the contrary, I can say that the Czechoslovakia has been replaced by Czech Republic and Slovak Republic. If you can see all that stuff, you can understand that vocabularies improve quality, because in my service, if I say, okay, I use a tree code, a tree isocode letter to identify a country, and if I say that I'm using this EU vocabulary, I have not only with those tree letters the information of which country it is, but they have the localization of this country name in all the languages of the European Union, and they have even a lot of more information. For example, whether this country has a Euro currency, if this country has been replaced by another, for example, this is very important for registry information, because if you're a citizen from the Czech Republic and you were born before, for example, in 1980, you were not born in Czech Republic, you were born in Czechoslovakia, so you can use this table, this vocabulary, to map back the old information about countries, and all you need to store on your dataset is the tree letter code. So we can see that this is very helpful. Vocabularies are stored in graph databases. So you can use virtuoso, you can use Amazon Neptune, and you can query those databases with all this information using the Sparkle protocol. In this case, I have the vocabulary we have seen before. I make a query, and in this query where I specify a list of predicates that should match. In this case, I say the URI should be in the scheme of the Eurovoc country vocabulary. Eurovoc and SCOS are resolved using the name space I introduced before. They should have a concept using the pref label, and they have an identifier. I am interested in the concept and I'm interested in English and not in the others. And in this case, I just extract a very simple table with the URI, the concept, so Italy, France, and the identifier. So I can populate this information into a graph database and extract very simple views of those complex information. So I explain what our vocabularies are. Now we'll see how can we use them in a very simple way. So I can use vocabularies to describe data. This is a very simple example. It's not the country, that's me. That is identified by an email URI, so you can see this is my mail. It's me worldwide, that's me. This is defined by four sentences. The first one, well, they are actually five. I updated my slides, but the first sentence says that I am a person according to the Italian vocabulary for person. I have other predicates that says which are my given name according to the Italian vocabulary. My family name according to Italian vocabulary. Okay, it seems simple because it's a family name, but our friends from Iceland, for example, they have a patronymic or matronymic. So the concept of family name in Iceland is different from the one we have in other European countries. Or for example, this is my given name, but was it the same name I had when I was born? Is the same name that I had on my birth certificate? Or maybe I changed my name in time. So as you can see when you design services for millions of people, there are a lot of other cases that might happen and that you may have to take into account. So in this case, I am stating that this is my given name that I have now, not the one I had at birth. This is my family name and not my patronymic or matronymic. This is my birthplace and that's according to the EU vocabulary. And I have an identifier that, okay, I picked my mail, but it can even be different from the one I use in my subject predicate. Application can use all this information back in here and all the link-up information to automate interoperability checks and other logics. For example, they may check if the country where I was born is existing now or if maybe it's changed, for example. So it has been superseded by other countries. So those are all checks that you can do if you use data that is linked through vocabularies. Well, the nice thing about linked data is that they have many dimensions. They are graphs. And there is people that spend their lives providing this information. But actually you can project them to lower dimensions so that people that is not aware of all those complexity can use them because people may be just interested in a list of country names and localised names so that when it pops up a window on a web application you can see Italy instead of Italy, for example. So there are specifications that allows you to project data on those dimensions. There is a JSON-LD framing to project this kind of data into a very simple JSON object that is part called that you can use to make queries and produce, for example, CSV. There is a CSV for the web. There is another specification that allows to interpret CSV information as linked data. So the important thing is to build stuff using specification. Let's see JSON-LD framing. This is using the Python-PyLD vocabulary. So this code is quite simple. Loads the European country vocabulary from that URL that is published by the European Commission. Then loads into a JSON object. And then it makes a projection. To make a projection is named framing in this specification. So it selects all the subjects that has a given type. For example, all subjects that has a... Did I have a type cost concept? This is a technical idea, but that's okay. It uses and shows all the fields that I am seeing there. So country code, versioning, foe, and label in English. But those fields do not exist yet. This is the shape I want the JSON object to have. But we will see it in the next slide. And then I have a context. The context take the RDF information on the right. So the ID, the identifier, the versioning foe, the preferred label localizing in English. And map it to the specific fields. And the next thing is that the context object you can see there can be used to convert back the simplified JSON object to the original semantic stuff. But let's see, because it's very simple. On the left, I have the vocabulary. On the right, I have the JSON. So in the cost text, I say URL is an ID. So it takes the ID from the subject of the vocabulary and puts it into the URL field. Then it takes this cost pref label predicate. Just takes the IT localization and puts it in label underscore IT. It takes the versioning foe and puts it into the versioning foe object. Since I'm not specifying anything about the Euro currency, currency adoption date, it just keeps this information. In this way, I have a very simple projection of those very complex information that can be provided to web developers that have no knowledge of all the complexity of vocabularies we are explaining now. But that can use it, for example, to populate web forms or APIs. So the challenge when we work with vocabularies is making this information accessible. But we can build platforms so that we can publish data in different formats so that people can use them directly to create APIs or for all online fillable forms. So I have this linked data information with an internal. It's complex. It may be boring. It may be not comprehensible. Okay, but I can create a platform through framing that I showed before. I produce a JSON API. Or I can produce CSV so you just pick the fields you want to see and you get tabular data. Or I can produce a JSON schema. Imagine I want to produce an API and I want to populate a field. A field should be constrained by only the countries that are in this vocabularie. Okay, this is a JSON schema. You can see it. That provides an enumeration of all the fields that are contained in that vocabularie. So people are not supposed to understand how the vocabularie works, but you can write an API that wraps it and provide it as a JSON schema so that people that is going to build an API can say, okay, just reference this JSON schema URL and you will get that vocabularie for free. Another thing you can embed is into frictionless data that is a specification that provides metadata for tabular data. So we are mostly working on enabling people to use, in a simple way, this data that can seem complex that seems not completely understandable. So this shift now, semantic APIs. When we have APIs, we want that APIs can reference concepts and vocabularies to provide a complete and machine-readable description of the exchange concept. So if I send a payload, I want that a machine can be able to validate it not only for its syntax, but even for this semantic. So how can you build semantic APIs? Semantic APIs should be built using the same vocabularies. When different APIs use the same vocabularies, there is this feature that we have seen before, that is the JSON-LD context, that allows to map JSON properties to vocabularie terms. In this case, we have two API palettes. The first one is in Italian. The second one is in English. How can I know that they map to the same person, for example? I can write for my API a context, that is this texting rate. This context says that the numeric property fields maps to the Italian vocabularie, W3ID.org, Italian given name, that cittadinanza maps to the Has-Citizen concept, and in the concept I say that it uses the European country vocabularie. This means that whatever you have in the value, or the ITA string, should be appended to the base of the context. Well, if the other APIs makes the same work with given name, citizenship, and so on, they can be mapped back to the same vocabularie. So you can see that I can transform back and see that the user has an IT given name, Mario, and then has an IT as citizenship that has the full URI of Italy in the European country vocabularie. So the work that should be done is to design APIs that should interoperate between different, for example, ecosystems. Together, for example, imagine you have to integrate an API that works on the finance sector with another API that works in the registry sector, or in another financial sector where there are some regulations that are different. You should gather your payloads and check to understand whether the concept that you are using in your APIs are the same. Because, for example, in some cases, you may use the concept of a legal person, and in other contexts you can use the concept of physical person. They may not map in different ecosystems, and this means that maybe in some cases if you are creating a financial application that only works with people, it's okay, but if you are creating a financial application that works or that should work both for physical person and legal person with companies, maybe you need to tweak your application before integrating, before mashing up. Otherwise, you may end with inconsistency. So, how can this enable interoperability in cross-border services? Well, the basic game is that the European Commission defined a basic vocabulary for a person to identify a subset of person so that on the left you have a registry name in Italy. With a given name there is a second name, a surname, and a country. I can see that some of those fields map to the European vocabulary, that is w3.org slash ns slash person. So, I can map some of these fields. Some of the second name that maps to alternate name has no mapping, but for this subset I can transform this person record to a person record that is possible to map in all other European countries. And the same can be done by other countries. This means that I have a basis to create interoperable service so that if you move to Finland, for example, or to Ireland, the basic registry information are available all across Europe. So, the problem now is that we have three different specifications. The first specification is the Linked Data Tartar Stuff, just on LD, that is used by semantic scientists or by the semantic web specialists, and that is very complex to be understood for web developers, service developers, and so on. And the other world that is related to web developers, API developers, is the OpenAPI world. The problem is, how can I bridge those two worlds which have different requirements? When I design a service that should be available for 60 million people, for example, I have to shape for billions of requests. So, I cannot convey every time all the semantic information that I need to describe all the specificity of a service. So, I cannot convey the complete payload, semantic payload. And in the other case, if I have to convey this payload to another country to create an interoperable service because I want to attend France University while my records are written using Italian schemas, how can the France University Web Service understand those kind of schemas? We try to bridge the gap. So, we leave agencies the freedom to define their own JSON schema. So, they can define freely the fields they want in providing their services, but they should do it in a way that fills map consistently. The meaning of the fields should be consistent with the Italian ontology, so with the Italian book libraries. So, when you say, for example, given name, it's not the patronymic, it's not the patronymic, it's not the name you had at birth, but it's the name that you have now and after you change your name because you don't like your old name or because you don't like your surname and you change, this is the name you have currently on the Italian National Registry. And the agency that provides the service should provide semantic information in the form of a JSON-LD context. The JSON-LD context is this thing we saw before. So, it's an object where every JSON field is mapped back to an URI in a vocabulary. So, the country should be mappable, the given name should be mappable, the surname should be mappable. If I have this kind of information into the schema, I can design the integration before start developing. So, it's an exchange of information that does not happen at runtime, not while the agencies or while the mashup is ongoing, but when two organizations design the API, they will check the context, they will check whether the semantic of those APIs is the same, and then they will be able to create a mashup that is syntactically coherent and that can be used at the integration phase because you can write tests that, for example, if you rely on vocabulary, that download during the test the vocabulary and check whether the information you have provided in your test are coherent, for example, with the job title that your organization decided with the list of countries that your organization intended, or, for example, whether the web developers of your UI use the same localization labels that are provided by the vocabulary. So, for this one, we filed a draft RFC that you are welcome to check, and we even stubbed some interfaces. For example, we implemented a very simple modified sugar editor where while you design your API, if you stub the URL of the person class, for example, it will make a query on that endpoint, on the sparkling point that stores that information, that vocabulary, and provides you with all the properties that are stored into the graph database. In this case, the web developer doesn't need to know about vocabularies. He just needs to know that there is a model class for person and that it can use that model, it can use the properties in different ways, in different order, in different shapes, but it needs to map back each property that it provides in its own schema to the original properties that are in the class. And in some cases, for example, if there are vocabularies for those specific properties, you can use and import those properties either in the form of lists or either in the form of open API schemas. All these happen at design time and catalogs ensure that API design is consistent within a given ecosystem. So, in Italy, we are building this National Data Catalog for semantic interoperability. It's a long name, but that's it. We already have a set of controlled vocabularies that you can get from the URI. The sources are on GitHub, and those vocabularies are aligned with European authority tables. Authority tables are very interesting if you have to plan services that have thousands of producers and consumers that work independently, because in this way, you don't need people to sync up between themselves, but they can always rely on those authority tables. And the National Data Catalog for semantic interoperability will allow to find reusable vocabularies ontology, share semantic interoperable schemas and public services, and ensure that APIs have a correct meaning and can be meshed up together. Then we are working on a lot of semantic specifications for interoperability. We are registering the YAML media type, because it has not been formalized yet. And in this work, we are providing security and interoperability consideration. It's very interesting, I suggest you to read it. There is a YAML-LD, that is a W3C-specific ongoing specification, that allows to express all this information instead of using JSON in YAML. And then there is this specification to bridge JSON schema and OpenAPI, so that you can formalize better all this concepts. Even there, you can see interoperability and security consideration. If you are into this kind of stuff, please drop me a line. There is a lot of community work that is ongoing, but with the OpenAPI community and the ITF and W3C, and we are working on all those tables together. Then there is this very experimental work we are doing, that is to bridge REST APIs and LinkedIn data. This work is ongoing on GitHub. And then we are even trying to bridge those kind of stuff with frictionless data. That is this specification that allows to bridge CSV and Excel and data that is exported in not very semantic stuff to a well-comprehensible, well-understandable REST API ecosystem. Well, I think that I'm done. I finished quite early. If we have some time, I can show you a couple of demos, but at first I think better to have some questions. Yeah, so we'll do the mic, and we will have some Q&A session in the meantime before the Susan demo, maybe you could do some demo. Okay, I can show you a couple of specifications. Okay, question first. Okay, so you've mentioned that you've designed the whole system for around 60 million users. So I guess that's not the real number of the users which are connecting with your system. So what's the real workload and what is the demand in comparison to what you've designed system for? Well, actually we designed a system for 60 million users because... There are 60 million Italians, right? Yes, and we provide services to all of them. And actually the kind of designing system is even wider because, for example, 60 million is just users, but we have 10 million companies, we have vehicles. So the goal of our work is not just to design the single systems, but since every agency, for example, the Ministry of Interior designed the system for the national registry population, okay? The problem is that you have to make all this information interoperable and integrable with all the other agencies that, for example, have information for vehicles or for companies or, for example, for the invoicing system that process every single invoice that is issued in Italy. So when an invoice is processed, you need to ensure that the sender and the recipient are existing living persons, for example. So actually in terms of workload, we are speaking of systems of thousands of APIs that are interconnected. And the challenge is not really operational because if you just want to face these kind of things operational, you can say, okay, there are some best practices in addressing single services, but the point is that if every agency designed the system in isolation, when you have to create services for citizens that needs to integrate the API of the population registry with the API for companies and the API for the fiscal information, for example, those kind of services are not harmonized so every single agency designed these kind of services, optimizing for their specific workload, then you will have very efficient services, vertical services, but you're not allowing an API ecosystem to grow and you are not allowing, for example, local agencies to build services. One of the problems you have, for example, is that in Italy you have, let's say, 400 central agencies that are big, but you have 8,000 municipalities and they are very close to the citizen and they are not every municipality has great money for expenditure including large services, but they may be able to mesh up at least basic services to provide customized user experience for their citizens, for example, for some social services. So the challenge is quite different, it's not just workload, it's to optimize the workload of the country, not just the workload of a single agency because for a single agency, in general, it's complex, yes, because it's complex, really complex, but it's doable because you say, okay, this is for a country, I mean. That's a huge scale. Yeah, the problem is to create something that can serve not only the verticals but even the locals. A region should be able to mesh up APIs that are provided by great major ministries or global agencies or what, national wide agencies. So the challenge is this one. I don't know if... I understand I have not answered to your question, but I think that... Yeah, but you've described the matter of... So thank you, Roberto, for the talk and let's give him a round of applause. Okay, thank you. So our next session will be after the coffee break.