 So hello everyone Welcome to our session about cloud events discovery Clemens and I will give some Introduction of what we've been up to regarding cloud events discovery over the past years At first maybe a short look back cloud events 1.0 was finalized in October 2019 and At that time As the course back was stable we met and discussed what to do after this course specification of course in parallel to this the Normal work was also ongoing like additions to the specification and Additional protocol bindings and of course the valuable work also on the SDKs and so on but yes We decided in 2019 that discovery and subscription handling would be the next thing to look for and So we now have a few draft specs in our repository one for subscriptions and one I Here called registry But in fact that is exactly about discovery, but why it's called registry. I will explain over this talk We have some Specifications that also emerged as you could say side effects of the other two like pagination where it's about handling larger results that's from API calls and page through them and the CE SQL that is a SQL dialect on its own just to create cloud event filters for event subscriptions But today it's about discovery. So what is discovery? I think in the beginning people had various expectations to this and I would summarize it under those three questions So what events are around in my context context? Of course for all our participants means a lot of different things like could be a product a landscape Just a specific service, but that's exactly open to the According use case and once you found an event and know essentially how it looks like what the source the type and those attributes are You might wonder what the payload looks like. So what the Schema of the payload is and that is also something that has applications beyond eventing and messaging because Yeah schema definitions are widely used for code generation and also validation purposes and Once you have those two questions covered you might be interested in a specific landscape and their look for endpoints that now are ready to consume or produce your Events and that's then the endpoint registry But that starts with the event definitions so Event definitions we just see them as something that can be summarized in a group a definition group And as I said in the beginning this context this group can mean a lot of things and that's really up to you what you wanted to be and The other thing you see in the bottom the rough structure of such an Definition It has some of the expected attributes like ID and description and so on but also you see something like format So here it says cloud events, but we realized that this can also be used for the people who maybe don't use cloud events But just plain messaging and like Messaging protocols like mqp. So this can be extended or there's already some there are already some definitions for other plain messaging formats and The other things you see here in this sample on the bottom is the attribute section I will explain this a bit more in the following slide and a link to a schema. So that's already touching the schema part and Yeah, so what can you tell about in a cloud event contracts? It's all about attributes. So that's also what we have here and in addition to what is specified in the cloud events core specification you can Go further like define what the event type of your event is supposed to be what we have here for example for this customer node added event Actually, the first part here is could also be left out as it's just Repeating the constraints. We already have in the specification An ID is always supposed to be a string and it is a required attribute anyways but for source and subject here we have your I templates and Here you see that for example for sauce. There's one field the sources Always made of those segments in the path with CRM customers and the region For subject. It's even a bit more interesting here because in this back it's an optional attribute but here for this event it's made required and also Originally, it's just a string. We don't constrain it further in the course back But here it's then a URI that contains a UID For schemas, we also have this grouping idea and you can store all kinds of schema documents So we are not really fixed on a specific language We define already to use the usage of Jason schema XML schema Avro and protobuf I think and But it's really open to be used with any schema definition language What is here in addition to the event definitions is that we also allow having multiple versions of a schema and parallel Endpoints are again just an extension of the plain definition group in the sense that they add allowed to add configuration data that is For technical things like like protocol settings and so on they define a usage and that's one of the things I explain in the following slides They also allow to To link to definition groups so to reuse already defined events sometimes in some environments You might have hundreds or thousands of endpoints that all refer to some predefined events so repeating them all over again and again would not be very efficient and Yeah, so one more thing that will also be explained in the following slides is the channel So let's let's first look at the usage types of endpoints So we have consumer endpoints. We have producer endpoints and also subscriber endpoints Let's start with the consumer endpoint That's sometimes also referred to as the pool model So you have a consumer that wants to consume events and therefore looks for that consumer endpoint and in the first step it Initiates a connection to this consumer endpoint And then once the connection is established the events can Flow typical things for this examples are Pub sub models when you subscribe to a topic for example, but it could also be realized with an HTTP get call The other direction the push model so to speak Where you use a producer endpoints? So you go out and look for an endpoint where you could send specific events to and here The producer also initiates the connection and then sends events over that connection I guess the simplest example for this would be a webhook and for subscriptions the subscriber endpoint There you have a consumer or some party actually that use that subscriber endpoint to create an event subscription so a filter and the specification of a Producer endpoint to send the events to that can be for example used in combination with webhooks and For channels I said that there's this channel attribute also in the endpoint definition And that can be used to correlate to which endpoints belong to the same channel So imagine you have something like a Kafka topic or any other Cuing system and then you would have an endpoint producer endpoint on the inbound side and the consumer endpoint on the outbound side and Through this channel field you then could discover that they are attached to the same channel and then correlate them So maybe you already saw that there is some commonality between those three registries We have and it's always this hierarchical set up there are some differences, but there is also a common core and that is to have groups of metadata that are stored in those metadata resources and Yes, we can just define them in single files And that's one part of the specification, but you also define then standardized API to access those And this is then the new repo on the block you could say so as this is beyond just cloud events You can define arbitrary message formats in there. You can also store schemas and even Extended to use other Resources to store other resources. That's why we call this the x registry the extensible registry and we are currently in the process of Moving the specification for this part into exactly this repository So some guiding principles we had when discussing this over the time It should be possible to start small with your hello world sample for example And yeah, there you would just put some event definitions together Maybe with some schema also in line and maybe even the endpoint description into a single file We currently foresee this is the EREC extension for this and you could just manage this together with the source code and a repository and One one step further would then be that you pull out the endpoint definition and make Maybe your deployment scripts or your infrastructure create this endpoint description on the fly when you deploy this and make this endpoint Then link to the statically provided definitions you have in your project But of course you could also do something more advanced maybe you're up to some kind of enterprise setup where you need a central registry and a lot more governance and Organization people want to have a central control over Events and schemas that are defined also for a more controlled life cycle and versioning You usually need this for If you need interoperability between maybe departments of the same company, but sometimes even to the outside so to to vendors or customers and And with Evolving event infrastructures, you might also have the need for federated event discovery So maybe even discovery services that again the exchange discovery data so sorry Very are right now. So just disclaimer first what I explained to you was the current status of our discussion of what we have and That is still work in progress. So changes may happen, but that is also your opportunity So if you would like to Join us if you have some ideas or use cases to present don't hesitate You can call join our calls or just open a github issue, whatever suits you best and and tell us and we have also some more challenges we are currently discussing in the group like Defining cloud events for x registry so that we Can have events if a schema changes or a new schema versions there if an endpoint changes things like this and There are also cases when event definitions need to be enriched by someone who points to them because there are There are more specific attribute definitions or something like this maybe also custom attributes added to an existing event definition and Sometimes also endpoints at additional attributes like in the example here a partition key If you if you think of a gateway that pushes events to a Kafka topic Then that gateway might add that partition key as an additional field and the producer originally did not even know about this So that's also an interesting case, but all these very sophisticated examples Might be a bit hard to discuss right now So it's better to get some practical experience right now and also learn from those proof-of-concept Implementations and that's exactly what Clemens did extensively over the past months. I would say and he wants to show something of this So handing over Thank you Klaus So I'm gonna go away from the slides. I'm gonna switch into VS code for you Right Because that's more fun And I will show you a few samples and sample documents to illustrate Basically what we have here in terms of definitions So there is a there is an API and there is a document format I'm not going to show much of the API here today Because we can basically in far from the document structure what that API looks like what we did is we created a Symmetry between the documents and the API means if you go to The root of the registry and then you start you start to Slash definition groups you're gonna get exactly the content that sits here underneath definition groups If you go to definition groups slash Contoso CRM events you're gonna get that object we so we have basically this resource graph You can traverse it using the URIs so we have a pure rest service But and what's what's richer in the API then in a document format is obviously you can store Many many many of these documents. We have filtering pagination, etc But basically the structure that you see here in Jason is is the same structure You will get out of the API and we find that super important. Why because we think That this registry will in most projects start small So the minimal I have a minimal file here where we have effectively a definition and you know the Few events I'll go into details on those things in a moment and then we have the schema groups all those in one file Another example I'll just pick this again. You see that the file is fairly large So I have an HTTP endpoints That HTTP endpoint points to the set of definitions that we're defining here Those definitions are cloud events. And so this is the customer created cloud events That has this type We require a time. We have a particular source URI that needs to be defined and it's using a schema In Jason schema and it's kind of pointing to this definition Inside of this document, which is relative to a relative to URL. And so if we go and scroll down To the bottom. That's kind of where we find those things. So you'll see that within a single document We can store schemas May that be Jason schema may that be protobuf schema may that be, you know, whatever schema you like avro schema, etc We can store event definitions Meaning constraints on top of cloud events Defining exactly what those events are and we can have endpoints which then refer to those definitions So very practically speaking what you can do is you can define exactly for a Messaging tunnel if you will Exactly what's allowed or what can be expected on that tunnel. That's what that channel concept is that we have We don't we don't have necessarily we don't we don't say it's a cue. We don't say it's a topic We have no We don't take a stance on this We're simply saying there's a thing you can go and send events into and here's the contract for it And here's the thing you can go and get events out of and that is what the contract is That's what we're doing kind of with these with these definitions The X registry is kind of the sub the underlying base base structure that you can then extend with further things so our colleague or our project colleague dr. Davis is Has been doing the work to validate the abstractness of the spec to build an API registry with it so there we're using the underlying the underpinnings the schema registry and We can go and embed effectively open API documents and acing API documents into our Format here, and then they can go and refer with their schema References into the schema registry So we have a very universal model here for modeling effectively metadata. I'll show you a few more examples Because we've been mostly talking about cloud and so let's go back to this So just so that we have context. So this is an event that is defined in cloud events here is an MQP example So we have an MQP endpoint that's an MQP queue and Here instead of defining a cloud event the format is MQP one and what you'll find is that the metadata is Representing effectively the message attributes in an MQP message the message properties the time to live that's on the in the header of the MQP message custom application properties, which means we can use this to create a contract for MQP which is something that doesn't exist. So you can now go and take these documents or a link to the API associate them with a queue and Now you all of a sudden you can go an inquiry in that queue and say what are the messages that you're expecting or what are the Messages that I can expect when I come to you Which is pretty amazing I can do the same thing with HTTP So HTTP So now this is the cloud events version of HTTP wait. Sorry. So MQTT MQTT also has a particular set of of metadata As topic, cause, retain, user properties, etc So we can also represent that here. So basically for all open messaging for all messaging formats, which have particular metadata models We can represent them all in a single registry here. That is all very powerful And there's a reason I'm doing this in code because I have a little tool Called C registry, but it's wrong. Okay, so It was just it was just thinking for a moment so I Called this tool I think this is a little bigger and It gives me back a number of templates that it has And so what I can do is I can from here say C registry Generate And I can go and say I would like to have a producer In for the language, I'm gonna pick C sharp language C sharp And I want to have a I want to put this into the outputs TMP 01 and the project name Shall be CRM and the definitions Shall be from samples Message definitions Contoso CRM. So that's the cortisol CRM file the the example that I just showed you So it goes in things a little bit Outcomes Here a project. So what it just dropped is effectively a Event producer That is taking all these definitions it creates basically it knows that there's an endpoint an HTTP endpoint in the file So it creates a factory method for that. We also have that for C sharp We also have that for Java. We also have that for TypeScript And then there's effectively a type method For every event that's in there and basically everything that's that's that's defined as fixed Is being inserted here into that file and you only supply the extra data That is the is defined for the event. So you're always getting kind of this generator now has enough information To always create a correct interoperable cloud event that data here stems also from This project so basically it goes and runs through the adjacent schema and then generates the correct class That's something we have all built into this tool This tool however finished and fabulous this looks is a prototype What we built this for is to just prove out that We can we can basically go and generate we can generate these we can generate these things that they work So I'm going to go into the so output to We're gonna do some Java There drops the Java project main control. So now I have the event producer Create for HTTP. So there's the factory method. So this is exactly the same pattern now in Java And we can we can do the same thing also for a set for TypeScript What that proves is that this is a that the format is good is great is great good enough for Generation now there are obviously tool chains That you may already have That you may want to use so I can go and create a producer No, no, no, no, don't do that I'm gonna pick the language open API There comes the project what it did now it basically took our definition and transcribe that into an open API document that you can now go and create a client from and so this basically goes and creates a Message schema for each of these messages with the correct expressions Basically the correct method the correct messages for HTTP So this is using effectively implementing the HTTP binding for cloud events And we now have enough metadata to go and drive that code generator and then just for fun You're gonna do this in 04 we can also generate Async API because we are a Level below so async API is also and you know standard also in the Linux foundation that is Defining effectively contracts for one-way transport But for usually for correlated request response kind of exchanges similar to open API and since our definitions are Effectively a level below We're simpler and but more more precise more down to the protocol It's easy for us to kind of write an async an async API implementation if you will of that contract out so you have we can generate open API We can generate async API. We can generate code and based on those definitions. I said This is all prototypical And this is actually not even yet checked into The the repo just because we don't we don't have the x-red registry pro yet So this is in a repo that I use to kind of prove these things out But we'll go and put that into that common repository Just to give you a quick look In how this works. There's this template directory In my project that kind of it's a Python. This is all Britain Python So there's a template directory and then in this template directory You will find these projects. Here's my producer for C sharp and the way this is done is using Jinja Jinja to the template the template engine and We've written a fair I have written the fairly extensive set of extensions for Jinja that can be used in in here so this is effectively the project file and Then you know picks up the package references, etc And here's my producer file and so I have we have a With this effectively a code generator framework if you will that's super easy to use you don't have to write any code You simply go and write a new template drop that into that directory and then you can go and write, you know code for whatever you want Based on these definitions if you want to you can also steal that code and write your own code generator for your own metadata format if you want to So in these so we have templates for as I said acing API. There's one that generates Queries for Azure Azure stream analytics We have here As is the open API we generate Python and TypeScript code and basically the the principle here is That the the code generator always generates complete projects Which means you can go and create a package compile it and then refer to it and then use that effectively as your client to you know your web service or To you know consume events from in that particular format you always get a fully typed experience out of this And that is all for Effectively made possible because we now have a formal language, but with which we can go and define these contracts for messaging for eventing with an eye clear eye on on cloud events a last thing And this is probably the most extreme example that we have If You have any contacts with the manufacturing industry There is an MQTT based standard called spark block B and spark block B is being used to You know from machine to machine interaction for collecting sensor information from a farmer sensors This specification here this document is a formal specific as formal definition of spark block B in our format that is more formal than any of the Spark block B owns of spark block B's own documents because there is no way today To formally to find an MQTT message and there's no way today to formally to find an empty the endpoint. This is it so you have here a Edge node producer for spark block That has a particular topic format That has refers to a will message and to a will topic So these are all options you can basically go and define here You have a node consumer so you can see that these endpoints are effectively roles that these parties take in In this MQTT protocol definition So these are all these roles Each of them have a particular assigned topic in this topic tree You'll see that these all refer to Definition groups so what they do here at this application producer that knows Effectively three met and knows consuming three kinds of messages. There are then defined further in here So there's so-called and birth and death and data I'm not going to explain the protocol to you D death the birth etc Some of them are sticky messages. Some of them are just telemetry messages and we can basically define that all in here And you'll see the schema URLs and the schema URLs point to the schema section which in one case Defines Jason schemas There is no formal Jason schema for those things in the actual spark like the Specifications, so this is this is something we did here And then in the other case refers to the official protobuf to Document that is external so we're doing here not and we don't embed it we have in a link to it so this is a Formal way to define fairly complex protocol relationships on MQP with cloud events with MQTT with Kafka whatever whatever you want. It's a formal language for For messaging contracts and that's what we had Questions questions How do we With the lady with the microphone is up here. Yeah, okay. Go ahead Hi, first of all, I think it's a great idea and it fills the gap But why did you decide to go with a dot c-req file and not a regular Jason jam on structure? So you could use Jason references and other stuff But so why do we have an extension? Yeah, why it's why isn't it just a regular Jason file? To something we have that's not set in stone yet We have mostly used this extension to separate it out for For integrating with tooling So there is a kind of in this in in the repos that we have there is a nascent extension for visual studio code and To trigger a extension to kind of make sure that you have the right document that they can go in and start the wizard The code generation wizard you need to be able to tell what document it is So that's why it's like it is Jason and it's being registered as Jason and it is all Jason files So this is not it's not it's not non-conformant, but we have a special extension that helps to tell the tooling that This is the contract file you want Not set in stone. You can I said it's just normal Jason So we're just storing that here in with the C a C rich extension and since we're moving to x registry Obviously that name still going to change. So whether that's more practical to use Jason or to use C reg or whatever the extension is is something that's yet to be seen obviously Since YAML is a super set of Jason you can convert these documents into YAML and also the parser understands them, too Because we already got a basically HTTP server in place It just has a lot of Jason schemas for our objects And then we could just reference these and yeah migrate to There's the the the spec doesn't I'm not sure but the specs and he says anything about the extension I'm not sure it does Yeah Thank you. You're welcome. There was a question up here up up in front Thank you. Yeah, this is great. I agree One question in the schema your mission about the headers of the event or also playing to put the scheme on the payload to the event itself So those are the two sections the the schema registries for the payload for the payload data The message definitions are for the metadata of the message. So these are effectively we have we have if you will they the the the message definitions are a Further set of schemas if you will which are specific to the respective Transport that you use And they allow you to go and constrain down only the metadata and then the the payload is always Whatever you want because messaging so general principle in messaging across all products is a Message is a binary blob with some metadata on it And so the binary blob is being formatted by the schemas may they be protobuf made of a Avro made of a Jason made of the XML Whatever and we can store all those Schemas in the schema registry portion and then the metadata of the message that's defined in the message definition section It's a question regarding the formats and the attributes the method is a attributes Schema I mean there is an intimate relationship between the metadata attributes and the format do you have a plan for a Format schema in order to use it for a validation That's one question and another question is Yeah this sometimes also an intimate relationship between the channels and the the messages in terms of authentication and and likewise and Maybe a format method is a good bridge those two So the meta schema for MQTT MQP cloud events Kafka and Nats I think and HDP so those meta schemas exist basically as a set of Extensions to the core registry model. So the the There's a notion of a message and then there's these formats that you see that you see here So this format definition here and that format definition is backed by a Jason schema which basically defines What's valid within this metadata section and we have those those sub schemas basically for this section We have those for all those protocols So there's a fairly rich one for MQP because that has a lot of metadata sections The one for for HDP covers You know also the headers and the and the query section, etc. Etc. So we have that for all all of those So there's you can go and formally validate a message based on these definitions and it's meant for that purpose It's that you that you're able to go and validate messages even as they come as they come along Whether they're conformed with that spec. So it's not just meant for for for gen for co-generation but also for these for metadata validation and the other point We have gone so far close to the endpoint definitions as we think we can Authentication is so varied That many many projects have already sunk the ship on trying to get security kind of You know The same across many many projects and we want to avoid this. So there are several Several of us in the project who have suffered through WS security in web services days And we would like to not not repeat that mistake of reinventing yet another metadata language for security So we're usually staying that far away from it. So these co-generators here What what I choose as a strategy there is I basically have a Interface which is the credentials interface and you can go and define whatever the credentials are and then you pass in and down into the the the transport implementation but the the Definition doesn't need to get into the business of you know defining what the authentication story is because that is certainly for now When we're starting up right this thing too hard You make it delicate wise choice we make it deliberate by space because we all know That that's a rattle that Will sink the ship and so therefore we're stay out of it We know that there's that there's practical ways to go and deal with it and then Eventually we can probably get to a solution where we can go and define this but this is not the time yet Thanks for the talk it seems that This scheme is so extensive that I can describe files and file formats with that and Should I do that with this in case for example? I have some private file format and Different languages that use the same file. Yeah, so I agree with you Let's see. Well pick this. So here's my here's my here's proto proto buff schema registry piece You can you can use this you can use the registry format for schemas for instance to kind of create a overlaid metadata File or metadata API for a data lake Where you are effectively you're registering all the schemas for your data lake in that registry and Then you find some organizational way in the data lake to kind of refer to that to that schema registry because there's Always the problem if you have a data lake and you store Schematized data like protobuf written into files. There's a question like where do I put the schemas, right? How do I manage that? How do I manage the versions of schemas? Where do I keep them all? So this is also meant for just organizing serialization schemas That then point to you know files in the data lake, etc. So this is all just very intentional and This is also the role of the schema registry When you think about data streams, you know, they start somewhere where a producer creates an event Let's say that's telemetry. It runs through a real-time pipe and then lands in a data lake somewhere where it's being stored, right? The consumer on the other side to reach the data from the data lake has no longer has no messaging business They don't need the message definitions, but they still need the schema So our notion is that the schema registry is the piece that will be shared across all of those different Interested parties which want to get at that serialized data and it's more and more that you have parquet and Avro and protobuf and all those different formats That this data in data lakes is being held in Yeah More thinking about what MQTT. It's not protobuf. It's binary Well as an example, sorry MQTT is just a transport As another example For example like MP3 or JPEG. Can I use that to describe it? Well, you can you if you want to use this to refer to and organize your files you can probably also do this, right? But we have a we have a We're making things simple by having a set of groups and contained in that are our resources So we don't allow you to have endless D paths Yeah, and your thoughts on that. Should I do it? Should you do it? You have to you have to decide it yourself Maybe one less question and the rest can do come to you in person. Is this okay? That is okay. Yes, great How do you see this in relation to async API and open API over time because this seems to be able to potentially replace those more or less We we intentionally leave a leave a gap in What we are aiming to do here between between open API does an async API does and what we do What we do here is we basically give we format messaging paths if you will With the core with that core set of specs a plus obviously we have this universal registry model which is kind of even That's added value completely independent of all the messaging aspects What async API and open API do is they they create effectively a correlation contract async API says I send a message over here and then I'm going to go get the response back here So it's it's something that is literally it's taking API very literally right. There's always kind of a relationship that's being built we have We ventured into that for a little while as we were talking about the the the contract and then we decided to stay out of it at least for the time being because We think that contracts are actually far more complicated than this simple request response story um, and so we believe that that justifies deeper thought A little later. So we'll start with this. So for instance If you have a scatter gather pattern pattern, which means you send one message in and then you have nine parties Which are which are responding to you. That's a legit contract You have one message that you're sending to someone and they're answering to you with a hundred messages or a thousand messages That's a legit contract in asynchronous messaging. There's so many more variations Of contracts than a good request response that we think we need to have a contract definition language that does messaging justice But we're going to start with this thing which where we can basically just go and label channels with metadata And then for a while you have to go and figure out what those contracts are And then we can probably go and figure out as an extension the next layer of of contract language And maybe async api is the project that wants to go and pick that up Thank you. All right. Thank you