 Hello. Today, I'm going to give you a short overview as well as a demo of Project Alvarium. Project Alvarium is seeking to be the newest at-large project within the LF Edge umbrella organization, and we'll talk a little bit about the problem space and what it's trying to solve. My name is Trevor Kahn. I am a technical staff engineer at Dell Technologies in the office of the CTO. I'm also a member of the LF Edge Technical Advisory Council, and I'm also a board member. And I am a founding member of the Project Alvarium. So to describe the Alvarium concepts, modern applications are extensively distributed, and we no longer think of data as residing in a single centralized silo. Data is originated at one part of the network and traverses that network, perhaps being mutated, filtered, etc., all along the way. What Alvarium seeks to do is to create metadata that attests to verifiable authority at the origin of data, but also through all those hops as it traverses the network, as well as any mutations that may occur. We do this by creating metadata, what we call trust insertion points, which I'll describe in a minute. And at each one of these insertion points, there are a series of factors that we can evaluate in order to determine that data is being created, that are modified and handled in a way that conforms to best practices or policy. And each one of those factors can be independently weighted because some things are more important than others. The trust from these factors is then calculated by what we call the confidence score. So that confidence score then is an objective measurement of the level of trust that you can have in how the data was handled. The trust score may be used for various scenarios like governing system behavior, sending out alerts for things that go wrong, notifications of possible attack, or even just applications that have been deployed and are misconfigured. So by trust insertion points, we mean the various hops that data goes through. Data, which originates at the edge, might just go straight to the cloud, but it might go through any number of hops in between on its way to its final resting place. It could go through gateways, regular compute that's stored in a data center, perhaps on-prem, finally making its way to the cloud, maybe it doesn't even go to the cloud. So one of these hops, there will be applications that facilitate the business and which will consume the Alvarium SDK that allow us to capture the annotations that eventually roll up into the overall confidence score. So this is a simple example just to illustrate the point. You see in the lower left hand corner down here we have some sensor data that's coming in. We don't really think of the Alvarium SDK residing directly on the sensor. It's more likely going to be on the gateway that the sensors are attached to, but if you have a rich compute device, there's no reason that it couldn't, like a camera or something like that. In any case, the gateway receives the sensor readings, and through the SDK is able to annotate certain aspects of its environment relative to the creation of that data. Does the gateway have a TPM? Did the gateway go through secure boot? Are the individual readings that are being collected being put into a distributed ledger for immutability and traceability? The level of confidence at this point is simply yes or no. Is this criteria satisfied or not? When we move to the next hop, which is the Edge server, there's also applications running there with an SDK embedded in them, you may add on additional annotations related to encrypted communication. You know, was the data handed over TLS? Does the data have a signature on it that can be verified through public private key pairs? And then finally moving into the cloud, maybe we add another annotation on top of that that uses a hash or a checksum to determine whether or not the payloads have been tampered with. These annotations are stored in a separate data store from the business. So the business applications are writing their data to one place. The Alvarian annotations are being stored in another place, and that could be a distributed ledger. It could also be a database sitting on the other side of a pub sub. But if all of these were to pass, then numerically we would have a score of six. But as I said before, if you want to independently weight some of these factors relative to others, then you'll want to think about that confidence score more as a percentage, because that will reflect certain things being more important. So the demo that I'm going to show you today essentially involves three different workstations. These are simulating the roles of a gateway, a core compute node, and then a cloud compute node. And the data is going to originate over here at the gateway on workstation to it's going to then move to workstation three, where it's going to be transformed. And then that transform piece of data will be moved over to workstation four. Now you can see at the top that workstations two and three their business applications will be writing that data to a database, just like you would have a database under support supporting any number of business applications. The annotations on the other hand, are going to be written to an iota stream, and then there's going to be a subscriber that is looking at that stream waiting for the annotations to come in which will eventually calculate the score and persist that data into an alvarium database. So you might be wondering, well, how can I relate the application data to the alvarium data. And right now, we're looking at application data as a document. And so think of a sensor reading that comes in in JSON format, you can turn that into a hash very easily. And all the players in this network understand that we're using say the shot 256 algorithm to do hashing. Then when I go to fetch records out of the application database, assuming they haven't been tampered with. I can hash those up using the shot 256 algorithm and use that hash as the key to go find the annotations and the score in the alvarium database. So that's how we link the two together. The other thing that I want to show you on here is that I've got these PKI yes TLS no TPM no and so on. This reflects the annotations that are going to be captured at each point of the process. So for workstation to we're going to catch for two of them, a PKI annotation which says that the data going out has been signed, and the fact that there's no TPM none of these, none of these boxes have TPMs on them. And finally we have an extra annotator for TLS. So I've set up a simple rest endpoint with no TLS certificate. And so that's going to fail. But then when the data is passed to workstation for there is a TLS certificate there. And so that one's going to succeed. So, our philosophy in building annotators and these are the only three we have right now I'm sure there will be more in the future. And one of the things that these should not be satisfied by virtue of a developer passing a Boolean flag into the annotator. The annotator should be written in such a way that it can capture information from the environment from its context that then allows it to say whether or not it meets the criteria. So on top of that the actions that are going to be taken at each point which I think I've already briefly touched on but this will make it more explicit. The data is going to be created over here on the left workstation to that's the gateway, then it's going to be passed to workstation three that's going to result in a transit annotation. And then it's going to be mutated it's going to be changed into something else than that mutated version is going to be handed off to workstation for and we'll see another transit annotation over there. Okay, so that's basically the flow that we're going to go through. And I set it up so that this is only going to execute once because if I just turn it on and let it rip. It's too much to keep track of for something like this. So this is somewhat contrived but I think you'll see the point. All right. Let's get into the demo. So you can see here I've got workstation two three and four. I've also got my sort of personal workstation here and that's what I'll eventually be running the dashboard on the very first thing I need to do is I need to stand up the iota tangle. The way I have this setup. I'm using version 1.1 point zero of the iota streams library in order to set put the stream on top of the Hornet tangle. And I have two nodes participating in the tangle one of them is a coordinator and one of them is just kind of a secondary. And then as part of the stream there's going to be what we call the author console that's going to register that stream and then all of the people that want to publish and subscribe are going to register themselves through that author console. It kind of ties the two ends of the pipe together. All right. So here goes starting coordinator. And now I will start the secondary. Okay, so those are both up. And now I'm going to come down here and start the author. Okay, so now I essentially have an address up and ready to go to be able to publish information on my stream. Now the other thing I have to do is I have to start some applications that support the back end process the subscription the calculation the population across databases and so forth so I'm going to go do that now. Okay. So all that's up and running. Now we're ready to actually kick off our business applications that consume the very MSDK. So I'm going to start workstation for first. And it goes through this key load operation in order to connect to the stream provider and here you can see it's connected successfully. Sometimes it has to try that a few times so we may see some retries in here. Provider connection successful workstation to now. This is really, this is really the trigger for the whole process. This, this application is going to create the data, the other two are sitting over there they haven't received anything yet so they're not taking any action. When I start this, then that will then start the whole chain of events running. All right, there's our message. So let's just see real quick. If we can go over here and maybe capture one of these things doesn't always work out the way I want it to but let's just go see. This is actually visualizing the tangle. I may have been too late. It looks like we're just getting milestones. Now, you can see here that the other applications have received their information so I may have just been too late to actually capture those. Well, let's go look at the dashboard. Okay, so there's our two pieces of data. Remember, we created one, and then we mutated it right so there's two pieces of data. And you can see they have two different confidence rankings. Alright, now, this is sample data. So there's just gibberish in here. All I'm trying to capture is that hey this is a thing right so this would be a sensor reading or something like that. So, this one has 57% confidence this one has 71% confidence so let's drill into that. So if I click on this, I can see here the annotations. And then you'll recall, I said that there were going to be two coming off of workstation two, and then three coming off of workstation three. And the only ones that we're going to pass for going to be the PKI annotations. Let's see if I can get back to right. So, essentially, what we were just looking at is this create, and this transit, and both of those PKI annotations passed. And we're saying okay well there's five there, and two passed. So shouldn't that be 40% but the reason that it is. What was it 56% 57% reason it's 57 is because the PKI annotation just happens to be weighted more in the static policy that I'm using. Because there's more weight given to the PKI annotation that's why we have a higher level of confidence. So let's look at this one this is the mutated piece of data. Right. So, workstation three, we have, we have two here we have TPM and PKI because we do a mutation. Obviously the data is being sourced on that box so there's no reason to do a TLS evaluation. And following from that we have the handoff to workstation four. And you can see here that yes the TLS handoff succeeded the signature was validated and we signed the data going out. So, because of all of that, we now have the confidence score 71%. Again, remember the PKI annotations are weighted higher. So, what you see here then is the alvarium database, which is everything that goes into supporting this one column, and then the application database, which is all of these other fields. So in this way, we can provide a dashboard to show how the level of confidence is measured against every piece of data in the database that you're interested in. So functionally that's the demo. But what I'd like to do is just very quickly show you how the SDK is used. From a go perspective, in your go mod you would import the SDK go, the alvarium SDK go, we have a logging provider as well that's a very slim dependency it doesn't import any external packages. And then let's look at the creator. And so, like I said, I generate a piece of sample data, and then I store it in store it in the database this is Mongo sitting behind that. And here's where it gets marshaled right so it's marshaled into bytes, and then we hand it to the SDK and say okay create. Now, when the application was bootstrapped in the configuration, there's a series of settings for the SDK and I can tell it, you know which annotators to use what algorithm to use for a hash. If you're involved, you know, you don't have to use keys on the file system obviously you can get those from other places but for the purposes of the demo that's what I did. And then here's where I can figure the stream. And this provider is essentially the author console that I was telling you about before. And the tangle is the node in the tangle that I'm going to be writing to so for purposes of the demo. I'm not writing to the coordinator. I'm actually writing to the secondary node. And then the two of them sync up. And so then going back to the main, I read all of that out of configuration you can see right here I'm going after the annotators node within the SDK config. There's a factory that allows me to get the instances of the annotators which I then pass into a constructor for the SDK. And then that SDK is fed to what I call the worker which does the actual work of creating the data. And then down in here whoops that's the wrong worker down in here that's why I'm able to call create. And then this is where I pass it off to the next hop in chain. One other thing that I would like to just describe is in our repo for the SDK, we actually do have some documentation around how you can use this so the SDK interface itself is very very simple. You construct it, and then you call create, mutate or transit. You know, whatever is happening to the data. One thing that's interesting about mutate which I don't have the ability unfortunately to show as part of this demo is we're working on the concept of lineage for data so recall that I said you create the data on workstation to gets past workstation three, and then somehow it's filtered or transformed, we call that a mutation. And we have a pointer from that mutated document to the original document both of those documents are being written to the application database, and we within the alvarium database have the ability to track that that mutation happened. So eventually what you could do is write a visualizer that says for a given document, show me not only the confidence for this target document but all of its former revisions, and then you could have a history of all the machines and all the different checks that went into creating this entire lineage of data, which I think is pretty cool. I think that's probably it. Let me wrap up here real fast. Just a few pieces of information in case you'd like to know more. So our official github organization is right there project dash alvarium on github.com. We have a few different repos that we're actively working on the go SDK. I always work a lot on that. We also have an example application that is a single process application but it leverages some, some of the go concurrency primitives that go routines and channels and things like that to demonstrate each one of the SDK methods in a sequence. So take a look at that that's pretty interesting and then we're doing the same thing for Java, because we have some use cases some internal customers who are interested in having this capability with a Java SDK. But we're certainly open to porting the SDK to additional languages. And so if you'd like to be a contributor in that regard, certainly like to have that discussion. On Twitter we have an account project alvarium where you can find various announcements and things like that. And then the resources that I'm using in this demo I have under my personal account in github. They're public repos of course, and you can see there, the ONES demo 2021 has the services that I deployed to workstation 23 and four, and as well as an example stream subscriber, and that shows you how to subscribe to the iota tangle and pull a message out. One thing I'll say is that the iota tangle that I should say the stream provider. We use a C bindings library that the iota foundation provided in order to have interrupt with the tangle. Everything iota does is written in rust. And so in order to have some integration there we need some C binding so that we can go back and forth between go. And that's it. Thank you very much for watching this talk, I hope you found it informative I hope it sparks your interest in the project, and you know would love to hear from you if you are interested in contributing to the project. Thank you very much.