 Hello, I'm Brian Mason. And welcome to the lecture at Sync for KubeCon 2021. I want to acknowledge my two co-workers that have been instrumental in helping me create this presentation. And of course, our major code contributors to the Natsyn project itself, John and Raghu. Thank you both. So there's a quick agenda what we'll talk about today. Obviously, we're going to talk about problem we're trying to solve, the design we came up with to solve that problem. We're going to do a demo of it because everybody loves demos. We're going to talk about where we're going in the future with this project and how you can get involved and help if you're so inclined. The problem we're trying to solve is basically we love the Nats. We love the asynchronous messaging model. But we want to extend this paradigm out to being able to send a message from a public cloud or something running up in a cloud environment down to a device or some software running on a private cluster or on-prem even in somebody's house, in the enterprise, in a data center that's protected by firewall. Nats doesn't really solve that problem. And that's OK. It doesn't need to. I think it would be overkilled for most situations if it did try to solve that problem. So what we're really in need of is basically a secure, scalable, multi-tenant Nats cluster exchange. So we also want this to scale down for simplified development. So we can keep that simple view of just using Nats and have all the complexities of that secure, scalable, multi-tenant environment hidden from us, or at least taking care of out of our view. In short, we want it all. We took a quick look around to see if there was a solution already out there. And we actually, of course, found some. This is not treading on new territory. But they didn't work with Nats. It's like a message arriving in queue, had a solution. But we wanted to stay with Nats because we really like Nats. So what we wanted to do is share what we've built and see if you want to join in and help us with it. Let's jump into the design of the solution that we came up with. So your basic Nats environment. We like it where you have some random service wants to admit and receive Nats messages to Nats cluster that are received by another service out there. It's a great model. We just want to extend it so this works across clusters and across private networks. So a distributed design, it's fairly simple. Got a cloud side. Imagine a service running up in AWS or Azure or Google. Then you've got something in somebody's house or on a device behind a firewall or in a private cluster on Azure, for example, or running in a data center. You want to be able to admit a message here, have it magically cross over to the cluster that's running in that private area to be received by some other service and vice versa. Of course, this goes across the internet. This is a nice model. Reality is when we add in the internet, we're going through firewalls. We've got maybe not literally, but figurative fire breathing dragons out there ready to stop on your data, steal it and do other things that you would not like. So to make this get through the firewalls, the client side really needs to be an outbound connection. We can't go inbound onto the private cluster. No one's going to open up a port for us to do that. So the ports have to be outbound and we use a hanging pole for that. So this mythical line goes away. All the traffic travels through the sync client up to the sync server and the sync client then request messages to come back down and replay on the local cluster. Pretty simple. So when we do this, we also need to add in an author server because when we register the sync client with the sync server, we really need to have some sort of authorization. We'll talk in more detail about that in a minute. Of course, all the data is encrypted. Not only via the HTTPS mechanism that we obviously used to connect from the sync client to the sync server, the data is also encrypted separately. We'll dive into that deeper as well. And of course, this is all multi-tenant. The server tracks multiple clients. We've got multiple clients registered. Data cannot cross between clients through basic multi-tenant security. However, we want to keep this illusion for development where you're just a client talking to another client with a single NATS cluster. This is the Unicornum Rainbows vision. We like it. We want to keep that for development. Here, let's dive down a little bit into some of the weeds. So obviously to get the messages going to the right private cluster, we need to have some sort of addressing scheme. And that makes this really easy because of the selectable subject that you have in NATS. We follow a simple naming pattern for our subject lines that allows the clients to come in and get the messages they want and to send messages back up to the server. The pattern that we're using right now, we wanna change this in the future, but right now the pattern is for the subject, the direction that the message is going, either northbound or southbound, southbound meaning to the client, northbound meaning back up to the server. The client ID, which for southbound would be the client ID for each client. For northbound, that is a known client ID to go back up, but we wanted to keep that client ID in the pattern. And then any specific data for the service and this varies to whatever you want it to be. This would be what you would normally have for NATS message when you're not trying to add in cross-cluster synchronization. So the pattern is that the client attaches up to the server, says, hey, here's my message ID or my client ID, do you have any messages for me? And by the way, here's some messages for you. Let's dig a little bit on this client ID, where does it come from? So the client ID is a randomly unique ID. It's generated by the server and given to the client. Before any client can have a conversation and exchange messages with the server, it needs to register with the server. The way that does this is there's a small API on the client that users can use when the client's installed on the private cluster, where you say, register yourself with the server that I told you when I deployed you. So when the client's deployed, you deploy it, pair it up with the server. It's just a URL, I'll show it to you when we go to the demo. And you go to the API for the client saying register. When that happens, the client will generate a private public key pair. It's gonna store the private, it's gonna take the public key, whatever metadata that you've told to associate with the client, which is completely user-defined. And whatever auth data you've sent it to authorize its access to the server, that's all part of the API call. And I'll show you that on the demo. And then that's set up to the server. The server, of course, double checks the auth data. If it's good, it will authorize the request. It will save the public key and the metadata and the generated client ID into the storage service. And we'll return the client ID to the client that's to be used for the rest of the conversation between that client and server. That's where the client ID comes from. Let's dig into that authentication we talked about. The system has a pluggable authentication model. This auth service actually is user-defined. We have a sample one that we use for testing and demoing. And it just uses a static user-defined password. And we might expand this out in the future. That's one of the things we'd like to add on is variations on this auth service. Basically, the server takes the auth data sent to it by the client, wraps it in a NATS message, puts it out of the NATS cluster. Hopefully somebody's listening for that auth request and then the demo you'll see will have our auth server running out there. We'll answer the auth request with either a yay or nay. And that is sent back via NATS message. The server and the auth service are completely decoupled like you would with any other messaging system with NATS allowing this auth service to be changed. It can be, the users can make their own. We have a sample, like I said, and hopefully we'll make some more in the future. So after we get registered, then we can start exchanging messages. A message is put in a NATS. If it's got the right pattern, i.e. it's got the northbound, southbound direction in a client ID, we'll select those and we'll sync them across the network. When we do that, we actually encrypt the data. Now, can hear most people say, don't you just use HTTPS to exchange the data between the client and the server? Yeah, sure, we have an HTTPS connection, but we didn't consider that good enough. One thing we wanted the protocol to be pluggable and not depend on HTTPS itself. We wanted the messages individually encrypted and bidirectionally authenticated using cryptography. So we wanted to handle the encryption ourselves. And again, we really didn't invent any new encryption schemes. We're using stuff that if you've read any book by Bruce Shire, you will understand this encryption pattern. It's nothing we've invented, it's best practices. We also wanted to make sure that the data was encrypted at rest. Right now, we actually don't have any specific spot where it's sitting stored in any persistent storage. We had prototypes where that was the case. We don't now, but if we do put that in, if NAT starts doing persistent stores, which I think they're working on, the data would still be stored at rest there. So we have a basic public key infrastructure. The server generates a public private key pair when it starts. And one thing I forgot to mention during the registration process, besides the client ID that's generated by the server, the server's public key has also returned to the client at registration. If you recall, I also said that the client's public key has passed up to the server. So both the client and the server have each other's public keys after registration. So we take the message data, the data that was put on to NATs, and it's taken off. We generated an AES key that's generated randomly for each message. And we encrypt the message with that data. Then we of course take the generated AES key and we encrypt it with the client's public key. We also take a hash of the plain data and we encrypt that hash with the server key. This pattern probably looks pretty familiar. This is your basic bi-directional authentication. We plug all this into a packet, that's a version header, the encrypted AES key, IV salt and some other details, which I won't dive into here. The encrypted hash of the message and of course the encrypted message itself. All this is sent over the wire to the client. Now you can see is once this message is encrypted, not even the server can be encrypted to that point, the, we don't keep the key around, it's thrown away. The encrypted version of it started with the message, but we can't decrypt it because it was encrypted with the client's public key. So the server can't decrypt it. The only person who can decrypt this message is the client itself. So that's how we have security that only this message, if somebody intercepts it, we don't care, it can't be used by anybody. The client knows it's from the server because it's got the hash of the message signed by the server's private key. So they can decrypt the hash, see that it matches the hash of the plain text data. So we know that this data is from the server. So we've got a good bidirectional authentication going on here. For people interested in the numbers, we're using RSA 2048, we're using 256 AES keys. And again, this of course would be changed out and modernized over time because the version had her, so it can always be backwards compatible and we can add in updated ones, like maybe elliptical public private key pairs instead of RSA. This can go forward in the future for more security. So one thing we had to consider is even with the HTTPS in place, once the client is registered, we really don't pass the authentication test anymore. We didn't want to keep those tokens around. We wanted to get rid of them. But we still need a way to verify that the requesting entity that's requesting messages for client IDX is indeed that client. So how we do this is remember, we've exchanged public private keys at this point. We actually do a cryptographic challenge. So every time the client comes up and asks for messages, there's a little piece of encrypted data that goes up with that request that then the server can decrypt with the public key for the client and know that this indeed is the right client asking for this client ID. So I have a check sum there to make sure that we don't have any spurious clients coming in and asking for data that doesn't belong to them. Or worst case, trying to post a piece of data that doesn't belong to them, which would be a message that could be acted out on subsystem, that would be bad. So we have this cryptographic challenge to make sure that that doesn't happen. So one thing we have to do is also we have to store data. We actually are storing messages anywhere other than in the NATS memory at this point. But we of course have client registration with the client's public keys, the client ID, the metadata associated with that registration. All this needs to be kept somewhere. So we have in the code an interface that represents a data store. And we have two implementations that interface right now, a file-based store, which is handy for debugging and testing. I wouldn't use it in production necessarily. And we have a Mongo database store. That's probably more of what we'd use in production where you can have a HAP Mongo setup. Again, the file is great though, because like for the demo, I'm just running a file system-based key store and it's storing stuff in the container. So if I want to tear the whole system down and bring it back up, I don't have to worry about any persistent storage. It's great for doing demos. It's great for doing testing. Mongo for production. Again, this is a pluggable model. It's not a particularly difficult interface. We could add more. We had Redis at one time, but we got tired of maintaining it and we weren't using it. So we dropped it. Again, we don't store messages anywhere other than NATS right now. So with this system, we want to talk about how does it scale? And I will be honest, we haven't done a ton of scale testing on the system. We've done some. It's performed well on our scale tests. The test that we did, we had a three node AWS cluster. We were running a replica set where NATS and the NATS server were running on all three nodes. It scaled well. Hopefully by the time we do the Q&A session, I'll be able to find the numbers from the scale test or have better numbers to give you. But as I have to record this presentation over a month before KubeCon, I don't have those numbers handy and I apologize for not having them in a recording. Hopefully for Q&A, I'll have them. We've tested both on X64 and ARM instances up at AWS. Interesting when we did the ARM, we were playing around with the ARM for cost savings. And indeed ARM instances can be cheaper and they performed pretty well. But we saw one issue with the ARM where it started to bog down as we do a lot of encryption, as I mentioned earlier. The ARM instances at the time we did this testing, which was back in March of 2021, did have encryption co-processors. So the X4 instances that did have the co-processors would perform quite a bit better. But the ARM's performed pretty well, just not as good. But in all honesty with our simulated load of, I think we're simulating about a thousand different clients, a hundred ones and a thousand different, we really didn't see a load showing up on the X64 instances where we had a three node replica set. I mean, we would see the load bounce a little bit, but nothing of significance popping up. So again, we're gonna get some more scale testing in on this, but we think it's gonna scale fairly decently for us. And this was all fronted by your basic AWS load balancer. We were simulating multiple clients, of course. Like I mentioned, we've tested on X64 and ARM, Linux containers is how we ship it. And that's really the only way we've tested it so far. And if other than popping it on a Raspberry Pi and making sure it works there, because of course we have to make sure it works on a Raspberry Pi. So for samples and for our testing and for our own fun, we of course have made some apps. The best way in my personal opinion would be to have sort of core screen messaging apps where you'd send messages from the server down the client, saying, do this action for me, and it would be carried out by the occasion that's running on the client side. But for testing, we've created two apps. One is an Echo app, which we generally deploy with the client system because this allows us to, from the server, via a little command line app, post an ads message that we can check connectivity all the way down to a particular client. So we can say Echo, a message out to this client, give it the client ID, and we could trace the activity down to that client. And we've actually, for the Echo message, which is a known message defined in the code, we've actually added logic into both the sync server and the sync client to send back a response letting the Echo app know that it's reached the sync server than the sync client. And then ultimately to what we call the Echo Proxylet, which is a service we run on the client, we call all the services run on the client a proxylet. So we have an Echo proxylet. We'll talk in a minute about the HTTPS proxylet. Apologize for that name. So the Echo message, we can check, did it reach the server? Did it reach the client? Did it actually reach finally the Echo proxylet to make sure connection is going all the way through the entire NAT system, the server NAT, the server sync client, and the NAT system on the client side to a receiving application. And again, we'll do a demo of that. The HTTPS proxy is a fun little tool that we created that basically takes HTTPS traffic. It serves as an HTTP slash HTTPS proxy. It takes HTTPS requests, bundles them as a NAT messages to be replayed onto the client network. In practice, and again, we'll show you in demo, you can simply use this as a proxy and run curl commands or cube cuddle commands targeting a system on a remote network. And they just pass on through and you can run. This is like a security nightmare for your IT security people. If you play with the system, which I hope you do, you may not want to demo this particular item in front of your IT security people. I did, they didn't go well. So let's get into the demo of this, demo fun. So what we're demoing is basically the two messaging systems that we did samples. The Echo client talking to Echo proxylet and the proxy server talking in through the NAT system to the HTT proxylet on the other side and on the client network. Again, this is a simplified view. This is what I call our unicorn and rainbow view of the world where there's no NAT messages. It's just client talking to the client. It's simple. But of course this is the reality we're working with where as we send an Echo message, it's gonna come to the NATs cluster, be picked up by the sync server, passed over to the sync client, back to another cluster on the on-prem side and finally to the Echo proxylet. And as you recall, the sync server and the sync client do recognize an Echo message and we'll send back a response so you can see where it's gotten along the line. But finally gets over here, we're happy. We'll also demo using the HTTP proxy server to do KubeCuttle commands to a cluster sitting behind a firewall hidden off on, in this case, my Docker desktop. So here's kind of the layout. We have a multi-node NATs cluster and running both the NAT software and our sync server. I believe for the demo, I've got the replica set to one. So even though it's multi-node, only one instance of that and only one instance of the sync server is running, I just didn't need to scale it up. It's fronted with a load balancer. This is just your basic AWS load balancer. It'll take traffic to the sync server and we have, writing on my desktop, the sync client, again, a single node NATs and we're running both the HTTP proxylet and of course the Echo proxylet as well. Up here, the Echo client is a command line client that we post to NATs and then that gets picked up by the sync server and travels all the way down and we'll do that demo. So let's go to what I've got running here. So I've got two command lines here. So let's first look at down here. This is my server. This is AWS, running Ubuntu and I've got my Docker desktop up here. So you can see I did a quick logs here on the sync client. It's just kind of sitting here, hanging out. It's running, but it doesn't have a client ID because it's not registered. So it's just sitting out waiting for one. Down here, we've got the server running and I'm gonna show you some aliases that I've got so you can see what I typed commands, what they are. So I have S, alias to kubectl, running in the namespace for server. I've got K, alias to kubectl and up here on the client, I've got the same K, alias to kubectl because I can't type and I've got C, alias to kubectl dash in client. So we can do C, get pods. We've got the echo proxy lit, the HTTP proxy lit, NATs and the sync client running here and I can do down here server, get pods. We've got, again, NATs running up on the server area. We've got the HTTP proxy server, the actual sync server. And of course we have an auth server running out here. This auth server is our simple auth server which is run with a static password. That can be set when you fired off. It's just an environment variable. Again, not a production item. This is for testing and demos. So we're gonna go to local host. I've got the sync client exposed on a node port 3281 so I could access it. And mostly I want to access this registration endpoint. So we need to register the client with the server. When I fired off the client, I gave it the server URL to talk to the server. It's in a config map. So let's kick this off. Let's give it some metadata. Let's give it a user. None of this is mandatory but it would make sense to describe your client and let's give it a use. And let's see, we'll just call it use of demo. And we don't need it anymore here. Get that out of there. We've got a comma. We've got to put our auth token in there. And the auth token for my specific demo running up there is the answer to life, the universe and everything. So if I type my JSON right, let's see. Kick it off. And indeed, yeah, I must have because it registered. So at this point, we've downloaded the server's public key. We've uploaded the client's public key. We've authenticated this request and we have a client ID or location ID. The same thing we renamed it to location ID and I still call it a client ID to do our communications with. So I need that because I need to use it on the Echo client. So let's first test connectivity from the server down to the client. So let's look at this little run Echo shell. Basically runs the Echo client. It's got the URL to our NATs. Again, we only talk to NATs. All of the software we write only knows about NATs. The fact that there's NAT sync in the middle is irrelevant to it. In fact, we can test all this without the entire NAT sync infrastructure. We can just do NATs to NATs. And that's our direct NATs communication. We don't need all this stuff to do all of our testing and development. But for here on the server, we've got NATs running exposed on a local node port. And we need the client ID. So let's run this guy. And we're gonna send an Echo message down to this client. Indeed, it went off. So you can see we can trace it back. It hit the NAT sync server. It hit the client. It hit the Echo let and came back up all the way and returned some basic spec version for us. Total time, 124 milliseconds. That's not too bad for an Echo. If it was a ping Echo, that would be pretty hard. But for passing through an entire messaging system, to NATs, the client, the sync server and coming back around across the internet to my desktop, I can live with 124 milliseconds. Full disclaimer, the AWS that I'm using is in US West 2, which is in the Dows, Oregon. And I am running in Portland, Oregon. So the internet connection is not too far. And let's do it twice just to make sure it worked. That's cool. So let's do a demo of running something more interesting, which is let's run some cube cuddle commands against my desktop from the AWS instance up on the internet. So this is the thing that will horrify your IT security folks. So I've got a cube config here. This is a cube config from my Docker desktop. If you've ever looked at a Docker desktop cube config, they're all pretty standard. Please don't copy my certificate and steal it from me, although I'll re-initialize this after the demo. So I've got to put in a proxy URL. The only change I'm making to this cube config, this is one I just got from my desktop, uploaded to the internet. And I'm putting a proxy URL line in here. This is standard Kubernetes. I'm not just nothing abnormal that I've done here. We're putting a proxy URL, which has the client ID for the user. This is just how we chose to do our sample proxy server where we had to have some way to identify the client ID. We didn't want to make any extra special headers. So we put it in as the user. So it comes in in the auth header so we can use it. And the proxy is running on the local host up at AWS. And I've got it exposed to 300 E1 for the node port there. So this set the cube cuddle up. So we should be able to just do a cube command using cube config, BDM will do get nodes that should easily identify it as the Docker desktop if we hit the right spot. Oh, there we go. I should probably put some logs up here so you can see the activity. So we'll do a log on the, let's do a log on the XTTT proxy here. Yeah, let's follow it. There we go. Space in there so we can see it. And let's do that get nodes again. And we can see all the packets have come through and been replayed on the client network. Again, this is what will horrify your network security folk when they realize that you just plopped a piece of software that violates all their network policies and once you access all your local resources from some random AWS instance. So let's do one more command just to have a little fun here. Let's do a namespace of client and then we'll do a get pods and see all our client pods running on that network. Well, there we go. And just for comparison, same list. So that is a demo of using the NAT system, synchronizing data to another NAT system on a private network behind a firewall and being able to have control of that system, which is really handy for the internet of things world. Let's go back here. Let's talk about the future and where we wanna take this project. So I checked the item on for the lecture that we'd love to have this become a cloud native incubator. Short of that, we've got some other work that we need to do. Remember how I talked about the messages have either northbound or southbound prefix on them? That made a lot of sense when we started working on this but as we've matured in our thinking, we realized that's sort of silly. Let's just simply add a NAT sync prefix that will allow the system to identify the message. Then we can use the client ID to decide if it needs to stay local or if it gets passed up to a peer, the peer being either a client or a server. I think that will enhance the system. This will actually allow us to send messages from a client to another client because if we could pass the messages up to the sync server, sync server will see, hey, that's not really for me. Let's put that onto the NAT system, then it will get picked up and sit back down to another client. That will allow not only northbound, southbound communication, but peer communication. I think it'll be a nice add-on. It's not what we needed for when we were doing the project but I think it will make a nice add-on going forward. We also save a lot of metadata with the client ID, but we haven't really provided particularly good APIs for searching that metadata to find the client ID. And we need to add in that API and figure a good way to do it multi-data. One of the things we talked about for the goal of the encryption system was having the data be encrypted at rest. That's one of the big reasons why we did our own encryption system. However, right now we don't have any data at rest. And even if NATs would start storing data, which I hope it does start doing some persistent data, the message is not encrypted till it hits the sync server or the sync client. So we'd like to push the encryption back a layer and have helper libraries that allow the NATs clients to post the message encrypted onto NATs. It wouldn't be hard to do. We just need to make the libraries and we need to make the system be smart enough not to re-encrypt the message. Wouldn't be the end of the world if they did re-encrypted. It's just a lot of wasted processing power. Another item we'd like to do going forward is right now we're doing straight HTTPS rest calls. Nothing wrong with them, but they're not the most performant thing in the world. So we'd like to switch to GRPC for the client server message exchange. We also wanna add in some APIs to help mobile clients. We have a couple in there now to help a mobile client post and get messages and exchange messages to do phone-based control, but those need to be enhanced. Those are all items that we have thought of going forward. Hopefully other people come up with some more clever ideas and put some issues in the issues log for us. Another thing is we'd like to do in the future is we've looked around, there's some industry standards for IoT. We've mostly ignored them other than acknowledging their existence. It would be good if we start working with these standards. Again, our goal was to write some software for NetApp, which NetApp then allowed us to open source. So the industry IoT standards weren't important for us at the time, but I think as an open source project and making this mature and have some sticking power, we need to start embracing the industry standards, which brings us to the next topic of getting involved. So again, everyone who's worked on this project so far works for NetApp. NetApp recognized that a messaging system is not core to our business. We don't wanna be in the making the messaging system world. We do data management. So NetApp has allowed us to open source this project. Thank you, NetApp. We licensed under Apache 2. When you open source a project at NetApp, you have basically have two choices. You can say, is this going to be an official NetApp open source project such as like NetApp Triton, which is obviously key to Kubernetes storage. It's key to NetApp. That is a NetApp open source project. We NetApp supports that. NetApp spends a lot of money supporting that. This particular project, we didn't open source as an official NetApp project. It's written and supported by NetApp engineers, but NetApp does not have its name behind this. This is just an open source project that we've put out in like community. It's a minor distinction, but it's important that I point that out that this is not a NetApp supported project. You cannot call NetApp technical support and get support on this project like you can with Triton, for example. Although again, everyone who's worked on this so far has been a NetApp employee and we're happy to answer questions and give people support. It's just not, you can't call NetApp directly. You have to email us, the developers, which you can see on GitHub. This is hosted on GitHub. Here's the URL for it. And the images are hosted up on Docker. And again, the URL's for Docker. As we add to this, we'll get the readme's up better for this so you can find the code in the, where everything's hosted. The proxy server is hosted and a different project up on GitHub. It's just something that personally I threw together just for fun. It's not part of the NetSync specifically. Although that is open sourced as well and anybody can grab it. It's just not directly tied to the NetSync project. But it's still under the one true way, which the OTW is the one true way of open source hosting area for us. Not a NetApp thing, that's my personal space where I put my open source projects. So issues that we could use helping on. We're testing, we could really use some help testing. We could love people to try it out and propose new topics. You can add issues on GitHub. You can grab, you can fork the project. You can email us, get added to the project. We're pretty much open to however people wanna work there. Obviously nobody's gonna have direct commit access to the trunk or the main branch. That will still be reviewed. It's still gonna go through security scanning. It's still gonna go through a peer review before anything gets down into the trunk. But everybody's welcome to come help out. I appreciate you taking your time and come listening. If we're gonna have a Q&A session live here after the talk, I hope you find this interesting. If you love it, check it out, try using it, let us know. If you think this is another waste of time and we totally missed some other really cool pieces of software that we should have been using, let us know about that too. And if you use it and you find bugs, please add issues. I'm pretty sure that issue tracking is open to the public. If not, shoot me an email, my email is up on GitHub. We'll make sure it gets open. And again, I appreciate you listening and let's go into the Q&A session now.