 So I spent my time looking at computing systems for potential security, operational issues, improving the uptime. Typically it's after companies have had issues with it and they're trying to figure out how to best re-architect their systems. And so I've been finding Service Mesh to be an efficient way to go about doing that. I'm also finding a lot of people don't know about Service Mesh so that's why I put together this presentation really to help solve that issue. All of this content is up in GitHub so if you're interested in the walkthrough that I'm doing today you can go, you can pull it all up on GitHub, you can download it, you can run it in your own infrastructure, you can run it in some packet offers infrastructure that are a bare metal service provider so you can run this whole workshop on their infrastructure. And the whole presentation actually I put it up on YouTube as well so if there's something you didn't catch today you can go and watch it on YouTube. Okay so quick agenda of what we're going to be doing today, I'm going to talk a little bit more about why I put together this presentation so I can talk about it. And then we're going to go through a sort of quasi-live presentation so it's a walkthrough of a deployment of a Service Mesh, deployment of a microservice, and then the steps involved in actually securing that microservice and what's involved with that. So we're going to see some physical infrastructure being turned on, we're going to take a look at that physical infrastructure, how it's configured, security issues with it, how it can be exploited, and then how the Service Mesh, how you turn it on, how you configure it, how you cluster it all together, and then what you have to do is to change your service to use that Service Mesh. So you'll see all of that in action. And then towards the end we're going to take a look at some of the resilience that microservices, sorry, that the Service Mesh can provide by, we're actually going to be killing some of the physical infrastructure, some of the notes there. Okay, so let's go back to the agenda, top item, what made me decide to present this particular presentation? So when I go out and I take a look at computer systems, oftentimes I see old legacy systems where companies might not have the source code anymore, they just have binaries or object files, so they can't go and spend the time to re-architect or they don't have the computer, the staff on board to take a historical legacy application and go through and add in authentication, authorization, encryption into the application, so they're looking for ways to add that on, put in that capability without having to go through a whole new development cycle, it's going to be very expensive. Talking to the security teams, I often find that they aren't aware of Service Mesh and what Service Mesh can provide, so it's a little bit of an education process to them, a lot of times people think of Service Mesh and they immediately think of containers and I need a container infrastructure, I need to virtualize my environment and that's really not the case, there is technology out there, so in this example we're going to be taking a look at Hashacorp's Service Mesh implementation and how that does not require you to go and containerize your application, so with that we are going to, like I said I only had two slides, so we will switch and we will go and walk through the real implementation, okay, so like I said what you're going to see here is the exact code base that I have up in GitHub, so you can see all the scripts and terraform and everything else that's used to deploy this physical structure and the Service Mesh on top of it, so I use the term zero trust for those of you that aren't familiar with zero trust is, the idea is that your application, whatever workload you're running, does not want to have any implicit trust in the underlying infrastructure, so the physical infrastructure you're using, maybe it's in a remote data center, you don't own it, you don't, you can't really trust the service provider that owns that hardware and manages it for you, so you want to make sure, and the network connecting those systems, maybe it's running across the internet or you don't know where that cable in goes, so you can implicitly trust it, so the notion of zero trust is you're not relying, you don't have any trust of that underlying infrastructure, so when you see the Service Mesh the idea is that you don't have any trust in the underlying infrastructure, sure you can, you have trust that it's going to keep on running but you really don't want to have trust in that your data is going to be secure going across the wire, so when we go through the microservice we're going to use here is something I call the Fortune Cookie Service, so people that are familiar with old school Linux commands, Fortune's command you can type on the command line and it would just give you back a random quote, so in this example here, I've taken the Fortune Cookie Service, turned it into a network service, so you connect to a TCP port and it's just going to spit you back a random fortune, so that's going to be our sample microservice that we're going to use and secure. When I go out and I'm talking to my clients about the type of services that they need to secure, it could be something similar like this, it could be a much larger manufacturing environment where they're running robots, they're running a program of logic controllers, PLCs that are interconnected and they don't have any of the authentication authorization on top of that, so they're looking to go through and secure that. So first things first, we're going to go and deploy some physical infrastructure, so we talked real quick about the Fortune microservice, so that up there is actually the command that we're running, so we are just running it on ports 8181, it's a TCP process, and all it does is it just runs the Fortune commands, so it's just a real simple Fortune Cookie Service. To connect to it, we're just using Netcat, we're just connecting to FCS, that's a Fortune Cookie Service, that's what FCS stands for, we're going to start up a bunch of instances of it, so that one happens to begin since 00, connect to port 8181 and then you see we've got a response back of Fortune. Typical initial deployments without using microservice is we're going to run a Fortune Cookie Service and then we're going to use a Fortune Cookie Consumer, so FCC is the consumer, the Fortune Cookie Service is once again running a port to the 8181, the Fortune Cookie Consumer connects on port 8181, the server doesn't know anything about the client that's connecting, it's just accepting connections from 81 and then it's sending a response back and the response that's being sent back is unencrypted, so we've got unencrypted traffic and then we have an unauthenticated service running, so those are the two things that we want to correct. The physical infrastructure we're running on, so the T1 is small, those are packet bare metal server definitions, so we'll talk about them in a little bit. Okay, so let's take a look at how that is vulnerable. Okay, so here we have a little sample of a TCP dump where we've gone and we've taken a look at the traffic that's flowing back and forth between the Fortune Cookie Service and the Fortune Cookie Consumer, by tapping in into this unencrypted traffic, we can actually see the data going across the wire, so that goes back to the issue of this traffic being unencrypted. If you can't trust the underlying network, right, going back to zero trust, we want to make sure that anyone who taps in and views our traffic going across between our services, they can't see it, so obviously we need to go and correct that somehow. Okay, so let's talk about, real quick, the infrastructure we're going to deploy. We're going to deploy console 01001 and 02, so we're going to have three console servers, that's HashiCorps open source service mesh implementation. We're just deploying on the smallest physical infrastructure I can get from packet, it's that T1.small, it's the x86 machine, and then we're going to initially deploy one consumer and one server, so FCS00 and FCC, actually actually 0-0, and one of them is going to be just have a port open 8181. We're using Terraform to deploy it, so once again, this is all up in GitHub, so if you really want to dive into the details of it, but it just goes through details out all of the information which data center to deploy it on. We're initially going to deploy this into a single data center, so we're just going to use the New York data center. We'll talk a little bit later about how you could go and deploy this across multiple data centers and how service mesh supports that. So we will go through, and next steps, we're going to kick this off. Okay, so I just did a GitHub pull down everything. We take a look at the variables here, we can see the number of console servers to run and the number of fortune cookie consumers and fortune cookie servers to run and the type of hardware. We're running three console servers to start off with. The reason why is because it runs in a cluster environment, you want to have an odd number of servers to run. And so now we have gone kick this off, so at this point Terraform is contacting the API, talking to our bare metal cloud provider and requesting five servers in total, three of them are console servers and then two of them, one of them's a fortune cookie consumer, one of them's a fortune cookie client. So it's gone and made a request, takes a couple of minutes for the machines to boot up, so we might speed up or go through, but I wanna talk a little bit about what's happening here. So Terraform is smart enough to know that you want five machines, so it's doing all five machines in parallel. As the machines get turned on, we will pull back some data and then we're also gonna go install the console software. So that'll be our service mesh implementation and we'll also turn on the fortune cookie service. So at the conclusion of this, we're gonna have five servers in total, three console servers are gonna be running, but they're not gonna be clustered together, they're not gonna know about each other, so at that point we'll have to go through and configure it. You can automate it in Terraform to have it go through and cluster it, I have that turned off at this point, just so we can manually go through and see the steps of what's involved in clustering together the environment. So at this point, the machines are up, we're copying some software onto it. So there you see that it's copying zip files for the console server. We're also installing vaults, vault is a key management system. So later in this securing bar model, the next presentation workshop revolves around key management. Okay, so we're at the, just going through and like I said, the end result is we'll have the five machines up and running, still copying software onto it. So vault is installed on each machine? Yeah, so in this example, it's the end result is to have an environment that'll use vaults to do key management. You need the key management components if you wanna go through and encrypt the traffic, the control plane between the console servers. So in this example here, we aren't going through and encrypting that and doing that level of key authentication just for the sake of time. But yes, the default installation that I have set up goes through and installs both sets of software. Okay, so at this point, Terraform is replied back there, the IP addresses of the five machines running, three console servers, the Fortune Cookie Consumer and the Fortune Cookie Server. So we'll keep this running. We're just doing some pings. And then we also did a netcat to port 8181, which is the Fortune Cookie Server just to get back that it's running. So we're just gonna log in into the cloud provider and take a look at the infrastructure that's been deployed so we can learn a little bit about the network configuration on it and the IP addresses that have been configured into it and then just a little bit about that hardware environment. So like I said, we've got five machines running. So we'll see the five machines running here. They're all the same hardware type. They're all running in the same data center. Each of them has been set up with a number of different IP addresses. So we get two IPv4 addresses and then we also get an IPv6. We're not using the IPv6 here. The two IPv4 addresses, we get a management one, which is a private network interface and then we also get a public internet accessible one. We're gonna use that private IP address for the control messaging between the console, between the service mesh for data transfers. That private one doesn't leave the data center and then we'll use the public IP address for the external communications. So console 00, 01, and 02 are all identified, they're all identical, which you saw pull up there, that was a console server to get terminal server access to it. Okay, so that's our physical infrastructure. It's up and running. Like I said, we've got three of these console servers up and running, they're all identical and now we're gonna go log into them and we're gonna join the service mesh together. That's the, so there are the three IP addresses that are assigned to it. One of them is private, one of them is the public IPv4 and then IPv6 address. So we need the initial IP address because we're gonna go through and tell the other two console servers the IP address of the first one so they can go in the cluster together. So I just have a little startup script that goes through and it just starts up the console server. So it uses the private IP address. We're just gonna pull it out there and then we're gonna start up the service mesh binary on this first one. That config directory, so in there, there's just a number of configuration files. The first machine we're starting up in bootstrap mode. So that tells the first instance that there are no other instances running right now. Don't worry, don't try and look for a master instance. So that's the first one we start up in the special mode. Okay, so while that one starts up, it did an election, it elected itself leader because we're in bootstrap mode, there's only the single one. Okay, and then in the second screen, we're starting up the second one. That second one is gonna start up, it's gonna panic a little bit because it's not gonna, we're not running in bootstrap mode so it doesn't know about the others. But then what we're gonna do is we're gonna tell it to join onto the first one. So we're gonna run the console join command and then we're gonna give it the IP address and we're gonna give it that private IPv4 address of the first one. And then we can see that the first one, the logs updated and said that the second one joined. We can see the logs of the second one to see that it's gone ahead and joined. And then we are gonna go and start up the third one. So console members is gonna list all the machines that are running that make up our cluster at this point and we can see there's a .7 and .9 both of them are up and running. So DC, that DC column is which data center? So ideally you would run multiple ones in different data centers. Service mesh is gonna be smart enough to know if you have copies of your microservices running in different data centers, which ones to connect to, to connect to the ones locally within the data center. And we're gonna do something different when we start up the third one. We're gonna start up the third one, we're gonna give it the IP address of the second one. And then so it's gonna do the join, it's gonna send a message to that second one and it's gonna say, hey, I wanna join. The second console server is gonna say, hey, wait, I'm not the leader and gives it the IP address of the first one and then it goes and it messages with the first one. So our second and third service mesh instances are just in a passive mode right now and any requests that come into it get sent along to the first one. Okay, so I killed, at this point, I killed the service mesh that first instance we're running because it was running in a bootstrap mode and now I'm gonna restart it in a regular mode because now that we have all three instances running, we don't wanna run that first one in bootstrap mode. So it's gone through and one of the interesting things you see there, it says console, new leader elected, console 01. So when we first started up, 00 was the leader, the cluster leader because we started up in bootstrap mode but then when we killed it and we started back up again, that second instance has become the leader, okay? So at this point, we've got our three console servers running, we are, we're basically ready, we can go through and we can start securing our service. Okay, so let's take a look at what we're gonna do here. So once again, we've got the fortune cookie server, we've got the fortune cookie consumer, we're gonna start up an agent, the agent is the same binary but we give it a slightly different configuration file to tell it to run in an agent mode and it's gonna go talk to the service mesh, the service mesh is gonna keep the directory of all of the services that are running, it's gonna register our service onto the service mesh and then at a later point, we're gonna run a side card to go along with it. So let's go ahead and see this in action. So we're gonna log into the fortune cookie server and we are going to start up the console server, but we're gonna run it in agent mode. So the difference between agent mode and server mode is the server keeps track of all the services that are running, agent mode just goes through and sends information back up about the services that are running on this particular instance. So there we have a configuration file called fortuneservice.json and all that does is it just lists out that we're running this fortune cookie service on port 8181, so this is how the agent tells the service mesh about the service that we're running. So as you can see, we are running it in a, on the left you can see we're running in service false, server false, which means to not run the console server and as opposed to on the right, which is the console server, we're telling the binary when it starts up to run it in server mode. Okay, so on the left here, we're starting up the console agent and then it's gonna go, we're gonna join it, I have it connect in so it joins in with the mesh, so we're gonna run a console join and then we're gonna get connect on, we're gonna do the console join, we're running the log file on the first one just so we can see what happens. So at this point, our fortune cookie server has joined into the mesh and has registered its services. So we can now see when we go list all the members of the service mesh, we can see our new, the fortune cookie service is now running alongside and all this does is identify the node, the physical infrastructure that's now connected on and we're gonna do the same thing on the fortune cookie consumer. So the fortune cookie consumer wants to use the service mesh because it needs to find out where services are running. Console, you can make an API request and ask for information about where make a coroll request, just another API request, ask about the where the services are running or you can do a DNS query and it will apply back with information about where the services are running. So we're joining the consumer into the service mesh and at this point we've got all the console servers running, we've got the consumer and we've got the server all joined into the mesh. Okay, we can go through, we can make some requests of the service mesh and ask about the services that are running, we can ask about the nodes that are running. So here we're just using the CLI, you can make the same request through the API, you can make the same request through DNS. So a bunch of different ways to go through and get the information. We're asking about the services that are running on the fortune cookie server, so it applies back that there's a service running called Fortune and then we can see that there's no services running on a fortune cookie consumer. So now we've got the servers, we've got the whole environment connected. We want to now go through and we actually want to start encrypting our traffic. So we wanna make sure that the traffic can't be read by anyone across the wire. So what we're gonna do is we're gonna make some changes to our fortune cookie service. Right now we have it bound to the public IP address on port 8181. So we're gonna tear that down and instead we're only gonna bind to the loopback. So by binding to the loopback, the only processes that are gonna be able to make that connection are processes running on that local machine. The console agent is gonna be running on that same machine, it's a sidecar process. So it'll be able to connect to the loopback and connect to the service and it'll now be our front door. So any request will have to come in through that console agent. It'll go through authenticates to handle the encryption and then make the request through the loopback on that same port to get the answer and then send it back across the secure connection. Run through here, connect back on to the fortune cookie service. We're gonna take a look at the services that are running. We've got that fortune cookie service. We are gonna start up a new service called fortune cookie encrypted service. So there we see there's the unencrypted service, all we're saying is it just runs on port 8181 and the name of it and that's used to register the service with the service mesh so we can see it in action. So there's still value in registering an unencrypted service. The service mesh does service checks so it makes sure that the service is running but at this point we don't have any authentication or any encryption. So we're gonna tear down the fortune cookie service and instead we're gonna copy in the new JSON file. It says that instead we want requests for this to go through the sidecar and the sidecar is that console agent that we're running. So we removed the fortune cookie, the unencrypted one and then we're now putting in the secure version of it. So all I did is just the JSON file that just describes it. We're not changing the underlying application, that legacy application, that microservice that you're already running. We're just telling the service mesh about it. So now we need to tell it to go through and reload the configuration files. So now if we take a look, we can see that there's this fortune sidecar proxy service and that is the encrypted one that people external can use which we can then go and make connections to. So we need to go through and start these things up. So we're starting at the sidecar, that's that fortune sidecar proxy and we're telling it this is the sidecar instance for the service fortune. And that service fortune right now is only bound to that local IP address. So we can see the fortune sidecar process starting up and then it's listening but it's externally accessible. Then on the fortune cookie client, the client can now also needs to go through a sidecar to make a request. So we're starting out the same process, we're starting up a proxy on the consumer side. So the request would then go from the consumer on the proxy on that sidecar that's running on the consumer and then it would make a connection across the wire to the sidecar on the server which would authenticate it and de-encrypt it and then send the connection over to the actual fortune server, okay? So you see port 9191, that's the port we decided we're gonna go and run our sidecar proxy on. So on the upper right-hand corner, we went through the proxy, see we made a net cat connection to the port 9191, so it went through that whole workflow of connecting through the sidecar on the consumer across the wire encrypted and then back and hit the machine on the left-hand side. So on the left here, we're just seeing the logs spitting out the message before it goes across. Now if you noticed here, that you will have long and healthy life, that didn't show up on the right-hand side there and you wanna speculate why there was a request there that we didn't originate. It was a health check, right? So the service mesh is periodically going through making a health check and that's why we saw that request come into the server. But it wasn't us actually making a request from the client side. So you can see right now we're balanced just to the loopback, or sorry, right now we're bound to all the ports because our bind is zero, so we're gonna change that to the loopback. So before anyone could still, could connect from the internet onto it, so by changing the bind just to the loopback, we're gonna secure our application so now it's not accessible to the rest of the world. So the next step we're gonna go through, we're gonna spin up some additional console servers, and we're gonna go through and show how we can do some service resilience. So we're gonna, sorry, we're gonna spin up some additional fortune cookie servers and we're gonna get up to four servers running, three or four servers running and then we're gonna go through and kill them. Okay, so we're gonna talk about service resilience. That's the other part. So we're gonna scale up, we're gonna update our Terraform configuration file, tell it we don't want a single fortune cookie server anymore, we want three. We're gonna change the configuration file, we're gonna rerun Terraform, Terraform will go through and figure out the changes that are needed and we'll spin up the additional hardware. And at this point I've gone and I've turned on the automation that'll actually do the joining. So we don't actually have to go through and do that joining by hand that you saw me do, so you can automate it in Terraform. It goes through and it pulls out the IP address of that first console server and it passes it on as a parameter to all the other instances that are spun up so you don't have to go through and do that those commands by hand. I am going to speed up so we don't actually have to go through and see the hardware being spun up. You can just take my word for it that it gets spun up. Okay, so at this point we can see we have three fortune cookie servers running. They are all joins onto the service mesh because we did a request for all the members and we see that they're all there. We can go through and make requests to the unsecure ports to see that those additional instances, so the first one is secure because we did that one by hand, but the second and third ones are not secure. They have not been unbound from that port zero. So now I'm going through and I'm enabling that secure service. So we're gonna skip through that because you guys have seen it already and let's get to the point where we start. So we're removing the bind from all the interfaces just the single loop back and then we're starting up those like our proxies. We are now gonna start making some requests and then we will go through. So let's go back a little bit. Okay, so at this point we have three servers running. We are gonna go and kill one of them and just validate that. So I'm doing a shutdown halt. So this is physical hardware, I'm shutting it down so it goes away. At this point, the service mesh doing its health checks are validating, hey, what servers are things still running on? It's gonna notice that it's down on one of them. We'll move that from the service mesh but our net cat keeps on running just fine. We still keep on getting requests from the remaining two servers. So now we're gonna connect to the second machine and we're gonna kill that one and then we're gonna go back and we're gonna keep on doing our requests through the service mesh. It'll take a second for the health checks to update. So there we go, we can see that it's now making requests to the last remaining one and then we will go and we will kill the last machine. So at this point, all of the Fortress Cookie servers are dead so unfortunately the service mesh can't help us at this point and we don't get a response back. Okay, so real quick, what we've gone through here is we've taken a microservice, our Fortress Cookie service, we haven't made any changes to it, we haven't had to recompile it, we haven't had any features, no development time, we've turned on a service mesh, we've clustered the service mesh together, we've turned on the sidecar which introduced the encryption of the data across the network. So at this point we basically have got, we've resolved that issue of someone being able to tap in and being able to see that fortune being run across the wire and so that's no longer visible because that traffic is being encrypted. Obviously this is the basics of using a service mesh but hopefully it's opened up your eyes to how easy it is to really add in authentication, encryption and service resilience into an existing application you might be running. Any questions I can answer? Yeah, oh there's a mic. How does console do gateways, implement gateways for external traffic going into the mesh? So external traffic coming into the mesh. So you mean if someone isn't running a proxy, a sidecar proxy? Yes, exactly, like other gateway implementations. Yeah, so then I think you're sort of outside the realm of a service mesh because the service mesh, the idea is that both the consumers and the services are communicating through that service mesh to learn about it. You could still use the capabilities, you could still register the services through service mesh and the DNS capabilities would provide that health check and everything but you're not gonna get that client and server side authentication of the certificates between it. So I would look at this more as a solution for when you own both of the consumers and the clients, sort of east-west traffic rather than coming in externally from the just public internet at general. Anyone else? Yeah, so you showed that the sidecar proxy can get you secure transport of your application's data but is like the whole TLS, the trust chain and everything is it also validated? Yeah, so. Will you connect? So there's a client side and server side certificates so it's authenticating both the client and the server. We touched on at the beginning vaults, vault is a key management solution so that's sort of the next part of that is the distribution of keys. So yes, we are encrypting traffic, we did not go through and turn on the authentication in this example here just to limit the scope. But that would be the next level would be to have a go through and define that things that are Fortune Cookie servers can talk to Fortune Cookie consumers but they can't talk to other consumers. So you'd go through to find those relationships and as you turn on new instances and turn off new instances the relationship with IP addresses goes away so you just need to worry about the Fortune Cookie clients and consumers can talk to each other and the service mesh keeps track of where those are running. But yeah, that would be the next step is going through defining those authentication roles. Do you have any pro contrast for example if I'm put running Kubernetes on bare metal with Istio? Okay. But because it sounds to me like it's a similar solution. Yeah, they're both service mesh implementations. The real example that I was showing here is you don't have to go through and containerize your application. You don't have to run it on top of Kubernetes to get the service mesh. A lot of people think service mesh, they immediately turn to Kubernetes and they think I need to use Istio or something else like that. And the idea is that yeah, you can do this without having on more legacy without having to go through and spend the development. So you don't really compare that or you're just used console and... Yeah, so it's the same high level construct of both of them have the same capabilities. So I wasn't trying to say console is the solution. It's more of a general, hey, this is what service mesh can give you. But yeah, there was a bunch of different service mesh implementations. Thank you. Just a quick follow up. You picked console. Was there any particular features that you needed from console or were it applicable for all kinds of service mesh like Istio or other ones? So I picked console because I didn't need to have the dependency of Kubernetes and running it in containerized environments. So this was an example of how can I run this directly on top of bare metal without additional layer of software infrastructure. So without having to go through an extra development cycle for a lot of projects I get into, they just want encryption, they want authentication, they want some of these features, but they don't want to go through the process of moving into a containerized solution. All right. Thank you, John.