 Good afternoon, everyone. We're going to go ahead and get started. I'm here with Spike Curtis. You guys should have seen him yesterday on the keynote. And I'm Dan Berg. And we're going to take you through a session here to talk about security and services mesh. So what's the problem that we're all trying to solve? We're moving towards highly distributed systems. And those systems, especially moving into the cloud, are complex. They're complex to monitor. They're complex to manage. They're complex to secure. Why is that? Well, we're dealing with all kinds of problems. In our code, we're dealing with network problems, dealing with lapses in the network itself. How do we deal with that? We're building in retries in our logic. We're dealing with what we're building in. Retries, fault injections, directly into our code. We're dealing with authentication issues. Authentication, authorization. But as developers, we're building that into each of our services. We're building that into our code. So every single one has their own mechanism, their own complexities for dealing with that. How do you even observe what's going on in these highly distributed systems? So even understanding the services, how they communicate, what are the lapses in the network, it's very difficult, it's very complicated. So fault tolerance as well is another issue that dealing with a highly distributed system, you have to assume that every single one of the components in your system, it's going to fail. It will fail eventually. So how do you build in that fault tolerance? Again, as developers, you're building that into your code. You're building that into the logic. And it's very inconsistent. And at the end of the day, you got to deliver these changes. Your operations, you have to deliver into these highly distributed environments, into multiple different clusters, multiple different clouds. You need to reduce the risk as you're delivering that in. And it's very complicated to do that. So how do we do this? What is the service mesh that we do this with? And what we use and what we work on is Istio. And unless you've been in a vacuum during this conference or you just arrived this morning, this may be the first time you've heard of Istio. But hopefully, this is probably the 100th time you've heard of Istio. But Istio is a service mesh that allows us to connect, manage, monitor, and secure all of your services in your environments. So why do we have Istio? Why do we have this service mesh? Ultimately, it's to deal with the problems that we just discussed here. But in the end, it's about resiliency. Resiliency and efficiency, dealing with how do you ensure that each of your services can communicate and can communicate well and deal with failures in your environment? How do you manage, because it's so distributed, you've got a lot of microservices, you've got a lot of services in your environment, how do you manage the operations around them? And we wanna move, instead of individual configurations of each of your services, we wanna move to a policy-driven approach so that it's much easier to operate. You need the visibility in your environment. So Istio provides that fleet-wide visibility into all the services that are available in your environment and the policies for managing those. And definitely, you have to have security by default. And you need that security distributed down to the endpoints. So we wanna ensure that with Istio, all of your services are always secure from day one. And a lot of this is moving the complexity that we talked about earlier from your code and moving that out of the code and distributing that down to the mesh, down to the endpoints. So let's take a look at Istio. What is the architecture of Istio? Most of you have probably seen this, so we'll go through this very quickly. Istio is distributed into two main sections. There's a control plane and there's a data plane. And the control plane is the APIs and the management of that, of Istio. It has three main components, pilot, which is used for managing the routing of your environment, routing between your services. Mixer is responsible for collecting the telemetry, the metrics. It's also responsible for managing the policy, the fleet-wide policy enforcements and definitions. And Istio Auth is responsible for providing the security, the identity management and the security of your services, the mutual TLS. And that is all used to control and manage the functionality in the data plane. The data plane is where your applications actually run, where your containers, where your pods actually run in your environment. And the way Istio works is we inject a sidecar, which is built using the Envoy technology. And it provides that control point that we use for communicating from one service to another service. And it's all programmed using the control plane, the components in the Istio control plane. So that's the general architecture. We don't wanna go too much more detail in it. What we wanna do now is to hand it over and talk about what are the key things for securing Istio. We're not gonna do a general Istio session here. We're gonna focus on how do we secure this platform? I'm gonna turn it over to Spike to take you through more of that detail. All right, awesome, thanks Dan. So what I'd like to do is spend a little time talking about some of the security features that are available in Istio today. Then we'll do a quick demo, talk about some vulnerabilities and how Istio can help you mitigate those vulnerabilities. And then I'm gonna hand it back to Dan and he's gonna talk about the roadmap for the future. So, my favorite, the big mainline feature that we've gotten into Istio around security is the ability to set up mutually authenticated TLS connections for all of the workloads in the service mesh. What that means is that you can turn this on and without making any changes to your application code, you can encrypt the data and authenticate both ends of the connection. We do this in a really standard way. It's standard TLS and we're doing authentication not only of the server side but also the client side. So this is also interoperable with other implementations of TLS. So if you've ever thought about implementing TLS in your infrastructure, you probably know that the difficult piece is not actually getting support for the TLS protocol. That's pretty easy. It's built into basically every library that communicates over the network. But the tricky bit is getting the bootstrapping of trust, those security certificates into each of the workloads. And that's where we spent a lot of time in Istio is actually building out the infrastructure to deliver those secrets so that you can authenticate both ends of the connection. So we do that with a component called IstioCA and it is compliant with the SPIFI protocol which is a set of standards that we are currently developing along with the rest of the SPIFI community. But basically, this is a way to standardize the way we identify workloads in our environment. So each workload gets a SPIFI ID. We mint a certificate and then put that into the workload. Now in Kubernetes today, we're using Kubernetes secrets to mount those certificates and in other workloads like VMs, we'll have a node agent that will securely deliver the certificates to the workload. Once you actually have those certificates, then Envoy is responsible for bringing up the encrypted connection and authenticating both ends. And we also do a secure naming check. So when you make a connection to another pod in the mesh, we know what the service is that you're trying to connect to. We examine the Kubernetes API to know what service accounts are actually running that service and make sure that the pod that we're connecting to is actually one of the ones that is running your service. So we're doing a secure naming check is what we call it to make sure that you're talking to a legitimate authority for the service that you're trying to reach. Istio also has an ingress service and as a sort of standard feature of ingress, we can terminate TLS. This is a bit of a work in progress for us because we in Istio today allow you to insert a single certificate into the Istio ingress and serve anything up there. A lot of what we're moving to is having more than one service being able to be exposed by your cluster. And so for that, we need the TLS SNI and that's coming later and Dan will talk a little bit more about that on the roadmap. So the mutual TLS gives you the encryption piece and the authentication piece which is figuring out who you're talking to. In order to build a secure system, you also need an authorization component. So not just knowing who you're talking to but being able to answer the question, should I be allowing that request? And so in the Istio architecture, we use mixer for this. So when a request comes into an envoy, before we give that request to the actual workload that is serving that request, we make a check against mixer. And pass all the attributes of the incoming request and then mixer passes it on to one of several backends. There are built-in backends like whitelists and blacklists and a bunch of different authorization backends that are being built speak as we now. So once the check comes in, mixer renders a decision on it, returns it to Envoy. If it's okay, then it passes that request onto your service which serves up the response. If not, then it returns a 403. And we do a lot of caching here so it's not like this is happening on every single request. There's a cache that your local Envoy is keeping and then mixer will return back an answer and if another request comes in that has the attributes that were used to render that decision the same, then it'll just make the same decision and not pass it on to mixer. Mixer is also used for telemetry. And the reason that I'm bringing telemetry up in a security talk, in addition to the operational concerns that you have just of being able to debug your application, observability is super important from a security perspective. You have to be able to know what's going on in your cluster and you have to be able to see into it to have any chance of detecting bad actors if they are able to slip in. So Mixer supports Stackdriver and just outputting to the standard IO and again a bunch of additional logging components are being built right now. So your favorite logging backends hopefully will be supported with Mixer before too much longer. We also wanna talk about egress policy. Now egress in this case, we're talking about egress from the service mesh itself. So for connections in the service mesh, we have that mechanism of mutual TLS and authentication to know who you're talking to but you will often have services that wanna make external API calls to take advantage of other services. So if you're running in IBM, maybe you wanna make calls to the Watson services if you're running in GKE, maybe you wanna take advantage of some of the Google Cloud features. So you wanna be able to control what the things in your mesh are making calls to. So enabling the things that actually are used in your application and disabling the ability for your pods to make connections to other places. And the reason that you care about that is that attackers might be able to slip code into your pods through a number of vectors, maybe by corrupting your build system or maybe by corrupting some public images that you use in your system or maybe even by phishing your system admin or something like that. And when these things, when they come up, they often try and make a connection out to some command and control server or exfiltrate your data and you can shut that down by using egress policy. All right. So let's do a quick demonstration. What I'm gonna show is a cluster that is running Kubernetes and then we have Calico installed in the cluster as well as Istio and then an application that looks like this in the cluster. And this is gonna talk about mutual TLS and it's also gonna talk about authorization and the reason that you need both of these things in order to be able to secure what you're doing. So let's see. Get out a full screen. So I've got my application running in the cluster and it has very simple architecture. It is a microservice. So I've got a front end pod which is called the customer UI. So that's where people actually, my customers actually connect to and sort of see on the web. And that connects to a single microservice in the backend which summarizes that person's account. And then I have a database which is actually persisting all the account information. And then I'm not showing in the demo but there's a teller UI that the teller uses when I go to the bank and either withdraw or deposit money and that creates transactions which are processed in a microservice and that actually updates the database. So it's called YAO Bank, yet another online bank. And it's very simple as I said. It just shows me my account summary which is my current balance. So I had this running in Istio and I have mutual TLS enabled. So what that means is that if I grab the IP address of the database, even though I actually do have network connectivity to the database from my laptop, so I can ping the database, if I try and make a request to the database, and notice that I have to use HTTPS here because I have mutual TLS set up in my cluster. What I get back is handshake failure. And that's because the HTTPS listener that is in front of that database is expecting an HTTPS connection and it's expecting it to present a certificate that proves that I'm part of the service mesh. My laptop doesn't have one of those, so I get handshake failure and that gets shut down. So we're already doing pretty well here. We're able to encrypt the data and also prevent people from outside the cluster from making contact to any of the microservices that are running inside the mesh. But there are still some vulnerabilities that we wanna be able to deal with that just mutual TLS is not going to help you with. So one of the problems is that we're not doing any authorization, we're only doing authentication, we're basically asking is the person that I'm talking to a member of the cluster and if they are, I'm gonna allow the connection. So that means that if an attacker is able to take over one of my pods by exploiting some vulnerability in the pod, they'll be able to make connections anywhere else in my cluster and be able to use that as a base of operations to start launching additional attacks and I don't want that. So instead of actually exploiting a vulnerability in my cluster, since I'm not a hacker, I'm going to cheat and use kubectl exec to show what it would look like if I was able to exploit a vulnerability. So I'm gonna kubectl exec into the customer pod and basically even though I didn't exploit a vulnerability from here on out, this is basically what it would have been like from what we know of the Equifax attack. So the attackers there exploited a vulnerability in Apache struts and then used that to install a web console and then use that web console to launch attacks in the rest of Equifax's network. So I'm gonna launch an attack against the database from here and notice that I only need to use HTTP because I have the envoy that will happily encrypt and authorize this connection for me. And I'll do recursive equals true and pipe that through Python's JSON tool so that you can see what information I get back. And so just from being able to exploit one pod, I'm able to connect to the database and I can get names, I can get account numbers, I can get balances. I basically owned this database from one exploit. So that's definitely bad. Another class of attacks is even though I may not be able to completely exploit a pod and get remote code execution, it's often easier to convince a pod to tell you something that you shouldn't know, to have it read out portions of its memory or read files from disks. So the heart bleed vulnerability that we had a few years ago is a good example of this where there was a vulnerability in OpenSSL and if you had a server that was running with OpenSSL that had a vulnerable version, then attackers could convince it to read out portions of the memory and that was used to steal certificates and secrets. So to simulate what that might be like, again I'm gonna use the Kubernetes API instead of finding on vulnerability. So I wrote a little program that just goes to the Kubernetes API and gets a set of secrets for a service account and I'm gonna grab the secrets for the summary service account. So when I do that I get both the key and the certificate chain as if I had stolen it. And now if I repeat that connection attempt and provide the key and the certificate, then I'm able to now connect to the database and I can do something sort of even more transgressive. So since I already know account names and numbers, I'm gonna change my balance. 1,000, that's a million dollars the creeps. Yeah, actually mistyped my account number. There we go, there we go. So now when I run that and go back to my bank account you'll see that I'm able to go ahead and just modify my balance at the bank. So again, we don't want that. So the way we're gonna deal with both of these attacks is in the same way it's by applying authorization policy. So in this case we're gonna use Calico application layer policy and I'll show you what the policy actually looks like, it's pretty simple. So we're applying network policy and just like in Kubernetes network policy we're gonna use selectors to select which pods the policy applies to. So this first policy just applies to the customer micro service and then I list a set of rules. So I'm gonna allow any HTTP method or any request that come in and better gets onto the customer micro service, no matter who it is from. For the summary micro service since that's not exposed to the world I don't want anybody except for things inside my cluster to talk to it and in particular I only want the customer UI micro service to talk to it because I know that that's the only person using that in my cluster. So I'm going to allow connections only from the customer micro service account. And then for the database I know that the summary micro service is the one talking to it so I'm only gonna allow connections from the service account from there. And I can select service accounts either by name or by labels if I have a lot of service accounts or you'd rather control your policy via labels. So once I apply that policy let's repeat some of these attacks. I'm actually gonna drop the JSON formatting because I don't actually get the JSON response back. I get 403 forbidden. So Istio does the check with the service account that's provided and the customer micro service is not allowed to talk to the database so I get 403 forbidden. Now I'm also gonna repeat this attack where I've stolen a certificate. And when I do that it also fails. Now you might wonder, well how do I do that because I actually did steal this cryptographic identity. Well in Calico I'm able to enforce that multiple layers so not just looking at the cryptographic identity but Calico is monitoring the Kubernetes API and it knows what pods are being run by what service accounts and what IP addresses those pods have. So it's able to verify against the network layer as well because I'm trying to make a request from a place that is not valid for that service account. I can shut it down. Let's jump back to the presentation. And one of the things I want you to take away from this is not only that authorization is really important but also the way that we've built security policy into Istio is based on Kubernetes ideas. So we're using service accounts for identities. All of the Istio configuration is stored as custom resources so you can look at them with kubectl you can use all your favorite Kubernetes API tools to look at what that config is. You can use your CI CD systems that are designed to talk to the Kubernetes API and push config into Kubernetes to work with them as well. And the way authorization policy is defined so Calico policy which I just showed is based on Kubernetes network policy and we're also working on Istio RBAC so if you don't like the network policy paradigm and you'd rather a role-based access control system then you can use Istio RBAC and that is designed to look like Kubernetes RBAC. And so with that I'm gonna hand it back to Dan who's gonna talk about the roadmap. Thanks Spike. So that was pretty amazing, right? I mean all of that was done, keep in mind Spike didn't have to write or change any of his code. All those security features again were made available in the mesh and it provides a way to program the mesh and have it apply to all the endpoints making the developers much more productive. There's already a lot available today in Istio around securing your services. It's pretty rich out of the box already available today but there's a lot more work that we wanna do. There's more features we wanna add to this. Again, we want that secure by default. We want services highly secure by default. So some of the things that we're working on in the near future. So this first slide is all about what we're doing in 4Q. We're already in 4Q. You saw the snow yesterday so clearly we're in 4Q. So around mutual TLS. One of the things with the mutual TLS right now one of the problems is when you're in the mesh you're in the mesh. That means you're communicating mutual TLS between your services. But the problem is what if you have to call a service that's not in the mesh. It's not participating in the mesh itself. Therefore it doesn't have the capabilities to do the mutual TLS. Today those will fail. What we're working on and what we're gonna be delivering very soon is the ability to enable or disable individual services so you can have a service in the mesh be able to call a service not in the mesh. And it will disable that invocation from mutual TLS. So you can allow those external services to be communicated. The other aspect is as well, one of the key things with Istio is that we want and we allow for you to incrementally adopt portions of the mesh and the capabilities along the way. So as you grow and you mature and you start adopting more of Istio you can turn on those features. Well one of those might be mutual TLS. So you might have a cluster in which case you don't have Istio authentication turned on. You're just using Istio with no authentication. We wanna provide the ability that there's zero downtime to turn it on. Which is pretty impressive if you think about it. I mean you've got all your services out there that are running with no authentication. We're gonna enable the ability to just turn it on with zero downtime and mutual TLS will be turned on across the mesh. A huge one for us right now is also SNI for the ingress. Spike was commenting on this earlier. This gives us the ability to define ingress rules into the mesh and use multiple different host names. Today we're limited to a single host for that. Authorization policy choices. We want to expand the different choices that you have to be able to authorize the identity of those services and more importantly users eventually. So right now we have basically service accounts but we wanna move towards role-based access. So our back as a way to authentic, authorize your services within the mesh itself and Spike touched on that a little bit moving towards the Istio RBAC or the way to define service roles and service role bindings using RBAC. OPA is another technology that we're looking at the open policy agent as well as the Calico unified policy two other ways of doing authorization within your mesh. On the open policy agent there's actually a talk this afternoon that Lemon who's my colleague on the Istio security policy team will be giving. So if you're interested in open policy agent definitely check that one out. And lastly on this slide basically drinking our own champagne Istio on Istio using Istio to manage the control plane for Istio. So managing pilot mixer and the CA components. Now moving forward looking into next year one quarter out we're looking at some other capabilities as well. So around mutual TLS the ability mentioned it a little bit but more work around interoperating with non mesh services. So services that are not in the mesh. But a key one is to allow you to work with clusters and services that are not just contained within a single cluster. So you have multi cluster services running outside the cluster. We want to enable the ability to authenticate with a common identity across multiple cluster boundaries. So that work is ongoing and user authentication. So right now in our examples that we were showing here you're authenticating and using the identity of your services so using service accounts and some of the other capabilities that I mentioned on the other slide with our back and others. That's still the service account so we're authenticating and using the identity of the services. But we want to enable the ability for you as a customer of Istio to use your users identity for defining and defining the policies and authenticating within the mesh itself. So it's your users identity that's gonna be coming using JWT in this case. Going even further with doing authentication and using different tools around that. We want to leverage the various identity management systems for from the cloud providers. So almost every cloud provider has an identity management system. We want to be able to integrate that into the service mesh and be able to use those identities as well. And then lastly leveraging external certificate authorities. Very much like the Kubernetes community is doing with secrets, we want to enable external authorities as well to manage the certificates. So that's basically our roadmap. If there's other things that you're interested in or you would like to see, please get in contact with us. There's multiple ways to contact the folks working on Istio. We have Istio users groups. We have a Twitter account. Definitely tweet us if there's more that you would like to see in the security of the mesh. Definitely reach out, get involved. Go out to the Istio.io site, start using it. Check out all the cool features showing how you can secure your services. But definitely get involved, give us some feedback. Thank you everyone.