 Hello, I'm Andy. I have a lot of slides, so I will attempt to do this in as clear, but as speedy away as I possibly can. So, what's going on? I'm Andy. I'm a technical milliner. You can tell I'm a build fanatic and an advocate of continuous everything. I'm a founder of Control Plane, continuous infrastructure and security practices with a focus on containers and Kubernetes. We are, of course, hiring cloud native hackers and engineers. If I can get to the end, I will give you more details. And I've done a little bit of everything from development, database administration, operations, pen testing, architecting. I want to talk about network security in Kubernetes. So, what is this? It is a way to ensure private and trusted communications across potentially untrusted networks with malicious actors like the internet. Why do we need it? Because we can't trust anything. The internet was built with trust between everything. But, unlike the origins of the internet, we're not a bunch of academics running trusted workloads anymore. We think about runtimes and we write software to run as if it were a day boating on the lake when in reality there are pirates everywhere. We should be designing our systems for the most hostile conditions they could possibly face. Resilience is key. So, we'll look at how Kubernetes does it. It uses a lot of standard components, but sometimes in a slightly different way to the manner they're used on the internet. For example, on the public internet, self-scientificates are bad. We'll look at why they might be good. We'll look at how encryption works, how we establish trust and finally how we determine identity from the protocols that keep Kubernetes secure. And our workload safe. And if there is one takeaway from this, it is encrypt everything everywhere. We are at the bleeding edge of a revolution that is already permeating traditional enterprise systems. These systems have high compliance and high audit requirements. Breaking away from traditional network security patterns is really difficult. There are many layers to a security onion. So, cloud native applications give us an opportunity and a problem. It's difficult to encrypt everything, but it's also difficult to encrypt workloads that are constantly churning, restarting and being rescheduled to unexpected or unpredictable places in our infrastructure. The SPIFI project is looking to solve some of these issues. The subject is broad and deep, so we will cover some key technologies and hopefully leave you enough pointers to go and do some further investigation on the parts that interest you more thoroughly. So, network security 101, Kubernetes API components, TLS in general, mutual authentication in particular, CNI and network policies for applications and finally looking at bootstrapping identity for dynamic workloads with SPIFI. So, what do we want from network communications? We want privacy and we want trust. Private means confidential, trusted means we have integrity and authentication on the communication and non-repudiation sometimes makes an appearance here. That means the inability to disprove it was the person actually communicating. We won't worry about that one for now. So, human communications, trusted and local. This is like whispering between two people. We know who is talking to us and nobody else can hear us. This is like a private air gap network. We believe there's nobody else there and so we take no extra steps to verify the identity of the other party. This is okay in a very small number of situations, but in general it is not. If somebody else sneaks into the room or is hanging from the eaves, our security is broken. We need it to be private as well as trusted. So, how does this work on the internet? Communicating on a public network is like shouting across a crowded street. We see the speaker or IP. We hear their voice synchronised with the movement of their lips and hear the words. Is that enough to trust somebody? Well, in human terms right now, yes. Although in the future deep fakes mean impractically that may not be a possibility, but this only works when both parties are local. What happens when we need to communicate remotely? You may remember Postman Pat from your youth. We use the postal service or these days this is the internet. So, with the postal service we want to trust the stamp on the letter as a guarantee of sanctity. Nobody is supposed to open this right? Wrong. The UK in particular has a long history of postal interception and communications in transit in general, preventing them from reaching their destination or spying on their contents. So, we need to get our privacy. What can we do when our communications path is untrusted? We use encryption and on the internet that encryption is what we know as ATTP secure, SSL, TLS, and back we go in time to 1991 when there were no certificates. Neither was their geosities, but for those of you old enough to remember it, here's a trip down memory lane. We tried for a few years. We released SSL version 2. It was difficult. It was slow algorithmically and computationally and it was broken. So, out came SSL version 3. More turmoil. And finally, TLS. Backwards incompatible. Everything is deprecated. We're now on TLS 1.3 and there are many, many attacks against these protocols and specific implementations of them. So, as always, keep your systems patched. Is it really that simple, Will Smith? Sadly not. A single mistake anywhere in configuration, implementation or usage can compromise the integrity of our encryption. Leading to the oft-sighted atherism, don't roll your own. There's no greater challenge for a cryptography nerd than an unhackable cipher, C Blu-ray. Cryptography is just a mathematical puzzle with some strict assumptions. And with that warning, on to Kubernetes, securing the API server. There are a lot of encryption options on the API server. We use TLS everywhere, but how does this work? We ask for a set of data to be signed and a certificate guaranteeing that data is produced. This is the CSR request and we can then use the trust we have in the certificate chain from the signer to the LEAF certificate to validate an X509 certificate at runtime. So, we have the root certificate. The root certificate authority is generally kept offline and safely secured. The root sign is an intermediate or a more general signing certificate which finally signs the LEAF or end entity certificate which is what we get on a website or a workload or on the API server. And what's that? That is a load of X509 certificates in a chain of trust that is the basis for TLS. Okay. So, what are these concepts? And how can we establish new encryption keys across a network that we do not trust? Because it's not already encrypted. We use public key cryptography. So, what is this? Simply it's a way to send messages that only the intended recipient can read. We use the public key to unlock the mailbox. The public key can be shared given to everybody because it's meant to start the conversation and not end it. Then the message is encrypted with that key and put into the box. And finally, the private key is used to unlock the box and decrypt the message. The public key cannot decrypt a message that it itself has encrypted. Only the private key can do so. So, how does this all work? Some quick glossary descriptions. Symmetric encryption is both keys being the same. You use one key to open and close the same door. This wouldn't be any good on the internet because the server would have to share its private keys with every client who wanted to talk to it. So, asymmetric encryption. The keys are different. This is more like it. This is how PKI works. We base this on elliptic curves, which is essentially curves that don't cross and loop around the plane. And we use certain properties of the elliptic curve to make a far more difficult problem space to brute force. Finally, these are elliptic curves. Finally, Diffie Hellman is a way of generating a shared secret between two people in such a way that the secret cannot be seen by observing the communication. This is how SSH works. This has been integrated into the bootstrap phase of TLS now. So, you're not sharing information during the key exchange. You're creating a key together separately. And, have we actually seen these guys? There we go. These are the two gentlemen here as a cryptography that have kept us safe since 1976, fighting US export grade classifications on cryptography and doing a great service for all of us in the meantime. Moving on. There's more info on how this bootstrapping in TLS actually works from DNS Simple, explained with cats what more could you possibly ask for. And this is the original crypto, not this shit. Don't tell the other room. So, let's move this back to Kubernetes Land. The API server presents a certificate when cube control makes a request to the API. It's usually self-signed. This sits in user.cube config on your local machine and typically contains the public portion of the root certificate for the API server's certificate, which is specified when using place of the system default certificates. Cube control uses the public key of the API server to start the Diffie-Hellman exchange. They generate their shared secrets, then shared a secret key and begin to talk, this is now secure. Exactly the same certificate path validation as before. So, this is what a public key is made from. SSL or TLS certificates are the public part of a server's X509 key pair. It contains a public key and an identity, a host name or an organisation or an individual, and is either signed by a certificate authority or self-signed. What does a certificate look like? Well, here is a certificate spec from the X509 version 3. We recognise this as the encoded version of a certificate and we can decode it very easily to see its contents. How do we know that we can trust it? We're back to certificate path validation. And there's a question here of self-signed certificates. On the public internet, self-signed certificates are bad. They are generally the sign of a man in the middle attack. We don't trust them and it looks like somebody has potentially intersepted and reproxied our connection to a server, then presented the communications back to us. The difference here is, with Kubernetes, we sign with a certificate authority that is generated within the cluster. The private keys for that certificate authority do not leave the cluster and as such there is one place to control your key material. If the routes on the API servers where those certificate authority private keys are stored is compromised, you have bigger problems than the encryption in transit of the cluster. So arguably this is a new pattern and it's better than having centrally organisation controlled certificate which you have to manage not only offline in some sort of cold storage but also bringing into the cluster regeneration and management of keys at rotation time. So the new cloud native way is generate your certificates, keep everything, all key material inside the cluster. It's nice, it works. We do this in Kubernetes land with two certificates. One for the server so we prove the identity and another one for the clients. We've mentioned this a couple of times already. It's just the TLS negotiation twice. There is a second certificate exchange between the client and the server after the initial handshake. This establishes a two-way trust mechanism that guarantees the identity of both parties. So on the internet you would ask for the certificate of the server and you would get back the Google.com host name. With mutual TLS you also provide a second certificate with your identity encoded into it. So both parties can identify each other and that authentication can then be used as the basis of authorization. This is a good thing. So where are we? We have private communications, confidential, encrypted. We have trusted communications. We have integrity given to us by the encryption. The Cypher guarantees we get the correct output with the correct inputs. And finally trust. We have authentication from client certificates. So back to the Kubernetes API server. Do we trust the network? Frankly no. And why should we? If a malicious actor gets inside the perimeter they have access to everything behind it. Phrases such as perimeter and DMZ should raise flags in cloud native architectures. Beyond Core from Google is designed as a response to these newly emergent threats. This is a project that basically says trust nothing validates everything and run your servers as close to public as you possibly can. Because that's the best way to bring pain forward. And be sure that your configurations are secure. This is called zero trust and it means that our systems continue to make some guarantees of safety even when some components are compromised. Obviously the nature of that safety is entirely dependent on the system and the type of data it stores but contrast this to traditional perimeter-based approaches. One compromise and the whole system might be hosed. Say what's Equifax. So if you want to run your own zero trust server all you need to do is put an identity-aware proxy in front of it or in the same pod as it. There are multiple examples of this online and it's fine for simple applications. It works nicely. But for complex microservices with deep transitive dependencies and crazy call graphs you'll probably have to run some sort of one-time token service to prevent replays, pass request context around and another number of non-trivial application layer concerns. Fortunately they're all described in this book by Spiffy author Evan Gilman. If you want to dig deeper into the future of network security this is a great place to start. We will examine some of these concepts in detail shortly but first back to the API server. Zero trust really mean in this context. It means that we mutually authenticate the tier less connection with client certificates between the cubelet and the API server between clients that are connecting to it. It means we know the identity of the server and of the client and that we have our own private and trusted criteria fulfilled. Zero trust is a lot more than this but we're adhering to one of its principles to restrict the blast radius of potential compromise in our network. Some container or server has an unauthorised user and their intentions are malicious. Let's have a look at the API server components and their communication. This is an awesome diagram from Lucas Calstrom, the Cube ADM maintainer. It's in more or less every talk I ever deliver and it shows us the network communication, parts and protocols for the various components of the system. So let's think about this. What could go wrong? Well, a container has an unauthorised user. How likely is that? Well, what about if somebody routes a privileged container? Admit it, everybody is running some and with a few caveats that is game over. Privileged containers are the worst thing to happen to computing. Or they get onto a load balancing box that's proxying or transiting packets to the API server with some caveats game over. Or there's another container in the same pod as us. Remember, the pod is a shared network namespace. They share the same network adapter. We can then sniff their traffic under certain conditions. Game over. Defence in depth is a friend here. This is similar to how traditional breaches occur and attack a gains of foothold within a system and starts to look for other chinks in the armour to pivot and to continue to escalate their privilege. We can fix this. For more on sealing up those gaps in Kubernetes, this talk goes into a lot more low level implementation detail. Thank you for the cat. This relentless focus on automation allows engineers to automate and refine the business's concerns. Veracity, security and performance in some order. So, now that we're sure that the control plane is able to communicate in hostile waters what about the applications running on it. Our data is the golden treasure trove that attackers are keen to retrieve. So, if we leak our database keys over the wire, we may as well have not bothered encrypting our control plane traffic. So, what's next? Static endpoints are easier to encrypt as they tend to stay in one place. Or, more importantly, one IP domain. This can then be used for X5 and 9 as an identity. But that's all very well if your infrastructure has a static IP as the front door. But what about dynamic resources? Containers break the coupling between IP and identity that has traditionally been used for firewalling and network security. Containers churn, they get rescheduled, they are dynamic. So, we need a component with a holistic view of the whole cluster to run our firewalls for us. Our cloud-native firewalling is network policy. Kubernetes is a complex mix of abstractions and network types and providers. And so, enforcement of network policy should be deferred to the orchestrator, which has a holistic view of the whole system. This is what a network policy looks like. They are applied to pods by label. Of course, labels are loosely typed. If you want to put that in inverted commas, in Kubernetes, there is no verification of labels. As a security feature, they are the loosest possible way to enforce anything. But unfortunately, we deal with what we have. With an empty pod selector, it just denies everything in the cluster. So, it's a default fail closed. It should really look like this. I just test this API. I think for something so important, it is not very clear. Here's another example that only permits egress for port 53 traffic on both protocols for some reason. Of course, maintenance of policies as applications change is very important. And this is just for layers 3 and 4. That is because DNS is inherently nondeterministic. We could be load balance. We could be OIP. We could just be round robin going on. So, this was ignored for Kubernetes purposes. Istio does away with this problem because Istio lives at layer 7. So, you can perform egress filtering with an egress gateway at layer 7 with Istio, which is something we will touch on briefly later. So, here is an impermissible network policy, of course. And we have a pattern for testing these. How are you going to know if something so loosely typed and verified actually breaks? You SSH on to the node. You enter the same network namespace and you aggressively parallelize NMAP to white and blacklist the desired endpoints that your application can reach. This is a meta-pattern for testing network policy. Start with a default deny and build out your code based on the Kubernetes deployment names. Produce a simple tab compliant output. And there are other ways to firewall things in Kubernetes, notably new vector operators at a different level to network policy. Whatever you do, do something unless you are a creative agency with no state in your application at all. So, we also have encrypted container network interface plugins. These do some good if you're using one, but they only fulfill part of our requirements. This is blanket symmetric encryption on all traffic. We have no identity and thus no authentication, and we are using a single key. A public API is an attack surface, so we need to do something else to fix this problem. Enter Spiffy. Spiffy is a set of open source standards for securely identifying software systems in dynamic and heterogeneous production environments. Spire is the reference implementation, which is a tool chain for establishing trust between software systems across a wide variety of hosting platforms. And concretely, Spire exposes the Spiffy workload API, which can attest running software and issue Spiffy IDs and ESFIDs. An ESFID is a Spiffy verifiable ID document and is the foundation of the identity that we then use to bootstrap our network encryption. This in turn allows two workloads to establish trust between each other, for example, by establishing a mutual TLS connection, and we do this by attesting to individual workloads. This is an example of a Spire attester for Kubernetes, and attestation policy is describing the properties that a workload must exhibit to conform to the policy and to gain the identity and have the certificate generated for them. And it's typically a mixture of process attributes, so the contents of PIDs, PROC, sorry, and infrastructure attributes, so attesting to a VM on a cloud provider, you're using the metadata API to pull back instance ID and type, for example. So, Spiffy is a standard for how an application can retrieve an identity programmatically and importantly, generate short-lived credentials on the back of that identity. And just as importantly, the API allows a workload to retrieve a trust bundle, which is the public keys that can be used to verify these self-signed keys. So, we're self-signing with a private CA, and we need to share the public key of that CA so that workloads can validate the TLS certificates that we have minted and handed out to our cluster, to our applications in the cluster. So, this means applications, libraries, and tools can retrieve credentials that automatically identify them without needing any secrets co-deployed. A Spiffy ID looks like this, there is a trust domain, which in Kubernetes is your cluster name, and you have a workload identifier, and it's encoded into an X509, just like we saw earlier using the certificates extensions to encode further information. Here is a certificate spec, the X509 extensions here are what are used by Spiffy. We know that we can trust this, we're back to certificate path validation, it's exactly the same technology used in a slightly different manner, and we can see how further selectors can be used to identify all sorts of workloads. The workload unknown attest to plugins here, bootstrap the identity process, and can do so across multiple different deployment types, while Istio implements a rudimentary first cut version of Spiffy, I'm sorry, Spire implements a whole lot more. This technology is actively being pursued outside of the Istio project. It is important to note that Spire, and well, Spiffy in particular, is not the network encryption itself. It provides the identity on which we can hang all our network encryption and generate certificates, but it does not have anything in the specification that describes what the usage of the verifiable identity document is for. Very simply it looks like that, it reinsert the Spiffy ID into the SAN. This is the Istio implementation. Envoy has the public keys of the certificate authority injected into it, because Istio is using its pilot component to write those API changes to Envoy's API, it's all API-driven configuration, and as such, Envoy is able to verify the authenticity of those certificates, and security tightening will continue in this project. Istio is still nowhere near where, ultimately, it's promised to suggest that it will be. So, yes, a couple more minor things. The secure naming is then used, secure naming is extracting that SAN, and using it as the basis for routing decisions. So, we then put on top of that, our back at layer seven. So, HTTP verbiage, parts, and we can also, and there's something else that hangs off that that escapes me right now. And there's plenty more. So, recapping. We have end-to-end encryption. Private, mutually trusted communications between the API server in Kubernetes and its callers. We have encryption in Kubernetes equally now for the values in the SCD store, which we didn't used to have. You must turn on SCD encryption on the API server if you have the chance. And in Istio's case, we are actually minting certificates and bringing them to Envoy. Control Plane have some threat modelling on Istio and Envoy from a DevSecOps meet-up in London recently that you will be able to find on the Twitter stream. Eventually there are attacks on Envoy from compromised pods. If you want to talk more about that, we've got lots of interesting stuff to find afterwards. So, the takeaway from this is doing encryption. It's great. Encrypt everything and while you're there, you may as well do it everywhere. Incredibly, I've managed to go fast enough. The obligatory hiring notification. If you would like to come and work in London on difficult problems to do with network security, application security and the provenance of artifacts and third-party code risk, then we are a very nice team, a small team of people and we offer some great benefits. Please hit me up afterwards and any title that you want you can more or less have. So, with that in conclusion, network security is very important. X509 and TLS keep us safe. Network policies are mandatory. You really must be running them in your Kubernetes clusters. Otherwise you're allowing routes all over the place. On that topic, make sure you also run something to block off your cloud metadata APIs. Cloud-native applications have way more opportunities to secure them than previously. A container is a per-process granular high-fidelity hanger for us to put security policies on. We can wrap security around a single process. So, although containers are fast and loose in some, from some angles, they also offer a far greater opportunity to learn SecComp. It's here on Spotify to give you wings and encrypt everything everywhere. Thank you very much. We can do one question. Oh, so on with that. So, banks love using IP addresses as the only source of identity for firewalling. Oh, yes. And have you had any success convincing people that IP addresses are not particularly good identity, or is that mindset? Yeah, so this is wrapped up in the whole breaking this traditional enterprise mindset. It's difficult. The way this works best for us is an enlightened VP will say that we've deployed loads of Kubernetes. What do we do? How do we fix this? It's still... The audit requirements and change control are required to get changes for pairing requests or subnet allocations. It's still super, super difficult. So, much success, no, but the beginnings of changing the mentality, absolutely. There's one further problem that if people have any interest in this, Spiffy are currently looking... Well, Sightail are currently looking at federated trust domains because TLS libraries do not check the federated aspect of some of the extensions to X509 well at all, which means ideally, for a banking scenario, you would have multiple Kubernetes or Istio or whatever, just TLS trust domains that are able to verify each other without being able to cross-generate each other's certificates, like an obvious requirement. However, it's not as easy as that and the project's very much looking for help. So, if federation of trust domains works, then ultimately we can say, issuing IPs entirely like we do in the CNI can be done on a wider infrastructure level because we're layer 7 everywhere. More help is needed. Good question though, thank you. Thank you.