 Hello. My name is Samuli Kusilap. Together with Amit Svarjiko, we are going to present securing a network virtualized with containers and Kubernetes. I work in Ericsson as an implementation architect in the CTO office focusing on security and Ami will introduce herself. So we are going to talk about isolation, then about container signing and especially on notary v2. And Ami will cover a sysbenchmarks, operating a service mesh and securing managed apps in an SMO. Okay. This is my main topic as my part isolation and first let's have a look at the figure, which is inspired from Google blog post. I think this is a pretty good overall figure. And in the interest of time, I'm not going to explain all of the Kubernetes terminology. So straight to the point. I would introduce by saying that I often hear and maybe you as well have heard statements like containers are not as secure as VMs. So I think that the statement is often referring to isolation and actually I think there is also maybe a too optimistic or maybe a bit unrealistic assumption behind that statement. So we will come back to that statement after in the end of this isolation part. So let's start to look at this isolation in more detail. So starting from containers, the innermost part here, that is a like it says there is some isolation, but kind of not very much. Then there is pod and by the way you can have like here it is multiple containers in a pod. Then again, you can have multiple pods in a namespace. Namespace can span across many nodes. And of course you can have multiple nodes in one cluster. So that's the kind of architecture. So going from container to pod, it also has like it says some more isolation. And the namespace has some specific aspect of isolation for example on the service account. And then we come to the node which is basically a virtual machine or hardware. And of course then it has stronger isolation. And finally on the cluster level it is the strongest isolation obviously. So let's move on. And then we have some, I want to just briefly mention some of the improved isolation like on this Kubernetes stack. So there are something like microvm's running containers in microvm's which are basically lightweight virtual machines. So the limitation here is that those are not really so to say mainstream at the moment. There are some specific solutions and for example in certain public clouds we can see those like AWS Firecracker and Google GVisor for example. Then the next point here on the right side is about that you could, of course you could build something on top of Kubernetes. Like as an example here Red Hat OpenShift has multi-tenancy built on top of Kubernetes. And is actually providing tightened isolation and security controls. However from let's say average CNF point of view it may pose some challenges. I mean like deploying a CNF in OpenShift with let's say the tight security controls and multi-tenancy on can be some challenges. Like if the CNF, installing the CNF for example if it requires Kubernetes admin privileges that might or might not be a violation of the policies of that particular deployment of OpenShift. Right. So then I would like to present briefly some of the ongoing activities in this area. So in CNT, the Kubernetes reference architecture we have had quite some activity and is still ongoing. So for example there are now some documented gaps on multi-tenancy and workload isolation with Kubernetes. I mean gaps like basically that Kubernetes itself doesn't provide as high degree of isolation as for example a VM or bare metal. There is also a pull request at least a few days ago was still open. Which is about describing best practices for workload isolation with Kubernetes. And here one input was a recent OpenDev conference and the conclusions in there. Then I can just mention very briefly that similar type of proposals or conclusions we have for example in CNFV and by a telco regulator. I mean these are just preliminary and not yet so to say public but basically saying that you should not isolate workloads. In different trust domains by means of Kubernetes and container isolation. But basically you should then use VM or bare metal based isolation between different trust domains. Also the Google blog post from where this figure was as well is stating like that in certain cases you should really rely on VM or bare metal. And then there is one this learn Kubernetes.io. So essentially we by in these references you could summarize that in certain cases you should deploy multiple clusters for isolation. I mean yeah isolation between trust domains. And then this right side is not about security but I wanted to mention it briefly because it is related. It actually pointing to the same thing that is down here that you maybe need to deploy multiple clusters. And in this case it's not about security but it's simply that let's say a typical CNF whatever that is. But anyway it might include some cluster wide software. I mean that the CNF has been integrated with cluster wide software like service mesh, ingress controller or logging framework that have been integrated to the CNF in the development of CNF. Or CNF may need to define some global resources like this custom resource definition CRDs. So if you pick a number of CNFs which include these let's say software or properties and you want to deploy them in a single cluster you might very well end up with conflicting versions of these software or different configurations of these. And as you may see it could also become very complex regarding the software lifecycle management within that single cluster if you somehow manage to anyway deploy these different CNFs with dependencies to the cluster wide software. So for more details I have here a link to a CNTT issue on this topic. And finally very briefly I wanted to mention about container signing and especially notary v2. So as of now there are some signing solutions for containers but no industry standard for example red hat simple signing and docker content trust. These are open source solutions but those are not supported by let's say every registry but more like in some specific deployments. You see these to be used. Notary v2 then I see it as having potential to become an industry standard. So it had kick off last year and it includes like you see many big players in this area are behind this project. And the goals mentioned here also are that they are aiming high let's say. So in the end it really aims to be something that is supported by all the registries and there is also a link to a Scenario document. So I think to ensure that the telecom related use cases would also be covered. Telecom players should join into this project and as of now that is not yet the case. So I hope to see that happening. And now over to Amy. Thank you. I'm Amy Suoriko and I'm from AT&T. And I'm now going to turn to the nuts and bolts of pardoning the containerized network environment. I'm going to spend a few minutes talking about some industry best practices. Specifically the Center for Internet Security or CIS, their Docker and Kubernetes benchmarks. I'm then going to talk about operationalizing a service mesh network. And I'll end the talk with an example of how to use the various techniques we've talked about in this talk. For securing the use of managed applications within an open RAM or O-RAM deployment. The Center for Internet Security has a set of practical and also testable benchmarks for securing both Docker and Kubernetes. Plus the benchmarks include scoring metrics so you really can measure how well you're doing. As you can see on the table on the left, each of the benchmarks has a little better than 100 controls to implement, which granted seems like a lot, but the good news is none of them are particularly difficult to implement or to implement the tests. Furthermore, once you've implemented the patterns and tests, you really can reuse them in all of your deployments. And finally, there are some commercial tools available that actually implement some of the tests. So the Docker example is from the container image and build controls. Containers shouldn't run as root unless they have to perform a task that can only be done as root. By default, containers are run with root privilege and they also run as root within the container. So that needs to be fixed. To meet the benchmark, the container should always set their user to a non-zero UID value. And then to test that the user is set during build, make sure that you include a blocking test in your build pipeline that makes sure the container does not have a UID of zero. Or that's in an exception list of the containers that have to have root privilege. We know that that happens. This test can also be periodically executed in the runtime environment to make sure that the UID didn't actually change. The Kubernetes example is the enforcement of rate limits on the control plane API server. This prevents the cluster from being overwhelmed by placing a limit on the number of events that the API server will accept during a given time slice. It's an important control because a misbehaving workload could actually overwhelm the server and make it unavailable. And that's basically a denial of service attack. Additionally, it becomes even more important in multi-tenant clusters where there might have just a couple of cluster tenants that are misbehaving and they impact the overall cluster performance. Like the UID not being set to zero, rate limiting is not turned on by default. And so as in the Docker example, it is a straightforward configuration. And the test that checks it is to just make sure that the enable admission plugins argument is set to a value that includes event rate limit. Now our recommendation is the virtualized networks that use Docker and Kubernetes should all try to meet all of the benchmarks. So they would have a perfect benchmark score. And where you have containers that need weaker controls, make sure that you include them, that you document wide, and that you include them in some type of a whitelist that you use during testing, and that you periodically review this whitelist to make sure that those controls are still needed. I'm now going to turn to the service mesh architecture. Service mesh is very popular these days. There's a lot of people doing work in it, and it really provides some really interesting security controls. One of the products that you've probably heard of is Istio that does the implementation. So what is a service mesh? It's really a dedicated infrastructure layer that facilitates service to service communication within a containerized architecture. It takes the logic governing the service to service communication, and it actually moves it out of the individual services or the application containers. And it abstracts it to another layer of infrastructure. We call this a service mesh container, which is this yellow box. I'm really only going to focus on externalizing the interface security controls, such as authentication, encrypted communication, or TLS, your RBAC controls, event logging, and certificate management. There are a number of other controls that the service mesh enables, but those are not security related. So the diagram on the slide is actually an abstraction of a service mesh architecture, and you should really focus on this service mesh container, or it's sometimes called a sidecar. And it's going to implement all of your security controls. Just as a note, I intentionally excluded the depiction of the podnet and service net components that are part of each node, just to make the picture a little more simple. So you can see in this picture that the TLS is no longer managed by the application. Instead, it's managed by the service mesh container. And additionally, the container has got the RBAC in it, and it's got all of your logging in it. What this really helps is that it has removed all of that work from the application development team, and it's externalized it. And this also enables you to have a very uniform implementation of your interface security, because all of your service can reuse the same service mesh container. So it's really straightforward to deploy with this STO and other service mesh implementations, because they typically include some type of an internal certificate authority, and they have automated provisioning implemented. They've got a user store for managing RBAC, they've got logging capabilities. Plus they typically have a lot of integration with open source products. One example is key cloak to use it for your user management, which would also cover the RBAC, and also some type of an elk stack for doing log management. All right, so you've got this great service mesh that you can use. You've worked on it in the lab. It's great. And now you go to run it in your production environment. So I've skipped a slide. And the problem here is that you're going to have to integrate that service mesh with your enterprise security platforms. You can't just use the out-of-the-box services. So what does that entail? You're going to need to start with your certificate management. You're going to need to select the CA that manages your certificates. So it might be a public CA. It might be an internally managed private CA. That's pretty typical in an operator environment. But in either case, it is a CA that has a root of trust. So you're going to have to make sure that the CA roots and intermediates are installed within the service mesh container. Plus you're going to have to make sure that you can automatically install an actual client cert within the service mesh container. And to do that latter step, you're going to make sure that you've got some type of a certificate management protocol, such as CMPB2 or ACME, integrated with the mesh, so that you can automate your certificate enrollment when it will. And finally, you do need to make sure that this is all integrated with your corporate identity life cycle management platform, or ILM platform, to make sure that you don't inadvertently configure a certificate in some type of a road process. So now speaking of ILM, large companies, especially operators, typically have mature ILM and access control systems. So you're going to need to manage to integrate with those. And that's your RBAC part. And you need to do that integration to make sure that your RBAC rules actually are correctly managed and that they are not granting privileges to applications that shouldn't have them. Now, in some cases, your RBAC rules are still going to remain resident within the service mesh implementation. In some cases, they're going to be externalized to some type of a centralized access management platform. And that's going to require another integration point. The good news here is that typically ILM and access control integrations are via an LDAP. So it's a straightforward integration. The third consideration is actually integration with centralized logging. So the service mesh is great because it simplifies the collection of transaction and event logs. However, when you're running in a production environment and you have a production network, you need to get all of those transactions out into your integrated log management platform because that's where your operations people do all of their monitoring and their alerting. Again, this is a pretty straightforward integration because it uses a standard protocol like Syslog. And finally, the last thing I want to bring up about the work you have to do isn't an integration, but it's really your planning and integration and training. You've got to account for several months to do that on your initial service mesh deployment. I want to point out one thing that is kind of a little bit of a might be a got you with the service mesh in this environment is that when you're talking about network functions, they have to expose both a management interface as well as their functional interfaces. And typically those two interfaces are exposed on different networks. So you have to make sure that your service mesh deployment accounts for that and can connect to different networks. I'm going to end with a proposal and an example of how you can take all of the technologies we've been talking about and use them to secure an O-RAM deployment that uses in which the O-RAM service management and orchestration or SMO can be augmented by a managed application called an RF. Now just a little bit about RFs. They can be developed by their CNF vendor. They can be developed by a network operator or even come out of an open source community. And they really represent a very powerful paradigm shift because they allow an operator to add new capabilities that are independent of their CNF and NF management providers. They can add those capabilities to their O-RAM or radio access network. However, that ability does introduce some new security risks. So if we take the Kubernetes and the service mesh and some of the other techniques, those can really help you effectively isolate these managed applications and reduce the overall risk. Container signing can prevent malware from being included as well. As a side note, it's really important to be sure when you pull in one of these managed apps that they are actually performing as you expect, but that's outside the scope of this talk. So the diagram on the right, or sorry, on the left, is the O-RAM community's proposed architecture including an R-app in the non-real-time RIC and that's a RAN Intelligent Controller which is part of the service management orchestration. I want to note that these R-apps can... They can leverage artificial intelligence, machine learning techniques along with other data and analytics from the RAN and their purpose is to help manage the RAN. So in this architecture that's on the left, you can see that the R-app is meant to only communicate via something called the R1 interface. Thus the R-apps access to the SMO, to RAN mediation and control to data and any external machine learning or AI is always going to be moderated by the non-real-time RIC. And this architecture provides the foundation for the secure use of the R-apps. Now the diagram on the right is a proposal of how we isolate these R-apps by deploying it within a service mesh. And that service mesh is only going to expose the R1 interface. Furthermore, the SMO and the non-real-time RIC will also each be isolated by a service mesh with well-defined interfaces. The interfaces between them are represented by the double arrow and then they have interfaces by which they talk to the other pieces of the RAN. And as you can see, the non-real-time RIC is going to mediate all of the requests from the R-app and the SMO will mediate all of the requests from the non-time RIC. So you have a lot of layers of security in place. Now turning back to what else do we expect from the R-app. So first, it's got to be packaged as a hardened container. So following the CIS Docker benchmarks and needs to be signed by an authority that the operator trusts. So each managed R-app also has to be able to run within the hardened Kubernetes environment. So that goes back to the Kubernetes benchmarks. Make sure that each R-app runs within its own cluster and it is separated from the non-real-time RIC, the SMO and any other R-apps in your environment. And finally, the R-app communication has to be restricted to only that R1 interface. So this is very high level and there are going to be lots of details to getting it done, but the service mesh with these highly restricted interfaces amediating non-real-time RIC, the enforcement of Docker and Kubernetes benchmarks and container signing really provide important layers to safely augment this SMO functionality with the managed R-app. Thanks a lot. And Simuli and I can now answer some questions. I think that there is one question, Simuli. I think it's for you. And the question is which features are missing from Notary V2 and which ones should we work on? Yeah, right. I actually typed one answer there. So basically, I don't have a very concrete or a direct list of what is missing. I mean, Notary V2 is in quite early stage now and maybe there are a bit still kind of different kind of directions still in that initiative a bit, but one thing came to my mind. It might be that the target solution as it kind of looks now an estimate of what it would be. It could be a bit maybe too complex for private registries, for example, registry service provider environment. But I mean, I don't have really a very concrete items to list here. And then there was one really quick thing that we need to get these slides uploaded some place. We've had, I think, two different people ask about that. Yeah. Yes, we will get the slides uploaded. Now we have actually a new question as well, but it says, hi, Simuli and Amy. So I don't know if there will be something more. So okay, we anyway are going to continue the conversation on the cloud native networking channel number two on the Slack workspace after this session. We'll close here pretty soon. So yeah, okay, John has a question. Maybe for you, Amy, about the SMOs. I don't know if you can still take it. First on SMOs, let's see. Okay, the question was an RF might want to inject functional changes into the SMO, inject a CDA, CDS, or add change to policy framework. How will R1 work in this case? There's probably a longer answer than we should give here. I will continue trying to get onto the Slack channel, but John, I'll keep a copy of this.