 Good afternoon, everybody. My name is Linsang. I'm here to talk about sodium and the istio, pushing the boundaries of what's possible for zero trust. I know this is the last session of the day, so if you are a little bit tired, I understand, so feel free to stretch yourself a little bit before we just get started. A little bit of introduction about me. By the way, I'm standing on the wheel this speaking podium. There is an elevated step, so if I fall, please come up stage to rescue me, so I'll try not to fall, so you see there's a height difference here. A little bit of introduction of me. I work for a small company called Solo.io. I'm leading the open source organization in our small company there. I'm being one of the founding member of the istio project. I made a lot of contribution. I'm also a technical oversight committee member and steering committee member. I actually wrote two books about istio, istio explained, and most recently istio ambient explained. So how many of you heard of istio before? Okay, how many of you heard of the new thing we're doing in the community called istio ambient? A few of you. Okay, cool. We're gonna introduce that briefly. And a little bit more about me before I joined Solo about two years ago. I work for a giant technology company called IBM. And right before I leave IBM, I went to our corporate directory and took a screenshot of how many patents I had, which is on the IBM name, but I was a co-inventor. So that was a little bit over 200. So that's a little bit of a fun fact about me. A little bit more about our company, Solo.io. How many of you actually heard of our company? Wow, thank you. So it's a small company when I joined. It's like 25 people. Some like the 25th employee, very cool. Right now we have about 160, 170 people. It was funded by EDIT. We're focusing on the API gateway, application network connectivity space. One thing I really like what our company do is we offer free trainings and certification tests through our academy program where we provide a lab in the cloud environment and you can run through the labs and do your quizzes and get a badge. So if you're interested in that, that's the QR code scan for that and it's completely free. So with that, we're going to talk about today's agenda. So we're going to talk a little bit about CNI and CELIM first. And then we're going to dive into some background around service management and SEO. And then we're going to talk about best practice for implement zero trust architecture. What is zero trust? And how do you integrate CELIM and SEO together? And we're going to do show and tell through a demo. So that's the flow. So before we get started about CELIM, I'd like to talk about container networking interface which is CNI. So how many of you are using a CNI today? Well, I would expect more than a few hands. So CNI is the component between container runtime and the network implementation. CNI basically, our plugins are called whenever you deploy a pod into Kubernetes when your pods gets created or when your pods get removed. The CNI plugins are responsible for setting up the network configuration such as routes for your pods. And it's also responsible for assign IP address for your pod. The examples of CNI out there are CELIM is one of the most popular one, Colico as well and different cloud provider typically have their own CNI. Typically most of the Kubernetes distro has their own CNI and some support more than one CNI. So that is CNI. And then let's jump into CELIM. So what is CELIM? CELIM is a cloud native solution for providing, securing and observing network connectivity between workloads and typically running in Kubernetes. But it also supports workloads running on VM and other environments. And what's most interesting about CELIM is it's using evolution technology called EBPFs. So a lot of people think about CELIM. They think about EBPF because that's really where CELIM may initially shine compared with the other CNIs. So what is EBPF? How many of you know what EBPF is? All right, excellent. So EBPF extended Berkeley packet filter it runs in the kernel space to process packages. So what's nice about EBPF is it provides the sandbox environment, allow you to extend your kernel and then running your program in the sandbox environment. It's designed for faster processing and safe processing and it's supposed to scale much better than running in the user space program. So if you look at the traditional IP tables without EBPF for example, the node IP table, by the way, there are two IP table box here. The first one on the top is the node level IP table. And typically in the case of service mesh and sidecar, there's also an IP table within the pod name space to config the IP, the traffic redirection between the container and the sidecar. So typically this is the network flows through the virtual ethernet between the pod and also the IP tables for the node. With EBPF, it kind of shortcut the circuit, right? So instead of going through the IP table, it would call programs as sandbox inside of the kernel directly. So it's a little bit more efficient. So what Solium does is leverage this great functionality from EBPF, what the kernel provides to allow you to connect to your workloads through layer three networking and also the typical overlay networking model. Solium builds on top, Solium first of all implements the CNI, container network interface we talk about. And it also has its own enhancement on top of the network policy, the standard Kubernetes network policy. So they have, Solium has Solium network policy. Solium also provide a very nice visualization tool called Hubbell. It allows you to visualize what's going on within your pods and your network. You can also have encryption config that this is not optionally enabled, but you could config with encryption with IPsec or WellGuard. So that's all possible. So the architecture of Solium is pretty straightforward. So the Hubbell relay provides the UI, the Hubbell UI components. There is Solium CNI that implements the CNI. And then the Solium agent is deployed as a demon set running on every single of your Kubernetes nodes. And the Solium operator sets up everything at the beginning of the install and makes sure every single other components is operating and running. Let's spend a second to talk about network policy because in order for you to use any CNI, you have to kind of know what network policy is. How many of you know what network policy is? Okay, so you guys, most of you already know network policy, excellent. So essentially a nutshell controls traffic in layer three and layer four. And there are three components of network policy, right, pod selector to selector, which pod your network policy is applying to. There's a typically ingress policy and egress policy and these are optional. So you could have one or the other or both together. So this is a simple example of a network policy applied in the default namespace. It's applying to this pod that has row DB and it has this particular ingress configuration to say I'm allowing incoming traffic from these IP block, these namespaces, the paths that has this particular label on this particular TCP port 6379. And it also have a similar block to control outgoing egress traffic. So network policy, I think it's pretty straightforward. So I'm not gonna spend more time on this. One thing interesting about network policy is if you actually go to Kubernetes documents and read what's actually not in network policy, you'll find out it's only for layer three and layer four, right, it doesn't have any cluster-wide support, anything TLS related, they are not really supported in the policy spec. It doesn't allow you to do note specific configuration on your policy, so it doesn't support the line policy. So there is a bunch of limitation. So what Solium brings to the table is, hey, on top of the network policy which we love, we also provide our own CRD called Solium Network Policy that provides the enhanced ingress and egress policy and also supports the line policy and then you can also use observability to have a UI to visualize network flow, troubleshooting network problems between your parts. So that's a quick rundown of CNI and also Solium is one of the most popular examples of CNI out there. Let's go down to talk about service mesh. So how many of you know what service mesh is before attending the session? Excellent, so in a nutshell, service mesh is a programmable interface or framework that allows you to connect secure observable microservices. So typically, service mesh has a psycho architecture, right? When your application A cause application B, it goes through the psycho captures all the incoming and outgoing traffic and kind of using the psycho to mediate the traffic in between your components, your applications and then you as a platform engineer or operator programs what the psycho needs to do through the service mesh control plan typically using declarative YAMLs. For example, in Istio, users using Istio policies to deploy that to Kubernetes cluster and then Istio control plan watches what user deploys the operator deploys and then correspondingly program the psychos to mediate the traffic. This is a CNCF landscape. I don't know if you notice new Istio is a CNCF graduate projects now. So certainly in the most popular service mesh out there, there are thousands of production customers referenced on Istio.io. Last year we started the donation to CNCF. As an incubation project and recently we just graduated from CNCF as a graduation project. And what's really interesting in Istio is ambient service mesh that was launched in 2022. And essentially it propose a new deployment model on data plan to allow you to run your application without psycho. So let's talk about ambient mesh briefly. This is a new architecture we are doing. Well, the proxy is actually running outside of your application container. So what's the benefit of that? When you not have the proxy running outside, when you have the proxy running outside of your container, you essentially, the moment you include your application into the mesh, service mesh, you don't have to restart your application. So whenever the proxy has a CVE, you don't have to restart your application part. So that's really cool. And from a resource utilization perspective, you could potentially have hundreds of pods that's on the same node, sharing the same proxy. And we kind of, the one thing Istio does really interesting is we kind of separate layer four and layer seven separately. So for layer four, we believe the CVE attack surface is very small because it's focused on very limited functions. So we have this multi-tenancy proxy, which we call it a zero trust tunnel, which is the orange box here that's mediated the traffic for the entire node regardless of whether you are running application A or B or C. You might be wondering, what about layer seven traffic, right? So Envoy, for example, the most popular proxy out there is not designed for multi-tenancy, right? In this case, how do you mediate traffic for layer seven that you want to minimize the blast radius and you don't want one noise tenant impacting other noise tenant? So with that design requirements in mind, we designed a waypoint proxy, which essentially controls traffic for a particular tenant scope that you feel comfortable. So that tenant scope could be a namespace or could be a service account, whichever you feel comfortable. So you could potentially deploy a waypoint proxy, which is an optional component, only needed if you need a layer seven processing. You deploy that waypoint proxy to control the traffic coming into your pod. All right, so this is the announcement we made for Ambient Mesh. If you're interested in learning a little bit more about Ambient, I encourage you to check out Istio.io. There's a blog about Ambient, introduce the data plane mode for Istio without Sidecar. And this is the blog, I'm actually one of the co-authors of the blog. So that is Service Mesh and most popular Service Mesh, Istio out there. Now we're gonna talk about zero trust a little bit, right? So how these two connect to each other. So traditional security is typically focused on edge security. So what does that mean? You think your edge is secure, right? Everything inside of the surrounding, inside you tend to think it's safe, right? So you kind of wrap up everything here with firewall and then everything inside, it's maybe a monolithic. It's not a big problem in that case. So you think it's safe, right? So a privileged user can get through a VPN and access what's in your data center. But a hacker, if they actually get into the firewall, they could potentially access to your application. This may not be a huge deal if you're using monolithic big application. However, as people are moving to cloud, moving to microservices, this model based on edge, the boundary as your security, the edge-based security really proposes increased complexity, attack surface with more valuable, right? With the increased breaking into microservices, every single bits of connection could cause problems. If you trust everything, if a hacker gets into the firewall, then they essentially get into every single communication of your services, which is not ideal. So that's the increased attack surface. So what's your trust essentially is, how many of you actually know what your trust is? How many of you are implementing your trust in your organization? Excellent. So by default, you don't trust anything. So if I say anything wrong, please correct me. And that is inside or outside of your network, right? So you verify everything that you are trying to connect to a system before you're granting access, right? So you want to give the least privilege and deny everything by default and only give the privileges if you need to. So that's what your trust is. You want to make sure the identity you can verify and then you want to grant the least privilege access. And what's also interesting is defense in-depth, right? So how many of you heard of that concept of defense in-depth? Excellent. So it's a concept that commonly used in security, right? That while you layer multiple layers together, so in this case, for example, CNI provides layer three, layer four, a service mesh like Istio provides layer seven. So if you have a security vulnerability, let's say in Istio, maybe in your envoy proxy, well, your layer seven, maybe it's a little bit weak, but potentially your layer three and layer four could pick up, right? So this is, well, the layering defense in-depth can really help, right? In this example, your sidecar proxy may be compromised or your application container could be compromised, right? So, or maybe you have security vulnerability in the cluster, right? So in this case, if one layer fail, it can fall back to the other layer to have it enforced. If the other layer can still be enforced. The other example we also provide in Istio is also allow you to control external service access, right? So for example, your microservice needs to reach out to Amazon or Google services or that Mongo database service. So in this case, Istio provides two mode where you can allow any external access or you can config to register only. The ones on the registry you can access, but both of these have security problems, certainly allow any security problem. The way registry only works is we help you program the Envoy config to allow access to the external service. However, if Hacker gets into one of the Envoy CVE and gets into the Envoy proxy and they could potentially reprogram the Envoy proxy and allow access, so that's not really very secure at all. So what we do recommend with defense in depth in this case is use network policy. So for instance, we could potentially use network policy to say your application A is only allowed to call pods within the cluster. So there's no external out of the cluster access. And then whenever you need to call external service in this case weather.com, you always have to go through the egress gateway. And then from the egress gateway, you can have the gateway initiate a TLS origination for you. So instead of a call HTTP weather.com, the gateway would initiate HTTPS weather.com for you. So in this case, you would potentially using layering the network policy and then layering Istio policy on top of the network policy to implement these secure egress control architecture. So with that, I'm going to jump into the demo. So the first thing I'm going to do is starting prepare the environment. So what this does is it's going to deploy a virtual machine in the cloud. We contract a third company called the Instruct. So it's basically what paying for this company to provide a cloud environment for us. As the environment is standing up, I'm actually going to go through the demo scenario briefly with you so you guys know what I'm going to demo. So first of all, we're going to deploy a Kubernetes cluster after the environment is up running. Then we're going to install Cillian and then we're going to install Istio. So just making sure we have all the key infrastructure out there to help us to build this Istio trust architecture with defense in-depth. And then we're going to deploy some test applications. So really simple applications. The sleep application, which essentially it's just a code command allow you to run commands and also the HTTP being application, which you can call and it prints out some response. And the HTTP being also have external site so you can reach HTTP.org and which would give you some responses. So it's basically useful for testing. So that's what we're using for testing. So the first thing we're going to do is we're going to deploy network policy for our environment. And that's not enough though because that helps control something but doesn't help to enforce at layer seven. It doesn't help you to do control the egress traffic. So then we're going to layer Istio on top by deploy Istio policy. We're going to deploy Zio trust to deny everything. And then we're going to config the sleep pod whenever it calls HTTP.org. It's going to route the traffic to egress gateway and then the egress gateway is going to perform TLS or ingenation on behalf of the sleep pod. All right, let's see if we have the environment up running. Hopefully, I think across it's running now. And I'm hoping the wifi here is good. It's perfect. I feel like I'm missing command prompt. So, oops. Thought would be easier but it seems it's still transferred the data. In the meanwhile, does anybody have any questions? As I wait, maybe I should open my hotspot for this because I don't think the network is that great. Hotspot. All right. I'm going to hit the refresh button one more time and I'm going to get on my mobile phone, see if it's better. All right, just bell with me as I refresh this. Doesn't look too good. Let me access it out quickly and continue see if it's better. All right, I did get the command prompt here. All right, so what I'm going to do is deploy a Kubernetes cluster. So using kind. So this is just a really simple test cluster. Basically what I'm having is I have three worker nodes and one thing I'm interested in, you may find interesting is that I'm actually disabled the default CNI, I believe in kinds. The default CNI is kind in the net. And the reason disabled that is because I want to install selling on it, right? I don't want the two CNI. Sometimes when you have multiple CNI, if it doesn't change correctly it would actually cause troubles. So that's why I'm disabled that. So right now it's kind of a stand up the kind cluster for me. And going to, so the next thing I'm doing going to do that's walk through this correct quickly as we stand up our cluster is we're going to download the Cilium CNI. So I'm not sure if you know the Cilium CNI instruction is pretty straightforward. The only thing I was highlighting is when I install Cilium I find out the CNI actually has a different version than the Cilium control plan and CNI and the agent. So that was a little bit confusing when I tried it. So making sure when you install CNI as long as you understand the versions are not necessarily the same, it should be pretty straightforward. So right now I'm just going to remove the binary that I installed so you can see I have the CNI. Yeah, this is what got me when I first tried to install Cilium. So the CNI is like 0.5 version 15. And now what's happening next? By the way, can you guys see is the font big enough for people in the back? It's good, perfect, thank you. So we're going to install Cilium. So you can see I'm installed the latest version of Cilium which is 1.14. I'm disabling layer seven proxy because I don't really need it. And I'm installed in the cube system and I'm setting up IPAM mode as Kubernetes. And I really like the Cilium status command so it shows one of my notes to your upcoming running I believe. So you can see it's a little bit, taking a little bit like 40 seconds to set up our Cilium typically. So it's going to take a little bit of time. In the meanwhile, I'm going to go ahead install enable Hubble, right? Remember we talk about Hubble is the observability UI provided by Cilium really provides you visibility into what's going on within your networking, the paths, communication between the paths. So now if we do Cilium status command everything is good, awesome. So the next thing we're going to do is install Istio. We're going to first download the Istio 1.19 binary and then we're just going to install the demo profile because it has everything I need including like the egress gateway because by default we don't deploy egress gateway because we don't know if you need external traffic control but the demo profile has ingress and egress gateway. So it has everything I need. So one thing about Istio Cuddle command it runs a little bit slower than the Cilium command to install the reason is when you finish Istio Cuddle install everything is actually up running. So when you have the checkbox on, oh sorry I think I missed the command. So let me go ahead install the sample as well. So the reason I install sample is because it has like the tie alley, gray fauna all these other components you help me show visibility. So if I do a Kubernetes kubectl get pass you can see all my Istio components are running and right now it's trying to bring up the add-on which is this observability components. All right so that's essentially this particular lab just to help bring up everything in start. And with that I'm going to move forward to the next section which we will talk about deploy sample application and apply network policy on top of that. So think across the network still with me here. I'm a neighbor who knows when the network doesn't work. Yeah. All right so the first thing we're gonna do we're gonna wait for the command prompt first. The first thing we're gonna do is we're gonna deploy a network policy called a selium network policy and the name is called lockdown namespace egress. So inside of this policy we essentially say for the default namespace whichever label matches default which is the default namespace we're gonna deny access to the word which means outside of your cluster and we do want to allow access to inside of your cluster which is why you have the egress blog and also the egress deny block. Let me click on refresh hopefully it would come up. So I actually stepped a little bit on this when I first tried the deny policy with selium because it was not obvious to me that you have to specify both egress and egress deny. So essentially you have to explicitly allow what's allowed and what's the and then the rest is denied. Okay it looks like it's still trans for the data. I'm going to try to hit refresh see if it's going to help. So finger crossed. I don't know why I have to refresh again. All right it looks like it's running. So let's go ahead and deploy the selium network policy which will apply to the default namespace as we just discussed. And the next thing we're gonna do is deploy sample sleep and HTTP being application and let's making sure the oops sorry. Let's making sure the application is running wait until they are running. And then the next we are going to do is we're going to try to access from the sleep pod to HTTP being right. So that works that's expected. Now remember the denying policy we set right? So if you access from sleep to Kerr actually being the org remember we set the denying policy not allow everything outside of the cluster the word is what means in selium network policy. So this actually hangs. So what I'm going to do is control C exit outside of here. So you can see we talk about Haber right? So let's meet set up port forwarding to port forwarding 12000 to Haber UI. And now if I launch Haber UI I'm going to hit the refresh button finger cross it's working. And when I hit the sleep default you can see right selium CNI enforce the network policy and drop all the packaging outside of like reaching outside of the cluster which is captured in Haber. So very nice right? So the next thing we're going to do so we kind of apply the layer three layer four policy through selium network policy. And we said hey sleep and actually being in your default name space you can't access anything outside of the cluster. So the next thing we're going to do is apply some layer seven policy. So finger cross on the next lab is going to load while so we can do the demo. All right looks like it's going to load. And hopefully this time we got a little bit lucky. All right so now we are going to add our application to the mesh. So with ambient I don't need to restart the pod but unfortunately we're still working on selium integration with ambient. So I will demo using PsyCOP for now. So what I'm doing is label the default name space for injection and I kind of rolling restart the pod. So assuming my pod is running you can see it's coming up. So this is an extra step we're hoping to eliminate with ambient you wouldn't need to restart the pod. With ambient it's really simplified operation. It's just right now the selium integration with ambient we don't have the network policy from selium in force. There are some technical challenging we're going through in the community. All right so our pods are running and let's do some tests. In this case you can see the first test continue works and the only differences in Istio PsyCOP we propagate some of the headers. We need it for tracing which is why you see the XB3 headers. The second command fail and it actually fails with a much better arrow because the PsyCOP proxy knows the upstream we can't connect to the upstream which is actually the beam.org within a certain timeframe and it times out, right? So what we're going to do next is enforce our zero trust, right? So first thing we're doing is in the entire mesh we're going to enable strict mutual TLS. So you have to have strict mutual TLS before you can talk to each other for any application in the Kubernetes cluster. The second thing we are doing is remember our zero trust principle we said we're going to deny everything and allow nothing, right? So this is essentially in Istio authorization policy to say I don't want to allow anything. And now we're going to gradually enable this to privilege access, right? So we're going to enable access from sleep to HTTP being and only allow the get access, right? So let's go ahead run some tests. Okay, so the first command is still works, right? Because there's no, while we see the zero trust in place it still work because we do allow the get access, right? So the second command is the delete method, right? In this case, it fails with 405 which is method not allow because Istio policy was enforcing it. And the third command continue fail, right? Because in this case we haven't configured anything to allow the sleep to reach outside of the cluster. So let's go ahead control the egress traffic. So the first thing we're going to do is we're going to config for any request to the HTTP being.org we're going to route the traffic to Istio egress gateway. So let's go ahead and config that and that traffic needs to be Istio mutual TLS and then from the gateway we're going to originate TLS connection to call HTTP as HTTP being.org. The other thing is because the HTTP being.org is not a service in Kubernetes. So in order for you to register the external service in Istio register with service entry and last not the least, let's go ahead and config the egress gateway to say, hey, I want the traffic coming to me on port 80 to be Istio mutual from HTTP being.org. So that ensures from sleep to HTTP being.org I'm sorry to sleep to egress gateway is using mutual TLS. So with that, let's go ahead run the same command. You can see now we can access HTTP being.org slash headers, everything is good. Istio currently has a bug to remove these. This is already fixed but we haven't had a development build since yet. So I couldn't demo that. So let's generate some traffic. So what we're doing is generate a bunch of traffic from sleep to HTTP being.org and also to sleep that HTTP being service. And then let's go ahead also enable access to Grafana and Kayali. So we talk about Hubble to view observability, right? So in Istio, we have our own dashboard which allows you to view layer seven data, a little bit more data than what you can see from a network layer. So for example, I can enable traffic animation with Kayali and I can see the traffic going on. I can see mutual TLS icon. Like if I click on any of these links here, I can get HTTP metrics about response time, request time. We also have a nice dashboard provided by Istio. So if you go to Istio dashboard here, you can see the Istio entire match dashboard. So you have all these metrics available for you like request volume, P90 latency, P99 successful rate. So all these are available for you because you enrolled your application into the mesh. All right, I think that's the end of my demo. So let me wrap up. I think we're reaching close to the end of the session as well. So what we talk about is zero trust, is trust nothing by default, right? And defense in depth really help you enhance your security posture with network policy from CELIM or other CNI and layer seven policy from a service mesh like Istio. So we definitely highly recommend you to layer them together. It's part of the best practice we recommend in the Istio community and I think by a lot of security company too. So with that, I'd like to see if you guys have any questions. Do we have time for questions or no? Yes, no? Okay, yeah. Does anyone have any questions? Did I put you all to sleep? No, all right, thank you. I'm glad you're here. So how problematic is transfers from default CNIs to let's say Istio and this higher CNI level? Can you repeat the question again? How complicated is to migrate from a default CNI to Istio? Okay, that's a great question. So like I was mentioning, right? Istio is not a CNI, like we don't necessarily participate, implement the CNI spec. We don't necessarily enroll into like your pod IP address configuration as your pod gets added. We don't necessarily config the pod IP tables, right? So Istio is designed to be complementary to your CNI. Istio's cycle today works with any CNI, right? We're trying to reach the same with the new ambient cyclist architecture with Istio. It's our intention to also be compatible with any other CNI. In fact, we recommend you to layer your CNI with service mesh together so you can achieve defense in depth. Does that answer your question? Well, my question was more if you have, let's say Kubernetes cluster with a default setup and default networking, how complicated it is to implement service mesh with the new CNI and Istio? Okay, so the question is not only you want to implement Istio, you're also interested maybe migrate from your default CNI to a preferred CNI. That's it. Okay, I would recommend you to do it step by step. So first, I guess I would ask you why the default CNI doesn't work for you, right? So you want to first look at that. So making sure you have a reason to move to a different CNI before you do that because the fact that you need to migrate CNI, it can be a little bit tricky with data downtime, right? Because the way CNI migration normally works is you would coordinate the node, right? You would move like the path running on a particular node to other nodes and kind of update the CNI or migrate the CNI to a different CNI on that particular node and then you do it one node at a time. So it could be a little bit more complicated and you have to be very careful planning that. So definitely ask yourself that question. Do you really need to migrate to a different CNI? And then for service mesh, I would also ask you, what benefits are you trying to get out of the service mesh? Do you need service mesh on this particular feature? That's the different feature before you include your services into the mesh and I would recommend you to do it gradually, right? Pick the feature that's most important for you and then start right there as you migrate to enroll your application into your service mesh. Thank you. Yeah, thanks for that great question. Any other questions? Well, if not, thank you so much for attending and enjoy the rest of OSS Summit and beautiful bebow. So thank you so much.