 Hello everyone and welcome to our session at the tail of Meshik Kafka. My name is Ariel Schupper. By the time of the recording, I'm in Portshift, but by the time of the, when we hear this session, I'm going to be part of Cisco. Hello from me as well. I'm very happy to be part of this event today. Thank you, Nicholas. What are we going to talk to you about today? We'll tell you who we are and a little bit more details. We'll talk about Kafka usage in general, any Kubernetes specifically. We'll talk about the security architecture or how do we secure Kafka clusters. We'll talk about Kafka and Istio or any type of service mesh. And then Nicholas, we'll talk about model and architecture and how do they secure the deployment and we'll share with us a great demo. A little bit about myself, who am I? So my name is Ariel Schupper. I'm a principal product manager at Cisco. I used to be a VP product management in Portshift, Portshift at the cloud native security vendor. Before working in Portshift, I was the head of serverless solution at Aqua Security. And before that, I led the cloud security solutions in checkpoints of technologies. Part of open source, the QB project that we in Portshift created and also a member of the Istio security working group. A little bit about Portshift, we are founded by 2018, then acquired by Cisco beginning of November, beginning of this month, focusing on cloud native security platform and more specifically on integrating it with service mesh, any type of service mesh. And that's it about me. I'm Nicholas Mussolos. I worked in telecommunication, currency exchange, commercial companies. And now I'm glad to be part of two organizations as a DevOps engineer. The first one is Marlowe Navigation. And now I'm trying to start my own startup business, RTX Direct, which is providing cloud native services. Global and commercial ship management company with offices in over 10 countries, over 1,000 shore based employees and over 13,000 employees on board at any given time. So let's go ahead with Ariel. Thank you, Nicholas. And let's talk about getting to the details. And let's talk a little bit about microservices communication. So we are moving to microservices. Okay, microservices usually look like that. When we're working in distributed applications, we start breaking down monolithic into small components. We want each of them to communicate with each other. So we are creating, you know, a nice communication schema. But then eventually, when the cluster grows and there are more elements, we can quickly turn into this famous diagram that shows the lift microservices communication before they started the project with Envoy and before they moved to service mesh. And the idea that, you know, when we have lots of microservices, a lot of communication, I'm not going to talk about whether or not service mesh is the right way to choose it. But we can talk about lots of microservices communicating with each other. We can use it like the mesh and synchronous mechanism that everyone can communicate with everyone synchronously. But you can also use asynchronous communication or I think the message passing with there are multiple options Kafka Apache Kafka just a popular one. But there are many other options of how we can use event streaming between services, which are not synchronous. Now, we would like to talk about Apache Kafka, you know, as a popular event streaming mechanism. Apache Kafka was, you know, becoming a very popular tool was donated by LinkedIn to the open source community in 2011 as a message queue quickly turned into be like an event streaming. So we're not just making like some simple computation actions on messages, we can run multiple actions on multiple messages simultaneously. We can also maintain the persistency with keeping up the message. There are many benefits to Apache Kafka and to the way it is used today for asynchronous communication. Now, our focus today and Nicholas will write it a little bit more when you talk about architecture is on an open source distribution of Kubernetes called Strimsy. And Strimsy goal is to simplify the process of running Kafka in Kubernetes. It provides, you know, the relevant container images. Strimsy also created dedicated operators that can run the Kafka cluster and can add a lot of support or a lot of to simplify those these operations and to make it, you know, more cloud native way by automating a lot of those things. And the Kafka components run as a cluster. So it can be like, you know, for availability purposes. And as I said, again, Nicholas will explain more about the benefits of using it. Now, when we talk about Kafka and want to talk about, you know, the different security changes, how do we secure this environment? It's important to know that Kafka does require some level of security by nature. The default setting is that, you know, the different configuration allow any user to read or write or publish or subscribe, you know, any or all the data. So you can publish or subscribe to any topics or get a full exposure. The communication is in plain text. If you don't do anything specifically and, you know, and you go and configure the TLS between your users and their brokers, the communication is in plain text. So if someone can intercept it in the cluster, I can get exposed to all the communication. There's no need to, you know, decrypt it. Users can delete the data. And in some distributions, the secrets or credentials are stored in plain text or you want to restrict the access of people who can access those, you know, the zookeeper location. So by default, there are many, I would say, security challenges. But, you know, given the maturity of the product, a lot of them, you know, we're taking care of, we're handled. More specifically, talking about Strimzy Kafka, looking at the main security building blocks. So the authentication user's authentication can be managed. Kafka listeners use authentication, so they can ensure secure client connection. It does support different authentication options. There's a dedicated user operator that can simplify some of them. But as you can see that, you know, in practice, not all of them are always in use, but at least the foundation or the infrastructure to use them is available. Authorization, the native authorization in Kafka is using the simple ACL authorizers. It's based on authentication of users. And then you allow, if you have users authenticated and identified, you can define access control list. You can define what user can access, what resource. Encryption in Strimzy is with TLS. So the communication is always encrypted in the main control elements, like between the brokers, the zookeeper nodes, the operator, and the exporter. You know, encrypting the user traffic, the user communication with the brokers is something that is required in users' intervention and making it with the TLS option. So I was talking about security in Kubernetes. Kubernetes offer a rich set of security mechanisms. There's a lot of investment, a lot of work, and also coming from the maturity of Kubernetes in its deployment. There are multiple options for security, both for deployments, for services, for policies, but also for authentication authorization and powerful role-based access control. Now, when service mesh is used, it provides flexible option to offload out of it. A lot of the authentication, the authorization decisions, the encryption is a completely different experience when using service mesh. So all in all, the security posture of cluster is much higher. Now, this leads us to think that when we deploy Kafka in Kubernetes and with Istio, we'll be in a much better stage because the Kafka security, although they don't match, but if we're using it with Kubernetes, then we can get the benefit of all the existing tools, especially Istio can use, like it does today, can contribute seamlessly to the security level because it's doing it using offloading the traffic, not touching, modifying, or changing anything in the workload. So this would make us think that the security status can be much better. Now, what we discovered, and also together with MaraLoy, is that Istio and Kafka are not the best mesh, or I would say they are not a match made in heaven. And it stems from multiple reasons. But when you come to think about it, that Kafka and ZooKeeper were both designed to have all the required resources available at startup time. In current Kubernetes versions, sidecars like the Envoy Proxy availability can today still be after the pod is already running. And the result we consider, for example, in ZooKeeper, it can lead to instability operation that members got created for a quorum with the brokers. If a broker tried to communicate with ZooKeeper and Envoy is not ready or not available, the broker will crash. We also saw that in ZooKeeper installation, it binds it to the pod IP, but Envoy uses localhost for forwarding traffic, and the result can be connection, refuse errors, and some other challenges that we have been to find. And unfortunately, this leave us with situations that still, from one side, we do deploy the Kafka, we can benefit from all those great tools, but the reality is that those cannot be used. Now, the way to move forward and really secure and benefit from what Kubernetes and what ServiceMesh can offer Kafka is if you can really make some small changes or do something which some have already in the work, but some can be done. So what are the requirements? What do we need in order to make Kafka and Istio a perfect match? So in order to create what we call a Kafka-friendly Istio or a Kafka-friendly ServiceMesh, something that will let us benefit from all the security controls in an automated and smoothed for existing users, but also more importantly for future users which are deployed and you want to connect with the Kafka cluster. There are two critical elements that need to be fixed in order to get this Kafka-friendly level expectation. One of it, which is already in the work, was supposed to be part of 118, 119, but I believe it will be in the next version, definitely going to be included, is making a sidecar first-class citizen part of Kubernetes. Now, what do I mean first-class citizen? Making sure that when a sidecar container like Envoy is deployed with every pod, making sure that this sidecar is up before the regular containers are up and making sure it's shut down only after all the other containers are terminated. And this will make, will assure that all the challenges we discussed in the previous slide will not happen because Envoy will always be there so we can, both zookeepers and the brokers can establish their communication without worrying about it. But that's not it. There is some little bit more tweaking that is required and thanks to the flexibility of Envoy and Istio, this is something that was already available since 1.5. And this is adding a special detection to Kafka traffic, either by enhancing the current Kafka filter because today Envoy supports a filter for detecting Kafka traffic, but there's slightly more that needs to be done or creating a new filter or a new proxy for Envoy, which again also, post 1.5 is much simpler to do. Now, as I said before, so there are infrastructure issues which are being done or managing Kubernetes already part of the release plan and it's going to be included probably in the next version, but there are some, you know, modifications to Envoy which are required in order to make it, you know, Kafka-friendly. So if we will achieve that, we can run Envoy proxy in all Kafka elements, whether it's broker, zookeeper, subscribers, and producers. If we can use similar mechanism like today with HTTP, kind of a proxy that allows us to parse the Kafka stream and, you know, send it for authentication and authorization. So we can authorize any request. Now, this can be done either by enhancement to the current filter or by using the new WebAssembly toolkit, which are to customize and create customized filters for Envoy, like an on-demand. And then, you know, we can use Envoy to invoke authorization for every message based on the seven properties. We can, of course, authorize it. We can create policies. Those policies will authorize. We can, of course, cache the results. So it's not going to be like every request only for new connection or new service. But this is something that will allow us, you know, to benefit from a fully authorized mechanism that allows users to define the rules. Envoy can enforce it, can create it just like we do today with our HTTP communication. Now, Envoy can also pass authentication information to Kafka authorizers. And Envoy can seamlessly encrypt and manage all the certificates, you know, of the cluster, of the brokers, of the users. There's no need to work on the TLS certificates for the users. We can also take it a step forward and do all the ingress and the egress communication. We can manage them and using the service mesh policies. So with having the right filters and some limited help, we can really reach a level of very Kafka-friendly Istio that will allow us to benefit from all, you know, the inherent security which are included. So how's the structure is going to look like? So this is going to be like we're going to inject the Istio or the Envoy proxy to consumer pods, producer pods, zookeeper and the Kafka broker, the Istio control plane will make sure that everything is encrypted and, you know, the Envoy will encrypt the traffic, you know, get the certificates and, of course, rotate when it's rotated, but can encrypt the traffic as soon as it leaves the consumer or the producer containers. When it reaches the broker, the broker will forward the traffic for authorization. So it can verify the three-year results. Every new authorization request will be verified. It will be cached and this can be maintained and the traffic will not impact the performance. So what's going to look like? So instead of using the regular authentication, we can, of course, use the Istio-based authentication. The Envoy proxy will extract the application or the microservice identity and forward it for the authorizer just like, you know, it's using it today. Just decision will be made locally and will be maintained locally. Just a second. Okay. Authorization will be the same so we can, you know, authorize user-based and specific attributes and then use the Envoy proxy for that as well. We can get much flexible, more granular options, setting the rules and caching those results. Encryption can spread around the entire cluster. So instead of keeping it only specific to the control, we can run it, you know, all the cluster and trip, all the traffic between the microservices and the brokers or even the internal, everything can be managed, you know, rotating certificates. It's going to be a much easier and simpler task to do. So all in all, with having, with, you know, making those changes, you know, we can make Istio a much friendlier to Kafka. We can then, once we deploy together, we can use all those mechanisms which today are not very advanced in Kafka. We can use the Istio mechanism and then we can bring the Kafka to the same level of security like any other regular workload that's running in Kubernetes. We can really benefit from both worlds together. But that brings us to the question, what do we do until that? And until we have Istio friendly, what can be an intermediate solution? And here we won't talk about what we are using today. And today we're using the Open Policy Agent. So the Open Policy Agent to do all the microservices authorization, just on a nutshell, the Open Policy Agent, it's a popular tool, it decouples policy decision making from enforcement decision. Okay, so when we use Open Policy Agent with Kafka, we can, we need to use an OPPA plugin inside, you know, the Strimzy Kafka. And this plugin can redirect or make authorization requests to the OPPA server. So when OPPA is using with Kafka, the Kafka authorizer call to the plugin, the OPPA server, to evaluate the policy based on the input from the authorizers. Input is the same set of information, just nothing changed. And then in OPPA server, people can define their policies and the OPPA can evaluate any request based on the policy and respond to the authorizer with a decision whether the request is allowed or not, and decision are cashed by the authorizer to make sure performance is not affected. So this is Open Policy Agent. In Portshift, we use the same architecture and use the same plugin also to allow people to communicate and define their authorization rules. And then the OPPA plugin will forward it to Portshift. Portshift will based on based act like just the OPPA server and based on the predefined rules will verify what users can access, what, you know, what topic, what broker can be accessed. We are focusing on topics. All the communication going to be authorized, the microservice is going to be authenticated based on the runtime properties, you know, in the namespace they are deployed, their source of origin. We can use those customized entities and verify that our users are authenticated. And then the traffic encrypted is used to be done independently. So that's going to be how our classics look like. You know, the main issue is the OPPA plugin getting information from the authorizers calling Portshift and getting information. Istio is not really in place, not in the Kafka broker. It's not in the Kafka pods, but it does being used in the Kafka pods. But again, it's the current version, which is still need some modification to get it like, you know, in full fledged to get from all the benefits enjoyed. So this is all from my side. I'll hand over to Nikolas. Nikolas, please take eyes through what you do in Marlowe. I'm sorry, I was stuck on slide 21. So I will talk about the Marlowe story and what happened in the practical side of stuff. What Marlowe is currently using. We are using a legacy system that was built for 25 years. It kept growing and became complex. And if I can recall correctly, it reached its end of life support in 2003. Not only that, but the server side application does everything. It received requests, executes domain logic, retrieves and updates data from database and responds back to the client. Modularity within the application is typically based on features of the programming language. Even a small change of to the application requires that the entire monolithic system is rebuilt and redeployed. It gets difficult for a change not to affect the whole system. As much as it grows over the years, it gets difficult and complex to maintain. And in order to scale the application, we would simply create more instances of that process. And usually it is not possible to scale the components in the kernel. So moving to cloud native microservices and Kubernetes microservices. With the Kubernetes native deployments, we can get view on different metrics, such as CPU usage and RAM usage. The horizontal pod auto scalers give us the ability to scale the number of instances in the replication controller or replica set based on those metrics. We can also use the same metrics alongside with health checks to vertically scale our infrastructure when needed, even that we prefer scaling horizontally our infrastructure as well. And with the introduction of GitOps, deployment configs, health charts, we save a lot of time, we have better rationing and obviously much easier rolling updates. So a bit about architecture. This is a small representation of how our system works. We still need to get data from our legacy system, since we are still in development. So we have stateless apps in blue color that issue commands and events, but we still need to keep the state somewhere. So we decided to use Kafka and other storage services such as elastic search to do that. Given all that, we need and given that we needed to move forward to a hybrid environment at some point, which is a lengthy subject that I don't want to get into today, we decided to move from managed Kafka services to Streamsy. Streamsy gave us freedom in so many ways. For example, leveraging all the Kubernetes concepts that I already talked about, we are able to use GitOps to easily deploy a ton of Kafka's with a few clicks. We parameterize a lot of the configuration. It's pretty much secure in comparison with other services and it has a great community that is always willing to help. The problem with such environments and traditional firewalls is that they need skills to get configured well, especially on cloud native environments and complex networks. Imagine having to secure a whole Kafka cluster using regular expressions and having almost zero visibility on the request made. Furthermore, firewalls need a lot of people to get configured, which it uses the agility and the acceleration of all the development, which is bad for everyone. We also sometimes need to give or block access on different layers such as on the micro service level and more importantly on the topic level. Also, given that a lot of time is now with the coronavirus days, we need to work from home. We need to secure the environment for specific IPs came up and as much as we move forward to configuring and securing Streamsy, we realize that we need something more sophisticated than managed cloud services and that's where Portshift jumped into the game to save our lives. So the Portshift solution gave us visibility on what requests are made with tables and nice graphs. We were able to set up rules on a very user-friendly environment. We are able to add rules for specific microservices and specific topics and moving forward, I will show a small demo of how easily those stuff can be done. So on that demo, I will be using Darko CD, which is a great tool for GitOps. Another great tool is Lenses that is great for visualization of data in Kafka among other stuff, but that's what I will show today. Obviously, I will use OpenShift, we use our Kubernetes cluster, Streamsy and our security solution, which is Portshift. So let me share my screen. With Darko, I will create a new app. I will give an application name, Afkan demo. I will select the default project. I will select auto create name space. I will choose my NGIT Ripper. I will choose the root path. I will choose my cluster and the namespace name I want. And after creating and syncing that, after some time we will have a full workable Kafka cluster on OpenShift. So let's see what happened. We have a broker, a zookeeper, Lenses and the operator and the Kafka Connect. Let's head to Lenses to see what's happening here. We have some topics, three topics with mock data created. Let's see what's inside. So some sensitive info of our customers, some customer information and some IBAN numbers. All of those are mock data, don't worry. And let's head to Portshift to see what we can do. We can click on policies and then connection rules. Let's start adding rules. I will select by port and put the Lenses A port name. Then next, I will select by Kafka. Here I can see my brokers and the cluster. I will select the broker that we just created. Next, I will select Kafka. Now we can see all the topics that are used in this broker. I will just select the IBAN numbers and the sensitive information and I want to block access for reading and writing on those topics. Next, we just put a name and then block the access. When I press finish and apply the policy, I will head back to Lenses and after some time I will lose access to it. So I cannot have access to those topics but I still have access to the customer information that I want access to. So back to Portshift. I will add another rule. I want to block access to IPs for everyone in the world. So I will put 000. Easy as that. Sorry about that, zero. Next, I will choose a name. Lenses in that case. Next, any protocol and then block. I click finish and then apply the policy. I can get back to Lenses. We need to wait a bit for it to work. Let's see what happens. Doesn't work on the first refresh. Let's see the second one. Now it's blocked. Let's try again to be sure. Blocked. But now I want to be able to work from home. So I will add my IP to be allowed. So let's select my IP. I will play it out for now so you can see my IP. I'm adding my IP and 32 for just one IP. Again with the same name. Between next, any layer 7 protocol. But now allow. I will add a name. Finish. And now because I want this rule to have higher priority, I can easily move it up. So let's apply the policy. Go back to Lenses. Refresh. Now we have access to Lenses. But we still don't have access to the topics we blocked. So that's good. So let's head back to Portshift to see the rest of the goodies it gives us. Hasena is dashboard with some risky workloads, events, connection, how much ports are running, permissions. Hasena is navigator. That's a nice diagram of what's happening now. We can expand this. And we can see all the external connections and all the internal connections. So one, Kafka Connect, connecting to the brokers. And Lenses Connecting to Brokers. And some IPs from the outside connecting to Lenses. Let's go to CI. We can see all our images here. We can see some vulnerabilities and how many there are in every image. We can click on the critical. We have a lot of information here such as in the description. And we can click the exact CV to go to the CV page and see what's happening with that. So that's nice as well. And then we can go to the runtime. Where we can see all the workloads running on our cluster. And some more information. There is that if it's allowed. And then we can see all the connections happening right now. As you can see on the right side, we can even see which topics are accessed by which workload. And finally, the risk assessment of our system. We can select the scan time or schedule it. We can select the severity to report. And if we want, we can select the namespace we don't want to right now. After we click save, the scan is starting. And after some time that I cut out, it's finishing and showing the vulnerabilities. So that's it for my side. Getting back to Ariel to summarize the presentation. Cunicolas, thank you for this great demo. And thank you for showing everything. I know just to summarize everything, you know Kafka, even when deployed in Kubernetes does require dedicated security tools. It does not benefit or inherent security mechanism to try deployed. Is too going to be ideal candidate for meant to do it or to achieve this level. Some of the work in order to make it, it's not natively out of the box. Some require will be enhanced in the future when site containers will get the better treatment. And some require a little bit more tweaking by adding a dedicated filter in envoy that can detect and manipulate the traffic. But with small fixes, I'm sure in the near future, we'll be able to use Istio and to really get benefit from all the security mechanisms and to bring the Kafka cluster to the same level of security, which allow everyone to use it freely and securely in Kubernetes or in OpenShift. In the meantime, OpenSore, the OpenPolice agent can be used in order to customize a lot of the work, but nevertheless, I'm sure in the near future, we'll be able to benefit from the Meshi Kafka and make a service-friendly Kafka. So that's it. Thank you very much for joining. Thank you, Nikolas, for a great demo. And thank you for doing this for joining. And we'll be available in the next few minutes for a few questions from the audience. Thank you.