 Thank you so much, Candice. Thank you so much for having me here today. So yeah, so today we're going to talk everything about how do you configure a high-performance streaming services on top of Kubernetes. And just a little bit about me. For you, those of you who don't know about me, I am a devolve advocate who is currently working in Red Panda. So my previous experience was very heavily on Java. So I did a bunch of SOA in the back days where I did a lot of Java programmings and I have a lot of experience with IBM at messaging queues as well with spheres. And then I moved on to doing something that's more open source because I like the community. I like to interact with people, especially working with smart people in the community. So that's why I started using JBoss. And then I got introduced to Kemmel. So I did a bunch of work on Kemmel, also becoming one of the person that talks about Kemmel on the internet as well. And then I started using, it's unavoidable that when you do integrations, you have to do a lot of synchronous work. And when Kafka first started it, that's when I started to work with Kafka. And now I'm with Red Panda. So I think for us, it's building that live data stack where everything just moved really fast. So that's our version over here. So I'm happy to be working with Red Panda on helping people to build a stack. So today, we're going to talk a lot about the networking on top of Kubernetes. Feel free to drop in any questions. I will try to answer them as much as possible. And also, I love to hear your thoughts afterwards. It will be good if we can connect on LinkedIn and let me know your thoughts. What do you think about today's session? Do you think it's hard? Do you think it's too simple? Let me know. I'd love to adjust my content next time for you as well. Just a little bit of a note here. So I'm talking about streaming services on top of Kubernetes, all the networking-related contents. So basically, I'm going to cover some of the basics on Kubernetes networking. And it applies to both Red Panda and Kafka as well. So this is not just for Red Panda, but if you're using Kafka already, I think most of the aspect would be related to you. So I just want to bring out a quick pull of your background to let us know a little bit about you. Where are you in your data streaming adoptions? Are you just looking or are you currently looking into adopting one? You're using it already in development or are you using it in production? I want to hear your level. And it would be good to have some veteran as well to discuss things with me. Love to hear your thoughts too. All right. Shameless plug here. Just a little bit about Red Panda. So Red Panda is a drop-in replacement for Kafka. So basically what you can do is just take the Kafka off from your data stack and then put Red Panda in. It will just work. Basically, the entire protocol and API is exactly the same. But what's the difference between Red Panda and Kafka? It is they're both streaming services. I think it's the way of implementations. I'm a geek. So I like to geek out on the fundamental difference between the two piece of technology. And that is why from Red Panda's perspective, because everything was written in C++, therefore there's not a lot of virtualization layers on top of everything. So if you look at Kafka, how it was implemented, it was great technology, great ideas, and great everything. The implementation was based on JVM, so Java and Scala. So everything, it has this virtualization layer of automated memory management. So basically everything has to go through JVM, therefore introduce a layer of management you need to take care of, things like garbage collections and all stuff like that. And also like when things get returned into the disk, it uses the page cache provided by the operating system, which provides another layer of caching and management as well. But compared to what Red Panda does is we wrote everything in C++, therefore we don't have to go through a lot of like the virtualization layers. We directly control the memories. The way that we implement how things work, we use C star framework, basically recruit as much as the CPU powers as possible to work on streaming your data. Instead of where the other solutions more like, okay, so the CPU is shared by a lot of different applications, therefore you got to have to do a lot of context switching in your CPU level in order to get to that point. We'll introduce a little bit of the resource as well. You got to use resource to do that. So that's why Red Panda was, when it did start developing a use, it was thinking about the hardware in mind. So that is why it's a lot faster streaming platform out of the box when you kind of do your streamings. Just some shameless block here, but I do like to geek out on technology and that's why I want to talk about it. But that's back, get back to our topics here. So networkings and data is on top of Kubernetes, right? But let's go back to the fundamentals. So why don't we just use HTTP because Kubernetes is not as infamous on doing basic level different protocols. It's really good for HTTP services, HTTP protocols, but it's not very good at dealing with other protocols. But why do we want to do that? Well, the reason behind it is because, how the fundamental of the communication works, right? So, I mean, of course, HTTP is working on your layer seven where, so basically the older version, not like the HTTP three, but like older versions of HTTP protocol, you have to have your initial handshake and then you have to do a lot of like request response in between that. So basically, whenever time it gets to your destination of your source, where everything is, then your client issues a request, it has to wait for a response back from your server side in order to proceed to the next one. So there's a lot of time loss if you have a large amount of data. But when you are working with specific protocols like Kafka's protocol, so by the way, Rappena uses the same protocol as well, exactly the same. So basically what happens is that it allows multiplexing because it uses a lower level TCP layer four communication and therefore you can do a lot of multiplexing, meaning that you can send your data directly without waiting for your response to get back, which is a lot more efficient if you think about how many data, the amount of data you're trying to transfer. So you don't have to wait for the act to come back, you have to send to the next one because you know everything will be gonna, you have to transfer everything over anyway. And another thing is the wait time, right? So for request response, like the typical behavior is synchronous in actually perspective because you know issue a request to a web server, the web server enters everything and it comes back and gives you an answer, right? And that's kind of like a typical way of like how HTTP works with all that come back, right? But with streaming services, you don't actually need to, you know, wait for everything to be confirmed in order to send the next packet over the internet. So, and then what happens after that doesn't really matter from the sender's perspective because the sender is just thinking about how do I get this data over to the other side? And that is all I worry about. So basically what happens here is that it uses this asynchronous way of communications where the streaming platform Kafka or Red Panda, whatever is the medium of place where it's the middle point where it stores temporary stores your data somewhere and then the rest of the other applications like microservices, your connectors or whatever like data pipelines, whatever you're trying to send it over will then can pick it up later on. So it's doing things asynchronously. So it, first of all, it allows you to do a lot better with the scaling with like, and then allow it to broadcast to different users. We're still talking about basic stuff, right? But how do you do like, how do you do scaling in terms of these two different types of protocols? Well, for HTTP that the typical way of doing things is adding a layer seven, you know, low balancers, application level low balancers where it will just run Robin or, you know, sticky, having a sticky sessions where you'll just send your request back to your server. So the client actually doesn't know about how many servers are you serving. So if you're trying to balance, really, really balance out the load, it's really hard to predict because your client has no ideas how busy things are on the other side of the store. So basically this is hard skills but with Kafka protocols or streaming protocols, things like that, you're gonna have your clients connect to every single one of the brokers that's available that has all the data. So actually the client is responsible for doing the load balancing, the client needs to know a lot. So that is why you have more of flexibilities in terms of how do you wanna distribute the data and how do you wanna consume them? Basic different ideas and that is why it introduced a little bit of a complexity when it comes to networking inside Kubernetes, right? And then when you're trying to do data ingestions into, remember when I talk about, how do I do scale out from the broker's perspective, this is how it, this is what happens, right? So basically you have three different brokers, for instance, in this case, right? And the clients will then start streaming data into three different brokers. The way it does that is that it connects to every single broker's, but it is not going to write everything into one single place because that is going to be a single point of failure, therefore it will then choose different partitions to hold different places. Say for this particular topic from offset zero to offset, or ID number zero to number 100, I'm gonna write it to here, from 100 to 200, I'm gonna write it over to here, and then from 200 to 300, I'm gonna write it over here. So it knows the range and if it falls into the same range, that will be going to different brokers. So everything gets evenly distributed over to the broker itself, but we still have to think about how do we overcome single point of failures? If something goes wrong with this broker, your data is lost, and having your things distributed in different places, not missing a piece of information is probably not very good for catching up on your data. So the way that all the streaming platform does is that they do replications. So basically when the data, when the producer writes the data into a certain broker, the broker then will start initiating something called replication, and these replication gets replicated into the other, so depending on the replication factors that you said, if you wanna say, I want a replication factor for two, that means that it will send two extra copies of the data to somewhere else in the cluster. Basically cluster is how you have multiple brokers inside a cluster that forms like a group of broker that's serving the same thing. So basically we just distribute all the data over to the other brokers, and same to the other petitions that you have. So it will then, all the brokers will have the data, but actually depending on the petitions, like which petitions you are in, it will then different brokers will become the leader, the leader is where the data gets writes in, and that's where the data is gonna be read from. So basically that's kind of how everything does, and because of this nature and the nature of what we have with different protocols, that makes things a little bit more complex when you try to set things up. So when we set up a client from the Kafka side, or from the Red Panda side, a client trying to access data from your streaming platform, what happens is your broker needs to set up a bootstrap server. A bootstrap server can be any broker inside the clusters that you have. So basically just tell your client that, your client's like, okay, this is one of the broker that you're gonna connect it to. So here's the address and go do your job. So basically the clients will then start up and then contact the broker A, or because I'm setting broker A as its bootstrap server, and the broker A is going to give the client a set of metadata and information about the other brokers that is available in the cluster. So the client will have an idea of, okay, so now you have like three brokers, that means I need to contact all these three brokers and establish a connection to these brokers. And these connections one will stay up live and then it will be sending out data according to the petitions that it's trying to send, right? So that is how it works. That is how client knows about everything in your brokers. So everything's great. But what happens is, we're doing a lot of replications internally. So if everything is within your network, everything's great because inside your network, your intra network, that everybody will get their internal addresses, IP addresses, they all know each other, they all know where they're located at. So they will be replicating everything because they all know each other. And then if your client is inside the same network, that is also fine because basically you're just connecting directly to one of your brokers as well. The problem is when your clients is outside of the network, it's outside of your company, it's somewhere in the public, it is trying to access your brokers, right? But it doesn't have access to your broker's address because if you give it's internal IP address, this broker won't know where to go. This broker would like, I have no idea where to go. I don't know where you are. So basically what you need to do is you need to expose a external, you need to kind of have that internet, that network interface exposed to the public so that the client would know how to contact these broker outside of your workspace. So that's when you have to set advertised addresses in order so the advertised address would be this address instead of the internal addresses so that the client then would be able to connect to the broker that you have internally inside your network. That is the important part. Many people when they first got introduced to the streaming platform, they don't know what advertised addresses. So a lot of them wasn't able to connect to the broker and the reason it was always about where they are, how do you locate it? And they didn't realize that the client was getting this advertised address from the bootstrap server. If you have it wrong, then your client will be able to connect to it. That was the whole reason behind it. So speaking of networks, that was just a bunch of VMs or bare metal machines inside your network, which is like, okay, fine. But on top of that, Kubernetes is adding another virtualization layers on top of your virtualization layers. So how do I get to get from, and then running a streaming services is like running a container inside your environment and then everything was kind of like managed by this Kubernetes cluster. So how do I get my client, which is outside of this Kubernetes cluster and trying to connect back to the clusters, Kubernetes clusters that you have, we have to learn about the Kubernetes networking. It is not as easy. So I think if you think about it as a normal internet setup, it'll be easier to understand, but it took me a while to kind of figure things out. So I want to share that with you on my understanding of how it works and hopefully that helps you as well. So with Kubernetes, you have a lot of nodes. Nodes mean that they are VMs or machines that's running your Kubernetes. So they're worker nodes, they're just running things, right? So your container, which is a container image that is running, it will be running in one of the node that you have assigned it to. And they are referred as pod. Of course you hadn't had multiple containers in the pod, but we're not gonna talk about today, but we have a pod that every container will be in the pod. Pod is like a way that you can manage like a spinning up, spinning down things on top of Kubernetes. But these pods or containers are informal, right? Meaning that they can be down and they can be up any single time. Like whatever, if something goes wrong, Kubernetes is going to destroy it and bring it back up again. So that is how things works in Kubernetes. And that is why I said, Kubernetes is great for stateless application because it doesn't really care like if what's the state, if something goes wrong, then it will just bring it back up again and it will just work. You know, like how we remember the whole time when you have some problems with your computer, it just restarted. That's kind of the idea, right? It restarted. But the problem is when you have some stateful information, if you wanna keep remember what's going on before you turn things off, then you have a problem because once you restart everything, then everything was lost. So same thing here. So if I stopped everything from this container and then I bring up a new one, I have a new IP. So does that mean that if my application, if my client is trying to connect to this pod, I couldn't constantly change my IP because then there's a lot of configuration changes on top of for my clients, right? So probably not so well. And then if we have so many, if we in the Kubernetes world, we have a lot of pods containers running inside of Kubernetes space. And then these are all pods. So how do I locate everything? How do I let Kubernetes know? And how do I let the other knows about things, right? So this is where it gets interesting. So when, so what you have to do is you have to create this thing called service where this service will then assigned a virtual IP for a pod inside your Kubernetes network. So basically it's a virtualized IP that can be located or used within the network and people will know what that is. And then what's going on is that it will then go to the Kubernetes API controller, which is in the management level where it is then it's going to say, hey, I have this new pod running in this node and it is using this virtual IPs. Then the Kubernetes API server is going to talk to the Kube Proxy. The Kube Proxy is a Kubernetes agent that installs on top of all your nodes. So basically what they do is their job is very similar to IP table in Linux terms where it has this mapping of addresses and IP addresses. They will know where they are and how to call it. So basically they would have that virtual IP addresses map to that particular pod inside your node along with different settings like port and stuff like that. So we can take a look at the example later on, but this is what's going on. So this Kube Proxy becomes like an address book on top of your Kubernetes node. And this is how when every single time when there's a request goes into your node, the Kube Proxy will look it up and says, oh, you're looking for this particular address services. Oh, that's fine. This is this virtual IP and this is where exactly where it was. And on top of that, you can add some networking policies on top of that. And then it will then transfer the traffic back to your pod and that's how you use it. Everything's great inside a Kubernetes server. But if my client is outside of Kubernetes server, then I cannot use the internal IPs or internal addresses in order to locate my services. So what do I do? Well, you've got two options there, right? The first options is use a node port. The node port, what it means is that it is going, so every single one of your node will then get an external facing public addresses, IPs exported outside of your Kubernetes services. And this public IP will then be used from your client in order to access your pod. And that's how it uses it. So basically when you say, I want a node port services, what happens is that it will then, okay, says, okay, every single pod there that runs this service, every single node there that runs the services, you're gonna attach the services to any one of your port randomly. You can do that randomly or you can assign one. And that needs to be between like 30,000s and this number here, I think I haven't done below. Exactly, like that will randomly choose that and then they will attach that particular service and particular port into that. And then you can use that external IP of the particular node or particular machines and then you can access that into where they are. Another way of doing that is using low balancer. Low balancer, what happens is that it will create a low balancer and then attach that to a external low balancers and they would directly and then it will use that low balancer. So each external low balancers would have its direct IPs. So the service will then says, okay, so if everything coming from that particular external low balancers will then go through that, it will go into the service, low balancers and it will then use kube proxy to look at, okay, this services is coming from low balancer and this is the services that it attaches to and then it will relay the traffic's over here. That's basically how things work in Kubernetes. And if you think about how things work in the cloud, it's not a lot of a difference. So I'm taking AWS as an example, but things are similar for Google, GKE and AKS. It's very similar. It's just the things are a little bit different on the naming side of the stories, right? They have a router, they have a different cloud low balancer. They name it differently, that's it. But in terms of using node port, basically what happens is that this IP address will then get rerouted to the route. So when traffic comes in from the routes, it gets directly routed to the instance, the EC2 instance that you're running it to or in AWS's term, like the actual instance running to and with its port, it gets access to the service. But if you're running through low balancer, basically what happens is that every time when you establish a low balancer, it will then create a cloud low balancers depending on your providers and then these gets attached to the route. So every single time when you call this low balancers, it gets forwarded into your Kubernetes. And that is the basics of Kubernetes networking. I hope that helps, right? And then this is, I forgot the title. All right, so this is what RepHandah looks like when you deploy it on top of the Uncover Kubernetes. Like every other data related projects, Kafka, databases, whatever that needs it to remember the state, you always use that we always have the stateful sets available. The reason behind it is because we need to pick up from everywhere everything was. So we need to know. So when we use stateful sets to deploy our brokers, we'll have default numbers assigned or default names assigned to each broker. So if broker one died, I would have to restart broker one and then attach broker one back to its configurations and its storages underneath the hood, also attach it back to the services that it was originally exporting to. Therefore, things can come back up again. You cannot randomly start up another set of pods, PVCs because it will be completely empty and it is gonna take forever to pick things up. So in RepHandah's way, I think it's very similar to Kafka where Kafka has StreamZ, but RepHandah, we have our own operators. So basically it does the same thing where you just define a set of custom resource definitions, describe how you want to deploy your streaming services on top of Kubernetes and the operator is gonna take over and deploy everything for you. So you don't have to do everything yourself. You don't have to define your stateful set. You don't have to define your PVCs, pod deployment, everything like that. It just does it for you. So that is what's going on. So I'm gonna do a quick demo on how we do things in RepHandah. It's very similar to what you would do with StreamZ, but basically what happens here is I have already defined a node port configurations. So this configuration is a CRD, which I kind of define a CR custom resource where I define I want to have three brokers installed on my Kubernetes clusters and how I want to export it with networking is through node port. And I wanna have a domain name for that. And that is my own domain name. And then the rest is kind of like, I'm just turning off all the TLS, like the security stuff to make things simple. But if you wanna hear more about security, we can talk about it later next time and then a little bit about storage as well. Like here, how much storage space do I wanna have for my particular brokers? Kind of it, and then adding a console so you can kind of see things happening. That is how I kind of deploy my cluster and let me move things over. Zoom is taking a lot of my space. Okay, perfect. So this is the cluster that I just deployed. So you can see I have three different running brokers and it is a empty broker. Nothing is in there, right? So if you take a look at my cluster here, you'll see that I have three. So I already deployed this ahead of time, but because it takes about three or four minutes to deploy everything, but this is what's going on. It will deploy different pods, right? And what is the interesting part here is the actual services that it creates. As you can see, it actually creates a node port service. Well, let's take a look at the node port service. Why don't we do that? This is the node port services that it creates, right? The interesting part is, okay, first of all, we've got assigned a cluster IP where this is where it's locating. And then I have something that's fun, which is the node port mapping. So you know how in streaming services we always export to different port, admin port, and then the actual port that does all the transfers of data, right? These are the two, the fundamental ones that we have. So these are all exported by the port from port specific port internally. But if we want to export externally, what happens is now I have two different, I'm exporting it on a different port. So this is the port that was attached to the public available IP addresses. So if you want to know, okay. So if we want to know about all the nodes and the IP address, this is how you can find out. So for the node, this is the node of the instances I'm running on. So I also have three worker nodes on my Kubernetes. They all run on these three IP address. So these are all the node port. These are the machine IP, external IPs that I was exposed to internet. And you can see that all my different Rappendas are running on different nodes, right? Because I have set the affinity to have all my my Rappendas broker deploy on different nodes. So they all spread across different nodes. So basically this is what's going on. So they all have the node port assigned and then have the node port created. So basically what I have to do is I can now, I can now kind of add it to my DNS. So this is my DNS server. Basically what I did was I like to hide things under, I like to hide the, these like configuration details from my clients. So my client doesn't have to change everything. Everything is underneath the hood. So I always use my DNS server to kind of hide everything. And this is my DNS server. So it's just kind of forwarding to see, it's forwarding to these different nodes here, right? So it's forwarding over things to that. So what I can do now is I can now talk to the server by using the IPs, right? But let's think about what we talked about, the advertised address. Let's just get that advertised address. So this is, we can kind of, we can do a quick curl on, you know, asking for, this is the things that you'll get when I'm asking a broker like, okay, what's the information's there? And then the broker is going to return with you with all the node configurations that you need to know the client needs to know. So we can see that. All right, so the advertised, advertised address from the client to get to, there's two of them. One is internal use, internal only, though this one is used by the internal user if you're in the same Kafka, or in the same Kubernetes space, this is the URL that you will be using. And if you're connecting outside, if you're connecting from outside, this is the URL that you will be using. And therefore that is how my client knows it needs to talk to, this Rafaena and then gets forward to the ones in my Kubernetes space. And that is how it works. So I can do a quick RPK topic, create a test. And what happens now is that it's going to create a topics inside my, inside my, inside my cluster and I can start sending produce. And then if I refresh, I can see the hello that goes in. So this is actually connecting to the clusters using the no port with the TCP packets and everything. So that's how we do things in no port, right? And remember the part where I talk about, setting the DNS, this is how you can set, here this is how I can set my domains. So that these domains are returned by the broker. So the broker does need to know your domain address in order to return the right addresses. Similar to low balancers. Low balancers are similar, very similar to what the others are doing. So basically what I need to do, I have already set up a low balancer over here. The only difference if you're coming from a custom resource perspective, it is only, it only needs this, the low balancer changes. So basically I'm changing from using no port to using low balancers. So I am going to apply this change. Rep and apply the Chef, low balancers and it is now going to upload these configurations over to the, to my cluster and my Kubernetes cluster is going to detect this change my operator would know and then it will look at these configurations and say, okay, these are the changes I need to make and then it will start apply the changes. And let's take a look at the results. So if we take a look at all the things that has been deployed below, you can see that things are a little bit different now. Not only I don't have that external exposed services anymore I now have three different low balancer services created on behalf of me, right? So what happens is now the operator knows that, okay, I need the low balancers and this is how it then creates low balancers. If you take a look at one of these, if you take a look at one of these low balancers, it will then says, yes, I will need low balancers and this is what happens, right? And then you will have a low balancer attached to it. So what is this low balancer here? Simple. So if you go to our, this is my AWS account. So I've been using this to spin up my, I have my EKS running and all that. So this is what I had before. This is just a low balancer. So that's my, my Ngenix Ingress controller or Ingress services that I started it. So I was using that to export, export my consoles and everything, right? So that's HTTP things. But now I want to have three more that was created for me. So what happens is not, if I click refresh, there you go, you can see they are three other low balancers. So basically what you saw was kind of what I had the, when I started a low balancer, it creates a low balancers for me and then they each would have a publicly accessible addresses. So what I'm going to do is I am going to get all the services first, I need to map all, I can't see. So I need, I want to map all these addresses over with my DNS so I don't have to change things. So I'm just going to go ahead and change my DNS forwarded over to the new URL that I got from AWS. These are cloud low balancers, right? Do that quickly, do that and save. And this is going to update my records. I'm going to wait for a couple of minutes for it to kind of capture, update everything on DNS side. But before while we're waiting, I think it's important to talk about the types of low balancers that your cloud providers provides you. So the cloud providers gives you, gives you different types of gateways or low balancers. And as you can see the low balancers that I have got from here, these are called classic low balancers from AWS. So this classic low bound network low balancer is a both, it supports both layer seven and layer four. So if you look at the documentations, it's actually not that efficient for traffic's like streaming services because we go on the layer four level, right? So in order for us to be more efficient, I think it'd be great if you can spin up a network low balancer instead of the classic one where it supports both. The network one would support just a layer four communications. So it will be a lot more quicker, a lot more better. And the way to do that is to spin up a network low balancer. And for the network low balancer, this is how you do it with AWS. Obviously for GKEs and AKS, it's very similar. I think it's just changing the different names to that and they will just know what type of low balancers that it spins up. But when you use this term, terminologies like adding, basically just add annotations on top of your services that they would know, it will then spin up a network low balancer. I can share this code with you later on with my GIT, but it's very simple basically. So what happens now is, do I have access to my cluster now? Let me see. Yes, now I'm talking to my cluster through this new low balancers. Oh, I should have done it when it's not working, but you have to trust me on that. All right, so this is how everything works internally and externally for the traffic's inside low balancer. And also, if you take a look at the configuration, it's gonna tell you that here are the things, we can always change it to using this URL as well, but we're not gonna do that this time because of the time concern, but I can make a video if you want to kind of dive into a little bit deeper into things, but that is how we can do configurations with low balancers. Yeah, so a little bit about low balancers. Another of the questions that I get asked often is about the CNI plugins. So these CNI plugins got really popular for the last two years, one year, two years. So people often ask me like, okay, so what kind of CNI plugin would be better for streaming services? So I actually did some testing. I did a benchmarking thing from, just do a quick testing, which one of these are faster? So actually you'll be surprised. Maybe I tested it wrong, I love to hear your feedback on that, but when I tested the load, it actually doesn't make any difference even if I introduce or not introduce CNI, right? And I think it's because how CNI works underneath the hood. Yes, it would introduce faster time to look up. It will allow you to do better networking policies and it will allow you to see more things in the networking space. So this CNI plugin is another layer where, depends on what kind of plugins that you use, some of them will then replace that Coup Proxy. So that Coup Proxy is probably one of the point where things get slower, right? So because the way that it does, it uses like an IP table. So if you're using IP tables, then like, every single time when you have a new connection comes in it needs to look up all the things on the IP tables, go through everything. So it's like the word, it's pretty slow. You can probably use IPVS as well, but it's similar to things, it needs to look out things. And then you can, and it's hard to kind of add different policies on top of that. So this plugin is kind of adding another layers where you can just, when every single time when something triggers in the networking side it can then call this plugin to do other stuffs. One of the things that it allows faster look up. So you're not just relying on that IP table to look up things, it allows you to do a lot more rerouting and all that stuff, right? But the thing is, it only gets slow when you do your first look up. So once your connection is established it doesn't need to look up things again, right? So for a very stable connections like streaming services it actually doesn't make a big difference. Like that's what I think it doesn't make any big difference when you're just have a stable connections and things are starting to return into writing into it and not writing into it. I think that doesn't really make any, and it's not making things faster. It would be good, I mean, these are good to have, right? If you have the capabilities you have, if you want to do that, you can. But in terms of making things faster I don't think this CNI would make a huge impact on the streaming services. But on the contrary, if you have a lot of services in your node and then you have a lot of hoop jumping for HDPs like your ingress HDP services and you have like a bunch of sidecars on your service mesh, that's a different story. But here we're talking about very stable streaming connections that when it's established doesn't really change that much. So in my perspective, I don't think it helps a ton. All right, some connectivity checkpoints, right? So when you have a problem with connecting to your services, make sure that you think about what is your advertised addresses, right? So I have a blog post that I did, I think a couple of months ago already is what is advertised Kafka address, right? So if you want to know more about this, please go ahead and read it. I think the last section was about Kubernetes and all the command that I shared today is mostly here, right? So you can see how everything works, explains a little bit about Kubernetes and how worker nodes work. But it goes to the basics of how I work. I like to work from the basics. So I talk about how everything works from the beginning. So I was strongly to recommend you to read it, if you have problems, right? Another common problems I see people who has is like VPCs. So I see people deploy their containers and their Kubernetes in this VPCs and they have other clients coming like trying to connect to this VPC, but they forgot to connect those VPCs. So there's no peering, there's no linking, there's nothing, so the network just it's not connected. So you can connect it together. So try to make sure your VPC are connected through VPC peering, linking or transit gateway or try not to use the net gateway because it goes out to the internet and comes back again. Like why waste that traffic span with, right? And slow. And then if you're on the same Kubernetes cluster, what is your addresses? Is it more efficient in all addresses that you have locally? So it's local services address, make sure you know. And some of the problem was people when they first set up their Kubernetes, they forgot to open up the ports for the ports for the, no port, if you're using no port, they forgot to open it. So no traffic can go in and out from that no port, that instance, that EC2 instance. So therefore you can not connect. So don't forget to take a look at it. And then obviously, like when you think about all the ports that is available here, right? You've got a bunch of ports that is available. So they all kind of meaning a different thing, right? For instance, like you have the admin port which is doing all the admin stuff. The Kafka port is actually what you want. And then because Repenna also provides you with a lot of capabilities that's not included in Kafka, we give you like a Out-of-Box HD bridge. So if you wanna connect to your streaming through HDB bridge that you can do that and then also schema registries. So just make sure you're connecting the right port. These are very common things I see people ask in the community. This is not all FAQs, performances, right? Some performances impact that you can probably see when you are designing and then when you try to figure how much bandwidth do I need for my network, you have to think about this. Take this into account. So when you are writing data into your brokers, X is the amount of, this X is the amount of data that you will be ingesting into the broker. So therefore you have X. The problem is that when, depending on how much replication factor, so how many duplicate things that you wanna duplicate to the other brokers, right? If you wanna say I have two, I wanna duplicate to two other brokers that means that you're introducing two traffics, like two data that gets replicated over from this broker to the other broker. So the more you replicate, the more bandwidth that you will consume. And then people don't really think about this much, but when you're trying to consume your data from your brokers, I know it's tempting to have as much consumer trying to fetch the data as much as possible, but you have to think about this. The more consumers you have, the more outgoing data you have. So this is how you can calculate, kind of estimate how much bandwidth you're probably gonna need in order to kind of just satisfy your needs. So if everything, everybody is like getting data in and out of that, this is how you can do that. So number of data transmitted, you can need to plus the number of replicated factors and then the number of consumers. I've seen some of our, some customers or some people that they would have like thousands of consumers, that means that you have to have a serious good bandwidth in order to support that. Just saying. So some of the configurations you can control, right? So, I mean, in the streaming space, your producer and your consumer is responsible for a lot of things, like determining what petitions they want to write things in and also how many in-flight concurrent requests, how concurrent requests, in-flight concurrent requests that you can have for connections. The more you have, the more data gets transferred over, right? So, you know, like, you can kind of see the fluctuations of your data bandwidth consumptions when you have more data coming, like transferred like through the same time, right? And then the things that you can control to make things a lot more efficient is increase the batch size, the bigger packet the better, right? So, and also the longer you wait, the bigger the packet. You can kind of control it that two ways. And the other one is compressions. We can compress the data into and zip it, right? So, there's less data you have to transfer over the network, but that also introduced a lot of processing on your consumer and producer side because they need to decompress it. So, depending on your needs, make sure that, you know, should I compress it? So, I have lower network usage or do I want to not compress it so that my producer and consumers don't have to do them as much work? Public IP, oh, it's lower. And then other things like consumers are similar to what producers, but in the other side of the story. So, the bigger the packet that it fetches, the more efficient it is. So, the longer you wait, longer it is, right? And then, you know, other things like timeouts. So, if your consumer is not, is not doing anything for the last minutes, many seconds, just disconnected, you know? Don't waste your connection there. It's not doing anything anyway. So, think about all these different strategies so you can kind of minimize the usage. Other things that can impact the usage are, you know, if you're doing some benchmark or things like that, it's probably gonna affect the network. Some people go crazy on observability metrics. It's really happens, I don't think so, but if you go crazy on the observability metrics, of course, there's more data going out, gets scraped from your observability tools, right? So, these are also counted as bandwidth. Speaking of bandwidth, that if you're doing tier storage, Kafka doesn't do this, but Rependat does is that we help you to, you know, like how Kafka works, right? It will lose all the data on retent data, right? So, after seven days, it's gonna delete everything and that thing gets deleted forever, depending on your configurations. And then so, Rependat allows you to kind of have these data stored somewhere else in a much cheaper, you know, object store bucket like S3. And then when you're offloading this over to the S3 bucket, it is going to use some of your network. So, you also need to take in account of that. So, some takeaways for the performance perspective. I went a little bit fast on that side. Let me know if you wanna hear deeper on that, but, yeah, so things that you wanna think about, right? So, the number of producer, consumer, and then the number of ratios, right? How many of them were there? You have to think about it. Don't have too much. Don't wanna do over provision your consumer, I would say. And then use large packets as possible, please. And then make sure you set up your time outlet when you're not doing anything, just time out. And then the replication factor, is it actually matter? It does matter because it will take up some of the internal networking space, right? Producer, consumer configurations, these are the things that you can remember, the one that we talked about, like the size, the packet size, and all that. You can kind of configure that there. So, I wanna talk about these, a limitation, not just EC2, but on every single one of the cloud. They will promise you a capacity limits, right? They will say up to 25 megabytes per second limitations. But it is not always like 25 megabytes per second all the way, right? And different instances would have different available bandwidth. So, do take into account that, what kind of instance do you have? And do they have guaranteed bandwidth, or do they have up to bandwidth? Because it can be very good on Monday. It could be amazing on Tuesdays, but it could be terrible on Wednesday. You never know. So, think about it, and then take a look at it. And then, sometimes when they promise bandwidth, it doesn't mean that it's all the way to the public space. Sometimes they will limit the bandwidth from the public IPs, and from external subnets, and stuff like that. And then, and depending on, also how you set up your HA strategy, because some of them, I think a lot of people likes to have it in different partitions, like different data centers, right? To have that have abilities. If something goes wrong, I can have that. But the thing is that when you're trying to do replications over, when you're trying to communicate through the clients, these all matters, right? These are all, if you're trying to connect to a different data centers, and it's going to take longer, and all that kind of stuff, right? That takes, I mean, it's not gonna have a huge impact, but you will notice the difference. And then, you know, thread the load, if you're consumer is consuming a huge amount of data, you know, if you're hitting the streaming limits per connections, you can probably spread it around, like have two consumers in the same consumer groups to share the load. Things you can do to kind of make things a lot easier for running your networks. And that is all from me. I hope we have some time for Q and A's, but here are some of the resource that is available from Red Panda side. We have University. So me and my colleagues created this university course where we teach you fundamentals of streaming services and all that, you can try and take a look at it. We have the best documentation ever. I think we have a very good documentation team that try to put things down. And then there's a lot of information on blog posts. And then let me know anything, any feedback, both on Slack or reach out to me on LinkedIn. Happy to hear your thoughts. And then our information or our code is all available in our GitHub account. So if you wanna see how Red Panda has made, how it's written and all the new updates on the code itself, this is the GitHub place. So I'm gonna go ahead and take a look at the questions. So how do you use Docker compose to interact with Red Panda with Docker for transitioning between Linux OS and Windows? To be honest with you, I'm not a Windows OS savvy person, but I assume if the Docker compose works the same way as how everything works on Linux or Mac OS, I think that it will be similar to what's going on with, so in my post, am I sharing that? So in my post, I have a section on Docker. So actually I talk a little bit about Docker and how to use it. So basically in Docker, you have this Docker network. So basically if you're trying to talk to things inside your Docker network, basically you have to see what is the address or that was bind to your Docker network and then you need to export it, basically attaching a advertised address, which is a external Docker network advertised address into your brokers and that is how you can write it. Sorry, I'm not a Windows expert, but love to hear your thoughts. If I get my hands on on a Windows machines, I will try and let you know. Any issues, metrics using method on ARM processors? No, we run on ARM, so I don't have any problems with ARM whatsoever before. Can Mojo be used with Red Panda? I don't know what Mojo is, I'm sorry. I will Google it and I will let you know on my LinkedIn account, I will post it later. But that's all we have for today. Thank you very much and I hope to see you next time. I hope you find this useful. Thank you. Thank you so much, Christina, for your time today and thank you everyone for joining us. As a reminder, this recording will be on the Linux Foundation's YouTube page later today. We hope you join us for future webinars. Have a wonderful day.