 your own cloud gaming PC. So before we go ahead, let's just set up the environment for it. Let's just set up a preface. So why exactly do we need a cloud PC? Because we already have thin clients. There are thin clients like that thing center, like small boxy desktop thing center. And Dell also has an OptiPlex, like all those systems. We have thin clients always available for us. We have RDP, we have VNC. All those protocols already exist. We have TeamViewer, any desk applications that enable you to access systems remotely. They already exist. But we also have GeForceNow and other cloud gaming services. So why do we need to set up our own? What's the motivation there? So well, first thing, thin clients, well, you will have to buy an external hardware, right, for thin clients. Why cannot we just use our existing hardware? We already have phones that are good enough. We already have laptops that are good enough. Why do we need external hardware? RDP, VNC, they have too high latency. So is it possible that we could somehow reduce the latency enough that we can at least play some games, right? TeamViewer, any desk, all those applications exist. But can we switch to an open source solution, something that we know does not spy on us or something that we know does not leak our data? And GeForceNow and PlayStation Plus, all those applications exist, all those services exist. But again, you need to have an account. We don't know their terms and services, how deeply they encroach us and how will they change over time? We don't know any of that. So why don't we just sell for something and we know what it is exactly and we know where the data goes, right? And I say, why not? Why should we not have a cloud PC, right? We already have power. We have the openness that we can ultimately upgrade it as we want. It's our hardware, right, at the end of the story. For security, well, I can just host it anywhere I want. If I want to host it at my home, I can host it. If I want to host it at a data center, I can host it. Wherever I feel it will be secure, I can host it. It's my hardware, right? Accessibility, we don't know if GeForce now or any of the service might go down even momentarily when we want to give it, right? If it's our service, if it's our system, we know how it works. We know when it wants to be stable. We know everything about it. So why don't we just host it ourselves, make it accessible enough that it's ready in two minutes. All we need is internet connectivity because you will need internet connectivity for any of the existing solutions already, right? So why don't we just have a few extra nice things on top of that internet requirement? So I'll go through a story and here are a few milestones that I'll cover. So the first milestone is how to use remote desktop protocol to access PC Overland and overtime. I will cover various protocols and the second half of the talk will talk about various optimizations needed to be done on top of the latest tech stack that Tim currently using. So the rest of the story will look like this. So we'll try to access remote desktop protocol over the internet and then we'll go over GameStream. So NVIDIA GameStream, how to use that in our own self-hosted system. And after that we'll explore Sunshine, which is a new application and then optimizations. What optimizations Tim needed to do in order to make it reliable and accessible over the internet at all times. So this is Tim. Tim lost to game. Tim, right now it's 2015 and Tim has recently started his summer vacations. He wanted to game, but for some reason there's construction going on in his room. So yeah, he cannot game. Wouldn't it be nice if Tim could just play games? Right? So, well, Tim tries it out. Tim sets up remote desktop protocol. He installs a remote desktop protocol application in his phone and he just accesses his own desktop from his backyard and he's able to play some games. It's not great because he gets around 10 FPS, but at least he's able to do it. So yeah, he's able to access it remotely and he gets around 10 FPS at 1080p. That's the current status. Now let's explore how to access it over the internet. So it's now 2019. It has been around three years since Tim has completed his school and now he's in college. He's in the last year of his college and he has successfully bagged an internship. But the thing is that internship is a bit too far. It's hundreds of kilometers away from his hometown and now he cannot bring his own desktop with him. So now he's employed at Nom Nom Core. It's a real company, you can look it up. He goes there. It's his first day. He does great. He comes back over a few months when he settled in his new job. He tries to set up remote desktop protocol again. He accesses his own computer remotely this time over the internet and he's able to do it. Why? The answer is port forwarding. So he could forward his own port, the remote desktop protocol port, that is port three, three, eight, nine. He forwarded that port and he just used his own IP address and he was able to access it. Simple. But again, it reduces the quality much more. And the status is he gets around two FPS. That's the current status now. So yeah, over the internet, remote desktop protocol, two FPS, 1080p. Now let's explore the next chapter, which is NVIDIA game stream. So now the time is 2020. We all know what happened in 2020. So Tim came back to his home and he explored a few Reddit posts and he figured out there is something called NVIDIA game stream that has been available all along. In his own system. And he explored it and he realized that he could just enable this little switch, the game stream switch, and he can use an application called Moonlight. What's that application? Well, technically NVIDIA is not good at heart. So this game stream service only works with shield devices. When I say shield devices, it means the shield tablet or the shield TV, those kind of things. So how can Tim use his existing hardware with NVIDIA game stream? The answer was Moonlight. So Moonlight is an application that mimics shield hardware. And you can use this application to enable your Android device or your iOS device, any existing device. It has a bunch of clients. You can use any of your devices in order to stream your desktop through NVIDIA hardware using NVIDIA game stream. So he tried it and he could get reasonably well FPS. He got around 50 FPS at 1080p with the HEVC encoder. So now he could get around 50 FPS within his LAN. So that's the status now that he could access his system remotely and he got a respectable 50 FPS over 1080p using NVIDIA game stream. So now the next chapter is accessing NVIDIA game stream over the internet. So this time it's 2021, things are slowly opening up and Tim has got a full-time opportunity. So now he again needs to move and this time he's prepared again. This time he has set up NVIDIA game stream over the internet. So for doing that, he again forwarded a bunch of ports. The list of ports is this one. I know it's a long list of ports, but we are still developing and Tim is not a high level thread or a high level asset. So it's not like somebody's targeting him. So it's okay. So he exposed a bunch of ports and he could access his system remotely with a respectable FPS over the internet. So that's the status in 2021. Then it worked well for about a year. 2023 happened, but then we got some news. So apparently game stream released their end-of-service notification in 2023. This was around the same time when Google Stadia also dropped. I'm not sure if those two things collide, but yeah, apart from that, people were also talking about moonlight and like a lot of people on Reddit were complaining, but they were also discussing which is quite rare on Reddit. So here you can see one keyword in the second line of the post, sunshine. So this is what got Tim's eye and he got to thinking what exactly is sunshine and he explored it and he found it out that there's an application called sunshine. And if you use sunshine, you can expose much less ports and you can get better if not the same quality as game stream. So sunshine is an application which provides kind of the same features as game stream. You can still expose some ports or you can still get your stream from your desktop to your moonlight lines and it works the same as game stream and it's completely open source. So Tim set it up and he could get better than NVIDIA game stream FPS around 55 FPS at 1080p over the internet. So that was the state once he set it up. The next thing is optimizations. So when I say optimizations, what do I mean? So the first thing is Tim always needed his desktop to be running. He needed his desktop to be always on, always turned on, at least on standby, but he always needed to be running. Why is that? Well, sunshine works only when your computer is turned on. You need the computer to be turned on fully. That means you need the monitor to be turned on and you also need the CPU to be turned on. At least these two things are mandatory. Why exactly? Because Windows is Windows and Windows does not like headless systems. So if you want a screen to be streamed, you at least need a monitor that is actually presenting that screen. So if you cannot, yeah. Yeah, exactly. So if you cannot provide a desktop, a monitor, the next best thing you can do is plug in an HDMI monitor, HDMI dummy plug, right? So if you plug in that HDMI dummy plug, that works. That's the first thing. So that way you don't need a monitor. That's around 50 to 100 watts of power save every hour. The next thing is restoring power loss. So in order to make sure that his computer was always turned on, Timmy set up this one feature so that his computer was always available, always turned on. But it's not the best thing because then again, you have your hard disk spinning all the time, you have your computer turned on all the time and it reduces the life of your hardware a lot. So how to fix that? Yeah. So right now the status is the system is always accessible because of that switch and you don't need any extra peripherals because of the dummy plug. But how do we make sure that we don't sip power when we don't need to use the computer? How do we fix that? The answer is wake on land. So wake on land provides you a way to turn on your computer within your land. That's important within your land. But Tim was not within his land. Tim was miles away, thousands of kilometers away from his home, but he's still needed to somehow trigger wake on land so that he could run or he could turn on his desktop at home. So that's why he just bought a Raspberry Pi and he just set up a script that could simply turn on or trigger wake on land and it could turn on his computer. But then there was another problem. It doesn't work outside land, right? So for that, he just set up a proxy device which was a Raspberry Pi. The other problem was it doesn't work with a cold boot. What do I mean with a cold boot? I'm not saying the shutdown button or I'm not saying restarting and then turning it off. I'm talking about disconnecting from power. So if you disconnect your CPU from power and then you reconnect it after a few seconds, there's no power in the CPU at the moment. Wake on land does not work. So this is not universally true because every motherboard manufacturer has their own implementation of Wake on land. It might work better for your system. But for Tim's system, it didn't work with a cold boot. So for that, Tim had to hack it. So what Tim did is a bit weird. I'm not sure. Yeah. So for that, Tim did something a bit weird which I can only explain through words. So Tim set his computer to turn off within two minutes. Every time it turns on. So you can think every time power is restored. So for example, every time electricity goes for a few minutes and then electricity comes back. The computer turns on by itself. If nothing happens, it turns off within two minutes. Those two minutes are enough for anyone to log in and then cancel the shutdown. But obviously that means Tim always has to be on his toes when he's turning on his computer. He already has a proxy device, right? So why cannot he just script it? That's what he did. He just set up a script and within that script, he just kept SSH-ing into his computer and he would just cancel it every time the script ran. He would just cancel the shutdown and it was all good. So this is how he figured out the cold boot problem. That's the first thing. The next thing is you also bunch of ports for being exposed. So for that, he set up wire guard VPN. And setting up wire guard VPN meant that he did not have to expose ports. He could enter his own LAN through the wire guard VPN. So that happened and that way he could easily reduce his attack surface. That's the first thing. So at this point, the status quo is securely accessible remotely over the internet, around 55 FPS, always accessible, only consuming power when used and without any extra peripherals. The next thing was that around start of 24, Tim's ISP decided that it was time to call it quits. It was a local vendor and he got a new ISP. At this point, everything was going great with the previous ISP. When I said everything was going great, he had a static IP address. And that's why exposing these ports and then accessing his own IP address was simple. It was easy because it was a static IP address. But this time, the new ISP put him behind a bunch of nuts, not just one nut, a bunch of nuts. And at this point, he needed something so that he could access his services without going through these nuts. And at this point, he found out about tail scale. So he found out about tail scale. He set up tail scale on his own phone and on the desktop and he was able to securely access his own desktop. If you want to set up tail scale, tail scale, the tail scale application and the tail scale server, the node logic is entirely open source, but the coordination server is not open source. The coordination server is the bit that connects the two devices together, any two devices. The coordination server's open source implementation is called head scale. It is not affiliated with tail scale, but it's available separately as an open source implementation. So if you want to try that out, you can set up head scale on your systems and you can use that. So he figured out that there was an application called tail scale that could jump through nuts. And at this point, the status score looked like this, that his system was securely accessible over the internet. It was always accessible, only consuming power been used without any extra peripherals and even works without a static IP address. So yeah, so at this point, this is the status score. So he pushes a button on his phone to turn on the PC. It triggers a script at his server. He accesses the PC securely, he logs into it and then he plays games, logs out and then pushes a button to turn it off, simple. Yeah, that's it. If any of you have any questions, now would be the time. Yeah, yeah, definitely. So tail scale internally uses wire guard. So since tail scale internally uses wire guard, you can easily secure all the communication. So it's completely secure and it's still applicable in an office environment. Yep, yeah, you can do that. Thank you. Yeah, yes. Yeah, yeah. The PI KVM is not the best solution because PI KVM exposes the screen of the Raspberry Pi desktop itself. But tail scale enables you to directly stream from your desktop without your server as being the middleman. So when you're using tail scale, compared to using wire guard, in the case of wire guard, all the traffic goes through your Raspberry Pi and then it comes to you. So it adds a bit of latency. But when you're using tail scale, you are directly streaming from your desktop directly to your mobile client. So there's no middleman when you're using tail scale. That's the first thing. And the second thing about AMD GPUs, yes, Sunshine supports all the GPUs right now. As long as it has an encoder, it works. And yeah, all the GPUs have encoders at this point. Yeah. Nice. So the first thing is all of the tech stack here is open source. So even when we are talking about it, it's still improving because it's entirely open source and anyone is still working on it. So moonlight, the client application, Sunshine, the server application, tail scale, the networking application, all these things are open source. So even while we're talking about it, somebody is already improving them. And personally, I cannot think of any good solutions apart from the things that are already in beta within moonlight and sunshine, for example, support for newer encoders, support for editing your bit rate, like if you are in a low bandwidth network, so you can reduce the bit rate and you can also configure what resolution you want so that you can get exactly the quality that works for you in that particular situation. So yeah, there are a lot of configurable options when you're using this, when you're using moonlight and sunshine and you can optimize it as per your requirement. So yeah, does that answer your question? Okay, what's the challenge that you're facing exactly? Yeah, okay, so is that a bandwidth problem? Like you have low bandwidth connection or is it something else? Okay, so yeah, you can edit the bandwidth in the moonlight application and that way you can configure exactly what quality works for your particular scenario. Like I can just try it out right now. Yeah, multi-user. Windows does not let you do that. Windows does not let you access multiple users at once, but you can do that using virtual machines. So if you pass through your GPU inside a virtual machine and then you log into the host and for example, your brother logs into the virtual machine, then the two of you can make it work on the same desktop. Virtual machines are pretty efficient these days because of like virtualization and passing through a GPU, it should reduce a lot of overhead at this point. So you should be able to do that. You can see the desktop right now and if I log into it, before I log into it, I'll just show you the configurations, like what options do we have moonlight. Yeah, so you can see here, there's a bunch of options available for you. For example, you can set up the resolution. You can set up the frame rate, like what frame rate does your device support? For example, on my mobile device, it even shows me the options for 120 FPS. I can set up the bit rate. What bit rate do I want per second? And there are a bunch of other options. For example, I can set up the frame pacing, like do I want the lowest latency or do I want the smoothest experience? All those things. And there are a bunch of other options available for things related to gaming. For example, emulating your gamepad, if you're using a non-standard gamepad, all those things. Using your touchscreen as a trackpad, like the standard controls that you expect, the standard options you expect from a remote desktop client, they're all available already. And there are performance stats. So if I just enable this, so that's my desktop at home. And at the top you can see it's currently running at my mobile phone and through mobile data. I think it's, no, it's on the Wi-Fi. It's on the college Wi-Fi and it's giving me around 10 FPS because the screen is static. If I, yeah, I need to figure out how to show my monitor keyboard. So once I'm able to show a keyboard, I can log into it. So yeah, right now I'm running on the college Wi-Fi which does not have a lot of bandwidth. So there's a bit of latency because of bandwidth and other things. Yeah, it's fine. It's secure enough, it's fine. Nine, seven, one, three. Yeah, so it's running right now. And at the top you can see it's around 50 FPS on the college Wi-Fi at 1080p. So now it's static screen now, that's why the FPS is low. Yeah, so this particular Grafana desktop is my server, the Raspberry Pi that I have at home. And obviously I can game as well. That's completely doable. Yes, yeah, so that's my desktop back at home. And what do I need? Games, I need, yeah. So I don't play Doki Doki Literature Club, but I do play Spider-Man. Where is it? Yeah, so now the connection is probably lost because college Wi-Fi, if I disable the college Wi-Fi it should work better. I just disabled the Wi-Fi. Yeah, so now it's much more responsive. That's the first. Tail skills working, a lot of people are showing up. So that was it, setting up your own cloud gaming PC. If any of you have any questions, I'll be at the KDE booth. So I can talk about self hosting, setting up your cloud gaming PC or KDE. Okay, thank you. Thank you, and good morning everyone. I know it's a bit too early to talk about all the technologies and that too in a comprehensive way, but yeah, let's try it. I don't know if this will work, okay. So a little bit about me, I'm a maintainer of Apache API 6, as you can see on my T-Shirt. I used to work on some CNCF projects, which is Cloud Native Computing Foundation. And I also try to help contributors through GSOC and LFX mentorship program. Okay, and today we'll talk about API gateways, Kubernetes gateways and service measures and from what I understand, like people are often confused about these three technologies and people often think that they are interchangeable and there is a lot of reasons for that confusion. And today I want to clear that confusion and give you an idea about all three technologies and how you can actually use them and where you can actually use them. So first of all, why is there a lot of confusion? Just like this confusing slide, we can dissect each of it one by one. First of all, like a lot of keywords that get mentioned with these three technologies, like canary deployments, rate limiting and service discovery are used to describe all three technologies. And all of these three technologies use reverse proxies. They are built on top of reverse proxies so that adds to the confusion. And some API gateway projects have their own Kubernetes gateways and have their own service mesh projects which also adds to the confusion. Like Apache API 6 is an API gateway, it's also a Kubernetes gateway and it also has a service mesh. So it makes it a lot confusing. And there is also the other way around, some service mesh projects may have their own API gateways or Kubernetes gateways. And finally, like there are a lot of articles or YouTube videos where people compare all three technologies and say which is better than the other, which is completely wrong. So that can add a lot of confusion. Let's start simple. Let's start with an API gateway. So when you boil it down to the bare minimum, an API gateway is just a reverse proxy that has a lot of capabilities. And an API gateway sits between your clients which can be a UI, an Android app or anything. It basically sits between your client applications and your application. And what it does is it accepts requests from the client, does some processing and forwards it to your backend application and takes the response back from the backend application and gives it back to your client. So basically, as shown in this diagram, so you have your different clients, your web apps, your iOS apps, your Android apps and you have your APIs on the other side and the API gateway sits between them and acts as a proxy and it takes the request from the client applications and forwards it to the APIs while doing a lot of stuff. So it does things like traffic control, it does things like authentication, it does stuff like security, it does stuff like monitoring. So a reverse proxy with a lot of capabilities. And an example of an API gateway is Apache API 6 and I of course work for Apache API 6 which is a completely open source project under the Apaches of the foundation. And there are also other open source projects that are also API gateways. But there are also API gateway offerings from all major cloud providers as well. And an example that I will use to demonstrate the capabilities of API gateways and the other technologies will be canary deployments, which is like a very common use case of an API gateway. So with the canary release, what you're trying to do is you are trying to roll out a new version of your application without affecting your users. So what you do is you roll out the new version to only a small subset of your users and the 95% of your users are stuck in the old application and you test your new application with only 5% of your users. And you gradually increase this percentage until all of your traffic, all of your users are using the new version. So as shown in this example, so you have your clients and you have an API version one and API version two and your API gateway, which in this case is Apache API 6, routes 95% of traffic to V1 and some traffic, which is 5% to V2. So you can configure API 6 to do things like this. So let's see this in example. Let's see some actual code and configuration. Yeah, I believe you are able to see this. So this is an example of a canary deployment configuration for API 6. So what you basically have here is a route or basically a configuration. You don't have to understand about the API 6 configuration to understand this, but basically what you're trying to do is you are configuring your backend here, your APIs here and you are giving it a weight of 95 and the other one a weight of 5%. So basically what this configuration does is it configures the canary deployment. And this is how it is done in API 6. This is done similarly in other API gateways as well. And going to Kubernetes gateways, we have this understanding of API gateways. Now when we go to Kubernetes gateways, we can think of them as just Kubernetes native API gateways. So API gateways that work natively in Kubernetes. What do I mean by that? Basically you'll be able to manage an API gateway similar to how you manage a port or a service in Kubernetes. So you will be able to create a manifest file, use kubectl apply and you should be able to configure an API gateway in that way. So basically, Kubernetes gateways are API gateways that are native for Kubernetes. And to work with Kubernetes to use API gateways in Kubernetes, Kubernetes provides two APIs currently. The first one is the ingress API, which is the old API, but it is still being used. And the other is the new Kubernetes gateway API. So both of these APIs are official Kubernetes APIs that can be used to configure API gateways. But Kubernetes itself does not have an API gateway implementation. Kubernetes only has port services, that sort of abstractions. And when we go to the Kubernetes environment, our APIs are now ports and services inside a cluster. So when you look back into the API gateway configuration, we had multiple APIs. And when we go to the Kubernetes land, your APIs are usually ports and services which are running inside a cluster. And you have an ingress or a gateway that routes the external traffic inside your Kubernetes clusters. But like I mentioned, like Kubernetes does not have an implementation of this API. And that is where ingress controllers come into the picture. So API six is an API gateway. API six plus API six ingress controller is a Kubernetes gateway. So while Kubernetes does not have an implementation, API gateway projects can bring their own implementation into Kubernetes. So if you want to use the Kubernetes gateway API or if you want to use the Kubernetes ingress API, you have to use it with an implementation like Apache API six. So what this ingress controller does is it takes Kubernetes configuration, which is written using the Kubernetes ingress or the Kubernetes gateway API and it converts that into API six configuration, the configuration we just saw before. Basically the ingress controller is a translator. All right. Now let's look at our example, which is canary deployments. So similarly you have your clients and you are trying to route traffic between two versions of your services. The only difference is that we are trying to route it out traffic inside Kubernetes cluster. All right, let's see how we can do this in API six. Okay. First we can see through the ingress API. So what we have here is the canary deployment configuration defined through the ingress API. So you can see the kind is ingress and this is an official Kubernetes API and you have configured some backends, which is your application you want to route traffic to and you configure the canary release in the ingress API through custom annotations. So you can have custom annotations like this, which sets the weight. But one problem that people had with the ingress API was they were stuck with using a lot of custom annotations. So they wanted to do something more Kubernetes native, much something that is more easy to do. So what people came up with was custom CRDs. So this is a configuration that is specific to Apache API six. As you can see, like it is provided by Apache API six. It is installed as a custom resource and this type of custom resource is called an API six route. So basically you are configuring, you are defining a configuration that is specific to Apache API six. And it more or less looks the same as we saw before. Like you have multiple, you are configured in different backends and you are also giving them a weight with which you want to route traffic to. So it is pretty human readable, but it can be read by API six as well. But the problem with this is that it is specific to Apache API six. So if you want to use some other ingress controller, you have to like manually change your configuration. And this might be simple for one or two configuration, but if you have a lot of configurations you are stuck with one particular vendor. There is a vendor lock-in. And that's where the gateway API comes into the picture. This is a relatively new API. This just reached V1 stage. This is an older configuration. Now it is, the V1 version is available. But what the gateway API lets you to do is like it lets you configure ingresses in a vendor neutral way. So as you can see in this configuration there is nothing mentioned about, that is nothing specific to Apache API six. So basically you can take this configuration and use it with any implementation of the ingress API. So the configuration looks more or less similar. You have multiple weights, you have multiple services and you are routing traffic between them. And finally let's talk about service measures. So we talked about API gateways, we talked about Kubernetes gateways. And they were more or less similar in a lot of sense because API gateways and Kubernetes gateways worked across application boundaries. You were talking about traffic from your clients to your application. So that is the application boundary. You are talking about the North-South traffic. But service measures are fundamentally different from these in that it cares about inter-service communication. So a service mesh does not care about the communication between your client and your application. It is more concerned about how different services within your application communicate with each other. Which is East-West traffic. So this diagram will make more sense. So typically microservice applications have different services. And what a service mesh does is it abstracts the network layer out of the application. So as an application developer you can only care about building your APIs, building different microservices and you can leave the networking to the service mesh. So typically how a service mesh does this is through sidecar proxies. So you have different proxies. That sits with your application. That sits with your services and it handles the networking. So this layer, this layer of networks is called a service mesh. So each service communicates only with its reverse proxy with its sidecar proxy. And these sidecars communicate with each other and forms the network. And you can have a service mesh control plane which controls this network and this can be inside or outside the Kubernetes cluster. So there are a lot of popular service meshes out there and all three of these are open source. The first one is Istio, second one is a console and third one is Lingardy. All three are open source and all three follow this sidecar model. But you might have heard about EBPF which is quite popular and this is EBPF has enabled a new type of service mesh to evolve and one such service mesh is called Cilium. So our traditional service meshes, the one we looked before, have one sidecar for each of your application service but Cilium does things differently. So Cilium uses EBPF to work at a kernel level. So it does not networking at a kernel level. So instead of having to have a single proxy for each of your service, you can just use a single proxy and directly network through the kernel. So it makes it much more efficient and the future looks quite promising for EBPF based service meshes and we have to wait and see how it will turn out. An interesting solution that comes is by using service meshes and API gateways. So service meshes handle Istio West communication as we saw between different services but you also want to handle the communication between your clients and your services. So for that you need gateways but you can have a situation where you can use best of both worlds. You can have gateways to handle the North-South traffic and you can also have a service mesh with it that handles Istio West traffic. So this is an example of that. The ingress gateway handles traffic into the cluster and the ingress gateway handles traffic out of the cluster while the service mesh handles communication between your services. And service meshes also support canary deployments and this is quite similar to what we saw before. So instead of routing or splitting traffic from external services, you can split traffic between from one service to another in a canary fashion. So the configuration kind of looks similar. So this is how the configuration looks like for Istio. It looks kind of similar like you can set up weights and you can configure different services to route these weights too. So the problem with the configuration that I showed you before was it is specific to Istio but we want the freedom to use any service mesh. So if you want that, we need to have a standard API for it and that was the intention of the service mesh interface project. So it is a set of CRDs that defines an API for configuring service meshes. So basically you should be able to use the service mesh interface API and be able to configure any service mesh. So you can define what your service mesh should look like through this API and any service mesh will be able to implement it. But this project kind of failed and only a few projects participated in this project and it gradually became less adopted. So this project is no longer there but the good news is this project is merged with the Gateway API project. So there is a new initiative called the Gateway API for mesh management and administration gamma which tries to bring the service mesh and the API in the Gateway API project together to create a new API within the Gateway API that also lets you configure service meshes. So basically you will be able to use one single API to configure your gateways as well as your service meshes. And one of the key enabling factors of this new initiative is that Istio, which is the most popular service mesh project, they plan to use the Gateway API as their default API. So in the future you'll be able to use one single API to configure all of your Kubernetes networking needs. And yeah, as I mentioned before, the service mesh interface project also joined with the gamma initiative. And you can use Canada deployments. You can easily configure Canada deployments through this. So yeah, that looks like this. So this is similar to the configuration of the Gateway API, but the only thing different here is this thing called parent refs, where instead of your parent being declined, it is another service. So instead of a client to service communication, this is a service to service communication. That's the only difference. Basically you will be able to use a single API to cover most of your needs. So the question is, what should you use? And I can only give you a very diplomatic answer that it depends. Yeah, it depends. But it depends on you. It depends on what you are trying to do. So if you are trying to set up traffic routing, if you are trying to set up authentication, if you are trying to set up all those things for your client to service communication, then an API gateway might be what you need. If you are trying to do this in Kubernetes, then you might need Kubernetes Gateway API. You might need a Kubernetes Gateway. But if you are trying to set up communication between your services, if you want to set up a networking layer for your services, you have a lot of microservices. Then a service mesh is what you need. And if you need both, then you might need a service mesh as well as a gateway. So it actually depends on what you are trying to do. So I'm giving this talk because I wrote an article called an article with the same title, and it was quite popular. And I realized that a lot of people are quite unaware about these technologies and most people think that they are interchangeable. So yeah. And if there are any Chinese speakers, you can check out the Chinese version as well, which was translated by one of my friends. So yeah, if you have any questions, feel free to ask them now. Or else you can scan this QR code which will take you to the blog post so that you can revisit what I mentioned in this talk later. So yeah, thank you. What is the performance implication of it being so many psychars and a gateway to the whole system? I would imagine that it added a lot of latency to the whole network and the microservice, especially when there are a lot of microservices talking to each other using psychars. So yeah, there is a lot of performance implications for using a service mesh. So there is actually a project called service mesh performance, which tries to benchmark how performance changes with service meshes. But yeah, as the number of services increases, the performance impact also increases. So I think it's a trade-off. If you want more capabilities in your network, if you want to do more granular things in your network, you might want to use a service mesh, but there is a trade-off for performance. But that's another reason why sidecar less service meshes, like the EBBF service meshes are becoming more popular because they are more performance than sidecar service meshes. So yeah, there is always an implication in performance here. So we're going to come right to the next session. So today, we'd like to welcome Mr. Han Pham, his cloud engineer from Virtel Solution. As you know, Virtel is one of our leading technology company in Vietnam. He's a highly skilled and dedicated car engineer, and especially in Kubernetes. So he's going to talk about designing, implementing, and also maintaining Virtel Kubernetes engine. Okay, so in the meantime, there's actually QR code for translation on the wall. So you guys, if you need some extra help in terms of languages. Okay, so everyone, please welcome Mr. Han Pham. Hello, everyone. My today topic is make operator and make controller tools for my great workloads between Kubernetes cluster. So let me first introduce myself, Pham Hong Thang. And I'm working as a cloud solution engineer at Virtel Cloud. And my main focus is on Virtel Kubernetes engine. And we are currently developing a migration service, which will migrate workloads from other places to Kubernetes. And today, I'm going to share, you guys, the idea of the part of it. This presentation will have four parts. First, I'm going to introduce you guys. What is Kubernetes migration? Second, I will introduce how we can migrate the workloads. In the third part, I'm going to show you guys two open source tools that I use for the migration process. And the last one, I have developed a migration operator to manage the lifecycle of the process. So let's jump into the introduction. So what is Kubernetes migration? Actually, there are a lot of theory and detail answer in the Covail project. It's open source. And I have a link down below. But today, I'm going to just talk about the one definition migration workloads between Kubernetes distribution. So for example, I want to migrate from GKE or EKS to Vietel Kubernetes engines. Why and what should I do to migrate the workloads? So the reason, there are four reasons here to change your Kubernetes Engine cloud provider. The first one is because of the difference of features and price. This one is easy to understand. The second one is because of infrastructure. So for example, in Vietnam, Vietel has the best infrastructure for storage, for networking. So if you move your Kubernetes Engine to Vietel cloud, the latency may be lower. The third reason is because of government policy. So the Vietnamese government has a lot of policy to encourage domestic cloud provider to develop features. And the last one is human resource. So if you are working at an international company, of course, you want to have a technical supporter. But if you are working in a Vietnamese company, then a Vietnamese speaker is your best choice. So there are some other UKs for you to use the migration service. The first UK is multi cluster. So if you have multi cluster in multi region, you might want to migrate your application across clusters. So the second use case change reason is already involved in the multi cluster. And the third one is when you want to update cluster version. For example, you want to update from 1.24 to 1.28. You might use the migration service. So what do we actually migrate? I have split the workloads into three parts. So basically, you have to migrate these three parts separately. But I'm not going to talk about it much in this presentation. You can contact me later for the technology. So the first one is building resources. This is the easiest part of the migration. Most of the time, you're not going to meet any problems when you migrate this workload. However, with the second one, it's a little bit hard for you to migrate since container image and data is actually stored outside the cluster in some remote storage. And for the first part, you have to choose carefully the custom resource and custom resource definition. For example, if you deploy your cluster in Google Cloud, there's a group of custom resource definition that will develop to manage their cluster. And this CRD is very useless. It cannot be used anywhere else outside the GKE cluster. So I have two migration approach for you to choose. The first one is direct migration. So the concept is very simple. You just have to expose the cluster over the internet. And if the two cluster can communicate with each other, you can have tools like a single clone to migrate the data, transfer the data between clusters. The advantage of this approach is it is really fast. And it has a lot of tools, supported tools, even open source tools you can search on Google. But if you work in an organization that has a lot of sensitive data, I don't recommend you to try this approach because it's had a lot of risk about the security risk because it's public over the internet. So I tech your channel easily. And maybe if you have an unstable network, the data will be lost. The third one, one-go solution. So if you are migrating, there is some downtime. And you have to wait until the migrate process is completed. So I have another choice, which is indirect migration. So this approach is a backup and restore mechanism to migrate workloads. So as you can see here, this is a source cluster. And you will backup all the resource to a backup storage. And then you restore it in the target cluster. So in this approach, two clusters are not connected directly to each other because they have a backup storage in the middle of the way. So and a lot of backup tools have more secure way to transfer data. So the advantage is it lowers the risk compared to the direct migration. And it had less downtime with incremental backup. So in Curate T, if you use incremental backup, you might not have downtime until the last backup. And for the disadvantage, it has less supported tools. So for Curate T, I only found one stable tool, which is Vero that I will introduce in the next part. The progress is complex with many. You have to monitor the progress. And it might be an overkill solution if you have just only stateful sets or simple, I mean, stateless sets, like a simple application. So let's move to the open source tools that I use. The first one is Vero. So Vero will handle the backup and restore logic. As you can see here in the right-hand side here, we have installed the Vero into their cluster, in the two clusters. And then it will backup and restore via a middle repository. And it is an open source of VMware Tanzu. And it also supports for data migration, as I mentioned in the previous part. So RESTIC and Co-PR is a plug-in for Vero to migrate data, even if it's outside of the cluster. However, Vero has limitation. So it handle the backup and restore process. But we still need to manually create the backup and restore. And it has no migration workflow. So basically, Vero just handle the backup and restore service. But then I found the OpenSIP, OpenSource tools for migration code controller. And I have a link up here. You guys can check it out. For it has a lot of features. So it automatically creates backup and restore. When you want to backup and restore, you just need to apply a configuration. And it's going to create it automatically. And it has a migration process. It can check for failure in the middle of the process. And it also had a run-back feature. Because when you migrate to another cluster, it might not be compatible for the old version. So the run-back feature is very useful. However, the process is not fully automated. Because I'm working in a cloud provider. So we want to have a fully automated product. So it also has no discovery service for customers to check whether their workloads are up or not. And it cannot handle some special case. For example, here we have storage class conversion. So if you migrate from the cloud provider to Vietel cloud, you have to change the storage class to store data. And this one cannot handle it. So I have to develop a... So here, the midcontroller will be deployed in a target cluster. And it will control the backup and restore workflow of the Vero. Still, I have to develop a operator to manage the whole lifecycle. So the migration operator will have three main functions. First is manage lifecycle of midcontroller and Vero in multiple clusters. So we don't work with only two clusters. We work with multiple clusters. So it is a very important factor. And what does it mean? It means that we will install the midcontroller and Vero inside of the cluster automatically. And after the migration complete, we will clean up all the resources that we have created. The second function is manage migration process. So when users do the migration, it will manage and maybe notify alert users for the process. And the third one is communicate with frontend because we are cloud providers. So of course, we need to interact with DUI. So how can it deploy the instance into the cluster? So we have Qconfig of the two clusters or maybe the service account token for the operator to discover a resource and then insert the ham release. So we will use ham release to deploy the midcontroller and Vero instance into clusters. And after that, we will apply some custom resource of midcontroller to automate the migration process. And also, we will monitoring the migration process. After it completed, we will delete the ham release and clean up other resources for customers. So that's the end of my presentation. You guys have any question? Hello. I see that all the trade-off between open source tools and I have a question. So which open source tools do you prefer? Yes, the tool approach. This approach is suitable for people who want to study, who want to have simple application in a more secure way. So most enterprise nowadays use indirect migration, right? Mainly. Yeah. Yeah, thank you. Any other questions, guys? OK, so thank you very much. I'm going to have a very interesting talk. All right, so I would like you guys to welcome Hittik. So he's an advocate and he's also currently a CNFC ambassador and platform advocate at Love Let's and who had previously worked at various startups helping him scale the content efforts. So everyone, please welcome Hittik. Yeah, so hi, welcome to my talk. So it's about open source dev containers. So before I start my talk, has anyone over here heard about dev containers? Anyone? OK, one person. That is a good number to start with. So a bit about me. This is me and I'm a platform advocate at Love Labs and I'm also a CNFC ambassador. I'm coming all the way from India for this conference and I'm very excited about this. So before we start, I want to talk about the traditional ways, like how things were before dev containers were. So a raise of hands if you have faced this issue. So you have a project which you want to run on your machine, but your office laptop sucks. So anyone who has ever faced this issue in their corporate or in your college, anyone? OK, a couple of hands over there. So yeah, this is a very common problem because most of the time the hardware updates itself. This is because we are advancing very fast in the software side of things. So the hardware is kind of lacking. And mostly, you can't always upgrade your laptop. So apart from that, there were other limitations as well. For example, as you are getting onboarded into a new company, you need to go through all the documentation, all the steps, all the make files to make sure your project is running. Even after that, there might be a scenario. You are not doing the steps correctly, or there is some fault in the steps. So you can't declare the state of your development environment. So there was no declaration over there. Next is isolation. You might be running some Python version 3 or sometimes a Python version 2. And even in that, you might have different isolation requirements. Like you don't want them to mess around your environments. There is something called Python by environments or ENV. But apart from that, a lot of languages don't support that. Even after that, you might have different database versions and a lot of other things. So there is a requirement for isolation over there. Next is, as we were discussing, there were different dependencies and different versions over there. So when you have different dependencies and different versions, it becomes very complicated. So there is something called Docker for your production workloads. But the question is, how you can use that in your development environment as well. So one of the last main concerns was sportability. Suppose you are running your workspace or your development environment locally. And you want more power, more hardware, more GPUs, more TPUs, something like that. So how can you pull all those environments and push it to your cloud so that you can access more powerful hardware in the cloud? You can't just go and create a VM and then assign and create all those steps. So that becomes quite complicated because you need to manage all the steps again. So you want to automate that. And all those voice down to repeatability. You want to repeat your infrastructure and your work environment as fast as you can. The benefits are that whenever there is a new team member in your company, they can go and do this easily. Apart from that, your infrastructure, you can upscale or downscale infrastructure as soon as you want. This is like when you need the cloud power, you will have that. When you are doing small operations locally, that is OK. But when you need something more specific, more powerful, that is also there for you. And all of those voice down to one common thing. That is the developer experience. As developers, one of the most important things we do is build things. So one way is to build things instead of figuring out a lot of other things. So the main focus is to build things. And the main focus of automation over there is to make sure you are building things without any obstacles. So what is the solution for that? Before we go to the solution from that, have anyone over here heard about Terraform? Yeah, a lot of names. So what is Terraform actually? It is a way to show or deploy infrastructure instead of clicking buttons. So it's self-documenting. So in a file, a tf file, you can file all those things which you can document. It doesn't have a wiki state page, so it's like you can deploy and define your state inside a file. And it helps with onboarding. Like, for example, if you want a specific infra on your cloud, it is just as simple as deploying that infra using a deploy command. So I want to introduce something similar, but for your developer environments. So this is called dev containers. So this is like a simple JSON, but it looks complicated. So it's very simple if we go through. So as we see, we are using a Python image to create a 3.11 workspace. And then we can add extensions over there as well. Apart from that, we can specify the ports which we want to open up for our workspace. This port can be opened up through cloud or on your local Docker. And lastly, you can create a post create command. So it will work. So whenever you go and create a new dev container JSON, you define your state of your workspace. So whenever you create this, those steps will automatically happen. And when you create this, you will have a workspace with Python 3.11 inbuilt. So this standard, this is an open source standard used by GitHub, code spaces, and the port as well. So let's understand the architecture a bit. So whatever we do, the architecture is one of the most important things. So there are two types of containers. What is the production container? And here we are talking about development containers. Production containers are the minimal footprint things because we care about security over there. So as you see, we just push the compiled application over there. But in dev containers, we try to focus on a bit more. So we focus on these four layers. One is the OS and runtime. Second is the compilers. The debuggers and your extensions or VS code and all those things. So you can have an environment in which you can develop instead of an environment where you are shipping. So this is like a virtual machine in which everything is there but on a smaller memory footprint. So another question is, dev containers are something like how you define your coding environment in a simple terms. So how can you use this? So here we want to use something like depod. So depod is a way by which you can go and create your workspace. So workspace are your like coding environment and you can just go download it and use it. So the advantage of something like this which is open source is you can connect with Google Cloud, Amazon Cloud, anything. And you can use any ID you want. For example, you are using VS code, CDM or anything like that. So you can do that. And there's a lot of portability. For example, sometimes you are using on GCP. Then you can use that locally as well. So the friction point of going and running all those commands to create a workspace diminishes. So yeah, this is kind of a star growth right now. So this snapshot was on April 7th. So we have seen a massive adoption on how people are using this and growing using this by creating a lot of workspaces. Next is like, how can you use depod? So depod ships as a like a small desktop application which you can download it and you can create workspaces over there. So workspaces are like your dev environments as we discussed environment in which you develop and it contains all your tool chain as well as all your coding requirement like your language, your databases and everything. So you can define everything using that dev container JSON which we were talking about at first. And lastly, there is something called providers. So you have created the workspace but where do you want the workspace to exist? So there comes the concept of providers. You can have the workspace to exist in AWS GCP on your local machine on a Kubernetes cluster that is hosted by you on Terraform or anywhere you want. So there are two types of providers. One is machine providers and small machine providers. I won't go into the details because that is not important. But in summary, with depod, you can just focus on your providers and your workspace to run anywhere you want but there's a machine. And how do we provision a machine on a workspace? So you go and talk with your depod CLI or your depod tool and it goes and creates a machine on your provider so that you have a working workspace. And so how does the cloud communicate with the depod? So this is your local machine. This is the cloud. So it works as a simple as such as connection which is encrypted using the report channel. So it's secure as well. And you can run this. The best part is that you can run this under your intranet as well. So you don't need any access to the public IPs because that increases our security over there. So like in summary, what are the use cases for something like this? First is fast onboarding for your teammates. Like for example, whenever we join a company, the first thing is we go through the documentations and all those things to make sure all the systems which we have built are working on your local machine. So whenever there is something like automation in that front as well, you can just go and create the experience like you can go and click. And not just for your company but for open source projects as well. The CNC is very interested in doing this because we want new contributors to come through the projects. So people are using Dev Containers so that people can just define the environment and they can use it. Next is cost savings. So there are a lot of tools in the market right now and having your own VM is way cheaper than those tools because you are just killing the abstraction layers over there. And again, as told, you can have the cloud power when required. So I will have a small demo over here. So if you are interested, you can download DevPod over here, add your provider and launch Dev Containers. So this is a QR code of DevPod website. I will wait a couple of seconds if you want to scan it. And also I've kept a couple of seekers of DevPod over here. So when you, after the talk, you can collect it. Okay, so this is a demo. So this can be your, this is a CLI, not the CLI, but your desktop application. So you can just go and add any Git repository over here which complies with the Dev Container standard. But for the sake of simplicity, we'll just go and take a simple example of a GoApp sample app and then you can connect your provider. So the provider can be any provider as we discussed. So here we are talking about GCP and you need to add your project, the zone. Apart from those, you can add more specifications over here. For example, if you go in the cloud option, you can discuss and decide your machine type. So if you need a machine with a GPU, you can select that. If you need a simple machine for your simple workloads, you can do that. And also the most important things is like, you can inject your credentials from your local machine to the workspace. So for example, you have your local Git and Docker credentials. So you can inject that to the workspace over here. And also you can define the network and some network over there for your requirements. And then when you go and click over here, you can select a lot of IDs. So we are always expanding the number of IDs we support. But there are a couple of JetBrains and VS Code IDs, also Jupyter Network and your fleet from here. Then you need to go and put a workspace name over here. And then also there is something, a concept called PreBuild, in which you build all your containers and you can use that. Once you have done that, you can just go and watch the Google Cloud Console and you can see there is a VM coming up. So this is the VM in which your workspace will be running. So as we see right now, what will happen is like, you will see there will be a container coming up in the portal and it will pull all the image layers. The image layers will be a Golang image layer because we have a Go image. So let's fast forward this. And now as we see, we have a working workspace with a Go 19.1, as we desired in our state. And here you can just use the workspace, for example, for your development and as well as forward the port so you can access a web UI if you are defining something that in your API. And then to delete, it is as simple as going over here and deleting the workspace. Once you do that, you will delete the VM as well in which your workspace was running as well as your local instance. So in conclusion, Depot defines your workspace by managing the infrastructure. So whenever you create a workspace, it's focused on creating the VM as well. Second is you can have inactivated timeout. Like for example, if you are not using the machine for a couple of hours, you can make sure the machine shuts down so that you save on cloud cost. Secondly, when all your workspace from the machines are deleted, it will delete the machine. And the last is you can check Depot as a OSS solution for providers, code editors and more for customization. So those are a couple of tweets I found a week ago, which I could put in where people are using it and moving the pace of their projects. And last but not the least, you can, if you have any more questions, you can ask me or join the slacks. And don't forget to collect the stickers over here. And this is me. If you want to connect me on social, something like that. This is my QR and I'm open for questions. If anyone has any questions right now. So like since you mentioned here, we can connect to the multiple cloud here, right? So is there any way we can identify what is the pricing belong to that one? Because when we are going to launch the VM into the cloud, right? We should know that, okay, is there any pricing involved in that? Whether it's a free or whether it's a free, like that, is there any way? So the thing is we can't see that on the web UI over here. The reason being generally when someone uses cloud, they have a predefined quota over there. So if your company is kind of assigning your predefined quota, which is counting towards their machine list, I don't think we define over here, but you can probably see that on your console, right? So that is it. Any other questions? Yeah. Can check for the files, right? Are they synced locally or it's only cloned from the repo and then once the machine is decommissioned, the files are also deleted. Okay, is there any way that we can stash the changes locally and stuff like that? Is there in the plan or? Okay, okay. So we've got it coming up again. I don't know. So does this have to be under decommissioned as well as on your cloud? Maybe you can mount it. Yeah. Only if you are using it at the next station, like, for instance, if the cloud is ready, do you have more insight into that? Any other questions? Okay, so where's your booth, Erotic? Oh, you don't have it? Okay, so I think so much for your talk is very interesting. And if you have any questions, feel free to contact Erotic. Okay, thank you so much. One minute or two minutes, okay. Hi, Xinchao. Hello, everyone. Today, I show my topic about AI. You know, AI is very, very hot and population. You use chat GDP, Lama, and the other large language model. This, I just did night. I try to read my shell slides. Two, I know this is the April 6th, GTC-Invada GTC Global Technology Conference last month in America tried to do some checkpoint and this door for GPUS. So, and so I tried to and the page to my slides. This is very, very hot. If you do some computer slides, you can listen my shell the topic is very, very interesting. Okay, this is done. So, I stand before you and try to introduce myself. My name is Luan Jianhai and do very, very years developer in open source. Open source. I do some kernel, Linux kernel, for five years and do some hardware, KMU and Zen. And you can find my name in KVM, then development, mail list, you can find my mails. I find this developer, five or six in Oracle. So, I find some security in five years. So, and then, I do some open aura developer for since then. So, you can, I don't know you, yesterday afternoon, you go to the mail hole. This mail hole, sorry. Speak Chinese. In mail hole, I tried to do some other project. You know, you find the open aura, some A10, O2, or AI, try to do some OS profiling. And another project, AOS. Yesterday afternoon, show some AOS and the six care project. So, maybe this project is do after. So, before, no. Before I do some project, but I find interesting field to do. So, maybe I try to do AI container new magazine. I don't know, maybe you can ask me why name is container new magazine. I don't know. Do you know why this name is not container magazine? Just AI container magazine. I don't know. Can you help me translate it? Because, why is it not AI container or is it container? No, it's not. It's not container migration. Yeah. Why is it called AI container live migration? It's not just a container. I don't know. I think it is AI container because it has Z build. Yeah. With normal container, it's some... It has a normal container. It only have the CPU and RAM. But with AI container, we have schedule or Z build for it. I think that... Use CPU. Just use CPU. But AI container use some GPU or other accelerator. Maybe Huawei, NPU and Google, Google, TPU for AI training or inference. So, maybe we start the slides. So, this is file chapter. Yes, okay. Container OS, I have created interesting project in open aura. This is name is cube OS. Cube OS, that is AWS, the bottle rocket. I don't know bottle rocket, AWS, American NWS, bottle rocket. Maybe you can find some read height, some core OS. Core OS. I don't know core OS. Core OS. Just, I don't ask everything. I just talk about the cube OS, such as bottle rocket and core OS. This OS is just base upstream OS, but just like for cloud native OS. Why cloud native? Cloud native is very smell, very security, more feeling, more migration, maybe. It's very, very, very, very popular, population kind of cloud native OS. So, maybe this is introduced of the open aura. So, maybe I next page. You can read, you can read the, page at just the afternoon. So, I don't introduce. So, maybe just introduce my project. This is cube OS. Cube OS is container OS, like bottle rocket or core OS. Just the base, based on open aura. Open aura, base open aura and vendor choice is ABI-compliant, ABI-competitive. Application benefit interface is very, if you build your application. If you build application, open aura application can just run cube OS and in container. So, this is background. This is background. I don't introduce in detail. So, I just show, this is the first introduce of class OS. Maybe, cube OS in Huawei cloud, some milling, milling instance. So, maybe if you introduce the project, maybe you can find the project and enjoy it. No, don't introduce in detail. So, this is the first introduce of class OS. Cube OS architecture and technology detail. So, I don't introduce in detail. So, this next chapter is why do this something? Do you know why do live migration? You know, you know if you do some computer size, you know, virtual machine live migration. You know, VM migration, virtual machine, virtual machine. If you study in computer size and into some level, you can know virtual machine view migration. Okay, so, but you know why AI application uses VM migration? Why? Why use virtual machine? Virtual machine is very population. Why? I just say, if you use virtual machine migration, your GPO parameter data is very, very, you know, GPO have one, 19, two GB in memory. So, if migration data is very, very long. So, maybe we can use container migration. You can translation is the smell data. So, maybe I use why container live migration. This is live migration is the spot, reduce the spot instance, just by spot instance. If instance is died, or maybe the biomedical try to fix, maybe I can migration my container to other machine. So, one, two, if you have two container, if you have two container, one container is very, very relation another container. Here, container. Maybe we can talk, we can reschedule this container, this container to one machine. This is two, since here, this is resource fragment. If this, this, this, this, maybe I can reschedule this container, here is this. Here, if requires client customer or client try to required resource, not enough, not enough. So, maybe I can migration the container to another. So, another biomedical can enough this container. So, the last is here, is here, is just reach, maybe I can container another here. So, this is the schedule container called opportunity. This is another, maybe you know, we can do some LLM training, how long? Do you know how long we can GPT, GPT 3.5, how long training, one, how long training, training? Yeah, training. Do you know training, just training once a long time? I tell you three months, three or five months, if you error the past computing is, wasted, okay, okay, it is very, very long. So, maybe we can do some checkpoints as law, if you have, if you forget some things, you can do some research, if you do some things, if you forget what you did three months ago, you can do some research, if you forget what you did. Yes, this technology is do some log, and if you forget the computing, maybe you can redo by your log. So, this is here, this is here, is very, one to one thousandth card, this is two thousandth card, this is training time, here is fault error, is long fault error, so maybe, so maybe question. Here, this is history, container new migration, the container new history, this is CIO to this year. Maybe Kube OS uses CIO this year, maybe this is Kube OS, Google uses PLAN LPC, uses CIO this, here is, this last piece, China GDP uses this technology and 20, this year, okay. This is CIO new migration is in tier, if you don't know the details, maybe you can find some page in web. This is container, AI container trial new migration, use this, okay, okay, okay, okay. This is, you see, this is chat GDP use this container and this paper release under 20, 20 years. Here, two years ago, Google, Adion, Microsoft, Adion released this container, this is technology, AMD try to release the technology under the file, okay, okay, see, okay. This is, this year, using the global technology, technology try to release the technology, maybe you can, if you have interesting, maybe you can find the web and the public web, okay. This is here, you can find the demo, okay. This is solution AI, AI container new migration. This is here, this is maybe, after try to use new migration in this, this, this, maybe, I can try this, this is container run operator created replace with run this door. I don't, I, this, I don't launch new container, just restore the container. This is my solution. So, solution is in CPU, the first style separate and marriage CPU and MPO stands, checkpoint restore CPU. This is GPU, some technology, okay, okay. This is POS, you can detail after the meeting. So, this is MPO, MPO try to some Huawei Ascend card, Ascend card, so this is entire, maybe you can do something entire, maybe you can do new container AI migration. So, the last, maybe I can do some demo here. This is, okay, okay. This is the GPU, this is GPU, another GPU, maybe I can run the my container training in GPU. This, this, this is workload. So, this is GPU, you all choose another, this is GPU logs you can find, this is workload and running. So, this GPU is no workload, no running. Wait, wait one minute, the container will be run another GPU, you can, here, this is no wrong, no wrong, no wrong, no wrong. Here, migration the container try to the another GPU. Okay, wait, wait, wait, here. This is no wrong, no wrong, no wrong, maybe. Here, please wait. This is stop, the AI training container. So, maybe wait one minute, try to migration the AI container into this. Here, this log, here. Walk around this door, the AI GPU in this. This is, this is complete. If you have interesting the technology, maybe you can connect with me, or with email or person after the meeting. Okay, thank you very much. Thank you very much, Mr. Tian Hai, right? So, everyone, if you feel free to come and talk to Mr. Tian Hai at the open euler section at our stairs. Okay, thank you very much. No, I have my talk after this, it's my talk next to us. I can only, no, I cannot. Because I have to leave immediately, not this time. I'm just a bit nervous introducing Mr. Jominic. Sorry. Hello, hello. Hello. It's easier than I thought it would be. Can you translate a question for me? Yes, because it's fantastic. I want to ask the audience who is born after 2006. You asked that later during the talk. Later? During the talk. If you can help me translate that question. If we have anybody in the audience who's born after 2006. Okay, hello everyone. So, the next talk, we'll please to welcome Jominic. So he's going to talk about distributed, I think, await a new programming model for cloud. All right, so everyone, please welcome Jominic on stage. Thank you very much. See if this is set up well. Oh, that's okay. I think we're good. I'll do this manually. Thank you very much for being here. Again, I'm Jominic. I'm Jominic Torno. I'm the CEO of Resonate. And at Resonate, I work on distributed systems and in distributed systems on distributed async await, a programming model for distributed systems. I started my professional career in 2006. If I may ask, it's like, is here anybody who is born after 2006? Nobody. Okay. All right. So, yeah, I started my career in 2006. So that is almost 20 years ago. And in 2006, AWS, Amazon launched the very first cloud service. That was EC2 and S3. And the cloud services changed the software engineering landscape forever. So when I entered the field, the dominant thinking about software systems was thinking in layers. We had a top layer and the top layer consisted of a set of clients. So typically web browsers. And then we had a middle layer or middle tier. And that was a much smaller set of servers, typically web servers. And we had one database. Now, technically, this is a concurrent distributed system. However, as a developer, I don't really have to think about these systems as distributed systems. Every request is processed individually. And every request is well isolated by the framework itself, by the web framework itself, and by the database. So as a developer, you can almost think about this system as if it was a sequential system. Now, with the rise of the cloud and service oriented architectures, applications are no longer confined to three layers. And today we think about systems in terms of a set of independent and interacting agents that span across multiple machines and that also often span across the entire globe, multiple geographical locations. Now, today's applications are distributed applications. And about today's applications, we have to think about these applications as distributed applications. But there is a catch. So while distributed systems are thriving, the developer experience of distributed applications is broken. We have to implement orchestration, choreography, event sourcing, CQRS, command query responsibility separation. I don't even understand what that is. The transactional outbox pattern, event-driven systems, reactive systems, sagas and compensations, client-side discovery, server-side discovery, rate limiting and retries. And sometimes we even write application logic. So the developer experience of distributed systems is broken. Now, but everybody says distributed systems are complex. Everybody says distributed systems are hard to build. And also every project promises you can focus on your business logic. But what does that actually mean? So today let's look into the reasons why distributed systems are complex and why distributed systems are hard and what focusing on your business logic actually means. So from the 1950s to the 1980s, that was the era of simplicity. Developers were tasked to write applications for single machine, single core platforms. Platforms that were able to do exactly one thing at a time. Now, this era gave rise to the quintessential programming model. The sequential programming model, also called direct style. In the sequential programming model, functions call other functions in a linear flow of control. And when one function calls another function, the caller has to wait until the callee returns. A delightful developer experience, the code is simple to write and most importantly, the code is simple to understand. Now the 1990s to 2000s, they were the era of rising complexity. Now developers were tasked to write code for a single machine, multi-core platforms. Platforms that are capable of doing multiple things at the same time. So this shift from single core platforms to multi-core platforms also necessitates a shift from sequential programming to concurrent programming. Now, what was the initial solution? I'm sure we all know the initial solution was that we just added threads. Let's just add threads to the sequential programming model. Easy, right? Unfortunately, it turns out that adding threads to the sequential programming model is not a simple addition. In fact, it is a major disruption. So functions under linear flow of control, they are simple and delightful. But functions and threads, essentially functions under non-linear flow of control, they are complex and dreadful. So what is the issue? What is the underlying issue? The underlying issue is that we did not elevate concurrency to a first-class citizen of the programming model. In fact, we just tried to like bolt on something to an existing model hoping that they would seamlessly integrate. I turned out they did not. Eventually, the ecosystem to improve the developer experience implemented concurrent programming models. So programming models where concurrency is a first-class citizen. And there are multiple examples of that. One is, for example, the ACTA model. The ACTA model is a theoretical foundation for languages like Erlang and Elixir. Another one is communicating sequential processes. That is the foundation for languages such as Golang. But there is one model that has been widely adopted across ecosystems and across several major languages and that is async await. So async await has become the foundation for languages like TypeScript, JavaScript, Python, Rust, C-sharp, F-sharp, Kotlin. I am sure I am forgetting a few. So that is a very popular programming model. Async await resembles the developer experience of the sequential programming model to the extent possible. So for the majority of cases with async await, the majority of concurrent tasks look sequential. Now, in the 2010s to the present to today, we are in the era of exploding complexity. Now developers are tasked to write code for multiple machines. Now developers are tasked to write distributed systems. And this shift, same as before, from single machines to a multi-machine platform, necessitates a shift from concurrent programming to distributed programming. And what do we do? Well, we add networking. We add remote procedure calls. We add message passing. Bottom line, we add networking to the concurrent programming model. But just as before, just simply adding threads to the sequential programming model was not just a minor addition. Simply adding networking to the distributed programming model is not, I'm sorry, to the concurrent programming model, is not a minor addition either. It is a major disruption. So once again, we want to elevate the developer experience of distributed programs. And the answer is distributed programming models. Programming models where distribution is a first-class citizen. And distributed async await resembles the developer experience of concurrent programming models to the extent possible. So now for the majority of cases with distributed async await, the majority of distributed tasks look concurrent. So this is especially a slide for the students. If anybody wants to take any picture, this is the time. This is a summary of the transition from sequential programming to distributed programming. Sequential programming is characterized by a total execution. Thank you. Sequential programming is characterized by a total execution order and total failure. When we move to concurrent programming, we're adding the partial execution order. We're adding non-determinism. And we are still at the total failure. When we move to distributed programming, we have partial execution order and we have partial failures. Some of your machines may fail. Some of your network messages may be dropped. And others still continue. One of the most difficult situations to deal with. So with this framework in mind, we can actually articulate the key requirements that a distributed programming model must address. A distributed programming model must provide coordination for partial orderings and it must provide mitigation for partial failure. And this is where distributed async await comes in. Distributed async await extends async await and makes distribution a first-class citizen with a delightful developer experience. So distributed async await provides coordination and failure mitigation across machines so you can focus on your business logic. Okay, now we said that. Again, you can focus on your business logic. What does it even mean? Every single project out there promises you you can focus on your business logic, but nobody ever tells you what that actually means. Okay, so for that, let's define two supporting concepts. Let's define semantic signal. Semantic signal refers to the aspects of the code that directly contribute to expressing the solution of your problem. It is necessary and meaningful. It's a necessary and meaningful part of your code that addresses the attention behind the program. And then there is semantic noise. And semantic noise refers to the aspects of the code that do not directly contribute to expressing the solution of the problem that is being solved. It is a necessary but utterly meaningless part of your code that does not address the intention behind the program. Now, when we start, when we write, for example, a prototype, then typically what we see is that we have a lot of semantic signal in our code. Most of our code just deals with the problem that we actually want to solve. But then over time, the more production ready our code becomes, the more semantic noise enters the scene. And then if we move forward and we actually push into the area of this concurrent and distributed systems, the semantic noise becomes louder and louder to the point where eventually I cannot even see when I look at the code what the application is meant to do. So our goal is to reduce the noise and boost the signal. So we basically want a very high signal to noise ratio so that the code tells its own story. The code tells you what it does. Now let's go over the three programming models in a little bit more detail. So first is a sequential programming model. The sequential programming model is characterized by functions calling other functions in a linear flow of control. And functions compose recursively. And that is a remarkable abstraction. Even our largest systems, the largest systems on this planet are composed recursively from the smallest building blocks, from functions. Now when a caller invokes a callee, the caller transfers control flow to the callee and suspense until the callee terminates and transfers the control flow back to the caller. It's a dead simple programming model leading to a delightful and equally simple developer experience. So that code basically needs no explanation. We are taking one step at a time. The problem here is that there is no opportunity of doing multiple things at once. And concurrent programming models mitigate that issue. Concurrent programming models allow us to express that we want to do multiple things at once. Now the core abstraction that async of 8 adds to functions is the abstraction of promises. Async of 8 decouples the caller from the callee via a promise. Functions are the universal abstraction for computation. And promises are the universal abstraction for coordination. So a promise sometimes, depending on what language you're looking at, sometimes they're called future, sometimes they're called available, sometimes they're called deferreds and sometimes they're even called tasks. It's always the same. And a promise is a representation of a future value. A promise is either pending or it's completed. Or if you want to get a little bit more specific, it's either resolved, success or rejected. Failure. So in async of 8, when a caller invokes a callee, one function calls another function, the caller does not transfer the control flow to the callee. Instead, the caller receives a promise and resumes execution. And the callee starts execution at the same time. Now, eventually, when the caller needs the result of the callee, the caller awaits the promise. And if the promise is still pending, the caller suspends until the callee terminates and completes the promise. And if the promise is already completed, then the caller resumes with the value of the promise right there in there. Now, this is the code that we were looking at before and it looks very similar. The goal of async of 8 was to resemble the sequential programming model as much as possible. However, now we actually have the chance of parallelism. Now we have the chance of doing multiple things at once. And if we wish, we can actually express even more complicated control flow, where we have the possibility to do every single step at the same time. So if you have multi-core machine and enough cores, it can actually happen in parallel. Now, last is the distributed programming model and distributed async of 8. Distributed async of 8 extends async of 8 with one core abstraction. So far, promises were ephemeral promises. They lived in the process. They lived in the memory of the machine. They were only accessible from this one process and their lifetime was limited by this process. When the process terminate or crashed, the promise was gone. So now, what we are going to do is we are lifting the promises from the confines of one process and store promises outside of a process. You can say in the database, store promises outside in a process, in durable storage. Now, with that, promises become a abstraction for coordination, not only on one machine, but also across machines, across processes. And in order to do that, we give a promise not only a pointer, but an actual URL. So a promise is a REST resource that is accessible from anywhere. It's accessible from multiple processes and is therefore now a universal abstraction for coordination for distributed systems. And once again, the developer experience of distributed async of 8 resembles the developer experience of async of 8 as much as possible. Now we can do a short case study. And let's look at how distributed async of 8 mitigates partial failure. That's one of my favorites. So let's take super simple. I'm sure all of us have played with AI and large language models at this point. Let's look at a really small example. So locally on my machine, a developer web application could be, for example, Node.js Express. And I'm using Olama and Olama API to implement a summarizer. All right. I can send it some text and ask the LLM, give me the summary, one sentence summary for that. And then my client is nothing else but curl, right? So I have that on my terminal, local machine client is curl. So technically, technically this spawns multiple processes, right? Every single one of those is its own terminal. So technically this is a distributed application, but doesn't really feel like a distributed application. I'm the only one using it, right? So therefore there's only always one request in the system. And let's be honest, it still all runs on one machine. So while it is a distributed application doesn't quite feel like one yet. But eventually we deploy our application, right? We want to deploy our application and now we have multiple clients connecting to the web application. Now we are in that territory where it feels a little bit more like a distributed application because I already have multiple clients and therefore I have multiple requests in the system. But on the back end side, it still feels fairly linear, fairly straightforward. The problem, if anybody has ever tried that, the problem with Olama or any large language model really can be a little bit like resource hungry, right? So this will probably not stay like that for a very long time. So since large language models are computational demanding, we must spread across machines. So now we have a situation that already does look like a distributed system and it gives us one more headache. Web applications are usually really simple when a request takes only a few milliseconds, right? Because if anything happens in between, any crash happens, I can simply retry. What's the harm in retrying, right? So I can simply retry and then instead of 250 milliseconds, I've ate 500 milliseconds. If that happens once in a while, that's not too bad. However, something like Olama, computationally a little bit more expensive, it may take multiple seconds and it's not only computational expensive, it is actually expensive. So you may run this, this is a production application, you may run this on a GPU cluster, you are paying for that time. So if a request enters the system, the client opens a connection to the web application and the web application opens a connection to the Olama summarizer. Now here we actually have a problem and that problem is a strong dependency that is very common in the async await and gRPC world. If the client crashes, it will drop the connection to the web application and to Olama summarizer. Therefore, wherever Olama is, maybe five seconds into its work, that work will be abandoned for actually no good reason because the only thing that actually had a problem was the client in the first place. But the connection on the web application and the connection on Olama will drop. So with distributed async await and durable promises, promises that have a URL, an identity and integrity, that is unnecessary if the client drops and the request gets retried. The client simply joins the execution that is already in flight. The web application and Olama summarizer are non-devisor. And of course, that doesn't only work for the client itself. That also works in the middle of the stack. If the client is fine and the Olama summarizer is fine, but the web application drops, the connection crashes for whatever reason. Then once we retry with distributed async await, we can again join the computation. And of course, that also works for the Olama summarizer itself. If that one crashes, that can also be automatically retried. And of course, it doesn't have to be routed to the same machine. It can be routed to different machines. And with distributed async await, you get all of that with one line of code. Now, if you're interested in my slides and some references and of course also the project itself Resonate is an open source project that has JavaScript and TypeScript SDK and a Golang backend implementation. But for anybody who is interested also in the theory of computation, the theory of threads, the theory of developer experiences, I added some references to this GitHub repository of some of my favorite papers in that regard. And with that, thank you very much. Thanks for being here. And if anybody has any questions, I'm more than happy to answer. The question is, I know there's a lot of concurrency models. So one critique of async await is that it colors the function. Actually, so that's the company of many people. So some languages, states which you, for example, can't read and go, they opt to go with goroutines, for example. So what do you think are these differences between, let's say, async await and goroutines in terms of doing it in a distributed way? So async await and goroutines address a concurrency problem, but they do not address a distribution problem. So we can look at them and compare them in the sense of concurrency. And there are multiple concurrency models out there, like the actor, like the communicating sequential processes like async await or also like manually via threads. So when it comes to the concurrency first programming models, actors, CSP, async await, I do have the strong opinion that the concurrency first is strictly better than the sequential plus threads. However, there is a fair amount of debate and one is do not color my function. There is a fair amount of debate, which one is the better model. I personally enjoy async await because it resembles the sequential programming model. Whereas actors and Golang, they do not resemble the sequential programming model. They require to rewire your brain. So you have to think about programming differently. Async await explicitly tries for you to be able to think about programming the same way. However, it is still at the end of the day a style question. And it's probably also a question of what problem you are trying to solve. So while I do not have the strongest feeling on the concurrency first programming models, I don't want to say like this one is strictly better than that one. The only thing I want to say is that the concurrency first programming models are strictly better than the sequential programming model plus threads. The gentleman had a question. That is a fantastic question. I would also like to direct you. We have a specification. We have a durable promise specification so that you can also implement your own durable promise infrastructure if you wish. And the specification addresses that problem specifically. And what we are doing with promises is we guarantee linearizability. So if you have two clients that concurrently, let's say one client tries to resolve a promise where the other client tries to reject the same promise. So since it happened concurrently, so independently, we do not know which one is going to win. However, we will guarantee you that everybody sees the same result. A promise, a durable promise will never lie to you. That is encoded in the standard. And it's also encoded in our Golang backend. In our Golang backend, we employ deterministic simulation testing and we subject our implementation to basically days of testing condensed into minutes of runtime to make sure that each and every combination is actually tested and the linearizability is a guarantee that you can rely on. So first I would very much like if you could visit this my Twitter handle. I was actually tweeting about this this morning because I'm still jet-licked and couldn't sleep. So I was tweeting about this this morning. There are queues. You are correct. There are queues in the background, but the queues are not a first class developer abstraction. The first class developer abstraction is application level view is that of an asynchronous function. However, we have in the in the back end, we have a distributed event loop and the distributed event loop does use queues and event notifications to distribute work across machines. Now a very important part and also one of the papers that I linked in the references, a note on distributed systems or I'm sorry a note on distributed computing from 1994. The gentleman at Sun Microsystems were far ahead of their time and they argued that we should never try, never try to make a distributed system look like it was not a distributed system. It's a futile effort. One of the reasons is networking behavior and networking latency. That's why for a distributed async await, you actually have to be explicit. You have to be explicit to say run in the same process right next to me or run on a different machine because we believe that this is not a decision that we can make on a platform level. It's a decision you have to make on an application level because there are inherent trade-offs in leaving a process boundary that are irreducible. They cannot be abstracted away. Thank you. Absolutely. There is a similar facility like for example that you saw in Temporal. You can use encryption to encrypt the data. But another point is that the model durable promises is not a central model. You can deploy multiple durable promise servers and you can deploy these promise servers also in private networking. Then you can run your computation, your sensitive computational and sensitive data also in a private network that never leaves the network. It's not a centralized platform. It is a decentralized platform and it gives you the opportunity to mix and match. You can deploy multiple promise servers. Some promises live that need wider reach but have less sensitive data. Some promises live in an area that has less reach for example confined to the network but has very sensitive data. You are absolutely right. Once again you definitely got to visit my tweet feed. I tweeted about this this morning. You are correct. There is like hey what is the difference between a job scheduler? The difference is that we don't expose jobs and job scheduling and notifications to the developer as a main obstruction. We are not event driven. We are imperative. We let the event loop translate sequentially looking code onto a job framework. Yes exactly. A best example is a database. When you say a database has a write a head log. We are not writing directly to the right. We could but we are not. We do not write to the right a head log. Instead we are using SQL and SQL is an abstraction on top of the right a head log. The database writes on top of the right or on the right a head log on our behalf. That's basically what distributed asynchronous. It talks to the job framework on our behalf. So I have something maybe more like a rule of law. You can suspend the content research then reply it and send it to another company. So in comparison other platforms they have a proprietary programming model. For example other platforms have workflows and a workflow is a sequence of steps actions. They have local actions and remote actions. We do not have a proprietary programming model. We have functions and promises that call other functions and other promises. So we try to be as true to the universal programming model of functions and promises as possible. No proprietary programming model on top of that. I'm sorry. Okay. Interesting. Oh we got to wrap it up. Thank you very much for being here. We are going to have Zahn speaking about OSS decentralized and optimized networking solutions for real time services. Hello everybody. I'm a member of a team and today I'm very happy to be here to introduce our last days the open source project with focus on and we are trying to create it in decentralized architecture. So the first about me I am dark. Yeah, I come from Vietnam and I have about 10 years experience about networking and media server. And mostly I'm working on a private close source from two years ago. I leave my company and switch to open source project. And we found a group and if you are interested about open source and about decentralized streaming, maybe we can join. Yeah. About us we are the independent team of engineer and we have a passion about streaming about decentralized also about networking and clearly we have a three member in Vietnam and some other contributor from some countries in the world. So maybe we have a question about why we started this and the story back from three years ago when I create a media server for learning, I do take solution with a student from mostly in Vietnam and Japan and Detroit from US from Philippines also Japan and maybe everybody we know about the internet connection in Vietnam is not good. In each year we have some sometimes the connection is very terrible. So we need to solve the problem with media server because the user from over the world and we need it to support the low latency high throughput and money zone. Yeah. So we try a lot of solutions. The first year about typical architecture. So everybody will try it first. We use some centralized server in both Vietnam US or Japan and it not work well because sometimes great connection is bad. Maybe we will boost the closest server but when user from money money money zone from money country we don't have any position that good for both. And also we try something like to cast case between the server. We also have a same problem with the server connections. Maybe some provider is good for US but another is not. So we don't we don't find any provider with good for every zone in the world. And from that point we try to another abstract. We are trying to overlay the network to overlay the data between the server. Instead of sending directly from to server we can relay with another hope and we can implement my out your out our smart routing mechanism by on RTT and sister. And with us testing we found that we decentralized solution which can we can be solved that problem. We also improve the high availability of system. Because the decentralized architecture don't have any central point. So we can continue to working with any of those can be down. So the most strong example is Bitcoin network and something at the news network. It can be down. Yeah, cannot be done. So we find another solution on the same decentralized work. But mostly about focus on the permission less solution and performance and the scalability not latency. It's simple. The most popular library will be found internationally. Maybe when we try to create a money zone network with Lippie to be we have the most most popular. The problem about latency is when you have two nodes in Asia and Asia we went to two nodes to talk together. But mostly most of the time we will try we've been routing to the US something like us and come back to the asia as we increase the latency a lot. So with that reason we try we try to create another solution focus on latency and performance and with some principle we will be my custom heuristic by routing. And I can explain it after us and so that I ignore directly we can really it in a better way and it can support the most permission and so the permission less network. And because I don't want it's only support streaming. It can be applied in another industry example in IOT in ProC and in VPN something like us. We try to adapt it as a library as a protocol level to adapt with another industry. So the most important thing as the most the core idea of the network is about routing right. We have a three point of the routing technique. The first is the meaningless node address. We try to embed the address as a location of the node inside the address. With that we can easy to routing from one side to another node without something like this nori or something like centralized server to store the location of the node. And with the focus on decentralized network ignore only the limited view of the network not all of the network and ignore instead of calculate the phone routing path we always determine like hope maybe maybe it's been lost packet but we try to send it fast fast possible. So I can explain about the routing technique is it is a very simple idea. It's similar when we have how we travel in the world. So it makes us we have a network where ignore have a network hope a collection of information how to go to every country in the world example everybody will be here when they know how to go to US. So first we go to airport. This is the first step. And so we also have every city in my country in each country also ignore when you have information how to go to every place in this city. Yeah, this is a simple idea. It's simple when we try travel from US something somewhere US to Vietnam Hanoi Westlake. So mostly example we end in New York. We will go to San Francisco and go to Ho Chi Minh City because we have a direct connection between the US to fly from from Francisco to Ho Chi Minh City. And so we will try to rig so Hanoi and we go to Danang City and go to Hanoi and because now we in Hanoi we will know how to go to Westlake. But it's a very simple idea look like the real one. Yeah, when we implement is we try to minimize the network as a node address for both transfer data with minimum we use only 30 32 bit address and we split it into four layer. The first as the three, the three layer will be used for for calculus like location and with that it's on only very small area only 32 30 kilometer square area. And also because they have a four layer and a layer on a one by and ignore only how routing table with the maximum around 1000 and try and it and try only a software. Array with a maximum size is the number of the connected table. So we can find the like hope with only the big only one and it is really fast. And so we only have big only Anna Anna for Anna is the number of the connected table. So the space with only very small. So, because it's not only have a limited view of the network. So we need to update it in each second and clearly I selected it one second in one second. It will broadcast the router table routing table to all of the label and the postcard content only a little bit of the routing table. So, so seen message with only the content and try that do not go to the label. And also it only selects the most fastest path not all not all of the past. And it allow us to optimize by using something like Instagram to send it when we do the traffic a lot but it not implement right now about architecture we will use at a structure network. And it not only need to connect a few net few few not not all of the most in the network to answer that it clear example. We have a four layer so back to the example when we travel is the real world. I only I only need to know how to go to all of the country and all of the city in my country on the plug in the city. And so we just we don't need to know every every address over the world. So we also have a flexible discovery mechanism. We talk or currently are this year table. Also we can have a more and more easier to conflict. You see it can be made manual by by talk. It's a very useful when some private network is when we create a stream made a stream server. We usually we are too tired of note is the gateway and it will be your media server node. And it's most of the time the media server mode server nodes only need to connect the gateway node inside the zone. And after that all of the gateway will connect to each other to provide a full connected network. So it can done by start only not very complex automatic which are kind of kind of yeah. So about feature because it's a network I for I focus to get on a very simple simple feature. So first it give you give you a key value. So pop-up node aliasing and service routing is a core core feature of that group. And we only implement that up as that we will have a service. And implement more and more difficult on more complex logic. And we have some unique feature which we focus on latency. So for the pop-up feature I must we have a network. We want to publish the data from the one node to owner of the other node. Maybe is a origin server something like in CDN we call it CDN origin server origin server will be sent to owner of the other node. But it will increase the bandwidth of the server and maybe server will be down or bandwidth will be not enough. But with at most CDN we will build something like minimum spending tree between the publisher and subscriber. And the tree is optimal way. It's simple when we send from publisher it only send the two layer relay and relay when send to node to answer that it will cover owner of the subscriber. And because it not only send to a small set of its label so it will reduce the network traffic a lot. So node is service feature so it is look like DNS but it is more lightweight. It is a way we implement custom logic. We have only one byte to determine service ID and it is very small. Because I am currently I'm focused on private network and the service list also we see in between the label. And with that we have a unique feature is that we can route to the fastest node inside the server. It's simple is that we have a service one and we have a lot of nodes there. It's simple for node three we want to send to service one. So it have a two option. It can be sent to node two or node four but sometimes the node two and node four can be going down. At that point we can send it to another zone if we have conflict ZAS. So it would be happy to have more and more scalability and also the higher value. And we also have some very, very good features. We can use the broadcast on the node that's running the service. For example, from node 1, we want the broadcast on the service one. Maybe something like ping or health check or something like that. We can from node 1 to send the on the node that's running the service one. Then routing table will take care about how to send it in optimal way. Currently, it is implemented and opened in GitHub. And it is only the first version. And we want it's anti-inverse for both performance and safety. Currently, we focus on performance and testing and easy to testing. So we use to choose Sanayu architect, where we decoupling between the logic part and also the runtime, the iOS, everything that's not in protocol path. So we can implement more and more runtime. For example, currently, I have only two types of runtime. It's pooling, and it's pooling, and KQQ. But in the future, I can apply some network as well as EPK, AF, ATP, or EBBF for much of the routing more and more faster. And we do it in 3D for architect for maximum performance of the service. In detail, we have some core feature. It is fixed. And we can implement a service upon the feature. And we have a two-way to talking to customize logic. We can create the service inside the network, by the way, and embed it inside the node. Or you can use eternal over the message skill. And we can code directly to feature if you don't want to build a service, or you can build a service. For testing the performance and also the optimizations, we create a network with 31 nodes in four zones over the world, so as you can see here. And we use VPN feature of the network. And we use Ping, the very popular tool, Ping to test latency between the nodes with both direct connection over VPN and I-Files. Files us a very interesting result. We can optimize a lot of connections. Example, in there, we have a collector 10, most 10 optimizer way. Example from Sydney to VietTel in Vietnam. The public connection will have a latency about over 200 milliseconds over the network. It will do about 100 milliseconds, about 42% huge improvement for something like streaming server or IoT system. We have some use case. The first is a reverse procedure for smartphone. We collaborate with Lumie Home in Vietnam to use a reverse procedure for IoT system. And Files us, it can optimize a seven-time latency. It can dial from one second to something like 100 milliseconds and also reduce maintenance cost. Because essential network can be go down and without interrupt on the network. And also reduce the infrastructure cost. We can don't need something essential, this centralized server or centralized expensive networking accelerator. And also, it can use some home device at the relay. Example, when we have a lot of nodes in the network, maybe some private node can be used as a relay only. And it's a public in the GitHub. And if you have interest, we can discuss more. We have another use case decentralized streaming server. It's the most focused, but currently I focus on both. It's different from other sources that streaming server is working something like a centralized. But we're trying to create it decentralized. It's something like CloudFare called it's a public someday ago. And it's a huge multi-zone data server where user can connect to any closest nodes and network will do their own as a work. And a room can be run in the multi-zone. And we can have unlimited number of user. And surely we support the mostly RTC, RTNP and C, but the protocol will be attained. And it only be used by some customer in Vietnam and in Japan. But it's the only idea not finished right now. It's we try to reuse a routing technique to create a network between the user by working like a pop-up. But we try to create a minimum spanning tree from the public user to another user. And from internet and from my experience, we have a peer-to-peer connection successfully about 70%, 70%. So if we have a network with a lot of user, so we can roast guard from one user to another user which does relate to any server. So it will reduce the cost of server loss and also the scalability of the number of the user without take a lot of cost about infrastructure. Yeah, I have a demonstration when we try to integrate in streaming server. Yeah, we have a network with a sub node over the room. It has only two zones in Hong Kong and Vietnam. And we have a conference between the three people. And eight people will connect to other different servers. As a network, we take care about how to server. We talk together to create a networking. And they can see the ping or gray is also the set of the connections. And when a connection interrupts the routing, they can take care to find another optimal way as far as don't interrupt the conference. Yeah. Yeah, thank you very much. It's a very, very fast talk. If you have any questions. Any questions? Thank you. I think it's a great improvement. And I have a question about, you said that it's decreased the maintenance cost. But I'm not actually understand why it's decreased that. Because I think, basically, previously, we have a centric server. And we have to maintain that. And this time, we have multiple sub nodes. I mean, in the smart home use scale, you said that is less maintenance cost. But I think it's more place to maintain easiness. I'm understand wrong or something. Please correct me. Thank you. Thank you very much. Yeah, about the maintenance cost, maybe we have more and more server. But the server is working like server less. So it can be go down. And if any server go down, we can start it in another zone without any configuration. And actually, it don't have any kind of database, any kind of position data. So if we have auto scale in monthly zone, it will work automatically without any manual by hand. The key point here is server is simple. It's only the forward. Not do something like a complex logic for some server go down. It don't affect any effect to other nodes. This is the main idea. Any more questions? OK, thank you. Hi, everyone. So we have a talk by Souven. On rethinking cloud native security with zero trust. Good afternoon, everyone. Hi, good afternoon. Hi, hi. OK, so once again, good afternoon to all of you. And good day for all of you joining us online. And today we are going to talk a bit about rethinking cloud native security with zero trust. Now, usually when I speak, I speak a lot of words per minute. So if this is too fast for you, please let me know. I'll slow it down, OK? And before we start, can you know how many university students we have? Racer fans? University students? No? OK, we have a couple of them. Awesome. OK, so let's start. Now, my name is Souven. And I'm a software engineer at WSU2. And I'm working on the Identity and Access Management team at WSU2. That's my email if you want to ask me any questions or for qualifications after the session. Now, today I'll start with a brief introduction to DevOps and cloud. And then we'll walk you through the traditional perimeter based security model. And then our main highlight of the talk, Zero Trust, followed by how it works and different implementation strategies and finally limitations and adoption. And then you'll have your time to take the mic and ask me questions. All good? Awesome. So let's start with a little introduction to DevOps and cloud. We'll start off with the significance of cloud environment in two days. Now, in today's world, businesses are increasingly relying on all these cloud methods or cloud computing to stay competitive and agile. Now, moving on to cloud computing, organizations can have many benefits. You know those already, right? Scalability, flexibility, cost efficiency, and the nicest thing about this is you don't have to have your own infrastructure. You can provision them as you go on the cloud whenever you need, however you need. As a result of this whole crowd plan formation, businesses can innovate faster. They can deliver services more efficiently. They can reach global markets very easily and so and so. Now, talking about DevOps a bit, now DevOps is what actually enables organizations to achieve the goals I said earlier and to be able to deliver products in high velocity. Now, of course, in your enterprises, better DevOps, that means faster time to market. You can ship your products very fast and it will improve your product quality. And of course, better customer satisfaction, the happier the customer is, the more businesses you gain and so on. Now, with this whole transformation of cloud and DevOps scene, the threat landscape goes as well. When the organization is increasingly using these DevOps and cloud services, the threats against these cloud services also increases. Cyber attackers, they keep constantly trying to hack and do other sorts of malicious actors for these cloud services and they often causes huge disruptions to your organization. Now, these modern concepts we have these days about hybrid cloud, multi-tenancy, and all these complex methodologies also increases complex security risks, which we should very carefully address or else our business will be going into disruptions. To talk more about some statistics about the cloud landscape, you can see in the last year, in the US alone, there were more than 2,000 cyber attacks happen. And not to mention the Johnson's control-faced incidents and the cloud flare mitigated zero-day vulnerability. All these attacks, most of these attacks happen in the cloud space. Now, you can see how serious these disruptions were. Now, that's about cloud and DevOps in brief. Now, let's talk about the traditional security model we've been following so far. We call this as the perimeter-based security model. Now, as the diagram you can see, we have two services in this enterprise, inventory and billing, and both of the services are protected by a boundary. Now, if you are staying outside of that boundary, external traffic, that goes to the firewall. And the firewall will make sure that this is a legit request and validate it. And after that, it will pass it to the relevant service. Now, the limitation with this perimeter-based security model is that there's a single point of failure, which is the firewall. So you can see the firewall is the only thing between your trusted and untrusted network. And if the firewall fails, all your internal traffic, internal services will be exposed, and it will be vulnerable. And the other one is insider threats. Now, in this perimeter-based security model, you are trusting all the things, all the network, requests coming from the inside, and you are not trusting anything from the outside. So if there are malicious actors, intentionally or unintentionally, within your network, your network will be not secured. Even a compromised set of credentials can make your system vulnerable to outside attacks. Those are the limitations of the security model. As a solution, we have zero trust architecture. Trust, but verify. Now, let me briefly walk you through around the concepts of zero trust. Now, like I said, zero trust is basically based on the principle of never trust, always verify. So with this model, we always assume that threats can originate either from inside or maybe outside. Malicious actors or harmful attempts can occur in both sides. So therefore, zero trust advocates due to verify identity continuously and all the verify requests coming from different devices continuously and do all these validations and verifications regardless of the location. By doing so, zero trust shifts our existing security paradigm from a network-centric one to a data-centric one because zero trust advocates you to protect our data regardless of what your origin is. Then zero trust, this whole architecture is based on a couple of principles. Now, if you Google these principles of zero trust, you may come across very, very different kind of interpretations. But this is the one I usually follow. So from all of these concepts, these principles are the most prominent stuff. Assume breach, continuous monitoring, and risk assessment, verify every access request, and so on. We'll discuss them in detail now. So we'll start with the assume breach. Now, assume breach is a pessimistic approach where you always assume an attack will happen instead of attack might happen. So you always think, OK, so there will be a cyber attack or something instead of expecting one to happen. Now, this principle, always expecting a breach, will transition your defense strategies into a more active network than a passive defense strategy because you're proactively taking measures to eliminate attacks instead of doing something after an incident happens. Now, with that, when you assume breach, you need to monitor your network constantly and verify all the access requests that comes through your network. So like I said earlier, regardless of the origin, whether it is inside origin or outside, you have to verify this and keep monitoring. Now, this verifying includes verifying identity of users, the devices they're accessing from, and different kinds of applications before accessing any of the resources in your enterprise. Now, multi-factor authentication can be a very crucial part in this thing because when you combine multi-factor authentication with risk-based prompt, it will give you even more secure way to authenticate your users. And by constantly monitoring your traffic and user behavior and system logs and all these observability-related stuff, organizations can detect and attack before it happens, mostly, and then take proactive measures to mitigate the attack. Now, the least privileged principle is another principle of zero trust. And what this means is giving the users only the necessary privileges they want. For an example, let's say I need to perform some kind of a reduction on some service. So for me to perform that action, I only need the read access. So I should only get the read access, not write access, not edit, not delete, or anything. Now, by limiting these user privileges and access rights, the organizations can always reduce the risks of internal threats, mostly, because I only have permission to do what exactly I want to do, nothing more else. Right. Then micro-segmentation. Now, as the diagram says, it's a bit of self-explanatory. So what this principle does is it divides and groups your services in the organization into different, different segments. Now, you can see in this example, I2, Segment 1 has two services, and Segment 2 has another three services. And all of these segments are protected by the enterprise boundary with the firewall. Now, this grouping, you can do this grouping based on your business requirement. These are a couple of examples. You can group them by workload characteristics, the type of work they do, and sensitivity levels, and trust boundaries, and the list code, so on. Now, each of these segments are protected by their own access controls and firewalls, so that even if one segment gets breached, the other segments will not have any effect of that. This is something like an isolation of the threat. I'll explain to you how we can implement this one a little bit later on. So those are the principles of zero trust, and what zero trust means in very brief examples. So to recap the things I said so far, zero trust is a principle that advocates you to trust everything, sorry, not to trust everything, but verify all this stuff. You guys are good. And I have discussed a couple of principles of zero trust. Can you guys name a few? Microsegmentation, monitor, sorry? Assume breach, yes. List privilege, of course. Yes, multi-factor authentication, okay, awesome. Now, these are very conceptual stuff. So let's see how you can implement these zero trust principles in your organizations. To do that, we are using something called a cell-based architecture. This architecture is a decentralized, different architecture made for cloud-native applications. With the cell-based architecture, this is how you can implement zero trust. So like any other enterprises, you have the enterprise boundary. That's the outer limit of your organization. And inside this enterprise boundary, you have different cells. So by the first look of it, can you see a principle of zero trust in this diagram? The last one we talked. The services are grouped in the cell-based architecture. That's microsegmentation, yes. So when you microsegment in cell-based architecture, we call this segment as a cell. Now in this diagram, we have two cells and cell one has two services and cell two has another two services. And each of these cells, they have their own firewalls and access control stuff. On the top of that cell, we have something called the cell gateway. And the cell gateway is responsible of doing those access control verifying and verifying identities and doing all this stuff. And on the enterprise boundary for the external traffic, we have an external gateway, something like a firewall. So an external request will first come to that point and after sanitizing it, it will come to the internal network. Now I'll explain how the network traffic flows through this whole thing. So let's assume in this diagram, let's assume service one wants to talk to service two in the cell one. First, service one of the cell one will talk to the cell gateway of cell one. And cell one will make sure that this request is actually coming from the service one and it will do access control rules, validating scopes and anything that your business needs to do. Then the gateway will forward the request into the destination, which is service two. Now service two knows that this request is an authentic request and the gateway has validated it and it will perform the business logic and do the calculations and do anything and it will return the response back to the gateway. Now the gateway knows this response came from service two and it will forward that request into service one. So the cell gateway verified the request for both ways. Now what would happen if I want to talk to service one of cell two from cell one? So first service one of the cell one will talk to its gateway, cell one, cell gateway. It will talk to the cell gateway of cell two and then the cell gateway of cell two will talk to service one of the cell two, get the do the operation, get the response and follow the same path. Now you can see both of these cells have their own kind of gateways and their validation logics. So your request will be validated multiple times before reaching the actual endpoint. So what's going to happen to the outside network traffic? It goes something like this. So we have an external source outside of our enterprise boundary. And on the enterprise boundary, we have an external gateway. Now this external gateway has all the sanitization logic and validations to do and validate outside traffic. So it will first hit the external gateway. So the external gateway will perform the validations and then it will know the destination of the request and it will forward that to the cell gateway of that service in that service. Now the cell gateway will again validate and do all these stuff and then it will forward the request into the relevant service. After it reaches the exact service, it will again perform all these operations and we'll send you back the response in the same way. That's how the Zero Trust, we can be implemented using the cell-based architecture. Okay, now that you know what Zero Trust is and how you can simply implement and design an architecture, how you can implement Zero Trust in your organizations. Can you guys have any ideas? I mean, it can be anything just. I mean, a good start. Yes, we can do that. We can have a gateway and do this identity-based stuff. Yes, so I'll tell you because we don't have much time. So you have two main methods to do this. If you are an existing company with some sort of enterprise architecture, you can migrate your existing components one by one into this new Zero Trust architecture. When you're doing this, you need to design the new architecture based on Zero Trust and then do a feasibility assessment and do your inventory and make sure this is something your organization can actually do. So you're actually feasible off. Then you have to migrate your components, the legacy components one by one. It can be IAM stuff, network segmentation and micro-segmentation and so on and so forth. And most importantly, you need to make sure that your organization is compliant with standards and you have proper governance framework over all your design architecture. That's the first thing. Or else, if you are new to these enterprises, you can use a platform as a service which has Zero Trust built in already so that you don't have to do it yourself. Instead, you are simply buying a subscription or something to an existing service that has Zero Trust built in. This way, you don't have to do any of these. You can just add developers instead of building platforms. So these are the two methods that you can implement Zero Trust in your organizations. Now, talking about adoption and challenges with Zero Trust, there are a couple of limitations and challenges with this approach. It can be complex because in this example, I took a very basic scenario, but when you're actually dealing with your enterprises and business stuff, the use cases can be quite complex. And then maintaining good user experience might be difficult for you because these constant validations and prompts might affect user experience. So then you have to think twice and make sure that this is something that doesn't break your user experience. Then, of course, legacy systems and applications are a bit difficult to migrate and organizations usually keep them as it is because they are stable systems. And skill gaps of the developers, before moving on to any of these concepts, you need to make sure your developers are well educated about it and knows about these concepts very deeply. About the adoption in 2022, there was a presidential order from the US encouraging organizations to use Zero Trust and followed by that, the US Department of Defense issued a white paper on Zero Trust, encouraging organizations and to introduce some kind of sample implementation strategies that they can use to implement Zero Trust in their organizations. Yeah, that's it from me about Zero Trust. And if you have any questions, I'm very happy to answer. So when you show the cell-based architecture, what sort of proxies or tech are you using? Is it like a service mesh? Is it something related to that? Yeah, that's a great question. So the implementation is actually up to your organization and for your business requirements. So if you want to use a proxy, you can do it in your gateway. And if you want to use some kind of service mesh stuff, you can use that service mesh capabilities to do observability, to monitor all the requests and track them. So yeah, once again, it's up to your organization to decide what technologies you are putting in. This is something like the conceptual thing that you have to do this and this and do the architecture yourself. Any other questions? Okay, seems none. And these are... And when you show me about the cell-based architecture, can you show me about the different between cell-based architecture and microservice? And why do you, yeah? Why do you apply cell-based architecture and zero choice instead of a microservice and zero choice? Sorry, I don't follow. Can you repeat the question, sir? Microservice. Microservice? Yeah. Yeah. Can you repeat the question? Can you show me the difference between the cell-based architecture and microservice? Yeah. Yeah. Thanks for the question. So he's asking about difference between a microservice and the cell-based architecture. So in the cell-based architecture, you're actually grouping up microservices. So these stuff, these are microservices and in the cell-based architecture, what you're doing is you are considering some similarities between these microservices and then you are grouping them together. I hope that answers your question. Yes. So I want to ask that. So normally we would have like a central identity providers in the system right. And I want to ask that if this cell gateway is the same as an individual centralized central identity provider or it is like separated from the system. So the gateway have to call that identity provider again just to verify the request is authenticated or not. Yes, actually that's a very good question. So when you are looking into this diagram, you can see the cell gateways. The functionalities are a little bit redundant, right? So this is because this one is a very conceptual thing. In your actual implementation, there can be one central identity provider and then you can forward all the requests to that same thing and do the validations and authorizations and authentications. So it's like, it's up to your organization to actually decide if you're using decentralized identity provider or anything. So like you said, we can use a central identity provider as well. And again, it will depend on your organization's capability and cost effectiveness and things like that. Yeah. So basically, always like it's a word for every request will have pro-nation or authentication or that adds more latency than adds more request time. Yeah, that's what I said. If you want to do that, you can do it here, authenticate all the requests or you can simply trust some of these requests and skip that authorization part and save that latency. Actually, I'll show you. Yes. Each request, one is an identity that you trust. It has a size. Yes. If you do something like this, you will be able to apply it upon them. But there are different ways you go. So it is not like a description. Yeah, yeah. It is just one kind of opportunity that you can also remove the micro-commentation and you can just use that. Yes. I think I can show you an example for that. So this is something like you asked, right? So in this architecture, you are trusting all this network traffic within this cell. So you are not authenticating it once again. So if you want to communicate from here to here, you can go to the common gateway and then get it authorized. Yeah. This is another way of doing this implementation. The one I showed you was like a pure concept. Any more questions? Yes, of course. OK. Thank you so much, folks, for listening. And I got two, only one. Only one? Yeah. Go ahead, go ahead, go ahead. I'm going to meet. Thank you, sir. Good to see you again. Thank you, sir. Speaking about securing container images with Wolfie. Hello, everyone. Today's talk will be about securing container images using Wolfie. I'm Tuan Anh. I'm a lead team tech platform at VV Bank. OK. First, I'm going to talk a little bit about the context. I want to check it out a little bit. Have you ever used a container before? OK. Do you know what a container is? A container is a very simple concept. It uses all the functions of C Group and Namespace. Namespace helps us access the resources now, right? C Group accesses the resources with the limitation. OK. The container uses those functions to isolate our application. So the context of our team is that we run a tech platform team. We support all development teams. Right? And they ask for all the software or development in this image. OK. What does this mean? Each team starts using a base image. This base image can start from a traditional OS such as Ubuntu or Debian. Or they use an official image on Docker Hub such as Python, Node.js, and Docker team they use in the app. Each team makes a base image and they have a problem when they have vulnerability, they have to fix it. Right? Now I want to solve that problem. Once and for all development teams they don't need to care about how they use base image. Technology platform teams will build and maintain these base images to ensure zero vulnerability and they only care about their application. And all the dependencies they use. This image should be easy to extend in a case where each team has a special request. Extending these images should be easy. It's not too difficult for them. It's not a very long maintenance. Mainly, I don't want to cover all the things that maintain these base images in my team. It should be a long maintenance and it should mostly be auto-filed 100% automatically. And get out, developer wait, developer don't care about the base image. Don't care about the fix vulnerability of the base image. This is the problem I have. Okay. Let's start with the base image. What are the options? The first option is to use a traditional OS. You can write a document from Davian, from Ubuntu. This is a traditional OS. Optimize it. You don't need to build all the things. The second solution is to use run-time image, official image from Ubuntu. These are the base images that the developer is able to build in traditional OS. The third solution that was invented by Google was the DSOLES solution. DSOLES is a traditional OS built by Google. They still build on a traditional OS but they use BazenBuild and their concept minimum attack surface. If you want to use run-time and you need something, just put it in the base image. You don't need to remove everything. That's the concept of DSOLES. Some other solutions build a container optimized OS. For example, Google has chosen a solution with Goofy. Okay. Why do you build a new OS? It's complicated, right? Why don't we leverage the existing traditional OS? The traditional OS builds with a base image. We have traditional OS like Ubuntu, Davian, OS for a long time. They build for a server, for desktop OS. They never have a concept like 3 or 4. When they build a container, they start to train. They design for desktop and server users that have a lot of tools that don't need to be built for a workload container. But they still have an embedded part. They have a release for example, if you have an image for Exynix beyond desktop, Davian will have an image. Exynix doesn't care. I have a release. My version is 1.0. I will wait until 1.1. I will include the version. Currently, I'm worried about the release. According to my experience, they have a release 3 months. In 3 months, we will run with that error. We shouldn't do that, right? I don't want to do that. That's why I want to fix it. With the second OS it works for all of your users. For example, some of your users have built an app on Google or Glass. Google Digital Less doesn't need to extend too much. If Google Digital Less works, they will use it. Why does it have a release of Uffi? Uffi takes the concept of each OS a little bit. It's more like Digital Less. There's a fun fact about Uffi that the team that built Uffi is the same as the team that built Uffi in Google. It's like Digital Less. But it's based on a Rolling Release Digital like Axe Linux, more like Uuntu. Uuntu will have a release every year, right? There's Axe Linux. When you have a new one, it's almost the same as Anpai. The ecosystem is built almost the same as Anpai. But it's not the same as Lipsy Implementation. It uses G-Lipsy. This is very important because of the software that we run based on G-Lipsy. So, if we use G-Lipsy, it will be much better. It's a little different from Digital Less that it uses Jammer in Google. In Google, they have a build system. It's very special for them, right? And it's based on Google's request. It's a bit specific and very difficult to use. It's very powerful but very difficult to extend. The reason why Uffi doesn't use this build is because it's more difficult to use. If you want to add a package, it's very complicated. It's built. They take the idea of Digital Less but instead of basing on traditional way they choose a running release. They have a concept similar to Anpai but instead of Lipsy using M-U-S-L, they use G-Lipsy. The second option is to choose an official image. The wrong side of this image is based on what you choose. Usually, people will choose DBN. Usually, it's not reproducible. All of these features are used by Dockerfile. The problem with Dockerfile is when you run a command it's almost all reproducibility. When I talk about its reproducibility, when you build the image at this point, it has to be the same. It's not just about the same. Binary 0.1.0 is still the same. It has to be the same. It's about reproducibility. But all of these images have their own release. Right? The biggest priority of Dockerfile is this image. They don't have the same release as the other ones. For example, XS. It has a bundle. We don't know when it will be. With an additional OS, a package is fixed. The first one will be the upstream of the packages. After the upstream is fixed, OS starts to move the release to the OS. Right? Usually, it will go into the stream like a testing stream first. Then stable stream. From a fixed package until it's released at the same time as the last one. It has a release schedule. It has a lot of red holes and has a very limited S-Bomb support. But the point is it's very easy to use because in my workflow, they release the image so it's very friendly. Everyone knows how to use Dockerfile, right? It has a lot of cons, but there's a pros One more thing I want to say is that Google Dishonest works for some users if you don't need to extend too much. Your demand is only that you don't need to add too many packages. You don't need to customize the Dishonest too much. If it works for you, it's good. It's based on a traditional OS. It's still a problem about release schedule, right? It's based on DBN. There are a lot of issues in the Dishonest's group that I want to talk about. What else do I want to do? Dishonest's maintainer doesn't have an official answer about this. What do they recommend? They list all the packages and their dependency in some complicated cases. The dependency is like a package with a B, a C, a D They generate the dependency and they also have an image. That's what they recommend. If you have to customize it, Dishonest will be very useful. So, can I build my own content image? Yes, you can. The image is very easy. If you build from scratch, I'm not saying you use Dockerfile to build content image. If you use BAS, it's very easy. It only has a few steps. It's very easy. Let's go back to the problem. How will an idea of a content image look like? What do you want in a perfect world? What do you want? Minimum. I want to take the concept of Google's Dishonest. Minimum of text surface. You include it and you don't have to fix it. That's my image. I'm an angelic. What do you want? What do you need? What do you need? I don't need it. It brings the concept of Minimum of text surface of Dishonest. I want it to be 2 reasons. 1. 2. The next thing I want is it's easy to extend. This is to fix the Dishonest problem. I'll have BAS, but if I want to add them to extend them from my BAS, it's the easiest. It's reputable. This is a security problem. Reputable as I said. If you build it today, and build it a year later, it has to be complete. It has to have S-bombs. It has to be able to build anything. I'll build it again. Trust, right? If you give me an image, you say it includes A, B, C, and how to build it. If I do it like that, I have to build everything. It has to have S-bombs and Attestation. Attestation is the way I build it. This is my goal. I want to build it. This is Rufi. This is a minimum image of Rufi. It looks very simple. It has a repository. What is the key ring to verify the signature of the package? Here I have a simple base layout for the structure of an image. What is the entry point? I don't know. This is a minimum. It's declarative. It's reputable. You can build it. You lock the package and you can build it. It will have S-bombs and Attestation included. S-bombs are not the same. With the traditional Chattisner, when you have an image with S-bombs, you can see how many versions there are. The S-bombs quality is different. With Rufi, they build everything from source. They generate S-bombs because they know the components. S-bombs quality is the best because they build from source. If you want to extend the Rufi, you can build a Jammer again. You can define a Jammer. If you want to extend the Rufi, you can create a Jammer and you can define a new package to build it. You don't have to run make install, run strip binary. This is the way to include the package. Simple, declarative and reproducible. They build from source. S-bombs quality is the best. This is a example of Anxinix official image on Docker Hub and on Woofy Base. When I scan this slide, I scan Anxinix's image list. It has 166 CVs. The first one is APT. Anxinix needs to use APT in runtime. With the approach of Docker, they include all of the APT's base. Why do I have to focus on the vulnerability of APT when I don't need it? With Woofy, they only include what they need. They can fix their vulnerability if they need it. If they need it, they can fix their vulnerability if they need it. If they need it, they can fix their vulnerability. Because I have an image of Anxinix 1.0. I give it to a soft team. But the vulnerability will increase day by day. The value of the scan is only valid in a short time. When the view is done, it's just the step-by-step in the process of securing your work load. Next, you have to update your work load. Next, you have to build a board. When the project team uses their image, they send their repository to the GLEP, I want to enable auto-update this. Every time I update my work load, if I can fix it, I will review it. If a project team uses that image, it will open the request to test if they have it. If they have it, the maintainer of the project team will accept it and return it to 0CV as usual. I want to call a... I believe the key to 0CV is the ability to stay close to it as you can with Lattice really on motion. You don't need to run Lattice. I don't recommend you to run Lattice. I want to recommend you to run stable. The ability to stay close with the Lattice one, you have to follow the Lattice as closely as possible. In Proxen, we have a lot of layer of creative variation. For example, when a project team wants to update their image, they run it. I run the test. I build the app. That's my layer of protection so I can move it, right? Because if you don't update today, when there is a CVE that is critical, you can't do it. This is the idea to stay close with 0CV as much as possible. Any questions? I have a question. Is there any activity like docker? In docker 5, there is a base image. But in WP, as you can see, there is no base image. What does it build from the image? Back to the question, what does a company image have? Do you know what is a star file? What does a star file have? A squash or a base layout. What is a star file? What do they have here? What is the simple work of building a disk message? They just take the package and define it. This is a base layout, a base layout. It is very simple. A OS will be USR bin, USR lib. It will have a star file and a star file. There is no base image here. The output of it will be an OS from the image that you can plug into your other cloud. Can you explain what is more compatible with the base? After building the base, what is more compatible? The image is almost compatible with the base. The second question is about the future. For example, what is the workflow of building a disk? How is it possible for you to have the most compatible with the base? How is it complicated? It is not complicated. We will use Anpai. If Anpai does not choose Glypsy Implementation for USR, the choice of Glypsy Implementation for USR is very minimal. It is secure. But when you build an OS, it is not compatible with Glypsy Implementation. You can use Anpai. You put a binary view of Glypsy Implementation and it will work. We will make a decision to use Glypsy Implementation for USR to accommodate the ecosystem in a better way. The second question is what is the output of an OS? How is it possible to plug in the biggest part of the things we have to adopt for Google. We can just type the first name from Google and then by text we can input the drop-in into the Google image? Then that's good. Build the image as 0CV is just the step-zero, right? If you read it, you will see that they will be able to use it. I'm monitoring the video. I'll scan the image. There are a lot of pros to manage this. For example, if you scan the image again, you won't be able to use it. With Ufi, they provide S-Pom. S-Pom is a file. It's a suggestion. In the video, I also put the base layout in the middle. In the video, I put the base layout in the middle. It's a fairly small file, right? If you scan it, you don't know what's inside. It doesn't matter how much it is. But it can affect the space. When you have a VCV observability in all of this, you try to fit all of them in the image settings. You try to fit all of them in the image settings. What is the magic of this? It's the quality of the quality. If you scan it every day, and you see there's a fake-up stream, you will trigger a review of it. After it's pushed to the precious field, you can plug it into the cloud. For example, you can use the Renovate bot. I'll use the bot itself. I just bought it to open it up. The logic compares a little bit. But the idea is that you can use a bot. The bot will open the request. You can review the new one. If you have a new image, you can open the request. And if you have a CI setup, and you have a request, you can run the test. At that time, they will click on the workflow. If you want to create a new environment, you can click on it. I'll try to support you. I'll try to follow as much as possible. I'm aware that I have a vulnerability. First of all, the key of the vulnerability is that you have to know the workload. If you don't know the workload, you need to review the image. Second, you need to follow it so that you have a lot of resources to protect the environment. When you open the request, the maintainer will be more confident that you can run the build. They will believe you. That's the key of the vulnerability to help you follow the latest. If you can run the latest, you can run it. That's the reason why I'm trying to have a lot of layers to protect the environment. I'll try to have a lot of layers to protect the environment. I'd like to ask you if you have any advice that you can use for the application of the services? For example, vulnerability can only be explored when the service is always up. The text of the service is always visible. The services of the service can be explored as part of the team first. There's some variability. Okay. The idea here is that when you look at a new community, you need to analyze the first approach. If it has an influence on what you're doing, you need to have an approach that has been released for a long time. If we don't apply the password and we have a lot of other tests, we don't have to care about the analysis. That's the second approach. If you have a password available and it has been released, you need to look at the analysis to see if you can run it. If you can run it. We have other layers to protect the environment Okay. If you have any questions, I'll hang around a little bit. Time's up. We Yeah. Listen to what we do. I'll send you a link. You can take a look at the service. We are an enterprise support. I'll send you a link. You can take the question. You can take the question. You can ask the question. For our last talk, we have Pratik Jagrut who is speaking about harnessing eBPF with next-gen security tools for cooperatives. Hello everyone. My name is Pratik Jagrut. These are my social handles and if you want to connect me you can just scan the QR code. So today we talked about eBPF and Kubernetes. How eBPF technology can be to increase the security of Kubernetes clusters. So does anybody here know about eBPF? Can you hear me? Sorry. Should I repeat again? Sorry. So today I'm going to talk about the eBPF and Kubernetes clusters how using eBPF we can increase the security of Kubernetes clusters. Anybody knows here what eBPF is? So this is the general definition of eBPF from eBPF documentation. It's a technology that allows you to run a sandbox program inside a kernel. So basically kernel is a very delicate part of any operating system and if you mess up there then whole system gets crashed. So it's hard to run anything particular inside a kernel if it's so stored or if it's not a Linux module. So basically before eBPF it was called Berkeley Packet Filter Network Technology used for filtering a network packet at the kernel level. But nowadays it does much more than packet filtering. So it has been extended how it can be used for security network packet filtering observability and whole other stuff. So why do we need eBPF basically? Has anybody ever wrote a kernel code here or maybe contributed? So basically then you know that developing a kernel code is really a hard task. First of all if you have an idea then you need to convince the community that it will be widely used by many other people, many other companies. And secondly you need to be a good programmer in C and basically because while adding any code in a kernel you need to be sure that you are using a right programming models just for following a security and you also need to make sure that your code does not lacks the overall performance of the kernel. And third the kernel code base is very huge. I mean if you just go over the GitHub and see the Linux kernel code base then you will see like it's a universe, it's always expanding. This is just a general development for kernel like any other software development. If you have few feature items you will just discuss within a community then it goes through a design phase, coding, submission. So it's like every software development procedure and even after release it's most like 3-4 years later any feature release will come into the production. So no companies right now uses the latest kernel version of the production. So these are the issues with legacy coding factors which ABP of managers, ABP of actually makes a kernel program level. That is if you have a custom feature suppose if you have to write any tracing packaging system or maybe you have to write a program to just list out the on the machine you can do that through if you do not need to go through the kernel programming. So how do we write a program then? So basically first of all ABP of program is event driven. So in kernel anything particular happens through the event. Suppose if any new process is run then there is an event which gets called and then to that you can attach your particular program. Whenever in future that event is again called your program schedule. So basically these events are nothing but hoops or probes also we call them attachment points. ABP of program can be attached to the K probes which are kernel probes, Q probes, user space probes, trace points for your networking and load balancing stuff and performance for your performance and matrixes. So ABP of programs are basically in two spaces. One is user space and one is a kernel space. User space applications are written in Go, C++ and Python as well whereas the kernel space application has to be written in C that also is in a restricted C it does not have all the features of C time. So whenever you write kernel space ABP of program goes through some spaces it goes through the verifier first it checks if your ABP of program is actually good to run. I mean you have not left any dangling pointer or anything it will not crash it makes sure that this program will not crash your kernel. GID is just in time compiler and then maps are for storing your data. Whenever you collect any data from a kernel that need to be stored somewhere in some kind of data structure and in kernel we use maps for that and from there it goes to a certain time and it gets attached to the hook and these are the some projects written over there Cili and Falco. These are majorly used ABP of projects in Kubernetes community. So this is a basic example of a kernel space code how you can write. So this is a psych function which is a predefined function in ABP of libraries. So there you pass the hook or at this point so I am here at this point of assist calls for int and for xve so whenever any process is run in your system this particular transport gets triggered and this is the function which I have defined which will be called upon whenever that particular transport is triggered. So how does it help with the Kubernetes then so basically so basically Kubernetes is just a cluster of nodes and every nodes has their own kernel and using ABP of you can increase the runtime visibility of the things. These are the categories where we can see people like container security with ABP of thread detection and real time monitoring system where there are a bunch of projects which are using ABP of for observability and security monitoring in Kubernetes. So I will just give a short demo here this is just one of the security project called Tarian. It's a very new project it's not known to many people it's one of my project. I will just be giving you a live demo here Can you still hear me? This is my Kubernetes cluster up and running I have all the namespaces which by default get created then these are the pods running by the Tarian system. Thank you so much sir. These are the pods which are up and running by the Tarian system which are needed by my project then to show that this is just an engineering application which I am just demoing as a production level application that we are using now I will just add a constraint this is a Tarian so you will add constraint command using this I am just telling my cluster that I just need to run three processes which is pods Tarian pod agent and engineer any other process other than this there will be a malicious process and it should not be run so I am just adding it to the microphone you can see that we have successfully added so match labels if you guys know Kubernetes then you know the labels and selectors in Kubernetes how they work so whenever any malicious container is run inside particular using this label which is not in the pods Tarian pod it should be as a malicious process here I am adding a watch over here event so if I run any other process using kube exec hopefully it should run so I run a sleep process here which was not the process were not allowed sleep process was not allowed and still I tried to run it and it ran in a container and now it has been reported as a violated process container and it might be malicious malicious process so this was a small demo for the project which I have been working on that is all from my side it was really short and small talk with eBPF do you see at the kernel having any Rust support in eBPF in the future actually don't work on Rust I am a good developer and I mostly same here so I mostly know about the go community and how that maybe we can watch a community channel for updates thank you guys, thank you so much