 Okay, so we talked about exploring how to use SCAC for container image distribution. First, my name is Agon, I live in Berlin, I'm a team folk, and we are basically a team of developers. We like to work on container things, both on Kubernetes and global Linux containers. Today I will talk about the existing distribution mechanism for container images. I will mention a couple of them, and I will explain the problem I'm trying to resolve to optimize the network traffic to try to find less data about the network. And I will mention two different ideas, and I will focus on SCAC. And then I will explain my experiments and show some numbers and some graphs about what I've got. So, first, I'm a rocket developer, so that's what I know more. On rocket, there are two mechanisms that are supported to download container images. There is, of course, the Docker way, connecting to a Docker registry, and getting the Docker image from that. It uses the Docker distribution model. And it supports as well the SCAC discovery protocol. And that's something which is supported by a rocket. I don't mention OCI here because OCI is the OCI brand name. That's the image format in this. That's not really about the distribution. So, the way we get image from the Docker hub, from the Docker registry, is to keep this way. Here I have Kubernetes node. We rocket this image, and when I get an image, a Docker image, the rocket will use the Docker to SCI tool internally to automatically support the Docker distribution model. And it will download from the Docker registry all the layers from the Docker image. In this model, you can configure your Kubernetes cluster to either use the Docker registry, the Docker hub, or your own registry, if you want to. The SCI discovery protocol is different. When you specify an image, for example, in this case, corwest.com.scgtc, Rocket will first go to the web page, corwest.com.scgtc, and look at the same page. Inside, there should be meta-attacks to explain how to get the image. So, it's more decentralized, but it relies on DNS to get the image. In this example, you say to go to bitter.com.scgtc, etc. So, it will download an SCI file, which is actually a table. And that table will then manifest on the rootFS that are other directories or files on your container. Optionally, your image can refer to order image. If it has a parent image, it will repeat the process. On both those models, there are two issues I can see with the wasting network I'm with. In this one, I have several Kubernetes clusters. All of them are configured to take images from the Docker hub. So, we see it's a quite centralized place. So, the first issue is if you update your Kubernetes application to detect the next version. All your nodes, your Kubernetes cluster, will have similar time to download from the same place the next version of the container image. So, that can put a lot of pressure on the centralized registry. The other issue is when you upgrade your application from one version to another, the two different versions of the same container might not have so much differences. Maybe there is some change in the binary, but most of the data file will be the same. So, it seems quite wasteful to download the full image. So, for these two problems, one problem is the centralized space and the other problem is the small differences between two versions. There are two different strategies that people have thought about. The first one is to use Bitcoin. So, there is an example of people doing that with QCTL, Q.IO, or Way.IO which is a registry for Docker image or ACA as well. And you can check the score with Wordpress where there is a way to use Bitcoin for that so that there is pressure on the registry. That was written on QCTL. And in the rocket, there was some discussion about using Bitcoin as well, but nothing was measured or finished. The other way is using CSV, which is the most... whatever you focus on for this talk. So, the motivation for using CSV is only to download the change between two versions. So, here I have two versions of one image. In every image, the difference between image one and version one and version two is only one small piece of change. So, it goes from orange to green on the back. In this case, I would like to only download that small version and not we download everything when I upgrade. So, it's cutting to change. But there is a problem to... if we do it lately without thinking too much, if the image gets unbiased or added, we can see if the change size while you fix, it will not really work because the hash of the next change will be completely changed. So, to fix this problem, CAC can do something which is a variable size for change. And the size of each change can vary depending on the content. So, it gives cryptographic hash to determine the probability to know whether to get more data in the change or stop passing. In this way, there is a more likelihood to not affect the next change. So, here, in the orange check, I remove some bytes. But still, CAC may need to have the same change for the web and the network button. So, it can manage to only download a small part. So, what's the full process for using CAC? First, when you have the container image and you want to CAC, you need to serialize the data. You can do that in a better way than the table of your territories and files. When you have all your container image serialized, you speed it into a check of a variable size. And then, CAC will hash each change and they will be back in the content of addressable storage. And then, each change will be compressed. So, they can be downloaded individually. When the container runs out, we have to download the container image. It will use an index file to know which change to download. And it will do basically the same process but in the address. So, what I did was a proof of concept to try that in Rocket. So, what I want to do is, I'm going to do Rocket fetch the version one of my image. It will initially have to download everything but when I download the version two, I need to download the difference because I will cache the change in the local change star. I'm going to integrate that in the way the SCI discovery protocol works. I'm just organizing the meta tag in HTML and if I have an image with a specific CA ID, it's just the index file for CAC, then Rocket will know that it has to use CAC to download the change and it will use the default directory which is default.ca star. There is a grunt for that so that's just a proof of concept. The amount is quite basic just to try. I have a list of to-do that I would like to try or maybe some of you are interested to try it out. There is now a library for CAC in Go. So, since Rocket has a lot of different software in Go that could be more practical than using the CAC. There is no database collection for the change yet. So, if we use it a long time, it will treat the problem that's not very practical yet so we need some information in CAC for that. Another idea is to use Fuse in CAC. There is a Fuse browser to be able to mount the container image immediately and download on the fly the change in which I did. So, that was a bit of the theory that in practice does it actually work to know whether it works or not if I can search memory with that idea. One some experiment and I have some thread to test on the different images to see if it works. The first method I tried was the registry image available on the look-alike app. I downloaded a lot of different versions from quite old 061 to 071. In the blue line, the size of the look-alike image is compressed. The size I would have to download if I were not to use CAC. I compared that to what happens if I use CAC to download all of the versions when I first download them. So, the first download I see using CAC doesn't save anything. It's actually a bit worse because I don't have anything in the cache yet. Actually, after the third version, I set quite a lot of network bandwidth and don't have to download so much. It's not really because the content didn't change that much so I don't have to download that much. Sometimes I see that actually I have to download everything probably because the version, it's a major version or a lot of things change so there is not much to win. I tried with other images with Ubuntu, tried different snapshots of Ubuntu and I see that in this image as well, after the first version I can set some network scope from Revox as well. It looks not too bad. Some other software, but this one I didn't get what I expected. Here I see that I can use the spacing method but actually it seems worse at most of the time. I had a look at the image of Prometheus to try to understand what's happening there. I see the image is based on Revox. Revox is quite small so it's not close there. But mainly the Prometheus image is just two big binaries of 16 MB, something like that. Between each version, there are some lines of code change between the versions, not that much, but still when accompanying it completely change the binaries so it doesn't help very much. In the scene cut stream there are some people trying to make it better for this kind of situation by recognizing the health format of binaries to recognize where to start the change. So that's about it. So with this, I can save a network bandwidth for a lot of images, but not a whole problem. But to properly integrate that to a lot of my work, what I did was just a proof of concept. Is there any question? Why is it worth the first unknown? I have a guess, I'm not sure if it's really worth. When locating is racing, I don't know the contrast image. When using racing, I don't know the contrast chunk, but each chunk is a contrast individually. So probably the compression algorithm is not that good when only compress small chunks individually. That's my guess, but I don't check. Again, how you dealt with the fact that there might be bites missing in the middle because as far as I understand, the chunk size is fixed, right? Okay, but I mean, it's not fixed based on the event or it's not fixed for like... So what you're saying is... So try to understand how do you detect that a chunk is going to be smaller this time? Sorry, sir. It's based on the body and the contents of the data. So what we actually knew, we decided how much the size is going to be and it's not exact size. So it's around 64 kilobytes to when we want to write something in this branch. Every time we write, we have a C-click hash, but no, that gives us a probability that the situation... You use a rolling hash, essentially. Okay, because I felt you're using a crypto hash as trying to think, like adding a byte to try and find the window at which size it seemed expensive. So you're using both a crypto hash and a rolling hash? The rolling hash is to decide when to swap. So when we cut the chunk in the front-end of the store using a crypto hash. Hi. I'm also curious how well this works with, for example, beta-infest with compression on or... Sorry, not only compression, but also things like the encrypt and stuff like that. How well do these things work with changing an encryption into blocks and so on and so forth? How well does the chunking work with those? Do you mean when the image is encrypted? So, for example, if you have B3FS with encryption around it with using blocks or something like that, how well does it work? How well does the chunking work with things like that? I guess it should not work. So, what I'm trying to do is do the serialization on the table, uncompressed table not directly, because I don't know how it will be possible to do that with compression.