 Mi je Stefano Babić. Imam autor in manteljnik for the SW Update project in over the update agent. In my talk is about data incremental update and how which open-associate projects are available to do this. And of course, focus is about SW Update. So which way we have in the SW Update to have an incremental update, what we have in the past, but mainly the focus is on the new delta handler that was introduced with last SW Update release. So, first of all, so an overview about SW Update, SW Update is an update agent. It's integrated into your software. So part of your distribution have a different way to update a system. So as an integrated web server you can update from a cloud. You have a connection to a fleet management server or a local update. There are security features, so software is signed, can be encrypted, and so on. SW Update is not fixed to a strategy to update the system. One common strategy for the schematic or sometimes AB or dual copy and really focus for incremental update is this strategy. So what I will show you is then adapted or just working with an ADB approach. So, reason for an incremental update is we have increased science. Our project becoming much more complex is common to have software 100, 115 megabytes where you can reach also a gigabyte of software. Even if our internet connection is very quick sometimes or at least at home maybe with fiberglass there are some devices that work in low bandwidth network like GSM so 96 kilobit. Even if you have a good connection you have to put your software somewhere on the cloud or maybe on some service and when all your devices are starting to update the system they are generating a lot of traffic. So that means at the end you have to pay as manufacturer a lot of money to the cloud provider. So reason for updates is usually is to save money rules. Different ways in SFDT what we have in the past for example we can split between OS and application. So if we provide a different update package just for the application we can have some sort of additional incremental update. This is because if I update there is no project where all features are implemented in the first release so the application is much more frequently as the OS. Big issue is the consistency because you have as release manager to check that your application, your new application is still compatible with the libraries provided by OS. In SFDT there are hooks to provide some consistency but this is something you have to write to your own, so is something project specific. The second one is a I would say something more regarding splitting the party into application and OS update is quite straightforward. In SFDT there is a file called SFDT it is a meta description of release you can put here just the application, describe how to install the just the application and of course this is really project specific. What about if we wanted to update the whole file system there are some kind of data so this was done with LibAsync handler this creates a data file from a specific version. This is the weak part of this handler because you have to know which is the version running on the device. If you have many devices you have to provide many data files. The usage is also quite straightforward in the study description you add a type for this handler is RDF and you have to put which is the source, so where on the base of which on the running software the data is applied. There are some drawbacks on this approach. So first of all we have RDF tool to create the data but we need a specific version and what about if and the customer has put a device into closer for 3 years and then is back to the network should be updated as well. The other big issue is the size of the data because using the LibAsync the size of the data is increasing of course with a number of changes but there is also a free shoulder and after this free shoulder the data is even bigger as the size of the original file to be installed. So the best usage for this handler is if we have frequently updates we have some kind of rolling release so we know that all devices are quite with the same version and we maintain the size of changes smaller. So we have a small data but this is also a reason to try and to find a better approach. So starting from this and I want to search or to look for a way to find a delta that is independent from the type of sources because we don't update just software, we update also manual, videos everything on the devices. So it should be independent we have embedded device so that means we cannot use a lot of resources we cannot use a lot of memory just to apply a delta. And what was important for me were the sources of the running software on the device after applying the delta so I have the new software on the device which be identical to what I have built on my system so in build system. So if I run the hash on the device itself I should have the same result if I run the result of big bake. Starting from this I started to imagine what is the actor involved in this. I can say well I have to get something from a server what is the role of the deployment server. A simple way could be I have the device the device sender just the version to a server, the server have all possible version available compute the delta based on version of some meta sender the device sender this delta and the device applies. This has a lot of drawbacks so first of all the server should have knowledge of a lot of version and maybe should also know the internals of this version. Second I have to change something on the server so if I have a different server so if I use a hopefully server or use another server I have a different implementation so that's very very bad. And starting from this I will start to find a different approach. So what about if this delta is not computed by an external server but it's computed by the device itself because the device knows everything knows everything from the running software can compute what is missing so not really a delta fiber computer what is missing what just this part just comparing the two approaches if I go to a server version well I need a lot of files on the server this can be difficult in some application because for example there's a consortium there's a vendor there's a consortium for multiple vendors and it's not allowed to have a lot of files for each vendors we have to generate files if we generate something we have to provide security so maybe we have to sign these delta files and this is also bad because in the approach in the server server is untrusted so if there is something the server is compromised the server detects this and rejects any update what is as advantages is of course the device should do less and that means the update at the end should be faster as in second approach if we run on the device well we have independence from the version each device will compute maybe a different delta we have ten device with different version we have ten different delta but at the end they will go to get the same that version we can have an update from any version to any version this is very good we are also running on the device itself so we have not to send a lot of information because this generates also traffic so we have not to send a lot of information we can compute any kind of hash so whatever on the device itself of course the device has to something more so more load on the CPU let's say we have a lot of tools in Linux to mitigate this like cgroup, nice or whatever but of course the update will be slower that's sure there are some open source project for delta I skip first one because all of them are required to have to start from a fixed version I go directly to the ca sync approach because this is more or less in the direction we want to have but I will say what is good in ca sync and why I cannot use it so ca sync this is taken from ca sync website good from ca sync the block device is then from my block device it generates a serialized stream this is very good everything is stream allows me to avoid any temporary copy very good then ca sync is working with a rolling hash that means the source is analyzed splitting chunks chunks are dynamical so dynamical sites a chunk is closed when a pattern is found or when the maximum site of a chunk is reached so I can split in multiple chunks that means for example if I add something on the etc directory well that means that some chunk will be different but all of the rest will be the same that's very good what is doing then ca sync is creating what is called a ca index so with a whole hash of chunks this is also good because I need then a cryptographic comparison to be sure that the pattern install is correct the last part I say creates the most of problems because ca sync creates is called ca store where each chunk is stored on a separate file and this should be somewhere on the network on the cloud I started testing this and the thing is when I get each chunk is in a separate file I need a HTTP get request for each of them sometimes the connection cannot be reused so for each HTTP get request there should be also a new SSL connection just checking the number of transfer bytes so the payload is what is reporting but checking on the internet statistic you can get the whole transfer the overload is very high so this is very bad it's not what I want to reach not at all ca sync is a complex and we will see a monolithic project so it's thought to be used as it is there is no library was thought to have a library there is no authentication ca sync is forking itself to download the data from the network this breaks the security in established there is a privileged separation security so that means that when something should be get from the network a process with lower priority is started and this cannot be reached with ca sync looking at the door backs this is a project is developed by Jonathan Dieter he is living here in Ireland is used on Fedora project is used to download the packages in Fedora and very interesting it's using the same approach as ca sync so it's using chunks it's using a rolling hash to define chunks of dynamic sites but it defines a new format self-contained so there is a header with all made information and there is the chunks this is much better and fits my use case more or less interesting is also Jonathan finds exactly the same and I can read from his blog he also started with ca sync and we both found that ca sync is close to what we have to reach to do because ca sync was fought for another use case anyway I don't need to start from sketch but I have to change something in the chunk make absolutely sense first to make changes and to push everything to many lines because nobody wants to work with some fork or something else so I started to working to the chunk and see what I have to change so first should be a better friendly there is a lot of assertion in the chunk where a stabiodate cannot be exit should report runtime errors and this runtime error should be evaluated and handled by a stabiodate the most important thing is uncompression hashing because the chunk just hash we compress chunk but on the device well I have the root fs so I can evaluate the root fs of course it's not compressed and I can get the chunks and ashes from not compressed data so I added also made information for not compressed data so in the chunk format both are covered and of course I have to send the API to make the library let's say, callable from a stabiodate after this in some of course assertion Jonathan was very kind, merge everything and is in this chunk mainline next step so this is just to say so with this chunk format is just a header and then a list of chunks there are some other main information most important for this application is the index number of chunks for each chunk we have hash for both, for the compressed chunk and for the uncompressed chunk then I have to put this part into a stabiodate so I have to transfer to the device starting from build I have let's say a root fs I transform the root fs in this chunk format I put the header because we have seen the header is just at the beginning of the file I put the header into the stabiodate stabiodate is the update package for stabiodate that means it is signed then I transfer to one of the server will be on the fleet management server what I also need because I need to download the missing parts the root fs in the chunk format will be also put somewhere on the cloud, can be on the same server can be also on another server the only thing should be available to the device via a URL connection to stabiodate so everything so a specific image in stabiodate is managed with a handle so I have to implement a new handle is a delta handle information for the handle is where I have to install so the device the source, so is now the running software where I get this g chunk format with the chunk file, so a URL then each handle is afford to install everything but if I do this had the problem because if I install on the MMC is different if I install on the UBFS if I install on the no flash and so on and so on in stabiodate there is a handle for each of these specific format so if I do directly here we do a lot of things so the solution here is if this delta handle does not install but create a stream and everything in stabiodate stream I can then forward this stream to another handle so I can have a chain the handle and this stream is then passed to a handle we have already in stabiodate because we have a handle also we can take this stream and send to another device running stabiodate because we have a handle for this so starting from this what is happening so this header is coming on the device the device is calling via the chunk library and creates a header for the running software I have for the running software header with chunks the hash I can then make a comparison so I have a downloaded header and a source header I can just iterate for all chunks for the destination and check ok, can I find a chunk on the source with these sites and with hash if I can find it this is exactly the same chunk we have a further structure this structure will continue also other information like the offset in each of the two files but I can know well I can simply copy or I have to download it very good so that means at this point I have a way to find a delta on the device itself I have just to find how to download the missing path to do this the way was via byte range request at the end is what we currently do when we see a video on YouTube, we remove the cursor when we want to skip something is exactly the same a good thing is and what is sorry, on my use case is that I can put multiple chunks so multiple bytes request in a single HTTP get so I can override the problems ahead in csync in idle word I will say I have just one HTTP get request and this contains all chunks I want to download in the real word so it is not possible because to avoid denial service all web servers or Apache and GX or whatever have a maximum number of bytes request to be sent so that means in my handle I have to split this to create multiple HTTP get request so one for when I reach the maximum to create a new one and I have to include all all of these requests for the server the server what I need from the server is really the chunks so bytes at the end the server really answer in this way so sends a partial content and the answering partial content as content type is a multi part byte range so the last building block I need is really parser here so I need to parser all of this stuff and to extract just the chunks but let's say this complete the number of building blocks I need and can put all together so the final step is I have this header from the Putin VSW is the new header I can run the Z chunk library on my source I have found which chunks are different I need another actor because the delta handle is starting to install something so to maintain a privileged separation I need another parser so there is a downloader is a server at the end so it just gets bytes request from the delta handle and returns the answer from the server or an error if it cannot be downloaded the delta end of course should synchronize the two stuff the two stream and should recreate the new stream when it's doing this we have at this point all ashes so we ashes for the sources and we ashes for the downloaded stuff so all chunks is then verified that the ash is correct in this way we exactly know that what we have created at this point so the created stream is exactly what we have built on our build server putting the ash on the build server will produce the same ash we have on the device the stream, the created stream will been forwarded to the chain handler and as I said when it's generic can use any supported handler in establish data can even use a custom handler if someone has a custom handler in Lua this is also possible because there is a stream and the whole handers are using stream that means I can use any supported handler so I mainly used on a couple of project with AMLC I know from many list someone I used with UBFS because I send patches but you can also use and send to another embedded Linux running via the SW for wilder handler so really you can use all handers we have in this little update just to check to see what's running on the device of course we run in verbose mode and we enable what establish data this handler is doing establish data will report for each chunk if the chunk will be downloaded or will be simply copied from the device and will be reported which is the site from the other sites which is the downloaded site so you have a feeling about how many bytes were downloaded the same could be done also at build time but of course you need a starting point so you need to know which is the starting release you can use the chunk tool there are some extension you have added to support uncompress hash and to use the sh256 hash because the chunk before to use an hash you create with the chunk format there is a tool that I have developed is not officially in the suit for the chunk is in the test directory of the chunk you can use it this is doing exactly the same stuff as the update is doing so it's calling the library reporting back how many chunk were downloaded or were copied and the total bytes if running on the device will be loaded what is still open so as I said this is part was already part of pre I think in 2021 11 but it's part of last release so 2022 5 mainly was commissioned and was used for Debian so what is missing is the support to generate the chunk file to build the header inside the SW so it's not a lot of stuff but it's still open and let's say in this way is already working I am supposing that in future there will be some optimization for example in this happening on the device the creation of the header from the source takes of course a lot of time because the whole software is ready is analyzed to get the chunk size and to get the edge this takes a lot of time of course can be a problem when an update is triggered so another way can be for example when the SW update is started and then maintained until an update is triggered and this can save sometimes bonus of this kind of update is also let's say that in some way on your device something was changed maybe file system is starting to be corrupted or some changes due to some bugs in software that just means that the generation of delta on this device is different as on a device that has not this problem but at the end we will get exactly for both devices both will get exactly the same software so quite at the end I won't just to say so with this let's say delta hander, the chunk hander is more suitable for dual copy approach good thing is there is no changes on the server side I have talked previously with output developers and they told me yes really we are developing a server for all kind of devices for bad linux or microcontroller or whatever so we cannot implement something that is specific for example for a bad linux or something that is specific to have a stable update on board so they say this should be done outside the server so this was also an argument against to make something on the server side what you need is to take this additional file so for each artifact you want to update with delta mechanism you need to store also an additional artifact but as I say you can put on the server on the flip management server if you have or anyway on the cloud the only thing the device should get access to this file and of course at built time you can know how many bytes should be downloaded ok so this is my last slide so I am ready now for some questions if you have yes whichever yes I did I was maybe too fast in the skip but I did I checked for example xdelta I checked yes thief I checked also in google it's done for commune so problem is starting with google so they do an excellent work but it's really for a elf file so they know the program so we download the program and then we can skip the old part related to elf file because we know is an elf file and then the delta is even lower so smaller in my case does not work because I have program I have configuration file I have videos I have something else so this approach is not working with xdelta my problem is xdelta as it is is generating or applying the delta in memory so that means I need a huge amount of memory just to apply the delta this is not compatible with in a bad device so the reason I change in the chunk how the chunk works instead of applying memory now is not applying just writing into failure writing to the destination so there is not so much resources the main point is really you see when you run this that it takes a lot of time to calculate the delta because the software the software you run should be read completely to calculate with chunks yes sorry so you are not restricted to a read-only file system I am not restricted no you need an AB and of course I am not restricted but the problem is if you have a read-write and when you change the file system is a problem so it could be read-write but we need also at least when I compute and then I apply the delta should be the same but it is not restricted as I said if a file system gets corrupted it works because it will cause it to think that there is more to be updated so you mean the chunk sites the way is working is with this bad hash or rolling hash is a mathematical approach is also used in a lot of backup programs really to find just the part of change think about also on a file system there is a lot of zeros in Easter 4 for example and what this algorithm is searching is really for pattern when you found there is a pattern when it says the chunk is closed so it can be a very small chunk or a chunk here the default so maximum chunk site is set to 128 kilobyte is also can be changed but the thing is you have to change in G chunk you have to change in stability data as well so it should be exactly the same but if you add just a file you will find that some chunks are changed and the rest is identical yes using which parsing there is something if it is a local update you don't need to save bandwidth so it is quite ok ah ok 20 megabyte file it is possible where URL is a URL so it can be also URL on a local file so it can be also used you can send the delta plus a small sw and the delta or the file not really because because everything in your way is you are defined the delta on the build system at the end and in that case maybe is a good thing to use the leba thing exactly and that's already also it's already so both are available so both are available you are the integrator so you decide yourself if you want to do a major upgrade or you want to do with zechank one thing is with difference with leba think leba think you reach a point where the size of the delta is bigger as this is bad very bad with zechank the bad case so the worst case is all chunks are different and all chunks are compressed with zst so also the file i deliver can also be delivered with zst at the end i have just the sum of chunks and it's quite the same as the whole chunk so the difference is not so big time difference of course yes compared to just update yes which processor do you have so i cannot i make 6 so i have tested where i make 6 but it depends when from your delta no the download depends on the delta so it depends on the size of the root file system i cannot say the only thing you have to imagine the whole file system should be read this is the difference takes time you can just make a dd on your device and see reading of the whole file system you already know when there is a running of a rolling hash takes sometimes i have no idea now which part but the major part is reading back all your software let's say no calculated for a specific project should be tested depends if there is an issue in most cases it's not an issue because it's with a b approach so it's running in background and if it takes more minutes even more it doesn't matter so i don't know if a various project require really better it's going fast this is top i think but it was the last one