 Good morning. Good afternoon. Good evening from wherever you are joining this talk, either in person or virtually Welcome to the talk I were you ring in the automotive world I am TVR Prasad. I work as a solution architect at KPIT Technologies I have more than 12 plus years of experience in Android and Linux based IVI and cockpit systems. I Follow the agile bailing list regularly to get myself acquainted with the latest happenings in each year I play around with my outright C2 and Dragonbird Fortancy in my free time Before we move further a few words about KPIT We at KPIT have expertise in all the domains of automotive 10,000 plus KPITans passionate about mobility and technology 500 plus vehicle production programs millions of cars running our software and global presence That speaks volumes about the work we do at KPIT Now let's look at the agenda for today First, we'll start off with an overview of IOU ring We'll follow that up with where IOU ring can actually add benefits in automotive We'll also look at the challenges which are hindering the adoption of IOU ring in automotive Then we'll follow that up with questions A few abbreviations and acronyms Before I get into the talk I will want to thank Stefano Garzarella for letting me use his slides from his talk that he has given at KVM forum After seeing the talk, I liked the slides and I thought I couldn't do any better So I requested Stefano to Permit me to use his slides. Thanks Stefano for letting me use your slides So now let's look at what is the primary goal With cockpits becoming the buzzword in the automotive industry Container based cockpits and VM based cockpits have come into existence So handling IOU effectively to extract maximum performance from the hardware Is important and crucial in the cockpit world With the ever-increasing number of ECUs and the new buses that these ECUs are connected on Handling IOU effectively to extract maximum performance from the hardware is really really crucial With the mitigations that have been put in place for Spectre and Meltdown The system calls have become expensive and handling IOU effectively to extract maximum performance from the hardware is really really crucial Is there anything in Linux that can help us achieve this primary goal? Thankfully IOU ring helps us achieve this primary goal significantly So what is IOU ring? IOU ring has been making headlines in the last couple of years in the Linux world In fact IOU ring has been touted as one of the One of the leading innovations in the Linux world in the last decade Lot of experts believe IOU ring and EBPF will revolutionize programming in Linux in the next decade Every morning I wake up to see IOU ring set up new benchmarks or break new bench or Break the existing benchmarks every kernel release. There are new features added to IOU ring So IOU ring is definitely making headlines. Let's look at what is IOU ring Before we get to IOU ring, we will also have to look at the traditional Linux IOU model As you see in this diagram the app Application requests the kernel using system calls like open Read write etc which go to the VFS and the page cache Which decides whether it needs to go to the puffer based IOU or it's a character IOU or whatever it is And then processes the IOU The IOU that you see here the read write etc is synchronous IOU Asynchronous IOU also exists but not used extensively apart from the storage world Threads are used for simulating the asynchronous behavior But the system calls themselves are not really asynchronous E-Pole-Pole select based IOU also is used extensively So as you see the system called performs a context which into the kernel world The kernel handles the system call and returns back the result to the application And the application would actually wait In the same thread for the system call to complete So As you see Linux has been having synchronous IOU Traditionally Now let's look at what IOU ring does differently So IOU ring is an asynchronous IOU interface that has been added to Linux with the kernel version 5.1 It is not only for block ordered IOU You could use IOU ring for File and network based IOU as well There are set of pair of rings that are shared between the application and the kernel or the user space and the kernel Called the submission queue and the completion queue Three new system calls have been added IOU ring setup, IOU ring register and IOU ring Enter These are the three new system calls which have been added to make use of IOU ring in the latest versions of the Linux We would recommend using lib-uring instead of the system calls directly As it hides the intricacies of using the system calls With this now let's look at the IOU ring operation in detail So whenever an application needs A system call to be performed The application would actually produce an SQE which is called a submission queue entry And fill in the details of the system call which are the opcode Which refers to the system call that needs to be executed and the flags which describe the Uh, how the system call needs to be executed The parameters to the system call like file descriptor address offset Etc User data is something like a cookie is more or less like a cookie Which helps relate the SQE the submission queue entry and the completion queue entry Now once all the submission queue fields in submission queue entry are filled up The SQE tail is updated and the application invokes the IOU ring enter system call Now once this is done There is a context which into the kernel the kernel consumes the SQE updates SQE ahead processes the operation that has been requested in the opcode And then the kernel Produces a completion queue entry Updates the fields in the completion queue entry like the result of the system call And the user data that has been passed in the SQE And then the kernel also updates the completion queue tail Now once this is done the application consumes the SQE entry And updates the completion queue head So as you see IOU ring is a pair of rings IOU ring uses a pair of rings between the kernel and the application To actually request the operation that needs to be performed And a separate ring to give the result of the operation requested back to the application area Now let's look at the performance benefits that IOU ring gets and how does it get those performance benefits So IOU ring provides something called as resource registration So whenever a file descriptor is passed to a system call in the kernel The internal reference to the file descriptor passed is actually extracted in the system call entry And released in the system call exit Similarly, whenever a buffer user buffer is actually passed to a system call The pages corresponding to that Buffer that's passed are pinned on system call entry and un-pinned on system call exit This happens for every system call that is done So to avoid this operation IOU ring unnecessary operations IOU ring provides Way to pre-register the buffers and file descriptors with the ring This way the need to pin and unpin the pages and extract the file Extract and release the file references for every system call can be avoided Please note that when The resources are registered the underscore fixed variant of functions for preparing the SQE need to be used for example underscore read underscore fixed underscore write underscore fixed Are the functions that need to be used when using the resources which are already registered with IOU ring Otherwise the operation would be like op read and op write Okay, this is one of the ways where IOU ring optimizes the cost for every system call IOU ring also provides what's called as linked commands To understand what is a linked command Let me give an example Consider a web server running in an iOT device Which needs to accept incoming connection request Receive a request process the request and send the response and close the connection So there are four system calls that get executed here accept Receive send and close Now IOU ring provides a way to actually link all of these system calls together And just execute one system call in place of four system calls The system calls accept receive a send and close can be chained And provided to the kernel in one system call And this helps in the number of system call reductions In reducing the number of system calls Please note that the subsequent SQEs are picked up only if the previous SQE is successful For example Receive SQE is picked up only if accept is successful Similarly send SQE is picked up only if receive is successful If for example accept fails for some reason The remaining operations in the link in the chain are not executed And the system call returns here itself with a failure This brings tremendous performance benefits in terms of the post-spectre and milldown mitigations The number of system calls are reduced and the IOU is definitely going to be performant IOU ring also provides us with what's called as polled IOU Where in SQ poll there is a kernel thread that is used to poll the submission queue For any new additions or any new additions or any new submission queue entries added to the submission queue This way this helps in very low latency use cases where instead of Doing a system call to let the kernel know about a new submission queue entry The kernel keeps polling the submission queue at regular intervals To know if there are new submission queue entries Mind you this definitely has an impact on the CPU usage But as I mentioned this is for cases where the latency is of primary Or the prime is of prime importance Last but not least one of the major important factors that IOU ring has Supported is LSM auditing which means that IOU ring objects can be used with mandatory access control Applications can actually apply labels to IOU ring objects and write SC Linux and smack rules And policy So that mandatory access control allows only the processors Which are allowed to access the rings to access them and read and write to them So this is very very important in terms of automotive security So as you see IOU ring has all that it needs to be adapted into the automotive industry IOU ring has been adopted by android for their software update implementation in android 13 Where they use it for merging the snapshots In the OTA world, they also use a user space blockchain which is based on IOU ring for the software update And as we see the code the fastwood fastwood utility also shows code pertaining to IOU ring This shows that android has adopted IOU ring for their use cases Similarly rust c++ have also adopted IOU ring Whereas databases have also started adopting IOU ring for their performance benefits Now let's look at where IOU ring can add value in the automotive industry Are there any use cases that IOU ring can add value in the automotive industry? We have identified a couple of use cases Where we believe IOU ring could add value in automotive and started gathering data about the performance benefits that IOU ring can bring in with these use cases The first use case is about media indexing Whenever a USB or a SD card with media content is plugged in into a infotainment system The media scanner or the media indexer kicks in Which iterates through the directories to identify the media content either through the MIME header or through the file extension And once it identifies a particular media or particular media content It extracts the metadata of the media the album art etc from the file And then stores that into a database Now typically storage devices media storage devices which have which are of GB's in size which could have songs ranging from 1000 to 5000 songs Typically will have to undergo this whole process Let's say I have 5000 songs in at least 1000 directories The 1000 directories have to be traversed Each song needs to be picked up Each file needs to be opened up Extract the data the metadata needs to be extracted And all that detail needs to go into a database This is lots lots and lots of IO huge amount of IO is involved in this process All this can be done with IOU ring in a more Performance way where the number of system calls can be brought down By chaining the operations of the opening of the file Getting the metadata etc All these operations can be chained in IOU ring And a lot of performance benefit can be reaped through it Mind you that reDIR and LC are the two system calls Which a lot of media indexing open source components use But these are currently not supported by IOU ring It will be really good to have these supported in IOU ring So this is one of the use cases where we believe IOU ring can definitely add value Now let's look at a reference architecture for a container based cockpit Before we actually look at the next use case As you see here We are actually considering a container based cockpit Which is running on a multi core SOC We have multiple containers like sysman container Cluster container, IVI container We also have an SLB vehicle processor Which provides the canned details over SPI or UART There is also this safety monitoring island Which provides or which interacts with the Linux kernel or shared memory Which is typically an RP message based interface So now when we actually do a software update for this Or when we want to do a software update for the container or hypervisor based system There is definitely a lot of IOU involved To download, flash and verify the image There is huge amounts of IOU involved And not only the primary system The other issues connected to the IVI or the cockpit And the issues are connected on various buses So there is a lot of IOU involved in terms of doing a software update Also if AB update is the preferred solution of TA1 or OEM The pack up partitions also need to be updated On a successful reboot of the system For example to make this statement clear Let's say the system boots up with from the A partition And the software update updates the B partition And now on booting successfully from the B partition The A partition also needs to be updated So there is huge amount of IOU that is involved With respect to software update For example if you look at a hypervisor based or a VM based Cockpit again the VMs could get updated Again in terms of container the containers could get updated The SLB component could get updated The safety monitoring island could get updated So there is a lot of IOU involved in software update And IOU ring definitely can play a huge and significant role in software update So that is the reason we have started evaluating the impact of IOU ring on software update Now let's look at the next use case As you have seen in our reference architecture We are using container based system And we are using binder for inter and intra container RPC So let's take a use case When a media is playing on the AIVI The cluster is updated with the duration of the song The now playing metadata of the song And a file list or a browsing list for the songs Is also displayed on the cluster HMI Now for a song which is of duration 5 minutes 300 messages over binder are sent To indicate the duration of the song to the cluster For example, for every second the duration update Is actually posted to the cluster for the HMI to get updated This gives you an idea of the number of RPCs messages That could happen between containers or intra containers Now this If the transactions of the RPC are not optimized To find a transaction for the RPC are not optimized There is going to be a significant cost in terms of IO So we are focusing on optimizing the binder RPC transaction times Using IOuring Binder transaction typically has or uses A single IOctl or multiple IOctls to complete the transaction With the addition of the async IOctl support With IOuring op during command We want to actually see how we can leverage this To make the transaction times of binder optimized Using the NVM driver as reference NVM pass through driver as reference We are trying to implement the IOuring command As an async IOctl and C and implement the binder transactions in the binder driver Using this to evaluate the performance benefits That we would get with using IOuring for binder Not only binder IOuring can definitely play a significant role in any other RPC For example, RPCs like D-Bus, CapnProto, which use sockets as their transport layer Can also benefit a lot from IOuring The buffers can be registered, descriptors can be pre-registered with the ring And performance benefits can be evaluated with that We know that CapnProto, D-Bus provide interfaces over Or use Unix and INIT domain sockets as the transport layer Definitely the transactions or the socket messages or the system calls can be Chained together to extract performance benefits from IOuring We are currently trying to evaluate the performance of IOuring 0 copy send on CapnProto We will definitely publish the numbers as and when we have those Again, not for containers, but we are also evaluating If Vsock sockets can also benefit from IOuring We are currently using Vsock for RPC between VMs in a hypervisor based or VM based cockpit We have our own RPC which is based on sockets between VMs And we are trying to evaluate the performance that Benefit that we would get with using IOuring and Vsock Moving ahead, I will want to show you a small snippet from a can over Ethernet or WLAN bridge That is taken from a presentation given by Oliver Hartop in an AGL summit As you see here, there is a socket, rock and socket that is open And there is a WLAN socket that is open for a network interface, a UDP socket Now the typical binding and connect that is already implemented You have a while one here, which actually reads the messages from can socket And writes the same message over to the WLAN socket This is a simple can over WLAN bridge As you see here, read and write are two system calls that are needed for this bridge to be realized The same can be actually realized with one system call in IOuring Let us look at that As you see here, in this example, the while implementation just gets an SQE, a submission QN tree Prepares the read request Sets the flag to IO SQL link, which indicates that the operation Is actually chained with the subsequent system call Which means that this and these operation needs to be done together And the user data is set Now a new submission QN tree is created And the right request is created, the user data is set And with one IOuring submit with just one system call You submit both these requests to the kernel Now the previous code snippet which took two system calls now just takes one system call And consider the amount of traffic that you would have on can If the number of transactions that are happening or the number of packets that are received on can Or 100 You would have 200 system calls in the previous transaction And in this current transaction, you would just have 100 system calls Mind, there could also be other optimizations from the previous code snippet Which are not shown here for lack of space But the previous code snippet can still be optimized to Link a lot of those system calls and just have a couple of system calls We also are looking at various other use cases One of them is an EAVB paste use case where we have telematics Systems that are connected over Ethernet There are telematics calls There are lip sync audio based use cases with EAVB We are trying to evaluate if IOuring can add value in those use cases You also have a couple of cockpit related use cases In both hypervisor and container based cockpits Which we are trying to evaluate and will take up after the current set of use cases are evaluated There are some use cases around camera Where we believe IOuring can definitely play a significant role And as I mentioned, we will take up all these use cases Once we have the correct current set of use cases evaluated and modified according to According to the latest IOuring Now let's look at the challenges involved in adopting to IOuring One of the major challenges with respect to using IOuring Is it requires the latest and greatest kernel versions And very few stock vendors support the latest versions of the kernel Or provide the BSP with the upstream version of the kernel This is one of the major hindrance in adopting IOuring in the automotive industry The second challenge is The dependent open source libraries also need to adopt to IOuring To explain this, let me take a use case We have our media indexing engine Which is placed on Taglib Now Taglib also needs to adopt to IOuring For our media indexing engine to effectively use IOuring Similarly, other open source implementations like SQLite etc Need to adopt their implementations to IOuring So that the middleware and the application components built on top of them Read the benefits of IOuring Last but not the least Developers are still not comfortable with the async IO programming model That IOuring brings in They are still comfortable with the threading and the synchronous IO Programming model This is something where the industry and the community need to work together To help developers get comfortable with the async IO programming model of IOuring To understand the challenges in more detail Let me walk you through an example I have set up an SSH session into my Odroid C2 Which is running Linux 5.9 kernel And then Ambien distribution We have two versions of Taglib One which supports IOuring and one which uses the traditional IO model So now let's try to run an application called as TagReader Which reads out the tags from a provided mp3 file Now let me run the tag reader on an mp3 file As you see it took around 30 milliseconds To read the tags from warning.mp3 Using the traditional taglib Now let's look at the same statistics with an IOuring based taglib It took almost a similar time considering environmental factors I would still consider this to be same But why is that we are not seeing the benefit Performance benefit with respect to the taglib that uses IOuring To explain this I will have to walk you through the source code of Taglib and also the stress of TagReader Let's look at the source code of Taglib As you see here taglib uses the function read file Which is a Fread based on the C file API And the function read block actually uses Read file to read the data from The file To also explain this in detail To extract the mpeg mp3 data data As you see here taglib uses seek Which is an lcg implementation followed by read So both these are synchronous operations And as I mentioned earlier lcg is something that's not yet supported by IOuring And read is actually synchronous read here Because the moment you see here read is expected to provide Data after it returns straight away So with no disrespect intended To any of the taglib developers Taglib was never designed for handling async.io To make this statement clear Let me actually walk you through the stack trace of The tag reader application So as you see The tag reader application Uses lcg and IOuring Read operations that were submitted with IOuring continuously If you were to reap the benefits of IOuring The read operations at various offsets Could actually be chained as IOuring lets Us pass the offset as parameter to read So instead of doing an lcg Various of read operations could be chained together At various offsets and executed with one system call This is how we can actually extract maximum performance benefit with IOuring But as I said and as I've shown The current taglib implementation is not Designed for async.io Which would which is something that will be needed For us to submit multiple IO requests at once to IOuring Hope this makes Understand the challenge in a clear manner In conclusion I would want to say that automotive systems involve a lot of IO And IOuring helps make IO performance The automotive industry and the community need to come together To drive the development of new features in IOuring And trigger widespread adoption For example, as I mentioned earlier Read dir lcg are two operations which are not supported But are needed by various media indexing engines So the community and the industry can work together To make sure that these operations are supported by IOuring and upstreamed So the collaboration between the automotive industry and the community Is very very crucial to develop and bring in new features into IOuring And trigger widespread adoption There are challenges with using IOuring And those challenges can be overcome With the industry and the community working together Last but not the least IOuring is not the solution for everything It needs to be used with care These are the words that Ence himself mentioned In the kernel recipe stock that he has given this year With that I will want to say Let us know your IOuring use cases Please write to us Please share your feedback on this talk And thank you Thank you for letting me present on IOuring Thanks