 Okay so hello everybody, my name is Marek Lovar and together with my colleague Dushan Srivaka we will tell you something about symmetric mode processing in Android. So first of all let me tell you something about myself and my colleague. So I am the author and maker of our message library which we will learn about later in the presentation. And my colleague Dushan is the author and maker of the RPC library which we will also learn about. We are both very enthusiastic and we are working at an exhibition on the factors. So during our presentation we will introduce you with as much mode processing as we understand it. With real machine messaging in Windows kernel, then with RPC message light which is an implementation for optimized for real time creating systems for small devices. And we will end with embedded remote procedure call library which is therefore making it easy to use AMB in your system. So first of all, motivation. Why do you use AMB when there is SMB? So thanks to AMB your system can be faster of course because there are multiple cores, can consume less power, can be safer and can be more secure. Now I will explain how this can be. So all of this is due to the fact that not all CPUs in the system are treated equally. The cores run independently on each other completely. They are not under the same operating system. They can have different operating systems. They can have different architecture. It can be a dedicated ESP for the data computation. And as they are completely separate it can be used in AMB. AMB can be also used in most criticality systems which means that on one core you can run Linux which is not certifiable for automatic applications, not yet. And on the other core, on the so-called AMB core you can run an ARCOS which will take care of critical tasks in the system. So in order to make the system work you need the two parts of the system to communicate somehow for this. There is a shared memory and two inter-core interrupts. One core is so-called master and most of the time it takes care of life cycle of the other core which means that it enables the core enables its clocks, power and the core starts booting. In some scenarios it also is able to change the firmware of the secondary core but it depends where is the secondary core booting from whether it is from VR or external, what is VR flag or something else. So this is the concept. Another important thing is that in this system or maybe this is getting a little bit specific for the protocol which is used, VRB message, but generally master is managing the shared memory. So it is a master's role to enable clocks for this shared memory to initialize any data structures which are later used for the inter-core communication. So finally, RPMSG. So RPMSG or remote customer messaging is a very thin layer on top of a transport layer and it actually defines just a utility like header which contains a source address which is an address of an endpoint on the sender side, on the side of the sender. Destination endpoint which is like a UDP port, again, but this time on the side of the other core and the other space of source and destination is always vocal to the core. This can be done like this thanks to the fact that RPMSG is point to point protocol. So always between one pair of CPUs there is one instance of RPMSG. Then there is a reserve field which we will not talk about now and then here is the length and the actual payload. And like this, this is one buffer transfer during communication. So in the beginning RPMSG used to have only one transport layer which was and still is based on Vertail. This is the most generic implementation which is currently used by the majority of vendors which are supporting RPMSG on their platforms. And initially support or maintainer was... These days the maintenance was done by Bern Anderson from Linaro and he did a lot of work between all the patches. One important thing I can see there is he split the core implementation of RPMSG from the transport layer so that it is possible to have different transport layers. So not only Vertail can be used but any custom way to transfer the data can be used. Currently there is Geolink and SMB which are ring buffer based implementations for multi-processor authentication. This is unfortunately by implementation mainly for platform platforms. So we are getting to the state of RPMSG in Linux kernel. So in Linux kernel there is another thing which is called RPMSG pass which is a virtual pass to which the RPMSG devices are registered. So either it is a virtual RPMSG device which is in... If you check the sources of Linux it's in driver RPMSG, virtual RPMSG pass.c Or it can be Geolink which is in the same directory. The name is different. Or SMB which again is in the same directory but there is this prefix like you can call. Once a device is registered, thanks to the probing mechanism presented in Linux kernel, it gets probed with one of the RPMSG drivers. There are several of them. One of them is called RPMSG car which is quite recent and it exports actually the RPMSG device or its endpoints. That refers to the UDP like header I showed you before. There was this source of destination address. So one address is exported to user space using a correct device. And it appears in slash there slash RPMSG. And then you can write your own pipeline script which accesses this device and it will actually send the data through shared memory to the secondary port. This is one thing, one way how you can leverage AMP. Another way is to leverage it internally in kernel. So Qualcomm has this SMDRBM, resource and power manager which actually takes care of power management in the system and in the host side, in the kernel, there is just an API and the actual work when setting the state or when changing code frequencies etc is done by the secondary core and those commands are communicated through RPMSG to the other side. And last but not least there is also another RPMSG driver called Video4Linux and this one is used by, well as you can see by Video4Linux framework and what it does, it sends not-decoded frames or pictures to the other core then the secondary core takes care of the decoding and the raw data are then sent back to the kernel and it's going to be used by the kernel. So the discrete plus and minus form computation is not done here but it's done on the other side. So we have actually, here is the floating and this can be regarded as power management which is a kind of critical part of the system. So it is, again, floating to the other core but it can be regarded as, makes a criticality kind of way how to leverage RPMSG. So this is currently an upstream current. If I zoom a little bit, here we have the RPMSG pass so I take this branch and under the RPMSG pass and the virtual art message there is the virtual trial pass which is, again, a virtual pass to which the virtual device is registered by a module called RemotePro and this module takes care of loading the firmware for the secondary core and once the firmware is loaded it parses a portion of the firmware and in this portion there is a thing called resource table in which there are entries for virtual devices. Once it finds this entry it registers a device and then gets roped with the virtual RPMSG, this virtual driver provided by the virtual RPMSG pass.c module and then this gets roped up and the rest of the story. So another way to create a virtual device is to use a custom platform driver which is not the most beautiful way to do things but there is a problem and because of the problem this is the reason why it can be also done like this. The problem is if you want your cores to start completely independently which means you don't want your Linux to start your secondary core but you want an early boot by the secondary core for example you enable some device and you need a very fast response which can be achieved by Linux but you would need to tweak the kernel to make the cutout time as small as possible there is a lot of effort done in this way also but if you want to take another way of achieving this you can leverage the mp core and let it boot very fast take care of the critical parts of the system and then you boot Linux kernel and kernel start slowly and the mp core is already taking care of everything and during the Linux kernel start up this custom platform driver registers directly the already existing virtual device which is already up so it is not taking from the firmware file and the firmware file is actually not managed by the kernel in this case so this is another use case and now we get to the point where I will try to explain a little bit how virtual works and with regard to rp message so virtual is quite complicated but I will explain how it works and how we use it with rp message is anybody here familiar with virtual? just hands up so for me to know ok so 27.5% so it's simple virtual contains a thing called third queue and each data structure and each word queue is composed of two ring buffers and the magic here is that each of the ring buffers is single reader, single writer ring buffer so in the beginning this ring buffer let's say it's full of three buffers and another thing I would like to point out is that those ring buffers are actually used only to store pointer to the actual container of the data so there is a pool of buffers and then there are those two ring buffers containing just addresses, just pointers to those members of the pool of the pool buffer pool so on one side it wants to send a message it reads four bytes of this ring buffer to make it simple which we can call allocation then the payload of course the rp message header is written into the buffer and after that the actual payload is written into the buffer and then the address of this buffer is written into the unveil ring buffer the reception side reads those four bytes or reads out of the ring buffer and it gets to know the address of the buffer which was sent but actually just the pointer was passed to the other side and then it can do whatever is needed with this buffer and once the other side is done with that it writes back the pointer to the unused ring buffer you can call this the allocation of ring this is quite an interesting way of communication because you don't need any inter or walking during the communication and it's efficient because it's not just a ring buffer because ring buffers in this case it would not be efficient to use a ring buffer because you would need to do a copy into the ring buffer in order to send the data and then you would need to read out the data by byte by byte if you would not want any walking so this way it's very efficient and that's why Gradio is actually used for this multi-core communication and one important thing it allows for zero-copy actually I will show the procedure here there is an interrupt but this is not mandatory it can also work without the interrupt if the other core just pulls which might look like a bad approach because well it will consume a lot of power but with some architectures it can be the way to go for example there are those programmable real-time units from TI if you know about it and the usage of interrupts is limited there so again I will just go through the process so let's say Linux would like to send something to the remote core so it reads out of the use ring buffer it writes the header and the payload then the data gets enqueued into the FAA ring buffer which is another ring buffer this is shown by a different color and then an interrupt is triggered and the other core is notified it reads the address from the available ring buffer and it passes it to some callback in the interrupt context and then there might be some ARPOS queue where your friends point you to the buffer and then it can be consumed in some ARPOS task context interesting thing is that the buffers can be freed later out of order and this also would be complicated well with a simple ring buffer it wouldn't be possible to even do a zero coping here it is possible to do zero coping and the allocate or free the buffer once they are processed out of order right there and when the free happens again there is a trigger an interrupt to the other side so that the other side knows that for example if this pool is depleted Master could wait for an interrupt and once a buffer is freed an interrupt would be triggered and it could unclip and continue so this is a simple process when the communication is going from the other side it is completely the same just another if you compare pictures just the color change and then the sense of arrows it is done in the same way just another pair of ring buffers is used so in total there are four ring buffers in the system that is important now for our message lines so as I said it is a very small library which is constituted by those files you see there are some headers but those are not very important so there is this art message lines art message queue and art message name service art message lines provides send API API for creation of new endpoint registration of new endpoint etc art message queue which is not mandatory can be disabled or it is not mandatory to use it provides blocking receive which might be useful if you use an art post if you don't use art post you can just use an interrupt call that's it but you don't see there is where the magic actually happens it's a very lightweight implementation of vertical protocol as I described it then there are those two layers one layer for portation to the ARCOS basically there is just a very thin abstraction on top of the mutex it's a more foreign view, very thin and then there is this platform layer which is used for portation to different platforms so there are interrupts platform dependent things but interrupts nippling, nippling etc and this name service I forgot about this is used for announcement of services to the other side because when the system boots up the mutex doesn't know or may not know what firmware is loaded in the AMP core so the AMP core may send a message to a fixed endpoint address and at that endpoint address Linux is looking for a packet which is formatted in a certain way where name, the compatible string is sent there and the address of the newly created endpoint on the AMP site so that dynamically drivers in Linux kernel can probe like a new ARCOS device can be dynamically created in this way so the AMP core actually announces its capabilities but this approach is not mandatory again it can be done in any other way so as I said it's there are two protection layers it's quite simple and it's PSD licensed so you can go and use it there is a free ARCOS port, a bare metal port and Zephyr port is in progress and if you'd like to look at the code it's hosted at GitHub under an extra micro-slash of MSI and now let me pass the ground to my colleague Lushan who will present to you who will tell you something about a very remote procedure call I will... address? yeah, you mean here with the name service well Linux needs to know to what address send or with whom to communicate because one thing is he would send some packet but when he doesn't know that the firmware for example provides some this quick-to-sign-transformation service he would not know where to send messages or what is the purpose of the firmware actually so either something is... there is some entry in the device street or it can be done this way dynamically there is no broadcast, no yes, yes it's like UDP, there are no connections there are some UDP and... yes sorry yes, yes, on both sides it's like UDP it's fire and forget just send a packet and you don't know what happens but well, it is in the same SOC so likely it will not be dropped or anything but it depends on the implementation what happens in the hardware system so let me pass the mic to Dushan thank you do you know something about RPC? it's a... commonly it's known so I will start just quickly what RPC is why we are developing the RPC and how to use already RPC so what is RPC? RPC is a mechanism to involve software routine on the remote system as a simple local function remote system can be any CTO connected through communication channel like servers across network under CTO multiple system for users it looks like calling local function from library built into application RPC is a type of clean-signer request-response type of communication here I have an example so application on the clean side will bring a function data are passed to stack code which can be written or generated and here are data serialized put to transport and send to server side server will deserialize code in server stack code serialize data and call appropriate function with this data if there is expected any written values these are again serialized send to clean deserialized and written to application here is RPC count so why we are developing the RPC when there exist RPC solutions already you could see some presentations already here so many of these RPC modern solutions are based for high performance distributed systems commonly created over the network we are focusing on practically couple systems using C program like we each with small code size for multi processor we are supporting SPI transfer who are transferred to some others for multi-core we are supporting RPC less each which was presently by Marek or Emu which is using registers to communicate between so what are main requirements as I said it's small code size RPC library has less than 5K main program language so we are supporting C mainly to call RPC calls we can use C++ or Python in future there can be others we are not forcing users to use particular API style stop code is generated it can be run on bare metal linux or free rcos we are considering 14 to 15 years it's modular and simple and it's open source so how to use ERPC in applications I can explain on the example let's say we have to output device on one core we have linux which is using for what you want server or play media application and you have regular transceiver and you want to use it for quickly communication through it so you can have another core which will communicate through this radio transceiver and you want to communicate between this linux core and this second core so you can use ERPC and wonder what you need to do as I said stop code is generated so first you need to define your interface and you need to define it in an ideal file which is using interface definition language which we designed to be similar to C but as you see you need to provide some more information to let us know how the data should be serialized because it's not always obvious from function declaration here is the ideal definition and here is generated code this is just a header file there is of course more code but this is what a real user needs to call in his application so you can compare it that it's can't see more so when you have this ideal file generate your stop code then it's everything you need to that you need to create your application so what you need to do then in your application on plain side you need to initialize the RPC and then you need to call your RPC functions and then just initialize in the end you also need to initialize RPC and then register services one service is one defined interface and then you need to run RPC server you can use run function or pull function it depends if you want to call handle just one RPC call and then use your another code so you can use the thread for execution some other user code or you want to just handle RPC calls and then you are done just initialize RPC so this is everything for me if you have any question for me or Marik you can ask here we have our repositories for projects if you have any questions you can send us email you can contribute you will be interested in these projects thank you for your attention so if you put a pointer to data data will be this way not a pointer to other side there will be a copy down but as the communication can be done or we are mainly focusing on multi-core communication there is a possibility to pass these pointers but of course when you communicate over spark this is not possible to this is a problem which is to be solved in RPC message as well so RPC depends on RPC message but those are physical addresses you can find your own data types and put it into your functions you will be declared and these structures will be also deserialized how you declare currently RPC is going through structure and serialize the exact size here we are supporting input also output you have very important questions yeah that's good so any other I wanted to return just a previous question currently we are supporting annotation we are calling this annotation to add some more information annotation shared where you can add the exact pointer address from one side to the other so the data are not serialized to the pointer so that's the more curious question but it's in the developing stage yeah well there is quite a lot of things to consider in embedded RPC for example we've been implementing lately nested calls that you can nest back and forth it's quite interesting because then the aim of the game is to take some stack and really provide for example a socket API to the other side and let the whole stack run on the game and when you do this sometimes by the design of the stack or the library you would need nested calls back and forth so we also support this so like edit yes sorry if yeah if one question is what would how do we handle that it's a good question in again this is something which is not yet well handled in RPC so for energy platforms we have something called power manager where we take power peaks but this is mainly used for multi-core microcontrollers in case of linux where there is a microcontroller and linux core this is very hard to handle in power and it is to be solved there are regular meetings with calls held by OpenAMB group each Thursday our time from 9pm so if you are interested in the hard message project or in AMP general you can go and join those calls if you send me an email I can send you an invitation we can discuss it but there is a lot of things to do yeah I'm taking it for the question but of course later the aim is to handle all this it's quite platform dependent that's the problem if it sends error messages well the data there is no checksum imposed by our message but in the payload data the application can do some checksum if needed for it because the aim is to make it generate the problem with with reboot or reset it's quite platform specific this as well there is this is quite you are getting to be hot it is relying on vertio so technically when one when there can be like back pressure if ok or from application you can send some knowledge if you want and wait for the technology that depends on the implementation but if you don't use anything and you just deplete the pool the master will not refill the pool and we will just run out of the buffers and we will have to wait back pressure maybe call it like this yeah it is ok you can reset but again this is button dependent it depends on the system there was a question yes this is possible and actually maybe it is possible to run server both sides yeah yeah ok so you can have one side and you can have the pool space you can have two instances we are just keeping one it is possible and also you can have multiple instances if you have multiple cores in your system and you need multiple independent channels running independent places in the shared memory yeah because I have been talking about packets or frames we need to consider those are just data written into a shared memory and the shared memory is an SRAM like link inside the system so I don't think this can really happen well you mentioned it going over UR you are talking about ERPC in case of ERPC of course there is an acknowledge there is an upper layer on top of either RB message in case of UR there is a CRC yeah there is CRC 16 or something like that yeah so any other question or maybe we are running out of time yeah so how the SPI transfer works so maybe which type of transport is used from RBC to serialized data and put this into transport so then it's called SPI sent on driver it's really important for SPI so as I said there is frame transfer so for SPI also for UR there is first message where there is CRC and how it works on the ICGs and then messages sending data and other messages and on the other side the CRC station RBC to serialize there are no other questions in case you are very interested in the ICG or just go and contribute to those so thank you