 I am Anshul Makar and today my topic of my presentation is dynamic scheduler over free RTOS. It's about dynamically uploading or replacing tasks that are running on an embedded system over free RTOS OS. So what is the problem? What will be the agenda for today's presentation? So I will first go through what is the problem that I'm trying to solve here then brief about the requirements. We will go into the details of the requirements then project progress at what phase what needs to be done then followed by a demo on actual live board and then followed by questions from your side. So the problem that we are trying to solve that led me to development of this module. So I was approached by a company where they wanted where they wanted the satellite to switch tasks dynamically without any need of system reboot or without any delay and second requirement was that they should be able to maintain the software on the spacecraft in an updatable and healthy state again without complex reboot process or without need for going through stopping the process stopping the task and starting it again and also again yeah we will go into the requirements in the next slide. So we identified two approaches one of one is we can just replace the task that's in execution on the fly and second is full image replacement. So with the first approach with the task replacement it's fast but there is high risk associated with that risk in the sense you are directly accessing the task memory there can be safety concerns around that around that concerns about attacks because you are directly accessing the memory full image replacement it's slow it's less risk and context is completely lost but slow is one of the main criteria and inefficient utilization of the bandwidth that's another problem with this approach. So we went ahead with task replacement going into the details of the requirement. So yeah giving you starting with the most important one reboot is expensive and it it leads to loss of the context of the currently executing task or the task in the execution and we don't want that giving you an example specifically related to this satellite world suppose a spacecraft is taking an image of clouds or reporting weather data and immediately from the earth station they receive a message that there is a risk of collision now immediately they want the task the satellite or the spacecraft to stop taking images and immediately switch to a task of collision avoidance or by altitude control so start doing altitude control start the task of altitude control instead of taking images and it should be on the fly it should not happen that first they have to complete they have to bring the complete system on on the space spacecraft down rebooted and then start the new task with altitude control it can be disastrous so they want everything on the fly second thing since the softwares that are running on the spacecraft they need they need to be maintained in a healthy state and for that all the bug fixes they need to be patched all the bug fixes applied new features added again everything on the fly without the need of system reboot or the task reboot because again system reboot task reboot all these are very expensive operations where you lose the context bring the system again all the other tasks are also brought to shut down and brought up again so we don't want that we want software to be in healthy state running state all the time all the patches bug fixes should be applied on the fly and so so so this bug fixing on the fly anything on the fly and task replacements also on the fly and at runtime this will ensure that our system is in a healthy state always in a healthy state and our system on the spacecraft is doing the task most important task that's required at that moment and another main requirement for going with a task replacement approach rather than full system image upload was that efficient utilization of bandwidth the bandwidth between the earth station and the satellite or the spacecraft it's quite expensive it's limited and we have to make sure that it's efficiently utilized if for if we are sending the full image or the full module from the earth station to the satellite it's highly inefficient it's going to be slow and it's going to be expensive we don't want that we want only the part that's needed to be updated should be sent from the earth station to the satellite or only the new task or or only the instructions to switch to new tasks to be sent from earth station to the ground station from from the earth station to the satellite and rest should be handled in the main system running on the spacecraft so that that will ensure that we are utilizing the bandwidth in the most efficient manner so yeah these were all about the requirements which led to this me developing this module called a dynamic scheduler so what are the design considerations while while I design this module or this system first of all I designed it for a spacecraft industry so it's it's the requirements were a bit different and stringent in the sense again I will go into details of how they are more stringent and different in further slides but coming first to the basic ones again it's for embedded world it will run on a embedded system or on the spacecraft so I have to ensure that its data and memory footprint is optimum second thing I have to ensure that the main binary or you talk about the main os binary is compiled and is independent from the application binary this is important to understand from design perspective so what I have in my system is that the main binary system binary or the operating system binary here it's a free autos binary should be compiled and loaded independently from the application binary application binaries can be developed from many of the ground stations so we have we have the main platform system running on the spacecraft that has that has the free art that that that is running the free autos and then the application binary can be developed from any of the ground stations application binary or you can say the task which different companies different organization or different clients want to execute on that spacecraft so they will be developing that independently so once the task is developed they should be able to compile it independently of the main system binary and send that task from ground station to the satellite so that was a main approach I have to use free autos but I have to make sure that's a standard free autos there are no changes on it so that and so that the application developers know which api where to which api their applications have to link against and what they expect once they upload the application binaries to the spacecraft that's running free autos so it's a highly plug-in plug-able component plug-in architecture that I used or that I kept in mind while while designing the system again yes it has to have a minimum performance overhead and performance minimum performance overhead and it should have a minimal memory overhead that's again the requirements for my design coming into the implementation stage at simplistic level I have a main system binary running on the spacecraft on platform system application binaries compiled independently from different vendors sent to the spacecraft at runtime system binary main binary detects that an application binary has come application binary can be a new task or it can be a patch or it can be a bug fix for the existing task then the application binary is allocated linked to the main system binary and it starts its execution so generally going further into details into the design so a task has two states one is a user state it consists of heat stack code allocated in a virtual address space directly accessible by the user then we have os state which is allocated by the os not directly accessible by the user accessible only via specific apis or interrupts or whatever it is and here now I defined a new task state called checkpoint state what is a checkpoint state so apart from the user state and the os state checkpoint state is a state where the task is in consistent state so while designing this application binaries the developers have to make sure to define checkpoint states in their task because the way I've designed the system is that a task can be updateable only if it's in checkpoint state again giving you an example what is a checkpoint state I defined a checkpoint state as a state where it's stack where the task stack heap are in stable state and there are no transactions on the fly for example suppose a program or a thread a process is running a for loop to read data from the memory now you can't define a checkpointable state in that for loop so once that for loop is over and the memory is in consistent state it's stack heap are in consistent state there are no transactions there are no instructions program instructions currently in execution at that point user can define the checkpoint state so when a task start it goes into an inconsistent state it does various operations then it goes into a checkpoint state as I mentioned so it's in a consistent state so once so when I have to do migration I wait for the task to go into checkpoint state once it's in checkpoint state I freeze the state and call it checkpointable object then I decheck it so what does decheck means is that if an update is received from the ground station or a patch is received from the ground station so I merge the old state of the task to the new state merge those state and then so I form a new task state and then the task starts to run again so now the task that's running has a new code or the bug fixes or the new patch that has been uploaded from the ground station to the space so going into detail migration you can see from this so we have a task code let's call the code in v1 state we have task data let's call it task state v1 so we have code in v1 state task state v.1 it's executing and we create a checkpointable object we create a checkpointable object so it's at this state we are ready to migrate to a new state so we have this stage four then transform the task state then we have code v2 and then task state v2 this is the merge state or this is the task that has been merged with the new task that has been uploaded from the ground station and this is the new task it and then the execution start of the new task state now coming on to the components so what are the components in my design first of all free RTOS that's the main system binary that's one of the important components then there is ELF binary now this is important again an important component that I have used in my system I needed a way to completely control the application binary that has been loaded from the ground station I needed to under the the system has the system needs to understand what are the requirements of the application binary let me put it another way so earlier I mentioned that the requirements for this project being in space industry are bit more stringent so and so and inherently if you are talking about dynamic patching dynamic task replacement it sounds a bit hacky so which will not be allowed if you talk about ESA or other regulatory organization they won't allow this because in a way you are touching directly the processes memory and updating it modifying it which is not acceptable so I need to come up with a way which can address their concerns now when when I was designing this system one of the simplest way is to use jump tables so you have a task running you have functions defined you a new task come or an updated task comes you identify the suppose you are executing updated if there is a function a and a new function a comes you load the new function a finds his address and whenever in the old task the control reaches to function a you define a jump from this function a to this new functioning so jump table but that happens and this approach is used in some of the server environment for live patching but this won't be acceptable to the regulatory bodies dealing with space industries so I have to come up with a way which can address their concerns and here comes my approach of using ELF format for application bindries so first of all ELF is portable so it's cross platform it's a universal thing it can run on any boards any targets and secondly it allows me or it it allows me to understand the need of the application binary and that's the reason why I took ELF continuing my discussion with this so what I did I I wrote a task manager layer that's integrated with free autos so it's a part of main system binary but sits on top of free autos and what does it do it has three main functions it's memory allocator it's registrar and it's a linker so this allows me this task manager layer allows me to control to completely control the requirement or the behavior of the application binary so as so as to address this or this helps me to address the safety concerns surrounding this approach or surrounding this module so by keeping the complete control by keeping the complete control on the behavior of the application binary I can ensure that it's not doing anything wrong so when an ELF so this task manager layer what it does is that when an ELF binary is loaded onto the system binary or onto the platform on the spacecraft it passes the ELF binary it does all the allocation on behalf of the app or the ELF binary app which is loaded in format which is which is loaded in format of ELF so it does all the allocation for itself it keeps track it keeps track of all the allocations all the memory freeze that happens all the stacks heap everything it keeps track of it it registers it with iris it registers it in a split tree or and a red black tree and it links with the main binary so whatever the application binary is doing whatever memory areas it it requires or whatever it has allocated this task manager keeps track of it and it's part of the main system binary or the it's integrated with free RTOS so and then uh so uh and then once it has done all these things it hands over the task to free RTOS and now the task goes into free RTOS domain for its control so here I got benefit of both the world I don't have to write the complete operating system myself I just I just wrote a layer on top of the operating system that gives me control of the allocations and linking aspect of the application binary and then I hand it over to the free RTOS layer so that now free RTOS can schedule it can provide the kernel resources whatever it needs whatever the application binary needs now free RTOS takes control of it so here I got benefit of both the world but by just writing a middle layer or a clue layer so what's the state machine for my system so at boot time on the spacecraft on the platform system free RTOS because along with task manager boots up the application developer or from the ground station an ELF binary is inserted into the memory of is is sent from the ground station to this to the to the spacecraft platform system it's inserted into the memory of the platform system the system the main binary or the free RTOS binary running on top of running in the spacecraft or on a platform system detects the new task it detects it finds out it parses the ELF binary of the application allocates all the resources links to the main system binary and do the migration whatever is do the migration of the old task to the new task and then starts the new task hand it over to the free RTOS and now free RTOS takes control of the task so boot time the system binary starts an ELF binary is inserted into the memory of the system memory of the platform system the system made the main system binary registers all the task to the task manager it's able to create new free RTOS TOS with standard free RTOS system calls and then a newly created task can be inserted into the free RTOS scheduling list and the old task can be merged with a new task and the new task is now in an OS control state free RTOS control state and it starts its execution again diagrammatically explaining how how things work so we have create task if it's unregistered onto the system at first point when it's when it stands for from this from the ground station to the satellite it's unregistered so we need to register it so the task manager has a function task registers then it registers the task then it passes the ELF binary allocates the task now it's in allocation state task allocates the function the task manager it's in allocated state then this task is linked and then it goes into OS control state after that it can be resumed suspended or whatever it is needed so how the migration happened as I explained in this diagram tasks start inconsistent state consistent check state decheck and the tasks start running again so here it is task is an inconsistent state wait till the task reaches the checkpointable state suspend the original task a newly newly allocated task or the newly allocated patch is allocated linked allocate and copy the memory section so the task so so the the stack and the heap of the old task is copied onto the stack and heap of the new task so that it gets the context of all the context of the old task update all the non-atomic pointer variables to point to the correct memory addresses start the task and put it into the OS control state and start the execution of the updated task what are the risks and unknown at present I as I'm this project is still under the in development phase I don't have exact performance impact of the task management layer this is something that I'll be working on how much time it's take how much time it takes to update the task at present I can upload what is the present state of the project I have just completed where a stack of an old task can be replaced by a stack of the its patch a task t1 is executing with the stack s1 and if the updated task t2 comes with the stack s2 then s1 and s2 are merged and the updated task start the next step will be to merge the heap to update the heap of the existing task with the new task then the next step will be to completely switch from task t1 to a new task t2 for example as I mentioned and if the satellite is executing it is taking images at present moment and if the ground station realize that they need to do altitude control then altitude control task or a switch instruction will be sent from the ground station to the satellite and it should completely switch to the new task so that still needs to be done and then I need to come up with logarithmic complexity and memory footprint of this approach now moving on to demo so it's a main what I will show is that main system binary is running on the target board main system binary means free R-toss along with task manager layer it's running on STM32 I will upload a new task from a UART to the target board and then you will see so this so the and one more thing the system binary the main system binary has free R-toss task manager layer and a static task that will output as altitude control altitude control altitude control then I will upload a new task from UART that will do AOCS so I will upload this new task from UART to the target board so you will see for some time for 10 seconds for five seconds you will see AOCS altitude control AOCS altitude control executing simultaneously and then new task or the updated task AOCS will migrate and altitude control task will stop and its their stacks will be merged and only you will see only AOCS AOCS AOCS going forward that's the newly updated task I can't I can't show exactly how the stacks have been updated behind the scene but you will get a fair bit of idea when when you see the demo of how this thing is working so this is STM32 I will upload a free R-toss and base task 1 to the board which which I call as a system binary then I will upload a replacement task over UART run newly uploaded task and replace base task with a newly uploaded task so this is my system I will start it in a debug mode I have breakpoints set at some important location so here it's main now here it it's waiting for the new task to be uploaded I will so I will go here I will get the app binary which is an ELF format to the UART port I will do that here you will see the output it's getting transferred through UART altitude control task started successfully now I will do this and let's see altitude control altitude control the task has not uploaded let me do it again so again I will start start runtime update altitude control AOCS altitude control AOCS both the tasks are executing simultaneously and now the runtime process will start update migration will start and now you will only AOCS so earlier for some time it was altitude control AOCS altitude control AOCS and now it's only AOCS of the task so the newly uploaded task has been updated successfully and it's it has started its execution going forward that's all from my side thank you and please feel free to ask any questions thank you