 Okay let's get started. So my name is Matthew and this afternoon we'll be talking about hardware assistive tracing on RMSOC's using CoreSight and the open core site decoding library. So the goal of this presentation here is not to show you how the presentation works but really go over how to use it. So the idea is to give a general overview of the technology so that when you get to start using CoreSight on your platform you have an idea of what to look for. The terminology is fresh in your head and you kind of know how the pieces connect together. So the emphasis will be placed on the integration that we have done with the PerfCore so that happened over the last year and we'll also be introducing the open core site decoding library in the course of it as well. So the presentation will start with a brief introduction on what CoreSight is. From there on we'll be looking at the required pieces that are needed to get CoreSight going on the platform. As I mentioned we'll go over what the open core site decoding library is and we'll finish with scenarios on trace acquisition and decoding. So if you go online or if you're looking at the documentation you will find that CoreSight is an umbrella technology that encompasses all the tracing need of an SOC. So we're talking about single step debugging, JTAG, IDE and in there as well is hardware assisted tracing. So the idea behind hardware assisted tracing is to record the instructions that a CPU is using without impacting what's currently happening on the CPU. Everything that we have done so far is upstream so all of the kernel part is already mainlined and everything is found at your driver's hardware tracing core site. So under there you will find drivers for the various IP blocks. You also see the framework that glues everything together and as I mentioned the integration with core site everything is there concentrated under that directory. So hardware assisted tracing works by coupling an IP block call an embedded trace macro cell with a CPU. So there's typically one one-to-one mapping between an ETM and a CPU and once the operating system has programmed the IP block there's actually no interaction that happens between or that there's nothing needed that is in order to get tracing going and recording. So you program the IP block you launch the traces and from there the CPU is not even aware of that traces are being collected. So as I mentioned earlier the goal is to not impact what's happening on a system while tracing but in order to do that we have to be mindful of the use case that is currently under consideration. For instance if we are if we have a trace scenario that involves a lot of DMA and we select an IP block a core site IP block that dumps the trace data into memory using DMA then obviously we'll have contention. So there's always a way to make sure that this doesn't happen by selecting different pieces of the topology and making sure that nothing gets impacted or the current processing doesn't get impacted. So this is a fairly simple depiction of what a core site system is and yet quite accurate and representative of what people would find on their platform. So on the far right we have the cores that are coupled with an embedded trace macrocell and the colors represent the the ID that the macrocell will introduce in the packet that they generate. So there's a specific and unique ID per macrocell from there the packets flow through the core site architecture all the way to the sync where they are collected. So when it's time to decompress traces these packets the packet IDs are used to split out the different streams that were collected. We feed everything to the open core site decoding library and in the end we get traces that are being decoded. Program flow trace is a term that you will find often in the documentation it refers to the format that gets generated by the tracers. The idea is to record only the events in the code that are moving instruction pointer around. So if you start executing here and at one point you understand that the instruction pointer moved from this place to that the only thing you have to record is when you started and when the instruction pointer jumped. Everything in between can be inferred simply because they are just instructions. So waypoints are those events that I just talked about. So branch instruction, exception, return events these are recorded and they are fed to the open CSD library and in return we get back executed instruction ranges and these ranges coupled with the original program image allow one to easily pick out that the path that CPU has taken through the code for the trace scenario. So when looking at a core site or enabling core site on a system there's a few things to consider. So the first thing everything that we have done so far supported upstream. We don't have support for CTI so cross trigger interface NITM. The cross trigger interface is DIP block that allows core site devices to synchronize with one another and the ITM is an older specification. So if you have an ITM on the board simply look at the examples that we have for the more advanced tracers like ETM V3 and ETM V4 and you skim out the stuff that are not needed and quite quickly you could get assist or a driver that supports ITM. I didn't provide an ITM driver simply because I didn't have any hardware. So I currently maintain two platforms upstream for core site. So the Vexpress TC2 for V7 and Juno for ARM V8. And between these two we are covering most of the cases that are found for core site. So most of the topologies that you will have on your board are more or less covered or provide enough examples are provided with these two platforms in order to give out a good example on how to do things for your platform. When we're talking about core site everything is platform dependent so the topologies will have different ways of configuring things. That's why we've decided to push everything to the device tree. So in there are simply lists of the devices that you have using the bindings and the graph bindings that were used for video for Linux are reused here in order to tell the framework what kind of topologies we have for a specific SOC. And when you have that, well, things should just work. But as usual the detail, the devil isn't the details. So there's two things to really keep in mind when looking at core site. So clocks and pyrodomain. So core site blocks are found on the ambibus. So your APB clock will have to be present in the device tree and manageable using the clock API. So if that is there, no problem, the drivers will do the right thing so they'll enable the clock when they require access to functionality provided by the IP block and we'll switch it off after. The harder part is with power domain where we have IP blocks that are split over different power domain on the platform. So typically the funnel blockers and the sync will end up in the debug power domain whereas the tracers will end up sharing the same power domain that the CPU they are coupled with. And usually that will end up being a cluster power domain. The problem is that when CPU idle decides to put the CPU in a deep sleep state, everything that happened on core site gets switched off as well. So synchronization on a power domain that is shared between a CPU and a device is definitely a problem. And right now in the Linux kernel, it is not addressed. Lina Iyer has published a framework to do that. There's good work doing in that area and I intend to adopt that framework as soon as it's available and merged upstream. But right now I simply decided not to do or to introduce any synchronization in mechanism in the drivers that are upstream simply because people would have to take that out before putting their own synchronization mechanism. So right now the drivers simply assume that CPU idle is disabled and I suggest to do the same if you are starting to work with core site. So by disabling CPU idle, he will get going quickly. You will understand how the pieces fit together and from there, if you want to introduce your own synchronization solution by all means. But as I mentioned, I intend to incorporate the work that is currently being done on power domain synchronization as soon as it's available. So if we have our clock and our power domains have been synchronized, the top part of the slide here, that's Juno booting. So on Juno, we have six CPUs or all ETMv4. So only the tracers will tell you that they're alive. All of the other blocks in the topology, they remain silent at boot time. So if you want to know what devices, if you don't have access to the device tree or you want to know what devices were instantiated at boot time, simply look at in SysFS. So the core site bus will list all of the devices that it's taking into account. And that remains for that doesn't go away. Basically, it's always available. So now that we had drivers and a framework, we quickly understood that we'd have to provide ways for people to access the technology by automating a lot of the configuration that is inherent to tracing with core site. There's literally dozens of registers to set up for even the simplest session. So integrated with Perf, it was easy for us. The framework was already geared to retracing, allowed us to highlight a lot of the complexity that is inherent to core site. And everything that we could not stuff into Perf, we simply pushed back to the open core site decoding library, which we've also integrated with Perf. So with a minimum of investment, you can get trace decoding and or trace acquisition and decoding using the Perf tools. Everything has been integrated properly. So how we integrated with Perf? Well, we simply represented our tracers as performance monitoring units. So from there, we can interact with the Perf core. And Perf is not even aware that behind the PMU, it's a core site tracer. So that way we were able to use the very tight control that Perf has given us, and also achieve zero copy between kernel data that are acquired in the kernel, and then send out to user space. So with regards to the PMU users on a system don't have to do anything. When the framework boots, it will simply register the new core site PMU with the Perf core. And from there, everything will work seamlessly. So the name that we have given to the PMU is cs underscore etm. And this can be found under all of the devices that were configured in the kernel, or all of the PMUs that were configured at boot time. So a sysbus event source devices will list everything that Perf knows about. And in there, you will find cs etm under which a bit of an anomaly that we've introduced is a symbolic link for tracers and CPUs. And we did that simply because from a user space, Perf tools point of view, there's no way to know what CPU is coupled with what tracer. So by adding a symbolic link, it gave us that information quickly. And we could also easily access the configuration for each tracers. Okay, so with that, it basically covers everything that is required for people to know about the kernel side of the solution. Which leads us to open csd. So open core site decoding library. It's simple. It's a standalone library that allows anyone who has core site traces to decode the streams. So it doesn't even have to be coming from the Linux kernel or the framework that we have any core site traces on on any SoC that conforms to the norm will be able to be dealt with using the open csd library. So it's a joint effort between Texas Instrument ARM and Lunaro. Right now it supports fairly comprehensive supports etm v 3 etm v4 and ptm. There's also support for STM and their the meeperical generated by the STM IP block. So as I mentioned earlier, we've in order to provide people with an example on on how to use the open csd library, we've integrated with that with Perf. So not only are the examples in the library itself, but on top of that we've provided people on an example on how to use in real system. If you're looking for more information on that library, there's an in depth presentation that was provided in a core dump blog earlier this summer. So the link is there. Mike Leach has written extensively on it. So if you're looking to spin off your own trade solution, or won't have more details on on how things work within the library, simply refer to that core dump blog. And it's definitely the right place to start. So now that we have drivers or framework, everything's been integrated with Perf, and we are also able to decode the traces. It's time to put everything together. It's not hard, but there's just a few things to keep in mind. So obviously the first thing is to get the library itself. One thing to mention here is that integrating with Perf so recompiling the Perf tools is mandatory, only on the system that will decode the traces. So on a target typically for trace acquisition, Perf tools don't have to be recompiled. So on a host where a trace decompression will happen, simply get the open CSD library, stick to the master branch. This is where we have our latest and greatest code nowadays. So the code goes there and all of the stable revisions are tagged. So the same way that we do for the Linux kernel. The how to.md is a file that describes everything that someone has to know about this specific revision. So not only how to compile it, but also what kernel revision is it synchronized with and some of the use cases that are required for trace acquisition and decoding. So there's a lot of examples there. But the reason why we have to keep the open CSD library in sync with kernel revision is simply because all of the Perf tools have not been upstream yet. So half of them have been upstream and the other half are currently working on it. And there's so much churn going into the Perf tool sub system that from time to time the solution breaks. And we have to adapt things in order to keep the functionality that we provide. So you will also find kernel branches on GitHub. So those are all of the patches in user space to support the integration that we've done with Perf. So we are looking to move away from that scheme. When everything is upstream, but for right now that's just a reality that we have to live with. When integrating with the open CSD, so when rebuilding the Perf tools, look for the very bottom of the screen there CSD.md coder. So in the list of files that are being compiled, if you see this file, it means that the Perf tools have correctly seen the environment variable CSStrays on this core path. So that environment variable should point out, should point to where the CSD library has been compiled. And if this is valid and running script is happy, then you will get the CSATM decoder. Otherwise, you will hit the stubs. And obviously, if you hit the stubs, Perf tools will be compiled successfully, but you will not be able to do trace decoding. So just an easy step. I mention it here because people tend to overlook it. Simply instantiate an environment variable with that points out to where the library has been compiled. And from there make sure that you're hitting the right file rather than the stubs. And that way, you will be able to do trace decoding. So now that everything has been integrated and we have our Perf tools, we can now proceed with trace acquisition. Because we have integrated with the Perf core, the only thing we have to do is use the Perf record command the same way that we would do for any other events. The event names, so in this case here we have CS on the score ETM. Between the slashes, those are the options that are relevant for a specific PMU. In this case here, the mandatory, the only mandatory option is the sync. So we specify the sync simply because on a typical core site topology or core site system, you will have more than one sync where to dump the trace data. And if you don't tell Perf where to send it, then this session will simply fail because the core site framework is not aware of where you want things to go. So once you have specified the sync, the specific about what process to trace and how you want to trace that process is specified. So pretty much the same way that you would do for any other trace events using Perf. So once again, worth mentioning the listing of all of the core site devices on a system. So if you don't know what kind of sync you have or the name that you have given to that sync, simply refer to the listing under the core site bus and you will have the information there. When we're talking about core site and hardware assisted tracing, we have to be mindful that it generates a massive amount of data and definitely advise to skim out some of the data only to concentrate on the areas that are of interest. So there's a few ways to do this. The first one is inherent to Perf. So using the U or the K option after the specific for a tracer will confine traces to either user space or kernel trace. Even then, there's a lot of information that gets generated, which is why we have decided to integrate with the extension of the filter framework that was already present in Perf. So two filters that we've decided to introduce. So address range filters and start stop filters. So if we're looking at address range filters to start with, so we have the first part of the command line is the same thing that we've seen before, dash dash filter. We're telling Perf that we are going to use a filter. So this is no different. The same thing can be done for trace point filters. And after that between the quotes, well, you specify the filter that you want to work with. Because we have an address range filter here, the keyword is filter. And when we're talking about kernel space traces. So the second parameter is the address and how many bytes you want to trace for. So the address is typically found in system.map that gets generated when the kernel gets compiled. So you simply look at the function that you want in there, the address that is related to it. You plug that in there, how many bytes you want to trace for, and voila, you have your address filter. Or your address range filter. For in user space things are a little bit different. So again we have our dash dash filter. But this filter specification is, as you see, a bit different. So the address that we have is a relocatable address instead of being the full address. So this will typically be what you find on the output of an object dump command. So once again the range, so how many bytes you want to trace for, and the full path to the binary on the system. That path is then used to correlate the address that was mapped in RAM. And that information is fetched from the perf.data file that we'll see later in a few slides. So address range filter works this way. We start filtering here, we stop here. So everything that falls within that range will get traced. If the instruction pointer happens to go outside of the range, everything that happened there will not be traced. So it starts up filters. If we're used the exact same example. So if we start here, we have our range here. Everything that happens between a range and outside of the range will be traced. So to start, what triggers the trace is the instruction pointer. Being equal to the start address and stop is when the instruction pointer has basically passed over the stop address. So it has a lot of potential for, it's a lot more inclusive, but also has the potential to generate a lot more traces. So here the syntax, again we have dash dash filter. The start keyword will tell Corsite what address you want to start from. Start again. The stop is followed. So the addresses are exactly the same way for the kernel. If you go into the system.map that's where you will find the information. Adrienne Hunter at Intel is currently introducing a way to specify the addresses with symbols. So if you simply have the name of the function, as soon as this word gets integrated, then things to just work without having to work with addresses. So if we're looking at the user space example here, start stop condition are the same. We specify, again, our relocatable address, the path. But here bring your attention to the fact that I decided to start tracing in one library and stop the tracing in the binary. So nothing prevents you from doing that. The frame is very flexible. And so an example like this would simply work on a system. Yes. Well, here it's a start-stop filter because of the start and stop keyword. Okay, so on the very bottom of the screen here I start working, I start tracing at a 72c in a library called libcstest and the stop condition it's in the main. So you start somewhere on the system and you stop someplace else. This simply highlights that you don't have to confine tracing to a single binary or a single library. And you can have multiples as one, multiple filters specification as well. Right, so only here there's only one but if you put a comma you can specify as many as you have address comparators on your tracers. Yes. Yes. A very good question. I haven't thought about that yet. The same thing would be useful for security as well. So how do you specify these addresses for the secure world? Yeah, so before, so you would specify in the same way, I'll just look at the space, but does the etm hardware have the capability to filter on addresses issued by something running at high or is it something that you have to do yourself? Repeat the question please. Who would be able to do the filtering? Is the etm hardware able to filter directly the addresses executed at high, instructions or do you have to do that in software? No, I think it has the capability. Yes. I understand. Absolutely. The security people have asked to see the exact same thing but for the secure world. So yeah that's a bit of a puzzle right now. So there's, I've been thinking that this the current scheme is very likely flexible enough to support it. It's just a matter of making it support. Okay, so the limitation on filters as I mentioned, you can only filter as many, you can only specify as many filters as you have filter capability on your core site tracers. So this is important, simply because every tracer will have a different amount of address comparators based on the implementation. This will typically be found, well it can be found at boot time in the information that the tracers exposed via sysfs or simply asked a hardware designer that worked on the system and you should have that information pretty quickly. Other than that we simply don't allow to have start and stop and range filters in the same specification in the same tray session. You can have as many start and stop filters as you want and as many address range filters as you want, you simply can't have them co-located in the same tray session. Other than that the scheme is very liberal. It should, I haven't been able to break it yet. Okay, so once again the integration with perf gives us the possibility to work with things that are already there. So anyone that has worked with perf already will be familiar with the perf.data file. So when you are starting to work with perf, with perf end core site and if you want to know if your app core site is alive on your platform simply dump the perf.data file and you will among other things like you'll see all of the events but you also see output of the packets that were generated. So it will look something like this. So the packets don't give you much about what happened during a tray session but at least it will tell you that something has happened. Otherwise you won't see any trace packets. Okay, so always a good place to start the perf.data file full of cool information along with the packets that were generated. So an f of theory it's now time to look at an example. I've decided to simply show what we can do with core site using a very simple example. On the left side or on the right side we have a main. So the main is just calling one example function so core site test one and it prints the value that gets returned from there and the core site test one happens to be located in a library somewhere on the system and the only thing that it does it goes through a parameter five time returns the value and that's pretty much. So using that it's very simple and yet a fairly representative of what someone would want to do to debug something on a system. So here we'll concentrate on what's happening in the test function. And for that well the first thing to do is grab the binary on the platform submit it to object dump and we see here that core site test one pretty much matches the address that we've seen before. So everything that we've seen before was based on this example here so that people can actually relate to the examples with what was presented in the in the slides. So 72c is the address of our test function. It's also interesting to note that because we want to use I will be using an address range filter the function goes for about 40 bytes. So with this information we simply go back to our target in the middle of the screen we are specifying CSETM the the sync that we want to use. We are using user space so we're tracing in user space the filtered keyword and from there the specification of our filter. So 72c we want to trace for 40 bytes and the full path to the library on the system. So Perf will go to work it will interact with the PMU that will start the tracing when the process is actually scheduled on the CPU and from there at the end of the session you will end up with a Perf that data file and at the bottom of the screen we see that Perf has picked up about 8k worth of data. So it's that simple. So typically what will happen is that we will generate traces on target and package all of the trace data for trace decoding on a host someplace else. When doing that obviously the Perf.data file is the first place or first thing to package but also the .debug directory. Why? Simply because under .debug Perf will put everything all of the binaries and all of the information that was that pertained to the session that we just went over. So here we have our kernel symbols. So this is typically the content of the system not map. The VDSOs the main for the session so we have the full path to the main. The libraries that we use so obviously the loader libc and finally on the far side we have libcsdesk which is the library that I have compiled for this example here. So all this is given to us by Perf the only thing that we have to do is pick it up package it with the Perf.data file and off to a host for decoding. So once we picked up the trace data and moved everything to a host the first thing to do is probably start with Perf report. So if you dump Perf report if you don't if you dump the output of Perf report to console using stdio you will get a flame graph of the hot spot that were recorded during the session. So this is just an indication of like somewhat what happened during the session because Perf report will use the address to start address of the range it won't tell you anything about the end of the range. So we have five ranges that started with the same address but did not end up at the same place these will be aggregated together and presented to you as like as a single data point. So it's a good indication of what has happened but for more information you really have to program your own script or use the built-in script feature in Perf. So this looks like this. So by default Perf script will simply give you all of the address ranges that were collected during the session. So in blue here if you look at the addresses and you go back to the output of the object dump output that we had for the library we'll see that we started the function at 72c some initialization was done and then the loop over five time and we return back to the main function. So because we decided to trace or we told course site to trace just this range this is just a range that was picked up. Okay so in blue we have the ELF or the relocatable portion of the address so we've seen that on the output of object dump. The most significant bit that's coming from the offset where the library was mapped in RAM at when the program was loaded. So this information can be gathered from the Perf.data file. You look at all of the events that the M map to events that were gathered for a session you correlate the address and the full path to the library and from there by adding the two you have the exact instruction in RAM where that instruction or where the specific were executed. So as an example of what we can do with Perf script we've decided to provide examples. So two examples that we have the range script and the deassemble script. So the first part of the screen simply shows you how to call the range script. You need a few environment variables so that's why I've decided to show this as an example. But the range script will print information as fairly useful. So the beginning and start of the ranges and again we see where it started where the initialization was done the loop and then how the code got out of there and went back to the main function. So these are basically instruction ranges that were executed. If we want more information and this is where the power of scripting and course like comes together we can use the deassemble script. So all of these scripts these two scripts are found on GitHub. So this is just I'm just showing here the way that they're being called and the end result is here. So this script the deassemble script was programmed by Tor Jeremiah at TI in about two hours over lunch. And this is the end result. So for each instruction range we have the file I was executed in the CPU it was executed on and all of the instruction in assembler that were executed for that range and the same is done repetitively for all of the ranges that were executed. So by investing just a little bit you can get a lot out of the traces that are generated. And this is just an example there are tons of other information found in the synthesized blocks by Perf that can be tapped in. So obviously there's a lot of things that I don't have time to talk about. First thing I'd like to highlight here is that we've just seen an example on how to generate traces for user space. The same can be done in the kernel obviously. But if you are doing that for the kernel be mindful of that the VM Linux file does not end up in the dot debug directory. So you will have to pick that up either from a compilation that you have that you have as a reference or on the system itself. So it has to be part of the of the trace of the information that needs to be collected in order to do trace decoding and you have to manage it yourself if you are looking at tracing in the kernel. So everything that everything that is done was by design the same way for Intel PT. So the idea is that if someone is using hardware assisted tracing on one architecture they don't have to relearn it for another architecture. So it simply follows the same framework and the same syntax. Everything while the framework and the drivers themselves are highly focused on CISFS. So if you are looking at spinning off your own tracing solution you will have to integrate with the registers that we expose in CISFS. And on the the streaming side everything that so all the kernel space solution is there. We are there's a lot of things there's a lot of Perf tools that are going into 4.9 and the remaining so Perf report and Perf script those will happen at later during later cycles. And as I mentioned we are working on a CTI driver so that should be coming at one point another in a month or two. So with this this is the team that has worked with me on CoreSight over the years. And it is now time for your question. I would like to ask what do you have some real example or case when it was useful for you and more about use cases and what it could be what it could be used for. What it could be used for it's basically right now this provides a foundation for building on top of things. I know that it has been used at customer sites or people that we have been working with for debugging like a frozen system. So when the system stops there's no output on the console what happened where's the instruction pointer. So that has been very useful in that area. So it's because of the massive amount of traces that are generated we have to understand that this is a micro solution. If you if you end up generated too many traces you're going to spend too much time looking at what happened. So you you narrow down the areas that you think are problematic and then from there on using course that you can really see what the CPU has done for a specific problem. So you might have a glitch in the rendering. So why is it that the glitch happened. You know that we ran out a RAM that we had a TLB miss. Is the cache a problem here. So by really it's used to really pinpoint problematic areas that might be really hard to debug. For instance if you're working in a scheduler and at one point another sting just stopped. So what happened. You know things like that. And could it be used for the performance how to say measurement. Yes. So cycle accurate is an option that we do provide and that could be used for performance measurement. Actually I think some people there's a group at ARM that does just that. They used cycle accurate in their diagnostic suite. Okay. And one more question about the could it be used for non Linux systems like microcontrollers Cortex and so on and so on. Absolutely. Absolutely. The only thing that you have to actually some people do that they have an RTOS somewhere on their platform and they stapled an ETM to it. So it doesn't matter as long as traces are generated and configured properly in a core site topology you will end up with the traces in the sync. And these traces can be correlated among all of the IPs that you have on a platform. They don't have to run Linux as long as the configuration is conformant with the use the trace scenario things will just work. That's one of the the advantage of using core site is that you can do tracing and correlate traces on systems that don't necessarily run Linux. Okay. Thank you. Yes. I have two questions. One is the time stamping the time you get is that generic in core site you use general purpose generated hardware. So the core site hardware will generate the time stamps that you find in the packets. But is this something you have to configure depending on let's say on the SOC you're going so you will have there is some configuration that are specific to that clock. So so you can I don't know all the details in that area but but you can you can ask to have a time stamp generated in the stream at every X number of cycles. So this is the type of thing that you can configure. You get this out of the box. I mean, Yes. For example, if you use OMAP5 or then you get the selection of four hardware counter which you can configure and depending on your maybe on your boot loader or you know on your BIOS you're using will end up in different configurations for that. So the clock and the time stamp that is found in the packet is based on the clock that comes into the IP blocks. And this is typically the APB clock. There's another there's another clock that you can use. There's two there's two clocks that can be used. There's one the APB clock as I mentioned and there's another one that can be coming from just any regular clock any clock that you choose to have in the system. And that configuration time in the drivers you can choose which which clock you want to use for synchronization. Just information somewhere in the configuration of the core site framework. And so we in the framework we do in the bindings there's a specification there's room for two clocks. So the APB clock and the one that I just talked about. But right now in the drivers we don't pick up the second clock. Like this is an enhancement that you would have to add. Probably a device three thing that would say pick up the second clock rather than the first clock. Thank you and my second question is well currently we see that we are tracing code somehow. But the I say if you look the overall system you have embedded system running multiple processes and threats and so on. So typically when we are tracing we use something like LTDNG. We see all the parallel activity on the course. So it would be a real a dream that that you can use the instrumentation which is provided in the kernel from for LTDNG or for any other kernel tracer and have maybe part of it in the hardware tracing unit. And but I think this is this is a long way for correct. Correct. So the idea is to provide building blocks that people can use to build what you just described here. So our main intention here was to provide a foundation so that people can converge toward a focal point in core site development rather than everybody doing their own solution. The first the first idea or first thing I would propose is having the scheduler just just getting all the scheduler activity traced in somewhere in some hardware buffers or in this core site modules and you get accurate scheduling trace of the system without interacting or interfering with it. So if you want to trace a scheduler core site is probably not the best tool simply because of the amount of data that gets generated. Right. But if you're looking for to trace all of the contep soaps or the contact soaps or the whichever process gets scheduled at this very instance or what happens during the transition from one process to another core site would be ideal for that. And there's there's in specification there's a room for a TPI use for a trace port that interacts with a port on the board and a decoded box on the side. So it allows to export data in real time. So that way you can generate massive amount of data but not impact anything that has that you have on the system. How how big is the internal buffer memory for storing that so it's a good question that depends on the IP block that you're using for as a trace buffer. So it can get range from you know four to 10 K for memory that will be embedded into the the block themselves. There's also provision and drivers that allow you to dump the trace dot into memory. System memory that is. So it can basically be as big as your system memory but then you can't do anything else on the system. Which is why it becomes important if you if you envision tracing anything substantial becomes important to just use a trace port and send everything out in real time to a host that actually has the memory and the capability to handle all this. Okay, thank you. Thanks for a talk. Sorry if I missed. Could you share some your opinion about security implications of such tracing technologies? I mean it may seem as a perfect root kit if you just come to the system which runs and take install USB debugging tool and take the keys and go away without any trace. Right. So to trace anything on the system you need to have root privileges. So as soon as you have root then you can use core site but as soon as you have root you can do a lot more damaging things than just using core site. There are some tracing technologies which don't even disturb the operating system which runs on the box. So you just use some hardware hardware support from the processor which runs the IPA rating system which will silently give you all the data about the running and operating system and the memory of it. What do you think about it? So what is the question? I'm not sure that I get where you want to go. My point is that tracing technologies have very hard security implications and does it is it considered when such tracing technologies are made for different architectures? Well again I'm not sure of the question. Did I consider it security in doing this? It depends on what kind of security we're looking at. Again as I mentioned in order to trace anything you need to have root and right now it doesn't support tracing everything that happens in the secure world. Other than that there's no security other that is built in other than what Linux provides in terms of the normal generic security mechanism. Okay thanks. All right have a good afternoon. Thank you.