 Je. So. I everyone thanks for coming. My name is George, this is Nikola Brica, we are coming from RTK company from Serbia. So, we have been working as a team on various projects in field of compilers, debuggers, profilers and so on. We are going to be talking today how far can we go with debug info in Optimize Code in LVM, especially with unimprovement. By looking at function parameters entry values at call point, at call side point. So, since we know that debugging Optimize Code is very challenging in these days, we work very hard in order to improve that. So, at RTK is software company specialized in system software and better systems. So, as I said, we are part of the group working on compilers and tools from open source area, such as GCC, binutils, LVM, luogit and so on. So, the group has been working on LVM since 2010 and we are working on debug related issues in LVM for the last two years. So, yeah. Here are points we are going to talk about today. Firstly, we will start with some general story about debugging software release products. So, also we will describe a technique that can be used within debugger for finding some actual values of function parameters, even they are described as optimized out. So, after that, we will point to some parts of our implementation in LVM. We took to support this feature and at the end we will show you some numbers that actually confirmed that we are improving something in debug location coverage. So, yeah, let's start. So, software release products are compiled with some kind of optimizations. So, for compilers like GCC and LVM Clang, it is usually O2 or O3 level. So, such a product might produce a core dump file on an embedded system and it could be starting point of our analysis of the problem. So, first thing we usually do after loading core file into debugger on our local machines is analysis of the call trace from the crash. So, to the optimization pipeline, we lose a lot of debug information. So, in those call traces we see a lot of optimized out parameters values. So, actually we also notice in those debug sessions that even if a variable is alive at some point, it could be clobber even earlier and reported as optimized out. So, developers with more experience with debuggers are aware that some of those values could be found at parent's frame or frame above. So, yeah, looking at this example here, we see that parameter X is reported as optimized out. But if we jump into frame two, we are able to print actual value at call site point. So, yeah, if we can do that, why not optimize all these products within debugger? But in order to do that, we need additional information by compiler side. So, in this example here, we see all the process optimized on both sides, debugger and compiler. And we also see these entry values printed out. So, actually those optimized out values turned into actual values. So, yeah, a little bit of motivation here to listen to rest of presentation carefully. By implementing this feature in LVM Clang, we noticed that the number of fully covered function parameters increased even up to 15%, which is a very good number. So, yeah, there are several dwarf symbols introduced in order to support the feature. So, does everyone know dwarf? I hope no, yeah. For those who are not familiar, actually dwarf is standard for describing debug information on Unix-like operating systems. So, compilers generate that, so debuggers consume it. So, yeah. To mention that it was initially introduced by Jakub Elenec, and it was implemented in GCC and GDB since 2011. So, basically, those two tags represents call site and call site parameters. So, those attributes describe them. So, there is also dwarf operant that can be used within dwarf expression for describing actual value for parameters. So, let's jump into real example. So, we see here a real call site printed into debug info section. So, it says that it lives at this address in memory, this call PC. Actually, it is address of a call instruction in memory. This call origin actually represents a reference to debug information about call function. This call site parameter is a child of this particular call site, and it says that it lives at register 5, and at call point, it had value 7. So, yeah. It's another example. So, if the parameter has untouched, unchanged value through the course of the function, we can use this entry value, actually entry value for describing actual value for that parameter at the places where it has no location information at all. So, let's imagine that a parameter has its code from address X to address Z, but from X to epsilon address, it has location information. But for the rest of its scope, we generate additional entry value for describing the rest of the scope. So, yeah. So, let's do. Nikola will continue with implementation details. So, hello, everyone. I will present you entities that we introduce on IR and machine IR level in order to follow information about call site to the compilation process. I will also introduce you with some of the concerns that we have about our certain parts of implementation. So, how many of you is familiar with debug info metadata in LLVM? So, basically, I will explain it briefly. Basically, it is inspired by dwarf tags, and it is used to represent source entities such as file, functions, lines, lexical blocks, variables, et cetera. So, since debug info metadata is inspired by dwarf, and we have that resemblance, and George showed that call site information in dwarf is represented by tags, tag call site, and tag call site parameter, we decided to introduce di call site and di call site param metadata. So, here we can see for call instruction that it has attached metadata node call site, and it has attached reference to call site metadata. And we can see that call site has following arguments such as scope file, it has reference to array of di call site param, it has line, and it has reference to cold subprogram. And for di call site param, it has argument number, it has reference to variable metadata, and it has an expression over that variable. Now, if cold argument is constant, here we would have only di expression, which represents constant. So, this di call site param is used as a backup location for catching this parameter at entry. I will show primary location later. And with this metadata, we actually emulate jumping back into functions frame and printing certain expression. We also need to mention that in order to use this new value and to value for representing other variables, especially parameter variables, we need to verify that parameter has never been modified in the function. So, in order to check this, we used semi constant check, and we wrote that information inside parameter variable debug metadata. So, this approach has certain benefits and it has certain limitations. And regarding benefits, it resembles to dwarf tag, and it follows that idea. Also, we have additional backup for representing of cold site parameter once primary location is lost. And also, we are able to produce tag cold site parameter that can have DVOP entry value in itself. And that means that we can look two or more frames up in search for some entry value. And regarding limitations, there is no support for representing expression between multiple variables. To do so, we would probably need new kind of debug metadata. Also, since we are emulating jumping back and printing variable location, we are able to do so because there is a system for tracking variables location in LLVM. But there is no support for tracking functions return value location. And that is the reason that we can represent arguments that are functions, function calls. Also, there is no easy way to represent address of variable. It is hard to distinguish in debug metadata between variables address and variables value. It is not impossible, but it is pretty complicated. Also, in order to provide reference in di cold site parameter to di variable, variables metadata, we needed to change prestandard and stable interface of di builder interface. Previously, creation and preservation of variables metadata was done in one function call, but we needed to separate this. Next pass that I am going to talk about is instruction selection pass. Now, it is important to mention this pass since this is the place of our implementation that could possibly be improved. So, we implemented general algorithm that should work for various architectures. Job of this algorithm is to recognize instructions, copy instructions that forward function arguments to the following function call. So, we can see, for example, these pseudo instructions like this. So, process of this algorithm starts after call lowering, target call lowering, whose result is target call lowering info object. And this object in itself contains sequence of selection dag nodes that represent call sequence. So, we iterate through this sequence and we search for copy to reg selection dag nodes. These nodes should later be mapped to copy instructions. And then we try to match copied value with one of the functions input argument. Such verification is required because we could have additional copy instructions inside this calling sequence. For example, for variadic functions, additional register copy instruction is required. And some of the function calling ABI's might load additional value that is not function argument. This algorithm could be lowered to target specific level more precisely to the level where call sequence is being generated. So, once we have matched these nodes, we just preserve them in instruction selection representation of call site and later we emit them as dbg call site and dbg call site pair pseudo instructions. So, our backup implementation pretty much relies on how dbg value pseudo instruction is handled. This pseudo instruction is used to track variables location in registers, virtual or physical stack locations or in some addresses. So, here we can see this dbg call site and dbg call site pair instructions. For dbg call site, first operand is Boolean value whether reference call is tail or not and the second argument is provided if call is indirect call and then it is that calls register location and last argument is reference to di call site. Now, this could be implemented differently but we choose to keep all information about call site at one place. Now, we can see here the dbg call site pair instructions are attached as a bundle to dbg call site. And the first argument of dbg call site pair is a register that forwards argument to the following function call. Second is reference to di call site pair and last arguments represent the location that is loaded into parameter forward register. Now, I will mention that this machine, machine IR is produced with lm4.0 and I have just stripped some of the instructions in order to call to be more clear. And here we can follow behavior of variable c and we can see that it is indeed forwarded as the first argument of function foo. So, its first value is value 4. After this function call, it is returned as eax and moved to ebx register and later it is forwarded as a function call argument through edi. So, in order to handle this instruction in backend we needed modification in a prolog, epilog in setter, pass, register allocation, split kit, virtual registry, writer and most important for us in light debug values pass. Now, job of this pass is to broadcast dbg value pseudo instructions into successive blocks where preserved location is not clobbered. That location is valid. So, as a natural, it should emit replacement for parameters for dbg values that represent parameters. And once these locations are clobbered we should emit this dbg value with this new expression dvop entry value. Also, this pass knows for each basic block range which variables are live at that point. So, also we adjust dbg call side param instructions here. By adjusting, we mean that we delete these instructions that are not valid. Instructions that are not valid are ones that do not have primary location nor backup location. By backup location, we mean invalid backup location are ones that reference variables that are not seen at that point of block. Also, regarding producing this, printing this location in object file it is done in dwarf debug pass and it is handled similarly as for dbg value instruction. So, there is nothing special to say here. It relies pretty much on similar structures as dbg value. So, Georgia will now present you some measurements that we have. So, yeah. We'll show you some numbers that actually confirm our improvements. So, in order to do that we used lock start tool from L-Futiles package. It actually looks for a scope of variable and calculates debug location coverage we need. So, for testing purpose for these slides we used gdb7.11 and spec 2006 benchmark. So, also we noticed increment in debug location coverage. Also, we didn't touch code generation which is very important. There is no change in text BSS or data sections. There is only change in dot debug sections as expected. So, yeah. And just to mention that for 02 and 03 level we noticed very similar results. So, yeah. First example is gdb7.11. So, we built the latest release version of that. So, we noticed over there the increment of fully covered function parameters for about 15%, which is about 17,000 more debug variables with fully debug location coverage which is very good number, yeah. So, average coverage variable is increased for about 10%. So, there is no change in code generation. There is only change in dot debug sections and in this case we noticed build time increase for about 2%. So, yeah. Another example, yeah. We built spec 2006 benchmark. It is huge project designed to stress processor, CPU, compiler, memory subsystem and so on. It is pretty standard. So, over there we noticed the increment of fully covered function parameters for about 8%, which is about 12,000 more debug variables with fully coverage. So, in this case we noticed the increment of average coverage per variable for about 4%. So, there is no change in code generation. The change as expected is only in dot debug sections and build time in this case increased for about 1%. So, yeah. Those improvement in debug location coverage is, yeah, most important. So, I will just wrap up this presentation. So, after we finished this implementation, we identified two main spots in our implementation that require further discussion. So, there is a question about usefulness of the icolocite param and there is a question whether algorithm in instruction selection phase should be lower to target specific level. But that requires target specific knowledge. And what does our implementation provides? It provides most importantly correct debug data. We tried to be strict as much as we could in order to achieve this. Also, our measurements showed that we didn't touch code generation process and this is important because our implementations takes part to the whole compilation pipeline. So, it is also important to mention that it provides infrastructure for collecting information about colocites. We believe that we have touched all necessary passes and that current flow is sufficient, but it certainly needs some improvements. It gives us desired results. It gives us improvement in parameter location coverage and it provides us new functionality in debugger. It provides us functional antivalues. And these antivalues in debugger clearly provide better user debugging experience in programs produced with LLVM. So, before we finish we would like to mention that we have implemented this feature in collaboration with Cisco, especially with Anant Solvda and Ivan Bayer. So, thank you for listening and thank you for the time. I was curious in the slide where you showed the measurements. So you showed dash G dash F in program entry values. Is that dash F flag added into the dash G groups or any of the G level groups or is this something I explicitly need to enable? Actually, firstly, just to mention that we, yeah, so the question was, does we include this F emit param option into dash G default option. So actually for now, no? Why? We initially implemented this for an internal version of LLVM 4.0 compiler and we tested this within Cisco on large projects but and also we introduced a new option in order to make all of those testing and make sure that we didn't especially co-generation and as soon as we make sure that everything goes well we will include this in default minus G and actually, just to mention that we are in the process of backporting this onto LLVM tranq, latest version, actually we backported it. So we are finishing testing phase and as soon as we we will post those patches onto LLVM, so Yeah. Yeah, thank you. It's supposed to be this month so yeah, we expect this. So the question is where is the target specific implementation? Target specific implementation part is instruction selection and it is part for selection dag nodes generation. It concerns about how some targets generate this core sequence and we notice that there are some issues for matching those selection dag nodes because for example you could have some node and you could have that node value wrapped up with some extension for example zero extension or sign extension and you can't match these two nodes you just need explicitly to match this selection dag node which does not have this extension also sorry we also implemented some salvaging function that preserves some location where we for example, when we lose track of primary location in virtual registry writer we go back up to stream to search for instruction that loads that parameter and we try to interpret that instruction that is salvaging for x86 lija instruction for example in the upstreaming process are you going to implement that for all targets selectively you don't have to turn this on make sure the back end supports it sorry can you repeat the question in the process of upstreaming as it stands now it sounds like some parts will only work in x86 so you how are you good to try approach the back end implementers to get them implemented or implement it yourself no so the question is whether this only works for 886 architecture we just tested it for x86 architecture we tested this feature only to build complex software for various architecture such as MIPS and ARM but we didn't go into deep details to see where we lose information actually when we started looking at this feature and we did we had seen those measurements we calculated those numbers also with GCC and LVM doesn't make sense because the code is different generated code is different but honestly LVM is still behind GCC in debug location because this feature is pretty standard over there since 2011 and it was improved in stages so if your question was does GCC has better debug location coverage still it has but LVM with this will have certainly better location coverage and we are running to even be at least like GCC yeah I had a related question to that so this is really useful work because it's an area where LVM debug optimized code in LVM is behind GCC and has been improved have you looked at the impact of this actually on the GDB regression test suite because that actually has some quite good corner cases of testing things so the question was whether we looked at GDB for some corner test cases thank you for that question we didn't and we will gladly look it we will investigate it, thank you so just to mention here that LDB doesn't support reading of entry values I think that for now it will be just ignored but within GDB we tried debug user experience even with binaries compiled with LVM with this version and yeah it works for a bunch of cases so thank you for the question yes actually the question was if this functionality is available only for 4.5 actually it was introduced as a GNU extension and used in GCC like that but since 4.5 released it is part of the standard so for now for this purpose we internally generated GNU extensions as well but yeah it's supposed to be both of that sure, yeah we should support GNU as well as 4.5 symbols so yeah, thank you for the question so on these metrics I'm not entirely sure exactly what the metric average coverage per variable really means but I'm assuming that it's impossible to get 100% because some variables will be completely eliminated there's no way even if you were perfect to find it do you know what the theoretical number would be so the question is so the question is what is the theoretical maximum for coverage percentage we use this tool logstat now it is not able to measure variable visibility from where it is defined to last use of that variable it can only measure its coverage only to that lexical block where it is defined so these numbers just show us how much we did improve show that we just improve something and we similar this tool was also used by Jalinek in his paper to show up this improvement actually it would be perfect if we have a tool a tool that actually looks for a life of variable but yeah, we don't have that we use just this one as a reference to measure improvement if someone knows for some tool that looks for variables live please be free to advise us thank you