 Can everybody hear me? Yeah, so this is about dwarf and the room has been emptied a lot For some reason, I don't know Not that I know Well, well, maybe a little bit but not much Okay, so Who knows about dwarf pieces already At least some people So one of the So one of the goals of this talk is of course to Explain a little bit about dwarf pieces so everybody who knows them already Maybe a little bit bought but I cannot help with that because it's really necessary So I'm just jumping right in dwarf has a concept of so-called composite location descriptions and Such a composite location description is a special Location description for describing the location of a variable, right like other location descriptions and A composite location description consists of one or more pieces, right and each piece has a simple location description and a composite operation and There are two such composite composition operations Namely the piece and the bit piece operation right They exist For a long time already But they have not been started using at the time at least not very much at the time they were invented Because the compilers didn't optimize that well at that time, but now We are seeing more and more use of these operations by the compilers. So they have become more important So let me quickly explain why pieces are needed at all. So one thing of course is The compiler optimization That For example spreads around things with an instruct, right When you have a struct and the compiler decides that one field in the structure is used Particularly frequently in a hot loop or something then this P this thing in the struct might be Allocated in a register and then some other pieces in the structure might not be needed at all They will maybe not be represented at all or they are only represented at certain times in the program Things like that, right and then reconstructing the whole structure is basically just a composition Of individual things, right? That's the piece operations. Well, that's that's what what pieces means so The same applies for bit fields and that's why we need also a bit piece operation All right, it's not sufficient to be able to compose pieces of full bytes We also need pieces that are just sub bytes, right? another Example is that an array may be spread across across vector registers. That's another type of optimization different one, right where Maybe an array of Whatever character a character array is spread across two vector registers or something then you would have also two pieces of this array And represented that that way another possible Optimization might be to optimize our high order bits in a general purpose register So not represent them in the register itself To avoid sign extension. I don't actually know whether the compilers do that right now Another reason Apart from compiler optimization to represent pieces or represent multi-piece composition and composite locations is That some objects might be split naturally for example A long double value Might be distributed across to floating-point registers on some architectures, right? Some architectures express it like that and then the hardware can really deal with that Data and and really do as arithmetic with within these two registers Or Another example is that some ABIs pass short small structures also in registers and even if Two registers would be needed for that right and then basically the structures composed of these two registers Needs to be pieced together like that So these are some reasons for pieces so right now they already appear very frequently in optimized code anyway and So what can a piece be taken from of course a simple location and These are the basically the types of simple locations that that we can use for that So register memory a dwarf stack value So it can also be Something that we have computed from some other stuff, right? And an immediate value which is basically a constant so a literal Or it could be empty Which means that this piece is optimized out then I Have an example here. This is now a little Cut off, but well So you have this structure here and The first element of structure is a character which means Or in this case in this example the compiler decided to Represent this don't to not represent it as out at all But the compiler knows that in this point in the program this character has the value 5 you know then This can be expressed by this small dwarf code up there Literal 5 then take the stack value and then take a piece of one byte of this After that we have three bytes alignment gap So we expressed it by just a piece operation without anything in front of it which means that this Alignment gap is not Not there. There's there's just optimized old stuff and then the next next object would be an integer and The compiler decided to put that in into register for whatever register for is on this architecture I don't it's not a real example here so This can can be expressed by by this code and so on right it's not very difficult The last example here with a short might be a little bit interesting because I use a bit piece here You cannot see the parameters for that piece, but basically The the bit piece parameter the first parameter is the size and the second parameter is The offset, okay the size would in this case be 16 and the offset 32 Yes, well In theory yes, did you intend to write that yeah, yeah, yeah, I Hope that somebody will jump at this Which it is Yeah, right. Yeah, I mean they don't overlap so Yeah, the integer is four bytes long and it's right aligned And the short is two bytes long and it's left of that left from that in the same register Theoretically possible. I don't know If the compiler would really Yeah, it's possible that compilers might do it do this. Yeah, I haven't seen it Consciously yet, but maybe Now another interesting question is Because we just talked about the short living in this the short value living in this general purpose register At some point in the register So but in what at what point in the register, right? So so this is a this is a topic of peace placement the placements of pieces is defined in the standard very vaguely I would say So this is a text from the standard here If the piece is located in a register But does not occupy the entire register the placement of the piece within that register is defined by the API So this is for byte pieces and for bit pieces the definition is a bit different It says that The offset is from the least significant bit and of the regis Whatever that means And These two don't don't agree. So I don't know what what really is going on here Yes No, I Mean the discussion went on for many weeks with many males being exchanged and was no result No I haven't seen the rational I have asked for it Yeah, I don't think they have it so Maybe maybe they still have it, but I don't think so It's pretty old right. I mean But what pieces themselves are pretty old and then there was a Refactoring for the bit piece stuff, but even for that. I'm not sure anybody still really knows the rationale for that anyway For memory pieces and implicit values and stack values. There's also a definition It's it's even a bit different now here Of course on memory the situation is different because what I expect the piece from memory to to look at to look like is of course that we Start always start from the address That we have given not backwards, but right always On the positive side of that address. That's my expectation. I mean I think to everybody else expects it like that Also the compilers certainly do I think But for implicit values and stack values It's not that clear. I mean for implicit values My preference would be to treat it exactly the same as with memory because implicit values look like memory right But that's also not what the standard says right now As you can see whatever I'm just showing you this because that's the only information we have about this and it's a bit inconsistent The implicit piece placement is also something that is not really defined by the standard The standard doesn't even say that this is defined by the ABI this then it just does says nothing about this namely if you take If you don't use a piece operation, but just take a location and use use it Use this location for a type that is smaller than the location itself, right like when you have a register and But you only use 16 bits of that register which 16 bits are you using now, right? It's not obvious in some cases if you take a short from a general purpose register Then it's pretty obvious that you will probably want the least significant bits of the general purpose register If you take Character array of three bytes and it's a bit less of obvious now So I don't really know and You can also look at this with other objects like with stack values and with vector registers and with floating-point registers in all of these cases It's not completely clear how the placement looks like but There is at least in many of these cases and implementation in GCC already that does something It's not necessarily true that it agrees with itself in all cases and not with GDP also sometimes Okay, whatever one question that I had when looking at this was if if this is the case and we can just take a location and use use a Smaller piece of that location just implicitly then what happens if we Suffix an actual piece of operation after that is that a no-op then It's also not obvious and I think most people think that it might be but There are actually counter examples for that for example Let's look at inter inters 80-bit floating-point registers When storing them in memory there are actually 128 bits You can take and and implicitly you usually take a 128 bit piece of that 80-bit registers Huh, which is actually too large, right? What happens now? so, yeah so there's some Things that are not completely clarified yet at least Now let's look at some stuff that should be more obvious The composition of pieces if you have multiple pieces and you piece them together. It's obvious how to do that, right? Yeah, if they are bite aligned at least my feeling is that this is mostly obvious because what you probably just do is You look at them in memory representation of your value and then you just Add up the bytes at the end right you just put them at the end But when it's bit aligned then only bit aligned. Yes, so you have two bits You need to append them now to your already existing pieces. Where do you append these two bits now? That's a bit less obvious. They're basically Some assumptions that I would state about this namely That if you do that then no bite alignment gaps result out of such such an Appending bits operation and also the next piece starts at the same bite or directly after the the last bite of the previous piece and the last bite only if No, not a single bit doesn't fit in this bite anymore, right? And within a bite we use the ABI's bit allocation order What that means is basically just this right? If you write such a struct Then you have you then you know the bit allocation order of your ABI Obvious, right? Well is it so what does it look like? I mean on little Indian platform on at least on Intel platforms It's pretty easy The bit zero is the least significant bit Right and Bit seven is the most significant bit on big Indian platforms. It's usually exactly the other way around and There might be platforms which do something completely different. I don't exactly don't even know Okay, so now that we somewhat understand Where pieces come from and how we compose them into a composite thing How do we actually use that in in GDP for example? We use that by of course by reading from such The thing and writing into such a thing right and what we actually read and write may not be the whole value But just part of the value because Because of struck members right we want to write to a struck member read from a struck member and the structure is itself composed of multiple pieces and We apply this read or write operation To some some of the pieces or whatever right? The value also doesn't need to be the value that we read or write doesn't need to be by the line because it feels right It also can spread multiple pieces When we want to write a subtract or Or whatever larger thing from From a strapped and Also doesn't need to be piece aligned. That's interesting because if we have a union and one part of the union is The one that has been optimized by the compiler to Load stuff into registers and things like these but the other variant of the union looks completely different and is not aligned with The values that are allocated into registers and and things like these then but in GDP We want to look at this now or read this or write this then we would like to be able to Ignore the piece alignment in the and in the one variant and Just do the right thing right whatever would happen in the real system as well And so what this means is we need a general bitwise copying function Yeah, everybody understand what that implies a General bitwise copying function really copies a number of bits any number of bits an odd number can be from an Odd number an odd bit offset whatever to another odd bit offset, right? so Sounds pretty easy but Well, no, it's not and maybe Maybe because of the problems that we have actually seen in GDP maybe that's one of the reasons that the The vagueness in the standard has persisted for so long Because nothing of this has ever worked correctly Is mostly completely broken up to last summer or something So these this list of fixes that has been applied to GDP is from 2016 and 17 Most of the fixes were done last summer In one large patch set where the whole piece handling logic was basically rewritten So just an example the copy bitwise function was almost correct. It just sometimes corrupted bits Yeah This happened on little and big Indian systems so So whenever basically whenever We really needed to do odd copying what it's copying then it was broken more or less Also when Writing piece values the size there's a size capping logic, right where we determine When when does our value that we talked about the sub value, right? This value here when does that stop and that value was also Broken it just Calculated the wrong size right and then also on big Indian systems The dwarf stack value pieces were taken from the wrong end So as soon as you actually needed something that was smaller than a stack value from this from the stack It didn't work You usually got zeros now this thought problem caused many fails and the GCC test suite Many hundred fails and GCC test suite which were ignored basically for many years In the GCC test suite we didn't have any tests and GDP for that That's also one of the things that has been fixed. We now have test cases for all of these problems Then the The value parents offset basically means if You have a substructure in a structure then you have a parent right and There's some internal GDP logic that deals with this with the offset within The larger thing and this offset was just ignored. So if you actually dealt with A substructure and a member of that Then the offset calculation was completely wrong then Then there was another copy paste error in the size calculation, which I don't explain now The transfer size is interesting if you write a piece value, then you have to check Then there was an optimization basically that checked whether we can copy the stuff bite-wise and if Everything is bite-aligned then we can do this But this check did not check for the size only checked for Whether we start at the bite offset right But if the size is not bite-aligned then This was also broken So What else? memory pieces had to say had the wrong buffer offset you are So as soon as you needed an offset into a memory piece that was also broken Then The bit fields is an interesting thing If you assigned a value into a bit field from an integer From gdb right if you said a dot x equals five Yeah, and a is a structure x is a member in the structure and and x is a bit field member and Five is your value that you provided yourself then Gdb represented the value five as an integer as a Full-size integer right not just How however many bytes the bit fields holds? so On little engine systems Actually the least significant bits of that thing were then transferred but on big engine systems The wrong bits were transferred and usually you got zeros there then Regents of pieces on big engine targets Were completely broken They only worked in very special cases where everything was bite-aligned and very simple In other in all other cases that were just broken. I mean, there were so many bugs. I cannot even list them here This this logic was completely redone and And then another interesting thing is that obviously the compiler has not emitted Offsets for bit pieces very often because they were always ignored from by gdb up to last summer And the offset was just replaced by zero silently For whatever reason, I mean, yeah, it was just ignored. It's now respected But I'm not sure whether the location is actually done correctly The placement right which I talked about before because I really don't know what is to be expected there then There was also a bit and bite offset mismatch and a parameter to Read value memory. So this is part of respecting the piece offset as soon as we respect the offset Actually passing it to an internal gdb function correctly Is also something that was done wrongly before Okay, so there's just an overview of the bugs that have been fixed. There are still some open bugs That I know of at least and they're probably much more So first of all pieces from vector registers on IBM Z at least and maybe on other platforms as well are taken from the wrong side They are taken from the whatever the least significant bits of a vector registers would be I mean They're at least taken from the high address higher address bits, right the highest address bits and That's Not compliant with the ABI Because the hardware doesn't like that right the hardware allocates floating point Sub objects on the left hand side on so on the lower address bits in the lower address bits so in the high in the most significant bits basically and And another problem that is related not exactly Part of the dwarf piece thing but related is that we have no support for unwinding parts of a register which is needed now that Some registers are maybe not cannot can maybe not be on one completely a Vector register on Z for example cannot be unwound completely But a floating point portion in it can maybe be unwound so So the other third the other 64 bits are Undefined and that is not that cannot be expressed right now The only thing that can be expressed is that everything is undefined and that's bad so and another thing is that May something that may complicate our logic quite significantly in the future is if we ever want to support The endianity attribute of dwarf that is not supported by by GDP at all right now. It's also not Specific to pieces. It's just a general thing. We do not support that at all Hmm, I don't know I would expect that Ada could admit it for example Yeah So I I would expect it to be admitted. I haven't checked. Yeah, it's not respected right now Yeah, so that's all I have Any questions? Yeah Yeah, yeah, they were red red rejected. I filed two of them. They were both rejected Yeah, I can send you the reason I'm curious about all their toolchain like The same problems different problems Then you said GCC didn't sometimes didn't agree with itself Yeah, in the case of Vector registers on the right now, that's that's where GCC doesn't agree with itself It actually emits bit pieces For things that should be left aligned With the same offset as things that should be right aligned So it's indistinguishable by GDP what GCC meant by that Buggy LLDB as far as I have seen so far the last time I looked at least LLDB just Was the log LLDB's logic was just inspired by GDB's logic. So and GCC's So I I think we are ahead of this right now with GDB. I Think I'm not sure is anybody from LLDB here or does anybody know more about this? I Don't know I've tried to get an answer from them They have Not responded Well It's not completely true. I mean I asked I talked with Ulrich and Ulrich Has the same question marks In his head as I have but that doesn't help us very much further. I wanted to get an answer from Jakub for example He didn't respond I don't know why probably because he thought that it's something that he needs to Devolve more time on and he didn't have a chance to do so. I don't know This is something that Yeah That might be yeah, it might be possible apart from the fact that it's currently broken but we have to fix it somehow Right, yeah Yeah, unfortunately, I think at the moment, I'm the only one working on this And so I would have to make the patch for the compiler to fix it Yeah Well Some things could probably be expressed in an architecture independent way still right so, but I think I really think that at the time when The the dwarf standard or at least these parts of the dwarf standard were created Not all of the variants were known or were taken into account, right? Some things didn't even exist at that time like I don't register just started to come into existence, but Yeah, so that's my feeling that these variants were not really taken into account I did I did yeah, yes, it was there was discussion they were rejected Yeah There was a lot of discussion I Didn't come through. I don't know why I tried everything I could Issues are actually on the dwarf standard They're mostly rejected with quality of implementation Yeah Yeah, maybe At button for subsystem of and I created some patches and Station was not clearly described in dwarf standard. So I created a call review issue at fabricator at Well, we am and included some debugging for subsystem maintainers in there and all of them are actually Members of for dwarf standard committee and after discussion on the code review They asked me to put an issue to work with to be least and after that it was accepted so first show them the code show them why it should be implemented and why it is Why the standard is not very clear about that and then they will point you at the workplace to be so just try to To take me from the other end Yeah, I think first first the tools have to be somehow When you have the tools agreeing then you can come to the standard saying Right Yeah, I tried it that way. I mean Problem doesn't have nobody talks to me. That's my problem, right? Jacob Jacob is basically the GCC guy for dwarf. He doesn't talk to me So who should I talk to? Jason Yeah, yeah, that's maybe not important enough. I don't know but Where it's not clear what the what what listen if Yeah Yeah, but no architect no ABI does that not even the x86 one I mean why would it Yeah, but even then it's it's not what we not not what the compiler does right now And it's it would also be bad. So we don't want that. It's not not efficient, right? I Mean then that we would have a lot of binaries that are broken right now if we would change it like that We could then not express left-aligned pieces in vector registers with Byte pieces, but we would have to use bit pieces But we cannot use bit pieces because we don't know the offset understand the problem binaries are already Yeah, but the problem is if if you if you have a vector register and you want to Take a piece from the left side, right? This is not the least significant on big Indian systems the least significant is this right okay, so Right now the binaries just emit dwarf piece operations that expects to align on the left and That is also what is best for this architecture in this case right because if we wouldn't have that Option then we would have to express it as a bit piece from the right From the least significant bit, but then we would have to provide an offset, but which offset do we provide? If we provide 64 as offset then it works for the vector register But it doesn't work on old platforms that don't have vector registers Because the same register number has been reused It has just grown, right? It has grown on the right-hand side Okay, so that's but that's a special So I think what we want is to preserve the current behavior and make the wolf standard adopt to it I've made a proposal we can talk about that or fine, okay, I guess we need to so thank you