 So, yeah, so I'm going to be talking about, well, I think the talk originally says 16-bit char support in Klang and LVM, but it's actually just more generic, slightly more generic, just generally non-8-bit byte support in Klang and LVM now. So first of all, let's just go into a bit of background. So we have this thing in C called char-bit, which tells you how many bits essentially your char has. The standard basically says that a byte contains char-bit number of bits. Generally, this is used to mean the number of bits in whatever your native machine type is and therefore it means the sort of size of the minimal addressable unit. It doesn't have to be, though, I mean you can target your C can be for some sort of slightly more abstract machine, slightly more abstract machine where you don't have where it doesn't map directly down to a machine sizes. Also according to the C99 standard, I think the older standards as well, this is at least eight. I think possibly the outset of C, beginning, you could get away with seven. I believe there's some really weird architectures with less, but for the most part, it's great to have eight. POSIX says that it must be eight, exactly, and the general assumption among lots of people who aren't compiler developers or working deep in systems, there's the assumption that bytes are eight bits and it kind of means that now and generally everyone assumes that and therefore it's also assumed that char-bits equals eight and lots of people write code assuming as such. So we have architectures which don't have this, where we have a char and a byte which isn't the normal lovely eight-bit byte. It matters to us, so it doesn't matter in general, but you wouldn't thought it would matter because all the machines that everyone uses on day-to-day, your ARM chips, your X86 chips, all have eight-bit bytes, and that's all lovely and nice. However, there's lots of places where you don't have eight-bit bytes, where your minimal addressable unit isn't eight bits. Our specific case that we run into is DSPs, but some other sort of more domain-specific processes might also have the same situation, where they don't have eight-bit bytes. Quite common for DSPs to have 16-bit bytes, that's very common. Sometimes it's a little bit weirder, 24-bit bytes is quite common in DSPs as well. Sometimes, and I've heard of architectures where you've got 10-bit bytes which are in use now, and I know of historical weirder architectures, but sort of odd numbered bits per byte is, I think, less common now. Now obviously, in order to support C-standing, you have to do something with your chars. You can't just pretend they don't exist and tell everyone not to use them. You have to support it in some way. It's very useful if you can use a native machine type of chars for efficiency reasons and for sort of, it's a bit more obvious what's going on and how it maps down to assembly and machine code. It's also important for, I mean, if you want to do string manipulation, if you want to use any legacy code, or any generic benchmarks and tests, it's quite useful to have reasonably good performing char support, so native support of chars. So sort of workarounds which may work, and they may be functional, they may obey the standard, may not be useful if you're benchmarking or if you want to, if your programmers are using chars and then finding that, oh, actually you've put a workaround in and chars are three times less performant than just using integers in that place for memory accesses, then that's not very helpful. So LVM and Clang, in theory, have support for this. LVM IR is really, really nice. Everything is basically bit-based, so all of your integer types you define in terms of bits, all of your memory access you define in pointers to things of sizes in bits. All of the generic stuff generally just deals with bits and doesn't deal with sort of bytes in any capacity, and therefore they don't really bake in many assumptions about the size of bytes. The data layout string which is used to define the sizes of your natively supported types, specifies everything in bits, which is useful as well. And in theory, we have this thing in Clang which is char width where you can just set the size of your chars and that will affect what char bit means and it will affect how Clang code gens for your target or what it does in IR code generation. And in theory you can set that to whatever size you want. So this, so LVM is generally very nice it gives you quite a nice interface, the IR is very nice, but there's some places where this sort of assumption breaks down a bit. One is that LVM IR doesn't have a void type, so in lots of places you use an I8 star in place of your void type and you just materialize them all over the place. That in itself isn't a problem, I mean if you're using an I8 star for a void type then you won't ever be able to dereference it without sort of casting it to some type that you can dereference. So in itself that shouldn't be a problem, but it does mean we end up with I8 stars all over the place. There's also a couple of cases, for example, SROA, Scaler Replacement of Aggregates will sometimes introduce where it will then bit cast to I8 pointers, so bit cast to I8 pointers and a few other optimizations will also occasionally do a bit cast to an I8 pointer. The really annoying one, there are a lot of cases where in LVM's code where there's just a hard coded divide by I8 or multiplier by I8 of a byte size to convert it to a bit size or back. So if you break the assumption and you set char bits to whatever and you sort of make it so your bytes are not 8 bits, then you tend to trigger assertions and crashes, which is more beneficial than generating miscomplations, but it does mean that these sort of cases of divide 8 by 8 need to be fixed. All of the intrinsics also use fixed-width types or sort of the key ones which you need support to do anything useful like mem copy, mem set, mem move. They all take I8 pointers and obviously this is also kind of, this maps quite clearly to what the libraries expect char's and this is just an assumption that your char is I8 and therefore whenever you call mem copy, mem set, mem move, you'll end up with the assumption the pointers are all I8. So it's not a horrendous problem. It's not like you need to rewrite the entire compiler, but there's sort of bits and pieces in various places where it doesn't quite work and you run into a few crashes, you run into a few errors. So there's been quite a lot of previous discussion about this point. You can see it sort of trends from back in 2009 where I think I first saw it mentioned and then I mean there's some more active discussion this month about this. There's definitely people maintaining out-of-tree patches of these top two mailing list posts just this month where I think some from Ericsson and someone working on DCPU 16, which I'm not quite sure what the architecture is, but where they have basically a set of patches which probably do basically what my patches do at the moment. But when I started making my changes, I made it based on some old patches back here, back in 2009, back in 2014 or something. So during all of these sort of mailing list posts, there have been various solutions which have been mentioned. We tried a few of them. I'll go through a few of them now. And then I'll go through what we settled on in the end. So one of the solutions, the first naive one I mentioned earlier, which is you just set charwith. I've just used 16 as a value here. So one is just set your charwith to 16 and in this case, you would kind of hope that everything just works, that Clang will just give you the correct char bits. It will generate IR as appropriate. LVM won't care. LVM just works in bits. So LVM will handle this perfectly fine. It will get all the way through the back end. You'll end up with the right values. That's, well, that doesn't work. It gets you some of the way there in that it means that you find all the assertions which break when you try and make this change, which is quite useful because it points out what you need to fix. Clang, even with this, I don't know. I did look at the code more recently and there have been some improvements on Clang on this front. Clang doesn't make quite as many of these assumptions, but it still materializes I8 pointers in various places. And it doesn't help with LVM side of things where LVM's made assumptions like in SROA where it can just materialize I8 pointers. So this doesn't work out of the box and it definitely needs some changes in LVM. So and I'll essentially talk about these changes a bit later because it becomes the correct solution or the better solution. So the alternative solution, which is basically a workaround, is that you keep your, even if your target has 16-bit bytes, you lie and say it does. You say it does have 8-bit bytes. However, you then set the char alignment so you set it so that all of your char valued things, all of your memory accesses to char's are aligned on 16-bit boundaries or whatever word bit boundaries you use. This minimizes any changes to LVM. It means you don't run into the issues that where LVM has assumed the wrong size, has assumed 8-bit bytes. And in theory, it should just work. You just align your bytes to 16-bits. And then essentially, that means that you end up with dead space where you have a byte and you've got some dead space where you've padded to fit it up to the 16-bit size or whatever native size. Then what you would do is, in your back end, because all of your addresses would still be assuming 8-bit bytes, if you wanted to get down to a word size of 16-bits, you'd then have to halve all of your addresses. That's pretty easy. It's pretty easy for global addresses. You just halve the constant value. It's slightly harder for any general pointer because you might actually have to generate code to do a right shift by one or whatever to convert from a byte address to a word address. And then when you do a load or a store, you would actually, with your byte pointer, you'd actually load a word. And then you would mask off the top byte because you don't need it anymore. And that would give you your 8-bit value in your memory. There's some problems with this. One is, obviously, having to convert from byte addresses to word addresses is free in many cases. But occasionally, you do actually need to generate some code to do this. And you'll have to generate a shift. And that hits your performance. You have to do this change quite late in your back end because DAG combiner and a few other optimizations will do make assumptions about the size of your loads and will do optimizations based on your pointers based on what they're loading. And that can cause misconflations. You also have to generate some code to mask off the unwanted parts. You'll do a word load. You'll do a 16-bit load. But you only want the 8-bit value. And you're only operating on the 8-bit value. So you have to mask off the unwanted parts or sign extend it to the full 16-bits and then operate on it. There's a minor issue with padding. And this is solvable. But it caused me quite a lot of confusion when I was trying to implement this, which is client LVM, they insert I8 values for padding when they want to pad something. However, the I8s also have the same alignment restriction. So if you insert an I8, you end up with a 16-bit value being inserted, which doesn't actually help. It doesn't pad anything. It just leaves everything still unaligned. That caused me a few issues. It basically meant that whenever you inserted padding, the sizes would be wrong. And some special cases need to be added to the compiler to handle this more elegantly and make the padding work. If you make these changes and you do somehow end up with an unaligned address, you'll get crashes if you're lucky. If you're not and you've forgotten, there's some case where you materialize an I8 pointer. But you fail to actually halve the address to get a word pointer. You'll just miscompile and you won't notice. Another solution is one that's on the main list. We didn't actually try this, but it's been suggested, which is you use a fat pointer, essentially. So what this means is, essentially, your pointer is, well, I'm not well versed in how fat pointers work, but the impression I get is that you make your pointer completely opaque, essentially. So it means you can't look into the pointer and use its underlying representation. You have to treat it as an opaque box where you can only do loads and stores and things through. If you make it a fat pointer, you make your I8 pointer a fat pointer, you keep it so that you do actually use 8-bit bytes to load. However, for every I8, so every 8-bit byte you're loading, you store the word address in the pointer, and then you also store an extra bit, which tells you when you load through that address which of the bits you need to mask off. So essentially, you end up doing a 16-bit load, assuming 16-bit bytes. You'll do a 16-bit load, and then you will use an extra bit, which your pointer holds, to tell you which of those 16 bits you actually wanted. And then you will add some extra code to mask that off to get back to your original byte that you wanted. David Chisnell has been working on this for the cherry backend for another purpose. They're actually using fat pointers. This is just one potential use of fat pointers. And I believe that code's pretty stable. He's maintaining a patch set to do that out of tree. There is a problem with performance. Because your word address has to be 16 bits, so you can address your full 64 kilowords or your full address space. However, you also, in that 16 bits, you also need an extra bit to tell you once you've loaded that word which bit of the word you wanted. So you essentially need 17 bits of pointer. And the only way to do that is to either add a second word. So it becomes word address, and then an extra word which tells you which byte. Alternatively, you could cut down your address space and say, actually, I've only got 32 kilowords. And then use 16 bits. And essentially just say the top 15 bits of this word address are the word address. And the bottom bit is just the byte select, or selecting the correct byte. And then it looks suspiciously like a normal pointer with some extra arithmetic added to it. So the actual solution which we sort of ended up doing, and which basically everyone has done on the mailing list or the mailing list posts and have alluded to, which is, essentially, you extend the data layout and you say, in LVM, you just fix your, you add an interface which tells you how wide your bytes are. So every case, I'm just going through this, yeah. So in Clang, then, you would set your width of your charts, your actual natively supported byte size. You will add this byte bit width in LVM. And then you will allow your target to specify how big its bytes are. And then whenever you see these sort of cases in your code where you've got divided by 8, multiplied by 8, or otherwise where you see an assumption where the bytes are assumed to be 8 bits, you make a call to this, or you find your data layout, you make a call to work out what byte bit width is, and then you do the appropriate changes to take that into account. Otherwise, and a more generic change that you can make which doesn't, which helps, is in many places in LVM, you will get sizes in terms of bytes rather than in bits. But you don't actually need to count sizes all the time in bytes. It's just a convenience thing. So the alternative is you'll just change it so that where you say get size of this in bytes, you actually change it to get size of this in bits, and then you alter the code around it to make it work entirely in bits instead of bytes. And then that is generally more generic anyway and more generically useful. But where you can't do that, you'd have to make use of a extra call to byte bit width. This is pretty tedious. It requires lots of small changes all over the place. Everywhere this assumption's made, you have to break that assumption and you have to sort of feed in the data layout and get the call to this function in there. So a few stumbling points I'll quickly go through which I found when trying to implement this. So first of all, whenever you generate I8, so you have this call get int8 pointer type, which is used a lot in LVM Clang, you need to replace that with get intn pointer type and then make a call to the data layout to get the byte size. Use that correct byte size in bits to get the correct pointer type and basically replace all instances of int8 pointer type with intn pointer type. This sometimes requires you to sort of find the data layout in wherever it happens to be. The alternative to this would be to just keep allowing int8 pointer type. However, in LVM somewhere, maybe expand that in LVM itself to a wider load and the appropriate math to get the original I8 out. However, this seemed actually just replacing all the calls seemed like the more eloquent solution, so that's what we settled on. We didn't really investigate this. Sorting out the intrinsics, all of your intrinsics have I8 star for pointer arguments. We've seen a different solution. So these have hard coded I8 star as their pointer types. The intrinsics could just be made essentially to use any pointer types, which LVM has for its intrinsics, which say the type is defined at runtime when you create the intrinsic instead of hard coded when the table of intrinsics is generated. But what we did instead was we just added a new type, which is the byte type, which is actually a pseudo type, which essentially got converted to whatever the correct type was when you created the intrinsic. So either way, basically, whenever when the intrinsic actually gets created and all the sites where you can create a mem copy intrinsic and memset intrinsic, you would have to make another call to that function to get the byte size in bits, and then you'd feed that in to give it the correct type when you're building your intrinsics to use the correct type for the arguments. So the other thing, which is all of your hard coded multiply by 8s and divide by 8s, this in theory is pretty simple. You just make calls to your data layout to get the byte size in every case that you call these. The trouble is there's lots of headers. For example, the ones which contain these functions like your data layout.h, your MVTs, your EVTs, your value types that are used by your machines, all of the things which tell you how big the stores are. They all hard code in divide by 8 to get the size and bytes of things or the size and bits of things. So all of those need to be updated. And all of the call sites need to be updated either query the data layout, either pass the data layout into here to use the correct sizes for the divides and multipliers. Or they need to, basically, all the calls need to be removed and replaced with something more generic. So this was the majority of the changes is just changing all of these cases, feeding in the correct sizes of bytes into all of these functions and updating all of the call sites and updating all the interfaces. It's pretty much a mechanical change. It's just a mechanical change where you don't have any option but to manually go through each of the files, look at all the call sites, feed in the correct information, get the correct byte size and account for that. So yeah, so there's lots like this very annoyingly. I can't remember where this is in. It's probably data layout. If you get the store size of a type, oh, this is type.h, actually. If you get the store size of your type, you will have this very useful get the size and bits, which is lovely. It's nice and generic. It's only in bits. However, you'll then just add 7 to it and divide it by 8 to get it down to a byte size, which is very annoying. So and this is in some header which has very few outside dependencies. So this is just the generic type header, which is very sort of independent. So you need to, basically, feed in something to make this not 8 and it's not 7 and whatnot. A small complexity, even though your architecture may use non 8-bit bytes, may use 16-bit bytes, it's kind of quite useful sometimes to keep your elf and your dwarf representation in 8-bit bytes in the 8-bit byte world. So you have to handle that. And at some point, you then have to sort of convert from whatever your machine byte is down to whatever your 2 8-bit bytes. That's partly for the on-disk representation kind of assumes, certainly for dwarf, it kind of assumes that you have 8-bit bytes for your on-disk representation. So there's advantages to doing this. This way is basically char bit. You're not lying when you, in your C code, you're not giving a arbitrary value of char bit, which doesn't actually match your machine. So it's a bit closer to, it's a bit more predictable. And there's minimal performance penalty, whereas the other methods, like masking off parts of words and doubling the size of your pointers just so that you can address everything, is quite costly in performance. So, but this doesn't really cost you anything. The disadvantage is mainly just that it's, it requires changes all over L, V, M, and you have to maintain a lot of changes and keep hold of those changes. And it's made sort of rolling forward to the top of tree, humongous pain. Because I have to maintain this, and there's lots of changes to scenario code in the meantime. And there's the other minor disadvantage to which it kind of breaks that assumption that everyone holds that your bytes are always 8-bits. But that's not as troublesome. I think if you're, for a lot of systems where you're programming, for example, DSP, you're well aware that bytes are not equal to 8-bits. So there's not quite such a bad thing for those. So the status of these, we have implemented this in a production compiler for a customer that we have and who've got a DSP. There's no patches yet for the generic to make these changes generically. I know some of the people on the mailing list, there was more recent talk about other people who have similar patches, making them generic and submitting patches. I guess we should probably, I'll probably post the mailing list and see what status that's in. I need to... Well, this production compiler is, I think, about six months or nine months behind top of tree. So first of all, I need to rebase against top of tree, which is going to be a pain. And I need to actually tidy it up and build it into a patch set. So that's actually quite a large amount of work, I think, to do that. The other thing is I've only looked at the case we cared about, which is where our char bit is 16-bits, where our bytes are 16-bits. We made no attempt to handle the other cases. So it probably won't work where... Well, at a minimum, it won't work where your char bit is not a multiple of eight because of assumptions in the compiler about alignment and things. The other really key one before this is ever going to be ready for prime and entry is to write some targeted tests. We don't really have a way of testing this easily because all of the targets in tree are sane and have sensibly sized bytes. I'm also not quite sure how to write targeted tests for this, but that's an exercise that I'll go through when I've got this post to ready to get entry. So future work is to fix this all up and submit the patches for scrutiny and for people to pick them apart and tell me I've done everything wrong. Our plan is we have this experimental back-end that we're working on, which we're trying to get upstream at the moment, and one of the changes I want to make is we want a target which has this kind of feature in tree, so I'm going to change our architecture to have 16-bit bytes. That gives us something to test again. One of the things I wasn't sure about is how we test this once it's in tree. And I'm hoping it might be an entry target for this feature if someone doesn't have a better alternative. I've not seen any suggestions for an alternative, maybe DCPU, 16 could be. And then the future work is to handle charbits when they're not multiple of eight, when they're 12 or 10 or whatever weirdness you have. So that's everything from me now. So are there any questions? The min set men copy. You can't just use any pointer because that means that you could, in theory, call men copy from a char pointer to an end pointer. If you use byte pointer, your generic byte stuff, then it means that it has to be the minimum address which is also set in the standard. I think the standard says it has to be a char pointer. Yeah, I'm fairly sure it has to be a char pointer. Which translates to the minimum address. So you may end up with generating code that it's not standard compliant or it doesn't even work. Yeah, so my assumption would be that you would... The base is that the byte size, byte type is I think is the right way to do it. Yes. I felt a bit bad about introducing that because suddenly LVM doesn't care at all about chars and then suddenly I've added something which is very dependent on C. Well, but no, it's not. It's the minimum addressable size. So I mean this is a computing concept. It's not a language concept. In fact LVM pointers are not really typed so I'm not sure it would make any difference but I would say that byte type has a little chance to be accepted upstream. So probably it's a good idea to wait for the all-park type pointer type basically. Yes, I've not really looked at the opaque pointer stuff to see how this would change everything and I have a feeling it will cause knock-on effects which I mean I have to rethink this so that's an exercise for when I've had time to look at opaque pointers and what they're going to change. You already have the complex patch search. You don't want to have two complex patch search. Yes, I did a quick search because I've updated most of the key interfaces and I've got this working and it definitely works but I've still got a few cases where I know I've got hard-coded divide by eights in there and I haven't yet figured out and fixed all of them so I need to go back and review all the changes I've made. Okay, so yes. So the question is basically what happens if you have pointers to, say you've got pointers to I8 in different, well, yeah, pointers to I8 or pointers to char, right? Yeah, pointers to char in different address spaces which use different sizes. I don't think I can handle that. You've got to tell the character designer to do something more simple. Yeah, I've seen, I did see that get mentioned in one of the mailing list posts as someone pointed out an architecture where they had two address spaces, one of which is word-addressed and one of which is byte-addressed and they wanted to somehow magically have this be handled elegantly. That seems so unbelievably painful. Yeah, so that could be an exercise for when it becomes a problem. Okay, yep. The machine is first two bits. I've ignored this by just utilizing the types to have a better understanding of the strings very, very slowly. Yes. I desperately want to repeat all this work. So do you have any idea where or if it will land upstream so I can do it all myself? No, I don't really know. Maybe 2019? It depends. It requires stepping on lots of bits of the compiler to make the changes and I have a feeling people won't necessarily be happy with the specific changes I've made so it's probably going to be a bit of a slog to... It makes the general gaze more harder to handle? Yes, and I'm not quite sure but I assume it would also slow things down a bit that we have to pull in the data layout everywhere and inquiry it to get this information. It's probably negligible but... Yeah, there'll be... I'll try and get some incremental changes. Probably the first one is add this interface to your data layout and then try and gradually make the changes to other bits of the compiler but that's on a step-by-step basis. So one thing I haven't considered is what happens if you've got I32 as your byte size, I'm probably not able to handle up to that size because you can still create I16 pointers and I assume I will still end up creating I16 pointers in some cases. I haven't really thought about it but that might still be a problem you end up with an I16 pointer but your chart size is wider and then you still have a problem. Any more questions? Jamie? Yes, I will once I've written this. So yes, so I'm going to put this in AAP and we're going to... I don't know, I assume we might change AAP to suit this but otherwise it will be a branch of AAP which is our architecture and my plan is to put the patch in there so that everyone can look at it and see it in a working architecture and then hopefully that will be a good example of it working and that it works and that it can work and that it doesn't affect everyone else too much because I mean obviously I don't want to make these changes put them in the compiler and break everyone's code or slow everyone down for endlessly so. Yeah. Yeah. Yeah, so... Actually in the code I've written, rather than using bytes I've used char so I've just said char will overcode. That adds its own confusion so probably something like minimal addressable unit or something would be more, would be better. I use char but then I realize that char in the C standard means byte and then I've got this, I've got this, I've got this, I've got this, I've got this, I've got this, I've got this, I've got this, I've got this I've got this byte and then I've got this, it doesn't really work so. Yeah, so I... Yes. Honestly, partly it was minimizing the, I didn't want to have to write lines where I just get minimal addressable unit size every time so I just, I took the simple root and just used char. Yeah, so that's, that's probably going to be a good idea as well because everyone does still assume. Does still assume that byte means 8 bits so I prefer not to break that underlying assumption now. Okay. Okay. Thank you very much.