 Shall we start? Yeah? Okay. Hi. Hi, all. I'm Mark Wielard. I work for Red Hat. Among other things I'm responsible for Valkyrie in Fedora, RHEL, and developer toolkit. Valkyrie toolkit is new versions of developer tools for RHEL because RHEL is stable. So sometimes you want newer versions of tools on it. So if anything is wrong with that, just bug me or actually take a subscription to RHEL or use Fedora for free and bug me. This talk came from a discussion we had on the G-Lib C list. It started with the Sanitizer developers who had some problems because GCC, G-Lib C also does memory protection by fortification of C-Library functions. Then the Sanitizers would either not see it or have to intercept different functions. It is interesting that you now have different memory protectors that somewhat do try to do the same thing on different levels. So GCC, G-Lib C does fortification of C-Library functions on a very low level. That's mainly what this talk will be about. Then the Sanitizers, they compile in checks and they are more dynamic. The fortification is mostly statically checking. Then of course you have VELCINE, MMTAC, UDO, whole program and REST space tracking. Fortification. Most of the tools now built with this enabled. So you just define fortified source to two, you can also define it to one but that's not that interesting. When you define that macro it causes some lightweight checks. Interestingly enough, there isn't that much documentation about it. This is kind of it. Which was surprising, so I hope by this talk I explain a bit more what it really does and why you actually should do it. Partly why it is somewhat fake is because it really is just defining the fortified source macro and basically anything you compile can do something. So what we see isn't actually just these memory operations but in G-Lib C it does a couple of other checks also. So fortification is basically one somewhat complicated but very cool compiler trick. There is a function built in object size that gives you the size of an object given a pointer. And then you have some C processor tricks which then take the output of built in object size and do the checks. So that's really nice and a nice magic function. Also built in object size is actually it has documentation. Yes, here it is. But it is kind of, I found it hard to understand really what it does. So it returns a constant number of bytes from pointer to the end of the object pointer points to if known as compile time. And then you can give it a type which is defined as 0, 1, 2 or 3. And that changes the behavior a bit. The thing is that it has to be known as compile time. If it isn't known as compile time then the function will return either 0 or minus 1. In the documentation it's always called minus 1 but it is size max of course because size t is say insigned. So depending on type you can say that's nice. I don't know the number or well pretend it's as big as can be represented. And it basically only works if you optimize because then the compiler really can see the object behind the pointer. If you the glibc sources actually check a fortify source is defined they error out if you don't use optimization. But in fact you don't have to it will just return 0 or minus 1. So to simplify this a bit if you are using this yourself and I think you should because if you provide a C library then I think it's really nice to provide fortified. There are so many things you can do wrong. Let's just try this out first. So just set the type to 1 and then it will return size max if it doesn't know how big the object that is being pointed to is and otherwise it will return the largest remaining size sub-object the pointer points to. So this example let's just you have p which just points there an int is four bytes long it points at the start of b so built-in object size of p is four. The same for q which points just inside this arrow the remaining size of the arrow is nine and this is nice. It doesn't really know the object size but it can give you at least if you give it one the largest sub-object size that is still available so that's nine in this case. With the type you can do other tricks some of them I don't actually know when they are useful so as a kind of homework exercise you can try type 0 2 3 and 0 is kind of interesting there it gives the remaining type of the outer object that's in there that's sometimes useful so r points to p or q and q points one after the first and the remaining structure is 24. Is there padding involved? Yeah probably there's padding between here so now it doesn't give the minimum but the maximum. I've seen this used in glip see if you have a read function for example that you give a buffer and they think they probably know the whole structure layout and they want to fill it completely the others give the minimum of the sub-object and read the here what we always will use is you get the maximum of the sub-types that are being pointed to is one example that's really nice and small and immediately shows why this is so useful and why I think everybody shoots at it you have a checked version of get current working directory you give it a buffer and you claim the size of the buffer so what glip see will do is it has this function and says I want to know what the buffer length really is and if the size given by the user is bigger than the buffer really points to then I fail otherwise I give you the current working directory so how it works is I actually cleaned this up a bit because glip see header files are interesting to read but basically it defines this new function it redirects the original get working directory to an alias it directs the checked version to another alias called warn which has an extra attribute a warning attribute that the compiler can see and then the magic your get working current working directory it asks the size of the buffer if it doesn't if it doesn't know the buffer then okay we just call the current working directory if we do know the size and if we do know the buffer size and we know the the size at compile time it's it's actually a constant let's see why does this oh yeah if if it is not good so if it is a constant we are here then we check the constant against the compiler constant and we call the function with the warning if everything is a constant then this is defined as an inline function in the header file so just warning function get called and you get the warning at compile time which is really nice otherwise if we do know the buffer size but the size given by the user isn't a constant then we call the checking function nobody would write this code but I actually have made this mistake myself and this is why I think fortified functions are a really good idea because well a compiler should work with stupid people like me here I thought I should give the maximum the maximum path to get working directory and of course that should have been the size which is 16 the funny thing is it just works of course because I have a short name so my home directory name is short and it's it's perfect if you run it under felgrind felgrind will see wait a minute if you would run it somewhere else then the system call would override some unadressable bytes and it even tells you that that is right after where you melded which is line 9 probably there was something before that so this is really great but it could actually be better if we compile this with fortified source then the compiler already gives a warning and the warning message is somewhat confusing if you don't know which tricks are being used which is kind of a bummer I don't know if we could make the compiler smarter nice thing about this is that it doesn't need any smarts it just needs header file which is smart but this is actually better than having to run it on the felgrind if you ignore this warning it's just a warning and who compiles with w error you should you can actually run it and the buffer overflow is detected the program is immediately terminated because here glibc kind of deduced something bad will happen you will overwrite some memory that you're not supposed to the only bad thing about it is and this is where did I call what so you would like to then at least that is what I always do is run it on the felgrind hmm we get the same thing and that is that trace that awards which which isn't so nice why do we have this this problem so the problems we see with it are it's not as good and expressive warnings but on the other end it is much earlier it it only works on the compiler can set acly deduce buffer bounds which actually is surprisingly often but it it blinds the other memory protectors because the actual bad usage isn't done anymore so the other memory protectors don't see it even if they have much more information that they could give you about the address that is being misused on the other hand it might actually uh uh flag something that uh felgrind or memcheck would miss because the object knows the object size and as we saw in that uh that the first example if you have a large structure memcheck might think yeah well there's addressable memory yeah I don't mind if you write over that so uh that's uh nice um but on the other end and that was the complaint from the sanitizer people is it might obscure tracking of memory usage uh because the standard functions don't get called anymore um and uh to be honest I first thought oh that's a big problem but felgrind is mostly immune to that because it has this whole program uh view of the world but other sanitizers uh don't or you should uh rebuild your whole world with the sanitizers including uh uh glibc and uh first I I'm not sure they uh glibc can be compiled and then you you wouldn't want to do that in production because it does really slow down your program and worse people have been doing it in production but the sanitizers come with their own libraries uh which do various things during uh uh error reporting and people quickly found out hey I can misuse those to override files or then you have you have your sanitizers uh work against you and introduce new security issues um but it would be really nice to have the good without the bad um and at least for felgrind uh we can uh happily um so um what should we do first we should uh uh make felgrind aware of what the address was so instead of just aborting what jack phil kind of does uh uh we we provided the address that caused the problem um so that's basically a simple uh change in uh jack phil we say okay something bad happened uh and it was there and then um uh we either override jack phil or uh we annotate uh jack phil to make felgrind aware of uh the issue first we we make the memory non accessible or tell felgrind we don't think that memory should be accessible even if uh man check thinks it is and then we ask it is that memory accessible well no um and then we we fail like normal amazingly that just works uh and we even just get the uh uh the original uh uh that address was allocated at that line uh so this is really really nice except um as soon as i suggested this the gc gc developers are um conservative let's call it that way um so that bloats the code you have to pass on our uh an argument you have to calculate the argument uh uh every instruction is one too many uh um so yeah they kind of have a point they really want this to be always on so anything that might get people to turn it off they don't want it um and um then uh they said well you didn't really do your homework because it's not always an exact address we use it for everything oh right so for example selects takes uh file descriptor sets which are really uh uh uh a byte set a bit set of which uh so if you try to put a file descriptor bigger than uh file descriptor maximum in there they generate a check fail which is correct because uh they they would calculate a bit outside the uh the file descriptor set but right and they are kind of right that you don't always know the exact address or you have to calculate it because there could be multiple buffers uh uh and and uh this is actually a a a a good point felgrine could give even better messages hints that is true because here you see that well it's an adress of bytes found during client check request uh yeah that's we we could do better and sometimes we do uh so we already have uh uh uh some overwrites for man move check uh uh and uh actually I believe it's only those four uh that we override and then uh we uh do for example overlap checking on the buffers and so yes we could uh but to be honest most of them are really as simple as get current working directory so uh there is not that much more than we uh can do wow I'm going really fast yeah okay uh yeah I had one other slide that's not here but so I'm slowing down and I'll try to do this one slide in 20 minutes sorry um so the issues is I haven't done most of the work there are 75 check functions in uh glpc um and about 50 of them are like uh get current working directory so instead of doing this presentation I could have actually done that um the other 25 are somewhat more tricky so uh if you have the sn printf uh family of checks that actually sets a flag uh that gets checked by the normal printf code so you can't really uh rewrite them completely as an override um uh and there is a problem that gc gets smarter ah down in compiler so um uh uh gc now has built-ins for uh uh this set of functions and um uh what glpc does is just if fortify source is on then we just alias a string encopy to build in string encopy check and gc inlines it all and uh uh there's a there isn't a function left anymore uh it only it only has uh uh check fill uh uh that does get called so in that case we can uh override uh uh check fill and produce at least a better uh backtrace uh but without an address and that's interesting because all these built-ins really work on a specific address that you would override that is bad I I really want to ask them to please make an exception and call something like check filled address with an address to make us happier um I actually had one conclusions um the conclusions were uh if you have a c library you maintain a c library and it takes buffers and sizes separately please think about adding for the fortification uh it it it really is uh as it it isn't that nice but it isn't that much work and uh it really saves lives well lots of programmer frustration um uh and if you do please um uh add something like hints uh to uh client requests to valkyrie the glibc people obviously don't like this generates code it does generate codes uh but it uh it generates a slightly complicated no op so uh it it shouldn't really be that hard and this really is in your slow part if something goes bad so do uh if you don't do that then write a failure that uh is just specific about what the failure was uh and finally yeah maybe I I I I should work out these uh functions maybe in the hackathon at the end and then uh go back to the glib cgc people say oh I have to write so much code can you just have different failure uh uh functions to call so that we can overwrite them nicer okay you haven't put up any signs so I guess I still have 50 minutes yes okay well let's see if people have any questions uh two minutes so just before this technology is quite old about 12 years old in glibc so it's mature and you can safely use it 12 years uh and I think that well chickpearl is on her part so it's not problem to put some code there yes I have no problem uh I mean taking my oh okay good well this this talk was a success for me at least okay no we probably have to introduce different check because currently there are only two I believe check fill and check uh stack overflow fail uh and uh there are some uh places where check fail is called where it doesn't evolve an address so probably we should have two but maybe you can discuss that okay good what was your question yeah no no no there are 75 depending on precisely how you count functions that have been fortified in glibc uh so and uh my estimate is 50 of those are like the cat current working directory where it just has a buffer at a size and you do some checks on the buffer length and the size and so the rest of the functions with glibc is not able to be fortified or uh I I don't know if if if they if they went to the max or enough I I believe these are the most I don't know yes so the the clang repeat the question yes clang uh provides uh some of the same features I think you're calling about the sanitizers that are shared between GC and LLVM uh so those work uh differently here most of the checks are done in a way that can be checked at compile time uh and uh the sanitizers at uh runtime checks uh so um yes you could but I think what what the the sanitizers come with uh helper libraries that you link against and what we could maybe do with velgrind is have our own uh implementation of those helper libraries I know Julian actually implemented for Epson uh support to have uh velgrind itself be Epson checked so it it was so easy that Julian doesn't even remember uh the memory checkers are not as simple uh as uh the undefined Epson undefined behavior sanity analyzer but I think it could be done but we have to see I think it it it wouldn't be that hard because LLVM and GC share them so there must at least be some documentation on them so maybe it can be done yes yes valgrind observes a check he signs observes the maloch calls but at the end valgrind knows it has been allocated doesn't know the time which is behind this uh maloch neighborhood no valgrind also can read the debugging for which describes types so what do you think about an idea to modify the compiler to tell to tools like valgrind that this maloch call is allocating this time so that the time would be remembered remembered by tools like men check and then can could be used to do bit of diagnostics yeah so I did think about it the question is can we annotate uh maloch or memory allocation functions with the type and maybe use the debugging for tool uh which describes which uh pointers point to which variables variables have types um I I couldn't make that work because the IR works too low level you you uh you you you don't know why you have a pointer or why you do a a the maloch calls we will do or we have just done by the way it is this kind of type no no no in in in in valgrind I um you no but but then then then you you you can very it it gets very hard to to keep associating a type what you could maybe do is gcc now has support for mpx uh the memory protection from intel processors and what those do is they uh they have uh an extra set of records that describe the bounds of pointers so maybe we could reuse something like that most people are very changing the compiler so that it says here I see a call to maloch and I know that I will assign the result to a pointer of that type so at this place I inform tools like how to deal with other tools that this pointer no here is the type so you're proposing that you take in how to generate these characters tells tools like how to deal with other tools that means and so then you drag the type and then when you think about it then you know what is being delivered yes so so so my my my conceptual problem I I can't really think about it because the types are not what we work on we we work on instruction streams and I I don't know confused why you're confused okay ah so in gcc you're compiling something and then does it call to maloch and you maybe know what object the type of the object you're allocating there is so gcc generates a piece of code alongside maloch or we're going to tell us maloch what type is yeah okay yeah okay so so yes you are right so why am I confused on the velgrine type because uh I don't understand on the gcc side how you would then pass on their type because in a way but I don't think we can buy out the right of course because depending on your language what you do is not half the type is just a point of view from your language but what you do is doing some static analysis to detect memory patterns memory access patterns and then you can detect the consistency is there yeah you can never work all the time but I would imagine that the rest of the department gcc is compiling three parts first here pretty much there is the type support that's looking at yeah so maybe it is more like what I said for the mpx extensions because there what they did was output instructions that set up bounds for every pointer and based on the object sizes uh and then you have it in the instruction stream and that would be easier for velgrine to see and and I also think abusing or using the uh the uh the sanitizer libraries might be an easier way uh okay yeah