 Hello, test test. All right. How's the sound out there? I Can hear it echo so that's probably more than enough. Hello. It looks like we are working amazing The screen I can use the pointer on does this point are visible at the back All right, then I may use it so for those of you that just got here I want to emphasize that you there is a home game you can play but it's entirely optional Everything I'm going to reference when I speak will be on the screen, but it may be Compressed a little bit and you might like to play with it. So that's your choice, you know I don't okay, so we're at the one minute mark. So for those of you just coming in I'll emphasize this is optional someone said oh no, so I Wanted to point that out. You can play the home game. You cannot play the home game You won't miss anything but for some people it's fun Okay, and I just realized I said Valgrind. I don't use it in this talk this repo I've used for some other talks on related subjects, but they're not exactly the same so I don't actually use Valgrind and Someone commented on the title scale accepted the second of two linked talks and so the title is slightly less salient but I Like to think it just makes me inscrutable rather than confusing and I guess that's time. So the title as you see is the young man in the sea reloaded And what we're going to do is we're going to kind of dive into the deep end of C and C plus plus And we're going to start with a very simple example. So if you're playing the home game, this is this is where you find the code But the only thing we're interested in is this one little function will over flow everything else is kind of driver code And I'll show it to you, but there are two questions. I want you to consider What is the intended behavior of will overflow and what will the actual behavior be? Okay, so here's the code. Is that visible? Back there. Is that readable? Okay. I At one point had bolded them to make the green sort of the letters Have a higher contrast and then the problem was they seemed a little overdone. All right So the only thing we're really interested in is the behavior of this function called will overflow and all it does is test Whether its parameter in plus one is less than in which of course can't happen in the integers But of course it does happen in the modular integers in other words I'm looking to see exactly what it says will this thing try to wrap Okay, and everything else is just I try to use it to make a prediction in this case I try it on it max and then we actually do the thing Do the computation? I think I had to move it out of the main or the optimizer would do stuff and then I just the prints all it's gonna do is print the answer and if you run the little test harness which actually Runs it under several different sets of Compile flags in every case. It does exactly what you thought it would do It max plus one wraps around to int min. I mean what else could it do? Okay, and with the optimization shut off it always predicts that However with if we turn on a bit of optimization. Oh one it still predicts correctly with GCC It predicts incorrectly with Clang Now if we turn on O2 it fails with both compilers which means the behavior of that code was both compiler and Optimization level dependent and that's not what we want from a programming language. So the question becomes why? Now for comparison, I could take exactly the same code notice that the only thing I've changed is the fact that I'm passing in and unsigned and Of course I have to test it on you int max and see whether it wraps to zero But other than that, it's exactly the same code and what happens again, it always wraps but in this case with Either compiler with any optimization level the behavior is the same it correctly predicts it So the question is why did it work now? Why did it fail before? All right, so what if I told you the first program actually wasn't see at all, okay? So at this point now it's red pill blue pill time. So just like neo in the movie I Like Morpheus have to give you a choice You can either take the blue pill and that means the talk ends and you can believe whatever you want to believe About see and about your compiler But if you take the red pill then the talk goes on and we explore just how deep this rabbit hole goes Okay, I Won't look if you want to sneak out Okay, so you're still here. You have apparently chosen to take the red pill. All right, so Now the first thing is you might say well that first program had to be in see I just stayed here so I could object the first program had to be in see because the compiler accepted it I want you to remember something the compiler is a machine the machines lie Yes, okay, they don't want you to know you're in their matrix. All right, so what we usually think of is That a programming language has a standard and the standard defines exactly what the constructs of the language do isn't that what a standard is for? That's actually true for some languages. Okay, and and reasonably close for others It's probably exactly true for Haskell because they try especially hard in that community. It's a very similar language called Oh camel has I believe a whole or two Because for because they care more about performance in the Haskell people however C and C++ are actually terrifyingly different than any of those languages and pretty much any language designed after C Okay, that didn't actually just inherit it from see like C++ did so there are four other possibilities in the C and C++ Standards besides actually defining what constructs do they have four different ways to not tell you what they do Okay, in order of increasing chaos and mayhem you have locale specific Okay, which is at least opt-in and make some kind of sense You have implementation defined which is not opt-in. However, at least it has to be documented Okay, there's unspecified which means they don't even have to document what happens and I don't believe it has to be consistent However, and you might think well, that's as worse as it could possibly get and that would be true in a sane language Or at least it would be true if the world were nice But actually it's not true because there's something worse for unspecified behavior The program does have a meaning you may not know what that meaning is and you may not be able to figure it out Except by running it and hoping it stays consistent from run to run, but it has one. Okay There is a fourth category in the standard called undefined behavior and you'll notice it says upon use of This international standard imposes no requirements Okay, now if that isn't terrified if you're not afraid right now, then you must have misunderstood So what I'm going to do is I'm gonna unpack that at some length So you understand just how much worse that is then at least all of us that have gray hair and learn see a long time ago think So no means no and there are ways that people have formed is a famous usenet formulation was it's legal for the compiler to make demons fly out of your nose The famous nasal demon formulation The one I like the best is actually At least repeated by Chris Lattener who's an important compiler writer any undefined behavior and see gives license the implementation To produce code that formats your hard drive Okay, by the way every time in this talk I say see I pretty much mean C and C plus plus because on this issue There is no difference. Alright, so well now maybe Chris Lattener won't actually go to the trouble of generating code to format Your hard drive. Let me tell you what will happen. Alright, although you might read Ken Thompson's paper on trusting trust Because if you're a high enough value target, maybe he will but the compiler will absolutely do whatever is fastest And if you invoke undefined behavior, then that will It will sooner or later create a vulnerability in your code And there are many volunteers on the net who will be happy to run code on your computer to format your hard drive So he doesn't have to okay, and this is not a theoretical thing C chord works with cert at Carnegie Mellon and he points out You know the majority of security vulnerabilities in C and C plus plus programs Require or involve exploiting undefined behavior, okay And it's not like there's an edge case or two like in the case. I think of O'Cammell Okay, because it lurks everywhere. Okay, and I'm not even gonna go through this list I found more than you know, I if my more than 200 here isn't right. I'm not going to count again. Thank you But some of these you knew but what you don't you knew you weren't supposed to access beyond the ends of a memory block But you probably didn't understand the consequences of doing it and that's what we're unpacking now. All right so the biggest problem is how in the world did C and C plus plus end up with a situation where if you invoke undefined behavior your program actually doesn't have any meaning and The answer almost the whole answer is in this part of the seed standard committee Charter make it fast All right, and C plus plus inherits all the same thing There's some of it is sort of the low-level focus But performance is really the monster and it's the consequences of uncompromising performance that we're exploring so a lot of it when we Program we usually have some kind of a picture of of what we are using and If we have any picture of a compiler, it's usually this one So there's a front end which does things like parsing and type-checking and then there's a back end that we hope it does a bit Of optimization sprinkles pixie dust on the code. Okay, what that means is what we think is that the front end discovers the meaning of the program Line by line just like the way we think about it and the back end then generates assembly with the same meaning Line by line. Okay, so we thoroughly think in terms of lines of C plus plus or if you're a little more sophisticated Maybe you think in terms of sequence points, but it's all the same The problem is that this leads us to think that the undefined behavior Sure, it's there for performance. What it's really letting see do is use single machine instruction So that addition does whatever The processor does it it's choose complement if the processors choose complement null You can dereference null with undefined results because that way we don't have to put in null checks The problem is that's not the compiler that you have now if you're old enough It might be the compiler. You had a long time ago when we weren't very good at writing compilers. All right But that was because compilers were stupid then they're not so stupid now Okay, so this is more like the compiler you actually have so yes There's still a front end and a back end but in the middle all the optimization is in this middle end and I have drawn it the way I drew it because this little tiny piece of the compiler in I don't know 1970 or whenever has become the tail that wags the dog. This is most of the code in your compiler and the reason for that is pretty simple actually the The parsing determining the meaning of the program is a solved problem and will not grow Without bound over the lifetime of the language. Okay, the same is true of code generation But we have a theorem that says there's no such thing as a fully optimizing compiler There's always an optimization left to make so if you are doing your doctoral thesis in Compiler technology, what do you think you're going to do? Yes, that's right. You're probably going to add more code to this effectively So the major task of the compiler still the front end does discover the meaning of the program essentially line by line however, then it goes to the middle end and that middle end is part of an ever Heightening arms race between, you know PhD students and compiler writers and it's getting smarter and smarter and it does amazing things to that program and Probably erases all knowledge of even what the original source lines were and then that What it produced is handed to the back end and yes It does produce assembly code with the same meaning of what it was given But it gives assembly code with the same meaning as this Amazingly transform program and the transform program does not need to have the same meaning as what you wrote Okay, now it does have to have a certain relationship But the only one is the one imposed by the standard and I want to go back and point out that no means no The standard imposes nothing on a program which invokes undefined behavior really okay, so To I need I want to show you how a compiler exploits this because it tends to seem very theoretical So let's divide. Let's categorize all functions into three types First of all, there are type one functions, which don't depend on undefined behavior At all those are uninteresting for this talk because the optimizer has to behave and do what you think it should do So that's not interesting in this talk type three functions always defend on undefined behavior So that's also an interesting because the optimizer frankly should just simply remove them Remove the calls all together because the fastest way to do nothing in particular is to do nothing at all So the only interesting case is type two functions Which may or may not invoke undefined behavior depending on the inputs or some other context such as you know Acquiring a database handle or something So the optimizer has a lot of license, but not complete license So what should the optimizer do what are the requirements? Well, first of all if no if the inputs are such that there is no undefined behavior, then it needs to do exactly what the standard precisely defines okay, and it should be as fast or as small or whatever we're optimizing for as It can be for that case However, if you pass it inputs that do invoke undefined behavior first of all there are no rules the standard imposes no requirements and Optimization in that case is also meaningless because part of the ethos of C and C++ is trust the programmer And in particular trust the programmer to never actually do that All right, so what that means is that the optimizer should analyze and optimize a function Based on the assumption that nothing ever invokes undefined behavior All right because that is the fewest constraints and the most latitude for speeding up the defined case Okay, so let's take an example type two function. This is our friend from a few minutes ago will overflow I've just now that I don't have to squeeze it on there with the driver code. I've formatted it reasonably Okay, now what should it do with will overflow? Well, first of all it turns out whoops Turns out that the optimizer So remember in plus one less than in is our test Well, the C standard says that adding one to in is undefined if in is int max It does not say that it wraps Okay, so the optimizer should assume that in is never in max because that would be not trusting the programmer so therefore this test always produces false because Unless it's in max. This is always the left-hand side is always the larger So what it should do is produce the same code that it would produce for this function Which simply returns zero and in fact, that's what it does because that make file also preserves the assembly language generated during the compilation and in fact, you don't even need to read x86 This thing takes a register and XORs it with itself. That's an assembly language idiom for loading zero, which Now or at some time in the past was faster and or smaller But anyway, it's an assembly language and then it returns why well because clearly we can deduce that EAX is the return value register on x86 So what it did in fact was return zero in the fastest possible way. You'll be glad to know that was optimized. All right, so What happened now, you know, what happened? All right, will overflows a type 2 function and I passed in an argument that invoked undefined behavior So that means it has no meaning according to the standard It's not see and in particular its behavior cannot be predicted for from the source in that case The unsigned analog was a type 1 function because the standard actually says that you went max plus 1 is 0 says it wraps in in all cases So now you might think well, okay, that was weird But I've written C and it didn't seem to always do that. So I want to kind of show you how bad it can be What if we modify this slightly? I'll do the test differently I'll check whether in plus 1 is equal to in min now Within C that should always be the same. I should always produce the same results at least for the way I drove it okay But what actually happens if we test it? Well once again, it always does actually wrap however with GCC on O2 it again fails to Predict that it will wrap every other case succeed somehow before GCC did slightly better than clang now clang is actually Making the code work all the time and GCC is not okay What I'm trying to show you is that the results don't when you get into the area of undefined behavior The results can not only depend on the compiler in the optimization law It can depend on details of the source that look completely equivalent Okay, in fact, you can get instability from release to release I just randomly pulled a bug report out of the GCC Bugzilla a bacula developer was unhappy because G plus plus six oh started deleting his checks for whether this is null Well, so how many of you are C++ programmers and actually got a chill down your spine when I said testing whether this is null Okay, so that yes, that's right The standard says that that does not happen and so the optimizer correctly recognize that and remove them I don't know what he was up to but it was not the right thing He also was unhappy about one that seems to be very well known. I hope at this point It was deleting calls to memset why because in the C virtual machine when you release memory back with free or delete It's gone and it doesn't matter that it used to contain your bank account Information because it's gone, but of course in the real world we like to zero it out But if you zero it with memset The compiler knows way too much about memset and we'll look at the oh he memset it to zero Then he freed it the memset had no effect and so it'll remove it because it's faster Okay, there's a mems just as a free tip. That's not in the talk There is a in newer sees there's a memset under bar s which the compiler is forbidden to touch And if you don't have that available to you write your own stick it in a compilation unit of its own So it can't do any inter procedural optimization and trick the compiler into not pulling it out Okay, so the for some of the responses on the bugzilla are quite enlightening The developer was told you seem to be confusing it worked okay until now with this code is valid or Someone Morgan I said it it it got smarter. That's what happened the compiler got smarter and his Incorrect code which had happened to be working Didn't work anymore because the optimizer was smarter now Okay, in fact the standard permits time travel doctor who loves see okay And the reason is that when it says no imposes no requirement. It really means no so it means that it is permissed permitted according to the standard to on Find undefined behavior travel back in time and change its mind about doing something saying that it did earlier and do something Insane instead and this actually happens okay, and just because that's so bizarre There's a good reason for that the reason is well It's speed at any cost, but if you want me to elaborate that the point is that the meaning of the entire program is retroactively undefined and the reason is it allows more aggressive instruction Reordering which can be a big win on a modern compiler and but it means that if you're hunting a bug which involves undefined behavior printf Even good old printf may fail you because the error may not be happening at the point The error may not occur in time where it looks like it should based on its position in the source all right So what this what I want to do at this point having said that I want to give you a tiny bit of a Different language to say the same thing because I think it's useful, and it's my top When we say a language is safe I I claim we mean something very specific it doesn't me or we should it doesn't necessarily mean it's easy to use what it means is Every language creates a sand a virtual machine it creates an abstraction layer for you Because the reality underlying it is too hard And if you think that it's not too hard because you like assembly I was a physicist once and I'll keep going down the stack of abstractions until you don't want to do it anymore Which probably will be sometime before a quantum field theory all right So the point is what we really mean is that the virtual machine cannot be broken from within now Of course, we can break it from without if you're running a computation, and I pull the power cord out of the wall What happens will not be according to the standard, but that's okay because of course I can break it from the outside C and C++ are not safe in a strong and objective sense Undefined behavior is exactly where the abstraction can be broken from within and That means that C and C++ as abstractions are leaky abstractions. They're glitches in the matrix All right because you thought you were in the matrix, and then you hit a glitch where the simulation didn't work and The reason the fundamental reason why the source code is no guide to what happens is because Source code only has meaning from within side the matrix meaning within side the abstraction Okay, and that is the terrifying reality of C++ So I wasn't joking when I said the first program was not written in C according to the standard It had no meaning and so the compiler wasn't really obliged to do anything much So what we got was probably more than we deserved it actually like printed and terminated in stuff Okay, so now you swallowed the red pill You know why and you know why the compiler accepted it anyway because it chewed along with the assumption in good faith That I would never pass it anything that would invoke undefined behavior and I did so remember the machine the compilers of machine the machines lie and They may hurt you if you try to use glitches in their matrix So the only real this leaves us with the question will now what so we have three options Well in the movie if you recall It wasn't really Morpheus wasn't really right that there's no turning back because of course I forgot a second chance Didn't or at least it looked like it so you can go back and take Cypher's choice You can go back to writing naïve C and do what seems to work You can take the red pill and figure out how to cope with reality as awful as it is There is a third option because of course the movie wanted to make you make the same choice Neo did so naturally They would not bring to your attention. You had a third choice. You could stand up and walk out of the theater and We can do that by abandoning a C and C plus plus as individuals, but it's not practical as an industry There's first of all too much legacy code more importantly a lot of the ways that we avoid using difficult languages like C and C plus plus most of the time is by using them to write the critical parts that actually don't work in our more comfortable languages like database servers or Ex-native native extensions to our languages so that we can get those little bottlenecks done So the point is we're not this is not really practical for us as an industry Also, it's against my principles to give up and let the machine win, but you know, that's me And and I guess I should point out while we have very few suitable replacements whenever we do have one the industry steadfastly refuses to adopt it so It doesn't matter because apparently inertia is the key to technology the blue pill actually doesn't work either because Unless you have very modest goals for security reliability maintainability and so on it doesn't work because Because this is where we are The reputation for C and C plus plus as being difficult to program in in bug ridden is really founded on this It's not manual memory management, which I actually do a talk and teach how to do systematically this there is no system Okay, so that we're left with what I'm going to call the red pill approach, which of course is That because I say so and that is Don't try to know all 200 edge cases in C learn to avoid them instead effectively you build your own abstraction layer by by learning what kinds of Programming habits will stay away from those dark corners and sharp edges And what you do is you do them even when you don't need to Until they become habits because you know what you will do at 3 a.m. On a deadline is whatever your habit is to do Not whatever a smart person would do at that time because you're not a smart person at 3 a.m At least I'm okay And then what you do is there are values common values between your habits you internalize those and then you practice in depth By using tools which we don't do enough and finally occasionally people object well But it really really does work the answer here is you can't let the perfect be the enemy of the good It's very hard to do this most non-trivial code bases do have Undefined behavior in them at least even high quality ones But you can you can systematically reduce it and that really is what you do if you choose not to abandon cnc Plus plus or you abandon Reliability entirely okay, so the problem is that a long list of sharp edges in dark corners I don't think makes for a very good talk I don't think that's the way to learn it what I'm going to do is first of all I tried to spend quite a bit of time to make it clear to you why this really is an issue so that you can read things with intelligence and be able to Interpret for yourself what the consequences are and I'm going to do a couple of sort of extended examples as To how I approach this and then I'm going to give you some references at the end where you can look for more Grubby little details on which jagged rusty edges can cut you all right So here's let's start with the example. I've already given you okay And that's will overflow and what I really want to know is not how I can fix this piece of code I want to know how I can fix all the code I'm going to write in the future and I'm going to suggest to start with I'm going to suggest that you make it a habit to Actually prefer using unsigned variables when you don't specifically need the sign and The answer reason is pretty obvious Why because their arithmetic is far better defined the point is we tend a lot of people will you will loop on int? Why okay? There's absolutely no meaning what that means is a sign as a non-negative quantity the number of elements in the array or the Size of the array is a non-negative quantity So in fact you're actually violating kind of the implicit contract with somebody else reading your code that you chose Your types to mean something like what they really mean, okay? Sometimes it's actually hard to do I will say this I have found that removing unsigned Oftentimes even if it required some fiddling has tended to improve the quality of the code and I uncovered places where I Or someone was a bit fuzzy about what they were assumed. Okay, but I'm not saying you do it for that reason I'm saying you do it because you're trying to make it so that I don't when I read your code I don't say mmm. Did they do anything undefined with overflop? Oh, no, it's it's unsigned I don't have to look for that I can go on and look for other things. All right, so as a trivial example, of course This is what quite a few people do and should not be done It probably has gotten worse because nowadays a lot of people learn to program in a nice safe Interfered language that uses big nums for integers And see uses big nums nowhere unless you go to a lot of work to use a library The nice thing about this is that size t is not only unsigned But the standard guarantees it's big enough for any array that you can obtain and see so actually we should have been doing that anyway because The meaning we wanted to express was a size and that's what we should have done now Sometimes it's not that simple and the reason it's not that simple is that Oftentimes somebody else or us has used a whole bunch of signed variables everywhere else in the code and actually Even though it's not undefined it can sometimes be tricky to deal with the overflow even for unsigned Also, if you're dealing with both signed and unsigned quantities, it's actually a pain in the neck to do the conversions and comparisons Correctly according to the standard although they often are just Implementation defined so at least they don't make your whole program meaningless sometime usually But my point is that a lot of the reason is because someone started doing bad things and then they made more things and more things signed So they didn't the compiler didn't tell them they were doing bad things and that's what you have to reverse And so it tends to be infectiousness So at least get the good infection and try Strenuously to use unsigned and let that infect your code instead of signed unless you actually need negative quantities, okay, and my point is and I'll keep going back to this hard Maybe hard, but it every anything is better than leaving the ad hard within the abstraction is always better than Breaking the abstraction and being at the whim of the optimizer, okay? And if that if you don't agree with that, please don't write C for me Now there are actually some nice things that you know didn't exist a long time ago So those are the gray hair, you know probably have habits to not do these But the C committee is not entirely made up of insane individuals who hate us it just seems that way But there are some sizes which actually express things that we often do with things that don't express those sizes like ints which are large enough to hold any other integer so that you can control the conversions and do things that make sense Or pass them to print F and know that they will work. That's another way to do it if You need to do something non portable like hash on addresses It would be you'll have to be non portable, but remember non portable is better than undefined, okay? It's also if you don't have one that already means what you want it to mean You're still probably better off with something explicitly sized than int so that at least It's consistent from machine to machine and then please type def it and document what that you know In 32 here doesn't mean the same thing as in 32 over here. Thank you very much Now one of the reasons you have a lot of signetness is because live C sucks now I'm sorry occasionally a Unix gray beard gets unhappy with me. I'm sorry, but it's true Live C sucks real badly and here's an example of it read returns us a sign to size t an s size t And the only reason is so it can return minus one so that you know to check air. No Okay, now the problem with that is of course if you know what size if you want to compare it with the size of your Buffer and then that leads you to want to keep the size of your buffer with a signed quantity so that the comparison all makes sense And pretty soon everything is signed. All right, so what you do with this is you get you wrap it All right aggressively wrap things that don't meet your standards and live C is a primary candidate for wrapping the whole stupid thing Sometimes or lots of it because notice what I've done here. Whatever you whatever you're using for error values Okay, you can return what I did is I passed in The place to store how many bytes were read because you'll notice now this can be an unsigned thing that actually means what it really is a size and I have separated the results in the error codes because what people are doing when they return a Meaningful Meaningful non-negative number or minus one or they return a pointer or no is they're actually faking an algebraic data type if you Know what that is if you don't it doesn't matter, but don't fake them Okay, if you don't have them don't stuff things together and use variables to mean multiple things This ends up meaning function parameters You shouldn't have in-out parameters Which is a rule that a lot of people know but it's quite beloved in the C community because it might save four bytes on the stack Remember okay, so if if you really mean at any cost, then I can't help you okay So the point is wrap aggressively all right and in fact a piece of advice. I got I love this title I abuse it all the time makes a compiler your Batman So I got this from a book written by I think an Australian who it turns out that he got a lot of responses to the first edition Because it turns out that in British Empire speak This is like British Army lingo for your assistant but I like the image the unintended image of the compiler putting on a cape and cowl and Skulking in the dark striking fear into the hearts of cowardly bugs that I abused this at every chance Okay, one of the other responses to that same bug. I showed you Same bug report was someone noted there's an option to disable both of the things that bother you Okay, actually it turns out that most of us spend a lot of time understanding our languages and apparently no time Understanding our compiler and we should probably not do that because the compiler actually has when I say the compiler I'm largely thinking of GCC and clang if you use a proprietary compiler. It's still probably true But you'll have to look okay But you can guarantee it with GCC and clang and this is a Linux conference. So I don't feel too bad about that There are a lot of useful flags Okay, and you can do a lot of different things. Some of them will catch compile time errors, okay? In fact, a lot of them some of them will cost you in some way But a lot of these are actually free They don't even disallow things you might have wanted to do the only reason they're off is fabulous amounts of legacy code because in 1977 somebody didn't realize that you shouldn't do whatever they were about to do Okay, and so a lot of things are turned off. That's why how many of you use W all when you write and see okay? I I'm I'm a little disturbed. It's not everybody, but it's a lot of people. How many of you use W extra? Okay, quite a few people so those of you that have gotten to W all gotten to W extra How many of you use any other flags besides those that control the interpretation of the source? Okay, I see a few people. Thank you. You guys are in the advanced class What I really mean is that you can do that But you can also put in runtime checks which you may or may not decide you can afford But let me give you a clue you can afford them when you're running your test suite. Can't you okay? And or you can actually change the language into some dialect of C that you find more Congenial than the one that ISO envisions all right so you know maybe the machines aren't entirely evil Maybe if we ask politely they might be a little nicer So for example here are some flags that are useful for this particular example. I'm talking about W strict overflow will will warn you when the compiler has used knowledge of has used the knowledge That something might overflow to optimize. You don't know if it's okay But at least it warns you that you might want to look and make sure that that's what you meant Because of course you can use that even if you're optimizing so much that you're actually using this It might not be a bad idea to check every once in a while You can actually have no strict overflow which essentially tells the optimizer. Yeah, I don't get so crazy about optimizing that I don't really know why you'd go halfway because you can go the whole way There's an option called f rap v rap with a w Which actually fully defines signed arithmetic to be what you probably thought it already was Which is to his compliment. All right, and you can write in that. I know I'm quite sure that postgres is written in that dialect Okay, and you know what it doesn't seem to have slowed down post because you know what I didn't put in the talk And I should have let me just give you a piece of philosophy One of the nice things about writing in a fast language is you don't have to squeeze every cycle out of it Because you probably don't need ultimate performance where you might if you did it in a slow language like python So you don't have to go the whole way. All right, and in fact, you probably shouldn't unless the profiling says you can't do it So what about runtime checks? There's a similarly named f trap v Which actually will check at runtime and trap on on signed overflow. Okay, there's a an f sanitize option I believe the sanitize family may have come from clang, but GCC has them now F sanitize signed integer overflow. Now that sounds promisingly named it doesn't it? Okay, so I what it does is it prints a diagnostic which probably you didn't want to do in production But might be awfully handy in testing. Okay, and you can even tell it. Hey try to recover from this and continue Maybe I could find more errors. I probably wouldn't use that to go to production But you know what? I'm not your release manager. So There are also some I don't have time to talk about conversion But there are some things that will help you if you take my advice and prefer unsigned when you do have to do some conversion things There is a W sign compare to warn you about Comparisons that might produce incorrect results. You can at least check them And if you know, they're okay Maybe you can find a way to make the compiler shut up by say putting in explicit checks rather than just letting people You know figure it out for themselves somewhere down the line There's also a W sign conversion, which does the same thing. Okay? It's enormously helpful to have have the compiler be your Batman here Whether you think that means assistant or the dark knight I Want to point out when you change the dialect even if for even if it does not meet your goals or your policy to Write to a dialect of C. It's still very handy because if you are actually writing to ISO C for most of these flags Your test suite should produce the same results now shouldn't it so even if you're not writing to the dialect of C where signed quantities wrap For sure guaranteed. It shouldn't change the results of your program. Okay? You can do the same thing with more than one compiler because they won't do the same And they all need to be the same if it meets the standard So what are some habits that can let us do this all the time? Well, one habit is maybe we should actually get to know our compiler and take it out for coffee instead of you know Putting in W all W extra and then hoping we never have to read the man page again and choose those flags as carefully as you write code because You know the compiler Knows more about your code in certain ways than you do more importantly. It doesn't matter whether it's right or not because it wins Okay Remember non-portable is still defined and it's still better. There's no upside to being at the mercy of the optimizer Okay. Now here's another example. This is an extremely simplified Actual Linux kernel bugs. So that's why the names are kind of odd The only thing that matters here is I receive I'm passed in a pointer to a struct and I'm gonna have to do something Okay, and what I'm gonna do with this pointer is I'm going to obtain another struck pointer I'm probably pulling it out of this but it comes from somewhere And then what I'm gonna do is I there's a particular field in this that I want to do something with Alright, however This could be no and so I'm going to check that it's dull and if it is then I'm going to return I'm guessing that that's what this function might have been doing but doesn't matter. I'm going to return and then the rest of this I'm going to actually use the value that dev points to You probably already see the problem. What is the problem? The programmer was probably doing the following Well, okay, what if I receive null here from dev? Well, this thing is this value is undefined if dev is null on the other hand I'm not going to use that undefined value. So it does not corrupt my program state the problem with that is That I mean that's the same thing you would use for an uninitialized variable, right? You should initialize them But if you don't they don't have to corrupt your program state the problem is the optimizer is the one that wins and it says Oh, this is a type 2 function. I can optimize this on the assumption that dev is never known Okay, and then I can delete the null check. Okay, so How are you going to fix this? Well, again, you have two possibilities. You can eliminate the Behavior which is undefined according to the standard or you can define it so it is predictable Okay, we want to eliminate it the kernel is written in more or less c89 So I can't I have to have all my declarations first So I'm obviously going to put the declaration before I compute the value And of course, I hope we're all adults here and we do know that you at least assign you know, you don't just leave uninitialized Variables laying around as a matter of habit remember. Okay, then I test it and now what happens is the Optimizer comes to this and says, oh, I haven't done anything with this thing. I don't know whether it's null or not I better keep the test. All right Okay, the other way is of course if you're in c99 or c++ I personally would do this Um, but linus does not like mixing declarations and executable code So if it's the kernel you'll do the other and either is fine the thing that's not fine is Not defining it at all. Okay And the bottom line here is a subtle point, but a profound one I lied to you when I said what the Author was thinking or guessed of course I said this value is undefined and the compiler doesn't think the value is undefined It thinks the code is undefined and there's a big difference. All right Um, so how do you make a habit of this? Well, this is actually a habit that I came to At some point and for more reasons than this, but this is certainly a big one Um, I make a habit of always laying out my code so that all values get sanitized at the earliest possible moment And sometimes, you know, your brain tries to kill you or try and tell you that well Wouldn't it be better if I get my stuff and then I kind of collect all the tests that wants Yeah, it sounds occasionally sounds attractive. I don't do that anymore Because I want to couple them visually. All right So in fact, the first thing I'm going to do in a function is sanitize parameters Okay, then the next thing I'm going to do is I'm actually going to Acquire resources first and there are other reasons besides this talk that I do this I will acquire resources and immediately sanitize and reality check them before I go to the next one And when I have all my resources, I'll do everything else that might have a non-recoverable error And then at the very end, I'll okay now I know I can meet my Contract and I'll do that but I'll lay out every function that way even when it doesn't matter because I'm talking about habits So it kind of looks like this. I had to smudge everything, of course But I just used the same it didn't sanitize the input, but generally you would okay, so right Hey, if you passed me null if I'm not allowing null. All right, I'm just going to bomb out right now Whatever your error handling scheme is that you have chosen Otherwise I will obtain a resource with that and then I will check and see if I actually got it if I didn't again I'm going to get out as soon as I can Otherwise I'll acquire yet another resource And I'll check that and then I'll continue and everything else, of course We'll assume that I have resources, but I will lay it out that way in cases where There's not that much going on and it would be clear. Why uniformity because I'm making a habit So that it'll still work at 3 a.m. Okay In fact more generally if you really want to think about the principle that underlies that What I will do is I will first determine if the caller has met his contract And if not, I will get out right then and there. I won't try to interpret it anyway And otherwise I will do everything I will first do everything that might cause me to fail to meet my contract And if I fail to meet it, I will undo anything that needs undoing and I will get out And only then will I do the rest of my work which now I know I'm going to meet my contract And that to me in the long run if every function is done that way even when it doesn't seem to matter I find that in the long run it pays off. Okay So There's another underlying value right do common tasks the same way every time if I loop through an array From zero to size minus one and I Test, you know, whether the the index is less than size I would I want you to do that and not loop on whether the index is less than or equal to the last element Why it doesn't formally matter, but in c we have an idiom We we we we track arrays with size not last index Please do that so that we all read code rapidly and that's really the underlying thing here I'm trying to make something so that I can do it the same every time everywhere Okay, and I want to make it look correct. So I always want release code to sit next to I'm sorry. I want sanitization code to sit next to the code that obtains it if possible And I want in other words I want logically coupled code to be visually coupled to the degree that I'm able to do that Okay, the other way to solve this of course is once again define the behavior There's a view this may be the best named flag in all of unix or all of computing no delete null pointer checks It's it just does what it says right on the tin folks. Okay, what happened here is that The this is what the the kernel team actually does And also by the way bind does this and at least for the kernel I don't think that the issue was they didn't want to fix that piece of code It was that they expected that they probably had a 1,000 other sites where that had been done and it had always worked But remember You can't depend on that to stay forever And so as a workaround they basically now write in a dialect Where it's okay to do that So One value and if I kind of have to sum up everything I it's the phrase don't normalize deviance and I got that phrase from a From an analysis of the challenger accident and the point was that stuff is falling You know insulation is coming off the tank. It's not supposed to we don't know why but it never seems to hurt anything So we'll take the deviant behavior and we'll just sort of define it to be normal Don't ever do that either eliminate the under eliminate the undefined behavior either by getting rid of the of the code itself and changing it Or by defining it with a compiler flag Okay, because again, and this is the thing I keep repeating what matters most Yeah, what matters most in life conan. Well, what matters most in c is not being at the mercy of the optimizer. All right Um, I'm basically done I want to point out the only tool I really talked about was the compiler and the reason is that that's low hanging fruit You're not if you don't have a compiler You're not writing c if you are writing c you have a compiler that will help you If only we actually check the docs and I'm ashamed to tell you how long I use c and never really looked for things like this But there are many other things you can do for one thing There's a whole lot of other flags that you can choose depending on Just how much How much you're willing to keep your code base clean Different compilers will find errors in different ways just by the accident of the implementation And you really would like it to be clean on every compiler. You can lay your hands on. All right Clang actually has a static analyzer project and most static analyzers cost money because they're boring to write and hard Clang has a static analyzer project. You already know. I hope what who here does not know what valgrind is Immediately go and google it because if you write in c you want to know what valgrind is Um, and there are a lot of things like that. It's not part of the talk Actually d and rust are things that would save us trouble But we can't do that because we're computer people and we don't change our languages That's it. I want to leave you with a couple of other quotes Winston Churchill served in the bower war and said, you know, nothing in life It's as exhilarating as being shot at without effect And that might be how she feels but you might as well make let lemonade out of your lemons And kind of enjoy the kind of control that you get because you're going to suffer the sharp edges anyway And that might even be why c is fine All right, so that's it. Um, I'm done until I'm kicked out, which is probably immediately Here are what I think are the most useful places to begin reading It's hard to actually find these things documented the way I said it But these two are actually blog posts both of them are extremely good There's also an entire book called secures coding and c and c plus plus Which comes out of one of the people at cert at carnage melon And I highly recommend all three of those to begin your journey What I hope I've done though is made it so you understand why the journey is important And so that you can intelligently interpret Sort of the many grubby specifics and that with that i'm done I did by the way put cards on the back if you want to talk and we don't get a chance to talk at scale Or you want to pay me to do things that hopefully are saying like this All right, uh, any questions Yes, uh, yeah Okay, so well, all right, so first of all At some point, you know, you have to live within your policy even if it doesn't make sense But what I would do is in almost every case like this I only want to do it once and I don't want cast sprinkled everywhere either So what I will tend to do I think with more ferocity than most people Is I will wrap things I don't ever want to see again And I will do it at most once if I need to if I have an impedance mismatch What I really don't want is to fix that mismatch at every place I call it And so what I will do is I will essentially write a wrapper that has the interface I wish I had and then I will get it right once Okay Now if your reviewers can't do that then well It would be nice if you would get better reviewers because I'm sorry that is the right way to do it But again, you can't eliminate every use of sign either there are times when you need it But there are an awful lot of times when you don't I don't know if that helped you but I would wrap them and I wouldn't put cast everywhere because cast you have to read up on the cast That's very problematic too Yes Um I would say it's not worse But that is a matter, you know, I'm not your code reviewer There is a reason why go which was written designed by very Old-fashioned graybeard seahackers pretty much took all the implicit conversions out Because they walked that they finally got a chance to walk it back after decades. Yes Implicit conversions are horrible. I have not I have seen some I have seen some pl one and I must say I was sort of amazed but kind of glad I wasn't using it But if you think about the age everything older than c starts to get worse rapidly c actually taught us many things that we weren't ever going to do again Other questions You and the checkered shirt. I think you were up Pointer my opinions about pointer ownership Um, well as it happens. I did did I pay you to do that? As it happens. I actually have Other talks that I've been known to give and I have an entire Part talk or section of a talk where I try to systematically teach memory management in c largely because I've never seen it done and and I learned it the way most people did by who I'm making more fewer seg faults this month than last month. I must be getting better Um, and when I do that I explain it entirely in terms of ownership So my opinion is that if you're not thinking in terms of ownership You're probably doing it wrong You're certainly doing it wrong if no one can think about it in terms of ownership in c Okay, uh, I saw you first. Yes, sir Um Well, the I mean the answer is it could point at that but the but it doesn't point at that at compile time And the compiler doesn't care Right, so it just does whatever the compiler does and generally in c I mean the you know, they it's actually explicit trust the programmer. This must make sense somehow Um, the the point is is though that the problem again is not, you know Invalid data. I mean that's a problem in most languages That would be the scariest thing is to not understand your program state What's scarier is to not have a program state because you have no well-defined program Okay, and so the problem there was far deeper than this might point at something You know something I can't touch or even Well in main memory It's not going to trigger and interrupt because you have an address one number which happens to be interpretable as an address Um, but it doesn't matter because the point is is that even well-defined mayhem is still better than undefined mayhem But the compiler honestly Yeah Must have been okay I trust him that when it was returned from that other function It won't have returned null because if it did that would be crazy Yes, sir Um, okay. So so the first part let me let me hit the hit the first one. So the first one was Yeah, I will I will repeat it. I wanted to listen so remind so tell me again I wanted to hear all of them Oh the multiple returns Okay, so the question was I did a whole bunch of things like multiple returns Which your programming teacher in college, I mean, I'm a elaborate. I'm exaggerating for effect But my programming teacher didn't like them. Okay, this comes Okay, so right. Well, you've you have You have invoked my undefined behavior because now I get to rant for a second What you have touched upon is a glitch in the matrix where what teachers of programming Say does not actually fit the needs of the industry. This is my opinion and since it's that's the end of the talk I can be a little edgy The fact is that yes, there was a time when people Energetically push the idea that every block of code should have a single entrance and exit and since they function as a block of code It should have one exit. That's the origin now Let me tell you what happened and dykstra edgard dykstra is the most famous exponent of this Let me tell you what happened actually. Let me ask you a question. Do you know anything about chess? Okay, so do you know how you implement chess on a computer? So Okay, well, so the answer is you do it very differently than a human being does it Okay, because a human being is actually not very good at looking very far ahead on dozens and dozens and dozens of lines But we're really really good. Well, not me personally, but a good a master chess player is very good at recognizing Lines which don't make any sense by essentially pattern recognition. So they we we are we are off We're not very good at following trees, but we're really good at pruning Computers are essentially the opposite. Okay We're the computers are really good at following many many So the point is is that for a computer chess player you will get a lot more of Your ability to play from things like look ahead than a human will and less from your position evaluation Why do I say that? Let me tell you the dirty the dirty little secret about that The people who benefited the most from the idea that blocks of code should only have a single Entrance to an exit were people who wanted to prove theorems about code and people who are writing Compilers and what they're really saying is write a do a piece of what the compiler does because if you look up Basic blocks on the internet you'll find that that's pretty much what the compiler does internally the real question for me is As engineers does this help us and the answer is it does not because we are good at pattern recognition and We are I actually Okay, so the bottom line is that you have two choices You can have one exit and have code nested more deeply and sometimes guard the whole thing with these big ifs Or you can bomb out early and the problem is that the way we actually understand code I claim certainly what I do is I read down and I like to know okay You know what I can quit thinking about no because I've just checked for it So as far as I'm concerned multiple returns are not only fine But there are three cases where I will accept go to now only three unless you can convince me that I missed one But the fact is is that when you do exception handling and see The idiom actually the is probably to use go to but that's not what this talk is about the bottom line is that I do that because I actually think it's clearer um, and You know at this point I don't usually rant about that because that argument is so 70s Except you still get people who come right out of school and they don't want to do those things And so I did so that was I don't know if that was the answer you were looking for But it certainly is I I'm perfectly willing to do it Okay, so the second question actually I should probably I should probably Okay Oh macros versus well templates are really only c plus plus So if you're in c plus plus, of course, you can find a way to not use a C macro for almost everything and you should so I mean that's almost uninteresting because Even if templates are confusing you still should they still make some kind of sense within the type system So yeah, I mean I would use a template pretty much anywhere If I could avoid you know avoid a macro you could probably find a case where that's not true But in c it becomes harder because you don't have the templates Anybody else? Yes, sir Yeah, uh You know it varies somewhat on the project. I can't remember the name right now. There's a very nice unit tester for c Let me tell you instead what you should look for in a unit tester All right as long as you're doing the test on you know a real computer The way a c unit tester should work is it should fork off before it runs every test because in an unsafe language There's simply no other way to keep bugs from possibly screwing up whatever your tester is Okay, and there's one that I just can't remember the name But the funny thing is is that if you're writing for an embedded system and you want to run your test on the embedded system You probably don't have fork available But you know what? There is a Okay, so I'm forgetting all the names. There is another unit tester, which is almost the ununit tester It's like five lines. It's like a header file with like five lines of c that you include which actually lets you Do it quite easily. So it turns out to be really easy to do and although i'm kind of Unhappy that my brain has quit working. It's actually very easy to find on google But what you're looking for is either one which is smart enough and has enough machinery to fork off the test so that You have the process boundary between the code which might be writing crazy stuff to crazy places And the code which is supposed to notice that it did it Or you probably want to make it really simple so you can run it on anything that can run c Okay, you know, I really should have those names on the top of my head. I'm sorry Anybody else So it looks like maybe we're done. Thank you so much for coming. I have cards at the back if you want for something Okay, welcome everyone Imagine that you're the owner of this website You spend Your weekends doing research for new recipes Engaging with people that come to your blog posting comments all day um And you really care about it It's your baby It's everything you do outside work However There is something that worries you a lot every now and then You get emails from people saying That you are selling their information to corporations that every time that they go to your website They see advertisement and they don't know where it comes from And this is very puzzling for you because you know that you don't Have any advertisement in your website And you know that you are not selling any information to anyone So you start doing some research And bump into this a blog post from google security that says If you don't enable htps We're gonna tell everyone that your website is insecure And you start wondering what the hell is the htps And then you you also see comments on twitter like this where You discover that isps inject Information for their users to pay their bills or like if the you don't know the torrents Uh your asp Will inject banners in the websites telling you to stop doing that and it gets worse because They might also inject advertisement completely unrelated to your website You might be this time for university, but a random isp might inject Gel jewelry advertisements in your website What the fuck So at this point you're Completely sure that you want that htps for your website You don't want this to happen to all the nice users that go to check your recipes You don't want to People to see this in your website So you do more research And you land in this page. I want you to sync the message in You want your customers to come back? Give us money That's basically what it says It sounds like a ransom to me So you're like, I'm not gonna pay just a company for like For something I don't even understand So you keep doing some more research And you arrive to this other page where it says CDN continuous development and one click htps And someone told you that this what this place gives you the hcps for free So it's like oh This is pretty nice So I can hide it for free and it says that it's just one click and besides that nailify has A built-in CDN. So my website is going to be faster. That sounds pretty cool So you sign up And when you try to To configure hcps You arrive to this page where it says free on all nailfly plans Just click on that button at the bottom and that's it And this for many people Sounds like magic It's like, all right, I don't understand what it is But I just click the button and it worked So That's pretty cool And today I'm here to actually tell you that it's some magic I'm here to tell you how it works So at Nailify We use this certificate authority called Let's Encrypt And Let's Encrypt is It was donated to the Linux Foundation. So It's free. It's automated and it's open So it's free where you can get A certificate for your website or for anything you want for free. It's automated in a way that it's basically an API It doesn't even have a user interface You cannot go to Let's Encrypt and like download traditional Certificate that takes file or anything. It's just an API and it's open Um, so you can contribute. You can you can see the code You can help them And for many people this is a game changer The fact that it's free. It's like anyone can get Certificate for free And what it's it's actually a game changer. It's it's automated besides being free We have seen in the last year How platforms took advantage of this like People don't need to buy Certificates anymore And people don't really need to know what the certificate is Because at the end of the day, they only want their websites to be secure So platforms like warpers.com Just Made all their websites secure for everyone that hosted with them And they didn't have to ask people to go to a third party Pay them for a bunch of files that they don't understand and so on they were just They just integrated with the lesson script and made all their websites secure right for And we saw that that again with the squirt space. They did exactly the same They wanted to make all all their websites secure So they just did Their users didn't have to care how The connections between squirts space and the people that go to visit their websites Are are keep secure. They just need to know that it's secure And etsy wrote a blog post in the same way, too It's this is how we made all the websites for our customers secure for free Without even like asking them like about anything we just did And this is really the game changer um And then if I uh, we don't do this by default yet It's our goal to make it by default this year But so far just by having a one-click button We have seen that like more than 60 of the people that put websites with custom domains with us like just have secure connections um And again the goal this year is just to make everything secure because There is no really point to actually keep serving websites in an unsecure way um, let's include uh, it's based in a draft standard called agme automatic certificate management environment Uh, it's a stealer draft. So It changes every day It's also open. So it's the draft itself. It's uh, it's been discussed on github. So you can go there See the draft you can contribute to the draft You can fix everything you want to fix There are a lot of typos in the draft There are a lot of fixes for typos um So i'm going to explain you how the how agme works in essence agme is basically like a traditional certificate authority But again, it's api driven api driven There is also an open source implementation of ikeme called boulder. It was also donated by ledson creep So if you want to see how The standard is being developed you can also see it and you can also contribute to it There are people from the linux foundation contributing people from the eff contributing to it There are a lot of people involved in it We're going to talk a little bit about the the way that agme transport messages The main characteristic is that Every message Every request sent to the agme authority Is signed with a json web signature and that basically means that every request sends its content in a json payload which includes A protective header with information that we'll see later An unprotected header with information that to interchase in an insecure way Then the payload that you want to actually send And the And also a signature to verify so that the server can verify that actually the request Is is the the original request and it was tampered with It has replay protection. So the server for every request generates an ounce And it keeps in memory all the nuances that it sends to the client Uh, so that way you can you can guarantee that An attacker kind of like try to replay previous requests And all all this is inside the draft. So this is By default agme is Very focused on making this secure too It has request URI integrity So inside the protect inside the protective headers The client also sends the The uri where it's supposed to send the request So the server can verify If there are not people in the middle tampering with with those requests If the url inside the protected header matches The uri in the request then that's a valid request. Otherwise The server just denies that request And we're going to see a little bit about how the certificate management works Um The first thing that you need to do To store issue certificates basically is to sign up So you need to create an account and you just need to send a request to the api Saying that you agree with the terms of first service And just give a contact email like any traditional authority And once you when you do that you just you're done You have an account with them and you can start issuing certificates When you want to issue your first certificate You basically send a request saying I want to certificate for this domain The domain the domain is uh It's a type dns. Uh, we'll see what that means later But you basically give the domain name and uh, the authority Validates that um and just uh And now we'll see how it actually verifies that and how it should how we issue the certificate This is what's called the Just the authorization and the response Normally spending uh the first time using the request The authority Like has a bunch of cues where it processes like millions of certificates Basically, so it's not the synchronous process The first time that you send an authorization The request goes to to the queue and is sourcing pending, right? It also Gives you These three challenges a challenge basically means Okay, you give me a domain Now I challenge you to prove that you actually control that domain And there are three type of challenges hcp1 TLS2 and the ns1 um The hcp1 challenge means that uh in order to prove that you control that domain You have to provision a file In a path, so the authority will send a request to that path If What's in that path matches? The share of the k authorization that you send it Then uh, they assume that you control the domain The dns challenge is very similar, but it uses a dns record So if you want to prove that you control the domain You just create a txt record under That's the domain and and again if what's in that record matches the the share of the Key authentication then that's valid. So You're good and then tls2 Which is again very similar, but it uses TLS negotiation so the authority will try to Request a certificate from you using just a simple client hello And if the certificate that That it receives includes the key authorization Then you're good If you're wondering why hcp1 dns1 and tls2 it's because The standard is designed To actually evolve those challenges And the first version of tls It was tls1 obviously, but it was uh proven that it was faulty so They improve it and they they just change the version To tls2 and the version that's in the standard now it's tls2 However, uh, let's encrypt it still doesn't support the tls2 standard It's still still a working process that uh, it's going to be released As soon as the the draft is Lily more stable Once you have proven that you have access to the domain and you control it You can send send them a certificate request And basically what what you get back It's a bunch of links To get certificate information And also the final certificate And when you have the first certificate you can just do whatever you want with it You just put in your website Or use it in the way that you want to use it obviously You don't have to do this manually Or like write all the code yourself. Uh, there are already a lot of Libraries to this The most Well known is Serbot. It's basically a common line application It was donated by the EFF and You just run Serbot on your machine And it does all the negotiation for you Verifies the challenges and it downloads the certificate Besides that it also Schedules, uh, I think depending on your system It uses cron to actually schedule the renewal of the certificate And so on so you you don't actually have to do anything There are a lot of other options I think there are at this point there are libraries in every known programming language If there is none it's probably because The language is not very well known But otherwise I think there are like libraries in every single language at this point Um So I'm going to talk a little bit about how We provision certificates There are three main points basically The main the first thing to explain is how We provision certificates for users when they go to visit a website So basically when the browser sends the client hello requested to to our servers It reaches uh or the CDN network. We use a patchy traffic server as a proxy cache And it has a specialized plugin that When it gets the client hello request from any user it goes to our To the origin servers and if we have the certificate we just give it back to To the edge nodes and the edge nodes do all the negotiation with the client and Um And that's it and then we start with the the website If we get another request from from other user to the same node The node will directly return the certificate and everything is good This is a very the most the happy path Interaction there are unhappy paths that we'll talk about a little bit later Then The second part of this is when A user goes and says yes, I want a certificate for my website And click that button And this this works In the way that I explained before we're like Our api when we get the request the api will enqueue a job And then We'll contact the authority We'll issue an authorization for the domain We'll generate the challenge And then we'll wait until the authentication is the authorization is completed And if it's completed then we download the certificate and we store encrypted and secure For the next time someone tries to reach the website and then we go and serve And the way that we Validate challenges currently is using htp one Because At the end of the day is a secure as tlso one. So it doesn't really matter that much And since we control the The origin servers, we can just serve that path with the authorization content And and it's a fairly easy integration So now I'm going to talk a little bit about what we've learned And also the unhappy paths Nobody knows how the ines works And this is also like kind of recurring on this talk because If like a person just wants to like put content online Why do they need to know how the ines works? Like they just want to put their content online Like why do they need to know that like how servers communicate to each other? And the problem the problem is that If you want to host your website and you have your name And you give all the content to someone else you actually have to point your name to the servers So you need to know how the ines works And Let's encrypt obviously uses dns to resolve like where the Domain lives And if the dns is not completely propagated They won't find the right server At the moment we actually do a very bad job at telling people That their dns is not propagated correctly Because if you if you notice that the the screen I posted at the beginning with the button To provision the certificate. It's just a button. So you can click it at any moment, right? So you can You can just Change your domain and click it And obviously the dns is not propagated at that point. So we're going to try to provision a certificate for a domain Which dns points to somewhere else and and then like nothing it's going to work, right? We're going to get Still certificates. We're not going to get a certificate And all those kind of weird corner cases The second problem is that let's encrypt has Actually, there are fair rate limits to avoid abuse And uh, basically they allow you to request 20 certificates per week And each certificate can include a hundred names So there is a lot to to request you can like in a certificate you can add a lot of names The problem is that uh, nullify integrates With git in a way Where if you have a repository And you let's say example.com and it lives in your git repository if you create a branch called staging We're gonna we're gonna provision the contents of that branch on our servers And we're gonna allow staging.example.org to serve the contents of the branch and if you Have a certificate issue with us We'll provision the certificate for general branch And this sounds pretty cool but if you Work a lot of with branches We'll end up trying to issue a lot of certificates Which can reach the limit pretty fast And then besides that Uh, let's keep keeps track of The certificates that you request per week So for instance, if I created my branch staging, I request the certificate for staging Um, and then I remove the branch I'll request another certificate for the previous Like a state of the certificate without the branch name And that will incur into a duplicate certificate because I already asked for the same certificate with the name um and again that Limit becomes like actually not that fair When we request a lot of certificates No, I don't know what happened there. All right And third, uh Obviously the network is never reliable We're dealing with a distributed system So we can send requests to let's encrypt and uh, they might never come back or We might try to send a request and the request never arrives there. Um So, uh We're not doing the best Work could actually Keep in track of those kind of things. Um And now how, uh, we're gonna make all this better Obviously, you might already think ways to solve all these problems Um The first one is that we're gonna expose dns Information to our users where like before they Can click that button. We're gonna tell them well the dns is not propagated. So That's not very useful for you to click right now We'll let you know when like the dns is correct and you can actually issue the certificate um We're gonna try to like batch the request that we sent to let's encrypt and we're actually already keeping track of the limits To make sure that we don't encode to those limits and we don't get penalized we're gonna We we've already implemented part of this, but we have a better delivery guarantees Let's encrypt gives you a unique URL per resource that you ask for so if you try to create an authorization That'll give you a unique URL. We keep track of those kind of things so we can replay Um Those requests without actually Trying to Gain the system Um, and finally we're gonna make it open source Uh, or the new system that we're designed is completely independent from netlify um, so if other company wants to Uh star issue certificates with less encrypt like wordpress or squarespace or etsy Uh, they wouldn't have to actually implement the whole system again All in all, uh, hcps Has reached a limit where like there is no going back and uh, the sooner we can make the way 100 percent secure the better for everyone Um Mozilla has seen on their servers that more than 50 of the traffic Uh goes through secure connections already This graph is actually flawed Because uh, it includes pages like facebook, gmail, google, twitter Um, this doesn't mean that More than 50 of the web is already encrypted. It only means that the pages that people visit the most are encrypted And that's it. Thank you for listening. We have time for questions All right. Yep Yeah, the question is how we tackle browsers for um at the at the moment The major browsers already support Their certificates. So there is no there is no issues there only like all browsers Have issues and we actually don't kind of do anything there really We like the traffic that we see Comes uh, mostly from like up to date browsers So there's nothing that we can really do that there Uh, it needs to be a very old browser, but and I don't think we have ever seen like Any Issue like from people saying that the certificate that we try to service in body door like Uh, they just uh, reject your requests just like They're also like Depending on the limit that you hit That domain can be like banned for a week So we also talk with them a lot Um, so the question is what's the life of the certificate? That's still true. Like it only lasts for three months So you have to constantly keep renewing it it's actually more secure because uh, we We revoke a lot of certificates and we see like people that stop like Browsing their website is that stop like Updating their websites or even like removing their websites and we just like kill the certificate as soon as like they do this um, but at the end it's For us really it really doesn't matter because it's just like we have uh, we have A scheduled task that every day like takes the certificates that are going to expire In less than 30 days and it tries to renew them um and For users it really doesn't matter because they don't have to do anything um And again, I I mean we've never had an issue with renewals or anything like that It's just you have to do it every three months Instead of like every two years or whatever Yep Do I have any experience with serbot and the ways it works? Uh As a user I've I've used it Um I don't know like I don't know the internals to actually tell you like if it really works or not I know like it works like for apache or ngnex. It really works It's very easy to because their configurations are very fairly standard Um, I know for instance, it doesn't support apache traffic server So if I try to like the way that serbot works is like If you have an ngnex on your server it will configure ngnex to serve the challenges And obviously like it does a lot of things to make sure that let's include can actually reach your Your server right Yeah Yeah It's the configuration is like for things like apache or ngnex is fairly standard And so nothing really complicated and it's also open source. So you can just go there and see The question is do I have to do the three challenge or just one? You have to do the one that you pick The authority sends you the challenges and you pick one And then you tell them the ones the ones that you pick you tell them use hcps and that's what they use Yeah The question is is there a single ca on the back end? No, let's encrypt Certificates are also Signed by a third party So it's not just there are certificates. There are certificates are signed by someone else The question is is there a path for Certificates without a public network, right? There are ways to do it. There are different ways to do it. For instance Dockers swarm issues certificates with let's encrypt for their clusters and they they can be like private Private clusters. It depends also like on your configuration, but yes, there there are paths to do it The question is is acme k for all of handling those pathways So it really depends on your setup like you actually at the end of the day you have to expose something publicly to So acme can actually reach and evaluate the challenge But it doesn't have to you don't actually have to expose your whole network And normally for those kind of things I think DNS might work better Because you don't actually have to expose servers. You have to just configure DNS entry in your DNS record and then you and then you remove it like you don't have to Another thing is that you don't have to leave the challenge there forever You just like if you if you try to validate the DNS You can just put the record there and as soon as the authority validates the record you just remove it Actually living in there it would probably be a security issue for you And all the clients already handled that like if a client uses DNS Challenge the way that they do is like you probably give them an API key for your DNS provider and they they will Create the record and as soon as the challenge is validated. They will remove the record Is there any other authority that uses acme besides ledson clip? Not as far as I know Yes Yes, you can Yes That actually bull there is very easy to run right it's They they they did a very good job at like well if you want to run it if you don't want to run it in a production system Serving millions of certificates is very easy to run. All right. All right. Thank you very much test test, you know, hear me. Okay Is that good? All right test test All right. Welcome can everybody hear me. Okay Too too loud too soft good. All right Well, uh, welcome to debugging hun python processes with gdb My name is brian bowters and before we get into the talk I just want to take a photo from up here if I can just get everybody to smile And look like you're enjoying yourselves So, uh, thanks for coming and thanks to all the organizers and volunteers who've helped to put scale together This is a great event. It's actually my first time out here and I've really been impressed with the diversity and the different tracks That we've had here. So big thanks to everybody who puts this together So Debugging python processes with gdb. Um, this is something I had to learn By you know out of necessity. I Did not seek out this topic. It found me because one day I needed it And I hope you never need this But if you do You can use these tools and techniques to get some insight into what your python code is doing There's a couple different situations that you would want to do this in and we'll go through those But the thing I want to highlight here is that I'm no expert I'm a python developer. I've been working with python for a long time, but I kind of just had to learn these techniques and so don't be afraid to try them and to experiment with them and play with them We're going to have some really simple examples today This qr code here does link you to the slides. They will also be available on the The page for this talk later today or tomorrow And the same qr code is here at the end so that you can find the content later So with that let's go ahead and dive in A little bit about me Before we move right into the content. So i'm brian bowters. I've been a python user since about 2005 It's my language that I work in. It's the language that I plan I really love using python And I even more than python though. I love free and open source software And I think that's just something that's shared by so many people here So I won't tell you too much about my love for it because I feel like you probably share that I'm a principal software engineer with red hat since 2015 and since I've been with red hat I've been working on this project called pulp as you can see. I'm assured here Which is a software repository management project and you can learn more about that at pulp project.org I won't tell you too much more about that, but I'll encourage you to go check the project out And if you have any questions about it, I'm happy to answer that for you I contribute to several open source projects, kombu and celery. Those are kind of messaging Libraries and applications in the python community. So if you have asynchronous work or one running jobs that you want to Get done in a scalable and organized way celery for instance is a great option to look at. It's very popular So probably many of you have already heard about it So i'm a contributor to both of those projects And of course contributor to kombu So Enough about me Why would you ever want to debug python software with gdp? I mean seriously Debugging is hard enough gdp probably doesn't make it any easier. I mean when you think about it and I come from I mean I've had some c experience, but I've grew up kind of with interpretive languages And really got interested in programming when they had already kind of taken hold And so I don't come from a c background I come from a python background and so that's maybe why I don't feel like an expert deeply in this area But there are some very good reasons why you might want to debug python software with gdb It's things like this so Who here has let me do a couple of show of hands here a couple questions. So first Who has ever used gdb to debug python before raise your hand? Oh, wonderful. All right great And I'm sorry So who here Has ever experienced an issue where somebody calls you up or messages you on irc or sends you an email and says look I got this problem. I'm experiencing it right now. It's in the production environment Can you please help me and then you say yes, I will try I'm going to try to reproduce it back in my developer environment and you cannot reproduce it And you try you try hard you ask your friends and you ask others and you get more information You still can't do it because there are just some situations where You cannot effectively reproduce a bug back in your developer environment So what are some reasons for this? It's a very rarely occurring issue in my case the bug that I was chasing for quite a long time Was occurring very very rarely So maybe only once a month on systems that use pulp in high volume Secondly It wasn't actually in our code as it turns out it was in one of the libraries that we were using and so you know we Deep knowledge about how our application is written and works was not helpful to us and Also the symptom was terrible it's rarely occurring, but it stopped your pulp system pretty much cold It was a deadlocking issue that when you experienced it your your system would be in the state where you know you had it Like you know it occurred nothing no no more tasks would process and The only thing you really could do was restart some of your processes and that would just oh everything's fine again So some people you know they see something weird and they just restart the processes and just keep going But as a developer I need to make sure that this is working great all the time so In cases where you have a defect and it's Something that occurs rarely or remotely for instance That's situations where you would want to use this and talking about the remote aspect of things for a moment um perhaps You know there are things like rpdb for remote debugging right But perhaps you can't connect to this other system where the where the environment is running where the application is running Because you just can't reach it because it's firewalled off or it's behind Some networks that you can't get into for whatever reason and so remote debugging on that live system is also not an option and So you know these are some of the reasons why You might want to use gdb to debug this because gdb is a great tool for getting insight into what is going on in this pipeline application in this production environment So, ah, yes So you can use it where pdb can't go you can use it in remote applications where rpdb cannot go For rarely occurring issues that was the case that I was experiencing Well, also I couldn't connect to these other places So rpdb was not an option and also deadlocking Issues so places where you're just not sure if it's doing anything like is this code even doing anything? I mean you run the python you run the application in python and It's hard to know it's hard to get insight into what is going on inside this application Right now and so that's the kind of problem domain that this technique is very useful in And like I said, I hope you never need it right because You should always be trying to use pdb and rpdb first so Simple simple poll here who is debugging with pdb for instance I expected to see a huge amount of hands up If you're not using pdb, you should be or some other interactive debugger If you're writing print statements and things like that and running your application Really consider switching because pdb is just an excellent tool. So that's tool choice number one. All right Use pdb say with me. No, uh, so who's using rpdb? rpdb, which is a remote version of pdb Okay, um, so that's really great if it's just like pdb But you can connect to a remote system and the way rpdb works is Actually, I'm gonna show you real briefly. This is a talk on gdb but Just want to show these real simple So yeah pdb. This is what you should be using it's part of the standard library. It's the python debugger So please use it rpdb Is a package you can find on pi pi And what you end up doing rpdb is you put this line in your code And when this line is experienced it halts your program and it opens up a network listener By default on port 444 and then what you use is is something like telnet to connect to that port And you can interactively debug the whole stack You can go up python frames and down python frames and check out the local state and look at the stack And look at the variables that are deployed and so this is really great But again, if you have a rarely occurring issue and it's in production Maybe you can't reach it. You can't do this and secondly You have to put the the lines of code in there, right? And so When you stop the application insert the lines of code, which you probably can't do on the production system anyway But even if you could When you start it back up again if it's a rarely occurring issue like mine, it does you know good Oh look it runs great and you know So this is just a non-starter for debugging those kinds of rarely occurring issues But use these tools. These are the preferred tools. Do not do the gdb stuff I mean you can't for fun. I think it's fun, but so The conceptual model here Is that what we're going to do is we're going to use gdb, which is um, I should have said this from the start It's the gnu debugger And we're going to use it to debug c python. So a little bit about these boxes. So the gnu debugger is the debugger for deep for debugging Compiled languages of a variety of types most people just think of it as the thing you use to debug c with And that is probably its most common use case But you can use it to debug a variety of compiled languages and so Uh as it turned well as it turns out, but c python, which is the python interpreter is written in c And there are other interpreters out there pie pie. Uh, is it is a different one? It's also written in c, but it's just a different implementation that can read the python code and provide The guarantees that the python spec requires to make the language work correctly So there are different interpreters if you haven't thought much about that. There are others too But uh c python is the reference Implementation and it's the one that if you're just running python on a system and you're not really thinking about what interpreter you're using You're using c python So the idea here is that the We're going to use the gnu debugger gdb to connect to c python the interpreter and we're going to In a way Do what a A c python developer would do right? We're going to use a c based tool to inspect the state of the interpreter itself And uh through this we can learn all sorts of interesting information Useful information about what your application is doing and you can see uh the box here I'll just describe it here so everyone can understand Above the c python box is a python code box which Is your application itself and so your application code is having its code fed into and interpreted by The by c python itself. So that's the conceptual model any questions here before we move on Okay, so uh, this is a very simple example this is example.py and uh just to describe it a little bit it imports os and time and it defines uh two functions One is foo Which displays the pid that the process is running as and then foo calls bar and all bar does is sleep for 30 seconds To give me enough time to connect to it and show you the awesome so, um At the module level foo is calling uh the function foo is called so just by running python example.py It kind of has this two function call example, which then halts for 30 seconds waiting for me to show you something So, uh, let's talk a little bit about gdb basics You can connect to a running process with by running gdb and then the path to the program Um, and then any kind of arguments after that So when you run a python program, you're using user bin python generally, uh, that's the when you type python That's the binary that you're using and so you can launch your program with gdb. You can say gdb user bin python And then the first argument is the module that is the entry point for your application So you can use gdb effectively to just start your application at which point it's already connected to uh the C python debugger To to the c python interpreter You can also and i very rarely do that the thing i do all the time is i connect to a running process by its PID so gdb tapes this dash p option and uh you specify the pit as its argument and This lets you just kind of start your application the way you normally do and as it's running you can just say Oh, i want to see what it's doing right now. Let me connect to it with the dash p option and you specify the PID As soon as you connect to it gdb halts the execution and this is the way that gdb works and and like most debuggers, uh It halts it and so you can continue execution with uh with just by hitting c for continue And then you can stop it again later with control c and you just kind of see where you end up The you can then detach the gdb uh gdb from the C python interpreter using control d and that's that's just that's disconnecting this link that you see back here So you connect to it and then you can actually detach which will disconnect it and your application just continues on as it as it did So those are some basics, uh, let's have a little demo How's the size is that good for everyone? So what i'm going to do, uh is how about i get into The right directory Okay, so this is that same application that uh i described and showed Uh a minute ago, which is the example dot pi. It's the same code So what i'm going to do is i'm going to run example dot pi And it shows me its PID which is 8506 It's waiting for 30 seconds. It's already in that sleep statement I'm then going to use gdb and the dash p option and i'm going to give it the 8506 call here and at this point we've made the connection between gdb and the c python interpreter And what you can see on the screen here is a whole bunch of um text output and Gdb has a lot of output again like i'm not an expert. I don't understand every little bit of it But what you're looking at mostly here is gdb reading these debug libraries So let's just take a minute and talk about compile languages and debugging them with gdb Uh when you compile your language you end up or when you compile your application in something like c You end up with a binary compiled version right and you can run that binary compiled version um What you'll also get with certain compile compilation options is a debug info package or um Asset artifact that also comes out of compilation at compilation time And this debug info library is basically a map that uh artifact that can be used by gdb to uh connect to the binary that you just compiled and uh have Important information about the compiled binary that you're inspecting so for instance You compile a function call and that function has a name right The binary doesn't store that name doesn't care what that function's name is it's a function And it makes sure that every time it's called it points to the same place right so if you're debugging it with uh Just without the assistance of a debug info library to help support your debugging You will have a very tough time getting any meaningful information because when you think about your your C code It You think about it in terms of the the source code itself and there are these function names and there are variable names But if you don't use this debug info library and you just uh try to debug it Just straight up with gdb without the debug info Then uh you're you're gonna get Not nonsense like you'll be able to see the values of variables for instance because the values are definitely there Um, and you can get some good information if you're really skilled at the art you can get quite it You can have a good understanding of it, but you won't get these nice pretty Things that match up like names that match up with your source code originally And so what you see here on the screen Is it's reading these debug info libraries that i've installed And so when python is compiled Uh for your distro or wherever you got your python from Somewhere there are these debug info libraries available for you So i'm running fedora 24 here right now in a vagrant vm and Fedora for instance provides you python You know python packages out of its normal repositories and people are probably familiar with those There are these other repositories that contain nothing but debug info libraries And so they basically just put all of those over there and what you'll need to do Is to enable them which is very simple And when you enable the debug info library for instance in fedora you can then dnf install something something dot debug info And you'll boom you'll have the debug info libraries available on your system gdb will automatically find them And gdb knows the binary name and it stores the binary name and the version very very Very strict version number and you need a Exact match of the debug info library because any small change and that debug info might not be appropriate for it so It kind of sounds a little complex at least it did to me when i first started But uh that's kind of the gist of it so So you can install these quite easily I mean what no matter what distro you're on debug info libraries are somewhere and you can figure out how to install them And once they're installed gdb just discovers them and it knows which ones it needs because that information is packed into the binary and so when i'm Debugging python Which is since we just connected to this example dot pi process. It knows. Oh, I need to read libdl I need to read lib utl utl lib m lib c ld linux Live download anyways a lot of things So at the bottom You can see well a couple of things there are these lines here. So they say thread Thread debugging using lib thread db enabled. So that means that it's threading aware and a little further up you can see This is kind of the preamble for gdb here and And i'll show you the threading stuff later in the second example, which actually has a bunch of threads in it So at the bottom you can see gdb This cursor, which is the gdb prompt here at the bottom and this is how you get to interact with the source code Now i'm going to i'm using screen here. So i'm just going to switch to The original place where i ran my code notice it's halted So i mean well more than 30 seconds have passed by and yet it's just still waiting here and that's because We've basically halted the interpreter and it's just waiting for my input from gdb So One of the first things you'll do in gdb is to look at the full back trace. That's generally the first thing you want to do Oh wait, that's and you do this with the bt command here, so This is a back trace as i'll put it a c back trace. This is Frame zero is at the top And that is actually the deepest call If you're using pdb or rpdb, the order of the stack frames is reversed so with pdb and rpdb the deepest call that you're into the moment where you've halted execution deep down in your function calls Is at the bottom of the the output, but here it's reversed in in c land And you can see at the top Frame zero is a call into select and select is a kernel facility That allows you to efficiently wait for a certain amount of time without kind of polling and the kernel has efficient implementations to Wake up when that select has completed and so This traces the call all the way down to the kernel and you can see that call there at frame zero So this is a lot to look at and so rather than try to explain all this right here I pulled out snippets of it And I'll just highlight two kind of cool things that you can see from this output and then we'll move on to some more practical Usage of gdb So this is a little snippet from that same output that you just saw And this is a single function call in c python. So These three frames eight nine and ten represent one python call and so inside of c python In on its internal implementation it has these frames that it thinks of and a frame is Basically a single A single step a single depth of the normal python call stack. So when you think about like making a function call Internally in c python It thinks of it as a frame and that frame stores all the code The associated with that and all the arguments that have come into it and This it calls fast function, which is part of the internal c python implementation And it calls a call function, which is another C-based method inside of c python and then it calls this py eval frame x Which actually does the evaluation and you can already see Some interesting things here. For instance, it says in frame 10 line 14 That that point in the python code is at line 14 of my example dot pi And so this is some of the kind of raw information that you can get out of here But as you'll see in a little bit there are much better ways to do this But I wanted to explain kind of the internals of this And this is actually a really cool way to learn more about c python If you're into that kind of thing you can just use gdb to basically say, oh, what is what is python doing when I do this? Or what is the python doing when I do that? What is c python doing when I do this? So like I had said highlighted a little bit earlier, this is a call all the way into the kernel And so it does not debug past that point I mean you can use gdb to debug the kernel, but that's a different topic all together And so frame zero is usually going to be a function call into the kernel I mean it might be that you just happen to You know catch it while it's just running in user space at which point frame zero would not be but if you Generally well not generally but you have a pretty good chance that frame zero is a call into the kernel So that is a look at The upper end of the Of the call stack there with b t so um, of course at this point I mean like if I was sitting in the chair and the first time that I ever Experienced these things. I ran bt on the application. I was debugging and I was like oh my gosh This looks crazy complicated How is this ever going to be useful to me? And um, that's when I learned about these things called python extensions for gdb Um, so thanks david malcom for contributing these to upstream python these landed a while ago as in Years, uh, even if you're on For instance, you know an operating system that's been out for a long time like rel six for instance, uh, you have These the version of python that you're running has been compiled with these extensions And it's these extensions, uh, and the debug info libraries that were produced at compilation time that worked together to help making this Make this debug experience practical simple and very useful. So thanks david malcom Uh at the end of the slides, there's a link to the area of The ticket in python's issue tracker and feature tracker Which talks about all the discussion and actually shows the commit that david malcom has made and others In terms of bringing this about so if you're interested in more the tooling about what that's creating what you're going to see here There's a link at the end of the slide so so here, um, these are some of the Calls that these python extensions for gdb provide us and what? These let us do and we'll go through each of them kind of quickly and then we're going to see some nice demos of them in action But what each of these do is they let us Not think so much about the c-based implementation even though we're connecting with it Connecting to c python and using gdb. It lets us Instead just think about it as um python so for instance like if you can if you're using pdb for instance and you Show the call stack you're going to see nice clear python code in that call stack, right? Well, guess what if you use um pi-bt for instance You're going to see nice python code and so it takes those for example those three frames those three c frames the c call stack and It collapses them into a single one and it strips out all the information that's meaningful And it literally shows you the original python source from that and so That's what pi-bt does Pi list will show you the source. So that's like the list command in pdb Or l as i typically use and so that literally kind of shows you the original python source Even though you're doing it through gdb and that is quite a feat Um That has been put together. This is just such a great tool Pi locals will show you all the locals uh from the current thread. So those are all the variables that are allocated In the current frame of python. So in the in the python Yeah in the python frame And then pi up and pi down let you go up and down the call stack And not the c call stack It's letting you go up and down the python call stack even though you're doing it in gdb Which i think is still quite magic trick so We're going to look at these In action So um Pi list for instance, let's go back to our little little demo here um That is part of the debugging it comes with the debug info libraries when you install them for python And and if you're going to do Kind of any meaningful debugging of python you're going to want to install at least the minimum set of those And when you start gdb It will tell you at the bottom it'll say Uh, i'm trying to find these debugging for libraries because it knows the name and the version from the binary And if it can't find them it's going to say i can't find these and Different distributions do it slightly differently, but all of them will tell you exactly which ones you need to install And so it's very helpful to uh, it'll literally on fedora. For instance, it will literally say here's the command you run to install it So, uh, once you install those you will have all of these tools The ability for gdb to more meaningfully inspect the binary of c python And the uh, this extra tooling that kind of comes around it to better interpret those stacks um, the package name is Uh, it's generally they're called debug info libraries Yeah, there's nothing additional Yeah, it yeah, it does. Yeah, but it comes with it comes with the debug info libraries because gba Because that's part of the python compilation Does that make sense? Yeah, so for instance, um Gdb does not natively do it. So for instance like if you um, you'll you'll see uh, David Malcolm's contribution to python He made um 100 of the functionality that he did to create this was committed to python There was not a direct contribution to the gdb project itself And so um, it's actually for that reason that I know that uh, these tools that we'll see come with the debug info libraries Um, and you're receiving them because those debug info libraries were produced From a compilation of python that came after and includes David Malcolm's work Yes, that is exactly correct. And um, just to unpack that a little bit, uh For instance This is what I used Uh, this single line will enable your debug info repository And install the correct debug info for you and this command literally was shown to me from gdb and so um Sometimes it has a hard time finding Debug info libraries and we'll come back to that at the end. It's kind of like a little gotcha But even though it's kind of a stumbling point that I'll talk a little bit more about later Generally you connect to your application with gdb. You watch what it tells you at the bottom. You try to do it. You connect again You do you do what it says again, and you just kind of iterate as far as you can go with that And generally, you know, your mileage may vary, but um, it's always worked out for me and I suspect that it will for you too And also, yeah, if you have questions, I like that you asked that If something is confusing, um, just feel free to shout it out So, uh, you you know, you hit enter and you go through these C python frames here and you can see there are 19 of them and the other kind of interesting thing here Is that you can see, uh, how how python itself starts up There's a lot of a preamble here, um in terms of how python begins to execute So you can use this technique to learn a lot about how python works, which is way cool Um, but what I wanted to show you right was this Uh, pybt, uh, pylist Good I caught it before it left. Thank you And so This is super useful. Um, because now you can take your running compiled version And you have no idea what it's doing like what is this thing doing? I have no idea And you can connect to it and then if you have your original source available, uh, you can use this And pylist isn't really doing that much right? It's looking at the c python stack. It's looking at your source code It's figuring out where the instruction pointers are and it's it's showing you, uh, the instruction pointer, which is, um, shown there at line five with the carrot and so Uh, you can also Do some other things Control l Yes, thank you so much. That's an excellent suggestion Um, yeah, so, uh, this is our pylist output. Uh, so let's just play around some of these other commands. Um Here's a here's a pybt And this is very much akin to, uh, what the call stack that you would see output from pdb or our pdb for instance And so, uh, you can see here that there are three Python frames even though the call stack in From the c python output was like 19 frames long. This is just a nice simple three And so that's the power of these tools, uh, and how they can help you quickly Get insight into what your application is doing Um, you can also Uh, you know you can pie up To take me. Oh, i'm gonna go up a frame and notice it says i'm going up to frame seven That's in the c python stack Okay, i'm gonna go up one more. Okay. Now i'm up at frame 10 And uh, remember i had said that a function column python is represented by three c Uh frames and so you can see by just going up one python frame you you are going up, uh three Three c frames So I haven't defined any variables here. So pie locals, um, shows me Nothing, but later we're going to see a better demo. Can everyone hear me? Okay. Okay, sorry Um, so later we're going to see a natural application. Um, and the pie local output there is going to be very interesting So, uh, then when i'm done, you know, I can control c um and Um, I can uh continue with this I can continue with the c command and like at this point It's just back running and then I can control c again and I can stop it one more time And so this is how you kind of start execution and then arbitrarily stop execution um Setting break points is of course like something you want to do uh with Any debugger, right? But it's it's hard uh this way and the reason it's hard is because the c python interpreter code is is You know interpreting it's interpreting python code. It's building frames It's calling those frames and executing those frames and it's doing all the memory management and all the stuff that needs to do um And so the same code paths in c python are being run over and over and over again as it interprets python statements, right? And so setting a breakpoint is is tricky because You have to set it in a way that you would have to set it and I don't Have a technique for this You would have to set it in a way that says oh not only do I need to stop at the pot I would probably stop it at the pi eval x call in um in python But not only do you have to stop it? I'm sorry in in the c code of c python Not only do you have to stop it there, but you have to stop it. Oh, I want to stop it when This python function is being called and the only way you can know that is by the data Arguments that are being evaluated or maybe the variable states of the pi eval x call, right? And so um all this is to say that What you won't see in this demo are nice breakpoints to set and that's probably a growth area for the Tool set which is why I linked to it at the end because you two can contribute to make this happen So with that I'm going to disconnect from this with control d I'm going to say yes. I'm done at which point it will detach that program will will continue running For its remaining whatever time is left after 30 seconds and then You will see this Exit so, you know you connect to it you do some things you detach and it just continues running Just like it always did and that's one of the reasons why this is such a great tool to use to connect to kind of production systems Because it's safe You can I mean assuming that you can halt your application And that's an okay thing to do then you can connect to it and then Do your information gathering and then detach and it just keeps on running because that's way cool So with that Yep, so we saw the pi bt example output here So threads gdb has great support for a thread Evaluation and for working with threads so you can get info threads to show you information About all the threads that are in your process the example we look at next is going to have has a good amount of threads in it The current thread is marked with the star And gdb is currently working on exactly one thread And so that way when you list the bt output and look at the or or the pi bt output either one it's working on the current active thread the one that gdb considers active Not the one that's active at that moment in time And so you can switch between you can list the threads of info threads and you can switch between the threads With thread id and then you can list You know bt on that thread. Okay, let me switch to thread three Okay, let me list pi bt over there or pi locals over there. Okay now thread two so A lot of times when I write little little scripts to Send to people who to basically gather all this data and send it back to me for instance You can use this thread apply all Pattern which is a normal gdb built-in thing and that says apply this command to all the threads And so i'll say oh thread apply all pi bt thread apply all pi list And that's a really quick way to just show me all the information that's relevant about python In in in that simplified python output for a particular process that you're debugging So core dumps are Become really handy with this technique right because what you can do is you can say user Who's experiencing the problem? Please take a core dump and you can do this while it's running and it's safe to do And it will just continue running and that's fine So it's basically a snapshot from when the core dump was uh Was taken And you just grab the core by pid Let's process id And you say and gcore is the utility to do it. So you just install gcore on the system And you say gcore By the pid and you'll get this file, which will probably be pretty big um core dumps are not that small and so But they're really useful and if you're really troubleshooting something you just need it Um and so you get them to upload it somewhere and send it to you and at that point you can Let me fix this up a little bit at that point you can Connect to that core dump with gdb also you just say gdb user bin python And then the core file that they sent you and you will be able to um analyze that in just the same way Yes, uh, yes, yes it does um So the question was um does this functionality depend on your system matching the Version and environment from where the core dump was created and The answer is yes and no It's Yes, because The binary of the the information in the core dump will name the Package it the binaries that it's trying to debug so like c python for instance and its version And If you your environment is different from where the core dump was created It will show you when you start gdb. It'll say oh, I can't find the right debug info library and then You'll say okay. Well, I'll just install it but for instance if it's from a different distribution Right, that's probably not going to work because they won't have debug info libraries available for you um so Most frequently i'm debugging on the same distribution that my user is or i'll find a box that has it And i'll even find the same major os version number and even major minor os version number But there's this one issue that i've noticed where A user's environment won't be fully patched and up to date and Then what happens is the say they'll be like just a small little bit behind um in terms of their patch level and then You know time passes and new versions of debug info libraries and new versions of Python compiled are pushed out into the repositories and the old ones are not available anymore And so there is this kind of like vintaging problem that i have run into multiple times So that's the bad news the good news is is that um Even though gdb is going to tell you that it needs a lot of debug info libraries The truth is if you can just get the python one to work That's probably all you ever need so um to show you just a little bit about what i mean The application that we're gonna debug here In a little bit Control l Okay, so like i had shown before this is the uh single command to install python This is the one that you want to get right you want to find that old version you want to compile it again yourself You want to do whatever you can to get this thing right, okay? But gdb is still gonna complain and in fact It won't fully complain to you about all the debug info libraries that it needs until it has this right because it actually Uses this to then discover more binaries that it needs to understand because When you have other You know for instance like Like a um you you can have See modules that are written in c that are um Runtime linked against your python code, right? So um the python interpreter itself is calling into another compiled library and so until you install this debug info library and attach to your application It doesn't know enough To know that oh i'm reaching this point where now i need these other debug info libraries so Get my general advice is get this one right and don't worry about the rest Unless you specifically have to debug down into one of those other calls So it's kind of a guess and no your environment Should try to match but if it doesn't match perfectly or gdb still complain. It's just like just keep going and if for some reason you can't Get what you need out of it then maybe come back and work on it Yes, that's a great suggestion. So um these uh suggestion is to the Very good one is to raise the u-limit up make sure that u-limit is high enough so that when you generate the core dump on the target system That the entire core dump will be generated Core dumps are large and they uh use a good amount of resources while they're being created And so you need your u-limit high enough to have a valid core dump of which to evaluate all right, so Similarly for the application That I am going to show next After I installed the main gdb the main debug info for python and I reconnected to my application It told me I needed all of these Um and you know you really don't need all of these but that's the way gdb works It sees a binary interface and it wants to have the debug info libraries to use it So I installed them. It still complains that some of them are missing It's really not the bit of a deal for me because I can get out of it what I need So with that Um, so one quick little side sidebar Consider using s trace if all you want to know is what is going on. I mean is even doing anything um You can use s trace to Show the calls that are occurring between user space and kernel space And so for instance if you have a loop that is waiting every 30 seconds Say kind of like my previous example dot pi You'll see that select call and you can see from the arguments of it. You can look in the man pages for select And you can see That the arguments will will give you some useful information. Oh the first argument is the number 30 and it's a call to select Well, I know my application. Okay. I know that this thing is running in this loop And I can very easily and quickly get some information about what it's doing Or maybe it's you know opening a file or maybe it's experiencing an error So the s trace is just a great really simple tool that you can use on a running On a running python application And you can use s trace to analyze an already running one and you can use s trace to start your application also so Quick little s trace demo. This is the simplest thing ever So s trace python example These are all kernel calls Calls from user space to kernel space. I'm not a deep kernel engineer or anything. So What I can show you is oh, it's at the bottom It says Select I know that's a call to have something wait With tv underscore sec equal to 30. And so this is The python code calling into the kernel and saying kernel Please wait for 30 seconds and after 30 seconds It will finish So s trace is a really simple way to get some information out and that's really what this is about It's about getting information about what the heck is going on so that you can find and fix the issue So here's a better demo I worked like I said, I worked on a project called pulp pulp is has several processes And each of them have kind of specific uses and functions and so I'm just going to give you a simple example of Debugging pulp with gdb. And this is just a more practical example to show you kind of something more meaningful so What i'm going to do here is i'm going to So these are some pulp processes and you know Don't worry too much about the details here It's up to say that this is the application that i'm going to debug and there are several processes that I want to That I could choose to select to with gdb You're going to connect to one of these processes And when I run the application I know which one that I want to connect to And so The one that I want to connect to The one the process id that I want in particular with this like little drop command that I made is process 7741 now this process is running as the user apache I am user vagrant and so if you're trying to debug a different users Application you need root access or special privileges to be able to do that And so when I run gdb this time to connect to PID 7741 I'm going to use sudo because otherwise I will get an error and I won't be able to connect all right, so Like I had said, uh pulp is a software Software management And composition Application and so you can install a pulp server you can fetch packages with it and you can compose those packages Into different repositories and store old versions of repositories And then host those to your environments. So consider going to check out pulp But the point is that I'm telling you all this for is that what I'm going to do here is run a sync from a remote rpm repository And while I'm going to hit this off We're going to see some output happen from the command line from pulp And then I'm going to switch to my other screen real real quick while it's running We're going to stop gdb and just see where we land All right, so this is pulp doing Doing its thing over our Struggling kind of network connection here. Okay, it's downloading things. Okay. Let me connect to it Boom So at this point now if I go back Right, this is just frozen because my application has stopped just like example dot pie had stopped And here you can see something that I wanted to show earlier, but I really couldn't this new lwp This new lwp is um, that's a lightweight process Which gdb thinks of as a thread and that's effectively what a thread is in kernel land is a lightweight process And so you can see here that it is it has four threads and if I do thread info So here you can see that there are Five threads and each of them has a thread id and you can see the The call that each of them is in and That's pretty much the useful information here So The star which is thread one is the one that we're currently Attached to with gdb and so you can say, you know thread two thread three and switch between them So switching to thread two. Okay switching to thread three And now if you do pie list You can see that this thing has stopped at a sleep statement um, and so thread three is right here at this point in my in my code and You can also use the pie lit the pie bt To show you um the python output version of The call stack here for thread three specifically And so between these two tools you can get a great idea of hey, how many threads do I have? What are they doing? What at this moment in time? Where are they? And you can also see pie locals which is a lot more meaningful now um Well, this this particular thread doesn't have a lot of useful stuff in it, but um You can see these in these variable names. So this variable interval is equal to 10 Oh, the name is pie mongo server monitor thread. Oh, that's useful thread three is a pie mongo Monitor thread part of pie mongo. It's a dependency And so I can very clearly see okay thread three. I know what that's doing. I can see the call stack I can see the instructions. I can see the state and the variables and all the stuff I can also go up Which takes me here and Um I can go up again I can go up one more time. So this is me just kind of like walking up the stack And um, that's really the the long and short of what you can do with this So for instance here, so I've gone up the stack a little bit and it shows you frame 15 and it's kind of like dense output here But because I walked up the stack some if I use pie locals here you can see You can see all the variables that are allocated to this particular thread. So um that I'm almost out of time here. I will Talk about just one or two small things. We've kind of covered the gotchas Which you correctly identified early on which is that you need debug info libraries installed and you need them to be the same Version as the one gdb once There's this optimized out option. So when you compile python, it takes these different options and Sometimes you'll see in the gdb output. It'll say optimized out Usually that's not the part that I need, but you'll see it in there and depending on how efficient your Your compilation is you might see more or less of them. So it's it's a gotcha It's a thing to know and then uh root is required to connect to other people's processes So um, this is kind of last substantive topic here. Um, then we'll have just a minute for questions. Um Do something proactive in your python applications, right and there's something very great that you can do and it's this You can put a rpdb set trace that is triggered with a signal and Now they're showing you like all this complicated stuff that you might have to do Here's something you can do proactively you can put this code into your python code and it will bring in rpdb into your Every time that it runs and then you can trigger it with a signal and at that point you can send a signal To your application and it will stop at that moment and open up the network interface this for you to connect to it with Tell me this does not resolve If you can't connect to to that thing and so this is not a cure all for all situations But consider building this into your application. This is part of rpdb The normal rpdb feature set so something you can do to help your future self. Yes That's a great point. So the point was that this example would listen to on all interfaces and that might not be a good idea And so consider using local host instead That's a wonderful suggestion. I will likely be updating my slides. Yes, sir So the the observation was that this call will will remove your ability to s trace the process and You're probably right about that and I think what you're driving at is that you're now using a different signal You're overriding the use of the signal Yeah, yep So this technique might reduce or completely eliminate your ability to use S trace if you're using this technique You can probably attach it to a different signal or write your own signal handlers so that they stay out of each other's way A little bit better Yes, that's that's exactly right and actually celery itself supports This functionality just as an example of user one and user two as signals that are free and available for you To trigger this kind of debugging network debugging So these are the references and these are the slides We are pretty much at the end of the time. Are there any last questions? Yes The question was have I ever used the set needs to debug async io? Or what was that? Or greenlets. Um, no, I have not done those things. Uh, but anything that is Interpreted with c python or any compiled interpreter you this should be successful for you may need more debug info libraries But this should work with With greenlets and debugging async io code as well. Yes Yes Yep So the bug that I found was a very uh, was in a dependency. Um, not our code but in a dependency and it was An error condition that would occur very rarely which would trigger a Syntax error because that line was almost never interpreted and it had a small error in it Which would cause a thread to die Un without any annunciation that it died Which would bring the other threads to a halt Yes I did not um the question was on some of its systems you might Need to enable pr control and I didn't quite follow the rest of it Oh, yeah, the set p tracer flag. Um, I have never had to do that Um, all right. Thank you all so much and thanks to the great organizers here at scale. We're out of time That's test test test Okay, why don't we go ahead and get started? Okay, how many have stopped by the arden booth in the exhibit hall already so? Most everybody Um, so this is a birds of a feather session. So this is more of a discussion We do have some slides and powerpoint that we could go through But I'd rather not do that. I'd rather have more interaction and and cover any questions or Detail That you didn't get from the arden booth to go over So I'll kind of play it by ear if If there's no questions and we'll start doing a few powerpoint slides