 All right, so first of all before I begin Okay, so before I begin I added this joke that I'm actually not sure how familiar people are with it at this point The office movie with PC load letter or is everybody familiar with with this scene of the movie? Yeah, where they have the printer and people just don't know why it's broken and then they just you know That montage of smashing it In reality this this is this is programming to me like this is every day You know we we all kind of you know go through these phases of like we you know we build stuff We don't really know why it works, but it works and at the same time things are broken And for the same reasons we don't necessarily know why they're broken There's many variables involved with with programming that may be running on different operating systems Who knows there's many many reasons why our programs don't work as they should So this is this beautiful headshot of me. So so why am I here? So first of all, thank you for bringing me here I'm kind of excited to be at Pi Ohio because I am an Ohio native, but I currently don't live here anymore So it's kind of exciting to come back I've been a software engineer in some sort of capacity For about 20 years and I say in some sort of capacity because probably the first Solid 10 years of programming was me as a child not really knowing what's going on so there was probably the first 10 years was as quickly learning as Somebody now that would go through like a code camp in probably two weeks but through this has given me quite a bit of experience of Dealing with and debugging things by myself and learning just how things work kind of out of necessity and kind of forcing my way through things So today if if is anybody here familiar with with century Okay, cool So today my my day job and probably for the past six years or so I've been a core contributor to this project century I'm not going to go in and talk about it a lot in detail But this has been pretty near and dear to my heart based on just a lot of my personal experiences of how I got to where I am It's generally all about tools to help debug debug your software So Before I begin I kind of want to ask a question. So who here does this as a professional career of some sort? I assume probably a good percentage of people writing code. Okay, how many people would claim that they're Good at their job Okay How many people would say that they're bad at their job? I'm pretty bad at my job And I'll say as a personal anecdote. This is something that didn't come to me right away It was something that probably There was you know mid of my career I got pretty excited and I felt that I was pretty good at what I did to the point where You know, I thought that I was kind of flawless and things that I would write You know just did what they wanted it to do and then kind of as I as I crested over this I started realizing that Everything I do is wrong and like there's just so many more ways that everything I do can break and there's just so many more variables in the world you get into Distributed systems and that has opened up like a whole another world of things and kind of that that self-esteem in a way has Just kind of plummeted, you know to the point where now in a way though I'm more self-aware that like I feel like I'm I consciously don't know what's going on so doing this kind of drives behavior of Writing tests, you know writing tests is a really good example of dealing with that Like you know that your test is gonna be bad So you just litter tests everywhere to kind of help assert that what you're doing is valid behavior But tests are kind of flawed like there's tests are never gonna cover 100% of your use cases You're gonna have a function that adds two numbers together, but someone's gonna put in a string You know someone's gonna do something dumb with this that you didn't expect and your test is never gonna cover this And your software is gonna break So I love this quote by Taylor Swift And to quote it verbatim if an exception happens in production and no one sees the logs Didn't really happen, you know, we we can kind of you know define production as many different things To a lot of us. That's probably shipping some web application, you know, we ship it out there Someone runs it, you know, maybe it's a shopping cart someone tries to buy something and it's broken Or maybe it's something that it's just a script that we've given to our friend and they run it and it doesn't work So production can mean a lot of things But if we don't know and we don't experience this happening or someone doesn't tell us that this has happened We don't really know that it happened so this kind of drives into The story of you know that we've probably all at some point experienced and if you haven't you will experience this problem You'll you'll you'll you'll give someone this software. Someone's gonna say this doesn't work You're like, well, it does. I've you know, I've run it here. It works The problem here is you have you're missing a lot of context about you know, and they're not necessarily sharing all of this context You know, you know, they don't know what operating system they're running on. They may not be you know varying levels of Technical capacity to even explain the problem correctly, but you're just kind of left in the dark of saying well, you know I fired up on my machine and this works so to kind of explain this process and we're gonna talk a little bit about Some computers so I Was gonna do this in a little bit of x86 assembler But I've thought that a little bit of diagrams would kind of explain this a little bit simpler So for those that are not aware and I don't expect this to be common knowledge Like we don't deal with this stuff on a daily basis But we're gonna go through just a very very high level very quickly It's also glossing over a lot of details of how we tell the computer that we want to open a file And if we can imagine opening a file has many problems file doesn't exist Maybe not with the right permissions many reasons why why this might not work So to quickly go through we're kind of you know, we feed this information into the computer We're saying we want to open this file Here is the file name that we want to open This is the mode that we're going to open the file as Now we're saying hey yo hard drive Here's the shit give me what I'm asking for And what happens is the hard drive You don't have any you don't have a lot of direct communication With with this external device. It's a separate thing. It's not attached to Well, it's attached, but it has a wire So you're communicating over some interface And this interface ends up writing back a response code And if we can imagine this this translates into higher level programming languages as well, you know, you have Python you have a javascript to whatever they return these values that tell you what that is So this is this is kind of fundamentally what is happening with a computer This is how we've gotten an error back that says we don't have a file So now if we jump into something that you know, potentially we're a little bit more familiar with and we go into c And c is now providing A couple more abstractions over this they're giving you this interface for opening a file You could approach this again at that same lower level, but we're going to try to start glossing over things at a little bit higher level So here this is a broken program So does anybody know off the top of their head what's going to happen if I try to run this And foo.txt does not exist This is c We were going to seg fault So we'll get something like this And the problem here is that the f open did not respond with something that we expected like we have We're expecting this file struct and we actually got back null So now we try to operate on this null and the computer is like, uh, I don't know what to do with this and it seg faults So we kind of have to you know build this guard around by checking what this return value was and then you know conditionally do this behavior So this is kind of some some fundamental error handling that we we do So if we move over to go this one's a little bit of a trick question So does anybody know how this would behave in go? bingo, this one actually doesn't do anything Because the the file object that is returned is effectively able to be no opt and worked on as it is a file It just doesn't actually do anything. So in a way, it's kind of it's kind of a good thing, but it's also um An underlying way of just how go functions But go has the same similarity that it returns an error and you're supposed to handle this error in some way of You know checking in this case if it doesn't equal nil you can return it Um, and then if not you go, you know proceed and read the file So if we move over to rust rust is actually a little bit more interesting. Um, so rust This is actually kind of hard to do Uh, if you're not familiar with rust and I do not blame you if you are not Uh, rust at compile time Tries to make well it absolutely makes sure that you are handling all of these cases It will not let you get to a state where you can have a nil pointer Um, it will not let you get to a state where it's just going to blow up Um, so in this case if we notice the dot unwrap at the end, this is telling rust I don't care And This is the and if you are new to writing rust I promise you you will be putting unwrap on everything Because the it's a little bit verbose to work around things correctly um But if we go and we run this code we get you know this error we We have the option to spit out this back trace. Um, which is pretty familiar of something that we're we're used to in in python But if we want to correctly handle this in rust We have to do something like this which is is kind of similar in the way to how go did it but basically we have Sort of analogous to like a switch statement of the ok path and then the error path And we're required to handle this stuff for rust to even compile. So that's kind of it's kind of nice Now we move over to python So we should all be pretty familiar with with what's going to happen here, right? We're gonna we're gonna try to open this file And it's immediately gonna gonna raise an exception And we're gonna get something like this. This is what we're familiar with So if you want to handle this in python, we probably all know, uh, we're gonna, you know, accept this i o error We're gonna pass and then if things work correctly now, we're gonna proceed with our operation So the key thing to point out here is that this is an exception as opposed to an error So what does this mean? So errors are basically that return value. It's something that we you know the function returns We're able to introspect that return value. It also does not halt our program We're able to have multiple errors that are coexisting. We could try to open, you know 20 files at one time and then check all the errors later. There's nothing that's preventing us from from moving on in code Whereas an exception in python As we're all should be aware an exception is fatal an exception is something that requires you to handle this at this time You cannot proceed with more code within this same call stack until you have explicitly said what you want to do with this So if we go back to what what is this exception that we're looking at? um Let's start diving in and say like, you know, where does this come from? How does it, you know, where does this information come from? What does it look like and what's it? What's it do for us? So if we go back to you know, we just ran this program Uh, the first thing that we could do is there is this this main level Accept hook and this is the highest level when your program is going to crash Meaning you have this this exception that you did not handle in any same way There's this high level accept hook We can intercept this accept hook and we could say whatever we could do whatever we want with these values And we could you know at this point start debugging or you know spit this out in different formats So if we start out and let's just print, you know, these arguments that we get We get the type of the exception We get the instance of the exception. We get this really cool traceback object So we dive deeper Now we let's try to reproduce the output that we actually get from the console Well, they you know, there's this convenient traceback dot print exception that we can throw in there that takes those same arguments That will give us this exact string So we've been intercepting this at the global level But now what happens if we want to you know effectively log this if we want to capture this exception Handle something gracefully move on but we want this output Well, we could do something like this. So inside of our accept we can access those three the the three tuple of exec info And this will give us back the exact same IO error the instance of the error and this traceback And to be verbose we we could do the exact same thing. So we have you know print exception Here are the bits give me give me the output Except in this case our application won't actually explode But now we we know that it happens and we haven't you know gracefully moved on So this exec info Is kind of in a way abusing the fact that exceptions are These things that only exist one at a time So we can run sys dot exec info at any point in our program But if we run it outside of an exception We get nuns If we run it directly outside of the accept We also get none So this is very important that this this exec info Is taking the fact taking the fact that there is one exception that is at runtime at any point in time So if we do it within within our accept block We get The exact same output. So we have the exception type. We have the exception instance and we have this traceback So let's start taking this traceback and let's start doing doing something with it besides just printing it out to our console So There's other options of dealing with this so we can we can print this exception to a file in this case We're going to do standard error, which is basically the same thing as as we've been doing But now now we have something that's a little bit more sustainable. We could run this program We could dump this to an error log And now we have all of our errors, right? All of our problems are solved. We have all we have all of the exception data Until you know, we we look at this file You have this and and if you're running some web application, you know something with with a lot of data This can be blown by like if you tail a file and anything with you know, a high volume. It's it's it's nonsense So that was a contrived example, but let's look at what an actual you know, here's a key error So from this we should be familiar with kind of what these pieces are. So if we break it into one individual traceback We can see these are really the key pieces that we need. We've had this this things Variable of some sort. We have a key error of thing three But we still don't really know why like we know that there was a key error, but we don't know why there was a key error It should be there So now now we go back to our story And you know, we get this error log and someone says hey, yo, this this ain't working And you say well it it should and then you get the trace back and you're like well That should be there. I don't know like what you're you're doing something weird. It's not my fault So this is another really interesting thing that we can do in in python and I'll say I used to take this for granted. This is not a thing that is easily accessible in other languages You can do it in ruby, but in ruby. It's extremely expensive But you can introspect. There's these magic functions globals and locals. So globals is Kind of as you would expect and that will dump out a dictionary of global values in your module So you'll see everything that was exposed that so you would see key and you would see things then locals is Local to that frame. So if you're within a function, it'll tell you the variables that were in scope of that function So if we look at this output now now, we're starting to get something that's a little bit more useful We can see that. Oh, we here was our things. It was an empty dictionary key was thing La la la we can kind of make a little bit more sense now of why we got this here because we have the actual values So we take this a step further, you know compounding on what we've been learning We could start to do something like this uh This kind of goes into a much deeper topic of actual structured logging. Um, so Good friend of mine hynek has a really good library called struct log. I would absolutely do not do this There are many reasons why just trying to json dump All of this garbage will fail Use a library that does this uh Don't do that But the value here is that we could once we get this into now a machine readable format We could start doing other things with this data. We can say Shove this somewhere else. We can shove it into elastic search. We can shove this into a service like sentry Um, or anything else that can actually consume this data in a reasonable way So if we go back to so now a little bit more complex of a program We see that we're going to get a problem. So here we're going to we're going to get this thing Uh, we're we're trying to get a random number and then we're going to do a key error it's kind of contrived but we're going to show that You know just having these local variables is not as useful. So we see that our index here was eight But we don't know anything about like we don't know how it got there. We only see that that one scope So this brings us to how we can, you know, kind of dive into this stuff a lot deeper So throughout this I've shown this trace back object. This trace back object is Extremely in depth of what you could do with it. Um So here is an example of taking, you know, we run this debug function And as we see we're going to actually extract this trace back object And you could walk up this trace back object through all of all of the frames in your in your stack At this point if we step through this we can get The frame of code and then we can actually extract this code object out of our frame And these frame trace back and code objects have all of these primitives that are needed for Uh constructing that textual stack trace that we saw at the beginning So now we can we can kind of Keep going further and start splitting this up into more and more Your useful bits and there's a lot of stuff that if you really want to dive into these code objects There's a lot of stuff in there. I think I don't need that variable there. That's a good observation Definitely don't need that variable. What is the while? Because so the trace back will keep looping back on so there's a tvnext of next frame And you can keep looping back on yourself tvnext. Oh, no the actual iterator next no So now if if we get this we have something like this, right? So now we have each of our our frames of our function We have you know now we can extract the actual function name. We can extract the line number and if we noticed We can get frame dot f locals This is pretty clutch Uh, you're so you're able to walk up every single frame of the stack trace and extract that dictionary of local variables For every single frame So now we get something like this and we you know now this is This is basically all of the information that we could potentially get out of out of that little primitive program So the next steps are you know, let's let's get a lot more information Right. So now we have the case of we've run this program We have these variables But there's still a lot of things that we don't necessarily know that are kind of implied with the environment in which this runs And this is all a lot of stuff of context So once we've gotten to the point where we can you know structurally log this stuff or we can do something You know extra with this we can now start collecting All of this information that you know, whatever we would need to help us in this the situation So in this case, we're going to start we're going to log, you know, the arguments that were passed in through the command line We're going to actually extract all of the environment variables We're going to extract the the host operating system information And we're going to extract the time that was run and all of these things that What we're trying to do at this point is just get All of the information to help us actually try to fix this problem Uh, if we kind of expand on this we can we can pull this out to say a web application Right a key thing of a web application would be knowing what user triggered this error Maybe what http endpoint did they use to trigger this error? Uh What state was there session in when they triggered this error? There's a lot of other information that we need to put together to actually get the full picture Of what happened, you know, so now ideally once we get to this stage, we can actually see Oh, you're running on a really old version of mac os Uh, why is your clock skewed five hours from what it should be? Um, you can start learning a lot of this information about, you know, this environment And now we could take this we could take this report. We expand on it We, you know, collect all this information And I like to point out It's really cool standard lib if you're trying to read a file and you want to get lines out of a file There is a line cache library that is built into standard lib that is used internally To render tracebacks and it just keeps a little very it's not even lru cache. It's literally When the cache gets this size Flush and just restart But the idea is that we so we can use this to our advantage to extract lines out of our file Without hammering our disk That's pretty cool And now at the end of the day, we have this this really nice report of all this information And we can just, you know This is pretty simplified, but we can now email this to ourselves and say, oh We saw this exception. Here is all of the information that we need to do with this So kind of in summary that that's you know, our jobs are are pretty complicated You know, there's there's a lot of variables There's a lot of way, you know ways that production systems differ from how we run things on our computer The testing environments are very different Uh, as I said at the beginning tests don't cover a lot of stuff. Um, if you've been shipping things into production for, you know Since day one, you're going to learn that somebody wants to put in an emoji for their age And you're going to hit these problems that you just absolutely Never anticipated. Um And these are things that necessarily make us bad at our jobs. It's just it's really hard To understand and and be able to anticipate all of these use cases But it's important that we we can build and design things to be a little bit more defensive So we can kind of step in front of these and and Catch them. So the first time that we see someone do this Imagine, you know, someone someone does that They put their emoji as as a name. You can now reach out to this person and say What are you what are you doing? Why'd you do this? You know, you can actually be responsive to that instead of waiting for this customer to be You know, this is broken and then complaining about this or you know yelling on twitter and saying hey Yo, you know shit's broke. But no, you put an emoji as your name Stop so the the the little sales pitch here is that uh You know, this is what I do like I've been doing this for a long time And we've we've been developing the software that you know helps us do this You know all stemming from the fact that We're pretty bad at being software developers and that's not anything to be ashamed of it's just there's a There's always a trade-off of you know doing everything 100 right I'm not shipping stuff to the moon. You know or something. That's that mission critical I can deal with you know the occasional exception. So this has kind of been you know a passion for me for You know, it's going on seven years at this point of building software that does this for you all because I'm pretty bad at writing software and I've you know, I've kind of resonated with that So essentially that's just kind of what we do. So generally like you probably don't want to do all of this stuff yourself Uh, but it's nice to know all this stuff is there in python This doesn't exist in other languages. So congrats So again, I don't really know what I'm doing, but uh If you have any questions about all of this, uh, I live and breathe it So I may or may not have answers for you. What about it? Yeah, so So in this world of what I've shown you will be sending that Um in the real world you would have like there was a magical serialize of this report You if you were writing this you would probably have in the serialize something that is Generally scraping on patterns you would say oh, this looks like a credit card number I'm just going to plow through this entire dictionary of this report if it's a credit card number redact it Um, or anything's like that if you have sensitive keys like so using sentry for example There's a lot of key words that we'll we'll use that just kind of throw red flags if it's off Password like obvious things that we'll try to extract before sending it also because that information is generally not Useful at the end of the day unless it happened to be the fact that they were putting in a credit card number of emojis Is what caused your problem You know this pros and cons either you collect the credit card numbers or you you know you sacrifice some information That's sure if it if it was your own and you were putting into your own elastic search Any other questions? I will honestly say no uh python for all of this world so Python is by far the best um ruby has some pretty similar stuff that you can do But ruby has a very heavy performance impact at runtime by enabling this feature so people generally don't Um, I don't know all the technical details of why python does this for free I assume they just absorb the overhead just into the runtime since there's not really a way to disable it. Um, so it's a little bit more transparent um Tainted that though. I'll say The flip side of this is other languages that don't have this stuff So javascript is one that stands out and that was the original inspiration and the first talk that I gave about this and javascript is It's a lot more entertaining I'll say especially of of like a talk because It's just a trash fire Like trying to do this type of stuff. Um, so that that talk is a lot more of just like You have to go through so much effort to get a You know anything that is useful out of this. So for example When you get an exception in javascript, there is there's this nice convenient exception dot stack which is similar to you know the trace back but in javascript, this is a string And there's no way to to introspect the string any other way So you can plow through with a regular expression and kind of extract the pieces that you need out of it But every browser does their own thing So every browser has their own format every browser has their you know, their own Way of doing things that evolve things over time But there's definitely no way of getting local variables or anything like that and even getting stack traces in javascript is not As trivial as this there's a lot of complexities involved in that as well Uh, there are yeah So he asks a little bit more information about century and if there are any competitors out there um I'm not a marketing person. So I'm going to avoid the competitor question. There are competitors out there I will say Almost objectively, we've been around for the longest. I'm for sure not necessarily to say that that is necessarily a testament of quality, um, but we've been doing this for a really long time um Yeah, so century is is kind of fundamentally all of this stuff kind of wrapped up with a really nice bow on it So our goal is getting you to Giving you the information so you can actually fix a problem as quickly as possible And all of that is kind of you know, this was centered around context and getting context a stack trace is not By itself is not enough for you to always solve your problem. So it's getting stack locals It's you know being able to add your own metadata to stuff You know stuff that we don't know about getting the hdp request information getting the query string arguments that we're passing Getting the user information and getting all of that stuff and then tons and tons of other metadata so our our goal is being able to get you to You look at it and you say oh That's why and maybe derive a pattern and say like oh these types of people are all doing the same thing Oh, they're all from some other country. That's a problem You know or something like that and you can kind of identify You know those types of characteristics to kind of you know all this stuff to help you fix Actually fix the problem We can talk offline about this view. I mean, it's I mean, I'm generally trying not to be He's he's asking about our product of like One of our tiers and saying how much you know, what does 10,000 events a month mean? To be clear, I'm not here to sell century, but I will talk with you about it after if you want anything else