 All right, so I'm ready to begin. And I'm talking about an error management story for Rust. And so I'm thinking that people want to be at this talk where people will be Rust programmers who actually want an error management solution. People who think about error management generally because I'm gonna talk about error management somewhat in the large. And people who just like languages because I'll talk about different languages and languages are interesting in all their manifestations. Okay, so who am I to be talking about this? Well, I have about three years of experience programming in Rust. Many years of experience of programming in Python, which I still do. Distant memories of programming in C. And also I'm the status team tech lead. And even though I think I would be interested in error management anyway, I have to be more interested in error management because of what Stratus does. We're glue code in Rust for the storage stack and we can encounter errors because at the bottom of the storage stack are storage devices. And so we're very interested in producing useful errors for users that is the people are managing the storage. So I have to be interested in it professionally. Okay, so what should you expect? Well, I'll talk about error management generally. And I'll talk about what's going on in Rust to try to address error management. I'll discuss what I've come up with so far. And I am sorry to tell you this is not an announcement of an error management crate in Rust that will solve all your problems. There is in progress PR, which has been in progress for a while on and off. And in fact, I actively invite any sort of thoughts or suggestions. So what's the problem? And the problem is Rust has no established mechanism or policy for error management. Although it does have error handling primitives and I'm gonna expand on what I mean by that statement in the sequel. Okay, so I really actually don't know of any language that has an established error management policy. So we're not really saying anything special about Rust. But I was talking about my talk with somebody who works at Red Hat and they worked at Digital Years Go and they said, oh, this was all solved in VMS. So luckily, most of you don't look old enough to remember so all this will be new. So I'm gonna talk about Python. And this is to expand on the idea of error handling versus error management. And I like Python and Python has made me happy by new introductions in Python 3. First there's all the old stuff that kind of works pretty well. You have exception handling, there's the try, the accept. You can catch exceptions by their types, you can identify them. And newish is the thing where you can, as part of the language, you can just chain the exceptions. And so what's going on here, this is some chunk of code that I wrote at some point. You see the happy from keyword. And what we've done is we've caught some error and we've changed it with our new error in our library. So now there's a success of errors, there's one error and it's child, which was the original error. And this is a library that is very fastidious and it knows that it can't get any other errors besides the ones that are checked. So it's chaining it with the special error that says oops. Okay, so Python has this nice mechanism but it doesn't have any error management strategy or enforced anything or mechanism really. And Python has a big problem, which is that if you have a Python library programming and you're calling into other libraries, you can kind of expect any exception at any time to be raised from the code that you call anything. And so there's a lot of Python code in the world, obviously, and that's because when Python developers develop code, they sort of forget that this is a fact, otherwise they would be too depressed to continue and they just go ahead and write code and then they catch the exceptions they know about and when more come flying through, they add some more code and so forth and so on. So it's very reactive. Okay, and so I'm pretty fanatical, I'm in the lucky position of having written lots of little Python libraries and so I define a bug if any exception gets through my library from lower code. That's a bug, if I raise an exception that I haven't done find, it's a bug and I always chain exceptions. So that's my rules that I obey and they work well in my world. But it's a little bit of work to think about that and design your exceptions and so forth and lots of people don't even agree that this is what you should do and think I'm a little weird for developing my libraries this way. Okay, so now there's C, this is a totally different language and it's sort of a good example of a very different and also bad situation. So in C you have error code, you return the error code and the easiest thing in the world to do is if you're a developer is to ignore that error code. There's nothing to prevent you from doing that. They don't tell you much either and it also turns out that often there's no information about what they mean. And so my former colleague got four years of fun papers in many of them in good conferences out of this fact. So I left around here, that was when I graduated but she took open source C code which was available. Most of it was system code which is kind of this thing that Stratus has to work with. She had a special love for file systems apparently and she got lots of, she used static analysis to show all the many ways in which error codes are lost or confused or ignored in systems code and she did really well on that. And you could argue, well, like there's some tricks she did here. She's like, there are a thousand locations in file system code where error codes are just ignored. And the way to get an even better number is to analyze another file system obviously because hopefully there are a few more places. So you can argue that she was, some of her numbers are sort of more about her cool analysis but you can't say, this isn't yet another way to say there's a problem. Okay, so that was Python and C. And so I'll compare them and I'll say, well, see there's something to tell you that there are errors but there's not much else that you got going for. Python, it has this exception mechanism which is really nice but you can expect track stack traces all the time in a regulated code with different types. They just keep on coming. Okay, and this is a total aside because it'll inform the rest of the talk. I'm showing a simple two-way graph. Statically checked languages. We see the column, Rust and C are in that and then typed sound languages, Rust and Python are in that. And so basically what I'm gonna talk about is that Rust is in this particular spot here and that's a good thing because we know types of, we'll find out that we know types of errors that we have to deal with at compile time. And that's a rough thing. It makes things hard for us because we have to make the language acceptable to the type checker and that's harder to do for Rust than for C and you don't have to worry about it in Python. Okay, so Rust took the other approach, a third approach and this is kind of wordy. I left out a lot of annotations because this is, but left a few in. This is a fundamental type in the Rust standard library. It's the result type and it's a typical parameterized type and so you can construct a result type in two ways. Either you construct okay and that will be the value that your function is supposed to return or you can construct error and that will be some error type, okay? So another thing you should notice is there's this little must use annotation which the compiler understands and we'll see it will enforce. Okay, so here I am. I've written a incredibly simple function up here and I called write all on a file and write something and as we all know when you write a file that may not work out there could be an error and the compiler that means that write all itself is a function that returns a result type. Now the compiler responds to this particular use of that function with a complete. It says that you didn't look at the result of invoking this function here and there's a warning because you must use that. So I didn't look at it in this code here. I ignored the possibility that it might fail. The compiler itself is gonna reprimand me and of course this is only a compiler warning and we can turn off warnings and ignore them if we choose to. But if people see us doing that, they will distrust us. Okay, so I can get around the compiler in another pretty cheap way as well. I can match the error and I can match the okay and then I can just choose to do nothing, whatever. And the compiler will be happy with that and so interestingly will the linter, okay? But if I do that, that's a pretty flagrant behavior and again that will be noticed and it won't usually pass code review and things like that. So I'm pretending to be interested even though I'm not. Another thing I can do is I can say okay, if there is an error, terminate the program like that. Stop right there, terminate it. That's what this unwrap does in the same situation. But again, if you're writing production code that is supposed to operate in difficult situations, people will notice this and object. Okay, another thing I can do is I can assert that this will always succeed and you can all see that I'm lying here. But if you didn't read the string, you might think I had good judgment and knew what I was doing and had somehow proved that this thing could succeed. So that's another way I can get around the compiler but I could choose to do the right thing. And here I am choosing to do the right thing. I know that write all returns an IO error and so I say my whole function can return that exact same error and I use the little question mark to say if you encounter an error, return it, all good. Okay, so that's my honest behavior but it turns out that immediately is a problem. So the thing is when I write functions, they might not all return results that have the same type of error. So I'm writing this function and here I would get an IO error potentially but down here maybe I'd get some other kind of error. That can happen. And the problem is, well, those are different errors, different types of errors and rust needs to type check stuff and it won't accept that and so what I have ultimately do is I define a super error. And I can convert every other error into the super error. So for my whole library, whatever, I define the super error and any other error that I could encounter in the functions has an automatic way to be promoted to the super error which I call all error here. And we have a gain here because we had to write all this boilerplate to do it but at least we know every type of error that can be returned. There's another problem here which is very significant which is that what I've done is typically, I have this little question mark here and I say, okay, if I encounter this problem, just return that error and it'll be subsumed into all error. And so what I potentially have is a stack just like any other stack and I have a lot of calls on the stack and in functions, the first function returns an error and the next function looks pretty similar with a question mark so it returns the error and the error just keeps being returned until you get to the bottom and then you have the error coming out and all you have is the error. Okay, so how are we doing with Rust? Some things are good, some things are bad. So long as everybody is writing stuff properly, any function that can have an error must return the result type and you can see that and deal with that and that's good. In Rust, the compiler forces you to do fairly obvious work to ignore the fact that a method could have an error. So you have to do work which is the inverse of C where you have to do work to observe it. The situation is still really bad in Rust. So we have a good thing which is we know all the types of errors we can encounter and we can work with that which is better than in Python. But typically we just end up unrolling the stack just like we would if we raised an exception in Python but in the end we don't even get a stack trace because that error return is a normal return. So at the bottom we have some error and the typical thing of course that you might encounter is that in the end you know that somewhere in your program there wasn't IO error and that's not good at all. Okay, so to summarize, things aren't so great in C and I told you how I see that and things are kind of mixed in Python because the exception mechanism is okay but you get exceptions all the time and you're running code because you can't predict what exception types you'll have to deal with. And when rust the situation is more complicated you have a useful type to encode whether or not there was an error in your function. The compiler pays attention to that and forces you to deal with it or to get it rounded in some way. And you know what types of errors you're encountering but usually the state of the matter is that in the end the error you get is just a bare error with no information about where it was generated or any of those useful things that come with the Python stack tricks. Okay, so the rust community can really be commended for even thinking this is a problem. Okay, so they tried. They tried to address it. And error chain was a rust crate that came out in 2016 and I jumped right on that bandwagon. I felt they'd solved the problem. They certainly from error chain you can guess one of their major focuses was chaining errors. And I'm all about that as I've already mentioned so that was great. But the state it is in now is basically abandoned although there's been some efforts to sort of push it forward a bit or at least to heal it to the point where it's still usable by the people who started using it in 2016 basically. There was another crate called failure which introduced the idea of error management but it's in a sort of wallowing state. It's wallowing about at this point. This is not a comparison of these different earlier attempts. So I won't get into this in detail but you can just look at their GitHub sites and see as they try to deal or whatever. And in fact, there are problems within the Rust Standard Library. So if you follow this link here, it's called something pretty straightforward like fix the error trait. And now the error trait is a fundamental part of the Rust Standard Library and about a year ago they realized it was established in such a way that it was very technical but it was unusable in certain ways they'd expected it to be usable. And this issue was opened a while ago. There are several little boxes to put check marks in and there's definitely a check mark in one of those boxes, but not at all. So the problem is that now we need something and it's not really there. So I'm willing to step into the void in order to construct something, at least for myself or for my project and maybe subsequently it'll work out to be the thing that others can use. Okay, so one of the things that we think we really ought to have is that factories. And you can see why I don't think I even have to go into any detail about that. That's obvious. Also the chaining of errors is to me very, very important. A slight difference that I have with other people is that I think that the chain, the essential whether it's a left or right link can be given some information. We also want to be able to log these errors in an intelligent way. And because we already have a big project we want to be able to add this in gradually and possibly even and this is a really more difficult thing, take it out if the community comes up with something full strength at some later time. Okay, so this is my error here. And this is an actual struct, not a trade. And you can see that these things are showing up. Here's the back trace that I demand we have. Here we have those children that we chain with. And here we have a thing to hold what actually characterizes the error kind of error it actually was. Okay, so just to beef aside, this is about the stack. And you'll all know that Python is generally an interpreted language. The Python interpreter comes along and it manages its own stack. And so when the Python interpreter generates a stack that is not a problem for it. So if it's generating a stack trace to show what was going on when an exception was raised there's no difficulty because it built the stack. It knows everything about the stack. So that's fine. Rust of course is a compiled language. Or rather that's how we run it. So we compile rust, we get an executable. And so this stack is not the friendly interpreter stack but this is just the machine stack. Okay. And so as one would expect, getting information about the machine stack is a harder problem. But you've all run, if you have been in GDB and you've typed in BT you've got in a back trace. Same thing. So we know that that is a problem that people have tackled and solved. And although what you see is not as pretty as the Python interpreter would get you, it's still the same general motion and has information modulo, a lot of tricky stuff that I won't go into in this talk. And the good news is that there is a healthy, happy create in Rust that will get the backtrace for you. So not abandoned in any way. It works. It will obtain a backtrace through well understood methods. Okay, so getting the backtrace is more or less modulo, very much technical detail, a solved problem. So then what should be the policy? What errors should have a backtrace? And what I go with right now is every error. And one argument there is that as if an error is returned, then an action might be taken because of that error, which will result potentially in another error with a new backtrace. And the new backtrace will not simply be a prefix of the previous backtrace, because new actions were taken and the stack was built somewhere else and then the error occurred. But I'm so worried that this might be a bad idea that I make the access or function return an option type. And it might be a bad idea for a few reasons. It might affect performance in some way or we might find there were some circumstances in which it can't be obtained. I can't think of that yet, but nervous enough. Okay. The other thing is the specifics about the error. We want that to be detailed and useful. So we put it in a separate spot. What we're doing now is we say that in every library or other isolated thing that we're building, we make it just globally defined. That is, we don't try to segment it into different parts of the library or other thing and isolate them from each other because that seems to be more overhead than it's worth. We just make it library wide, essentially. Lots and lots of things. Okay, and then finally there's the child. This is the thing you chain. And I just thought about it and I decided that there are two different and interesting relationships. One is simply that the parent provides additional context for the child. And you can imagine that in the usual sort of file not found, you get that error back and some higher level function says, adds additional information that says, this is the file that we were looking for and then later it says, okay, we failed to do this action and so forth. The other thing is that it just might happen after and that's more like the scenario where there's some reaction to the error and a new action is taken but that also results in an error. And I think that these are distinct enough that we should use essentially the left and right arrows to distinguish. Okay, and so you can think of it as being something like this. So this is my example. So I wanted to ride around on my horse smiting people but there was this missing nail which led to the lost shoe which led to the horse not working but this is sort of an updated poem so I ran to my Jeep but that didn't work either. And so you see that this shows the previous link because this doesn't further explain this problem. It's a new thing. But ultimately we get up and lose the battle. It's also a good reminder about the importance of small things I guess. Okay, and so in order to deal with gradually adding this into an already existing large program we use that same trick that I was saying can cause some problems earlier. We make this a separate kind of error and we implicitly promote this to our super error like we promoted everything else. Every other error that we had to deal with in our program as we gradually added in until it encompasses everything. Okay, so this one I could list you all the things we've worried about and thought about but that would be boring and so I'm just gonna say we're thinking about it. Okay, so this is the end of my talk and rather than just asking for questions if you have any advice I'd actually welcome it. We have this PR up. I actually have a repo for this talk so if you have any comments you can actually attach them as issues because I update talks. And then we have at one time a design PR but of course it's designed off but of course that encountered reality so it's now lagging the pole a bit. And yeah, that is it, so. So thank you for a very good talk. I've also been working in Rust for about a year and this is a perpetual pain point. Just a couple other comments and then a question. One of the things that's nice about the Python method is that errors don't end up as part of your ABI so you can add additional errors, for example, for additional cases and it doesn't break every application that depends on you. I'm not sure we can really do anything about that in Rust because having them in the ABI is also nice but it does represent at least a pain point for using errors. My question was for your, well I have several questions. One is for your all error type are you then, do you then have implicit parameterization for the into type so that you can use the question mark operator to implicitly convert all errors into all errors? Um, so I think the question is quite understandable so about the implicit conversion, but I missed. So let me back up and explain for people who don't know there's another trait called into in Rust which allows you to convert from a type to a type and or it's actually the from trait is the primary one but as long as you have this trait implemented for your type and you use the question mark error then Rust will automatically convert it to whatever kind of error you're trying to go to. And so my question is, so you have like the sub error section right where you have like a parent and a, what was the other term? I forget, concurrent? Constituent. Constituent, yes, thank you. So it would seem to me like you could have a fairly good automatic conversion using the question mark operator such that if an error occurs and your function is returning all error that into would convert whatever error type occurs into an all error listing the source as the parent type. And that would allow you to chain I think by just using the question mark operator. You know, this is, I think we will have to write this down. Yeah, no problem. It's a very complex topic with a lot of subtle interactions. Yeah, so there's, I can't, basically I can't tell which direction you're converting into. So I think it would be better if we took this one. Yeah, we can do that. Because I'll just go around the circles. Are you, my second question is are you actually proposing this for a crate? Or are you proposing an upstream? Not at this point. So basically as Stratas tech lead, we have to, well I feel it's necessary we have a better situation than what we have. So this PR is not a separate crate but simply direct, the idea would be to directly incorporate it into Stratas. We need to, we need to mature it before we can turn it into something that is a crate. I think, I honestly sort of think that these earlier efforts, which are quite respectable, error chain is very respectable. I think that these earlier efforts might have been a little bit rushed into generalization. And so, not only- You want some time to work with it before you really turn it into a general. I think it needs to be exercised because they- How widely are you using this in the Stratas code? So remember that's a PR. We're experimenting it with shoving it into more places. So we started working with things that were very much at Leaves as Leaves of the whole Stratas project in Stratas D. Now we're trying to pick off other bits, just interesting bits and shove it in again. But I think why this process has to be so gradual is that if you- I didn't talk about the complexities here, but if you wanna chain a new error onto potentially a bunch of other errors, you rewrite the method. The way I do it is to make a closure, shove everything that was in the old method into the closure, and then chain the new error onto it. But that's a rewrite. And so it requires actual developer application thought to do that. One last question, then I'll give up the mic. So no standard is also a thing on Rust, and I did notice you were using Box there, which would be fine on Alec, but if you don't have Alec, it's gonna be a particularly difficult situation for you. So that's something to consider in your generalization efforts, how things work on no standard and no Alec. That's a good caveat. This is the last talk of the day. They ask that everyone leave, not all the rooms, but the building, because we're setting up in that main area for the park. The recommendation is, there'll be food at the party, but it will be more of the-