 Let's give bread a warm welcome Good afternoon. Thank you all you can hear me fine. It sounds like great How are you feeling like it's good to be here, right? Like I know it's a little bit. Yeah, thank you I know it's a little bit of that post-lunch law and we're heading into the break You're thinking about your next coffee. I am too after I get off the talk nerves But so since Josh introduced me so well, I won't belabor the point if you want to follow along with the slides on this talk For accessibility reasons or just because I've got a QR code that'll take you right there I just point your mobile device toward the front of the room and scan it and you'll be able to head right there So to give a bit of background Josh mentioned that I work on a lot of technical projects. So like I've been I Love Python. I work with Python a lot One pattern I find myself implementing a lot is I write a lot of command line tools for people People who aren't me always So I need to think a little bit about how to make them user-friendly and accessible And when I start usually I start by trying to write a shell script How many of you all have written a shell script or at least are familiar with the concept? Okay, so good a good number. So hopefully this will be a useful talk for all of you um Shell scripts are really handy like when you have tools that Input and output formats that you need It's very easy to set them up to talk to each other seamlessly and get a useful result that you want And typically we do that with shell pipelines. So if you're not familiar with that terminology, we can walk for you through a few examples Uh pipe shell pipelines are literally indicated with the pipe character In between two commands and it means take the output of the command on the left and make it the input of The command on the right. So some examples of pipelines you might see commonly Like here here's probably the first one that everyone learns right like take the LS command that lists files and pipe it into less so that I can scroll through the list and actually Read it at my leisure rather than having to go whoosh like all the way down my terminal real fast So that's a that's a pipeline. We take the output of LS Which is just a plain text list of files and and make it the input of the less command Which then gives us an interface to scroll through it another another common one That's often taught to beginners is doing like if you've got a tar archive It's usually compressed in some way right and so what we can do is we can run that file through the decompression tool which might be Z cat or BZ cat or XZ cat And then feed the resulting output into the tar extraction command or the tar list command so that we can see What's inside that file? But we can do more sophisticated things right like like it doesn't just have to be One command on the input one command on the output. We can keep doing this So so here's a personal pipeline that I use every now and then to publish like lightweight webpages There are command line tools called smarty parts pants and markdown markdown does what you think it does it takes a Markdown formatted input and gives you HTML on the output And so that's how you sort of render the markdown Smarty pants just adds like nice formatting to that so m dashes Proper curly quotes things like that And so one pipeline I've written occasionally is to just take a markdown file pass it through smarty pants pass it through markdown And that'll give me like a nice chunk of HTML, right? And then I can use the cat command which concatenates files or inputs To take an HTML header that HTML chunk that I got out of markdown and an HTML footer and Conglomerate it all into a single HTML page that's ready for serving If you want to write blog posts, but you don't actually have set up a blog yet. This is a great way to do it Or at least it's convenient And so like if you've got tools that that you can chain together like this like you can do some very powerful things Here's an example of a kind of pipeline. I've written in the past. I'm not going to step through every technical detail of it But what it's saying is go through a source directory recursively find out all the files with extension PDF or ODT So basically documents and print out a list of them And then we're going to turn that list into arguments for CP the copy command and copy them recursively into a destination Directory and so what this is going to do is is recursively search a source directory find all the documents in it and copy them all Into a single target directory whether for reorganization or backup or whatever purpose you need That's a very high-level concept to express in just like a single shell pipeline like this right like this is a lot of power It's really like you can do some pretty impressive things if you're handy with the shell but one limitation of the shell is that what what this what these pipelines rely on is that The output of one tool and the input of another tool are formatted the same way And in fact, there's a little bit of like almost cheating going on in this last pipeline You see here this print zero argument to find and then this XRs dash zero basically these two flags Cause finds output to be compatible with XRs as input like we're kind of having to tell them to talk to each other correctly And if you don't use these flags like bad things can happen if your PDF file Names have like spaces or new lines or other unusual characters in them so it's important to include them and so One issue I run into when I start writing shell scripts like this occasionally is is I'll have like an impedance mismatch in My pipeline like one tool will output yaml and I'll need to like do a little bit of manipulation on that or selection on it And then feed it as input to something else And this is where I start really bumping up against the limits of shell scripting right because shell scripts like like Arrays are awkward to work with and hashes are completely non-standard dictionaries as we would call them in Python And so it can be difficult to like You know here's a task where like oh if I just had this I could parse this yaml in Python It would be really easy. It would give me a dictionary I could just do my selection there with a dictionary comprehension and then feed it as the input to another thing That that can be kind of difficult to do with shell So it's like well it'd be really nice if I could take the power of these pipelines and have it in Python fortunately you can So here's how you can take a shell Pipeline and write it in Python, and I'm going to use one of the examples from the previous slide the tar extraction one So there's a sub-process module in the standard library. You can import it right now In it it you can create p open objects out of it and p open basically represents a running process and and I Encourage you to refer to the documentation because p open actually takes a lot of different arguments to control Exactly how you want to run this process one thing you can do is you can take the different file descriptors that every process has the standard input output and error and determine how they get redirected and if you Say that you want to redirect something with the pipe Constant out of the sub-process module what happens is this p open objects that you get back will have an Attribute that is a file like object that you can read or write depending on which file descriptor it is And so this is saying start the xz cat command with whatever other arguments my script was given And then give me the standard out in a file object So here's an example sit where I'm sort of like mixing my shell command in my Python data structures Right like I'm taking a list of arguments that I got from the outside world And I'm using them as arguments to xz cat now this this is this is very simple This is something we could do in shell You know we're just like passing more arguments, but this list could be instead of being argv It could be a list I got from a yaml data structure or JSON or some other more sophisticated data manipulation that I did in Python Could just as easily go there and so this is the already a good example of how I can sort of mix and match these things So once I've got this standard out file object on my xz proc p open object To to set up the pipeline all I have to do is start another process in this case tar So there's the command for tar And now again, I'm going to set one of its file descriptors, but instead of setting it to a pipe I'm just going to pass it the file objects that I already have From the previous process and so this is literally what a pipeline is right? We're taking the input of one command Right excuse me the output of one command and making it the input of the next and that's you can literally see that reflected here This is the output of the first command. It is the input of the second And this is a very efficient way to do it like when you do it this way Their Python doesn't deal with the data being passed in between your two commands at all They go directly you're not introducing any overhead or anything like that to the data processing In p open objects are context managers so we can put them in a with statement And as usual, you know once the with block finishes Python will like Wait for the process to finish clean up all your file handles stuff like that Take care of all the cleanup for you without you having to worry about it So if you wanted to do some additional interaction here like print out Progress bar or something like that you could do it inside this block But if you just want to wait for your pipeline to finish you can just say with my pipeline pass and You'll just wait for it to finish One slight tricky thing to note here note that they're in reverse order from the way the pipeline would be written So we have the last process in the pipeline first and then the first process last that's because remember context managers create a stack and so When we when the with block finishes and we unwind that stack We're gonna wait for this process first, which is what we want because it's first in the pipeline and then this one So that's why it's that part slightly counterintuitive, but it it may sense if you think through how context managers work So that's pretty straightforward right like that's a nice clean way that we can write shelf pipelines in Python So I have to make a quick aside here Someone in the audience is chomping at the bit right now and they want to tell me you don't have to pipe xzcat into tar Modern tar already knows how to decompress its input and we don't have to run a second command And yes, you are correct. Absolutely. That is true I want to work with this as an illustration because it's a relatively simple pipeline And I think a lot of people are familiar with the tools involved Odds are like you know tools that you often use in shell pipelines that you could use a technique like this for but they tend It's difficult to find a set of tools to use as an example that everybody can use as a reference point So that's why I've stuck with xzcat and tar is because those are very common tools But don't literally do this, please. You don't need to just just say tar x and you're good Okay, so when we're so when we're writing shell pipelines in a shell script when we're like chaining these things together One neat feature of shell scripts is that right at the top you can say set dash e and what this means if you're not familiar with it It checks the return code of every program that you run inside the shell script and as soon as one of them fails It stops the shell script and doesn't continue unless you specifically added some error handling to deal with that case So this is a really nice safety feature to make sure that like the shell script doesn't continue on After execution stops making sense It's actually very similar to the way that you know, we're very big on having exceptions in Python Stop the execution of your programming something unexpected happens set dash e is really similar to that too And so you might want to do this for the programs that you're invoking from Python You might want to be able to say if one of these fails like I'm not going to be able to do anything else So stop and if you ask around if you search around on the web like how can I do that? You'll find really quickly They'll tell you oh The sub process module has functions for that You can just say sub process run with the check equals true argument Or if you're using an older version of the sub process module you could it has a function called check call and what that does is it waits for the program to finish and then it checks the return code and if It wasn't successful ie if it wasn't zero The sub process module will raise an exception for you But that's kind of tricky right because wait wait and we're waiting for the process to finish But the whole point of a shell pipeline is that you've got two processes running simultaneously one providing input for the other So we can't afford to wait for a process to finish So how can we get the best of both worlds here? How can we have our p-open objects which let us? pipe processes together While still having error checking It's about a three line subclass So you will have to like add this to your own script, but all you have to do is make a very quick extension to the wait method of sub process dot p open so all p open objects have a wait method which does exactly what you Think it does it waits for the underlying process to finish and then it returns the return code of that process And so what we can do is we can override that method to say Call call the original wait method of p open objects and so it returns the return code If that's not zero raise an exception just like sub process run would And if we were successful, we'll just return the return code just like the original wait method would so we stay API compatible So that's like something that's pretty simple you can drop that into your script pretty easily and Then once you have that for for any pipeline where you want to be doing this kind of error checking Instead of saying sub process dot p open you just say checked p open and that's it That's literally the only thing that's different about this entire block of code So you might be wondering well wait, where does the error checking happen? It happens right here Remember how I mentioned that the context manager waits for the underlying sub process to finish It does that by calling the wait method so implicitly Your context manager is calling the error handling code that we added through this this method and So that way that lets you just write your error checking code once And then like very easily adapted into the rest of your script where you're already writing these pipelines So that's pretty nice so Once we've got that How does error reporting handling or how do we report the errors that we get that we catch with set dashy or with our new Wait method so in a shell script You kind of don't really have any error reporting, but it kind of works out because Normally what happens is like if a tool like tar or xz cat encounters an error It'll print an error to its standard error file descriptor the user will see that and then your self script will stop So hopefully normally the last thing they'll your user will see is The error message that came from the underlying tool and hopefully that's helpful enough that they can do something with it Unfortunately, not always true, but you know like it is a start at least So if we if we've adapted this in Python like let's say for example that so we've got our tar extraction pipeline in Python Let's say the user gives it something that isn't actually a tar file the output will typically look like this So just like a shell script on the standard error You'll get an error from tar where it's explaining that it can't actually extract this thing so far so good Then you get about eight lines of Python trace back from where we raised the exception That we caught when we did our error checking after the fact So this does tell the user what went wrong, but like this is a lot of Like the user can't really do anything with most of this information Even if they're familiar with Python, even if they're familiar with their script i.e. It's future you This is a lot of information to parse through to like understand what actually went wrong that you use the tool incorrectly That you can do something about right now to fix So is there something we can do about this to improve this error reporting? There actually is so if you've seen a lot of these trace backs Have you ever asked yourself? Where does where does this printout come from like what what prints out this trace back that tells me? Where the exception came from in my script? It's built into Python But it's also customizable What Python does is it calls of function sys.except hook from the sys module To do exactly this kind of error reporting and so when you start Python This function is the function that prints out this trace back with all of this information But if you want to customize it you can just set a different function as sys.except hook So here's an example of one custom except hook you might write for yourself So every except hook takes three arguments the exception type the exception value which will normally in modern Python be an instance of exception type and then a very specific trace back object that has the details of Everything that happened leading up to this exception And so what we can do like in the simplest case you can simply like check the exception type and see if it's a kind of exception that Normally indicates that the user made a mistake or that it's some problem with the user's input And then just generate a custom error message accordingly So what this one does is it looks at a couple different exception types if it's called process error We can just we'll grab an error message. That's like Just the exception string but without the entire trace back Because that that's useful. We can give the user that If we get a keyboard interrupt exception But usually because the user hit control C at the terminal will just say okay, you interrupted us If anything else happened Python always has access to its original exception hook its default one as sys.dunder accept hook And so in your custom accept hook you can just call that directly to sort of fall back to the normal behavior For for cases that you're not prepared to deal with in your custom accept hook And so that's what I've done here there are so there are and then if we did find something we encounter You know would just print the exception and then exit one and then actually set it as sys.exept hook There are a million things you might want to customize here. You might want to customize the error message You might want to customize the handling you might Check whether or not a debug flag is set and if it's set don't set this as the accept hook Just use Python's default one The world is your oyster do whatever makes sense for you and your users But hopefully this is a useful illustration of the technique of both overriding sys.exept hook and common things you might want to do in it like writing new error messages or Customizing the exit return code so it's not just always one in this case it is but you could set whatever you wanted So once we have that installed and we we run again with the same bad input The output from our script will look more like this We'll see the error from tar and then there will be another error from your script specifically that says okay tar returned exit 2 This this is something that You know if a user understands that they're extracting a tar archive This is an error message that they can probably do something about they're like Oh that thing I thought was a tar archive apparently isn't I need to go back and check and maybe give it a different input So that's that's something most users will find a lot more friendly And so that that's it so there's just a few simple things that you can do in your python script so that you can call Shell commands and even orchestrate them into pipelines and get a lot of the user-friendly features of shell scripts as well so you create pipelines by creating chains of p-open objects where the output of one p-open object is connected to the input of another You can extend p-open objects to do things like error checking And potentially other cleanup as well to get shell script behavior like set-e Don't drop my water bottle And you can set your own except hook so that when you get exceptions from this kind of error checking You can do more user-friendly error reporting something something that's more useful to non programmers Rather than just giving them the default python trace back That they wouldn't know what to do about So that's my talk thank you all very much if you want to grab some of this code I've got so all the code that's been on these slides. I've got like it's got nice doc strings and like oh, it's so good It's also it shows you how to do python 2 compatibility. It shows you how to do It's got a couple extra features in there. I won't spoil the surprise But go ahead and check out that code and you can just drop that right into your scripts and start using that today Hopefully and if you have any questions or I'll be around for the rest of the day And there's my contact information at the bottom. Thank you all very much Thanks. Thanks Brett here is your love