 How's everybody doing today? All right. So my name's Eric Tabanek. We're going to talk about building command line tools in Python. If this works, I'll control from my phone. If not, I'll do it from the computer. But yeah, so if anybody was here for the last talk, there were some really excellent sort of standards about reusing code and whatnot. I'm going to try and carry some of those themes forward, although not quite as much detail. So I work at a company called Game Changer. We build a sports scoring app for amateur baseball, softball, and basketball. And so I wanted to talk a little bit about one of my colleagues. So this is my good friend, Sean. Sean's a product marketer on our admin team. He essentially works with all the coaches that use our product. He sends a lot of emails trying to get them to come and come back the next season around, score more games, things like that. And so Sean will often send a bunch of emails. And so a lot of times, he wants to test things out. So maybe he's sending a headline. Maybe he's using a different color. I don't know. He's going to send two different emails and try and figure out which one gets people to interact more. So Sean will ping me on Slack and say, hey, I sent out these two emails. One of them, I sent 360-sum emails, and 43 people clicked through. And the other one was 323 and 52 people clicked through. So which one performed better? Well, it's pretty obvious that the second one performed better. We got more clicks with less sends. But is that statistically relevant? Should we say that we should always send this, whatnot? So I have a stats background. There's a number of different ways to solve this. My personal favorite is using Bayesian statistics. So we're not going to actually get into the Bayesian statistics in this talk. This is worthy of probably a whole hour-long talk on its own, but very pro-Bayes. So I start poking around the internet trying to figure out what's the best way to evaluate this, find some code. So I found this really awesome snippet of code online. I tried to give as much credit to the guy who actually wrote this as possible. And he actually wrote this really great blog post about how to run a B test using Bayesian methods. And at the end, here's some code to run it. So I'm like, sweet, that's what I need. All those formulas and whatnot look right. I'm going to trust you and go with it. So I take his code, and I copy and paste it. And I throw in the numbers that I need. And I adjust the alpha and beta levels because I don't really have any priors that I want to put in there. If you don't understand what that means, don't worry about it. And so I run this. And sure enough, it tells me that the probability that the A group was better than the B group, the one that we thought didn't work as well, was only 5%. So pretty solid evidence that B is better than A. So yeah, all right. Good job. Let's all go have a beer. Unfortunately, I have another 15 minutes or so to talk. And I think if we were all to go have a beer now, you guys might not get so much out of it. I'm supposed to be talking about command line utilities. Anyway, so not quite so fast. Sean's back, and he's done more. He sent a lot more emails this time to the B group because it did better. And he wants to know, is it still doing better? So I go and I open up this file that I saved somewhere on my hard drive and go and put in these new numbers. I probably should have updated my alpha and beta in this case because I did have some prior information, but whatever. And I print it out. And this time, A did a whole bunch better, like a lot better. And Sean's like, all right, that's cool. Like, did it do 10% better? And so I go back to that blog post. And I find they're like, oh, hey, yeah. There actually is a way to measure how much better it did, or at least give a probability. So we say, hey, yeah, it's actually really strong that it was at least 5% better. And so Sean comes back to me, well, what about 10%? So I go back to this file and update it. And now I've got a couple files floating around with basically the same little code. The last line's different. We're breaking one of the sort of fundamental pieces of whenever we write code. If you're here for the last talk, this came up. Don't repeat yourself. Don't repeat yourself. Don't repeat yourself. So someone else takes a look at this or says, hey, what are you doing? And I'm like, oh, well, maybe I should make this a little bit better. So I go and update this and create some functions that actually make this easier to use. And so now I've got some nice functions that encapsulate everything. And if I want to run this again for different levels, so I can test 5%. I can test 10%. I can test 20%. And I'm like, hey, look, Sean, as you get past 10%, the probability drops off quite a bit. So it's probably about that much better. So this is pretty cool. This is in a pretty good state. And you know what? I'm like, hey, these are some functions that other people might want to use. That blog post was nice. But you know what? Just like with that blog post, sharing is caring. And so I want other people to be able to use this technique, because it got one shot earlier. But Bayesian theory is a great way of doing this. The blog post talks way more about this. But it's a way better way of trying to understand these types of things. So I'm like, hey, let's make a package out of this. So this is Python. That's actually pretty easy to do. We're just going to release those functions. So we write a setup.py. This is another thing that could go on for its own talk about how to release stuff on PIP and whatnot. I'm sure there's some great talks out there. So I'm not going to get into all of the dirty details here, but essentially we can do this. And we can build it. And we can upload it. And now someone can go and PIP install it. So someone else comes and says like, hey, I want to use that thing you built. I'm like, all right, great. So you can create a virtual environment. You can go and install it with PIP. And then you just got to open up your Python interpreter. And you need to import the right function. If you're not sure exactly how to use that function, you may need to go to GitHub to look at a readme or look at the source code. And now you have to put in all of these numbers that aren't very clearly labeled. And you'll get a probability. So unless you have intimately read this code, granted, it's not super long. This is not super intuitive or super usable. There's a lot going on here. And so for the newcomer to this piece of code that's really designed to do two pretty basic things. And they're very similar. There's a lot of context that needed. There's sort of a heavy burden to the user to get this to work. So what we'd prefer is to have a command line tool where we can run it. And it will spit out something and says like, hey, here's how you use this. You can run this AB test. And you can tell it what's the population size, the success size, the second population size, its success size. And here's some other parameters that you can fiddle with if you'd like. And so we can input those. It's a quick hit the up character and change one of the variables to figure out what things look like. So this would be super handy. And so this is Python. We can do what we want. And usually it's pretty easy. So let's build this thing. That's what Python is great for. So if you search around around command line utilities and taking command line input for your script, the first thing you're going to come across is arg parse. Arg parse is part of the standard library. It's quite powerful, but it's also quite low level. It can be fairly cumbersome to use. I would compare it somewhat to trying to make orange juice by squeezing the orange on your eye. If anybody remembers this, you mean there's a better way? There is a better way. And that better way is called docopt. Docopt was introduced to me by a co-worker, and it's awesome. It's a descriptive language for creating command line interfaces. It was written first for Python, but now actually exists for a whole bunch of other languages. And so with docopt, we can essentially, in the doc string for our command line utility, we can define a command line interface. So here we only have one usage, and we can define the variables that are going to be dropped in. We can define optional variables, which are here represented in the square brackets. The docopt website obviously describes all of these rules and whatnot. We have another one that is optional, but there's an or in there. So you can only do one or the other, and otherwise it will raise an error. And we have a couple more optionals. When we print like help, it actually gives us this full doc string so that we can actually see it. We can give a more descriptive description of what these options and variables do. And the really nice thing is you just drop the doc string, which is available as DunderDoc in your script, into the docopt function, and it returns to you a dictionary with all of these elements as keys and the values or the defaults that were passed in by the user as the values. So that makes it super easy for use. Before we actually get into how this is used, I wanted to show a few examples of more complicated ones, because this one's pretty basic. So here is Dusty. Dusty is a tool that was built by some of my coworkers for managing your workspace environment through Docker containers. It's a great tool. If you have a bunch of services and whatnot, you should check out Dusty. It's a great way to set that up and have that running locally. So here's like the main Dusty command line tool. When you run the Dusty command, here's all the different commands that it allows you to do. And so all of that's done through docop. Here's one of the specific commands. So this is the bundles command within that. And then it has its own commands. So you can extend it to a whole bunch of sub-commands. Here's another quick little tool that I wrote. So I will often find myself in the P-SQL command line interface writing queries. And sometimes those queries can take a while to run. And then all of a sudden I'm like, you know what? I really want this table of data to be in a Google spreadsheet so I can send it to someone on marketing or finance or something like that. People who wouldn't really understand this output or what a query is or whatnot and probably want to operate it on it somehow. And so to do this in Python is pretty simple. It's just some string replacement and clearing some white space and whatnot. It's fairly straightforward. But then to actually use this regularly and be able to copy from my terminal and then paste that in, command line utility is perfect for that. So this just, boom, pops it out. And so in order to write the spec for how that works, we can use a file if we want or we can pipe in and out of it. And so this is another just example of really simple command line utility built with .gop. So back to our Bayesian probability calculator. So if this is our interface, this is the implementation to essentially run that. So maybe 20 lines ago, it's not super crazy. Essentially, we just pull out all the variables that we care about. They're all coming in as strings. So we convert them to the appropriate numerical types. We want to check if a relative or absolute effect was passed in. And if so, call the right thing and then print out the result. And that's about it. So if we do that, now we have a CLI. So we can call Python and our CLI and throw in the variables. I didn't put that doc string there at the top. But if you just called Python CLI, you get the doc string explaining how it works. So this is a bit cleaner. That's nice. So now let's go back to our end user who's going to download this via pip. It turns out that you can actually run this with the module flag. And maybe that's a bit easier. I was being a bit hyperbolic here. But trying to run that script when you download it with pip is actually kind of a pain. It doesn't really live anywhere nice. I had to use grep to go and figure out where, within my virtual environment, this thing was actually stored. And this is a mess. This is actually maybe worse than having them throw open the interpreter and import the function and use it themselves. So this isn't very helpful. This is definitely not something that you're going to tell someone to do. Like, oh, yeah, just run this thing and find the Python file in your virtual environment. So that's not going to work. This is the key to making it all happen. It's called an entry point. So back in our setup.py file, we can define an entry point for a console script. And so by adding this one little line that says console, four little lines that say console scripts. And then we give the tool a name. So this is a Bayesian AB test. And it says that entry is now at CLI main. And so it's going to call that function. So now, back out on our command line, once we're in our virtual environment and install via pip, all of a sudden now we have this tool. It's just called Bayesian AB test. So we can call it and we can get the doc string. And we can call it with the inputs and get all the right things. So this now is super helpful. We can either install it in a virtual environment and then go to that virtual environment when we want to use it. Or we could just install it sort of on our whole system. And whenever we need to use it, we just type the line and put in the four numbers. And Sean has his answer real quickly. He can slack me and I can slack him right back. It'll be a matter of copy and paste. So yeah, CLI tools are awesome. They're awesome because they help make your code dry. They're awesome because they help make your code shareable. And they're particularly awesome because they turn your code from a script into a tool that's easy for you and others to use. So that's all I got. Thank you for your time. Thanks for coming. I think I have a few minutes for questions. Yeah, there's some time for your questions. If you're sitting at a desk, please use the mic. You can turn it on with a silver button in front of the microphone and just be sure to turn it off when you're done. Otherwise, just be loud. It seems doc op, our process a lot of granularity. Like you can control the types of your input variables and stuff and it seems that doc op, that kind of washes out with doc op. Yeah, I believe so. There may be more advanced functionality in doc op that does allow you to specify that, but it wasn't immediately obvious when I've used it. So yeah, I think it's a difference between using something that's very powerful but can be a bit more robust to write versus something that cleanly integrates your documentation to the interface. Anything else? Yeah, making a slack bot out of it. That's actually a really great idea. Yeah, making a slack bot that's a Bayesian EP test. So then Sean doesn't even have to bug me at all. That's gonna be my next hackday project. Thank you. Any other questions? Well, thanks guys. That was a lot of fun.