 Thank you, and hello there. My name is Jeroen and in the next 20 minutes, I'm going to try to convince you that if you have written some code that you should consider turning it into a command line tool. In the next 20 minutes, this is what I have in mind. I'm going to say a few words about myself so that you know where I'm coming from. I'm going to briefly explain what the command line is, what tools are, and most importantly, why you should care about this. Then we're actually going to build a command line tool so that you can see how that process goes, and then I'm going to wrap up. So a few words about myself. I have a background in computer science and artificial intelligence. I did my PhD in machine learning, and this was around 2010. And back then, I worked in Windows and I coded everything in MATLAB, and my code was not free at all. I wasn't thinking in tools, so to speak. So after my PhD, I moved to New York to work as a data scientist at various startups. And that's where things changed. I got the opportunity to write this book, Data Science at the Command Line, currently working very hard on the second edition, which should come out somewhere in October. Now you can read both editions for free if you're interested at data science at commandline.com. And if I am not writing, I'm spending my time coaching and training others in a variety of topics related to data science. So about that title. Now there are a couple of interpretations of the word free, there is free as in beer, there is free as in speech, but I'm not gonna talk about those things. I'm also not gonna talk about versioning code, packaging code, and distributing code. They're all important, yes, but they're outside the scope of this talk. Now I'm gonna talk about free as in bird in the sense that your code is light and can go everywhere it pleases. So the command line is a stark and unforgiving environment, but if you know which spells to cast, it can offer unlimited possibilities. So a while ago in February, nature argued that researchers should embrace the command line. They say that it can help you wrangle big files and that it can parallelize your experiments and automate your work. And yes, there are plenty of other reasons. Most importantly, it can save you from velociraptors, but I'm not here to convince you to use the command line. No, I want you to consider to turn your code into a command line tool. So something that can be executed from the command line and that can interact with other command line tools. And the real reason I am talking to you about this topic is that because I believe that command line tools enable these, yeah, wider community ideals that I read here about CSVConf. So those are interoperability, hackability and simplicity. I took the liberty of not highlighting data because there can of course also be interoperability regarding tools. So command line tools are simple. Well, most of them are. They usually do one thing and they do it well. They're hackable in the sense that you can combine tools, right? They all have text as a universal language. And there's also a sense of interoperability in regarding that you are able to leverage the tools and the command line in other places. So for example, here is a, here's JupyterLab, right? I'm showing here some Python code. There's a Jupyter Notebook and there is a full terminal here at the bottom. All showing different ways in which you can use the command line within Python. So plenty of programming languages and environment have this ability. Also, R and RStudio allow you to use the command line and their tools. So, yeah, you wanted to say something? I'm sorry, your slides aren't popping up. Oh, oh my, you've just been looking at my face. And that's the least important thing of all. I forgot to press this button. Oh my, and just the thing that I worked so hard on these slides. So, how are we doing on time? I have plenty of time left, so let's just. You're good on time. I was thinking, where is all the laughter? I mean, all these, yeah, I'm working on this book. Yeah, data science at the command line. Here we go, my company, free as in bird. This is what the command line looks like in case you have not seen it before. There's this nature article which you should be able to find if you Google for nature and bash. Now, this of course refers to, well, I'm not gonna explain it, you see the slides. These are the core values of CSVConf. There's value in repeating all this, of course. Jupiter lab, R and R studio. Well, I'm glad you told me, thank you for that. And even Spark, here I highlighted a sentence from the book Spark the definitive guide by the original author of Spark. And they say the pipe method is probably one of Spark's more interesting methods. And I think that's quite a compliment coming from, well, the author of this 800 pound gorilla when it comes to wrangling a lot of data. And I think it's really interesting that they've decided to add the functionality to leverage a 50 year old technology. So now that I have sort of established what command line is, tools, we're gonna look at tools in a moment, but also that it's in a lot of different places, right? Not only different programming languages and environments, but it can be found on supercomputers, to microcontrollers, and of course on your laptops. Even Windows now with it's a WSL, Windows subsystem for Linux, you can easily run a UNIX command line on your Windows system. So it's everywhere and it's here to stay. So about turning your code into a tool, right? All you have to do is follow these six easy steps. Now, I can imagine that right now you're feeling a bit like Alice tumbling down the rabbit hole. Oh, don't worry. Let's all just take the red pill and I show you how deep the rabbit hole goes. And let's go to, here we go. If you don't see the full screen, I now have JupyterLab open. What I've learned about Crowdcast is that if you resize your browser, if you make it a bit wider, the aspect ratio changes and the entire slide or my entire screen should be visible. So in this demonstration, I'm going to use Python, but the same principles apply to other programming languages, right? If your weapon of choice is R or JavaScript or Java, the same steps, right? The syntax will be slightly different, but the same steps can be taken. So I've also chosen a very simple example so that we can focus on the process itself rather than the code. So here we've got some analysis, right? What we're gonna do here in this piece of code is we're gonna open up a text file that contains the adventures of Alice in Wonderland and we're gonna count the number of lines that contain the term Alice. And then we're gonna print the result. Now, of course, it's up to you to use your imagination and think like, okay, how does that apply to my code, right? So the very first step, although it's a bit trivial at this point, is to copy this to a file, right? A regular text file. And that's what I have right here. That's the first step that you need to do. And this becomes more complicated when of course your code is scattered across many different files or notebooks and so forth. Let's see now. I shoot, where's my command line? Oh, there it's hiding. Let's move that over here. So now you should see the command line. Got a couple of files here, count.py. Now, let's first check that it's working, right? 401. That's the number of times Alice appears in this text. And this is again, a really simple command line tool. In fact, what we've just did is I've implemented quite poorly to be honest, grep. Should be right, but that doesn't matter. It's the process, like I said, which matters. So I'm now going to take you through the steps that are needed in order to turn this into a proper command line tool. By the way, if you have any questions, I unfortunately don't have the time to look at them now, but there are various ways you can get ahold of me after this session. So, okay, the very first thing we want to do, oh, not this file, that is the text itself. Now let's copy that over from the previous directory that will speed up things a little bit. So the very first step that we want to do, or the second step actually, after we've put things into a file, is to add some arguments, right? Because this piece of code does the same thing over and over again, right? And in order to make this tool more usable in more general, you have to think about, okay, what are some parts that can vary that I would like to change? And so what I've added here is this import statement and I'm using the, well, an argument that is passed here at the command line. The first element is always the name of the file. And so the second element in Python that is element one, that would be an argument. So now what I could do, well, I can test if Alice still works, but I could also try out other values. So we've already made our code a little bit more general. But of course there are other things we can do. Now let's see, this piece of code only works on Alice, right? This code takes on the responsibility of opening up a file and reading that file. And it's the same file over and over again. Now of course you could turn that into an argument, just like the pattern is then, yeah? So Alice hadn't got that's an argument. So we could also turn the file name as an argument, but another approach is to read, let your tool read from standard input. So that's a standardized way of feeding data into a tool. And that's what this code is doing. It's, let me put them side by side. Guess if I do it like this. So now we have those two files side by side. And the second version here, the newer version is even a bit simpler. It doesn't open a file, it just reads from standard input. So it has moved that responsibility to outside the tool. So now we have to read it in. This is one way you can do it. And then we can pipe that into all. Okay, so this works. But now that this tool is reading from standard input, anything is possible. We could even. This is your, sorry, this is your five minute warning. Thank you very much. That's actually a minute more than I had. I'm gonna make use of it. So it can read anything. Any data that you feed into this, it can now handle. So what I'm doing here is I'm using the command line tool curl to download a file. This is the sequel to that book through the looking glass. Oh, that took a long time to download a book. I can silence that one. Is it gonna take that long still? Anyway, you can see that Alice is mentioned on 465 lines in this book. So there's already a lot more possible now. In fact, it is just text. So we could even generate a list of numbers. Here I'm generating one through a hundred. And I could pipe that to that tool and say, I want you to count the number of times three, a three appears on a line. So moving on, because this doesn't really feel like a proper command line tool, right? So what I've changed here is I've added a single line here. These two first characters that is known as the shebang or hashbang, and this lets the command line or the shell know that this is executable. And what's here, what follows, that is the program which is responsible for interpreting this source code. Now, unfortunately, or no, that's not really unfortunately, we need one more step in order to make this work. And that is we need to change the permissions on this file. So otherwise we would get an error like, hey, you don't have to write permissions. So now what I can do is I could use it like so. And then I guess if you change this to just being count, because the command line doesn't really care about extensions, this already starts to feel more like a proper tool. Now, there's one final step. I'm gonna leave that as a take home exercise. You'll be able to find it in my book. Let's wrap up, because I have a few closing thoughts here. So these steps, I'd say are pretty easy. It's the thinking about what goes into this tool that is hard, right? Once you have a tool, even if you don't use it yourself, it will benefit others. If you at least, of course, want others to be able to use your tool. They would then, and perhaps you as well, able to tap into the existing ecosystem of all these command line tools. So all this functionality of downloading files, of scheduling, of monitoring and parallelizing even, all becomes available. And then I haven't even talked about packaging and distribution, which is very important in itself. So that's my talk. Thank you very much for listening. I hope I've been able to sort of convince you. If you need further convincing, if you have any questions at all, you can leave a message on Slack or you can send me a tweet. I'm on Twitter here. And again, good luck. Thank you to the organizers of CSV Inc. Enjoy the rest of your conference. And yeah, I'll...