 Hi, everybody. I'm happy to be here today. My name is Grant and I'll be presenting on what I think is a very cool utility for many of you who do data science work or even any kind of code work. This is my first time attending. We're not attending the first time participating in CSV comps. And it's my first presentation in quite some time, but I'm really, really looking forward to it. So with that, you should be able to see my slides relatively soon. Let me just get my speaker view up here. I'll take it from here. So yeah, I'm going to be talking today about using GitHub actions to accelerate your data efforts. If you haven't heard of GitHub actions before, don't worry, I will cover that. And this is something that I'm doing kind of on a personal note, not really related to my job, but I see so few people making use of it. One is I think an incredibly underrated tool. So I'm hoping that by sharing it with all of you today, maybe you'll have something new in your tool set. So before we get into that, just a little bit about me. In case my video in the corner is too small, here's a slightly larger photo of me, although out of date, in case you really wonder what I look like to give you a little bit more info on where and who I am. I am coming to you live from sunny San Diego, California, home territory of the Kool-Aye people. I've been here for a few years now and I lived here prior to that. So I'm a very, very familiar with this area and very comfortable here. Right now I'm a research software engineer at the Anti-Defamation League, one of the oldest civil rights organizations in the United States. And that's where I do a lot of data science and receive learning related work related to social media and particularly online hate speech, especially how to make that be less of it online. But I'm not really giving a talk today about my work or my organization, so we won't be hearing about any of that terrible stuff. I put this talk together kind of on my own time, but I will, after this talk, be sharing a fellowship opportunity that some of you may be interested in over on my Twitter, so I encourage you to sort of check that out later. And if you want to be able to contact me here, a few of my socials. Twitter is one of the easiest ways to reach me. You can find me on GitHub. I'm aware my email may be blocked over in the lower left hand corner, but you're welcome to find me by email as well. Prior to joining the ADO, I was a PhD dropout. I wasn't a PhD program long enough to drop out of it. I used to San Diego where I was doing a lot of engineering and neuroscience. And then before that, I was a graduate in computer science, philosophy and engineering, and my undergrad at UC Irvine because I was interested in far too many things. But without further ado, the first step to sort of understanding, first step to understanding how you use GitHub actions in any of your data or coding efforts is to understand what is a GitHub action. And so also, I apologize if you can hear the airplane noise in the background in the past quickly. So if you've ever been on GitHub, you've probably seen this toolbar before. And you may have noticed it has a few options that you rarely click on. In particular, you may notice that there's this button that says actions that is probably less used than the issues and poor quest buttons that you're much more used to. In short, a GitHub action is simply when a specified event happens, GitHub will do computation for you. Now that may sound a little bit strange because when most people think of GitHub, they think of storage and version control and using it to sort of maintain projects. Sometimes they go as far as using it for documentation and management. But about a year and a half ago at the end of 2019, GitHub released these things they call actions that take GitHub's capabilities quite a bit further. And I think they're very useful for the kind of things that many of you are probably familiar with, but a bit more beyond that. So GitHub actions are meant to facilitate some of the more traditional elements of software development like testing, automatically formatting code, continuous integration deployment, all those sorts of usual things. But it's much more flexible than that. And that's kind of what I'm going to be talking about a little bit today. For the moment, beyond that, one of the things that makes GitHub actions very cool is that for private repositories, you get about 30 to 50 hours of free compute time per month, depending on which of the free tier options you're in. You'll get more of your in paid options. And if you're using a public repository, you have sort of an unlimited amount of compute minutes per month if I'm understanding GitHub's documentation correctly. So if you have a public or open source project, you can basically make use of these utilities as much as you like without much restriction. And when you factor in the ability to sort of run any GitHub action for up to six hours at a time or more, if you play your cards, right? They become a very powerful tool, especially if you have limited compute resources elsewhere or you don't have a very powerful computer or you don't have a lot of time in one place with a computer at a given time. So, again, GitHub action to go sort of through the anatomy of it is even though they're called GitHub actions, the top level item behind them is actually called a workflow and a workflow links several actions together. So there are kind of six pieces to the anatomy of GitHub actions that you need to understand. The first is, as I already mentioned, there are events and events can be chronological or asynchronous. They can be initiated by a user or an event online, which will trigger the action and that triggers the workflow, which will kick in as soon as the action is detected by GitHub servers, workflows are sequences of jobs. So a single workflow can handle multiple jobs, which as I mentioned before may relate to testing, data collection, deployment, several other things. Each job contains multiple steps towards completion. Each step is broken down into the eponymous actions. This is kind of where the GitHub magic comes into play and I'll give a little bit of an explanation of what these actions are as we get towards that. And finally, actions are executed by these servers on GitHub side called runners. And so these six items, if you know these six sort of vocabulary terms, you kind of have an overview as a bird's eye view of what GitHub actions are composed of. And so I'm going to give you a code example of what a GitHub action actually looks like under a repository once you have it initiated. All GitHub actions are maintained using YAML format. So it's well to be easy to read even if you haven't used it before. And I'm going to go there bit by bit. The first bit is that you have an, like I said, you have an event on which the GitHub action executes. In this case, this particular configuration is set to execute whenever there's a push to the repository that this is posted on. The workflow has is basically the entire file, but it has a name. In this case, I'm borrowing from the introduction to GitHub actions documentation over on the official GitHub website. The workflow is then broken down further into jobs. In this case, this workflow has only one job, which is to check a version number of some sort. And then the rest of it is broken down into steps where the four steps involved here are checking out the repositories code, setting up Node.js on the runner, running the install for Node, running the install for the particular module that's hosted in the repository, and then finally running the code once NPM has installed it. So you can break it down into each step and you can see what the steps are by looking at the dash marks under the steps section of the YAML file. And then finally, the actions in question here are particularly these two steps. So when you see these sort of actions slash some name, that is using a piece of code that is hosted on the GitHub marketplace to perform an action for you. The other two steps below those two are actually just sort of running the code that is written in the YAML file. So you can choose between using things that other people have published, or you can just sort of write the raw code and have get a back and sort of execute the code that you put right in your file. And then finally, the runner is specified a little bit further towards the top. You have a choice of many operating systems including various versions of Ubuntu, macOS, Windows, and there are probably a few others. But for my purposes, I nearly always use the Ubuntu latest version. So that's why this file is the one I'm going with. And so what triggers a workflow, what actually is an event and what can you actually use this for? You have a lot of options. One is that you can schedule using timed events so you can run it every so often or on certain dates and so on. The syntax is very similar to CRON jobs if you've used them before CRON. And so if you're familiar with that syntax, you'll be able to schedule a GitHub action job, no problem. You can also fire off manual events to trigger GitHub actions and that can be done through the user interface on GitHub itself. If you go through the actions tab that I showed you before and you specify that a certain workflow is using this workflow dispatch manual event, you'll be able to just sort of hit a button and have code kind of execute on call, which is really cool. And then finally, basically any GitHub event that you can think of can be used as an event to trigger a workflow. So that includes the pull request that I'm sure many of you are familiar with. You can set it so that it's when a pull request is opened or signed or synchronized or reopened or any other event there. You can have GitHub events, GitHub actions run on push to any branch of your repository and you're allowed to select what branch is triggered or closed, which ones do not. And then of course you can also even have them triggered by issues being opened or edited or replied to or closed. So really any attribute of GitHub that you can name you can use as a basis for triggering a GitHub action. And there are many others that you can look up as well. So what actions are available, there is a GitHub marketplace. And as of this morning, there are nearly 8300 actions available to you in the GitHub actions marketplace that you can freely use in any of your code. So I encourage you to go look through, you'll probably find an action associated for any piece of your tech stack that is commonly used will probably have utilities in this marketplace. So some very common actions though, which you'll see over and over again are the checkout action, which is the one that gets your code from your repository onto the runner that's executing the action. And then you'll also probably want to pay attention to these two actions, which is the upload artifact and download artifact item. And that will let you pass assets and data between different jobs so that you can preserve some of the output of your computation and reuse it later if you want to. So that'll be very useful for some of your data maintenance and data processing efforts. And so by way of example, since you have a few minutes left, I wanted to go to the more full example of a GitHub workflow that I'm using for one of my projects or that I'm making use of in my spare time as well. And in this case, we're going to be testing a piece of Python code. This is the full file. Don't worry, we'll go through it step by step. This is for a project that I effectively refer to as the data kitchen because as many different culinary ask tools, I guess it could say for like preparing and acquiring and processing data. And so it's going to be named the kitchen testing suite. This is just sort of a moniker doesn't actually represent any piece of executed code. I want this piece of code to execute on both pushes and poor quest for poor quest. I want it to execute on every poor quest, but when it comes to pushes, I'm only going to have this executed whenever there's a new commit to the main branch so people can commit new stuff to feature branches or they want, but it's only one they try to put it in the main branch that I am going to be doubly sure that it's passing all the tests. So in the jobs, there's only one job in this workflow which is to run the test. And one of the things I'm going to make use of here that I think is really cool is this thing called a strategy matrix. And what this does is I'm specifying for operating systems and two versions of Python, because I don't just want to make sure it works for whatever the latest to blend two ways or whatever the latest Python is, I want to have some sort of compatibility between not everybody uses the same thing. And so by defining that here I can make use of it later to actually run test in various different environments. So now that I've defined a matrix of these items. Are we coming up on time. You have five minutes left. Yeah, we'll be doing a couple seconds, sorry, in a minute or two. So yeah, and then you can specify that this item will run on the operating system specified in that matrix by using the special syntax. That's a dollar sign double curly braces with matrix less in the middle. And then moving on to the steps of this job. And check out action I defined earlier that puts my code into the runner so I can now use it. I'm going to have it set up a version of Python based on whatever part of the matrix is in. In this case, I just invoked a matrix dot Python version so that it invokes the give action set up Python so that runner will now have Python installed specifically the version of Python I specify. So I'm going to use pip end in a raw command to sort of a site pip to install pip end, because I use pip end to develop some of my environment, you know, some of my tool in a virtual environment pip end is very helpful for that. And then once that is installed, I can run the test by having pip end install all the items that are in the repository, and then execute high test. So that's a complete actual useful GitHub workload. I can use from time to time to make sure to make GitHub test any code that gets put into our repository. And you can do similar things to execute data science related jobs. So that's just one example and unfortunately I saw we really have time for here again is the full code. You'll be welcome to look at it after the presentation as well. Before about GitHub actions, I highly recommend looking at the GitHub learning labs. They have many great interactive tutorials for many aspects of GitHub actions. I particularly recommend Hello World and continuous integration one of you just want to get familiar. And then other GitHub data get an action data resources you might want to check out is this blog in repository from the person who actually kind of inspired me to make this talk in the first place. And then there's a little project where you can actually deploy and train a machine learning model on the code on using entirely to get up action. It's a small model, but you can still use it and you don't have to have any sort of powerful computer to get it started you just need to write up the YAML files and GitHub actions will kind of handle the rest. I also highly recommend looking into the debugging action, which is action teammate, because if you do any amount of coding you're eventually going to need to learn how to debug. So if you're wondering how do you debug something that gets automatically triggered on somebody else's server. This is the GitHub action you're going to want to look at. So with that I'll let you all go off on your separate ways and hopefully you will find to have actions useful. Thank you all for having me today and I really appreciate the opportunity to present to you.