 All right. How to read code. All of these examples will be a bit different. I'm going to try to give you an Adam example. There's been a lot of VS code. There's been a lot of PyCharm. My preferred editor is Adam, though I believe you could do many of these ideas in any preferred IDE development environment. In order for me to get this kind of interactivity in Adam, I've installed a plugin called Hydrogen. Hydrogen is basically a plug right into Jupyter and allows you to execute code line by line. Why am I doing this? Well, live coding is a little bit crazy, except this is going to be live coding with some training wheels. Everything has been pre-baked so that I'm not going to type. You can see I'm quite tall and I have to be down here. But my thesis in this whole presentation is that reading code is a lot like archaeology, where you're not just going to read it from top to bottom. You have to scrape and dig and figure out what's going on inside of these things. If you don't like the archaeology metaphor, perhaps surgery might be appropriate. So again, reading code is really trying to understand what a specific block of code is trying to do. So I'm going to take you through in these next 30 minutes an example of reading some code. And to make it a bit fun, I thought, hey, let's try to build a rock, paper, scissors bot. I had this idea because I'm teaching a full-time data science program in Canada. And my students just really needed to be introduced to this concept of reading code. And so I went online and tried to find a block of code that we could leverage and then kind of retool, inspect, and see exactly what was going on. I chose rock, paper, scissors because humans are terrible at generating random numbers. I think we're quite good, but we're not. We have predictable patterns. I read this on Reddit a couple of days ago. In rock, paper, scissors, you should just always pick paper because men will choose rock. And on top of that thesis, I kind of believe that like, hey, maybe it's the case that someone's going to hit rock, rock, paper or paper, paper, scissors. There's going to be some type of predictable pattern. And in order to dissect this type of pattern, I thought that I could use a Markov chain to try and decide what was going on. Markov chains are quite complex, though. And so I wanted a starting point and found one Googling. So this is a wonderful article guest posted by Anker and Ken, who apparently has a book. I was reading through his blog posts and literally just grabbed his code and wanted to better understand what was going on. So all of these slides, or rather they're not slides, all of these examples would be available at this GitHub link. So if you want to follow along, please grab it. And then if you want to read the post on which this is derived, you can go to this website. But the first step in reading any piece of code is to pretty much just copy and paste it. And so that's exactly what I've done here. Here is the block of code that I lifted. And straight away, it seems not too bad. I can sort of understand what's going on. Seems to be an initialization function, a next state method, a generate state method. Seems to be using NumPy. Not quite sure what this thing's doing just yet. We'll have to give this a better look. But when we're lifting code, I like to put it into Atom such that I can instantiate all of it and pretty much just run and see what's happening. So provided in the article was this transition matrix that defined where you are and where you're going to go. It seemed a bit clunky, but I'm just going to try it. Let's see what happens. Stuff this into an instance of the Markov chain. Call it weather chain. And then define where the states would be. It seems like that's what the author was doing with this block of code. So I'm just going to run and see what the outputs might be generated from this. There seems to be an argument in next state called current state. OK, that makes sense. What does a Markov chain do? It tries to grab your current state and predict the next thing. So it seems like if I just run this method call, OK. It's just generating some outputs. Oh, you see rainy, but most often it would hit sunny because that's the place in transition matrix. So too, snowy will look like this. And then if I look through the next method that was implemented in this class, there's generate states. Seems like it might take a number. We'll see what's going on here. So all I've done is I've copied and pasted the code. I've run the pieces of code in my atom editor with hydrogen. And I haven't done anything interesting. I understand what I need to put into it. I understand what I'm going to get out of it. I don't really understand how it works just yet. I see it's using a dictionary. Don't really know what this piece is doing just yet. In order to better understand and read this code properly, I will drop it into a brand new text space. And I will try to do some surgery. Surgery that is going to be as minimally invasive as possible. And so what I've come up with is let's just grab the pieces from the script before, the transition matrix, and the states. Load those up. And my version of minimal invasive surgery will actually be chopped off the head. So what I recommend when you're dealing with a class and want to better understand what's exactly going on inside of this thing is take all of the code and literally just lop off the head. So if you do that, I'll just comment these bits out. And I'll replace them with something that wasn't intended for this. But I'm going to grab mock from unit test and build up this self object. This will allow me to keep all of the pieces of code pretty much exactly as it is. So I'm going to lop off the head, bring in mock and self. So I'm going to bring in this object. Doesn't look like all that much. But it will allow me to proceed with business as usual. So now I can run each of these pieces of code. And everything will actually just work. So if I go look at self now, this is the amazing part about Adam and hydrogen, is you can actually just highlight bits of code and see exactly what's going on inside of these things. So once I've lopped off the head, I have brought in this self mock object. Now it's time for me to dig deeper and see what's going on in these blocks of code that maybe I don't fully understand just yet. This seems confusing to me. OK, that seems to be where all the heavy lifting is going. I want to now better inspect and move a fine tooth comb through all of these pieces. My silly little inspecting gift. So right now we have states. This is rainy, sunny, and snowy. This is the transition matrix that matches up to this. I want to see exactly what each of these pieces are doing. So I'm going to run mock, build in this object. I'm going to see what this type of thing does. Actually just looks like it's turning it into a numpy array and making sure that it is at least 2D. Cool. I'll put states into states. And now I want to see what this thing's doing. So right away I'm thinking that something doesn't sit right. We're trying to build up two different dictionaries that seemingly, I think, do the exact same thing. So it looks like one maps it to integers and the other just maps it back to the original states. I understand why that is maybe happening. We're going to do things on top of a transition matrix. And a matrix can only accept integer locations. So I get that. But I think I'm going to have to come back to these pieces. They seem OK for now. The code works. I love code that works. Code that works is code that works. But I think we can do a better job. And I'm going to come back to these things. But this is just a big, long piece of code to effectively map, it seems, the states back to integer placements. So I'll go grab a current state, Sunny. I'll stuff it back into next state. I'll run this piece and see that if I actually run everything again, I perhaps did something out of order. There we are. Still giving me state outputs. But I think we need to now go deeper on this object. So again, with Adam Hydrogen, reading code is pretty damn easy. I can collect and rather inspect exactly what's going on from the inside out. So current state we know is defined as Sunny. I can go look at this and see, OK, that's Sunny. Well, going into the index dict, this is just a dictionary. I'm going to pass in Sunny and get back 0. OK, yeah, that totally makes sense. We are mapping 0 to Sunny. And then I think what is happening here is this is your transition matrix. And with a little bit of sub-setting, we can pass in rows and columns. This would just be the 0th row and all the columns. I would expect this block of code to just give me the first row back. And that seems like exactly what it's supposed to do. Looking at random choice, what I know about random choice is if I just drop this in, I can give it any sort of 1, 2, 3. I believe it will just give me 1 or 2 back. Seems like this p is actually just modifying the probabilities of the array associated with this thing. So it seems like I'm just going to pass in the states, and it's going to map it to probabilities. OK, cool. I sort of know what's going on now. I hope you can follow as well. With these pieces, I'm now going to generate or rather see what generate states is meant to do. I'll build this loop. And now I'll try to take apart the loop to better understand what's going on. I've already done this, but effectively, I've copied and pasted it. And then I'm going to grab the 4I in range number. And just like, let's start at 1. See what that does. So we're going to build the future states object, set it to 1, run our next state. Oh, that's the thing that we just teased apart up here. So run the next state. I think we're just stuffing that into an object called ns. Pending that to the future, and then setting current state to the state that we already built. So looking into future states now, oh, it's sunny. Actually, it seems like this I business isn't even captured in the loop. So breaking apart, actually seeing what was going on, I now notice that, hey, I could probably replace that with an underscore to denote to me that this doesn't really matter. It doesn't matter which iteration we're in the loop. It just matters that we're running the loop a whole bunch of times to generate a bunch of states. OK. So we've gone a little bit deeper. Now I want to make some changes. I saw that this doesn't feel all that right. And this is probably pretty verbose. I'd like to change this to make it more legible, so that others that are reading my code could better understand what's going on in here. So I'll move into adjust. And while I'm making these adjustments, I was talking about rock, paper, scissors, and I'm giving you sunny, rainy, and snowy. That was the variables that the original post described. So I'm going to take sunny, rainy, and snowy. And in fact, I'm seeing this transition matrix now. And I don't even know how I would codify rock, paper, scissors into this. Because what I'm trying to do is build a rock, paper, scissors bot that would try and beat a human, exploiting the fact that a human probably isn't the best at generating random numbers. So it seems like this transition matrix is actually doing most of the work. How would I even build that in the first place? Because I'm probably going to be getting states that are like, oh, it was rock, paper, scissors, or in a string of dates, it was sunny, sunny, rainy, rainy, rainy, sunny, sunny, snowy. It's probably going to look like this. And so my transition matrix isn't actually all that super helpful. I'm going to need to figure out how I can move my states into something that looks sort of like this and back out the decision matrix. So as we move forward, let's just go to rock, paper, scissors, and replace sunny with rock and snowy with paper and scissors with rainy. So I'll do that here. I'll build up the states. And effectively, when I reach this point in trying to read this code, I went to Google and googled how to build a transition matrix. Found this post. You can go look at it. But in the interest of time, I'm just going to drop it in, copy and paste it, and just run it. Because that was the first recommendation. So the code that I grabbed from Stack Overflow to build up one of these transition matrices looks like this. So immediately, not really sure what's going on, but I'll execute the code and try to get to an output. So it seems like the last bit is stored in probes. We'll see what probes does. And that looks sort of like a matrix, except it's been normalized across the entire thing. So that's not precisely what I want, but it gets me pretty close. So I'm going to backtrack a little bit and try and see what was going on in this whole window function business. So with window, it grabbed states, and it tried to get me closer to prob. So I'll execute this block of code. And oh, no, it's a generator. So I'll explode it with a list and maybe just inspect the first five pieces. And it's trying to match up, OK, what was going on here? Rock, rock, paper, rock, scissors. Oh, I see what it's doing. It's building rock to rock, rock to paper, just chaining the last piece to the next piece. Well, this seems like a big old function to just put the place that you're at to the next place. When I was reading this, a light sort of went off. And I imagined that, hey, I could just take these two lists, stagger one of them, and then zip them up. This is just like a vanilla Python function that comes batteries included. So if I staggered this list, zipped it up, and then peeked inside with list, I'd be able to replicate everything that this function did up here. So by running the code, but getting to the output, something went off in me, knowing that I could just reach for zip, and just replace all of that nonsense with code that looks like this. So we're still not there. We need to get to probabilities. It takes states. We're trying to move into a transition matrix such that we can build our microb chain. So taking one step forward, we will, I think, go into counts and see what this was doing. So counts, it looks like, it was just going from one state to the other and counting them up. Well, I know there's another Python tool for that in collections called counter. And so what I can do is take our zipped up stagger states, force it into a list, put it into a new object, and run counter on top of it. By running counter, I think we get rock, rock 5, rock paper 2, pretty much the same output. OK, now with these pieces, it looks like this is just a dictionary with tuples as the hashes and values as the counts. Maybe I can take this and run items on top of it, unpack this thing, and have x, y count be captured in each of these places. And I'll just stuff it into a pandas data frame that will have those locations and these counts. So I've taken someone else's code from Stack Overflow, read through it, executed the code, see what was going on. And now I'm almost there. It's still not perfect, but the original code wasn't perfect either. So with this, I'm going to not adjust but refactor my code and go over here. So running from the top again, we'll bring in rock, paper, scissors. And actually, before we get to the transition matrix creation thing, I think we need to solve this problem, because I wasn't thrilled with it when we started. We've got these two different things that are doing the exact same thing. Well, not exact same thing, but the reverse of each other. I don't want them to have separateness. I want it to be all packaged together. I want to be able to run it. And I want to be able to keep things clean. Well, as a data scientist, I know in my toolbox that I have something called Label Encoder that is imported from scikit-learn preprocessing. Label Encoder is this overpowered tool, I think, that allows you to take, hey, our states that we had, our rock, paper, and scissors. We can fit states into this label encoder and actually run something called fit transform that will take all of these states and transform them into integers. Everything, I think, needs to be an integer because we're doing index location in a matrix and we can't just plop rock, paper, or scissors in there. So with this chain and with Label Encoder, I believe now I can run a method called inverse transform. This will take the generated chain and map things back. This seems to be exactly what this block of code was trying to do. And just peeking at what was going on in this thing seems like it was mapping rock to zero and zero to rock. Well, what Label Encoder does is exactly that. And there's a built-in inverse transform method. So I'll take this chain that's been transformed. I'll do the same bits that we did in the last script. So the staggered chain that was zipped together. I'll run counter on top of it. And then I'll describe and try to build out the matrix to pre-populate it. So to do this, I'll find out how many unique states there possibly are. Should just be three. And I'll quickly build out a matrix with a list comprehension to do that for me. So this would be 0, 0, 0, 1, 0, 2. I can now unpack this whole counter object and then fill it in with some vanilla Python. So this is now my matrix. I think we've got enough code going on at this point to wrap it up. So we will put it into a function called change to matrix. This is the exact code above. And run this on top of our chain. So here now is the matrix. But we need to normalize it, such that I can pass it into the transition matrix choosing function. It needs to be a probability that will sum to 1. So is this normal? No, it's not yet. But my normalize function can be pretty damn easy. It's just taking the sum of the entire row and it's dividing it by whatever the row and column index is by the entire row. So when I run this, here's just a couple of examples. Treat this as a row. These would be summed up to 1, as with this. Because it's a matrix, I need to apply this function on top of everything. And so I'll just iterate through every single row for row and matrix, do the normalizing process, append, and bring it back. So now I can take the entire matrix and stuff it into a transition matrix. That looks all right. So now we've gone from a process of chained together states. And I have pretty much what I need to generate the Markov chain. So now I'm just going to run it and take the function and just rebuild it with my pieces before we had this whole big block using the index dict. But instead, I'm going to use my whole label encoder transform inverse transform business. So I can peek inside and see these would be the classes. Here's the transition matrix. Let's run it. Hey, seems like it works. So now that I have most of the pieces ready to go, I'm going to package it up. This seems like something that would be useful to other people. So this is the code that I'd come up with. It's not all that different save for this piece here. As I was building this presentation, it was like, oh, I've kind of gotten rid of NumPy. I've gotten rid of Pandas. It'd be cool if I just could build this all in pure Python. And so I spent some time building out a label encoder for myself. It's not all that difficult. Maybe you can take this presentation. The idea is that I've shown you and kind of peel apart this for yourself. I knew I wouldn't have time to fully explain it. But the rest of the functions, convenience functions, the chain to matrix, the normalized matrix, they've all been presented. You've all seen them. And now I've just taken the Markov chain that was given on the blog post and rewritten some of these methods to do exactly that. So now it's time to play with this thing. I've actually packaged it up for you. It's available on my GitHub or PyPI as Mark. So you can pip install it. And what this will allow me to do is bring in a Markov chain from Mark, take the chain that we've been playing with, rock, rock, paper, rock, scissors, build up that chain, and now predict next states, generate a whole bunch of next states, or just use something different. If you read the synopsis of this presentation, I made reference to Shakespeare and was like, oh, actually be kind of cool if I went to Gutenberg and grabbed a text of Shakespeare and see how well my Markov chain could do on top of this, who needs GPT2. So we'll build two functions to quickly go through this, take the entire text of maybe something like a fellow, and generate some sentences using our pure Python Markov chain builder. And so I've been running a couple of these. They seem like they look like they could be Shakespeare. There's a bunch of just nonsense. I'll let you read through a couple and see if they pass the sniff test. Most don't. I wanted to take something like Oz, so the Wizard of Oz, see if I could do the exact same thing. Runs pretty quick. This is actually running through the entire book. I did it in 20 milliseconds with pure Python. Here's some Wizard of Oz Markov chain generated sentences. And they're not perfect, but they do all right. So in summary, what should you take from us? Reading code is a lot like archaeology. It's a lot like surgery. And if you're going to perform surgery, you need a tool. My preferred tool is Adam and hydrogen. And I'd like you to not be scared of copying and pasting code. Make sure that you are giving credit where credit is due and making changes that are appropriate. But when you're copying and pasting code, in order to read it, get to an output as quick as possible. See what the inputs are, see what the outputs are. That gives you a better sense for what was going on. If your code is encapsulated in some type of class, these things are hard to get into, except if you just lop off the head and use my little self-mock object instead. Once you've done that, rerun all the attributes and the methods to make sure that you didn't break anything. And then start inspecting those blocks of code with a fine-tooth comb that you are sure they're doing the heavy lifting. If something is especially complex, you should probably take it, do some inlining, see what each step is trying to do, and then peek inside of those things that are iterators by exploding it with some type of list. Loops can often be difficult to parse and see what's going on. And so deloping the loops by just extracting the code seems to work. You should try and slim down and replicate your outputs from those borrowed bits of code. And any time you're working with a matrix, you should probably use Label Benarizer or Label Encoder. These things are just overpowered. They're really great. And then this should be obvious, but maybe it's not often. Make sure after you've made all these changes that things still work before you try to encapsulate in a class. But that is my presentation. Here is the tool that you can use to play with it. All of the code is available at this repository. I think I have two minutes. And so I'll toss it over for some questions. Thanks for your attention.