 My first RubyConf was RubyConf in San Francisco in 2009, and the closing talk of that conference was this monstrosity by Aaron Patterson and Ryan Davis. And if you haven't seen this talk, you should really go find it on Confreaks. It is a great way to spend half an hour, 45 minutes, because in this talk, Ryan and Aaron cover all of the crazy ideas that they came up with while riding the bus home from the Ruby class they were teaching in downtown Seattle. The fail bus, yeah, we re-banded that bus, the fail bus. And they share a whole bunch of perfectly reasonable code for perfectly unreasonable problems. And back in 2009, I was working at a startup, and my code was running on something that looked approximately like this. And now my code runs in some place that looks like this. So I was trying to figure out what talk to propose for this conference, and I thought I was sitting on the bus, fail bus, and I'm like, I should totally do another version of this talk. So I'm Masu Hamreley. The code for this talk is on GitHub. I'm Thagamizer and it's in the Stupid Ideas repo. I tweet at ThagamizerRB and my phone is over there so you can totally tweet at me during the talk because I love it when people do that and you won't distract me. And I blog at Thagamizer.com and that used to be, I blog very rarely at Thagamizer.com but now I'm paid to blog at Thagamizer.com so I blog there more often. I work for Google Cloud Platform. I'm a developer advocate and we do let you run Ruby on our cloud, oddly enough. So cloud.google.com slash Ruby. I'm gonna be running Office Hours tomorrow during the afternoon break down on level two. And if you have questions about running your code on Google Cloud or you just wanna know about Google Cloud, you can come talk to me. And because I work at a large company, unless otherwise stated, all code in this talk is copyright Google and licensed Apache V2 because lawyers. So now that all that's out of the way, time to get on with the Stupid. First idea, load testing. So the code in this section predates my time at Google and it predates my time coming to Ruby Conferences because it's really old. But that means that it's my copyright and it's licensed MIT. And I wanna be real clear, this is really stupid load testing. I started my career in QA. And as part of QA at a small company and my boss would come and say, hey, can you load test this new feature? And I would say, okay, great. What parameters are you looking for? What's the ideal load you want this to have? What kind of specs should I run it against? And they're like, load test it. Just do the load testing thing. And so my response to that was to take my relatively limited set of skills at that point and come up with something incredibly stupid. So I had a Rails log of the scenario I wanted to run and that was from my local server. And I hadn't mechanized. And it turns out with those two pieces and not a lot of forethought, you can make a perfectly reasonable load testing framework. So why mechanize? Because it let me write out a script of code that was then going to run and I could inject user IDs or association IDs into that script so I wasn't rerunning the exact same scenario from my log over and over again. Also, mechanize is actually pretty small and so if I needed to load a bunch of code onto a bunch of load-generating machines, it was pretty easy to do that. So the big challenge of load testing is to make a whole bunch of computers do the same thing at the same time pointed at the same server. And if you can do that, you have load testing. There's an analysis piece, but we're gonna ignore that for now because this is stupid ideas for too many computers. So what's the code look like for this? Well, let's be clear, it's not good. In fact, this is some of the oldest Ruby code that I've written that I haven't deleted yet. And by it not being good, I really mean that. I knew I had this code sitting around when I proposed the talk and I went to go pull it up to put it on the slides and this was the first line that I saw. So yeah, really truly the first line about files so here's the code, I'm probably not gonna be able to see it, it's okay. There's not much here. That's just some preamble to get the mechanized agent started up. That's me looping through the log file and matching against that horrible regex and all that horrible regex does is pull out the action, the controllers and the parameters from the Rails log. And then that right there is running out, writing out the code that will make mechanized do a post to the server based on what the action and the controller were. It happened to be that the project I was doing this for only had two primary actions for our entire application like 99% of our traffic went through those two so I just switched. There's a case statement there I think or an if. But it's pretty flexible. You have the action, you have the controller, you have the parameters so you could totally throw that stuff in for whatever your Rails app is. So that's great. Now I can programmatically run the website but that's still not load. How do you turn that into load? Well, parallelism. You take your script, you do it on a bunch of computers at once for a very long time. Turns out very long time is actually pretty easy. You just put that around your script. So I've used this technique more times than I really want to admit such as the life of a QA engineer. You have a tool, you have a problem to solve and you're always gonna be running on the last minute. This works great. I'm guessing that some of you booked your travel here on our website that I use this technique to test. So the second part is how do I get a bunch of different computers running? Well, I used Bash. I wrote a Bash script that started up 30 versions of this script on a single computer and the first time I did this, I did it with commodity hardware that we had purchased on eBay. So we had a bunch of pizza box servers and I had them set on a rolling cart in a conference room and I was in a conference room because they made too much noise and everyone was getting mad at me and I managed to get the aluminum shells of those commodity hardware servers warm enough that I could make a toasty cheese sandwich on them. But I didn't. And the second and third time we ran this technique, we used the cloud and the biggest disadvantage of using the cloud to do this is no toasty cheese sandwiches. So how do you deploy that? Well, I normally call this section deployment but really it was just low-scale hackery. And so basically what I did is I made a VM on a cloud provider. I got all my dependencies on there, got Ruby on there, put my code on there and then I hit the make more like this button a bunch of times. And then to start it up, I had to SSH into each of the servers to start the script up. And turns out when you do that, you can't do it all at once and so it takes a little while for you to get your whole load test suite running at full capacity. But I'm gonna call that a feature because most real load testing frameworks ramp up the load slowly so my manual load testing framework doing that was just fine. So why is this a stupid idea? First of all, you get no statistics or analysis. When we ran this, we had no idea of any of the agents we're timing out. We had no idea what our maximum capacity was beyond what we could get from the server logs. We didn't know kind of errors we were getting. Also, there are off-the-shelf tools to do this that I did not know about at the time. There are many of them. Apache Bench is a great example. So please don't ever do this unless it's for fun. So that was stupid idea number one. And this is stupid idea number two. So I adored my boss and he and I were sitting over coffee one day at our office and he's like, I have this great idea for an interview question. Can we do sentiment analysis of Twitter using emoji? And I started proposing this talk. I'm like, hey, can I steal your interview question for my talk? He's like, sure. So I wanna thank my boss for letting me steal his question. And I also wanna say that if you were ever asked this question by my boss, the solution I'm about to present is not the right one. You will fail the interview with this. So if you're not familiar, sentiment analysis is the process of looking at a big pile of text and deciding if the general feeling of the text is generally positive, generally negative, or neutral. And it turns out it's really hard because we're human and we are not precise with language. For example, this sentence. If I said this sentence to you with a smile on my face, sure, I'd love to. It would be a positive sentiment. If your teenage son says it while rolling his eyes, it's probably not so positive. So it's hard to tell what the sentiment of this is. But we're not actually dealing with words, we're dealing with emoji. So we can probably divide these five emoji up into positive and negative. Maybe something like this, the heart, the thumbs up, the smiley face are positive, the pile of poo emoji, and the corns guy. I think the name is the imp, are negative. And yes, I did partly propose this because I wanted to be able to put the poop emoji on my slides for a good reason for once. But you know, if you think about it, maybe the heart is more positive than the thumbs up. So you can also do something like this. Put the emoji on a continuum and assign them points. And now I need a corpus of text to do my sentiment analysis against. So I'm gonna use tweets. And here's the code. First thing, I'm using the tweet stream gem because this allows me to do live streaming against Twitter. Here's my signing of points to a bunch of different emoji, just throwing it straight in a hash. This looks a lot better on this slide than it does in Emacs because I can't actually see the emoji in Emacs. And then here's the code that analyzes a single tweet to figure out a numeric sentiment for it. I'm just going through that hash of sentiments. If a tweet includes a given emoji or an un-given emoticon, I increment a sentiment integer by the appropriate number of points. Pretty simple. But that's only good for one tweet. If I want to do a whole bunch of tweets, like the tweets that are coming out about RubyConf today, I need a lot of computers. And that's where Rinda comes in. Rinda is an implementation of the Linda distributed coordination language. It's part of standard lib, so I didn't even need a gem for this. This was awesome because it made it much easier to deploy it. And Linda uses a shared tuple space to allow multiple processes, potentially multiple computers to communicate. And if none of those words made sense, I have a picture. So the tuple space is that cloudy thing. And a tuple, as far as Rubyists are concerned, is just an array because we're not Pythonistas. And then we have a bunch of workers and they can write things into the tuple space or read things out of the tuple space. So the server for Rinda is actually really super simple. That's all you need right there, four lines, and you have a tuple server that you can connect to. And I've made this slightly more complicated than it needs to be because I'm passing in the server, the URI as a command line per room. And to do this tweet analysis, I'm going to have three different types of workers and I'm going to walk you through basically how this works. So the fetcher brings in a tweet. It writes it into a tuple that has the tweet symbol as the first element of the array and then the text of the tweet is the second. And then it pushes that to the tuple space. The analyzer pulls the tweet out of the tuple space, calculates the sentiment and creates a sentiment tuple with sentiment as the symbol and an integer as the second entry in the array and writes that to the tuple space. And then finally the reducer reads all of the sentiment tuples and comes up with a total sentiment for the entire space. If you're familiar with MapReducePatterns, this probably looks very similar, it's a MapReducePattern. So here's the code, here's the fetcher, that's the kind of tuple it's going to write. And that's basically all the code you need to do this. I'm just writing to my tuple space after I create it with Rinda, the text of the tweet. Here's the analyzer or the mapper. Slightly more code here because I have to pull something from the tuple space. That's that first line with the arrow. I'm doing tuplespace.take, anything that starts with the symbol tweet and has a string as the second entry. It's pattern matching, I really like pattern matching languages. This is why I like prolog. And then the last line is writing the sentiment that I've calculated using that code I already showed to the tuple space. And that's what you're gonna get back. And then finally the reducer, its job is to pull all of those sentiment tuples out of the tuple space and then I'm just gonna write the total sentiment out to standard error. So you can see a tuplespace.take matching on the pattern, the symbol sentiment and a numeric and then writing it out to standard error. So that's all great, but I actually have to make this go on something other than my MacBook Air because it turns out that I can't actually do that much on my MacBook Air. And I was reading on Slack the other day that someone had been to a conference or every conference someone had been to this year had to talk about containers. So clearly containers are the right solution for this problem. So I'm using the Ruby latest container from Docker Hub and I'm gonna run this with eight containers on five VMs. And this is kind of a rough architecture diagram. There's one fetcher, there's one server, there's one reducer and there's a bunch of analyzers. And I work at Google and so I know Kubernetes because it's an open source project that started out of Google and has worked on a lot with the people in my actual office. And it takes care of managing all the container stuff so I don't have to worry about that. And so I ran this up with Kubernetes and it kind of made it look like this. I don't get to pick which VMs, different parts go on each gray boxes of VM, each colored boxes a container. But it turns out I don't care. And this isn't a Kubernetes talk. So I'm not gonna talk about how that works but if you would like to know the Kubernetes part I gave this exact same talk with a Kubernetes focus on Tuesday at KubeCon and that video should be online shortly and I'm happy to talk to you about it after the talk. So this is the part where I get nervous. I'm on a team of people who are all professional speakers and they all really like doing live demos and I really hate doing live demos but they've managed to shame me sufficiently into I'm actually gonna try to do a live demo today. So if you can get out your phone or your electronic device of your choice and tweet an emoji at this hashtag at the end of the talk if I have time we will see what the sentiment of this room is. I've got this code all running in Google Cloud right now at this hashtag. Just tweet with that hashtag in there somewhere. The last time I did this it worked but someone also filled their entire tweet with the poop emoji. So yes, it will count each one of them. Once it looks like everyone's gotten their emoji on I'll move on to the next stupid idea I have for today. Okay, I'm seeing more eyes. So the last stupid idea I have before we go and see how many poop emoji you guys were able to tweet at that hashtag in the minute and half I gave you is Latin squares. So if you're not familiar Latin square is a very simple kind of logic problem and the idea is that you have a grid and then in each row and each column a symbol can only appear once. So this is a four by four Latin square and this is a seven by seven Latin square that was rendered in stained glass. This one happens to be at one of the colleges in Cambridge and was done in 1989. I saw it and I assumed it was gonna be much older because it was in Cambridge now but it's still pretty cool. Latin squares were named by Euler. Yes, that Euler because the paper he wrote about this used Latin characters for the symbols and they're simple. I'm here to show that I can't each and every one of you a piece of graph paper right now and say draw me a one by five or five by five Latin square and you can do it. But it turns out there's a lot of math and I am pretty much incapable of giving a talk without some math in it. So let's do some commonatorics because clearly 530 first day of the conference commonatorics time. So a four by four Latin square. How many four by four Latin squares do you guys think there are? Someone shout out an answer. There are that many, less than a thousand. So let's make it slightly bigger. Six, six by six. Turns out there are that many, almost a billion. Oh no, the numbers get bigger, trust me. So nine by nine Latin squares. Nine by nine Latin squares are interesting because it's kind of a degenerate, simpler version of Sudoku. It just has one less rule, right? So nine by nine Latin squares, there are that many. Which if you need help is that number in scientific notation and to put that in perspective there are fewer stars in the Milky Way than there are nine by nine Latin squares. But this isn't the biggest number I'm gonna show you. So first let's show, I'm gonna show you some code. I'm gonna use an approach like a sieve. If you guys have ever done the sieve approach to finding prime numbers, for example, it's the same idea. You throw a bunch of stuff in the top and then you eliminate the stuff that's wrong. And whatever comes out the bottom is your solutions. And I'm going to abuse array permute to do this. And the reason I'm doing that is because it's easy. I wrote this code up in about 10, 15 minutes. So here's the overall code to generate all the possible Latin squares plus a bunch more. First line, all I'm doing is figuring out the size. I'm gonna default to six unless I pass something else in. This line calculates all the permutations of an array that contains all the elements one to n or one to size. So if it was four, it'd be one, two, three, four plus all the permutations of that array. And stores that in an array called permutes. And this line creates all the permutations of the permutations of size n and prints them out to standard out, which means you end up with some n by n grids, which looks something like this. Now this one is not a Latin square because I have three ones in the first column. And it turns out that this algorithm generates this many items into the top of the sieve. So to simplify that a bit, and you're familiar with big O notation, it's big O of n bang bang, which is awesome. I laughed so hard when I figured that out and it's horrible. And secretly I am actually still a tester because awful things like this make me very happy. So if you run that with size nine, you're gonna get approximately that many items into the top of the sieve, which is a big number, turns out. It's about twice as many as we need, or a little more than that. So this is the body of the method that checks the solution. Setting up some variables here, the size of the solution, and then an example array from one to n. That line right there checks that each row when it's sorted is equal to the example array, which is a fast and simple way to make sure that each and every number one to n shows up in that row. And then I use array transpose to rotate the array 90 degrees and do the exact same thing over again. And if it's correct, I'm gonna put the solution out to standard out. So how do I filter 10 to the 50th possibilities to find the ones that are actually and that 10 to the 27th that are actually Latin squares? Well, clearly scale. I have access to all these cloud computing resources, clearly I should do something stupid with them. I'm gonna use Rinda again. This is a tuple for a possibility. This is a tuple for a solution. And here's the server. It's exactly the same code as before. No, really, it's exactly the same code as before. And if I wanted to, I could run both these problems side by side using the same tuple space as long as the tuples they were writing didn't collide. I chose not to do that for a variety of reasons, but I could. So here's the generator. Basically the same code as before, just connecting to Rinda and then those last three lines are those lines I walked you through before lines I walked you through doing horrible things with array permute. Here's the checker. The line I've highlighted is pulling the possibility out of the tuple space and then I'm writing the solution into the tuple space and I'm also putting it out on standard out. So deployment. Turns out the same as before. Only I need more pieces. I actually need a lot more checkers than even on this. I ran this with 25 checkers and it took about 24 hours for it to get through all 10 to the 50th possibilities. But again, I used Kubernetes and it looked something like this. And most of you probably can't see this, but this is a screen cap of the logging console and cloud platform where I was running this. And the nice thing about that is that it pulls everything from standard out and standard error into my logs. And so what this is is this is a whole bunch of nine by nine Latin squares that it's outputting as arrays, 2D arrays. So this is the part that's scary. So back to the emoji. Let's see which side you guys are on. There we go. Oh, this is not good. So I'm gonna try to make this bigger for you, but it's okay if I can't. I'm gonna come sit here on the stage so I can look at the confidence monitor and see what the number came out as. So of two minutes ago, the sentiment was negative 2,334. That is how the live demo worked and it worked. So I'm gonna be thankful to the demo gods today. So this is a keynote and I've actually got a little bit of time left. So I'm gonna go through some profound thoughts because clearly a stupid idea is talk needs to have profound thoughts. So were any of the things I talked about in this, were any of the ideas I talked about in this talk useful? No, none of them were useful. If you have these problems, I would love to talk to you because if you need to generate nine by nine Latin squares, I wanna know where you work. Were any of these ideas well engineered? Well, my original intention was to do these things as well engineered as Ryan and Aaron did in their talk with full test suites and everything else. But if you go check out the repo on GitHub, you'll find out that I did not do that because they weren't really well engineered. Were any of these the correct tool for the job? No, they're off the shelf tools for doing sentiment analysis. They're off the shelf tools for doing load testing. And there are numerous published algorithms for doing Sudoku or Latin squares and solving them efficiently. Something that is much better than Big O of N Bang Bang. What, Big O of N Bang Bang? Was it fun to hack these together? Yeah, it totally was. Instead of do the simplest thing that could possibly work, I went for do the stupidest thing that could possibly work. And best of all, it was fast. All the render code you saw, all the Ruby code, totaled took me less than three hours. The deployment took a lot longer because it turns out I don't actually understand networking all that well, but I did learn it and I got it working. And one of the things I noticed while working on this talk is that we're really spoiled now with regards to how much computing power we have access to. I work at a place that's famous for that, but I also carry around the equivalent of four processors in my bag most of the time and that doesn't include my phone. And it's starting to make me question, what is a computer? Ah, whatever. Can a pile of disks in one data center and some processors in another data center be a computer? I'm starting to think so. And a lot of folks are talking about how the limits of what we can compute is getting bigger. The data center is the computer and it makes it possible to do a lot of amazing things and a lot of really stupid things. So the moral of this talk shouldn't be distributed systems are fun and easy, but really that is not the moral of this talk. Distributed systems are easier thanks to tools like Rinda and thanks to cloud computing where I can spin up 100 VMs in a data center that's multiple states or countries away, but the real moral of this talk is don't be afraid to act stupid and have fun. Enjoy your code. So thank you. I explicitly want to say thank these four guys, Ryan Davis, Scott Windsor, Eric Hodel, and Brian Dorsey. Without them, this talk wouldn't have happened. Eric was the one who tried to help me finally understand how Rinda was actually working. Scott helped me understand how Docker containers talk to each other and how to get a container's IP, which is gross, in the official Docker way. And Ryan helped me figure out a less gross and more Ruby-based version of this and Brian Dorsey is one of the developer advocates on the Kubernetes team and he was able to help me with the Kubernetes parts of this. And because it's me, I have stickers and information about Google Cloud and of course I have dinosaurs, which I wasn't gonna break into until I finished my talk. If you have comments, questions, you can email me at thagamizeratgoogle.com. I am inordinately proud of the fact that I managed to get that. Or you can tweet at me at thagamizerrb. And I think I've got some time for some questions if you have some. Questions, yes. I have not tried the emoji thing with TensorFlow. I actually wrote this talk a month and a half ago and I did not know that TensorFlow was gonna be open source this week, but I'm going to. That's gonna be fun. TensorFlow is a machine learning library that Google open sourced on Tuesday of this previous week. And it's out as Apache V2. You can download it. You can try to do your machine learning and computational analysis with it. Other questions. So the question was, does Rinda handle the parallelism for me? For example, when I say tuplespace.take, does it actually pull it out of the tuplespace? And the answer is yes. Rinda handles all of that for me. It handles it in a way that makes sense if you understand that it was done in the simplest way that could possibly work, but it's a little bit obscure. Basically what happens is the client sends a message to the server saying here's the pattern and here's a port in a URI where you can send me something back. And it just does that over again and it takes it out of the tuplespace once it's been assigned to a specific client. Other questions. Awesome, thank you guys.