 for the next 40 minutes or so, I'm gonna be talking about voice controlled home automation in Ruby. So if you wanna follow along, you can go to this link that's cut off on the screen. Also on Twitter, I just tweeted out a link. So I've got all these slides are available on GitHub, also all the code I'm gonna be talking about and showing and talking over and through is available on my GitHub. So one thing for this talk, if you've got questions as I'm going along, please feel free to shout out, raise your hand, try to get my attention. So something comes up that I probably didn't explain very well, just grab my attention and we'll try to take questions as we go. So what this talk is, I've had a home side project that I've been working on, on and off over the last few years and what it is today, I've got a voice controlled television remote. So I use the Amazon Alexa and I'm able to control my TV with just my voice. So there's kind of two halves to this talk. First part, I'm gonna explain why and how I built a HTTP web based TV remote. Then the second part, we're gonna talk about the Amazon Echo, how it works, how to build your own skills for it and then how to more specifically build a skill to control the TV that I've got. In the end, we'll tie it all together and I'll show a little video of all this working in action. Quick introduction, I'm on Twitter, GitHub and I live and work in Chicago at a company called DRW where a principal trading firm offices in London, New York, Singapore, Montreal and Austin. Primarily, I write Ruby JavaScript enclosure but occasionally, like Monday, I find myself writing in programs like C. So all across the board in terms of technology staff. One quick warning before we begin. I know some people have hard feelings about JavaScript. There will be some JavaScript in this talk. It's not gonna be too bad, we'll get through it all together. Okay, so let's jump in the time machine and go back to February 2014. Around this time, a lot of coworkers, programmer, friends of mine, were doing the DIY home automation thing. A lot of people were using Raspberry Pi's or Arduinos and so I decided I should probably have one of these, I don't know. So Raspberry Pi's a very small computer, takes up a very low amount of electricity. It's very inexpensive, $35 I think. And it runs Linux, so who would want an extra Linux server running in their apartment? But at the time I bought it, I had no idea what I was gonna do with it. Turned out it was a good purchase. So not long after that purchase, March 2014, I got an email from the apartment building I lived in at the time. The email said that everyone in the building had to upgrade to Tivo, upgrade their cable service. But the benefit of doing so would be a lower rate. So at the time, I lived in this crazy world where I didn't record TV. Mostly I watched live sports, so that was kind of okay. But in order to save money, I figured why not, I'll give this Tivo a big shot. So kind of life changing, can't believe I lived without it for so long. And of course, with great power, the power to record live TV, comes lots of buttons on a remote. So the remote picture down the lock is what I had before Tivo. And it's fine, it's a standard TV remote. Changed the channel, changed the volume, no big deal. One on the right, kind of the same, but there's like 20 extra buttons. I don't know, honestly to this day, some of these buttons I've never touched. You look above the one, two, three on the numpad, there's colored buttons here that say ABCB. I don't know what they do, I've never touched them. I don't think I ever will. I'm scared to find out what would happen if I actually didn't touch them. So I thought way too much about TV remotes when this crazy remote came into my life. And as a result of that, I tried the Tivo app. Now, meaning that it's gotten better, I don't know. Hopefully no one listening to this right now worked on this app, because I might say some not-nest things about it. So the picture on the lock is, the first thing you see when you open this app is like a giant zoomed-in picture of your remote. Only because it's so big, you can't actually press all the buttons without scrolling. So not only did I not like the remote in the first place, they wanted me to scroll around in order to use it. It didn't make a lot of sense to me. The other feature of this app, which was kinda cool, was a built-in TV guide with the remote buttons. So you can see what's on TV and then click it and get your TV to turn it, turn to that channel. That was cool, except for the whole scrolling and the app not catching any data. So anytime you scroll to see other channels, there's this long loading process that made it pretty much un-easeable. So I kinda wondered, maybe it's a better way. And the hacker in me started thinking, maybe I could run a man in the middle of tech or network traffic between the app on my phone and the Tivo device. Thankfully, I didn't go down that path too far. And I started Googling around. It turns out that Tivo has an API. So there's a TCP protocol for Tivo. Probably many of you have Tivos at home and had no idea this existed, that's why we're here. So basically, there's a setting you can turn on and it opens up a TCP board. You can connect up to it with a command line utility like NetGad, or yeah, and it's super easy. Tell that, not NetGad. And so you can just send commands. And basically anything you can do with the remote, you can tell your Tivo to do by sending a TCP command. What do the commands look like? It's basically a command, a space, and then a parameter. And it's all just asking characters. Super simple to play around with. So some example commands, there's the course CH command which changes your channel. You can do the teleport guide command which will bring up the TV guide on your screen. And if the command that's not really as cool as it sounds, teleport to live TV. So one thing to note about a Tivo controller is when you press the volume buttons, it's not communicating with the Tivo, it's actually communicating through the IR signal to your TV. So despite having all this power through the Tivo TCP protocol, I still needed a way to talk to my TV. Luckily, I had a sharp aquas which also has a TCP protocol. I think it's fairly unusual for TV manufacturers to include an API like this, but I was lucky enough that my TV actually had a protocol. So similar to the Tivo protocol, you cut the TCP socket and then you send commands and parameters. So in this case, there are four character length commands followed by four character length parameters. So you might have to add some spaces to make sure everything lines up right. But you can send a power command to turn on and off the TV, you can change the input, you can change the volume, you can also ask for the volume, which is useful so you know what the previous volume was and you can mute. So I can't really think of too many other things you wanna do with the TV. And obviously the Tivo stuff, you can do anything else with that. So at this point, as I'm discovering this stuff, remember it was still back three and a half years ago, I finally found a use for my Raspberry Pi. I could write a little web app that maintained a TCP connection to both my Tivo and my TV, and then there was HTTP-based TV remote. So that's exactly what I did. I grabbed Sinatra and I started writing some Ruby. So for the TCP connections, I decided to use an event machine connection. That machine's a library I'm familiar with and it plays nicely with Sinatra in order to have a nice, a bent model and it handles the TCP life cycle for you. So I can give the event machine connection a IP address and a port and then a connection class, which I'll show you in a second. I've got different classes for both the TV and the Tivo. And then the API, if you will, to my remote control is just a bunch of get commands. They're all gets, because that was kind of the simplest thing I could do. If I want to have to worry about using a post, it's easy to test from the command line in Perl. It's easy to test and use from the browser. So I've got the ability to change the volume up or down, set the volume to a specific number. I can meet the TV on mute, turn the power off. There's no power on button. I actually handle that when I change the channel, I sort of have this implied feature that if I'm changing the channel, I probably want the TV to be on and I probably want the input to be set to, in this case, one. And then the last command here at the bottom is kind of a pass through to Tivo. So I can kind of keep all that logic flexible outside of this little pass through. So I need command and value, I want to send to Tivo. I can do that at the slash Tivo command here at the bottom. So quick look at what the TCP connection classes look like. For the Tivo, it's really simple. When I receive data, I'm just gonna log it. Turns out the Tivo connection never really gives you anything too interesting. So I don't need to keep track of any state or do anything as a result of list of commands. Also, if you think about failure models here, if I press a button on a remote and it doesn't go to that channel, I'm just gonna press it again so I don't really care about air handling a whole bunch. It failed in a really good way. And then the second method here, one bind is called, the connection is lost and then I just want to do that to kind of keep all the state connected. The TV connection class is a little bit more complicated because of maintaining state of your volume. So if you think about any TV remote you've ever used, there's a volume up button and a volume down button. And they go in increments of one, one unit of volume, whatever that means. And the way the Sharp Aqua's TCP Pro all works is you have to set a number for the volume. And so in order to set a number, you have to know what the previous volume state was. So when the TCP socket opens, I start pulling the volume that's this send data command here. The reason I pulled for it is it turns out the Sharp Aqua's doesn't always expand. So I've learned over time that if I just pull for it I can second, eventually I'll figure out what the volume is and then when I go to increment it, it'll be able to take that volume from its current state and add one to it. And then to receive data method here, I'm ignoring data that's an error or an okay. And the only other data I get besides that is a number back and that's my volume. So I store that in the volume variable. If I unbind, I want to reconnect again. Down here at the bottom is the code to make sure I'm sending commands and parameters that are the right length and also maintaining the state of volume. So I've got the Smajra app, it's running on my Raspberry Pi and I built a little web app on top of that. So here's kind of what my TV remote looks like and it's great because I can pull it up on my phone, I can pull it up on my computer, let's say I'm at home sitting on the couch, searching the internet and I want to change the channel. Just open a tab in my browser, click a button, it's done. I don't have to reach for the remote, it's great. As you can see I've got a TV guide data in here. There are lots of places on the internet where you can get very accurate TV guide data. And so the other thing to note about this remote, there's no channel numbers, right? I don't care if it's CVS or 602, I just want to turn on CVS, why should I care about the numbers? So I got rid of numbers and just went with channel names. The other thing is it's very custom, with the nine or 10 channels on this picture are kind of the nine or 10 channels I tend to watch, usually the channel that shows boards. So it's also easy to make sort of customized special remotes for certain times of the year, so I don't know if there's any college basketball fans here, but in March, this is what my remote turns into. Four channels with all the games and I'll never lose track of which teams playing on which channel. So that's kind of the first half of this talk. And so now the second half we're gonna play around with the Amazon Echo. So first off, I have to apologize. I could not get this device to connect to the account that's locked by. So we're gonna go to plan B and I had some issues with this, so we'll see how it goes. The Amazon Alexa is set up so that even if you don't have an Amazon official device, you can actually use the underlying voice services. And so there's a website out there called EchoStone.io and if this lets you use Alexa, even if you don't have an Alexa device, so we're gonna try to use that in place of the Alexa itself. So just to begin, I wanna play around a little and see what this thing gives us. So we'll start with see if Alexa wants to tell us what your birthday is. Alexa, what's your birthday? My birthday is November 6th. Okay, so Alexa's got a little bit of personality. I think November 6th is probably the day that the device first went on sale. Let's try to find out what the weather is. Alexa, what's the weather for tomorrow? Here's the forecast for tomorrow in Angeberg. Look for mostly sunny weather with a high of 58 and a low of 36. So I don't know what city that actually is. What I was expecting to happen with the actual device is we would get the weather for Chicago and then I would tell you that we got the weather for Chicago because the device is programmed and set up to give us Chicago's weather. But let's see if we can get the New Orleans weather and it might be more interesting. Alexa, what's the weather in New Orleans? Currently, in New Orleans, Louisiana is 72 degrees with clear skies and sun. Today, you can expect partly sunny weather with a high of 74 degrees and a low of 57 degrees. Okay, beautiful day, I can't believe there's so many of you inside listening to me. Another thing that's kind of fun to play around with with Alexa is just sort of like random and trivia facts. I like to try to understand their algorithms and throw a random question to see what it knows, what it doesn't know. And I kind of think about we Google something and they give you that little box at the top that always has the exact right answer you're looking for. Those are the sorts of things that Alexa has had to be really good at answering. Alexa, who was the 23rd president? The 23rd US president was Benjamin Harrison. Okay, so Alexa got that one. And then another thing that I always end up doing, I'm watching a movie or a football game and I want to know like how old is that person? I don't know why I always had this question but it just always pops in my head and Alexa's great for that. Alexa, how old is Tom Brady? Tom Brady is 40 years old. All right. And then one more and then we'll move on with the talk. So Alexa, it's all a joke. What did the Buddha say to the hot dog winner? Make me one with everything. All right, so Alexa's full of interesting jokes. The last time I gave this talk, I got an engineering joke so that was great but not as good today. Okay, so a few facts about the echo. June 2015, they released the Alexa skills kit and what this did, it allowed anyone to write a skill for the Alexa. I think this was the first kind of mainstream popular voice service that released an SDK and I think that's one of the main drivers why the Alexa is sort of at the top of the food chain in terms of voice services compared to the Google home device and Siri. There's a lot of, there's over 25,000 published skills, over 15 million devices have been sold and just to give you a sense of some of the things that I use Alexa for, gotta put jeopardy every day. You get six or 12 questions depending on the day. Shopping list is really nice. Let's say I'm in the kitchen, I run out of the spice, I get to stream out, Alexa adds cinnamon to the shopping list. Then there's an app that goes with the device, I'll be in the grocery store, pull out the app and all those things I forgot I ran out of are just right there waiting for me. Timers are really nice, again in the kitchen, you're cooking, you got like chicken juice on your hands, you don't want to touch anything. Just stream out Alexa, set a timer for 15 minutes and you don't have to worry about it. The weather, another thing I ask it all the time and then finally, my favorite skill for the Alexa is control in 1988. So what is this thing? What is it actually doing? So this is a very high level overview, but the device sits there and it's waiting. It's waiting for the wake word which is Alexa. There's some other wake words you can set but Alexa's kind of the most popular one. And once it hears the wake word, it starts sending what it hears up to the Alexa voice service. So if you think about this web page that I've been using, it's not sitting listening to the wake word, I have to either click the microphone button or hold spacebar, but it does the same thing. Once it captures the audio that you say, it sends it kind of number two in this picture, sends it up to the Alexa voice service. So when it gets to the Alexa voice service, Amazon does this amazing trick where they figure out what it is you just said. So in this example I'm saying we call Pandora to play music. Pandora is what's called the invocation name on the skill. So Amazon figures out what that skill is looking for, what utterance is it called that it's looking for. In this case, keep playing music with the utterance. Then it takes all that information, turns it into a JSON message and sends it over to the skill server. Skill server does its thing, we're gonna dig into that in a minute, send the response back, then the voice service sends it back on for the voice and you hear music. So building a custom skill there's kind of two ways to do this. One is to use AWS Lambda. The other is you create your own self-posted web service. Basically you have to receive an HTTP post. And so there's kind of pros and cons, right? AWS Lambda, you don't have to manage the compute resources, you don't need to pass around the pass-as-sell certs. You get kind of all the Amazon building access control security. It's gonna be free for a long time on the free tier. However, you're limited to a couple of runtime environments, Java, C-sharp, Python, Java script. Notice I didn't say Ruby. So of course we're gonna figure out how to do run an AWS Lambda function. And so that's what we're gonna dig into. So first, I'll start with kind of a hello world of Amazon Alexa skills. So I just want Alexa, open the greeter. Hello Ruby conference, I'm glad this demo worked. All right, so I'm really glad that worked because I already had the texture not here. So this is the JavaScript account Solview. This is a AWS Lambda function that gets invoked through the Amazon Alexa. And it's pretty simple, this is just a hello world example, but the function that's invoked gets two arguments. The first one is your event object. It's kind of like the request object, all the information about what Amazon heard someone say. Second object is contact, which has a succeed method on it. So the simplest thing to possibly do is to just call that succeed with, in this case, the response and its output speech and then some text. And that is what Alexa will say after you respond. So pretty simple, hello world. And so now we're gonna figure out how to do that in Ruby and then eventually build up a language and build up a skill that actually does something a little more interesting than just saying it can't message. So using Ruby in Lambda, first step is you need a pre-compiled Ruby binary. The reason is, as I mentioned, there are only four available run times for AWS Lambda. So when you upload your Lambda function, you can upload a zip with really whatever you want. And then you can shell out from that JavaScript function and you can execute anything you want. Of course, it's gonna have to be able to execute on the platform where the Lambda function is actually running. So the Ruby binary I grabbed is that I didn't build it myself. There's a great product out there called Cable Ruby. It's got pre-compiled Ruby binaries for a number of different platforms. One platform is the Linux X8664 and that is what AWS Lambda runs in. So I could grab that Ruby binary and then shell out from that Lambda function to invoke a Ruby code and voila, I'm in Rubyland writing a Alexa scale. So then shell out, execute the Ruby, package up all the Ruby together with the JavaScript file, configure your Lambda function to execute that JavaScript which will then invoke the Ruby and all this you upload to S3 or do the Lambda web UI relatively straightforward to configure it all. So what does this look like? So now we've got our Ruby greeter. So let's do this one. Alexa, open the Ruby greeter. Hello from Ruby. All right, so we've got kind of split string code going on here. The top half is our JavaScript and all it's doing is in that Lambda function it's just exacting out to Ruby, running Ruby, running our color world.rb file and then passing a callback in and that callbacks in both when Ruby's not executing and then I'm able to access standard error from the Ruby process. I just part standard error as JSON and call context.succ.net. So the Ruby code here, pretty simple, this example, I'm just writing some, taking a Ruby cache, converting it to JSON and writing it to standard app. So that kind of ties everything together. You can imagine teaching that event object, the first argument to the Lambda function, turning that into a string and including that as an argument to the Ruby code and that's what the next example we walk through is gonna actually do. So now that we kind of have the very simple hello world example going we need to build a language for our skill in order to understand text. So I wanna be able to say, Alexa, tell the TV to turn on ESPN and I need to understand that I said ESPN and not the OC. So to do that, we need an interaction model and we can build a Alexa call to that in tolerances. So we're gonna go through an example of just having Alexa repeat what we say. So we can have a intent called repeat name and we're just gonna say T say blank and that blank is what's called a slot and we're gonna name it a name. And so we have an intent schema is something that you have to configure when building a skill. So there's one intent here, it's called repeat name and again it's got a slot and then I'm actually telling Amazon to expect a US first name. The more specific you are when building up your language with Amazon, the better it is at actually matching the speech of gears and giving you a text that makes sense on the other end. So what this looks like when you're building a skill, you've got the name for your skill in this case name repeater, the invocation name is here, it's just repeater in this case. Then you draw your intent schema in one of the boxes and also your tolerances get put there as well. So just a couple of steps and you're off and running. So let's try this out. Does anyone wanna throw out a name that we'll see if Alexa can understand? We might only get one part of that but we'll see. Alexa, tell the repeater to say Alexander Hamilton. I heard you say Hamilton. All right, so we only got the Hamilton and that's because we go back and look our utterance here is to say, and it's just expecting one word, had I put two words in there and we could have gotten both words back but it kind of just ignored that Alexander part and matched up on Hamilton. I don't know why I picked one versus the other but that's why we only got one word and not two. But that's okay. So looking at the Ruby code, this is the Ruby that runs being executed from the JavaScript Lambda function and I take the first argument to Ruby now is gonna be that event request object and I'm just going deep into that object to pull out the main value and then I'm plugging it in down here in the output speech in order to get a response back. So now that we can pull out words that I've said to Alexa, I need a way to get that information back to the Raspberry Pi sitting on my home network and I could have opened up a port to my home network but that seems very insecure and scary. So the way I solved this problem was by using AWS simple queue service, SQS for short. It's another Amazon service and it's just an easy way to pass messages from one place to another. So I can easily send a message with an HTTP guide or post. You can also use AWS SDK to do that and then the same deal I'm receiving that in AWS SDK. So receiving a message, this is code that is gonna run on the Raspberry Pi. I'm using the AWS SDK Young. Just configuring my key name and a couple parameters, wait time of 20 seconds. So it's kind of just doing a long pull all the time and then I print the message out to the bottom. So sending to a queue from Ruby in Lambda requires a couple things. So I can issue an HTTP guide or post to send a message to the queue. However, AWS has this kind of complicated process of signing one of those requests so you sign it in order for AWS to know you are and say you are, you have the right credentials to add a message to that queue, the message is new and not old. So the signing process is a little complicated and in order to help with that, you could pull in the AWS SDK. However, getting that done into Lambda would require some groups, so I kind of cheated. I found the simplest dependency I needed, which was the AWS SIGV4 Gen. That Gen has four files and no dependencies. So what I did was I grabbed those four files, I threw them in that zip-up package that I sent to Lambda and so now I've got the AWS SIGV4 and signed my messages to send them to SIGV4. So that's this big blob, send SIGV4 function, takes a key, a secret, and a queue. Those things I've configured through environment variables and then a message is what is the actual text I want to send to that SQS queue to hopefully get off the other end of my home network. So this function does all the signing, does a synchronous get request. I want to make sure I'm done sending to SQS before I move on and respond back to Alexa. So this code is gonna look pretty similar to the last example except for this one call, sending to SQS, I waited for that to finish, then I respond back to Alexa. So let's try this example. I'm going to run that RGCMe code here in super good font. I'm gonna start that, switch back over here. Alexa told the message sender to send Alexander Hamilton. I heard you say Hamilton. Okay, and there is Hamilton showed up from Lambda, from Ruby to SQS, right here to my laptop. All right, so now that we've got all that out of the way, it's time to actually build our TV language one part. So there's now more than just having a name repeating there's a bunch of things that I'm gonna wanna have, change channel, an off one, volume up, volume down, volume set, back, mute, unmute. You can imagine some other ones like fast forward, rewind, whatever you wanna do. Most of these are gonna be pretty straightforward, they're not gonna need any slots, but changing the channel, I've got a slot and also if I wanna set the volume to a specific number using an Amazon number type. So the channel slot type is a list of channels that's a custom made slot, what that looks like. It's just gonna be a list of all the words that are channel made that I'm hoping will help Amazon understand what I say. I mean, let's face it, nothing more frustrating than screaming at your Alexa to change the channel to ESPN and it turns on Fox. I mean, it's very frustrating. So we've got our intents and then the utterances are the last piece and so you can have more than one utterance for an intents. And in fact, the more utterances you have, the better it is. For changing the channel, I've got one here, turn on, and then the channel, change to it on the channel. The more things you have, the easier it is on the user because the user doesn't have to remember the exact perfect specific language. It's a little more conversational, although not completely, the more things you have, the easier it is to use and the better chance it is that Amazon is gonna understand the speech that appears and turn it into the correct data to send to your skill. So we've got all these intents, utterances, so now the TV skill starts again, we've got our syntax.js function, then there's gonna be a lookup channel function. This is there in order to handle channel numbers if I want to just throw a number out there to be more specific or a new channel that I haven't figured yet. Otherwise, there's a little map to lookup channel name to number. So then I've got two half-geteer commands and responses. The commands are what I'm gonna send to SQS, and hopefully you'll remember those HTTP get-rocks from my Sinatra web app. We've been in this talk, they look a lot like those, so that's actually what I'm gonna send to SQS is like the ending portion of a URL. The responses are what Alexa's gonna say back, then I have a couple special cases if I'm setting the value to a number or changing the channel, I need to append the value number or the channel number to me and have both the command and the response. So then I send off SQS and I send the response back to Alexa, and so the final step is a little script that runs on the Raspberry Pi in a loop. I run my Ruby SQS go-snare, play the output of that into this XR scroll command, and that issues HTTP get to a local host the port I'm running my Sinatra server on, and at the end of that, I'm inserting what came back through SQS. So, time to take a look at this in action. Alexa, call the TV to turn on NBC. Changing the channel to 605. Alexa, call the TV to turn on WGN. Changing the channel to 609. Alexa, call the TV to set the volume to 24. Changing the volume to 24. Alexa, tell the TV to mute. Mute. Alexa, tell the TV to turn off. Now turning off the TV. All right, so there you have it, voice control commodimation using Ruby. There's a bunch of links on the last slide in this presentation that go into a lot more detail than I could in 40 minutes, but you can pull a lot of reference data for all the stuff, all the code that backed all these examples is on my GitHub. I think I'm just about out of time. I'll hang out for a while, so if you have more questions, I'll be around and I'll play answer them. Thank you.