 Well thanks for showing up so early. It's what is it nine? Excellent in that case. This is actually pretty impressive So I want to start with a little bit of a different goal a lot of yesterday At least to me was more of a technical run-through and there are elements of this that are technical I mean I have code samples will go as deep as we want to but my real goal here is a little different I want to spark some idea you might have if you've got something Noodling in the back of your head and Chris hit on this yesterday last night if you've got something that you want to pull off It just seems like it's worth your time of tinkering I want to get it to the point where if it involves telephones. You're ready to go It doesn't feel like there's any real barrier to being able to build ruby and voice applications Second I want to shine a light in areas that might not have been explored before We've all heard of it here's and or at least a lot of people have But the number of people who have actually sat down Installed adhesion installed asterisk gotten a pvx connection gotten a sip connection and then built something Let alone actually rolled it out in the production with real people calling it Pretty darn small. So I'm hoping we can kind of get a glimpse at what it looks like to have a real application in production Which I do and we'll load it and we'll refresh and we'll call and do it and watch it crash hopefully not and Then third I want to help you build something cool and the coldest thing I could think of was dead Shaw in an igloo And indeed he does look rather chilly. I don't know It's him versus the parka So here's where we stand today If anybody came to ruby hoedown last year you might have seen Jay Phillips actually presenting about adhesion I did not but I've seen the slides. I've seen kind of the result in the year It doesn't feel like we've really come a year. It feels like The projects move forward a little bit It feels like it's still a great framework And it feels like we've solved the framework problem. We've got ways to build voice applications in ruby But among the crowd has anybody actually done that? One two Okay, so we're two for like 50 or 60 I want to raise the number because when you're doing it and when you actually hit an url Or you run a rake task or Bring up murb or sanatra and your phone calls you. It's really freaking slick It's like Like I get a caller id on this thing that says cloud vox or says send sign or one of the other applications We have and it just makes me smile. It's my version of the app store And I don't have to pay apple so the other thing I want to do is A lot of the things you might have heard about with adhesion and and just Asterisk in general is kind of hackery and I love hackery. It's great But I think chris really hit it hit the nose or hit the nail on the head yesterday When he said solve a problem that you have and there'll be other people who have that same problem You'll have a user base ready to go. So don't try to imagine it But make sure you actually have that problem So the examples that come to mind Um If you know of the the rumba that's callable the rumba vacuum It's novel, but it's not really a problem Same thing with the flower waterer. There's a way to make a flower waterer call you to remind you to water the plant And again, okay, it's novel. You've proven that you can make phone calls with software But you didn't really start out with a spark of I've got an issue that I want to solve with phones so the chances of that getting either usage or adoption or interest or focus any kind of cost emphasis whatever is zero Um, and that's where we aren't going to you know, we aren't going to feed starving kids in India But we are going to solve some real problems here This is 10 years in so uh, this gentleman over here actually wrote asterisk back in the day In fact the day being last night I think This is more expensive if anybody cares Um, he rocks and he's he's the 10 years That asterisk has been evolving Yeah, say hi everybody. He's He doesn't bite So I bring that up because where we sit at the very top of this stack You know, you might write one line of adhesion and that one line of adhesion Might map to two or three or five or even 10 agi commands Which is the asterisk gateway interface that we'll get to here in a minute And that might in turn map to 50 or 100 or many many many more macros dial plans Sip packets of course This really is the very very top of the stack and we're finally at the point Where you can sit down and without knowing a whole lot about the voice world and without knowing a whole lot about telephony at all You can write something useful and that's what I hope to show today Okay, so twitter vox. Okay. I just said it was going to be useful now this what the heck so If anybody pays attention to nasa at all Like, you know, we are in hunts vegas There we go. So they've got a little rocket apparently Yeah, and they launched the lunar lander. This is the marz phoenix lander with a payphone And alexander grand bell's creation Oh, yeah, exactly. I actually had to borrow this from nasa to pull this shot off. It's not photoshopped at all It is early, isn't it so What this is Well, and actually a little context is in order When it was landing, um, nasa figured as a good pr move They would put up a twitter feed because hey, who doesn't have a twitter feed right now Who doesn't follow twitter there? I assume nasa has engineers just like us We're all reloading all the time or reading via rss or twitter ifik or whatever So they had the bright idea of of twittering in the first person as if the lunar lander was twittering Which hey great pr move whole world is linking to it now. They can justify more congressional budget dollars Justifiably. So what we wanted was to be able to call in and see when it landed I wanted to hear the latest twitter From the lunar lander over the phone Now, I wouldn't say this is necessarily terribly utilitarian not going to solve world hunger here but You can imagine that if this was made public and if this was put up on the lunar lander blog or up on nasa Or anywhere it would have a torrent of calls much more so than say the rumba or the flower water There are things that that really do spread virally So Let's do it Let me go to the app store. Oh wait. No, no And you can see I have succumbed to steve jobs's vision I sold my soul to the man So I am going to call and actually if anybody else wants to by all means feel free pull out your phone and And give it a shot This is running on cloud fox for the infrastructure But the actual server that's doing the twitter is is sort of closer to home It's a slice on slice host. So we may make it crumble, but that's okay. That's what slice host is there for Um, what it's doing and I'll get to this in a sec. Let me see if we can actually call in Put it on speaker here Keypad one two four five five There's been a new tweet. Here's what it says Number zero eight zero eight zero eight. It's all 73 here Salt equals martian day toss all i'm digging to widen the new neverland trench Plus looking at some icy spots cheers. Goodbye Cheers the lunar lander has told us cheers. So what did we learn from that one? It sounds like crap Why does it sound like crap? Well, because we're taking total garbage a twitter That has its own vocabulary its own vernacular shorthand everything and we're trying to run it through the most programmatic Transformation you could possibly imagine a text-to-speech engine and expecting good output and it really does boil down to garbage in garbage out I'll show another example here in a minute where we speak english And when we pass english in lo and behold that actually sounds pretty darn good So while I wouldn't recommend using it for a free forum text and actually if you get a chance Load the lunar lander the twitter feed. It's just twitter.com slash mars phoenix This was kepstral swift Yeah, which um, we have a new allison voice that sounds better than that Oh, so allison's getting sexier So these voices have names for anybody who doesn't have text-to-speech experience Which makes it more fun to say allison is sexy Yeah, so that that voice was callie. I think um Now what what happened when I placed that call? This whole stack came into play This is the part that is actually interesting to us so And actually I will point in one piece of context the agi request. That's asterisk gateway interface What's cool about agi is it divorces the application that you're writing From the asterisk server. Why is that slick? Well, it means you can scale out instead of scaling up So where the old model was this behemoth pbx that did everything under the sun the the monolithic pbx You kept adding voicemail modules. You kept adding uh conferencing And we all heard the horror stories of the the half a million dollar pbx or the hundred thousand dollar pbx This is the exact opposite. Let's have the pbx do one thing and expose bindings to us that we can do cool Shit with Tool stuff with yeah And by doing that That means a couple things one we can use the same pbx for multiple applications, of course But two it makes it 10 times easier for those applications to scale because now we can use conventional load balancing techniques Now I mentioned run call flow So speaking specifically of adhesion. This is what run call flow means On the client side where I've written this application this twitter vox. I run this command I've already created the voice app directory which can be within a rails app or can be outside of a rails app When I run this it starts a daemon by default on port 45 73, which is the agi port that asterisk gateway interface What that does Is runs the call the call information the metadata in ascii So rather than sending across the wave or the pcm which would again Prohibit scaling quite as nicely all that's going across between asterisk and adhesion is a pure text protocol So you could run this on a dsl connection. You could run it on Um with a little luck maybe even on dream host on the on the client side I don't know what dream host would say about that but When when that connection arrives so when we placed those phone calls What showed up on that client application? Was a single tcp connection per phone call and specifically We've all we're all used to http queries. What shows up here is an agi array same structure And actually my business partner just contributed this catch to adhesion Which lets you hard code The application name and makes it easier to run multiple applications within the same Adhesion runtime, which is pretty cool so This this rule gets requested if you've ever seen fast agi it looks really similar to fast agi Key value pairs show up or http for that matter and Then there is a file appropriately unnamed dial plan rb and dial plan rb takes all the crap That conventionally would exist in extensions.com. That's the asterisk extensions And instead of having it sitting there Where only the pbx admin can run it We've moved it out to the client side where whoever's doing the application development can make all the changes they want So you've got the full power of the asterisk dial plan, but you've got it on the client side in ruby where it matters And it runs stuff. Hopefully uncommented though So if anybody's got a a pc open feel free to pull this up And i'll give you an url later on that has all this stuff up on github or gist um, what this is is the phoenix lander so Default and i'm going to flip back here for a sec This hoedown, which is a in asterisk is a context or a set of dial plan instructions Shows up And like i said, you're running hoedown and you're running a block of stuff In this case, we're running default with a block of stuff and the stuff Catches an exception if it happens, but otherwise We'll take A command or an url argument the a equals b in this case the url argument is called twitter user And adhesion automatically exposes that to us in the local namespace So all you need to do when creating this application Is point to it and you can you can append state if you will from asterisk And a whole bunch of it shows up automatically that we'll see here in a minute And then it does more or less what you'd expect grab the jason feed decode it Get rid of cruft and One caveat if you're ever doing a demo Be careful with demoing the hot stuff In a second, you'll see why I tried to make the inverse. I tried to make it where Twitter would call me when I tweeted and i'll see that here in a sec The problem is twitter is a little overloaded because we all know rails can't scale Um, so the the twits show up a little slowly Um, and then it runs swift Name of the voice What to say A couple pauses for grammar The text that we just built and then a friendly goodbye so Some a couple text speech engines, um, this is all on the asterisk side So the good news is if you use asterisk now Or you install it yourself some of it you don't have to worry about the problem is a lot of it you do When you download asterisk now you get a pretty bare bones configuration So there is some effort required to get kind of the the underpinnings going Um, I covered speed or actually I didn't cover this this will pace it And makes a lot of difference It really is the difference between her being all happy And her being like man, what the hell are you saying to me twitter? garbage in garbage out So I mentioned my inspiration for this was Human should not pull I actually gave that number for the twitter feedback or out to a couple people And we got more calls than I expected I was just playing around and people started calling it and they'd tell their friends and it kind of got It got viral among friends The problem is people actually kept calling it and they'd call in like once a day Which is kind of cool, you know people are using the application But it's kind of stupid too because it's a square peg in a round hole. It's the wrong solution to the problem Rather than having them call in to get the current tweet Let's change it to have that have The script call out To whoever you want to whenever a new tweet is posted or tweet is posted It is tweet isn't it tweet I got to get my grammar right, you know If I'm in Huntsville, is it still tweet? Yeah, tweet y'all Tweet y'all So with the tweet y'all So here's the polling This is fairly straightforward if you've used continuations And actually my business partner eric gets credit for for writing this What we've got go through fairly basic Grab a current tweet id and then execute the the continuation And lo and behold we're actually able to use the same code that we used for This I'm just I'm going to go back a couple We're using the same code that we used here For an outgoing call. What's this mean? I was able to call in Here the twitter now We've got this separate polar that's looking for information and is Placing an outgoing call when it finds it Well rather than having to hard code what should happen when that call goes out instead of hard coding it and and having two separate Basically forks of the code or having to set up our own library just to do this What we end up with is that functionality for free The way we get that for free is this When you place an outgoing call you tell adheres and what should happen when the other end picks up that line In this case, we tell it to do exactly the same thing as Happens when somebody calls into the phone number So we've got the direct route if you call in and then we've got this kind of shoehorned Send this outgoing notification And once somebody picks it up send them back into that same agi url that we used before So this agi url is just the same url that we used before On our server The other thing I want to note here It's using drb for all the outgoing communication So something that comes free with adhesion is the ability to use not just agi That's the incoming request. So asterisk to the client There's a separate protocol called ami asterisk manager interface And that's from the client out to asterisk. So like for outgoing calls and this is how we use it You'll also note Yes, so we pass in the local hurry all this is running on local host here. Um, nothing funky on the outside. Oh And sync and asynchronous so I don't have an example here But it's pretty straightforward to either block or don't block depending on what you want for the user behavior There's cases where, you know, if somebody's hitting a web page, you obviously don't want to block Get it out of the the inline path and go to town But if you're running it as a rake task or you've got some you need an absolute return code By all means Um Where it gets a little funky is when you're calling out As as Courtney might say fail Somebody's not going to pick up Or they're going to have a busy signal or maybe they're going to pick up, but it's not going to be human It's going to be voicemail There's a whole bunch of corner cases that you have to consider if you really care about the user experience But for our purposes for just kicking some rear and building cool stuff Doesn't matter in the slightest, you know, just go to town Oh, yes, the process of building voice apps. So I learned this the hard way. Um, I try to be pretty tdd, but Well, it got to be really easy To just pull out the phone Oh, I got another test here. I just implemented a couple lines. Let me call it and see does it work or not Well, imagine testing the xbox without having an emulator You would poke your eyes out and I rapidly approached the point even with teeny teeny little apps Where I'd be like, all right, I'm gonna call in again. I'm gonna call in again and test it So if you have to call in you're doing something wrong You look like this guy who's I'm not sure what he's playing with but it looks nuclear And his hat says I love you for some reason This is what google images turns up, you know, you never can tell So the way I've had the best luck start out on paper Mock up exactly what you do and it can literally be a little flow diagram. In fact, that's the most fun Small first so Do the very very core functionality no logging no errors. No nothing and then placeholders first So when you call into a professional text-to-speech system or a professional IVR Usually what you're hearing is a way of file or an mp3 or something. It's a prerecorded Audio file Don't do that. Don't bother use text speech for everything and until you get the exact workflow, right? Keep as low hanging as you can So back to the incoming calls when we called into twitter This is all the things that asterisk passed in to the agi library Now it gets even slicker because adhesion does the heavy lifting of turning these into variables for you So it lops off the agi underscore and all of these become first-class named variables So for example, why is this like you get the caller ID? Why is this even slicker if you enable it and your carrier doesn't Doesn't disable it you get the name or some permutation Which granted the carrier has to choose to pass both your zip carrier And the cell phone or the pstn carrier But can actually be pretty slick like when I call in to this number it shows up as davis troi Granted reversed, but at least you kind of know who called in Um, and you get the extension. You know exactly what they called and how they showed up So one more This is a little more real and also falls into that category of viral stuff. This is called the yelp vox I kept running into really really craptastic food Fast food, um, and this happened again in huntsville Mark actually introduced me into to a really good mexican place. So thank you for undoing the damage of thursday What yelp vox does is Assumes that you have the least common denominator cell phone All you have is voice support. You don't have a web browser. You don't have sms Um, although as it turns out I'd argue it's actually better than using either of those two and we'll see why here in a second The uh, the assumption that you only have phone calls. You only have voice Let's you use the phone number Of the restaurant that you're closest to As sort of a proxy for how to geolocate where you are So we have no gps And we can't ask you to enter the street address because you poke your eyes out So we implement a little creativity and we ask you for the phone number of the restaurant you're near And lo and behold Oh, yeah, that's the other thing. Um, why would you use this with the iphone now? Well, I didn't want to buy the iphone until the 3g came out So I had to build this for myself As it turns out Phones really are the least common denominator and just like people still use google 411 Even with the sms and the web and safari on my iphone I still call into google 411 because it's the fastest way to get a phone number look up and that's still true here It's the fastest way if you're standing in a mall or you're standing in a strip mall or whatever and they've got the phone number on On the door you're 30 seconds away from knowing what's near you And you don't have to dink around with a browser and figure out are you in 3g or edge or whatever? And that you can extrapolate from that Granted that's that's a corner case, but it's definitely true that A voice phone call is the one thing that your grandma understands about a cell phone I hope So this is yelp vox I'm not very creative with the names. I know If anyone has suggestions. I'm all ears Smarticus, please enter the number you wish to call. Okay This voice is not very hot. I would think 25905 Uh-oh Your call cannot be completed into network error brutal. So We've got two choices here. We could fork the presentation Namely, I would go launch a terminal and run that a hn Start command that I forgot to this morning That would be the one fork the other fork is we're going to assume That the code does a better example and these slides will be up and the phone numbers will will still continue to be there So I hope this is is kind of a blueprint and one that you can share with anybody else as a way to call in I mentioned this is all up on on jester github. This is the url here. Feel free to pull it up It's got the the full version What we're doing is Oh a little context here, too So before we had the the little dial plan dot rb file right that had the context Like default or ho down and then had the block of code to run when a call showed up in that context In this case, we've actually taken that code Moved it into what's called an adhesion component, which basically amounts to a helper and that helper looks like this plain old class We actually end up getting the call object Which has all the all the goodies attached to it showing up as call And then we throw in a couple a couple of accessories gets initialized just like you'd expect And then down here we go through it. You can pretty much read this in itself explanatory Let's pop into a couple of the methods that are doing magic. You can see we get the restaurant Just see an exact get the phone number describe where you are right now And then see what else is nearby that might not suck as bad as wherever you're standing Which brings up a pretty good point or at least one that you'll run into really quickly There's a right way and a wrong way To address somebody when you're talking to them over the phone and that extends to computers too It's a little strange. It's almost like building your first website In that you realize all the things that that characterize A professional quality or even just an enjoyable to use voice service Like 1-800-GOOG-411 or 1-800-FRU call There's really a lot of teeny teeny decisions latency matters. Is there a two second pause or is there a five second pause? What do you do if you have to go out and get data off some other web service and it takes some time? How do you deal with telling the user that? How do you prevent her? Mark is that allison? Sad Allison's the text-to-speech voice she looks like she could be allison Oh, yeah Okay So how do you ask them for something and if you're collecting digits like for yelp box? We have to collect a phone number Goal is to minimize the amount of data entry minimize the confusion and get them over that as quickly as you can Because once they've entered data They're yours and by that I mean they're going to complete the call So if you can get it to the point where uh, maybe they press one to do something and that's all you need or As in this case if you're getting caller id anyway Why wouldn't you just default to the area code that you're in that the person's calling from? So lo and behold, we get restaurant by phone. And so we have them ask Hello stranger or if you pass in a name Throw it out there And then please enter a restaurant's 10 digit phone number And I didn't extend this to do to do eight digits or seven digit, but you get the idea We've got a string. It has the phone number. Voila And then we use the yelp api to pull this back now the overarching point here is it took what 10 lines here and 12 lines here To pull off a service that you could actually share with people and that if you told yelp about They'd probably put it up and you might get a thousand calls or five thousand calls or whatever Just of people tinkering with it and then from there you can decide. Hey, do I want to flesh it out? Maybe I want to let people record comments about what what They had as an experience Maybe I want to let people press two To hear other people's comments spoken to them from the yelp website yelp is like city search, but it doesn't suck If If you get the over the hump of having an application that you can extend And having those first few users You'll have the motivation to do the rest at least I have so far When are phones good when are phones really really awful? They are awesome when you need an immediate response So immediate as in it calls me rather than I call it They are awesome when you need a fast response fast as in low latency Hey What the heck people can ignore sms And people can certainly ignore client push like a web The really the two big ones for me at least Phone is two-way and yes sms is two-way The first time you ask someone other than you and even even for an internal application The first time you ask an employee to sit there and type a little code as a response to an sms Not going to happen on the other hand if you read them a little thing and say press two for this and press three for this Done you're not going to have a Four-way round trip conversation with sms And the other reason is everybody's got one So if you're designing for something that's either publicly consumable or Even even a group of 20 people other than this room. You're unlikely to find 20 people who all have Uh high-end smartphones And last but not least we have lemmings I don't know about you. I like lemmings Lemmings are easy to predict so We've got them here And i'm thinking okay Why not do american idol for Huntsville So this is the ruby hoedown idol now. I didn't populate it all the way I didn't really want to get ranked at the very bottom of the speakers. So you're not going to get to vote for real people here Which is good I'm protecting my own reputation, you know So what we've got is and this is this is a mash-up with rails if you will except there's no real mash-up to do It's more or less automatic And we've actually got configuration instructions up on cloudbox.com if you want to see how to glue rails and adhesion It's step by step takes about five steps Took a whole bunch of effort on our part to get it down to five steps, but now that it's there That's what it is. So Do a couple validations And this this actually spawned out of my best friend's girlfriend Asking whether it's possible to stuff the ballots for american idol So indeed it's not. Um, I speak empirically What what we can do here though is replicate that same thing So we have a vote model We do basic validations including seeing whether the caller id can do stuff Now what I'd ask you to do is imagine for a second that in your applications In less than 10 lines you can control phone conferences You could kick off a recording and get back the url to an mp3 You could let somebody call in and get bridged to an mp3 and press two to pause it You could collect as many digits as you want and not just play it back to them But post it to a real web interface consider these as building blocks and start thinking about Now why couldn't you stop the ballot? I mean you gotta have dids you could just Call from all of those different caller IDs and You can set your gallery wherever you want to have the dids This is true and it is doable with with that caveat. Yeah, and actually there's I don't want to get too far into it, but there's two caller id's There's the one that you set And there's the one that the carrier passes down to you if you have a pri A high-end or at least a slightly higher-end circuit Exactly and in theory you wouldn't be able to spoof that although it can be done too So actually if anybody wants to organize an American Idol rebel, you know, we could probably pull that off So What's that do? Oh, and obviously the tiny url if you want to pull that up. Um, this is also up on github So here's how we cast a vote This is the dialplan.rb Lo and behold Grab the count pick our voice Create the name of the text string and then Tell somebody this is how many times you've called Cast the vote And then hey, here's your result Now clearly not the most sophisticated thing, but that's the point Ruby is not the sophistication here. This is this is ruby that we all know This is the same functionality that we all know If anything it's on the very very light end of the difficulty spectrum Somebody with pretty basic ruby experience could pull this off and they'd have a pretty neat thing going They could extend it in any direction they want And that's that's I guess the the underlying realization here is this isn't a technical problem There's a couple other libraries that I'll cover here. I actually just contributed some codes to telegraph The difference Is Where with adhesion all the stuff that we've seen is pretty much standalone So it's got this dialplan.rb That lives in its own world has access to the rails models or active record models can do whatever it wants. Sure But is pure phone code and geared only for phone code In contrast telegraph mixes the two and the way it does that is if you've seen the uh respond to you know wants.html wants.xml the different mime type based responses This act actually adds a placeholder mime type for voice Which sounds really cool. It sounds like an awesome abstraction The farther I got into it the less it is because the things people want to do In phone calls are very different than the things people want to do over the web So even though you can have a separate view that is your phone view That decides how to speak the data that does whatever you want that's specific to the phone It turns out that the controllers are different enough that it's really more effort than it's worth And then the last one and this is actually worth mentioning It might be the easiest way to get started although telegraphs come along or adhesions come along way Is raggy and this goes back a couple years. This is the original Ruby voice api. This is mixed. So it's Dare I say it although closer to php than anything It isn't what I would stick with but does let you just as a one-liner or a two-liner do some stuff So back to the examples back to the building cool stuff and not being too worried about Anything except solving my own problem. I went looking for houses and if you've ever been to Seattle That's a really sorry experience Even before well certainly before the bubble And even now it's it's a painful experience So I'd wander around and I'd see a house or I'd see a condo And I'd see a flyer box But it wouldn't look like this flyer box instead. It would be empty always like half the freaking flyer boxes All I ask a real estate agent or all I would ask if I was selling a house is to keep this flyer box stocked So if somebody walks by they're going to get a flyer. They don't do that for whatever reason They're getting their six percent or what have you and They're empty So my rant aside I set out to solve this problem because again, I was carrying around a device That is totally capable of solving this problem for me So I built send sign And what send sign is is you can call in or sms in or email in from your smartphone a house number or a multiple listing For sale property number and get back the bedrooms the bathrooms. Does it have hardware? Does it have a deck? What have you what's the price when was it built above of law? All that good stuff And if you're an agent you can upload that information or to pull off pull it off the mls directly Why is that cool? It's really cool because now I control my own destiny And if I turn this on to the public And told them where they could control their own destiny I'd be able to I think get a pretty quick viral adoption It doesn't take too much, you know granted there's two million iPhones out there It's probably another two million blackberries, but that means there's 200 million Plain old phones in the u.s. That people are carrying around in their pockets And you've got to figure other people have encountered that same problem that I have And they don't need a sign, you know, there's doesn't need to be anything special You can tell them sidebands So this is slick because it means somebody with ruby experience could build what amounts to a standalone business Um, I flesh this out pretty far to the point that it effectively is now And we'll give it a shot Please enter the number you wish to call Hello Welcome to friends by Please enter house number or the mls number of any home for sale Oh, damn I went over to Let me try that one more time This is what I get for having half production Applications I keep putting myself into this, you know, I'm going to demo everything even if it kills me and it is Let's see what we've got That may be uh Apparently the hardest part of this is learning to work my cell phone And this latency is some processing I'm doing Enter the house number or the mls number of any home for sale That takes a spin That is not and actually that's a prime example of what it can sound like in production That's why I tell people don't worry about what it sounds like with text-to-speech because you're not Just a part of Brooklyn Avenue to 21 in Seattle features two bedrooms and one bathroom Is 938 square feet offered for two hundred ninety nine thousand nine hundred fifty dollars It was built in nineteen twenty nine and features laundry room hardwood balcony natural gas Visit www.sensine.com to learn more about the home you visited Don't forget to add this phone number to your phone's address book Thank you for calling So why did I bother with that? I wanted to show the relatively small amount of effort that created other than being 50 reliable this uh This actually live application that data wasn't static. That was actually pulled off The mls um fed through and lo and behold I'm going to detour for a second and you're going to be stunned to see I'm running windows It is lame. I've gone through three laptops in three weeks And I haven't had a chance to reinstall their boon to let alone gen two So if I can actually get a browser Which is easier said than done in this hellish os There we go So Why do I demo that here is why I demoed that What we've created and granted adhesion in ruby is only a little part of this, you know, there's an sms functionality There's email functionality Is an offline mode for firefox We're really good at creating that offline mode. I don't know I don't even have to write any code to create that Forgive this quick detour. I think it will be worthwhile By the way, if I can put in a plug Sprint internet cards anywhere. I was really I was unwilling to pay the money for it. It's worth every dime This is the the sprint evdo magic So what we're going to do is get ourselves some interweb With the sprint smart view Could you come up with a better marketing name guys come on Somehow they managed to take the next tell logo the wi-fi works too. Does it okay Thanks, I was having the wi-fi is running on two spread cards So another fancy there we go So I am now connected which means oh by the way, this was built in google docs. So I'm not totally a windows kit What we're going to do is load send sign com and Lo and behold The number that I just called in with from my cell phone imagine for a second that I'm a house shopper Or I'm just Joe off the street who walked by a house that was for sale and it piqued my interest And the flyer box was empty Or even if it wasn't empty normally I just pick one up and I'd throw it in the back of my car Because who shops for houses or condos by grabbing a flyer box and then reading them when you get home That's just not how you do it Rather than doing that what if I could show up And then at home Wait for the page to load and load and Come on See more proof the rails doesn't scale There we go And voila, so I'm going to log out just to show the experience here. Let's say I show up at centine.com. I'm joe off the street and this is unifying Phone web and anything else you want to add to it How did I inquire? Well, I called in and my number is 206-683-8769 and by saying this on video I'm guaranteeing I'm going to get all kinds of stuff. I'm sure Please don't spoof me And four minutes ago I called in here And lo and behold we're pulling the same data off the web and we're able to unify that experience So we've now used the phone as the way to connect people To bring them back to the web. We've solved that original problem. We set out to Everybody is happy I'm happy as a buyer because I'm not lost and I can answer my question right then I'm happy when I get home because I get to get real information about it And as a real estate agent I actually get to communicate with people without you know not not being Not getting their email address or their privacy invading information But I get to put something up here That connects the dots as a buyer. I want this information So then as an agent If I have a listing that is is active I can see this is one This is a different listing But I can see nothing about privacy like we make sure to to do this and this isn't commercial It's just a concept. I wanted to prove the the concept We see who called in but not specific just how and when and where And you know, I got basic updates and all this The slick part of this is it's essentially a business. You could go around to Real estate agents and for all I know we may and say hey We've solved this problem for you And it's all ruby. It's mostly adhesion And it's all phone based This is the type of stuff that if you want a side project is totally buildable So back to this Here's what it looks like on the backside. This was actually built built with telegraph instead of adhesion And introduces a couple concepts that only come up because of the phone world In the phone world, you don't have a google analytics You don't have really any kind of real reporting. You've got to handle it all yourself Somewhat annoying but things different things matter, you know If if somebody hangs up I did something wrong If somebody goes all the way through and has to spend three or four minutes on the phone I did something wrong. So you're measuring for different things and caring about different things And we also have to track this assuming you want to support multiple protocols the request method a little bit of the logging And then this is what I was alluding to About how telegraph overloads the mime type and creates a virtual voice mime type if you will And so just like you render render A template with the html version you render a voice template and that voice template or partial looks like this We're going to play something this is a this is a view and it actually lives in in your views directory as blah dot voice Uh play something, you know using variable set by the controller and then this is where it gets a little hairy We actually have a voice form That posts to a controller But looks the same as say form tag would or any other type of form So we're telling it what to collect what to play first What what to store it in All the things about collection and then When the user does that when they press pound at the end it actually shows up Just like you would Via via a regular html post Sadly, we're all stupid. I'm stupid. My users are stupid. The listing providers are stupid Everybody is stupid All I have to do is deal with their errors day in and day out So Users, uh, they'll drop off from time to time Their expectations are a little different than yours. Uh one neat thing The second time they call in They're thrilled The first time it seems like is always the learning experience with any kind of phone or voice app the minute they call back in in fact If you if you build something to maturity, you may want to adjust the behavior for future calls because they have different expectations than that first one Um, the listing providers are stupid I mentioned twitter is uh Not exactly the best for text to speech input You have to parse out enough granularity so that you're not using text to speech for huge blobs here I want to use text to speech for the number of bedrooms or I want to use text to speech to say hardwoods or whatever But as you mature this and this gets back to My point about you don't need to worry about text to speech quality that much You don't need to worry about speech recognition quality that much. Why because you won't use either of them very often um, uh-oh So Stupid is calling um unknown in effect, they probably are so So actually I set up the uh the twitter watcher On mars phoenix this morning. So this may actually have been a twitter alert Yeah, exactly test Stupid live Everything is so darn stupid so Wow, i'm popular today. We're gonna just shut this off for a minute So, um, why are the listing providers stupid and why should you not really care about text to speech or speech recognition? The reason you shouldn't care about either of those is because by the time it actually sees the light of day And by the time you're sharing with people who aren't your friends the amount of stuff or the percentage of the words They are going to use for text to speech or for speech recognition infinitesimal You might use it for numbers. Maybe you'll use it for a couple keywords But as you saw here where I was saying hardwoods or it has marble or whatever It's just as easy easy to record prompts and even if you had the most awesome text to speech in the world You would still spend the effort to record prompts There aren't that many cases where you need truly freeform data and not that many of them are good for phones So don't worry about it build it and the problems will go away So logging Again, there isn't a web trends or google analytics for phones. So you guys do it yourself I have a request model which Uh, I should have renamed this so I don't look like a total clown Don't ever name a model request It belongs to things that make more sense from a naming perspective, although recipient again shoot me in the head Before validation we set the default timestamps because we care about a couple things like say when they're requesting it and then We format it cleanly and this 2s is just what the The view is actually inlining for the agent. So pretty clean formatting nothing magic there And then a result type that has many failures This is probably the most interesting part of that in that As soon as you have the slightest few users, even if they're internal even if it's like We're working with a company now that does a restaurant listings online And they have an awesome iPhone app that rocks And has led to a whole bunch of people saying hey, you have some data You might want to correct and they're doing that or considering doing that over the phone Which is a perfect use case call out to the to the restaurant say hey Are you still open is this your current information press 1 if it is press 2 if it's not leave yourself a voicemail voila They've eliminated days if not weeks worth of human calling But when you do that you really want to model like this to track what happens to make sure you're doing a good job of your experience Oh, yes, so getting this running. It's really magic your dhcp You oh wait. No shit reality Sorry You set up a whole bunch of stuff and then at the end you're finally able to do that last 10% with adhesion This covers some of it It gets a little bloodier from there like there are three count them three different conferencing apps for asterisk They all have different shades of gray I wouldn't want to rely exclusively on any one of them They matter for different things and the right way to run into them is not by seeing a list of them It's by saying this is the thing I want tell me which one I should use at least for me as a ruby developer I don't want to have to learn a whole bunch about phones. I want to be able to write code that does cool stuff At the very top or bottom is the api and it's living on all these things. They need to be configured properly So where are we now? The past year people are starting to get aware of asterisk asterisk and digium Which is a local company is starting to become aware of the fact that the way to scale their business Or one of the ways to scale their business is not just relying on people who are building pbx's It's that all of us in this room will do do cool stuff And stuff that they never thought of and won't have time to execute on if And only if the infrastructure is there to create it So if we do the building blocks and if it's that last 10% so we can all sit there and tinker a whole bunch of cool stuff is going to happen and between digium realizing that And all of us hopefully realizing that I think there's there's the potential to see some real phone applications show up in the next year Free switch is evolving. It's a good platform. It's not quite there yet, but they're making progress Abstractions on abstractions. So there's adhesion And as of yesterday, there's a new application called open diesel that is IVR oriented And it's just another way to shorten and do more with less code Um Things are really getting popular the whole world now including my grandma knows about 1800 guk411 And it's really creepy to hear your grandma say have you tried 1800 guk411? She shouldn't be mentioning 800 numbers. You'd have to know my grandma. Um Or 188 do fruit call and they'll tell you the day she mentions that Uh developers So the last thing that we need is developers And we actually built a platform specifically for this At the very bottom here. Whoops. Did I say that wrong? I don't want any developers at all Not a one None of you all of you go out. You're all stupid Not so much. So the final thing we need is developers Um, and I hope I've shown that you can build a whole bunch of cool stuff with minimal effort um Myself and and my cohort because we kept running into this problem Ended up building what we call cloud vox, which takes that 90 and more or less builds it for you And then you can code the last 10 um So if you want a the chance to experiment or tinker send me an email. We'll hook you up Um, basically we point agi at you. So you install adhesion on your end and voila. You're making phone calls Um That's almost unreadable, but it'll be up at cloudbox.com slash go slash ho down With links to all the gists and the githubs, uh, as well as instructions that say specifically the five commands That you have to run to get to the point where you can actually do that same Um, marz phoenix application I was just thinking have you guys considered advertising on the actual stands that are empty That contain, you know, oh for the housing thing. Yeah, putting in something saying, you know, if this is empty dial this number You know, we're trying to figure out what to do with it Like that was the original thing that spawned cloud vox because we I had this problem last summer and said man I would build something And then in the process we spent eight months building the infrastructure To get to the point where I could build the application pretty quickly And now that we're there, we're trying to make it easier for other people to do the same thing But yeah, if if we run with it, that's a great idea And fred couples who is a golfer, but apparently also knows the business world said it really well I'll let you read that but imagine for a second That whether it's nagios or whether it's your own integration application Or maybe it's something you work with that your salespeople might want to call in and check in Or maybe you have a tracking site and you want to let people call in and record their blood glucose readings 42 87 whatever If it's simple if it's fast If it's explainable to a user in an average phone call You can probably build it in less than an hour maybe two hours of effort to get it to proof of concept Uh-oh. I'm out of slides Questions comments suggestions So can you please summarize what all you need to get going here? So at the minimum there's the underpinning Which will either be asterisk or free switch and I'd recommend asterisk given the relative stages right now That's can either run in dedicated hardware So you could download it or you know install a an rpm or debium package or what have you You can get a vm for it. So like asterisk now You can use cloudvox, which is the hosted version but scales way down That's the underpinnings on top of that You need stuff that you can run in that asterisk container. So text to speech There's one called festival that's free It's It's good for what it is and then there's I'm being kind and then there's kepstral Which is what you heard there, which is just good in general But there is a licensing cost attached to it That covers kind of the text text to speech building block if you move over one, there's conferencing Things like app underscore conference app underscore conf call There's a third one that's escaping me right now Then moving over another bucket You'll need to get a carrier. So either a sip provider or god forbid do not ever buy a hardware device Sorry mark, um if you can avoid it. So I am a huge fan of sip. I am a huge fan of not having to own fx o fx s pris If you can avoid having to own a direct connection to the pstn And do it through sip you will save yourself so much headache getting up and going and then figure out what to do in production Um, so that's another part of the building block If we keep moving over, um, then you'll need either some dids or phone numbers With which to people people can call into your app You know my experience has been exactly the opposite that Until we went away from sip. We could not get our call quality the way it needed to be In our office, you know and interesting. We were using I mean, you know, we just could not rely on the internet as a reliable You know means of I'm always making sure that our I think we were just getting dropped calls and uh, or call quality and all sorts of issues For real with multiple carriers or trying different carriers Yeah, and we were on a fiberlink, you know, I mean I don't know. It's just um It seems like the phone the pstn has been around for a hundred years longer and it's a little a little bit more reliable That's a great point Yeah, if you have quality concerns, certainly Yeah Yeah, when it's working well All right It's low touch enough to uh to do that Um At least in our case the real pain was having to go through all the building blocks Like it's easy enough to go and download the the asterisk now vm But it was all the other stuff and organized documentation and getting phone numbers and stuff that was a pain in the rear for us I'm interested in your scaling architecture. How you're doing that? Well on the asterisk level All right My original goal was to get to the point where we had a hundred thousand people With their own little teeny pbx slices And I still hope to get to that point. We we aren't Obviously there yet, but we scaled it down so far That it's not like tied to a vm or a piece of hardware or anything It is kind of the scale out of asterisk So at the low end It's a teeny teeny slice of a system, but it's totally private Nobody can record your calls. Um, you can associate sip calls or sip phones with it just like you regularly would So it's an app oriented pbx, but it's also Um of regular sip connection And then at the high end we do some some stuff to do, uh I guess you'd say load balancing But really boils down to Making one phone number able to handle way more calls than any single asterisk install ever could like if say Somebody wanted to do a killer app that did 10,000 concurrent calls. How would you handle that? Scale out which is to say multiple Little little installations. It's probably worth having a conversation if you want Good good. Let's catch up That's about a good dial plan So in Tricksbox you can point the uh one of your extensions or Like a whole context to adhesion and if you do that You'll be able to control the whole dial plan So really I guess depends what level you want to pass that on to I wouldn't waste your time with Tricksbox I wouldn't I don't have a choice. We've already got it. I'm sorry The truth comes out you heard it here first ladies and gentlemen Other questions thoughts The leader there is a company called lumen box And I guess I'd say for my gut it's a b maybe a b-minus But my standards are pretty high for speech recognition. Um, it gets way better if Obviously I never had to call In that case it's a d So that's someone who's designed for the exact case They're looking for one of two words and they can't pull it off Yeah, so I normally I'd rather go for for digit entry where you can But if you can constrain it to a really small vocabulary. Yes. No and a half dozen other words Then it works just fine But don't expect any kind of of free form text entry or, you know, free form speech For for speed recognition Because lumen box uses things which is the you know, I can get pockets things. It's just a modified version of it They use tarps, but their voice models are so much better than everything else Interesting. I haven't looked at the covers with lumen box. That's correct. Isn't it more? I don't I think he uses space In the copyright notice or whatever Yes A little stuff from from cmu too The lumen box guys a lot of that stuff is you know, the cmu Libraries have been used by a lot of people and they have a bunch of sort of underlines each technology So maybe there's something This makes it really slow This makes this painfully slow. Yeah, if you see any of your space as a speech recognition engine Don't bother Sorry voice recognition Excellent, so these will be up at cloudbox.com slash go slash ho down and by all means if you have an aji question Or an adhesion question or you want to tinker and you need a place to do it send me an email I'd love to see cool stuff come out of this in fact I hope if I show up next year that some of the crowd will have released some cool stuff or or added the 5 percent To an existing application that is the phone functionality Because we need to get this infrastructure to the point where you don't have to decide i'm building a phone application I'm building a voice application As a friend of mine, thomas house says voice or phones is the paprika. It makes everything taste better So you shouldn't need to decide i'm going to go spend the effort. It should just be hey I'm writing more ruby and just like I you know, I don't have to decide whether i'm going to use active record. It just works Thank you