 So my name is Jason Gecki, got some of my details up there on GitHub and Twitter, my email address. I'll throw those up again later as well. How many people in here have actually heard of adhesion? Okay, so a good number. How many people have heard of asterisk? Okay, the same number. I was curious if that was going to be the same. So I worked closely with Jay Phillips before on the adhesion project. We actually started a company where we were doing consulting and hosting services and things around adhesion and asterisk. And one of the things we found is that while everything works well and you get out there running, a gentleman in the open voice presentation actually mentioned to try and get free switch up and running, it's somewhat difficult. So we found developers actually being quite put off by having to go and set up an asterisk and adhesion, things like this. We didn't create a sandbox where you could actually download adhesion connected to a sandbox that's running on our host that we had and get it up and running. We continued to actually support adhesion. In fact, we just had adhesion conference that Jim Freeze was out for, I think was that last weekend or the weekend before. It was within the last couple of weeks. So it's still an active project that we're actively developing and moving along and doing things. And in fact, at that conference, we agreed to go to version 1.0 shortly. It's running in enough places in production. It's fairly feature stable that we're actually going to push it out there now as well. So the company that we built was actually acquired and we've been doing other open source things. Jay, who some of you may or may not know, is actually now at Pivotal Labs. So he's living large there now in San Francisco. And I've taken the adhesion project and we're actually running an open source project called Tropo and another piece of it is actually Moho. I forgot my laser pointer here, but what we've effectively done is created a cloud platform. Instead of having to go and set up your own telephony engine, you actually go and use a cloud service. And in order to keep from locking in or forcing you to use a cloud service, even though this is a proprietary API, we've actually open sourced all the underpinnings. So you actually have a SIP server-lit engine container. So we're getting into a little bit of Java here. Don't worry, we'll get to the ruby here soon. That Moho runs on, which is a common library, and then Tropo, which is actually the API that we extend out. That will actually run on Sailfit as well. So you can actually take and download this code and install it on Sailfit or install it on MobySense. MobySense even has a media server that you can run this on as well. So you can literally take this whole API and run it in your own servers wherever you want to, et cetera. But as has been mentioned before, trying to do telecom to scale is not a trivial task. A little bit about Moho, because this forms the underpinnings. In the telephony world, you have lots of different APIs, and no one really got together and designed them all together. So you have SIP server-lit, which is the way you write SIP voice-over IP applications in Java. Then you have MRCP, which is the way you speak to back-end media resources to actually do things like conferencing, make calls, do things like speech synthesis or speech recognition as well. And what we've done is actually gone and created Moho, which is a Java library that takes it up a level, so you can actually focus on a common object and event model and not have to worry about all those underlying things. So you start baking in common concepts like conferencing, called queuing, speech synthesis, speech recognition, like I was mentioning before, and then Tropo actually sits on top of that. Now, what Tropo does is it actually gives you an open-source API that can be consumed from the cloud or on your own servers that really takes telephony and makes it very simple. So you can make phone calls with 15 commands, and it's not actually limited to phone calls. As you can see in the graphic down here, you can do voice, you can do instant messaging, you can do Twitter, and you can also do SMS as the gentleman with Splat had mentioned before on that. And you're really using all the same APIs to actually do that multi-channel capability from your applications. It also does speech recognition. Right now, I think nine languages in total were in the process of rolling out 20 languages, and that's both on speech synthesis and speech recognition when soliciting input. There is the speech recognition driven by grammars, and then there's also transcription. So you can actually take a voicemail, for example, feed it into the transcription engine, and get back and email with the text of that. So that's more free-form. It tends not to be as accurate. If anyone's used Google Voice before, it's hard to do free-form speech recognition, but it's very different when you're doing real-time guidance speech recognition because in that case, you're actually getting a fixed grammar or a fixed set of words that will actually be recognizing. And as I mentioned before, it's a single API. Now, we've created two different APIs because we have our web API, which is really your traditional REST JSON interface into our network. So you have your familiar request response. I do a lot of examples using Sinatra talking to that API. Sinatra is very easy to write web services in. It doesn't have all the formalities of Rails, of course. And what we're effectively doing is posting you a JSON session anytime a phone call comes in for your application. And within that, you get things like their caller ID, where they're calling from, where they're calling to in terms of the phone number they've actually dialed. This type of information actually gets posted to your application. In the case of an instant message, you actually see which network they came from. So you can come in from Google Talk or from Yahoo or any of these networks and actually see where they're coming from and interact with them in that way as well using the same API. And then it's just a series of resources that you cascade through setting that JSON documents for the different commands as well. And a lot of web developers get this because you're dealing with REST APIs out in the wild all the time already. There's actually a Ruby gem out there that I wrote called Tropa Web API Ruby that actually hides all the JSON and all the complexity. Well, it's not actually that complexity. It's the actual API from you and allows you just to deal down in Ruby with that particular API. And it's meant to be plugged in with something like Rails or Sinatra. We've actually run it up on Heroku as well as Google App Engine to actually serve that up as well. Now, what's interesting, and I'll get more into this later as we're doing things with adhesion as well and actually get to some demos as well, is we have a scripting environment. The scripting environment is more similar to what you might do with Google App Engine whereby we actually take your script and run it in our network, on our cloud. And what we're actually doing is we're a Java shop at the core, so all of our stuff is running inside of a sips-orlet container, which is Java. And then we're actually exposing all of the various language implementations to take all the different scripts. So we do Ruby, PHP with Quirkus, of course Ruby with JRuby, Jython, and let's see the other one, Ruby and JavaScript with Rhino as well. So we can actually take those and then everything goes down into the lower level Java API. There's actually a joke in here somewhere. I think there's actually a cold fusion icon in there as well. It makes some cold fusion fans that we have. But here the idea is you simply, a lot of developers use this to get started quickly because you simply write a little Ruby script and you say answer, say something, speak synthesis, and then hang up and you're done. You don't have to run any web services. You don't have to figure out the API. You're just off and running at that stage. And this will run on a local install just the same as well if you decide to to mix your own SIP server engine with this as well. And there's other use case scenarios that we're finding compelling out there. We actually had a set of developers out there that had been trying to do an interactive media billboard in common spaces where people would dial into that billboard using their mobile phones and try to interact in that space with other people. The problem is they tried it with other services and even with our Web API service but the HTTP post latency back and forth was actually creating latency and you could feel it as you were interacting with that board. So scripting actually took Ruby, I helped them through it at that stage and actually opened a persistent socket from the network to their back end where they were actually streaming the digits down in real time without having any HTTP overhead or even opening up new sockets. So there's interesting things you can start doing if you have your scripts actually running within a cloud close to what the actual plethora is doing. So just quickly there, the different languages that we support. And what's interesting here is this is a good case study of using JRuby and the power that JRuby brings to the Ruby community. I use JRuby for everything I do because it has native threads, tends to be good implementation, we'll support it moving together fast but we've really taken it a step further and what we're effectively doing is exposing our underlying Java API and then we've written a series of shims that sit on top of that for the various languages to make them feel more like that language. We haven't done a perfect job but if you want to do speech recognition you simply do what are you asking, the time out and then the choices that they have, they can say yes or no or hit one or two for example on that site. And what it's really doing is it's driving down into the Java API so we don't actually keep you from going to Java if you know what you're doing from a Ruby perspective but you don't have to know anything about the Java you don't see it or feel it when you're using a particular API either. Now something we've been working on because what we're trying to do is make it as easy to select the development environment you want for a couple of reasons, one is whatever you're comfortable in that's what you should be developing in but also making sure you're not getting locked in to anyone's particular services, right? So if you want to get started we've actually created an adhesion script that runs in Tropo so Tropo actually looks like an asterisk server to adhesion, right? So now what you can do is you can throw up your adhesion go into Dial Clan RV and I'll actually demo this I'm going to try a live demo and we'll see how it goes and interact with Tropo and all of its features and then if you want to once you're ready to deploy you can deploy on Tropo or you can actually take it and deploy on asterisk once you want to get to that level as well. So it's really providing the true spirit of open source of being able to move around to whatever makes most sense for your business model or your willingness to support, maintain or build, et cetera on that side. All right, so let's go into the demos here. I've got a few of them. The first one I will do is the adhesion one. So how many have actually used adhesion before? Okay, so a smaller number, four or five people. So this is actually the adhesion console. What adhesion is for a little more background is really a Ruby framework. It was inspired by Rails. Jay Phillips actually started developing it in 2006. I was one of the early real users and contributors to the project. And it's really, there's two frameworks in the asterisk world. An asterisk being the most popular telephony engine that's open source. The second most popular is probably FreeSwitch that OpenVoice has been importing to. And from that perspective, asterisk, there's really just two frameworks that have the type of features that adhesion does. It's the adhesion one and the asterisk Java one. Because what it allows for is dealing with both the asterisk synchronous API, which is called the Application Gateway Interface, or AGI, which is what you're seeing here. It actually comes up and serves a TCP socket, in this case on 11276, to serve up to asterisk. And then they also can connect to asterisk through the asynchronous interface, which is called the asterisk manager interface, or AMI. So for the first iteration that we've done on supporting Troppo, is really the synchronous dial plan interface. So a flavor of an asterisk script. Here's your script, excuse me. Can you see that all right? So what you have is you have the concept of a context, which is really a block of code that you're calling from asterisk when a call comes in. So a call comes into asterisk. It's actually going to find your server and hit a particular context, which is really a code block. Then you say something like answer to answer the phone, because maybe you don't want to answer, maybe you want to redirect it somewhere after looking up their caller ID and deciding to do something different with it. And then you can actually play. Now, what you're seeing here is the Troppo implementation. In asterisk, when you play, you really only play back an audio file that's pre-recorded on your server. In this case, since it's actually going to talk to Troppo, you can actually have the speech synthesis, or you can pass it a URL, which actually plays an audio file as well. So what a lot of people will do while they're developing is they'll use the speech synthesis, and then if you need a professional recording with a person, then you go back and fill all that out as you go. And then we actually show the ability to ask information. And in this case, once an asterisk doesn't have built-in speech recognition, so we're doing the adhesion capability to call a custom method. Because in asterisk, anybody can develop a module for asterisk to plug it in. And we may not, within the adhesion development community, actually know that that module exists. So there's a lower-level way to actually execute. Since there is no ask-verb within asterisk, because there is no built-in speech recognition, I'm doing that here. And then passing in a series of parameters, such as the prompt, what you want to ask for. In this case, we're going to ask for the zip code. The choices. In this case, we're saying, since we're asking for a postal code, we're going to ask for up to five digits. And those may either be spoken, in this case, in English, because I believe we're using the English recognizer, or touched, tomed in on the keypad of the phone, as well. The number of times you want to repeat, in case you don't understand, it doesn't recognize what they're saying. They end up saying banana, instead of a series of digits. The terminator, so they can hit hash to finish quicker and return something. Then, of course, the timeout. In this case, we're actually converting it to a JSON string and setting that up, because the asterisk AGI is really a very simple TCP-based protocol that's passing strings back and forth within that line-by-line. I'm actually going to log the result that comes back, so you can see what's recognized, play back what's actually in there, and then say thank you on that side. So this is running, this is a pure adhesion app with no change to the underlying adhesion framework to actually support the interface. And the idea behind adhesion, right now, it is asterisk-based. We've actually started porting it to free switch now, as well. We're porting it to Trappo, in that case, unchanged. And then, as well, to other platforms, such as Moho, that would give you the asynchronous capabilities like you get from a asterisk on the manager interface, as well. So I'm going to try and dial this. Let's see how it goes. I've got to use this phone, because the latency here to do VoIP is not very good, and hopefully everyone can hear it. So, thank you. So there you may see what it's actually doing on the adhesion side, is it thinks it's talking to asterisk. It's got no idea, really, right? So you can actually take this with some very small modifications and run it on your own asterisk server at that point. Now, in that case, I could have said 94070, but I'm never sure in a conference on that brief. And to do things like different languages, we actually have, as we said, eight different languages or nine different languages, and you just pass the different voice that represents the language in English and nail our female voices on that side. And I'll give you a demo of that right now, as well. So that's the one live demo. The other ones I've actually canned to be a little safe on that side. So the next one is actually multi-lingual speech synthesis. Let me do the demo, and then I'll walk you through the code quickly and what it's doing and how it works. I find that works better. And I don't think we have... Do we have audio here? Yes. I'll just use the microphone. Plug it. I don't have my audio on myself. Oh. That would help. What are you trying to do? Plug the cord again? Well, he's trying to play the speaker there. Try to play the speaker. Oh. No. There's no reason. Does the app have a controller? I'm not sure. We just restart quick time. And this is the canned one? Yeah, exactly. The live one goes well, and the canned one doesn't. You got a lot of time. The audio is going. It's not going to be very good without the audio. ... Yeah, it doesn't actually have a video at the beginning of the... the sound at the beginning of the video. So I actually gave this in Krakow at Iroko, which was back in June I think. I think. So that's why it has Polish in there as the urgent language that we need to have in here. So the way this was actually done was using our scripting platform. Once again, keep it easy. No running pieces outside things like this. So I wrote a Ruby script where I created a default voice which is English. So in between it's actually when it's speaking this with language it is. It keeps that in English. Then I created a hash of all the different language pairs that we actually have within the platform. Then I created the text since we were doing it. I could have said Austin was a beautiful city, I guess, but I didn't want to have to re-can my demo. And then I'm actually using the Google API just to show that you actually can go out and fetch your data from wherever you want to to actually do this. So I'm actually using the Google Translate API to go out and pass the text that I actually wanted to do. And then down here the first thing I say is I say the text in English first, right? I sleep two here because we actually support Skype calling so you can do a free Skype call to your application as well as SIP or the other PSTM, things like this. But because Skype takes a little while to nail up the audio channel, you wait for the Skype world a little bit. You can do that smarter by seeing if the channel coming in is Skype or not. And then you say if you pass it the voice, in this case English, and then I'm actually iterating through and passing a block with each of the voices and actually just going out and fetching that, passing the language pair to Google, now I think you could actually just let it detect the language you're doing and then just going back through and playing through each of those in that case, right? So what this is showing is the, you know, quick ability that you can actually get up there, have a multilingual application that I think actually sounded pretty good in terms of the various languages and be off and running. I'm going to try and plug it in for the next one. And what I'm actually showing here is dialing in via Skype and then since we're doing things server-side, right, we actually have a debugger that comes out server-side that actually shows you what's happening within the API as well. Welcome to the conference. What is your favorite programming language? So you like to dynamically type languages like Uber to you. Enjoy the rest of the conference. So in that case it's just showing the ability to actually speak back the response as opposed to entering in with your touchstone on the phone as well. So in that particular case... Welcome to the conference. What is your favorite programming language? Java. So you like to statically type languages like Java to you. And there it was running over Skype so you could hear it would get some artifacts in there once in a while. And here's the script that ran that, right? So relatively straightforward once again, you know, I have an application up and running very quickly. Where you come in and say welcome to the conference, then you're asking what the favorite language is. In this case what we're actually showing is the ability to group different languages or different words together into a particular context. Right? So you can see dynamically you could have said JRuby, Ruby, Python, or Ruby or statically would not be grouped as C and Java. Then those actually get returned if they get recognized. And you can say the result value which is the actual language that was spoken. The result value is the actual context that was spoken so dynamically or static. Then the result choice utterance is actually what they actually said within that particular context. So once again, this is highly accurate compared to transcription. Because transcription, you're not providing this definition of what you're looking for. You're just trying to recognize anything in the world with any accent on that site. So that's what I wanted to show. Any questions that people have. A lot to digest, I could go on for a lot longer. So what speech recognition engine are you using? We actually have our own speech recognition engine that we've done for English and Spanish. Right? And then we have a series of third-party engines that we actually use just while I'm trying to bring up the other one here that we use for all the various languages. Because we're never going to be able to develop all the different languages out there. There's a whole series of vendors that we use there that I could talk to you about as well. But for the English and Spanish which are the two most common ones we use is actually our own speech recognition engines. And speech synthesis as well. Now you can get open source ones. Like there's flight for speech synthesis and space for speech recognition. And they both do work but they take tuning to sound good. Right? And there are techniques that we actually support things like SSML which is I think the speech synthesis markup language or something like that. We can actually control in detail how fast words are spoken, inflections that you want, things like that as well. And then GRXML for the voice recognition site. There's another question back here. Yeah, which I have platforms to support. We support Google Talk and any XMPP network. Yahoo, Hotmail, AIM, are you forgetting any? AIM, MSN, Yahoo, XMPP, and you direct Google Talk. You have a G Talk ID you want to connect to. Yeah. Yes. What's your business model for me to use these services? There's a few. One is we have a... If you take Tropu and Moho, you can take that and run it on Moviesense for example. And we're not going to make anything. We're just trying to get people developing voice applications. If there's more people developing voice applications, the bigger addressable universe of developers we have out there. And getting the community back. On that side, we're actually working with a few carriers that are actually collaborating on some of the layers and extending those layers as well. And the various filling in the back end provisioning systems and things like that. Then we have a commercial download because we have our own application server and our own media server that you can download and that's licensed. And then we have our cloud API which is charged on a per minute basis, prepaid, 3 cents a minute. And then depending on what things you stack up on it on top of that. Our developer program for the cloud is completely free. Our idea is we'd rather have a thousand people developing all sorts of innovative applications for free in the hopes that two or three go big and then they move on to our production platform and go there. So we really want to incubate our developers' business. So free phone numbers for as long as you develop free usage in bomb and out bomb, SMS, et cetera on that side. And we do offer international numbers but those are in extra charge as well as international dialing because of course there's room costs involved there. So there's various ways we do it. But the idea on the open source side and the reason we're doing that is we come from the world of voice XML and CCXML and the way you avoid vendor lock-in was through an open standard W3C because a lot of these things are proprietary and moving too fast to keep them simple. There is no standard so you do it through open source. So you don't lock in that way. Yes, back there. So how do you differentiate between some of your flu or the phone tag? It's hard for me to understand the differences. We actually use phone tag for transcription, for example. We're not focused on the transcription side of business. They have an API that we actually plug into so they're a partner of ours. They don't do the other voice applications like we're doing here, all the self-serviced things and the APIs, et cetera, so they're transcription guys. And I believe some of our competitors use that as well. Most of our competitors aren't actually using them directly. Comparing to people, there's several players out there. We have a business, which is our enterprise-focused business where we run about 30 million minutes a month on our platform. So we have a lot of scale and troubles using that same backend. And the competitors we have there are people like Tell Me, which was acquired by Microsoft four years ago, maybe. West Interactive, et cetera, that actually provide the more enterprise-focused stuff. And I won't go too much into detail on that. On the trouble side, like you mentioned, Twilio, what we have is we have one API for all of your channels. None of our other competitors have that. So you're going to literally write the exact same thing can work on SMS, IM, or voice. Then we have speech recognition, which other competitors still have built into the platform. The other competitors only have DTF input on the simpler APIs. So you can get input, but it's always through your keypad. There is no speech recognition capability except for on-tropo. And then, of course, the whole multi-legal capability as well. So it's not just Spanish and English. It's nine different languages and working on 11 more as well. And if you want to go into more detail, we can as well on that side. But those are some of the broad stroke differences. And then, of course, the two different programming models. And now, kind of the third one, because I can talk to it, here's your mail, too. So any other questions? I mean, are all three programming models peers? In other words, if I go Web API, I'm not losing something from the scripting environment. Correct. They all share the same verbs or the same underlying capabilities, just different models for getting there, right? In fact, Web API is actually just a Ruby script running on our scripting engine. And the AGI we've done with Asterisk is just the Ruby script that I've written that runs on our scripting engine as well. So it's really exposing the same underlying capabilities on that side. All right. Well, thank you. Appreciate it. Time.