 The Italian voice mafia of the day, that's a bunch of us. Yeah, that's a bunch of you four at least. So tell us about speech recognition, the hearsay, whoever they see, all the things. So, when I started preparing my talk for Fasdan, I was thinking about building something that was not usual. Like we usually work using various APIs, most of them are commercial, stuff like IBM Watson and Google Voice and things like that. And I decided, what if I start my talk by building an application that's based only on open source technology, including the more advanced stuff such as machine learning and stuff like that. So who am I first? My name is Luca Perdovera, I'm the new lead at Mojolingo, which is a company that's been managing the addition project and the other stuff, and I'm an addition contributor and I will tell you all about addition today. And I've been playing with phones since I was eight. My father is also a telephone engineer and reminds me of family. So, demo first. Cool, someone call James Bodey while I pull up the demo, please. So, a warning, the demo doesn't actually work because we have no audio. So what I've been trying to do here is get a call into a WebRTC client, which was the part that didn't work essentially, which I touch on my presentation, I did a slide on this after, and get the call through speech recognition using pockets things, so it's free and open source software, running through free platform that's called NLU that does the actual bot interpretation, kind of like wheat.ai, their stuff, and then spit out the result. What I did here... I need a Chinese restaurant. I'm just piping the fancy browser, and essentially what we're doing is that we get a Chinese restaurant out of that. This looks simple, it is essentially simple, it's also very not polished, it's horribly complicated. So, what just happened? This was WebRTC going to Friswitch, check the disclaimer, so demo might not actually contain WebRTC. Friswitch is sending a call to Adition for control. Adition is using the result from CM news things to go through Reza and LU and ask Google for a restaurant. F-Lite is what is speaking the voice. F-Lite is open source TTS engines, that's not too bad. Actually, that's one of the things I found out by building this application is that all of these products are surprisingly good, and we're skipping over and giving money and time to big corporations, and we should probably working with some of these people in projects, that's the important message you got from building this one. Well, we're using HTTP just to go through. And this camera says, no keyboards have been harmed during the preparation of this demo, that's not actually true, I broke one just because I was so angry. So, moving part. As I said, we have Friswitch with Modverto. Adition, which is the main control layer. Pockets, things, and F-Lite are the voice and Reza and LU. So, everybody knows what Friswitch is by now, there's been like three presentations and Giovanni is better than me in explaining what Friswitch is. So, it's a switching platform, you can use it for a variety of things. Very good modularity, it's easy to turn off features you don't want, and it's got very good WebRTC support through Modverto. I'm talking about Friswitch, just because I use Friswitch to build this particular demo. Astros will be Asco, they don't have Modverto, they just use SIP over WebRTC, and it's pretty much, right now, they're feature equal, depending exactly what you need, there might be some differences, but I don't see anything that's really relevant in that. The ear and voice, as I said, are pocket sphinx, pocket sphinx will be tuned for better results for starters. This guy really only understands a few hundred words, just because I'm running with a stock grammar, just a larger grammar, I didn't have time to install that. It sounds like it's worked pretty well. And F-Lite gives you good TTS for the price, which is free, the voice is a bit robotic, but then again, it does speak pretty well. The interesting thing I discovered, when working on this project was this library, which you don't even know about, which is Reza NLU. This is a machine learning and network language processing library that aims at providing a service similar to Weith.ai, Lewis API.ai, so essentially a conversational interpretation of text. You ask it a question, and on the back end you have built a tree of entities and phrases it understands, and it will give you back an intent and subjects. What's an intent? Intent is what the person wanted to do or know, and what the subject is, is what the person wanted to do or know about. So I need the Chinese restaurant as an intent of restaurant search, in my configured example, and an object of Chinese, because I said I want the Chinese restaurant. So that's then used to do a search. This is a very, very interesting project, also because it's almost batteries included. It didn't take long to set up, and the time you're taking and setting up your model can be reused on any other platform just because they're compatible. So what did I learn building the app? Principally one thing, we need a better way to set up Friswitch or Asterisk or WebRTC development locally. Working with various security requirements, especially the SSL certificates is a pain that needs to be fixed, and I'm probably going to start and figure out how to fix it from Monday on. PocketSphinx is not as bad as the reputation it has. PocketSphinx is considered a toy. That is wrong, essentially wrong. It does most of what you need in a simple application, including interpreting simple grammars, so you can ask it yes, no questions. You're generally asking that kind of question in a voice application anyway. You don't want people to tell you the story of their lives and then interpret that. You don't really care. If they need a counselor, they call a person not a bot. There's value in running your own brain. Of course, there's an economical value. You're not spending money at all, and essentially, the setup time is so tiny, it's probably worth to still look into using something that you host personally, and you're contributing back to open source and you're not giving things back to Facebook or Google or whatever. They have enough money. They don't need ours. The other important part, very important part, is adhesion itself. So all these services are disconnected, in a sense, from each other. Friswitch records the audio. PocketSphinx interprets it. NLU gives you the conversational representation of what the person said. But you need something to run the call, pick up, answer, do the recordings, shuttle around messages between the various parts. And that's where adhesion comes into place. So first of all, has anybody here heard about adhesion before? Woo! That's like 300% the usual amount, which is one. That's bad. Adhesion, it's great. It's a Ruby voice application framework. It provides third-party call control logic to telephone engines, which means we don't mess with the actual media flow. We just handle things like picking up the call, transferring, answering, recording, playing audio, collecting input, using ASR, et cetera. It connects to the two main open source platforms which are Friswitch and Asterisk using two different mechanisms. One is RIO for Friswitch and one is AMI for Asterisk. We're not on ARI, I know we should. Version 2 is stable and it's been stable for a while till we just released an update. And version 3 has been at RC1 for quite a while because there's one bug we need to fix and we need a sponsor for that essentially but we've come up soon enough. And it's backed by a foundation. So the project has guaranteed long-term viability. So we have a few new things in Edition 3 which make it even easier to use as a glue for building your own complicated applications whether they're voice-based or other types. Friswitch support is RIO only. It doesn't use the event socket anymore in case anybody knows what an event socket is. Asterisk 11 plus is required, which is, well, I mean, right now it's okay, I think. We streamlines the internals a lot. It's 30% faster on benchmarks. And it's got a built-in HTTP server so if you need to build a very simple dashboard to your, for example, PBX internal telephony exchange for your office, you can have it straight on the server without any other service. And it's got native internalization support which means you can easily play and send translated text. It's important just because we didn't have it. I know it's forgiven these days, but it wasn't there before. What can you do? So there's plugins. Plugins give you voicemail. Cudo TTS. What's Cudo TTS? It's a plugin I built that will play dates, numbers, and generally ordinal numbers. So first, second term, just starting from a string with audio files it carries itself. So you can even run without any kind of TTS and it will still try and play audio for people. There's platform-specific functionality against mostly for Asterisk. Asterisk has very good primitive support, native support while Friswitch is only supported through RIO. We have clustering. Login has been unified so it's easier to deploy to Heroku and whatnot. There are all small changes by far and wide, but in general speaking means that you can just take an additional application, deploy it to Heroku, point it at your Friswitch server, and you're done. So every phone call is an actor. Probably someone asserts about the actor model. Some people haven't. Has anybody here used Erlang or Lelixer? Same people from before. Yeah, so anyways, the idea is that every call is isolated. If a call crashes, the process doesn't go down. So we're falling back into what Giovanni mentioned earlier. So if a call goes down, it's not a big deal. If someone gets your system, it doesn't go down. So the goal is to have something that always runs. Messages are passed around as events, and each call runs the handling logic and the extra thread, which is a technical term to say if the call dies, only the call dies. The rest stays up. So as I said, controllers group up features. So it's not like trails or any MVC application, really. There's a controller for everything you want people to do. So the demo I run is one controller that asks the person for the restaurant, checks if there's valid input, and then looks for the restaurant and sends the URL back to the browser. Routing, kind of like a web app, controls which call goes to which controller using, I don't know, could be a phone number, could be time of the day. So the usual time of the day thing when your office is open or closed. But it's far more flexible than using the dial plan because you can, this is a normal rule application. You can access anything you want inside the application. There's an event handler to handle as sync messages. And it's generally based on celluloid, which makes it behave similar to what an underlying application is. And there are DSLs for all common operations. So you can really replace your dial plan. Quickening on the RIO protocol. RIO protocol is an XMPP extension that we use to communicate with Friswitch. It's interesting because it's called building load balancing. So it scales very well. Actually, addition scales better in Friswitch than Asteris, thanks to RIO. As a side effect, every addition node has also its own XMPP address. You can use to control the instance or send out events or participate in chats. So that's not really restricted to Friswitch. It will just connect to Friswitch using RIO, but it will also have, okay, way up eight. Perfect. So addition on Asteris has no RIO support. There's no RIO on Asteris. It doesn't really matter. It connects via AMI. We're not using ARI yet. Sorry, we'll get there. There's much better native command support. So if you have a platform that's based on a very complicated Asteris install with a giant dial plan with thousands of lines and context scripts and whatnot, you better serve moving it to addition right now because it will be at least one tenth of the lines of code in Ruby. It's slightly easy to get started because of configuration a couple of reasons. So if you're checking out addition, try Asteris first. What can you do? As I said, calls, conferences, media drive various types of speech recognition, commercial engines that use special grammars that are XML formats, build very complex IVRs from simply press one if you need to do this or press to do that. We have IVRs recording anything from flight reservations to people placing complaints because they didn't receive their newspaper today to really anything else. And you can connect to a database. I'll speed up a little just to get to a few other interesting slides. It's deployed on any Ruby flavor. It's usually one-on-one with Friswitch and Asterisk. Friswitch now you can scale. It's sort of decoupled. You can have multiple nodes with multiple Friswitch nodes. It's easier to scale, providing you have a load balancer. So it makes it everything easier to scale. Quickly, this is an XML dial plan for Friswitch. And it's like a board with the nail through it. It's good, but it will only work for the work for some things. And it's difficult to adapt. The controller code is not... Well, we can read it, of course, but you can need the slides. It's... The controller code is not necessarily shorter, so it doesn't look like it's any simpler. But keep in mind that it's normal Ruby code. You can literally access anything a Ruby application can. So anything from a common line application to APIs to whatever you need. We used this to push events to the browser, as you saw. It was just connecting to a web socket and pushing events to the browser right away without any intermediate process. Who's using iteration? Just to give you a few examples. These guys have an interesting application based on Nestrisky, if I'm not wrong, that does... So you need a doctor. You don't know who to call. You're new in the area. They give you a number. You call that number. They put you through doctor to start their own call or available, depending on a complex routing table. For example, that's the application that will be a pain to build in just in a dial plan or just using more HTTP curl, complicated stuff like that. Just because you will have to adapt your logic to how asterisk and frizzwitch want you to build it. Well, here what you do is just keep your logic in addition. All you do is a simple SQL query. Manipulate whatever you need to do. Generate specific messages for the different doctors and just handle the call in an easier and more integrated way. Again, we're not doing anything with the media flows or signaling or whatnot. This is just call control. LiveConnect. This is the newest project we have. This is WebRTC based. This uses frizzwitch to broadcast operating room surgery. Why is this SIP based? Because we got a very strange piece of hardware that's FDA approved in the U.S. and we could only use that and that's a very weird thing because it connects through SIP to send out video. I don't know who came up with that but we had to use that because changing that meant the millions of dollars invested and whatnot. And so what we do is that client calls the frizzwitch server and we just replicate the call over to WebRTC. People, they call him. In this case, what the challenge was is managing security. You have to be very sure that you as a student you're only allowed to join a certain surgery during a certain time and not other times just to, you know, sort of all sensible data and stuff. So again, in this case, addition allows you to very easily manage access to the conferences by going through the database. We actually check with, I think it's a, I don't remember which big CRM application is handling the actual access but the university that's running this as their own platform and as part of checking if the user is allowed to get into a conference we check with that API. Power home remodeling. This is just a big customer of all things. They're, well, they're a customer of us that also big users for their children. They have a very big, they sell half a billion dollars of windows every year, which is a huge amount of windows. Not windows, computer windows, like literally glass windows. They were built out of a, essentially out of Gold Center. So the Gold Center has always been very central to their strategy and I'm talking about them more about the scaling properties that adhesion has then the actual functionality. In this particular case, adhesion is not doing much more than just a dollar. So what it does is they have a list of people, they call those people and connect them to agents to have them confirm appointments or take a moment. It's very simple from an operating standpoint but they have 400 Gold Center operators working 24-7 and they have only one Astros box and one adhesion node with another cold spare standing by. So yeah, that's not ideal for doing so. We can run a lot of calls. It doesn't add any noticeable overhead. You're going to hit your Astros fridge single box maximum number of calls before you hit the ruby maximum number of calls in this case. Well, there's a few other as I said the publishing system they do complaints and stuff. We have one MVNO. Oh, I have guess which one but no one knows which one. It's ring plus in the US just because there's nobody here I think. And there's an interesting free project we have as a cultural mediator network. So people that speak a language and they've done a course by the detailing government say, I'm available, I speak Hindi and I'm available to help out people who are in distress and only speak that language and there's a writing system that the police can call a number and it will give them a person speaking that language and not speaking a language. We have cases though in which the nurse or policeman didn't know which language the person was speaking. In which case we route them to some people who speak a lot of languages and will try to understand what language that is. So, thank you. I have to speed through a little. There's plenty of other things I could show you but maintaining a solution removes a lot of the complexity of building applications and we can fix the WebRTC certificate but that's something I'll fix later and if there's any questions I'll be glad to answer. It should be about three minutes. Yes. Great. I hope you liked it. All right.