 I think, guys, probably we're going to start differently today. We're going to start with a demo, just to set up the context. Hopefully, the demo gods are on our side today. And based on that, we'll try to explore how we did this, what are the technologies and open source technologies and API we use to realize that demo. Let's get started. So I have a Raspberry Pi over here. I have a Raspberry Pi over here with a small mic. And what we're trying to mimic is a small speech-based home automation where you can control your home devices over the speech. So we use Raspberry Pi as a cheap mic over the Amazon, bought it over the Amazon, and run it through some of the cloud services that helps us to achieve that goal. Yeah, let's give it a try. Hey, Watson, turn on the light. Yeah, that's a gist. Hey, Watson, turn off the light. So yeah, this is what we're trying to serve. But at the end, we'll do some more demos. What other media channel we can use? At the end, we'll do some more demos with using different media channel. Yeah, let's get started. My name is Prushant Kanal. And I have my colleague over here, Cologneji. We are a software engineer with the IBM, Watson, and Cloud team. And we do a bit of, we work with the Watson team as well as we are the dev advocates and engineer in the Cloud and Watson team. Yeah, let's get started and talk about what did we use and how did we realize this demo using some of the SAS products and the Open APIs. So this is the overall architecture looks like for us. I mean, it looks complicated, but it's not really. It's not really. It's on the far left side we have. We are trying to mimic some of the devices with speech control and mic audio or mic interface, whether it's a Raspberry Pi or your mobile phone or an Arduino or even like some other embedded devices could be chip. For this demo, we've been using Raspberry Pi. So the speech that I just demonstrated over there, the speech is recorded and sent it over to some of our serverless platforms that we've been using. We're going to talk more about why we chose serverless platforms and how serverless platforms has helped us to really get things going pretty fast. So we have a bunch of, we call it a serverless action, just our code that's been hosted on the serverless platform. We composed multiple serverless actions together to achieve our goal. So on this side, this serverless action basically takes in the audio and transcribe that audio into the text using the speech to text technologies. And the output of that serverless action is feed into another serverless action, which runs that text through the natural language classifier to get the intent from the text. We'll talk in detail about what natural language classifier we used and how did we achieve that. And through that, we just run through a small serverless action that is able to use the NoSQL DV to map the device to the device ID and being able to control that device using the MQTT broker in this case is the Watson IoT platform. We used IoT Gateway in this case. We used IoT Gateway. We used just a Raspberry Pi in order to provide the IP connectivity to all of these small devices. For example, over here, we're using the light bulb. So this is kind of like the overall architecture about how we achieved this demo. In the next 15, 20 minutes, we're going to talk in detail about individual component of these architectures. And we're going to start with the serverless architecture, what serverless platform we used and what is serverless and how we use serverless to compose multiple actions together. For that, I'm going to hand over the mic to Kalanjee. Hey, good morning, everyone. So my name is Kalanjee Bankoli. So I'm going to start out by essentially setting this stage for exactly how serverless architectures can make the day-to-day developer options a lot easier. So let's say that you're a developer that comes out with a perfect app idea. So after solidifying the idea, you might spend a good amount of time actually writing the actual application code and then finally get a prototype up and running. So of course, if you want to actually share the prototype and allow for others to be able to consume the application and give you feedback, then you'll need to set up a server and place your code there. So let's say that you put the code on the server and send some friends the link to the beta. So after the first week issue might happen, like your hard drive might crash. And then after you fix that, then a new Linux vulnerability might come out. So you, as the developer, are spending the majority of your time actually firefighting and you're fixing all these other DevOps issues that instead of actually focusing on the actual code and responding to feedback, adding features, and so on. So essentially, this serverless model allows for users to not have to deal with manually maintaining day-to-day operations, such as these hardware crashes and scaling, updates, networking issues, and so on. So these ultimately allow for the user to just focus on their code base. So the idea here, the way that serverless works, is the developer writes a series of stateless, decoupled functions and uploads them to some serverless engine. So once they're uploaded, then the function can be called by either HTTP requests or a change and a service. So these services can be anything like a database changes in social networking and so on. So the term serverless isn't entirely accurate because the server still exists and they need to be maintained by somebody. So many of my colleagues prefer the term function as a service. Yeah, the idea is, again, the user just does not have to deal with maintaining the servers and keeping them up. So there are, there were similar offerings that have been out for a few years called platforms as a service. So you might have heard of Cloud Foundry or Heroku. So these essentially allow for something similar. A user sends off their entire code base and the platform as a service handles all the dependencies, scaling, and hosting. So the primary difference here is that the platform as a service, once it deploys the application, it's always up in its idle and it's waiting for requests and always essentially like charging your account for that uptime. So the difference here is that serverless allows us to spin up portions of the application on demand in ephemeral containers. So this allows us to actually follow the microservices approach. And you only get charged based on how long the actual code is running. And so the idea is that each piece of the application is spun up in ephemeral containers and that runs the code, sends the response, and then gets deleted afterwards. So IBM is actually not the only company offering serverless platforms. There's also AWS Lambda, Google Functions, and I think Azure Functions is the last one. But ours currently is the only open source one. So you can actually spin it up and run it at home or in your own data center. So there are essentially four pieces to open with. So we have triggers, actions, roles, and packages. So explain what each are. So trigger essentially defines which events open with should pay attention to. So trigger can be essentially anything. So this can be a web hook. These can be updates or changes to a database, incoming tweets to your social media account or changes to hashtags. Data coming in from IoT devices, messages coming in to a certain MQTT channel, and so on. And you can also create custom triggers as needed. So the main thing that the developer will ultimately want to focus on is implementing the logic that responds to these triggers. So they can do so by creating actions. And these actions are just essentially functions or snippets of code that can be uploaded to the open with action pool. So at the bottom, this is what a very simple how a world action will look like. So essentially, this is just against some code that would be placed into a file or into the UI. And then that would be uploaded to the action pool using the WSCLI. And there's quite a few languages that we support so far. So it's Node, Python, Swift, and we're in the process of adding more as they're requested. And by the way, Swift is actually like open source language that was released from Apple about a year ago. So it's becoming pretty popular and it's pretty easy to pick up and it works on the front end and the back end. And anyway, so yeah, so the action code is just executed in the Docker container and the result is returned to the user or it's forwarded to another action. So that's called chaining. Oh yeah, so essentially actions can just be, they can be chained together and executed in a sequence and so that you can actually reuse pieces of code and combine them according to the needs of your application. So this allows for people to develop their application in a loosely coupled fashion. Okay, so next we have roles. So roles essentially tie everything together so they define the relationship between triggers and actions. So given the right set of roles, you can have a single trigger kicking off multiple actions or you can have the same action being triggered by multiple roles, it's very flexible. So for example, if you have an application that acts as a security system, you might want to place sensors all on the doors or windows so that when the system is armed, you'd set up roles to say if any triggers go off, take some action by texting these phone numbers. And also if you wanna trigger to kick off some number of actions in parallel. So for example, in the same scenario, you could say when the alarm is kicked off, call actions to lock the doors, flashlights, a starter siren and so on. Finally, so actions and triggers can be bundled up into packages that can actually be shared using the OpenWiz. So they can either be private or they can be shared using the OpenWiz catalog. So you can use these bundles internally or you can choose to publish them publicly to share with all other Wiz users. So now that we have these basic concepts down, let's see how everything comes together. So the execution model starts with an event being picked up by the system. So the event trigger should have some role associated with it dictating exactly which action will be kicked off. And based on the relevant rules, some pieces of code should be pulled from the internal action Wiz pull and then called from there in response to the request. So yeah, these actions are called and then executed in a temporary Docker container. So the idea is that the code just runs, returns a response and then gets deleted. So the way that OpenWiz works is each function that is submitted to the system gets associated with customize a rest endpoint. So you can invoke a function directly by performing HTTP requests. So this can be admitted by essentially any device that has internet connectivity. So whether that's a sensor, your phone, laptop, smart TV, whatever, as long as it's connected to the internet. And as the alternative to HTTP rest requests, we can also use something called feeds. And so feeds essentially monitor services such as a database or message bus like MQTT. And so essentially if a message comes in to a certain topic on MQTT broker, our new record is added to the database, the action can be triggered in response to these changes. So finally, if a developer would prefer to write these actions in a different language that we don't support yet, they can do so by placing the code in a Docker image and building that and uploading the image to our hosted OpenWiz offering. So as long as the container spawned by the Docker image follows a specific API, the custom code should be able to be called just like any other built-in action. Okay, I think I'm gonna go on and... So I'll go on and pass this on to Prashant now. Oh, okay, sorry. So I'm just gonna briefly show you the OpenWiz UI. So essentially what we're showing here is these are, this is how you can actually see and update the actions. So you can either do it like through the CLI or this UI and then you can actually string these actions together in sequences. And also here is a list of some of the public packages that you can pull into and associate with your account. For our demo, we used the Watson speech-to-text package. We didn't really build any and we tried to chain that with other our small actions together to like chain the Watson speech-to-text package with natural language classifier action. So that, and also along with the parser to compose multiple actions and get our work, get the classification result. That's what we used here. Thanks, Kholzi. So yeah, let's go back to an architecture and take a look at it briefly and we'll come back here. So what we just saw us over here through what Kholzi described was we use multiple serverless action and chain them together in a sequence. And as each and every action has their own logic to run like a Watson speech-to-text, just transcribe the speech-to-text. Natural language classifier we'll talk about it later. It's just gonna get the classification result out of that natural text so that we could know the intent of the text, what device is trying to control and what sort of control command it's trying to send. So we're gonna take a look at it in detail. So all these actions, states, once we figure out the state, we're using a NoSQL DV to update the state of the device in, I mean, corresponding to the document of that device in the NoSQL DV, that changes triggers another open-wisks action that has been run through the MQTT broker to control the device through IoT Gateway. We have the other piece as well, like IF-TRIPLE-T, that is the, these are the free web services that's out there in the market. We'll talk about in detail later how these architectures can easily interoperate with these open web services that's out there in the world. So yeah, let's go back to the IBM Watson platform. So the IBM Watson platform provides, it's just a SaaS product. We can even call it a cognitive technology as a service. Some of the cognitive technologies that we used for the demo is speech-to-text service, natural language classifier service, and it has other cognitive services that you can leverage to build a cognitive applications. For example, it has a machine learning as a service as well to run your own machine learning model. It has a visual recognition service that you can use to classify the objects in the image for your business need. And all of these services are configurable to run on your own training model and on your own training data to get the context as per your business. So it's just an open APIs. It has provided a bunch of STP-based REST APIs to consume the services. In other, in some cases, like in for the context of speech-to-text, it has a streaming API over WebSocket as well to use to transcribe the stream of audio if that is more suited for your business need. And it just, in general, it enables developers to integrate cognition in apps and products, whether it's in mobile application or the web application or even on the invader devices. Now let's talk about the Watson speech-to-text platform that we used. Just an API, it has an open API, both STTP-based or WebSocket-based to transcribe speech into text. It's right now, it supports eight languages and it also supports your custom model and custom corpus to give the context for the transcription. And there's a simple example of how you can easily leverage this speech-to-text service over STP, as you can see over here, you're just using a simple curl command to send a post request and you're sending an audio file to the speech-to-text service and it's able to transcribe that speech and gives you back the result with the confidence. Moving on to the natural language classifier. The process of building a natural language classifier and using the natural language classifier service would be to come up with your training model to train the natural language classifier to with the anticipated natural text that you would expect from the user on your application. And being able to classify each of those texts with the intent called classes and once you train with enough data, you can run the natural language classifier from the user text and it gives you the classes for that text and based on those classes and intents, you could figure out what the user is really intent to say in the natural language, in natural text. So this is kind of like a simple training data what it looks like on the dashboard. You can even automate all this training process through the REST API and CLI. Just for simplicity, I'm just showing, you have a dashboard to play with it as well. So as you can see, I've defined a bunch of classes over here. These are like a bunch of off and on. These are the devices, fan and light and it's like an intent of the text. Is it like trying to query the state of the device or it's trying to send a command to the device? So these are the classes, predefined classes I created and I added some training data like the text I would anticipate from the user and for each text, I am specifying what classes it would belong to. And so once you do the training, you can go on and test your classifier with some of the text and see are you really getting the intended classes or not. Like for example, like once you do the training, you wanna, when you run with some of the text, these are the things, these are the classes you get I mean, these classes are different from what I saw over here but this is totally configurable based on how you train and what sort of classes you use to train the data. So for example, for the text turned on the fan in the living room, I'm getting the classes saying that, hey, it's a command, it's trying to say it's on and it's trying to say that it's a living room fan. So that's sort of the classification. So this classification helps us to figure out the intent of the text and you can use this JSON based response to figure out what device it is, what sort of command is trying to send to the device and what confidence it is. And you can easily use those to now communicate to the device. So for communication purpose, we use the Watson IoT platform which is internally just an MQTT broker. You can in fact, you can even use any open source MQTT broker, run it and it's since it's compatible with the MQTT protocol, you can easily leverage any MQTT broker if you wanna use some lightweight broker. But yeah, Watson IoT platform internally is an MQTT broker. It just provides the rest APIs as well as the real-time APIs to communicate with your devices and also to consume the data of the devices. Mostly we're using here to communicate with the devices but it can be easily extended to read the state of the devices and also communicate that state and store that device events. It has Watson IoT platform on top of the MQTT broker. It provides some of the cognitive capabilities such as you can run your analytics but for the purpose of the demo, we just use the MQTT broker and it can be extended to use any features. So I briefly showed you about the IF triple T in the architecture diagram. So are you guys familiar with IF triple T? It's great. So it's just like a bunch of open web services. It has a concept of recipes, you call it an applets. So it is a combination of trigger and action to trigger an action that you can use to control your devices or control your web services. So it is not really limited to IoT only. It has a lot of IoT channel and there's a lot of vendors out there that are pushing their services out into the IF triple T service cloud with a recipe to control their appliances like LG, Wasserdryer, LG Free, Samsung, even though there are a lot of coffee makers. So you can use a lot of the web best triggers like for example, if I receive a tweet that says, shut down the fan and you can even that you can use that trigger to connect to the vendor devices that is already being registered in the IF triple T services. For the purpose of the, I mean, we are looking into some of the use cases. For the purpose of our demo, what we did was like, we use the WiIMO switch that has IF triple T services and you can, we put the light bulb, connect the light bulb to the WiIMO switch and using the maker channel that allows you to trigger those actions through the web request. I can show you later on just on the IF triple T side. So it's just a way to so that how you can easily interoperate with the other actions and services that's out there that is already being provided by a lot of vendors in the devices. Since using the maker channel, you can easily control those, you can easily trigger the actions on those services connected to the devices through the HTTP base. So it's really easy to extend your architecture to connect to a lot of those IF triple T connected devices. Let's see if I can, let's see. If I can show you the, bear with me. So if you can take a look at all these IF triple T connected services and the devices it supports, it has a lot of GE appliances, cooking appliances, dishwasher, dryer, these all now can be controlled through the bunch of actions that you can integrate with it. One of these is a maker channel. So if you use a maker IoT channel and hook that maker IoT channel with any of these devices, now you can easily control these devices over through the HTTP request. That's kind of like what we leveraged for the IoT. One example would be, let's go in and these are like, we call it applets and recipes, they call it. Like, it can even track in Evernote every time you leave the fourth freeze door open. So it's like, it's this device, you can connect with the Evernote service to really any time the freeze door is open, you can send that, you can track a note of that. And then you can even receive a notification through any channel, even to your mobile application. So it's because of the open APIs and web services which can be controlled over HTTP, these are really interoperable with your architectures. So back to, let's go back to and do a quick demo before that, let's go back to the architecture. So for architecture, we kind of explained what we did and how we use the IoT gateway to provide IoT connectivity. Let's do a quick demo to easily show how you can really extend this architecture to work with any media channel that gives you texts such as SMS over the Twilio or any Messenger API. So this action, we know that it's already text the text and run the text through a natural language classifier. So let's leverage that how we can easily extend this architecture to directly send, use a media channel such as Twilio for SMS text and being able to control the devices using this already created architecture. Yeah, let's give it a try. Yeah, anyone can try this. I mean, if you guys can try to send these commands on this phone number and text messages, like this, for now we're just supporting only one device light and on and off command. If you try to send one of these commands to turn on the light in that text, in that phone number, it should be able to control the device. So yeah, this is kind of like a showing how you can easily extend the architecture and all those classifier to use with multiple media channel. In this case, we just leveraged SMS text. So yeah, this is the whole goal of the demo. Yeah, it's gonna be fun. The whole goal of this demo, we're trying to discuss the talk with you guys, the technical recipes that we use to build, just by leveraging all these SaaS products out there to build this Google Home or Amazon, Alexa-like miniature form of those appliances. You can easily build something using those architectures. Yeah, this is kind of like a technical recipe that we came up with. And yeah, with that, probably I'll leave the time left. We have a lot of time left to questions if you guys have any open questions. I'll be happy to accept it. Yeah, there are some, I would, we haven't really turned the numbers on those yet. We were just trying to get the into end functionality working, to be honest. Because we are mostly, even for the speech to text, we are not really leveraging any streaming capabilities. We're just recording the text, saving into the file and sending that file. So using the STTP, maybe one way or two, we can minimize that would be using a streaming capability. But we still, we haven't really turned the numbers on this. But yeah, it's definitely going through multiple hops, multiple actions. So may not be the right architecture for the time critical data. Like you are worried about the nanoseconds latency. You're like a data is coming in the stream of like maybe thousand of messages per second. You wanna immediately process those messages. There might be a better approach to handle those architecture rather than using even the serverless. But for our cases, you're just speaking over the mic. You probably you're not expecting results immediately, like in seconds. But yeah, you wanna make it as fast as possible. But on the contrary, maybe we haven't really played with numbers, but I think there's a room for improvement in this case I would say. Say that again. Sorry. You could. We haven't really used if, this and that for this demo actually. We kind of sort if this, then that is for the cases that's already, there is already an action recipe out there. You like a buy a home appliances from LG fridge. Like, or you have a coffee maker that supports if this, if this, then that services. It just, we're trying to, so you can use this architecture to interoperate those with services. Yeah. We haven't really used if this and that for this. It's solely through the open whisk actions. Yeah, that's a good point. Any other question? It's a, it's just a first in, first come, first serve right now. I don't think we have really worked on, really thought about like handling multiple users to connect one devices. I think the, who, it's a first come, first serve like any, any command that comes in first. It's a pretty single and single data pipeline. And I think for the open whisker maybe Colonsy can explain if there are like multiple requests to the open whisk action. I think it just spins up multiple, multiple actions. So in that case, you might want to, if that is the requirement, but mostly this architecture is for like some of the fire and forget scenario. Like you're probably, you want to streamline event based, you want to streamline some action based on the events you receive. So the order in which you receive the event, that's how your action is executed. So internally there's no, we haven't used any synchronization mechanism. Yeah, I think it's, we like turn off the light is pretty much equivalent to like sort down the light or something it has been. I think it does try to handle those scenarios. I mean, I expect it to, and I think it does. I think probably it depends like the more you train the data, but it's, I think it should be able to handle those scenarios. The way it doesn't, I mean, it tried to do the relationship extraction and entity extractions on those texts. And yeah, it should be able to handle it. The other challenge only we could feel was like coming with the training data, like how do you scale this natural text? Like we have, we are training the NLC saying that turn on the light, turn on the light. So in future add, you add one more devices. Do you go on and train your classifier again, like adding text for each of those devices? So we're looking into that scenario and we're probably working also like you can apply some sort of entity extraction and tagging so that you don't have to train for each and every device, but your natural language classifier is able to extract the entity that you think is a device from those. So those are the, those are I think more challenging aspect we would see going forward, like being able to scale with the number of devices and different heterogeneous devices, but so far in terms of text and intent, it's, it's, it has, it's working pretty well to extract the intent. Any, any more question? All right, so we'll be, we'll be roaming around. If you guys have any question offline, feel free to stop us and ask questions. Thanks guys.