 Welcome everybody, how's the conference been so far? Interesting, not a bad time, right? Right after the break, at least you get the sugar in, some coffee before lunch or before the break is normally the tough ones when people are not interested in what you're gonna say. Anyway, so but I know I'm between just before the evening social, so I'll try to make it as interesting as is possible. My name is Belvinder Kaur, that's just legal, which basically says that the company owns everything. Little bit about me, I'm an architect, a software engineer, working for almost 20 years, started off with enterprise software, did mobile before Android and iPhone came along. I used to work on this really cool project which was gonna get open sourced or Java based. I was at Motorola at that time and then Android came along and iPhone and the project got canned. So that would have been my big chance of making a contribution to open source. But there's always hope. After that I did a lot of Android, Android camera particularly, and since 2014 I segued into the internet of things starting with the video development kit. Last year I joined a company called App Dynamics which if you're not in the enterprise software business most people would not have heard about it and they have a lot of tools that help do application performance monitoring for web services, databases, servers and they wanted to start up a new team that does embedded agents and that's when I joined the company. So right now I'm in the emerging technologies group there. So you may ask why this talk and why at open IoT, right? Well, the one thing that I have found out in the last year is that there is a problem and nobody seems to be thinking about it or realizing it. So normally when you go to give a talk and I've done a few of those in my life, right? But it's either there's a new technology and you do a deep dive and you go share it with the audience or you have this really cool solution or most of the times it's an emerging technology thing. Look at this new technology. But after a year of trying to work with this problem I realized that most people are not thinking about it and I've got a chance to speak with many, many customers and oh, we didn't even give that a thought. So that's why I decided that forget about talking about the solution, what I really need to go and start talking about is the problem itself. So, and I love this picture because there's actually a baby that says there's a problem and the Internet of Things solution is really in its infancy, which is one of the reasons why people are not thinking about it. So why open IoT, right? This is the company is not an open source. So why am I talking here? So one of the major reasons is that I believe that long term, five years or 10 years, it's really the companies and individuals who work with open standards, open source that are going to be successful in the IoT in actually deploying successful IoT solutions. Maybe there are some proprietary solutions which give you a quick win and some market thing but long term because it is so fragmented, because interoperability is such a big thing, it's only the open source and the open, at least some portion of the solution has to be built or open standards. So that makes this very the right place. I think the people in this room in the conference are eventually the ones who are going to be the most successful ones. Secondly, within the company also, although we don't have too much of an open source offering, there are a lot of us within the company who are very passionate, whether it's the enterprise side or the others, we're beginning to make, start internal processes where we can, we have a hackathon coming up, we have a special open source prize this time. So we're beginning to get there and I hope to see more of it. And finally, I just love to hang out with the open source crowd because people have different motivators. I'm not the one who gets motivated by competition. I love collaboration, so this is the right place and it's a perfect excuse for me to come attend the conference. So what I'm going to do is first, I'm just gonna talk about what the landscape looks like. I'll do a deep dive into the problem itself, talk about what a solution looks like and then just come up with some best practices that I have found in the last year that I've been working in this space and then I've also collected a lot of things from other people who've been working in the space, my coworkers who've been working in the space for seven to eight years, although it's on the enterprise side. So we'll start with the first one. And as you notice, the topic is, are we neglecting response times as a risk? So one of the most, and I don't mean to undermine that these are not risks, security, privacy, interoperability, connectivity are normally the concerns that anybody who is venturing into this field are trying to deploy an IoT solution thinks about. What happens after deployment is a different story and that's what I'm going to try to highlight and spread awareness, if you may. So we have a changing, the consumers are changing, right? We all know Millilians are part of the workforce now and are going to be the ones with the most buying power. I actually wanted to post my teenage son's picture here, but then I thought if I did it, I'm probably going to violate half a dozen ethical, moral, legal issue. Gonna be bad mother, probably corporate legal will have an issues, I would violate teenagers, privacy rights, who knows what. So we'll just stick with Willy Wonka. But really, that's what it is. So some traits of millennials and doesn't like the animation. Let's see if we can go through and see. Nope, okay, I'm gonna change the mode so we can see what this thing looks like. Give me a minute here, please. So this is a generation which is digital native. When they were born, they know what digital is. They're also super impatient, right? And if they want to search anything, they don't need to go to the library, they just Google it, right? Now they don't even need to Google it, they just ask Alexa what they need, right? I have my sixth grade, I do geography homework by talking to Alexa. I had to put a stop to that. Or if they want food, you microwave it. You don't even need to wait for half hour to get a taxi in San Francisco, just Uber it and then they complain it's four minutes late. Super impatient. They expect all services working all the time. There are no excuses. And seamless access from all devices. They want something, school supplies. Oh, can we use Amazon Prime? And then you put it in there, put it in the cart from the browser and they just need a bomb's phone to click send and we're good. So seamless access from all devices. And they're multitaskers. I actually caught my son one day watching television, watching a YouTube video on the laptop and snap chatting at the same time. I love telling stories about my kids. This is how I get back at them. Go to a public conference, tell them what they do, but multitaskers, they manage to get stuff done and very short attention span. So this is the consumer when you deliver an IoT solution, right? And ultimately, whether it is industrial IoT and the worker is there, if they are used to a certain experience in their personal lives, they expect that in their work lives. And for this generation, work and life is mixed. They will multitask all the time. They don't have qualms of working at home. It's not a nine to five or even an overtime. They don't know how to box themselves. Goes back to the multitasking. Now let's for a minute talk about response time expectations. In the early 2000s, from a web browser, people had the patience for up to 10 seconds to get a response. As technology progressed, laptops and desktops became more powerful. Connectivity rates rose. It went from edge to 3G to 4G. And backend servers became more powerful. Response times decreased by about the time when the smartphones had started, you would find 50% of the population would drop at transaction. They were making or a purchase. They were making if it took more than 10 seconds. That was the flexion point when they started expecting more. Right now, for all UX designers, the target is 1,000 milliseconds or one second. At one second, your brain knows that there is a break, but it feels in control, so it's acceptable. At 0.1 second or 100 milliseconds, it's instantaneous. So the good news is that we don't have to go below 0.1 second. After all, we're human beings, right? And I think the internet of things is going to be harder, because the classic example everybody gives is that people are used to, the moment you turn on the switch, in that instantaneous moment, the light bulb goes on. So there is no excuse that if you have, it's talking to the cloud and talking to the rest of your smart home that there can be a delay in the light bulb turning on. So that's the new goal, is 0.1 second. And companies like Twitter and Facebook are already working towards that goal with all kinds of different algorithms and software engineering around caching and fetching data as is needed. So this is an open IoT and an embedded Linux conference. I just would like to know, how many people are aware of the trends on the enterprise side? I'm assuming most people are on the device side. OK, excellent. The enterprise is also changing. And I'll show a few slides on what's happening. Again, seven, eight years ago, this is what the flow looked like on the enterprise side. This is what it looks like now. And all of this, again, comes from all the monolithic services at the back end are starting to break down into what they call microservices. And the biggest motivation for microservices is the ability to deploy quickly. There are software companies, again, like Twitter and Facebook, that can do multiple production deployments in a single day. And the reason is because not only is the development cycle automated, the development and the delivery cycle is also automated. You must have heard a lot of buzz around DevOps. That is the culprit. And the only way they can do that is by having small teams, self-contained. And I actually had a chance a few weeks ago to attend a microservices summit in San Francisco. And it was so amazing because the person, there was an architect there talking about how they have created self-selecting teams. They no longer put individual contributors or engineers in any team. You can basically go and sign up for the team you want to. And they have self-contained teams with a product manager, a person, a QA person, developers, and DevOps all contained. And the only way they can accomplish that speed of deployment is by breaking it up into microservices. So what's becoming more complex on the enterprise side? Basically, a lot more access points. And we know that, right? Everything, every device that gets connected is a new access point. This is the microservices piece of it. The third thing that has gained a lot of adoption are cloud services, even very traditional companies that would never put anything on the cloud, right? Everything had to be on-premise. And there were just processes after processes after compliance and regulation are also moving to the cloud. And we know that from the success that the AWS and Azure have had. So now there are a lot of external services, right, that are part of the enterprise. And everything is now extremely async. If we go back to the previous diagram that I started with, it was just a linear flow. You know where it started from. You know the next point, the third up, the fourth up. And then there's a return response back to the client. That's no longer the case now. Last is short-lived. So in the keynote this morning, they were talking about lambda functions, right? And I believe Google Cloud has a similar one. Basically, serverless architecture. Even the backend is transient. It comes and it goes, which means you don't even know at a given time what was up. Basically, all that this leads to is that it's become very complex, very hybrid. And if a person needs to see whether your system is running with the performance that it needs to, it is no longer a human task. It has to get automated. So up time is equal to the business running, right? And that's why I say neglecting the performance of your system end to end from the edge to the backend. It may not be like security or privacy. But it is critical to the success of the business. And that's why it's important to pay attention to this. So how do people start launching an IoT solution? The way I look at it, there are three pathways. Either you were a traditional embedded device manufacturer and you started connecting your devices to the backend with cloud services. You provide new services. And that is your pathway. So device manufacturer, now connecting to the cloud. And things have changed because you used to build it, test, test, test, deploy. And other than the service organization, going there for periodic services, depending on the service contact, you really didn't know what happened to the device. Now it's connected to your backend. You're providing new services. You're making money off of it. You need to know what the device is doing. Second case, you're an enterprise provider. You provided web services, web solutions. The only two clients were your desktop, browsers, or mobile. And now you have all these other devices that are coming into the mix and you don't know how to deal with them. You don't have embedded expertise within your company. And finally, the third one is it's a startup right from scratch. Everything you're building from scratch and new both the sides. I found this very interesting. This is a survey that the Eclipse Foundation did last year, 2016. By the way, the 2017 one is on right now. And I think it's on till March 17. I have nothing to do with Eclipse Foundation. But I found this survey from last year really useful. So if you want to take 10, 15 minutes and share your knowledge, I think the whole community would actually benefit from our joint learnings. So like I mentioned, performance at this point was number five, I think. But the moment the solution got deployed, it rose up to number three. Because now there is a person sitting in the operations department and needs to know, is the device up and running? If people are complaining, is it because the device went down? Is the application on the device went down? Was the network connection bad? Or did something happen at the back end? And there is no, if you do not have tools in place, there is no easy way to find it. So the KPI for success is MTTR, Mean Time to Resolution. Resolution from the time a problem was detected to it was resolved. It's very straightforward to understand. The other thing I'd like to point out is that the performance of devices impacts that of the back end and vice versa. So there is no way to ignore one or the other, or you cannot have an isolated solution for monitoring the runtime or uptime of your solution. With this, I'd like to start digging a little bit deeper into the performance, into the problem. So we'll take two use cases. The first one, like I said, there were three pathways. I'm not going to talk about the startup. Just going to talk about the other two cases. Teresa, she's a director at an IT services company. They provide inventory management services. So far, there was always she used to manage the uptime or the operations of the web application. Her clients were mobile and web. Well, IoT came along. It's really cool. They have to do something new for their business. They decided to deploy an automated RFIDry inventory system. But what started happening after some time was that it's now possible to track inventory, but the web services started becoming slow. Life is not looking so good for her. Next one is Ivan. He's the head of operations at a white goods company. And they recently launched a new line of devices but the consumers are complaining of unresponsible control panels. Again, connecting to the cloud not so good or not so much fun. This is pretty straightforward. Where can problems happen? Either the device itself is unavailable. It's unhealthy. There's a network lag. Or now that we have third party cloud services, maybe there's a problem with that. As an example, we have all these smart point of sales systems running Android now. And different companies create their little embedded applications, put it on the point of sale systems. And they talk to their own back end. But they also talk to the credit card payment system. And if there's a lag there, they don't know where the lag happened. Did it happen at the point of sale system? Or did it happen at the payment system? The second set of devices happens from, so this one is actually the first one. This is pretty straightforward since we're all from the embedded world. We know what can go wrong just on a single device. But there are problems that happen from aggregation of devices. The volume of data together, maybe all the devices are working fine. But all of them together generate so much data that there's a problem. So the three Vs of data, volume, the velocity at which they data gets sent, or the variety. They're also highly distributed and hybrid environments. So what are the reasons for poor MTTR? So when we talk to customers, most of them will tell you that, oh, I already have logs. They're on my device, but I don't know how to pull them out. So you have to put the view of the person whose responsibility it is to make sure that the entire system is up and running. The second one is some of them are device logs, and some of them are metrics that already get generated by frameworks, whether it is Backnet, or Modbus, or whatever. But again, they're trapped on the device. They don't know what to do. Let's say they've come up with a solution, homegrown, or open source, or commercial, and they're able to pull out all the data, whether it is logs or whether it is some metrics the system generates and bring it to the back end. But now they have different systems. The logs from the mobile apps is coming in a different format. The ones from the embedded devices is coming separately. Back end application is coming separately. They have to sit there, open multiple panes, and try to correlate the data. Manual correlation. It will increase your mean time to resolution. Let's say they've managed to do that, too. They've glued things together, and they have a single pane, but they still don't know really. All they know was that, oh, calls to the database were slow. Or all they know is the mobile device has a long response time. They don't know exactly where the problem is. And you just hand it off to the engineering department and say, there's a problem. These calls are failing at the database. That, again, will take its due course of time, two days, three days, a week. Remember the impatient millennial? And finally, even if they detect that it is this method within a certain code base or this mobile device, it's these two pieces, just organizational gaps. And like the keynote this morning where she was talking about culture, this is a problem. And I believe there was one talk I attended this morning, I think, where they talked about the IT and OT coming together and how it causes problems. This is what it is, engineering, information technology, operations technology. And it's good to be wary of this, but really it's a management problem, but it's nice to know that this is a problem. So what's the solution for all of this? Basically you need an end-to-end monitoring solution. And this solution has to get designed in right when the product is getting designed. I heard a very good quote from a customer once when I asked them, said okay, says yeah, we're connecting the devices to the cloud now, we have Microsoft Azure, we've spent a lot of our development time getting the security piece. I said, how are you going to ensure the runtime performance that the system is up and running? Oh, we really haven't thought about that. The discussion started and then the engineers who were responsible for designing that embedded product said, but we have our budgets, right? We have CPU budget, we have our memory budget, and this thing is an overhead. At that point somebody pointed out that it has to be considered a feature with a dedicated budget rather than a debugging tool that is an unnecessary thing that you put on your device. And then because the back end is so complex and because the front access points are so complex, you need something that can correlate everything together automatically for you. Be able to trace the path of a user from which device it started and where it ended. Remediation, being able to automatically actually correct problems. If not that, at least be able to alert you because you can't sit there and manually look at screens trying to figure out what went wrong. You need to be able to put health rules in place and whenever they get violated, you need to get notified. And finally, analytics. So what does device side instrumentation look like? Okay, same problem again. So basically you need to capture device metrics and device events. What are device metrics? Device metrics are things like CPU, your power levels, memory, top processes, all of those things with a given timestamp. At this time, this was the state of the device. Then events, I'm going into deep sleep mode. I'm coming up. But above that, there is also the application layer. Let's say you have a thermostat. So what was the thermostat doing it? What was the application logic? So events there. So what does this solution look like? Which is enterprise grade, which will be able to, which can support millions of users and millions of devices. So the ability that you can instrument all kinds of applications because you won't find a single language. You need to be able to support different kinds of languages, different kinds of application frameworks, different kinds of devices. Need to be able to aggregate all of that at scale. Correlate the last one. Although it seems almost trivial or frivolous that have a single pain view of the enterprise, but in reality, when there is an operations problem, that is a lifesaver. So a single pain of view. Deep instrumentation. Let's take a minute to talk about this. If your instrumentation can actually pinpoint what line of code caused the problem, where did a crash happen? Where did an exception get thrown? But remember that we handed it over to the engineering department and it took two to three days to a week to actually solve the problem. Then that goes away because now you have a stack trace, you should zip it up, put it in your issue tracking system and they look at it. You've now reduced from days to maybe even hours and deep instrumentation is possible. For some of the platforms, for example, the gateways, if you're running things like Java or .NET and there are a lot of gateways that actually run that point of sale and retail, they support what is called bytecode instrumentation, which basically means that you don't even have to do, there is no development effort, you just do it post compile and you can start getting your stack traces and your crashes. The ability to diagnose problems quickly alerts and then finally measure the business impact. Now let's say I'm at a point of sale system and I have purchased something. I'm also talking to the payment system where my credit card is going to get charged and I also might be updating an inventory system. Now if the credit card thing fails, the impact to my business is way higher than let's say maybe choices, this person bought this, these three things and the wish list maybe or recommendations, Netflix has recommendations, maybe that service, if it gets delayed, it's a different thing. The severity is different to the business. So for within when you're instrumenting all of your code and you're instrumenting your network calls, you're sending your crashes over to your management console, if you can leave breadcrumbs of what business got impacted, at the end of the day, it will be very helpful for you. Because otherwise you could even enter a situation where you have now instrumented so much, you're getting so much data, but you don't really know what it means to your business. So this is like the cherry on top. So let's take a look at what happened with Teresa and Ivan. So turns out the RFID reader was updating the GPS coordinates even when it didn't need to and it was detected again by an end-to-end monitoring solution. They were able to figure out that traffic on the backend system was coming, was very high from these new RFID readers, there was a patch and the problem was solved. And the reason I had to put the smiley there was because if you try to search for the same person with a frown and a smiley, at least in the public database you don't find it. So similarly for Ivan, somebody went and changed the backend and forgot that the dishwashers were also, the washers were also connected to the back ends and they were timing out, which made the panels very unresponsive. Again, coordination between the two teams, being able to debug it and have happy customers. So best practices. Runtime performance management instrumentation is not a must have, it's not a nice to have, it's not an overhead, it's a must have which should get designed into the solution both on the backend and the devices. And when you are designing this, a good overhead is two to 5% is a good estimate. So whether you're allocating CPU, whether you're allocating memory or disk space, that's a good percentage to think of. Now I have some guidelines on choosing an agent. Let's say you have multiple options. What are the features within that agent that you're looking for? It should be configurable because different use cases, different devices have different requirements. Some devices say that when we need to send data over, I don't want the agent uploading any instrumentation data to the backend. Once I am done, they are free to do it, so you should be able to configure it. On the flip side, there's another embedded device which says once I am done, I'm going to shut myself down and save my power. I don't want the agent to be working at that. They should only work when I am up, so you need to be able to configure it. You don't want an agent that you start and it does what it feels like on a background thread. You should be able to have control on it, that now you send your data. This is how much you want to collect. Do you want to store anything in buffer? You don't want to store. So the instrumentation data, the metrics that you're collecting, do you want local aggregation? You don't want local aggregation. You should be able to be controllable. Naturally, I don't think I need to tell this to this audience, but small footprint secure, available in your favorite programming language. Collect and send crash information. I cannot emphasize this enough because I've seen customers get into a situation where they did not even have access to the box. Think of a setup box. They were writing applications. Like you look at your smart TVs now. They're running Tizen or other different things. Companies write their applications. They're sitting on embedded devices. They don't even have access to the crash trays and the application crashes and they don't know why. So the ability to be able to capture crashes and send it over so it can get analyzed for the by the operations team is very, very valuable. And it should be able to support offline mode because as we know, IoT devices can be connected or can be in offline mode. For if you're an IoT cloud service provider and you're looking to bring devices into your, under your wings where you need to look after them, please make sure as part of your acceptance criteria that they have instrumentation in them. Otherwise, just a single heartbeat that comes up every now and then you'll not be able to detect problems at runtime. All the other things that I said it should provide. So the other thing, and this is more for the people with the cloud enterprise background, they don't, they don't normally, they don't always have embedded knowledge and they think that if they have, they've already using an instrumentation solution for the mobile and web and backend that a similar set of criteria will work on the embedded devices. But it's not, you have to understand that there are differences and I have this table which sort of highlights all the different things that a web agent will have compared to an embedded agent and it's important to be knowledgeable of the differences so you can pick out the right agent for you. So guidelines to choosing a management console. So make sure it can display both time series and events. Time series is how things change over a time and sometimes you'll detect problems by anomalies in that itself. Events are, that an event happened, right? Okay, lost. I've reached 10% of my battery or the application doesn't have any more memory. So events. It should be able to correlate all the instrumentation data in near real time automated. Provide alerts, configurable dash modes and the last we've already talked about that. So with this, here are some open source solutions both for instrumentation as well as the management consoles. And I was at the VMware yesterday and they had hooked up their agent with, I believe it was Graphite. Yes, yes, so they had. So that's that I had. So ready to take questions if you have any. Said that again? It's the end of the day. It's the end of the day. Some questions, we can talk about other things. We can talk about nice places to eat around the venue. Recommendations, I've been going to the same place for three times, now I can go somewhere else. No, say that again? Oh, really? Okay. Sure, no questions, no comments. Things to do differently, no? Useful at all? Yeah? Said that again? Okay. Well, thank you so much. Thank you for coming. Thank you.