 Hello and welcome. My name is Shannon Kemp and I am the Executive Editor of Data Diversity. We'd like to thank you for joining the current installment of the Monthly Data Diversity Smart Data Webinar Series with host Adrienne Bowles. Today Adrienne will be discussing getting started with streaming analytics and the Internet of Things. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share how it's your questions via Twitter using hashtag Smart Data. If you'd like to chat with us and with each other, we certainly encourage you to do so. Just click the chat icon in the top right for that feature. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our series speaker for today, Adrienne Bowles. Adrienne is an industrial analyst and recovering academic. I love that. Providing research and advisory services for buyers, sellers, and investors in emerging technology markets. His coverage areas include cognitive computing, big data analytics, the Internet of Things, and cloud computing. Adrienne co-authored Cognitive Computing and Big Data Analytics, published by Wiley in 2015, and is currently writing a book on the business and societal impact of these emerging technologies. Adrienne earned his Bachelor's of Arts in Psychology and a Master's in Computer Science from SU and Y Binghamton, and his PhD in Computer Science from Northwestern University. And with that, I will give the floor to Adrienne to get today's webinar started. Hello and welcome. Great. Thank you, Shannon. I think this is the first time that you've introduced me as the host and not as the new host, so I guess I'm settling into the role after four months, you know? Well, thanks to everybody for joining us here. This is terrific. Before we started the call, I was saying to Shannon that it was kind of interesting in the first three months. We've had people from over 40 countries join us on this series. So wherever you are today, tonight, tomorrow morning, I'm not sure which time zones we're covering, but welcome to everybody. Today we're going to talk about getting started with streaming analytics and the IOT, and if I can – oops. This is the first time we've done this with presenting from my computer. There we go. So what I'm going to do is give you a little background, some context, why this is important, new data sources, et cetera. Then a little background on streaming analytics in terms of what we're talking about and what we consider to be streaming and what isn't streaming. An overview of the importance of open source data and open source tools in terms of building these applications. Then I want to spend some time looking at what the vendors are really doing in this area because there's a lot of hype, but there's also some real value being delivered. That's a pretty exciting time to be working with analytics and the IOT. So let's get right into it and start with new data and new demands. Most of you that have been with us since the beginning when we started the series in January, I used this slide once before to talk about the fact that we're going to be covering the Internet of Things. You'll hear me talk about the Internet or the Internet of Things or the Internet of Everything, almost interchangeably because that's where the world is going. And the picture on the left, that's actually me interviewing John Bates, who's the CEO of Plat1, and Plat1 is going to come up later in the talk as one of the vendors to watch in this space. But I think the quote here will kind of set the stage. So Internet of Things is often referred to now as the beginning of the new industrial revolution. And I don't think that that's overstating it. What we're talking about is really a level of connectivity that we've had in science fiction, but we've never had in reality in the past. And this goes, for me, back to the late 90s. Jana mentioned that I'm a recovering academic. In the late 90s, I was teaching the business school at NYU. And one of the things I talked about was the importance of getting everything that creates data or about which we want data online so that we can start to take advantage of understanding things and not just people, so the internet moves from being a pure communication device to being a connectivity device. So that in mind, we'll briefly look at these terms, Internet of Things, Internet of Everything. Once we get to the point that everything is connected, we're going to see new sources of data. We're already seeing that. We're going to see a lot of examples today. But the important thing for businesses is that there are new sources of value from data that was already there that we really couldn't leverage in the past. And we have to start looking at things and changing our assumptions about what we should measure, whether we should be sampling data or actually going in and looking at everything. This is something that we'll get into today. We also have to change our assumptions about who owns data and what you need to do to communicate and to improve relations with your business's clients, your supply chain, regulators, basically everybody that you communicate with. So the IoT, the Internet of Things, and I want to say I still think of it as the internet because basically we're layering on top of the internet and I will talk about the industrial internet. But we're seeing already new technologies, new business models. And what I'll look at today is some of the new ecosystems because I think that's pretty exciting. This just happens to be a new story from a couple of months ago. Qualcomm buying a chip designer to tool up for the Internet of Everything. Qualcomm, of course, we've mentioned a couple of times in our talks about cognitive computing because they're one of the first companies that really made an investment in neuromorphic chips for machine learning. Now we're seeing this sort of second line and it all comes together. We'll talk probably at the end about how we are seeing this convergence between the Internet of Things, analytics, cognitive computing, it's all coming together to change the way we interact with each other in society, but also the way businesses interact with each other and also with their consumers. So what we'll see as we get into the examples is that once we do have data coming from a lot of new sources, very high volume data, and I'll try and define that in a minute, we also have to look at where do we put the intelligence in our systems. And just as we'll see a sort of a migration or a shift over the years from putting data in one place and operating it on it in that place and then moving it somewhere else to what we have today where the operations on data tend to follow the data or to migrate to the data, it's the same thing with intelligence and there's a real movement afoot to make all the devices, the devices that are intelligent or to have what we think of as intelligent machine learning components that are leveraging all the data that's coming from these sensors, sensors and sensors, that's a topic that's coming up in another month or two. But let's start today looking at the new data that's available and how that's changing the demands, and also how it's creating new opportunities. So I thought this would be interesting to start with. For comparison, Gartner said that by 2020 there will be 21 billion IoT devices, which is cool except they had to be run up by businesses, which said by 2020 there will be 34 billion devices. The truth is I have no idea how many devices there will be by 2020, but if I look at the expansion in the number of things that are being added to the net and that are discoverable, I wouldn't count out certainly the higher of these two estimates. We could be really low because as you start to put things on, it's like the network rule, if you think about it, having one telephone didn't do much good, that second one got a lot of calls from the first one. Once you have a network effect with millions or billions of devices and they can talk to each other and create some value from those interactions, then the rate of expansion is going to go up. And so this was merely to show that we don't know how many there are, but there are going to be a lot. If you look at your own environment, the type of folks, the job titles of people that register for these webinars, I would assume that most of you already have at least half a dozen devices that are on the Internet, whether it's your computer, your smartphone, your tablet, your personal exercise thing, like a Fitbit that's communicating with your system. We're getting to the point now where early adopters at least, it's not unusual to have all these things going at once and communicating as we start to get more personal devices and your refrigerator connected and everything else. These numbers are going to start to look quite reasonable very soon. So more demand. YouTube, I do a lot of work where we post videos on YouTube, but it's still amazing to me that every minute of every day YouTube is getting over 400 hours of content uploaded. And if you start to look at what the data rate is on that, depending on whether you're doing a lower-high resolution video, you're talking gigabytes to terabytes for individuals and certainly into the petabytes and probably exabyte for YouTube. Now, when you start to look at all the other places that are collecting this data, and we'll see where there's some interesting opportunities for companies and even individuals to leverage that onslaught, if you will. So here's one, General Electric, which is a hot topic. I happen to be in Connecticut about 10, maybe 10 minutes away from GE's World Headquarters, which is now moving out of Connecticut. But they have done a lot with the industrial internet, and they're going to come up again in a minute. But here the interesting thing for me is that their local motive division, the local motors today have so many sensors that they're computing something on the order of 150,000 data points per minute. If you just stop for a second and think 150,000 data points per minute, is that something where you're going to use all of that information? Are you going to sample it? Are you going to abstract it out? Are you going to save it until the train stops and then download it and use it? Well, those are all questions that need to be addressed. The fact that we can compute or can generate 150,000 data points per minute on a train that's going across country, does that mean that those things are all relevant? And that's going to be part of what we come to at the end in terms of deciding what to use and how to use it. So this is something that I think is just an orders of magnitude change in data processing and opt-t from what we had just a few years ago. I wanted to note that most of the examples or many of the examples that we use today are from commercial ventures like GE and some of the others. But there's a movement right now that's been picking up steam to make government data or data collected by the government open and available to the public. And what that means in the case of New York City's open data initiative and open data portal is that with very few exceptions, data that the city collects, and this is done in many places now, is available to the public. And you as an individual can go and get information out of these databases. Particularly, they're not that user-friendly, but there will be interfaces and APIs so people can create value by the way they analyze that data. And so that's another avenue here that's changing the way we think in terms of what are the opportunities. I'll give you one more example here from entertainment. Most of you are probably familiar with the change in the Netflix business model, for example, over the last several years, going from they would ship you a DVD when you were done with it, you would ship it back and you'd always have something in the mail going back and forth to on-demand or streaming data where you're getting the movie, the TV show, whatever it is, as you want to watch it. It's not actually stored on your machines or having to deal with bandwidth issues, but also infrastructure issues with little diagrams on the left. We're just showing that there are different ways to look at this if you've got a lot of data that needs to be moved from one point to another. In the case of Netflix, we're not doing analytics on the frames within the video, but there's still that choice that has to be of data. Are we sending one copy that's been replicated or if we're doing something that's on-demand like this, we're probably sending a new copy to each of the users. The analytics comes in here when you start to see recommendations. So if you look at this one, I think I've cropped anything that would indicate which shows I happen to watch or my children watch, but basically the analysis looks at what's being streamed, which user, in the case of a family that's using something like Netflix, I think we have five people that are registered at our house, I can tell when somebody has signed on with my account and I get a recommendation for something that I would have absolutely no interest in because their algorithms are actually pretty good based on looking at your real history. One more thing here, I don't know how many of you have reviewed your own data usage recently. I actually created this slide this morning because we're changing cell plans here. I was a little surprised to see that I had personally used Shannon, did we just lose the screen? There we go. That I personally used 29 gigabytes of cellular data in the last eight days. When we think about it, that's certainly a lot of data being used because on my phone I'm also doing predominantly on Wi-Fi. So now I have to go ahead and do a little analysis on how I'm using really 30 gig in just over a week. But the interesting thing there from an analyst standpoint is that's what I'm using, that's what I'm downloading, but at the same time we could calculate what was being created by the phone as it captures data from everything from the accelerometers to the GPS looking at what's being tracked. So I may be downloading nearly 30 gigabytes of other people's data, but the reality is at the same time I'm creating gigabytes of data, or my phone is creating gigabytes of data about my whereabouts, my behavior, and some of that would be available anonymized to create new value. So the last example I want to give before we move on to the next section, you may have seen this one before. This is from a site that I have a really like called ThinkinFull.net, and this particular snapshot I took a couple of years ago. This is a search engine that looks at open devices and open meeting that the data is available to the public or the presence is available to the public in a number of different categories. And so I happened to zoom in on a hotel where I was giving a talk and found the number of bikes that were actually parked in a bike rack because they have sensors on it at the hotel where I was. And each of these circles represents a device. It could be a weather station. It could be a navigation buoy in the harbor there. But the point is that more and more things are being added to the Internet of Things every day. And our challenge is to figure out which ones have business potential for us, which ones we can use, and what else we can do to, again, as I said, create that value. So we'll look now at the data itself. Let's look at the idea of streaming analytics as a subcategory of analytics. We've talked about analytics a lot in this series, but now I want to focus on streaming analytics. And I will say that if you were with us on one of the previous ones where we talked about machine learning, we talked about different types of analytics, predictive analytics, descriptive analytics, and even prescriptive analytics, any of those can be applied as streaming analytics. So whether something is streaming or not is a function of how the data is moving. We'll just go to the attributes of streaming nature and see how that fits with streaming data. So you've probably heard the saying that you can't step twice into the same river. It largely interpreted to me that every time you step, even if it's the same place, if you're dealing with GPS coordinates, you're in the same place, but it's different water. There's different things in the context. And so things have changed even if they look the same. And so if we're trying to analyze the contents of the physical stream as it passes through, you have a few choices. You've been trying to divert the flow so that you can measure it and see what's in it and analyze it or you're going to pool the data. Pardon the pun. You know, we put up a dam and then we can look. And it is interesting to me that we use so many analogies or labels from traveling water to, you know, data lakes, et cetera. But every time we try and understand what's in this flow, we have to be careful that we don't actually change the flow, right? Unless that's our intent. I mean, putting up Hoover Dam certainly had a lot of intended side effects, but there's also those unintended. And so it's very difficult to evaluate everything without changing the flow. And when I say ask Heisenberg, I'm not talking about breaking bad. I'm talking, of course, about the uncertainty principle. It's very difficult, if not impossible, to measure everything in your data stream without changing the properties of the stream. So the last option there is just to sample what's in there. And I think of that as kind of the catch and release. You go in, you look, whatever it is that you've taken out, you put back in. But when we're dealing with data, in some ways it is just that simple and in some ways it's not anymore. So here, when I look at the evolution of data management, you've got the – I'm not going back to several hundred years, but just in the last century going from paper card catalogs to early disk drives to the 60s and 70s when we started to actually have real commercial database management systems. And the last one is a modern, if you will, graph database. But what they all have in common is we typically perform operations on the data at rest. So whether we pull the data into a database and then do our analysis on it, or whether we have to move some analysis or query potential to the data, whether you're dealing with things like stored procedures, et cetera, the data itself isn't moving. It's not in transit as you do the analysis. But today, we've got a whole new order of magnitude in terms of speed. And just a couple of examples here, Delta Airlines. So they're processing five million business events per day. And a business event for airline, we're talking about transactions, we're talking about scheduling, we're talking about personnel, but five million events per day. Pratt and Whitney related to the data – related to the airline industry, of course. Pratt and Whitney Jet Engine today has typically 5,000 sensors that are producing 10 gigabytes per second per engine while the engine is running. And if you think about that, that's a lot of data. And that's going to come back to this question as we look at all the different sources. Is that something where you need to be looking at all 10 gigabytes per second per engine in order to have value? Or are you looking at pieces of it? Or are you looking at the aggregate? Is this something where some of that data you need to know about right away because you're dealing with failure? Some of it you're looking at it and it's going to give you information in the aggregate if you compare what's happening right now, what's streaming with the engine that's actually running with historical data. You can do some pattern matching, you can do some machine learning to predict maintenance. And so it requires immediate actions. Some of it doesn't. Just to compare the Jet Engine to a typical Formula 1 car sensor, a Formula 1 car has sufficient sensors to produce about 1.2 gigabytes per second. So a Jet Engine is typically – well, it is in one way more complex in some ways simpler, if you will, than a Formula car engine. But the amount of data that they're actually capturing is significantly higher for Jets today. My little vitality to comment at the end is that's all great because we need to be able to look at this stuff, but we're looking at it not only to understand what's happening now, descriptive analytics, if you will, but we want to be able to take that data and use that to predict what's going to happen in the future. If you're looking at 10 gigabytes per second per engine, not a multi-engine airplane, there are some things that you're going to predict that you're going to be looking at into the future where you should have a horizon of maybe a week to a month in terms of maintenance that's coming up and others that you need to be looking at in much closer to real time. So we need to look at what the architectures are that will support that. So in a nutshell, when I'm talking about streaming data today, I'm talking about data that is high volume and high speed. If it's just high volume, but the speed at which it arrives and the speed at which it needs to be analyzed are not a consideration. I can get all this and I can run it overnight. Maybe I'm looking at historical performance of my business. I'm not dealing with real time. Fine. That's not necessarily a streaming data issue, but if the speed at which it's coming in and the speed at which it needs to be analyzed is high, then it could still be something that needs to be processed as it passes, and that's really kind of the criteria for streaming analytics. So if I'm building a system to make recommendations on Amazon, for example, I go on and it'll give me some recommendation. The performance that's really required there is a couple of seconds, right? Because if I go on and I go somewhere else, they've lost the opportunity. Now if we start to layer in geographic data and say, well, I want to make a different offer to you if I know that you're a customer and I have your customer history, I've got all that data, and you're in my retail store and I can tell based on your cell phone location, which you have opted in at some point, that you are leaving, you're heading out the door, I may want to make a completely different offer and the timing there is very important. So if you're 10 feet away from my store and you're heading to my store, that's a completely different business proposition than if you're 10 feet away from my store and you're heading away. So let's look at the issue of architecture for a minute. So here I've got a simple representation of kind of a typical conventional architecture. If I have multiple data sources, you've got data flowing from one place to another. I spent a lot of time in the 80s, working with a New Orleans company doing data flow diagrams and modeling the flow of data through systems. So here, this is like a directed graph. You've got the data coming into these circles, which is where some processor transformation happens, and then it goes out and eventually it gets stored or the two arrows that go nowhere are typical of some systems that I found over the years where data was being produced and it just never went anywhere that it could be used again. But the key here is the actual analysis is done when the data is at one of these points along the way. So it's data flowing on the edges and the queries are on the vertices or the points as opposed to the lines. Now, in a streaming situation, you've still got the data flowing on the edges, but you can have a query anywhere. So think of it as almost like a phishing net that you stick in that stream and the water's going through, and if something of interest comes by like a fish and it gets caught in there, well, then you have actually changed the data, obviously, if you caught the fish. But just think about this as a query is something where the query is stationary and the data is flowing through it. And that comes back to that question I had before, which is do we need to look at every piece of data or do we sample? And just to make sort of a side note here, if you've heard the term complex event processing, that's really what we're looking at here. We've got these events and we want to be monitoring and know as something passes in front of us. So to address the sampling question, I'll give you an example from an audio wave. Most of us have digital music today in one form or another, whether it's something that you're downloading, you're streaming your music or you've got it on iTunes and you've got a CD that was actually recorded in digital format. The issue here is if you have a signal and data is really your signal, in this case it happens to be a sine wave, you can see on the left what the wave looks like. It happens to go up and down around 440 cycles. Well, the red lines, the red vertical lines represent where I'm sampling. That was the sampling interval that I used to look at this data. Every time I sample, it happens to be at the high point at 880. So what you see on the right is that's what I would think I was looking at. It would be a straight line because every data point that I look at looks the same. As opposed to if I sample more frequently, then I'm going to get a more realistic view of the data. So there has to be something between not sampling enough, sampling frequently enough to understand what the pattern is that you're finding. That's easy to do when it's a simple sine wave. There's certainly well-known formulas for doing that, microsequency and all that good stuff. But when you're dealing with data and you don't know what's in there, the issue is at what point can you be comfortable that you have enough data that you've sampled enough that what's going to pass you in the stream that you have a good picture of what you need. And I've seen someone recently doing a sampling of real big data in operational efficiency for IT organizations. So we're looking at systems in a shop and the manager said, well, you don't need to look for things on a particular port because we've blocked that off. And it turns out that what you think you have and what you actually have could be quite different. So in that case, they tried looking at everything and discovered that the policies weren't actually being followed. So as we start to build these systems, it's really important to make some choices here in terms of sampling versus capturing everything. And although it sounds trite and simple, the general rule is capture what you can. You may not use it today. You may not store it in the end. You may decide to sort of have a trailing window. But the more data that you can capture in a streaming event, the more options you have for analyzing it afterwards. You may analyze it all as it passes through. You may analyze a sample but keep the record of the rest of it. Those are design choices that we need to look at. So now I'll take a quick look at where this fits with open source and open data. And just two slides on this. So the first, if you're building a system for streaming analytics today, you are almost certainly going to use some open source applications, open source system programs in your architecture. In this diagram, I happen to borrow from a company called Stream, which is a streaming analytics vendor. But what I like about it is that they've segmented things into open source projects that are useful for data collection versus data delivery and all the different sections in the middle. In an earlier webinar, I think it was in February when we talked about machine learning, we talked about the Spark and the machine learning libraries for Spark and things like that. So here, the point of this for streaming analytics is there are a lot of systems out there. Most of the or many of the systems that we'll be talking about in the next 10 minutes leverage data that's coming in from things like Hadoop that may be organized using some of these other open source projects. But the change in the industry right now is how do you take all that data and create new value out of it, the benefit of using an open source infrastructure or architecture, if you will, to actually store it and maneuver or manipulate the data is that these things have really built up. What are you dealing with, like Hadoop or Kafka or Cassandra? You're dealing with something that has a lot of users and there's a lot of progress being made that for practical purposes you're not paying for. Certainly the larger companies are actually investing in that. With that in mind as we move from putting all your data in these things that's coming from more traditional sources, you may be putting all your customer data in there, you may be putting all your manufacturing data in there. Once we start to say let's look at it that we're going to be putting the sensor base of the IoT data that's originated from the Internet of Things, then that's a little less mature and that's why I have the next slide looking at the industrial Internet Consortium, which is a recently organized group. It's actually managed under the auspices of the OMG, the object management group. What's interesting here is if you look at what they're doing, it's not really a standard organization but they are collaborating and coordinating on what's being offered. I think that you'll start to see standards that emerge from the group that then go into other places. The IIC is not actually itself a standard organization. The original contributors, sorry, the original members that founded it, AT&T, Cisco, GE, Intel and IBM. To some extent you might think well that's the usual suspects if you're dealing with let's say Cisco, Intel and IBM but then you look and say well what's the GE role or the AT&T role? Obviously Telecoms produces a huge volume of data as I described in terms of my own usage in the early slide with the head of AT&T talking about the importance of the Internet of Things and GE is now building a platform or has a platform because they have turned all of their complex products from locomotives to jet engines, et cetera, into sensor enabled information producing devices and so that really makes sense. I just mentioned these half a dozen companies. The Industrial Internet Consortium today has a couple of hundred companies and I would encourage you to just take a look. One of the reasons I think that's important is if you're looking at some of the newer vendors, some of the smaller vendors, the emerging vendors, it's good to see if they're actually supporting or participating in some of these organizations. So that brings us into the final section where I want to talk about how I'm segmenting the market. One of the things I said when I put the abstract for the talk was that the markets are realigning and I think there's a couple of things to look at here. The first is the Internet of Things. It's very straightforward to create a device and get on the net, get an IP address associated with it. Many of us do that with devices without even thinking about it. But there are a few companies, again, on the left-hand side, some of the giants with their own IoT platforms that they're putting together and that's why they're so involved in groups like the Industrial Internet Consortium so they can influence what's happening there and how it fits together. So GE, for example, has their predicts, that's P-R-E-D-I-X. It's a solution that is basically the foundation for all of their own applications. But they're using that as a way of communicating with their suppliers and their users. So if you want to build applications, if you're a GE locomotive user and you want to do your own analysis of the data, you would do it going through that platform. Cisco has what they call the Jasper Control Center, which is their proprietary system to help their business partners launch services that they know are going to work with the Cisco environment. IBM has made a big investment in both cognitive computing and in the IoT, and now we see it coming together in what they call the Watson IoT. So again, that's another way of interacting as an application developer where you can build apps that interact with these frameworks, these platforms, and AT&T and Intel has what they call their developer resources for helping you put your devices on their platform. On the right, I just wanted to bring up a couple of these smaller vendors or newer vendors. So I think it's interesting to see that it's not all the giants. There are a lot of innovation coming from smaller firms. I'll mention middle FT is Flow Things because I don't have a slide that goes into their architecture, but they're doing some interesting work in particular with Smarter Cities and what they call Smarter Agriculture. One of the examples that I thought was kind of interesting with them is helping a cooperative or a collaborative group on Cape Cod where individual shell fishermen can use sensors to be able to verify to a restaurant, let's say, that when something was caught and it's put in a cooler, the sensor will monitor where it is and what the temperature was between the time it was caught and the time it's delivered to the restaurant and ensuring freshness that typically a sole proprietor out there in the clan diggers couldn't do without this type of an infrastructure. C3 is a very interesting company to me. I worked with them when they recently renamed themselves C3 IoT. It used to be C3 Energy. I know it was founded by Tom Siebel who founded Siebel Systems and X Oracle. They've built a platform and I put up this diagram because as it turns out, where I live in Connecticut, about every month we get one of these things and I only recently found out that it was being done from C3 analysis of my utility company's sensors. And so this constantly goes on. They're constantly monitoring. They provide information to the utility in a real-time or near real-time fashion and then they provide information to the utility customer like me on a monthly basis. I just put it up here because every month I get one of these and it really annoys me because I'm always way worse than the average home and it probably looks like I'm running a meth lab. I'm actually just running a pair of 27-inch monitors and I've got a lot of teenagers in the house that use a lot of hot water which runs a lot of power. But this is a system that was originally developed for energy monitoring and is now being used for all sorts of different applications where you have a lot of sensors connected to the IoT. And the last one in this category, PLAT 1. I mentioned my interview with John Bates earlier. PLAT 1 is another internet of things device platform to enable the monitoring management of complex systems. I think I've got a couple of hundred thousand devices under management right now. I just choose a couple of examples. I have a version that's being used in smarter cities and another one by utility. So these are two areas that almost everybody gets into if you're doing IoT. All right. So we're going to quickly look before we open it up to the questions at the streaming analytics vendors. Here on this slide, I'm really just showing you again the giants as you might expect for streaming analytics tools and platforms. IBM, Amazon, Microsoft, they're all in there and we'll see that they also have something else in common in a second. But companies like SAS and SAP and TIPCO, Software AG, Informatica all have streaming analytics tools and suites, if you will, that have matured as a result of them being in the analytics business before there was a real requirement for streaming analytics. What I wanted to show with the device on the left just quickly is that each of them is also focused on offering analytics or streaming analytics for IoT as a service. In an earlier webinar, we talked about machine learning as a service or analytics or insights as a service. These three are sort of the top of the pile, if you will, in terms of investment and breadth of the suite of services that they're offering. This one happens to be Microsoft Azure and showing how you can do all sorts of predictive analytics by integrating their tools as a service on demand with your sensor-based devices. This one happens to be Amazon. Again, I chose this just to show that it really is, even though Amazon obviously has some proprietary sauce, what you're dealing with in terms of the data management, I do BenSpark open source, so it all works together these days. And the last one is IBM, in this case DB2 with Blue. It's the analytics as a service using a platform for one of the major vendors. In this last section, one full of the time, I just wanted to talk a little bit about the fact that the market is very hot right now and there's some good investment but also some really interesting results. And it just created here for a short list of vendors to watch. There are many more in this space and we'll be doing a report on this probably around July. So if anybody's interested in that, please follow up with me afterwards. I'll tell you who we're actually including. I just wanted to mention a few of these because they've taken an interesting approach. So the first one, Stream is, here's their architecture. Again, it's leveraging an open source data management system and Stream, the second eye in Stream, I guess, is for intelligence. So you're looking at the analytics as it passes through. And this is something that we're going to see more and more of, but the Stream team comes out of some of the larger firms. It's on where I did an interview with them recently that's, I think on YouTube at this point, but they've taken an interesting approach to integrating this static data with the dynamic data or Stream data. Stream analytics comes from a company, it's best known as a service, a large service company called IMSS, and they've commercialized tools that they started to build in their consulting practice. And you can see here, this is just an overview of their architecture. I wanted to point out the same thing that the Streaming Engine that they build leverages the open source data underneath, or the open source applications, the data in open source using a patchy storm, which is a spark, which are the keys that we're still dealing with proprietary tools. And open source tools being well integrated. One more, space time insight. The thing that I thought was interesting here, we've talked to them recently, is that they are very big on looking at location data or geospatial data or geospatial attribute for all data. And so you're looking at whether you're dealing with the business data or the operational data, external data source, it's looking at the impact of space and time. What I don't have in any of the slides, I think this is interesting because obviously most data, there is a time decay, there's a time issue to data, but that location is not always factored in. And that reminds me of the investment that IBM made with the Weather Channel. A lot of times, when we see changes in data, if we're not considering the weather conditions and where the data is, where the action is happening that the data represents. So if we're looking at retail data for a store, knowing where that store is and what the weather was at that time really can change the way we look at it. The last one is from a company called Zoom Data. And what I think is really fascinating to me, and again you can look on YouTube, I interviewed their CEO recently. Zoom has taken an approach to modeling and individualized streaming data. They use the analogy of a VCR or a videotape recorder that you can get access, you can look at something as it's passing through, which is the definition of streaming that we use. You can also go back in time, which many tools do. But what they've done quite different is a technique called data sharpening. I was just fascinated by this. So if you're dealing with this high volume of high speed data that we talked about earlier, you can very quickly get a rough look at what's there. Because they take your query, your data, your DBMS query, and break it up into a whole microqueries. And they can analyze those separately so that you get a rough picture, and then it comes into focus. That's their term when they say data sharpening. It's almost like when you're downloading something, if you ever do your own YouTube videos. When the video first is ready, you can watch it, but you can only watch it in low resolution. And as it gets processed, let's say you share something in high def. You can't see it in high def right away, but you can see enough of it in a lower resolution. It's a good representation of what's there. It's useful, but over time it darkens and it becomes the final answer. So it kind of pulls these things together. And their technique for that, by partitioning queries into these microqueries, solving the microqueries, and then putting them back together, I think that's one of the cooler technologies that I've seen in recent years. So we're just running, my time is just about up. Let's say that getting started, like so many of these technologies that we talk about every month, it really is all about the data. And to decide whether or not you have an application or that you should be building an application using streaming analytics and IoT, start by looking at, do you already have or can you capture streaming data from what you're building? In the case of General Electric, they were able to add sensors to the engines, which created more value for them, but also for their users. Or can you create new value from analysis of open data? There are people out there now that are building systems that give you better access to the data that's being made available by government agencies. They're creating new applications, new ways of interpreting that data or combining it. So if you look at population data, which is open to everybody, and then you look at cell phone data and you can add your own sensors in there, you can create something quite new. So my last slide on this is, the data's out there. The issue is finding the right data for an application that you can create that's going to add that value. And this just happens to be today's snapshot of my neighborhood. If you knew where I lived, you could find me on the map. And this is showing a variety of sensors that are out there right now from personal weather stations to actual commercial data that's being produced. And I think that the timing is perfect for getting started with streaming analytics and the IoT today. So I'm going to turn it back over to Shannon. We have a number of related webinars coming up over the next few months. Just a note here, I get a number of invitations to connect on LinkedIn. You found me through one of the webinars. Just send me a note and I'll be sure to accept because I get a lot of them that are a little strange and generally ignore those. But if you took the time to listen, I would certainly love to connect with you. If you have questions, here's how to reach me. Shannon, back to you. Thank you, Adrienne. Thank you so much for this fantastic presentation. And just a reminder to all the attendees that I will be sending a follow-up email by end of day Monday for this webinar with links to the slides, links to the recording, and anything else requested throughout. If you have any questions, feel free to push them into the Q&A in the bottom right-hand corner of your screen. And Adrienne, it's so funny that you mentioned it certainly is not as a prize to me anyway that Cisco and AT&T and the telecom companies are out there pushing the standards. I know I used to work in a – I was a telecom analyst for a call center and we analyzed over 1,000 KPIs religiously. Most of which came from the PBX, you know. Yeah. A lot of data that flows through our telecom systems. And it's so exciting, too, to see this sort of thing not only improve our business. I certainly live on the real-time analytics of how our website – the health of our website and what people are reading on a daily basis so I can react and give and provide more content of things that people are truly interested in. But it's so nice that it's leading over into our personal lives and I can walk into my house and say, you know, turn on the kitchen light. Yeah. Anything you want to add as to where you see it going? There's so many possibilities. What's the most – you know, it's mind-boggling and the possibilities of what this technology can bring to the world. Especially as you've mentioned, you know, connected with AI and the other emerging technologies out there. Yeah. You know, I started really looking at this in terms of things being connected in 1999. And I was giving examples then that sounded pretty far-fetched and we were talking – again, this was at NYU. And I said, you know, everybody at that point was just really getting into buying all their books on Amazon. And Amazon ran an experiment where they would let you see attributes of other people who were buying similar books and doing recommendations. But they also did this thing – I forget what it was called – where you could see for a particular domain what people were buying from that domain. And they would go on and say, what are people with IBM emails buying in terms of books? Now, obviously somebody from IBM probably has two or three other email addresses, but you could get some insights there. And what we talked about is what are the implications and what would you be willing to give up in terms of your privacy if you could monetize it? I think, you know, even back then – and we're talking about 17 years now – all of the idea of putting devices on was pretty clear that that's where the world was going to go. One thing that I didn't see at all, I thought of it as something that would be good for monitoring health. And I talked about how my smart refrigerator could talk to my key chain, which could talk to grocery stores I passed by, and it would give me a special offer for ice cream. Both the downside of that is that it would also tell my doctor that I was having it, you know. What I see right now is that the price of getting a device on the Web has gone down so much that, you know, it's not just your expensive refrigerator, it's not just your laptop. So many tiny things. Your telephone is obviously on there, but now you have your health-related apps and maybe a device, a Fitbit or something like that. What I think is going to be interesting is what is the next generation just assume will be online? Because I see that with my own kids, you know. Having something that doesn't talk to anything else is just so less entry, you know. I agree. Well, we do have a couple of questions that have come in here. We don't have much time left, but I do want to just quickly throw them out there. Do you know if AT&T is using any particular vendor or streaming, or are they building in-house solutions? AT&T, that's a good question. I don't know offhand what they are saying publicly. Let me see what I can do to answer that. I'll just have to say, no, I can't answer that one right now. That's okay. The other question coming in is how much of the mountains of data is really valuable and can generate gigabytes within minutes of information? But how much of that is valuable, and how much of the data will go into the cloud versus a local gateway? You were kind of talking about this earlier in terms of whether or not to sample and everything else, because there is so much data. Right. Yeah, I never thought about it. I have to make a good joke between jet engines and being in the cloud, because all that data is actually created while you're above the clouds. But some of it needs to be sampled immediately. There are certain things that you're looking for. Anomalies need to be reported very quickly. I think that for most of these things, whatever that number was, 150,000 data points or sensors, there's a reason for collecting it. And you may find that there's data that's collected that's never used, but a lot of that is going to be stored for a period of time, because later analysis is going to find patterns that you weren't looking for before, and then you can go back and see where it fits. In some industries, like in pharmaceuticals, I know that the actual researchers and notebooks have to be kept for a very long period of time after something is approved to be able to go back and look at it. I think that for mechanical devices like this, we're going to come up with at least de facto standards on how long you keep that data. So whether or not it's used immediately, I think everything that's being recorded is being recorded with the idea that it could be useful as conditions change. I don't know if that has questions. The other part of the question was how much it will go into the cloud. More and more of these things are going into the cloud, and the three examples that I gave for analytics as a service are all cloud-based. The IBM, Amazon, and Microsoft approaches are cloud-based, and so everything that you do is basically you're taking your data in their cloud environment and doing the analysis on it. So whatever the percentage is today, I think it's just going to continue to increase. All righty, well, that is all the time that we have for today. Adrienne, thank you so much as always. Just another great presentation. I really look forward to these each month, and thanks to our attendees for participating in everything that we do. We just love all the questions that come in. And like Adrienne said, we'll see you the next month, the second Thursday of each month when we have the series going. And I hope everyone has a great day. Thank you. Cheers.