 Hello and welcome, my name is Shannon Kemp and I'm the Executive Editor of DataVersity. Excuse me, Chief Digital Manager. I've got to update my title. We'd like to thank you for joining the current installment of the Monthly DataVersity Smart Data Webinar Series with Adrienne Bowles. Today, Adrienne will discuss streaming analytics for the Internet of Things-oriented applications. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, you'll be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you like to tweet, we encourage you to share highlights of questions via Twitter using hashtag Smart Data. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the top right-hand corner for that feature. And as always, we will send a follow-up email within two business days containing links to the slides, to the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to you our series speaker for today, Adrienne Bowles. Adrienne is an industry analyst and recovering academic providing research and advisory services for buyers, sellers, and investors in emerging technology markets. His coverage areas include cognitive computing, big data analytics, and the Internet of Things, and cloud computing. Adrienne co-authored Cloud Computing and Big Data Analytics, published by Wiley in 2015, and is currently writing a book on the business and societal impact of these emerging technologies. Adrienne earned his BA in Psychology and MS in Computer Science from SUNY Binghamton and his PhD in Computer Science from Northwestern University. And with that, I will give the floor to Adrienne to get today's webinar started. Hello and welcome. Thank you, Shannon, and thanks to everybody out there. Sometimes when I start these, I feel like I'm talking about my kids. You know, it's like, oh, yes, you're all my favorite. But when I look at this topic, it certainly is a favorite for me. Looking at the futures as we move to a world of streaming analytics, and in particular, as we look at the opportunities that the IoT is going to bring to us. So to kick it off, I'd like you to imagine for a minute a world where everything and everyone is connected. Augmented in instrumented reality is reality. So let's talk now about how close we are to that world, how we'll get there, and how you should prepare now to thrive in it. Spoiler alert, we're almost there. So today, what we're going to do is I'm going to present some context, what the IoT does for us in terms of new data and demands and opportunities. Then I'm going to outline sort of a view of what streaming analytics is, what it's all about, the types of technologies and the types of questions you should be asking. We're going to look at some of the open source projects and libraries that are supporting streaming analytics and supporting IoT development. And then briefly close and look at what some of the interesting vendors are doing in this space. So starting out, when everything is connected, what happens? And when I say everything is connected, I mean that every device, basically everything that's manufactured is producing some data about itself or about its condition or about the environment that it lives in. So there's lots of these. And I don't like to quote numbers in terms of how many billion devices there are, but we're at the point now where devices are outnumbering people, just like certain insects outnumber people. And that's only going to continue, particularly as the underdeveloped areas of the world develop. We tend to have more devices than we have people. And I think that what we're going to see is that as the cost of building sensors to report on conditions drops, and it's very low right now, the speed at which we instrument devices is going to increase. So we need to challenge some of our assumptions about what is worth instrumenting or turning into a device that reports on itself and what isn't. This slide is one that I've been actually using for a few years now. It's from 2014. This is a view from a website called Thingful. It was in beta in 2014. It was an attempt to create a search engine for IoT devices. And the reason I use this is because I first discovered it when I was giving a talk at the Seaport Hotel and found that in close to real time, I could look at this site and discover how many empty spaces there were in the bike rack at the hotel. You could zoom in and look at each of these dots. The different colors represent different types of devices. But the point here is that they're all over. They're different types. Out there, some of them are for commercial purposes, like the Seaport. Some of them are for safety purposes, like some of the sensors in the water. They tend to represent things like a buoy or a signal. And the challenge is to figure out what value we can get from the data that they're producing. If you're in the business of creating the data, then that's a different way of looking at the world. But today we're going to look at it as all this data is out there, more is coming. How do we build an architecture that supports it? How do we analyze the value of the data? Are the values of the data and then create value? Sorry, two different versions of the same word. When I say the value of the data, I'm talking both about the economic value and the actual data itself. So we know, and I'll go fairly quickly through the first few slides here, I just want to kind of set the context in terms of the scope. And this one's, again, from a couple of years ago, a locomotive, the GU's featuring, creates 150,000 data points per minute, and these locomotives are expected to last 20 years. So if you think about what that does in terms of the life of the locomotive, how much data is created, it's just a vast sum. And I was thinking about it last night. In terms of just my own week, I started the week in Connecticut. I drove to an airport in New York. There's data being generated at that point by my phone, by my car. There's sensors in the car. Get on an airplane and go through the TSA. First of all, there's additional data being generated. Get on the plane. One of the things that everybody talks about as an example is how much data is actually being produced by a jet engine today. So Delta Airlines reports, or it's been reported that they typically process about five million business events per day. And inside joke with Shannon, I would say it's a little more when it's raining in Atlanta. But on a typical day, five million business events. A Pratt & Whitney jet engine. It's 5,000 sensors. They're producing 100, I'm sorry, 10 gigabytes per second per engine. This happens to be a Pratt & Whitney engine out of a SR Blackbird. This one predates all the instrumentation, or actually maybe it doesn't. We just know what they had a few years ago. And Formula One cars are now producing about 1.2 gigabytes per second. So if you look at all that, these are producing data that we can use for everything from forecasting fuel usage to maintenance requirements. But we also would like to be able to predict the future. So when I say predict the future, predicting things like the engine is going to fail, or a part is going to fail, or a part is going to need to be replaced. We've got all of this going on around us. And the question is, how do we take advantage of it? So one way to do this is sort of a simple two-tier IoT architecture where the dots on the left represent sensors. And if you think about that bike rack for a minute, that's a pretty simple sensor. I forgot how many spaces it had. They've had 12 spaces. And each of those has a sensor in it that tells you whether it's being used or it's open. And now they can report. They can report to the world. Generally, you're going to want to have some system, some compute engine, which may be something as small as a Raspberry Pi. It may be a network. It may be a data center. It may be access to an application in the cloud. And that's fine if you have a couple of those. But as you start to build up and scale up, you really just kind of go into it. Now we think about this as being the jet engine example. You've got hundreds of things in the engine that could fail or that could give you information about activities that you should be undertaking. And as that grows, as the number of sensors grows, it just becomes unwieldy. There's also the issue that sensors can either be in motion, as would be the case with a jet engine, or in my car as I'm traveling, or if you've got sensors on a cruise ship, for example. And a lot of the time, they're not going to be in constant contact with a stationary and reliable compute engine, if you will. So this is not something that scales. What you want to do, what people are doing today, is moving to a three-tier IoT architecture where you have the sensors and you can still have an external compute engine. But between them, aggregating the data from the different sensors or devices are gateways or intelligent gateways. And one of the companies that's doing interesting work on this, I did a few interviews with them recently that are on YouTube. Red Hat is building some interesting intelligent gateways. It's kind of that buffer between the devices and the compute. But once you start to do that, and what you're doing is obviously creating a higher bandwidth, but fewer in number connection between the devices and where the computation is being done, that simplifies things, but it also gives you new opportunities. And so what we can do is to start to move the analytics, the evaluation of the data itself from the compute engine and create these smaller compute engines, if you will, at the device or at the aggregation level. So if let's say one of these is maybe we're doing a automobile system and the pink color here may be representing information about the engine, the purple, maybe about fewer tires. And instead of having a system like this that's reporting on each tire, including the spare, so you've got product going into the dealer or the manufacturer, that gets aggregated. And so you could actually cut those lines, the dotted lines, and say, okay, maybe we need to be able to push intelligence and computation out closer to the edge, which is what we show here. And so maybe we can do some things that require heavy lifting. Like we're going to do a machine learning model to help do predictive maintenance on an engine. We want to get a lot of data. We need to train this. We can train the model in a data center. That was the topic that we talked about last month, the webinar. But you can actually deploy it. You can compress and deploy the deep learning model on an intelligent gateway. So now we're moving the intelligence closer to the edge. In fact, it means that we can kind of cut the cord and we're doing things as they happen or we're reacting to things as they happen close to the source of the action. And that has a number of benefits. It means that we're not as dependent on connectivity. We can, in the case of the locomotive, for example, we may choose to send some information back to the manufacturer, to the leaseholder or a lease engine in real time as things are happening. Some things we may just batch up at the end of every run or at the end of every cruise on a ship. Then we offload the information. So now let's just distribute the intelligence and the analytics and the computation. And so we have things closer to the edge to the devices themselves, to the sensors, where some computation is going on. We've distributed that. And then ultimately we can do the things that require, don't require real time response, get pushed into the cloud or the cluster or the network or whatever sort of device we use. Now, what happens when we do this, though, if we, sorry, going back to the picture for a second, if different companies are manufacturing devices for different functions, you know, if you were thinking about the automotive world, for example, there are several companies that are actually manufacturing cars, but components of cars, there are, similarly, there's a market there, and you'll have half a dozen of companies that are manufacturing tires, different companies that are manufacturing spark plugs, and one car may have components from several different manufacturers. What you want to be able to do is to have this intelligent gateway gather information from the different devices in a way that's standardized so that you can plug and play and you're not dependent on any one manufacturer. And that's where things like the Industrial Internet Consortium come in where companies like Cisco, GE and Intel, IBM and some of the other members, the founding members here, and now there are lots and lots of companies, but we're on standards for communication between these devices. And really, that's what enables us to build the type of gateway that I was talking about. So, looking into where this is going, now we have different companies that are providing platforms for this communication, and just name some of the bigger ones that you would know about. But as we start to do that, think of these as the infrastructure companies, think of these as the companies that are building the roads, if you will, that we might travel on with all this information. We need to get from the platform level to building applications, because that's really where the, I don't want to say where the interesting stuff comes from, because if you're working with a platform company, that's pretty interesting, too. So now I want to look at what is it that's going to be computed and communicated from these various devices. So, very simply, we do a number of chats on analytics if you have questions about this part of it afterwards, or any of them, obviously, just to get in touch. So, when we talk about streaming analytics, let's have a common definition here first. When we talk about analytics, there's generally a hierarchy. Descriptive analytics are things that describe the known data values. Everything is known, so if I have test scores for a particular class at a high school, I know what the values are. Descriptive analytics might be the mean, the median, the load, different scores. You can tie that and say, the students, we can rank them. There's no uncertainty there. There's the values, and they're easily computed, and they simply describe what has already happened. Higher up on the stack may have predictive analytics, this is where most of our efforts are today, that use statistics and a model of the world, including your assumptions about the data, whether it's uniformly distributed, what sorts of rules it follows. But predictive analytics, which gets us into machine learning, is where we use algorithms to look at the values and compute or estimate with some level of confidence missing values. So, if we have information about 50 classes of students, and we have another school that has comparable values for other things, then we may predict how they're going to score on a particular test. So it's either filling in something that has already happened, or it's predicting what people are going to do next, and this is where we get things like recommendation engines. Prescriptive analytics, not as common, that's where we have statistical models that tell us not only the prediction, but what's the next best action. So it's a combination, really, of predictive analytics, if you will, and operational research telling you what to do next. So that's the analytics part of streaming analytics. On the data side, the streaming part, I just want to build a similar hierarchy. When you think of data, it's in a database, for example, it's not going anywhere. Once you put it in, we can change values, but if we take a snapshot at any given point in time, the data is there, so it's static. If we take it a little higher up, what I refer to as the stop and frisk, data is going from one place to another, and we divert it so that we can analyze it. That could be if you're doing water samples in an environmental study, you go out and you take water out of the population, you do a sample, and then perhaps you put it back in, maybe you don't, but basically it's changing the flow of data so that you can analyze it. And at the higher level, we have the data that is actually in motion, and what we want to do is to analyze it while it's moving without changing it. And that's where we get into the whole idea of streaming analytics. So the problem is, as we've known for hundreds of years, you can't step twice into the same river, meaning that it may look the same, but you don't know the actual values that are in there. And so this adorable little kid is in the water, and if he goes in tomorrow, this one goes in tomorrow, it's going to look a lot different because he's 17 now, but there are different things in the water, even though a lot is the same. You need to be able to take a snapshot and say, what is it that's important that we want to measure? So we need to be looking when the data itself is moving. Is it something where we can divert it? Is it something where we can pull it? Can we put it aside? Can we evaluate it without changing the flow? And for those of you that think in terms of physics, we're talking Heisenberg here, by evaluating something, are we going to change it? In terms of data, that's often the case. We have to divert it, we have to delay it, or we're going to sample it. And sampling is valid under some conditions, but under others it creates more problems. So let's take a quick look at that. So streaming analytics, we're looking at, I'm going to take a prescriptive off the table for the moment, but descriptive and predictive analytics at data that's moving. So when I say streaming analytics, I'm talking about statistical analysis of data in motion. The main issue is, are we going to move the process to the data, or are we going to move the data to the process? If we move the data out of our streaming, if you will, we pull it out, we can either sample it, or we can make sure that everything goes through a place at which it can be monitored. Think of it as we have water going through a canal, and we have locks. We can stop it, and we can look at it, and then let it go on. But that changes the nature of the data. It changes the value of the data. So the conventional architecture for data analysis is that we've got data flowing on the edges of this graph. We've got the queries that we do on the vertices. So there's a stopping point. So when something is held and it's analyzed, I think when you see documentaries and birds are taken out of the population, they do some measurements, they put a band on the bird, and they put it back in, that has changed the results are not what they would be if the data hadn't been moved. Moving to a streaming architecture, a real streaming architecture, but we're going to see some examples on that. What happens is at any point where we're going to check this, we're observing without deflecting or diverting, and that requires different technologies, and unfortunately the good news is, of course, that they're out there today. There's some interesting stuff going on. I want to talk about how we're going to take advantage of it. So just a quick note when I talk about sampling. If you have a signal and you're looking at data, which is an analog sine wave, depending on how often and when you sample it, if you're not looking at every data point along that curve, as it goes up and down, in this case, I just sample every period, it would look like, this is a frequency 440 Hz versus D80, even though the sound itself is going up and down, or the signal is going up and down, I could sample it like this inadvertently, and then if I go to plot it, it's going to look like it's a constant 880, but if I sample more frequently, and we've got an analog signal, then I can get a more accurate representation. So when we're doing streaming analytics, and the signal is something that has a variance in amplitude, for example, or in frequency pitch, then it becomes critical that the model that we have for how the data is created is accurate and we know how to sample it. For the rest of the webinar, we're going to just focus on things that are where we're not sampling, where we're really going in, and we're going to try and look at every data point. So I'm going to ask you, again, to consider sort of this world where you as an individual are producing data, and some of it is streaming data, some of it is more static, there are things around you that are creating data, and some of those are streaming, and some of those are right in your physical space, if you will. And I think that the way to think about the future, the stuff that's really interesting and exciting and challenging from a technical standpoint is when you have an individual and you have data about them and you have data about the immediate environment, not what I'm calling environmental data. And just as an example, this morning, actually in San Francisco today, I took the train up from San Jose further south. As I was looking on the phone, my iPhone, a typical smartphone, has a lot of sensors. I looked at it and I saw my location. I could see it moving. So I was creating data by my device. It could have been monitored. I happened to check in on my son to make sure that he'd made it to school on time, 2,500 miles away. I could see because I could see the data from the sensor on his phone, the guess at that point he was at school, or at least he was smart enough to send his phone to school, but he didn't. But now you start to look around and say, okay, as I'm traveling on this train, there are things around me that are creating data. And by knowing about my data and that data, those are new opportunities to create value for me by being aware of what I'm doing. So let's take a look at the types of infrastructure that we need to build systems that can leverage all that data. And then look at who's doing what in the space. So this is one area where you really can't, I don't think today you can see any enterprise-grade system or any infrastructure upon which you would build an application that isn't heavily dependent on open source software. It certainly made the argument that you can hardly build anything today without using some open source in your infrastructure stack. But here there's just no way around it because to do the kind of systems that we're talking about, collecting analytics on data that's streaming, as soon as it starts to scale up, we need access to systems that can collect the data, aggregate data from different sources, and then do the analysis. And so I'm not going to read through these. I wanted to just give a list. This is a good starting point. These are a number of projects from the Apache Software Foundation. So we've got Apache Apex, Link, SAMHSA, Spark, Storm. All of these are projects that have found a following that creates software at the infrastructure level, if you will, that enable you to build systems on top. So I'll just take a couple of Apex. It's a unified stream and batch processing engine. You get into SAMHSA. It's a distributed processing. But here it uses multiple levels of other pieces of open source software. In this case, SAMHSA uses Kafka and Hadoop. Just in the interest of time, I'm going to just say, if you're dealing with something like Hadoop or Spark storage engines, this is where you're going to put your data so that you can analyze really large quantities of fast-moving data. One of the recommendations at the end, and I'll just sort of foreshadow it, is that you really need to start looking at what's available there and which of these projects are getting the most support. These are all fairly well-established. And just as, going back to 1999, 2000, so less than a couple of decades ago, when Linux was just starting to get commercialized, Red Hat did their IPO in 1999, there was some resistance today. I think we're past that, and most people would look at this and say, okay, yeah, if something has made it through the Apache Software Foundation, it's now a top-level project as most of these are, then it's something that's stable enough it has enough of a community behind it that we can start to look at products that are developed on top of it. It is a pretty rigorous process to make it to that level. I mean, like most things, there are a lot more projects that are started than finished or that people contribute, but it doesn't make it to that level of rigor, if you will, where a group like the Apache Software Foundation will promote it as a top-level project, like Linux became and Spark, et cetera. But now we're at the point where this is the place to start, I think, if you're interested in building a fully instrumented world, you need to have a data architecture that can handle the volume you're going to deal with, and one that's going to be developed and refined. I like to think of it that when we're dealing with some of these open-source projects, it's not that, I mean, it's nice that something may be available for free or at low cost, but the reality is that most people that are using these, the results of these projects, are getting them with support from a commercial vendor anyway, so it's probably less expensive than a purely proprietary solution if you could find one, but the real benefit to building your system knowing that this infrastructure is underneath is that you're dealing with something where there's a higher rate of innovation, because there are people from different companies that are building it and contributing to the code base. So here's a more graphical representation of this, this part of the slide from the folks at Stream, a company that I'll mention again in a little bit, but I like the way they put it together, so you look at it and say, okay, well, we've got open-source projects that collect data, we've got open-source projects on the right-hand side, board-dated delivery, this is where we've got Kafka and Cassandra, you know, maybe on one or both sides, and then in the middle, we have all these other open-source components or projects that, you know, the project is what the term is used for the group or federation, if you will, that's actually maintaining it and expanding it. And so for each task, there's typically one or more open-source project that provides that functionality. So this is kind of the whole ecosystem. There are more, but this is a good representative sample. And so as you look at projects and then you look at the type of applications that you're going to use or using applications to indicate any kind of package of functionality, we can start to see that for most of these projects, there are now companies that develop expertise and maybe, well, practically in many cases, the original team that started working on a project that then gets turned over into the open-source community, those people go off and commercialize it. So you have a number of companies that are focused on, companies or university groups, typically, that are focused on any one of these. And so now we start to see a market develop and that's what really builds it and frankly takes a lot of the risk out of something where you can't just say, okay, I'm buying this from XYZ company that's been around for 10 years or 50 years or 100 years and we know that if something doesn't work, we can bring the lawyers in and get it fixed. Well, you can't do that if it's a loose federation that's not a commercial entity. But the fact that so many people are depending on it, by the time it gets to this level of maturity, it's okay, now we can start to build. Let's look at the types of companies and products that are now building on this infrastructure. So one of the things we're dealing with analytics is you have to have that compute engine that I mentioned. And today, one of the big differences between the way we build systems today and 10 or 20 years ago, is that most of the approaches for this type of an application for streaming analytics are going to run at some point in a cloud environment or in a distributed environment. And so you want to look at who are providing platforms for that. And obviously if we're dealing with cloud services, you're going to be dealing with Microsoft or AWS or Google or IBM on Bluemix. And so it's interesting that from a commercial standpoint, if we take just that group, Microsoft, Amazon, IBM, Google, Oracle, I guess, each of these companies provides a platform that you can build on. They all use some combination of the, or permit if you will, use some combination of those open source projects. So you can go to IBM Bluemix, you can get access to different databases, different data sources, go to Microsoft Azure and get access to Amazon. But some of them also are developing their own services. And a big part of the economic model for this is streaming analytics as a service. You have to have a platform so you can have access to the platform in an on-demand fashion. But then you need the analytics software on top. Most of the companies that are providing cloud services are providing their own, but also access to any number of commercial vendors that are building on those platforms. And so that gets us to just a couple of slides that are taking a look at some of them. This is Azure with their IoT suite. And with the Azure suite, you've also got Microsoft machine learning and a number of analytics tools. Something similar here. This is AWS, Amazon web services. They have a big data analytics framework. And just take a quick look. You can see, for example here, there's an open source project that they are building on. So you can get access to those at different levels. You can access to Amazon. Less so than with Microsoft, you would get access to some of their own services. They're more of a, I don't want to say more open, because certainly with AWS or Bluemix, you can build whatever you want on top of it. But you have choices from Microsoft and IBM at the application level too. So one more here. Big data analytics on Bluemix. It's the same idea. You can have access at a high level of abstraction where basically you're dealing with APIs and plug and play to figure out what you're going to do, what sort of analytics you need, and then your application that you're building, it's one that you're composing an application from components that are available to you from one of the platforms. Yeah, that's the last one I'm going to do with that. So you take that, and then the key to the last few slides is that those companies provide both the platform and some of their own applications on top, as well as giving you access to the platform to build on top of with solutions from vendors like Informatica or TIPCO or SAP, SAS. The idea here is that even though these companies all build proprietary commercial software that have some of the functionality that I've been talking about in terms of streaming processing, you're buying that and you're running it, you're executing it if you want to compose it on one of the other platforms. But what I wanted to do today, and we're going to have some time for questions, is just to look at another view, if you will, of this whole landscape and say besides the actual platform providers, as a service providers, if you will, that will take you right down to the bare metal in some cases, besides the big companies that are moving into a SAS model with analytic software that in the past was largely proprietary, there are a number of emerging firms that are focused on the idea, on the functions of streaming analytics. They're doing some really interesting work and I think they're worthy of consideration. So I'm just going to go through fairly quickly four of the ones that we see as quite interesting and I put the comment at the bottom, integrate and analyze, because that's where these things have had a lot of value. You want to be able to take data from a variety of sources if you go back about 20 minutes in the webinar to the automobile example, that was almost grotesquely simplified where you had a system for tires and a system for the engine and maybe a system for fuel monitoring, something like that. In larger applications, if you're trying to do business planning, and let's say you've got historical data about your customers and now you want to also look at that historical data as static, it's not going anywhere, you have information on what your customers have done in the past and now you have sensor-based data, you know that a customer is in the store, you want to be able to integrate data from the historical view, the current view and be able to predict next behavior in order to make an offer at the right time. And in the diagram that I had where there's a circle about the individual and you've got information that's streaming and you've got static information and then you've got information about what you have, perhaps as your inventory and you know what your offers are and you know maybe we'll take the information from an outside source like weather information. If I'm building a system and I've got information about my customer and let's say I'm a soft drink manufacturer, I know that the person that's walking by the machine right now has a preference for, we're taking a really low value if you will here, for one drink over another and I know that the proclivity to buy that drink goes up as the temperature goes up. Now I know that it's going to get hotter in the next two hours and I'm not going to be able to refill the machine in the next two hours, maybe I won't make a special offer, maybe I'll raise the price as the temperature goes up. But to be able to do something like that, I need to be able to integrate data from all these different sources. So in this case it's about the individual, it's about the stock in the machine, it's information about my supply chain, how long it's going to get there and it's information or data about the weather and the forecast of the weather. So all of these four are interesting companies in the way that they think about integrating and analyzing. So data torrent is the first of the four and for each I'm just going to do one slide I'll have resources at the end so you can take a look and do more details. So data torrent, the folks that are involved in data torrent are people who are involved in the original development of Apache Apex, one of the infrastructure systems that's been promoted to a top-level project by the Apache Software Foundation. And here is just a representation that shows the kind of data that can be aggregated, which open source systems, the second oval if you will, we've got Hadoop at the top, but that is using Amazon Web Services as one of the ways to get to it. And then in the center is the actual data torrent system, as they say, powered by Apex, which is their development here. What it runs on, so this is showing that yes, it interacts with other systems, but you can run it on AWS, you can run it on Azure. So if you have a preference or a standard in your organization that you're only building on, AWS Azure gives you more options. And then showing what the features are in terms of analytics on the data in motion and visualization. Visualization is something that we really don't have time to get into much detail in terms of how these things differentiate based on visualization, but that's obviously a key way of evaluating the different tools. So that's data torrent stream. I had one of their diagrams earlier stream, it's pronounced stream, but the double I is where it refers to integration and intelligence. And this is another one that's really interesting stuff. Again, built on aggregating and integrating data from a variety of sources. It's a little difficult to read on the blue background here, but we've got Kafka and Flume and other open source projects that are feeding into it. I owe through Hadoop. Hadoop is still sort of a major presence in the market, even though Spark gets all the buzz today. So we can use Hadoop and feed Spark from Hadoop. It just interacts, pulls it all together, and allows you to do the kind of intelligence that I was talking about earlier where we're pushing the intelligence to where it needs to be. And so the integration here is with the existing jobs, the ETL, the transformer load jobs. So this is just a diagram that does a pretty good job at showing how they can pull in the data from the legacy applications. And we could do the kind of thing I was just describing, where you're looking at that historical data about the customer. And now we're looking at the real-time application, what's going on with the customer right now. And another data source, go back to my example with the weather, that allows you to start to do predictive analytics based on all these different data sources. Stream analytics. Stream analytics comes from a company called IMPITUS. Stream analytics is a derivative of business out of the IMPITUS company. And basically this is a... I hate the term next generation, but it's a very current way of approaching this integration of data from the multiple sources that we talked about. Doing the analysis and being able to do predictive analytics based on current and historical. And what's interesting here is that IMPITUS is more the services company and started to develop what became stream analytics out of their services business, because they were getting repeated requests to build systems that did this, and so they turned that into a product. Or if it gets, again, it's one of the four that I'm going to mention today that I would encourage you to take a look at. And finally, Zoom data. What to me was interesting about Zoom data is that their approach is sort of a model of aggregation and visualization, if you will, that looks like a VCR. So you can look at the data, integrate, again, all of these will allow you to pull in data from different sources, including historical, more static data, and stuff that's acquired real-time as streaming data and then start to make some predictive analysis based on that. But what's interesting about their approach is that they break the world, build the query about the data up into what they call microqueries that can run in parallel over very large data sets. And the thing that I found very interesting here is that you can very quickly get a view of the query that's perhaps more abstract than the ultimate one, but it starts to refine as it goes through processes, microqueries, and refer to that as data sharpening. So you get something that can be used very quickly and that may change the following queries, or the subsequent queries that you make based on early information rather than waiting for a complex query to go through. It's not a sampling technology. So there, it's the actual sharpening algorithms that they use that I think is interesting in advances. All right. So to get started, I say this almost at nausea in my head. Sorry, but it's always about the data. So if you're looking at this now and we're thinking about the IoT, how do I get started? Well, I've said that you're going to start by looking at open source tools and trying to choose the infrastructure that will let you look at both your historical data and your real-time data to do your analytics. But in terms of figuring out which applications you can upgrade or turn it into streaming data, the idea is to look at where you have that historical data. But now the other part that you have to look at for the next, this is one way to get started in the next 6, 12, 18 months. But if you're going to look at a business and say, well, how am I going to leverage my data assets? What you use as assets in a year from now may not even exist today. The sensors may not even exist today. So you need to be looking at how you're going to create new sources of data, new aggregations of data, and new algorithms, frankly, for creating that value based on this combination of streaming and static data. And with that, I'm going to look at eight or nine minutes, open it up to questions. I will just say that I know Shannon will tell you about getting a copy of the slides. I have a number of references for each of the products, but also each of the open source projects. There are links and more information. I'll just say we've got more in the series coming up. And I've got some new content that if you check out our research, you can see that too. So be in touch. Send me an email. And if you want to connect on LinkedIn, I have to say this because every month I get LinkedIn requests with no context. And if you want to connect because you've heard one of the webinars, just tell me that. I'll be happy to connect. All right, Shannon, and I turn it back to you. Adrienne, thank you for another great presentation. Feel free to submit your questions in the bottom right hand corner of the screen in the Q&A section if you have any questions for Adrienne. And just a reminder, as Adrienne mentioned, that I'll be sending a follow-up email by the end of day Monday with links to the slides, links to the recording of the session. Something very quiet today, Adrienne. I think you explained it all. There we go. So no questions coming in. So that's great. So just as Adrienne is showing there, there's additional resources for you that we'll get in the follow-up email as well. And next month, we hope you all will join us for machine learning case studies. I'm really looking forward to that. That's another hot topic for our audience, I know. So thanks, everyone, for everything. And Adrienne, thanks for another great presentation. And I hope everyone has a great day. Oh. Oh, I forgot I was on mute. Actually, we did have a question come in right at the last second here. So I'm just really quickly, Adrienne. So can you comment a bit on the Apache Edgint project? I'm sorry, can you say that again? Can you comment a bit on the Apache Edgint project? I don't know if I'm pronouncing that right. Edgint project. Ooh. No, not intelligently in the time that we have. But if somebody's... Let me ask you, if the person that asked that, I can't see the questions right now. I'm in a space where I don't... All I can see right now is unfortunately my own slides. But if the person that asked the question is in the process of evaluating or wants to know about contributing, by all means, let's put that in the queue. And when you send that to me, I'll write up an answer that goes out to everybody. Yeah, absolutely. They provided a link to it. Yeah, I'll get off of it. Okay, great. Yeah. All righty. That is it. That's all the questions for today. Adrienne, again, thank you so much, and thanks to all of our attendees for joining in in the presentation. I hope everyone has a great day, and we'll see you next month. Great. Take care. Thanks.