 Hi, I'm Peter Burris and welcome once again to Wikibon's weekly research meeting on theCUBE. This week we're going to discuss something that we actually believe is extremely important. And if you listen to the recent press announcements this week from Dell EMC, the industry increasingly is starting to believe is important. And that is, how are we going to build systems that are dependent upon what happens at the edge? The past 10 years have been dominated about the cloud. How are we going to build things in the cloud? How are we going to get data to the cloud? How are we going to integrate things in the cloud? Well, all those questions remain very relevant. Increasingly, the technology is becoming available, the systems and the design elements are becoming available, and the expertise is now more easily bought together so that we can start attacking some extremely complex problems at the edge. A great example of that is the popular notion of what's happening with automated driving. That is a clear example of huge design requirements at the edge. Now to understand these issues, we have to be able to generalize certain attributes of the differences in the resources, whether they be hardware or software, but increasingly, especially from a digital business transformation standpoint, the differences in the characteristics of the data. And that's what we're going to talk about this week. How do different types of data, data that's generated at the edge, data that's generated elsewhere, going to inform decisions about the classes of infrastructure that we're going to have to build and support as we move forward with these transformations taking place in the industry. So to kick it off, Neil Raden, I want to turn to you. What are some of those key data differences and what taxonomically do we regard as what we call primary, secondary and tertiary data? Neil? Well, primary data coming from sensors is a little bit different than anything we've ever seen in terms of doing analytics. Now I know that operational systems do pick up primary data, credit card transactions, something like that. But scanner data, I'm not saying scanner data, I mean sensor data is really designed for analysis. It's not designed for record keeping. And because it's designed for analysis, we have to have a different way of treating it than we do other things. If you think about a data lake, everything that falls into that data lake has come from somewhere else. It's been used for something else. But this data is fresh and that requires that we really have treated carefully. Now the retention and stewardship of that requires a lot of thought. And I don't think industry has really thought of that through a great deal. But look, sensor data is not new. It's been around for a long time. But what's different now is the volume and the lack of latency in it. But any organization that wants to get involved in it really needs to be thinking about what's the business purpose of it. If you're just going into IoT as we call it generically, to save a few bucks, you might as well not bother. It really is something that will change your organization. Now, what do we do with this data is a real problem because for the most part, these sensors are going to be remote and there's going to be a lot of them. That means they're going to generate a lot of data. So what do we do with it? Do we reduce it at the site? That's been one suggestion. There's an issue that any model for reduction could conceivably lose data that may be important somewhere down the line. Can the data be reconstituted through metadata or some sort of reverse algorithms? Perhaps. Those are the things we really need to think about. My humble opinion is the software and the devices need to be a single unit. And for the most part, they need to be designed by vendors, not by individual ITs. So David Florell, let's pick up on that. Software and device is a single unit designed more by vendors who have specific domain expertise, turn into solutions and presented to business. What do you think? Absolutely. I completely concur with that. The initial attempts at using the sensors and connecting them to the sensors were very simple things, like, for example, the thermostats. And that worked very well. But if you look at it over time, that processing for that has gone into the home, into your Apple TV device or your Alexa or whatever it is. So that's coming down. And now it's getting even closer to the edge. In the future, our proposition is that it will get even closer. And vendors will put together solutions, all types of solutions that are appropriate to the edge. They'll be taking not just one sensor, but multiple sensors, collecting that data together. Just like in the autonomous car, for example, where you take the LiDARs and the radars and the cameras, et cetera, will be taking that data, will be analyzing it, and will be making decisions based on that data at the edge. And vendors are going to play a crucial role in providing these solutions to IT and to the OT and to many other parts. And the large value will be in their expertise that they will develop in this area. So as a rule of thumb, when I was growing up and learned to drive, I was told, always keep five car lengths between you and whatever's in front of you at whatever speed you're traveling. What you just described, David, is that there will be sensors and there will be processing that takes place in that automated car that isn't using that type of rule of thumb, but knows something about tire temperature and therefore the coefficient of friction on the tires, knows something about the brakes, knows what the stopping power needs to be at the speed and therefore what buffer needs to be between it and whatever else is around it. There's no longer a rule of thumb. This is physics and deep understanding of what it's going to require to stop that car. And on top of that, what you also want to know, outside from your car, is what type of car is in front of you? Is that an autonomous car or is that somebody being driven by PETA? In which case you have 10 lengths. But that's not going to be primary data. Is that what we mean by secondary data? No, that's still primary because you are going to set up a connection between you and that other car that car is going to tell you, I'm primary to you as primary data. Here's what I mean. That's right, to you as primary data, but from a standpoint of that the car in that case is submitting a signal, right? So even though it's to your car, it's primary data, but one of the things from a design standpoint that's interesting is that car is now transmitting a digital signal about its state that's relevant to you so that you can combine that inside car, effectively a gateway inside your car. So there's external information that is in fact digital coming in, combining with the sensors about what's happening in your car. Have I got that right? Absolutely, that to me is a sort of sanctuary one. Then you've got the tertiary data, which is the big picture about the traffic conditions and the weather and the routes and that sort of thing, which is that much higher cloud level, yes. So David Vellante, we always have to make sure as we have these conversations, we've talked a little bit about this data, we've talked a little bit about the classes of work that's going to be performed at the different levels, how do we ensure that we sustain the business problem in this conversation? So I mean, I think Wikibon has done some really good work on describing what this sort of data model looks like from edge devices, where you have primary data the gateways, where you're doing aggregated data in the cloud where maybe the serious modeling occurs. And my assertion would be is that the technology to support that elongating and increasingly distributed data model has been maturing for a decade and the real customer challenge is not just technical. It's really understanding a number of factors and I'll name some. Where in the distributed data value chain are you going to differentiate? And how does the data that you're capturing in that data pipeline contribute to monetization? What are the data sources who has access to that data? How do you trust that data and interpret it and act on it with confidence? There are significant IP ownership and data protection issues. Who owns the data? Is it the device manufacturer? Is it the factory, et cetera? What's the business model that's going to allow you to succeed? What skill sets are required to win? And really importantly, what's the shape of the ecosystem that needs to form to go to market and succeed? These are the things that I think customers are really struggling with that I talked to. Yeah, the one thing I'd add to that and I want to come back to it is the idea that and who is ultimately bonding the solution because this is going to end up in a court of law. But let's come to this IP issue, George. Let's talk about how local data is going to be, is going to enter into the flow of analytics and what, in that question of who owns data, because that's important to then have a question about some of the ramifications and liabilities associated with this. Okay, well, just on the IP protection and the idea that a vendor has to take sort of whole product responsibility for the solution, that vendor is probably going to be dealing with multiple competitors when they're sort of enabling, say, self-driving car or other edge or smaller devices. The key thing is that a vendor will say, you know, the customer keeps their data, the customer gets the insights from that data, but that data is informing in the middle of a black box, an analytic black box, it's flowing through it, that's where the insights come out on the other side, but the data changes that black box as it flows through it. So that is something where, you know, when the vendor provides a whole solution to a Mercedes, that solution will be better when they come around to BMW and the vendors, the customers should make sure that what BMW gets the benefit of goes back to Mercedes. That's on the IP thing. I want to add one more thing on the tertiary side, which is that when you're close to the edge, we have much more, it's much more data intensive when we've talked about the reduction in data and the real-time analytics, at the tertiary level, it's going to be more where time is a bigger factor and you're essentially running a simulation, it's more compute intensive, and so you're doing optimizations of the model and those flow back as context to inform, you know, both the gateway and the edge. David Floyer, I want to turn it to you. So we've talked a little bit about the characteristics of the data, great list of Dave Vellante about some of the business considerations. We will get very quickly in a second to some of the liability issues because that's going to be important, but take us through how, and what George just said about the tertiary elements, now that we've got all the data laid out, how is that going to map to the classes of devices and we'll then talk a bit about some of the impacts in the industry, what's it going to look like? So if we take the primary edge first and you take that as a unit, you'll have a number of sensors within that. So just a moment, this is data about the real world that's coming into the system to be processed. To be processed, yes. So it'll have, for example, cameras, if we take a simple example of making sure that bad people don't get into your site, you'll have a camera there which will be facial recognition. They'll have a badge of some sort, so you'll read that badge. You may want to take their weight. You may want to have a infrared sensor on them so that you can tell their exact distance. So a whole set of sensors that the vendor will put together for the job of ensuring you don't get bad guys in there. And what you're ensuring is that bad guys don't get in there. That's obviously one very important, and also that you don't go and... Stop, good guys. Good guys going in there. So those are the two characteristics, the false positives. Those are the two things you're trying to design that thing. At the primary edge. At the primary edge. And there's a mass amount of data going into that which is then going to be reduced to very, very little data coming up to the next level which is this guy came here, this was his characteristics. He didn't look well today. Maybe you should see a nurse or whatever other information you can gather from that will go up to that secondary level. And then that'll also be a record of 2HR maybe about who has arrived there or what time they arrived to the manufacturing systems about who is there and who can be, who has those skills to do a particular job. There are multiple uses of that data which can then be used for differentiation and for whatever else from that secondary later into local systems. And then equally, they can be pushed up to the higher level of how much power should be generating today or what the higher levels. We now have 4,000 people in the building. Air conditioning therefore is going to look like this and those types. Or it could be combined with other types of data like overtime, we're going to need new capacity or payroll or whatever else it might be. And each level will have its own type of AI. So you've got AI at the edge which is to produce a specific result. And then there's AI to optimize at the secondary level and then the AI optimize bigger things at the tertiary level. So we're going to talk more about some of the AI next week but for right now we're talking about classes of devices that are high performance, high bandwidth, cheap, constrained, proximate to the event. Gateways that are capable of taking that information and start to synthesize it for the business, for other business types of things. And then tertiary systems, true private cloud for example, although we may have very sizable things at the gateway level as well that are capable of integrating data in a more broad way. What's the impact in the industry? Are we going to see IT firms roll in and control this sweeping, as Neil said, trillions of new devices? Is this all going to be Intel? Is it all going to be looking like clients and PCs? My strong advice is that the devices themselves will be done by extreme specialists in those areas. They will need a set of very deep technology understanding of the devices themselves, the sensors themselves, the AI software relevant to that. Those are the people that are going to make money in that area. And you're much better off partnering with those people and letting them solve the problems. And you solve, as Dave said earlier, the ones that can differentiate you within your processes and within your business. So yes, leave that to other people is my strong advice. And from an IT's point of view, just don't do it yourself. Well, the gateways sound like you're suggesting that the gateway is where that boundary is going to be. And the IT technologies may increasingly go down to the edge, but it's not clear that the IT vendor expertise goes down to the edge to the same degree. So Neil, let's come back to you. When we think about this arrangement of data, how the use cases are going to play out and where the vendors are, we still have to address this fundamental challenge that Dave Vellante bought up. Who's going to end up being responsible for this? Now you've worked in insurance. What does that mean from an overall business standpoint? What kinds of failure rates are we going to accommodate? How is this going to play out? What do you think? Well, I'd like to point out that I worked in insurance 30 years ago. I didn't want to date you, Neil. Yeah, an old reliable life insurance company. Anyway, one of the things David was just discussing sounded a lot to me like complex event processing. And I'm wondering where the logical location of that needs to be, because you need some prior data to do CEP. You have to have something to compare it against. But if you're pushing it all back to the tertiary level, there's going to be a lot of latency. And the whole idea was CEP was right now. So that I'm a little curious about. But I'm sorry, what was your question? Well, no, let's address that. So CEP, David, I agree. But I don't want to turn this into a general discussion about CEP. It's got its own set of issues. It's clear that there have got to be complex models created. And those are going to be created in a large environment, almost certainly in a tertiary type environment. And those are going to be created by the vendors of those particular problem solvers at the primary edge. To a large extent, they're going to provide solutions in that area. And they're going to have to update those. And so they are going to have to have lots and lots of test data for themselves. And maybe some companies will provide test data if it's convenient for a fee or whatever it is to those vendors. But the primary model itself is going to be in the tertiary level, and that's going to be pushed down to the primary level itself. So let me turn, I'm going to make an assertion here, that the way I think about this, Neo, is that the data coming off at the primary level is going to be the sensor data. The sensor said it was good. The data, then that is recorded as an event. We let somebody in the building. And that's going to be a key feature of what happens at the secondary level. I think a lot of complex processing is likely to end up at that secondary level. Then the data gets pushed up to the tertiary level and it becomes part of an overall social understanding of the business, its behavior data. So increasingly, what did we do as a consequence of letting this person in the building? Oh, we tried to stop them. That's going to be more the behavioral data that ends up at the tertiary level. We'll still do complex event processing there. It's going to be interesting to see whether or not we end up with CEP directly in the sensor tower. Might under certain circumstances, that's a cost question though. All right, so let me now turn it in the last few minutes here, Neil, back to you. At the end of the day, we've seen for years the question of how much security is enough security. And businesses said, oh, I want to be 100% secure. And sometimes CISO said, we got that. You gave me the money. We've now made you 100% secure. But we know it's not true. Same thing is going to exist here. How much fidelity is enough fidelity down at the edge? How do we ensure that business decisions can be translated into design decisions that lead to an appropriate and optimized overall approach to the way the system operates? From the business standpoint back, what types of conversations are going to take place in the boardroom that the rest of the organization is going to have to translate into design decisions? Boy, bad actors are going to be bad actors. I don't think you can do anything to eliminate it. The best you can do is use the best processes and the best techniques to keep it from happening and hope for the best. I'm sorry. That's all I can really say about it. There's quite a lot of work going on at the moment from ARM in particular. They've got a security device imageability. So there's a lot of work going on in that very space. And it's obviously interesting from an IT perspective is how do you link the different security systems, both from an ARM point of view and then from an X86 as you go further up the chain. How are that going to be controlled and how that's going to be managed? That's going to be a big IT issue. I think the transmission is the weak point. What do you mean by that, Neil? Well, the data has to flow across networks. That would be the easiest place for someone to intercept it and do something nefarious. Right, yeah. So that's purely a security thing. I was trying to use that as an analogy. So at the end of the day, the business is going to have to decide how much data do we have to capture off the edge to ensure that we have the kinds of models we want so that we can realize the specificity of actions and behaviors that we want in our business. That's partly a technology question. Partly a cost question. Different sensors are able to operate at different speeds, for example. But ultimately we have to be able to bring those that list of decisions or business issues that Dave Vellante raised down to some of the design questions. But it's not going to be throw a $300 microprocessor at everything. There's going to be very, very concrete decisions that have to take place. So, George, do you agree with that? Yes, two issues though. There's one, there's the existing devices that can't get re-instrumented, that they already have their software hardware. There's a legacy in place. Yes. There's another thing which is some of the most advanced research that's been going on that produced much of today's distributed computing and big data infrastructure, like the Berkeley Analytics Lab and say their contribution of Spark and related technologies. They're saying we have to throw everything out and start over for secure real-time systems that you have to build from the hardware all the way up. In other words, you're starting from the sand to rethink something that's secure and real-time that you can't layer it on. So, very quickly, David, that's a great point, George. Building on what George has said very quickly, the primary responsibility for bonding the behavior or the attributes of these devices are going to be with the vendor. Of creating the solution. Correct. That's going to be the primary responsibility. But obviously, from an IT point of view, you need to make sure that that device is doing the job that's important for your business, not too much, not too little, is doing that job and that you are able to collect the necessary data from it that is going to be of value to you. So that's a question of qualification of the devices themselves. All right, so David Vellante, Neil Raiden, David Floyer, George Gilbert, action item round. I want one action item from you guys from this conversation. Keep it quick, keep it short, keep it to the point. David Floyer, what's your action item? So my action item is don't go into areas that you don't need to. You do not need to become experts or IT in general, does not really need to become experts at the edge itself. Rely on partners, rely on vendors to do that, unless of course you're one of those vendors, in which case you'll need very, very deep knowledge. Or you choose that that's where you're in the value stream, you're differentiated. You just became one of those vendors. Yes, exactly. George Gilbert. I would build on that and I would say that if you look at the skills required to build these full stack solutions, there's data science, there's application development, there's the analytics, very few of those solutions are going to have skills all in one company. So the go-to-market model for building these is going to be something that at least at this point in time we're going to have to look to like combinations like IBM working with sort of supply chain masters. Good, Neil Raden, action item. I think the question is not necessarily one of technology because that's going to evolve. But I think as an organization, you need to look at it from this end, which is would employing this create a new business opportunity for us, something we're not already doing. Or number two, change our operations in some significant way. Or number three, the old red queen thing. We have to do it to keep up with the competition. David Vellante, action item. Okay, well look, at the risk of sounding trite, you got to start the planning process from the customer on in. And so often people don't. You got to understand where you're going to add value for customers and construct an external and internal ecosystem that can really juice that value creation. All right, fantastic guys. So let's be quickly summarized. This week on the Wikibon Friday research meeting in theCUBE we discussed a new way of thinking about data characteristics that will inform system design and business value that's created. We observed that data is not all the same when we think about these very complex, highly distributed and decentralized systems that we're going to build. That there's a difference between primary data, secondary data and tertiary data. Primary data is data that is generated from real world events or measurements and then turned into signals that can be acted upon very proximate to that real world set of conditions. A lot of process, a lot of sensors will be there, a lot of processing will be moved down there and a lot of actuators and actions will take place without referencing other locations within the cloud. However, we will see circumstances where the events that are taken or the decisions that are taken on those events will be captured in some sort of secondary tier that will then record something about the characteristics of the actions and events that were taken and then summarized and then pushed up to a tertiary tier where that data can then be further integrated in other attributes and elements of the business. The technology to do this is broadly available but not universally successfully applied. We expect to see a lot of new combinations of edge-related device to work with primary data. That is going to be a combination of currently successful firms in the O-tier operational technology world, most likely in partnership with a lot of other vendors that have demonstrated significant expertise in understanding the problems, especially the business problems associated with the fidelity of what happens at the edge. The IT industry is going to push very aggressively and very close to this at that secondary level through gateways and other types of technologies and even though we'll see IT technology continue to move down to the primary level, it's not clear exactly how vendors will be able to follow that. More likely we'll see the adoption of IT approaches to doing things at the primary level by vendors that have domain expertise in how that level works. We will however see significantly interesting true private cloud and public cloud data end up from the tertiary level end up with a whole new sets of systems that are going to be very important from an administration and management standpoint because they have to work within the context of the fidelity of this overall system together. The final point we want to make is that these are not technology problems by themselves. While significant technology problems are on the horizon about how we think about handling this distribution of data, managing it appropriately, our ability ultimately to present the appropriate authority at different levels within that distributed fabric to ensure the proper working condition in a way that nonetheless we can recreate if we need to. That these are at bottom fundamentally business problems. They're business problems related to who owns the intellectual property that's been created, they're business problem related to what level in that stack do I want to show my differentiation to my customers and their business problems from a liability and legal standpoint as well. The action item is all firms will in one form or another be impacted by the emergence of the edge as a dominant design to this consideration for their infrastructure but also for their business. Three ways or a taxonomy that looks at three classes of data, primary, secondary and tertiary, will help businesses sort out who's responsible, what partnerships I need to put in place, what technologies am I going to employ and very importantly what overall business exposure I'm going to accommodate as I think ultimately about the nature of the processing and business promises that I'm making to my marketplace. Once again, this has been the Wikibon Friday research meeting here on theCUBE. I want to thank all the analysts who are here today but especially thank you for paying attention and working with us and by all means, let's hear those comments back about how we're doing and what you think about this important question of different classes of data driven by different needs of the edge.