 Live from Berlin, Germany, it's theCUBE. Covering NetApp Insight 2017, brought to you by NetApp. Welcome back to theCUBE's live coverage of NetApp Insight here in Berlin, Germany. I'm your host, Rebecca Knight, along with my co-host, Peter Burris. We are joined by Mark Bregman. He is the CTO of NetApp. Thanks so much for coming on theCUBE. Thanks for taking the time. So you have been recently looking into your crystal ball to predict the future. And you have some fun, sometimes counter-intuitive predictions about what we're going to be seeing in the next year and decade to come. So your first prediction, you said, data will become self-aware. What do you mean by that? Well, the title's kind of provocative, but really the idea is that data is going to carry with it much more of its metadata. Metadata becomes almost more important than the data in many cases. And we can anticipate sort of architectures in which the data drives the processing, whereas today we always have data, sort of a pile of data over here, and then we have a process that we execute against the data. That's been our tradition in the computing world for a long, long time. As data becomes more self-aware, the data as it passes through will determine what processes get executed on it. So let me give you a simple analogy from a different field from the past. In the communications world, we used to have circuits switched systems. There was some central authority that understood the whole network. If you and I wanted to communicate, it would figure out the circuit, set up the circuit, and then we would communicate. And that's sort of similar to traditional processing of data. The process knows everything it wants to do, it knows where to find the data, it does that, and it puts it somewhere else. But in the communications world, we move to packet switch data. So now the packet, the data, carries with it the information about what should happen to it. And I no longer have to know everything about the network. Nobody has to know everything about the network. I pass it to the nearest neighbor who says, well, I don't know where it's ultimately going, but I know it's going generally in that direction, and eventually it gets there. Now, why is that better? It's very robust, it's much more scalable. And particularly in a world where the rules might be changing, I don't have to necessarily redo the program, I can change the markup, if you will, the tagging of the data. You can think of different examples. Imagine the data that's sitting in an autonomous vehicle and there's an accident. Now there are many people who want access to that data, the insurance company, the authorities, the manufacturer. The data has contained within it the knowledge of who can do what with that data. So I don't have to now have a separate program that can determine can I use that data or not. The data says, sorry, you're not allowed to see this. This is private data, you can't see this part of it. Maybe the identifier data. For the, obviously the insurance company needs to know who the car owner is, but maybe they don't need to know something else, like where I came from. The authorities might need both. Well, I came from a bar. So you can imagine that as an example. If you think- But the implications of this marker are important. For example, if I wanted to develop an application that would be enhanced by having access to data, I had to do programming to get to that data because some other application controlled that data and that data was defined contextually by that application. And so everything was handled by the application. By moving the metadata into the data, now I can bring that data to my application more easily, less overhead. And that's crucial because the value of data accretes, it grows as you combine it in new and interesting ways. So by putting the metadata into the data, I can envision a world where it becomes much faster, much more facile to combine data in new and unique ways. Exactly. It also is easier to move the processing- To the data. To the data because the processing's no longer a monolithic program, it's some large set of microservices and the data organizes which ones to execute. So I think we'll see, I mean, this is not a near-term prediction. This is not one for next year because it requires rethinking how we think about data and processing. But I think we'll see it with the emergence of microservices, compositional programming, metadata together with the data, we'll see more of it. Functional programming. Functional programming. So let me ask you a quick question before we go on to the next one. It's almost like in the early night, or the late 1970s, it was networks of devices, ARPANET that became the internet. And then the web was networks of pages. And then we moved into networks of application services. Do you foresee a day where it's going to be literally networks of data? Yes, and in fact, that's a great example because if you think about what happened in the evolution of the web through what we call the web 2.0, the pages were static data, they came alive in the web 2.0. And there was a much less of a distinction between the data and the program in the web layer. So that's what we're saying. We see that emerging even further. Next prediction was about virtual machines becoming rideshare machines. Well, this is somewhat complimentary to the first one. They all kind of fit together. And here the idea is, if we go back in the earlier days of IT, it wasn't that long ago that if you needed something, you ordered the server and you installed it and you owned it. And then we got to the model of the public cloud, which is like a rental. And by the same analogy, in the past, if I wanted a vehicle, I had to buy it. And then the rental car agencies came up and I said, well, you know, when I go to Berlin, I'm not going to buy a car for three days. I'll rent a car, but I can choose which car I want. Do I want the BMW or do I want, you know, a Volkswagen? That's very similar to the way the cloud works today. I pick what instances I want. And they meet my needs. And if I make the right choice, great. And by the way, I pay for it while I have it, not for the work it's getting done. So if I forget to return that instance, I'm still getting charged. The rideshare is kind of like Uber. And we're starting to see that with things like serverless computing. In the model that I say, I want to get this work done, the infrastructure decides what shows up in the same way that when I call Uber, I don't get to pick what car shows up. They send me the one that's most convenient for them and me. And I get charged for the work, going from point A to point B, not for the amount of time. There's some differentiation in making the car. They're on to come to that. And so that's more like a rideshare. But as you point out, even in the rideshare world, I have some choices. I can't choose if I want a large SUV, I might get a BMW SUV, or I might get a Mercedes SUV. I can't choose that. I can't choose if it's a silver or black, but I get a higher class. And what we're seeing with the cloud, or these kind of instances, virtual solutions, is they are also becoming more specialized. It might be that for a particular workload, I want some instance that has have GPUs in them, or some neural chip, or something else. In much the same way that the rental model would say, go choose the exact one you want, the rideshare model would say, I need to get this work done, and the infrastructure might decide, this is best serviced by five instances with GPU, or because of availability and cost, maybe it's 25 instances of standard processors, because you don't care about how long it takes. So it's this compromise, and it's really very analogous to the rideshare model. Now coming back to the earlier discussion, as the units of work get smaller and smaller and smaller, and become really microservices, now I can imagine the data driving that decision, hailing the cab, hailing the rideshare, and driving what needs to be done. So that's why I see them as somewhat complimentary. And so what's the upshot though, for the employee and for the company, if they are? Well, I think there are two things. One is you got to make the right decision. If I were to use Uber to commute to Sunnyvale every day, I'd break the bank, and it would be kind of stupid. So for that particular task, I own my vehicle. But if I'm going to go to Tahoe for the weekend, and I need an SUV, I'm not going to buy one. Neither am I going to take an Uber. I'm going to rent one, because that's the right vehicle. On the other hand, when I'm going from, you know, where I live to the marina within San Francisco, that's a 15 minute drive, I want it on demand, I take an Uber, and I don't really care. Now if I have 10 friends, I might pick a big one, or a small one, but again, the distinction is there. So I think for companies, they need to understand the implications. And a lot of times, as with many people, they make the wrong initial choice. And then they learn from it. So, you know, there are people who take Uber everywhere, and I talked to them, I said, I had a friend who was commuting to HP every day by Uber from the city, from San Francisco. That just didn't make sense. He kind of knew that, but he wasn't paying for it. The next one is data will grow faster than the ability to transport it, but that's okay. It doesn't sound okay. It doesn't sound okay, and for a long time, we've worried about that, and we've done compression, and we've done all kinds of things, and we've built bigger pipes, but we were fundamentally transporting data between data centers, or more recently, between the data center and the cloud, big chunks of data. What this really talks about is, with the emergence of, call it IoT, in a broad sense, telematics, IoT, digital health, many different cases, there's going to be more and more and more data both generated and ultimately stored at the edge. And that will not be able to be shipped, all of that will not be able to be shipped back to the core. And it's okay not to do that because there's also processing at the edge. So in an autonomous vehicle where you may be generating 20 megabytes per hour or more, you're not going to ship that all back. You're going to store it. You're going to do some local processing. You're going to send the appropriate summary back, but you're also going to keep it there for a while because maybe there's an accident, and now I do need all that data. I didn't ship it back from every vehicle, but that one I care about, and now I'm going to bring it back. Or I'm going to do some different processing than I originally thought I would do. So again, the ability to manage this is going to be important, but it's managed in a different way. It means we need to figure out ways to do overall data lifecycle management all the way from the edge, where historically that was a silo. We didn't care about it. Probably all the way through the archive or through the cloud where we're doing machine learning, rules generation and so on. But it also suggests that we're going to need to do a better job of discriminating or demarcating different characteristics of different classes of data. So that data at the edge, real world data that has real world implications right now is different from data that summarizes business events, which is different from data that summarized as things models that might be integrated somewhere else. And we have to do a better job of really understanding relationships between data, it's use, it's asset characteristics, et cetera. Would you agree with that? Absolutely, and maybe you see the method in my madness now, which is that data will have associated with it, the metadata that describes that so that I don't misuse it. Think about the video data off of a vehicle. I might want to have a sample of that every, I don't know, 30 seconds. But now if there's really a problem, maybe not an accident, maybe it's a performance problem, you skid it. I'd like to go back and see why. Was there a physical issue with the vehicle that I need to think about as an engineering problem? Was it your driving ability? Was it a cat jumped in front of the car? So, but I need to be able to, as you pointed out, in a systematic way distinguish what data I'm looking at and where it belongs and where it came from. The final prediction it concerns, the evolution from big data to huge data. So, that is really driven by the increasing need we have to do machine learning, AI, very large amounts of data being analyzed in near real time to meet new needs for business. And there's, again, like many of these things, there's a little bit of a feedback loop. So that drives us to new architectures, for example, being able to do in-memory analytics. But in-memory analytics, with all that important data, I want to have persistence. Technologies are coming along like storage class memories that are allowing us to build persistent storage, persistent memory. We'll have to re-architect the applications, but at the same time that persistent memory data, I don't want to lose it. So it has to be thought of also as a part of the storage system. Historically, we've had systems, the compute system, and there's a pipe and there's a storage system and they're separate. They're kind of coming together. And so you're seeing the storage impinge on the system, the compute system. Our announcement of Plexistor acquisition is how we're getting there. But at the same time, you see what might have been thought of as the memory of the compute system really being an extended part of the storage system with all the things related to copy management, backup, and so on. So that's really what that's talking about. And it's being driven by another factor, I think, which is a higher level factor. We started in the first 50 years of the IT industry, it was all about automating processes that ran the business. They didn't change the business, they made it more efficient. Accounting systems, et cetera. Since probably 2000, there's been a little bit of a shift because of the web and mobile to say, oh, I can use this to change the relationship with my customer, customer identity. I can use mobile and I can change the banking business. Maybe you don't ever come to the bank for cash anymore, even to an ATM because they've changed that. The wave that's starting now, which is driving this, is the realization in many organizations. And I truly believe, eventually, in all organizations that they can have new data-driven businesses that are transforming their fundamental view of their business. So an example I would use is imagine a shoemaker, a shoe manufacturer. Well, for 50 years, they made better shoes, they had better distribution, and they could do better inventory management and get better costs and all of that with IT. In the last seven or 10 years, they've started to be able to build a relationship with their client. Maybe they put some sensors in the shoe and they're doing Fitbit-like stuff. Mostly for them, that was about a better client relationship so they could sell better shoes because ours differentiated now. The next step is what happens if they wake up and say, wait a minute, we could take all this data and sell it to the insurance companies or the healthcare companies or the city planners because we now know where everyone's walking all the time. That's a completely different business, but that requires new kind of analytics that we can almost not imagine in the current storage model so it drives these new architectures. Great, great. And there is one more prediction. Okay. Which is that, and it comes back again, it kind of closed the whole cycle. As we see these intelligence coming to the data and new processing forms and so on, we also need a way to change data management to give us really understanding of data through its whole life cycle. One example would be how can I ensure that I understand the chain of custody of data, the example of an automobile, there's an accident, well how do I know that data wasn't altered? Or how can I know who's touched this data along the way because I might have an audit trail? And so we see the emergence of a new distributed immutable management framework. When I say those two words together you probably think blockchain, which is the right thing to think, but it's not the blockchain we know today. Yeah, maybe something else. It's something like that, it will be a distributed and immutable ledger that will give us new ways to access and understand our data. But once you open up, once you open up trying to get the metaphor, once you decide to put the metadata next to the data, then you're going to decide to put a lot more control information in that metadata next to the data. Exactly. So this is just an extension, as you said it kind of closes the loop. Exactly. Mark, well thanks so much for coming on the show and for talking about the future with us. It was really fun to have you on the show. We should come back in a year and see if anything will happen. I know, see if you were right. Exactly. Exactly. Great. Thank you. I'm Rebecca Knight. We will have more from NetApp Insight just after this.