 Hi, this is Peter Burris with Wikibon's Action Item. Once again, the research team at Wikibon is broadcasting from our beautiful Palo Alto, the Cube Studios. I've got here in the studio with me, George Gilbert and David Floyer. And on the phone or remotely, we have the ever-steamed Neil Raiden and Jim Cabela. Hey guys. Good morning. Good to be here. So today we're going to talk about something that's been tried before in the industry with not a lot of success, but for a variety of reasons, not the least of which is need. We expect that we're going to see another go at it from a number of different suppliers and technologies. And specifically what I'm talking about is what we'll call data aware middleware or technologies that read metadata and use that and process that metadata to actually take actions, either to reorganize or organize or move or not move data based on knowledge about application needs, infrastructure capabilities, time, dependencies, et cetera. Now we've seen this before. A number of years ago, we had distributed database or distributed federated database technology. Didn't go very well. We had XML. Didn't go very well. But David Floyer, something's a little bit different this time. What is happening within the industry that's really catalyzing this need and increases the likelihood that at least from a business standpoint, this may actually require some degree of success or achieve some degree of success. Well, the majority of enterprise customers are saying that they want to adopt a hybrid strategy of having some on-premise and some in clouds, either their own clouds or public clouds or in a variety of different places. And the access to data is key. Where the data is, is where you'd like to process it if it's at all possible. It's a lot easier to move code to the data than it is to move a lot of data to the code from both from an elapsed time and from a cost perspective. Therefore, what we need across this set of clouds, private clouds, public clouds, what we need is the ability to manage the data. And for that data to know what its characteristics are, where it is, what the latency is to different points within that topology and to be able to guide applications, to be able to inform applications about how they should run, the best way that they should run. And at the same time, keep consistency, availability, et cetera, of the data that's under their control. And we're seeing the introduction of some of those technologies, very interesting technologies that are taking place now. And a recent one was a very, which was bought by Microsoft. So that's the background to there being a strong demand for this type of cross-cloud metadata, in particular, that will tell applications what to do. We got S3, cloud storage technology, for example, object-related technologies that are now presenting, where we're binding metadata and the data together. We've got platforms like Aviri coming at this. We have other technologies like Wendisco, who started out in the world of mainly focusing on responding to application requirements, and now we're doing active-active replication based on data, practical data realities. But we've seen this before, Jim. Why has this kind of an effort failed in the past? Complexity, in a word. First of all, this end-to-end self-describing integration fabric, you know, it's like, you know, our esteemed colleague, Neil, and we've all discussed the fact that, long time ago, in the late, great early part of the 2000s, there was this paradigm of ESB, Enterprise Service Plus, as an SOA, and federated data and all that, but it took too long, it was too complex for too little gain, to set up these complex webs of trust among these federated data domains to enable and to build the elaborate choreographies and orchestrations, and to build out the overlying semantics with ontologies to make all this stuff work together. It was just too thick of a stack and nobody could develop it. It was a bear to maintain this rat's nest of code and all that, and it was abandoned largely. It was a great thing in the trade press, it was a great thing for architects to describe how they're going to change the world, make everything interoperate with everything else magically. It didn't work. But we are, but there were also some, some very established and powerful entities who took an active role in ensuring that a lot of this didn't work, like IBM with their distributed federated database technology was not easily supported by Oracle. There were a variety of different approaches to how we were going to handle XML at different languages and in different infrastructures and different platforms. Neil Raden, from a very practical standpoint, when we think about data, we have to be careful that we don't attribute some magic to metadata. What is some of the difference between data and metadata? And again, what is necessary to ensure that we have some degree of consistency and commonality at that metadata later to make sure that this kind of thing will work? Data is easy to create. Write an application and it reads data or it creates data. God knows there's tons of it. Metadata has to be created with some intelligence. That means either people or now some AI tools. And by the way, I noticed that Oracle has now referred to AI and I know that they're not the only ones as augmented intelligence. And I really, really like that term because to me, artificial intelligence is something else. It's vision and cognition and all that other stuff. And in business, we're really looking at augmented intelligence. So the deal with metadata is how does it get created? Data doesn't create metadata. I mean, it can to a certain extent, but not enough. And I think that how the metadata then gets used is also another issue because you don't want to have a million people out there and this would be the systems integrator full employment act, writing a bunch of if then else code to use this metadata across the systems. So it has to be done with some intelligence. I will disagree with one thing that David said. I think that he described the hybrid environment that was really just part of an enterprise but very few enterprises today are working without interoperating the data and applications of their supplier and customers and so forth. So for this to be useful, it would have to include metadata configurations for those systems as well. What do you think, David? How is the metadata, where are these definitions going to come from? Will some of the lower level platform and storage guys be more successful at defining some of these things than the higher level application tool guys that try to do this from the developer down? Right, yeah, I think it's going to have to come and develop from the bottom up. I agree with your requirement but that is a necessity if suppliers are going to work with their customers, the supply chain as a whole is going to be investigated. That's going to be a requirement. But however, there's a lot of simple stuff that can be done now. And I gave an example of a very, their technology, they are able to use S3 together with a company's own file capability, together with local distributed file systems to make it look as a single global file system. And within that, have knowledge about where everything is across that, which guarantees consistency. As long as it's expressed into a theory. As long as it's expressed within the Averi set of microcode by using that particular object or file system across if they agree to use it, could produce solutions which would allow them to be able to run their code against that file system, that object file system. You're seeing these different types of technology fit in as potential solutions. And this is, to me, very, very important because unless you can get access to that data and use it wherever it is, and unless you can get away from this concept, everything has got to go up into the cloud, the ability to actually produce real results for enterprises will be severely limited. Well, but even if everything's in the cloud, you still have the problem of understanding and knowing where the data is and being able to do something with it so that it's not siloed within individual applications. But as we think about this, George, the whole concept of the pipeline is kind of a middle ground between knowledge about the data as an object and metadata associated with it and what the applications are going to need. How's that likely to play out in that world of thinking through the pipeline? I think the way you framed it was really good in that as we've gone through decades and decades of sort of ever more structured and componentized programming case as the programming projects get larger and larger, there has to be ways of breaking them down so that they're more modular. And as we look at this sort of data-driven middleware, it absorbs more of the metadata from the applications so the applications themselves are more independent. And if we go back in time, so we go back to the transaction monitors, kicks and tuxedo, mediated access to the database. Then there was the RDBMS triggers, which Sybase did first, which essentially took rules from the applications, centralized them in the database so that they were consistent in terms of how different apps or modules access them. Now we have the distributed commit log which took what was inside the DBMS that was hidden in there that ensured essentially acid compliance. And we've opened that up with Kafka so that many applications could publish into this backbone and then subscribe from it. And the interesting thing there is this is for this can enable the continuous processing parts of microservices or functions or and then on the consuming side, you can show materialized views which are sort of the result of a query. In other words, it looks like a database then when you're accessing it from the, when you're consuming what was pushed out of this log. So the bottom line is there's a history of stuff doing this but where it has been successful has been close to the data as opposed to close to the application. And now it's- We haven't successfully gotten large numbers of developers to adopt common principles for how they define their data, how they define their metadata so that this will work. We have been more successful at people who are working with data as a set of assets and as a set of objects. So as we think about this, Jim, I just kind of introduced a couple of things that are likely to have to happen for this to be successful. But there have got to be some others. It's a pretty high bar to imagine how this is going to play out over time. It's unlikely that we're going to see all data, well we know, all data is not going to magically present itself to Avira. But Jim, what are some of the other things that have to happen for this type of technology to be adopted with a little bit more success? Yeah, what's going to have to happen first of all is a universal orchestration layer for managing the movement and tracking and processing of all these data and data derived objects and the metadata from Anduin. I think it's coming to pass. I mean, I think Kubernetes is an onsite favor to become that universal orchestration bus that enables that, you know, the custody of data to be governed through, you know, the rules, the complex rules that can be defined and executed therein. What also has to happen is a universal web of trust. That's an old term. And we're starting to see that come into being through blockchain. Blockchain is a distributed hyper ledger to enable, the built on PKI public infrastructure to enable a universal trust infrastructure to support a universal hyper ledger, another system of record that's held in common or accessed in common by all application and data domains. These are two critical pieces of the overall puzzle that are coming fairly rapidly into broad adoption. And really through open source code, you know, that everybody marches around. Those are two critical pieces, but also very much a standard way of representing the policies governing how any given piece of content is to be managed throughout its life cycle. That's something that we don't see yet. Unless you guys have heard of one, but I don't see yet anybody describing a standard wrapper to describing the various types of things that can be done with any arbitrary piece of content. This knowledge about the data and then using that knowledge to apply it, it goes even beyond this, what Jim was talking about orchestrating the data pipeline, that's what a distributed commit log does as a hub. That's intelligently sort of bringing data in from many sources and publishing it to many others in different formats where the consumers can present it as if they were a database, a cache, a search index, but there's another higher level, which is when we do IT operations management, application performance management, those, the models about how that environment works, those are rich data-driven models and they respond to the operational data coming from the infrastructure and the services to inform administrators as to how to keep the service level objectives in compliance or they do the automatic remediation themselves. And that type of machine learning based management and orchestration, in this case, it's applied just to running the applications, but in the future we'll see it applied sort of outwards to the endpoints of the applications. Good point, George. Okay, so Neil, action item. I think that metadata-driven middleware is a nice term. It requires intelligence and my experience with metadata over the years, it always seems to be kind of incomplete. The other thing I'd say is that developing something is actually easy. All you need is to get some money and some time to figure it out, but maintaining something in real time is quite difficult. So who's going to maintain the metadata so that between instance one and instance two, something has changed, the metadata hasn't picked up on it yet? I think that's going to be a big nod to your problem. All right, David Floyer, action item. So I've really building on what Neil's just been saying, that the core of this is that the systems are going to become too complex for humans to manage and therefore as part of the design, you have to put into the systems the ability to be self-managed. In other words, that they have the AI capabilities, they have the operational automation capabilities that can offload that management onto the system itself and not have millions of alerts going to a few poor operators. So a key requirement is AI and automation, operational automation. George, action item. Couldn't agree more with David. I would say there's a step before that, which is to take the, not just to take monolithic applications and break them apart, because that's a very dangerous undertaking, but the new Greenfield apps to build them more with either microservices or functions where you have to build in knowledge of the data and the ability to evolve with that data as it changes, those become sort of a loose collection of microservices and functions. And for that to work, you sort of, it's feeding off the data. The control flow feeds off what values of data you have, what types of data you have. So it's very much more data-centric than sort of rules-based. Jim Cabela's action item. Action item is to follow deep learning into this space. And I think deep learning is going to be used to automatically, in the future, before long, to automatically extract from any arbitrary data that comes to you the associated metadata, if it hasn't already been required to be semantics and to post it to some external system, or some system of record where it could be accessed by all apps. So follow the AI industries, utilization of DL for this kind of application. I think that will come fairly soon, and I think it's probably already underway. All right, so let's do a quick kind of summary of everything. So the overall action item for us this week is that when we think about the upcoming transition to utilizing cloud technologies as a basis for infrastructure so that we can apply more complex application technologies, what that ultimately means is an increasing recognition that data has to be treated as an asset. It's going to be easier, first, to move application function to the data. Secondly, we have to bring a degree of automation to how we manage data and infrastructure. Otherwise, we'll get buried by these more complex application forms. And third, we have to recognize that increasingly businesses are going to communicate with each other, not just through humans and related contracting and interactions, but increasingly through machine level interactions. That's going to place an enormous pressure on the ability for data to do a better job of managing itself. We see on the horizon a class of data-aware, middle-aware that uses metadata as a processing format to better predict and anticipate and appropriately store and or move data in response to application needs in an anticipatory way that satisfies the constraints both of cost as well as latency and wait times. The challenges that the industry faces as we move to this are very clear. We've tried this before in other domains and it hasn't worked particularly well. This time, we think it's going to be different because we can bring artificial intelligence and other types of technologies. One key set of challenges, organizations have to be ready to do this, developers have to be willing to adopt it, and we're going to need a network of trust that imbues security and other controls in the data itself so that we can be certain as we process data. Blockchain may help there, but a lot of new innovation is required. All right, so once again, I want to thank the Wikibon research team for a great conversation about this, George Gilbert, David Fleuer here in the studio, remote Neil Raiden and Jim Cabellus. I'm Peter Burris, thank you very much. This has been Wikibon's Action Item.