 Hi, I'm Peter Burris, welcome to Action Item. There's an enormous net new array of software technologies that are available to businesses and enterprises to attend to some new classes of problems. And that means that there's an explosion in the number of problems that people perceive as could be applied or could be solved with software approaches. The whole world of how we're going to automate things differently in artificial intelligence and any number of other software technologies are all being bought to bear on problems in ways that we never envisioned or never thought possible. That leads ultimately to a comparable explosion in the number of approaches to how we're going to solve some of these problems. That means new tooling, new models, new any number of other structures, conventions, and artifacts that are going to have to be factored by IT organizations and professionals in the technology industry as they conceive and put forward plans and approaches to solving some of these problems. Now, George, that leads to a question. Are we going to see an ongoing ever-expanding array of approaches, or are we going to see some new kind of steady state that kind of starts to simplify what happens or how people, how enterprises conceive of the role of software in solving problems? Well, we've had probably four decades of package applications being installed and defining really the systems of record, which first handled the ordered cash process and then layered around that. Once we had more CRM capabilities, we had the sort of the opportunity to lead capability added in there. But systems of record fundamentally are backward looking. They're tracking about the performance of the business. The opportunity we have. Recording what has happened. Yes, recording what has happened. The opportunity we have is now to combine what the big internet companies pioneered with systems of engagement, where you had machine learning anticipating and influencing interactions. You can now combine those sorts of analytics with systems of record to inform and automate decisions in the form of transactions. And the question is now, how are we going to do this? Is there some way to simplify or not completely standardized, but can we make it so that we have at least some conventions and design patterns for how to do that? And David, we've been working on this problem for quite some time, but the notion of convergence has been extent in the hardware and the services or in the systems business for quite some time. Take us through what convergence means and how it is going to set up new ways of thinking about software. So there's a hardware convergence and it's useful to define a few terms. There's converged systems. Those are systems which have some management software that have been brought into it. And then on top of that, they have traditional sands and networks. There's hyper-converged systems which started off in the cloud systems and now have come to enterprise as well. And those bring software networking, software storage, software. Software defined, so it's a virtualizing of those converged systems. Absolutely. And in the future, it's going to bring also automated operational stuff as well, AI in the operational side. And then there's full stack conversions where we start to put in the software, the application software to begin with the database side of things. And then the application itself on top of the database. And finally, what you were talking about, the systems of intelligence where we can combine both the systems of record, the systems of engagement and the real-time analytics as a complete stack. But let's talk about this for a second because ultimately what I think you're saying is that we've got hardware convergence in the form of convergent infrastructure, hyper-converged in the forms of virtualization of that, new ways of thinking about how the stack comes together and new ways of thinking about application components. But what seems to be the common thread through all of this is data. So is basically what we're seeing is a convergence or a rethinking of how software elements revolve around the data. Is that kind of the centerpiece of this? That's the centerpiece of it. And we had very serious constraints about accessing data. Those were improved with Flash, but there's still a lot of room for improvement. And the architecture that we are saying is going to come forward, which really helps this a lot is the unigrid architecture where we offload the networking and the storage from the processor. This is already happening in the hyperscale clouds. They're putting a lot of effort into doing this. But we're at the same time allowing any processor to access any data in a much more fluid way. And we can grow that to thousands of processes. Now that type of architecture gives us the ability to converge the traditional systems of record. And there are a lot of them obviously and the systems of engagement and the real time analytics for the first time. But the focal point of that convergence is not the licensing of the software, the focal point is convergence around the data. Now that has some pretty significant implications when we think about how software has always been sold, how organizations to run software have been structured, the way that funding is set up within businesses. So George, what does it mean to talk about converging software around data from a practical standpoint over the next few years? Okay, so let me take that and interpret that as converging the software around data in the context of adding intelligence to our existing application portfolio and then the new applications that follow on. And basically when we want to inject in intelligence enough to anticipate and inform interactions or inform or automate transactions, we have a bunch of steps that need to get done where we're ingesting essentially contextual or ambient information. Often this is information about a user or in the business process. And this data, it's got to go through a pipeline where there's both a design time and a runtime in addition to ingesting it, you have to sort of enrich it and make it ready for analysis. Then the analysis is essentially picking out of all that data and calculating the features that you plug into a machine learning model. And then that produces essentially an inference based on all that data that says, well, this is the probable value. And it sounds like it's in the weeds, but the point is it's actually a standardized set of steps. Then the question is, do you put that all together in one product across that whole pipeline? Can one piece of infrastructure software manage that? Or do you have a bunch of pieces each handing off to the next? And- But let me stop you. Because I want to make sure that we kind of follow the thread. So we've argued that hardware convergence and the ability to scale the role that data plays or how data is used is happening. And that opens up new opportunities to think about data. Now what we've got is we are centering a lot of the software convergence around the use of data through copies and other types of mechanisms for handling snapshots and whatnot in things like Unigrid. But let's start with this. It sounds like what you're saying is we need to think of new classes of investments in technologies that are specifically set up to handling the processing of data in a more distributed application way, right? If I got that right, that's kind of what we mean by pipelines. Yes. Okay, so once we do that, once we establish those conventions, once we establish organizationally, institutionally, how that's going to work, now we take the next step of saying, are we going to default to a single set of products or are we going to do best to breed and what kind of convergence are we going to see there? And there's no- First of all, have I got that right? Yes, but there's no right answer. And I think there's a bunch of variables that we have to play with that depend on who the customer is. For instance, the very largest and most sophisticated tech companies are more comfortable taking multiple pieces, each that's very specialized and putting them together in a pipeline. Facebook, Yahoo, Google, got it. Those guys. And the knobs that they're playing with, that everyone's playing with are three, basically on the software side, there's your latency budget, which is how much time do you have to produce an answer so that that drives the transaction or the interaction? And it's not, that itself is not just a single answer because the goal isn't to get it as short as possible. The goal is to get as much information into the analysis within the budgeted latency. So it's packing the latency budget with data? Yes, because the more data that goes into making the inference, the better the inference. The example that someone used actually on Fareed Zakaria GPS one show about it was, if he had 300 attributes describing a person, he could know more about that person than that person did in terms of inferring other attributes. So the point is, once you've gotten your latency budget, the other two knobs that you can play with are development complexity and admin complexity. And the idea is on development complexity, there's a bunch of abstractions that you have to deal with. If it's all one product, you're going to have one data model, one address and namespace convention, one programming model, one way of persisting data, a whole bunch of things. That's simplicity and that makes it more accessible to mainstream organizations. Similarly, there's a bunch of, let me just add there's probably two or three times as many constructs that admins would have to deal with. So again, if you're dealing with one product, it's a huge burden off the admin and we know they struggled with Hadoop. So convergence decisions about how to enact convergence is going to be partly or strongly influenced by those three issues. Latency budget, development complexity or simplicity and administrative, so David. I'd like to add one more to that and that is location of data because you want to be able to look at the data that is most relevant to solving that particular problem. Now, today a lot of the data is inside the enterprise. There's a lot of data outside that but there's still, you will want to, in the best possible way, combine that data. But is that a variable on the latency budget? Well, I would think it's very useful to split the latency budget, which is to do with inference mainly and development with the machine learning. So there is a development cycle with machine learning. That is much longer, that is days, could be weeks, could be months. It's still done in batch. Right, it is, it will be done in batch. It is done in batch and you need to test it and then deliver it as an inference engine to the applications that you're talking about. Now that's going to be very close together. That inference and the rest of it has to be all physically very close together. But the data itself is spread out and you want to have mechanisms that can combine those data, move application to those data, bring those together in the best possible way. That is still a batch process. That can run where the data is, in the cloud, locally, wherever it is. And I think you brought up a great point, which I would tend to include in latency budget because no matter what kind of answers you're looking for, some of the attributes are going to be pre-computed and those could be external data. And you're not going to calculate everything in real time. There's just, it's- You can't. Yes. You have a bucket. You can. But is the practical reality that the convergence of, so again, the argument, we've got all these new problems, all new kinds of new people that are claiming that they know how to solve the problems, each of them choosing different classes of tools to solve the problem, and explosion across the board in the approaches, which can lead to enormous downstream integration and complexion costs. You've used the example of Cloudera, for example, some of the distro companies who claim that 50 plus percent of their development budget is dedicated to just integrating these pieces. That's a non-starter for a lot of enterprises. Are we fundamentally saying that the degree of complexity or the degree of simplicity and convergence that's possible in software is tied to the degree of convergence in the data? This is, you're honing in on something really important. Give me one more. Thank you. Give an example of the convergence of data that you're talking about. I'll let David do it because I think he's going to jump on it. Yes. So let me take examples, for example. If you have a small business, there's no way that you want to invest yourself in any of the normal levels of machine learning and applications like that. You want to outsource that. So big software companies are going to do that for you. And they're going to do it, especially for the specific business processes which are unique to them, which give them digital differentiation of some sort or another. So for all of those type of things, you will, software will come in from vendors, from SAP or son of SAP, which will help you solve those problems. And having data brokers, which are collecting the data, putting them together, helping you with that, that seems to me the way things are going. In the same way that there's a lot of inference engines, which will be out at the IoT level, those will have very rapid analytics, given to them again, not by yourself, but by companies that specialize in facial recognition or specialize in making a warehouse data warehousing. Wait a minute, are you saying that my customers aren't special that requires special face recognition? So I agree with you, David, but I want to come back to this notion. The point I was getting at is, there's going to be lots and lots of room for software to be developed to help in specific cases. And large markets to sell that software into. Whether it's a software bit increasingly also with services. But I want to come back to this notion of convergence because we've talked about hardware convergence and we're starting to talk about the practical limits on software convergence, but somewhere in between, I would argue, and I think you guys would agree, that really the catalyst for, or the thing that's going to determine the rate of change and the degree of convergence is going to be how we deal with data. Now you've done a lot of research on this and I'm going to put something out there and you tell me if I'm wrong, that at the end of the day, when we start thinking about UniGrid, when we start thinking about some of these new technologies and the ability to have single copies or single sources of data, multiple copies, in many respects what we're talking about is the virtualization of data without loss. Without loss of the characters, the fidelity of the data or the state of the data. If I got that right. Knowing the state of the data. Or knowing the state of the data. If you take a snapshot, that's a point in time. You know what that point of time is and you can do a lot of analytics, for example, and you want to do them on a certain time of day or whatever it is that you're comparing. Is it wrong to say that we're seeing, we've moved through the virtualization of hardware and we're now in a hyper scale which is, or hyper converged, which is very powerful stuff. We're seeing this explosion in the amount of software that's being, the way we approach problems and whatnot but that a forcing function, something that's going to both constrain how converged that can be but also force or catalyze some conversions, is the idea that we're moving into an era where we can start to think about virtualized data through some of these distributed file systems and other types of data. That's right, and the metadata that goes with it. The most important thing about the data is, and it's increasing much more rapidly than data itself is the metadata around it. But I want to just make one point on this. All data isn't useful. There's a huge amount of data that we capture that we're just going to have to throw away. Is the idea that we can look at every piece of data for every decision is patently false. And there's a lovely example of this in fluid mechanics. Who dynamics? Fluid dynamics. If you're trying to have simulation at a very, very low level, the amount of data, high fidelity, you run out of capacity very, very, very quickly indeed. So you have to make trade-offs about everything, and all of that data that you're doing in that simulation, you're not going to keep that, all the data from IoT. You can't keep that. And that's not just a statement about the performance or the power or the capabilities of the hardware. There's some physical realities that are just going to be, that are going to limit what you can do with a simulation. But, and we've talked about this in other action items. There is this notion of options on data value where you do that today's, the value of today's data is much higher. Well, it's higher from a time standpoint for the problems that we understand and are trying to solve now. But there may be future problems where we still want to ensure that we have some degree of data where we can be better at attending those future problems. But I want to come back to this point because in all honesty, I haven't heard anybody else talking about this and maybe it's because I'm not listening. But this notion of, again, your research that the notion of virtualized data inside these new architectures, being a catalyst for simplification of a lot of the sub systems. It's essentially sharing of data. So instead of having the traditional way of doing it within a data center, which is I have my systems of record, I make a copy, it gets delivered to the data warehouse, for example. That's the way there's been done. That is too slow. Moving data is incredibly slow. So another way of doing it is to share that data, make a virtual copy of it, and technology is allowing you to do that because the access density has gone up by thousands of times. Because? Because of flash, because of new technologies at that level. High performance interfaces, high performance networks. All of that stuff is now allowing things which just couldn't be even conceived. However, there is still a constraint there. It may be a thousand times bigger, but there is still an absolute constraint to the amount of data that you can actually process. And that constraint is provided by latency. Latency. Speed of light. And the speed of the processes themselves. Again, let me add something that may help explain the sort of the virtualization of data and how it ties into the convergence or non-convergence of the software around it, which is when we're building these analytic pipelines, essentially we've disassembled what used to be a DBMS. And so out of that, we've got a storage engine, we've got query optimizers, we've got data manipulation languages, which have grown into full blown analytic languages, a data definition language. Now the system catalog used to be just a way to virtualize all the tables in the database and tell you where all the stuff was and the indexes and things like that. Now what we're seeing is since data is now spread out over so many places and products, we're seeing an emergence of a new type of catalog, whether that's from Elation or Dremio or on AWS, it's the glue catalog. And I think there's something equivalent coming on Azure. But the point is we're beginning, those are beginning to get useful enough to be the entry point for analytic products and maybe eventually even for transactional products to update or at least to analyze the data in these pipelines that we're putting together out of this, out of these components of what was a disassembled database. Now we could be- I would make a difference there between the development of analytics and again the real-time use of those analytics within systems of intelligence. And those are different problems they have to solve. There's design time and runtime. There's actually four pipelines for the sort of analytic pipeline itself. There's design time and runtime. And then for the inference engine and the modeling that goes behind it, there's also design time and runtime. But I guess I'm not disagreeing that you could have one converged product to manage the runtime analytic pipeline. I'm just saying that the pieces that you assemble could come from one vendor. Yeah, but I think David's point, I think it's accurate. And this has been since the beginning of time. Certainly predated Univac that at the end of the day, read-write ratios and the characteristics of the data are going to have an enormous impact on the choices that you make. And high write-to-read ratios almost dictate a degree of convergence. And we used to call that SMP or scale up database managers. And for those types of applications or those types of workloads, it's not necessarily obvious that that's going to change. Now we can still find ways to relax that. But you're talking about, George, the new characteristics. Injecting the analytics. Injecting the analytics where we're doing more reading as opposed to writing. We may still be writing into an application that has these characteristics. But it's a small amount of data. But a significant portion of the new function is associated with these new pipelines. Right, and it's actually your, what data you create is generally derived data. So you're not stepping on something that's already there. All right, so let me get some action items here. David, why don't I start with you? What's the action item? So for me about conversions, that there's two levels of conversions. First of all, converge as much as possible and give the work to the vendor. Would be my action item. The more that you can go full stack, the more that you can get the software services from a single point, a single throat to choke, single hand to shake, the better the more you have outsourced your problems to them. And that has a speed implication. And that has time to value. Time to value, it has a, you don't have to do undifferentiated work. So that's the first level of convergence. And then the second level of convergence is to look hard about how you can bring additional value to your existing systems of record by putting in automation as a real-time analytics which leads to automation. That is the second one where the, for me, where the money is. Automation reduction in the number of things that people have to do. George, action item. So my action item is that you have to evaluate, you the customer have to evaluate sort of your skills as much as your existing application portfolio. And if more of your Greenfield apps can start in the cloud and you're not religious about open source but you're more religious about the admin burden and development burden and your latency budget, then start focusing on the services that the cloud vendors originally created that were standalone, but they are increasingly integrating because their customers are leading them there. And then for those customers who have decades and decades of infrastructure and applications on-prem and need a pathway to the cloud, some of the vendors formerly known as Hadoop vendors, but for that matter, any on-prem software vendor is providing customers a way to run workloads in a hybrid environment or to migrate data across platforms. All right, so let me give this a final action item here. Thank you, David Floyer, George Gilbert, Neil Raden and Jim Cabellus and the rest of the Wikibon team is with customers today. We talked today about convergence at the software level. What we've observed over the course of the last few years is an expanding array of software technologies, specifically AI, big data, machine learning, et cetera, that are allowing enterprises to think differently about the types of problems that they can solve with technology. That's leading to an explosion in the number of problems that folks are looking at, the number of individuals participating in making those decisions and thinking those issues through, and very importantly, an explosion in the number of vendors with piecemeal solutions about how, what they regard, their best approach to doing things. However, that is going to have a significant burden that could have enormous implications for years. And so the question is, will we see a degree of convergence in the approach to doing software in the form of pipelines and applications and whatnot, driven by a combination of what the hardware is capable of doing, what the skills are make possible, and very importantly, the natural attributes of the data. And we think that there will be. There will always be tension in the model to try to invent new software, but one of the factors that's going to bring it all back to a degree of simplicity will be a combination of what the hardware can do, what people can do, and what the data can do. And so we believe pretty strongly that ultimately the issues surrounding data, whether it be latency or location, as well as the development complexity and administrative complexity are going to be a range of factors that are going to dictate ultimately how some of these solutions start to converge and simplify within enterprises. As we look forward, our expectation is that we're going to see an enormous net new investment over the next few years in pipelines because pipelines are a first level set of investments on how we're going to handle data within the enterprise. And they'll look like, in certain respects, how DBMS is used to look, but just in a disaggregated way, but conceptually and administratively and then from a product selection and service selection standpoint, the expectation is that they themselves have to come together so the developers can have a consistent view of the data that's going to run inside the enterprise. Want to thank David Fleuer, want to thank George Gilbert. Once again, this has been Wikibon Action Item, and we look forward to seeing you on our next Action Item.