 Hi, I'm Peter Burris and once again welcome to a Wikibon action item. We are again broadcasting from the beautiful theCUBE studios here in Palo Alto, California, and we're joined today by a relatively larger group. So let me take everybody through who's here in the studio with us, David Floyer, George Gilbert. Once again, we've been joined by John Furrier, who's one of the key Q-Posts and on the remote system is Jim Cabela's, Neil Raden and another Q-Post, David DeVonte. Hey guys. Hi there. Good to be here. So one of the things we're, one of the reasons why we have a little bit larger group here is because we're going to be talking about and the community gathering that's taking place in the big data universe in a couple of weeks. Large numbers of big data professionals are going to be descending upon strata for the purposes of better understanding what's going on within the big data universe. Now, we have run a Cube show next to that event in which we get the best thought leaders that are possible in strata, bring them in onto the Cube and really help separate the signal from the noise that strata has historically represented. We want to use this show to preview what we think that signal is going to be so that we can help the community better understand what to look for, where to go, what kinds of things to be talking about with each other so that it can get more out of that important event. Now, George, with that in mind, what are kind of the top-level thing? If it was one thing that we'd identify as something that was different two years ago or a year ago and it's going to be different from this show, what would we say it would be? Well, I think the big, the big realization this year is that we're starting with the end in mind. We know that we know the modern operational analytic applications that we want to build that anticipate and or influence a user interaction or inform or automate a business transaction and for several years, we were experimenting with big data infrastructure, but that was it wasn't solution-centric. It was technology-centric and we kind of realized that the sort of do-it-yourself, assemble your own kit, open-source, big data infrastructure created too big a burden on admins. Now we're at the point where we're beginning to see a more converged set of offerings take place and by converged, I mean an end-to-end analytic data pipeline that is sort of uniform for developers, uniform for admins and because it's pre-integrated is lower latency, helps you put more data through one single analytic latency budget. That's what we think people should look for. Right now though the hottest new tech-centric activity is around machine learning and I think the big thing we have to do is recognize that we're sort of at the same maturity level as we were with big data several years ago and people should, if they're going to work with it, start with the knowledge for the most part that they're going to be experimenting because the tooling isn't quite mature enough. We don't have enough data scientists for people to be building all these pipelines bespoke and the third-party applications, we don't have a high volume of them where this is embedded yet. So if I can kind of summarize what you're saying, we're seeing a bifurcation occur within the ecosystem associated with big data that's driving towards simplification of the infrastructure side which increasingly is being associated with the term big data and new technologies that can apply that infrastructure and that data to new applications including things like AI, ML, it's DL where we think about modeling and services and a new way of building value. Now that suggests that one or the other is more or less hot. But Neil Raden, I think the practical reality is that here in Silicon Valley we got to be careful about getting too far out in front of our skis. At the end of the day, there's still a lot of work to be done inside how you simply do things like move data from one place to the other in a lot of big enterprises. Would you agree with that? Oh, absolutely. I've been talking to a lot of clients this week. And we don't talk about the fact that they're still running their business on what we would call legacy systems and they don't know how to get out of them or transform from them. So they're still starting to plan for this. But the problem is it's like talking about the 27 rocket engines on the whatever it was that he launched into space, launching a Tesla into space. You can talk about the engineering of those engines and that's great. But what about all the other things you have to do to get that car into space? And it's the same thing. A year ago we were talking about Hadoop and big data and to a certain extent machine learning may be more data science. But now people are really starting to say how do we actually do this? How do we secure it? How do we govern it? How do we get some sort of metadata or semantics on the data we're working with so people know what they're using? I think that's where we are in a lot of countries. Great. So that's great feedback, Neil. So as we look forward, Jim Kabilis, the challenge is associated with what it means to better improve the facilities of your infrastructure but also use that as a basis for increasing your capability on some of the new application services. What are we looking for? What should folks be looking for as they explore the show in the next couple of weeks on the ML side? What new technologies? What new approaches? Going back to what George said, we're in experimentation mode. What are going to be the experiments that are going to generate the greatest results over the course of the next year? Yeah, for the data scientists to flock to strata and similar conferences, automation of the machine learning pipeline is super hot in terms of investments by the solution providers. Everybody from Google to IBM to AWS and others are investing very heavily in automation of not just the data engine. That's been that problem's been handled a long time ago. It's automation of more of the feature engineering and the training. These very manual often labor intensive jobs have to be sped up and automated to a great degree to enable the magic of productivity by the data scientists and the new generation of app developers. So look for automation of machine learning to be a super hot focus. Related to that is look for a new generation of development suites that focus on DevOps, speeding the machine learning and DL and AI from modeling through training, evaluation, deployment, iteration. We've seen a fair upswing in the number of such tool kits on the market from a variety of both startup vendors like the data robots in the world, but also companies like AWS, the SageMaker, for example, that's hot. Also look for development tool kits that automate more of the code generation, you know, low code tools, but the new generation of low code tools as highlighted in a recent Wikibon study use ML to drive more of the actual production of fairly decent, good enough code as a first rough prototype for a broad range of applications. And finally, we're seeing a fair amount of ML generated code generation inside of things like robotic process automation, RPA, which I believe will probably be a super hot theme at Strata and other shows this year going forward. So there's you mentioned, you mentioned the idea of better tooling for DevOps and relationship between big data and ML and whatnot and DevOps. One of the key things that we've been seeing over the course of the last few years, and it's consistent with the trends that we're talking about is increasing specialization in a lot of the perspectives associated with changes within this marketplace. So we've seen other shows that have emerged that have been very, very important that we, for example, are participating in places like Splunk, for example, that is at the vanguard in many respects of a lot of these trends in big data and how big data can be applied to business problems. Dave Vellante, I know you've been associated with a number of participating in these shows, how does this notion of specialization inform what's going to happen in San Jose and what kind of advice and counsel should we tell people to continue to explore beyond just what's going to happen in San Jose in a couple of weeks? Well, you mentioned Splunk as an example, a very sort of narrow and specialized company that solves a particular problem and has a very enthusiastic ecosystem and customer base around that problem, long files to solve security problems, for example. I would say Tableau is another example, heavily focused on biz. So what you're seeing is these specialized skill sets that go deep within a particular domain. I think the thing to think about, especially when we're saying in San Jose next week is as we talk about digital disruption, what are the skill sets required beyond just the domain expertise? So you're sort of seeing this bifurcated skill sets really coming in to vogue, where somebody understands, for example, traditional marketing, but they also need to understand digital marketing in great depth and the skills that go on. So it's sort of a two tool player. We talk about five tool player in baseball, at least a multi-dimensional skill set in digital. And that's likely to occur not just in a place like marketing, but across the board. David Floyer, as folks go to the show and start to look more specifically about this notion of convergence, are there particular things that they should think about that to come back to the notion of, well, you know, hardware is going to make things more or less difficult for what the software can do and software is going to be created that will fill up the capabilities of hardware. What are some of the underlying hardware realities that folks going to the show need to keep in mind as they evaluate, especially the infrastructure side, these different infrastructure technologies that are getting more specialized? Well, if we look at historically at the big data area, the solution has been to put in very low cost equipment as nodes, lots of different nodes, and move the data to those nodes so that you get a parallelization of the data handling. That is not the only way of doing it. There are good ways now where you can in fact have a single version of that data in one place in very high speed storage on flash storage, for example, and where you can allow very fast communication from all of the nodes directly to that data. And that makes things a lot simpler from an operational point of view. So using current batch automation techniques that are in existence and looking at those from a new perspective, which is how do I use apply these to big data? How do I automate these things can make a huge difference in just the practicality and the elapsed time for some of these large training things. Yeah, I was going to say that so many respects what you're talking about is bringing things like training under a more traditional operational set of disciplines. Very important. So John Furrier, I want to come back to you or I want to come to you and say that there are some other technologies that while they're the bright shiny objects and people think that they're going to be the new kind of Harry Potter technologies of magic everywhere. Blockchain is certainly going to become folded into this big data concept because blockchain describes how contracts ownership authority ultimately get distributed. What should folks look for as the as blockchain starts to become part of these conversations? That's a good point, Peter. My summary of the preview for Big Data SV Silicon Valley, which includes the Stratas show is two things. Blockchain points to the future and GDPR points to the present. GDPR is probably the most one of the most fundamental impacts to the big data market in a long time. People have been working on it for a year. It is a nightmare. The technical underpinnings of what companies have to do to comply with GDPR is a moving train. It's complete BS. There's no real solutions out there. So if I was going to tell everyone to think about that and what to look for, what is happening with GDPR? What's the impact of the databases? What's the impact of the architectures? Everyone is faking it till they make it. No one actually really has anything, in my opinion, from what I could see. So it's a technical nightmare. Where was that database? So it's going to impact how you store the data and the sovereignty issue is another issue. So the blockchain then points to the sovereignty issue of the data both in terms of the company, the country, and the user. These things are going to impact software development, application development, and ultimately cloud choice and the IoT. So to me, GDPR is not just a one and done thing and blockchain is kind of a future thing to look at. So I would look at those two lenses and say, do you have a direction or a narrative that supports me today with what GDPR will impact throughout the organization? And then what's going on with this new decentralized infrastructure and the role of data and the sovereignty of that data with respect to company, country, and user. So to me, that's a big issue. So George Gilbert, if we think about this question of these fundamental technologies that are going to become increasingly important here, database managers are not dead as a technology. We've seen relative explosion over the last few years in at least invention, even if it hasn't been followed with, as Neil talked about, very practical ways of bringing new types of disciplines into a lot of enterprises. What's going to happen with the database world and what should people be looking for in a couple of weeks to better understand how some of these data management technologies are going to converge or evolve? It's a topic that will be of intense interest and relevance to IT professionals because it's become the common foundation of all modern apps. But I think what we can do is we can see, for instance, a leading indicator of what's going to happen with the legacy vendors, where we have in-memory technologies for both transaction processing and analytics. And we have more advanced analytics embedded in the database engine, including machine learning, the model training, as well as model serving. But what happened in the big data community is that we disassembled the DBMS into the data manipulation language, which was an analytic language like could be Spark, could be Flink, even Hive. We had the catalog, which I think Jim has talked about or will be talking about, where we're not looking, it's not just a dictionary of what's in one DBMS, but it's a whole way of tracking and governing data across many stores. And then there's the storage manager, could be the file system, an object store, could be just something like Kudu, which is a, you know, MPP way of in parallel performing a bunch of operations on data that's stored. The reason I bring all this up is following on David's comment about the evolution of hardware. Databases are fundamentally meant to expose capabilities in hardware and to mediate access to data using these hardware capabilities. And now that we have this, what's emerging as this unigrid with memory-intensive architectures and super low latency to get from any point or node on that cluster to any other node, like with only a five microsecond lag relative to previous architectures, we can now build databases that scale up with the same sort of knowledge base that we built databases, I'm sorry, that scale out that we used to build databases that scale up. In other words, it democratizes the ability to build databases of enormous scale. And that means that we can have analytics and the transactions working together at very low latency without binding them. All right, so I think it's time for the action items. We've got a lot to do, so guys keep it really tight, really simple. David Floyer, let me start with you, action item. So action item on big data should be focus on technologies that are going to reduce the elapsed time of solutions in the data center. And those are many and many of them, but it's a production problem, it's becoming a production problem, treat it as a production problem and put in the fundamental procedures and technologies to succeed. And look for vendors that can do that. Who can do that? Yes. George Gilbert, action item. So I talked about convergence before. The converge platform now is shifting, its center of gravity is shifting to continuous processing where the data lake is a reference data repository that helps inform the creation of models, but then you run the models against the streaming continuous data for the freshest insights. Okay, Jim Cabela's action item. Yeah, focus on developer productivity in this new era of big data analytics, specifically focus on the next generation of developers who are data scientists and specifically focus on automating most of what they do so they can focus on solving problems and sifting through data. But all the grunt work of training and all that stuff can take care about the infrastructure, the tooling. Neil Raiden, action item. Well, one thing I learned this week is that everything we're talking about is about the analytical problem, which is how do you make better decisions and take action, but companies still run on transactions. And it seems like we're running on two different tracks and no one's talking about the transactions anymore, which we're like the tail wagging the dog. Okay, John Furrier, action item. Action item is dig into GDPR. It is a really big issue. If you're not proactive, it could be a nightmare. It's going to have implications that are going to be far reaching in the technical infrastructure. And it's the Sarbanes Oxley, what they did for public companies. This is going to be a nightmare and evaluate the impact of blockchain, two things. David Vellante, action item. So we often say that digital is data and just because your industry hasn't been upended by digital transformations, don't think it's not coming. So it's maybe comfortable to sit back and say, well, we're going to wait and see. Don't sit back and wait and see all industries are susceptible to digital transformation. All right, so I'll give the action item for the team. We've talked a lot about what to look for in the community gathering that's taking place next week in Silicon Valley around Strata. Our observations as the community descends upon this and what to look for is number one, we're seeing a bifurcation in the marketplace, in the thought leadership and in the tooling. One set of group, one group is going more after the infrastructure, where it's focused more on simplification, convergence. Another group is going more after the developer, AI ML, where it's focused more on how to create models, training those models, and building applications with the services associated with those models. Look for that. Don't be careful about vendors who say that they do it all. Be careful about vendors that say that they don't have to participate in a converged approach to doing this. The second thing I think we need to look for, very importantly, is that the role of data is evolving and data is becoming an asset, and the tooling for driving velocity of data through systems and applications is going to become increasingly important, and the discipline is necessary to ensure that the business can successfully do that with a high degree of predictability, bringing new production systems are also very important. A third area that we take a look at is that ultimately, the impact of this notion of data as an asset is going to really come home to roost in 2018 through things like GDPR. As you scan the show, ask a simple question. Who here is going to help me get up to compliance and sustain compliance as the understanding of privacy, ownership, etc. of data in a big data context starts to evolve because there's going to be a lot of specialization over the next few years. There's a final one that we might add. When you go to the show, do not just focus on your favorite brands. There's a lot of new technology out there, including things like blockchain. They're going to have an enormous impact ultimately on how this marketplace unfolds. The kind of miasma that's occurred in big data is starting to specialize. It's starting to break down, and that's creating new niches and new opportunities for new sources of technology, while at the same time reducing the focus that we currently have on things like Hadoop as a centerpiece. A lot of convergence is going to create a lot of new niches, and that's going to require new partnerships, new practices, new business models. Once again, guys, I want to thank you very much for joining me on action item today. This is Peter Burris from our beautiful Palo Alto theCUBE Studio. This has been Action Item.