 from the Silicon Valley Media Office in Boston, Massachusetts. It's theCUBE. Now, here's your host, Dave Vellante. Welcome to CUBE Conversations, everybody. We're going to have a conversation about how customers are getting, gaining analytic insights and what role both infrastructure and software play in that equation. Jeff Kelly is here. He's a data strategist from Pivotal and he's joined by Ted Bartos, who's a Senior Director of Product Management for Hybrid Cloud Platforms at EMC Dell. Gentlemen, welcome to theCUBE. Welcome back, Jeff. Welcome back both of you, CUBE, both CUBE alums. Jeff, you've done a million CUBE interviews. I've done a few. Good to have you back here in the Wikibon studio. So, you guys have big announcement. We're going to talk about that. We're just, you know, fresh off strata, where we sort of had a full dose of what's going on in the ecosystem and some of the problems that we're seeing. So, Ted, let me start with you. Maybe you could sort of summarize off of strata and as well as what you've been seeing in the customer base, what are some of the big challenges that our customers are facing and then we can talk about how you guys are addressing those. Sure, thanks, Dave. And thanks for having me. You know, coming off of strata and actually even over the last year of really working on directed availability of our big data solution, what we've really seen is a couple of key things. One, customers are looking for an end-to-end analytics lifecycle solution. They're looking for someone to deliver them the software componentry and the infrastructure that can take them from discovering data, ingesting it, the analytics, surfacing the insights, and then actioning, having actionable insights, being able to action them so that they can ring the monetary value out of it and get the business value out of it. In that, so for that fully analytic lifecycle, they also want it to be self-service, right? The people in IT that we're talking to, they really want to be the hero to the data scientists. Right now, they're underserving them just because of the environment's so complex so when they get a request to stand up a Hadoop environment or a big data environment, it takes a long time and then sometimes those data scientists or the data science professionals go out to the cloud and that's through underserving them. So they're looking to get to a governed, quota-managed self-service environment so they can really empower data scientists at the time that they wanna really investigate insights, prove out hypotheses. And then the third part of it really is to deliver that as a fully engineered turnkey solution so that we can really help them and out of pivotal and James Waters talks about above the value line in development, there's also an above the value line in big data that we really want to take the data science professional and bring them to the point at which they're really focusing on how they deliver value to the company and we take the burden off of them in terms of the infrastructure and software stack gymnastics that they might need to give. Jeff, when you think about sort of the evolution of the whole big data ecosystem, it sort of started with Hadoop, really started it all and people will debate, well Hadoop doesn't equal big data and vice versa but that really was what it got started and the beginning is what is Hadoop and then it was, we saw the distro wars, everybody was jumping in and now the ecosystem is exploded, there's an explosion of complexity and a lack of applications. That's in part anyway, why Pivotal was formed. So where are we and how are you helping address both the complexity problem and the lack of applications? Well, you're absolutely right. From a complexity perspective, I mean, if you were at the floor, Strata Hadoop wrote last week through our, I don't know, 100 vendors, you know, half of them I hadn't heard of before and a number from last year weren't there anymore. So there's a lot of complexity going on, there's a lot of different components required from the technology perspective from to address the entire data life cycle. So that's absolutely true. I think you need to step back though and when we talk to customers is think about first and foremost, what's the business problem you're trying to solve and if you start there and then work back to the technology I think that's probably the better approach rather than I think what we saw in the early days especially with Hadoop, you know, people were deploying Hadoop and they're so-called data lakes and they were filling it with data but ultimately they didn't know what to do with it and it was very difficult to monetize that data because they just didn't know what the use case was. So, you know, when I was here at Wikibon our research thesis was that the companies that really gonna get the most value from data they're gonna do it through applications. And Pivotal really was in part created to help facilitate it, facilitate that, help that actually become a reality. So really going from managing your data to analyzing it to coming up with insights and then ultimately delivering those insights in context to users through applications. And if you think about, you know, if you think about the disruptors in Silicon Valley, they do this really well. So somebody like Uber, somebody like Netflix, you know, they, I'm sure they use data in the traditional sense in the background they're running data warehouses and reports but ultimately they would, they impact the customer experiences through surfacing insights and their applications. So, you know, with Uber you request a car, you get a little pop-up, this says it'll be here in three minutes. There's a lot of data science behind that little pop-up. It seems simple, but there's a lot of data science happening and ultimately you don't need to be trained on how to use that app. It's just in the context of what you're trying to do is get from point A to point B. So Pivotal, we're trying to help our enterprise customers do the same thing. And that's really what we're all about. Well, the early days as well of Hadoop and Big Data where sort of the ROI was reduction on investment. We sometimes joke, but it was true. It was that big sucking sound from the enterprise data warehouse. And you're right, people just throw stuff in the data lake and people again tongue in cheek, they joke about the data swamp, but there's a lot of truth to that. So what are you guys specifically announcing? How does it help people get insights from analytics and how does it help with that sort of data flow, data pipeline? Sure, thanks. So we're announcing the analytic insights module at Dell EMC World. And what it really provides is it provides that full analytics lifecycle coverage. So it provides capabilities for data scientists as they're looking to build out and discover the data that they need to fill out their analytics model. It provides them the ability to see what's in the data lake as well as to be able to go out and sample data stores, content stores in the enterprise and even in the cloud. And within a button click to be able to see how that data model is coming together in completeness to see the relations that are done. So for the full analytics lifecycle on the front end, one of the key bottlenecks is that 80% of the time data scientists has spent finding and blending wrangling data. So we're focused on the full life cycle, really some help on the front end so that we can help them not necessarily have to push everything into the data lake but what's in the data lake is high use but then on demand by being able to investigate, analyze, sample, analyze and bring in data on demand. So the full life cycle, a tailored user experience. So we built the user interface using Pivotal Cloud Foundry. It's a PCF app and it's really a tailored user interface to those data science personas of the data scientists, the data engineer, the data architect, the business analyst and we deliver that as a solution as a by verse build proposition to our customers at both the infrastructure for the converged infrastructure as well as the solution stack on the top that provides the key solution components on there which we talked about data investigation hitting the 80% time, we also have the data ingestion, the data ingestion module that we're partnering with the company on and then one of the key things and especially as we penetrate the financial services, healthcare, highly governed verticals, really strong security component that has very deep granular security with attribute based access control so we can address things like data sovereignty at that level. And so it's the full life cycle, the tailored persona directed interface to really speed the discovery and delivery of insights. We're putting that on the converged infrastructure platform and then because we're a PCF app and we have our native hybrid cloud solution which is PCF on our converged infrastructure, the analytic insights module sits on top of that so that when the data scientists builds their data model and they build their modeling of the analytic model, they're on that same platform seamlessly working side by side shoulder to shoulder with the app developer that can then bind to that, connect to that data service and to be able to deliver the applications like Jeff was talking about those Uber applications, the ones that'll monetize it, right? Applications in a broad sense and software applications in that consumer market facing applications that give you a deeper relationship, more timely relationship with the customer, embedded systems, software applications that improve operational efficiency like predictive maintenance or IT operation analytics but then also even dashboards or reports that come out from software to help businesses make better strategic decisions. So it's an end to end solution based on the premise that data's going to be wherever the data is, you're bringing in models to actually help people understand what should go into the data lake and then you've got tooling to get insights out of the data and ideally operationalize those. So sort of end to end solution integrated sort of out of the box, if you will. What piece of that is Pivotal? So Pivotal Cloud Foundry is the platform which this runs. So Pivotal Cloud Foundry is a cloud native platform that supports that kind of agile application development process and continuous delivery. So you're continuously shipping code and that kind of modern application and software development methodology. So that's where one place that Pivotal comes into play from an analytics perspective. So the solution can call out to different data sources that can include Pivotal Green Plum, Pivotal HDB which is our Hadoop native SQL analytics database could even include more transactional type systems like Pivotal Gemfire which are in memory data grid. So it plays a role both on the platform side as well as on the analytics side. Okay, and what about other sort of ecosystem partners that are part of the solution? Maybe you could talk about that a little bit. Sure, so when we went with the directed availability of our big data solution, we really interacted with the model that's really what drove us to that persona-focused interface. You know, get the user, get the key persona users, data scientists, data engineers above the value line, get them really focusing on how we speed their time. And so when we took a look at the key components for the data discovery portion of it and the ability to not only see and be able to search at what's in the lake and in the enterprise data catalog but also go out and discover additional content stores in the enterprise or even in the cloud we've partnered with Tivio, a company out of Newton with a long pedigree in enterprise search and really they've been focusing on big data and the data discovery portion. For the ingestion capability, we're really looking for someone, when we took a look at ingestion, we wanted a couple of things. One, someone with a really open platform like a workflow engine for ingestion so that you could use different, whether it was streaming, whether it's batch, whether it's Spring XD or whatever you wanna use there, Kafka, that those could be woven into a workflow and we chose Zoloni as our partner for that. Open API, great workflow, great pedigree in the market. Another thing about them in the Tivio is both have a long pedigree of services delivery to the market so they really get it, right? They've been out at companies really solving problems and that's pragmatically, that's how we wanna approach it is very pragmatically. On the security side, as we're bringing all this data together and you have your data lake and then you're gonna go out to content stores, you're gonna be able to sample them, as soon as we're working with a financial institution or anybody, HIPAA help, they're saying, well, okay, great, you can break down the silos, you can bring all the data together, how do I make sure that only the right people can see the data, right? I think the statistic is in breaking down silos and big data, 70%, north of 70% of the people are seeing data they shouldn't have access to. So we partnered with a company called Blue Talent and they were at Strata Adoop, they won awards at the last few and they provide very deep, very granular cell row column level security. They provide the ability to obfuscate, redact or even tokenize data so you can still do joins with tokenized data. But it's all policy driven and it's at attribute-based access control. So beyond role-based access control, but it provides attributes. So you can say for any attribute of someone in ADL or LDAP, what country are they in, right? Do you think of data sovereignty and data that's authored in Germany needs to only be viewed by people in Germany. So you're able to have a very flexible, very rich, very deep security model that you can apply to the different content sources through a policy enforcement point. And those three companies, when we went out on top of that experience and we looked at that workflow, we first went through the workloads, the key needs. Then we identified the key characteristics and capabilities that we wanted. Then we went out, looked at a range of buy, build partner. And that's how we came up with the three partners we chose. It was really pragmatic. And I purchased this solution from Dell EMC, right? I don't have to cobble together all these solutions. Ah, yeah, great point, yes. So one of the key assets of us delivering a fully engineered, validated solution is that there's a single line of support. So some type of complexity, as Jeff was mentioned, you walk around Strada Hadoop and you see all of the different applications. And that's where that complexity comes in and it's what's really stalled a lot of companies and setting up the right environment. So we set up that environment, we engineer it, we test it and we stand right behind it with a single line of support. So to the end user customer, this rolls up on the dock, it gets set up, it gets stood up. You have the full workflow, you have the persona and you only have one number to call if anything happens to the system. Jeff, I want to explore this notion of personas. So you have the data scientist, you have data engineer, you have a business analyst, you have an application developer. How are you guys seeing those personas evolve? How are they related? They're gray lines, unpack that for us. Well, that's part of what really excites me about this solution, that it helps bring that world of data and applications together. So traditionally those worlds were very separate. So you had your data warehouse team, your BI team, creating reports, ingesting data, cleaning the data and then you had your application development team that was creating your enterprise applications and the two very rarely taught. Data went from your transactional, operational applications over to the data warehouse but other than that, there wasn't much of a relationship. Insights from your data warehouse very rarely made it back to your applications. So what I really like about this solution is trying to bring those worlds together. In terms of how those roles are blurring, you know, it's still evolving. I mean, if you go to most enterprises, it's still the case that they don't talk very much. You know, part of what we're trying to do at Pivotal, you know, our mission is to transform how the world builds software and kind of bring the way Silicon Valley companies build software, bring that to the enterprise. And if you go to a Facebook or a Google, you know, their application developers and their data people don't sit separately. They're working very closely together. They don't look at it as, you know, one or the other. And so we're trying to bring that mentality to enterprises. And frankly, it's a learning process for a lot of our customers. But we, you know, we work very closely. We do through our Pivotal Labs organization, through our Pivotal Data Science organization, working side by side with our clients, data scientists and application developers, helping them learn to work together, educating application developers about, to start thinking about when they're building applications where analytics and insights might actually improve their application. Start thinking about the outcomes you're trying to achieve and how data might be able to help that. And similarly, from a data science perspective, start thinking about, you know, this isn't just a science experiment. Think about, you know, as you're getting to the point where you're creating some really interesting insights, how could this actually impact the business? And then creating those lines of communication between the two. So it's just as much about culture or communication as it is about platforms and technology. Well, and as well, to what extent can you automate that cleansing part where data scientists spend 80% of their time wrangling, you know, and figuring out if this data even has the quality necessary, which in part, by the way, has to involve the line of business, to say, okay, is this right? How can you help or are you helping? Well, I think the way we look at it is, in a lot of ways, you can look to new advances in machine learning to help with that, help classify data. That's really a role that that can play. You know, but it's still, there are still areas where it's somewhat a manual process. There are tools out in the market that can help you, you know, cleanse data. There are data quality tools, data quality vendors out there. I think from Pivotal's perspective, if you look at our data science team, for example, and when they're doing engagements, you know, they really try to put machine learning and data science itself to work to solve the data quality problem. You know, machine learning can help you understand what data you have, can help classify data, things like that. So that's one way to approach it. I think ultimately that's the direction the market's going, versus kind of more manual tooling. Still pretty early, you know, there's a lot left to do to kind of crack that nut. But I think machine learning is going to play an important role in data quality. Ted, you're part of the solutions organization at Dell EMC, which has had sort of a long history. What specific industries are you going after? You're building solutions for, you know, those various industries? Yeah, great. We are, we're targeting it and we're really looking to build out the platform very pragmatically. As Jeff said, you know, the utility of big data really comes in and how it improves your business performance. And you know, so what we're doing is we're really targeting specific verticals and using that to pragmatically build out and extend the platform itself by saying, okay, these problems really is the one problem that only affects one market, but let's really go in and deliver value in those particular verticals. And the verticals that we're targeting have shown a good appetite of consuming big data well and going into production with it. Telco, telecommunications, healthcare, financial services, oil and gas and manufacturing because of the IoT influence on streaming analytics and also core and core analytics. So we're really focusing on those markets and the partners and some of the success that we've had, you know, partners, healthcare, where we really sped their ability to get to better health results for people using their services, that those are good examples of, you know, how we're targeting a specific business outcome, delivering value, but then taking that back to the platform and then extending it. So that's just even out much easier for the personas that are using it. Historically, Jeff, if you go back, data and applications were very tightly intertwined and all the data was locked inside the application and then the enterprise data warehouse emerged and, you know, the next generation decision support systems came out and you had business analysts, which was all well and good, but it was like insights for a few that by the time they got to the business were outdated, we all sort of know and lament that problem. It seems like now that data is being associated with business outcomes and to the extent that you can operationalize that notion of, you know, I know data scientists don't like the term citizen data scientist because it sort of denigrates what they do and, you know, anybody can be a data scientist if they just change their name and their business card. So, but the concept of an analyst or a business person that can be at least a quasi, you know, data geek is seems to be happening. So do you sort of agree with that progression of business outcomes, data tied to business outcomes and are we close to actually seeing that data operationalized and much more widely distributed throughout the organization to have business impacts? Well, I agree absolutely that any, you know, worker, business worker, business knowledge worker can do a better job if they have better information at their fingertips. So that's absolutely true. The question I guess is how do you get to that point? Do you want your business user spending time wrangling data and analyzing it or do you want to deliver insights to them that are either immediately actionable or can lead fairly quickly to them, you know, improving how they do their jobs? And to do that, it gets right back to that whole point about integrating insights into your applications, delivering them in context. So whether that's, you know, if I'm a sales person, I'm in my CRM application all day, if I can deliver some insights through that, through that application, that's not necessarily a bar chart or a graph but actually give you some recommendations on the next actions you can take. Maybe for this customer you should call them and make an offer for this or maybe this is a customer that might be a little angry, you might want to reach out to them. If you can turn those insights into actual recommendations that they can take action on, I think that's the most valuable. You know, that said, there's still value in, you know, kind of self-service dashboards and visualizations, you know, companies like Tableau and Click are very popular because of that and there's definitely still value in that as well, kind of taking a step back and saying, hey, I'm going to examine some data, see if I can find some insights. But I think on a day-to-day basis, you know, we're all short on time and I think the more you can use insights to actually help people be more efficient and do their jobs in a smarter way, I think that's really where the big benefit comes in in terms of business users. How, go ahead please. Yeah, I just wanted to interject one thing on the platform and a response to that and your question directly. So, you know, one of the things we found is the ability to turn, you know, raw data in the enterprise out in the cloud into a real digital asset, right, is one of the key things we kept hearing from customers. So, one of the capabilities we had was that, to kind of speak to your question, you have the data scientists and they're building these models and they're targeting a specific outcome, but that data model in itself, once they pull that data together, blended it, wrangled it, cleansed it, we have the ability in the platform based off the customer, you can actually publish that into a data catalog that we have. So, it's one way to sort of leverage the data scientists to do, you know, some of the deep data science, but then take that model and publish it with full security so they can bind a tableau to it or if they want to do some further analytics or whatever, they can have a good starting point instead of having to start all the way from the beginning and finding the data. And that, you know, a couple of things is we talked with customers, some of the wow factors, right, of the analytics, you know, the self-service is good, IT loves it because now they can be the hero to the data scientists and the business from maintaining governance, but some of the wow factors are the publishing, the ability to do that so that raw data becomes an asset. And then the other one is really the interaction between the analytic insights module and native hybrid cloud and that when we show customers the fact that they can take those data models and then bind to them directly and really break down that barrier, right, to a deliberate platform between data science and application development. You know, that's the one where they're like, okay, we want to see you demonstrate that to us. That is really exciting, right? Yeah, I mean, I think if you're a data scientist, what could be more frustrating than, you know, doing your work, spending months on a project, coming up with a great insight, you've built this great predictive model, and then it never gets implemented. It dies in PowerPoint somewhere. You give a presentation to the board or the C level and they're like, this is great, but there's no way to actually turn that into an application that's gonna do something to help your business. I think that's one of the big challenges that the industry faces. And I think that's, you know, one area that we're really trying to help. And being able to improve that over time. Oh, absolutely. It's a learning process. The more interactions that happen, the smarter the algorithm becomes. And that metadata catalog is your technology? That's the deli and C? It is. That's what we built on top of. We built that on top of that infrastructure. We have partner software. We have the solution software that really provides the workflows. We're actually patenting some of those workflows and integrations as a method and apparatus patent. We built the data catalog on the top. We do, you know, in the data catalog, one of the things you're gonna immediately do is go, as a data scientist, you're gonna say, well, I don't wanna start from the beginning. What's in the lake? What's in my catalog? That is, we do leverage the search capabilities of Ativio. You have a single place to go in the platform where you can say, hey, I'm looking for customer data that's, you know, by, you know, from this particular region or whatever. And that'll search across the lake, across any of the attached content stores and in that data catalog. So we can present you, hey, I've got curated data already that has the majority of the content you're looking for. And that allows them no faster way to speed an 80% bottleneck than to take it out of the game. You mentioned machine learning before. Are we getting to the point where machine learning and techniques can actually recommend to the data scientist which algorithm to use and what the best fit is? I mean, are we there? It's a little meta, but yes, we're getting to that point. Where, I mean, in fact, that's, I think, one of the real killer use cases for machine learning is, in fact, the data quality question, data classification, kind of the discovery process where you're just trying to figure out what you have. Machine learning can play a huge role in that. And in fact, we're seeing that with our customers, our data science team, you know, it's been around for years now, started way back in the Green Plum days, has now grown out to a quite a large team. You know, they use those kind of techniques, whether it's something like Apache Madlib or other machine learning libraries to actually do that kind of early classification process. I'll run out of time, Ted, but go ahead, please. And just one last thing on there. So when we're, you know, as Dell and EMC came together, you know, one of the key areas that is just a beautiful sort of, you know, peanut butter and chocolate, a one plus one equals three is around IoT, right? Is around Dell's presence in the gateways and our presence in the core and being able to weave those together. The reason why I bring that up is that's a big area for machine learning, right? The ability to at the edge have your analytic model, you know, giving you near real time feedback at the edge, but then taking, you know, longer term time series and more of the data back to the core and then using machine learning to actually improve that analytic model out at the edge. So we really have this beautiful situation where we can have this like continuously improving loop, right, through deep learning. Well, it opens up the door for a manufacturing vertical solution as well, where, you know, there's a backlash in a lot of the products that are coming from overseas. You know, consumers in the United States are saying, wow, they don't have the quality that I expected. And you are seeing a slow shift in some industries back to, you know, onshore manufacturing. Perhaps, you know, data science can either solve the problem of quality, you know, overseas or, you know, bring some of the manufacturing back to the States. We'll see. So before we close, you mentioned a directed availability a couple of times. What's the availability of these products? You know, when do they go GA? So we're going to be launching and announcing the analytic insights module, analytic insights on native hybrid cloud at Dell EMC World, which is from October 18th to the 20th. And the GA, the complete availability, you'll be in the first week of November. And so people will be able to, you know, get to their EMC sales person, will be out in GA availability. And how do you price these solutions? Is it sort of small, medium, large? Is it? Well, it is. We have models, because it's such a scalable business, right? In either way, how many federal cloud foundry application instances do you need? How many, how much storage do you need for your big data? How many cores do you need to process that big data? We price it. We have kind of a pinpoint as a starting point for companies. One AI pack, so 50 AI instances. 75 terabytes of usable HDFS storage. About 90 or so cores on a converged infrastructure platform. And that really, that's kind of the seed. A lot of companies can start right there. They have everything, we deliver it, they open it up, and then they can scale out in any direction. More pivotal cloud, more application instances, more storage, or more cores. And so we're prepared to go on that journey with them and give them a really nice model that they can extend in any direction. So that's the starter kit. Have you announced pricing for that starter kit at this point? We'll announce it at EMC Dell, EMC World. All right, Gents, great to see you again. Thanks very much for coming on theCUBE. Jeff, Jeff, welcome back. Good to have you. Thanks for watching everybody. We'll see you at Dell EMC World. This is theCUBE. Thanks for watching CUBE Conversations.