 Live from Midtown Manhattan, it's theCUBE, covering Big Data, New York City, 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back everyone. We're here in New York City. This is theCUBE's exclusive coverage of Big Data NYC in conjunction with Strata Data going on right around the corner. It's our third day wall-to-wall coverage talking to all the influencers, CEOs, entrepreneurs, people making it happen in the Big Data world. I'm John Furrier, the co-host of theCUBE with my co-host, Jim Kobielus, who's the lead analyst at Wikibon for Big Data, Ned Shad, Bartol Wala, Bartol Wali. Bartol Wala. Bartol Wala? Ned Shad, Bartol Wala. That guy. Okay, goes. Hexata. Hexata, co-founder and chief product officer. It's a tongue twister. Third day, originally from Jersey, it's hard with that R accent, but thanks for being patient with me. Happy to be here. And Pranava Mr. Godi, product manager of Microsoft Azure. Guys, welcome back to theCUBE. Good to see you. Thank you very much. Thank you for that third day blues here. Guys, so Hexata, we had your partner on. Prakash. Prakash, really a success story. You guys have done really well, launched on theCUBE for years ago, so we're fun to watch you guys from launching to success. Obviously, relationship with Microsoft is super important. Talk about the relationship, because I think this is really kind of, people can start connecting the dots. Sure, maybe I'll start and I'll be happy to get Pranava's point of view as well. Obviously, Microsoft's one of the leading brands in the world, and there are many aspects of the way that Microsoft has thought about their product development journey that have really been very critical to the way we've thought about Prakash Zata as well. If you look at the number one tool that's used by analysts the world over, it's Microsoft Excel, right? There isn't even anything that's a close second. And if you look at the evolution of what Microsoft has done in many layers of the stack, whether it's the end user computing paradigm that Excel provides to the world, whether it's all of their recent innovation in both hybrid cloud technologies, as well as the big data technologies that Pranava is part of managing, we just see a very strong synergy between trying to combine the usage by business consumers of being able to take advantage of these big data technologies in a hybrid cloud environment. So there's a very natural resonance between the two companies. We're very privileged to have Microsoft ventures as an investor in Prakash Zata. And so the opportunity for us to work with one of the great brands of all time in our industry was really a privilege for us. Yeah, and that's the corporate side. It wasn't actually part of it. So it's a different part of Microsoft, which is great. You have also a business opportunity with them. We do. Obviously data science problem we're seeing is that they need to get to the data faster. All that prep work seems to be the big issue. It does. And maybe we can get Pranava's point of view from the Microsoft angle. Yeah, so to sort of continue what Nishad was saying, just data prep in general is sort of a key core component which is problematic for lots of users. Especially around the knowledge that you need to have in terms of the different tools you can use. Folks who are very proficient will do sort of ETL or data prep like scenarios using one of the computing engines like Hayeev or Spark. That's good, but there's this big sort of audience out there who like Excel like interface, which is easy to use, a very visually rich graphical interface where you can drag and drop and you can click through. And the idea behind all of this is how quickly can I get insights from my data faster? Because in a big data space, it's volume, variety and velocity. So data is coming at a very fast rate. It's changing, it's growing. And if you spend a lot of time just doing data prep, you're losing sort of the value of data or the value of data will change over time. So what we're trying to do with sort of enabling PICSATA or HD Insight is sort of enabling these users to use PICSATA, get insights from data faster by sort of solving the key problems of doing data prep. And so that's what. So the data democracy is a term that we've been kicking around. You guys were talking about it as well. What does that actually mean? Because we've been teasing out first two days here at theCUBE and big data NYC is, it's clear the community aspect of data is growing almost on a similar path as you're seeing with open source software. That genie's out of the bottle, open source software, tier one, it won. It's only growing exponentially. That same paradigm is moving into the data world where the collaboration is super important in this data democracy. What does that actually mean and how does that relate to you guys? So the perspective we have is that first, something that one of our customers said is that there's no democracy without certain degrees of governance. We all live in a democracy and yet we still have rules that we have to abide by. There's still policies that society needs to follow in order for us to be successful citizens. So when a lot of folks hear the term democracy, they really think of the Wild Wild West. And a lot of the analytic work in the enterprise does have that flavor to it. People download stuff to their desktop. They do a little bit of massaging of the data. They email that to their friend, their friend that makes some changes. And next thing you know, we have what some folks affectionately call spread-marred hell, right? But if you really want to democratize the technology, you have to wrap not only the user experience like Pranav described into something that's consumable by a very large number of folks in the enterprise, you have to wrap that with the governance and collaboration capabilities so that multiple people can work off the same data set, that you can apply the permissions so that people, you know, who is allowed to share with each other and under what circumstances are they allowed to share? Under what circumstances are you allowed to promote data from one environment to another? It may be okay for someone like me to work in a sandbox, but I cannot push that to a database or HDFS or Azure Blob Storage unless I actually have the right permissions to do so. So I think what you're seeing is that in general, technology is becoming a, always goes on this trend towards democratization, whether it's the phone, whether it's the television, whether it's the personal computer, and the same thing is happening with data technologies and certainly companies like Pranav. We were talking about this when you were on the cube yesterday and just a little bit, I want to get your thoughts on this. The old way to solve the governance problems was to put data in silos, right? And that was an easy one. Just put it in a silo and take care and then your access control was different. But now the value of the data is about cross-pollinating that they make it freely available, horizontally scalable so that it can be used. But at the same time, you need to have a new governance paradigm. So you got to democratize the data by making it available, addressable, and used for apps. At the same time, there's all sorts of concerns on how to make sure it doesn't get in the wrong hands and so on and so forth. And which is also very sort of common with running open source projects in the cloud as well. How do you ensure that the user authorized to sort of access this open source project or run it has the right credentials, is authorized and stuff. So the benefit that you sort of get in the cloud is this centralized authentication system, there's Azure Active Directory. So, you know, most enterprises would have Active Directory users who they'll authorize to either access, you know, maybe this cluster or maybe this workload and they can run this job. And that sort of further goes down to the data layers where, where we have app link policies which then sort of describe what user can access, what files, or what folders as well. So if you think about the end-to-end scenario, there is authentication and authorization happening in for the entire system on what user can access what data. And part of what sort of Pixar brings into the picture is like how do you sort of visualize this governance floor as data is coming from various sources? How do you make sure that the person who has access to data does have access to data and the one who doesn't cannot access the data? Is that the problem with data prep is just that per piece of it? What is the big problem with data prep? I mean, that seems to be, everyone's keeps coming back to the same problem. What is causing all this data prep? People not buying Paxata, it's very simple. That's a good one. Hey, check out Paxata, they're going to solve your problems, go. But seriously, I mean, this seems to be the same hole people keep digging themselves into. They get it, they start something new and they're in the same hole. I've got to prepare all this stuff. Well, I think the previous paradigms for doing data preparation tie exactly to the data democracy themes that we're talking about here. If you only have a very siloed group of people in the organization who have very deep technical skills, but don't have the business context for what they're actually trying to accomplish, you have this impedance mismatch in the organization between the people who know what they want and the people who have the tools to do it. So what we've tried to do, and again, taking a page out of the way that Microsoft has approached solving these problems both in the past and in the present, is to say, look, we can actually take the tools that once were only in the hands of the shamans who know how to utter the right incantations and instead move that into the common folk who actually- The users. The users themselves who know what they want to do with the data, who understand what those data elements mean. So if you were to ask the Paxata point of view, why have we had these data prep problems? Because we've separated the people who had the tools from the people who knew what they wanted to do with it. So it sounds to me, correct me if this is the wrong term, that what you offer in a joint, in your partnership, is basically a broad curational environment for knowledge workers to sift and sort and annotate and share data with lineage, the lineage of the data preserved and essentially a system of record can follow the data throughout its natural life. Is that a fair characterization? I would think so, yeah. Yeah. And you mentioned, Pranav, the whole issue of how one visualizes or should visualize this entire chain of custody, as it were, for the data. Is there any special visualization paradigm that you guys offer? Now, Microsoft, you've made a fairly significant investment in graph technology throughout your portfolio. I was at Binnell back in May, and Satcha and the others just went to town on all things to do with Microsoft Graph. Will that technology be somehow at some point now or in the future be reflected in this overall capability that you've established with your partner here, Paxata? I am not sure, Link. Okay. So far, I think what we've talked about is a rich graph capabilities that exists in the Microsoft graph, that's sort of one extreme. As the other side that the graph exists today as a developer, you can do sort of graph-based queries. So you can do it on Cosmos DB, which has a Grimman API for graph-based query. So I don't know how this will evolve. I'll get to the right of the question. What's the Paxata benefits with HD Insight? How does that just quickly explain for the audience, what is that solution, and what are the benefits? So the solution is you get a one-click install of installing Paxata on HD Insight, and the benefit for a user persona who's not sort of used to big data or Hadoop, they can use a very familiar GUI-based experience to get their insights from data faster without having any knowledge of how Spark works or how Hadoop works. And what does the Microsoft relationship bring to the table for Paxata? So I think it's a couple of things. One is Azure is clearly growing in an extremely fast pace and a lot of the enterprise customers that we work with are moving many of their workloads to Azure and these cloud-based environments, especially for us, the unique value proposition of a partner who truly understands the hybrid nature of the world. The idea that everything is going to move to the cloud or everything is going to stay on-premise is too simplistic. Microsoft understood that from day one, that data would be in all of those different places, and they've provided enabling technologies for vendors like us to take advantage of. I'll just say it too, maybe too quick to say it, but the bottom line is you have an Excel-like interface. They've offered 365. Their users are going to instantly love that interface because it's an easy-to-use interface. An Excel-like, it's not Excel interface per se, but it's similar to Paradigm. It's a metaphor, it's a graphical user interface. That's clean and it's targeted at the analyst's role or user. That's right. That's going to resonate in their install base. And combined with a lot of these new capabilities that Microsoft is rolling out from a big data perspective. So HD Insight has a very rich portfolio of runtime engines and capabilities. They're introducing new data storage layers, whether it's ADLS or Azure Blob Storage. So it's really a nice way of us working together to extract and unlock a lot of the value that Microsoft is thinking. Here's the tough question for you. Open source projects, I see Microsoft comments where hell froze over because Linux is now part of their DNA. It was a comment I saw at the event this week in Orlando, but they're really getting behind open source from open compute, just clearly new DNAs. They're into it. How are you guys working together in open source and what's the impact to developers because now that's only one cloud. There's other clouds out there. So data is going to be an important part of it. So open source together, you guys working together on that and what's the role for the data? So from an open source perspective, Microsoft today plays a big role in embracing open source technologies and making sure that they run reliably in the cloud. And part of that value prop that we provide in Azure HD inside is making sure that you can run these open source, big data workloads reliably in the cloud. So you can run open source like Apache Spark, Hive, Storm, Kafka, R server. And the hard part about running open source technology in the cloud is how do you fine tune it? How do you configure it? How do you run it reliably? And that's what sort of we bring in from a cloud perspective. And we also sort of contribute back to the community base to understand what we have learned by running these workloads in the cloud. And we believe in the broader ecosystem, customers will sort of have a mixture of these components and their solution. They'll be using some of the Microsoft solution, some open source solution, some solutions from the ecosystem. And that's how we see a customer solution sort of being built today. What's the big advantage you guys have at Pexata? What's the key differentiator for why someone should work with you guys? Is it the automation? What's the key secret sauce to you guys? I think it's a couple of dimensions. One is, I think we have come the closest in the industry to getting a user experience that matches the Excel target user. A lot of folks are attempting to do the same, but the feedback we consistently get is that when the Excel user uses our solution, they just, they get it. They're like, okay. Was that a design criteria? Was that from the beginning? How are you going to do this? From day one. All right, so you engineer everything to make it as simple as Excel. We want people to use our system. They shouldn't be coding. They shouldn't be writing scripts. They just need to be able to point the click and work with the data. Good Excel users do good macros. That's right. So simple things like that, right? But the second is being able to interact with the data at scale. There are a lot of solutions out there that make the mistake, in our opinion, of sampling very tiny amounts of data and then asking you to draw inferences and then publish that to batch jobs. Our whole approach is to smash the batch paradigm and actually bring as much into the interactive world as possible. So end users can actually point and click on 100 million rows of data instead of the million that you would get in Excel and get an instantaneous response versus designing a job in a batch paradigm and then pushing it through the batch environment. So it's interactive data profiling over vast corpus of data in the cloud. Correct. Nishad Bartolwala, thanks for coming on theCUBE. Appreciate it. Congratulations on Pexata and Microsoft Azure. Great to have you in present. Good job on everything you do with Azure. We want to give you guys props. We're seeing the growth in the market and know the investment has been going well. Congratulations. Thanks for sharing. Cube coverage here in big data. NYC, more coming after this short break.