 Live from Midtown Manhattan, it's theCUBE, covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. It covers the theCUBE in New York City for Big Data NYC. The hashtag is Big Data NYC. This is our fifth year doing our own event in conjunction with Strata Hadoop, now called Strata Data. It used to be Hadoop World, our eighth year covering the industry we've been there from the beginning in 2010 at the beginning of this revolution. And I'm John Furrier, the co-host with Jim Kobielus, our lead analyst at Wikibon. Our next guest is Christian Rotatas, who's the CEO of Datamere. Datamere, obviously one of the startups now evolving on I think eighth year or so, roughly seven or eight years old. Great customer base, been successful. Blocking and tackling, just doing good business. Your shirts are showing the data. Welcome to theCUBE, Christian. Appreciate it. So well established and I barely think of you as a startup anymore. It's kind of true, right? Actually a couple of months ago, after I took on the job, I met Mike Olson. And Datamere and Claudio were sort of founded the same year, really in late 2009, early 2010. And then he told me there were two open source projects, MapReduce and Hadoop basically. And Datamere was founded to actually enable customers to do something with it, right? As I enter in platform to help getting data in, create the data and then doing something with it. And now if you walk the show floor, there's a completely different landscape now. But we've had you guys on before, the founder, Stefan, has been on. Interesting migration, we've seen you guys grow from a customer base standpoint. You've come on us to see you and kind of take it to the next level. Give us an update on what's going on at Datamere. I was in the shirts that show me the data, show me the money kind of play there. I get that. That's what the money is. The data is where the action is. Real solutions, not pie in the sky, we're now in our eighth year of this market. So there's not a lot of tolerance for hype. There's a lot of AI watching going on. But what's going on with you guys? So I would say, interestingly enough, I met with a customer, a prospective customer this morning. And this was a very typical organization. So this is a customer that's an insurance company and they're just about to spin up their first adobe cluster to actually work on customer management applications. And they are overwhelmed with what the market offers now. So there's 27 open source projects. There's dozens and dozens of other different tools that try to basically, they tried best of reach approaches in certain layers of the stack for specific applications. And they don't really know how to stitch this all together. And if I reflect from a customer meeting at a Canadian bank recently that has very successfully deployed applications on the data lake like in fraud management and compliance applications and things like this, they still struggle to basically replicate the same performance and the service level agreements that they're used from their old EDW that they still have in production. And so everybody is now going out there and trying to figure out how to get value out of the data lake for the business users, right? And there's a lot of approaches that these companies are trying this sequel on Hadoop that supposedly doesn't perform properly, right? There is other solutions like OLAP on Hadoop that tries to emulate what they've been used to from the EDWs. And we believe these are the wrong approaches. So we want to stay true to the stack and be native to the stack and offer a platform that really operates end-to-end from ingesting the data into the data lake to curation and data preparation of the data and ultimately building the data pipelines for the business users. And this is certainly something that we see. So yours is more of a play for the business users now, not the data scientists and the statistical modelers. I thought the data scientists were your core market, is that not true? So our primary user-based data mirror used to be like until last week where the data engineers in the companies are basically the people that built the data lake, they created the data and built these data pipelines for the business user community no matter what tool they were using. Jim, I want to get your thoughts on this for Christian to address. Last year, so this guys can fix your microphone. I think you guys fix the microphone for us is your piece there. But I want to get a question to Chris, but I want to ask to redirect through you. Gartner, another analyst firm. I heard of him. Not a big fan personally, but you know. They're still in business. They get the Magic Quadra and they use that tool. Anyway, they had a good interest stat. So last year they predicted by through 2017, 60% of big data projects will fail. So the question for you both you guys is, did that actually happen? I don't think it did. I'm not hearing that 60% fail, but we are seeing the struggle around analytics and scaling analytics in a way that's like a DevOps mentality. So thoughts on this 60% data projects fail. I don't know whether it's 16%, there was another statistic that said there's only 14% of Hadoop deployments on production or something or whatever. Define failure. I mean, you build a data lake and maybe you're not using it immediately for any particular application. Does that mean you failed? Or does it simply mean you haven't found the killer application yet for, I don't know. So I agree with you. It's probably not a failure to that extent. It's more like how do they, so they dump the data into it, right? They build the infrastructure and now it's about the next step data lake 2.0 to figure out how do I get value out of the data? How do I go after the right applications? How do I build a platform and a tool set that basically promotes the use of that data throughout the business community in a meaningful way? Okay, so what's going on with you guys from a product standpoint? You guys have some announcements. Let's get to some of the ways. Absolutely, so I think we were very strong in data creation, data preparation, and in the entire data governance around it. And we're using, as a user interface, we're using this spreadsheet-like user interface called the Workbook. It really looks like Excel, but it's not. It operates at a completely different scale. It's basically an Excel spreadsheet on steroids. And our customers built the data pipeline. So this is the data engineers that we discussed before. But we also had a relatively small power user community in our client base that used that spreadsheet for deep data exploration, right? And now we're lifting this to the next level and we put a visualization layer on top of it that runs natively in the stack. And what you get is basically a visual experience, not only in the data curation process, but also in deep data exploration. And this is combined with two platform technologies that we use. It's based on highly scalable distributed search in the backend engine of our product. Number one, we have also adopted a column in the data store parquet for our file system now. And in this combination, the data exploration capabilities that we bring to the market will allow power analysts to really dig deep into the data. So there's literally no limits in terms of the breadth and the depth of the data. It could be billions of rows. It could be thousands of different attributes and columns that you're looking at. And you will get a response time of a sub-second as we create indices on demand as we run this through the analytic process. With these fast queries in visualization, do you also have the ability to do semantic data virtualization roll-ups across multi-cloud or multi-clusters? Yeah, absolutely. So there's a second trend that we discussed before we started the live transmission here. Things are also moving into the cloud, right? So what we are seeing right now is the EDW is not going away. The on-premise data lake will prevail, right? And now they're thinking about moving certain workload types into the cloud, right? And we understand ourselves as a platform play that builds a data fabric that really ties all these data assets together and enables business- This is one of the trends we went on camera. We'll bring it up here, the impact of cloud to the data world. You've seen this movie before. You have extensive experience in this space, going back to the origination, you'd say, tarot data. When it was the classic old-school data warehouse and the great purpose, great growth, massive value creation. Enter the Hadoop kind of disruption. Hadoop evolved from batch to do ranking stuff and then try to, it was a hammer that turned into a lawn mower, right? So it's like, then it started going down the path and really wasn't workable for what people were looking at. But everyone was still trying to be the tarot data of whatever. Okay, so fast forward, so things have evolved and things are starting to shake out. Same picture, data warehouse-like stuff. Now you got cloud. It seems to be changing the nature of what it will become in the future. What's your perspective on that evolution? What's different about now and what's the same about now that's in the old days? And so what's the similarities of the old school and what's different that people are missing? So I think it's not related to cloud, just in general. It is extremely important to foster adoption throughout the organization, to get performance and service-level agreements right with our customers, right? So this is where we clearly can help and we give them a user experience that is meaningful and that resembles what they were used to from the old EDW world, right? That's number one. Number two, and this comes back to a question, do 60% fail or is it failing or working? I think that there's a lot of really interesting projects out there and our customers are betting big time on the data lake projects, whether it being on premise or in the cloud, right? And we work with HSBC, for instance, in the United Kingdom. They've got 32 data lake projects throughout the organization. And I spoke to one of these- Not 32 data lakes, 32 projects that involve stepping into the data lake. 32 projects that involve various data lakes, right? And so, and I spoke to one of the chief data officers there and they said their data center infrastructure, just by having kick-started these projects will explode, right? And they're not in the business of operating all the hardware and things like this and so a major bank like them and they made an announcement recently, a public announcement, you can read about it, are starting moving the data assets into the cloud. So this is clearly happening at rapid pace and it will change the paradigm in terms of breathability and being able to satisfy peak workload requirements as they come up when you run a compliance report at quarter end or something like this. So this will certainly help with adoption and creating business value for our customers. We talk about all the time, real time and there's some of the examples of how data science has changed the game. I mean, I was talking about from a cyber perspective how data science help capture bin Laden to how I can get increased sales to better user experience on devices. So having real time access to data and you putting some quick data science around things really helps things in the edge. What's your view on real time? Obviously that's super important. You got to kind of get your house in order in terms of base data hygiene and foundational work, building blocks. At the end of the day, the real time seems to be super hot right now. Real time is a relative term, right? So there's certainly applications like IoT applications or machine data that you analyze that require real time access. I would call it right time. So what's the increment of data load that is required for certain applications? We are certainly not a real time application yet. We can possibly load data through Kafka and stream data through Kafka, but in general, we are still a batch oriented platform. We can do... Which by the way is not going away anytime soon. No, it's not going away at all, right? We can do many batches at relatively frequent increments which is usually enough for what our customers demand from our platform today, but we're certainly looking at more streaming types of capability as we move forward. What do the customer architectures look like? Because you brought up a good point. We talk about this all the time, batch versus real time. They're not mutually exclusive. Obviously good architectures would argue that you decouple them. Obviously grab good software elements all through the life cycle of data and have a stack and the stack's only going to get more robust. Your customers, what's the main value that you guys provide them and the problem that you're solving them today and the benefits that they have? Absolutely, so our true value is that there's no breakages in the stack, right? So we are end to end. We can basically satisfy all requirements from interesting the data, from blending and integrating the data, preparing the data, building the data pipelines and analyzing the data, right? And all this we do in a highly secure and governed environment. So if you stitch it together, so the customer this morning asked me, so whom do you compete with, right? Getting this question all the time. And we really compete with two things. So we compete with build your own, which customers still opt to do nowadays while our things are really point and click and highly automated. And we compete with a combination of different products, right? So you need to have at least three to four different products to be able to do what we do, but then you get security breaks, you get lack of data lineage and data governance through the process. And this is the biggest value that we can bring to the table. And secondly now with visual exploration, we offer capability that literally nobody has in the marketplace where we give power users the capability to explore with blazing fast response times, billion rows of data in a very free form type of exploration process. So there are more power users now than there were when you started as a company. It seemed like tools like Datamir have brought people into the sort of power user camp simply by the virtue of having access to your tool. What are your thoughts there? Absolutely, it's definitely growing and you see also different companies exploiting their capability in different ways, right? You might find insurance or financial services customer that have a very sophisticated capability built in that area and you might see a thousand or 2,000 users that do deep data exploration and other companies are starting out with a couple of dozen and then evolving it as they go. Chris, I got to ask you, as the new CEO of Datamir, actually going to the next level, you guys been successful. We were commenting yesterday on theCUBE about, we've been covering this for eight years in depth, obviously with the CUBE coverage. We've seen the waves come and go of hype and but now there's not a lot of tolerance for hype. You guys are one of the companies I will say that, you know, stay to your knitting. You didn't overplay your hand. You've certainly rode the hype like everyone else did, but your solution is very specific on value and so you didn't overplay your hand. The company never really overplayed their hand, in my opinion. But now there's really the hand is value. So as a new CEO, you got to kind of put a little shiny new toy on there and rub the, keep the car looking shiny and everything looking good with cutting edge stuff, at the same time scaling up what's been working. So the question is, what are you doubling down on and what are you investing in to kind of keep that innovation going? So that's really three things. And you're very much right. So this has become a mature company, right? We've grown with our customer base, our enterprise features and capabilities are second to none in that marketplace. This is what our customers achieve and now the three investment areas that we're putting together and where we're doubling down is really visual exploration as I outlined before. Number two, hybrid cloud architectures. We don't believe the customers move their entire stack right into the cloud. There's a few that are gonna do this and that are looking into these things, but we believe in the idea that they will still have their EDW, their on-premise data lake and some workload capabilities in the cloud which will be growing. So this is investment area number two. Number three is the entire concept of data curation for machine learning. This is something where we've released a plugin early in the year for TensorFlow where we can basically build data pipelines for machine learning applications. This is still very small. We see some interest from customers, but it's growing interest and they're looking into this. It's a directionally correct kind of back to you. Absolutely. It's a good sign. Let's kick the tires on that and play around. Machine learning's got to learn too. We've got to learn from somewhere. And quite frankly, deep learning machine learning tools for the rest of us, there aren't really all that many for the rest of us power users. They're going to have to come along and get really super visual in terms of enabling visual modular development and tuning of these models. What are your thoughts there in terms of going forward about a visualization layer to make machine learning and deep learning developers more productive? So that is an area where we will not engage in. So we will stick with our platform play where we focus on building the data pipelines into those tools. Gotcha. So in the last area where we invest is ecosystem integration. So we think with our Visual Explorer backend that is built on Search and on a Parquet file format is our calendar store is really a key differentiator in feeding or building data pipelines into the incumbent BI ecosystems and accelerating those as well. So we have currently prototypes running where we can basically give the same performance and depth of analytic capability to some of the existing BI tools that are out there. What are some of the ecosystem partners you guys have? Partnering is a big part of what you guys have done with P name. I mean, the biggest button. Did everybody in Switzerland that you don't? No, not really, we are focused on staying true to our stack and how we can provide value to our customers. So we work actively and very, very important in our cloud strategy with Microsoft and Amazon AWS in evolving our cloud strategy. We've started working with various BI vendors that are out there that you know about, right? And we definitely have a play also with some of the big, big SIs and IBM is a more prominent one. So the BI guys mostly on the tool visualizations that you said you were pipelineing into. On the tool and visualization side, right? We have very effective integration for our data pipelines into the BI tools today. We support TDE for Tableau. We have a native integration. Yeah, why compete there? Just be a service provider. Absolutely, and we have more and better technology come up to even accelerate those tools as well in a big data. So you're focused, you're scaling, final word I'll give to you for the segment. Share with the folks that are a data mirror customer or have not yet become a customer. What's the outlook? What's the new data mirror look like under your leadership? What should they expect? Yeah, absolutely. So I think they can expect utmost predictability, the way how we roll out that vision and how we build our product in the next couple of releases. So then the next five, six months are critical for us, right? We have launched visual explorer here at the conference. We're gonna launch our native cloud solution probably mid of November to the customer base. So these are the big milestones that will help us for our next fiscal year and provide real great value to our customers. And that's what they can expect, predictability, a very solid product. All the enterprise grade features they need and require for what they do. And if you look at it, we are really an enterprise play, right? And the customer base that we have is very demanding and challenging. And we want to keep up and deliver a capability that is relevant for them and helps them create values from the data leaks. Christian Rodatis, technology, enthusiast, passionate. Now CEO of DataMir, great to have you on the queue. Thanks for sharing. And we'll be following your progress. DataMir, here inside theCUBE, live coverage, hashtag big data NYC, our fifth year doing our own event here in conjunction with Strata Data, formerly Strata Hadoop World, eight years covering the space. I'm John Furrier with Jim Kobilius, here inside theCUBE, more after this short break. Thank you.