 Okay, we're back here live in Silicon Valley. We are in the heart of Silicon Valley. We're at the San Jose Convention Center here in California. This is Hadoop Summit 2013. This is theCUBE, our flagship program. We go out to the events, extract the signal from the noise. A lot of action happening here, a lot of growth, a lot of innovation, a lot of positioning, a lot of companies we've had startups on. We had founders, we had CEOs, developers. We're here to extract the signal from the noise. I'm John Furrier, the founder of SiliconANGLE.com and I'm Joe with my co-host. I'm Dave Vellante at Wikibon.org. Stefan Groeschup is here. He's the CEO and founder of Datamere. Stefan, welcome back to theCUBE. Good to see you again. Thank you. Thank you for having me. Yeah, you're welcome. So you've been hitting the circuit up here. Saw you on stage and you are a little controversy as usual, but you've been here since the beginning. Oh yeah, surprising. You've been here since the beginning of Hadoop. You're obviously an early contributor, a committer to the project. How would you describe where we're at today? You've seen the baby grow up, where are you at? Yeah, so this is my tense here. Maybe that's why I'm controversial too, right, I saw it before. But I think what we're seeing and it's actually very fascinating is that we are literally crossing the chasm. So if you actually go on Google Trends and you compare Hadoop, what I would argue is a technical term versus big data. What's kind of the CIO term? You actually will see that the big data is growing like a hockey stick, where Hadoop is actually flooding out. CIOs are winning. They have the checkbook. They always win. So what's very interesting really is that the conversation that we have with a lot of customers is really changing to use cases. Right, so for a while it was more kind of the technology looking for what can it solve. And you had some early adopters, right? Techies, but now it's really the classical business problems. Final optimization, fraud identification, marketing segmentation, and there's huge value there, right? But it doesn't matter if it's this version, that version in memory, compressed, over-optimized. It's the question, what is that technology really bringing to my business is the big question now. Yeah, or other use cases, like we had Sky Christopherson on one of your customers, partners, collaborators, which is amazing what they were doing. I guess like using your technology to help athletes squeeze a little more juice out of their bodies. Yeah, it's super interesting. I mean, I love the story, right? It made all in data me, I had goosebumps. And we was like sitting in front of the screens at the London Olympics. We couldn't talk about it, right? But it's just such a great story that there's such a cultural shift in sports where you used to have doping as kind of the stun knot, right? And now it all came up. And now here's Sky and the woman bicycling team, and they have an innovative way, you know, and it's focused on data. It's focused on quantified self-data and a used data mirror to really look at the patterns and they found amazing insights. And boy, they shaved off five seconds, you know, came in as like no way they will get anywhere close to the medals and won a silver medal. And it's just, that's, again, that makes goosebumps. And this is so great because Adub is not just able to sell more advertisement, right? But it also really can change lives. And that's just a great story. We met at the IBM meeting, the Athens Almeden, which is kind of a cool thing. We were having lunch together. We were talking about business users and you were showing me the little demo and it's just amazing what you do with visualization these days. John, I think that, you know, Stefan's saying we're crossing the cast and it feels that way, doesn't it? Well, I mean, look at Dave, let's just be, let's just go right into the heart of the conversation. It's all about enterprise grade and business value. And I think, Stefan, some of the things you brought up and I saw on Twitter that I'd like to talk to you about is that business value. Because we are now at that, the rising tide is floating the boats, but the harbor yet is dead to be filled up or the lake or the ocean, whatever data, lake or analogy people want to use. But to go public, to get bought or to make money, startups or growing companies need to have metrics. So I want to get your perspective on this because you're out there, you're talking a lot of the vendors, you're in the middle, you like Switzerland. What is the metrics? What are the metrics that need to be in place to have a sustainable business model in this market? You know, assuming, obviously, there's a little frothy right now, but I want to get your perspective on that. Okay, great question. Well, so first of all, you have to provide ROI to your customers, right? So let's really start at the basics. And a lot of folks, they sell you a tool set, right? And there's really a challenge around that because if you actually do the math behind that, and we do this with our customers on a daily basis, we have a nice ROI tool where you calculate the total cost of ownership, well turns out Hadoop and the hardware in all that good stuff is really cost efficient, no question. But boy, if you have a tool set, you have to duct tape it together. And if you duct tape it together, you need people that are knowledgeable about that. That's actually not cost efficient anymore. If you have five developers that, by the way, go for 200K in Silicon Valley that now Hadoop, duct taping something together out of all those different pieces for a year, you just spend a million dollars. So we really have to have an honest conversation about this. We always say, oh, you can save IT costs if Hadoop, but actually, if you do the math, that's not the case. If you duct tape and build from ground something, actually sometimes people are using an MPP database, right? So there's important conversations. Hadoop brings something really important to the table, but it's not SQL on top of Hadoop or lowering your IT costs. It's about flexibility, no schema, those kind of things. So it's early though, it's early though. I mean, it's not like it's not, it's going to add value. What you're saying is, it might not be that shiny brass ring that you're going to have right out of the gate. Well, Hadoop is another tool in the toolbox, right? And that's very important. So you have to know, do you need a screwdriver or do I need a hammer, right? And Hadoop is a freight train, basically. If you need a Ferrari, you need a MPP database. Anyhow, so to answer a question, to answer a question here really is, what is important really if you want to be successful as a company is to build your customer base, provide them value, obviously make more money than you spend acquiring them, right? Guy Kawasaki, famous world star, death by customer acquisition cost. A lot of startups actually fail. They raise a lot of money. Well, it's usually a bad sign because it means they bleed money to get customers. And if you get that right, if you get more, if you make more dollars than you spend on getting the customer, then you're moving in the right direction, right? And if you get to a certain stage, then it might be interesting to get acquired by one of the big guys if the technology works. Or if you get it really good and get really big, then you maybe might do an IPO. But it's always in Silicon Valley, you have a lot of companies that raise a lot of money because they have a product that they need to spend $100,000 to sell, right? Very classical enterprise software problem. So can we go back to the comments that you were making? Let's just go directly to it because you're well known for your position on SQL coming to Hadoop. Can you restate that position and we can try to help us understand technically why you feel that way or from a business standpoint? Yeah. So if you think about the Hadoop file system, it actually works like a tape drive, right? We all remember what tape drive is. Hadoop is a new tape, they say. It's not the new oil, it's the new tape. But if you try to find that one song on your tape, right? All the way on the back, what you had to do is go all the way to the tape. And technically, Hadoop is a sequential optimized file system and to find an interval to record, you have to basically stream to all of the data. That gives Hadoop the performance for analytical workloads, like full tables, scans, joints, all this kind of good stuff. Fantastic, great. But for a very language, right? It's just miserable. That's why we invented CDs that actually have more B3 kind of data structure where you actually have an index and then you know where to jump to what you can do with a CD but not with a tape. Now, putting a query language on top of a sequential optimized file system is just not making sense. It doesn't make sense even more if we already have very great, major technology. I'm not against SQL. You know, it's a very good tool, but you have Oracle, you have DB2, you have MS SQL, you have Green Blum, Votika, Neteaser, it's there since 20 years. And the performance, those tool springs, those technologies are somewhere else. It's a different university than universe that we will ever get with a do because it's a sequential file system. Now, the only reason some of those vendors, I think, do this, right, is because they need more adoption in their customer base. And hey, those guys all know chocolate. Let's put some chocolate frosting on top of that, but then the customer's biting into the cake and realizing, well, it's actually sausage and not a cake. Right, so, and there's a big problem. So, they have expectations about, excuse my German. I love that. Chocolate covered sausage, too. Yeah, so, right, so they have expectation around SQL and Hadoop will never fill them because the underlying architecture was never done for that. Now, the big question is, what is Hadoop then bringing to the table? I would argue that the new currency and organization is time, right? If you push a new smartphone to the market, you have maybe only eight weeks to make it successful during the market entry. Now, a traditional three tier data analytics architecture where you have ETL, where if data's a new oil, where you break up the data and make plastic out of it and you melt it into that schema and to that form of your data warehouse and then you put BI on top of that. It's really a 19th century production environment, right? Where, yeah, that's how we used to do that. The reason we used to do that is because the man hours we invested in pre-optimizing the data was much cheaper than the really, really expensive data warehouse. Well, guess what? The whole thing turned around with Hadoop. Now we have Moore's law, right? Every 18 months, hardware capacity doubles, multiplied by the number of machines in your Hadoop cluster. So you basically have unlimited storage and compute. We do not need to pre-optimize anymore. So what we now can do is almost kind of data 3D printing. We take the raw material, we put in Hadoop and now we can create views on the data. We create a marketing view, a sales view, you know, an IT view on the same raw data and what that brings us is agility. So instead of spending 18 months, and that's a number from TDWI, on implementing the ETL Data Warehouse BI infrastructure and it takes you 18 months to change by the way as well, we now, we have customers that within three days integrate data, create their views on the raw data and get their insights. And if you're able to move in the market faster, then you can win against your competition. It's the only competitive advantage today is agility and flexibility. Okay, so you're prescribing, use Hadoop as it was intended to be used, as this flexible, you know, John calls it data ocean. He doesn't like the term data lake. I don't like it, don't like it. It's too small. Yeah, it's too calm. Data universe. And then use. We went and boiled the ocean. That's the... Ripped ties, ripped cards, a lot of stuff. Every entrepreneur. So use SQL where it belongs. So move the data out of when you need to do that. Yeah, or if you have structured data from the start with, just leave it in your MPP database. I mean, organizations have MPP databases and they're great, great performance, great ETL tools, great BI tools, great security. It works, right? So for me, again, I do this in 10 years and I spend a lot of time really trying to innovate here. For me, it feels like we backporting a NoSQL platform that has no schema back to something now that requires schema just to put the chocolate on top of that. Come on, let's call it sausage. And it's okay. If we hungry for that, we can have that. Great. Okay, so give us the update on Datamir. You guys made some, you know, new announcements recently. Talk about where you're at as a company. Yeah. So we just announced Datamir 3.0 and we super excited about this. It took us two years to really get this functionality in the product. We started hiring two years ago and we really passionate. I mean, the whole Olympic team story shows what Datamir is about. We believe data and data analytics should belong to subject matter experts. Data, if technology is as difficult that you need a scientist to work with it, we as, you know, vendors did something wrong. We really believe the doctor or the medical folks need to look at the data, not the data scientist. Data scientists known about algorithms and I'm an IT, I'm an engineer as well. I've worked a lot in data mining all my career. So, you know, maybe I could classify myself even there but what I really saw is when people like Sky Christopherson, you know, get access to the data and they see things that data scientists, engineers, IT will not see. So what we did in 3.0 is we introduced smart analytics and smart analytics is basically machine learning, data mining, advanced analytics, however you want to call it but it's so simple that everybody can use it. There's a single click of a button and you can do clustering on your data. So here's all my customer data, just go and find groups yourself. Then we did column dependencies where you basically say, okay, well here's my, you know, lead database. Find out if there's, you know, relationships and we actually analyzed our Marketo data where our, yeah, marketing leads are in. And what we saw is there's a very strong relationship between lead source and job title. And we trilled down, we used our own, you know, our own sausage, we eat our own sausage. And what we saw is like, wow, actually our webinars, more VP's and CIO's coming to, we are to trade shows, there's more software engineers coming to. So we instantly changed our messaging and our webinars and we have much more technical stuff in ROTO's now. So instant insights with a click of a button where we could see this as well. Then we have decision trees. That was also fantastic. We optimized our whole sales process based, again, on this, on insights we found with this. This is where we basically look into the data as well and you get a beautiful graph, like a tree craft where you can understand, okay, well I have a lot of leads out of that specific vertical sets. Let's say healthcare, right? There's a lot of talk about big data on healthcare. But you know what, they're so slow in adapting technology, they don't convert for us very well. So we stopped spending time, for example. But hey, there's a lot of interest in, let's say, financial services. We have four of the five biggest banks as customers, right? But you know what, we don't convert them if we talk to the IT manager, well, we sell a business analyst tool, right? So we saw all those things with a single click of a button. We have recommendation engine as well. And again, we're very passionate, we believe that if we, that the biggest problems in our society can be solved with insights we can find in data. But we really need to put it in the hand of the people that know those specific problems. But that's your strategy, making heroes, data heroes out of business people. And that's absolutely. We're getting the hook here, but I want to ask you one question because I want to end on this note. First of all, great perspective. We'd love having you in the queue, great content, great opinion as well, which we'd like. Advice to entrepreneurs, because obviously it's a confusing marketplace. You're using your own technology, making and eating your own sausage, we kind of have our own tools as well. It's very difficult for an entrepreneur to come into these crowded spaces today and differentiate themselves when what they're building is either a converged and or integrated solution of other metaphors. So what's your advice to entrepreneurs? Especially in the analytics space, you talk about smart analytics. How do they talk to potential investors or to recruit employees? How do they talk about the vision and or their market opportunity in this new world? It's not siloed. I'm not just this, that and the other thing. It's complicated. So what's your advice? Yeah, so it's a very important question. So Datamuse, my sixth company, and I question burned a few in between and I had to learn this really the hard way. And I'm an engineer, right? I always try to solve technical problems. It doesn't matter. It really doesn't matter. What you need to solve are business problems. And the technology needs to have an edge to it so you can solve the business problem that nobody could solve before. So if you have a faster version of a file system or if you can stream something in memory or if your thing is a thousand times faster and scales a million times, it doesn't matter, right? What you need to find is this one specific area where someone has a problem and go and solve that problem. And if it's PHP or shell script you use to solve the problem, it doesn't matter. But there's huge opportunity in data, right? So think about, my favorite example is airlines. There's a huge big data problem there. They have to optimize their flight plans, right? There's a lot of regulation. Crews can only be dead long in the air. If there's a delay, they have to have a crew and stand by, this is really expensive. If they need to reroute airplanes, they have to get more gas up front. Well, it's even more expensive. You have to buy it a certain time. There's a perfect environment here with a lot of competitive pressure that you could solve with big data. And there's everywhere else, supply chain. I saw a really interesting company that used sensor data to optimize vegetable delivery to like farmers market and those kind of things. This is what you need to solve, a real problem. Don't just play with the shiny new toys and build technology. It's great, I love it too. I'm an engineer, every new toy I need to have and I need to optimize it and build my own. Nobody unfortunately cares. Solve a real problem and contribute to the society. If you're really passionate about this to improve a real problem in the world and not just do a little bit better technology, I think you will be successful. That's great advice from a serial entrepreneur who has very successful business now and has some scar tissue. As he said, he's crashed and burned a few times. As we all have, Dave and I can attest to that. We love having entrepreneurs and successful CEOs on. Stefan, thank you so much for coming in on theCUBE. I really appreciate it. I brought you our latest t-shirt, as always. Okay, Datamere always has the best shirts. It's a data scientist. Now you can be a data scientist too. We are data scientists. We are introducing software defined Hadoop into the nomenclature. Wait a minute, Hadoop is software. That's our joke for the week and no one's laughing. Okay, we'll be back here at theCUBE right after this short break. This is Silicon Angle and Wikibon's coverage of Hadoop Summit. I'm John Furrier with Dave Vellante. We'll be right back after this short break. Thank you.