 Special Eli Collins, segment Eli, he was swinging by the cube. I know you're super busy. Eli Collins is a senior engineer at Cloudera, a friend of Silicon Angle. We see him in the office where we are at Cloudera and we've talked in the past on the cube. We've had some great chats. I know you're super busy. You're hosting a session at two o'clock. We've got a couple of minutes. Tell us about Big Top. Big Top. I saw some hashtags popping around, Doug Cutting was mentioned in that. So what is Big Top? Share with the folks what Big Top is. Yeah, Big Top is a new Apache project that's focused on the Hadoop stack as a whole. So the testing, integration of all the components and then releasing them as one unified platform. So that's something new to the community. Typically a lot of that integration has not happened at Apache. And so we've open sourced a lot of the kind of integration and testing and build infrastructure brought it to Apache, gotten a bunch of people in the ecosystem, Hortonworks, Canonical, IBM, Yahoo involved, and we're collaborating on basically bringing the Hadoop stack together as an Apache project. So when was this announced? Cause I've never heard of Big Top. Yeah, it's still in the incubator at Apache. So it's still, we're still putting through the development. It's been in progress for at least like two or three months, but we haven't had kind of a Big 1.0 release yet. So I've had a couple of minor releases, but we're still working towards that. So it's coming out as interest and you guys are making progress. So what is it again? And what's the purpose of it again? So the purpose is it's devoted to, it's an Apache project that's devoted to the Hadoop stack as a whole. So it's primarily focused on the testing and integration of the stack. So as the Hadoop stack grows and gets more complicated and more and more use cases involve multiple components in the stack, you know, workflow, scheduling, storage, compute, there needs to be a home that exercises, you know, how these components in the stack interoperate and function so that we can start thinking of it more as an integrated platform and less as a kind of the sum of the parts. That's great. So Big Top is a new Apache project, Eli Collins, you're involved personally? Yes. And that's project? Who else is involved from Clareira? Oh, there's a number. We've got five people involved on it. So Roman Shepoznak, Andrew Beyer, Bruno Mahe, Peter Lanell, number really strong people on it. So how are you feeling right now? 40 million dollars in financing for the company. Oh, good. Great validation. Pretty exciting times. I remember the, you know, the first Hadoop world three years ago, quite a bit of this change. It's pretty exciting to see. The rocket ship is finally entering the atmosphere. I remember last year, you know, I used the word big data revolution. This is Clareira as the home of the big data, you know, earthquake, you know, revolution. And, you know, I remember back then, you know, the word big data was kind of like, yeah, we don't want to use it now. But now it's now a full industry. It actually is happening. Yeah. You know, so you guys- Your grandma might know the word big data. And Hadoop is happening. So congratulations. Ed Albanese was on earlier and he clarified for the audience and got on the record that Clareira is completely 100% open. Everything contributes to open source except for the enterprise suite, which is unique to Clareira. So he clarified that whole misconception of anyone's kind of tossing around grenades at Clareira that you guys are 100% committed to Apache. Yeah, CDH. Or CDH is free. It's Apache. Yeah, CDH is 100% Apache license open source software. And in future versions of CDH will actually be based on Apache Big Top. So you heard it here. Clareira is banging the drum. CDH is open source. It's free. The only charge for that added value product that's unique to you guys, which is the Enterprise Edition. Exactly. Pretty simple. I mean, it's not complicated. You got free Apache, CDH, and Clareira's unique differentiator. Exactly. And that's continued to do that. So congratulations. You got great reviews. We had just had a guest on, had a great review of the Enterprise Edition, the new UI that is fantastic. That product's really taken off. Congratulations, Eli. Thank you. Cube alumni, we've had multiple queues with Eli. Thanks for coming on theCUBE. Appreciate it. For the user. Did you ever use Excel before? Yep. You can use all products. Really, it's that intuitive. Yeah, it's a spreadsheet. Stefan, sorry, I jumped in late there. I didn't take a bio break. Nice to meet you. Welcome to theCUBE. We heard from other folks in theCUBE pay attention to integration. So the practitioners were talking about the integration. Obviously, data sets are sometimes siloed and it's a challenge. And what's the advice you have for folks out there who want to start using the BI stuff and using social data, it's diverse data types and want to develop on that. How do you guys approach that and how do you talk about that data integration component? So given our experience, we're really focusing on data integration and IT ecosystem integration in general, right? If your product is not monetary, you can't monitor the product with an Agios or with HP Tivoli, you don't even need to go to a big organization. If you don't integrate with LDAP and don't understand what LDAP or Active Directory is, don't even talk to those guys, right? We have a lot of customers that even have mainframe data and all those kind of things. And we're really good about this because the reality is you usually come into an IT ecosystem that's grown best of breed for many, many years, grown over 50 years and you really need to interact with those systems. So again, we have REST APIs, you can monitor, we integrate with security solutions, pluggable. We can talk to mainframe machines, in fact, because for example, the financial industry, that's very important, and so on and so on. And I think the beauty, and that's the way we think about Hadoop. The beauty of Hadoop is it eliminates limitation in storage and compute. Since 30 years, we do ETL, massage data into star schemas because we had this limited amount of hardware. We created indexes and perfect star schemas that by the way, every year change because there's no perfect star schema to really optimize data so we can reasonably interact with it. Because we can only scale up, there's only so much HP or SGI or ABM hardware you could buy for money. With Hadoop, it's not about big data. You know, if you ask me, by the way, big data is a big buzzword from big companies to make big money. It's like Web 2.0, it's a good buzzword. So it sells well. What is beautiful about Hadoop is that the limitation of storage and compute disappear. We don't need to do extract, transform, load upfront. We can do that later. We can pull all data, any size, structured, unstructured, even image data into Hadoop, and we can go from the raw data. It's not environment-friendly. We should think about this. But it provides agility. It provides us the ability to make mistakes, to go back. And really, if we just have the tools that empower business users, we can get so much more insights. I was talking about his startup, Trasada, which is in the financial services area. It's really a more focused vertical for him. But in general, what he's saying is essentially we're doing, we're solving problems we never could solve before. So the word that I love these days is schema-less. So it's enabling a whole new set of opportunities. As you said, it's a whole new world for us to get that data. Question specifically is, because you have a little visibility into kind of the old way and new way. What is the disruptive forces around the old guys? So there are guys making some big money rolling in the old data warehouse to hold business intelligence. How are they being disrupted right now? And can you share your observations, any anecdotes around that? What are those disruptions? And also you got free Hadoop, it's free. You have services that people can build on top of and new products. So, and the old models, again, they charge a lot, but might not be relevant. How are they being disrupted? Okay, let me maybe get a little bit more technical to answer this question. So, RDBMS databases are really optimized for random read and writes, right? They are transactional systems. And they really made for writing and record, updating a record, deleting a record, or reading that one record. They really wasn't, RDBMS wasn't built for data warehousing. Though, in the last 20 years, we misused RDBMSs with all the heavy lifting around B3 data structures as a transaction for data warehousing. And if you really look at the way how data is stored in databases, it's a waste. You know, you blow up data to create indexes, you actually don't need for data warehousing. Because if you look at the profiles of data warehousing, you usually do joins, full table scans, those kind of stuff, you do sequential access to the data. Where Hadoop is very, very strong. And what is very interesting, I mean, Hadoop didn't break or change loss of physics. What the difference of Hadoop is that it accesses data sequentially. And what happened, and Doug Cutting talked about this, what happened about the last 15 years is that the reading head in a hard drive couldn't move faster. It still takes eight milliseconds to move that reading head. But the disk is now spinning much faster. So we can sequentially read much more data, but not randomly read faster data. And this is really where Hadoop comes from, and to talk where it's disruptive. In a whole data warehousing space, where we really need sequential access, I think the amount of business that the traditional RDBMS vendors do will change into more kind of a Hadoop environment. And we see that, I mean, every big software company is having a Hadoop offer right now looking into this. And you got, honestly, in the storage side, just a great point about the disk. I mean, it's physical head, now you got SSDs out there with flash, even accelerates that piece. Makes it a little bit faster, but it's not necessarily gonna... That's a very good question. You're arguing essentially the track capacity is the limiting factor there. So, okay, let's talk about this. So you're the technical guy, Doug. We have to spun, now we gotta go. So I'm a German engineer, let me... That's it, fast cars and spinning disks, you know. In fact, with SSDs, and even with memory, sequential access is thousands of times faster, even in SSDs and in memory than random access. So the idea of Hadoop will not disappear with SSDs, even though SSDs get cheaper. It's great, everything gets faster. And certainly there's a boost for random read and writes. But the overall performance difference of sequential versus random access is tremendous. That's great, we had Facebook on Jonathan Gray yesterday talking about how people, they store email and people don't read it, it's hanging around, it's passive data and then they're focusing more on active data and passive data. Does that dynamic change the equation? Can you elaborate on that notion of active data, passive data, tiering? It's kind of a storage concept, it plays into Hadoop and then if it's there. So let me share a little story about active and passive data. So if you go to your bank maybe three months and look three months ahead on your credit card transaction bill and you say, hey, this $500, not for me, that's fraud. They look at this $500 three months ago, you're right, that's fraud because it's passive data. The reality is it's on tape, it's somewhere in the basement, they need to find one guy that goes down into the basement, find that tape, puts it back into the rack, reads all the records, it's not those $500. That's the reality today. The reality today also is that we have very big retailers that only can store and analyze the last three months of data. We at Datomiya called it Enterprise Abnesia. They can't really compare last Christmas with this Christmas and you know, so one of them. And if they do, they have to build an expensive. Incredibly expensive. They said 14 days they keep data and they expand that with the Cassini to 90 days. Yeah, that's great. It's still way less than we need as consumers. Right, well, imagine you would only remember the last 90 days, right? And this is how you run companies. Could be good for business in relation to that. Certainly, this is how we run companies. We really just looking through a keyhole, right? So now we have a customer that stores 250 trillion records on a Hadoop cluster and is building out this Hadoop environment. And they're looking at transactions over the last 10 years. They're getting insights in customer behavior and fraud and in how their business is growing and potentially can be extended they could never see before. And they brought their storage cost on from $27,000 per terabyte to $300 per terabyte. If you build a Hadoop environment like this, I'm sure your hot webinar gives you a good deal on the terabyte, but all this data now is quote unquote active, right? You can analyze it and with our product, a spreadsheet, there are hundreds and hundreds of users in this organization that is now looking to this data and getting insights. The value proposition is very compelling. Stefan, thank you for coming on theCUBE. We got a break right now, but I'm sorry for the scheduling. Just one quick thing. I know you guys got a little bit involved in the who contributed more code debate. We wrote about that, but a datamere send out a tweet. Who's Hadoop is bigger? Yeah. I had to ask you about that. So you got into the conversational nest, huh? So we at Datamere try to have a good time. We do a great software, but we also try to see everything with a little smile there. So I did a blog post in that context because there's a few companies trying to have a conversation who contributed more, and the blog post starts around like, actually Datamere contributed more. Of course, it was a joke. And we made a t-shirt, and you can actually go to our website and get the t-shirt that says, my Hadoop is bigger than yours. So if your Hadoop cluster or your code contribution is bigger, please come on our website and get a shot. For the past year, which is our second. I have one for you too, of course. If your Hadoop is bigger than someone else. I certainly want that t-shirt. Yeah, we love to play jokes on theCUBE, SiliconANGLE, Wikibon. And last year, we recovered Hadoop. We're the only one doing it now, everyone's covered. But the joke was, we're going to come out with our own distribution of Hadoop. And we kind of announce it every year. So your elephant is faster than everybody else's. I figure EMC has one. Yeah, I mean, why not? SiliconANGLE is going to launch our own distribution for Hadoop. Come see at SiliconANGLE.com and download it because you can find the link. That's great. Thank you for coming on the queue. Thank you very much for having me. You appreciated it. Great to meet you. Thank you. Sorry for the scheduling.