 Hi, welcome back to Hadoop Summit here in San Jose. Jeff Kelly here from Wikibon. I'm here with my co-host. John Furrier with siliconangle.com and Matt Bile is here with DataStacks co-founder. Welcome to theCUBE. Thanks for having me. We had you guys on at Strata two years ago. Lots changed in the marketplace, a little bit, so. We also had them on more recently at the last strata as well. Yeah, you guys have been big supporters of theCUBE. We appreciate that. You guys are in early and obviously the database world is going through master change. I mean, Hortonworks comes out of nowhere a year ago. Look where they are today, heavily venture backed. So what's the update with DataStacks? So take us through where you guys are at right now, relative to all this discussion of HBase, HDFS, and MapReduce. Definitely. Where you guys sitting in that equation. So we've always been the company behind Cassandra. Cassandra is an online database. It's part of the NoSQL race, so very high performance, very scalable, very distributed. And obviously as big data's continue to grow, our focus has been, how do you bring more and more of the compute to the data so you don't have to move between different systems? So what we've done is, we've got a Hadoop offering where we've integrated Apache Hadoop and Hive and Pig, et cetera, and Mahoot with Cassandra. And then recently, we've also included Apache Solar in that. So now in one offering, you can get your online database needs through Cassandra, your analytics through the HOOP stack, as well as your search needs through Solar, all completely compatible with the Apache distributions of each of those. How about the, share with the folks out there who are like picking the tires. There's also some guys in the industry who know the inside baseball, but for the most part, there's a whole new wave of audience coming in saying, hey, I want some of that Hadoop stuff, Hadoop synonymous with big data. There's a lot of use cases outside of just pure Hadoop MapReduce. And you guys hit that, right? It's a huge search function, different data sources. Talk about that marketplace outside of Hadoop. There's a whole nother world obviously going up to the stack to OLTP and other high-end scalable systems. I think you nailed it. If you look at where Hadoop came from, it was for offline batch analytics and doing incredibly intense things with unstructured large data sets. Whereas what we've done with Cassandra came from the other side, which is give me all of that information that needs to be served in some millisecond response times, and what can you do with that? And so one of the really nice things about Cassandra is its ability to have basically as many data centers as you desire, and then have all of those be active for reason rights. So a lot of people, whether it's in the cloud of their own data center, say, I need my data around the world online at all times and have the data store keep that in sync. So what we've done with Cassandra and the search integration is whether you want to do sort of advanced key value store queries via Cassandra, or you want to do intense full-text search queries in real-time, something like a Google-type scenario. We provide that out of the box. So e-commerce is really popular if you want to do a product search. If you want to do something like even financial transactions, that's not possible with our technology. Apache is obviously growing huge, right? It's an open source. First of all, we're huge open source fans and we personally think that our research that open source is going to be absolutely going to disrupt even more than it ever has, ever. But open source is like a NASCAR race, right? Someone's in the lead. Next project comes up. HBase right now is up there. How does Cassandra play in that NASCAR analogy in the sense of you guys have a good track record. There's a huge community behind it. Not as much hyped up because you get passed with some hype. So talk about how you guys fit into the overall ecosystem for the folks trying to unpack this. I think that people have really seen Cassandra as the solution when you need your online data to always be available. One of the nice things it does really well if you look at the traditional cap theorem is it sacrifices consistency for availability in that scenario. Or it's actually tunable, but that's a longer conversation. But if you need something that can always be up, whether it's a server is up and running or a rack falls over or even an entire data center goes offline, Cassandra's the choice because you can lose any server at any time and the system as a whole is always available. There is no single point of failure like there is with the name of it in Hadoop. And so that's a real strength when you're talking about mission critical can't be down for even a second type data. Where do you guys see the sweet spot there? Your wheelhouse application wise? Is it hyper scale, financial services, all the verticals? It's honestly, we're very vertically independent. We sort of like them all and they all sort of like us. And I think the key is though really when uptime matters more than anything again across any geographical location and performance of course is the key there. When you need sort of sub 10 millisecond response times and you really need your data to be really fast, we're the key there. You know, Google and Amazon have great white papers where every 100 milliseconds of latency cost them 1% of revenue. And so you obviously don't want more latency in those scenarios. Dave Calante, who's our normally the co-host, Jeff Boss is a big horse racing fan and probably an addict on the betting side. I can imagine knowing Dave. But he has a phrase, horses, courses for horses. Courts, no. Courses for horses. Horses run better on certain tracks. So here, this is Cassandra, right? So that's essentially what you're saying is that for you guys, high availability in production environments is the preferred solution. Yeah, the phrase that we've been using is continuously available. At no time can the system be down. And whether again, whether you lose a data center, that happens. Data centers unfortunately go down when a backhoe cuts a piece of fiber or a car runs off an interstate and lands on a power supply, it happens. Amazon East has been down a couple of times over the last few years. And a lot of the, in fact, all the guys that were running Cassandra out of two data centers didn't even notice that. I mean, let's be clear. We're big fans of Hadoop and all the rah rah around the Holy Trinity, HBase, HTFS and MapReduce, which I think is the Holy Trinity. But it's still early. Look at what they're announcing here. High availability and the name note issue, right? We didn't actually get that with everyone. I want to talk more about that. But just up until the CDH4, there was one problem, one name note, name note. So you guys are well ahead of the pack hence the horses for courses analogy. So talk about data stacks. What's going on with the company right now? What are you guys at? What's the current product focus? Give us the inside track on that. So obviously, just like probably a lot of people in the sort of big data market right now we're going through hyper growth. I mean, we are just over two years old and I remember when there were two of us and now we're up to north of 50. We did a series B, I guess, eight months ago and so that's been really successful. We're really happy with bringing on a new board member. And in terms of customer accounts. Which BC did that deal? The latest one was Crossline Capital. Eric Chin joined us on Crossline on the board. And we're very excited that our customer accounts now just, I think it's just the 200 or just under. So we've seen a lot of success there. Again, it's across all verticals. So it's really fun to go meet a lot of these users and hear about interesting use cases where they affect the end user very differently but behind the scenes a lot of the data models are actually very, very similar. So we see a lot of success across different industries as a result of that. In terms of the product, you know, we actually, I think today announced Datasax Enterprise 2.1 which is our main offering that combines Cassandra with Hadoop and Solar. And the release today integrated Mahood into that offering as well. Along with some minor enhancements where you can have multiple deployments of HDFS across multiple data centers. I'm sorry, you can run multiple instances of HDFS in the same cluster now including multiple data centers and then run independent jobs across each of those file systems. So you mentioned, you know, seeing some really interesting use cases and customers. So talk a little bit about who your customers are. How do they come to you in terms of, you know, what are their issues? Does it, do they come to you because they have those continuous uptime as the original kind of pain point they're looking to solve and you offer them now a really comprehensive platform to do some other things as well. So kind of how do they typically, how do you engage with the customer and then maybe go into some use cases, some examples you're saying. Sounds good. So, you know, obviously one of the really nice things about our business model is we are the company behind Cassandra and I believe almost 100% of our leads have been 100% inbound. So people come to us which is really, really nice. In terms of why they come to us, a lot of times and what we hear and I'll use an example in a second is whenever it comes to things like either scalability or performance or the ability to do multiple geographic regions or even on top of just commodity hardware, other solutions just can't do it. You know, we run truly on really low-end commodity hardware and so, you know, a really good example of that is Netflix who moved off of Oracle onto Cassandra on the Amazon Cloud for almost everything they do today and they run, you know, you go log in tonight and you'll, all the backend data store for that is Cassandra for almost all of it and so that's something where we see a lot of scenarios in e-commerce such as companies like William Sonoma have talked about how they use Cassandra for their registry system and other e-commerce sites where you simply can never be down or you lose massive amounts of revenue. All the way to major financials on Wall Street who are doing things like tracking ticker data in real time. Those guys are a little more secretive about their names just because they view IT as a sort of competitive edge but it's rare to find online stores that can handle that massive amount of throughput and actually keep up. So we, you know, we talk about, you know, your platform kind of having the three pillars, if you will. So that obviously, so managing that kind of environment, is that a, how do you go about doing that? We talk about, you know, work on management and making sure each workload is working optimally. How do you guys approach that? So there's two things that are actually really interesting about that and the first one is, unlike a lot of the other Hadoop distributions, one of our really large advantages is operational simplicity and I'll define that real easily. Every node in our cluster has the same role. There aren't different types of nodes like the data node or the name node or zookeeper, et cetera. There is a single type of server you deploy and you enable the services you desire on that. It's all one JVM which makes things really simplistic from an operational perspective. Going back to Netflix, they manage, I think, 55 clusters or they've said with Cassandra and they use three people to manage that entire system. So it's somewhat simplistic to do it operationally. Now the second piece is around how do you manage all the interaction there? So one of the nice things about Cassandra's architecture is we utilize the strength of that architecture to afford workload isolation between the different workloads. In other words, I can have my data set between the online, the search, and the analytical piece all have the same data but I don't have to worry about one workload killing the productivity of the other. So the last thing you want to do is kick off a MapReduce job and the response time goes from five milliseconds to five seconds on the Cassandra portion of the cluster. But utilizing our workload isolation, that's not a concern but there's no manual ETL to do that. And so that's a really big strength for a lot of our customers who say, hey, I don't want to move data around between different systems. I don't want to have to worry about that. Why should you? It's not your core competency as a business. So I use this to use Data Sets Enterprise to accomplish that and that really lets me focus more on the business problems of what I'm building around the status core. And in terms of, you mentioned Netflix kind of moving from an Oracle environment to Cassandra, could you just have kind of big picture? I mean, what is the NoSQL world, Hadoop, Cassandra? How is that disrupting in your opinion kind of the more traditional, certainly the traditional database market, Oracle, but also IBM who are, you know, some others are, Oracle and IBM are kind of playing in this Hadoop world a little bit themselves, but what are you seeing in terms of the NoSQL world as a disruptor? I think the key is that if you look at the past, we had relational databases and if we had an online transaction, we pushed all of our data into that relational database. And we squeezed it in. I've done it before I've ran. When I was at Rackspace, you know, we ran the cloud apps division completely off of either MySQL or MSSQL and we basically partitioned the crap out of those databases. But we ran it for millions of users that made it work across multiple data centers. Now that there's options finally available, I don't think that the relational database is going to go away. I think there's great use cases for a relational database. I just think in a lot of areas where scalability matters more than things like high end performance features, a lot of these new pieces or a lot of these new NoSQL offerings make a lot more sense because they solve the bigger problem, that being scalability. And that's going to be a problem. I mean, that problem's only getting worse or if you're a vendor, better in a sense as data volumes grow. And that's really going to put, I think, companies like Oracle a bit in the bind. I mean, their approach does not throw more money at the problem as opposed to kind of that scale-up on commodity boxes and that kind of architecture. Yeah, I think Oracle will have some sort of reaction to that. I mean, they're pretty smart guys over there and they know what's going on. But I completely agree. I think the future in terms of the vast majority of future data growth is going to be in this type of technology, not the relational one. So, talk about when you guys go to a customer situation or are you still dealing with, you know, Xander, it's still new just like Hadoop and HGFS, kind of a burgeoning technology to a degree. Certainly not, doesn't have the legacy or the history of that relational database. What are you encountering in terms of attitudes towards adopting a new approach to data management? We talk a lot about, in the Hadoop world, is Hadoop Enterprise ready? Of course, a lot of the work you guys do is exactly around that, to make Hadoop a more robust platform. But what are the kind of the attitudes that you're getting or you're hearing from customers when it comes to, are all these NoSQL and Hadoop-based technologies, are they ready for prime time? I think they're right about on the edge of being there. In fact, I saw, there was a great interview someone did online that said, how many of the Fortune 500 are using these new technologies? Or how many do you think are? And my gut says everyone in the Fortune 500 has at least one project and either something from the Hadoop space or the NoSQL space, if not both. I think that from our perspective on the Cassandra side, we've done a lot of work over the last year and a half to do things that make the adoption a lot easier from a development perspective. So we've introduced the Cassandra query language, which is very much like SQL, which really eases developer adoption. But at this point, we've got CIOs and VPs telling us, I can't even do this with the old technology because the vendors have told me it won't work at the scale I want to go to. So I have to go to something new. I would say 2011, at the beginning of 2011, there were a lot more question marks around, is this right and ready for prime time? And now it's almost assumed that this stuff is really going to make a difference in their business because they've seen the success stories elsewhere to know what's going to. Right, yeah, you're starting to see. That's classic vendor Olympics, which is urinary Olympics, we call it, where they're going to put the foot in the water because they don't have the products. And they own the account. I think it's just a matter of use cases for you guys. I mean, I'd like to know how you look at the, as you go out and sell more and scale up your business, what are the top three use cases that you guys are seeing come out of them? So the two, and this is a somewhat generic answer, but there are two that really stand out to me. One is anything with time series data. I mean, that could be something as simple as a tweet stream, which doesn't have that much of a different data model than ticker data on Wall Street. It's just sequence of events that happen, or it could be when your phone is blipping its GPS every three seconds, so you can log in and see where it's been recently in case it gets stolen. And if you think about those data sets are astronomical, especially when it's coming from device-generated data. It's just gigantic, how many times, or the amount of data that one device can leap out. And at the same time, the devices per person are just going through the roof right now. My scale now hooks up to the internet to record my weight every morning. On the other side, I think anything involving the cloud from a user perspective is big. So things like SaaS are really important, because it's not that I don't think one user's worth of data is going from 50 kilobytes to 50 terabytes. I think it's that you have millions of users in one place, as opposed to being siloed previously. So those two use cases are just gigantic for us. Great, so in terms of what you're seeing here at this event, what are some of the more exciting things you've seen, what's your impressions of what's going on here, this environment? Is it more of the developer focus, less business? I mean, what's the vibe here? Well, there's 21 people here, right? So who thinks Hadoop is just a fatter isn't going anywhere. I think that's the big one, is that this big data thing starting to catch on finally, especially to the masses. On the business side too, specifically. The C-level suites are pushing down, messaging to IT guys saying, get your shit together, basically. I think you nailed it. You know, two years ago I was at one of the private investor retreats where they have you come in and talk to all their investment bankers and CIOs just to sort of ask questions about this big data thing. And half the questions were, what is this Hadoop thing I keep hearing about? That was two years ago. Nowadays we're hearing, hey, what do you do differently than this one? What's the next thing that's coming out? And yeah, I think it's starting to hit the C-level very, very strongly. They know what they're doing with it. Val from NetApp, or Chelys, the CEO, CTO, Office CTO said, all in NetApp's business opportunities in big data are not coming from IT. And all their salespeople sell it to IT, right? So you're seeing kind of that. And when we were at last week, at the Cube to HP Discover, they bought Autonomy. So one of the cultural mismatches is HP sells to IT and does a good job at it. Autonomy sells to the business case. So it's a whole other, in the tech scene, it's a mind shift, right? So you're out in the front lines doing the solutions. What's your experience with the data stack solution? Because you guys are in that middle ground, right? You've got to bridge both worlds. So one of the really good things for us is we're an infrastructure player. And so we are selling to the guys who really understand the need for high performance, very scalable infrastructure. And we actually have a nice value prop because of the fact we run a commodity hardware that the guys that actually are at sort of the highest level there, immediately say, hey, you're going to save me a lot of money on CapEx on my hardware? Absolutely, let's talk right now. On the flip side, I do think that there's a huge opportunity for that next abstraction level up for people who are helping a lot of the business folks understand what the big guy's doing. And that's a lot of the guys here. Like commerce fear, right? Yes, exactly. I think that's a huge opportunity to start to say, don't tell me, don't answer the question I'm asking for. Tell me what data to pay attention to. And I think a lot more of the BI folks will be doing more of that and there'll be opportunities coming out of that market in the near future. Okay, Matt Fial, co-founder of Datastacks. Thanks for coming inside theCUBE. You're a DevOps guy, as we love to call it. You know, X-Rack space, I have to read that in there. You guys are selling infrastructure. Congratulations on all your success on the business. And thanks for coming inside theCUBE, appreciate it. Thanks for having me. Okay, we'll be right back with our next guest after this short break. This is theCUBE's SiliconANGLE.tv's exclusive coverage, continuous coverage of the Duke Summit 2012. We'll be right back.