 We're here with the co-founder of DataStacks. Matt, welcome to theCube again, congratulations. Thanks for having me. You're the co-founder, right? I am the co-founder, unless they took that away somehow. Do you actually go through the code base and look at all the code over here? I have not looked at that code in a very long time, but I spend most of my time these days with customers figuring out what their needs are and how to make them more successful. We're here with Jeff Kelly, our lead analyst at Wikibon.org, our research team at SiliconANGLE. So we can be technical, but we'll be like a helicopter to go up and down because really what we want to do is we've had a lot of geek conversations. We talked with Jonathan, give a keynote. We had Adrian off from Netflix, talking about his demo. So let's do some, let's range on that. But really the number one thing that we find in the marketplace that people are kind of scratching their heads on who want some signal is the noise around Cassandra's not relevant. Hadoop's won, Mongo's better for this. So you go on Quora, you go on the threads, depending on who you talk to and what school they went to or what language they write in. They liked this or didn't have the documentation, had a bad experience with this, loved it here, hated it here. It's a lot of noise. Yes there is. Help us Juan, trip away that noise and talk about Cassandra, why it's relevant today, where's the traction on it and where's it going and then we'll get into some of the specifics. Sounds like a good thing. So first of all, the traction that we've seen with Cassandra is really when companies want to build their business on top of a platform that not only scales but will not fail and they can legitimately trust their information with it to run the business. So we see a lot of interest in the Fortune 500. We see a lot of interest where people are moving off traditional relational databases like Oracle's because they don't scale as well and we've just found that as our sweet spot at this point. So if you look at the lineup here today, we've got companies everywhere from Disney to the Netflix's of the world and they're just talking more and more about how they can use this. The theme that I've been hearing is that production is a key buzzword. So both on the positive and negative, on the historical sense, there's been a lot of dings on quote, the quarries of the world that bring up quorum because I was just reading it again last night. No, I tried in production at work but then yet there's a huge traction on the numbers in production, right? And more than others, when I say production, I mean like high availability, scalable, multiple data centers, I'm running a real business, not some further type app. Is that true? Absolutely. I think that... Drill down on that and explain it. I would even go a step farther and say we might not have even the most production deployments out there but I bet you we have the most production deployments of people with their core business on this technology. I think that Cassandra is really, really good at staying up and running. It's really, really good at scaling and it's really, really good at performance on top of that. And when you look at those features combined and someone's going to bet the farm of saying, you know, if this technology goes down, I legitimately lose revenue. You want to make sure you trust it. And we, you know, go talk to the guys around here today and you see a lot of that scenario. So Matt, let me ask you a question more. We've kind of changed gears. We'll come back to that. I want to have a couple more questions that come back to Stratocon. The original Stratocon, we had some conversations with folks. That's when Hadoop kind of made its move. We talked about Cassandra, Mongo, HBase, and Hadoop kind of as NASCAR. Someone's in the lead, someone drafts, slingshot around the next turn. So it kind of is like that. But let's first, before we go there, what's the personality of the Cassandra environment, the ecosystem and the community? The personality of the ecosystem. I mean, every community has a personality. Alpha geek, problem solvers. We heard John say they're problem solvers. Expand on that, explain to the folks, what is the kind of the personality of the group? Besides being beer drinkers, we know that for a fact. There's some whiskey and vodka drinkers in there too. I think that the personality of our community are realists who are smart and know how to pick a tool that is right for the job. They want something that's not fluff. They don't want to do extra work than is what is needed. They don't want to do things just as it's cool. They say, I have legitimate real world problems and I want a tool that's going to make my life and as a result, my company's life significantly better. So very practical, very smart, hardworking people. That's interesting because that's the practical is the same term Jonathan used and as opposed to kind of the more academic or theoretical approach, which you know, he creates some really cool technology, but how do you translate that into doing just what you guys are doing, which is supporting production level environments, people running their business on this and you can't do that if you've got, you're focused more on the theoretical, hey, look what we can do versus, hey, look what we can do and actually make it work. You don't make a lot of money just by writing academic papers all day unless you're in academia. We have things that, we have specific problems that we hear from our users that we solve so that their lives in that production environment are better. Let's lay out the horses on the track. Horses for courses. People run better in the mud and the grass, whatever the horse runs on, Mongo, HBase, Cassandra, Couch, SimpleDB, Dynamo, whatever. Well, let's take the main ones, right? Mongo, HBase, Cassandra, seem to be the top, you know, Couch, maybe, but I'll take those three as most controversial in terms of ones that a lot of people paying attention to those three, Mongo, HBase, and Cassandra. Break down the horses there. I mean, which one's better for which to use cases? I mean, I'm not saying one's better than the other because we're hearing different use cases, right? HBase, you know, great for this, but don't try to put it here. Mongo, great for here, but don't try to put it here. Is that true? Or is it just too early? If you go back, you know, and look at the last 30 years, we always try to cram every problem into one of probably five relational databases, right? Oracle won that race with 11G for the most higher end solutions, MySQL, MSSQL, or around Postgres, but we crammed all of the data into one of those relational systems. An easy analogy is, if you give me a nail and a screwdriver, I can get that nail into the wall. It might not be pretty, but I can make it work. Well, now we have a lot more tools in the chest, and so I do think that, you know, that's a really good thing for technologists everywhere because you can pick the right tool for the job, but at the same time, you know, I think the areas where we see Cassandra really, really strive are, first and foremost, scalability, especially when it matters on high availability and never going down. I think multi-data center is another close one, and then I believe that the third is, honestly, is Jonathan Cezna's keynote this morning, anything involving time series data. But we see those three use cases, and I can't stress enough that high availability of always being up and running is a really, really important one for us at whatever scale. There was a quote on Cora I want to read to you from a guy named Steven Elliott. I don't know where he works, but it was a good quote. He goes, I would not place a huge amount of weight on any language or framework. This is about, you know, I want to get a NoSQL database. Each person claims they specialize, and the truth is that any programming worth their solid would be objective and experienced enough to pick the correct stack for the job. I would say the best way to see how good a programmer's architecture really is, is to ask them to sit and listen to what you're trying to do and recommend the proper stack and justify it as reasons. Do you agree with that? Completely. I remember when I was in school, a college, and I was learning how to program, you know, formally for the first time, the language they taught us was C++. And you know, we were like, oh, okay, we're learning C++, and someone had the question of, well, what happens if I need a job that wants Java or PHP or whatever it is? And this was freshman CS 101. The professor said, if you can't pick up a new language in a matter of days or weeks, we're not teaching you correctly. We're teaching you ideas. Languages are just methods to implement those ideas. Yeah, well, I mean, I have a little bit older than you. We had to actually learn the data structure, but we had Pascal back in the day, but we had to actually do a lot of assembler. C was great. C was very efficient with memory and so on and whatnot. But that brings the question back to Java. Java has been dinged, compared to other languages, it's still faster, but you have that C++ Java crowd, right? That's also comes up in conversations. How do you talk about that distinction between the two? Obviously, Hadoop is Java based, and Bigtable was C++. I think that there are, and languages are not something I am an expert at, but my programming days are behind me at this point. But I will say that, you know, there are strengths and benefits and weaknesses to both of those, right? The C++ crowd has a lot more control on their hands and they do get better performance, but they have to worry about memory collection. And when you mess up, it's really bad for you. Java doesn't have to worry about that. It's a much more efficient language for, because you just completely ignore the responsibility of having to manage your own memory. So there are pluses and minuses of both. And again, I think that that quote's really good where, pick the right tool for the job. Yeah, and that really comes down to versatility in the skill set, right? So which is a whole nother conversation. So let's go back to my experience at Strato, Riley's first inaugural conference. Actually here at the Hyatt in Santa Clara, and I had a conversation with someone in the Cassandra community, their name, I will not reveal our identity. Probably I was not authorized, but he was bullshit at the whole Hadoop thing. Oh, the best solution is not the best solution. So, you know, historically, the best solutions don't always win. They usually don't. Usually don't, you know. Look at the best technology solutions. Look at Microsoft DOS. That's my generation. It sucks, right? It's the worst operating system. Yeah, and then we've got Windows and then, thank God, now Apple's out there. But, you know, Marquis and Windows guys, I could resist. But no, but Cassandra has been known as kind of a core, well-efficient documentation. We've heard you guys took some lumps for that, fixing it, making the tools better in the GUIs, but was solid, but was being out-marketed by other environments. How are you guys responding to that? Honestly, it's not affecting the community. It's pretty solid here. Someone who's a founder of data stacks and in the community, what are you guys doing to kind of change the perception? Roadmap-wise, code-based-wise, contribution. Here's an update. I'm not saying those things are true, but I'm just saying that was kind of a sentiment of the crowd then. Because Hadoop was being pumped up pretty heavily. And I'll say that, like, first of all, I think that Hadoop is complimentary to Cassandra. You know, Hadoop's core is batch analytics. Cassandra's core is very low-latency, fast queries for online-type scenarios. So again, those are not either, not one or the other. They're very much, I should have both in my environment. Know what that said, I think, I'm very, very happy with what we've done at data stacks as we've grown. You know, this is the third annual Cassandra summit that we've now hosted. The first one was almost two years ago to the day. There were 140 people there. There were over 400 last year. And there's somewhere I haven't seen the latest count between 800, 5,900 people here today, if not more. I just haven't seen the latest take-accounts. The community is growing eight blocks. It's a packed house. It's packed. Yeah, it's packed. We got emails that said, data stacks employees do not eat any of the food. We had more people show up than we expected. So they took away our lunch. Now you've got sushi bar over there. No, that's true, that's true. Co-founder. So, every once in a while they let us splurge. With that said then, I think that one of the things that the Cassandra community, this is not data stacks, but the community has been really disciplined at its importance, we're not trying to be the best at everything. We are not trying to go a mile-wide and a mile-deep. It's impossible. We have really chosen some of the larger data, very important data set problems as the ones that we're going to focus on. And I think those results are speaking for themselves. And I think that the users and customers of our here today are giving great testimonials to why that's important to them. Talk about those use cases real quick. Disney's in the other room talking about how they are building a data infrastructure as a service to the other teams so that each individual team no longer has to build and scale their own database. So they're building an internal service to power the rest of their business. Any of the use cases that jump out of your production wise that are compelling that you can share publicly? That are spoken about here today, going through my head on all the great presentations. There's a great company that Matt Stump is the CTO of called Source Ninja, where they were working on tracking every single open source project for every single program that you've written. In other words, how do I make sure that every open source library that I'm using in my application is not only up to date, but if it's GPL I know, so that I don't have to open source all my code. Companies are trusting their core businesses on this stuff. We see Eddie over there, MVP, congratulations. He's kind of watching the co-founder in action. I know Jeff wants to jump in, but I got to ask you this next point, because it's really come out to us as a real differentiator for the Cassandra product and community as that is. And it's also affecting the marketplace in a big way and that is solid state. Solid state is changing the game on converged infrastructure, mainly because it's enabling people with on-prem infrastructure to really start changing their own economics and performance configurations, caching layers, you're seeing spinning disks, moves it back up, you're seeing tech guys and as mentioned, Eddie, when I was just talking earlier about creative architecture, being re-architected, IO-centric infrastructure. Complete, and just in the past 12 months within the enterprise, it's changing HP, ADEL, and IBM's landscapes. So that's going on inside the enterprise, the data center, among other things, power and cooling out of the stuff we track. And then externally in the big data world, in the cloud, you have massive tsunami of new apps, Greenfield or Cleansheet, Papers, stuff like that. Talk about the impact of solid state to one, you guys, your data stack solution, Cassandra community, and then what's going on with the customer. Come. So first and foremost, everyone says SSDs are more expensive. And if you go by a pure cost basis, sure, an SSD drive costs more per gigabyte. For a device. It costs more per gigabyte than a spinning media. But if you're doing a bill of materials for a device. It does not cost more on a per IO basis. A spinning disk can spin at most 250 times per second. So you get 250 requests out of that. That's if you don't factor in other things like taking away a sand, a million dollar sand. Right, and how much? For the cost of some SSDs. Exactly. So the cost stands, we've debunked that. Okay, perfect. So we can move on. So I don't even know where we're going then. That seems like the answer's here. No, I want you to give us some proof points. SSDs make it truly linearly scalable between the hard drive or the storage medium, to the memory, to the CPU, where you can map a certain amount of CPU to each one of those other components so that you can guarantee certain amounts of requests and do really, really good capacity planning for databases. I mean SSDs are like the silver bullet for databases. Yeah, we heard Jonathan say that. But the reason why we're asking you is we want to get real third party validation around our own rhetoric and reporting and analysis. But we're really also trying to share what the customer's impacts to their environments because what SSD is enabling on the converge infrastructure side is the same kind of disruption that's happening on the customer side, which is new thing, new way of doing things are now emerging. So for example, we've said, let's look at an angle that with big data, now new solutions can be arc that never were thought of before. Because now instrumentation data from the business can be rendered in a dashboard. So with converge infrastructure, the incumbents like Dell and HP and others sell servers. And you got Cisco out there and you got Juniper in the Sierra, now part of VMware. And you got storage, EMC selling a raise, right? That's the spinning disk. Fusion IO and violent memory systems are disrupting the market with caching layers up and down the stack. So this is game changing. So I'm trying to figure out for our audience, what are the use cases? Who's taking advantage of this? What architecture for Sandra or others that's best fit for that? I would say any application that's in an online setting should be using SSDs. It's that simple. Because the real benefit of SSDs is you eliminate seeks, right? So I'm going to get 10,000 requests per second at a minimum per SSD with a latency of single digit milliseconds, right? Really fast. There is no worse case anymore. It's linear. It is static. Use SSDs in any online scenario because A, you'll save money, which you already covered. And B, you get a better quality service for your customer. You take the guesswork out of it. All right. Jeff, I didn't mean to hijack the whole interview. Go ahead. No worries, Sean. So yeah, I just wanted to dig in a little bit more. You're on the front lines talking to customers every day. So, you know, wondering what you're hearing from customers in terms of, as we look forward, what data stacks in the Cassandra community is working on now. So what are some of the main requests you're getting from customers? Saying, hey, we love data stacks. We love Cassandra, but we'd like to see X, Y, or Z. What are some of the, what are the next steps you guys need to take to kind of take this to the next level? So the big thing that's coming in Cassandra that's really exciting is, it's, and we're introducing virtual nodes in a near future version. And that's in public. That's out there in trunk right now in the ASF. And what that'll do is that'll make recovery time of node failure significantly better. And adding nodes to the cluster will be a lot better. So we're really working on rounding the edges on some of the operational things, just to make the user experience that much better as they continue to grow. Because you know, obviously the nature of any of these systems is, you start with some nodes and you grow. Otherwise, why would you use it? And so we want to make sure that that growing process is as seamless as possible. So that's another step in that direction. There's some more information around collections that I heard about on the mailing list here today. I needed to dive into, and I just don't know enough about them at this point. But that group is moving for really rapidly. And I think they're targeting an October release right now for the next version of Apache Cassandra. I want to talk a little bit about security. We, you know, we're, I noticed you guys signed a deal with Gazang recently. So, you know, we've been, I've been thinking about kind of security in the NoSQL world. And there was a story in Wired last week or two weeks ago about a new company coming out of the NSA that's talking about building, mentioned that they kind of built cell level security into their NoSQL database. And how do you guys approach security? And how important is that to your customer base? Well, again, you're on the front lines. Is that, is security an issue that comes up, you know, a point in the top two or three points when you're talking to customers? Yeah, so a couple of weeks ago I was on Wall Street and we hear a lot about security questions and how would we accomplish X, Y, and Z? And our partnership with Gazang was a great step in that direction. So obviously for anyone that's in the fortune, you know, 1000, that's a bigger deal than the guys that are startups on Amazon today. They just haven't reached that aspect of their company's life yet that they have to worry about it. So it is something that comes up a lot and a lot. And obviously at Datasax, we're working a lot with partners to accomplish the things that customers need. Yeah, interesting. We talked a little bit earlier in the day about your kind of partner strategy, but maybe you could kind of articulate, I mean, where do you look, how do you decide when there's a part of your platform that you need to partner on versus kind of building in conjunction with the community? I think that our strategy is we always think about the customer first and we think about the user first and we think what's going to be best for them. And sometimes it makes sense because of the expertise that we have in-house that it's like, that's really core to the server and we should build that. There are other times where that's not something that we're necessarily experts at and therefore, if we're not experts, it's really hard for us to build an extremely high quality product very quickly and there might be other people out there that can do it better than us and we can get that end user satisfaction more quickly, such as with Gazang. In that case, it was perfect sense. You know, Gazang's based out of Austin. We have a team in Austin. There was a nice introduction there and next thing you know, we've got this offering that we brought to market relatively quickly. So we look at each case on a sort of a, each thing on a case-by-case basis to really find out how we can make the customer the most successful. That's interesting. So really focus on the core of what you do. If it's relatable to that, that's where you guys decide to invest internally versus find a partner for something that might be more periphery, but important, but not part of your core DNA. Exactly, like it goes back to the earlier thing. We can't be experts at everything. We just can't be. And if we try, we'll fail miserably at everything. So let's do the things we're really good at but again, never sacrifice the customer experience. Good strategy. Never. Yeah, I was going to say you never want to let the customer down. So finally, just in terms of, you know, we're seeing all the different NoSQL databases out there and John and I have talked today with guests about the horse race. Are we all of those different NoSQL databases? Are they all kind of, are we all racing to the same endpoint or is there, are we going to see an environment develop where there are really different NoSQL databases for different purposes and it's not going to be a winner take all type of scenario. I really think if you say that there are destinations that are use cases, I think that there will be different winners for different use cases. I just don't know why we would go back again to having something that can be the best at everything. I just don't think it's possible because I think because there are options, why would you not choose the best thing? It's out there. There are companies focused on different things. There are projects focused on different use cases and as a result, you'll get better natural results at some things than others. Matt, I want to talk to you about some, as you mentioned when you were in school, also you were back in the days when they were teaching C++'s 101, which is good, you can see it. You have some chops, good computer science program, but you're a little bit older like me, a little bit younger than me probably, but we've been living in a decade of great innovation where open source has created a lot of wealth creation and opportunities for entrepreneurs. The ability to stand up a venture from zero stage to prototype in market with validations, very little cost. The technology tax, when I did my first startup was, I needed to spend $20,000 just for some gear in a data center, in a closet, in a T1, and I had to buy a Sunbox, it sucked. It was hard. You got to go to the VCs and lay down and take your punishment, but now we live, it's all historic, now it's well-documented. But the lessons were learned. Twitter was started in the cloud, Zynga was started in the cloud, hey, there's some MySQL servers, jam up, throw some more, scale up, throw some more RAM in there, but we heard from Netflix that you have problems, right? Scaling problems, things are breaking. I think Jonathan called it tech liability or tech, something like that. Technical debt. Technical debt, yeah, which is true. But everyone wanted, I'm not saying how do you avoid that, but we're learning culture, entrepreneurs like to learn, developers seek information out. What would you say to the next generation of entrepreneurs who are using Node, JS for example, playing with Ruby, with Python, all these different tools, building apps, pumping stuff into the cloud, not thinking about what's downstream possibly. Is there things they could do now that we have learned that isn't a tax, that they could fundamentally from a computer science entrepreneurial perspective do differently than before? Because before it's simple, oh yeah, cheap gear, boom. I think, lamp stack up and running. What this has really enabled, and I think the cloud's actually a really big piece of this, because the cloud made it that there's no more CapEx, there's only OpEx. You can only program so fast, and languages will evolve and become higher level so that that can move faster. You can procure hardware via the cloud in a matter of seconds. I don't think that the development game has changed. I think what this enables us to do is it enables the developer and the entrepreneur specifically to move quickly, and more importantly, you're going to fail, so fail fast and iterate off of that very, very quickly. And there's honestly no excuse not to do it these days because of how low the entrance to the game is based on money. You don't need that much cash to start a company anymore. You just don't. So there's no reason to not take that idea and run with it. And if it doesn't work, it doesn't work. You know, nine times out of 10, you will fail. Yeah, push the envelope. That's right, but go faster, and it'll make the world as a whole a better place. We will see more innovation from more sources, and as a result, there will be more benefit to the average user like yourself and I. What is Cassandra's community doing to foster that creativity and to give someone comfort? Because young kids will just throttle up because they don't know what's around the corner. They haven't crashed enough to know that hurts a little bit. But obviously when you have failure, teamwork, and community is always there. One thing about Silicon Valley and the tech systems we lived in, especially Apache community has been known for this for years, is that there's an honor among thieves, if you will, and use that metaphorically speaking to help each other. Yep. So if you look at the Cassandra community, there's two angles to it. One is we're very open to accepting new people. You know, we acknowledged, I think, 17 MVPs today who've been building this community. And we are not there to drive an agenda. We're there to let them say, this is what the community needs to do next. At the same time, you do need some defensive, hey, some adult supervision, as you might say. And Jonathan at the top of the Cassandra committers is really good at saying, guys, I want to innovate, but we need to do so in a safe manner because we are a database, and if we break, that's the end of the game. We can't break. A database, first of all, stays up and running, and second of all, performs. So there's a nice ying to the yang between sort of the guys guarding it, but at the same time accepting innovation from some of the guys that come in. So the community as a whole can grow and then foster faster growth. Great, all right, cool. Outlook for the next year. What are you looking for for the next year? Just in life, in general, out looking forward. More of you guys at these great events. We'll be at Hadoop World in Strata. We're going to be at VMworld. We're going to be at Inch Health Developer Forum. We're going to be doing a remote cube at Broca's, doing a big data center thing. So obviously, you were covering a lot of convergent infrastructure and big data as our sweet spots. But we're ranging across, what are things, any other events I'm missing? I think you got them all for now, but I'm sure we'll have some more. I'm sure we'll have some more. We'll be on more red eyes. We'd love to go out into the field and talk to the smart people who can find. So, Matt, thanks for coming on the cube. I really appreciate it. Matt, the co-founder of DataStacks and participant in the community. Great stuff here. Great insight for entrepreneurs and just what's going on with Cassandra. We'll be right back with our next guest from Netflix right after this break.