 Okay, we're back live here at Stratoconference in Silicon Valley where the big data discussion's happening all around us. The world's changing. The business intelligence data warehouse market's changing and all new technologies are here making it happen. So I'm John Furrier, the founder of siliconangle.com and I'm here with my co-host. I'm Dave Vellante of wikibon.org and we're here with Billy Bosworth who's the CEO of DataStacks. Billy, first-time Q, but a fan, I understand. So welcome. Watch the guys many times, it's real thrilled to be on. You guys were on last year too at theCUBE. We had your other guy on who left DataStacks to start Plafora, which is an early stage company. I think they got a Series A from a big VC firm and somewhere in Silicon Valley I've never heard of before. But they only have $6 billion in their management. Small fund. Yeah, small fund. So Billy, you guys invented the Netscape browser. That's right. So you guys, all right, let's get into it. So you guys commercialize Apache Cassandra, right? Talk a little bit about that, where you fit in this whole ecosystem. Great, well, so Apache Cassandra is obviously very near and dear to us. It's what we started the company on. The founders were responding to a demand of people who were moving into production environments with Cassandra and needed help with that. And that was the early stages of it. Since that time we've evolved our business model a great deal. And the situation now is that we want to absolutely see Cassandra thrive and grow in the community. So we do a lot with that. We are obviously behind a lot of the commits that go into it. We help with the education process. We have a free version of it that we bundle with documentation, tutorials, that sort of thing. But then we also leverage it as a core foundational technology to a larger offering called DataStacks Enterprise. And the interesting thing about what we're doing there is we do not fork the code. And so when you're in open source world and you use a word like commercialize all the open source years go up, right? What's that mean? Are you forking it? We don't do that, that's not our philosophy. So our philosophy is we leverage Cassandra to do interesting stuff for enterprises in a big data platform. So does DSE replace HDFS? Is it? It does not use HDFS, that's correct. It uses the Cassandra technology which is a fully distributed architecture versus a master slave architecture. So there's really, you pick one or the other when you're doing these type of systems. And so with us, it's about you bring the data in via Cassandra and then you don't want to move it around. So we enable people to leave it where it is and then do Hadoop functionality on top of it. But the key thing there, it becomes your workload contention. So we guarantee that you will get workload isolation so that you're not in a situation where those resources are conflicting. So you're kind of bringing the performance of Cassandra with the Hadoop analytics by dropping in the NoSQL database. That's correct. It's the core of it and then what we do is it gives you all the architectural advantages that people love about Cassandra which is the fully distributed, it gives you the continuous availability, no single points of failure, geographical distribution for multi data centers. That all gets inherited, if you will, into the Hadoop layer. So you sort of get that all for free when you're doing your map reduce, your hive and your pig. Okay, and you said before you don't fork the code. Can you talk a little bit more about what you mean by that? What's your open source strategy? Right, so just to be clear, when we talk about forking code, that's a situation where somebody would take a version and say we're going to give the open source world version A but then we're going to take our own version and do all these special things to it to make it better. The reason that we don't adopt that as a business philosophy is because we want to see people come to us for the advantages inside of data stacks enterprise. But if we cause unnatural development behavior in the open source world versus in data stacks enterprise, it becomes a very tough transition path for an open source user to come, and we don't want that. So we want to see Cassandra grow and thrive and get better and stronger on its own. Our value then on top of that becomes the standard stuff, like supporting it, like web based tools for monitoring and management of it. But then the real value to the businesses comes in data stacks enterprise value add on top of what you could do with just Cassandra by itself. Scott, breakdown for the folks watching obviously Cassandra hot open source code base and community, very vibrant. You have Hadoop which came out of the woodwork that's growing really fast and being hyped up big time and illegitimately, I mean, but there is seriously hype behind it. Oh, sure. Talk about, it's legit, I think. Talk about why Cassandra Hadoop, break it down for folks out there that are actually doing real business because they're existing with legacy stuff. They got data warehouses, they got business intelligence systems. So it's not, they're not in the in crowd of the flavor of the month is going to be this or that or the other thing. They got to solve real problems. So talk about break down the whole open source opportunity relative to how you guys are adding value Cassandra versus Hadoop and how that gets applied into the marketplace. Sure, so you got to remember first of all you're talking to a 20 year relational guy. So I completely understand that when you're thinking about a business problem and you're considering abandoning a 30 year ecosystem you better have a good reason that better be a good business reason to do that. Yeah, you don't do it because you want to, you do it. No, you do it because you have to and you see some real advantages of that. And so when it comes to something like Cassandra what people see is the first question I ask people when they're evaluating technologies and they come to me for advice, I'll ask them how important is it for you for your application to always be available? And sometimes it's not. Sometimes you have a situation where downtime is completely acceptable. But when continuous availability is of paramount importance that puts Cassandra on a very short list and because of the architecture it's enabling that to happen not only in a local data center but across data center. So you can do things like disaster avoidance you can make sure that for performance reasons your data is closer to your users. So that is one aspect of it but we can say, well hang on in a relational world I have high availability, right? That's been around for a long time but what you don't have is the ability to then scale that to levels that relational databases just weren't designed to handle. Moreover, now you're into this notion of the data being semi-structured and unstructured. What's that mean? Well, it means that you're going to have an unpredictable set of attributes for any given row. That creates a lot of flexibility for developers but the relational world wasn't built on that concept. And then the last element is flat out scalability but at a cost effective rate so that you're not punishing yourself on the cost side of the equation and then limiting yourself on the future growth side of the equation but yet you're solving your mission critical problem. So those are the things that sort of So there's always like in the nuances of the communities of the alpha geeks and all the people in the open source community there's always mud being thrown between different open source projects people promote and jockey, working the coach all those other stuff that you mentioned. Okay, let's rise out of that and talk about the real world about Cassandra and customers and dealing with things like Cassandra and Hadoop. At the end of the day, people don't want to get stuck with anything, right? So they don't want to get stuck with Hadoop they don't want to get stuck with Cassandra if it doesn't evolve fast enough with the market and the pace of the challenges whether it's integrating the BI what's the update on Cassandra? How has it changed? You mentioned your business model is changing a little bit with the market. What's changed with Cassandra relative to those customers not so much the industry in fighting or conversations but really at the customer level what's changed with the product and the community? Absolutely, so the first thing is the general education level. Our biggest, people ask me all the time it's a very common and legitimate question who are your competitors? Then you get into this what I think is kind of silly mud slinging where you're all fighting over this little patch territory. Our real competitor is ignorance and I don't mean that in the fairest sense of the term I mean literally just the ecosystem has to get educated and caught up on how do you think differently about these problems? And so what's happened in the last I'd say six months in particular with some of the customers that we brought on board we have about to give you a reference point about 140, 150 customers. So when that happens what we're starting to see now in Q4 are names of companies that my mom would recognize not just these edge case companies and when I'm asking them how did you hear about us? What is happening to change things? They're starting with prototypes and they're starting with small projects and here's an interesting thing they're doing as well they're actually starting with if they have a large project with multi architectural aspects to it they're replacing one piece of a very large application structure with something like Cassandra and they're rolling it in slowly and then they're making it more and more a part of the critical stack. And what we're seeing now starting in Q4 and I'd say moving into this year is people are now saying okay I'm through the prototype now I've got some people who understand how to think in a new data model way I got some people who understand what big data really is and now that's moving from prototype into real production that's the big difference I'm seeing. So let's drill on that point about the prototype production obviously it's a new mindset but the solutions are new in the sense of predictive analytics and real-time analytics are on the forefront now but I was just talking with someone in the hallway prior to me coming on with you and there's these new solutions coming out but also he's also a data warehouse business intelligence guy saying we've been talking about this stuff forever every session he's gone to it's like hey I went to a conference you know 10 years ago same kinds of conversations except the predictive and real-time piece is now bolted on the front of it so question on that is are those things happening what kinds of conversations are new and which ones are the same old conversations? The conversations that are new are around the cost-effective and manageable scale that's what's new because when you think about those two things together cost-effectiveness and manageability right there often are at odds you will pay for simplicity or conversely you will sacrifice simplicity for cost so if you want to save money you'll do that in this new world it's about trying to balance all of those things together and doing it in a way that is going to also give you this flexibility this notion of the application developer not being tied to a rigid schema is actually a really big deal I was a developer for a long time and so to be freed from that rigid schema creates some good things but it also creates some bad things because now you're losing the discipline sometimes of people who really understand how to document the data and how to tell other people about what's in the system we used to have these folks called data architects and people who did all this management in this new world that's going to have to take some time to catch up so I'd say some of those are some of the things So Bill Schmarzo from EMC obviously has been on data warehouse business intelligence side of the house for many many years he's now at EMC he said that architecturally the data model is no longer the lock in do you agree with that statement? Yeah, partially I think there's other aspects to it but that is correct and the way I describe it to people who maybe aren't familiar with data models I say imagine that you're building your house and your house has these things called load bearing walls, right? So when those three things get fixed and you want to make a change to your house it becomes a very big project you can't just simply say I like to move all the rooms and reorganize all the rooms that is analogous to a relational database schema it's fixed, it's rigid, it's defined and good things come with that in this world where you have this so-called no-sequel I think flexible schema is a better way to say it but when this world you can literally wake up every day and say I want every room in my house to be different and you can do it like that and that's the difference between thinking about a rigid schema versus a flexible schema it's a very different way of thinking about things So let me ask you about some other trends that I'd like to get your opinion on one, obviously we had Scott Detson who's with Pure Storage, did WebLogic he's an old-time systems guy the comeback of the systems architecture the reconfiguration as one trend that I want you to comment on in terms of this new solution set around this data stuff two, flash obviously changing some of the database architectures around latency and three, the rise of HBase so just comment on those three things if you can are we coming back to a systems architecture programming model that's new and what do you think about that? So it's funny, the older that you are that you've been in this industry for a long time you start to see these things yeah, and you start to go this sounds awfully familiar right, this is are we talking about an IMS database? like for people who've been around a long time when they hear these things and so it is definitely reinventing some of the same stuff but I look at it not as a circle but maybe more as a spiral where, yeah, it sort of looks from your territory but you've actually moved a little closer to the target and the target comes around again that cost, that flexibility, that scalability that manageability things that before were awfully difficult to do but yes, some of the paradigms are going to look very familiar as you're on that spiral you're like, I've seen that before I've seen that story before so that's one aspect of it I think you're being unbundled like when you're talking to a bunch of folks with Flash, they're essentially taking commodity servers and creating master-slave architecture with Flash, high performance completely changed and this is not just for unstructured this is for like, you know, combination of both I mean, how's that playing out? Absolutely, so that's the second point the, we saw this coming a long time ago even at my old role at Quest we saw this, you know, like everybody five years ago saying it's just a cost equation now clearly the technology is what everybody wants they want that speed there's a lot of value to it but it was unreliable for a long time and it was very expensive it seems like we've hit that tipping point with Flash, where now that's becoming a standard so I actually was talking to somebody who's very educated in this space does a lot of research for one of the big hardware companies I was with David's lawyer and no, and I was asking him about the trends and he said, I think that spinning media will definitely have its place and it will probably still always be a larger percentage of the market but it's all about the use case if performance was my hold up I'm going to move to Flash if cost was my hold up I'm going to go to these cheaper and cheaper and cheaper spinning disks so it is impacting the market in a way but I think it's going to be a selective choice it won't be carte blanche I think it'll be a hybrid solution based on the use case and your checkbook quite frankly okay then the third thing and then I know Dave wants to jump in on some of those questions of Flash because I can see him jumping at the bit H-Base, the rise of H-Base has been very popular yeah so I think that it's really synonymous with H-Base the problem was okay I've got this data in Hadoop and I want to do these batch analytics against it and but I also now want to start getting at it real time if you think about that scenario it's the same outcome but in a different order than how Cassandra typically would get introduced into the equation Cassandra begins with the real time side and then with DataStacks Enterprise we want to introduce the Hadoop capabilities on top of it so H-Base is really a natural step if you think about it in terms of low latency data requests I have to get at it so I need a way to do that and if you're going to use it on top of H-DFS then that's H-Base if you have a different model where you're looking at something like Cassandra or React or Mongo or even something like Membase these are all different implementations but the use cases are and this is I think the point you're hitting on it might be better to say what's going on with the rise of real time big data because what we're seeing and I don't mean scientific real time you get into these debates is it, I mean quasi real time, near real time fast, fast, velocity, velocity and variety whereas Hadoop, volume these other things are velocity and variety so I have to be able to do it What other things do you mean like Mongo? Yeah Cassandra, Mongo, H-Base, React all those other systems so I think that's a distinction that's definitely emerging people are waking up to the real time transactional side of big data not just the batch analytics side I wanted to talk about that that was actually one of my questions I would love to talk about Flash too but I think this is actually more interesting and more in context real time, you're right there are a lot of academic debates about it the best definition I ever heard was from David Florey he said real time is fast enough so that you don't lose the customer Right You're going to act before you lose the customer I love that or the patient that's kind of not academic real time do you buy that? Absolutely Yeah I absolutely do I try and tell people it's more about queer response so if I'm writing a query and I'm expecting a response back to me that for most business people that's real time as opposed to I fire off a queer I fire off a job I go to lunch I go home I come back and that's batch so and then as you say the use case I actually like that definition I think that's very good because it puts a use case face on this and this is what we're struggling with in this world is use cases people like can only hear so much about the technology and the architectures and then they want to understand tell me is this going to help me run my business and so thinking about it yeah before you lose the customer or like patient that's an extreme example but that's thinking about real time in the right way I think for a business person yeah good I agree and then my second question is an organizational one you're a former DBA you've risen up obviously great career but you know started in the trenches good time to be a DBA or not you know you're all about data scientists and new skill sets what's your thought on that and what's your advice to DBAs so this is a question that's near and dear to my heart because A, yeah I came from that world and B, I have a lot of friends who are still in that world and C, I spent the last 10 years of my career developing tools for that world so I think there is a phenomenal opportunity for DBAs and here's why just like we were talking about a second ago a lot of times we've seen this show before we've seen how this stuff evolves what's happened in this world is because this has been such a developer an operation centric movement the DBAs have not had a seat at the table developers love to develop I was one right developers don't like to do maintenance they don't like to babysit they don't like something when it gets old there's going to have to be somebody to step into that equation who can sit there over top of this and guess what it's called you're administering a database these things are still databases they're not relational databases they're still databases so I believe there's a real role for somebody who can understand the data understand the value of the data understand how the app team works with the operations team and I think the people with the most institutional knowledge are the DBAs so if they're willing to step out of their comfort zone and start educating themselves on this I think they have a very strategic role to play if nothing else how do you decide? how do you decide if I need a relational system or a no-sequel? wouldn't it be nice to go to your 15-year DBA and say, hey, you're a senior fellow you've been around here a long time can you help us understand the difference? I think there's a powerful, powerful place for the DBA yes, the DBAs obviously have a lot of influence in a relatively narrow sphere you're talking about widening that into a more strategic role so good time to be a DBA you heard it from Billy Bosworth on the Cube all right, it's break time Billy thanks very much for coming on it was great to have you thanks guys good to meet you great to appreciate you data stacks growing, Cassandra, very popular 140 customers, congratulations I tweeted that out since you said it thank you wasn't you? yeah, we didn't really get into the customer but you had a great list of Netflix I know as a customer and there's a zillion you want to share your revenues while we're at it? I will hold that one for a later show I'll wait for the IPO how about a headcount, what's the headcount? I'll wait for the IPO we're about 45 okay we'll crunch the expense numbers and back into the revenue number okay Jeff Kelly will have that on this next report thanks for coming on the Cube bill