 Okay, we're back here live in Silicon Valley in Santa Clara, California. We are here to hire it. In Santa Clara, I'm John Furrier, the founder of SiliconANGLE.com. This is theCUBE, our flagship telecast. We go out to the events and extract the silver from noise. We are here at the Cassandra 2012 conference here with the CEO, Billy Bosworth from DataStacks and Jeff Kelly, chief analysts over there, a big data for Wikibon. Billy, welcome back to theCUBE. You're a CUBE alumni, you've been on many times. Yes. Thanks for coming back. Thank you very much. It's a pleasure to be here. This is your event. Like Hortonworks, who put on Hadoop Summit, you guys are shepherding and being the steward for Cassandra Summit 12. You kicked off the keynote, packed room. Share with the folks out there what the scene was like and how did you address the audience? And we'll talk about some of the things you're working on. Yeah, you know, it's really humbling to be in an audience like that because these are some of the top engineers I've ever been around. And when you have 800 plus in the same room, when you know there's a big IQ spread between you and your audience, you got to be a little careful with that. But watching this community grow and seeing how they're doing this stuff and then watching that room be standing room only, wrapped around the back, I felt a little bad we didn't have more chairs for everybody. But one of the things that I looked for in my last job at Quest Software when we were trying to identify a growing market was how were the events doing? How were the shows doing? And so when you see this kind of growth, it's just a humbling, exciting experience. You know, you got too small of a hotel, right? You need to double down. Looking like it, yeah. That's, I mean, we've seen that growth in your community's vibrant. So take us through the status of Cassandra. I mean, there's a lot of thought out there around all the different frameworks and databases. And in particular, Cassandra, you know, stacks up against HBase, Mongo, Couch, and you know, as Jeff Kelly's colleague and co-founder Wikibon says, courses for courses, you have different, you know, tracks for different solutions. So take us through where Cassandra's at right now. Obviously it's a vibrant community. It's pumping on all cylinders. Is that just the market or is it, what's going on with it? Why is it happening? It's the market, because the reason I can say that is that data stacks, where we leverage Cassandra to build a platform called Data Stacks Enterprise, we're seeing equal excitement and growth in real customers doing this stuff in real ways. And the market that we're looking at right now where I would say it's the most common fit, is if I talk to somebody and their first instinct is to say, I have an application that can never be down. I don't mean highly available. I mean, this thing can never be down and it has to scale massively. Then we're in the right conversation. And so that's the type of apps that we find ourselves working with today are massively scalable, no-SQL apps where there is a huge dependency on availability. And then often that will also span into multi-data centers. And so that enterprise grade, mission critical type of application is where we find ourselves playing a lot. Well, that also kind of talks about another trend, dev ops. Dev ops has been one of those trends that's been developed for focus, but yet operational. It's been kind of an ops-y moron. We've had folks on theCUBE over the past couple of years are saying, no, no, no, it's ops dev because the ops guy's like, no, no, ops can't go down. You can have a bug in a software package and go and can crash, ops can't crash zero time. So is that kind of the same thing? Are you seeing uptake in the dev ops crowd as well? I think dev ops is an interesting thing for two reasons. I think that on the first access it's the reasons you would run to it from a business perspective advantage. It clearly had, I was a developer for 10 years in my career and I know what that separation of the roles can do to slowing down a project and to making things worse in terms of the feedback loop. So I think on the one hand there is a desire to say let's move more to this direction. On the other hand, a lot of it was just forced. When you had these new technologies that were being developed so quickly and being put into place in the enterprises you had no choice. The ops teams didn't know how to do it and so the developers did it by default. But developers are really good at building stuff. They're not so good at maintaining stuff over the long run. That's not what excites them. And so I think as the dev ops role matures the good pieces of it are the tight feedback loop and the cooperation of getting a project to market very quickly. And I think though that demand side is going to grow more. And we talked about this last time I was on I actually think there's a role there for the DBA as the DBA role grows and the traditional DBA. There's a big opportunity here for them. Well we call it the DBA's been involved into a cluster manager basically and you have kind of the DBA role changing into and this came up in the Hadoop community as well as seeing similar things where the cluster management is more of a CIS admin meets database admin. You see that too? Absolutely, because if you think about the, again our profile at data stacks, our customers we tend to deal with a lot of pretty big mission critical customers. You can see that in some of the folks who are talking today with Disney and eBay and Netflix and that's not uncommon to have that caliber. And when you think about that kind of a company who has often the most institutional knowledge about those systems? It's usually the DBA. And so if the DBA can go ahead and bridge that technology gap and learn to be more of the data manager as opposed to purely the relational database manager then there's a lot of room for them to take their career. You know I'm really excited because over the past two years with the queue we've gone out to the field and extract the signal from the noise as we say. And we've had the chance to talk to startups and entrepreneurs like yourselves and even priest funded startups all the way up to the SAPs of the world. And so you have upstarts emerging and then to the big, big companies running businesses large scale businesses. Mark Hawkins and I were talking that the Zoom marketplace alone is a 4% of Microsoft's revenue. Yet, yet no one really knows that. And Spotify would love to have that yet they killed the Zoom service and meanwhile but most of it's Xbox but I'll know the story. So my question to you is is that as to run a business requires massive scale are you seeing a different mindset in the developer community where the Cassandra side might be a better fit than say Mongo. Mongo has been heralded as a better easier to use kind of a general purpose lamp stack and or framework or a database where Cassandra's gotten props on scale and might have been difficult to use. How do you guys, I know you're trying to address that in data stacks but is that a legitimate criticism or not? And if it is how you might be addressing that? Yeah, well I think that clearly whenever you look at statistics like that and you're trying to figure out what produced what in other words was Cassandra difficult and so therefore it attracts certain kind of developer or was a certain kind of developer attracted to technology because of the use cases. And quite frankly I'm not sure but what I can tell you is that what we see and if you poll our audience and you walk around and talk to our audience it is definitely a very deep high level engineer who are working on very serious mission critical problems. That's not the kind of engineer I was. I was a corporate developer and so if you'd have to go back a little bit guys that are old enough like you and I to remember this but if you remember in the 90s how the evolution of SQL server happened in the same marketplace as Oracle, right? There was definitely a place for both and both were very very successful and both have gone to billions of dollars a market but you would tend to pick one over another for a given project for a given reason and so many times I would choose SQL server for a project in a departmental app. Other times if I was writing something different that required bigger scale I would choose Oracle. I think that same thing is happening here so a certain type of app massively scalable no SQL always available multi data center. I think Cassandra gets the nod in many cases but that doesn't mean there's not room for that. I want to give Jeff Kelly a chance here but Jeff before you jump in I want to ask one final question before we get to you. So Adrian talked about two constituencies out there cloud, cloud guys and data center. He called them server huggers which I love that term because this can, server people aren't green but yet they stuck to their old ways. Okay he said, and then Jonathan mentioned earlier, Jonathan Ellis mentioned earlier on today that you guys are really built well, Cassandra's built well for multiple data centers. So I want to ask you in particular, we've been covering the converge infrastructure space where there's two things going on that are really happening that we're tracking. Big data which is the movement we're living with no SQL you want to globalize that and then converge infrastructure. So I want to ask you, based on your experience because you've lived on both sides of the emerging side and legacy side in your previous jobs what has the impact of solid state been to the marketplace? Both on the data center side on premise and how has it affected the cloud as well and how does that fit into your Cassandra ecosystem? Well so I think that at a high level just talking about what solid state does clearly anytime you can get something that's faster that's going to be desirable but then cost comes into the equation, complexity comes into the equation and in this world there's not a lot more complexity and cost has always been the big factor. Now that the cost is being driven down like all hardware it presents some great opportunities and for Cassandra some of the stuff that Jonathan talked about this morning where we have optimized Cassandra to take advantage of the SSDs gives us a lot of flexibility, gives the architects and the IT departments a lot of flexibility on what they want to do. So suppose that you're not ready yet or you haven't made the full jump to solid state but you'd like to run some of your workload on that. Now you can do that with Cassandra you can partition out your workload so if it's now available in the cloud which it is then great perhaps you want to run some of your cloud workload on it. What we're really about is taking advantage of that technology wherever it is if it's on-premise fantastic if it's in the cloud that's okay too and in fact we see more and more of our customers doing hybrids where they have some of it on-premise some of it in the cloud but it's the same database it spans both data centers so that gives us a big advantage there but I love the work that Jonathan and team have done on the solid state optimization in the center. He said SSD was the closest thing to a silver bullet you could find. It really is from a performance standpoint. I do because I think that it's well just think about it in our personal lives if you bought a laptop recently that has a solid state drive and you watch it boot you're obviously a lot happier you're happier without fast, you're absolute. Well Scott Denson, another veteran CUBE alumni who's founder of Peer Software or CEO of Peer Software their whole business is based upon eliminating spinning disk. Yeah, yeah well I think it's a natural evolution now the time frame on that these things always seem to have a longer tail than we think for whatever reason so the good news is for a Cassandra user we're going to give you not only the optimization of that technology but the flexibility to move your workloads accordingly so we think we're in a pretty good spot with it. I agree Philly, I wanted to talk to you a little bit about your partner strategy specifically around business intelligence and analytics because I noticed some news in the last couple of days some partnerships with Data Stack with Actuate and with JasperSoft and I think Pantah was also a partner so talk to me a little bit about or talk to our audience about why you made those partnerships and what role do they play in terms of we know that Cassandra is a great database for supporting real-time big data apps but not necessarily known for the analytic side of it. Talk a little bit about the interplay there and then how closely are you really partnering with these BI players? Well, so BI is a subject that's been near and dear to my heart for a long time even in my last role at Quest Software and that's where I met some of these players actually and just even as a consumer of data in my roles as GM and VP it was important that I had and a good understanding of how that stuff worked and the important thing is as a company is you have to know who you are and you have to know what you're good at if you try and be everything to everybody there's a really good chance you're going to fail. We are an infrastructure enterprise database and that means for us to take on the whole stack the whole solution. We're going to have to partner to make that a reality. Well, we want to partner with people who know what they're doing who are good at this stuff. We don't want to reinvent the wheel. And so when you mention companies like Actuate and Jaspersoft and the kind of players that have been doing this for a very long time in Pentaho they know what they're doing and they get big data, they get BI and so we take those partnerships very seriously we're going to be doing much more aggressive partnering as we move forward with them and with others to make it significant and deep because for us it's about being able to walk into a customer and say yes we can help you on your infrastructure side that's obviously very important. We also want to help you on your business side and that's where the BI piece comes in. So these are much more than logo partnerships. These are things that we take very seriously we want to help enable our partners to do more and more. So the idea is essentially allowing better analytics of reporting against Cassandra so you can understand how your application is running and how Cassandra cluster is running and kind of correlate that with business metrics. Yeah because analytics is a bit of a loaded term as is BI when you say that it can mean many different things to many different people so when you say analytics to an application developer they're going to think perhaps okay maybe that's some time slice data in real time with Cassandra maybe it's a map reduce job in Hadoop but when you say analytics to me I have a whole different vision of what that means I want my dashboard, I want my spreadsheet I want my data in answers that I can understand as a normal human being and so we need to be able to satisfy that need across that whole spectrum we can't do that alone we have to have the partnerships to help us with the business side. We'll handle the nuts and bolts underneath but yeah, the spit and polish, the business the transliteration of that data into real business actionable items that's where we need to help with our partners. So you mentioned you can't be all things to all people but of course the data stacks enterprise platform and we does have that kind of three components like Cassandra, Hadoop and solar. Talk to me a little bit about how they inter-operate and what really are the, what are the key use cases in terms of from a Hadoop and solar perspective to supporting your core Cassandra customer, data stacks customer? Right, and to the point earlier about making sure you know who you are again enterprise infrastructure so from an infrastructure standpoint what we want to do at the end of the day the goal of data stacks enterprise is to allow somebody to focus on the big data application and not worry about the complexities of that infrastructure. We want to take that complexity away. We believe Cassandra is extremely well suited to handle those real-time massively scalable always available loads. That data once it's in the system from that real-time application people then naturally want to do other stuff with it. How many apps today do you use in your personal life where you don't search? Probably almost none, search is critical so we want to bring solar to the data. Then perhaps you want to do batch analytics we're going to bring Hadoop to the data. Our objective is once that data is created let's not add complexity by adding more systems and moving that data all around. Let's help those developers and architects by saying once it's there we're going to let you do your batch analytics we're going to let you do your enterprise search right on top of the same data and here's the key we will do it with complete workload isolation to make sure that those workloads don't bump into each other. How can we do that? That has to do with the ability of Cassandra at its core to be multi-data centers. So we do logical data centers under the covers and that's what we do as a company so that's what our objective is. So we'll talk about that workload isolation a little bit in the significance there because I mean that's essentially where the Cassandra workload will take precedence for instance if Hadoop batch processing analytics job is going to run at the same time it's going to cause some performance issues with the Cassandra supporting a real-time application and you can make sure that doesn't happen essentially. Right, so as long as I can remember one of the biggest challenges we've always had with databases is how do I do my decision support systems we used to call in the old days, data warehouses, right? How do I do my data warehouse load and how do I do my, used to be called OLTP my online transaction processing I mean with all these terms. Remember? Yeah, don't scroll down. Right, how do we do all these together and the answer was we didn't and that's why you had the rise of the data warehouses and then you had the rise of the OLTP. Well in our world with that ability to carve out those data centers underneath we can then take that and say your Hadoop workload on these particular nodes is going to always operate with those resources. Your Cassandra nodes are going to always operate on these resources. The key is we manage all the data movement underneath so that the replication is happening for you you never have to think about it but from a compute resource perspective those remain segregated so that they will not bump into each other and where it gets cool is maybe you want to take your Hadoop load and perhaps you want to put that in the cloud. No problem. You can take that same data stack center price cluster keep some of it on premise move some of it into the cloud move some of it into a different data center so you get a lot of flexibility but that's been a problem for a long time and we think we have a good solution now on how to handle that. That's some of the business model because I think that's a relevant solution because that's what you've been saying I've been saying in SiliconANGLE look at what I'm saying it's hybrids the sweet spot right now. Public is certainly not going away but it's not the land Russia everyone thought it was going to be in terms of moving everything off on-prem off-prem but it's happening right but it's different in a different way shadow IT, age we had a long we had a long conversation about that. Oh yeah. So the question is is that I used to be part of shadow IT. Shadow IT creates competition and also we had Eddie one of your new MVPs from Splunk on earlier talking about the creativity involved now with the developer within the developer process with that saying hey we'll eliminate an entire sand save that money and move it into a new architecture. So there's some coolness going on obviously no problem the question really comes down to what's the business model for you guys at DataStacks obviously Cloud Air is making a business out of Hadoop now Hortonworks you guys are on Cassandra any updates there what's the latest with DataStacks? Yeah with the release of DataStacks Enterprise 2.0 what we're seeing is more and more of the Cassandra use case that starts with the real time load for that always available massive scalable app and then the reason that we developed DataStacks Enterprise in the first place was because that's what our customers asked us to do they said hey I need to do analytics now I need to do search now I don't want to have multiple teams I don't want to have multiple systems to do this can you help me find a way to bring this into the same system again because of the flexibility of Cassandra we can do that in some pretty meaningful ways and we don't just offer superficial packaging there's actually quite deep integration not to go too technical here because that's about as low as I can get but HDFS is not part of DataStacks Enterprise it's the Cassandra file system that we bring Apache Hadoop layered on top of and it's Apache Solar layered on top of and so for our customers it's hey DataStacks here's what I need I need a platform I've chosen Cassandra because I found the use case and it matches Cassandra perfectly but now I need to do more I need to search I need to do my batch analytics we're going to do more in the roadmap that we're not talking about publicly yet so our business model is one of when you download that software you're going to look at it for use cases very similar to G-Charge for the Hadoop connectors well it all comes in the same offering so when you buy DataStacks Enterprise you get it all and then you can deploy those nodes as you wish so when you buy 50 nodes you can decide how many of those you want to be Hadoop how many you want Solar it's overall node and node as a node absolutely absolutely and those are the same type of workloads that we've seen the last 10 years these types of apps are the same type that Oracle would have claimed in the past 10 years so yeah in the Hadoop community we're hearing a lot about yarn and next generation MapReduce and I think that they're trying to address some of the issues that DataStacks is addressing kind of without you you're addressing it from bringing together the three different components and where the yarn project is looking to kind of build some of those capabilities beyond MapReduce into the Hadoop system but certainly that's you know it's still an alpha quality you know I always laugh at the word lock-in because you know everyone loves lock-in if you're on the business side because lock-in means you extract more rents of the system look at Oracle great example Oracle is the classic extracting rents like a telco said that in the past but eventually they have to kind of figure out how to cannibalize their own but really there's other business models that can create kind of a quasi lock-in that's interoperability so you know we've seen this with you know we talk about our histories you know in the 80s and 90s you saw a client server go through that same kind of evolution where you know there was some quasi lock-in some proprietaryness but you kind of opted in for the proprietary but interoperability was the key is that what you're saying right now is the key to data stacks enterprises that you can interop with Hadoop and not have to make it usually exclusive? Yeah it's two things it's first again you start with that with that precedent use case at first use case where you've identified an app that has that always available need that massively scalable need then you just hit it spot on once you do that why would you want to fight bringing in other systems once the data is already there you're going to want to do other things on top of that data what we want to do is take that complexity out of your mind as a customer we want you to feel like you can just focus on your app let us handle how to bring all that together underneath it's exactly what we do What's your plan for CFS? What are you guys, how are you guys going to be pushing that out there? What's your big plan for making that the file system of choice because obviously you want to originate the data within CFS that's a cool thing and it will help your business model what's your core message there? Well the developers here at the conference when you want to talk about CFS and HDFS they're the right people to talk to they're about five times smarter than me and they have all the details on that the reality is in our customer base as we move more across the chasm if you will into the mainstream customer frankly they don't care all that much whether it's CFS or HDFS or XYZFS what they want is something where they don't have to fight it every day where they don't have to fight the complexities of the infrastructure they need it to scale they need it to never be down and they need it to handle mixed workloads so frankly if we're sitting here a year from now and I'm telling you about how CFS conquered the world we didn't succeed I've always said rising tide floats all boats and it's not one winner I mean it's different use cases it's different problems to be solved and that the interoperability is the key and Jeff I'd like to get your take on this because actually you're covering all the horses on the track and big data and somewhat in the convergent infrastructure let me know that's David Floyd's area what's your take on that? Well in terms of interoperability it's critical when it comes to big data because we're talking about a world where there's no the whole point of big data is to bring together multiple data sources from different systems from different applications I mean I've said in the past that you're really not doing big data analytics in my opinion if you're not matching up data sources from disparate different sources so that's key that they can all play together so you've got to balance as a vendor in this space you've got to balance playing nicely with the rest of the ecosystem with building your own business so and that's always a balance and we've seen different business models in the Hadoop space and now in the other big data markets we're seeing different balances between selling the software versus focusing on services and keeping the software open and free for the most part so it's always a balancing act I think we might see some changes in that as we move forward and some winners start to really emerge once they've kind of got a good hold on the market winners can kind of change their business model a little bit and start to ramp things up in terms of revenue so but it's critical really you've got to be able to interoperate in terms of the types of data the types of systems that can talk to one another and interact because again if you're not doing that you're really not doing big data great and I can tell you guys just by way of business model you asked earlier services is not something that we're really interested in we do services only during the presale part of the process to make sure that the thing can get implemented well after that we are more than happy to work with our partner network because the more services we do the more it detracts us from building that platform that's where we want to bring the value is taking the complexity away from that platform well you guys have a great culture the community here is very vibrant and I think you know the ecosystem at these early stages of the market growth I mean you know it's crazy now but I think it's even going to get crazier as the market continues to grow having that openness and enabling people to make money is a real key ecosystem philosophy I mean you look at all the successful ecosystems you know you want to have a position where you put a stake in the ground and you put a fence around it and make some money but at the end of the day if you can enable other people to make money with you and do good things it's always going to be good my final question for you Billy is you've been a great supporter of theCUBE would love having you on you're a veteran in the business you just have the experience in the market and the industry you're now in a very emerging cutting edge relevant segment no sequel where national databases are coming again with it it's not one philosophy when beats the other one it's growth what's your vision for the next two years or so around the growth of the market as a CEO of Datastax and also as someone in the industry what do you see as the key things going on major trends that are emerging major tech trends, major business trends what are you watching, what are you afraid of what are you excited about share the folks your two year view in two years what I would love to see is a world where we've gotten beyond the hype of big data I tell you this sounds weird but I have grown to hate the term big data in so many ways because it's essentially become meaningless to a lot of people so my vision in two years of a great world is a world where everybody understands this is the type of application I have this is the type of infrastructure need I had oh they solved that really well oh they solved this one really well another thing I would love to see in two years is a time when our customers aren't coming to us and begging for resources to hire they are having a tough time finding the kind of people of the caliber to go out and build these big data applications unfortunately there's no quick fix to that and that's why one of the areas where I believe that that rising tide does lift all boats because we all, every one of us in this new emerging big data space has to do a good job of educating the world what I get frustrated with is when we educate the world inaccurately that really agitates me and so I hope we can get to a world where we're very clear on our use cases we're very clear on who we are and in our world I can tell you if that trend continues you're going to see a lot more very household like names running mission critical applications on data stacks enterprise and at the same time doing everything we can to make the Cassandra community one that's continuing to grow with really high quality but those would be the two things better education, better understanding of when to use the right tool for the right job and then a bigger ecosystem you mentioned crossing the chasm as a follow up you mentioned crossing the chasm earlier Hadoops in the same kind of boat where it's emerging from a specialist specialism to general purpose that chasm is being crossed kind of in flight right now so as it kind of lands on the other side of the chasm making it easy to use as critical and easy to develop do you agree with that and what are you guys doing now and what's your plans for the year to do that? No I'm a big, first of all I'm a big believer and a fan of the architecture the crossing of the chasm architecture and all the research that was done there I think it's accurate it's seen it happen multiple times and that's why I think you need commercial companies to step in and do this everybody says well are you hijacking the community and there will always be a portion of the community that gets a little frustrated over that but the reality is to take something into the mainstream you do need some help from commercial entities who can make things easier I'll give an example just a small example the Cassandra documentation was rough and that sounds like a small thing it's not a small thing I mean that inhibits people to get up to speed very quickly it makes it a bad first experience so we pay a lot of people to work on Cassandra and we do that because we want to educate in the right way so that you know the right use cases and you know how to get it into production in the correct path so yeah I think it's a critical I've been on both sides of those arguments with developers oh the greed, the commercials company is going to take all the stuff out of the ecosystem in the day though what developers really care about is doing something meaningful and making money and getting distribution so having stability is a good thing it's a huge thing I can't think of anything worse than sinking a big part of my life and my energy and my soul into something that all of a sudden evaporates because it didn't get the uptake that it needs but I think we're all working to do the same thing I'm pretty good friends with Mike over at Cloudera Mike Olson and he's done a good job for a long time I've partnered with Mike back at Quest and they've done some real exemplary work in educating the community with Hadoop and we need to do more of that all of us need to do more of that I think you know Mike's been I said he let us hang in his office for a year and a half you know Mike well he's been a good steward and he actually you know Cloudera is the gold standard for how to boot up a company in an open source in a way that's credible you know they've always had to work on their marketing but you know when you're number one in the market they don't really need to work on marketing but a great company you guys are doing some same work you guys are kicking butt here at Cassandra thanks for your support for allowing us to come and congratulations from the conference Billy Bosworth the CEO of Datastax Datastax is running Cassandra Summit we're here live and it's looking valuable right back with our next guest after this short break