 In Santa Clara, my name is Jeff Kelley. I'm with wikibond.org. I'm joined by my host for this segment, Jeff Frick. Welcome. So we're here of course at Percona Live. We're talking about MySQL, the open source community that supports it and a lot of the vendors that play in this area in this space helping kind of show up MySQL for the enterprise. So we're joined in this segment by Robert Hodges, the CEO of Continuant. Robert, welcome to theCUBE, your first time here and we're glad to have you. Thank you, Jeff. It's a pleasure and hi everybody out there. So as your first time here, why don't you introduce kind of your company and yourself to theCUBE audience. Tell us a little bit about the company and what you do. Sure. Continuant does clustering and replication for open source databases. And I think a simple way of explaining that is we really make open source databases work for big businesses. An example, very large customer of ours is Marketo. They run a huge data driven business. It's all done on MySQL. And what we do is enable them to cluster the MySQL databases. If they have to do maintenance, if something fails, we ensure that another database is there to take its place. And it's been one of the factors that has allowed to grow into a multi-billion dollar company. So that's the kind of problems that we help people solve. Right, so yeah, I'm looking at your site here and you get quite an array of customers, some impressive names on here, Financial Times, Groupon and others. So who's kind of your core customer? Is it the more traditional enterprise or is it more of the web companies or kind of a mixture? No, it's a little bit of both. So I think Zappos is a good example of a more modern customer where they've been web-facing right from the start. We help them move data out of point of sale data into Oracle from MySQL. Marketo, market management very much in the forefront of marketing automation. And then we have companies like Adobe who've been in the business for a while and we help them also run their operations using our cluster. So let's talk a little bit about MySQL. So of course, it's been around for a while. You know, there's a lot of talk in the database world around no-SQL databases but clearly MySQL is still, you know, it's a vibrant community. There's companies like Facebook and LinkedIn and others that are big MySQL users. Talk a little bit about where MySQL kind of fits into the current database paradigm. Sure, I think that MySQL is still, particularly for web-facing applications, it's the premier online transaction processing system. There's been tremendous change in the industry, obviously, but MySQL, for people who just need to process transactions very quickly, not lose them, be able to draw on a large set of tools as well as expertise, it's still the top dog. We're also looking at the emergence of databases like Mongo, Cassandra, which try to cover some of this ground that MySQL does, perhaps giving a little bit more ease of use, giving people more flexibility about schemas, but MySQL is definitely still in the running and very, very vibrant at this point. So Robert, all the buzz has been about no SQL databases with Mongo and a bunch of the other competitors out there with a web scale, and then of course Hadoop is getting a ton of buzz in the news, not in large part by the Cloud Air announcement for the huge investment from Intel and others. You had a keynote this morning, unfortunately we weren't able to stream it. We didn't have a connection, but tell us a little bit about how this Hadoop buzz and spin and where does MySQL fit in this new world? Absolutely, and I just want to say on the SQL, it's not so much that MySQL is going down but that other databases are coming up as there's a wider variety and more data being processed in enterprise environments. So MySQL is still continuing to grow. The Hadoop development is really interesting because what Hadoop has done is really changed the way people are doing analytics and it does it in a couple of ways. One is it can handle any kind of data. So people are using Hadoop as a way to aggregate interesting types of data, everything from machine sensor information to web logs to business transactions in one place where they can compare them, make associations and get insights that enable them to drive their business. The other thing about Hadoop is it allows people to process enormous amounts of data. It takes advantage of decreasing costs and storage as well as processing. It's architected in a way that really allows people to take advantage of that. So what this has done is for the first time, people running MySQL who have been accumulating this valuable business data now have to think seriously about getting into Hadoop so people can run their reports. And in fact, at the end of, we expected this to happen a couple of years ago but really by the end of 2013 we were starting to see businesses including our customers coming to us and saying, hey look we've got dozens to hundreds of MySQL servers, we need to get the data, the transactions that are happening there. We need to get them moved into Hadoop and we need to do that without changing our applications or putting load on those systems. And so that's really a trend that's going on right now and one that we're helping with through the software that we develop. And then that new opportunity for you guys. Absolutely, I think that it's a little bit, it's kind of a natural evolution in many ways. We do clustering, so we have to make replicas. To make replicas you need replication. So we developed replication that is completely open source but is very similar to the enterprise replication products like CyBase Rep Server, I used to work there, like Oracle Golden Gate. And so we've adapted that to allow people to replicate efficiently from MySQL to other data sources. And now Hadoop is really turning out to be a really big opportunity there. So I wonder if you could expand on that a little. When we were talking before we went on, you mentioned how a lot of MySQL deployments were kind of siloed. Not unlike how Oracle database deployments are often siloed, maybe tied to that application but not really integrated into the rest of the kind of the architecture or other data sources and other applications. Maybe you could talk a little bit about why that is and how that's maybe changing in kind of your role and trying to help that change along. Yeah, absolutely. Well, I think the fact is that databases have always been kind of siloed because it is expensive to integrate, to do projects to integrate between for example, an OLTP database and a data warehouse. So Hadoop has kind of changed the, has changed the, shall we say the balance because of the fact that it can deal with all kinds of data. So there's a sense first and foremost that MySQL is missing the party by not being able to, this is a very, very valuable transaction data often. So sales data, campaign management data, core transactions for the company that people now want to combine with that unstructured data that they're already processing in Hadoop. So that's definitely, there's definitely that, you know, it's driven sort of from the users to have these requirements. I think the other thing that Hadoop has done is it's really kind of changed the equation for developing data warehouse. It used to be data warehouses, for example, Teradata, Vertica, these are great technologies but they're complex, require a lot of specialist work, require a tremendous amount of development before you can begin loading data. With Hadoop, it's simple to load data, it's just files. And so you can often get value from the data very, very quickly once it's in Hadoop because you build just enough structure to answer the question that's currently on your mind. And I think that's another thing that's kind of driving people to now pull my SQL data and to let them participate in this analysis. So what are you doing specifically around this? I imagine it relates to your replication capabilities and getting that data from my SQL, as it's created as quickly as you can into your analytic environment, so it's available for that kind of analysis. Right, that's exactly what we're doing. And just to be clear, people have been loading data from relational databases into Hadoop for quite some time, sort of years now. But for the most case in the MySQL world, it was done using snapshots. So take a picture of the database, just dump it out into a file and then load that file into Hadoop, maybe splitting it up, changing the data type so it's easier for Hadoop to consume. The problem with that is people, there's really two problems. One is it just doesn't scale very well. So we have customers that run seven, eight terabytes of data in a single MySQL server. The time to do a snapshot of that and load it into Hadoop might be days to a week. By the time you've done that, well, you wish you'd had four more copies. So there's a tremendous latency problem. The second problem is it doesn't capture all the information you want. When you're doing analytics, you actually don't want to know what the data, you don't want a static picture of the data, you want to know what happened. So you want to see the list of sales that you made in the order that they occurred. And so what we do with the replication technology is we extend the existing snapshotting tools, which are great for getting a first copy in. And we allow people to then add to them the transactions as they occur. And we can pull them out, we're very fast. We can pull them out, basically have them out of MySQL in a second or less. And then the users can decide how quickly do they want them to be loaded into Hadoop. Do they want to wait until they have a million of them, push them up to a million? Do they want to wait 30 minutes? All that stuff is configurable and they get to make the choices. And they make that choice based on business requirements. And trade-offs with would be performance or? Exactly, there's a trade-off. If you load it too fast, it doesn't use Hadoop resources efficiently. Hadoop has a limit on the number of files, for example, that it can really handle. So you need to keep the files more or less large. At the same time, your analysts have some maximum latency that they'd like to tolerate. So for example, one of our customers is doing a data warehouse where they're loading point-of-sale data that they have in MySQL into the data warehouse. They'd actually like to see results in 10 minutes if they can. So that's kind of at the near real-time end. You have other people for whom 24 hours is fine. They just want to see the data. Well, it's interesting around the whole real-time question. It means different things to different people and it's not always needed or always the best choice because it can have performance implications and other implications to our infrastructure. I wonder if you could talk a little bit about the company and your business model specifically because what interests me is in an open-source world and I'll cover the Hadoop market closely as well. There's different business models around open-source. Do you provide just services? Do you bring in some of your own IP in proprietary software technology? What's your approach in terms of business model in this open-source environment? Well, our approach is, first of all, to be very, very careful about what we let go. Always a good first step. Into open-source and I think that a lot of people have very well-known companies have gone broke putting too much stuff into open-source or not doing it in a planned way. So our approach which we settled on over a period of years is the clustering is actually closed-source and licensed. We use a subscription model for that but then we realized very early on that in order to be viable in the MySQL market we needed to make a contribution that would get people, first of all, feeling good about it, second of all, looking at our technology and we decided a few years ago that the piece we wanted to let loose in that way was replication. So the replication technology, everything we do in that area is 100% open-source. It's licensed under the same model as MySQL and GPLv2. And so that was kind of the hook that would get people interested in the company and really look closely at our clustering. Now it turns out that replication, now that we have things like moving data to Hadoop, moving data to Vertica, moving data to and from Oracle, that's actually valuable in itself and there we do a model like Red Hat where we basically sell a yearly subscription. People pay us X amount per server that they're using with replication and then we provide them full support and bug fixes. And it's been very successful. I think unlike MySQL, which is kind of a commoditized database and it's very simple, at some level very simple to operate, replication is kind of complex and it often involves business, being able to move that data is business critical. So people want that, they basically want our support to make sure that that stays up. So Robert, that's interesting because you introduced the company by talking about replication and clustering and then as a business model, you basically split it down the middle. One is open-source, one is not. Correct. And that's been successful and how did you choose which one to go with? Well, by trying all the other ways of doing it first. So we came out, we started out really with kind of a crippleware model which you could argue we didn't know better. It's sometimes hard to find the right balance and I think it was something we evolved to because we wanted something that was to release something to open-source that was complete so that people could build complex, interesting applications with it and also get a feeling for the power of our technology. And so as a result, replication just felt like a good thing to do in this way. The other thing is that if you look at the enterprise replication products, again like Oracle, Golden Gate, CyBase Rep Server Quest, nobody's doing that as a fully open-source product. So we thought that was an interesting contribution to the market to put that out there and give people the opportunity to try it as an open-source model. And within that piece, how much of the contribution is coming from outside the company? How's the community really adopted? Great question. Getting behind that. Great question. The replication work is actually almost exclusively from continuing and that's not, we do see people who go and often on their own will fork the code and write new features but it turns out that replication, it's a bit like, it's equivalent to the internals of a database, sort of the core database engine. It's a very complex piece of software and so most people can't just walk in and make contributions. We've had a few people that have, but for the most part the product development has been driven by specialists working for, continuing, and doing it full-time. What we do get from the community which is equally valuable is we get direction. People let us know what's interesting. They try it out, they give us feedback and then they're also able to, at the fringes, contribute things like documentation, contribute scripts, things like that that's sort of at the edge that allow the replicator to become more capable. That's great. Well Robert, we've got time, just one more question. So I would like to give you an opportunity, tell us a little bit about what lays ahead for continuing, what's on your roadmap to the extent you can share with us, of course. Oh, absolutely. Well, I think that the Hadoop is really just, it's just the beginning of the wave where people are beginning to move their structured data in. So our first priority for the early part of this year is really to nail that problem, to get customers successfully deployed, build out the features, do the performance optimization that's necessary for that. We have really two directions beyond that. One is to continue to develop our core clustering, capability, databases, services, actually in many ways, a more valuable business for more people. So we definitely want to have, preserve those capabilities and make them better and compete in that market and then, of course, to extend the replication out to more data sources. We work with MySQL, work with Oracle. We'd like to make the other databases better and have more things to offer to our users. Absolutely. Well, I mean, it's a really interesting market right now with both the open source technology, the new types of databases, well, there's no SQL coming online and it's just becoming, in a lot of ways, more complex for enterprises trying to understand how to architect their systems and when to go with something like MySQL and when to go with something else. And there's a lot of need for that kind of technology to actually help integrate those and the things that you help do. So Robert Hodges, CEO of Continuant. Thanks for coming on theCUBE. My pleasure. Hope your first time wasn't too painful. Yeah. Thank you very much, Jeff. Thanks for coming. Thank you, Jeff. You're welcome. It's been a pleasure. Stay tuned, we'll be right back for Percona Live here in Santa Clara. I'm Jeff Kelly with wikibon.org and you're watching theCUBE.