 At Big Data SV 2014 is brought to you by headline sponsors WAN Disco. We make Hadoop Invincible and Actian Accelerating Big Data 2.0. Hi everybody, this is The Cube, this is Dave Vellante and I'm here with Jeff Kelly, we're with Wikibon.org and this is SiliconANGLE's live production of Big Data SV. Big Data SV is our production that we've run, we've run now running for three days here concurrent with the Stratoconference, which is across the street over at the Santa Clara Convention Center. We're here live at the Hilton, we have our event, The Cube Party tonight, so if you're in Silicon Valley, friends of The Cube, please stop by. We'd love to see you, we're in the coastal room at six o'clock. Brett Rudenstein is here and he is a senior product guru of WAN Disco. We're gonna talk about HBase, nonstop for HBase. Brett, welcome back to The Cube, it's good to see you again. I'm glad to be back, thank you. So we were just talking about, we're both in Boston, it feels like we're never in Boston, right? We're constantly on the road here, but last time we saw you, we both happened to be in town. We got a great demo from you. You guys made some big announcements this week. We had cause on earlier, David Richards and Jogaine as well. Talking about you bringing your active, active technology into HBase and why that's important, but why don't we dig into that a little bit. But before we do, first of all, I want to get your take on the event across the street at Strada. It looks like it's pretty packed. What's going on at your booth? Yeah, the attraction has been very, very well, very well trafficked, I should say. We've had a lot of interested parties and what's interesting, I think when we look back a year ago when we announced our nonstop adobe and we looked at, you know, how people looked at our technology and said, is that really possible? Now this year, it's pretty well understood that it's possible to do and everyone's defining their use cases. So the people coming to our cube, the people coming to our booth and looking at our technology have very real world use cases and are very excited about what we're offering. So what kind of questions are they asking now? But last year, it sounds like, what? I didn't think this was possible. What about speed of light? You know, all those kind of Colombo questions that we were asking you when we first heard about this. What's the discussion like now? The discussion really evolves around being able to maximize resource utilization in a DR cluster. Typically what people do is they're buying an insurance policy. So they might put $100,000 worth of equipment in a disaster recovery data center. And it's literally the insurance policy in case of, you know, in a zero availability event. And so if that sites down, now suddenly this equipment has value or its value is in the insurance itself. What's different now is everyone wants to be able to take that resource that is idle and be able to do something with it. So they're looking at sometimes they're looking at at one data center for doing primary jobs and running secondary jobs that otherwise wouldn't have the bandwidth to run in the primary data center and be able to run in that that active, active nature that we talk about. So who are you having these discussions with when they come by your booth? What kind of level we've heard there's a lot more Jeff was over there earlier, a lot more suits than there are, you know, hoodies this year. But who are the types of people? What kind of titles are you speaking to? Yeah, I mean, it's it's mixed, you know, we're looking at engineers, we're looking at developers, we're also looking at C level execs, CIOs of companies, CTOs of companies, it's very mixed and it's very much people trying to understand how it fits in their environment and how it benefits them from a more holistic global scale. All right, let's talk about each base a little bit. Each base kind of default, a dupe database, but why don't you take us through each base, give us a little each base one one? Well, you know, each base is effectively, you know, what some people call columnar storage for, for big beta applications, some people call it a key value store. But the fundamental principle behind it is to be able to store billions and billions of rows of data. And at the same time have real time or near real time access to that data. Okay, so it's popular database, right? So we actually presume that's why you picked it. But why did you pick each base? Well, fundamentally, you know, from a database perspective, these are, you know, the order often dubbed the no SQL databases, if you will. The reason that it's often picked is because, again, that the level of scale that it's able to achieve. And fundamentally, each base is Hadoop's database. And the first thing that you need, because each base stores, you know, it's each logs, it's each files, the persistent part of each base gets stored into, typically gets stored into HDFS. And the first thing that you need is a hardened HDFS, whereby you can withstand failure and you can withstand redundancy, or you have redundancy. So the first thing we did when we announced last year was our nonstop Hadoop, which was active active replication of the name node, and geographically or data center aware Hadoop. When you have that solid underpinning, you can now take on an application like each base and give it those same characteristics. In fact, one of the things that oftentimes people are have to make a trade off when they look at the kinds of no SQL solutions that are available is they're looking at consistency and lower availability versus eventual consistency and high availability. And by taking our active active replication technology, putting it on to H base, we now allow H base to not only be strongly consistent, but also give it the ability to be continuously available. So to take advantage of the nonstop for H base or nonstop technology for H base, is there a prerequisite or what are the prerequisites, if any? Well, you know, the the prerequisite, and I wouldn't say it's a, I wouldn't say it's a prerequisite, so to speak, but you'd want to be able to have a best practice would be that you'd want to be able to have that that reliable continuously available data storage. Meaning, meaning nonstop name note, meaning HDFS and nonstop name note, the nonstop adieu product. Okay, so it's kind of, you could do it without the nonstop names, but then you'd make the dupe the weak link. Correct. Okay, your advice is go nonstop name note and then apply active active to H base. Correct. Okay, go ahead. Sorry. Yeah, no, I think I think that's exactly the right approach. You want to make sure that the, you know, the foundation of your house is in good order before you start building on top of it. And that's exactly what we've done. So talk a little bit about the implications for the enterprise now that you can potentially run these applications, mission critical applications. So there's, you know, there's cost savings you're talking about if you're running to do instead of a costly proprietary database, or some of the implications that this means for the enterprise now that you've got a nonstop or I should say a kind of a rock solid H base. Well, I think it changes some of the applications that are now that can now, you know, participate in the usage of an H base. You can use it for applications that are mission critical stock applications, you know, streaming stock quotes and things of that nature. Because if H, if H base becomes unavailable, suddenly you, you no longer have that, that continuous availability. You don't have the business continuity that is required for a stock style application or something. So this opens up all those possibilities for continuous availability that are required for these mission critical applications. Okay, let's take a look at the demo. Sure. You know, and I thought I'd start off, if you take a look at my screen here for a moment, I thought I'd start off by kind of doing a recap on where we started a year ago. A year ago today, we started with nonstop Hadoop. And that was active, active replication of Hadoop's name node and the ability to stand up Hadoop across a wide area network or multiple geographic territories. We allowed the name node to be continuously available, even in the event of any failure of the system, and by making HDFS essentially available across geographic locations, you had continuously available Hadoop and wide area scope Hadoop with multi data center ingest. And so I thought maybe, you know, you've seen some of these sort of demonstrations in the past, but I thought I'd start off with a much smaller scale down version of the demonstration. So once again, I've got a North Virginia data center, and I also have an Oregon data center about 3000 miles distance between those two data centers. And of course, from the screen, you can see the six graphing applications, each one representing one of the active name nodes in the cluster. I'm going to create a job here, a map-produced job. I'm going to put it in a directory called Q1. This is going to run teragen and terasort, so we basically put some data into HDFS and then we sort it into a total order. And while we do it, we throw some failures at the system. A couple of things that you'll notice on screen is that our graphing application, all the name nodes are starting to respond to the event. This is a synchronous update of each one of those name nodes. But we also have these foreign blocks, and you're only seeing them on one side. You see them in the Oregon side. And that's because the map-produced job that we're running, teragen and terasort, are running in Northern Virginia. And the foreign blocks are those blocks that have not yet replicated from their site of ingest. So these move across the network asynchronously. And what this allows us to do is it allows us to have local land speed performance while synchronously updating the name space, and of course gives us the complete replication of data across geographic territories. And if a client were to access some of those files that weren't yet available, the client can actually reach across the wide area network and pull it over in real time if it wants to. So while we're doing this... There's a bit of a performance hit doing that, but eventually that goes away. Is that right? Well, if you're reaching across the wide area network, yes, there's a performance hit. Unless you want to wait until all the data is completely replicated, in which case it's local at that point. So let's go ahead. I'm going to pick this node here called the CDH3. I'm going to do a reboot on it. And one of the things that you'll notice as we're running the map-produced application, the bottom right-hand node goes down to zero. So zero bytes in, zero bytes out. And of course we have our map-produced application, which runs uninterrupted. There's no interruption of the service itself. So similarly we can also bring down another node. I'll bring down two out of the six. I'm going to also move over to the nine machine on the other side, on the organ side. And I'll reboot that as well. So you've just seen two name nodes fail. Yet despite that failure, our map-produced job in the upper right-hand corner is just completing. So there's been no interruption of service. And as you've seen in some of the previous demos, when those machines come back online, they're in safe mode. They'll learn from the other machines that they're behind in this sort of global sequence of operations. They have to play them all back. They have to catch up. And once they're caught up, then they become fully active participants in the cluster. Really providing just a tremendous amount of availability, continuous availability. It's the way you can get five-nines reliability into Hadoop. So let's now kind of fast forward and look at what we do for an H base. So if we look at sort of the traditional architecture with an H base, we have any number of region servers. And the job of the region server is to host these things called regions. A given table inside of H base might have many regions, but one of the regions will be on a specific region server. And if that region server hosting that particular region fails, then that table is unavailable for that particular point in time. So how long does it take for that region server to come back online? Or for that region to come back online on a different region server is a better way to say it. Well there's a couple of steps that it has to go through. One is the first thing that has to be identified is that that region server is missing. The next thing has to be the reallocation of that region, so it has to be sort of elected to a new region server. That region server has to come up online, look at the right-ahead log, play it back into its mem store, and once it gets back into the mem store, has to read the H file, eventually to come back up online and be in service again. So how long does this take? You can take one second, it can take one minute, and it can take an hour. It's sort of, it's not a determined amount of time, and it really depends on various things, what's going on inside the cluster at that particular time, how busy the utilization is in the cluster, how big was the table, is a number of factors to get that back. So there isn't really at least a level of consistency in bringing back age base. So the contrast to that is if we take when this goes active, active technology, our distributed coordination engine that we applied to Hadoop, now we apply it to the age base region server, we can now take a single region and have that region hosted on multiple region servers, so that one region is actually available on three region servers, and then client is able to load balances request across all those region servers. So now you have a couple of interesting scalability benefits as well. A lot of times what happens is, there's a general recommendation, it's not a hard recommendation, that region servers only allocate about 15 gigabytes of memory, so 15 or 16 gigabytes of memory to heap, because the issue is they don't want that to, when it gets to GC, the garbage collection, they don't want to have a big pause or slow down in service. Well when we have multiple active name nodes, we can now sort of eliminate that bottleneck, because one machine can be doing GC, while the other two are still serving. So we've got some benefits there as well. Of course it's continuously available, if a region server fails, then the other region servers pick up the slack for the region that was being hosted, and we allow the application to continue to run, and that's exactly what the demonstration is going to be. So the screen that you're looking at now is a graphing application, and this is showing us tweets, but again, you know, think about this in terms of the example I gave a few moments ago, which was the stock trading application, or at least a stock ticker application, where you're getting a live feed of the actual prices of stocks as they come through for those kind of mission critical applications. In this case, we've got tweets that are coming in from a streaming API, they're coming in based on a series of keywords, and they're being fed directly into HBase. And not being fed into this mapping application, they go right into HBase in the series of puts, and then the application that you see on screen here is reading those back out of HBase as essentially a get, it's reading them out of HBase, and then displaying them on the screen. So the bottom part of the screen is the actual tweet message that you can see in the top half of the screen is the area of the world in which that tweet is coming from. So it's something very visual. And the reason we did it this way, so that you can see that when I stop the region server that is hosting the region that is being written into here, you won't see any flashes, you know, the flashes will keep flashing and the messages will keep messaging without any interruption of service. And that's what a non-stop HBase provides. So let's do that. I'm going to pick the, I'll pick the bottom one, so I'm going to SSH into HBase 4 is the name of that machine. I'm going to get the process ID of our non-stop region server and I'm going to issue a kill-9 on, let's see, it's 2055. And again, just before I hit the button, I want you to kind of keep in mind that in any other circumstance, if the region server failed, that was hosting the region that you were trying to access, you'd have an interruption of the service itself. So let's go ahead and kill it. What you'll notice, if you look at the bottom right hand corner of the screen, as soon as the graphing application updates, that that HBase machine will, it shows up now as dead. The dials drop down to zero, but if you've been watching the application and the tweet's coming in, there's been no interruption of service, it's a non-stop, continuous availability HBase that gives you the strong consistency that is necessary in an HBase and of course, the high availability or continuous availability of Wendesco's non-stop technology. Okay, and without Wendesco, you'd point out that you'd see an interruption in service and I think you were saying, I'm sure you were saying that interruption in service would be very unpredictable, very inconsistent, might not be so bad in one instance, might be huge in another, you just don't know. You don't know and for mission critical applications, you need not only consistency, but you need to be able to do it in a timeframe that suits the application's service level agreements, the SLAs. So what kind of applications do you expect people to be applying your non-stop for Hadoop technology at that? You know, I think we'll see this being applied in financial services, we'll see it applied in healthcare, pharmaceutical, sort of all the major players who have mission critical applications that demand a continuous access to their applications. And put it into context a little bit, how often does HBase fail? I mean, is this a problem that is a common everyday occurrence? Is it built into the very nature of HBase that it's going to have failures from time to time? You know, it's when we talk about failures in a Hadoop cluster and in region service failing, we're talking about the hardware. And absolutely, hardware fails in Hadoop clusters all the time. The disks in Hadoop clusters fail all the time and if you lose enough disks, that service effectively becomes unavailable is out of service. So this is a fairly common problem. And again, the larger the cluster, the more failure you're going to inherit. And you mentioned, this allows you to build and deploy mission critical apps that have specific SLAs. And that's one of the things we're seeing in the larger market is a reluctance in some cases to adopt some of these new types of technologies like SQL databases, HBase, whatever the case might be, because they don't necessarily meet these rather strict SLAs that are common and pretty much table stakes for all the other types of applications we run in the enterprise. It's the trade-off. It's trying to determine whether or not I can sacrifice the availability of the application for the consistency of the data within it. And that's why people sometimes get stuck relying on even traditional RDBMS or things that they're very familiar with even though they need the scale that is provided with some of these no-SQL solutions. Awesome. Well, I'd love to see that you guys are propagating your technology beyond the core of Hadoop. Is it likely that we're going to see this on other no-SQL databases or are there other logical places that we can expect to see non-stop technology? Well, our company applies non-stop technologies to the ALM space. So we've applied it as a version. We've applied it to Git. In the big data space, at this point in time, we've chosen to stick with the Hadoop infrastructure in the Hadoop ecosystem. So certainly, you'll see it applied to other aspects of Hadoop in the future. All right, Brett. Thanks very much for coming on theCUBE and sharing that great demo. It was a live demo, by the way. Absolutely a live demo. A live demonstration. Running across, again, machines in EC2 across four different nodes that comprise the cluster. Very impressive. All right. Thanks again. Great to see you. Good luck getting back to Boston in the snow. And we'll see you next time. I really appreciate your time. Thank you. Okay, everybody. Keep it right there. Jeff Kelly and I and John Furrier will be back. We're live from Santa Clara Hilton here in Santa Clara, California right across from the Convention Center Strata Conference going on across the street. We have our CUBE party this evening. If you're in the Silicon Valley area, friend of the CUBE, please stop by. We'd love to see you 6 p.m. tonight at the Santa Clara Hilton. Keep it right there. We'll be right back with our next guest.