 It's fun, it's exciting, and more importantly, it's very social. The Cube doesn't act. Live from Midtown Manhattan, The Cube's live coverage of big data NYC, a Silicon Angle Wikibon production, made possible by Hortonworks. We do Hadoop, and when does this go? Hadoop made invincible. And now your co-hosts, John Furrier and Dave Vellante. Hi everybody, we're back. This is Dave Vellante of Wikibon with Jeff Kelly, and we're here live at Big Data NYC. We're at the Warwick Hotel, right across the street from the Hilton on Sixth Ave. We've been here going wall to wall all day. We'll be here tomorrow. We were here last night. So Jeff and I are really excited about this segment. We've been tracking Wendisco for quite some time now. Brett Rudenstein is the product guy for non-stop Hadoop, which is really when Disco's claimed to fame. So Brett, welcome to theCUBE. Good to see you. Thank you very much. All right, so let's get into it. We were talking to some of your colleagues earlier about what it is you guys do. So I think we got that down, but talk about the product specifically that you run. Yeah, so Wendisco's non-stop Hadoop essentially provides active, active replication for name node, which essentially eliminates name node bottlenecking. It eliminates the single point of failure for name node, and it allows clients to essentially round robin their requests to each one of the active name nodes such that you also get a name node load balancing effect as well. So over the wide area network, we allow Hadoop to span geographic locations in a single Hadoop. So what that effectively means is you have one Hadoop cluster, a single cluster ID that spans and number of geographic territories. Okay, so that's a pretty complicated situation for a lot of people in the audience. So talk about name node. What's a name node do? What's the purpose of a name node? Yeah, the name node essentially is, or the purpose of name node is to basically store the file system metadata. So if you think about a standard disk drive, there's a placeholder, if you will, to know where all the files live on that disk drive. Now Hadoop is a distributor of architecture. So the name node basically handles all of the metadata information for where all the files live in the Hadoop cluster. So the name node allows you to get to the data. It knows where the data is. You lose the name node, you're screwed basically. And it's another piece of what you guys do. Is that the official way we put it? That's a technical term. Now so, and the other piece of it is, as you said, Hadoop's a distributed file system, so the data's everywhere, right? It's not. And the name node keeps track of those locations so that a client that makes a request knows it knows where to go get that data from. So would you say active, active? Talk about what that means. Yeah, so, you know, and we look at it from two angles. One is from the name node perspective. So active, active means that a client could hit, in fact a single client will send a request to one name node. That request would be handled by that name node. And then the follow one request might land on another name node. So in a standby situation, all client requests would only go to a single name node. And only in the event of a failure would you switch over or fail over to that standby. In an active, active environment, clients round robin their request to each name node. So you get a load balancing effect against that name node such that you don't create a bottleneck. You know, we've seen where name nodes will go into garbage collection, you know, with these large Java heaps. Name node goes into garbage collection and causes a slow down and complete pause of service. With multiple active name nodes because you're not limited to the just one, you're able to continue service and operation in the cluster without any downtime, without any interruption. So in an active, passive situation or what you call the standby situation, if there's a failure, you fail over to that passive name node which then becomes active, which of course takes time. Correct. And you've, like you said, you've got to constantly rebalance to make sure that you've got, you know, a situation under control. So you've got, you know, you've essentially got a virtual single point of control with your solution. Is that right? Yeah, in fact, well, you know, if you really look at our architecture, we call it a share nothing architecture because no one name node acts as that central coordinator. There isn't one machine that creates a leadership role where as you look at some of the other protocols out there, they, even if it's a temporary election to that role, it is a leadership role. And in the event of a failure, you hope that some other machine self-elects in its place and a share nothing architecture with no single point of failure, you always ensure consistency and continuous availability. This would be what Abby Metta calls an AC product that's after cutting as opposed to a BC before cutting. So you guys designed this for Hadoop, obviously. Okay, so take us through the demo. What are we doing? Yeah, and just before I do, you know, so we talked about, you know, multiple active name nodes. What we do across a wide area network is we do stand up that single HDFS and it allows you to ingest data in both sites, both the data site A and the data site B or and it could be, you know, a third site or fourth site and so forth. The demonstration that I'm going to do today is going to use some of the standard sample applications that ship with Hadoop. So I'm going to run Terogen and Terrasort in data center one. If we have time, we'll run Terra Validate in data center number two to prove that the data made its way across the WAN. Now, if you look at the screen here. Sorry, it's rubbing, you're saying you could do this through N data centers. Correct, yes. If you look at the screen here, you see a map is sort of a live view of the active data centers and we have two data centers. There's a data center in Northern Virginia and it's data center one. These are running up in Amazon EC2 so this is a live clustered environment. So data center one is in Northern Virginia. Data center number two is in EC2 West in Oregon. So about 3,000 miles difference between the two data centers. So I'm going to drill down on the WAN link here, if you will, and that'll give us an exposed view of what's happening operationally inside the data center. So again, I've got the three name nodes that I talked about, three active name nodes and there's actually three data nodes there as well. Six node cluster in Northern Virginia and then three more data nodes in Oregon and the graphing application that you're seeing is showing the RPC bytes in and the RPC bytes out for each of the active name nodes. So let's go ahead and start our pterogen application. So we're gonna run a little script here and the script, we're gonna put it in a directory called SA. When we run the script, one of the first things that you'll notice on screen is that the name nodes start responding to the request. So you've seen each one of these name nodes respond to the request for client activity. So we're already kind of sort of demonstrating that load balancing effect that you see. Each one of the name nodes is responding to the client requests. One of the other things that you'll start to see is over in data center number two in Oregon, you're seeing these little blue bars appear. What these blue bars are is they're showing you something called foreign blocks. So because we've made Hadoop data center aware, all of the blocks that are being ingested into data center one happen in a synchronous fashion but the foreign blocks move asynchronously so we don't block the client. Now the next thing I'm gonna do is I'm gonna start to... Sorry to interrupt you, I just talked to our director we're having a slight technical problem so why don't we, we're gonna hold on the demo just for a moment. So why don't we jump in just for a second, ask a little bit about, you know, kind of the, let's talk a little bit about the application that this allows you to really run. What are some of the things that, if we can really talk about the business values. So if you're an enterprise and you're looking at the solution, you're looking at to do what is this really gonna allow you to do that maybe you couldn't do before? Yeah, I mean if there's a number of things, let me start with, you know, sort of, you know, contrasting it to some of the, you know, other methodologies that people use today to kind of create or keep data and sync across multiple Hadoops. So the first is that it's two different clusters. You've got two different clusters that you're keeping in sync and usually the two most popular methodologies are one is you use disk CP or something controlling disk CP on the back end to periodically send out that synchronization if you will, but the problems that we see is that disk CP won't copy a file if it's the same size and the same name that it's a predecessor. So how often does that actually happen? Well interestingly enough, if you're doing job transformations, you've got the same data inputs and that data input might be growing but the data that you're creating tends to be, you know, file part 001, file part 002, split points are about the same. So after a period of a couple of months you wind up having your cluster sort of diverge. Now it's a manual operation to have those two clusters or to have these sysadmins basically figure out what diverged. And another thing to sort of mention is disk CP runs as MapReduce. So you're actually using up various cluster resources to use it and then of course comparing when those two data centers start to diverge comparing the two of those can use up all the cluster resources as well essentially leaving you out of service. Not to mention the human. The human element, right? How does a sysadmin actually, describe the process that a sysadmin has to go to? Well a sysadmin essentially has to ensure or they have to check and run sums against each cluster individually and make sure that the files are essentially lining up and when they don't they then need to also involve the user community and say I have two copies of your file. You know, which one do I keep and which one do I throw away? So now the users are totally ticked off, right? Because first of all you've advertised to them that you really don't have your IT act together. Secondly you're asking them to stop selling for, hey, stop selling. Need your help to figure out which one of your files we should keep. So that's a real backlash. And there's a similar problem with doing parallel data ingest. You know, a hiccup in either Hadoop, you know, if you're using Flume or even a load balancer using Flume, Scoop or some technology to kind of keep two different Hadoops in synchronization. Those are problematic as well. So those are just sort of the admin problems. Now when we talk about the use cases that we solve from a use case perspective, one is, you know, failing over to a disaster recovery site. We use that word before active, active. Both sites are active. You can ingest data into both sites. You can run jobs in both sites. So if one data center is down, what is the length of time that you have to go through in order to get that site back up online? And with Wendisco's nonstop Hadoop, it's essentially zero. You can just start running jobs in the data center that remains. Awesome. Okay, now we were talking to Jigain earlier about you just mentioning some other sort of competitive approaches. What else is out there in the marketplace like this? Like a nonstop Hadoop? Yeah. There really isn't anything because no other technology stands up a single HDFS that spans multiple data centers. Okay, so you stand alone in that regard. So how does this differ, for instance, for something like NameNote Federation? So we've heard a little bit about that from some of the other players. Yeah, NameNote Federation is interesting, but what it does is you're effectively separating the namespace. And what that means is you're saying, conceptually anyway, I'm going to serve A through L on this NameNote, all the A through L files, that's not quite how it works, but conceptually, A through L will be served here and M through Z will be served on this NameNote. So if a NameNote fails, you're kind of back in the original position, which is you can't service the part of the NameNote that failed. Again, with the active-active architecture, you're able to maintain continuous availability. You don't really need federation. All right, fantastic. So I think we've got our technical difficulties solved. So we're going to jump back into that demo. Why don't we just, if it's okay with you, could we start at the beginning? Take the demo from the top. Yeah, I think that would be great. So you've got two EC2 instances, right? Different zones, right? Correct. So two EC2 instances, one in Amazon EC2 East and one in EC2 West. So Northern Virginia and Oregon. So what we're going to do is we're going to run some of the standard sample applications. We're going to run TerraGen and TerraSort in data center one. And then to prove that all the data migrated its way between the two data centers, we're going to run TerraValidate over in data center number two. Along the way, what we're going to do is we're going to throw a bunch of various failure scenarios. We'll fail a local NameNote. We'll fail a remote NameNote. And at some point, we'll even crash the entire WAN link. Let's pull a plug or two, yeah. Pull a plug or two and crash the entire WAN link. So let's start that off. I'm going to do a Gen. So this is our little script that runs TerraGen and TerraValidate. We'll put it in directory SA2. So this is just the directory where the output files are going to go into. And again, zooming back in on the internals of this cluster, you see the three NameNodes start to respond to the events. Basically TerraGen is putting a bunch of information into HDFS, so we'll put it into a total order. We see all the NameNodes responding. And of course, over here in data center 2 in Oregon, we see these foreign blocks. So a foreign block is a block that is not yet replicated from the site of origin. In other words, we're putting data in data center 1. Everything is fine there. We asynchronously transmit the foreign blocks so that they don't block the current client while they replicate to data center number 2. Now let's throw some failures at this. I'm going to SSH into data center 1's NameNode, let's pick, I don't know, C. And I'm going to issue a reboot command. So what I've just done is I've rebooted this bottom machine here, and just a moment, the graphing application will update, and you'll see it flatline. Make sure I understand this. So you've done that while you've got data in flight? Data is in flight, correct. Yeah. And so you're feeling a plane in midair. Correct. So now while we still have our job running, I'm going to also kill a NameNode in data center number 2's NameNode C. So effectively what's happened here, let me set over to data center number 2. We'll do data center number 2's NameNode C as well, so it's in the same place in the screen, is while we're running our local ingest, I've also got those blocks that are replicating from data center 1 to data center number 2. I've killed the second one, so this one is down as well, and yet we're still replicating those blocks. So there's been no interruption of service on the client side. So the next thing that we're going to take a look at is that whole self-healing aspect that we talked about. So let's jump back one screen while we're replicating here. And one of the things that you'll see about the graphing application is one of the dots on the screen is turned blue. It's basically indicating that we're still providing service. It would have turned red if there was no service, but we're still providing service, but there's at least one NameNode down. And you see it's the blue dot on each side in both Virginia and Oregon. Yet a MapReduce job is still running. In fact, we're in the sort phase at this particular point in time. And in just a few moments, what you'll see is the first NameNode that I rebooted will come back online, so it'll turn green. What's really happening under the covers, because we have a global sequence of operations, is the NameNode will come up in Safe Mode. Now while it's in Safe Mode, it won't take any service requests. So clients will only work with the other NameNodes that are available. It will learn from the other NameNodes that it's behind in the global sequence. And then once it's completed that global sequence or caught up, it will become an active participant in the cluster. And as you can see from the screen now, each one of these is now back online. If I drill back into it, you'll see that they're servicing client requests again, and each one of those NameNodes is coming online again. And our job has just finished as well. Now we'll throw one more failure at it. I'm gonna run the application one more time. And the failure that we're gonna throw it at this time is a complete WAN separation. So I'm gonna break the WAN link between the two environments. So let's go over here. And I'm just gonna close the tunnel between the two by issuing an IPsec stop. So I'm gonna give that another moment. Maybe we give it a chance to replicate some of the blocks, but not all of them. And then we'll shut it down. Okay, great. Raising the bar. On the demo. And we're looking at a live demo here, right? This is a live demo running again in two clusters running up in Amazon, EC2 East region and West region. So about 3,000 miles difference between the two. So let's go ahead and stop the tunnel between the two of those. What you're gonna see now is you'll see each of the NameNodes. It looks like they're down. They're not really down. They're still running in their respective data center, but the client no longer can see it. This application, this graphing application can no longer see it. And for a visual perspective, if I go back to the on-screen map, what you'll see in just a moment, once it comes back in the screen updates, you will see that it turns red, indicating that there's no service between the two. But if you look at our map-reduced job up in the upper right-hand corner, it's still running. It is still completing transactions. And it's queuing up those foreign blocks that need to cross the wide area network. So the last part of the demonstration that I'll do for you here today is I'll bring that tunnel back up. We'll do an IPsec start. In just a moment, we'll see the line turn green, and then we'll continue block replication and have complete synchronicity across the two clusters. Because remember, while it's two data centers, it is a single HDFS across one, or across the land, or when. But Brett, yeah. That's awesome. It's not trivial what you guys just showed, all the tech behind there. But I wonder if you could compare this to other sort of techniques to provide resiliency to data. For instance, I'm thinking about things like bit slicing and people using Reed-Solomon code to essentially, if you lose a slice, or two slices, or three slices, you can recreate the data. How does this compare? Well, this is really a way of doing coordinated synchronization. So it's not like even like a block device synchronization technology like an EMCS, RDF. Those technologies are technically limited. Typically, latency is a big factor in those technologies, and so you only have a land distance of about 200 miles or so, plus or minus. With this technology, it's time-independent, it's when-independent, and it's completely software-based. There's no hardware that's involved in it. As I mentioned earlier in the discussion, it's a share-nothing architecture. There's no central coordination that if that machine is down, you have to wait to elect a new leader. Okay, and you do this through the Paxos algorithm, right? It is a patented implementation of Paxos that is able to cross those geographic boundaries, those geographic distances. Juiced up Paxos, all right, great. Juiced up Paxos, I like that. It strikes me as it must be funded, developed as product, because every day you're saying, well, let's try a new way we can try to break this, throw a new problem at it and stay one step ahead of it. Yeah, I mean, some of the first early-on tests is we would basically reboot name nodes and we'd bring them back, and then someone through, let's try a network partition where A can talk to B and A can talk to C, but B and C can't talk to each other. And that Paxos algorithm, that patented Paxos, handles all those conditions, whereby any network condition that you can throw at it, just short of no availability will keep consistency and availability across the network. Excellent. All right, Brett, we're getting the hook. Thanks very much for the demo, and appreciate you coming on theCUBE and going deep with us. All right, keep it right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from Big Data NYC. Right back.