 From New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Jeff Frick. Welcome back to New York, everybody. This is theCUBE, the worldwide leader in live tech coverage. This is our special presentation of Big Data NYC and as part of Strata plus a dupe world, every year we come down and do our show within a show, we've been going, Jeff, four days of nonstop coverage, starting with deep learning and machine learning with NVIDIA, we had the big IBM party on Tuesday night, yesterday wall-to-wall coverage, and today, more end to end. So, Sean McEwen is here, he's a technology solutions architect for data center and virtualization at Cisco. Sean, welcome to theCUBE. Thanks very much, look, great to be here and I really appreciate the time, man. Yeah, you're welcome. So, we have been having quite a few conversations about what Cisco has been doing and UCS and some of my early misconceptions about what that was all about. You guys changed the game there, but talk about your role and how it relates to big data. Absolutely, yeah, I've been working here in Cisco for about six years in the data center space and driving solutions in the big data arena for most of that time. We've been working with our ISP partners, Cloud Air, MapR, Hortonworks, IBM, the kind of the who's who in the big data space, going back about that far and building solutions together with our hardware and some of our software and theirs as well, creating validated designs and testing and building really solid solution products that our customers can deploy easily and kind of get into this new technology area, feeling comfortable really with it. I think so many of our customers are coming from that kind of traditional mode one world where they've done everything in the one way for so many years and they need, I think a little bit of guidance on how to move into this new space. This is a big data world that we're in, right? The architectures, I think, is different enough that you really have to take a different approach to building out these large clusters. Yeah, we look at a kind of classic Oracle or SQL Server world and you've got that separate compute and storage and always having to manage everything independently, separately and here we're building what effectively is a supercomputer, right? Out of commodity, industry standard pieces and you want to make that all work consistently together and that's what we've been doing with our partners. So I remember my first to dupe world had Mike Olson came on, I was like what's dupe? Give us a one-on-one, he said well back in the day you'd buy a sun server, run Unix, you'd take some Oracle licenses and if you had any money left over you'd start to develop applications and he sort of described how Hadoop was all about taking five megabytes of code to petabytes of data and that was sort of the epiphany of Hadoop. How do the requirements of big data differ from an architectural perspective? What's the implication from an architectural perspective? Yeah, absolutely, I mean if you look at those classic database designs or relational database designs where you've got that separate compute and storage it made a lot of sense, there's a good reason for doing it you could independently manage the two pieces and when the dataset was small-ish at least by modern terms, that was the right way to go but when the datasets got big and when you were trying to address a significant portion of them in a given query in a given job, that old method it just didn't work anymore, it was funny I was talking to my son the other day and he asked me this question out of the blue and he's like, dad, when people want you to roll down your window, why did they do this? Why didn't they take that gesture? Right? And I do it all the way, all right, yeah. Okay, it's doing it the old way, right? So you gotta recognize when things aren't the old way anymore and you gotta, you know, the different problem to solve, right, so now the problem is really throughput to the data like trying to drive, basically get that data into CPU cores, into RAM as quickly as possible and to do that, you gotta get those CPUs near the data you gotta get the compute near the data and scale horizontally rather than vertically and that's what Hadoop did so well, you know and really revolutionized the whole world 10, 15 years ago, 12 years ago, something like that, yeah. So what's Cisco's role here? You're saying you're building out architectures and reference architectures to help customers figure out how to exploit data, talk more about that. Yeah, yeah, so, you know, transitioning into that mode of managing systems at that horizontal level, what happened, you know, you start building these clusters and they start maybe modestly at 10 or 20 nodes but pretty quickly, we've seen it over and over again, our customers come back in six, 12, 18 months and they double and double again and double again in size and so now you're dealing with this cluster that might be 50 or 100 or 200, we have customers with thousands of nodes and how do you deploy that in a consistent fashion? How do you manage that in a consistent fashion when you start to have something that's of that scale all, it's, you know, plenty of our customers have thousands of servers in their data center but they're all doing different things. In a super computer, you're trying to have all of these computers work together to do one thing, different jobs at the same time but basically working under one framework. That becomes a real challenge as they start to scale and so what we've done with UCS is to essentially make managing individual physical servers templatize. We have what we call service profiles which allow you to create a single template for any group of servers, tens, hundreds, thousands of them, stamp them out in the build process. They all have the same network settings, bios settings, firmware versions, all of that kind of stuff and then when it's time to maintain them, when it's time to upgrade a particular component, maybe you want to experiment, maybe you want to check whether your workload works better with hyperthreading off or on or something like that. You can make that change at a global level across the cluster, flick of a switch, right? And so doing that and then being able to manage that in conjunction with validating it against all of our ISP partners is I think a key area for value that we bring to our customers. I was going to say we hear over and over that there's just a shortage of people to really get involved in here. There just aren't as many as are needed. So to come with kind of a validated architecture that's put together with the right partner so you remove a lot of that complexity, you're giving them something to start with. And you can't, I mean if you look at IT spend, it's essentially flat, maybe even declining on a year over year basis. Data is exactly the right data's going exponentially up. You can't add people every time you add a terabyte, it doesn't or a petabyte, you can't do that. You've got to flatten that curve with a management capability. So as the cluster grows, we're trying to keep the consistency and management straightforward and keep that curve flat, right? And adding on to that, not just at a hardware level, we've been able to build some software management products. There's a product we have called UCS Director Express for big data and it extends that not just to the hardware itself but it actually now communicates with our ISV partners tools. So with Ambari from Hortonworks, Cloud Air Manager, MapR Manager, that sort of thing, with Splunk now as well. So that when you use that tool to provision the hardware, when it's done building the hardware to the spec of the reference design, it will then start speaking to the appropriate ISV management tool and have it do its provisioning as well. And now the software and the hardware know what each other are doing and we keep that consistency again across the spectrum. I wonder if I could ask you, go back to the scaling question. In the early days of how do people would complain? I wanna scale, I wanna scale a compute independent of say the storage and they had trouble doing that. Is it still a problem? Are you helping address that issue? Yeah, I think there is still a desire and I do hear this from customers to have some flexibility in how the storage scales relative to the compute because I think the hardest thing for customers is sort of knowing ahead of time what the right ratio is. They'll do their workloads, they'll do a POC, they'll guess as best they can. But boy, it's hard to know. And even if you guess right, six months later it's probably wrong. Six months later, the rest of the business units have figured out that this is a cool thing and then they want in on it. Now everything's kind of, all your spreadsheets are broken. So yeah, we are trying to bring a more, and I know this term is overloaded, but bring a more composable or flexible approach to adding compute and storage to the cluster. We have a new product called the C3260. It's a, if you get a chance to look at it in our booth, it's pretty impressive. This four RU rack server, it's got 56 drives and two servers in it, but you can figure it with one or two servers and that's the key is that you can start with a chassis that's partially populated, one server and half the number of disks and if you need more disks, you can add more disks, if you need more, another server, more compute, you can slide in another compute slide and adjust kind of as you grow to accommodate. And we've got customers that are doing exactly that. So you mentioned before ISV partnerships, discuss their role, maybe name names if you can. Sure. What are they doing with this? Yeah, so I mean, going back, and this is my, I don't know, I've probably been, I think I started at Strata Hadoop World before it was Strata Hadoop World. You know, 2011, something like that. It was Hadoop World. Yeah, it was just Hadoop World. I was there at the Ortonworks, one when they announced Ortonworks too. It was just, it used to be the Yahoo version of Hadoop World. So yeah, we've been working with these vendors basically since their inception. Cloud-era Ortonworks map are more recently IBM. We've added into the mix with their big insights product and Splunk as well, obviously less from a Hadoop perspective but from data ingest for machine-generated data. And all of them have been just really stellar partners for us. We bring their engineers and ours into the lab, have sit down and decide what is the best configurations we want to offer to our customers and then test the hell out of it. And we've published, there's a new TPC benchmark now for big data, TPCX-HS. So you're probably familiar with TPC from the old relational database days but there's an industry standard and a needed one because there's all kinds of claims about how fast X, Y and Z is in the big data world. So it's a very public industry standard benchmark and we've published with all the vendors I was referring to there and set records in each case. Every time we publish we were able to put a new bar, move the bar higher rather. Sean, what's the gnarliest big data problem from an architectural standpoint that the industry needs to solve? Yeah, that's a great question. I do think the management of the systems at large scale becomes pretty hairy. It's easy enough to build a system on your laptop or a five node cluster and have it be functional and actually get some good value out of it. I mean, get some good results that, and this happens all the time. We see it, we'll go to a customer environment where they have done a POC on whatever they had lying around and build it up and show this incredible speedup over whatever the previous system is. It's a very different animal to take that five server POC and build a 500 server environment. So that itself I think is a pretty big problem that I think we're solving reasonably well. I think maybe the next, and you're seeing some of the hints of it coming out, these new technologies like, well, SSD is not so new anymore, but NVMe and some of the cross point memory technologies that are gonna be coming out and becoming more affordable in the next few years. I wouldn't call that a problem so much as a challenge to figure out how best to take advantage of these technologies at the right price point. And you basically wanna use the right tool for the job at each layer of the software stack. Use memory for what memory is good at, use SSD for what SSD is good at and so forth. And I think there's gonna need to be some innovations in there to be able to balance those resources appropriately. Yeah, because it changes the balance. It changes the bottlenecks. The spinning disc used to be the bottleneck. And Flash comes in and all of a sudden, whoa, it exposes the network which you didn't have to worry about as much because you had the disc slowing you down. And once we see the adoption of the disc SSD and NVMe technology, that pendulum's gonna swing right back up to the network again. Yeah, move to your next point of failure. 10 gig is pretty much the standard at this point for all of these clusters. But as soon as, and we're already seeing hints of it, we're already seeing some customers asking for 40 gig interconnects at the server level, not just above, not just at the spine layer. So it's the classic game in IT always hunting down that next bottleneck and that's how we make progress, right? So that's to, as you provide scale out architectures, you gotta be able to share data. So you're seeing we need fast pipes between servers now. Absolutely, and it's always gotta be at the right price. It's always gotta be a price performance equation. You can't just say, I want faster, well, I can give you 100 gig pipes if you want them, but it's not gonna be cost effective, right? We've gotta find that right balance. But by definition, if they're successful with the POC, then they're gonna want to do more data. The data from those sources is also growing at the same time, so it's more sources and the data within the source is growing and now the application stack, as you said, other people, I want some of that too. And that excitement comes, you know, I think part of it too is, especially in big data, so much of what we've done in the first part of this century, that always sounds weird to me, the beginning part of the century. I was just thinking of like the 20s. It has been cost savings arguments, you know, by this solution, because you can do what you're doing for less money, we'll save you 20, 30% on whatever it is your workload looks like. With these big data environments, they're net new, they're almost always net new capabilities to the business and that really, that creates that excitement that you're talking about, you know, the customers come in and somebody shows one interesting POC event in the lab and then the word gets out and the business guys are all over it, right? And they want it quick, they want it right then and there. So yeah, absolutely, that challenge is huge for all of our customers. All right, we have to leave it there, Sean, but I'll give you the last word. What do you want people to know about Cisco and big data? Boy, I guess it's almost the kind of the flip side of the open source software world, right? Where open source is a fantastic movement and our enterprise, you know, customers are absolutely interested in the open source world and it's not because it's free. Free is okay, you know, it's free like a puppy, right? You gotta watch out for what comes with the, kind of the baggage there, right? And that's where, you know, ISP partners like Cloudera do a great job of taking what's free and all the goodness there and making it easy to manage. Well, we're the other side of that coin on the hardware side, right? You can build these clusters with, you know, like a stitched together bunch of laptops and it will function, right? But you're gonna be dealing with that free puppy at some point, so having a little value at it on this side. Puppies grow. Right, yeah, yeah. Sean, thanks very much for coming to the queue. I really appreciate it. I appreciate it. All right, keep right there, buddy. We'll be back with our next guest right after this, we're live from New York City.