 The Cube at EMC World 2014 is brought to you by EMC. Redefine, VCE, innovating the world's first converged infrastructure solution for private cloud computing. Brocade, say goodbye to the status quo and hello to Brocade. Okay, welcome back everyone here live in Las Vegas for EMC World 2014. This is The Cube, our flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, the founder of SiliconANGLE. We're going to join with my co-host, Dave Vellante, co-founder of Wikibon.org. Our next guest is Brian Doherty, CMA Chief Architect. Welcome to The Cube. Thank you very much. So I got to ask you, what's your impression of all the big news here? Obviously, EMC's pretty serious about all flash arrays, don't you think? Oh, nothing but flash. It's been a great experience being here. Flash is truly come of age and it's the talk of the conference. What about your companies? Let's start there, CMA. Tell us about what you guys do and what your role is there. CMA is we're a hosting data center analytics company. We host the largest Medicaid data warehouse and analytics in the country. We also deliver a products division. We deliver Oracle Rack clusters, a VQ big data product. And we host a lot of different analytics in four different locations around the country. So we say hosting analytics. Are you actually monetizing the data or are you hosting it for clients or a little bit of both? A little bit of both, but primarily hosting it for clients. And the future is monetizing? That's what we're looking at. So that's an interesting discussion I'm sure you guys are having inside the company. What's the 10 second bumper sticker on that? Well, we're just trying to figure out what the best way to exploit the technology is and the data that we have right now. Huge opportunity, how do we do it without ticking off everybody? That's correct. Massive amounts of data. The key is exploiting that data and that's what we're looking at now. So paint a picture of your IT environment. We're maybe talking about applications and how you're serving the business. So we have a very heterogeneous environment. We have a lot of Oracle Rack clusters. We have big data clusters. We have a Duke running in our environment. We have security infrastructure, Oracle Fusion middleware. We have a suite of analytic apps running. And we get kind of thousands of users coming in every single day and running tens of thousands of queries, different analytic queries, some long running queries, some really short and quick queries. So it's kind of a heterogeneous complex environment. So of course the big discussion, John and I do a lot of big data events and a lot of Hadoop events. The big discussion there is bringing real time to Hadoop. You hear a lot about SQL coming into Hadoop to widen the programming skill sets. Are you seeing that? What does real time mean to you guys? Real time is becoming to mean to us less than one millisecond, microsecond. And it's becoming kind of a competitive strategic advantage. It's no longer okay to have an analytics environment where people can run a query and wait five seconds for the response. We now have applications that are intercepting from devices, not only people now, and it requires immediate response time, microsecond response time, and it's just becoming a more complex environment. And so obviously Flash fits into this discussion very nicely. What's your experience with Flash? When did it start? Take us back to the beginning. About 15 months ago, we started looking at different Flash products. We had ultra-low latency application needs and requirements. We looked at several products on the market and stumbled along Xtreme IO, across Xtreme IO, liked what we saw, continued to test with it for several months. And we've been there ever since. Through the purchase, through the acquisition, and just continue to see value add in the product, pleased with the product, and we're kind of looking at the product a little bit differently today also. Some of the use cases you're seeing out there that are now possible with it. I mean, we're talking about snapshots, everyone uses snapshots, databases are now, tsunami of data is killer. With every new technology, there's always new experiences. User experiences, outcomes. What are you seeing as an architect that you're going for and how do you build an architecture for potentially unknown opportunities? Right. So there were a couple of reasons, a couple of things that are exciting to us. One is we have an Oracle Rack Accelerator product and we're having a very difficult time getting the IO throughput that we needed. It's a Linux rack-based cluster and then we'd plug a VMAX or a very large storage array in the back end and then ship it to a customer and plug it in. Well, that becomes very difficult thing to do. So we were looking at a much smaller form factor, something that would fit into a 42-year rack that had the power that a large-scale array had that we could quickly deploy and deliver. So one of the reasons we looked at this early on and very excited is we could essentially shrink wrap our clusters with the IO power and capability that we had a large array and that was a critical thing for us, crucial thing for us to do. What about, you mentioned that sort of example where you sort of struggled to do this with the VMAX and then, but what about the stack? Because VMAX you trust, right? It's got all this hardened software, it's got two decades of testing and everything else. Did that make you nervous or was it the type of workload that didn't make you nervous because you really focused on opportunities and how did you rationalize that? Anyone would be kidding if they said it didn't make them nervous to move off of VMAX into another platform. That's good. But one of the things that is pleasing to us and one of the reasons we're with Extreme IO is the reputation of EMC. We know EMC. We know EMC has value as a great research and development, great service. And that was one of the things that was very, very compelling to us up front. And we saw the performance that we could get Extreme IO, the scale out performance. So we knew that we were not going to tap out the performance that we had. So yeah, so it was scary up front a little bit but we knew EMC was backing it. We knew they had great technology, great engineers and we've grown to trust it just like a VMAX. And how would you characterize the state of the sort of Extreme IO stack? Would you say that it's, I mean obviously it's not as mature as a VMAX, but where is it at? I think two things, it's from the time we first started using Extreme IO, we were looking at performance, performance, performance. That was a crucial thing for us. But lately in the last couple of months, we've grown to look at some of the more of the value add features that are there. You mentioned snapshots. Snapshots are a crucial thing for us going forward because we struggle right now. We may have 50 development projects going on at any given point in time. We have sets of developers, we have QA and test. It's very difficult for us to support five to 10 different images of the production database and to serve that out to the different development groups that are there. With snapshots, we can now do that easily and quickly. And we don't have to compromise performance, we don't have to compromise latency. We can let a developer, a QA tester, run on an environment that is essentially a production environment. I wonder if we can talk about that a little bit. So, a lot of people complain about copy creep. It gets expensive. But you're telling a different story. Why? Help us double click on that a little bit. It's the metadata and it's a value add of the operating system itself. With Extreme I.O., it's possible to leverage a smaller set of core flash data and to essentially multiply or project that very efficiently out to a heterogeneous group of users and environments. In the past, you can do that, but it takes long and it is a higher overhead operation. So, this is very efficient, low overhead, and we can exploit this to many different groups at the same time. So, you're a big Oracle shop. One of the things that we've looked at at Wikibon is the impact of bringing in more flash into the environment, optimizing the storage infrastructure. Actually, maybe spending a little bit more on flash and then reducing the number of cores that you may require because Oracle license and maintenance costs are based on, well, license costs based on cores. And oftentimes it's 50 plus percent of an overall total cost of ownership. Are you seeing that in your environment? That's another big thing, big thing for us. We're seeing three things along that line. We're seeing an ability to shift from a larger and a more expensive hardware platform to a smaller x86 based hardware platform, get the same performance. We're seeing the ability to exploit the processing core more because of the reduction in the latency off the drive itself. So we can shift to smaller hardware, get better utilization, reduce the core set, and we're benefiting from the ability for essentially it's half the cost on the Linux platform versus a, for example, an AX platform for the Oracle cost. So really three things that drive costs down for us. The shift x86, the reduction in the hardware cost and the exploitation, the higher utilization of the core count that's there. Okay, and so but not yet reducing license and maintenance costs. Is that a future? Well, no, we're doing that right now also. I mean, that's because we're able to reduce that core count confidently because we have the power in the back end and it comes right through. So, but it's not just that core count reduction. There's all three of those things really, really, you know, really lead us to a great environment. In combination. How about, there's a lot of talk about data scientists and the new rock star and shortage of data scientists. What are you guys doing in the world of data scientists? You're hiring data scientists, you're training data scientists. Is it a key part of your organization? What's happening, what we're seeing a lot of is kind of a retrofitting of analysts, scientists, PhDs in our environment that were using older technology and now we're kind of redirecting them to the newer technology that we have. So some of it is moving from, for example, from something like SAS to R. So they're, you know, different software that's there moving to Hadoop platforms and just the power and the speed of flash and some of the other technologies they're able to iterate much more quickly through the data. You know, not run a query, wait for three hours and run a query again, but actually run a query 15 to 20 times within a five minute span. Where in your Hadoop infrastructure are you actually applying flash? Are you using Hadoop as a filter and then running the, you know, real-time analysis on the nuggets that you extract and that's where the flash is? Or do you see that real-time nest getting into Hadoop and the flash actually permeating into that infrastructure over time? Well, we're using, actually, most of the time we're using flash in the Hadoop environment. We may have Hadoop on some different storage, but we have the high-performance in-memory databases or the high-performance MPP databases sitting on the flash itself. So it's really kind of a mixed environment where some of the core Hadoop is running on kind of a little bit, maybe, internal slower disk, but the in-memory database is the whole stack. The database part of the stack is running on the flash. Right. And what example, Gemfire may be running on flash. Sure, okay. So what's your take on the whole NoSQL trend? Do you see that being part of your portfolio in the future, is it already? What people forget sometimes, I think, is that, you know, the industry spent 30 to 40 years developing optimizer technology. So you just don't develop optimizer technology overnight. So there's a lot of things you can replace in that relation environment, but 30 years of optimizer technology is not something you can replace. So some things will be fine with NoSQL, some things will be fine for kind of a lightweight SQL, but some things are going to continue to require that optimizer depth and that will continue to stay in the MPP, the in-memory databases, or the Oracle databases. So I could say, okay, so it's fair to say you've looked at it. Yeah. You're not deploying it today. We're judiciously deploying it. Okay. Where's the good fit for it? Something quick and dirty that doesn't require a lot of elegance and complexity in the query path. So kind of sifting through data quickly, taking a quick, broad look at it, and then maybe at that point, you may load it into a relational database, an MPP database to do more in-depth mining and analytics, but there's a lot of scrubbing, a lot of screening, there's massive amounts of data, so it can be a good tool to kind of pre-qualify data before you go further down that path. Brian, what role does visualization play in all this, and how are you visualizing all this data and turning data into insights? Yeah, one of the most difficult things for us is to go through 100 terabytes of data and then to pick out the 15 kilobytes that make sense. So you have so much noise in the system, but yet you need to get to the value out of the data. So we're drowning in data flowing in, but we're having a very difficult time sift through that. So there's some good visualization technologies that can be employed to do that, but there's still a long way, I think there's still some work to do in that environment, and we're still struggling with it ourselves. All right, Brian, John, extracting the signal from the noise. Brian, do we hear inside the cube? We are rocking and rolling three days of wall-to-wall coverage, not one cube operation, but two cube double-barrel shotgun of thought leadership. I'm John Furrier with Dave Vellante. We'll be right back into the short break.