 Okay, we're back. This is Dave Vellante of Wikibon, and we're here with Colin Mahoney. We had a segment earlier with Colin. Brief segment, high-level overview, drive-by. Thanks for coming back. No, thanks for having me back. Jeff Kelly is here as well. So, let's get deeper into it. We were talking earlier about Vertica 6. Big announcement for you guys. First of all, 6, that's a good number. It's a great number, yeah. It's not Vertica 2. There aren't too many version 6 of big data platforms out there. Right, right. And we're also talking about this whole trend of bringing together what started as a dupe, sort of exhaust into the traditional enterprise. So, why don't we pick it back up with Vertica 6. What's the real intent? What are the real innovations that you guys are bringing there? Yeah, so I touched on a little bit of this earlier, but let me go into a little bit more detail. There's really three things that we've created in Vertica 6. And this has been a long time coming for us. The good news is we actually got it out ahead of schedule. So, the core Vertica team that's built this company, still intact, still committed, still energized as ever. And we continue to build the company as we go forward here, part of HP. But what we did in Vertica 6 was, number one, we added to our FlexStore, which is something we introduced a couple releases ago. And what that allows us to do is, people love the Vertica SQL. They love the SQL engine. They love that we have some advanced functions for time series analytics, rolling windows, real time sessionization. And what they said to us was, it would be great if we could use those same analytics against not only Vertica, but maybe even some other data sources. Now, why would they want to do that? One reason they want to do that is before they pull a bunch of data into Vertica, they might just want to do some exploration on their data. They might want to go through something in another database, go through something on Hadoop. But ideally they'd like to do that through the single Vertica interface. And so, the first thing that we've enabled in Vertica 6 is the ability to do that. And it's through a combination of some very advanced external table functionality that we've added, as well as something that we're calling user-defined loads. And that's part of our SDK, which we introduced back in a few releases ago. And the combination of those two things really let you parse out and tell us what any data source is. So if you have some sort of non-GZipped file, as an example, we support GZip natively, but now any Zipped file, 7-Zip or anything, you can natively load right into Vertica. And so you can pipe data from an existing database, from an existing data warehouse, from Hadoop, et cetera. And I think one thing that we've always said at Vertica is you've got to use the right tool for the right job. And so we fully admit, there's a lot of other data sources out there. And we want people to be able to take advantage of that immediately. And then what's also really nice about it is we set this up so that at query time, we'll pull the data in. So you can say at load time, I just want to load it into Vertica, or I'll do it at query time temporarily. And then if it's something that you want to keep in the Vertica database, we make it seamless for you to do that. So that's the first thing. The second thing is adding the R framework inside of the platform. Being able to do massively parallelized, really intensive, complex statistical analysis for data scientists and other people that want to make decisions on their information. One of the most costly operations there is to doing this type of analysis is you've got to extract information typically from a database, then run your analytics against it in a small data set and then put it back. With Vertica, we get to run all that inside and give you the advantage of the compression, the massive parallelization, et cetera. And then the third and final thing, big thing in the release, is taking advantage of more HP conversion infrastructure, more offerings, we can run on 3par, we can run on the DL380s, the BL460C blades. So we keep expanding what we can run on with HP's hardware. We are agnostic, we're standard x86 base, we are software offering ourselves. And also the cloud. We made some announcements today on HP's converged cloud. So really just giving customers the choice of where they want to run it. And then beyond that, as with any one of our releases, we've got things are running faster, we constantly tune what we do so that we can always make it better. And so this release has a combination of all of those things. So you've, I'm sure, heard the whole rap from the Hadoop community back in the day. If you had a big data problem, you'd buy the biggest box, the Unix box you could find, you'd pay Oracle a bunch of money and having money left over, you might be able to get an office or something like that. And so the Hadoop crowd brings the code to the data. We all sort of know that. But now we're in this mode of, okay, great. I got all this data out there and I got to be able to analyze it. So I got this cool Hadoop application and it's doing what I want it to do, but I need now to pull the nuggets in. So, and can you help me with that, number one? And can you help me without going back to the God box? The monolithic app, yeah, absolutely. And that's exactly what this addition to FlexStore does. You've got a great Hadoop cluster running. Keep it, you don't have to change anything. Now from Vertica, however large Vertica is, you just point to that as a data source and we do all the heavy lifting. And you can decide whether or not you want us to actually move the data in permanently or just temporarily. So now what we can do that in my mind really, no one else can do is let you leverage the best of Hadoop and a cluster of hardware to scale out your Hadoop environment and the best of the Vertica system. And so you can run any Hadoop jobs like MapReduce, you can run any SQL and you can even use a lot of the advanced analytics that we have pre-packaged against Hadoop and the Vertica platform. So unlike anyone else now, we have the ability to give you the best of breed Hadoop experience and the best of breed analytic database experience. And I think there's a lot of companies that talk about it, a lot of younger companies. We for a long time had companies doing just that very thing. And so what drove us to put this in Vertica 6 is a lot of customers came to us and said, it would be great if you could create this ability. And that's exactly what we did. We were talking to Cloudera the other day. They're obviously making some great progress. They've hired a ton of people over the last couple of years and you're just finally starting to see that come out in some decent products. But the state of the art to do what we just described was sort of have a connector, right? Or maybe you'd use Hive. Are you going a step beyond that with your vision here? Yeah, we absolutely are going a step beyond. I think not only with the vision, but with the offering itself. For a long time, we were the first analytic database company to come out with a connector. And the connectors are great, but typically how you hook up those systems, how you connect them together, how you run those jobs, the connector can only do so much. And the connectors are very much associated with moving information back and forth. And what we really wanted to do was separate the Vertica analytic engine, if you will, from the Vertica storage engine, the columnar storage that we have. And by doing that, you still get to leverage Vertica for the analytics and the full SQL support. Whereas Hive and HQL, it's not full SQL. It's not going to work against every BI tool. So part of what we can bring is using this technology in FlexStore, you can take all your traditional BI front ends connected to Vertica. And it has no idea that you might actually be pulling right off of HDFS or Hadoop. So I do think it's very powerful. It's much more advanced than what anyone else in our space has. Yeah, outstanding. That's, again, I see that as a huge problem. We're talking a lot of big data practitioners. And they don't want to go back to the God Box model. So I'm glad to hear you're doing that. I wanted to talk about the whole landscape. I said earlier, you and I met on a plane. Gave me some great perspectives. At the time, Cloudera was out there. I had a big lead. But since then, we've seen just a spate of announcements, new distributions, Green Plum and EMC kind of did their thing with MapR. And then they've broadened that out now. Sort of playing on a wider field. You've seen Hortonworks spin out with some serious VC behind it. HPCC, I'm not really sure what the status of that is, but certainly Hortonworks has some juice and some legs with Apache. So what's your take on that whole landscape? And specifically, analysts like me love these reference models. They say, okay, we're going to see, who's going to be the next red hat of a dupe? And others have said, man, there probably isn't going to be a red hat of a dupe. What's your take on that whole landscape? And where do you guys fit into that? Yeah, so my personal take is I just don't see there being a red hat of a dupe. I see so many different initiatives, so much innovation going on by not only Cloudera, but Hortonworks and MapR and a whole slew of companies, including some very large companies that people didn't even know what they were doing. It's some stealth guys that you haven't even heard of yet. And stealth guys you haven't even heard of. And I think the Apache Foundation itself is also adding a lot to it. We've taken a very pragmatic approach. We're very open standards base, just like we work with a lot of front NBI tools. We work with a lot of ELT, ETL partners. We want to work with all of them and we want to be able to easily hook up to all of them. And so that's what we focused on. And then yesterday, HP announced the app system for Hadoop on the ESSN on the hardware side. And they've also taken an agnostic approach. They looked at the landscape and they said, we're not just going to bet on one horse because we've got a lot of customers that are using different distributions. We want to be able to play with everybody. So it's hard for me right now from my vantage point to see a single dominant company in the Hadoop space. What I have been seeing is that Apache seems to be getting a lot more proactive on this. And that distribution seems to be getting a lot better, which is wonderful for us. And I think it's wonderful for the end customer. Everybody, one thing that people gravitate towards Hadoop for is it's free. Hadoop is free. Well, there's hardware and there's services and there's expertise you have to have around parallel programming on top of it. But I think all the advancements that are happening on the software, especially the Apache software, it's really forcing the vendors in that space to innovate just above the standard distribution. And I think that's great for everyone. I wonder if we could talk a little bit about applications. Big data apps. I think it was Mike Olson, Dave, who said, there's just not enough application development going on in the big data space. And if you've got an idea, let me know, I'll connect you with some VC. On the other hand, you guys are doing some things with well-known clients like Zenga and others, but also more traditional markets. So what are some of the end user applications, the more popular types of end user applications that people are building on top of Vertica? And what role do you guys play in that kind of application development process? Yeah, so that's a great question. We admittedly are an infrastructure provider. So traditionally, Vertica has not created applications itself. We are the underlying engine, the underlying platform, the database, and we've partnered with a lot of companies that create the applications on top. Within HP, there's so many groups, when you think about ES, the services organization, by industry, who have built applications, there are parts of HP software that have actual applications. And so part of what we've been doing around our vision of analytics everywhere is how do we embed Vertica as part of those? And one of the true benefits of Vertica is that it is embeddable. Our server, RPM, is less than 50 megabytes in size. We can run, and we do run, in some OEM appliance platforms on a single RU rack unit. We're part of other pieces of software. So we're continuing to do that. We're working with ES. We're working with a lot of other external service providers. And that's where we're going to build most of our applications. Now, there are some that we've been working on internally, where we see a lot of repetition around something that seems to be difficult, that we think- What are some of the examples of that? So, clickstream analytics is one. And actually, we just announced something with autonomy around their optimal clickstream analytics solution, where Vertica is the underlying database. They're taking orders of magnitude, more clickstream views into this, and they do all the visualization and analytics on top as part of their optimost engine. So that's one example. I see a lot happening around web data, clickstream data. We've also been doing a lot of innovation around sensor data. Part of the advancements in our SDK make it easy with the user-defined load to load up any type of sensor framework, or any type of sensor format into Vertica. So you'll see, we'll continue to innovate around there. In these high growth areas, where really nobody has a solution, because for us, that's the best place to go from a growth perspective. So, I wanted to run this premise by you. Jeff and I have been sort of batting it around, met with Peter Goldmacher from Kowin. So, we met him, and he sort of turned us on to this idea, which I liked, which was that big data practitioners are going to create much more value than the big data technology suppliers. And I added to that basically, okay, what customers do with the technology and monetizing data is going to really be where the action is. And the more I think about it, the more I believe that, because the productivity impacts of big data, if it can live up to its promise, which I think it's already beginning to, we can see that, is going to be huge. And so, I wonder if you could comment on that, what your thoughts are, specifically as a former entrepreneur in residence of a VC, is that from an investment standpoint, how do you capitalize on that? And what does that mean for you? Well, I'll tell you something interesting. I was in a CIO panel today in a forum, and I posed the question to the audience, how many people in here have a big data problem? Almost every hand went up. And then, right after that, I said, okay, and how many people in here feel like they're doing a great job with big data and they've really got their arms around it. Not one hand went up in the room, not one. And so, when you look at it from a technologist perspective versus a practitioner, I believe that companies like HP, we have to do both, because people don't care about the data. To them, I always say the data is black sand. They care about the gold, and that's the information in the data. And one interesting thing about the cloud is, now you're really combining these things, right? In some ways, the cloud obfuscates what's going on behind the scenes so that we can just provide a solution. A solution that a practitioner can take advantage of, that a business person can take advantage of. And I believe HP has a huge advantage here, because we've been doing not only the infrastructure support, but a lot of the business practitioner support as well, from our acquisition of EDS and ES and Knightsbridge and some of these other groups. So, I think the opportunity for us as a company to gain the most value out of big data and to frankly help our customers the most is to offer not only the technology, but the solutions and the know-how of how to apply this and how we applied it in one industry that can map directly to another industry. And there is such a gap right now by every company and who can take advantage of big data. There's not enough data scientists. There's not enough people in IT that understand maybe what can be done. There's not enough business people that know how to take advantage of it. So, the opportunity is getting in the middle of those three things and making sense of it all for those three parties. Excellent. I wanted to go back to something else while we have time. We're like minus five minutes, but let's keep going. I want to talk about your architecture. Yep. You know, sometimes we get into the weeds, but it's important, I think, because there's a lot of, when I first met you, you described to me, and I had had some exposure to Vertica earlier, a couple years ago, several years ago. It was very impressed, but you seem to have a good handle. Line up the horses on the track. There's you guys, there's Natesa, there's Green Plum, there's AsterData, you know, and some others, Paracel. What is it about Vertica that makes it outstanding? Yeah. So, you're absolutely right. There's a lot of solutions out there. And what's ironic about this is a few years ago, all the big guys poo pooed the columnar thing. They just said, oh, that doesn't work for this, that doesn't work for that. Now what's happened? Territate is talking columnar. Natesa's talking columnar. Oracle acts like they invented it. Oracle acts like they invented it. Green Plum's talking, everybody's talking about it. I think what is so unique about Vertica is our founder, Mike Stonebreaker, database legend, obviously understands the space. He did Ingress, he did Postgres, he was a CTO of Alustra, and ultimately in Formix. So he's been around the block, and he took a look at this, and he could have used Postgres. He's very comfortable with Postgres, which is what a lot of these other guys, like Natesa and Green Plum and the like, have done. But he said, we need to fundamentally write this from the ground up. And there are a couple key design decisions that were made by Mike Stonebreaker, by Andy Palmer and the founding team, one of which was we are natively columnar. So everything we do is columnar. That means not only do we get the most advanced compression, but we only pick the columns that we need for a query, so we're not doing full table scans. So it acts like a native index without the overhead of indexes, without the latency of an index. Number two is by being columnar, we get super compression and footprint reduction. Not only helps us with density and power management in data centers, but it actually speeds up the query performance as well, because we have to pull in less data and we operate on encoded compressed data. So we're able to do our analytics against the data while it's compressed. So as an example, if you've got a trillion rows of data, and one of your columns is US states, we know there's only 50, we run run length encoding against that column, as well as a few other things, and now we're only storing 50 values in that column, and when we do the analytics, that's all we need. The other thing that was really important was this notion of batch overnight loading and traditional databases, most of these row stores, they take a lock on the data as your loading data, which means if you're trying to query the information, you're not getting fresh data. So Shopa Lawande and our engineering team have built an amazing ability for us to just continuously stream data in, and it goes in memory, so we leverage a hybrid of in memory and on disk, and the data comes in memory, and then a lot of our secret sauce is how we get it down to disk really quickly. So we not only have super load performance for a column store, but we usually load data faster than any row store out there as well, and you're hitting data within a few seconds after it gets created. So we use the word real time a little bit loosely. It's not millisecond real time. Not the only one. Yeah, admittedly, full disclosure, but compared to most traditional data warehouses, it's real time. And the point I made earlier today on theCUBE of, Jeff was asking that you can spend a lot of time massaging your data and getting it perfect, but guess what? By the time perfect is done, it's sort of what Cain said, and the long run will all be dead. So we've got to make sure that people can take advantage of that information, even if it's not perfect, right when it comes in, as long as they know that this is the raw data. This is the data that's just coming into the engine. And so we make that available, but we also let them do all the traditional things in an EDW as well. So let me ask you a question on that follow up. So you've got some secret sauce that addresses the horrible storage stack, I'll call it. The spinning disk and all the SCSI protocols in between. Does the advent of Flash minimize that advantage for you? And allow other people to catch up? So it's a great question that comes in because if you think of Vertic and what we do with the columns, you could say, well, just put it on Flash and now you won't have a spinning disk head arm reader and you can just easily get to the data. Well, it turns out that Flash actually can help Vertica quite a bit too. And we've done some initial benchmarks where it can make a big difference. The second thing is that having the compression that we have is really helpful in Flash right now, especially given the cost of Flash technology. If you want to store petabytes of information which some of our customers do, you're not going to want to do that in Flash today. And you're going to want to do it in a compressed way. So it might remove somewhat of an advantage, but a lot of our competitors have come out with Flash-based appliances and we're still delivering incredible performance boost against them. So what I say about Flash is it benefits a row store in how the disk head arm doesn't have to move around to find some of the data, but it also benefits us in many ways as well. And actually some of the workloads that we don't do today that are more OLTP like you could imagine that having Flash might make a column store really fast at doing OLTP as well, which is not an area that we focus on today, but you could make that same argument. So are your customers deploying Flash today? Some of our customers have a Flash tier inside their hardware, so whether it's Fusion IO, PCI Express cards, or maybe it's even some SSD drives instead of spinning magnetics. Inside the array, yeah. Inside the DL380 or inside 3-par or whatever it is. And some of the things that we're working on too are basically tiering where you could, you can tell the database, I'd like to physically store this data, maybe the most recent three months in Flash or memory and the next data here. That's a lot of what we're working on as well. All right, cool. You had some questions? Yeah, you mentioned the ability to analyze data immediately after it's ingested. So I think a lot of people can understand the benefit of that maybe in a financial services situation where you've got one second and one more piece of information on a trade is going to make a big difference one way or the other. But what would you say to people out there in other industries? Let's say retail, manufacturing, why is it so important that I have that real time access to data versus waiting, okay, I got to wait an hour? Why is it important? Well, so I was amazed. Two weeks ago I was in Australia and I went to one of the largest mining companies in Australia and they brought me in and they said, you know, there's a mining boom going on and it's not going to last forever. So time is money. They have a 40 terabyte data warehouse and what they've found was that the fastest query they have takes about 14 hours to process. And the slowest thing in the operation is actually not the enormous dump trucks that are moving the physical materials around, it's the database. Doing the analytics and the modeling from satellite feeds and from all the net present value that they run and everything they do with the minerals in the earth to figure out how to optimize the value, they get one hit at this thing roughly every 14 hours in an industry where they have a fixed amount of time to do what they need to do. So every at bat they can get and the more people that can get into this, they can make the deployment decisions to take advantage of it. So to your question, I've been amazed at how many industries outside of financial services find the value in being able to run the query faster. Now, not every organization is ready for that change. I think the first thing that we see happening is once people get this faster platform, they start hiring people who can take advantage and start acting in the company. But I really think that almost every industry would like to ask more questions, have more people ask those questions and get faster results from that information. And I think that's really where we're going to see the huge impact that productivity comes back to that productivity impact, to the extent that we can shave milliseconds off the time and the latency of making decisions. It's just going to drive revenue per employee through the roof, so. Well Colin, thanks very much for coming back. Thank you guys. It's great to be here in a little bit. Yeah, no, it's really wonderful to be here. All right, good to see you. So this is a wrap from day one of the cube at HP Discover. We'll be back on tomorrow. We start at 10 a.m. Pacific time. My colleague, John Furrier, is flying in tonight. He'll be here. We've got a full lineup on Wednesday and Thursday, so keep it right there. And we will see you tomorrow. This is The Cube. I'm Dave Vellante with my colleague, Jeff Kelly. Thanks for watching, everybody. We'll see you all.