 We're here with Ming Chen Hong from Vertica, engineer turned product marketing, so he knows what he's talking about. Welcome to theCUBE. Thank you, John. Get one of the smart nodes out there, we're going to extract some knowledge. We have a demo, you're going to show us some eye candy from the Vertica, and one of the big things around the marketplace right now, and certainly in why we're here at the show is to figure out where HP fits in the big picture for convergence, cloud, mobility, consumer, consumerization of IT, all those buzzwords, the heart of it in today's marketplace, powering mobile, social, and cloud is data. Data is the new developer environment, people are using data in all sorts of new ways, there's new data types. You guys at Vertica have been hugely successful in building out a new opportunity that plays in the big data space. So let's talk about big data. So first, tell me a little bit about Vertica and then how you work within the big data ecosystem, because it's growing, it's changing. You have Hadoop out there, you have proprietary approaches, you guys are now in HP, all the brain trust is trying to figure out what to do with you, sure they're going to do some good things, but tell us your view of the big data ecosystem. Absolutely, John. So when Vertica got started in 2005, there's just tremendously growing trend of big data, you know, just the amount of data volume that's being generated and captured every year, it's just growing exponentially, but the problem is the technology for analyzing the data, gaining real-time insights over the fine-grained data is not there yet. That's the premise where Vertica has founded, right? So we took this revolutionary approach, started with a clean slate, no legacy technology, no burden of the history, and we really architected this compelling solution that's reducing the disk IO, which is often the key bottleneck for large-scale data analysis, terabytes, petabytes, or even more, right? These data analysis engines, and we figured out a really low-cost and flexible software solution that's bringing sort of the standard, power of standard hardware to sort of the extreme. So data, as we've been covering at SiliconANGLE.com and wikibon.org, our research group, is a source of competitive advantage because traditionally data's been parked away in data warehouses, HP's played in that world, they build gear and hardware to make things go faster, but you really, when you park that data way away, it's really hard to get it out fast, you've got to throw some hardware at it, but now with these new approaches with unstructured data, with mobile and all those new data types, low-latency archival recovery of data is critical for all applications. We heard from VMware earlier today talking about how virtualization is changing, we had David Scott on board here, Senior Vice President of the Storage Group, thin provisioning, all these things are happening. So tell me, Meng Shen, what do you see as the technology enabler of big data? We see things like Hadoop out there that's growing like a weed, it's open source, people are building proprietary approaches, we see Vertica with a clean sheet of paper developing some technology. What's disrupting this marketplace? What technology is that disruptive enabler? Absolutely, you're spot on, John, when you commented that it used to be the case that data warehouse is the place where you park the data, or as some people say, the data warehouse is where the data goes to die, right? And that's not compelling, because you like to put your data to work, just like you put your money to work, you want to keep all your fine-grained raw data and analyze them in real time, that's what Vertica does. In terms of the key enablers, I would say there are three aspects that are receiving continuous innovation and these are the key focuses for Vertica. First of all, performance, and by performance we mean efficiency, scalability across standard hardware, reliability, high performance, continuous querying and loading, no downtime, performance. Then we talk about features, because people realize given the new age 21st century web 2.0, virtualization analysis, mobile device, social network, well, traditional structure data analysis, in particular, there's this language called SQL, it's no longer sufficient, we want to add all sorts of statistical data mining, algorithms, predictive analytics, feed that into the database engine if possible, that would make the ultimate compelling solution. And lastly, I would stress the importance of usability because you don't want a tool that is incredibly powerful, but no one can use it, you need an army of PhDs who will have to roll their sleeves and program from scratch every day. You want to put the brain, the people's insights into higher level innovations, enabling new use cases, and let the data engine take care of the low level work for you. And these are all the foundations that Vertica is created on. And it's also, I mean, performance is key, and that's going to be something that you have to deal with in these legacy environments. And we want to talk more about that, and I want to go to the demo in a minute, but I want to ask you about data science, data scientists. So that's the big rage right now is data scientists. But they did not graduate from college anymore, that's PhD levels. So talk about the skill set required to play in this area because this is not just hire a kid out of high school or college with a CS degree, right? I mean, this is math involved, there's some real tech. So talk about the science involved and also the role of the data scientist. Absolutely, now traditionally, and it's continuing to go along the trend, a data scientist needs to have three strong skills. You need to have, first of all, mathematical and more specifically, statistical analysis skills. You need to understand what kind of methods you're employing on which kind of data. Secondly, performance-related, database background, that's how you know which tool you can pick to maximize the performance. Because if you just understand the algorithm, but don't know how to execute efficiently, that's not going to be realistic. And thirdly, domain expertise knowledge, you're talking about analyzing social gaming behavior in Zynga or Groupon analysis or AOL, real-time looking at people's web browsing behavior, analyzing that, serving, targeting precise ads. These are all the different domains and you need expertise in that, right? And Vertica is continuing down the trend, but given the power, the performance and the usability that Vertica puts on the table, it can offload a lot of activities from the user so that they can really focus on really what's adding value to their business, the business insights, the domain-specific knowledge they bring to the table, as well as the statistical elements there. And Vertica takes care of the performance for you and the scalability. I'm John Furrier with SiliconANGLE.com, the reference point for tech innovation. We love talking about some of the emerging technology like big data. I'm here at Mingxian Hong with Vertica, now part of HP, really smart guy in the engineering side, now product marketing. Before we go to the demo, I want to ask you a final question about Sizzle, okay? Sizzle and Stake, the Sizzle is analytics. You get the CEOs up there doing dashboards, I was at the SAP conference, oh, look at this, we can get data from all these databases in seconds and I can run my business, I get real-time information, real-time analytics, I mean, they're selling that dream. It's a great dream, right? So that's the Sizzle. Let's talk about the Stake. The Stake is the science behind it. Share with us some of the science behind making that dream of selling the real-time analytics happen. Well, science is always sort of the concrete foundation but relatively sort of boring part that people don't often highlight on a big screen, right? But if I am to summarize the science behind big data technology in specifically Vertica, these are the following aspects. We want to leverage standard x86-based commodity hardware. That's how you can write the curves of decreasing price and increasing performance every year, year after year, right? You know, really standing on the shoulder of the giants like Intel's and others. So standard hardware, and you want to have a massively scalable solution. You don't want to have any single bottleneck in your system. You know, that sounds obvious but it's very challenging to pull it off, right? So I'm a customer. I want some of that big data solutions. Can I just hire a consulting firm to roll in and just roll me some big data? I hire, download Hadoop for free and I got real-time analytics. Is it that easy? Where are we with this? Oh, absolutely. So Hadoop is one of the examples where it's been revolutionizing the big data industry, the open source nature, this incredibly talented community behind it, right? Now people do use Hadoop as an entry point to conduct the analytics and Hadoop is very flexible to handle jobs all the way from ETL data loading site, transformation to the analytics. Now, Hadoop is very, there's a lot of commonality between Hadoop and Vertica in terms of using commodity hardware scaled out as we just talked about. The difference is I would say, Vertica is really focusing on physically organizing the data really well when you load it into the database, right? Talking about segmented data across the nodes not in the random way, which Hadoop does for high availability. Vertica is actually segmenting it in a way that's very amenable to query analytics and also sorting the data and encoding the data and compressing it specifically tailored for each individual column, right? That's how you can squeeze the storage footprint. Vertica often gives 10 to one compression ratio. Very compelling because if you've got terabytes or terabytes of data, where you put that into the disk, you don't want it to blow up. Whereas for other solutions, for high availability through the replicas, it usually blows up anywhere between 3x to even 100x. Great, well at SiliconANGLE.com and SiliconANGLE.tv here inside theCUBE, we've done many CUBE gigs this past year, many big events, Stratta, but over a million views at Stratto Rally Stratta Conference which was the focus on big data. We did a lot of different events and demos of data are hard to do. And one of the most popular demos we've done and Michael Schoenwright, our director, loves the demos is Visualization of Data. So you have a demo, so let's go to the demo right now, share with us the setup, what the demo is, and then just jump right in. Okay, all right, this is very exciting. What I'm going to show you is a live demo. There's no PowerPoint or sort of sleight of hand there. So if it actually crashes in the middle, then you know it is really the real stuff. Let's get started. So first of all, let me set up the table here. We basically developed a demo to show Vertica's real-time analytics capability, monitoring a large number of machine devices. Think about your mobile phones, your tablets, or perhaps the network routers, any machine that's generating real-time data. And you want to analyze them and monitor the situation. For a large organization such as HP with hundreds of thousands of global employees, you may have that many devices online at the same time. And you want to understand which devices are consuming most of the bytes on the network, so on and so forth. So let's get started with the user interface here. What I'm going to show you is we have a loader that generates the realistic machine measurement data and load that into Vertica engine. And then we're going to feed the Vertica inside into a dashboard to visualize the query analytics. All of these components are running on my latest and greatest HP laptop, right? But truth be told, it's still a laptop, right? All right, so now let's get started with the demo. I'm going to hit the click button to start loading the data. Now what I'm loading is for every half a minute, it's generating a million records corresponding to those machine measurements, hundreds of thousands of machine devices. And what this blue bar showing you is for that batch that corresponds to half a minute's worth of data, the green bar is actually the amount of time it takes Vertica to load that data, right? So it's nowhere close to the machine capacity. And let me remind you, we're running on a single laptop here. So if you have more beefy hardware, you can really provide linear scalability there. The text might be a little hard to see, but let me read some of the key statistics for you. We're loading over 20 megabytes a second, which translates to multiple terabytes a day, right? So we have customers like Zynga and others who are generating that much data and loading into Vertica and providing real-time analytic insight, okay? So this is the loading site. Now, while we're continuously loading that much data into Vertica, let me show you how we can extract analytic insight the moment the data hits the database, right? You don't want to have a separate nicely load window, which is the common practice today, but creates huge latency for the analysts. If I want to look at what's going on my network right now, I don't want to wait for 24 hours. So let's look at the reporting site, okay? So let me start with, if I'm an analyst, I want to understand among the hundreds of thousands of devices, which ones are these the ones I'm interested in? I don't want to eyeball all of the hundreds of thousands, right? So I can break them down by the device vendor. So we've got a couple device vendors here. If you click any vendor, you see the more detailed breakdown of the device families. And notice every time I click the UI button, it's generating a real-time query, feeding that into Vertica and getting the result back. Look at the latency here. Very compelling, interactive. Similarly, if I click any device family, it's going to bring me a long list of all the devices currently active. In the effect, select any device, it's going to give me all the details, right? Perhaps which user it belongs to, what are the hardware details, and you can select a few of them for contrast if you like, right? So so far, we covered the reporting site for the static hardware device information. Now let's look at a more exciting site, the really, the runtime information. How are my hundreds of thousands of devices doing in real time? So I'm just refreshing this pretty complex dashboard and walk you through that. So the left hand on the top left screen, we see we're plotting all the data across a timeline and we're aggregating the information around hundreds of thousands of devices and plotting the average throughput, which is the first curve. And the bottom curve is the average error rate, right? So once you've got the 10,000 foot level view, you might ask yourself, what are my 10 most business devices, right? Or who are my 10 users who have been consuming most of my 3G bandwidth? So that's what the top right screen is showing you, the top 10 devices there. Again, if I hit refresh, it's actually going to notice the extended timeline there. It's going to actually bring in the latest data that hits the vertical database and it brings you the real time insight. Now if I want to drill down, let me click any individual device and it's going to bring me the similar curve, but this time focusing on that particular device. What if I want to contrast different devices? Let me pick a few ones and you can look at the visually compelling curves there to look at how they're different. And finally, if I pick a single device and I may want to understand the correlation between different measures, such as the bottom right screen, you have the X axis as the error rate and the Y axis as the throughput, right? So we have a single data point for that particular chosen device at a time nine o'clock. Now let's understand how that data point changes over time. So I'm going to play this movie like experience here where we can look at the trajectory of all the data points as it's moving. And let me remind you every time it's issuing a new query to the database and bringing the latest information, right? So I want to just, we have a guest. We have to give you the hook. Sorry about the interrupt on the demo. Thank you for the time. I know you got a lot to show there. We're going to get to it later if you have time. Sorry to interrupt you, but really thankful for the demo. We have to kind of move on. It was a great demo. We have a time we had realized that Paul Miller was supposed to be coming on as well. We'd lost our chair for the demo, but thanks for showing us the demo. The verdict of the thing's hot. Congratulations. The pleasure is mine, John.