 Okay, we're back live here in New York City for Big Data Week. This is SiliconAngle.com's exclusive coverage of Strata Plus Hadoop World. Hadoop World, as we were writing on SiliconAngle.com today, was going to be the South by Southwest of data nerds and geeks, an intersection between geeks and business. That's really the theme here today at Hadoop World New York City. I'm John Furrier, the founder of SiliconAngle.com. I'm joined by my co-host Dave Vellante, the co-founder of wikibon.org. SiliconAngle and Wikibon, the reference point for Big Data. We cover Big Data like a blanket. Wikibon is free research. You go there, you can edit documents, edit the wiki, share documents, collaborate with your peers. Dave, we have an awesome guest today, Brian Vilkowski, the CEO of Aerospike. We met at Oracle OpenWorld, emerging really as that, there's always that startup that pops out of the radar and starts to shoot up and you can see it. It's a shining star and Aerospike was one of those, Brian. Got it, welcome back to theCUBE. Thanks, glad to be here. So, when we were talking, we've known you've been following you and you have a great technical background, but you really hit a nerve with your offering and your company, that's a little rebranding going on from Citrus League to Aerospike, but Oracle, you made kind of this move almost like you got caught into the thermal and you got sucked into the visibility. So, explain why you think that's happening and why it's resonating so well right now. Sure, so what Aerospike has been about since the very beginning is speed and scale, ease of use, ease of deployment and the real problem that we've had is, everyone says that. So, as everyone keeps saying that, what does that mean, what do you do? Oh, you too. How do you get, yeah, you know, oh. So look at angle.com, we scale and we're easy to use. Yeah, we scale and we're easy to use. So, you know, what else is new? You're the fifth guy in my office today. We run Flash, run my iPhone. You know, what I think is really interesting about Hadoop World Strata this year is everyone's still talking about it even more, right? You've got Impala talking about real-time, you've got the folks that had dapped trying to bring the worlds together, but I haven't seen any speed numbers. Matt Bars throwing out some TerraSort, they had that TerraSort, so they did throw some numbers out. And that's great, someone won. And we got one guy actually showing numbers. At least they, world record. First time ever. So I was over at the Intel booth, those guys have, they're showing TerraSort, but they also have their own benchmark for Hadoop, which is great. I mean, let's get a shootout going between the guys at Hortonworks and the guys at Cloudera and their distributions. Let's start seeing some numbers, but I even asked the Intel guys, so what have you seen? What distributions are fast? How do you run them? And he had no idea. He didn't even bring his own numbers, but at least he brought a benchmark. So, one- Let's talk about benchmarks, because that's a good point, because MapR legitimately has a performance angle and they're touting that, and again, they're the only ones stepping up for the microphone. Sub 60 seconds, 1000 servers. Hold on, hold on, hold on. Be the previous record. Yeah, hold on, hold on. You make a good point. I mean, it's one of those things where the microphone's open and they're stepping up and talking to the microphone, saying, hey, we got a benchmark. Got some numbers, they're pretty impressive. But like you said, no one else is stepping up. So let's talk about why is that the case? And is it because no one understands the use cases? Is it too early? They don't want to show the numbers all the above. What's your opinion on that? And then after that, what benchmark should we talk about? Well, first of all, it is hard. Let's just put that out there. It's a hard thing to do. It's a hard thing to do well. It's easy to just toss something together and then get taken apart by your competitors who say, well, they didn't really test the right thing. That's not a valid configuration. You didn't tune me, et cetera. So it is hard. It takes some real technical focus. And when you're out there trying to solve customer problems and build new technology, the last thing you want to do is spend some of your senior guys for a month to try to put together a great benchmark. But that is what the world needs right now. To get above the noise, you have to say, I think this right now in speed and scale, you have to actually come with the goods. All right, so let's talk about the goods. Let's talk about the kind of benchmarks. Let's make the market right now. In today's ecosystem that we're in now, and let's try to unpack this, what kind of benchmark should be being laid down right now and lay out the framework of the benchmark and then what kind of performance? So TerraSort's not bad, but it is one very specific use case and it's not a use, it's a, you have to talk about load times as well. People are very concerned with load times and Hadoop. So you've got to get your data out of wherever it is and into Hadoop. And MapR has a good solution around that. They justifiably are pushing the fact that you don't have to actually load in the traditional ETL way, which is great. So TerraSort is good, but the world does need some new benchmarks. We need to really think about some of the streaming real time use cases. Some of the things like Impala would be good at and really try to come up with, I've got this ticker data flow, it's time series data. I now want to do this kind of analysis over it. It does take some thinking to really come up with what those use cases are going to be. TerraSort's nice because it's, take this pile of data, sort it. Any idiot can understand that, which means it's great. People can come with their best times. That might be the TPCC. You learn that in your first year in the computer science program. Run a sort, quick sort. Yeah, exactly. Okay, it's not complicated, but it gives a feel for the performers, again in a one dimension. So just to recap, load time, streaming, what are the factors you think should be looked at the benchmark? Well, I think the cost of the hardware is a critically underappreciated value. I was at a meetup last night and there's a real time database company and they were talking about running a million transactions per second and then finally they owned up to that on $40,000 worth of hardware. We had come to the same meetup last, the week before, and showed a million transactions per second on $2,000 worth of hardware. That's a much different perspective. And it does depend on the size of the cluster and stuff like that, but I would like to see more benchmarks in terms of the actual amount of money that it takes to run that benchmark. Oracle beating both on a million and a half dollars worth of hardware. Maybe they did, maybe they did. We should make a note. We should make a note. Wikibon should put up a benchmark together. We should really kind of work on that. Cause it's interesting. I mean, I never thought about it until you just said that, but really no one's talking numbers. I mean, MapR was the only company. And good for them. Yeah, good for them. You're right. I mean, it's a deep topic, right? And there's so many factors involved, the workload itself even, right? And generally, these are, you know, ranges of workloads, but, you know. Well, that's what IBM, we're just at IBM information on demand. And we actually mentioned you guys a few times, but IBM's got a completely different philosophy than Oracle. Oracle is the, you know, one size fits all. And IBM is at least saying, hey, you know what? We got big iron, they want mainframes, we got mainframes, but we got everything else down here. And some Hadoop, but they're workload specific. So I kind of agree with that. I like that philosophy. I want to get your take on one, that whole workload view. And then some of the things that you guys are doing in the market because you have customers right now, and after Dave and I did some digging after Oracle Open World, you got some results there. So tell us about what's going on with Aerospike's specific customer situations. No high, just raw data. So when I founded Aerospike, the, I asked around and I said, well, how do you get the word out about your company, the technology that you've built? And there's only two things that anyone should believe when talking to a database vendor. And that is the customers that are using it, because you don't trust the vendors. I don't, I wouldn't trust a vendor. So what you do trust is you trust the stories told to you by the customers. And you also trust getting the software in your hands yourself and then running your own benchmarks and essentially over your own workload. That's been one of the benefits of open source and the reason why MapR, even as a non-open source company, they have their M3 version, which is full performance so people can take it themselves. Close source companies like us, we have a free version that you can download. We have an enterprise trial as well as a free version. You can download it yourself over your workload and that's a critical proof point. The other critical proof point as you were mentioning was customers. So when we had the folks at AppNexus, one of the premier independent or non-Google ad exchanges in the world, they're passing 30 billion impressions per day, not per month per day, 30 billion, not million. They have a big placard in their office of doing 500,000 transactions per second on the front side. All of those hit in their Aerospike servers and again on very lightweight hardware. So folks hear that story. They hear about the portion of internet that we're powering and that's a much better, a much interesting story and really bringing those kinds of numbers out into the market in real world deployment scenarios. Brian, how does, we talked a lot about Flash at Oracle OpenWorld. How does Flash and some of the concepts that you were educating us on at OpenWorld translate into this world of Hadoop? Well, that's actually a very interesting question because Hadoop and HDFS is tuned for rotational disk and petabytes. So the implementations of Hadoop today are really still focused on rotational disk performance. I've seen two vendors here talking now about in-memory map reduce, about stepping away from the rotational petabyte problem and towards the real time I have the few terabytes of hot data that I need to analyze right now. And that's an area where we as Aerospike are also very interested and where Flash starts playing a picture. So Flash having its role between pure DRAM speeds which are of course higher and then cost effective 10 to 100 terabyte solutions of Flash. We now really have this third player in the middle between in-memory and rotational disk that frankly the Hadoop guys within the Hadoop infrastructure is not tuned for which is why you see the real time and the in-memory guys really looking for either using Hadoop tooling but not necessarily using the Hadoop technology stack which is rotationally disk and petabyte optimized. So Flash in its, you know, now what it's under a dollar per gigabyte. So a dollar per gigabyte, you know, terabytes are looking cheap for Flash and tens of terabytes are looking reasonable. That really changes what you can do with the front side and fast portion of big data. Let's talk about some of your customers. Drill into that a little bit. You've got a growing list. You guys been putting out a lot of press releases about customers lately. Who's some of your favorite stories? Well, you know, AppNexus we talk about till we're blue in the face, but, and they're great guys down there. They're really pushing the envelope here in New York. There's a company out of the West Coast called Blue Chi that we've worked with for a number of years now. They are one of the premier data vendors. So there are these clearinghouses on the internet now where you can buy and sell data of all sorts that in terms of anonymized data about web traffic and behavior, you can get pretty much everything flows through Blue Chi at some point. It's a great, it breaks down barriers in terms of being able to start a new company and being able to say, hey, wait a minute, in order to appreciate this new company, I need a lot of behavioral data. Where am I going to get that? I'm not Google. One of Google's compelling advantages is all of this data that they have, every single search, every single, everyone touches Google. So companies like Blue Chi have come up saying, hey, look, let's be the aggregator for so many different small companies and even larger size companies and let's expose the data that usually only Google has. Another company we deal with in the same space, Exalate, great company, but Blue Chi uses us in the same way. Right on the front edge, the real-time edge of responding to requests. They have Hadoop on their backside. They have Hortonworks. I remember seeing a Hortonworks presentation there in the early days. They do a lot of the analytics on that backend data and basically exchange anonymized versions of this data. And they use Citrusleaf switched out of Cassandra because they didn't have the speed and performance and actual reliability in the field. So we recently announced them as a customer. I want to step back a little bit. Anytime we get a database guru on, we like to pick your brain a little bit. So John and I have joked that four, five, six years ago, if you said that you went to a party, you said, yeah, I'm in the database field. People go, okay, next. But now database has become one of the hottest areas. I mean, it's just exploding with innovation. So talk a little bit about your perspectives on what's happening there. Big trends in the business and help our audience understand those. Sure, there really is a tectonic shift happening in databases. There's a lot of reasons for that. One reason is this new available hardware. So the ability, the switch over in data centers to flash storage has been a massive part of that change. The ability and understanding of cloud, the technologies that have been built out by some of the very early leaders like Google and Yahoo and also some academic work. Those guys have kept the technology to themselves. They don't sell products. They only allow you to use them as a service. Part of what we do and some of the other folks innovating in this area is, let's bring out a product. Let's let you use this new technology. Let's use no-sequel, massively distributed systems. That's not just the smart guys at Google. Let's make sure that that's available to all comers and the guys who compete with Google as well. That has been evolving over time really to go beyond the simple problems of key value store and fast storage where we've initially done our work and into the broad database market. The pendulum is swinging back towards SQL as well. So not only can you use the newer, innovative, easy to use, no-sequel interfaces, but more and more applications of SQL like what we're seeing out of, as I mentioned, ADAPT and PAHLA and also even Cassandra now having a SQL interface. I think you'll see just about every vendor. We certainly are considering it. Re-approaching SQL as an interface. There's nothing wrong with SQL as an interface. There was really a problem with the core implementation of many of the non-scalable relational model, database implementations. SQL's a great tool. So I think that it's all about scale and performance, proving those and building out multiple interfaces on top of that. Well, it's a lot right with the SQL interface too, as in a lot of people understand it, right? Okay, I want to also go ask you, to get your opinion on multi-tenancy. We were at Oracle Open World, Larry Ellison announced the world's first multi-tenant database, which of course is probably not true because DB2 has had a multi-tenant database for a while, but- It's called an exaggeration. Yeah, but it's Larry, so we give a pass on that because in his world DB2 doesn't matter. But his argument, his premise was that you don't want to do multi-tenancy at the application layer. You want to do it at the database level. Can you explain why that is? Or do you agree with that? I don't agree with it. Multi-tenancy is a concept that has to be at many different layers of your stack. For him to say, to try to take the entire world of multi-tenancy and say, mind, mind, mind, mind, mind. You know what, Larry Ellison doing something like that. Who would have guessed? No, look, if you look at, let's even just take the example I was mentioning with Google. With Google, they have the Google application stack and you log into it and their entire infrastructure has to be multi-tenancy. From Larry's world, there's a database and then there's the application, which is the Google Cloud. Well, the Google Cloud has to be multi-tenant. You have to be able to log into it. Just like with EC2. You have to be able to go into Amazon's EC2. You have to be able to create your different services as well as create database instances. Multi-tenancy in the cloud world exists top to bottom. And it is very good to have that in the database as well. And I'm glad Oracle is supporting that at this point. But multi-tenancy is everywhere. It has to be the lingua franca. And in some ways, you also have to think about how virtualization plays. Because virtualization was, so few things are multi-tenant, let's be able to stack up multiple virtual machines since multi-tenancy isn't really with us. Where are we in multi-tenancy? First of all, I totally agree. Multi-tenancy has to be standard. Table stakes relative to the concept you just mentioned. Because everything's as a service will be cloud based at some point. Whether it's on-prem or not, there's a mixture of that. So where are we in this multi-tenancy? Is it on a scale of one to 10? Ten being mature, baked out, and one being born. I'm going to say four. So it's early. It's early. It's still early. Oracle only supporting it now at the fundamental basis it needs to. Shows you really it is still in the earlier days. The cloud service guys are ahead. Because you have to be, as you're saying, to follow all those trends. And folks like us who are database innovators, such as the guys I talked to last night at this meetup, all of us are going, hey, we're reinventing the entire database. Give us a little time to get to the multi-tenancy. You can sort of fake it with virtual private networks like Amazon has been doing. So there's some tools out there that get by. And so it's often something you add, step two or step three down the road. But everyone really needs up front. And the goodness, you referenced before it's good that Oracle's doing it. The goodness in the multi-tenancy at the database level is what, that I don't have to just make copies of that database across the, you know, set of applications that I'm serving or. And it has to do with security. It has to do with, you know, in the privacy enhanced world that we have today, being able to very carefully sculpt out which applications, not just which users can touch which pieces of the data. I think the next step is actually security on queries. We've been very focused on securing the data itself, which is critical, but knowing what a user asks can be more important than the data that a user is generating. So there's multiple layers of multi-tenancy that need to be considered. When we met, I said, we're going to be picking your brain a lot, and I love when you come on theCUBE. Everybody, you know, they like to talk about the company, your company, that's great. But there's a lot more that folks like you can add to the industry in our community really appreciates it. We had the CEO of a company called Squirrel on yesterday, Orin Falcovitz. And they are basically, you know, commercializing around the Accumulo project and they're doing the cell level security. And it seems that an opportunity for them might be a multi-tenant database for the cloud. What's your thoughts on just security in general and have you looked at the Accumulo project and do you have any opinions on it? I have not looked at in depth. So one of the classic things to do in an early stage in databases is to try to solve a few of the most key problems early. At the NSA, security is the most key project. So they started with that. The folks at FoundationDB, who I saw yesterday, they focused on transactions ahead of just about everything else. I was talking to our advisor who's one of the founders of Informix. And we asked him the question, so how long did it take before customers really required transactions? And he said, well, six years. So that informed us a lot in terms of saying, being fast and being scalable is really, those are the core values we're focusing around right now. Security is important, it's going to be important, and there are a few key use cases for which security is the first checklist in the list for you. And I'm sure Accumulo will do great business in that in the short term because those few use cases, just like for us, the levels of hyperscale that we've brought have brought us into the one advertising community where for them, security is at a broader brush. Everything needs to be secure. It's all PII, everything in there is but they don't need specific roles for individual users. So there's a lot of fragmentation and this is what's caused confusion in those SQL is so much fragmentation, everyone says, well, which part are you good at? That's the first question they ask. But it's a lot of innovation going on, which is exciting. Absolutely. All right, the other area you got to ask about because you asked me off camera a couple of weeks ago is, can we talk about Spanner? And I would love to get your take on that because I've been talking about it, I've been squinting through the paper and my eyes are bleeding. I'm not sure anyone really understands it yet or well. So talk about it to you, what does it mean? I mean, it's the world's Google claims the world's first globally distributed transactional database. Looks like they've solved the speed of light problem. Well, what's amazing, there are a few academic, that's not really an academic paper, but there are a few papers you look at and a few statements of technology, whereas a technologist, I say, wow, they really thought through something in a different way. And what they did was they looked at time differently. So it is almost like bending the speed of light. If you say, hold on a second, time isn't really what you think it is. Time, there's a variety of ways to think about time and you have to really think about, these different actors with their own view of time coming together and these time synchronization records, new ways of synchronizing not only servers, but adding time marks to a transaction. That to me was the real innovation with the Spanner. And then algorithmically solving to get to the accurate time over distance, right? Is that correct in understanding? Like I said, I'm not sure, I'm still reading the paper to myself. But what I see is more of a relativistic view of there are many views of time. Time is not one thing in the Spanner paper. It really is about looking at the different slices of time and then later how you get everyone on the same page and you say, well, if you think of time this way and you think of time this way, now you all agree. So as a visionary, how do you see a technology that being applied to create business value? Well, in the end it's going to come down since the only two things that matter are speed and ease of use. Ease of use in this case is I have a larger number of servers than before. We specialize in server clusters in the 20 to 40 server size, tightly connected clusters that can get up to 10 million transactions per second, which is good for a lot of business uses. And what Spanner is really saying is there are transactional problems that can be solved in the thousands and thousands of server range. So when you have that type of problem where you need transactional integrity, this is a great step forward. Awesome, I really appreciate you letting me pick your brain like this. We can talk some more about your company. We'd love to learn more about your, expanding your management team. Talk about that a little bit, what's going on there. Great, so a classic story of two guys in a garage two years ago taking our seed round from Alsop Lewis, our lead partner, and then bringing in NEA a bit later. NEA, with them we also brought in a new CEO, a guy named Bruce Fram, who was known for Gridiron as well as Data General, and sorry, Network General, the Sniffer company, as well as a few other interesting ventures. So he brings a lot to the management team, as well as Monica Powell, who's our new VP of marketing. So that really is a key value in our company and in any database company today. Being technology focused, you really have to rise above the noise in this market. And for all of the jokes about Larry Ellison, one of the things he did brilliantly was hire and fire marketing people, especially fire them. There's a lot of stories about, they would last about six months because he was so brilliant at marketing. Having core values and core competency within market is critical to database companies in this innovative field. You can't just come with great technology, you have to tell people about it and they have to understand it. So Monica's been a great new part of the team. Well I think I've been saying now for quite some time that marketing needs to be a source of value to technology practitioners and CIOs, and you've certainly provided a lot of value for our community, so thank you for doing that. Okay, and Austin Bryan Wachowski, the CEO founder of Arrow Spike. We'll be back with our next guest after this short break. Thanks John, thanks Dave. Thanks Bryan. Wait, here.