 Live from Midtown Manhattan, it's theCUBE, covering Big Data, New York City, 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. Okay, welcome back everyone. We're live in New York City. This is theCUBE's coverage of Big Data NYC. This is our own event for five years now. We've been running it, been at Hadoop World since 2010. It's our eighth year covering the Hadoop World, which has evolved into Stratoconference, Stratahadoop, now called Stratadata. And, of course, it's a bigger than just Strat, it's about Big Data in NYC. A lot of big players here inside theCUBE, thought leaders, entrepreneurs, and great guests. I'm John Furrier, the co-host this week with Jim Kobielos, who's the lead analyst on our Big Data, our Wikibon team. Our next guest is Yaron Javier, who's with Iguazio. He's the founder and CTO, hot startup here at the show, making a lot of waves on their new platform. Welcome to theCUBE. Good to see you again. Congratulations. Yes, thanks. Thanks very much. We're happy to be here again. You're known in the CUBE community as the guy on Twitter who's always pinging me and Dave and the teams, and hey, you guys got to get that right. You really are one of the smartest guys on the network in our community, you're super smart. Your team has got great tech chops. And in the middle of all that is the hottest market, which is cloud native, cloud native, as it relates to the integration of how apps are being built. And essentially new ways of engineering around these solutions, not just repagaging old stuff, it's really about putting things in a true cloud environment with an application development, with data at the center of it. You've got a whole complex platform you've introduced. So really, really want to dig into this. So before we get into some of my point, of course, I know Jim's got a ton of questions, is give us an update on what's going on. So you guys got some news here at the show. Let's get to that first. Hi, so since the last time we spoke, we had tons of news. We're making revenues, we have customers. We've just recently GAID. We recently got significant investment from major investors. We raised about $33 million recently from companies like Verizon Ventures, Bosch, for IoT, Chicago Mercantile Exchange, Dow Jones, and other properties at Dell EMC. So pretty broad. So customers, pretty much. Yeah, so that's the nice, the interesting thing. Usually investors are sort of strategic investors, or partners, or potential buyers. But here, it's essentially our customers that it's so strategic to the business. Well, talk about the GAID approach. Just get into what's shipping, what's available, what's the general availability, what are you now offering? So Iguaz is trying to, you alluded to, cloud-native and all that. Usually when you go to events like Strata, and big data, it's nothing to do with cloud-native, a lot of hard labor, and not really continuous development and integration. It's continuous hard work, okay? It's continuous hard work. And essentially what we did, we created the data platform, which is extremely fast and integrated. It has all the different forms of state streaming, and events, and documents, and tables, and all that into a very unique architecture. One diving to that today. And on top of it, we've integrated cloud services, like Kubernetes, and serverless functionality, and others. So we can essentially create a hybrid cloud. So some of our customers, they even deploy portions as an optics-based settings in the cloud, and some portions in the Edge, or in the enterprise, deploy the software, or even a prepackaged appliance. So we're the only ones that provide a full hybrid experience. This is a SaaS product? So it's a software stack, and it could be delivered in three different options. One, if you don't want to mess with the hardware, you could just rent it, and it's deployed in Equinix facility, where very strong partnerships with them globally. And if you want to have something on-prem, you could get a software reference architecture. You go and deploy it. If you're a telco or an IoT player that wants a manufacturing facility, we have a very small 2U box, four servers, four GPUs, all the analytics stack you could think of. You just put it in the factory, instead of like two racks of a dupe. So you're not general purpose, you just put whatever the customer wants to deploy the stack, and they have flexibilities on them. Yeah, now it is an appliance, it is an appliance even when you deploy it on-prem. It's a bunch of Docker containers inside that you don't even touch them, you don't SSH to the machine. You have APIs, and you have UIs. And just like the cloud experience, when you go to Amazon, you don't open the Kimono, you just use it. So our experience, that's what we're telling customers. No root access problems, no security problems. It's a hardened system. Give us servers, we'll deploy it, and you go through consoles and UIs. So you don't host anything for anyone? We host for some customers, including our- So you do whatever the customer was interested in doing. Yes. So you're flexible, okay. We just want to make money. Well, let's get it, I think you're going to do it pretty good. Let's dig into the product. So on the GA, so here, it's obviously a big data world, you mentioned you have this data layer, so like a data piece. So I got to ask you the question, so pretend I'm an idiot for a second, right? Okay. No, you're smart guy. So what problem are you solving? So we'll just go to the symbol. I love what you're doing. I assume you guys are super smart, which I can say you are, but what's the problem you're solving? What's the same thing for me? Okay, so there are two problems. One is the challenge everyone wants to transform. There is this digital transformation mantra, and it means essentially two things. One is I want to automate my operation environment so I can cut costs and be more competitive. The other one is I want to improve my customer engagement. I want to do mobile apps which are smarter, get more direct content to the user, get more targeted functionality, et cetera. These are the two key challenges for every business, any industry, okay? So they go and they deploy a dupe on Hive and all that stuff, and it takes them two years to productize it, and then they get to the data science piece. And by the time they finish, they understand that this dupe thing can only do one thing, is queries, and reporting, NBI, and data warehousing. How do you do actionable insights from that stuff? Okay, cause actionable insight means I get information from the mobile app, and then I translate it into some action. I have to enrich the vectors, the machine learning, all that details, and then I need to respond. Hadoop doesn't know how to do it. So the first generation is people that pulled a lot of stuff into the data lake and started querying it and generating reports. And the boss said- Low cost data lake, basically. And the boss said, okay, what are we going to do with this report? Is it generating any revenue to the business? No. The only revenue generation. If you take this data- You're fired again, exactly. And they're not all fired, but now they're re- Well, you don't get the budget AD anymore. Now they're starting to buy our stuff. So, and now the point is, okay, how can I put all this data, and in the same time generate actions, and also deal with the production aspects of, I want to develop in a beta phase. I want to promote it into production. That's cloud-native architectures, okay? Hadoop is not cloud. How do I take a Spark, Zeppelin, and notebook, and I turn it into production? There's no way to do that. By the way, depending on which cloud you go to, they have a different mechanism and elements for each cloud. So the cloud providers do address that, because they are starting to package- There's no dev option that spans all the clouds, yeah. Yeah, so cloud providers are starting to have their own offerings, which are all proprietary around. This is how you would, you know, forget about HDFS, we'll have S3, and we'll have Ratshift for you, and we'll have Athena, and again, you're starting to consume that into a service. Still doesn't address the continuous analytics challenge that people have. And if you're looking at what we've done with Grab, which is amazing, they started with using Amazon services, S3, Ratshift, you know, Kinesis, all that stuff, and it took them about two hours to generate insights. Okay, now the problem is, they want to do driver incentives in real time. So the driver, they want to incentive the driver to go and make more rides or other things. So they have to analyze the event of the location of the driver, the event of the location of the customers, and just throw in messages back based on analytics. So that's real-time analytics, and that's not something they can do in real time. They got to build that from scratch right away. I mean, they can't do that with the existing. No, and Uber invested tons of energy around that, and they don't get the same functionality. Another unique feature is that they talk about in RPR. This is for the use case that you're talking about. This is the Grab, which is the car. The Grab is sort of number one right-sharing in Asia, which is bigger than Uber in Asia, and they're using our platform. By the way, even Uber doesn't really use a dupe to use MemSQL for that stuff, so it's not really using open source and all that. But the point is, for example, with Uber, when they monetize the rides, they do it just based on demand, and with Grab, now what they do because of the capability that we can intersect tons of data in real-time, they can also look at the weather. Was there a terror attack or something like that? They don't want to raise the price. A lot of other data points could be traffic. They don't want to raise the price if there was a problem and all the customers get aggregated. This is actually intersecting data in real-time, and no one today can do that in real-time beyond what we can do. A lot of people have semantic problems with real-time. They don't even know what they mean by real-time. The data could be a week old, but they can get it to them in real-time. But every decision, if you think, if you generally don't know the problem, and we have slides on that, that I explained to customers is, every time I run analytics, I need to look at four types of data, the context, the event, okay, what happened, okay? The second type of data is the previous state, like I have a car, was it up or down, or what was the previous state of that element? The third element is the time aggregation, like what happened in the last hour, the average temperature, the average ticker price for the stock, et cetera, okay? And the fourth thing is enriched data, like I have a car ID, but what's the make, what's the model, who's driving it right now? That's secondary data. So every time I run a machine learning task or any decision, even decision, I have to collect all those four types of data into one vector, it's called feature vector, and take a decision on that. You take Kafka, it's only the event part, okay? You take MemSQL, it's only the state part. You take Hadoop, it's only like historical stuff. How do you assemble and stitch a feature vector? You're talking about a complex machine learning pipeline, so clearly, that you're talking about a hybrid. You're talking about a hybrid, yeah. And actions based on just down things, like the car broke and I need to send a garage, I don't need machine learning for that, okay? So within your environment then, do you enable the machine learning models to execute across the different data platforms of which this hybrid environment is composed, and then do you aggregate the results of those models, runs into some larger model that drives the real-time decision? In our social everything is a document, so even a picture is a document with a lot of things, so you can essentially throw in a picture, run TensorFlow, embed more features into the document, and then query those features on another platform. So that's really what makes this continuous analytics extremely flexible. So that's what do we give customers? The first thing is simplicity. They can now build applications in, we had tier one automotive customer, CIO coming, meeting us, said, you know what, I have a project, one year, I need to have higher dozens of people, it's hugely complex, so tell us what's the use case, and we'll build a prototype, okay? All right, well I'm going to push. One week, we gave them a prototype, and it was amazed how in one week, we created an application that analyzed all the streams from the data from the cars, that did the enrichment, did the machine learning, and provided the predictions. We'll have to come in and test you on this, because I'm skeptical, but here's why. Everyone is. Okay, we'll get to that, but I'm probably not skeptical, but I kind of am, because the history is pretty clear. If you look at some of the big ideas out there, like OpenStack, that thing just morphed into a beast. Hadoop was a cost of ownership nightmare, as you mentioned early on. So people have been conceptually correct on what they were trying to do, but trying to get it done was always hard, and then it took a long time to kind of figure out the operational model. So how were you different if I'm going to play the skeptic here? You know, I've heard this before, how were you different than say OpenStack or Hadoop clusters? Because that was a nightmare, cost of ownership, I couldn't get the time to value I needed, lost my budget, why aren't you the same? Okay, that's interesting, I don't feel no, but I ran a lot of development for OpenStack when I was in Maddox and Hadoop, so I attached a lot of those. So do you agree with what I said? That that was a problem? They are extremely complex, yes, and I think one of the things is first, OpenStack tried to bite on too much, and it's sort of a huge dent, everyone tries to push his agenda. OpenStack is still an infrastructure layer, okay? And also, Hadoop is sort of something in between an infrastructure and an application layer, but it was designed 10 years ago, where the problem that Hadoop tried to solve is how do you do web ranking, okay, on tons of batch data? And then the ecosystem evolved into real-time and streaming and machine learning. And that doesn't... A data warehouse alternative, whatever. So that doesn't fit the original model of batch processing, because if an event comes from the current IoT device and you have to do something with it, you need a table with an index. You can't just go and build a huge bar K file. Now, you know, you're talking about complexity. Oh, you know, that's why he's different. Go ahead. So what we've done with our team, after knowing OpenStack and all those giant... All the scar tissue. And all the scar tissues. And my role was also working with all the cloud service providers. So I know their internal architecture, and I worked on SAP HANA and Exadata and all those things. So we learned from the bad experiences, said let's forget about the lower layers, which is what OpenStack is trying to provide. Provide you infrastructure as a service. Let's focus on the application and build from the application all the way to the flash and the CPU instruction set and the adapters and the networking. That's what different. So what we provide is an application and service experience. We don't provide infrastructure. If you go buy VMware and Nutanix, all those offerings, you get infrastructure. Now you go and build with a dozen of DevOps guys, all the stack above. You go to Amazon, you get services. Just that they're not the most optimized in terms of the implementation because they also have dozens of independent projects that each one takes the VM and starts writing some. But they're still good service, but you got to put it together. Yeah, but also the way they implement because in order for them to scale is that they have a common layer based on VMs and then they're starting to build up the application so it's inefficient. And also a lot of it is built on 10 year old baseline architecture. We've designed it for very modern architectures parallel CPUs with 30 cores, Flash and NVMe. So we've avoided a lot of the hardware challenges and serialization and just provide an abstraction layer pretty much like a cloud on top. Now in terms of abstraction layers in the cloud, they're efficient in providing a simplification experience for developers. Serverless computing is up and coming. It's an important approach. Of course we have the public clouds from AWS and Google and IBM and Microsoft. There are a growing range of serverless computing frameworks for prem-based deployment. I believe you are behind one. And can you talk about what you're doing at Iguazzo on serverless frameworks for on-prem? Yeah, so first I'm very active in CNCF, the Cloud Native Foundation. I'm one of the authors of the serverless white paper which tries to normalize the definitions of all the vendors and come with a proposal for sort of interoperable standard. So I spend a lot of energy on that because we don't want to lock customers to an API. What's unique, by the way, about our solution, we don't have a single proprietary API. We just emulate all the other guy's stuff. We have all the Amazon APIs for data services like Kinesis, Dynamo, S3, et cetera. We have the open source APIs like Kafka. So also on the serverless, my agenda is trying to promote that if I'm writing to Azure or AWS or Iguazzo, I don't need to change my app. I can use any developer tools. So that's my effort there. And we recently, a few weeks ago, we launched our open source project, which is a sort of second generation of something we had before called Nucleo. It's designed for real time. How do you spell that? Nucleo, I even have the nice slick here. It's really fast, because it's... So that's the open source that you guys are sponsoring. So I'll code out in the open. All the code is in the open. Pretty cool. It has a lot of innovative ideas on how to do stream processing and the best, because the original serverless functionality was designed around web hooks and HTTP. And even many of the open source projects are really designed around HTTP. I have a question. I'm doing research for Wikibon on the area of serverless. We've recently published a report on serverless. And in terms of hybrid cloud environments, I'm not seeing yet any hybrid serverless clouds that involve public serverless like AWS Lambda and private on-prem deployment of serverless. Do you have any customers who are doing that are interested in hybridizing serverless across public and private? Of course, and we have some patents I don't want to go into. But the general idea is that what we've done in Nucleo is also the decoupling of the data from the computation, which means that things can sort of be disjoint. You can run a function in a Raspberry Pi and the data will be in a different place and those things can sort of move, okay? So the persistence has to happen outside the serverless environment like in the application. Outside of the function, the function access the persistent layers through APIs. And how this data persistency is materialized, that's sort of a separate thing. So you can actually write the same function that will run against Kafka or Kinesis or RevitemQ or HTTP without modifying the function and ad hoc through what we call function bindings, you define what's going to be the thing driving the data or storing the data. So you can actually write the same function that does an ETL job from table one to table two. You don't need to put the table information in the function, which is not the thing that Lambda does. And it's about 100 times faster than Lambda. We do 400,000 events per second in Nucleo. So if you write your serverless code in Nucleo, it's faster than writing it yourself because of all those low-level optimizations. You're on, thanks for coming on the Key Realty Show. We want to do a deeper dive. Love to have you on Palo Alto next time. As usual, yeah. You're in town. Let us know when you're in Silicon Valley for sure. We'll make sure we get you on camera for multiple sessions. And more information re-invent. Go to re-invent. We're looking forward to seeing there. Love the continuous analytics message. I think continuous integration is going through a massive renaissance right now. You're starting to see new approaches. And I think things that you're doing is exactly along the lines of what the world wants, which is alternatives, innovation, and thanks for sharing on theCUBE. That's very great. This is a CUBE coverage of the hot startups here at Big Data NYC. Live coverage from New York after this short break. I'm John Furrier, Jim Kobielus after this short break.