 Live from New York, it's theCUBE. Covering theCUBE, New York City, 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. Okay, welcome back everyone. Live from theCUBE in New York City. It's our second day of two days of coverage. Cube NYC, the hashtag, Cube NYC. Formally, big data NYC renamed because it's not just about big data anymore. It's about serverless, it's about Kubernetes, multi-cloud data, it's all about data and that's the fundamental change in the industry. Our next guest is Yaron Javi, who's the CTO of Iguazio, Cube alumni, always coming on with some good commentary, smart analysis, kind of a guest host as well as an industry participant and supplier. Welcome back to theCUBE, good to see you. Thank you, John. Love having you on theCUBE because you always bring some great insight and we appreciate that. Thank you so much. First, before we get into some of the comments because I really want to delve into comments that David Richards said a few years ago, the CEO of WAN Disco. He said, Cloud's going to kill Hadoop. And people were looking at him like, oh my God, it's heretic, he's crazy. What was he talking about? But you might not need Hadoop if you can run serverless, Spark, TensorFlow. You talk about this off camera. So is Hadoop going to be the open stack of the big data world? Yes, I don't think Cloud necessarily killed Hadoop, although it is working on that because you go to Amazon, you could consume a bunch of services and you don't really need to think about Hadoop. I think Cloud native is sort of starting to kill Hadoop because Hadoop is three layers. It's a file system, HDFS, and then you have sort of a scheduling yarn and then you have applications started with like MapReduce and then evolve into things like Spark. So file system, I don't really need in a cloud. I use S3, I can use a database as a service as a pretty efficient way of storing data. For scheduling Kubernetes, it's a much more generic way of scheduling workloads. I am not confined to Spark and certain specific workloads. I can run with TensorFlow. I can run with data science tools, et cetera, just containerize. And so essentially why would I need Hadoop? If I can take the traditional tools people are now evolving and using like Jupyter notebooks, Spark, TensorFlow, those packages with Kubernetes on top of a database as a service and some object store, I have a much easier stack to work with and I could mobilize that whether it's in the cloud, on-prem, you know, on different vendors. And scale is important too, how do you scale it? Of course, and you have independent scaling between data and computation, unlike Hadoop. So I can just go to Google and use BigQuery or use DynamoDB on Amazon or Redshift or whatever and they automatically scale it out and then you know, they'll... That's a unique position. So essentially Hadoop versus Kubernetes is a top line story. And wouldn't that be ironic for Google because Google essentially created MapReduce and Cloudera ran with it and went public. But we're talking about 2008 timeframe, 2009 timeframe, back when Ventures were cloud was just emerging in the mainstream. So wouldn't it be ironic Kubernetes, which is being driven by Google, ends up taking over Hadoop in terms of running things on Kubernetes and cloud native vis-a-vis on-premise with Hadoop? People tend to give this comment about Google but essentially Yahoo started Hadoop. Google started the technology and just a couple of years after Hadoop started with Google they essentially moved to a different architecture with something called Percolator. So Google's not too associated with Hadoop and they're not really using this approach for a long time. Well they wrote the MapReduce paper and in the internal conversation that we reported on theCUBE about Google was, they just let that go. And Yahoo! They moved to something. Yeah, quite slightly differently. The companies that had the most experience were the first to leave. And I think in many respects what you're saying, Ron, is that as the marketplace realizes the outcomes that Hadoop is associated with they will find other ways of achieving those outcomes that might be more technical and sufficient. There's also a fundamental shift in the consumption where Hadoop was about ranking pages in a batch form, just collecting logs and ranking pages. The chances that people have today revolve around applying AI into business applications. So it needs to be a lot more concurrent, transactional, real-time-ish, which is nothing to do with Hadoop, okay? So that's why you'll see more and more workloads mobilizing into things like serverless functions, into service pre-canned services, et cetera. And Kubernetes is playing a good role here as providing the transport for migrating workloads across cloud providers, because I can use GKE, the Google Kubernetes, or Amazon Kubernetes, or Azure Kubernetes, and I can write the same application and deploy it in any cloud or on-prem on my own private cluster. It makes the infrastructure agnostic really in this application focus. Question about Kubernetes we heard on theCUBE earlier, the VP of Product of Blue Data said that Kubernetes ecosystem and community needs to do a better job with stateful. They nailed stateless, but stateful was stateful application support, something that they need help on. Do you agree with that comment? And then if so, what are alternatives for customers who care about state? They should use our product, I gave. Somebody about the database, maybe you're going to say. Before we get there, is Kubernetes struggling there? And if so, let's talk about your product. So I think there are challenges around it. There are many solutions in that. I think they're attacking it from a different approach. Many of them are essentially providing some block storage to containers, which is sort of not really Cloud 90. What you want to be able is have multiple containers access the same data. And that means either sharing through file systems, through object or through databases, because one container is generating, for example, ingestion or a screen. It writes a sample, another container is manipulating that same data. A third container may look for something in the data and generate a trigger or an action, okay? So you need shared access to data from those containers. Does the data synchronize all three of those things? Yes, because the data is the form of state. The form of state cannot be associated with a single container, which is what most of, I'm very active in CNCF and those committees, and you have all the storage guys in the committees that they think the block storage is the right solution, because they still think like virtual machines. But the general idea is that if you think about, Kubernetes is like the new OS, where you have many processes, they're just scattered around. In OS, the way for us to share state between processes in OS is whether through files or through databases and those four. And that's really what we're doing. Threads and inter-process communication, basically. And that's essentially, I gave maybe two years ago, a session in KubeCon in Europe about what we're doing on storing state, extremely high performance access from those container processes to our database that could impersonate as objects, files, streams, or time series data, et cetera. And then essentially all those workloads just mount on top of, and they can all share state. We can even control the access control for each individual application. So you feel you nailed the state problem? Yes, and we have, by the way, we have a managed service. Anyone could go today to our cloud, to our website, and just start trial. He gets his own Kubernetes cluster provision within less than 10 minutes, five to 10 minutes. With all those services pre-integrated, with Spark, Presto, Zeppelin notebooks, Jupyter notebooks, real-time utilities, serverless functions, all that, pre-configured on his own Kubernetes cluster. 100% compatible with Kubernetes. No impact. It's a real Kubernetes, and now we're... It's a Kubernetes investment. Yes, and what we've, we're just expanding it to more types of Kubernetes threats. Now it's working on bare metal or Amazon Kubernetes, EKS, I think, we're working on AKS and GKE as well with partnership with Azure and Google. And we're also building an edge solution that essentially exactly the same stat can run on an edge appliance in a factory, and you can essentially mobilize data and functions back and forth. So you can go and develop your workload, your application in the cloud, test it under simulation, push a single button, and teleport the artifacts into the edge factory or retailer. So is that like near real-time Kubernetes? It's a real-time Kubernetes. What if you think about the kind of things we're doing? It's all real-time. Interesting. Talk about real-time in the database world because you mentioned time series databases and you could do an object store versus block. When you talk about time series, you're really talking about data that's very relevant in the moment. And also understanding time series data, and then it's important post event, if you will, meaning how do you store it? You care. It's important to manage the time series. At the same time, it might not be as valuable as other data or valuable at certain points in time, which changes its relationship to how it's stored and how it's used. Talk about the dynamic of time series data. So first we sort of figured out in the last six or 12 months that essentially real-time is about time series. Everything you think about real-time sensor data, even video is a time series of frames, okay? And what everyone wants to do is ingest a huge amount of time series data. They want to cross-correlate it, because for example, even think about stock tickers. The stock has an impact from news feeds or Twitter feeds of a company or a segment. So essentially what you need to do is something called multivariant analysis of multiple time series and be able to extract some meaning and then decide if you want to sell or buy a stock as an application example. And there is a huge gap today in solutions in that markets because most of the time series databases were designed for operational databases. Things that monitor apps, not things that ingest millions of data points per second and cross-correlate and run real-time AI and analytics. So we've essentially extended, because we have a programmable database essentially under the hood, we've extended it to support time series data with about 50 to one compression ratio compared to some other solutions. We've worked with the customer, we've done sizing. He told us I need a half a petabyte after a small sizing exercise we got to about 10 to 20 terabytes of storage for the same data he stored in Cassandra for 500 terabyte. Huge ingestion rates and what's very important, we can do AI in-flight with all those cross-correlations. So that's something that working very well for us. That's going to help on smart mobility and kind of as 5G comes on, certainly intelligent edge. So the customers that we have, these cases that we applied right now is in financial services, two or three main application. One is tick data analytics. Everyone wants to be smarter, do deep learning on how they buy and sell stocks or manage risk. The second one is infrastructure monitoring, critical infrastructure monitoring and SLA monitoring is be able to monitor network devices, latencies, application, transaction rates, all that. Be able to predict potential failures or SLA degradations. We have similar applications within the telcos. We have about three telco customers using it for real-time time series analytics of network data, cybersecurity attacks, congestion avoidance, SLA management, and also automotive fleet management, ride ailing. They're all essentially feeding huge data sets of time series analytics. They're running cross-correlation and AI logic and based on that, they can generate triggers. And if you now apply to a dupe, what does a dupe have anything to do with those kinds of application? You cannot feed huge amount of data sets. You cannot react in real-time and it doesn't store time series efficiently. It becomes a poop. Yes, you said that. That's good. Yeah, I know we don't have a lot of time left but we're running out of time. But I want to make sure we get this out there. How are you engaging with customers? You guys got great technical solution. We know we can vouch for the tech chops that you guys have. We've seen your solution. If it's compatible with Kubernetes, certainly this is an alternative to have really great analytical infrastructure, cloud-native goodness that you're building. You do POCs, they go to your website and how do you engage? How do you get deals? How do people work with you? So because now we have a cloud service, so we also engage through the cloud. Mainly we're going after customers and leads that we generate from such events or from webinars and activities on the internet. And then we follow with those customers. We know the specific... So direct sales. Sort of direct sales but through the lead generation mechanism. Marketplace activity, Amazon, Azure Marketplaces. We have partnerships with Azure and Google now. With Azure we have joint selling activities. They can actually resell and get compensated on our solution as an edge for Azure. We're working on a similar solution with Google very focused on retailers. That's their current market focus of essentially think about stores that have a single supermarket will have more than a thousand cameras. Okay, just because they're monitoring shelves in real time. You think about Amazon Go kind of applications, real time inventory management. You cannot push thousand camera feeds into the cloud in order to analyze it and decide on inventory levels and practice actions. So those are the kind of applications, some of them. So it's bigger deals, we're on some big deals. Yes, we're not a Raspberry Pi kind of a solution. That's for a bigger customer. Got it. Well, Yaron, thanks for coming on Yaron. I've used the CTO of Iguazio, check them out. Obviously they've been great commentary. Love the Hadoop versus Kubernetes narrative. Love to explore that further with you. Stay with us for more coverage after this short break. We're live in day two of Cube NYC, part of Strata, Hadoop, Strata, Hadoop World, Cube Hadoop World, whatever you want to call it. It's all about the data. We'll bring it to you. Stay with us for more after this short break. Thank you.