 From Austin, Texas, it's theCUBE, covering DockerCon 2017. Brought to you by Docker and support from its ecosystem partners. Hi, I'm Stu Miniman with my co-host, Jim Copielus, who's been digging into all the application development angles. Happy to welcome back to the program here at DockerCon. Yaron Aviv, who's the co-founder and CTO of Iguazio. Yaron, great to see you. Thanks. How have you been? Great, great. Been busy traveling a lot. We talked about, you know, some of us celebrated Passover recently. I had brisket at home. We had Franklin's barbecue brisket here. Anthony Bourdain said, the only two people that know how to do brisket well are Franklins and the Jews. So, you know. Yeah, so we had Passover, a lot of good food, but also a lot of traveling. I was also in the Kubernetes conference in Europe and here prior to that big data show, so it's a lot of traveling, yeah. So, you know, Kubernetes, Docker, ecosystem, you've been watching this, your company's involved in it. What's your take on the state of the ecosystem and what do you think of the announcements this week? Yeah, so, you know, I've also been to the Kubernetes conference and you see those are still small, relatively small shows and it's mostly developer focused. What we see is that sort of Kubernetes is taking a lot of share from the others because most of the guys that adopt are not enterprises yet. You know, it's people that have a large enough infrastructure that they want to use it internally and Kubernetes is a little more flexible and on the other end, you see Docker trying to create sort of a VMware-like shrink-raft version of container infrastructure. So, we see those too and there's obviously the public cloud with their fully integrated stack. Now, what I noticed here in the show and also when I was a couple of weeks ago in the Kubernetes conference that, you know, think about this stack. It has like, let's say, 20 components. So someone like Amazon brings the entire 20 components and it's fully integrated and secure and networking and storage and data services and everything. And here what you'll see is a lot of vendors, this guy has those four components, the other guys have those five components, in some cases actually overlap. So this guy will have three unique components and two other components, et cetera. And it's very hard to assemble a full-blown solution. So as a buyer, how do you decide which components I'm going to choose? Okay, that's part of the challenge and also helps sort of the cloud guys. Yeah, it goes back. I remember when I first joined Wikibon, we talked about, you know, the hyperscale model was you take your team of PhDs, you know, you just architect your application and software. You're the enterprise though, you want to, you know, you don't have that talent, so you will spend money to buy that, you know, package solution. I want to buy it as a service. I want to buy it, you know, easy. You know, where do you see kind of the maturity of this market and how that fits for, kind of what can the enterprise consume? How do they do it? Or do they just go to platforms? You know, like like- Yeah, so this is why we position Iguazio as a platform. We're not a component. We're a fully integrated system. We have multi-tendency. We have security. We have data lifecycle management. We integrate with applications. We have our own UI, you know, but it's focused more on the data services. So if you take like a dozen Amazon data services like Kinesis and Dynamo and others and object and file, we basically pack all of them because data is the biggest challenge, as you know. You know, high availability, versioning, you know, reliability, security, the biggest, toughest challenge are the data. And once you resolve that one, the applications, they all become stateless. That's much easier. Okay, now there still needs to be a bigger ecosystem around it, which is why we're doing a lot more work with CNCF as the organization and trying to create standards for the different interactions between those components. So when a buyer goes and buys a certain component from one vendor, it doesn't necessarily locks into that. You can just go and modify it in the future, okay? But I think once you solve the data problem of the persistency, which is sort of the toughest challenge in this environment, the rest of it becomes simpler. So, and one of the questions Jim's been asking this week is, you know, where analytics fits in. I look at kind of your, you know, real-time continuous analytics piece. Not an application that I heard talked about too much. Maybe we can give you your viewpoint on it. And the relevance is, of course, much of the application development that's going on, the hot stuff is related to artificial intelligence on streaming analytics. Which is where we focus on, you know, some things that I try to work with different communities is explained that right now we have bifurcation. We have sort of the Apache ecosystem and we have sort of the Docker ecosystem, totally, you know, separate ecosystems. And by the way, you know that cloud is where most analytics happens. So basically analytics and cloud technology have to converge. And this is what we've been trying to pitch. It is, you know, why do you use a yarn as a scheduler where I can use Kubernetes and it's more generic because I can schedule any type of work, okay? So this is something that we're trying to push in. And all this notion of continuous integration, when we say continuous analytics, it's not just about the real-time aspect. It's also about the continuous development and integration, okay? So you actually want this notion of just like serverless, you know, function, which is one of my things I like. And also just immutable code and infrastructure. You want to adopt those notions. So think about, you know, analytics is going to go into real-time more and more, okay? So that means that let's say I have my connected car pipeline that I get streams and I process and I generate insights. What happens if I found bug in my application or I just want to enhance it and create another feature? I want to be able to just push a new version of my analytics code into some platforms, you know, hopefully ours. You also want to train that new algorithm as well to make sure it's fit for every type of system. But you have to have this notion of continuity, which means all the integration with data have to be different. It has to be a lot more atomic. It has to be checkpointed. All those things that I can basically knock down my analytic process and relaunch it and it goes seamlessly in continuous. And that's not the Apache model that you played around at Duke Camp enough. It's sort of more, a lot more legacy kind of approach, which I don't connect to. Yeah, Euron, maybe complete out the stack that you're building. How does a serverless fit into this also? Okay, so basically we're building all the data engines. You know, we're doing streaming, we're doing objects, files, no SQL, SQL. For us it's all integrated into the same very high performance engine. It also have built-in analytics so we can build things like joins and aggregations and a lot of computation on the data as it injects. And it could basically present itself as many different things, okay? Now one of the things that we get asked from customers and we demonstrate that in Strata, let's assume I'm throwing an image into this thing, okay? I want to be able to immediately analyze the image and say if there is a face, if there is something suspicious about the picture or maybe even simple things like extract metadata, information like geolocation of the picture so I can do something with it. So we had to develop internally an event-driven processing we didn't call a serverless internally, but where you throw data and basically it immediately launches and triggers a process which is a Docker container-based process that has high-speed message-based integration into our data platform that immediately invokes and processes that in a very elastic fashion. If you throw thousands of objects and basically elastically generates multiple workers to work on that, that's also how we design things like DR and backup internally in our platform to be very flexible. So we can build DR to S3. How do we do it? We basically have serverless functions that know how to convert the updates into a continuous stream of updates and then they just go and there is a small code that says okay, go right to S3. That allows me a lot of flexibility to develop new features. So this is all this notion of data lifecycle management with very advanced in our product is actually based on serverless functions. We just didn't call it serverless. One of the things that we're working now with the community is trying to detach that portion from our product and contribute it as an open source project because it's much faster and much more optimized than what you'll see including IBM WISC or Amazon Lambda implementation of that. Are you working with the Apache or are you working in the context of the Apache framework to expose, for example, machine learning pipeline functions as serverless functions? So again, Apache is not the right necessarily place to do that. Yeah, they do the spark and all those things. They do the spark and all that. But we do want the Kubernetes environment to deal with all the orchestration requirements for that thing. So the way that we do for TensorFlow integration is that we may expose a file into TensorFlow on one end to be able to look at the image. And at the same time, the metadata updates of what the image contains is exposed to TensorFlow as sort of a key value store or document store. Basically just updates attributes on the same image. So the way that we work, we're working with now with healthcare, think about like an MRI image lens. And something looks at the MRI image and senses cancer. So basically you can immediately take the same image with records, with field to say, contains cancer, buy this guy, take picture of this guy. And then when you want to run a query and say, you know what, give me all the MRIs, pictures that contain query, it now flips and acts like a database and you just go all those images, okay? So it's basically a different approach to how to do those things. All right, you're on, talked about Docker and containers, Kubernetes, serverless. What are virtual machines fit into the environment? So I had some interesting conversations at Kubernetes with some friends that are sort of high ranked in this industry without disclosing. It's like, do we really need open stack in between bare metal and containers? Because the traditional approach is, okay, you know, we have bare metal, we need to put virtualization layer for isolation and then we need to put, you know, Kubernetes or Docker. And we figure out the very little amount of risk actually in putting, especially with the new security things around containers and image signing and what we do, which is authenticating the container, not the infrastructure on data access, you know, network isolation, all those things that eventually can collapse and eliminate virtualization. But not for every application. Some applications with a more traditional legacy application may still require VMs because it's quite a different philosophy to develop microservices and develop VMs. And part of what I see here in the show is that not everyone internalized that. Okay, people still think in the notion of, oh, here's my lightweight VM that happened to be called Docker container and I'm going to give it the volume and I'm going to create snapshots on that volume and all that stuff. But if you think about what is really microservices is about allowing this sort of elasticity. So the same workload can spawn multiple workers, okay? Is the ability to go and create update versions? Is the ability to knock down this container anytime I want and just kill it and launch it in a different place? You know how Google works or Amazon or eBay or all those guys basically killing containers on purpose to basically test their system. So all this notion that my configuration and my logs and all that stuff sits inside the container is not cloud native and it doesn't allow you this sort of elasticity that you want if you're building a Netflix or an eBay or a modern enterprise infrastructure. So I think we need to put those two things aside. You have a legacy application, keep them in the VMs. Okay, you have new workloads. You need to think of data integration and microservices differently as something which is entirely stateless. The image of the container builds from the git, okay? And create a Docker image. And if you want to go to a different image, you just go and recreate from source the same image. The data for that image needs to be stored in a data facility like a database or an object or something like that. Your own final question I have for you is talk a little bit about the customers that you're interacting with. You know, talk about the people that are here as you said, there's a spectrum of how far along they are in the thinking. You're pretty advanced in some of your architectural thoughts and opinionated as to where you're going. You know, where are the customers today? How many of them are ready for the future versus, you know, kind of sticking with what they've got? So what you mentioned before, part of the key challenge for enterprises is they all want to move into the digital transformation. They all want to be competitive because some of them have existential threats. Think about even banks, you know, today where Apple comes with Apple Pay, it kills a lot of the margins that, you know, they're making from all those small transactions. And now no one really cares how many branches you have in the bank because of all the white generations just go to their mobile app. So someone like a bank have to immediately transition and be able to offer premium services, offer better experience through the mobile application, be able to analyze the user behaviors and things that are more strategic. So the traditional things that IT deals with, like exchange server management, you know, SAP, all those sort of legacy things will move to the cloud because there's no real value. And what you see is more and more enterprises thinking about how do we generate the differentiation which is more about analyzing data and be able to provide better service to the customers. And the biggest challenge is they don't know how to do it because what the industry tells them, go to Apache and take a dozen or project and now integrate those and figure out the security problem. And you know what, you want to add Kubernetes, that's from a different story, but let's try and glue this together and that's extremely complicated. So what we're trying to do is go to those customers, say, you know what, we're building a full blown solution, fully integrated, security is baked in, all the different data services, it's integrated with things like Kubernetes natively. We actually do the extra mile of, we actually build like Spark and TensorFlow and all those already images that contain everything, including the support for us that you could just launch Spark and it connects and works. So we want to make life easier for those enterprises to solve those key challenges that they're working on. And this is working extremely well for us. Actually, the challenge we have, we only have, I think, two sales guys and we have a huge pipeline and we can't really deliver for most of those projects. Good challenges to have sometimes. So I talked about scaling, which has been one of the themes of the week here. You're on, Javier, great to catch up with you as always. We'll be back with two days of our coverage here at DockerCon 2017. You're watching theCUBE.