 Live from Seattle, Washington. It's theCUBE, covering KubeCon and CloudNativeCon North America 2018. Brought to you by Red Hat, the CloudNative Computing Foundation and its ecosystem partners. Okay, welcome back everyone. We are here live with Kube coverage. Three days of wall-to-wall coverage here at KubeCon, CloudNativeCon 2018 in Seattle. I'm John Furrier with theCUBE. It's Stu Miniman here, breaking it down. We're at day two. We got a lot of action. John Chik, who's the head of open source ML strategy at Microsoft Azure, formerly of Google now at Microsoft. Welcome back to theCUBE. We had a great chat at Copenhagen. Good to see you. Great to see you too. Thank you so much for having me. You've been there from day one. It's still kind of day one in Copenhagen. It's still growing. You got a new gig. You're at Microsoft formerly at Google. You had a great talk at Google Next, by the way, which we watched and caught on online. You're still doing the same thing. Take a minute to explain kind of what the new job is. What your focus is. Absolutely. So in many ways, I'm doing a very similar job to the one I was doing at Google, except now across all of Azure. You know, when you look at machine learning today, the truth of the matter is, it is about open source. It's about pulling in the best from academia and open source contributors, developers, across the spectrum. And while I was at Google, I was able to launch the Kubeflow project, which solves a very specific, but very important problem. Now that you look at Azure, a company that is growing, excuse me, a division that is growing extremely quickly and looking to expand their overall open source offerings, make investments, work with partners and projects, and make sure that researchers and customers are able to get to machine learning solutions very quickly, I'm coming in to help them think about how to make those investments and accelerate customers' overall time to solutions. So both on the commercial side of Azure, which has got a business objective to make money, but also open source. Is it still open source for you? Is it all open source, crossing a little bit of both? Just quickly clarify that. Yeah, there's no question. Obviously Azure's a business, they pay me a salary and we're going to have a great first party solution for all of these various things. But the reality is, much like Kubernetes has both a commercial offering and an open source offering, I think that all the major cloud providers will have that kind of duality. They'll work in open source and you can measure how many contributions and what they're doing in the open source projects, but then they'll also have hosted and other versions that make it easier for customers to migrate their data and adopt some of these new solutions. You know, one of the things that's interesting on that point, because this is a super important point, is that open source community that's here with Kubernetes, around Kubernetes, you know, it's all kind of upstream kind of concept, but the downstream impacts are IT and your classic developers. You have your open source kind of thing going on, and that's the core of this community and event. The IT investments are shifting. In 2019, we are seeing the trend of somewhat radical, but certainly a reimagining of the IT. I mean, certainly you guys have gone cloud at Azure, has seen that result. Absolutely. Good pickup by customers, Office 365. That's now a SaaS, it's now, now you got cloud, you got cloud scale. This is where machine learning is really shining. So the question to you is, what do you think is going to be the big impact in 2019 to IT investment strategies in terms of what they, how they procure and consume technology, how they build their apps with the new goodness coming in from Kubernetes, et cetera. Absolutely. You know, I remember back in the day, you know, I was an IT admin myself and I carried a pager for literally when, you know, a machine went down or a power supply went out or this RAM was bad or something like that. Today, if you went to even the most sophisticated IT shop, they would be like, what are you, crazy? You should never carry a pager for that. You should have a system that understands it's okay if something that low level goes out. That's exactly what Kubernetes provided. It provided this abstraction layer on top of this. So if it went down, Kubernetes knew how to reschedule a pod and move things back and forth. Taking that one step further, now into machine learning. Unfortunately today, people are carrying pagers for the equivalent of if a power supply goes out or something goes wrong. It's still way too low level. We're asking data scientists, ML engineers to think about how to provision pods, how to work on drivers, how to do all these very, very low level things. With things like Kubernetes, with things like Kubeflow, you're now able to give higher level abstraction so a data scientist can come in and open up their Jupyter Notebook, work on a model, see how it works, and when they're done, they hit a button and it will provision out all the machines necessary, all the drivers, everything, spin it up, run that training job and bring it back and shut everything down. So Dave, I wonder if you can help expand on that a little bit more. So one of the things that's great about Kubernetes is it can live into a diverse amount of infrastructures. One of the biggest challenges with machine learning is where's my data, how do I get to the right place, where do I do the training? We've been spending a couple of years looking at edge and what's the connectivity and how are we going to do this? Could you help just kind of pan this picture of the landscape and what do we have solved and what are we working at trying to put together? Yeah, I think that's a really excellent question. Today, there's so much focus on, well, are you going to choose PyTorch or TensorFlow, CNTK, MXNet, NumPy, scikit-learn, there are a bunch of really great frameworks out there done in the open source and we're really excited. But the reality is when you look at the overall landscape, that's just 5% of the work that the average data scientist goes through. Exactly to your point, how do I get my data in? How do I transform it? How do I visualize it? Generate statistics on it. Make sure that it's not biased towards certain populations and then once I'm done training, how do I roll it out to production and monitor it and log in and all these things? And that's really what we're talking about. That's what we try to work on when it comes to Kubeflow is to think about this in a much broader sense and so you take things like data. The reality is you can't beat the speed of light. If I have a petabyte of data here, it's going to take a long time to move it over there and so you're going to be really thoughtful about those kind of things. I'm very hopeful that academic research and industry will figure out ways to reduce the amount of data and make it much more sane in overall addressing this problem and make it easier to train in various locations. But the reality is I think you're ultimately going to have models and training and inference move to many, many different locations and so you'll do inference at the edge on my phone or on a little Bluetooth device in the corner of my house saying whether or not it's too hot or too cold. We're going to need that kind of intelligence and we're going to do that kind of training and data collection at the edge. Do you see the landscape evolving where you have specialty ML? For instance, like the big concentration in IoT is compute to the data, reach that latency. Do you see machine learning models moving around at code so I can throw machine learning at a problem and then is that where Kubernetes fits in? I'm trying to put together a mental model of how to think about how ML scales. What's your vision on that? How do you see that evolving? Yeah, absolutely. I think that going back to what we talked about at the beginning, we're really moving to much more of a solution-driven architecture today. ML is great and the academic research is phenomenal but it is academic research. It didn't really start to take off until people invented things or created things like ImageNet and MobileNet and things like that that did very important things like object detection but then people, commercial researchers were able to take that and move that into locations where people actually needed it. I think you will continue to see that migration. I don't think you're going to have single ML models that do 100 different things. You're going to have a single ML model that does a vertical specific thing, anomaly detection in whatever, factories and you're going to use that in a whole variety of locations rather than trying to develop one ML model to solve them all. So is application specific or vertical? Absolutely. Because the data is super important. Quality data, clean data is clean results. Dirty data, bad results. Absolutely right. People have been in this kind of virtuous circle of cleaning data. You guys know it, Google, certainly, Microsoft as well. Data quality is critical but you've got the horizontally scalable cloud but you need specialism around the data and for the ML. How do you see that? I mean, obviously it sounds like the right architecture but this is where the finesse is and the nuance. How do you see that? So, you know, you bring up a really interesting point. Today, the biggest problem is how much data there is, right? It's not a matter of whether or not you're able to process it. You are but it's so easy to get lost, to get caught in little anomalies. You know, if you have a petabyte of data and whatever a megabyte of it is the thing that's causing your model to go sideways, that's really hard to detect. I think what you're seeing right now is a lot of academic research which I'm very optimistic about that will ultimately reduce that, that will both call out, hey, this particular data smells kind of weird, maybe take a closer look at this, or you will see a smaller need for training. You know, where it was once a petabyte, you're able to train on just 10 gigabytes. I'm very optimistic that both of those things happen and as you start to get to that, you get better signal to noise and you start saying, oh, in fact, this is questionable data. Let's move that off to the side or spend more time on it rather than what happens today, which is, oh, I got this model and it works pretty well. I'm just going to throw everything at it and try and get some answer out and then we'll go from there. And that's where a lot of false pauses come in, all that good stuff. Absolutely. All right, so take it to the next level, here at KubeCon, CloudNativeCon, in this community where Kubernetes is the center of all these sets of services and building blocks. Where's the ML action? What, if I'm showing I want to jump in this community, I'm watching this with, hey, you know what? I got Amazon's web services, Reinvent just pumping up a lot of ML and AI, SageMaker and a bunch of other things. What's going on in this community? Where are the projects? What are the notable things? Where can I jump in and engage? What's the map look like? How do I navigate? Absolutely, so obviously I'm pretty biased. I helped start Kubeflow. We're very, very excited about that. So Kubeflow's one. Yeah, absolutely. But let me speak a little bit more broadly. Kubernetes gives you this wonderful platform, highly scalable, incredibly portable. And I can't overstate how valuable that portability is. The reality is that customers have, we talked about data a bunch already. They have data on-prem, they have data in CloudA, CloudB, it's everywhere, they want to bring it together, they want to bring the training and the inference to where the data is. Kubernetes solves that for you. It gives you portability, it lets you abstract away the underlying stuff, it gives you great scalability and reliability and it lets you compose these highly complex pipelines together that let you do real training anywhere. Rather than having to take all your data and move it to a cloud and train on a single VM that you're not sure whether or not it's been updated or not, this is the way to go. Versus the old way, which was what? Because that's an easier way orchestrating and managing that. What was the alternative? The alternative was you built it yourself. You pieced together a whole bunch of solutions, you wired it together, you made sure that this service over here had the right user account to access the data that that service over there was outputting. It was just crazy town. Now, you use Kubernetes constructs, you use first class objects, you extend the native Kubernetes API and it works on your laptop and it works on CloudA and B and on-prem and wherever you need it to run the training rig. And that's the magic, basically. Absolutely. Alright, so multi-cloud has come up a lot. Hybrid Cloud is the buzzword of the year, I call that the 2018, maybe 2019 buzzword, but I think the real end game and all this is from a customer standpoint that we are reporting on SiliconANGLE on theCUBE is choice, multi-vendor is the new, multi-cloud is the, multi-cloud is the modern version of the old multi-vendor concept, which basically is choice. Absolutely. So how does Kubernetes fit into the multi-cloud? Why is that good for the industry? What's your take on that? Can you share your perspective? Absolutely, so when you go and look at the recent write scale reports, 81% of enterprises today are multi-cloud, full stop, 81%. And not just one cloud, they're on five different clouds, that could be on-prem, could be multi-zone, could be Google or Amazon or Azure. Sales force, you can't help you define cloud. They're spreading, they're doing it because that kind of portability is right for their business. Kubernetes gives you the opportunity to operate in an abstraction layer that works across all of these clouds. So whether or not you're on your laptop and you're using Docker or Minicube, you're on your private training rig, whether or not you go to Google Cloud or Azure, on Google Cloud GKE, Azure, you have AKS, you're able to build CI CD systems, continuous delivery systems that use common Kubernetes constructs. I want to roll this application out, I want there to be seven pods, I want it to have an endpoint that looks like this, and that works anywhere you have a Kubernetes conformant cluster. And when it gets to really complex apps like machine learning, you're able to do that at even a higher level using constructs like Kubeflow and all the many, many packages that go into Kubeflow. We have NVIDIA contributing and Uber, we have Intel and, I mean, just countless. Cisco, I hesitate to keep naming names because I'll be here all day, but we have literally over a hundred contributors. Cisco is a great tailwind for Cisco, they're going to have network, everybody wins that the CI CD sides for developers, one common construct, the network guys get more appropriate because if you decompose an application, the network ties it together. So everybody wins in the stack. Absolutely. I think hybrid is really interesting. Hybrid kind of gets a dirty word, people are like, oh my God, why would you ever deploy to multiple clouds? Why would you ever spread across multiple clouds? And that I agree with. A true hybrid deployment today is in, well I'm going to take my app and I'm going to spread it across six different locations. In fact, what you really want to do is have isolated deployments to each place that enables you in a single button deploy to all three of these locations. But to isolate them, to have this particular application go and if AWS has an adage, GCP is there. Or if GCP has an adage, Azure is there. And you can do that very readily or you can bring it closed for geographic reasons or legal reasons or whatever it might be. Those kind of flexibility, that ability to take a single construct of your application and deploy it to each one of these locations, not spreading them, but in fact just giving you that flexibility gives you pricing power, gives you flexibility and lets you take advantage of what's needed to that cost. If you have the operating model, if the CICD is common and that's the key value right there. Absolutely right. David, thanks so much for coming on theCUBE. As usual, great commentary, great insight there. Been there from the beginning. Just final question, predictions for 2019. And Kubernetes, what's going to happen in 2019 with Kubernetes? What's your prediction? Well, I think you've heard this message over and over again. You're seeing Kubernetes become boring and that is incredibly powerful. The stability, the flexibility, people are building enormous businesses on top of it. But not just that. They're also continuing to build things like the custom resource definition, which lets you extend Kubernetes in a safe and secure way. And that's incredibly important. That means you don't have to go and check in code into the main tree in order to make extensions. You're able to build on top of it. And you're seeing more and more businesses build great solutions, customer-focused solutions. Well, next time we get together, I want to do a drill down on what the word stack means. I heard people say Kubernetes stack. I'm like, yeah, I think they love the stack, where it's not a stack anymore. Sets the services. David, thanks so much for coming on. I appreciate it. We're here with theCUBE coverage live here in Seattle for KubeCon Cloud Native. And I'm John Furrier with Stu Miniman. We'll be back with more after this short break.