 All right, welcome everyone from my voice, obviously. I'm very delighted to have you all here. I know it's late in the afternoon, and the refreshments are waiting for us. This is the last panel, so I will try to keep it very exciting and hopefully not make you sleep. I am Tushar Katarkhi. I am a product manager on the OpenShift team at Red Hat. And this is the panel discussion on AIML, on Kubernetes, and OpenShift. We have, I'm going to say it a couple of times so that it sinks in. We have a new SIG that we are creating in OpenShift Commons that Diane mentioned. So you can go and obviously, I'm going to plug it now and at the end, go and sign up on OpenShift Commons for the SIG if you're interested in this topic, especially towards the end when you heard from this esteemed panel here. So the logistics are, I thought we do some introduction to the topic and to the panelists for about 10, 15 minutes. Then we'll get into the main discussion itself, ask some questions, and then I'll give you an opportunity to ask some questions for, like, say, 10 minutes. And then we'll wrap up with some more plugs. Does that make sense? All right, cool. So a brief introduction to this topic. It's very exciting, right? I mean, AI, artificial intelligence, and machine learning is already touching our lives, be it driverless cars or automobiles, be it personal assistants like Alexa and Siri, or be it Netflix recommending your favorite movie or Pandora, your favorite music, or be it optimizing your optimal energy usage with Nest thermostats, I mean, there is some AI in there. I mean, it's really already there, right? I mean, we are saying it's new what's next, but in some ways it's already there and affecting our lives. So I'm very excited to talk about this today. My personal favorite quote unquote use case really was this thing I was reading that they are using basically pattern recognition and image recognition to reunite families that might have been separated in place like China for 20, 30 years based on photographs. Like, oh, I have photographs of my child and they were six years old. And now 30 years later, I mean, it's only possible because of AI, I mean, machine learning. And I remember doing pattern recognition 15, 20 years ago in grad school or whatever, and it was so much more rudimentary. And so anyhow, so I'm very excited about this. The one thing is, AI is not easy, right? I mean, AI has been talked about for a long, long time in academia. So why has it not happened so far? It's because we didn't have things like cloud and we didn't have things like data, big data. So it's certainly building upon a couple of huge technology trends, but also those technology trends and those use cases, as you know, are also moving very fast. So AI is building upon that. So it is complicated. It uses a number of languages, and we'll talk about that. It uses a number of frameworks. It uses a number of, it is very computationally intensive, resource hungry, so you really want to optimize the use of it. And it touches a number of different roles. I mean, data scientists are being an obvious new one, right? So we have all this complexity. And what I thought we would do really is the fact that one of the advantages that we have, I mean, we're going to talk about today, but over the next few days, we're going to talk about containers. I mean, simplistically putting, hopefully we can containerize some of this complexity of ways, how I'm looking at it, for some of this complexity that we saw. Containers also are lightweight. They are fast and efficient. And they enable you to, they're portable across a hybrid cloud footprint. So Kubernetes and OpenShift obviously have emerged as really as you saw in Chris Wright's keynote, have really emerged as a very powerful container platform. And in fact, you might call it the de facto platform. So it is great to have to moderate this panel on AIML and OpenShift and Kubernetes. So with that, let me start with the introduction of our panel here. I'll start with David first. David Arancic is a product manager at Google. He is a manager for Google Container Engine and has been shipping software for 20 years in various roles. He has been a founder of three startups and has had strengths at Microsoft, Amazon Chef, and now obviously at Google. David now is focusing on ML on Kubernetes. And he has some very exciting news that he'll share with us in a bit. But David, say hi to everyone and add something that I might have missed. That's 20 years worth of summary right there. So it's pretty good for me. All right. OK, so next we have Chris Oro. He is a product manager at Anaconda. He is a product manager for the data science platform. He has expertise in distributed systems, data engineering, and computational science workflows. He has a PhD in civil engineering from UT Austin. And prior to that, prior to Anaconda, he was at National Institute of Standards and Technology and IST, Southwestern Research Institute and University of Texas at Austin. Chris, say hi. And anything else you want to add? Yeah, hello, everyone. Thanks for having here. It's an honor. And our headquarters of Anaconda is about five blocks over. So this is our home base here. We have about 100 employees. And if you're not familiar with Anaconda, it's leading open source data science distribution. We have just about 5 million unique data science users around the world, Windows Mac Linux. And it's a lot of foundational pieces of data science, machine learning, and gateways for folks to get into things like notebooks and machine learning with TensorFlow. So excited to talk about that here today. 5 million. That's a big number. Very nice. Thanks, Chris, for that. And last but not least, we have Matt Fairley. He's a senior engineer and architect from Red Hat. He is the founding member of a project, Red Analytics, which he's going to talk about in a bit, which is about open source data analytics, ML platform based on Apache Spark on OpenShift and Kubernetes. He was one of the founding members of Sahara. I could say that, right? In OpenStack, which was the OpenStack big data processing project, he was involved in the University of Wisconsin condo project, where some of you who might know that that was like the high performance throughput computing project as kind of early pioneers in distributed cluster computing. Matt, you want to say hi? Hi, everyone. And that was great. Thank you very much. All right, thanks. I've known Matt for several years now, including the University of Wisconsin project. All right, so let's dive right into it, right? This was the introduction. So let's dive right into it so that we have a common understanding. What I'd like to do is, what does AI and ML mean to you? I'll start with you, David, and we'll circle around. You can go in any order. But what does it mean to you? Scope it a little bit. What is it? What is it not? Absolutely. So I always think it's funny and somewhat not great when people joke about AI coming to murder us all. And I am as guilty of that as anyone. But please, there are many, many people in the world who do not understand AI and ML and like to hear it, even as jokes, from the people who do know, not so great. So please, let me plead about that. When I think of ML, I basically think there are three categories of problems that it really unlocks that you've never seen before. The first are where things are hard today, but they're tractable. You could at least potentially do it. So that might be if you had a million pictures identifying what pictures had dogs in them, you could do that today with your standard algorithm. You could do it today with humans and so on and so forth. It would just take a really long time. Then there's the second category, which is there's problems you know how to solve, but effectively, they would be impossible to solve via computers at all. And that would be something like beating go, for example. That's something where we know the rules of go and theoretically you could beat it. But no, you could not use standard computational techniques today to beat go or be better than humans at that problem. And then the third is where we can't even really describe how to solve the problem. We know what a solution looks like or when we've succeeded, but there's no algorithm that we could come up with to solve it. And that would be something like identifying cancer in radiology, for example. We have kind of a generalized heuristic, but if you got 10 doctors together, they might still disagree. Yet, even now you have AI, or excuse me, ML, being able to look at this problem and make an assessment and be better than humans are today. And it's still improving even beyond that. So that's all to say, the three commonalities of those problems and that's where I say the definition of ML is. ML is being able to solve a problem without necessarily understanding exactly the methodology to get there. And that's not great, right? We actually should probably be able to understand it, but that's what we're unlocking today and that's probably the biggest thing that you're seeing. Yeah, when I think of machine learning, if I initially scope that to a library, say in Python or R, it's just a collection of algorithms or statistical algorithms that can be applied to different data sets. So the sort of import a library, run it on some data, that's the beginning of machine learning, but I often think of it on an implementation timeline and how do we make that useful for other people and how do we democratize that? So the following, the next two big stages I think of are you have a library, it has some statistical functions, you do things like model selection, there's dozens of steps beyond that. Once you have something sort of working, it works on my machine, how do I share this and make this useful for the rest of the world to build and improve on to match up with open source philosophy? And I think things like standardized formats, whether it's serializing data or sharing models in efficient ways, allowing other people to build on that modularly is a big deal. And when you start working with larger and larger groups, whether that's a foundation or an enterprise team, things like reproducibility, governance, traceability of those models becomes very important then deploying that out so people don't have to follow dozens of steps to get it up and running, it's very easy to get it up and running in any environment, HBC, cloud, on premise. And then beyond that stage of deployment and usability really comes the consuming of that, right? So we actually want our greatest audience to be able to consume that in a interactive visualization or just a browser, right? So there may be complicated technical stack underlying that and it all starts at the library and infrastructure level. But we think a lot and as I've watched different industries evolve, it's all been about model, about consolidating these models and APIs on a common framework and common tool set to really democratize the audience, the people building and consuming. So to me, machine learning is those libraries plus the reproducibility plus the collaboration at that human scale, like global scale in many different industries. Thanks, those are very interesting answers. I take kind of more of an engineer approach to it. I think of AI as a large body of research that's been ongoing for many, many decades. Within that you have things like knowledge representation and machine learning and then within machine learning you have things like neural networks and deep learning and whatnot. So I kind of think of it from a structured perspective that way. When it comes to the scope or the impact, it's more of how our AI machine learning is giving us ways to interpret the world, interpret all the data that's around us and giving us new ways to interact with the world and interact with other people. To try, you gave some examples of AI machine learning apps that people may interact with on a daily basis if they could buy a Tesla or something like that. But in reality, AI machine learning is really ubiquitous already. Google search is an example of this that's been around for a very long time and is really part of people's lives at this point. On the kind of like, I think you said, what is it not? I like to think that it's not, and I'm going to kind of violate David's comments a little bit, but it's not the destruction of humanity and it's also not the savior of humanity. And really, it's also not a salad dressing, although people might, given the fact that it's so hyped right now, people might say it is. One of the things you hear also in this context is, oh, it'll kill all the jobs. And I think it's not even that, because just the example of driverless cars, I mean, driverless doesn't mean that you can sleep inside. I mean, for the next 10, 15 years, you still have to pay probably some attention. You know, you may not... We've had many kind of like revolutions of these things that people think are going to destroy humanity or whatnot and spoiler, we're still here and things are for the most part getting better, getting whatnot, so. We'll have the same thing with AI as soon as a lot of the hype settles down and the reality of what people can do with it and how you interact with it in your life actually becomes more clear. I'd like to support that. I actually just want to say one thing on top of that, yes and it, which is, it won't kill us. It won't kill all the jobs or anything like that. But only if we, all the people in this room and the people watching are responsible and think about you are all technology implementers, creators and so on. Please do think as you're doing this stuff. Don't rely on someone else doing the hard work and being aware of that. But yes, I totally agree that it by itself will not do it but we do need responsible implementation. Yeah, ultimately it's a tool and we need to use that tool in good ways. Very good. There's a good introduction to that topic. So next is what I was going to ask each one of you and you can go in any particular order but what is a favorite use case for you? I mean, something that you get excited about, oh, I want to make this work today because it'll solve this problem and so what gets you excited in this? I mean, just one example, maybe multiple examples, it's up to you. I'm sorry, I can start on this at this time. So I think my favorite example and my favorite use case or whatnot is somewhat a selfish one. So I've never been particularly good at foreign languages. I took lots of foreign languages in high school and college but never really kind of like conversed myself in an environment to actually use them, to actually communicate with people and I think the translation capabilities that are coming out right now are really going to make it much easier for people to communicate and much easier for me to communicate with people that I wouldn't otherwise be able to. For me, a little bit of my background is in civil engineering and specifically like life safety systems, building protection systems. So really kept an eye on building systems and building integration as it comes together with many different manufacturers. If you look at a building, inherently it's sort of boring. It's a boxy structure with rooms. But as soon as you start recording and logging information like energy uses, temperature, occupancy, and you put all of that together and aggregate it, you actually get a really beautiful picture of how that building behaves as it interacts with people and people interact with it. And at a city scale, it helps things like emergency responders and it's a nice example of how to bring something that was formerly static, sort of online and something that we can monitor over time and become integrated even with many, many different subsystems of many different types. So to me, the complexity of that and how we wrap that all up into some useful metrics for people is a pretty awesome example of a smart building or a smart city. And to David's earlier point, some of this insights, we don't even know what we're looking for. Right. And I mean, machine learning will help us. David. I spend all my time in ML, so I'm always astounded at everything. I'll try and keep it super brief. I'm gonna, I have a talk tomorrow. I'm gonna give away some of the stuff I'll talk about. But one of them that I love is from Google. Google, as you may know, we have a lot of data centers. We hire some smart data center people. And there's this term in data centers called PUE, power usage efficiency. You want it to be as low as possible. And it means like literally the number of cycles per power. And we hire really good data center engineers that understand this and try and roll it out. And so we looked at it and they're like, oh, all these fans and water and cooling and things like that, they kind of look like signals for ML. And so we hooked it up and the power usage and efficiency went like this. Bam, bam, bam, bam. ML on, down, ML off, bam. Right back up. And literally, and we're very public about this, we save 15% on our power just by using ML against these data centers. It's just such a phenomenally large number that you can't even believe. And again, it's not something you could just wire up instantly, but it was really, really important. The second thing I do want to say is one of the most interesting areas of ML today, particularly use case-wise, something called GANs. Is anyone familiar with those? Generative adversarial networks. Basically, imagine the incredibly simple summary is you take two AIs, or excuse me, two ML frameworks, models, and you pit them against each other. So the first one tries to figure out a solution and the second one tries to figure out something that breaks the first solution. You just force them to go against each other. It's just unbelievable and you're getting things and this helped with the translation and it's just, you see this all over the place because for us right now, we're entering this phase where we're actually having trouble coming up with exactly the right data to, you know, face off against the model. And I remain very, very optimistic that this is the area where you'll see the most creativity coming because literally, we can't even think of the ways to break the model in such a way that the model can get better anymore and you're seeing some really interesting stuff there. Very cool, very cool. All right, so what I'll do next is kind of try to get more deeper into it. I talked earlier about how this all sounds very exciting, but it's also complex and it's not easy. So starting with David, one of the things that I was gonna ask is, what are some of the challenges in AI and ML that you see today and how is Kubernetes playing a role in it? And maybe there's a time to talk about one of the things that you wanna talk about. Sure, so generally I see a couple of really big things. The first is really the approachability of ML. Today, if I sat any of you down for any non-ML practitioners and walked you through what the average ML person did, you'd be shocked at how absolutely tribal and back of the envelope and, you know, oh, maybe I'll tweak this number, the way it is right now, which is really, really disturbing for a bunch of CS folks who are like, well, there should be a standard process for going through this and exploring, and I think we'll get a lot better there. But part of it relates to the second big problem, which is real transparency and understanding, being able to prove into a model. So if I run an application today, I can attach to the process and see exactly what's being called, what time, and what's using memory and things like that, and there's not that kind of level of introspection into model and how it's performing right now, which is not great. That's kind of a second major effort. And both of those kind of relate to the third thing, and this is something I'll talk a lot about in a little bit, but it's basically, one of the biggest problems is, is that it's not easy to roll out a standard ML deployment. Everything is very bespoke. You kind of piece together whatever works and how you would like to approach it. And that's not great, because that means that that standard tooling doesn't work anymore. You now need to not only create this stack, but then on the other half of that, create a set of tooling that supports your stack and lets you introspect and lets you understand what's going on, lets you auto-tune and all those various things. That said, that's where I hope that Kubernetes can help us. Today, people generally create their stacks from the bottoms up all the way down. They understand exactly what version of Python, what libraries they're running, what networks, all these various things. And that's just too much for the average data scientist to approach. And the data science shouldn't have to think about that. And that's where Kubernetes has really changed the game. They create this wonderful standard abstraction over the infrastructure that you're running on and not just create an abstraction, but actually create rich objects that allow you to interact with various components of the platform. And to your point, like, help you wire a bunch of services together. So I remain very optimistic and like I said, I'll talk about it in just a little bit. What I think the future looks like for ML and Kubernetes. Definitely. So in terms of data science with Python and R, many of our users, if you sort of climb the stack over the past couple of years with our users, Anaconda solved the problem of I need to get Jupyter up and running with TensorFlow with all of its Fortran and C dependencies as quickly as possible across Windows, Mac and Linux. So that was a good thing. The next question was, now I want to share this analysis or this model or this server or this visualization with my buddy on a different operating system, or so they need to install system packages, allow these things on their firewall, get these other libraries in, all the mismatched version of this. So Docker was a nice addition to something like an Anaconda Python distribution where sort of where the open source Kanda package manager left off in terms of environments, it took over and said, oh, now I can bake everything into an image and it's very portable. And then the next layer was resource management, scalability, orchestration, and that's where Kubernetes came in because what we found about a year ago was that our users were building amazingly different things and amazing things with Anaconda and the last mile was, now what? Now, how do I deploy this thing out? So data science deployment, you can read blog posts of 20 easy steps to data science deployment, right? Spin up a web server, do a reverse proxy, set up your SSL certs, hook it with authentication on and on. But what Kubernetes did for us was to create an ecosystem that was a common interface and a way to take what we had already built with the package manager, with Docker images and containerization and give us that last mile where now someone using Python 2, Python 3, R, Lua, Julia, Fortran, C dependency, any of that doesn't matter, can all play in the same space. They no longer have to worry about things clobbering around another environment. It all just works at that abstraction layer for these people who just want microservice, they just want their model to run and share that run alongside others so they can build on top of it without having to go through that over and over and over. So this has been very important for enterprises adopting things like machine learning and ML, large organizations working together and really democratizing environments, right? The fact that they can deploy to the cloud or on-prem without changing anything is a game changer. That means we don't have to switch APIs every single time we want to deploy somewhere. So really, in the last year, deployment has data science deployment from anywhere from interactive Viz models to machine learning libraries and all of the above has just become that much more pervasive through things like Kubernetes, containerization, isolation and orchestration. So I'll be quick and I'll build a little bit on what was said here, I think. And also, I mean, touch upon data, right? Like one of the things and how data is so important for all this. Well, so quickly, I think Kubernetes has done a tremendous job really providing the API, the interfaces that's expected by operations people, by system ins, by developers. It's really codified a lot of their best practices over many, many decades of experience. There is, with AI machine learning, there is a shift in the way that the systems operate, the way that they're built that Kubernetes is gonna have to adapt to to some extent. It's, there's an understanding that I think is really being formed will be talked about with data later is how data scientists operate and what expectations they have and then what expectations are the things that they build have on the infrastructure that they're running. A concrete example that we usually use is thinking about BitRot. Thinking about BitRot from a developer perspective is you've deployed some piece of code and ROT is something that happens over long periods of time, usually with some sort of dependency changes or some sort of input changes and things fail and things fail in a fairly drastic fashion. And the infrastructure, Kubernetes understands how to deal with systems that do that. AI machine learning systems are inherently more statistical based. They're not, they don't give you that clear things that failed. They just start performing suboptimally and detecting that suboptimal performance, being able to have infrastructure that can respond to it is something that is really gonna be more and more important as we try to take the data science artifacts that are produced from the bespoke systems and put them on to something that an enterprise really understands. It's gonna be a challenge for Kubernetes. God, and it's not available. So Matt, let me start with a kind of a new line of questions in some ways. Is like, tell us a little more about what Red Hat is doing in this space and how there's some of the projects that you are leading. What are you contributing? Do you want to walk us through that? Sure, just really, really quickly from my perspective, truly from my perspective. Really think Red Hat's getting into this conversation about AI machine learning in, I'll say, two important ways, three interacting with our customers, one is that we're really consuming best practices for AI machine learning. We're understanding what it means to build these applications, to use these applications to improve our business and whatnot, so that becomes really important to be a practitioner of these things. The second thing we're doing is we're starting to have the conversation as to how do we influence, how do we give back to what these best practices are. So Chris mentioned the RAD analytics work that's going on, this is an output which is starting to form some of the understandings that we've built up over the last number of years using AI, using machine learning. So I think that's really key. And then the third thing, probably to add into this is we're actually putting out services and software for our customers that don't have a big AI machine learning stamp on them because in the end it's a tool to do something but are actually powered by AI machine learning. Chris mentioned too earlier today, he mentioned the Red Hat Insights, also mentions the work that's happening with OpenShift.io. I'll add a third which is actually kind of like intelligent routing for support or getting customers to solutions more quickly using AI machine learning. Cool. Who wants to talk about some of the work that is happening? I mean, we talked a little earlier about the Kubernetes resource management working group and some of the work that is happening with respect to GPGPUs and enablement of that and some of the work that is happening in Kubernetes. Who wants to talk about that? I'll throw a couple words. So there's work happening with those working groups around figuring out the kind of like hardware technology that is becoming more and more important for these machine learning algorithms and making sure that hardware is exposed and accessible to the algorithms that are actually running on top of OpenShift. There's the, I think it's the performance sensitive pod, application pod work that's happening to really kind of like make sure that Kubernetes has a very solid foundation, not just in the ops APIs, the developer APIs, but then in the hardware support to make it worthwhile to use those APIs. All right, so the next question really was, what I was going to tee off really was this way, right? So if, I mean, it's one of the things that everybody is thinking and like everybody makes these decisions, right? Like why should we care now, right? So can you address that, especially like, you know how data is important and even if you decide to do something today, you know, you might not have collected the data, how difficult data is for an organization to collect internally, but also from outside and what's the dynamics there as well as, you know, how that's important for AI and ML. That's fine. So, I mean, I think that we are awash in data in a way that we've never been before. You know, literally we are collecting data from every movement, right? Every device, Fitbit trackers, you know, the sensors in this room, repeating heat and thermostats and so on and so forth, all the way up to the largest possible, you know, number of queries, user behavior and things like that. So we're at this phase where it's just absolutely transformative relative to data. And I think there will be a really important transformation that goes on, you know, a lot of researchers out there nowadays would argue that actually with all the hardware investments and all the data investments, we have everything that's necessary to make these great decisions. The problem is it's not available in a format or a way for us to consume it or our algorithms are just wrong and too slow and so on and so forth. And I think that's a really valid point. I think probably the biggest problem today and then when you think about your average pipeline and the average number of the amount of time that anyone spends building a model versus everything else that's involved in building out ML, you know, building a model is very, very small versus ingesting, getting rid of outliers, feature engineering, transforming it, moving the data around in a pipeline regular way, let alone after it comes out, you know, are you tracking it? Are you being responsible security wise? All that good stuff. I think a lot of this stuff is gated on the process and the pipelines rather than just, you know, the actual implementation of building a model. I think a couple of interesting things we've observed in the last year happening in the data world, one of them is related to data virtualization, right? The data is going to live in different places. We know that it's gonna live in different formats and that's something that data science both at the machine learning library level and at the operational Kubernetes level that we know as a fact and folks have worked really hard to build in connectivity whether it's Amazon S3, Google Storage, on-prem, HDFS, different connectors are, you know, it's very important that those, that they have high value to be maintained over time, right, that's a lot of hard work that goes into that and things like standard data formats and best practices for things like Apache Parquet or Colomer data stores, these have become, so from the Python and R perspective and data science, right, Python has connectors to just about every data format and data source you can imagine. It's part of the Python data science and the kind of philosophy, right, to just connect to all sorts of remote data and compute sources, but what really happens is we're seeing users exercise, you know, for a given problem, it's best to use Parquet stored in this particular data store for performance reasons, for training reasons, so we're seeing a lot of that as we see these get exercised in different verticals and another thing interesting we've seen in the last year is generating synthetic data when training, right, so just sometimes you just don't have enough data to get started when you need to do model selection on natural language processing or image classification and we've seen really interesting use of generating huge data sets in parallel that can be used for the training iterative process and then you can bring in the real data on a rolling basis, so between those two things, data format, data storage, especially many remote sources, I think that it is hard work and it's something that was recognized up front in Kubernetes and containers and it's going to be hard work to continue to maintain but we're gonna learn the high value connections of things like standardized data formats as the training, whether it's image classification or NLP, orders of magnitude difference in performance when you use the right tool for the right job with the right data format and the right data storage, so that's all starting to come together and I think we're learning a lot to that together even in the open source and cloud activity that's going on. Cool, I was gonna go to the Q&A next but before that I wanted to give you each an opportunity to kind of talk a little bit about something that's happening here at the conference or a plug that you want to do for an announcement that you want to make so David, you want to start? Yeah, absolutely, thank you. Well, my plug is for my talk tomorrow, please come. But I do want to talk about something that we're doing just between everyone in the room and those on Facebook. We're launching something which is designed to solve exactly a lot of the problems that we've talked about here on stage. It's called Kubeflow and the idea is it is a standard ML stack for running ML on top of Kubernetes. It is not about re-implementing all the great hard work that's out there in the world. TensorFlow, XGBoos, Scikit-learn, anything like that or any of the UIs or any of the transformation tools. It is really about much in the same way that Kubernetes didn't go and re-implement a database serving tool or something like that. It just allowed you to take that containerized tool and spin it up in a very elegant way but not just elegant but also portable and very scalable so you could deploy it to your laptop, you could deploy it to a GPU rig, you could deploy it to a cluster all with the same command repeatedly. And that is something that we're very, very happy to get out the door because this is something that we hear so often from customers. Oh geez, I wanted to go down ML but I had to completely re-implement that stack or I had to build it myself or my data scientist had the wrong version of Python and so everything failed. Our goal is with Kubeflow to be this very open framework that the community can come together and help collaborate on. So we're kind of being quiet about it right now, we love your thoughts but the GitHub repo is open and please join in. All right. My plug is for Anaconda, what else? If, I guess how many folks in here have downloaded or used Anaconda? There. So if you haven't, it's a free download, Windows Mac and Linux, it's up to 1,000 libraries for Python and R. Any area you can think of, image classification, natural language processing, NLTK, Gensim, Jupyter Notebooks and machine learning, we've been very busy adding more and more libraries, TensorFlow with GPU support and the nice thing is you just kind of install it, it's all pre-compiled across Windows Mac or Linux, makes it very easy to use, free to use on any of those platforms and then we have Anaconda Enterprise which you can sign up for a 30-day trial of, it's pretty much a manifestation of data science platform with collaboration, authentication, security but it's all powered on the underlying Anaconda distribution and the Kanda package manager. So if you haven't used that and you're tired of living in dependency hell and dealing with Fortran C libraries, system libraries and when doing machine learning, try out Anaconda and let us know what you think. Cool, Matt. So I'm gonna bookend you here with Kubeflow on the other side too. I think one of the really important things that we should be looking at when it comes to something like Kubeflow and what's David's gonna talk more about is there are many, many organizations out there who have been producing bespoke solutions for building these pipelines, building these flows, trying to put them into production, trying to address how data scientists work, trying to address how operations folks work. So there's a lot of engineering work that's gone into this. There's also a lot of engineering product work that's gone into this. You see cloud vendors are producing solutions for these almost on a weekly basis if you look over the last few months or so. What's really missing in the space right now is a place where a community can form to really not just have a single vendor or a single instance position on what this flow should be, what the interfaces should be, how things should fit together. So I'm really hoping, looking forward to Kubeflow as a potential place where that community work can really happen. Cool, and David forgot one, so let me plug it for him. There's a Birds of a Feather, BOF session at 7.30, something like that this evening. Tomorrow at, I think it's early, I think it's 5.30, but it's on the schedule, it's Birds of a Feather, ML on Kubernetes, I'm gonna be there to talk about Kubeflow, but I wanna be clear, it is the organization, the community that comes together and wants to talk about whatever they want, that they take in that direction. All right, thank you. We'll come back to you for one final thought, but we'll do some Q&A now. So I'll open it up, and then I'll. I'm gonna put in one more plug too, if you go to commons.openshift.org, halfway down the page, there is an ML working group that we're starting up on the OpenShift Commons. So if you're interested in this and you wanna get involved and hear more about the best practices and lessons that we're learning around Kubeflow, please sign up there as well. So do we have any questions in the audience? I know it's towards the end of the day. There's one right next here. One might go, but she's still up. Hi, I was just wondering if there was anything in the predictive analytics space in particular that you're excited about that you feel is really interesting? I mean, honest to God, I see something almost every day. I think the thing that I have, I approached this and I'm not at ML PhD by any stretch of the imagination. I came at it like the ML space as being like, well, how hard could it be to take an existing model and transfer it to this new space? And what I've found is that unfortunately today, it is highly, highly bespoke. Like you could have a predictive model that is 98% accurate in the movie theater ticket space that doesn't work in the baseball game ticket space. And it's just kind of crazy, but that is where we are. So I guess the things that I'm most optimistic about are things around transfer learning, really, where you, for those that don't know, basically imagine your model is, I don't know, 100 neuron or 100 layers. You strip off the last five layers and then you retrain on a much smaller set of data. I think that's very exciting to me. And there's a lot to do there to say the least, but that reduces the total amount of data you need by a factor of 100 or more often. And that's quite compelling. Yeah, for me, it's a little bit more rudimentary, but it's exciting to watch our users going through the process of dropping their batch jobs in terms of models that are constantly training, constantly running instead of this daily or weekly thing. That's exciting because there's a lot of wasted time that goes into the iterations and the daily iterations as opposed to just bringing something online and having it run sort of on an ongoing basis. And the other part that's interesting, again, it's not cutting edge, but it's watching our users refactor the way that they work into microservices. So what would have previously been a monolithic image classifier with a UI built onto it with a very specific declarative way of doing something, let's say recognizing images or edges, is now completely different in the way we're seeing our users in the past year sort of build a specialized API that just does the classification and a specialized front end for that that's modular that can swap between the different backends. So actually watching that roll out into the larger masses and not just the developers, the bleeding edge developers is actually really nice to watch. And it lets us sort of focus around the best tool for the best job instead of a monolithic approach to everything. So we're starting to see those projects get deprecated and sometimes broken up to microservices that are actually healthier than the original monolith. So it's exciting to watch that happen as things get adopted more and more across the industry. So just two quick things then. One is I wanna add on to David's comments about transfer learning. People need to watch the space as the vast majority of the complexity that's happening in data science and data engineering work, model design and whatnot and being able to reuse that as a developer is gonna be hugely empowering. So that's really key to watch out for. To your question about something happening in the predictive space, as a user, I'm really excited about more work that's happening around kind of like news curation or really curation in my RSS feeds. There's more and more work happening there. It's just saving lots of time in the morning. Yeah, I mean, at least from a radar perspective to add to that, right? I mean, something that you mentioned, using the digital exhaust like logs and metrics and stuff like that and how do we make our systems much more smarter in terms of scaling or in terms of even a better fault tolerance, et cetera, et cetera is something of a lot of interest for us from a radar perspective. So one more question, all right there, go ahead. Hi, so I've heard a lot of people complain of how when technology advances, humanity regresses. So how as technologists or advocate of technology are we supposed to educate people to use this to our advantage and not the opposite? We really need to become data literate. We need to understand what the sources are. We need to be teaching people to understand what data is, how data is used, and what the potential is. And really, as from a personal perspective, also looking at understanding what our, to use the term, our data exhaust is in the world today. I think if we can teach people, if we can educate people about data, how it's used, how they can use it, and then what they're actually producing, we'll have the building blocks for not regressing. I think a big piece of empowering both the producers and consumers of ML and AI is about transparency, reproducibility of the models and the analyses themselves. So I think a bad example is treating something as a black box, only runs in a certain environment and we don't really know why it works so well, but it works great and saves us money. It's not a good approach. You know, I come from a sort of civil engineering, very hands-on physical engineering. I think ML to me is the same. I often relate to it in a way that when I think about the hyperparameters or distributions that are going through, I want to see those all the way through. I don't ever want to see a step that, I don't really know what happened to that distribution or hyperparameter, but it looks good. You know, that, I think, you know, as producers, we need to be very careful to document and be very open about the stages that are going through the layers and what intentionally each layer was put in for in a given analysis. And then as a consumer, we need to ask those questions of, hey, this is a really nice model. It performs, it performs the others by 10%, but why? Can you reason through why, whether it's a physical model reasoning or a hyperparameter tuning, just having that global picture often lets us not focus so much on one well-performing local model, but really get a big picture of the context, you know, especially depending on the industry you're in, in finance or healthcare, that can have really big impact. So reproducible transparency of the models, I think, are the two key things, as a citizen data scientist, right, to always exercise and either produce or ask of things that are being produced. I think those are both terrific answers. I, you know, I think if I was going to kind of generalize it a little bit, there are two key things that these both factor into. Is anyone familiar with Chaos Monkey? It's the Netflix tool that they use to actively kill machines randomly to tease out issues. Like, technology is not neutral. We need to be aware of that. We are technologists. We need to be aware that it is not neutral. It has a positive or negative effect. And it is up to us to be our own Chaos Monkeys for the technology we roll out. We need to be probing in every possible way and be mindful that, hey, have I checked to make sure that this model that I rolled out doesn't actively bias against a certain population, whatever that might be. Have I checked to make sure that the, you know, what the edge cases look like here? And hey, is this an area? Let's be clear, you are not reinventing anything. I promise you, right? If you develop a loan analysis system for looking at someone's credit report, there is literally 50 years of sociological research showing how minorities and people of color and women were biased against through standard methodologies, like long before ML came along. Go be smart about that area and the ways that it impacts you, even prior to technology, and be thoughtful and mindful about entering that space again. And don't try and be an expert in it either. You will get it wrong. Go find an expert and chat with them and understand how you can be better at it. So that's largely it, but to be honest, we will never educate the population. We need to be doing the educational work on our own and really have a hard line about this. Great answers. Next question. Yeah, I guess I'd follow on. You guys were talking about it from a producer of ML technologies and I'm thinking about it from a consumer of ML technologies in terms of is there development or some kind of transparency guidelines that we can use to figure out, okay, when a ML model makes a certain decision, why is it making that decision? And is there a way that I can tell if I'm consuming Google's version of this algorithm versus Microsoft's version of this algorithm, which one is gonna give me the appropriate transparency and how do I trace it to make sure that the decision is not a biased decision on a case-by-case basis, but also on a software-by-software basis? I'll give a very concrete example of that. I use Google Maps a lot and I always wonder how many times is it experimenting this time on me, you know? So, and then that goes to the consumer transparency is that, I mean, will we know that you are being experimented upon this time? No, you will never know that. Yeah. That's their goal is to, and I say that as a Google employee, but any AI, or any solution, and this is not AI or ML related, they're just gonna experiment. They wanna see whether or not a new thing works. So that's not the problem. I know technology is not the solution or panacea or anything like that. My hope is that, and I know this is my job to pitch my new thing, but my hope is that by creating standardized ML stacks with somewhat standardized reusable components, we will develop standardized reusable transparency tools for that. It is impossible to look at, for example, there are two very, very popular image recognition models right now out there right now, ResNet and ImageNet, they're both very, very successful and they both perform better than human right now. You could not use transparency analysis tools that you built for one with the other. They're just completely different layouts and models and so on. And so my hope is that by building some of these standard tools, you could do it. But let me make a pitch out there. I would love someone to build Chaos Monkey for ML, meaning you don't need to introspect into the model, right? Like you could build this and say, hey, I have a set of multiple different population types as data that I can feed into your model and get the results back on the other end. And it doesn't have to be real humans, they can be totally anonymized, but like if at the end your model comes out and it's biased, then you're like, hey, something bad is happening here. And it doesn't give you the transparency that we all should demand, and literally there are 100 PhDs working on introspecting into models today, but at least then we have some awareness. And so I will pitch that and I will endorse and find Google engineers to help you if you wanna leave that kind of thing. Exactly, but test cases where it's not like where we know what the population source was. The population source is not made available to the model. You just hand these objects in and some results come out the other side and that test on the other side looks for bias against populations. So two quick things. One, going back to my definition of what AI is, what machine learning is and what not, we need to understand that machine learning, there are sub areas in machine learning. Neural networks, deep learning capabilities is one particular area that's being applied a whole lot right now. And it specifically has interpretability concerns associated with it. There are other approaches that are better in some use cases, worse in others like image recognition speech, things like that, but are interpretable. When it does come to neural networks, I think we need to extend your question as to, is it, it's not just the focus on the model, but it gets more to kind of like what David was talking about is it's, it's the focus on the model, the code that was associated with it and the data that was used to drive it. And we'll, we don't know it, we don't have an answer now, but if we can start capturing these things, smart people will figure out how we can get an answer to it. Then one more question. Yeah, David to your point, I think we had chaos monkey like 10 years ago in the financial industry, they all used AI and they never saw it coming, but what I wanted to ask, where do you think will be the main contributions of AI when we talk about things like the self-driving data center? When I listen to your answers, David, I think you're hedging a little bit that you say right now that complexity is too high, we have to focus on abstraction. Will you foresee a future where maybe the scheduling is predictive that you already know, okay, to assure the performance of a workload, this is where I have to put it, I can provision more stuff, or how do you see this play out? Sorry, so if that's what you took away, I apologize, I don't think the complexity is too high at all, right? We have existence proof of us solving that problem. I think in my opinion right now, the problem generally relates to the approachability of using AI or ML for your data center, for your self-driving data center, that is too high. And by that I mean literally the interface between a model and your system is broken, it's highly bespoke, meaning either I have to rewrite my model in some very specific way, or I have to build some crazy feature engineering tool to translate the data that I have into something that's actually usable, or then I have to, like even if I get answers, you know, do I have the correct feedback loop so that as my take action on my answers, it's feeding back in properly, like all that is broken right now. Like it's very approach, or it's very implementable, the problem is that there's no standards. And so what we're, all three of us on the stage are contributing towards this stack. And again, I don't wanna hold this up as the end all be all solution, but my hope is that we're able to develop some standards as an industry around stacks, around ways to ingest data, ways to spit out answers, getting feedback loops, all that kind of stuff, where it doesn't. And to be clear, like I said, we have data centers that we do it at Google, we have internal services that literally self-drive our data centers. So it's absolutely possible. It's just, how do we make that available to everyone? Democratize, that's the word that was used. Everyone's democratizing, I'm done with that. I wanna feudalize our data center. All right, with that, I think this is a great panel discussion I thought. I hope you all agree. Thanks to our panelists. And I hope to see you all over the next few days, including the few sessions that we have.