 Test hey everybody we're gonna take about five more minutes just to get set up here on some Navy things So it's a good time to grab some coffee and some fruit back there And then we'll go ahead and get started Okay, we're gonna go ahead and get started there's still a lot of folks that are out Waiting to come in but in the interest of time and trying to keep to a very busy schedule I'm Josh bottom Helped to organize this event I'd say about 30 years ago I showed up in Southern, California with everything I owned in the back of the car and got my first job as a encoder and the folks in Southern, California have always been very welcoming to me and I can tell you Over the last couple of years. I've been going around talking to people about machine learning Projects and I go around the country around the world to talk to those some of the most interesting Most most innovative type things are happening right here in the LA basin and the other piece that I would say is that the people are very open Right when we first started this I Work on the cupola community We reported back to them that I was seeing a lot of interest in LA And I said hey, you know, I would like to bring the community down to LA to do this so we could grow it and and try to make things better And so I talked to Constantinos here and he said yeah, let's do it. And then we've got the Taya Everybody that is in the cube flow community today on the community call If you could stand up just so people can see who you are that are participants in the community so these are all folks that Volunteer their time To build this project. What's you know, I always think about why do people spend their extra time, right? and If you took how many people signed up for this it was about 300 people 350 people if you think about what those salaries equal that's like 50 million dollars a year in talent that this room will represent when it's filled The piece that I would tell you is Why do you do these things if you look what happened at cruise automation anybody familiar with cruise automation, right? They were a company that was bought by GM. They were doing self-driving cars They were bought about two years ago for 587 million dollars last year last year Softbank and Honda put in another two billion dollars into that company which is held off right now Cruise automation is valued at 14 billion dollars When when those investments by Honda and Softbank went in to cruise automation GM stock went up two points. I mean, this is not just impacting the Software today is no longer doing a linear improvement of People's process is we're getting into an exponential space and One of the leading places that I have seen that people are investing is in cube flow right so What I wanted to do today was then bring this all together and say look the heartbeat of Cube flow to me have been these core contributors and one of them is right here on on this stage So I would like you all to welcome Jeremy Louie from From Google who you know nights weekends holidays He's there to answer questions and you know, I would just like to get a round of applause for Jeremy for everything that he's done Over the last year Jeremy. Thanks so much. Thank you Before I hand it off all the speakers what I'm asking them is Why do you do this? Why do you spend your extra time? Because I think there's a lot of really interesting applications going on outside in the world So we'll talk about one of them, which is this natural language code search that one of our collaborators is working on Hey, do you want to answer the same question? Why do you work on this? Why are you I? Work on people because I think it's a great example of a really complex on technical challenge that Open source as well All right, so with that, let's get started. So my way of introduction. My name is Jeremy Levy as Josh said We're very warm introduction. And so I'm the tech leader Google for a cupo Ted you want to introduce yourself? I'm tail Amkin, and I'm the open source So thank you all very much for coming today to hear about cupo and thank you Josh for putting this together So We'll be talking today about cupo and kicking this things off and we're gonna be talking about building the next platform ML platform together And so here's the agenda for today. We're gonna start off by talking about, you know, why we built cupos So, you know, we're gonna start talking about the problem that we're seeing in industry, which is that Putting ML into production into products takes way too long and we're trying to use cupo and Kubernetes to solve this problem And then we're gonna describe in a little bit more detail, you know What we mean by cupo as a platform and in particular why we describe cupo as a composable portable and scalable platform On Kubernetes and then finally table talk to take today is going to talk about the community You know what we're doing and how our water community looks like and then finally we'll kick it off close things out with a Brief demo of what we have coming up in our next release, which will be 0.5 at the end of this month And so what we hope you take away from today's talk and this day as a whole is that you know With cupo, we're trying to make it easy to build machine learning products using Kubernetes Let's start off by talking about why cupo So here's an example of a product that one of our collaborators Hamill who's a data scientist at GitHub is trying is trying to build Right. So what you're seeing is you're seeing a search engine for code So he typed into the search engine the query ping rest API server and return results And then we're seeing that him return some results which are samples of code that are pulled from github And so the key thing here is that if you actually looked at those Examples that he's returning none of the words that appear in the search query appear in the code so it's not like he's doing keyword searching and matching against you know the name of the Function or the comments in the code and the reason he's doing that is because he's actually used machine learning To learn what that code is doing and then learn how to map that to natural language so that he can search Using natural language right and so this has a lot of you know amazing use cases for something like github and outside github Because now you can really use github to or to search code to find code Examples that do what you need and has other applications like you know People who don't speak who aren't let's say English speakers can search for language for code using their natural Their native language and find code that was perhaps written by somebody Who doesn't speak their language? So this slide sort of captures the journey he went through and it captures the sort of challenge We see throughout the industry so he started off by building a prototype and exploring his idea in a notebook Right and so a notebook in case folks aren't familiar is an interactive environment where or data scientists used to sort of rapidly Prototype and test ideas right And so with that he spent about two weeks trying to you know build a model using some data from github and then sort of run some predictions inside the notebook to sort of do a Reality insanity check to see whether the results he was getting were useful and valuable So after about two weeks, he thought that that was successful. They saw that there was merits and that there's To this idea that he could actually build a machine learning model to solve this problem so then he spent about three days sort of you know turning this into a pretty sort of web application putting it behind like a Pretty web server that you can sort of see here You know but bundling it up into like a class gap and he did that so he could you know create a prototype that he could shop Around github and get people really excited And get them to sort of invest in it But then the next step was actually sort of launching this publicly as an experiment on Experiments.github.com and that took three months right so most of his time and energy as a data scientist Was not spent developing the model or trying to make the model better It was actually getting the model into production and getting it launched right and this is what we see throughout the industry and The reason we see this is because in reality when you're building an ML product Which you end up with is a complex distributed system and you end up with all the DevOps challenges associated with managing and Large complex systems right so this is a slide or diagram that's taken from a picture from a paper published by Google That was intended to describe why we spend so much time investing in machine learning related infrastructure, right? And so the key take what takeaway is that your machine learning code, right? So the algorithm in terms of the energy and sort of effort spent it actually ends up being only a small part of your investment Most of your time and energy ends up going into the supporting infrastructure around it that actually turns this into a product so managing your resources like your Your VMs or your CPUs Doing logging and monitoring security all of that stuff And so this is really the genesis of Kublo we said, you know Kubernetes is really great at DevOps It's really great at building complex distributed systems Can we take advantage of Kublo Kubernetes and really make a creative platform for machine learning? Let's describe Kublo is a platform in a little bit more detail So this is what the machine learning development workflow looks like right so as a data scientist or You know trying to sort of build a product. These are the kinds of steps you go through right? So you start off by doing you know data analysis and data transformation You know where do I get my data from how do I transform that data into data that I can inject into my model? How do I clean and validate that data? Then you go into the next step where you actually have to Build a model, you know train it using your framework whether that's TensorFlow or sci-kits learn And then once you have the model you actually have to roll that out into production And once you have a model, you know running in production serving traffic You have all the DevOps challenges associated with you know keeping that up and running and healthy And so for each of these steps we have you know different frameworks or different applications and tools and libraries that you might want to be using Some of those are going to be developed by the Kubeflow community and that's where we sort of you know indicated by the stars Those are the steps where a Kubeflow has seen some gaps And we've decided to try to fill those gaps by building some some applications of my library But there's also you know a large ecosystem already of tooling and that exists out In the community and we want to make it super easy for people to use that in a cohesive platform so this is what the This slide is sort of aimed at giving you a sense of what the ML landscape sort of looks like and it's a diagram that's being produced by the deep learning foundation at The deep learning foundation under the CNCS And so what they've done is they've sort of divided the world up into these different areas So you have like traditional machine learning you have deep learning Then you have your models you have natural language processing You have no folks That second deep learning is actually supposed to be data management. I apologize and then each of these boxes you can blow them up and you have a lot of different Libraries and applications that you can use so if we look at you know traditional machine learning We have libraries like spark ML lib that you can use you can use libraries like scikit-learn You can use XG boost Within deep learning you can use you know tensorflow or PY torch or a whole bunch of other frameworks and even then Once you've picked your framework. You have a bunch of high-level libraries like Keras And we want that you can use to really build your models, right? So the challenge we see is that as a data scientist and ML engineer you have this incredible landscape That's very rich and full of tools, but actually cobbling these together Into a cohesive platform is a really challenging and time-consuming problem Which is not where you want to be spending your time and so with kube flow We're trying to sort of organize the world and create a cohesive platform That's really easy for data scientists and ML engineers to consume So this is how we sort of you know diagram our architecture And so this is one way to sort of you kube flow and get a sense of what it looks like So at the lowest level you can think of we have a bunch of api's and Services that are very low level in the sense that they do one thing and they try to do that one thing very well And some of these services are going to be developed within kube flow by the kube flow community But a lot of these are going to be outside the kube flow community that we're just going to sort of incorporate and integrate with So as an example, we've created some kubernetes custom resources To make it super easy to take advantage of kubernetes to run Deep learning frameworks in a distributed fashion so you can scale out to train really large models on really large data That's by using multiple machines But we're also leveraging Services and applications in the in the broader landscape like Argo, which is an orchestration framework You like directed acyclic graph type workflows And we've been starting to work with you know spark There's a spark resource for running spark jobs on kubernetes And then in a way we go above this we have some Systems as well call them that combines some of these lower level services into more complete functionality, right? So it's one was one example. We have a pipeline system which combines all bunch of different low-level services to give users a full experience in terms of Describing and running and managing complex ML workflows, right? So we use some of these low-level resources like Argo to run Workflows, but then we provide a UI on top of that that makes it easy to visualize your runs and the results You know like the ROC curves for your models As well as run those on a regular schedule So then you can actually combine these services into complex workflows like I showed you before Where you don't just do you know train a model you preprocess your data Then you train your model and then you roll it out into production And then at the layer above this we have a bunch of tools and libraries So we have some command line tools like arena and some we have some libraries like fairing They're really sort of focused on end users and really trying to create a you know that usability layer that sort of hides some of the low-level details of you know Kubernetes and some of the other infrastructure so that We have a really data scientists friendly and usable platform And then we have sort of what I will call these you know vertical areas you know orchestration being one where We try to tie all these layers together to the you know we provide a cohesive platform So orchestration is one because we want to provide a you know data scientists want to you know Do these complex ML workflows so they have to tie together multiple services? And then metadata is the other one because I know from all these complex At every stage of these workflows we want to keep track of what's happening so that we can understand what data We process and how that was used to produce our models And so here's here's sort of the takeaway We want to produce a platform Where the goal is users can consume their favorite ML applications and libraries And we want to make that so that's what we mean by composable is they can use whatever library or framework They want to use we want to be scalable So we want to enable them to you know run either lots of jobs or take advantage of multiple machines So scale out horizontally They're also vertically by using larger machines or using GPUs and we want to be portable So we want to be able to run you know in the cloud or on an on-prem data center Or even on their laptop and for that we're taking advantage of Kubernetes to provide a platform that runs anywhere And so our hope really and one of the reasons that we're sort of trying to grow the community is we really think that As more apps sort of integrate with Kublo That's going to attract more users to use Kublo because they can find what they're looking for on Kublo And that in turn is going to convince more people to make their apps play well and integrate well with Kublo So we're really trying to drive that that virtuous cycle And so with that I'm going to hand it over to Taya to talk about our community Thanks, Jeremy. Um, I'm going to start off by talking about some of the core principles that we've Established in the Kublo community that help us organize not only how we structure ourselves socially But also how we pursue some of our technical challenges So the first so the first of these three core principles is for us the most important We are an open and inclusive One of the things that Jeremy has touched on is that building an ML platform is a giant challenge and we can't do that alone We think that the Kubernetes community is a great model for how to successfully engage a large community To tackle like this huge technical hurdle And we really want to architect our community in a way that empowers Contributors to take ownership of the compatibility points that What this means is that we Consider all members of the community to be equal and have an equal opportunity to contribute their ideas and expertise All of all ideas and perspectives are welcome So what that means is that whenever we put a POC or a proposal out there? You really deeply want that it's really important to us That we hear from a wide variety of both users and contributors to make sure that you flow 60,000 products and we're also doing something internally right now Within the community to define their past We think that it's really important to have a diverse set of very invested leaders from a variety of companies that are invested in Kublo to make sure And then We also have a really interesting And basically So it's more like what to do and what not to do. We're actually hoping to roll that out as something that others can borrow from as well And so just in closing for this slide I think it's really important to have a diverse community of builders serving a very very diverse community of users You don't have just like one profile I'm going to hand it back over to Jeremy to cover two other principles that help us organize Yeah, so we have two two principles that are a little bit more technical in nature The first thing is a little bit more product in nature The first is we really want to focus on having a low bar and high ceiling So low bar means we really want to make it super easy to get started with Kubeflow so minimize the number of Kubernetes concepts that People have to learn to get started So if you go back to that original slide about what Hamel was doing as a data scientist We want to make it super easy for somebody like Hamel who's not a dev ops engineer to get started You know prototyping and developing his model, you know in a notebook on Kubeflow Without sort of having to get distracted by learning Kubernetes But then once he he has to get that model into production and he has to start thinking about some of the more complex You know dev ops things like scaling his model and security. We want that to be a smooth transition It's doing those more complex tasks. We don't want him to fall off a cliff We're now he's got a completely rewrite his model and we choose a different set of tools in order to take that model into production And then the final principle is that you know We want to be Kubernetes native and one of the things that this means is that we're taking a hard dependency on Kubernetes Right, and one of the reasons we do that is as a community is that when we're you know designing You know how we're going to do things whether you know individual applications or things like our how we're going to support notebooks We want to be able to focus on How we're going to do it on Kubernetes and not sort of have to think about and have discussions about You know what to do and how to support non Kubernetes native platforms And so this really helps us as a you know at a technical level in that regard And one of the reasons we can do this is because we think Kubernetes has been super successful And it's now supported and runs almost anywhere. So we don't see this as a we see this as a huge advantage And with that I'm going to hand it back to Teya I'm going to dive a little bit deeper into our first principle Openness and inclusivity and what that means in practice for our community So we believe that open source is a powerful model for collaboration for Project like this that has so many different components. So many different communities that we need to be compatible with And we really draw on the expertise of our community members to help provide that compatibility over a large surface area We also want to provide a kind of test bed for machine learning innovation Machine learning is still in the early stages as far as like the tooling that we need to support machine learning programs and production and I think we we really want to Decentralize the innovation that happens in the project itself to make sure that people feel comfortable sharing the ideas And that the different components in the project serve as kind of test pieces for how to incorporate Different technologies and see kind of which ideas Prove out to be the best one based on their usage within the community's components in the GitHub organization So that's worked out well so far. We have a really giant community Well, giant is the relative term But I think it's growing very very quickly. Um, so as far as like our 0.4 release We had 137 individual pr creators which represented a 47 percent increase over the last release We had 1053 pr's um for a 0.4, which is also a 43 percent increase over the previous release We have 140 and counting community contributors that come from 25 and more companies And you can just take a look at the logos on the right hand side to see all of the people that are investing in the project today And there are very many more that can fit on this slide Okay, so, um, during the mentioned how important it is to provide Immigration with people to demonstrate its use in the machine learning market today We have some of our best demo creators in the audience Right now, and we have a pretty good path for how to share your ideas as far as interesting Immigration projects and COCs in the community So I just wanted to tell you how that works in practice First, um, it's really great to connect with us in all the community channels to use Um, that's flat for asynchronous communication GitHub, which is our kind of gravity and force of truth for all technical conversations in the community And then our mailing list, which is more how we socialize ideas And make sure people are aware of what's going on in the project So after you get traction around your idea in one of those panels And come up with an interesting demo you can provide it You can demo it at our community meetings. We have um once weekly community meetings that happen At alternating time zones to make sure that we serve all of our community members wherever they happen to be And then we also encourage people who have gone through the effort of creating a demo to contribute it to our medium blog Medium blog is a little bit, um, you know I'm in the progress in the process of being a better place for ecosystem projects to kind of Be shared and heard there. Um, but we really would like more contributors On the blog, so please let us know if you're interested in providing content Um, okay, so a more kind of generic topic contribution for people If you are interested in getting more involved in the project Would you have an idea for an issue that you want to solve or just Want to somehow get involved first? We think that it's really important to start by participating in community meetings Participating in or reading mailing list discussions. That's the best way to kind of surround yourself in the technical milieu of the project And then of course we at our country could contribute or guide. It's really important that you Acquaint yourself with the processes that we use to get contributions through to make sure that your contribution doesn't fall through the tracks And that um the investment that you put in it is well So a good place to start is a good first issue or help wanted issues that are tagged in the community In the github repose with that help wanted area Daniel open a github issue Create a small proposal for more involved changes That you can use to build consensus with the community about your proposed design And then once you've gotten that To the PR and after feedback bring it to the maintainers bring it to community meetings until you get the feedback you need to Get it into a list um Code is not the only thing that we need as far as contributions to the project other ways to help um is evangelism and outreach um one good example is attending and speaking at events just like these Um documentation is really important for usability. We have a very fascinating project that constantly changes We have a really high need for more people to um collaborate with us on documentation Issue triage is important as well And we have about 444 issues open in the repo right now. It would be great to Have more community members and contributing to organizing those issues and setting priorities examples and tutorials that kind of define the happy path for our users Who are testing out the product or serve as a reference for some of our more advanced users Um testing release infrastructure is a little bit of an unsung dog But it's so important and we're going to figure out how in the community we can make sure that testing release contributors are the most important people So I think they're the most important people. Um, and then sharing your use cases trying to slow giving feedback. Um This is an example of a use case that was contributed by a community member to our documentation website But it's just actually one use case that is currently in the documentation or the use case folder on that website We need more. Um, and we're really really looking forward to hearing how you're using people today and how other people Can use it for similar cases Finally, you can organize an event like this. So I want to give a strong Thank you for just bottom for putting all of this together This came together at the last minute and we need many more people bringing people to their local communities across the us and the world Okay, so I'm going to dive, um Really quickly into what So we released quarterly, um, this release process is managed by a community members That has come together in a product management working group for each release and Sometimes for you know multiple releases we define a critical user journey, which is a key experience that we think it's important to develop to support our users and Be more successful at their best After we've agreed on a critical release journey to kind of organize our work We take that to users get their feedback and then based on That feedback we organize and prioritize our issues into really themes that then community members Uh step up to coordinate and throughout the life cycle of the release. We also report on the release progress weekly in our community meetings and also our product management Another really key theme that's kind of blossoming in our community right now is user driven development We just released our first ever two flow user survey and and got a really great response from the community It was run and created entirely by the community A joint effort from josh bottom on elvira, francisco Me and with some advice from our uxr researcher at people We wanted to dig into who why wearing how people are using people today the Yeah, um, we have got 81 responses, which is actually a really really big response for A small survey like this. Um, and we hope that this is a first step towards building a people user Council that will eventually give us programmatic feedback on our cj's and help us set our roadmap and release milestones Just a couple interesting slides. I wanted to share based on results from that survey Is an example of how we break down the users as far as what their primary roles are within their organization Release data and machine learning engineers are the biggest users of people right now We hope to eventually expand the number of data scientists using people as well um, mostly enterprise Users are our industry gaming to flow. It was um a little bit affirming for us We think that people is most useful in enterprise settings. So that was good to see Um, and then finally this is then look at what people are looking to improve what their use of people as far as their existing machine learning process using programs And so I think that this is going this kind of project is going to help us make sure that we're steering people up in the right direction I think it's really important because um, many of the people working on people are not necessarily the target user And so having this program where we programmatically get users feedback is critical to driving our release milestones um, and uh We're gonna have a demo of our 0.5 release that Jeremy will show you guys This is a quick radio of our demo So the idea here is that we've deployed to club and so we're just kind of just started We're just showing you um through the dashboard in gcp all the different services that I ended up getting deployed It's part of your cooper deployment. We're just kind of trying to illustrate the overall You know breath of the platform But the key thing starts right now where you can see what i'm doing is i'm creating a namespace Because what we want is one of the things we're sort of focused on is enabling teams to sort of operate an environment where everybody gets their own Isolated environment in which to one thing And so after I've created the namespace you can see this is our central dashboard Which allows you to sort of navigate between all the different services and from here we've gone We've clicked on the jupyter hub page to actually spawn a notebook And you can see we have this form-based approach where you can easily spawn different notebooks And you can specify the resources In the docker image that you want to use As well as a name right so from here you can spawn multiple notebooks And then you know you just click spawn and now we're going to wait a second for it to you know spin up It takes a little bit of time just because you know you have to pull down the docker image and get that started And so you know while while we're just waiting a second for that to start You know, we'd love to have you at um one of our calls to action today is we'd love to have people You know try out trying share kubpload. So, you know try out kubpload You know go to our web page and then you know get started with it and then share your experience with us on either a slack github or one of our um or the mailing list And then the final piece that's showing up in the in the uh in the demo is that you can see we're actually spinning up a notebook We're we're cloning an example from our repository and in this notebook We're actually defining our model You'll come up in a second and then one of the things we've added to our notebook It's we've added a library called sharing which is going to be in our zero dot five release And this allows you from your notebook to automatically build you know Turn your notebook into a docker container and then spawn that up on kubernetes Either for running the training or for doing the deployment of the model And so we're completely hiding kubernetes from the data scientist in an effort to make it more accessible Um and so with that Okay Yeah, so what we'd ask of you um as you uh depart from scale uh later today Is to try kubpload give us feedback um if you just really run into some major blocker We'd love to hear about that um and then finally we are looking for local meet-up organizers or event organizers or Speakers who are interested in starting up a community here in la if that's something you're interested in Please come and talk to me on lunch or just email me at tidlampkin at google.com and i'd love to work with you Thanks, do you guys have any questions? I mean in your slides then you mentioned the mpi ciah. Is that message passing in the thing? Okay. Thank you Any other questions? So a big fan of this project. Um, is there any, um Future roadmap to corporate more classical machine learning frameworks like second learn Right now seems very deep learning center. Um, yeah, so we're we're making a big push to better support You know, uh, you know python frameworks are already supported In jutsu, we just haven't done a really great job promoting that But like we're staring at some of these other efforts, you know, one of those things that's promoting is our extra boost support And some more traditional Any other questions? Ah, great. Thank you Is, uh, gpu resource management addressed like you before at all? Is that kind of just Yeah, mostly we uh, we rely on kubernetes to Handle resource management including gpu and mostly we just provide, you know sugar on top of that So as an example like in our jupiter notebook Spawner we've made it super easy to attach gpu to your notebook, but you know under the hood if you think kubernetes We're just kind of surfacing it in a data scientist on the way Any other questions? Awesome. Thank you. Um, I'll give it back to josh, I think Oh So, um, why timo is getting set up just by show of hands. It was interesting in the survey We saw more machine learning platform engineers using qplo than data scientists, which was kind of reversed from what we thought This show of hands are any machine learning platform engineers in the room kubernetes Right, what about data scientists? Right, so we have some of both So, uh, timo, uh, you uh getting ready here I'm surprised we don't have more questions now. Come on. There's got to be a couple dumpers for jeremy He's uh, you know done such a good job on this Just kind of what you guys were expecting to see from from the platform. Is that a good introduction? Well, so give me an idea. Why did you show up? It just seemed like an interesting uh track to be on for day one Uh, I wanted to learn more about kubernetes and I'm interested in getting more into ml as well So I'm a bit of a novice But, uh Yeah, seem like a good place to meet like-minded people too That's really it. I mean if you can make a chance to meet four or five people I think what we saw in gallus when we did this is that people started working Together and helping them solve problems and it was a nice kind of uh flow I think we'll go ahead and actually get started my colleague is uh, just grabbing some water back here But uh, I'm timo mechler and this is charles uttered the lawyers just walking up to the stage We're both from maven code and we're gonna talk to you guys today about orchestrating and deploying machine learning platforms At scale with cube flow But before we really get into the meat of our presentation, uh, we're gonna talk just a few minutes about A company and ourselves so charles and I both work for maven code We're a dallas port worth based software and consulting company We started in about 2008 and have worked with a variety of clients Both large and small companies The data space since that point Today our primary our primary Areas of expertise are in providing our clients both deployment and the deploying of scalable ai and ml pipelines and building out that Related infrastructure We also work with our clients to do the actual modeling once we Deploy the data pipelines We more broadly offer cloud services to our clients So we still work with clients today to help bring legacy applications into the into the cloud We use those applications micro services to provide consulting around kubernetes Deploying that managing that and more recently actually with the cube flow and that's where we're here today We also have data security Practice in our company. We're a data company and we take data security very seriously. This is one thing that we offer And then finally we also offer training To both small teams and enterprises on these topics that might be interesting About ourselves. So again, my name is team and i'm a architect and product manager at maven code I've been with the company about one and a half years Prior to joining maven code. I spend a good seven to eight years working in energy commodities trading both as a analyst and a strategist working on trading desk building out scalable Research platforms and developing trade strategies working a lot with traders and then charles Next to me here. It's got well the decade of experience working with many different companies and building scale large scale data software platforms And again, across a variety of different verticals. There's a lot of experience there But uh, that's enough about us. Let's uh, we always like to start kind of with a motivating question and that is you know, we're all here because there's some interest in ai and ml But really why is the time right right now to do machine learning and for us it really has come down to three main things number one You know, it's just the availability availability of data and the amount of data that's being generated everything They seem to generate data We were able to store that data. In fact, I think we'd all be surprised how much data is being generated just in this room on our devices Right now, but it's not just data itself. It's it's also the ability to ability to process that data having these computing resources and you know having uh services on Self-provided that's a cool goal Microsoft AWS You know the ability to be able to spin up a thousand books and machines Is just a few clicks of a button to be able to really process large scale large scale data And the third thing that's been really important is just is the ability to model models all these modeling frameworks that come about in the last two years of the years to grow and mature So it's really the combination, you know of these three things, you know, that's Rp that's really gotten us to Why the time is right right now to do machine learning and we do machine learning at at scale So if we got everything there then okay, things should be pretty straightforward, right? We can anybody can just start on machine learning So we've found through our practice that there's still a lot of perception that machine learning is kind of simple like this You start with some data. Maybe it's pretty pictures of puppies Maybe it's the time series when you sit down for a while You know and pack away to laptop or desktop and write a lot of really cool machine learning codes and outcomes to predict the power of the model well Well, you can do things that way the reality of the matter is that things aren't really quite that simple and just like the presentation before we also saw that You know, right in the actual code is just a really small component of a machine learning pipeline There's a ton of data collection that needs to go on to really be able to get a good model performance and be able to Do machine learning at scale You have data, but if it's good data verify data I'd have to label it and I'd have to transform it and you have to store it Then you have to set up all the infrastructure for that data ingestion to do that If once you actually have your data and you start modeling it becomes about Actually describing the data property you have the right teacher you have to be teacher engineer Once you know once you've done all this There's more yet. Let's say you have a model like me. Okay. Now. I'm ready to put this thing in the production Well, you're not done yet There's serving infrastructure to worry about once you get it into production that the monitor is the model performing Well, it's a bias being developed. You have to be trained You know, then there's the management of all these resources all these machines etc that you have to worry about And finally, you know analysis to it. So really an actual machine learning pipeline You know, it's far more complex than just sitting down one afternoon writing code And we at Maven Code, you know through our consulting engagement over the last decade Particularly over the last few also realized this and so what we've done actually internally is built our own Software platform to make this simple and that software platform is called smart deploy AI And I'm just gonna give a quick plug to this before we get back to two flows But essentially what smart deploy AI does is take away This side of the diagram And help automate it so we build a platform that helps anyone really quickly Bring up scalable AI and ML Data pipelines on in the cloud or on-premise. So really we have an automated platform that allows you to do that And this is this is great if you're in a situation where again, you may not have to dev out expertise But you really want to do AI and ML at scale the smart deploy We can cut the time down to get you running up at scale and the cost down So we think this is going to be a with a wonderful tool As AI and ML get more democratized and small organizations and mid-sized organizations are looking to really get ramped up on this Instead of hiring spending a lot of money and hiring a lot of dev up engineers and multiple data scientists and developers The smart deploy AI Very quickly, so you know drive that home our smart deploy AI platform really makes it seamless To set up scalable and the end AI and ML platforms in the cloud or on-premise We are already using this internally with our client engagement and we're looking for a broader release to the public later this spring One quick last plug about the platform in the future When we do integrate it with two flows And one nice thing that we also offer is intelligent monitoring of that whole data pipeline And we offer you know people that are involved in the whole AI and ML process We are basically able to collaborate. There's a collaboration work flow We offer metrics so that anyone in the data scientists all the way up to C2 can can see what's going on Are we seeing the right return on this product? How much are we spending to be able to see this, etc So that is really a big part of the platform. We're very very excited about And we actually got Certified or got accepted into the google cloud sassage. It's just this past month. We're very very excited about that But enough about our product. Let's go back and talk real briefly about, you know, what does a typical machine learning Production deployment life like to look like for us at Mavencote So, you know, we've worked with a variety of different clients a variety of different industries Energy, oil, gas, telecom, retail and always starts out by having some, you know, domain knowledge You don't understand the industry that you're in. You're trying to model something you don't understand It might not turn out so well. So we start with that and it's all about the data that makes you see you have the right data So if you acquire data, you have to label, transform, store that data to make sure that it's actually ready for modeling Once you've got your data, you know, then we work with the client and say, okay, what's the right kind of model to you? You know, what should we be doing to model this? And then you start modeling once you've got a model, you start training, you start thinking, I'll get my getting the right performance You've got testing, validating, you might go back and say, oh, I don't have enough data And you just start over, collect more data, then try this again, you do the model some more Then after a while, this iterative process, you say, okay, I've got something that I'm ready to put into production You know, and once you're at that point, you know, we've always encouraged our clients to deploy on Kubernetes Because we've worked a lot with, you know, deploying cloud-native applications on the cloud And so Kubernetes provides this common layer for deployment So the whole point really about this is when you have this production life cycle You want to be able to repeat that over and over and over again without having to spend a lot of time, a lot of money to redo it You should just, you know, for example, if you, if you're a company and let's say you have multiple power plants And you're trying to model each plant separately You should be able to use the same life cycle, the same process each one without having to spend six to nine months to get it going So you're really looking for having common deployment platform And it's also important for organizations, if you're the data scientist in the organization As we were in last presentation, you do not want to focus on the deployment details You want to focus on the modeling, the main knowledge, you know, describing the data properly, collecting the data But by being able for us to deploy, I think for Kubernetes, we are able to do things like, you know, model serving, you know, storing performance elevations Evaluation storage, monitoring, versioning and so forth So really we want to use Kubernetes, you know, that comes with the concept of deploying cloud-native applications To build a cloud-native machine learning stack That's what we're trying to do And so the idea is that it doesn't matter what infrastructure you're on, whether you're on Google, Microsoft, Amazon Or you're maybe on-premise, Kubernetes provides that common layer And then by putting Google on top, you know, we're able to do machine learning deployment easily The data scientist can interface through that infrastructure, through a Jupyter notebook And really just has to worry about what modeling framework and process do I want to follow Do I want to be training into the intent to flow, MxNet, PyTorch, or even something else So with that, I'm going to pass it over to my colleague who's going to talk a little bit more In detail about Google Pipelines and how we deploy them Thank you so much Yeah, so one of the things we do, and the project we work on is to like, offer clients to the li-fi tool And then send data all the way to where we can model it, and do some analysis with them So it's a pre-use case to build IPs We have client-generating data from sensors and maybe window plans Or maybe like from the always, or from Tegel Or maybe an old-scale IoT device So we do all that, in this case we're running in the cloud So we run all our analytics on Google Cloud So our typical framework looks like this So one of the things we've done is to leverage on the Google Pipeline Which is one of the great projects under the Google community working on We want to be able to compose this infrastructure And basically be able to deploy and make it very feasible So in this case, we're trying to deploy a model and train a model And push it out to like a device We love all these devices, we run our TensorFlow model, the TensorFlow Lite So all we do is like, we ingest the streaming data coming into the hotspot here It could be tap, but it could be anything else We use data flow, or you're running on, I mean, you can use data pocket as well Or you run a spot, each of our processors to do that And we write it out to a colleague of yours So we want to be able to compose all these steps with this flow And one of the things over the years that we've evolved One of the things we're doing right now is to run our pipeline with these full pipelines We can compose all these steps, steps, small steps, or steps And we have a task state for it in the pop-up here Another task state for bootstrapping the ETL, the data flow Then another task state to write it to like a story location where you can use it to train a model So what's the Q-POP pipeline like? So the Q-POP pipeline basically allows you to bootstrap your infrastructure And run your runtime space So everything is all docket container So you start up that docket container It kicks off, it bootstraps your infrastructure So unless you're connecting to a queue, it's going to launch the connection You need to connect to that queue Then it will execute your runtime code So a typical setup will look like this We have our runtimes, I mean, the components that you can code That bootstrap the infrastructure, then inside that The docket container files that are runtime code that connects to the queue And we've got all the things that we have in there So this is a snapshot for a project setup So the docket file basically wraps up everything And you deploy, and that's a unit of work And that's one of the tasks over here So the whole idea is for us to come back and make all these tasks And create a DAG which is what becomes the Q-POP pipeline So looking back at the pipeline we have on the previous slide It's going to look like this We have the first step will be to like connect to the queue So we'll read up whatever message we have on the queue We do like an ETL process probably to clean up the data set Hello? Oh yeah, sorry about that So the first step will be for us to connect to the queue Read up the message from the queue Then the next step will be for us to like do like an ETL process Maybe you want to clean up the data And we organize the data and things like that Then the next step will be for us to roll it up And write it to a location where we can feed up the data sets For our machine learning training So overall the pipeline looks like this We bootstrap our pipeline The pipeline contains all this stage The data flow itself stage, the roll-up stage The training of the model stage And there's the queue flow pipeline as an annotation For you to like compose all these steps And this annotation helps you to compile And generate an executable Generative compressed file which is an algorithm It's a big contribution from the algorithm team And we can now deploy these and use it to run our model So once we do that, the generated zip file That contains the pipeline information And all the metadata about the pipeline We upload it to our queue flow pipeline So from there you can see the DAG, all the stages So you can, I mean one of the cool things about this Is like you can debug each step of the pipeline So let's say I'm in the same data And something is not coming up properly I can go in and debug that stage Maybe I need to upgrade my HL process Or I need to like concatenate my HL process To some other process as well I can do that So this is a simple like postage process As part of our pipeline that we go to We can run multiple of this in parallel But once you upload the pipeline information You get this nice diagram That illustrates your DAG and everything going on in the pipeline So the cool thing about this is like Like Jeremy and Timo alluded to earlier Like whenever you're building a machine learning pipeline You want to be able to manage the process And not get like entangled with all these other things One of the things, the advantage of building a pipeline this way In the past few months is like You want to try out different things The customer comes in and is like Hey, we have this new data source We have these new events actually coming As part of the event source Want to be like appended to what we're doing So in the past doing this was really a pain So but now we can create a different version Of the container code running as one of the task units And we can quickly test it out So if everything works out good Then we can flip up that version Which is all these things that just Docker image that you concatenate along your pipeline So I can decide to like Create a new version, test it out real quick Maybe reading out the events If everything is okay Then I can just promote it and decommission The old part of the pipeline And this applies to any other part of the pipeline I may just decide to like do some more tests I mean my training of my model I might want to do some more training Create a new version of the pipeline That of the task on that state That does the training And deploy it as a Docker container And once everything works out good Then I can flip it over So the other thing The advantages of doing this as well Is like I can run multiple experiments So normally whenever you're doing machine learning You want to like test things out You want to change your hyperparameters And see how things get So with this we can run multiple experiments concurrently By changing all the hyperparameters Running it for the pipeline And see now I mean checking out everything So with that I can have different versions of my model Do quickly be testing to see how things is going So to that The advantages of people pipeline for us Is mainly allowing us to like scale And doing this training It's a lot more easier than before When we first got started Where we have to bootstrap all these things By writing shells to do all that Develop and teach things together We can do faster The iteration is a lot more faster right now So we can easily like train multiple models at once See the results and see how everything is Giving And overall our productivity has increased As a result of this That I'm solving the model The other part I didn't show in this dive I mean the solving path So with this we can solve our model And see how the model is giving In production and see We can do quickly be testing To see which model is better And based on that we can decide to act out The model or promote our decommission The FATLA model And the color cool thing is like Because all these things are doctorized And they're containers And each stage is part of the pipeline There's a lot of other All these doctor containers out there That we leverage on A lot of them are pre-installed We take like a TFMA Tense off flow model analysis Where you can do real-time analysis Of what's going on on your model And see how the two models are giving Tense off flow transformation Framework which is part of our patched beam We use that a lot for all our data flow ETL process So we can leverage on all these pre-built models Docker containers That contains all the things that we want to do And we can control when we decide to like Adopt if FATLA release or change Because all these things are in the big blocks So one of the things we notice is like If you're using a patched beam And there's a slight bump up in the version of it And we're not aware of that It can derail a lot of things that we're doing So we want to be able to like Control our own internal environments Whenever we're building all these things So with this We have a lot more control In terms of how we deploy things And how we accept things from the combination To our internal development ecosystem And overall it makes things a lot more possible Because Kubernetes is a common baseline For all these infrastructures Where it comes all team companies So we want to be able to like build things That we can repeat across from all the customers So if you're running an Amazon On Google or Azure or on-prem As long as you're running Kubernetes as a base I think we can easily help you get things going On your team and overall for us is the efficiency And from our team in terms of like Getting things out and getting things going For everybody So I'm going to hand it over back to Timo If you have any questions We'll be able to answer you as well Yeah just a quick thank you to everybody Google in particular for giving us a bunch of Credits to burn on Google Cloud To get all this up and working Conference organizers here And especially to Josh for putting all the effort in And then the Q4 team for working with us And then finally just yeah If it's just a little bit about us now To get in contact with us again We're a consulting company If you're looking to get spun up And build a data pipeline in the cloud Or if you're interested in learning more About our smart deploy AI platform Come talk to us afterwards We'd be happy to discuss that So with that I'll open up for any questions In the audience Yes After looking at some of the slides On Google and Azure Does it work like the Amazon AWS SageMaker? The question was like Does this work like the Amazon AWS SageMaker? To some extent, yes But I think about SageMaker is like You're locked into Amazon So and if you're trying to be cloud native If you don't want to be locked Down to a patch cloud vendor Because the GPU prices varies It's a lot more cheaper with Google To like run all these kind of workloads And the other thing is like You can run these as a managed service So what I mean by that is like on Google You have things like Dataflow That you don't have to worry about Your ETL process Bootstrapping the server and everything like that Or BigQuery And all those kind of services With that you're not locked in You can decide to like Move back to Amazon And do things in Amazon as well But the whole idea is For you not to be locked into a patch cloud Infrastructure providers in this case Do your customers tend to reuse The same infrastructure For the training portion And for the serving portion Or do they tend to have separate Whether it's separate physical infrastructure Or separate clusters For those two purposes So to that question In most cases yes Because they're running entirely on Kubernetes So you can have a 7 On a particular Pod running in your Kubernetes And you can have it Training the model on the infrastructure as well And if you're using Managed services on Google You can use things like Cloud ML In that case it runs on a different infrastructure And for like If you're building like IoT devices All these edge devices You can actually push your model To that device After training them Any more questions? Questions? I think that's it Yeah, thanks a lot for having us Can I hear you? Oh, you can hear me Croquet So Yes, I think we are So, hello Thank you all for being here My name is Vangelis Koukis I'm the CTO of Aricto I'm here to talk to you About advanced data management With Koukuo Okay So just asked The very first thing he did Right? What's our motivation in this? Why are we working with Koukuo? Well Exactly as Jeremy mentioned In the first presentation Where will machine learning run on? Machine learning will run on Kubernetes Kubernetes is becoming the de facto standard For deploying applications across clouds On-prem in the cloud The single declarative language Of explaining what kind of workloads you want to run Okay And then you're interested in ML workloads Right? Okay What do you need to run ML workloads? Do you need all these things? You don't just need the money You need your data sets and your notebooks That's where you work with Okay How many of you use notebooks? Lots, I guess Can I have a show of hands? How many of you use notebooks? Lots of people use notebooks Okay And then you need to run training And maybe distributed training in parallel And then you need pipelines A single declarative way Reproducibly to explain what you want to run in sequence Okay And then you need to track your experiments And do hyperparameter training And serve and deploy introduction And you need to monitor this Okay Are you going to build this yourselves? You'll use Koukuo for this That's what Koukuo does Koukuo containerizes all these things On Kubernetes exclusively So you can go on without doing the actual hard ML work Instead of having to worry about How do I run my models on Kubernetes? Okay So having seen this And having seen people struggle with how to get the workflows And Koukuo is a composable reproducible way of doing this What do you need next? So you need all of these components This is a full ML workflow Okay And Koukuo's mission is to make Super easy for everyone to develop this kind of workflow And deploy it anywhere and manage it Across platforms Across locations Okay So if I have these components I'm done, right? I'm good to go Well, no you're not There's something missing Koukuo gives you a reproducible Composable way of declaring what you want to run Are you okay? There's this tiny bit missing Which is you need your data to run You need to track your data You need to know what data you worked on To produce the results you produced Oh, I have this great model It works It's got 99.99% accuracy Okay Can my colleague reproduce my results when they train? No, why? Because I used this specific data set Which no longer exists Or because someone cleaned the data set But the way they cleaned it Which was supposed to include accuracy Well, it actually hurt accuracy You must have seen this, right? How can I go back and have full lineage for my model? I need to have a reproducible way of explaining How this model came to me I need to track my metadata and my data So what do people do today? That's what we can talk about And why are we in Koukuo? To show an end to end data plus code way of working And make this work regularly for everybody So this is a figure from the TFX paper by Google TFX is Google's internal ML platform, right? And all of these components have been open sourced now by Google These are libraries to do specific steps in an ML pipeline Okay, so Koukuo The paper assumes there's a shared configuration framework And job orchestration framework In the case of Koukuo, this orchestration framework is Kubernetes Kubernetes will orchestrate your workflow as pods, right? And then Koukuo covers this It containerizes all of these and more components To give you a very easy declarative way of saying I want to run a parallel job And then Koukuo takes care of this and integrates it to Frontend For essentially logging into the cluster and using it And our contribution to Koukuo is We are major contributors to the notebooks and UI Frontends for Koukuo We start from the user How can the user actually log in and use the infrastructure? They start from notebook, so we are contributing to this So this is the one end of the storage And then the other end is these two layers at the bottom Shared between this for garbage collection, data access controls And pipeline storage in general What is this going to do for Koukuo deployments? This is where our software comes in So we give you software to manage your data As your pipeline runs We are the mount point that you see We are the file system you store stuff in And you can then snapshot this file system And reproduce it and clone it and share it with your colleagues So you can have a reproducible full workload So why are we in Koukuo? To tell an end-to-end user-driven story One end is how do people log in to the platform And explain what they want to do And the other end is what happens with their data at the lowest level Traditionally, we have been data management, storage people But Koukuo has been such a vibrant user-driven community So that's why we are part of this So we'll talk about data in this presentation And please feel free to think about how you use your data And ask questions according to your application So our mission is PlanetScale Data Management Manage your data across platforms, across locations globally We give you software to snapshot all of your file system Version it with a specific version Package it Distribute it In a peer-to-peer way with your colleagues at first location Share it from a single pane of glass So you can have access control lists Who sees what And you can do it across teams and locations On-prem, on your own deployments or on the cloud So how does this fit into the Koukuo pipeline story That Maven code also very nicely explained in a previous presentation You have a pipeline, it has three steps How do people actually make the steps for today? What they usually do is They have all of the data in an external data lake And they write specific code to access the data lake So I know my data is on S3 So I import S3 client I import my tokens And I go to the specific buckets and find my data That's what you do Okay What happens if this data changes? I don't know What data did my pipeline use when it ran? I don't know Can I run a pipeline on Google Cloud and keep my data on S3? I can, but then performance suffers Can I develop on-prem and then run my pipeline on the cloud Where's my data? So the traditional approach is import S3 client Go to the lake, bring data locally Because that's the only way to actually have bearable performance Crunch it, then store it back to the data lake Lather, rinse, repeat for all steps That was the right way, sorry Okay Is there any better way? Yes So we are your data management layer You mount us, you work on a disk that you see Slash data, data one, data two You can clone this disk from an existing snapshot So you can have ready made versions of your data to start from I want to start from this dataset I declare it, I access it via slash data And I see one terabyte of data This is a local disk on your local cluster No matter where you are And then when the step runs What do you do with this data? Do you allow the next step to just go touch it? Well, you can and that's what you do You don't need to transfer the data just there But before you allow the next step to mess with the data What do you do? You snapshot it Why? So you can go back in time and see exactly what the step produced So you have a declarative reproducible immutable pipeline Everything is immutable You can go back in time and see exactly what happened So the next step starts from a clone of whatever the previous step produced And then the next step starts from a clone of whatever the previous step produced This is our vision for advanced data management in Kubeflow And then by having every Kubeflow component work with persistent volumes You can use this across Kubeflow So our mission is to make sure every single Kubeflow component Uses Kubernetes persistent volumes in a vendor agnostic way And then we know that our technology integrates As one more storage vendor in Kubernetes And you can do all these things This is the end-to-end story Make sense? So why is this important? Because underneath we use a local object store to store everything But the users never see this So we maintain harsh chunks of your data on a local object store If you're not working on Amazon, we use S3 If you're working on print, we can use a standard NFS share So then because you have a way of running your pipeline What if you want to run a pipeline in a hybrid way Both on-prem and on the cloud Because every step gets its input as a snapshot of the previous step Running a pipeline in two locations becomes running a pipeline A meta pipeline of two steps One step one, run a pipeline on-prem Step two, run a pipeline in the cloud How do I connect these two steps? I snapshot the output of step one and give it a simple to step two So by having this kind of flexible data management You can solve the how do I run a hybrid pipeline problem You just take the output of step one and feed it to step two And then if a pipeline produced an exciting result How do I reproduce it? I take the code of the pipeline Which I have committed in Git somewhere And it's a Python declaration As maybe code demoed previously I feed it the exact same input Somewhere else even in another cloud One cloud that may be faster, may have more GPUs May be easier for me to use And I expect the exact same results So this is it By being able to handle your data sets The same way you handle the code Essentially data commits You can reproduce the pipeline in another location We see it underneath your volumes And we synchronize on stage We synchronize data among locations and appear to be away So this may sound interesting, right? And you may want to try it out Okay, so what are next steps? I'll come back to this So how do we modify the model? There's usually three stages in building an ML workflow But you know better and you can correct it First step is how do I develop something? Then how do I train? Then how do I deploy? And people would like to develop locally on their laptops That's what's easiest for me to develop But I need to train at scale So here's a big cloud like Google's And use lots of GPU enabled instances And then I need to deploy in thousands of locations Because I do autonomous cars And I need to run my models Where's my data? I keep all of my data in a single data layer And this suffers? For the reasons I explained previously What's our approach to this? Instead of having a gigantic reservoir of data You can see the cloud in your pipeline Is the same in the cloud And it gives the same APIs And you can see your data when you make it This is the world of technical literature There's good flow orchestrating Instructing components to orchestrate your workloads Across all of these issues The workloads use the same declarative Kubernetes-based language Do that We make Kubernetes data well That is data welling We can drink it called to Kubeflow So it always uses persistent volumes It's not Excuse me It's not a Recto-specific code We contribute code to Kubeflow To always use persistent volumes But when Kubernetes Persistent volumes is this PVC world over there Okay But then Kubernetes has a sluggable storage driver interface They call the CSI the container storage interface We are an implementation of the container storage interface We use the container storage interface To orchestrate your local storage Wherever you see that One of the major hurdles in getting started with Kubeflow Is how do I install it? So we have taken the latest Kubeflow release 0.4 And we're now working to package the virtual machine image Please be here if you want to know more about it Where we'll be demoing how this works and to end So that's the overall idea You have a very simple way Of spinning up a local Kubeflow cluster In like minutes If you'll meet the original download part This is it If you want to ask any questions I don't know Am I out of time? No, I'm not Okay, so the question was Do we synchronize data on demand When the data is requested Or do we do it continuously? Right? So the difference in our approach Compared to a traditional active passive primary secondary Replication approach Is that we don't replicate data We do not store your data We do not replicate your data What we do is We see it by your data By your primary storage on the side We snapshot it periodically By extracting what has changed In a differential way So it's super efficient And then we give you a way To essentially declare the data you're interested in Someone publishes datasets As he publishes a bucket Makes it accessible And then other people subscribe to this bucket So by having this kind of publish And subscribe semantics You can declare the kind of data you're interested in And this brings you in the swarm It's an actual swarm Like a torrent swarm Of people interested in this data So you then start to synchronize And we do it in a way that's actually authenticated And private So it's not just public exchange of information It's you have to prove to others That you have That you're allowed to be part of this swarm And you take tokens And you're establishing quick links There's just a technical stuff But the other idea is There's full access control lists Of who shares what with whom Can you repeat the question? Sorry, I meant to Ah, okay, okay Yeah, I get the question No, we won't work with your data If it's just directly on the object store We use the object store for our purposes We maintain our format in the stored data So we can then synchronize it this way So essentially There's got to be a first ingress step You continue using your data lake Any way you like We don't touch it But the very first step in your pipeline Is ingressing into the pipeline Into us The subset of data that this pipeline Is going to work with And then we give you a very efficient way Of snapshotting it And this is what we'll be demoing On many kubflow in the afternoon How do you interact with the pbc And pb api and kubernetes? Is it you create a new class Or is it kind of hidden behind the scenes Of reading from the kubernetes api So the question Oh, you hear the question The question was how do we integrate With kubflow and how do we interact With kubernetes And how do we interact with kubernetes A creation of pbcs So we are a csi plugin kubernetes says I want a new persistent volume of size 100 gigabytes We annotate the volume And give it extra information from where to clone So if you pass us an annotation that says I want you to clone from this data set When we create the pbc We create the pbc And we instruct Your primary storage to create the pbc And then hydrate it We do all these things But kubernetes doesn't know about this Then you just mount your pbc So you don't see us directly You see us as a csi plugin And we extend kubflow to pass this kind of annotations To lower layers Does this answer your question Ah, okay So yeah, yeah, I have the exact answer to this And we can demo it later Okay The notebook ui When you request a notebook We'll request a pbc for you And we will dynamically provision the pbc Underneath And the pbc will be bound to the pbc And you as a data scientist Doesn't have to know anything about this And this is important Because if you get data scientists worried About pbc's and pb's We are not in a good place I think that's the general idea, right We don't want people worrying about pbc's And pb's and kubernetes and paths We want them to see a uniform Notebook-centric interface In kubflow Any other question? Am I out of time? I think I am Okay Thank you Yeah, we need two mics So one eye hole and then one We're exploring kubflow This is Sumi I'm the founder and CEO of Dreamjob It's a LinkedIn for tech talent Tech professionals like yourself I'll be moderating the panel And I have the local experts Machine learning experts From the LA region They're here to share their experiences About why they're exploring the kubflow So why we will kick start By having each one Interact The had been mentioning airflow at one point I didn't look at that too deeply I know that's one competitor for I guess the space in the DAG aspect of it But when I saw kubflow And I talked to Josh It seemed like they were really Hitting a lot of the points just right out of the box I didn't really Think I'm going to do a shootout between More than one product A lot of what our needs are based around Is the openness, the ability to Tinker with any part of it There's no real sort of lock in To any format or whole stack at once We are a polyglot company We have a lot of different kinds of Code that goes in So for us being able to drive Through something that's from the onset Open and embracing different stacks From TensorFlow and MX and PyTorch And whatever you have Using the Juberhub as a basis to Have data scientists create and deploy Their code and build their models From just a notebook I think it was a great approach To integrate right into it So that just attracted us right away To try it out Let's see I think some of the drivers for me Was a little bit strange I didn't have a whole lot of alternatives I just sort of had to make my own alternatives For deployment issues and things of that nature But one of the things that really drew me To Kubeflow was the kind of unique situation Where I was in Where I was literally kind of teaching people How to do what I did In different contexts So being able to just teach people Kubeflow and how to use it Was just a huge help Because it gave me a formal Way of going about things Like a repeatable process That could just hand to each person That I was doing advising for And then personally It just made it a lot easier I didn't have to worry about the mental overhead Yeah, I think the big driver Was probably just ease of use And just the cost effectiveness Of it I mean, I used to be really big on using Amazon But I got invited really early To use the Google Cloud And back when it was in beta It wasn't quite as fleshed out back then Didn't have all the technical documentation So I ended up depending more on AWS But as time went on I found that Google Cloud just kept Providing superior options And then Kubeflow showed up And just made my life so much easier So going back to your saying of how and when Right? So that's the biggest question So yes, Kubeflow is going to save a lot of time A lot of resources It's going to make life very easy But then how and when That's what you ask So what would be Like what would be the things that would push the needle for you For in your project to say, yes, Kubeflow I can start using it What would be those? Is it like still is it too early Or you want more better Like you know Tutorials or more use cases That are aligned to your business needs Or what is that that would push the needle for you And I'm sure like that's the same for everybody in the audience Who is looking Yeah, again, so we have So first what kind of models are we running So we're running object detectors So we have primarily this security vertical That we're going after We also have this healthcare vertical So a storage unit complex Doesn't really care about pose estimation You know where you get like the skeletal points on a person But a hospital would So we end up having to support a large number of models So we don't have a large team That's tinkering with all these knobs at the same time That's kind of not our problem It's that we have so many of these analytics That we, you know This is another conversation But there are black boxes to us That we need to enable as a feature set in the product And it needs to be out yesterday And so for us internally Managing all those different models It's, you know, you know Derek found, you know Some model that some kid from India did And it's amazing Okay, like let's pop it in Let's retrain it Let's get it out So that kind of workflow Is amazing right up to the point For us of deployment So Kubeflow is running internally We have, you know A little main data center in our office And so it's running there We're doing the training there We foolishly bought a big machine That does all our training And so the biggest driver for me Or the biggest I think push to like Let's get this full flow working in production Is the edge computing component So not all of our customers can update at the same time Not all those devices are connected To the internet at the same time Those connections to the internet I don't know if you've ever tried to get fiber in Kentucky But it's pretty difficult So those kinds of problems are I don't know if it's under the purview Of Kubeflow to do this But at the moment that's our biggest pain point Is like how do we deploy this thing When do we deploy this thing How do we make sure that that model got down there What version of the model are we running Because for example we try and train I mean per camera model The ideal scenario in edge computing is okay So in our world we have a camera We have a local edge device Where we're doing a compute That's, you know And it's on the customer's network And then we have, you know We go out to the cloud when we need to For them to go view the camera As a metadata or whatever Just visualize it So if we have a model running on the camera Or some of these cameras are smart Beyond, you know, light motion detection Like they have decent chips on it And, you know, it seems like that's Where the world is moving So how do we manage that model How do we manage the training of that model there By the way Because if we took all of these models And actually had to go train them The amount of time it would take And the compute it would take in the cloud Would cost us too much So if we had If we would break our business model In other words To go out to the cloud and do all this stuff Not just in the training But also in the inference as well So managing all of these different IoT devices Being able to identify them So this is a camera This is an Edge device This Edge device is small And has X and Y capabilities So we can run OpenVINO on an Intel i-series But we can't run, I don't know Some of these like GPU-enabled models I know that the Intel one Can work on GPU I'm just saying If we don't have that there Let's just say we have an i3 I can only run this subset of models Defining those and making sure That the right model gets down To the right Edge device Because if I deploy Let's say I do some kind of federated learning For a small retail store They have 8 cameras all different positions So we can kind of use the data From all of those 8 cameras And that would be an amazing model Just customize to them The detections are going to be great False alarm rates are going to be really low They're going to be really happy with this They didn't even have to buy new cameras These things were installed in the 80s And they were great But if I deploy somebody else's model To that on accident Or the training doesn't go well And I don't know about it And there's downtime there Especially with the security use case Like that's just untenable So it's really all about the edge computing That makes me really nervous About getting that system up in production Because if I have all these edge notes Which you could do You could run this kind of system Where you have the masternodes out in the cloud And then each edge thing is kind of communicating back home And we kind of do that already But to rely on Kubeflow at this time For that piece I think would be unwise However, I don't think too That there's a lot of business value In us creating this kind of DAG pipeline So if one of our teams came up to us And said, hey, I built essentially Kubeflow I'd be pissed Why would you do that? There's no value here to this I mean, it's almost like At least for us With some of these commodity models Like these object detectors Like we're not in the business Of creating novel model architectures Like we're not going to win there We're too small We don't have enough capital Like we're going to lose that game So the quicker we can get These kind of state-of-the-art Or latest state-of-the-arts Into a pipeline to be retrained And redeployed the better And Kubeflow like I hope That that's what this project becomes And would love to see that And help with that as well Sorry, that was a long one We're also constrained by a small team And a lot of projects But we look for opportunities To tie together a lot of the pieces That are waiting for time Development cycles Right now we have sort of a backlog Waiting for Istio to get deployed On our clusters Kubeflow is built on Istio And that's a great opportunity To start putting that into production too And getting some of the backlogs In data science activated as well So that's one of the pieces I'm thinking of now And it becomes kind of a DevOps concern too We need to be able to spin up multiple clusters So we can test our other applications That may or may not need to be rebuilt On the networking layer But that's sort of where we are now We're kind of ready We're sort of wrapping up a round of deployments And we'll probably end up concentrating On this conversion with Istio And using Kubeflow at the same time Since there will be And yeah, I would say we're pretty much ready In about a week or two I would love to see what you were saying Touching on earlier more tutorials And particularly aimed at how to Address what he was talking about About embedded systems And machine learning embedded systems If you put out more material like that That would make my life a lot easier Because then part of my job is also Instructing others and teaching them how to Replace all of us So that they can go on to do the same thing I think they were talking about Like, you know, a lot of meetups And local communities Coming up and if you guys want to join And take leadership from them That's a good place also to learn So now, opening up for I'm opening up for the Q&A And so whoever has questions can stand up Or, and then I bring the mic to you Oh, sorry Can we get a pity question? Thank you And please give a big applause for the panelists Asked myself a question Like, are you amazing? Yes, I am Thank you Thank you for sparing my self-esteem I'm curious about the edge computing ML models How much of the training do you do on the devices? And how much is it like you train elsewhere And then ship the model to devices? What would you say is the trade-off or is it like completely Oh, we're training on devices So this is going to be embarrassing You guys are going to know how bad this is But this is where we're at All right, so everybody gets the same basic model And then over about a week we go And if we have the bandwidth Which we always require for the first week We say like the system's training We'll actually go out and talk to a bigger system So we'll go out to the cloud And say like, hey, you know, something nice Like yellow or whatever the case may be What is this? So most of our IPs are on like this kind of like Lightweight algorithm that doesn't require a GPU Again, this was somewhat foolish Given where we're at today But four or five years ago this wasn't clear And wasn't available So, I mean, we did all this work that like, you know Cube close doing now and look at it And think like, oh man, not only do these guys do it They did it like way better So everybody gets a basic model And then when we train on the CPU It's like we're maxing out each of these boxes Because everybody wants to spend less money So how many cameras do you have? That's the kind of box you got to buy You want to add more cameras or more compute Like you got to buy a different box or a bigger box That's kind of how it is So I don't have a lot of room So I can't actually do inference And keep all the cameras up and train at the same time So everybody gets a basic model Like you pull down your kind of basic model And at this point we have better ones So it's like, okay, this is an interior model This is an exterior model This is an interior of like a data center Okay, this is an interior of a storage unit So they'll get that kind of basic model And what we're working towards And what my end goal would be Is every camera gets their own They get their own model That would be the ultimate We would get a lot less false alarms And that's coming And it's just difficult to do I mean, I think sometimes we get so I look fondly at cloud computing Because I don't often have a cloud And I think that it's really easy to get kind of lazy About the resources that you're using But yeah, perhaps that's another discussion But yeah, edge computing is harder Yeah I also do edge computing I, a while back I did a little bit of research And I found this weird obscure Russian research paper About what they called binary Or bitwise neural networks It was like very fine A method for very small machine learning models Very low energy And that's what we use for our cognitive radios For spectrum management So it's very, like what you want with cameras We kind of have now with these radios Yes, we do use specialized hardware So I mean, sorry Well, I'm the last speaker Separating you from lunch So I'll try and be quick and diligent There's been a few questions around The background for people here Whether you're a machine learning engineer Data scientist How many here are excited about Kubernetes Deploy to Kubernetes Are interested in CRDs Great So this will be a little bit sort of separate From the previous conversations The previous speakers have done a great job Talking about Kubeflow Why to get involved Why it exists I'll re-emphasize some of those messages But then I'll also dive into some of the Things that Kubeflow uses To actually build and develop and deploy As an application to be Kubernetes native All right So Kubeflow is, as Jeremy mentioned Composable, portable, scalable This is a scale conference But it's also Kubernetes native And we'll dive into that a little bit Extensible And open And I think each one of these three things Hopefully will give you further motivation To get involved in the community As well as it'll make you a better Partner if you're in a relationship Or if you're single It'll give you better dating mojo Guaranteed Just seeing if people are paying attention All right So you saw this Jeremy mentioned this already People get involved in AI, machine learning I think they're going to spend all the time On the code But in reality Where they spend their time Is all this other stuff to make it happen And that, in part of that article there The hidden technical debt So you do this work And you have to do all this other supporting work To sort of make it a reality Which Why is that a problem? Well You could lose a lot of money So in that same article I talked about one organization That lost half a billion dollars In 45 minutes Holy crap That's pretty bad And so they said as part of that It was dead experimental code paths Well how hard could it be To find dead experimental code paths Well If On that previous slide And you saw your code and all that other stuff If it looked a little bit like this You know try and find the Dead non-working cable In this mess right In reality what you want is something like that And that's where Qplo comes in Is it helps you build systems And scalable systems That look a little bit more like this You could argue that you still couldn't find The dead cable path in this But at least it would be maybe easier And less Onerous to actually read through All of the different cables in there So that's kind of part of the motivation Back to these three simple messages Qplo is Kubernetes native Extensible and open You'll see that a couple more times So what does it mean to be Kubernetes native So here's a general architecture for Kubernetes You guys have probably seen this a thousand times You've got a master node Worker nodes A bunch of things in there That'll schedule your containers You have an OCI repo Or your Docker repo That'll serve up the images I haven't put in here the fact that You know somebody's actually built some code And had a Docker file And created that container Thanks for that And put it into the repo But that's kind of a general architecture To the level set a little bit What is Kubernetes So in here you can see You can probably read faster than I can talk about But you've got nodes And they run pods Pods created and managed by replica sets You can do some of these things Without like replica sets Without deployments But in a second You'll see how those things Help facilitate Help manage your workloads And help make the running containers And keeping containers running happy So that's some of the terms That I might use throughout And eventually we'll talk about APIs What they are Why they're important And then get into some CRDs Which is Custom resource definitions Which Kubeflow And other Kubernetes Native applications Make kind of a use of To re-emphasize a little bit more On Kubernetes APIs You can sort of double click On each one of these And sort of have a pretty good conversation Around what does it mean to be declarative As opposed to imperative What does it mean to be asynchronous Level driven, observable Etc, etc, etc I'll dive into the declarative one And the extensible one And again, this is just sort of Re-emphasize This transformation That you're seeing within the community Software development Application development Corporate applications Going from whether it's monolith To microservices But essentially going into the cloud Native world And quite possibly Presumably the Kubernetes Native world And how you can take advantage Of Kubernetes to help accelerate Your application development Help you get involved in Kubeflow And understand how Kubeflow is created All right So let's go into a deployment example And see what happens So you have a YAML file And you'll see a lot of YAML files As part of Kubeflow You may or may not get involved With these YAML files But it'll be good for you To sort of embrace it Love it Understand what they are The first two lines there Is just sort of the resource metadata The next part there We're actually a specific metadata Is that object metadata And then you get into a spec And you'll see that a little bit later I'll show you an example Of a TensorFlow job spec Similar but different And so what happens Is when you say sort of Kube-cuddle-apply To that YAML file First what'll happen Is it'll discover the endpoints For the API server It'll do an HTTP post And then magically I put question mark there Because magically All these pods Just sort of appear On worker notes Somewhere in the world Now how does that happen Well here's a little bit more Of a detailed view On what happens So you hit the apply button On deploy It hits the API services Create That will send a watcher event To the controller The deployment controller What it'll do Is it'll sort of say Hey, I'm going to go Create a replica set And I'm just going to click through the rest of this Because what you'll see is that Each of these different controllers Are looking for certain events to happen And they'll get notified of an event And when that event happens They can do whatever they want In this particular scenario They're creating additional resources Which will then cause additional work to happen So there we'll see the replica set Controller doing its job Pod scheduler doing its job The node doing its job And those last two there The update pods That's actually updating the status Of a pod resource That is being maintained by Kubernetes So that's the end of that discussion there Where I was trying to tease out is You know what happens With a YAML file or resource definition Of what happens with controllers Now we're just going to go Fast forward into a TF job CRD Can you guys read that one on the left? Awesome So that's another four point font But that's to give you an example Of defining a TensorFlow job And that one that you can submit Is a bunch of arguments in there I'm not going to go over all of them If I went over No I can't do that easily can I? I was going to show our wonderful documentation And that's what I wrote there here So here you can see that this is up on our site qfo.org and docs You'll see TensorFlow training What is a TF job This is that in better print You can see how you can pass in arguments And by defining this YAML file You can use kubectl to then send this in And have all that training happening Have all that magic happen Again I'm highlighting part of this Not only to show how kubectl works But if you have other corporate applications That you're in the process of transforming Into Kubernetes Then kubectl provides a really really good example Of how to do some Kubernetes Native stuff and some best practices To take advantage of what it has to offer as a platform And then this will go into all of the detail here So you haven't I don't think we've actually shown this This website yet But please go to kubectl.org Lots of great documentation in there And again this is specifically how TF job works And as Jeremy showed in one of his slides There's a lot of boxes in the architecture There's a lot of components that are part of kubectl To none of the time here today To go into each one of those But I'll go back to my presentation Here Okay so why is this important So here's your general training process You've seen some examples of this already Defining the model, training it Having a trained model, deploying it Whether it's a private cloud, public cloud All that kind of good stuff Remember back to seeing all those crazy cables In your data center So you can start on a workstation So Kubeflow and you have CPU, GPU storage, etc But now imagine that You have a single cluster But multiple deployments of Kubeflow It could be per namespace So imagine each one of these belongs to an individual user And you're doing shared resources across the bottom Now take that out to Imagine you have not only a single cluster But you then have multiple clusters Multiple Kubernetes clusters Either on-prem, VMware, bare metal Up in the public cloud And you want to define And you want to deploy quite possibly Not only this many Kubeflow deployments But also your corporate applications Leveraging the exact same hardware So if you want to think about Resource utilization, CPU utilization, etc etc Doing not only your artificial intelligence On that same hardware But then also your corporate applications On that same hardware Then that's when you start to see That understanding the power of Kubeflow But then also being able to use Kubernetes And define CRDs To define your own resources And interact with that with an API I know that earlier There was a lot of DevOps A lot of hands went up So you can search this tree of information This tree of resources quite easily Once you understand what that looks like And how to start surfing these clusters All right So you saw some of these components already I won't go into some of these details There's more and more being added all the time I actually don't really think I put up here ModelDB ModelDB has been there for quite some time It's from a community out there at MIT Quite interesting It's not a CRD It's a service But it's an example of this community Growing community How you can easily add your own component And then to integrate with that component Within your machine learning jobs Just sort of a simple sort of Addition or modification of some of your code To then integrate with some of these services A lot of key components These are on Jeremy's slides as well So I won't go into the details I'll skip that slide And then yeah It's extensible So add your own Jupyter containers Your own controllers Your own pipeline components You build your own standard pipelines Come up with your own innovation So there's lots of ways that you can get involved And you can extend Kubeflow And make it available to your organization And particularly if you're a platform engineer Responsible for providing a platform to your organization These are some of the areas where you can play in And lastly I flew in from Australia this morning So that's why it might be a little sleepy But that's what open roads look like in Australia I don't see any dead kangaroos But a lot of times you'll see dead kangaroos as well It's a different story But the community is very open So if you haven't gotten involved yet Please get involved If you are an innovator There's lots of different platforms you can use to innovate I work for Canonical and Ubuntu And it's really easy to get going on Ubuntu and Ubuntu platform We have a lot of tools You get you going That's it Sure Sure, sure So CDK, Canonical Distribution of Kubernetes Or Charmed Distribution of Kubernetes There will be a charmed version of Kubeflow You'll see an example of that later today Any other questions? All right Be back at 130 Just on the things that we've done in the community Rather than just the problem of how to work Okay, okay, gotcha Basically you can talk about how we win So that people in the world do a different company They can do the same thing Okay, I think we're going to take a couple more minutes To let people filter in back from lunch But then we're going to hear from Davo Dutta from Cisco And I'll introduce him when we're ready to get started Thanks Welcome back everybody Or hello for the first time if you're just getting here I think Josh Bodum who is our event organizer today May have gotten stuck at lunch As I'm sure some of our attendees have as well But I'm really excited to hear from Davo Dutta Who is one of our Oh, there you are Hi, well I'm just going to present anyway He's one of our core community members in Kubeflow And he works at Cisco He is here not to talk about how Cisco Handles machine learning But instead how they've engaged with the Kubeflow community And how they've helped us build their project To what it is today So thanks Davo Hello, can you hear me? Yeah, so I've been an engineer at Cisco I'm originally from, I spent a lot of years at Southern Cal And the USC And so today what we're not going to We are not going to talk about Cisco And what Cisco does with AI But just wanted to let you know That we have a diverse portfolio of products And all of these products use AI in some form or the other We have our own scars and pain points Of, you know, the AI apps, the data scientists We have all of the above that we've talked about today But we are not going to talk about the Cisco stuff I just wanted to And I'm sure there are many people over here Who are in, from other enterprises They're thinking about similar problems So I just want to talk about why we chose Kubeflow And what was the rationale like exactly one year ago So a year ago When we were trying to build our own machine learning pipeline In fact, my team We were trying to build our machine learning pipeline And then we noticed Kubeflow I talked to Jeremy And we said, okay, you know, why We went back to the team and I said We should actually just scan our own project And just contribute And start building with the community It might take a little longer time Or it may not even take a little longer time We don't know But when we saw the right kind of people in the community So our metric was Who are the people in the community? We saw Canonical, Red Hat, Google They were all in the community So I said, okay, that's a good community It's going to start and it's going to take off So that's what we And so Right now Kubeflow is our pure open source effort We do not do anything Internally Everything is from day zero External Facing And so that's why I just wanted to talk about How did we choose what to contribute in And So that many of you Who will go back to your own enterprises Can think about What your needs are And how you can help Either Use Kubeflow Or you can probably leverage Kubeflow And then contribute back to Kubeflow In many ways So there are multiple ways you can Look at our journey and figure out What works best for you So we saw When we looked at all our products And we talked to a lot of our customers So I personally talked to a lot of our customers Everybody has the same pain point They are not Completely cloud focused Or completely on-prem focused In fact They typically deploy their production workloads Across multiple clouds And they have all kinds of You know pipeline I mean the life cycle challenges And You can recognize At least with most of them So So the biggest problem is That when you Try to manage the entire life cycle And you try to manage the cost Of running the entire Life cycle across many clouds The different tool chains The different cloud providers have And the on-premise software has I mean it can be a bottleneck So what we felt is Kubeflow is the right platform Because Directionally It abstracts all the Machine learning infrastructure From the user Any user And we'll talk about And you know there are different personas And That's when You can make rapid progress So if you want to accelerate Your teams Your customers Your partners Whoever In machine learning In the long run It's not just about data science It's not about DevOps It's about the entire life cycle Please do consider Kubeflow And contribute back So there is one use case When people say that Oh but Cloud provider X already has a cool ML tool Why can't you use this? So if you look at some of these use cases In industrial setting Let's look at simple things like microscopy Where You know you can go to open microscopy.org And get to know the entire use case Without having to take notes But There are I mean if you look at these High throughput imaging systems In industrial environments They can generate You know 50 to 100 terabytes Of data a day Depending on how many of those machines you're Leveraging at one point So the question is How do you determine artifacts? It's not about Figuring out which A cat versus a dog Which is You know the Quintessential ML use case That all to a half of the talks have It's about You know real industrial value So In these use cases You just cannot connect To the cloud at that speed So you would have to do Both on premise and cloud And you would have to Think about Think about Always think about consistency Not just how Something works well for you today But as you grow Your businesses And span across multiple clouds So our When we looked at Our problems last year When we started off We said that we are going to Simplify our customers' journey By betting on open source By looking at Product By looking at open source products that Come from partners That are known to do Good things in AI And that can help Both the two sides of the Organization So typically in an organization You have the Scientists and the engineers And the IT Enterprise IT somewhere Trying to figure out Both the sides of the same coin And so we bet on Q flow So that's pretty much the motivation And if you have a different motivation Would love to Hear Post the talk But So this is This is This slide deck is evolving But This was something that we felt that Q flow is really good For the basic Kubernetes integration And the essential ML tools Because that's going to be commoditized It's It is best to pay your technical debt Up front I mean I'm just talking on behalf of Any enterprise who wants to Accelerate themselves And machine learning And focus on Your line of business applications Which is the top layer of the stack That's where You want to focus all your energy And let open source And let's build Very strong big open source communities To deal with Other layers So this is a proof point that we've done So our goal is So from our team Our team's charter is Make you flow really good But ensure that You have a consistent experience Whether you're in the cloud Or you're not on the cloud And we've kind of worked with Google's partner engineering team On regularly To ensure that we have the uniform UX For us That is the most important thing And we will keep highlighting that in the talk So in order to Kind of get into the community We were new to the community And we didn't know exactly what would be valuable So we asked Jeremy and David Who's I don't know if David is in the room David, I didn't check So we said Where can we help the community Rather than What can we This is what we want to build So effectively we said Okay, let's see what an ML training job looks like It looks like some kind of Keras or TensorFlow workload And then you go through the TensorFlow Toolchain, you compile it And you You know, through a TensorFlow Through an operator in Qflow You run the job on Kubernetes spots So this is like everybody in this room Probably knows this already So then where do we contribute? I mean, so we realized That the TF operator Was really robust And very good But there was some gap in PyTorch And we also noticed that At that point Qflow had a really good project That was just kick-starting Called Catab And so we were looking at all these things And we kind of took the community guidance And we said we are going to first Improve the PyTorch operator And make it as good as the TF operator So that's what we did And instead of Sorry So and that was really good Because we started seeing a lot of teams Who were not wanting to use Qflow Because oh, Qflow is only about TensorFlow Then we said no, Qflow is not about TensorFlow It's about any kind of machine learning back-end That you want to have And so this was a good step in that direction Then when we started playing with Qflow A year ago On premise, we said How do we know this stack is performant And it matches up to our expectations So we realized that there was a big gap At that point And the gap was benchmarking If you want to have an on-prem solution What is your benchmarking story? So we said we'll create a benchmarking pipeline I mean an entire module for it So we contributed that And in fact We feel very good about it Because not only did we Not only can we use that in Qflow We can connect other open source communities together Because if you Google for ml4.org That's another benchmarking community That's again spun up by many companies It's basically to come up with industry standard Deep learning benchmarks So now our goal is to ensure that We can run some of their reference workloads On a Qbench On a Qflow cluster No matter where it runs And another thing that we When we looked at Qflow We realized, okay Remember the slide that I showed you So you have the basic tools That integrate machine learning workflows With operators into Google LEDs But typically most of the time Whenever you start In streams Most of the time people will take existing models And fine-tune them I think this was also mentioned by the panelists from Wiser That many people do not have the skills Or do not have the time or the resources To build models from scratch So hyperparameter tuning And looking at some of the advance Are actually very important So for those who don't know hyperparameter tuning A simple way to understand is Suppose you take a deep learning model You specify the models But you do not know how many layers to use What kind of filters to use What are the filter sizes in a typical CNN And you want to iterate And figure out what is the right variant Or the qualities of the right model That you want to train This is not even like figuring out what model to use Like given a CNN If you've decided you want to use computer vision And you want to use CNN What are the qualities of the model What kind of parameters would you train So that is hyperparameter tuning By the way, the hyperparameter is not the same as training the model And if you want to take a step forward On one hand you have hyperparameter tuning On the other hand you have neural architecture search Which means I don't know whether I want a CNN model That's specified already in the TF examples I want to start building my model from scratch I may have some constraints And that's when you actually build the architecture And that field is called neural architecture search It's relatively a recent field And there's a lot of activity in the industry And when we saw Cateb the hyperparameter tuner We said why shouldn't it do neural architecture search Because we were already independent of Qflow Doing our own research and putting our open source software to do this We said we are going to make this happen inside Cateb That was our promise I know Jeremy and I, we've talked about this promise for many months now Not years, sorry So by the way, one of the things that we did along the way Was we also contributed the Bayesian optimization for Cateb And right now this is in alpha But we are contributing to Cateb So that Cateb becomes a general RML system And then we can have, then we will be much closer In terms of parity between the cloud The tools from the cloud providers and on premise And so these are some of the ways that I think Your companies or if you're from academia You can contribute and make this community A really robust community And make you flow, you know, the dirty factor ML platform for Kubernetes as opposed to building yet another one Otherwise, you know, if people will keep building tool chains And that's probably not that valuable And the bigger the community is, the more things that you can use You are going to actually have a more vibrant ecosystem And it's going to just amplify your effort So one of the things that we also recently started participating In the community was helping Folks in actually talking to customers And doing user surveys So you've probably seen this from there already But essentially we realize that this community as it grows The needs will change So our goal was to see today let's take a snapshot Let's help in the user survey Let's actually do the user survey every six months To see the dynamic nature of our community And how it changes And so that we can build the right things And essentially what we realize is When we talk to a lot of people And even internally Lot of people want on-prem to be a first-class citizen So that's why we started We created a CUJ proposal And by the way it's not a Cisco proposal It's a community proposal We expect everybody to contribute to it And we just, there was nothing in the community So we just jump-started it That how do we make on-premise a first-class citizen So for example, how do we, you know The baby steps are How do we address bare-metal installing Management challenges Then how do we ensure storage integration As Erecto talked about And then how do we ensure the same user experience I think that's very hard to define Very precisely because the community itself is changing And it will keep improving But as the community evolves How do we ensure seamless UX On-premise sign in the cloud So I think only when we, you have seamless UX Can you have seamless hybrid cloud If you're on-premise UX Starts or it's not as good You will not have the hybrid cloud experience And that's what, if that's what your business requires You need to enable that So we've taken some baby steps I don't want to go into the details But most importantly What we want you, the community to do Is to join hands and do a lot of testing Figure out what your use cases are Talk to the PM community Give feedback to the on-premise community In fact, we want to create a subgroup So that we can actually help prioritise this It's not going to happen just with one Participant in the room It's going to happen when we all join hands To make this on-premise system look As good as the cloud offerings And of course there's a lot of testing involved There's a lot of integration involved A lot of solutions need to be done And a lot of good documentation that we need to do So I think Kubeflow is a really good It's a great journey And we believe that we will keep improving the community We'll keep fueling the community And I think it's going to take off There is no doubt in our minds It's just that we want more people to join it And that's pretty much what my, the reason why I'm here to talk to you about Kubeflow So I didn't want to spend much time Because everybody's just after lunch But we still have six minutes for questions Yes, critical user journey Yes, sorry, I should have Any other question? I think I asked this question in the previous talk But I'm curious, your use cases are very interesting What about non-deep learning use cases? Yeah, so that's a very good So the same, I think we've all felt the same pain points We have a lot more and I mean personally When I look around and talk to our customers and partners We have more non-deep learning workloads And that's one of the things that we will also help do So that's, your points are really well taken I have a question for you It's a question slash a request for feedback What do you think are the most effective ways To engage with the Kubeflow community As both like an advanced contributor And as a new contributor starting out So as an advanced contributor The best thing is to actually attend The Kubeflow contributor summit And just socialize and see where you can help I mean go with an open mind So that's the advanced contributor pitch For the new contributor I think the best way So a lot of people within even Cisco have said, hey, you're doing Kubeflow I want to get involved I would say install Kubeflow Run a simple application And then you'll feel some of the pain points And start attending the group meetings And actually just be on the Slack channel And ask a lot of questions And ensure that you can run Kubeflow first Then you can contribute something to it What are some of the problems that you've seen Running it on bare metal And what are you trying to fix? Yeah, so if you think about Kubeflow Today The bare metal support for some of the components Like pipelines And some of the even operator support You would have to do some extra work So for example you have to do the storage integration So it depends on how you're trying to use it Because pipelines works But pipelines is a very abstract view of your workload But if your workload is somewhere else You need to bring it close to you And that's some of the Oh sorry Yeah, so Some mostly around Actually these are not major problems But it's about making the UX better They are like smaller issues You have to fix your storage back And you have to ensure things don't You're reading from the right place So a lot of these things will be worked out I'm sure the community is going to Smoothen it out in the next few months So this network architecture search In Cotip that you're talking about Is that leveraging any of the other existing open source tools Like Keras has auto-Keras Which does this Or is it just like a separate effort So I think it's been a separate effort From scratch in the Kubeflow community In fact what we've done is We've built it in a pure See most of the other efforts They are not native Kubernetes efforts So we wanted to do it from scratch So we had our own code base too For some of our own algorithms And we ported some of the existing algorithms Into the framework And so we wanted to be a native Kubeflow tool Not specific to this topic But I have general question about the open source Of aspect of the Kubeflow If you look at the current installation guide You see it uses KS on the KS interface But when I try to install Go to install KS Yes it is there But I also see the note that it has been It is defunct And there is no more active deployment on that So that's one question Like are we going to update all our nodes to use Of course And I use the UI UI work just fine But I want to be more command line No so that's actually one of the things That the community is really working hard towards Is to have better documentation And better That's what we meant by better user experience When a new user tries to learn And jump start It's still not as friendly as it could be Especially for the non-ops folks Like the pure data scientist Agree and I'm happy that you already Discord that as a problem So knowing that it is a problem is important Yeah we all recognize Actually within the community we are very open And we recognize all these problems And we are working very hard Yeah and the second part I want to ask is Like is there What is the way to give feedback about anything So that it's just Slack Slack is Slack works And file an issue if it's a real problem File an issue It's more like a feedback Like what these kind of things Like user experience and all Yeah I think Slack should be good Okay great thanks Thanks I do want to emphasize that Filing an issue is the easiest way To get your problems visible And actually filing an issue And then referring to the issue from Slack Or from the mailing list Is like A plus contributor work So please do that So I have a silly question Do you have any examples Maybe like some YouTube videos Where you show working through an entire example Or do you have a GitHub thing Or something like that Yeah so there are community examples Community GitHub stuff We also have published On our GitHub an entire example So you can But I would recommend First check out the community examples Yeah you can find community examples In Kouplo slash examples repo Um no We don't have But yeah just on GitHub It's the best place to look at All of them both coming from the Google team But also from the community Yeah thank you So next up we have Rita Zang And Zartek from Microsoft Who we're going to talk about Scaling and testing experiments On Kubernetes using Kouplo Helm virtual Kouplo And Azure Container instance While they're getting set up I have some questions for the audience Just to get a sense of like The open source experience here And what matters most As far as like adopting open source projects For you guys today So how many people regularly use An open source tool as part of their workflow Awesome that's a lot I'm wondering so as far as documentation Examples or kind of basic API stability Which of the three are most important As you're considering getting started With a new open source tool So documentation Any hands most important Okay how about examples Okay and how about API stability Awesome so all of them are important We'll get we'll get working on all three of those Thanks Hello Right um hey everyone um We I know some of you guys asked for demos So we will have demos So it's going to be exciting So yeah we're we're here Very happy to be here at Kouplo day And we're going to talk about How to use Kouplo With all the other awesome Kubernetes tool kits That you guys already love and familiar with So it is about Kouplo But also in addition to Kouplo What are some other tools that you can use At your disposal With that All right so Early last year OpenAI like basically told everybody They were using they were using They were using Kubernetes for their ML trainings And one of their largest clusters Go up to 2,500 nodes So that was really really cool And a lot of people were really impressed with that But as you know not every company can be like OpenAI Where they have a dedicated Operations team to help support their data science team Right so in this talk We're going to show you guys some tools That you can use for your own organizations Where you can also do what Elon Musk does I would like to introduce Rita Rita is a software engineer at San Francisco In Microsoft And she works in the Kubernetes upstream futures team She contributes to the Kubernetes Kubernetes project and many other projects And making it great in Azure Also contributing to our projects Like AKS Engine or the AKS Azure Kubernetes service And I would like to introduce Sir Tech He is also a software engineer at Microsoft In San Francisco And he also works on the managed OpenShift offering That we have on Azure As well as the Azure Kubernetes service So if you want to try it out You guys know this guy worked on it So for the rest of this talk We're going to start off By talking about some of the typical ML workflows And some of the shortcomings And I know we already saw a lot of slides today You guys probably saw a lot of workflow diagrams So we're going to short-circuit that and skip some of that And most importantly, we want to use this opportunity To tell you with tools like Qflow, Helm, Virtua Kubelet You can basically make sure that you enable your data science So that you can run your trainings and servings anywhere Whether that's on premises in your own cloud Or you could run it in any one of the public cloud offerings Get this microphone close We're going to have some demos We're going to see image classification With inception v3 and transfer learning And then we're going to look in automating Repeatable machine learning experiments with containers And then we're going to deploy the ML components To Qflow, to Kubernetes And then we're going to scale and test these workflows That we deploy with Qflow Okay, so by now you really should know what Qflow is Right? All right, so we're going to skip this And we all know it's awesome, yes So how many of you guys heard of Helm? Oh, nice So I don't really need to talk about it But for those of you who might not know It's basically an application package manager For Kubernetes applications Now we all love YAMLs, right? But what this allows you to do is package your application So that all of your nice YAMLs can be packaged into one application So that when you share your application with other people on the team Or with your customers or whatnot You can actually hand them off to this Helm chart And it will be composed of a lot of Kubernetes resources That your application requires And you can actually manage the life cycle of your application by using Helm And we're actually going to use Helm in our demo To show you how you can package up your machine learning applications Like any other Kubernetes application So for those of you who might not know what Helm is We're just going to show a quick demo of Or just kind of walk you through So one thing that's really nice about Helm is that there's All these community-driven community-supported applications And in this charts repo, you can see all the popular Applications that people have already created So I'm just going to go ahead and show a WordPress application And as you can see, for this particular WordPress application You basically just have to define What are the resources that you want on the WordPress application To deploy in your Kubernetes cluster And as you can see, Ingress Resource, PVCs, Secret Services that you've defined In order to run this WordPress application And what's nice about it is you can actually Expose the configurable components of your application Via this values YAML And you can parameterize it such that You only expose the parts that you want In users to parameterize and configure them And this is also really important because We're going to use this in our demo Where we demonstrate how you can use Helm charts to parameterize your experiments Your machine learning experiments And we actually use the values YAML To drive some of those hyper parameter sweeping parameters And then next we're going to talk about the virtual cubelets How many of you are familiar with virtual cubelets? Okay, so not that many So virtual cubelet vk is an open source project In the Kubernetes upstream That kind of makes a virtual cubelet That masquerades as a node So you can think of it like the Kubernetes But in the back end you can do whatever you want So in sort of way So it's like it provides a Plugable interface that you can add different providers So these are some of the providers that are there currently You can have Alibaba, ECI provider, AWS Fargate There is Azure Container Instances, BackGPU, Service Fabric, HashiCorp Nomads And then today we are going to talk about Azure Container Instances Provider And how we can deploy Containers from your Kubernetes installation To the cloud using the Azure Container Instances So the workloads are not in your Kubernetes cluster But it runs on the Azure Container Instances But you can see them in your Kubernetes cluster Because it's scheduled through the virtual cubelet So the first step is basically If you do Helm So if you have the Helm binary installed And if you do Helm in it It'll install necessary components To your Kubernetes cluster And then in this case we already have it So we do Helm version So you can have a client and then the server component In the next version of Helm There won't be a server component So it'll just be on the client side So like we talked about earlier We are installing the VK chart in this case Through the From GitHub and the virtual cubelets The organization And then we're going to override the values at YAML That is one we specified So in this case if you do dash f And then specify the YAML file It'll override with whatever you want So if you're doing hyperparameter tuning And whatever you can specify Specific ones that you want And then in this case it installed All the deployments and then the pods And then we do Helm LS to see that it's deployed And then in this case we do a get node So the next step we're going to see is We do a get pod so that the The virtual cubelet is running as a pod And then in the last step you can see that It registered the virtual cubelet as a node type So now when we create a Like a new deployment or whatever We can say hey go ahead and schedule along the virtual cubelet And it'll create something on the ACI for example Or any of the other providers Yeah so this is like another screenshot of the last thing they saw So you can see the virtual cubelet is registered as an agent node And then if we describe it I'm not sure how clear this is But you can see the capacity is like so large It's like 800 CPUs there And then you can also have like GPUs and other resources there So like I said you can think of it as like a node But to the back end we'll go somewhere else In any of the providers So for example this is how you would schedule a GPU In Azure container instances You add an annotation for the virtual cubelet that I owe GPU type And then this is going to be the GPU SKU you want In this case it's KAT but there are other SKUs also And then for the node selector This is how you would schedule Just like you're scheduling a specific Like a pod to a specific node This is how you would schedule in a virtual cubelet So if you say type virtual cubelet Hey I want this one And then you can have a node selector as like a Linux or a Windows node So in this case it's Linux And then this is how you how we'll get scheduled to virtual cubelet Which will then which will create a Azure container instances Instance basically All right, so so search Talked a lot about Azure container instance So what is it right? So basically it's a service that allows you to run your containers So you don't have to worry about the underlying infrastructure All the nodes that you have to manage yourself All you need to work all you need to do is bring your container right And all you need to do is tell it how much resource you need And where's your image And if you decide to use like files You can tell it where the network share is So all in all it's a provider That allows you to run your container What without having to worry about the actual infrastructure And the other nice thing is it's It's created as you request it So only it only runs and charges you per second Only when you actually request the resource Right, so if your container's not running You're not getting you you don't have to maintain it And you don't have to pay for it So why is this important in our in this scenario? Well, we worked with a lot of data scientists that were trying different experiments Now if you have ever had to create like your own cluster for your data scientists Obviously, you know, a lot of times what they the experiments that they need May require many GPUs Now if you have to create a cluster and maintain it over time With many many GPUs, that's very expensive Right, so with this what you can actually do Is you can allow your data scientists to schedule their experiments as they need And they don't have to wait for other data scientists to finish their experiments first So that they can run theirs, right So this actually gives them a way to schedule the experiment as they need And scale up and scale out as they need to perform the experiment So let's take a look at what VK and ACI looks like Can you guys hear me? So as as a searcher mentioned earlier here As you can see because of I pre-installed The Virtua-Cubelet Helm chart, I basically now have a virtual node, right So with this, as you can see this because the Virtua-Cubelet node in this case Is actually presenting itself as a virtual node But it's actually running on Azure container instance So now to me the cluster operator It looks like I have 100 GPUs on my disposal But I don't have to pay for it, right Because I'm not running anything So now let's see how a developer or a data scientist can actually schedule A container or a job or a pod in this cluster But really it's running off of an Azure container instance Here as you can see I have the often-demoed NGINX pod And all I'm doing is hey I'm telling NGINX I need this much, I need to run this container in a node called Virtua-Cubelet And I want to be able to schedule this pod there, right And so once this particular pod is scheduled And as you can see it is actually scheduled on this It's actually, yeah maybe this looks better So it will show that it's actually running on the Virtua-Cubelet node Not one of my nodes in the cluster as you can see here And then it gives me a public IP that I can use To actually reach my NGINX pod And there you go So all you have to do is create, schedule this pod like any other pod But the actual resource that is running is off of Azure container instance And as soon as I delete this pod it's no longer running And I'm not no longer paying for it Okay so I might just like maybe skip this Because as you guys see we've seen a lot of diagram But basically let me just quickly walk through this In a typical machine learning workflow right You have some data set that you want to train on And you want to be able to run training Whether that's sequential or maybe you have some hyper parameters That you want to experiment with And maybe you want to distribute that training But those are the parts where it's a little bit difficult Especially if you only have one machine Or maybe you're in an organization You don't have that much GPU right And so the other problem that a lot of data scientists faces Now I have a bunch of models How do I serve it right And how do I make sure the serving application is scalable And so these are the questions that often come to us Coming from the data scientists So for our particular demo We're going to use the same workflow But basically we're going to find a celebrity So the problem is a celebrity that we have And we have a bunch of people who are not that celebrity So basically an image classification problem And what we are actually doing is We're doing transfer learning using Inception V3 Which was previously trained with ImageNet So taking that, what we're doing is we create our own data set Of celebrity and not celebrity And we use transfer learning to train the top layer So that we can actually classify for our particular celebrity And in order to make our model more accurate What we are doing is we can run our experiments repeatedly With containers And because we're going to use Helm and VirtuaCubula and ACI We can actually parametize our experiments And run them up in parallel And we can visualize the training results As they're running with TensorBoard Because all of that comes with Kubeflow And so last but not least We have a pipeline with Kubeflow Pipeline Or Argo of your choosing So you can actually create that pipeline And once your model is done You can use a pipeline to trigger the serving stage Of your pipeline And then the serving stage will basically use The model that you want to use So let's look at the demo All right So here I have just a simple Docker run command And all it's doing is running my code Right So I have a Python code That basically takes Inception v3 base model And I tell in my code I go retrieve the images that I want to use for training Then I create some distorted images for my training And I create the last layer before the actual training starts And as you can see here I'm running this on a single VM And it's using K80, Tesla K80 for my GPU So all this is showing you is It's some container running the training portion of the experiment So next what we have is the serving part Right So once you have a model trained You obviously want to serve it And there are many ways to do it Right You can create your own class application And just serve your model that way Or you can use TensorFlow serving Or sold in however you want So in this demo We are actually using TensorFlow serving Because they just worked out of the box Right All of that is in this demo All right So what I'm doing here And again we're not even on Kubernetes yet We're just using Docker right now So here what I'm doing is I'm using the TensorFlow serving base image And I am basically using it to create my own image Using the model that Using the transfer learning model that I've already trained And then here as you can see I'm creating the serving server And then once that server starts It's running off of 4.8500 And then next you'll see I'm running some Python code That basically grabs an image And going against that TensorFlow serving endpoint And here as you can see is a code that is doing the prediction Right And then here as you can see Running against the serving server And then here I'm getting some results back Which basically tells me how good this prediction is And if it's classified as the celebrity I want So anyway, so that's the serving demo So in the next step we are going to use TensorFlow Training the Kubeflow to show you this We have a Kubernetes cluster with GPUs So you can see an immediate outcome GPU So that's pretty much it And then in the next one We are going to install Kubeflow So this is like an older installation Like older version And it uses the KS And then now there's the KKF CTL To be able to make this more streamlined So we are installing all the components here So at the end you'll be able to see the Kubeflow components Like I said these are the older version of the one that was recorded So there's more stuff now But this is basically a Kubeflow installation And then one of the things that Kubeflow really helps is Like compared to like a Kubernetes batch for example It allows you to just auto-discovery for like things like GPUs So before you had to specifically specify where the volume mounts is It's something that a data scientist or even an engineer might not know How it's configured Especially if it's on a like a managed Kubernetes installation So with the TFJob it kind of abstracts away all these components So you don't have to worry about like Hey, where's my driver pass for example So on the left you'll see that the Cut like a very simple batch implementation for Kubernetes For running the same container and then on the right is the TFJob On this one so we are basically creating that TFJob on Kubernetes So we do the TFJob CRD So we see the TFJob retrain is Running and then we have a persistent volume That's bound now So this is the container that's running And it will start running just like Rita showed earlier So it's very similar to that one But in this case it's running on Kubernetes using Kubeflow In this case So we are doing the same thing as you can see It's reserving GPUs for this container And then once So it will also create a tensorboard instance So once we reserve an external IP And then once we go to the To the external IP for the tensorboard You will be able to see the The tensorboard instance for our training On the next one we are going to deploy this to ACI All right so it's a holy grail right So now we got our training into Kubernetes And we're training with TFJob Now because of Kubeflow We have all this stuff running in our cluster But how do I actually scale it out right How do I have one centralized tensorboard That's looking at all my experiments happening at the same time And how do I only pay for the container The resortings that I need when they're actually running Right and so this is where With Virtua kubelet and ACI This is where all of them actually work together So we're gonna first look at the node And here I'm just showing you like that's the Virtua kubelet And next we're actually using Helm To install the hyperparameter experiment And again an application is an application right So you can basically create your application And package it with Helm like any other application So as you can see here This actually creates nine different jobs Based on the hyperparameters that I've created That I've parameterized And so because there are nine different permutations So it creates nine experiments with nine TF jobs So as you can see here all of the jobs Actually creates a corresponding pod And then the pod is that they're all running on Virtua kubelet As you can see And so what I'm doing right now Is I'm doing az container list So just to show you behind those things Is actually creating a bunch of container groups On azure container instance And as you can see it uses the right skew That I tell it to It's using k80 But you can specify whatever skew you want Right And as you can see here This is basically all the information That you give to azure container instance So it knows how to train And how do you run your container And as you can see It's actually creating these containers right now Okay so now the containers are running And just to show you what it looks like Behind those things right So each pod corresponds to a container group In azure container instance And as you can see all of them are running And here as you can see They're utilizing the resources And then you can actually look at the containers That are actually running And here's the lifecycle of the container And you can look at all the logs Very much like the logs that you saw earlier But now we're just inside of azure That's all And then what this is actually doing Is it's training all of your experiments Based on the parameters that you specify So each experiment will have its own test parameters And then in my Python code I've also told it to persist The experiment summaries and the models To add your files Which is like a network share That you can just write all your stuff to So when we actually load a cancer board That's where it's going to retrieve And looking at all the logs and all the summaries And this is why you're actually seeing All of the test results at the same time Inside a cancer board So you can actually see how each experiment's doing As they're going, as they're running And then all the code and the examples we talked about Are in the first repo From Rita ZH and Kate's dash ml And then also under kitab.com azure We have the kuflo labs That kind of takes you through Like a workshop kind of concept From how to do it with Docker To Kubernetes, to kuflo And then going through all the components And how you can do this one example at a time And then this is how you can create IGP-enabled clusters in azure Using azure community service Thank you But also, so this has all the code and all the Yeah, all that you need All the charts that you need to get all the This entire work demo to work And it also has a lot of different videos I know someone was asking for videos Of all of this end to end So check it out We have stickers How long stickers When would you choose to use the azure container interface Versus azure kubernetes service It seems like you'd have different types of flexibility With those two approaches So it's azure container instance So when you want to run a cluster Say for your organization Where you want to maintain the lifecycle of your cluster For a long time That's when you want to actually have the entire cluster Right And that's when you want to have AKS So azure kubernetes service Whereas azure container instance is more Like dealing with the container itself So it's more for people who just want to run their containers But they might not necessarily want to use anything fancy So I know someone mentioned like Istio Or you write your own CRD and you want to display there Say if I want to display Kubeflow I wouldn't be able to do it on azure container instance Because it's that itself Just runs that workflow on its own Without knowing anything else around it Whereas with Kubernetes If you can only run the CRDs and the services And like the pvcs If you are actually in a cluster itself Right So the question is with Virtua Kubella You can actually leverage both Right And exactly That's exactly why Virtua Kubella created Is so that you can leverage these container Runtime providers Meanwhile you can still like leverage all the goodies of kubernetes cluster itself That way you can manage it Manage all your applications Deploy it and manage the lifecycle security with kubernetes But you actually only like pay for the resources you need Like when you need it with Virtua Kubella Yeah so I'm going to say the same thing basically With Virtua Kubella you can burst out to Istio whenever you need to You don't have to have the necessary resources in your cluster already So it's like on demand basically You can like add more nodes to your kubernetes cluster with like more GPU nodes But then you'll have to scale down those nodes which takes more time and like So actually we forgot to talk about cluster autoscaler Right So cluster autoscaler allows you to scale out your actual nodes in the cluster Right But don't forget you still have to manage those nodes once they're created You have to patch them you have to like you know love them right But with ACI you don't have to You just have to schedule them and run them as the container is alive And then after that you don't have to ever worry about them Right But one thing that I also want to mention is persisted volume right So if you have applications where you might not want to like persist to a network share Like ACI wouldn't make sense right Because it doesn't have this concept of local volume right But if you have an application you do care about like local volume Local persist if you see then that's where kubernetes cluster makes more of it So it depends So but with vk you can have that best of both worlds Along the same lines as opposed to comparing nodes with a magic container What about magic container and magic function like k-native Well, how would you kind of could you kind of get some of the benefit with Like not having to run your own node with the function Isn't that kind of like a more granular way of doing this Comments so much about like k-native But I can talk about like what I do know Like just because I think that there the big difference is like running like running your code As the code itself versus like the life cycle of your container I think they are kind of different right And in this case you're actually treating your application like a container Not not just like I'm running some python code or I'm running some I don't know Like go code or something right So I think it's it's different in that it's more like I guess it's like kind of like Haroku world where you're kind of like a web package Like your your bill package right whereas now your your knowledge and your life cycle is in the container itself I'm not sure if I'm answering but it's a different way of doing Same thing I think it's just like how much freedom you have with your application because there's a lot of stuff you can do With functions and k-native so so I guess it just depends what you need Thanks for your answer tech And next up we have Pete McKinnon from Red Hat who's going to talk about How skill or how flexible coup flow is for deploying and scaling in multiple environments and specifically in this instance He's going to talk about I'm taking a local deploy environment and scaling it up to open ship Hello Welcome to this Slightly delayed presentation. I'd like to thank canonical for sabotaging red hat Thank you, Josh very clever I would like to thank canonical they were the drivers behind putting together coup flow data. So it's a great event. Thanks, Josh Let's see. Let me put this down Since this presentation is starting so late It's going to move pretty quickly And we're going to skip over some stuff. So by now you um and basically the presentation's talking about Specific feature of coup flow, which is its capability Or ability to be deployed into multiple environments and we'll look at what that means with respect to different environments. So You've seen lots of slides up to this point starting with this morning It should give you this sort of sense that It's actually pretty tricky to do To put together a machine learning platform when you consider all these different parts the model training You know the management of the data data provenance Building the model rolling that model out to production a lot of parts involved So when our heroes at coup flow like Jeremy started the project, they set out some sort of high level sort of goals for the project And that is portability scalability and composability Composability is really covered, you know in the sense of microservice architecture But this talk will focus on those first two things portability and scalability So the notion that you can go from the bare metal to the cloud with this machine learning platform open source platform And that it scales easily starting with a developer and their Their machine which is maybe a laptop or something like that. They can scale out the hundred Guess what surprise surprise kubernetes is a terrific enabler for this type of These types of goals, right? And today with kubernetes we have Sort of different dimensions of kubernetes in respect of there's on prem in terms of how you deploy it There's hybrid cloud, which is my company is very much Involved in and then there's the public cloud providers. So gpe, gaf, akf, azure ec2 all that stuff Then so that's one dimension of the problem and then the other part is the kubernetes distribution themselves So there's OpenShift, which my company Red Hat makes Rancher Docker has An implementation of kubernetes that you can install on your desktop So There's a couple of dimensions to this challenge And when it comes to the cloud providers, you can think of the different ways there is to authenticate to these environments And then as kubeflow sort of matures, there's also this notion of how do we sort of take that authentication? Ideally a one-time sso type authentication and apply that and manage it within the cluster itself In this case i'm talking about a kubeflow cluster Then there's considerations with respect to storage Google storage versus s3 local storage or block storage like sef nfs And then there's considerations around the hardware hardware resources, which are very important in machine learning platforms So particularly with GPUs There's different types of ways to install GPU capabilities for the different types of kubernetes providers Certainly my company with respect to rel Is very invested in se linux So doing that in that type of environment provides some or imposes some additional considerations as it were But nonetheless the community project the open source project who strives to kind of Put put all that to rest for the users and the operators So to make it easy as a user and an operator to have a consistent experience so By now you have a sense of the components involved So we sort of have this elastic notion of the core components what's filled out of kf cuddle If you do an install of say zero dot four There is the ambassador component for ingress control jupiter hub. We have an asterisk there Because that is undergoing an evolution. So in zero dot five There will be some significant changes happening in the notebook launch space There's catib which has been discussed for hyper parameter tuning kubeflow pipelines the training controller all that stuff And then you have these various other components So the span of components also provides various different considerations For how you deploy this into different environments starting with just a local dev environment and then scaling up into these cloud environments So uh when our heroes at google started they were looking around in terms of What could provide a capability for config management for this new project? And they settled upon a project called case in that So that's a cli tool that's still used today within kubeflow And it's used to basically specify and generate these kubernetes deployment objects. It's based on the jasonet language And it's more expressive than just using you know directly Json or yaml for defining your kubernetes components. So you have variables, conditionals, mixins You have the notion of prototypes where you can install various Different components from think of them as libraries that are reposed that could be hosted in github Fairly powerful constructs and When case in that generates those resources as they're defined Which you get out of that is entirely compatible with kubernetes Okay, so those objects that are services pods deployments all these things And you can directly edit those you can modify those The idea is that you would have sort of a if you will a source of truth for your application And that would be expressed by this config management. That's Uh managed or defined using case in that The other part of that was um And again sort of thinking of this multi environment sort of problem is that In the enterprise you would have sort of a stage rollout of a machine learning type offering for implementation So you might start out in a dev environment with certain specifications or capabilities test environment on the staging fraud the idea with case in that is that by a minimal specification of a variable or two That could greatly sort of define and impact how the application is actually Generated that is deployed into that specific kubernetes environment It also has the ability for continuous deployment and you could delete these resources by deleting The components themselves But Wah wah sad trombone um What happened at the end of last year is that dmware acquired heptio and heptio is a A very influential or was a very own influential company in the kubernetes space and um They made announcements early in february in a blog post that sort of made some declarations about What they were going to do with some of these open source projects that they started One of those was case in that and unfortunately the decision was made that case and that would be shuttered in terms of new development so What that has done is pushed our happy coup flow community to Kind of take a step back and think about decisions to make about How do we want to come back to this problem of config management for this application? so The key point here is that even with this announcement about case in that there's still Um, they're not underlying goal of the project and that hasn't changed is that it should still be very easy To deploy coup flow into these different types of environments That's still a key goal And also case and that provide an on ramp for the various integrations that were really key to the beginning of the project Going back a year ago. We had Selden Integrated into the project packaderm And the roadmap for that in terms of the integration was provided Via case and that so it's still important to have that type of capability So today the community is evaluating options. There's a lot of interest. I don't know if cams here, uh, jeremy But there's some interest in customized which is an open source project for yaml based type of configuration similar to what case and that provided There's capitan. There's helm two The folks at microsoft are just talking about helm and then there's it's antecedent Helm three so a couple options. We're actively looking at that and making decisions as a community about that So still sticking with case and that it's not dead yet It's still used in our community for the time being But when With the creation of this shell script that basically spawned a coop flow deployment for you You could basically specify these different types of platform setting And what would happen is that from that it would define a set of parameters that would be relevant for what you were doing So in the case of tf serving if you're storing your models in s3 There'd be parameters for zone region end point whether you want to verify ssl things like that the access Docker for desktop and mini cube to sort of developer type Implementations of kubernetes for that it'd be a definition for the local storage Alibaba has been very involved in coop flow and They're you know the chinese company they Won the early issues with alibaba as they realized that they couldn't get access to some of the Public images that were being curated and produced by the coop flow community So that's due to the great firewall. So there's Parameter changes in there so that it provides a different url To a mirror inside the great firewall for pulling those images And then gcp. There's the api version for gke zone email project the endpoint Google application credentials secret all this information So these are the types of parameters that's important to capture and Have sort of defined to make it easier for an operator to deploy coop flow into different environments so I don't know if we've made a decision about customize. It's certainly a frontrunner, but Customize interesting as an alternative and it's still sort of This notion of that we would have a common or a base set of definition and yaml and then those be processed against specific sort of overlays or overrides of Perhaps those base definitions or there would be enhancements to those definitions for that environment So mini shift kind of like mini cube. That's the open shift version of mini cube I'm going to talk about open shift in a second and then you could imagine that there's an overlay of gke type Parameters that would be processed and then used for deployment into google cloud Okay, so that's kind of where the community is at with basically this notion of Uh adopting a new customization or configuration tool So just a word about open shift. That's my company's uh kubernetes distribution. It's uh enterprise kubernetes distribution The design of open shift is really around enhancing the developer experience for developers using kubernetes and also Um for the operators Who are tasked with deployment into the enterprise environment? there's um higher levels of security around things like container uid not being able to run A a container that you know lacks a uid Or a ui uh container that's trying to bind to 480 or something like that And it's still you know, it's not a well known uid. So things like that there's controls within open shift for for managing that Um, it's very much self-service. There's developer tools for basically quickly generating applications And um underneath that it's still all the same kubernetes construct that you know and love So, you know pod services it all integrates seamlessly with kubernetes So There's a developer tool with open shift That is the open shift client as part of that tool it has A ability to fire up a developer kubernetes instance. And basically what it does is It pulls down an image and fires up a container which in itself is a self-contained kubernetes environment whereby you can quickly Deploy an application in this case the application we're interested in deploying is kubernetes itself This runs in developer environments. Um, I know it's heretical to say this a linux conference. This is actually running on my macbook pro or that's where the The video was stolen from so That's a 16 gig environment It's limited, but it can be useful for doing quick sort of evaluations of various kubernetes type applications Notably coop low and then there's the deployment of coop low itself So on our website with the getting started information It's uh, there's a fairly well laid out sort of recipe that involves the kf cuttle Shell script for deploying Into basically a bare metal or on-premise environment So that's what you're looking at here. Again. This is in a local developer environment It's running in this oc cluster up environment And it's the same instructions that you would follow basically anywhere to deploy coop low generally so to run coop low on open shift, um, this enterprise kubernetes distribution There's some considerations because of the extended resource concepts And the rback stuff There's these enhancements around things like namespace sub projects and there's permissions and constraints that are applied to the security Contacts around those things so We had read how I have had a lot of success success with open shift. That's been a popular kubernetes distribution the coop low community does most of its developments Using a general form of kubernetes and a lot of the development is done in gcp and gke so The other thing that's uh come on recently not just in coop low but in the kubernetes space is development of operators So coop low has developed its own set of operators and the ecosystem that we're integrating with is also making a strong use of these operators And operators typically consist of an implementation maybe in go lang of some type of capability that is managing a very specific application resource for you Then in the kubernetes In terms of the resource sense, there's custom resource definitions cluster roles and cluster role binding so, um These all have to a crd as you know if I have to be installed as a cluster admin But these can prevent or you know impose challenges in terms of Deploying coop low on the open shift So there's sort of a I create an issue up on the community and capture the friction log there And there's various sets of um sort of cluster admin type Capabilities that you have to apply to some of the components to get it running smoothly on open shift But having done all that Let's look at a little demo and it's not a live demo it is So that same application of coop low that you saw earlier in the other video there This is the exact same deployment. That's really the point. I'm trying to make here with coop low that This one has been deployed into a much more robust environment That environment is running rel 7.6 It has 64 giga ram 8 cpu, but more importantly it has an nvidia v100 GPU So this playing in a loop, but what it's showing you is that um using the same Um components that are specified from kf cuddle I can basically launch a jupyter hub instance specify one of the curated gpu TensorFlow images from the coop flow project and have it run successfully In this environment I'll say it again. It's all the same code whether I installed it in My macbook or installed it in this this larger environment that just so happens to have a very expensive gpu card in it So that's the beauty of the coop flow. It's something that the community really strives for in terms of A project goal that this is a very portable type of environment Especially with people who are just getting started with machine learning platforms that they can work comfortably And easily sort of in a local environment and then take that out to their to be their bigger game In these larger environments with more capability so back to this guy and It brings us to And you've heard this before From other people in the community. We love getting feedback and in Contacts of this particular presentation What we love hearing about is any issues or challenges that you encounter trying to use coop flow or deploy Into different types of environments. We'd like to try and have All-encompassing experience for coop flow in many different types of environments Whether it's on-prem, bare metal certain cloud environments So what helps us is providing us With um your support and the way of testing GitHub issues friction logs Full requests that you've found something that needs to be fixed Love to hear about it When you generate a GitHub issue it'd be great to have specific details So for example, you had a notebook pod that didn't launch and you discovered it was because of this and Then you need to do something to get it running in a particular cloud environment. Tell us about that Capture the information the logs to config and just redact it if you have to and Throw it over to us and let the community have at it in terms of Making it a better smoother coop flow experience for everyone And that's the abbreviated version of my talk I think Any questions? Good Thank you Thanks, pete. Um up next. Uh, we have my co-worker actually patina our who is New to coop flow. Um, but is going to talk about it in the context of a i use cases He's seeing in the field as a customer engineer at cuckold. Good afternoon Good afternoon So I have some stickers each unique From the friends from japan's to jeepley if you're familiar with your word Each unique and you will not see them ever again other than your laptop, but these are like five of them In order to get it you have to answer question. So I need some interaction here first All right as my friend introduced i'm new to the cube flow However, I am not new to the kubernetes nor did mlai So before I joined google I was part of verizon wireless as a distinguished member Before that I was in canonical ubuntu I did my serving to the sae whatever you call them now self-appointed dictator for life And then before that it was ericsson So through my life, I have worked in different disciplines of technology including voice over ip ip tv media backends infrastructure as a service Platform as a service software as a service now in google helping You developers and customers and enterprises to bring the workloads to the cloud It's easy to say but it's kind of hard to do it, isn't it? Especially if you're if you're in love with the technology But your counterpart on the business side is not seeing the light at the end of the tunnel as a return of investment It's kind of a challenging situation So let me stop there. What's the picture about is this is my personal car I got a big accident last weekend in delos I got t-boned literally And it was unfortunately unfortunately my mistake And I wasn't On my phone or anything was just one second. I guess destruction. I got t-boned And the funny thing is before I left verizon We were working on an edge computing cases and one of them was literally the red light detection and stop light detection And we did it We did it and we deployed some 5g sites and our deployment and still running on top of cubanities And that cubanities deployment The basic quality is redeployable without creating snowflakes iteratively So we can scale and bring workloads on top of it as much as we can without changing the workload But at the time it wasn't possible now with the qplo especially for mlai it's possible So other than this case what i'm going to present you today Go over, you know, you saw this chart a little bit different perspective is I like to talk about more about the use cases that you can go and advocate inside your organizations to get some funding Because at the end of the day Your job Is tied with the business This is the reality If you can bring a good use case for mlai, which is really hard Especially if you're doing unsupervised learning, you don't know what you're gonna get out of the unsupervised training Especially clustering which I will give you a couple of good examples from the field You may not get good sponsors All right, has anybody heard Two words in the same sentence walmart and pop tart anybody Walmart and pop tart Okay Don't wish we didn't see the news about it two years ago So walmart has has crunched big data a lot of big data in the last 15 years of Purchasing And they they did run the came in clustering And the result came interesting that at the particular times of the year where there's extreme weather conditions People are buying a lot of flashlights And also pop tart at first it didn't make sense Then they realized the human behavior to survive is have a light blanket and some comfort food So that drives the different shelving strategy in walmart If the weather predicted weather is pointing extreme conditions So when you go in a rainy or stormy or or hurricane season in particular walmart area, you will notice when you walk in the store blankets flashlights and pop tarts So this didn't come as as a predicted outlet They crunched big data with the terraform and terra data guys and and it took a while They spent a lot of money to learn that pop tart sells good in bad weather So they could do better if they had the cupelow at the time and also transferable Pipelines that in the sense of they can run anytime anywhere without restructuring it Distribute go ahead Okay, is this better? All right. Sorry. Is this better? Better? All right, sorry All right, the microphone's always the problem thing in the consensus All right One key challenge as everybody mentioned is if you're a data scientist or an LAI engineer Knowing all these has been the prerequisite, right? You need to know how to package your software how to distribute your software how to Build your infrastructure such as Kubernetes Collector and maintain it and scale it And and also deploy the stuff on top of it at the same time. It's a lot of work That's true. Yeah, and besides if you're looking into more Accelerated workload cases where your levered GPUs or TPUs it needs more expertise as well So what we did in my previous lives at the time OpenSec was popular before the Kubernetes stole title So the accelerated network applications required better throughput by means of latency and bandwidth And then the industry invented SRIOV That required different talent on the on the hardware specification side designing and drivers and also workload migration It has been a lot of challenge So what it did is is mainly It killed the sense of moving workload on the fly Even though if your infrastructure are failing or you're scaling your workload So what does kubernetes do for you much more than open stack not on not only giving you a place to run your application but also taking care of your your Large cycle of your application by means of replica sets or you can use demons or or jobs As well as as well as monitoring the health as a socket or the Whatever port you like to scan or you want to run as a job to see if my application is healthy On top of these abilities. We are putting as many of the folks mentioned kube flow to give you better Better a tool set to be more productive so edge use cases Everybody's talking about the 5g everybody's talking about the mobile is computing low latency computing The low latency is only possible in the sense of law of physics If you're close to the if to that place where the event is happening Right that is only possible if you're if you're Execution environment for your application is close to that event place So in in the sense of telecommunication everybody put saying that okay, we'll put some edge computing nodes Where the e-note these which is the radio stations are located? That's what verizon and AT&T has been doing with partnering with other telco vendors but we're what they are not thinking yet is mainly okay the The investment i'm doing on those edge i'm putting a couple of servers But they are not thinking a step further how i'm gonna Put an you know infrastructure that can be reused by others Without requiring additional a lot of administration So they first thought about using open stack then realized open stack is not really a way to go for portable workloads And then they start looking at more on the kubernetes and then open stack Changes name to be more agile and close to the kubernetes So at the end of the day what you are hearing from other organizations under the linux foundation Is mainly what you are seeing here is the base infrastructures is depending on kubernetes So you saw on the on the on the other presentations that you can do your ml ai app packaging on your on your developer laptop or a workstation And then push it to a cloud cloud for heavy lifting for training the algorithm because What makes sense is train your algorithm with a bigger data set merge data sets so one reality we saw in the field is Is Do you have a better insights That is valuable for your business or your partners You have to crunch a lot of data a lot of different data So if you think i have a one petabyte data and i'll have a great insights coming out of it May not be happening You have to think differently combine different data sets You have a data set for better forecast Combine with the sales from nearby retail stores if you're not in the sales Only predicting which areas in the neighborhood will going to be more popular for house sales Combine with the geographic topology information And you may say okay, I don't have those data right that's the reality and josh is sleeping right there realities You don't have any excuse Right now the all the political situations in the world is driving data to be opened by governments As well as organizations if you need a certain set of type of data There are already data brokers out there. They are selling data If your organization doesn't know what to do with the data you have You can consider selling your data through broker and be a partner Whoever using it as a kickback for a business So there are different businesses right now running in in the world for big data ml and ai You don't have to necessarily know that This data will be used for this particular business Maybe it may be or if you combine with other data you can purchase from a data broker Getter more different insights. So one of the use cases we saw in the field your partner is one of the retailers Which is very famous one of the seniors in jcp. You picked the name they they put a carpet that pretty much Senses the the footsteps the size of the footsteps time and recording and they saw that In the particular timeline of the week and particular time all the day There are different shopping behaviors After 6 p.m. They noticed that the family with the little kids running fast through the store to the Obviously the post-section And in the other time of the day special lunchtime The woman Behavioral is different than men by means of going which department and purchase something So they cross the data and they come up with that. Okay. How can we restructure the our store dynamically and fast? That we can get more business out of Driven by the data So today one of the key key conversations about all right Good to talk about, you know, running the workflows at the edge, but edge is not ready it's it's through and not through And the one example was given was the raspberry pi obviously raspberry pi doesn't have the enough muscle power to run the inference or prediction at the edge, but now google has, you know has announced the Edge tpu last year now it's available for purchase and this is the purchase set that you can go buy around 149 dollars And you can see the you know test benchmarks the the left most column show showing that basically the There are a lot more forward with the edge tpu versus versus The desktop cpu with the eight core. It's an expensive combination plus usb accelerator If you see the numbers is pretty much the same meaning that once you have a board accelerated board like this with the htpu plugged Either to your arduino or or a raspberry pi It's good enough to run your inference at the edge that you trained the algorithm obviously before to provide your low latency predictions And one of the examples could be easily Plug a small raspberry pi size computer in your in your car With a small damn camera to give you a buzz. Dude, you're going so fast and there's a red light right there It's not a feature. It's right today. It's doable And this hardware is already available for your purchase and consumption to deliver such use cases all right So one of the things I have learned through my life is community is better than the companies This is through companies has their own focus for business has their own agenda Despite they are leveraging the leveraging the same technology, but the community has higher purposes the reason opens that grow a lot because of the Contribution from the larger larger community and hopefully q-flow will grow as fast as big as possible to deliver ready to use pipelines Even more than that ready to use as a base Pipelines that you can put an additional component on top to deliver different experiences All right another question Another question today when you deploy A Kubernetes. I know this is kind of a DevOps question sort of for that We have masters and workers right master nodes and worker nodes Before we call them workers. We used to call them something else No Minions Sticker for you here, right if you like to get it It was kind of a historical question All right another question Another question another question When you deploy A new additional feature on top of Kubernetes such as Such as Istio It's not necessarily a regular pod But it's something else we call What do we call for example envoy or Istio on a on a worker node? Yes, sidecar For you if you want to get it to So the reason I'm asking these silly questions is mainly Kubernetes is open to open to different way of delivering different feature sets It could be demons. It could be sidecars. It could be pods And Q flow is is leveraging that heritage So your business may need You to deliver something brand new Or your business or your partner In the crime needs Something that's already out there to use as an api for example google quad has a vision api You can use vision api to detect objects and label them You can use speech api To mainly from speech to text text to speech conversion Or if you need something a little bit different you can train your algorithm on the ml engine and export it if you want to do something Definitely brand new on the edge by means of either training or just running interference or or a prediction Where you're going to see the benefit of q-flow It's not necessarily these are all separate worlds. It's maybe you can combine them in in a sense that Get best price performance for your spending and what you're getting out of it crunch the data train your algorithm on ml engine Leverage the tpu's leverage the printable vm's with the minimum bucks Export it and run your edge with the run the verify with the htp accelerator Much much less than what you will spend buying server with gpu's maintaining them electricity calling Even securing those servers all right This is just in a self-advertisement just skipping it It's many folks talk about these already In pipelines what I'm going to talk a little bit fast forward is there was a question today. I think it was a user right about the examples right Yeah, let's go them there All right a little bit Here we go The demo that I'm going to talk about is mainly from the q-flow github under examples So we took I took a little bit one step further to give you more easy to grasp and deploy by yourself I know the projector resolution is not good, but let me tell you what we will be you will be saying shortly First we're going to package our application on our laptop or inside the machine that you are using as a jump host Mainly what we are doing is building a docker container. Okay inside the docker container. We have a machine learning algorithm Then we're going to push that container to the google cloud container repository gcr And then at the same time we will deploy a keep small q-flow Cluster which you you will see very easy to do through even ui or cli And then we're going to train the algorithm on the google google Kubernetes engine gkey based q-flow cluster Once the training done we're going to export the model into our object storage google cloud storage And then Imagine this is running at the edge this cluster q-flow cluster running at the edge We will simply push that model from gcs into the edge Infrastructure where the q-flow is running All right, and then we're going to run the prediction in here As a sample of but we will deploy one web app front end to kind of a run sample inference Test cases So the first step is mainly deploying your q q-flow if you go to deploy that q q-flow that cloud You will see a ui like this and on the on the left of it You will see put your project id and deployment name Project id is mainly is the project that you created on google cloud platform So the first precondition is Go with your gmail address log into the google cloud console If this is your first time logging in you'll see you are given a 300 dollars of credit Which is good enough if you ask me and I experienced that through when I wasn't part of the google Good enough for a sample ml ai or even big data analysis workload So what I did before I go on google? I collected the weather forecast from the open data initiative in dallas as well as the ground humidity for agriculture I pushed it pushed it into the into bcury From bcury. I did run Mainly ml ai and that cost me around 100 dollars believe it or not and that data set was around 100 terabyte So once you created your project Put the name in there. What it will do is mainly let me take Let's create another q-flow clusters as a sample in here My project name project name is you can see it in the home button in here Yeah, the project ID is q-flow demo 2019 Oh la Let's go back And look at our deployment Status so what it will do is It will go create a deployment set first And show your deployment in here as you see q-flow la is getting deployed So there are different types of tools that you will see out there from aws as you're in us and also on top of there here On top of there are other umbrella solutions such as terraform and so on and so forth. We have our own Automated way of deploying things we call it cloud deployment model So what is that the ui triggered is automated way of deploying a cluster? What right now is running It will trigger first creation of the virtual machines and on top of the virtual machines will deploy the kubernetes And on top of the kubernetes will install automatically the q-flow. That's the first step and it's only one click through a ui So you may say I will not remember that I'll go to All the references in here. So if I go and open you this github repo quick Which is this page This page is literally I put down all the steps that we are running through with all the details Even though the code is literally under the q-flow example For optical character recognition through mlai all the steps described in here with the with the cli command and everything If you see an error feel free to drop me a message so I can convert it And if if there once I can add this read me in the second read me under the example if you want So after after the cluster is deployed obviously you can go and do the you know get the details of the of the All the cluster for the q-flow namespace the cube cut all get pods namespace equals Here we go. So these are the pods that are deployed automatically under the q-flow namespace that that are delivering the q-flow components in here, right? So the following step, okay Our our q-flow is up and running the package our docker through the steps I have provided next step is running the running the running the ml training and then exporting into exporting the gcs bucket who has Here never touched docker before Everybody knows how to build a container out of a manifest. I assume right it's pretty straightforward who has recently pushed a container into docker repo or gcr All right, so the question here is Imagine your partner in crime says that hey, this is our idea. We cannot put on a public repo, right Man, you cannot push your container in a public repo. They will steal it. They will download it. So what would you do? So the answer could be either Hey, I'll get a managed service from docker or gcr for a private repo, right? or or Set up your own exactly My rocket size But the problem is setting your own is coming with your responsibility What if that machine crashes? But if you have a hard disk failure, you lost the container, right? So that's that's where the managed service comes with the beauty of hey They provide you sla by means of service level agreement whatever container you push out there even it's a private repo. It will be there Since you're paying for it Even the back end crashes. There's a replication coming to a picture that will that will help you continue your business Plus delivery So when you push a container into gcr You can pull it from the edge computing sites with a much better throughput and latency than those edge sites connecting to your headquarters God forbid if you're using a vpm for remote connectivity and so on and so forth. That's another another bottleneck All right So in the previous redhead presentation our friend mentioned and in this demo We are also using casonet as well for mainly for configuration management to to leverage casonet for different Contextual switches So what do you see here is mainly is where we build the first container and then as you'll see pushing Into the path we are given above which is the gcr path And then training based on on the gq cluster that we just deployed as a q-flow Cluster and casonet is mainly giving where to download the Container and running on table on table mainly the Kubernetes cluster And the last step is is is step number eight Where q-flow running as the edge at the edge side where predictions is happening again the same Same path used for gcr.io for downloading the container and applying the config model through casonet And then running your running running the running the inference Obviously when you run a predictions at the edge easy way to see a demo purpose is to build a web front end Which we did in here. Let me show you get to This is the web ui that's running as a docker container on the same edge q-flow kubernetes cluster this is the This is the ui ip address If i go down there Let's see This side is connecting to the Prediction or inference Serving right now on top of q-flow and giving you an image that will translate into it into a character This one says it's most likely in Number one, I guess let's try this again It says six and so on and so forth. So in this demo we leveraged the existing data set from mless and use ML algorithm inside the docker container push to the gcr download it for machine learning training purpose on q-flow Exported the trained algorithm into google storage bucket and then pull it into Again q-flow for for inference. So that's the long story Short that I will give you today Any questions so far Did I make you sleep worse than before? Sure, which step? Oh, okay, sure For this example, let me open the code. Yeah, sure So first, let me take that question so I can show the code quickly. So the good thing the beautiful thing about the About the gcp you can directly use your source repo On there the source supporters in here as well as your container registry And code is and I just switch sync with my github in here, right? If I go and let's take sample in here This is the model in here I'm not sure if you can see it This is the model that we package inside the docker and then we run the training on top of it Can you see it? Okay So this repo so google cloud repository is per project base You can sync with your github two ways and also do the code changes in here automatically trigger the rcrcd Part 1 building the container on the fly and also deploying through through spinnaker So back to the other question going back to which step you said i'm sorry Oh, you're all followed up. Oh, okay I hope this made a little bit of sense By means of not only using the tools for the sake of using it But generating something valuable for an idea becoming a reality Idea becoming reality and reality becoming a business income for you As I said at the end of the day technology comes and goes the main thing is what you're getting out of it as an output Today as you saw kasa that is you know No longer will be the obvious choice for us because of the acquisition of tap to you Something else will be replacing it and later. We'll see other tools and come in and go and mainly If you have an idea If these tools are making your life easier use it if you need to support Leverage the community. That's the all only message I can give you as an experience Community is a two-way road by the way You will not only getting stuck in knowledge and data out of it You may you need to think that hey, I'm part of a family I have to contribute back because I see in my experience that some organizations out there I don't want to name those service providers Getting to open source projects bundling as a business proposal But not delivering anything anything back to community. This is this is not the way. This is so selfish Contribute back, please This is how we how we grow as a knowledge as a as a human organization All right, that's it Stickers are here for the people who likes to Torah Come on and take it if you want. I'll give you one. How about that? All right next up we have a workshop on using model driven architectures To deploy who blow at scale um from Kenneth Koski who works at canonical Kenneth are you here? Yes, excellent as an implementation of kubernetes persistent This is the end-to-end story so This is what we do planet scale data management We can snapshot and version and package and distribute and appear to clear network your data So you can be working locally You can be producing versions of your data You put things in a file system You snapshot it in a notebook and then you start machine learning pipelines from this snapshot This is what we do today and then you distribute these versions of so This is a pipeline. You've heard lots about pipelines today, right? Pipelines are big in the machine learning space and couple give your standard way of running your pipeline Instead of having these pipelines access a data lake directly You have them access Us as your data management player You have them access A local disk a local directory slash data that we manage And that we can then Snapshot and clone for your steps So each step of your pipeline works on a different clone of the data And you can have full reproducibility of how your pipeline runs For each step of the pipeline You can essentially go in and say what despite did the steps start from and what did the steps produce And then you can mount all of this in a notebook and start over This is the idea that it is notebooks to experiment You store your data in persistent volumes You create pipelines from these persistent volumes, and then you can continue experimenting and iterating So this is too much talk, but Let's see what This has been doing so They grant up Okay, it came up Inserted a new interface to Welcome to me in Qtflow run they grant SSH to get started. Okay. I'll do that So at this point This is a full Qtflow deployment on kubernetes So this is kubernetes and you can use those kubernetes commands To a manager to interact with it Okay, welcome to me in Qtflow that me in Qtflow to ensure everything is up and running. This is our provisioning space So you run our provisioning space Okay Welcome to our provisioning space. So mini Qtflow is mini Qtflow A small deployment of kubernetes Kupflow and rock our software for data management Okay So at this point the script essentially Declare what you need to be running And then sure to create it by the pipeline into a new notebook So be easy on notebook to explore what happens. So again, I found the notebook But this time it's gonna be seeded with data Again, I can go back into any Qtflow And see that My path has been created Okay, I can actually watch And this was just short cut for this right? Okay, it's running It became running like 29 seconds Jupiter hub is natural flavoring detecting when the path is up. So I can just just try instead of giving it more time So my notebook is up My snapshot is here and it already has file space It contains the state of the pipeline step exactly as it was So step two was expected Let me see the phone so you can actually see So step two was expected to run At this state I have all three files I have the gzip file that step one created And they have the unzip file that step two created So I can actually see What happens So I've got I can have full introspection using notebooks in the result of my pipeline This is it And then why is this important? Because I can split pipelines and run them in a hybrid fashion A cross-portation or I can recreate pipelines But we're interested and produce exciting results And why do we want to do that? So you can have development training and deployment happening in distinct rotations being driven by distinct people using distinct platforms So you're running mini-cube flow on your laptop exactly as it is now You're deploying over your local You're developing and deploying over your local disk Using the exact same API You use pipelines, you use notebooks You can just copy locally You don't have to go to another location to do that And then when we want to train at scale We can move to the google file for example and get two or three gpu enabled in Google And when we're done we can snapshot the results of our training the model And distribute to our fleet of self-guarding cars Which will be running in front at scale And all of this happened with kubernetes and kubflow running on each individual location And we don't have to change our code But we haven't visited it like 20 times today so this has become more or less common knowledge This is it And the way we work is We appear as a CSI to again under kubernetes And we make kubflow data aware by contributing code To have all kubflow components access persistent volume Okay, I'll skip the technical part If you want we can go over them but Let's have an idea of these two technicals Thank you We have a mini kubflow channel In kubflow slack Please download mini kubflow, try it out Let us know how it goes, let us know The description, if there's anything that you don't If you could be improved, if you have suggestions We're really looking forward to your suggestions and contributions Thank you Any questions we may have? I've got these very nice Microphone here Thank you Thank you evangelist, that's super awesome That's very exciting to have launched something today So thanks all of you For coming and thank you For the people who have stayed until like the bitter end here We really appreciate your enthusiasm And so I guess the takeaway here We've seen so many talk about Different use cases for kubflow But the key is We would like you to try it We would like you to send us your friction logs And your bugs, we would like you to try All the different integrations that exist in the community To work with your cloud provider Or Your data Your streaming learning workflow Sorry, it's getting to be that time And one last thing That would be super helpful for us To figure out what kinds of content Would be most useful for us to bring To community conferences like this in the future Would be to respond to an attendee survey That Josh just put together on the fly For you all So if you could visit the link I think that Ilan will also distribute it By email to all the people who have registered For this particular summit But yeah Let us know what you think For all years In case you were wondering We finished up early today I think one of the workshops Became much shorter than expected But I don't think we've Had many breaks today So I for one am not Sorry If you guys are expecting the session at five We did it at 4.30 And it's already done You're more than happy to welcome this day But Yeah, the guys are here We have questions just in the second row here Otherwise that demo has already been done Can you hear me? Okay So thanks for your interest In watching this demo But I have to undo The results of the previous demo So it will give me two seconds So it's all real because Some of the steps fail, right? That's cool Okay Destroyed everything So this is where we start from This is a presentation On Mini Kubeflow Mini Kubeflow is A product that we announced today Essentially a package Version of Kubeflow So that you can run it On your local machine And get onboarded and be able to use Kubeflow As fast as you can So are you familiar With what Kubeflow is? Yes, you're on Kubeflow day Kubeflow is a way to containerize And melt components for Kubernetes I'll be talking more about this Later on because this can take 5 or 10 minutes Which is very short So the fastest way to run Kubeflow Okay What's the story? The story is run These two commands Wait for a few minutes and be done So this is what I'll demo today So You're running You go into a command prompt You create a directory You switch into this directory You use Vagrant Which is a tool to manage virtual machines To initialize it Based on a box that we produce We call it Mini Kubeflow Okay And then you ask Vagrant To bring up the machine This is it So at this point The about 8 gigabytes That this box requires Because it's Kubeflow Is a lot of code Vagrant is extracting this code And then we'll run A short provisioning script So you can try it out So are you familiar With Mini Kube for example Mini Kube is a very easy way To run Kubernetes on your local machine And start experimenting with Kubernetes APIs Why is this important? Because you can have a full Kubernetes deployment And you can use the exact same APIs You can deploy your machine Your workloads In the exact same way you would On the cloud So Mini Kubeflow is Mini Kube Plus Kubeflow on this Mini Kube Plus ROK Arectos Around data management software So you have a full deployment Of Kubeflow On Kubernetes With the data management software That you can use locally in a few minutes And this is now Uploaded to the internet And you can go and try it out right now I mean don't do it right now Because A I'd like to be paying attention to me And B the network you may not be as fast To download 8 gigabytes of data So this is progressing And it's copying lots of stuff on my disk So I'll switch to the presentation And talk a bit more about Kubeflow So what is Kubeflow? This is a Figure from Google's TFX paper TFX is Google's internal Machine learning infrastructure So Google is essentially Open sourcing All of these components To build a machine learning pipeline You need lots of components that have to do With how you manage Your Different steps Of building ML models Okay All of these components have been containerized And have been open sourced And Kubeflow is containerizing These components So they run as components on Kubernetes So Kubeflow is a project to run Machine learning workflows On Kubernetes So this part Of the effects Is essentially the focus of Kubeflow Okay And then you need An illustration and configuration framework And Kubeflow uses Kubernetes For this So Kubeflow orchestrates These components As pods On Kubernetes Okay And then Kubeflow provides An integrated frontend A user interface So you can manage these components And this is what we are contributing to Kubeflow A user interface to manage Kubeflow's notebooks Most data scientists Begin with notebooks So we are contributing Kubeflow To manage notebooks easily and efficiently And then There's these Two big white boxes At the bottom That are garbage collection Data access controls And pipeline storage So where do you store your data And this is what our software brings For the ML pipelines They omit the most important part How do you handle your data And you can always Go to an external data lake To be fetching your data And uploading it back to the data lake But this is not efficient This doesn't work if you want reproducibility If you want performance If you want to work in a hybrid Fashion So we combine Kubeflow Our improvements Are those user interface Which we contribute as open source to Kubeflow We are a core of Kubeflow contributors And our software For data management Into a whole And this is what Mini Kubeflow is For a local deployment Okay So what can you do then Our business is Planet scale data management We allow you to take your data A local file system Or your pods that you are Manipulating from within your notebook And we allow you to snapshot it Take multiple points In time snapshots One every five minutes for example We allow you to version these snapshots And pack up them This is similar to what Git does for your code We are essentially a Git for your data And then we can distribute These snapshots And you can share these snapshots In a completely decentralized way Over a peer to peer network If ten developers are sitting here And producing data snapshots They exchange these snapshots In a secure way Over local links You don't have to upload something to the cloud To bring it back Makes sense So before I move on Let's see what Mini Kubeflow is doing It's done Okay and we started like three minutes ago Okay run vagrant SSH Get started Great Let me move to the top of the screen Okay And then there was a welcome Message that I cleared that said Run the mini kf command Okay I'll run it And this is essentially our provisioning script Which will make sure that everything is up and running So I can start using Kubeflow within minutes Okay I'll run the provisioning script I'll accept the end user license agreement I'm a very Fast reader actually And now the provisioning script Runs And ensures everything is up and running So the provisioning script Will deploy Mini Kube Will deploy Docker Will deploy all of the different Kubeflow components Will make sure they are up and running And eventually It will make sure our software is up and running And then We can demo Using the full set of Kubeflow APIs Over this local deployment So let's give it some more time It taxes the network To make sure there are no new versions And the network queue is not super good So there might be a bit of a latency This happened In the first demo as well This is a live, right? It's happening right now So what you're seeing is actually what's going on right now And because we don't like Silent Periods I'll switch here And show you how I can log into Mini Kubeflow And essentially use it As a local Kubeflow Kubernetes deployment It's still running Okay The network working Yes, it is So let's give it some more time What is it doing? It's updating stuff Okay, why is it updating stuff? Because it's the foreign daughter Our update seems to be Not progressing And I'll blame the network for this But anyway See, I can always blame the network and it works I didn't do anything and just give it some more time And then we provision in Mini Kube And then we're making sure Kubernetes is up and running And I can actually use this As a Kubernetes deployment So I can go and say Tell me, what pods are up and running right now These are the pods that are up and running And the software is initializing itself As Kubernetes components But you don't need to know this To run this, you can just be sitting here And watching the provisioning script run And what the provisioning script does is It makes sure that All of the distinct Components are in place before it says Okay, you can use this So it makes sure to pre-download And notebook emas That you can use to spin up your notebooks quickly It makes sure to ensure that the components Are up and running And when they're up and running And it's been like what, 5 minutes After the machine came up So within 10 minutes total You can have your own Kubeflow running locally So let's give it some more time And talk about What will be demoing after that This is a machine learning pipeline It's three steps for example At each step You want to be Fetching some data, manipulating it And then producing data for the next step So if you use A data lake A big S3 deployment somewhere What most people do is they Download from this lake Because they need the data to be local Otherwise performance is super bad Super bad, sorry And then they upload the data Back to the data lake So your performance Depends on where your data is And this brings your computation Close to your data You cannot be running somewhere And having your data somewhere else This will be super slow Using GAL approach allows you to Essentially Use us as the local Data management layer At each step Uses a local disk To manipulate data And we snapshot The disks after each step We give you APIs in the pipeline To manipulate this So you can have a snapshot Of whatever each step Produced And this allows you the pipeline To be traceable And reproducible There is no problem in running A machine learning pipeline Or any automated pipeline in general If something doesn't do what we Expect it to do There is no way to actually Go inside the step and see What happened, what it did So what we allow you to do is Oh, this step failed Okay, this is where it Started from This is the output it produced I already know its code So let me trace Each step exactly This is what you can do If you run our software That's our end goal So what is our end goal with Kupflow? It's having an end-to-end story That starts From multi-user notebooks Multiple users Working on the same Kupflow deployment Each spinning up their own notebooks Sharing their data Spinning up pipelines From within their notebooks And then creating new notebooks To explore pipeline steps So what could this final result Of the pipeline be? A model, a trained model So then you can have a serving pipeline That starts from this model And serves So why is this important? Because you can have A training pipeline here Producing the trained model here And then you can have a serving pipeline Running somewhere else That starts From The trained model as an input To serve So you can split A full pipeline In individual steps That run in different locations Because you can very efficiently get Output from one step running As input for another step And you can also recreate the pipeline Because if you can recreate its input And it's code You can recreate it fully So let's get back To this, it's up and running How much time passed? 10, 15 minutes at most Okay So I'll go here I'll go to the URL And this is Kupflow You deploy it Kupflow and you have it up and running locally So you can go explore Kupflow APIs So I'll do just that I'll create a notebook Create a pipeline Then create a notebook again To explore one step in the pipeline So I'll create a notebook This is the This is the pipeline's component of Kupflow This is the notebook components This is a release 0.4 We have contributed a new user interface For a release 0.5 Okay So I log in I'm creating a new notebook From this image This notebook will have this much CPU and this much memory And It will have A new volume As my workspace This is where I'll be storing my Python libraries for example And I'll create a data volume That I will mount Under pipeline data Right in size And this is where I'll leave my data As I work in the notebook So then I can start a pipeline From this data So I spawn the notebook At this point Kupflow contacts Kubernetes And asks it to create a new pod That will be used as my notebook server So I can Go back To mini Kupflow and query the pods And see that My container is being created 19 seconds into being created 25 seconds Running Within 27 seconds So it took me 30 seconds To get my container, my notebook up and running I go back here Jupyter Hub hasn't actually realized The pod is up and running So I'll just refresh The new user interface pulls more frequently So it comes up even faster My notebook is up and running You deployed Kupflow within minutes You deployed a notebook On Kupflow and you can start using it So I have The place where I'll leave my data It's empty, pd2 I'll spin up a terminal And I'll create a very complicated data set Which will consist of three files But This is a proof of concept for Any kind of data processing, right? At this point I could be going to my data lake And fetching in a few gigabytes of data This is a local volume So it doesn't cost To move 100 gigabytes of data From one step to another It's just there Another step can come and access the data Instead of having to upload 100 gigabytes of data to history And then bring it back And paying the associated cost So this is where I'll be Leaving my data, pd2 I'll create Let's say Time is 0132 UTC I'll create a few files here Okay I'll create three files Because this is the kind of pipeline That I'll run later on So I have created three files Okay This is the current state of my notebook Three files inside This directory This is my data set And now what I'll do is I'll access our software On mini-cubeflow We call it rock And I'll take a snapshot of this notebook I'll essentially Create a starting point For a pipeline from this notebook So I log in This is an empty bucket We organize Data sets in buckets Okay I'll bring in A full Jupyter lab My notebook server Okay It auto completes That this is the one that I'm currently running And at this point I'm essentially creating a data commit So this is why This looks like it I'll be committing My seed commit For a pipeline For a cubeflow pipeline Okay The message will be Create three files Seen the pipeline Okay And at this point This creates an asynchronous task My notebook is still running That snapshots my notebook So rock can be creating Multiple point in time snapshots Of my notebook as I work I can then Use these snapshots As starting points For pipelines So this is the task It's almost done Everything happens thinly So if I take another snapshot After a few seconds It will take 10 seconds It's done Okay And now My snapshot is done This is my snapshot I snapshot the Jupyter user I snapshot My data volume And my workspace volume So I'll keep The URL To the snapshot of my data volume Because this is where I'll spin A cubeflow pipeline from I copied the URL here Okay So now let's See this pipeline This is another cubeflow API Pipelines So this is This is the definition Of my pipeline In Python So cubeflow pipelines defines A Python based domain Specific language To define your machine learning pipeline So my pipeline Is a Python function It starts from a rock URL The one I just copied And it says Create a new volume As a clone Snapshot this parameter First step Is to mount This volume as SlashData And concatenate all files Into a gzip file So first step is gzip everything Into pull.gz Then Snapshot the results Of step one So I can retrace what step one did Seenlessly Is clone this snapshot Into volume two This happens thinly If this was a 100 gigabyte volume I don't have to copy 100 gigabytes Step two is Unzip this data Pull.gz And create data slash pull That's what gnzip will do Okay, and use volume two As your data SlashData And take another snapshot Step three is start From a clone of this snapshot And just show me What's in data slash pull What do you expect to be SlashData slash pull The concatenation of all three files Okay So I'll compile This pipeline using the Kubeflow Pipelines compiler And compiling the pipeline Produces An Argo workflow And then submit to Kubeflow Pipelines So this is the compiled result A big YAML file That you don't really have to see You are seeing nice idiomatic Python But this is the end result A big YAML file Okay And then I can log into Kubeflow Pipelines Which is here And I'll upload A new pipeline The one I just compiled Example five Okay And I'll Do I have to create a new experiment? I don't remember Let me create a new run Yes, I do have to create a new experiment So I'll create a new experiment first Call it Experiment one And now I'll create a new run Of this pipeline Okay Run name is gonna be run Five And the pipeline needs a Roke URL to start from It needs the seed Snapshot So I'll go into Roke Get this link And give it two pipelines Why is this important? Because I can run multiple pipelines In parallel with different parameters From the same snapshot Why do that? Because I'm an ML I can run the same model From the same input data With ten different values For a hyperparameter So this is how I do it, programmatically I give it the same input data I give it a different value For each parameter And they run the same pipeline ten times And Kubeflow has components That do it automatically And we are extending these components To work over persistent volume So creating the pipeline run And what this will do is Will essentially show me The pipeline graph being created So initially We create a volume This kind of pink-red color And the volume has been created So this is a persistent volume On Kubernetes Managed by our software And now step one is running Okay, let me refresh Step one is still running Step one run Snapshot one is being created Again as a Kubernetes resource And because this is all Kubernetes I can go into Minikup and ask Give me a list of persistent volumes Okay This one, this volume Is volume one It's this run of this pipeline It's bound And managed by rock Okay, give me a list Of snapshots This is the first snapshot It's being created Okay Give me another list of volumes Now there's a new volume Why? Because the pipeline is running as we speak So if I switch to the pipeline's UI And refresh Volume two was created Step two run Snapshot two is being created All automatically Okay, give me a list of snapshots Snapshot two is there 17 seconds ago Okay, let me refresh The UI however does not refresh that well So it will refresh eventually Snapshot two is done Volume three has been created Step three is running Same thing, another volume would have been created So we orchestrate The creation of Kubernetes resources From pipelines This is the easiest way To run full Kubeflow pipeline And finally, step three is done I can go into each step And look at input and output I can look Into logs And the logs of step three Is the concatenation of the three files Because the definition of step three was To cap The slash data Slash full file So this proves that all of the steps run But let's say that Step two had a problem Okay, step two Is behaved Okay, I'd like to see Exactly what its output was Snapshot two Maybe I also want to see What its input was, snap one So what I'll do is I'll destroy my notebook I'll stop the server Jupyter Hub Allows us to run a single notebook Here running multiple notebooks is not very easy But the new user interface Of 0.5 will show multiple Notebooks at once So I wouldn't have to actually destroy my notebook here Okay So what I'll do is I'll spawn a new notebook And I'll attach snapshot one And snapshot two Into this So I can see what step two did So I'll be creating a new notebook Okay This notebook Will Be empty But I'll attach An existing Snapshot which will be step one So I go here I see that rock has picked up The two snapshots So I'll attach Snapshot one Us A mount point And I'll also attach Snapshot two As another This is the old Jupyter Lab And it's dead of course But you close it And I'll attach snapshot two Did I copy it correctly? I don't remember, let me try it again I'll attach snapshot two Here Okay So I'm creating a notebook To explore the execution Of the pipeline This is full insight into how the pipeline runs Okay Creating the notebook Again underneath I can see that my Notebook is being created As a Kubernetes pod Okay Will give it 20, 30 seconds And within 30 seconds I'll be able to explore The results of the pipeline That are immutable and reproducible Because they are snapshots Okay, it's running Within 28 seconds This is fast This is really fast I'll switch to the user interface Jupyter Hub is not that fast On the other hand Okay And what do we have here? We have a notebook, a new notebook That already has Snapshot one and snapshot two Mounted Snapshot one has contents File one, file two, file three Full.gz No.ganzipped file Because step two hadn't run yet This is the input of step two Snapshot two has File one, file two, file three Full.gz and the unzipped file Why? Because this is after the execution of step two So I can have full insight Into my pipeline I can go and see Exactly what has changed Before and after The execution of step two This is super important When you're debugging the pipeline So this is snapshot one And this is snapshot two My own local clones This is it So what we did is We got Kubeflow running Locally within 10 minutes We used Kubeflow to spin up Our notebook within 30 seconds We compiled A pipeline that uses Local volumes for data exchange We ran this pipeline Starting from a snapshot Of a notebook And then when this pipeline Was behaved or broke We were able to attach a new notebook Into clones Of snapshots that happened For full introspection That's it Any questions you may have? Yes, please Let me bring you the mic Thank you for the demo It was very impressive actually Is there a place where We could find documentation About MIDI KF? There is not much documentation On mini Kubeflow But there is lots of nicely written documentation For Kubeflow itself So mini Kubeflow is just A packaging of Kubeflow That's meant to run locally So we have tuned Kubeflow So it runs easily And fast on your local laptop But if you go to What about the vagrant command That you shared on your first slide Are you going to share that somewhere? Yeah, we have a Yes, I can show all of this So There is a blog A blog entry on the Kubeflow blog That describes mini Kubeflow And then it points to our website Which is this And you can find The network is Well You can find Both where to get support We have a channel On Kubeflow Slack And this is the installation guide Essentially you need quite some RAM And then And then It's running vagrant Virtual box and running this to command This is it So if you go To Kubeflow's blog This is the blog You should find this It's up and running And this should point to Whatever you need Got it, thank you But I should have a final Yeah Presentation slide for this as well So let me Oh, I missed it, sorry This is it Yeah, we're at Channel Hash Mini KF On Kubeflow And if you go to the Kubeflow site We'll find the Kubeflow blog and everything Any other questions? Thank you