 Hi, okay, this is working great So I'm Miloš and this my colleague Christopher. We'll talk you about how we introduce kubeflow and Really drastically change how the data science process is done in PepsiCo. So Why kubeflow? It's a good question. We already had Kubernetes system We already had the team infrastructure team to like do all all the stuff related to that to maintain it and then the kubeflow was really a logical thing to add especially when you think about all the add-ons it has like Ways that you can easily serve the malls Which is typically a big problem like I've witnessed many times that project just failed because there's no chance to put them in Production so that's one thing very important for us. Second thing is definitely hyper parameter tuning We all know that they are for good malls We need to do a proper hyper parameter tuning and we really don't want our data scientists to just Run something on their own machine and then wait for like two days just like tall waste of time Of course like training malls and all that stuff on the infrastructure. So practically kubeflow help us with that So how we organize ourselves? We organize practically let's say in three groups One group is infrastructure people. So they as I said like Manage Azure cloud handle all the maintenance do all the stuff of permissions onboarding people on on the platform and that kind of stuff also screaming about costs to us Second group it's us First we were called ML Ops But we quickly changed name to MLE because we are practically just specializing on Helping data scientists to use kubeflow and the whole platform around it to be able to do their job properly and The idea the whole idea of this story is that we want data scientists to improve models That's what they're like their skills are and that's where they can create a biggest like benefit for the company We don't want them to I don't know create all over again some same code like Code to extract data or something. No, like we will provide all of that We just want them to focus on on the on the on the actual model improving malls and Yeah, the third group data scientists, but it's not just data scientists there's like other other teams across the company that are already using it and Whole ideas of course to like build the cool stuff But that is unfortunately like much easier to set then then done, but okay, let's let's go next and see What was like the original problem? So Approximately like 15 months ago. We started with all this effort I will just try to explain it how how the projects were done like so you have Like business owners sending some requests to say hey, we want I know some predictions of whatever Then data scientists working on his own machine was like getting some dump of data Creating something locally and after some time there was like a model Model that only works like on his own machine. There's no chance you can replicate it Working totally in isolation. Nobody knows what he was actually doing So it was really problematic and then when they want to get the new predictions example It is really funny like then they send him like literally they send him an email with new data Then he gets the data runs again the mall But think about maybe he's already on the some other projects or he has to like revert and go back Then he runs the data then like send the email back Really really not efficient even funny and as I said like really Almost impossible to put in production to automate the problems with libraries, etc. So it's a total mess Of course, yeah, I said like collaboration zero like everybody working in isolation They are like all members of the same team but working in some different projects And they don't even know don't know what who is doing what everything is it was so so isolated Of course as I said production is very painful thing. So it's also a big problem because like Okay, they get them all and now what like it's it's a huge problem like how to put something for production like so Practically data scientists were just left alone to like handle all these things by themselves And of course it was like really not efficient It was not working very well and as I said like wheel was constantly being reinvented example I was going through different repositories and over over again They were creating a code to connect to snowflake Sometimes better sometimes worse but all the time the same stuff and they spend like two three days Test and everything and then maybe they're proud. Oh, yeah, this is working But this is just wasted like three days that they should be Focusing on on the model itself So the whole procedure like of introducing q-flow We were at the same time like building a plane and flying it and learning how to fly and everything So we were learning how to use q-flow infrastructure team was learning how to organize how to maintain all the stuff and Data scientists were practically being forced to like shift from their like cozy little space where they're just using I don't know Jupiter notebook Creating some nice little models and now they have to like understand the whole concept of q-flow and how to compile and All of that and learn to learn Kubernetes right because you cannot run from it So that was really painful in the beginning so a lot of trials and errors We only had one little repository with some really poor documentation. I wrote that really bad in documentation. So Pipelines were complex, you know, like when when you go and see examples the q-flow pipelines Those are all little cozy Examples Wow, this is nice read data like do some cleaning Create a model. I don't start case server. Wow great It's it's working and we we started with like parallel loops inside parallel loops and then in that inner parallel There was like hyper parameter tuning in parallel manner So it was totally crazy because that's the actual real-life problem they had so we started with totally crazy pipelines and of course How it is implemented like the hyper parameter tuning is was very complex And all that stuff like promised without the indication. So all these things were really big problem for our data scientists and also we have to be aware that the thing is like People don't want to change I was kind of afraid that will be some kind of Obstruction that say oh, this is not working. This is bad like we just want to keep all the things that were before So we really needed Time and we needed to invest time and effort to help them to make like as How to say as simple as possible for them to use so there are no excuses not to use it First thing this Chris idea, and I think it's great idea to like create monorepo So enough of everybody having their own like felt like little Repositories so everything goes to one big repository. It's very good for collaboration for knowledge sharing That that was really good idea then we built this from a few CLI It's called from a fuse because it brings like light to people like practically giving a power For data scientists to use Q4 more easily. That's built on kfpsdk and Practically, it's well, we'll talk about that in details a little bit later There's also some other other like Greek mythology named parts of this Whole like ecosystem or or ML platform So like component library and common pipeline repo and so generally the whole idea was to just give them all the tools necessary To be as easy as possible and just like let them focus on things that matter So let's go quickly about this mono repo. What is like there's some process cons for the organization and your Company like whatever you want to use it or not but the idea is to enhance like code visibility integration and collaboration and Dependency management tooling standard so a lot of a lot of things that I think Definitely improves like the business process and and how the things are done and also now They're not like repeating all the time the same stuff They can see like pull requests on something say this is like the project This is something different but still there's a lot of common things that just reuse it and they are also learning So that's really really good thing And also here's the joke like because I realized that at one moment of time I was just like merging pull requests like never ending and of course like something like one pull request to cripple them All so that's something you have to be aware of like people have to Like some kind of hygiene is needed. You don't want to end up with like 500 branches and that kind of stuff So some things should be Like rules should be followed, but I think it's a very very good thing The main part and most important thing I was built was the CLI practically is a command line tool which tries to hide or underlying Kubernetes things from Users so in our case data scientists We like if they want to learn like no problem. I don't have problem Actually, they're really welcome to learn about like Kubernetes and all all the things But I'm just afraid they don't have enough time to to do that Like and maybe as I'm repeating or over again, maybe they should focus on improving models So this tool is practically just hiding all those things. There are those are all things are done in the background Also authentication. So we started with the authentication where you had practically to go to user interface Where and you that's in Chrome people use interface and then get the cookie copy paste it like Exposed it as an environmental variable and then run your pipelines. It was kind of pain in the neck So now that's done just in one line. Everything is automated and we'll show you a little demo about it Also recurrent runs now just implemented with a decorator. They just use like the crone syntax to like Run it run just one comment and it will it will create all the stuff they need Hyper parameter tuning also it's improved. They just define in Python dictionaries run it We all already have like all the tools like to Create images for them to put in the right ACR place like for container registry on Azure So all these things are automated. They don't have to think about that Also deploying case served practically for them is just to have like one method like predict And there's a decorator where they can just define where they want it to run And that's it like really really convenient because I know like I was doing like all that stuff and Like it was hard for me and then especially for them would be hard to do all all the things that are currently done like in Kubeflow Also the whole thing of them being able to use like GPUs and The large memory machines is now simplified So it's very easy for them to use and again we have always like kind of a friendly discussion in the infrastructure They're always trying to lower the costs and we always want to get as much of resources as possible So this is like Yeah, it's kind of how this this CLI looks like so this is like one example Indicating and then this is like kind of all the things that are just Automatically your log to some like either staging on production and You get all all the stuff so really it's really easy for totally new users because we want to on board like a huge number of people from different teams as a big company and It must be very easy for people to use Okay, and I give my microphone to my colleague for the next part. Thanks. So First of all, we named it Prometheus because yeah, like I said, it's like we gotta bring fire to the masses and Five seconds later. We thought well, Prometheus is already like a very popular Kubernetes tool. Maybe we should name it something else And I said it's not like we're ever gonna like be on a stage at a communities conference So we can name it whatever we want We ended up naming it Prometheus because that's the Serbian version of it But so if there's any confusion, it's not actually Prometheus Prometheus, but On top of that, you know to give you more of context of what Milos was alluding to a year ago I was on the data science team And I hated Kubeflow. It was a thing that People that MLops was setting up and I was like it's so clunky and you know I've used like prefect and all kinds of other tools and like I'm just gonna do this myself and the person who was the original head of MLops was very much an infrastructure person and she did an amazing job of actually setting it up She left Last March last April and they came to me and said hey, you should take over that job And I said well, I hate Kubeflow and I don't I'm the guy that's always trying to get it's kind of sneak around it And they're like well, that's why you have to that's what you have to be the guy you have to make it An actual tool that people like you data scientists Not only tolerate but really enjoy and so we've been kind of obsessed and thinking like okay We've done a lot to get it as infrastructure as a robust tool from the ground up Running but there's a lot missing from from the kind of the bottom down right from the data science developer experience To actually getting it to run so ton of work on you know taking taking stuff That's already in Kubeflow. That's already really nice That the you know the pipeline and component decorators the SDK and sort of just wrapping extra goodies in there Right, so what are the kind of things that you know? What are the kind of things that we want to kind of add to the Kubeflow compiler and and we injected them Once we got that up and running like like Miller said we had and this was not so much a Kubeflow issue But I think you flow we saw Kubeflow as an opportunity of a place where people could run Run serious jobs and share their work in the notebooks In a way that they weren't able to do before and once we started once we started saying like Okay, what is what is the universe of things that we're building? You know all these data scientists across multiple teams are building, you know a lot of time serious forecasting, right? We always want to know like how many mountain do's are we gonna sell tomorrow, right? We do recommendations We do, you know marketing modeling and Just various visualizations on EDA So like what are the what are the kind of tools that we can we can bring up that abstraction chain? What like and a lot of which was already there? things like what one thing that we started even a lot especially in the last month I keep trying to like point out to people that We can do, you know, we can put plotly plots directly in pipelines as as metrics as outputs Save HTML and people thought like we did something really impressive, but it's really just you know Kubernetes or sorry, you know Kubeflow Can output that pretty nicely so Do we go to the next slide? So this is an example of so we're talking about before like you know, we went in all these different repos and we use Snowflake for basically all our data and Everybody had their own notebook that was like on some random branch of some random project And then they had one what a file called utils and then there was like 200 lines to connect to Snowflake and then you're like Okay, I don't know why you're doing exactly. This is fine And you would find these all over the ways every everybody's got their own like 100 or 200 lines to like Read some secrets or read a file and we're like the no no no come come to Q-flow you'll get a better machine anyway if you really need to scale out and We'll make it you know you add your SQL query everything else is basically defined for you You'll see here actually so we define like different clouds one of the big things that What because when I took over a year ago? They're like oh by the way also we're switching from AWS to Azure and I thought like okay this is gonna be how I get fired because it's not gonna work and to you know a lot of I think a lot of a Lot of tools talk about being kind of cloud agnostic and making that switch easy This actually was pretty easy switch I mean there was some permissions we had to change here and there but actually like the kind of lift from Q-flow AWS to Q-flow Azure was Pretty much pretty much pain-free Total aside, but so that gives you an idea of like the kind of stuff, right? The you know common components to find in a single line. Do we go to the next one? And then you know here's here's another problem that data scientists have been Doing kind of on their own in a ton of different ways so we said Okay, if everyone who kind of is interested in doing data drift, let's find that like what's we're okay Get all your cool. Let's go to our code together. Let's sit down. Let's figure out. What are the tools? What are the like? What is a common component that you can do? You know you can say from from the main set of components Import and then you're just kind of plugging in right you're plugging into your pipeline these comic components We want to make things We want to make the easy things easy and the hard things possible So of course, it's always you know trivial for you to say I want to I want to start from a you know the most generic Q-flow Component add add to packages and and build my own thing you can certainly always do that But we're trying to build more and more as we see it where it makes sense Build things up that extraction chain cool, so yeah Obviously a big part of this was Was the culture shift right going from going from having either You know our data scientists either you know They heard the word Q-flow and then it was like this thing that you know it was like going to make them use or something and We really needed to we really needed to make them Kind of accountable like you have to use Q-flow and we have to make it easy for you to use Do you want to go to the next slide and Yeah, so We're super fanatic we're super fanatical about about our users and Again, you just kind of alluded to this more and more we've been taking For people that are interested data like data engineers who have did engineers who sit on on in projects where There's not enough of a there's not enough of a reason to bring in like an entire data science team to that project but they need some sort of some sort of analysis right some sort of model and they're happy to You know Pull open scikit-learn or even pie torch and and build something simple to get some quick analysis out So we've been we've been and these are people that use airflow. It's on so we've been trying to make Make it so that on-boarding basically anybody who knows Python Don't even have to be a data scientist, but obviously you want to use you know, you want to use some models How do we get you know Python? How do we use it? Is Terry the next one? okay, so You'll see my giant bald head in there But we we've taken kind of like it's rule like community kind of a community open-source approach to the whole project of We're making tons of you know tons of tutorials and how-tos and demos We have it like this big playlist internally We bring people on and we make that we make them write code We'll say like okay We're gonna we're gonna film you writing a pipeline for the first time even though you have no idea what Kubeflow is and Some people were amenable to that some or not. So we've got one in fact kind of a Testimonial from from from one of our data engineers science team in the past and remember being extremely intimidated by some of the tasks I Successfully completed Never been my strong suit and I still struggle to this day Now I'm far from an expert and I'm fairly new to both machine learning and Kubeflow Yet, I was able to create a pipeline with just a few lines of code using Prometheus Prometheus removes the guesswork making machine learning more accessible for developers at any level Okay, I told her not to make it sound like such an ad but Yes, so so Terry Terry's a data engineer and she's on this project for Kroger, which is a big retailer in the US and It was a project where the data was kind of it was small is not so good But they wanted some some media they wanted some some kind of small models and Terry's like, you know, I'll take a crack at it So she got up and running You know, we kind of walked her through the general UI of Kubeflow and then our tooling and yeah Probably 20 30 lines of code later. She had I think a random forest or something and and people seem to be pretty happy with the results and You know, so that's something we're trying to really do as we go obviously make it so the kind of data scientists that come to us and say hey, I need three GPUs and Tensor board and all the goodies you'll get that or hey I don't even really have a data science ML, you know sort of problem but I want to look at some visuals really quick and Airflow doesn't really do that very well for us and this and that and We want that to be a put kind of a more general place for people to experiment there. So at that point Questions, I guess Hello Yes, my first question. Well, no, I would have several but Just did you need to integrate This authenticator that you talked at the beginning like one click on the indicator to use in Kubeflow Do you need to integrate with some something else than decks? I know it is the authenticator that comes with the Kubeflow installation in the beginning Because where we have some problems like integrating this with key club or other things that we have in the company So that's that's a question. Thank you Yeah, the one kid authenticator it's it's right now It's a bit of a hack but it's growing into proper software or like which is like several try accept if else sacks Yeah, the original I think eventually we're gonna try to get on to SSO That's gonna be definitely easier said than done, but yeah for decks, you know, we are authentication is to No, so yeah, the workflow is basically Go to go onto the VPN our VPN. Go to our, you know, the q-flow UI Dex pops up you click github You authenticate through github. You have to go through SSO through your github. You're authorized that Now you if you're lucky and you remember like what you even came there for you're in q-flow and then you know If you if you if you do that Obviously like that the sort of github authentication will stick around and the SSO for you know for us for you know For a couple days at least Sort of do it every time but it's still painful enough that like you know I think I think people will just not you know if it's like even it takes like five minutes to just get set up and You know click through the UI then you then you got to go get a cookie You gotta like we have to put that cookie and some environment variable and then you know It's a whole and then and then people want to switch between environments You got to do the whole thing again to move to a different environment So yeah, the one click is basically we took a chrome driver And just kind of did all do all that for you and kind of step through the process and if you need to log in You know it's saved in chrome and everything so you just kind of click a button if you have to and yeah, like I said It's a little bit of hack. We're working on making it a Little bit a little bit more robust though But people I mean people I've been using at the time and I've been honestly using q-flow a lot more just because it's like Oh, yeah, I'll go to the I'll go to the UI. I'll click the same way you do like with github github off login It pops open. Yes, and you're in so Hi, thanks for your talk. I think it's really interesting and I see a lot of Common patterns, so maybe can you give us a bit more details on? why you chose to build your own components library and What this tool actually does aside from the authentication like pushing pipelines or Yeah Bit more details on those so regarding that example You have something that's like repeating over or again like they need to connect to snowflake because our data are in snowflake So as I said like during the presentation no need for them to just go and write again again the same code Also this components. Let's say that they're pre-built. They're all already Handling all the secrets and all that stuff so then don't before you know you have to like say you have to authenticate So you need to attach with some secrets then sometimes in the secret files Like the names are not like the same exactly then you have to like change it and there's a lot of all of those Things that just like a waste of time for users and they're confusing especially if they don't have the background So all these things are hidden and also like if you want to say we were talking about data drift like no need to like Reinvent the wheel like just use this component. This is input This is output and include it in so practically building and this is brought in process So we are still expanding from project to project So imagine if you're doing some let's say time series forecasting for specific products for one retailer And then the other group of people are doing this. Let's say Comparable thing for another retailer like they will just waste their time if they have the components You can just reuse here and there and you don't have to think about all the underlying Kubernetes things Yes, thank you. So can you give us an example of other types of components other than like data loaders? I think a good one is We didn't really advertise it a ton here just because I didn't want to show off tons of sequel I guess What one big thing we have and you know a lot of this again is like some of its cultural problems that we're not we're not necessarily trying to solve with technology technical solutions, but we think they're like making it easier to have those conversations one is you know like sourcing data and You know when we started working with data engineers more We brought them and said like can you look at some of the way we source data and they would oftentimes be like well That's not even the right table. We switched that months ago And this is a totally totally unoptimized query etc. Etc. So we started making common You know a common library of data sources, right? Because you know we something like Amazon sales is the kind of thing We're gonna want to use in a thousand projects or right so do that kind of once and then That component is now kind of declarative because in some you know this particular query In a depth in a dev environment has certain things that need to happen You know different roles different this and that you can declare all in one in one component And then we'll know just okay We're in the environment that needs this kind of setup adding pot annotations adding like you know kind of Kubernetes level stuff for for being on the monitor and track Because we don't want people to say like okay Go add a pot annotation and change this thing and this V1 selector whatever right so that the The kind of data library is a big one and then one that me was spent a ton of time on was was the hyper parameter tuning right like making that You know making those as simple as as simple as basically one decorator to set it up for Across you know different different modeling types and that sort of things Thank you very much