 All right, so yeah today today we're here to talk about how we scale opens for smash learning and how we use communities to Deliver food to millions of people I'm Steven, but if all I'm a machine learning engineer at at waltz, and I'm here with I'm edgy I'm head of develop relations at company called seldom so yeah So maybe for the people that don't know waltz. We are we started as a food delivery company in Helsinki in 2016 and we are now in 23 countries going from Norway to Japan And then we'll have millions of users. That's like the boring part But the fun part is that we have machine learning and we have a lot of different use cases The first one is supply and demand forecasting. The second one is recommended systems. Then we'll have logistic optimization Then we also have fraud detection and the last one with the longest title situation monitoring inbox prioritization So with a lot of different machine learning use cases, we have a lot of different needs and we have to address this needs and The first one Biggest need that you can have usually when you train machine learning is data access. How do you access your data? How do you make sure that you access it's in a secure way, but it has to be simple as well We don't want our data scientists to be struggling and to be like, okay I need access to that table or that table or that database but struggling for like I don't know days and days and we also don't want them to have access to the whole Database that we have and all the all the tables that we have So yeah, we want to have that simple and yet secure and then infrastructure as well Has he has been told, you know scaling is quite hard and then you need infrastructure to scale usually and When you train a model you need some resources. You need some computer resources So you're gonna need CPU usually memory as well and then you might also need GPUs and You might need one GPU or five GPUs or even more. So how do you have access to those? I'm a big believer that if you need something it should be easy to have access to it So if you need one GPU, you should just say, okay, I want one GPU You shouldn't have to think about all the drivers and everything. So yeah, we have a whole infrastructure for that and Then one part that is very important is fast deployment You know you machine learning is very interactive and if you make it slow for your data scientist to deploy something Then usually the whole process is becoming very slow and then instead of taking three months to deploy model to production For example, you take six months a year or even you don't even deploy it You know we know like a lot of models that are you train your model You're like very happy with it, but then you never deploy it to production. So that's why yeah We have fast deployment is very important. And so we have a lot of CI CD pipelines and a lot of different templates And the last one is standardized monitoring So we have data scientists that train the model. They're very happy. They deploy it But you know, I don't want them to have to go on graphana and be like, okay I'm gonna create my new dashboard there because most of them they just want to use graphana They don't really care and they don't want to create dashboards and also you train your model So you're gonna have logging and we want it to be the same for everyone at least for the basic You know, you can have I don't know like CPU usage memory usage and all those things They should I have to think about it We should just yeah provide it for them So that's those are the things and the different needs that we have and Yeah, maybe a quick reminder of the typical life cycle of a machine learning. So you have your idea You have a problem usually so you're gonna collect some data and then you're gonna recreate one model Usually the first model is gonna be terrible Then you're gonna evaluate the model and you create a new model and then you collect more data and everything And then you have a whole life cycle here and then after after a bit of time You're like, okay. I'm I'm pretty happy with my model here. It looks looks decent. So what you're gonna do usually is that you want to deploy it to production So you deploy to production you have your first model deployed You're very happy but then you need to monitor it because You know, you have a model But then if you don't know the latency for example of your model if you're doing it online inference Then it's pretty bad and it's the same. You also have to monitor the quality of your model You trained it on data from the past usually so you have now new data that is coming And so you also need to make sure that you know your model is the same quality It's like, yeah, maybe you deploy something and the data is very different now that what you train enough so You really have to be careful with that and then yeah iterate again You go back to the problem you go back to connecting data prototyping models and deploying again and That's like the thing. We really wanted to focus on making iterative work quick and easy. That's our main thing And we had different challenges when we started the the whole ML platform The first one is that we had projects that have been running for a long time at Walt And we we've been using measuring for I think five years But we only had a couple of projects that were running and it was very hard to create new project We had a lot of people we have some data scientists and they have amazing ideas But sometimes it was very hard to create a new project and be like, okay I want to work on that But then you don't have access to the data or then you don't know exactly how to deploy something so that was quite hard and Then the second one is that we had a lot of different tools We had we're using airflow Then we're using also some cron job to train your model But also yes, I'm safe maker and then sometimes I'm custom curd You know like someone wrote three years ago, and then no one knows exactly what it's doing So that's that's was like a big problem that we had and the last one is really the impact Because we didn't know exactly what was running or if it was running well It's also very hard to to monitor the impact of your models and be like, okay My model is pretty good, you know when I order food it arrives on time or not And so that's those are models that are very important for us and we couldn't really monitor it and One thing that's usually people don't really think about is like we had a disconnect between Machine learning metrics and business metrics. So you arrive to a meeting your data scientist You're very happy your mean square error, for example is lower But then we didn't exactly know what it was what it doesn't mean for the business And so it was like a big big struggle to you to be like, okay So when you start a new project you have some metrics that are data science metrics and then you have some business metrics So you want to optimize for both And then when you have meetings that it's a bit easier To be like, okay, my my error is lower Therefore the the business should increase or the retention should increase and so those were the main challenges that we had And if I click it works, yes So what do we want exactly for the machine learning platform? Well, the first one is easy to launch and iterate. I said that's already a couple of times But that's the main thing and then really a driving force for new endeavors, you know You're like you create a new project and you're like, yeah We have a whole ML platform and it's pretty easy to train a model It's pretty easy to deploy your model and then we also we have monitoring we have logging So, you know, we can we can create it and we are pretty confident that we can make it an end-to-end model actually And focus on platform velocity. That's a big thing as well And then tooling so common best practices You know, I have like we have a lot of tests We write a lot of tests for data scientist and then they can use them But then we also have different tooling like we have an in-house tool that is doing YAML to terraform For example, so our data scientist when they want to access some Bucket on S3 instead of writing all the terraform and everything then we have arrived the YAML for the YAML With the S3 bucket and then it will create the whole terraform without you having to do anything So that's pretty handy for a data scientist because yeah, they didn't sign up to do some terraform I didn't sign up to do some terraform. So pretty happy what I don't have to And then we also have different templates So you you're gonna train your new model and with the different tool that we use and I'm gonna talk about it later You need to write a docker file if you want to train your model and for that You know, you you might not want to run your docker file as root Or you know different things that are related to security So we have those templates so that people can just use them and then they replace They replace the template with their code, but then most of their work is already done So they don't really have to think about those things and therefore it's deployed in a secure way if I can say and Also a lot of automation. That's something very important You work on your feature branch you merge it to the main branch and then we have a lot of CI CD running We have automatic deployment as well running if you merge But yeah, I'll come back to that a bit later and the impact. Yeah We want logging continuous monitoring by default for everything So for the model that you're training for the model that you're deploying and we want that by default for everything without having the data Sense is to do anything and you make machine learning a core business component it's been used for years But we're still not there and I think yeah if we can really have a good ever platform Then we'll be able to make it great and a core business components So yeah, and how did we start then? Started to create value very quickly for a data scientist or data scientist our customers I guess most of you work with customers. Usually you have to create some values for them So they're happy what it's the same for us. We have to create values for them quickly So that means yeah fast iteration within automatic monitoring But for example being able to run a model in shadow mode You have a model running in production You want to run a new one in shadow mode so that you can make sure that you know It's actually good and the latency is not too high in comparison to the one that is running in production Well, we can you can do that easily now and Rolling updates. Yeah, you deploy your model and you're like, okay Maybe if I if it's break if it's broken, sorry Then you know you're not gonna update everything and then you have the whole what communities allows you to have with rolling updates and It had to be easy to deploy on communities as well Because when I joined the company, I joined a year and a half ago You know, we already had like 14 data scientists and I was the only one So you didn't you had to untrust of the data scientists and the different stakeholders You know, I didn't want to be to arrive there and be like, okay Now everyone trust me, you know, I have no background in building an ML platform So I had to earn their trust and to do that we had to deploy something that was quick to deploy on communities and the last one We have a motor at wall which is to focus and so for that we also wanted to focus on only one component Meaning that we didn't want to build the whole ML platform, you know from the first month we had to only one component that quits value quickly and We can have a quick impact so that we don't spend, you know, I don't know six months eight months working on something Something that can be amazing. But then your data scientists will be okay I'm not gonna use it because I don't need it, you know, we want quick feedback as well from them We expect quick iteration from them, but on return we want quick feedback from them So it can be like, okay, what you did is pretty cool. You have to improve that all that so those but that's how we started and That's what it looks like now today So we have a whole ML platform now And we have flights here, which is which is what we use to train and orchestrate our different workflows So that's running on communities as well and it's fully distributed It's fully scalable and it's running on dedicated Kubernetes clusters that we have and Then we use ML flow for experiment tracking Well, it's very famous, but yeah, you can track metrics and different things related to your models and Then we have some Python services that are here to provide automatic updating for your model and different things that we need And the one that I mentioned here as well, which is the YAML to Terraform It's also something that we have here and the last one it's Seldencore So it's what we started with. It's our deployment service And it basically allows you to put models into production microservices and to have automatic logging automatic monitoring Shadow mode a bit test. So that's what we have now and that's what we're going to build on in the future Yeah Go for it Great. Thanks, Stephen So yeah, just a quick show of hands if you've heard of or used ML flow before Okay, cool. So for I don't know people joining online or watching the recording that was maybe half the room I'll do a very quick recap and then I'll show a little bit of the kind of what Seldencore offers and then we're gonna jump into a demo and show some stuff live But I'll try and go through things very quickly. Okay, so the ML flow is basically made up of four components There's projects which allows you to kind of create this reproducible environment You can define the kind of model interface, etc. Set all of that up There's the tracking API, right and this is a bit that will use very heavily This is what allows you to to track every experiment that you run You know every training run the parameters that were used the data and the versions of Configuration and so on, you know, and you'll see some of that again when I show that in a second It has its own model format, which is really good for making your models particularly in different frameworks reusable in the same way And then it also has a model registry, which is a way of handling the life cycle of a, you know, machine learning model Particularly if you're using ML flow to track, you know the training runs and stuff like that Okay, so what's Seldencore? Seldencore Steven mentioned a bit already. It's but it's an open source Project that allows you to deploy and monitor models on Kubernetes. That's kind of it at the highest level If we dig a little bit deeper what it actually does is it creates these containerized microservices with a rest and gRPC interface And then if we're digging in even further, right, it actually provides these like highly optimized inference servers That you can then use to to execute your models code on when at you know at runtime when you're doing inference And if you have a look on the right there, that's an example of a you know, very simple Selden deployment crd and the I don't know if this point is going to work But you can see for example, I've got like this implementation Key value pair and then I've got you know SK learn server So it's using the scikit-learn server and I've given it this model your eyes So it can go and pull my model down and deploy it on a on an optimized scikit-learn server Cool One of the things I really really like about Selden as well is is this ability to do complex inference graphs? So we appreciate that like not every model just runs on its own stand alone often You know you have input transformers output transformers. You might have you might want to chain models together I've given a kind of example up here of how you might do Maybe two models running in tandem If I kind of think of an example for this maybe you know that if you look down the left-hand side here This model to might be a you know some sort of like image classification where I'm doing some you know Transformation on on it beforehand and then some output particularly for NLP We see a lot of use cases for that You know converting to word embeddings running a model and then you know converting back to some sort of text on the on the output And then maybe model one here just takes standard tabular data as and you know numerical inputs And then you can have things like combiners right which take you know the outputs of multiple models and then feed those back So it allows you to define the whole Machine learning like almost business logic within your deployment, which means your end users only have to hit one API and they get all the results. They're expecting And there's I've got a kind of example of how that you build that up within the CRD Couple other things so it integrates really nicely with everything in the Kubernetes stack, you know We use a lot for you know request logging response logging, etc We use elastic search for that, but you could plug and play different databases You can do distributed tracing with Jager It automatically will scrape metrics from your models and and the containers underneath pass them to a Prometheus server We do stream processing integrations with Knative and Kafka and then you can integrate with batch workflow Managers like I'll go or we actually have a CLI tool that allows you to kind of run batch processing workloads, too Okay, so that's enough talking. I'm gonna show you something a bit of a demo I mean I have to squat a little bit here because I'm too tall for this lectern. So Maybe maybe I'll go for a power stance Okay, just before I start I'll talk about the the use case So obviously I can't give you any of Walt's actual machine learning models and that their data and kind of show it live So I've tried to pick something that's kind of similar. So we need some food classification with TensorFlow They're basically a bunch of images that have been taken at different restaurants by people's phones and so on So you'll see that the quality varies quite a lot and maybe the the motivation for doing this is that if you can classify every image that you're using on Like a food ordering app, right and have the metadata attached to it You can then build up a profile of a user, you know as I click on certain images and go, oh, yeah, that looks nice Right, etc. I can be storing that metadata and use it for recommender systems later on So let's dive into that right and I'll see can you can someone at the back tell me if you can see this All right, do I need to make it a bit bigger? Okay, it's I don't know what this means thumbs up. I think Bigger. Okay, cool. Right. Let's go bigger All right, fine. So I'm gonna do I'm just gonna get something training first. So I've got this again. I'll make this one bigger to presumably that's okay Okay, great This is again a simple kind of convolutional neural net The things to highlight in this code rather than going through it in detail is this line here is really cool Right, so if you import ML flow, you can you can actually just run one line of code there This ML flow or keras to autolog and has this autolog feature for a bunch of frameworks And that will then automatically track a bunch of parameters Metrics around your model and then store your model into one of those reusable ML flow model formats as well So that's enough right to do kind of 80% of what you want to do. You can also do things like you know log Maybe manually logs and parameters in there. So I'm gonna do batch size epochs dropout, etc and then What I'm gonna do actually is I'll just run this Okay, and that might that I think I only how many epochs did I do only three so we're good That should take a little bit of time Okay, and you can see the kind of examples of images that have been classified by this You know some are good some are a little bit more ropey, right, you know down here this pizza image I mean that kind of could be anything. Maybe if you look closely you can see it's it's a pizza Cool so we'll let that run and what we do I'm gonna jump in and show you The ML flow UI so if I refresh this You should see That hang on any second is that running Cool, yeah, then it's already picked up my new training run even though it hasn't finished So if I jump back down you can see it's still training But it's picked up some of those parameters. I logged already And then if I jump in and show you something, you know from a previous run The cool thing here is I can look at like I can compare Metrics across, you know over time all of this is automatically logged for me by ML flow tracking So if I want to look at maybe you know the accuracy versus the validation accuracy we can see for this model Let's put this for epochs right rather than time Maybe after about 30 epochs. I'm just overfitting because you can see the the accuracy of the models going up But the validation accuracy is not So that maybe you know as a good indicator that I probably don't need to train for that long But there's maybe some data enhancements. I need to do I can also do cool things like compare runs So if I go in and you know grab two of them Let's go compare those and then again, you know look at the the different accuracy or loss metrics in there You can see so this one that had a larger batch size as we'd expect right kind of converges on a reasonable solution much quicker Because the machine learning model has more information about to read with each each batch Cool, so let's jump back down here We'll see that my model is completed and actually wait just quickly before I do I'm gonna jump in here and show you that you know again you get all the same information for that run I just completed. It's also saved a summary of the model. You know the actual Model topology that's been built within Keras I'm also I'm just gonna copy this run ID because then what I want to show you is if I let's do hang on Python model and then I'm just gonna give it this run ID So what that's gonna do is just saves me a bit of navigating around the file system But that's copying that saved ML flow model into a Minio Storage bucket on my cube cluster and then From putting it there what that means is I can now create a seldom deployment and deploy that straight on top of Kubernetes, so let's have a look at what one of those looks like right, so let's go Hang on. I'll play in a second Okay, so this is For the sake of this demo, right? There's one already running. This is my Model that's already up and running my food classifier that already can accept predictions And you can see in here, you know some of the obvious things I pointed out earlier Like I'm using this ML flow server because it's using that ML flow model format and then I've passed it this Model uri which basically saying like where where is my model stored? You know go and grab that and then importantly here as well like because that's in an S3 bucket or in this case on Minio Running on my cluster. Yeah, how do I actually access that so I have to pass this? secret in as well But that's basically it in terms of what you need and then you can see actually like in one of the cool things with seldom cores you can Customize like a whole load of the CRD underneath So, you know if you want to write the full container spec you can do that and in this case I actually needed to because what ML flow does when you deploy an ML flow model is it recreates the whole environment from scratch so For this model, it's gonna go in fact. I'll show you it in the UI So like the model we just trained you can see these artifacts and there's this Like ML model file and there's a condo yaml. It uses this with all my Python dependencies to recreate the actual environment that the model is gonna run in the reason I then need to set timeouts on my Live-ness probe and my readiness probe is that that can take a while particularly on you know I've got rubbish Wi-Fi here in this venue and it's installing tensor flow from scratch, right? That's a pretty big download So I can do cool things like that What I'm gonna do for this one is I'm just gonna go Show you how maybe for a new one that we've trained which was only three epochs and probably pretty rubbish model Let's not roll it straight out into production. Let's do a shadow deployment instead So if you look at this one again, it's the same as the deployment we had earlier, but I've added another Component spec down here with the same kind of live in this probe thresholds and Then this time I'm gonna let's put in that new run ID So this is the new model. We just created and just pushed to a bucket and then All I need to do to specify that it's a shadow deployment. It's just this one little line here So I can do shadow true right and now what's gonna happen is all the traffic that goes to the main model It's gonna be replicated to the other one But then the output is just gonna be discarded So if I'm monitoring it with you know, Prometheus and looking at latency things like that I can make sure the model performs well before I actually promote it I could equally, you know if I wanted I could do really cool things like Just say, you know traffic 20% right and then you know go and change the traffic up here as well to be Traffic 80 right and so I can I can do kind of a B tests and things like that on my model too. Let me just Just undo that Okay, so it's shadow true. Okay, fine And then how do I how do I actually deploy that model? So that's just a simple cube CTL apply deploy shadow Right, and now if we look at my You can see that it's creating so my food classified defaults already there, but it's creating this shadow deployment as well which When it receives request is gonna send to you as well So finally, let's just send a request have a bit of fun check if my classifiers any good. Let's go grab an image I don't know who here likes tiramisu. That was one of the classes of food in there. All right, this guy does nice Let's grab an image Hopefully it's gonna get this right, but Let's copy the image URL and then what we're gonna do is go Predict And I'm just gonna give it the URL and it's gonna go and pull that image down Oh, maybe not what they done wrong. Oh forbidden it that URL doesn't like me Okay, let's try another one Okay, let's try this one. See if I can pull that one. Okay. Yeah, that looks good. Oh It still didn't like it. All right Well, this is what happens when you do stuff live and try and grab images. I can try one final image But then if this doesn't work, I did copy the image address. Yep This doesn't work. We'll just move on The reason I want to show this is because the output of the classifiers kind of cool You get to see like live inference requests. Oh, no, I've broken something now. Oh My that's all. Okay. Never mind. I won't fix it now because we don't have time But the issue there is that I haven't port forwarded from my cluster to my device So what I do is I'll hand over to Steven so we don't run out of time and you can finish off. Yeah, thank you Thank you So yeah, maybe the very important question on that side is how exactly do we scale the ML platform because that was You know the title of the talk and on our side Yeah, so our data center star customers and what we have now we have dedicated community clusters only for machine learning We have ones running in the development environment production environments and then we'll have other ones as well So then you work on your machine learning and you use a lot of a lot of different data a lot of different nodes Well, then you want to impact the the rest of the production data. So that's pretty good Yeah, common tooling and infrastructure as well as I said our yaml to terraform for example is one of the best example we have Automation yeah, you automatically deploy to production automatically deployed to shadow mode as well Automatically do a lot of things a lot of automation and the last one is really predicting the future needs of data scientist At the moment we only have one model that is using GPUs for example And but maybe in a year we'll have I don't know five five different models And so they're gonna use GPUs to train the model But then how do you how do you run inference with GPUs? Do you need GPUs to run inference or do you use CPUs? That's like one thing that we try to predict so then when they arrive and then a year later They're like, okay, we need that. We're like, okay, we tried out, you know, you should do it like that So that's something that we really try to have and I clearly don't have time to talk about that But that's a very big model. We had running and we're totally blind on it But what we did now is that it's running on the infra and because of that now we can see like number of requests per second it receives and The latency the memory usage But one thing that is very cool is like we use Kafka a lot and now every response you can you can log it to Kafka automatically So then you can monitor the quality of your model and then you can we use snowflake behind the scene So then you get your snowflake and you can compare your prediction to the real values And you can do that easily because everything is locked to Kafka and Yeah, you can run shadow mode a B test and different things So that's that's what we have now And yeah future work. I will finish with that We want really want to work on the ergonomics of the platform data scientist for now I still have to struggle with Cube CTL and everything in some cases We want to get rid of that. So we want to make you acubitus completely Yeah, we want to hide it from them if they're not interested and Really have an integration with the rest of our tools We have an experimentation platform so we want to have full integration with that so we're gonna run a B test and then monitor the result and everything and I will finish on that because we're out of time. So yeah, thank you All right Any questions any questions Anybody got a question anybody hungry Jeremy so sounds good too. Okay. We got a couple of questions Yeah question you down here Sorry, sorry, sorry microphone's got me Hi, Steven. Nice to meet you. Hi, here go from VMware. Um, quick question on your YAML to terraform tool Why are you not open sourcing that? That's one thing we want to do But for now, it's very very targeted towards our infra But it's one of a goal yet to open source it at one point because I think it's pretty cool for a data scientist Hi there when you put model a URI on the spec Yeah, actually fetching the model and then running in your inference server there so you like it will provide for you like the Interface interface like HTTP at gRPC for you. Yes. Yes, maybe and you won't yeah Yeah, it pulls that model down right and that's something that's like mandatory within cell and call is that like you have to provide an external URI to the model because Like if your model is huge, right, you don't want to be like pushing that up with a you know cube manifest when you do it So and we have an init container that goes and grabs that down pulls it down Make sure you've got all the artifacts before it starts running the classifier So it could be also like a proper S3 not an M meaning. Oh, yeah Yeah, so yeah, I mean that that endpoint was one running on my cluster But yeah, just swap that for an S3 bucket, you know Google cloud storage, whatever It's supported like 40 different file types because we use our clone under the covers Any other questions? All right Hi, I'm Luke and nice to meet you. Nice. Thanks for the talk Have you ever considered using the cube flow stack? I say a lot of overlapping and what you did and if it is why you discarded it Yeah, so When I started we benched mark cube flow and the whole thing But I found it to be quite hard to use especially for a data scientist, you know Having to write all the pipelines and all the DSL and everything related to that I found it to be quite hard even for me and I was interested in the tool So I didn't really want it to sell it to our data scientist because I think it would have been too complicated for them So yeah, that's why we didn't go with it Okay, folks, that's unfortunately all the time we got for questions If you want to continue the conversation we can do so outside big round of applause for this amazing talk Let's go to photo