 Hello, welcome to our talk. My name is Chris and together with Tinko We're gonna take you on a short journey how we adapted cloud native tools to build our data analytics platform and how the recent addition to that stack with those advanced schedulers help us to You know achieve a bigger scale or operating that multi-tenant environment So that's gonna be our story, but just to begin with we're gonna introduce to you to our company So I'm G is a financial institution that operate globally with with European roots You have over 52,000 employees working class Yeah, with our mission is to empower people to stay ahead in life and business Also that roots of the company are from the deep economic crisis of the 20s of the 20th century and Our mission right now is also to help to transition our world into the new era with To shift in the low-carbon future and to speed up the innovation in finance Being like over 10 years in that banking industry is also a fascinating area where a lot of regulations are there So it's adopting a new technology. It's never easy But also gives you because one of the most important thing in the bank is trust So that was the way we want to handle data with respect to whatever It's only required to the data to be handled and to respect your rights in that regard the mission For the company but the mission for our platform is to become a data driven what we want to emphasize here is like Support our employees with the self-service platform So we are in a platform economy and what what what we build the mission is to empower you to build on top of That platform and to solve the business needs The growth factor so we started around 2018 the whole platform roots are in 2013 when we started to adopting open source technology to bootstrap the product initiatives In 2018 we are starting to look in the new version of the platform when finally we see that the transition towards the cloud native Tools help us to to build the platform with the new foundation On the infrastructure layer the numbers may be not impressive, but we have a stable grow Since then and the adoption rate is growing and over 400 projects Exist on our data and its platform so that self-service paradigm really helped With the adoption rate What we what we built is essentially a product of the product driven mindset. It's something that is embedded in our in in our mindset Using the modern open source technology what you see here is like an entry point our platform That allows users to use a significant compute power With the some security and compliance configuration embedded inside Supporting not only global, but also local needs of our users three foundation pillars for the self service platform is colorability seamless and security and We want to emphasize on that engineering capabilities of our platform that engineers can start their Data analytics journey using predefined pipelines building sharing testing the deployment and create new insights for the business Data is stored in a secure way. We what's one of the most important factor of the platform is that? sharing resources sharing data resources with the products Is based on it's based on predefined roles the tool sets its Is then also delivered for the latest tools that it is then available for the users We're looking for the ultimate answer and in my Perfect way the ultimate answer would be like a search box when the users would like to type whatever I want to do next and Everything is gonna be taking care for the users and we guide you through that user journey on the next step directly it's a little bit complex obviously you have that Pieces of the puzzle that you need to combine to make sure that your platform Delivers on the promise What we think of this is the set of liars that that you stack up to deliver the end product the end product and Tyler for the user needs on that journey we prepare couple of Interfaces our users can use on the platform What you saw in that previous picture the landing page like it's a starting point for the users when they can Choose that from the portfolio of the platform and the products We have our opinionated data science toolbox that is obviously with the Jupiter notebook interface We recently switch from our custom-made built environment towards built bugs that helps us to build To build this environment with data discovery portal the cooperation with with lift folks Help us to enable the data discovery and metadata engine on top of that platform We have a super set for our BI and analytics needs The recent transition that happens for us was that That's something that's made it possible to transition to a cloud native tools So we swap we switch from the Hadoop ecosystems towards Object storage towards Kubernetes with the addition of the caching liar something that's also some talks were here yesterday Helps us to achieve proper performance on top of those object storage Implementation The key challenges that we saw so far with the of the cloud native tools as as you can see the Job job management scheduling and multi-framework support Still had up was there for us despite you know leveraging cloud native Technology like Kubernetes for stateless application. It was even possible to set up stateful application But still for the data analytics workloads were missing a major Mature scheduler like yarn is to enable those jobs running concurrently with the multi-tenant environment What what yarn was missing in on the other hand was support for those new frameworks that are happening like tensorflow pytorch So that's something that we're mostly also looking for it was gonna be next next big big thing for our platform With that that's a short transition towards volcano and we'll hand it over to think code to give you a little bit more Details about the our implementation Okay, so Hello everyone on the trail to for solution So yeah Chris just Told us a little bit about it. I hope you can take this image in the mind for us on premise means that we have a fixed cluster available and on our previous cluster we had this we had scheduling which was Split between hadoop and kubernetes essentially all the big spark jobs that we ran were happening on our hadoop yarn cluster For like auger reference like all these different kind of tasks and the rest was all fully kubernetes but Like the rest of world we actually aim for only having kubernetes since this makes The hill or paradigm much simpler So here if you look at from a resource consumption you Could see that if we have a certain split in workloads kubernetes and hadoop and we Then in dirty normal office hours You might see that when the kubernetes were actually like using somewhat, but not everything on the hadoop part. We're definitely Since spark is wants to try to use as many resource possible. We tried we have many jobs running But for example during the night once we run all big batch jobs, which are happening in our bank Then the batch over the batch part is like utilized fully, but like the kubernetes part is barely using anything Although on peak capacity you could see like that's that's all different kind of things are being used You might see that this is not very an optimal way of allocating and What's also is important to us is that we have a distinction between batch and users Interactive sex are the people that are using the platform immediately Batch are like the big processes that we have to finish on time. So to say for business critical applications If we for example have a full kubernetes cluster And we run all the spark jobs on top of it then the research loads might look like this where we have For example the pot running and all the all essentially all available space that the pots and the core service Aren't using it and a little bit of spare capacity for the rest of the deployments will be Will be available for spark to use So did this means that even for example during the night time then our batch processes We might be I said it we have much many more resources to use so Essentially, this is where volcano comes in There's Different as many approaches, but essentially we need a job scheduler for cubanita specifically currently for spark but in the future for different kind of technologies and What's volcano offers is that it has job queues with weighted priorities. It has the ability that you can Essentially queues are like the ways how you divide the cluster up. So if there are four users Running on the cluster then everybody gets one fourth of the cluster to use So their job can complete as fast as possible But also that's the ability has has To commit above the queue limits So if for example if two users aren't using anything then other users may use their resources So it's like a system that you try to keep try to Claim as many resources as possible And it also has the main ability to preempt both when more pots can come in and The lastly it also has configurable strategies to deal with competing workloads for example from a task scheduling perspective For example spark it is It is I say that you can preempt machines you can kill machines and then the job will still compete in you For tensorflow this might not be the case All these different these features these are already exist only they only exist in yarn and not in the cubanitas I said it in the cubanitas ecosystem so since the Release of three point three point zero does spark officially have support for volcano This is by the community. I This was made by the volcano community itself and There is also another Workloads and batch scheduler called unicorn which is now supported in the latest release of three point three point one So I will tell you more about volcano along the way if you But volcano essentially is a generic task scheduler So we as we see it you just you have surface scheduling and you have tasks scheduling and All different kind of tasks which like have a certain predefined moment that it will stop That's like tensorflow by torch spark and cube flow and you can think of any other application on top of it and essentially the configuration is Wouldn't say simple, but I wouldn't say easy, but it's more like it tries to be as As simple as possible. So you just have a job object, which is a small abstraction of both on top of boats In which you can select the amount of replicas that you want to use to schedule by and you can also add Policies like for example if the entire job is completed, then I will want for example If every bolt is completed, then I will want things to so that the job is completed But also things like I want if certain job both is Doesn't work, then it should restart like five times and It has this plugin about architecture in which you can select. Okay. I want to use different kind of plugins And there are two different kind of actions that can happen within volcano itself And then also you have this cue part, which is specifically needed for our spark setup, which I will demonstrate later so it's just a Relatively simple overlay on top of the cubanitas API So if you would look at volcano then also it Actually encompasses out of three different services you have admission service, which will check everything is correct Then also you have the controller manager, and then you have the scheduler in the scheduler is the main one making decisions Where to allocate the task etc so For let's get into the balancing kind of situation so what the main differences is that I Want to move forward is that? There are quite a bit of differences in strategies how you want to schedule pulse From a surface perspective and from a gel perspective For example, if you have surfaces running in cubanitas You want them to be spread as much as possible so that you if there's a certain note that fills Then you can still resume the rest of the services, and you have a lot of redundancy But this is not the thing that actually with for like high-performance in jobs that you want to do You essentially want to put the jobs as close as possible So you have less network traffic, and so that is can complete faster But also you might want to have that for example the Applications of all different users are spread as much as possible so that you don't get like in competing workloads on one note so What you here can see is that for this task scheduler while in cubanitas. It's mostly like by spreading all the pulse is that volcano actually has Many different plugins attached to it So you can have like the RF which is the dominant resource fairness algorithm Which I will get into you can have gang scheduling where you say okay I want to only deploy all the four pots when all the resources like are available for all four pots So that's like no both is hanging. I want to add priorities. I want to add Resource quotas. I want to add as a way. So for example, I want to check whether Certain both if there isn't available space that it doesn't take too long So there are many different kind of algorithms you could think of when You approach it from a task scheduling and this might mean this means for us that we can further Optimize this traffic of spark in a cubanitas cluster This is very important to us because essentially in the old yarn cluster. This was highly dedicated for a spark job So this has been optimized by by by many years of experience and in cubanitas we Essentially now still have to do it by ourselves The main feature that Was important for us and which why we essentially selected volcano is that we needed to have dominant resource fairness Which they have enabled So for example, if you have two users and for example, yeah now currently let's assume that you have a cluster with 18 CPUs on the one side and you have around Let's say 24 times 3 72 gigabytes of memory then You would like that that both users can use as many CPUs as possible so that no of their jobs I said that cannot execute So you want to have a situation where one user can have nine CPUs and the other user also gets nine CPUs in the system But that will If one user is using less then that might mean that you use 12 CPUs and the other user user use six CPUs so this is the part of like you have a weighted claim and This is done in volcano uses cues and you can over commit one use resource from another process and calculations The dominant resource fairness so that the calculators are done on the most dominant resource Which is being used so in this case of CPU because memory is used a way less and there's even some available space That means that in volcano Resources are preempted once for example user one wants to use more resources that means that Pulse from user two will be deleted To make space for user one So if we would go to a stack up then there's also the part of resource starvation for example, if you have two notes to your available then You want to have a long-running services for example with us. That is like presto 3.0 Or we have a cache like a lucio which are like we run on every surface. So we have Also different kind of compute options and also we have a Caching layer so we can access data as fast as possible That means that the rest of the space is essentially available for tasks. But yeah We also if we use everything for tasks that might mean that we cannot do any deployments anymore, or we cannot make any changes So for example, we added some available available Parts to it so that you can have always have some spare capacity in the cluster so that we can do deployments and Without any issues next part from spark itself It actually we have made some custom changes on the top of Sparket set in volcano itself But for us it would look like this where you say, okay, I want to use in my spark config I want to use volcano Then you can define. Okay. I want to use root user one. That is my Cue name and then I want to use this group name this pot group where all these pots are I said I said at this abstraction over it Spark itself will create this queues and both both groups automatically within our setup and then These started pots are medically assigned to the volcano pod group that you have declared We have owner references and driver heartbeat for garbage collection So that means that if somebody's spark session a spark driver will stop then the executors will go down itself. This is This is how it's done in the spark and Then we have mark spending pots Option to limit the amount of allocation So spark is trying to ask as many resources from the volcano scheduler as possible and If if you limited the bias then it will ask for less and less I said, yeah, but the main part is is that we have dynamic allocation Being used. So if you're running for for example now for the user like the main resource requirements are as much I say that Hidden away from them as possible. They just get the spark both In which they can run the query For example, if they need more resources than spark automatically will ask volcano like can I have more pots? Can I have more pots? Can I have more pots? to execute Process the job as fast as possible and Then it comes to a small demo which I wanted to show Here comes the part which I will say is that we work in a highly regulated bank So I cannot show you our cubanitas Commands and etc. So because there might be some sensitive information lying around but I can show you for example now the Grafana Dashboards which I have updated a little bit in which we run spark So and this I think will immediately make it clear for you guys how it works So for example, if you I'm adding one user to it It will automatically ask more executors executors as much as possible And it will try to fill up the cluster as much as full as possible So that will be like 40 for 30 for executors If suddenly a second user is being added that means that automatically that first user is being downscaled So like his pots get skills killed off and make room for the second user to come in and then that will balance itself again And then you have both have 17 executors To your To their how you say that to that average and then They are their job is like still running on the background if you add the third user then it will Automatically skill itself back that everybody has around I think around nine executors Sorry 12 executors And there you can see the amount of CPU and memory being used. So in this situation you also see on the right Parts that like we tend to use all the resources as possible and if for example, you remove Let's go back a little bit if for example, we try to remove user-free that means that Essentially we can divide the cluster back to Two people again, so they will get back more resources if needed in this situation You have like a static Allocation of the cluster, but then you manipulate how many resources particular job or user gets It helps you to you know have the cost at bay and giving as much power as possible to specific jobs If the cluster is empty the user gets full capacity of the cluster if new users are coming they're sharing the resources There was something really missing in the in the case to bring that data analytics computation towards communities exactly and For example, if we didn't have volcano then we would for example for every user We have to limit the namespace and the resource requirements that but that means that it Can only use like many less resource because we statically have to declare it Then there is available in the cluster. So this is for us like from a resource consumption standpoint This is very much needed And if we don't give it unbound then essentially they will fill the cluster and then we can't do any deployments anymore So back then if we for example then all spark Processes are killed then You will see then the fingers Everything is going down. This is all being ran. We're using spark interactive mode So that's quite nice so Yeah, this is just with these commands and they can just do anything on top of it and We provide this dashboard to users, but essentially most of this is hidden away. So they they might They might complain if many users are in the system and they get less space So to say about performance, but they want they will always get the option to run their spark job so then into a little bit of workflow monitoring We have Also, I want to show you a bit, but that's still not done is that we also Can have a DRF dashboard in which we want to show a little bit more fine grained in more higher Coral situations that we have one road queue we have an interactive queue which you know Interactive use are and we have a project queue in which all the big projects are that we run And for example there we can give the project queue more parity over the rest So that's our I said a business critical spark applications always get more resources to their to their If you have like a bad job CTL jobs that needs to have like a higher priority Because they need to finish do to bring new data to the cluster, but then you have those rest divided up for the interactive queues for interactive user sessions Okay, and then also we want to do essentially avoid traffic jams and Then also we were thinking about adding some cluster rush hour Part in which we can give the user a little bit more context about like probably when the cluster is more More used at the moment. So that's a more from a self-service kind of UX perspective Because we essentially want to make this hidden away as much as possible as you see It's like we are eliminating eliminating the toy full data engineers data analysts. So they start this session They do not care about the configuration of the cluster to run their job and they get the best performance possible What we also want to make it visible towards them You know, we're gonna be the best time to During the day that you can run your job because the cluster is less busy Exactly so Since I'm not a volcano maintainer I wanted to just give all the love to all the kind of folks that essentially made this terrific scheduler They did a really great job. I personally also think they did They did that this is like the best way to go because they have a nice Nice abstraction over how we can try to make tasks scheduling more formal and human users and how we can get more performance out of it I'm not a very great open source worker so These things are working and with a small amount of changes on top of it But I haven't open sourced it myself. So hopefully if somebody from volcano is here. I we can talk We have added the DRF dashboard we added spare space automatic queue management more Prometheus metrics updates to the Grafana dashboards cube state metrics and They leverage some cluster wide permissions and I have reduced that bit But essentially what they have what is all done is working and I think definitely this is the way to go and it would be cool in the future to also support like TensorFlow by towards like different kind of distributed methods. So we can add all different kind of tasks On top of Kubernetes this way Then I want to conclude my presentation