 Okay, so hi. My name is Tobias Henkel. I'm working for BMW and Today I want to talk about why we're using Zool and also how we're using Zool at scale Okay, so this talk is the focus of this talk is We're doing One two three, it's in the box now. Okay. Hey, but it works. Okay, so the focus of this talk today is Focus to automotive software development or software in the car as But BMW is also developing software for many different and married very many different areas such as for example back and mobility services also financial services But the focus of this talk is Regarding automotive software development for software which basically runs in the car mostly so To to For start I'll just outline a bit of the history of the automotive software development I guess most of you had already Already at the keynote, so maybe I'll do that a bit quicker So at the first there were only small independent ECUs And they also were just running small soft small pieces of software However, they still had to fulfill Very high safety standards because for example, it's very critical that an anti-lock braking system does what it's Supposed to do. Otherwise bad things can happen For this automated testing is crucial for the entire development process Back in that in these times Smaller systems were sufficient and typically set up was Jenkins base for example with a few build slaves then the existing features evolved and And many new complex features were introduced and that also leads to a massive amount of new code and new ECUs in the car and also This also needs much bigger CI systems Typical setup in in this area was for example Achankings with more build slaves, but most of the time we had several changes Instances because we had more projects to run it in parallel The software that is now now running in the car. It's much more complex software is now everywhere ranging from window lifters up to infotainment systems and Also the code sizes increased dramatically which also requires Massive increment in in the continuous integration systems also what's also very challenging about that is that we had we need Corporations between many internal and External teams and for for this for this to be able to manage that also automation is Not only crucial for the software quality, but also for running the software development process itself For example for creating and this pre-building releases so a typical setup was Jenkins multi masters setup With many build slaves for for example, but we also Are running for certain smaller projects already since four years Zul based CI systems and We're also an early adopter in that area outside of the OpenStack ecosystem and we also were among the first users that switched to the new Zulv version 3 That even was even before it was released So now we are developing the software of the future cars And for this we have to drive many large projects in parallel these require corporations with many suppliers and teams and This also now covers not only on board, but also off board software Automotive software is One is just one example for that. So developing autonomous driving is Very challenging. It is a huge effort and Also, it needs a highly complex software stick for this obviously safe safety is a must and we also have an excessive need For example for simulation because we cannot just rely on unit tests System tests integration tests and manual test drives. It is just not enough We had to simulate millions of kilometers to be really sure that this also works and In order to drive this we need a large-scale CI system and That is Zul So we're using Zul now also for our large projects. It is already in use And this is in use by international development teams and partners and Enables our software development process One key feature of Zul is that it scales its resources Horizontally, so when the project grow We can just add more cloud resources to the system and Zul can just use more more resources and so we can Support more developers But this is not the most important feature in Zul. So the most important feature in Zul in my eyes is the project gating feature of Zul so This is the key for not just only scaling the software the compute resources of the CI system but also it is the key for Scaling the development teams themselves so without gating There's often a process Called stop and fix so when when the build is read everyone has to stop and fix the build and until the build is green and When when you scale your development teams that obviously became the major pain point and so many big projects tend to Then just ignore the red build and just merge the features as quickly as possible and then this needs an additional regular fix the build day or in larger projects even weeks and this is also this also has its problems and leveraging automated gating with Zul can also just prevent breaking the build upfront and So also improve the development process itself dramatically further Zul is a great open-source project with a very helpful community and we're extremely part of being proud of being an active part of that So how does Zul fit into our existing environment? So one of our requirements is that we need to separate the projects from each other so we We work together with many OEMs and suppliers and we have many different projects and From from a company point of view we need to separate these projects with so the they cannot interfere with each other and they cannot see each other so This is a hard requirement for us and luckily Zul has a built-in tenant support, which is a perfect match for this So further we're using GitHub for development So we're so the choice to GitHub was because many developers out there are Also working on open source components for example and they're most of the developers also Are familiar with GitHub so that's the reason why we chose GitHub and Zul supports GitHub and we also contributed many changes to improve the support for GitHub so further Configuration is code is one of the principles which is very important for us so We not only want to to version the the projects themselves But also the configuration to build these projects also the jobs and so on that is very important also to when it comes to traceability for example so in in in the automotive software development you typically have need to be able to trace every single change of a specific project to up to via some systems up to the requirement for example and This not only applies to the code itself But also to the build system because you have to reproduce you have to be able to reproduce the build at that time For example, and for this we need configuration as code and that is one of the key principles How Zul works? Zul is a git-driven system. So that is also a perfect match Further We don't want to run many Zools for many projects But we want to run one centralized tool for all for all projects because That brings down the maintenance overhead for example But this comes at a cost because now when we're Running a centralized tool it is It has had has an an high impact if this if Zul goes down for example for a few hours that That would cause several hundreds of developers Being not able to work and that is bad. So what we have So our requirements is that we need to ensure and as high uptime as possible of the CI system and for this Zul is a great fit because most components of Zul are scaled out and therefore Therefore they just can tolerate failures of specific components. There's only one component left in in Zul that is not scaled out yet that is the Zul scheduler, but It's at least on the roadmap to make this scale out component So further also regarding Availability multi-cloud support. So we want to use cloud computing to you to distribute the build resources. So But as we have the high requirements to to the availability we also need to distribute the load of multiple clouds and This is just natively supported by Zul. So this is also a great fit So what's our approach to to run the CI system? So there's not only technical stuff around that but also process and organizational topics here So we at BMW we have a dedicated team of CI developers both on on the infrastructure running Zul and also in the individual projects. So we have a team that runs the CI infrastructure and we have in each project also a team that also That is mainly responsible for doing the jobs This is also This is important so we can have Couple of people who have the best knowledge of how to design the jobs for example because When we have these big projects it really matters how you design the jobs to not Bring down the system because you download Five gigabytes per job for example every time and that's with a parallel that with 100 jobs in parallel So that's gonna not gonna work. So we need People who know what they're doing That design also the jobs Further we are building up the CI community in-house so we're we're starting with with our infrastructure team and our project support team and Also, we're the project them themselves at least the key users are part of the CI community and we also have Dedicated chat group chat for example for that so the community can Talk to each other and Yeah, give best practices or ask questions. So Not every question has to be answered by by us who run the CI infrastructure So as already I mentioned We have one centrally CI system hosted CI system with a large amount of cloud resources because we want to to have at least the maintenance cost and to bring down and The most the most important advantage of that is that we have shared resources for all projects So we have not just little resources for that project and a bit higher resources for that project But they are just shared so in peak times each project could in theory use all resources that are available What's but with that approach it is still important to have enough resources for all so With so it can be a problem with If many projects use much resources and you don't have resources for all so that that is a problem and this is also a challenge but I think that is if you if you can resolve that so the if you can Manage to to every to have enough resources every time. This is a win-win situation situation for everyone and Further We're following an upstream first open source strategy. So we're running Zool from A head of master to be fair But every single patch we run in our Zool is already in review Upstream before I take it into our Zool so every so this is I think one of the most important parts when it comes to Using open source projects. Yeah, you need to contribute back and if you need patches to fire to run Then contribute the patches back. That's the most important thing When it comes to to using open source projects in my eyes So what is our technical setup? Obviously we have a Zool and We have a note pool so No pool is the component that manages manages the build resources and This isn't running up open stack, of course But it's not only running on open stack so because of the our Requirements to availability. We run Zool and note pool on top of open shift, which is distribution of Kubernetes This has some some advantages for us. So we can so open shift has Some self-healing capabilities health checks life nests probes and all this kind of stuff Which for example if if a component of Zool fails or crashes then it's just Automatically restarted on a different node. So even if if a whole BM then crashes for example and takes some some Zool services with with it Then it's just rebalanced to a different note and the system is restored This work this would work Perfectly if we would have the scale out scheduler. So the Zool scheduler is the only component that loses its date if that That occurs, but this is Still to be fixed so also What is important for us and I think for all of you who want to run Zool on on open stack is Is that you should separate the ten the control plane and the build resources where node pool operates in Into different tenants. So that I thought that is Very important to do that because node pool also has some cleanup functionality, which usually Works almost works perfectly But as it is software, there's there is a small risk that this might be misbehaving and if and I guess No one wants his control plane to be cleaned up by no pool because the VMs are leaked. So Just be safe better safe and sorry, so Just let work will let no pool work in a different tenant Also, no pool has support for static nodes, which is important for us because we have Several use cases for static nodes. For example, we are testing our our software on on hardware racks and This obviously cannot done in a virtualized manner. So This is also a good feature for us. So we can Do real hardware testing with together with no pool Also, we have the github which is so also connects to github and Just reacts on on the pull requests similar to Garrett Also, we are using the Zul jobs repository Mainly currently for for the base jobs and some some minor stuff, but we are planning to use Zul jobs much more than we are currently doing And We have a similar repository. We can we name it see I live Which is our internal Zul jobs repository where we have Zul jobs and Ansible rows, for example that are shared between all our Different projects and this see I library is also part of every Zul tenant we have we also use that to develop functionality independent of Zul jobs and we are also looking into Which parts of the see I library is? Being able to be upstream to Zul jobs. So actually this should be Deployment specific or BMW specific Collection of of jobs and rows and what's Interesting for everyone. We should we will upstream also to Zul jobs also, we don't even don't only have git and and github, but we also need to Have a binary store for example build dependencies. We still have we also have problems that that needs that have binary dependencies and Binary deliveries for example from from some OEMs. So we store that in Artifactory Also, we are using that as a mirror for example to not have to download everything from the internet and I'd like to share some of our users feedback. So this is how for a year ago where where when we When we activated the first projects for Zul and the first our first users then use Zul for the first time so Many users at first were a bit skeptical because they didn't know this concept of automated gating and for us because they just knew for example Jenkins and clicking the jobs together and so on and there's a There's a huge mind mind set change that is necessary to switch from in clicking jobs together and Jenkins to Git driven system with Zul and automated gating. So They were a bit skeptical about that but After just one or two weeks of using that they were happy about the concepts and Also, what we found out was that really and also the big projects Benefit from from the automated gating. So we had we had in Zul in use already for for the some smaller projects Already more than four years ago. So we know how gating works and these smaller projects also Benefit dramatically from gating, but this is was also the proof that this works also for for bigger projects So the bad thing was that most P and most developers were C++ developers also in in our automotive domain and Ansible comes from the DevOps world, which is completely different to to them So then it was a relatively steep learning curve for for the users to get used to the new concepts Yeah, and Just what just to share some two Citations so one one developer said never again have the project get broken by a developer never ever so that's He's very exciting excited of of the gating, which is really the most important thing in Zul and Another developer said some weeks after the migration. It just feels right. So this is Our user feedback which I'm very proud of because they were very skeptical before that Yeah, so Thank you very much for your intention and Are there any questions? How how do you make your deployment outcome compliant at the end? so means when Your different teams are deploying the application how you make sure that the outcome at the end What you are deploying that is 100 person compliant to the company rules What do you mean with deploying so in in our automotive domain? So we're developing software which is not directly deployed to a car, but it is tested and released and checked with if if there's for example a JIRA ticket linked to and Yeah, then it's And then we have a release process where there's The the artifacts are signed and so on and there is a package created which then can be fleshed to to the car, so we don't have this full chain yet to be to Deploy the artifacts directly from CI into car. So we're not there yet, but Yeah, so automotive industry moves slowly in with regards to that Thanks. I have two quick questions What if I run a Kubernetes without open stack? Can I still use Zool for CI CD? You can still use Zool. So we're Running Zool on on Kubernetes, which is not the common case. So because we're just doing that with for for availability reasons and You still have the choice to Run Zool just on bare metal or on VMs or on your own Kubernetes And you also don't Need an open stack cloud for example for Zool to work. So no pool also can work with static nodes and Just I think one or two weeks ago. There was also patch landing in no pool that also added Kubernetes support for Getting the build resources. There's still some some some small piece missing in Zool to use that really but Zool will also support soon Playing container workloads for example Okay, thanks. And my second question is about Ansible because I didn't see that in your diagram So where does it actually come into the game? So Ansible is the language in which you describe your jobs so This is just so Zool has two parts of the configuration one is the the jobs, which is a custom specific language based on Yammer and the second is what are the jobs doing and that is just plain Ansible and This is all yeah, and this was well New to most of our users I have a question So how do I install Zool? Is there a standard procedure or documentation available? I don't see anything obvious There is documentation available on Zoolci.org If there's something missing we are still open for contributions, but just one two weeks ago There was a new getting started guide Edit which is based on Docker Composer tried out, but our open open shift Specific deployment is currently specific to our deployment So to deploy on a private cloud every documentation is available Yes, okay So two questions actually so I work for a company that is relatively small Person to PW were about 50 ish developers You mentioned in one of your slides that you had kind of a central CI team with dedicated people We did the same it didn't work very well for us because the requirements from all the different teams were Just different enough that the CI team had a really hard time providing something that is You know useful to everyone So I guess the question is how big is that team in your case in comparison to the others and then question number two How do you you know keep them from going insane? Okay? So with that regards we have several layers of teams actually So we have we have a handful of people who are running the infrastructure part of Zool and Zool is Very generic, so you you don't have to do many customizations to to Zool to Drive so so Zool is that generic then you can that you can drive almost every workflow Without customizing Zool just with job configuration. So that that lowers the barrier for that and The second layer is that we have some In each bigger project we have a CI community a project intern on the CI community for example which are people who Got to know Zool in in detail and they They're knowing how Zool works because for the average developer. It's still difficult to to manage that complexity, but That's our two layer approach. So we have we have a central Team that that drives the infrastructure that also maintains the CI library And we have the second layer where we also have centralized knowledge in in the projects itself themselves Hi, thanks for a great presentation I'm coming from the automotive space too and you said you do this OEM stuff and then different vendors and different other T1 and to two OEMs via multi-tenancy in Zool. So you have one big instance for Zool Are your lawyers okay with that because on our side some developers even have three different notebooks for different code bases and it would be great crazy great for us if we could consolidate and do it to Easy multi-tenancy in a tool. Yeah, so also what we're doing is we have also Most of the code and within our GitHub and there also the access rights are just separated from each other. So that The developers of project A don't have access to the project B. So Essentially the the OEM the the supplier members of these development teams and With soon we have just a multi-tenancy where we can just Divide these projects and they don't share then any job configuration Except Zool jobs for example and the CI library which are then injected to all of the tenants. Okay. Thanks Actually, I have a follow-up question regarding the multi-tenancy You mentioned that you separate the rights on the GitHub part. However, you still need to upload the logs somewhere So how do you control the access to the logs? Which will probably contain a lot of stuff from the jobs itself. That is a good question So we're uploading the logs also prefix with tenant and we were just doing Path-based Authentication and authorization Could you say a bit why automated gating is such a game changer for you guys? so automated gating is a game changer with because You can scale the development teams themselves. So Back back in time where we had a for example a project which didn't have Automated gating yet. So they but still they wanted to test their Current state of the so they don't didn't want to to land patches and see the business red So you need to test before you merge it and then the problem is either you do it automatically Yeah with Jenkins and then your stick stuck to serialized gating Which is a problem with your jobs take longer? Yeah, then for example if your job take One hour for example, then you can just land 24 patches per day and that's not much What's will does is it does? It paralyzes this process in a speculative manner. So each Each change is then tested together with the speculative In a speculative future state together with all patches that land that acute up in front of that And if all go green then they all are just merged and this can this greatly improves the gating performance but back back in time where we didn't have any gating so there was Procedure so we did the rebased and usually you did you triggered at the The jobs if the jobs are green then you hit the merge button But then all other changes which are already there are Invalid because there are jobs well, you don't know if there's the work so you you rebase the next patch set and That's just work. You don't have to do So that's why automated gating is really a game changer in that regard Have you included calibration of your EC use and the CI system or is that later? calibration Often the EC use are uncalibrated and then you add some data sets afterwards Okay, so I'm not not that deep into the ECU projects themselves So I'm the person running the CI system and usually I don't care what they're doing with that So but I know that they're they have test racks and they deploy their their code on on that and I guess they're also using Test data any further questions, okay, then thank you very much