 Människa Johannes Fuffas, en senior principal software engen är ett valverkars och en av de drivers. Avsul att ta komponent. Så Suleys, det fallt sig i tullsjärn ett valverkars en del. It started as a bit like a garage project or small project. Men över tid har det öppnat och jag tror att en och en halv år tidigare har vi blivit det följda CITU. Så nya software-team, starta nya projekter med sorgkodet som går in i bilen, är uppnått att använda Zoom. Och här i bilden kan du se några av de människa komponenter som vi har. Vi brukar nästan bara använda GARIT som kodreview. Vi har en liten bit av GITLAB, men ännu ingen GITLAB. Vi kan inte säga inga GITLAB. För att vi har så många problem eller svåra som det är. Ja, och vi brukar använda AVS och ESHUR som klubbprovider. Och vi har runnit våra bäckande inkubinerar av Zoom. Ja, så vi har en massiv growth rate, eller i alla fall i våra terms, massiv. Vi har två liten team som brukar använda Zoom, och många andra saker, så att säga. Vi brukar också definiera bäckande jobb. Och lite SDKs och testenvironnader. Så vi gick från runt 200 projekter till över 600. Och vi var lukta, faktiskt, för att Zoom 5 kom, och våra kollegor, som jag hörde lyssna för tidigare från BMW, gick tillbaka. Så vi behövde det också. Ja, det här är en plått av vår nöjeusaget. Jag tror att det här tipset är statiknodet och de andra är dynamis. Och vi försöker få råd av alla statiknod, men de är som ruggsäck eller heritage. Och vi försöker jobba om det. Vi brukar ha riktigt bärmätta nodet i alla fall. Och vi har våra serverooms. Men jag tror att det resten av industrin försöker använda klubb. Och det här är vår nodusaget. Jag tog några rondom grafana samplatser för att kolla på Open Dev. Och vi är inte så stora. Det är en ganska liten komparisering, men för mig är det mycket att handla. Så här ser du att vi, jag tror att vi har en flest avgörande på 150 nöjer och opererar. Och det exekuter Q är runt 180 avgörande och Open Dev väntar mer. Och här är bildstatsen. Vi bildar runt, jag tror att jag inte har tagit en avgörande, men jag tror att vi har avgörande runt 4500 jobb per 24h, så inte per hår. Och här är några staten av vår sjäckengård för en av våra tendens, den största. Och ja, du kan se när det är veckan och när det är releases. Det är jävligt. Som jag sa i morgon, vi vann Volvo Cars teknologinvård. Och det här är verkligen något som betyder mycket till oss, because every employee within Volvo Cars can vote. We never thought we're gonna win this among all the other teams. So yeah, so that was really, really nice. When you start as a grassroots movement and you win technology award for a CS system, that's I think really nice. Yes, and we use accumulating a lot. I would like us to contribute more to the open source product with our developers. So far, we, I mean lastly, we haven't had any time, basically. But we have a nice cooperation with James, which provides us with custom images. And as he said this morning, sometimes we get our features that we really need before the released upstreams. And sometimes we wait with upstream releases because maybe, I mean sometimes we're in a critical phase in some project. We don't dare to do anything. I mean, yeah, maybe it's stupid, but that's how we operate. And some of the things that James has developed for us is improved Azure drivers and improved metrics. Yeah, the optimized reconfigurations. And something called semi dynamic nodes. And this is like a dynamic node with a timer. And because of the proprietary license models we have on some of our systems. It's much easier for us to have something in between static and dynamic nodes. And the same things goes for the enterprise wide semaphore. One of the problems we had when we just poured in projects from everywhere was that we needed to keep them in one tenant if they used the same proprietary license like a compiler license. And that wasn't really a good idea. I mean, that when I talked to James, that wasn't really how Azure was intended to be used. So I mean, now we solve our issues. We have solve our performance issues, but we have the enterprise wide semaphore so we can share important licenses and not have that as a reason to keep people in the same tenant. Or more, more, we like to keep them in the same tenant if they really belong together, if our software modules belong together. Ja, and this is a picture, me trying or our team trying to describe how we operate at the moment. So yeah, James drives and I heard during this conference many times people explaining the main principle of Azure. And for us when we migrated to version three, we did that with the knowledge in mind that our new generation of cars have a full ecosystem of nodes and they all, or many of them have dependencies. So I mean, if you remember, I said that we were like a grassroots movement and we mustn't default to start with. So people came from all different kind of systems. And then, I mean, finally for our core computer that we have in our core, in the middle here, we just, on a company level, decided that all modules that go in there should be, should be using Azure. So then, in that situation, I mean, we really needed to use the dependency function, both what people can state it in their work, but also stating their dependencies in their actual jobs like, okay, my thing here actually depends on these projects. And that, I think, is what really makes Azure important for us, that teams can collaborate, they can depend on each other's changes and still pass through check-in-gate. And the different modules, I mean, we have a lot of, they have a lot of dependencies between each other. And here is an example of how we use Azure's dependency management to do a rebuild. So in this example, we have a C++ base tech library, and we want to upgrade that because that gives us new features in a software stack. And what then happens, so in this picture, we have base tech CI, they use Jenkins, and they provide the library. And we have an integration repository in Git. And when they change, when you want to change the version, we have a manifest where we change the version, and that triggers a rebuild in Zool. And when the rebuild is ready, we build a binary and we test it in many instances, but we mainly test it in a hard-rendered-up setup that I will show a picture of. Yes. So all these modules have complex dependencies. I just made some drawings here to show. This is a small test branch. This isn't hundreds of modules because then you wouldn't be able to see anything. And here, on the side, you see Zool's user interface, where when the jobs are ready, you get this time graph. And I really appreciate it when this came into Zool web. And here you can see, yeah, so in this dependency graph, we start to build from the bottom of the stack, and then we rebuild further. We build the stack upwards. I tried to show this with some kind of animation. So when the lower layers are built, we build the dependent layers. And this is done then automatically. So the developers, they state their dependencies in the repos, and then they are prepared for automatic rebuilds. And here we have another module up here, which has all the dependencies downwards in the stack. And, of course, all these modules can belong to different teams in different gerit instances and maybe different type of repos. It could be GitLab, but mainly it's gerit. And, yeah, of course, from my perspective, we are working, or our vision is to have this instantaneously. So whenever a module is updated, we ought to have this dependency to see, okay, does this work or not. But we are not there yet. So this is what we had in the meantime. And the performance, I mean, before we had this, it took like a week to do these tests, because people had to talk with each other and check, oh, I have dependency to your things, and I had to build it in the right order and do it like, yeah, basically traverse up the graph or the dependency stack. But now it takes three and a half hours to rebuild all the modules we have. So that, I mean, from our perspective, it's a great improvement and it really made the developers happy. Yes. So this is a picture of what I took from the official Volvo cars web page. It shows basically that we have a lot of sensors in our future or platforms, the platform that we work with. And we have many different hardware and loop setups. We have setups for vehicle motion control, both longitudinal and lateral. And we have autonomous drive and ADAS protecting safety and so forth. And there we have both component and domain test setups. So the simple ones are basically a node or two of the smaller setup, but the large ones, I have a picture of here, that's part of the rig, real radars and real cameras for object identification and so forth. Yeah, and we have systems both from natural instruments and the space around this, including scenario generators and vehicle models. So I think it's quite, I mean it's quite close to the real deal, but it's in a controlled environment. So whenever we made, we have triggers and we build the whole stack, we test it here and ensure and check how well it grows. Here is another use case, what we run in SEUL. And we started to do this quite recently. Here we have a compiled simulation platform for active safety that they call CSBOSS. And it's a special setup where they combine different modules in a different way than we have in the owner software and then combine it with world engines and vehicle simulators. And we also have open scenario and open drive and a scenario generation engine that we can feed and create driving scenarios. And for ADAs, this is very important. And then we have some software plugins that communicate with ADAs. And this setup we get from our supplier, SENSACT, and they are actually here today. SENSACT. And when we get the delivery, which actually through some gateways enters our artifact, we trigger on that. And then we have some test cases that are evaluated with PyTest. And here are some links for those interested and some contact names. I have some colleagues who work with this. And ESMini is quite nice. It's an open source project that my colleagues contributed. And here is also links to the open scenario page. Another interesting thing we drive in SEUL. I mean, we run a lot of things. I mean, we run C++, C unit tests and compilation jobs and so forth. We also run some other things. And this is the one that I think is really interesting. It's a domain software stack. So here we have the source codes. And that we build and compile in SEUL anyway. And then we have supplier virtual issues that we get. So we, with our suppliers, we often negotiate to get SEUL simulations to be able to run. So we can connect our source code too. And we have both supplier and in-house models, for instance propulsion systems, brakes, steering models and so forth. And that is then combined. So these models here use silver and test-weaver from Synopsys. And it's all run in a car maker, which is a multi-body simulation environment. And it's used to kind of smoke test functionality. And the really nice thing with this system is that it's a kind of verification of the software because the fidelity here is quite high. And you can actually see real problems in brake systems virtually. And I think it's really impressive that the team have it as a gate and release job in SEUL. So they can actually keep track of the real functionality out that we experience in the car. And they run it through SEUL. And they also use SEUL for their setup of the actual models in the framework that's using SEUL too. Ja, our current setup is SEUL version 5. We have six schedulers and ten executors. And we tried to, we increased it slowly basically when we saw that we had performance issues we had to improve. And the back-end is run in an IKS Kubernetes cluster at the moment. Yes, our biggest challenges, I have two slides on that. One is that when we onboard new teams and they maybe have a small janking setup, they just say it's really slow. I pushed my change and I have to wait a minute or two or five. What's this? I want to execute it immediately. And then we have to tell them, yeah, but we have this rebuild thing. You're listening to a lot of repositories. So we had some issues with this. I don't have any graphs when it was bad, but it could take 10, 15 minutes for a short while. And we had some real issues there. So here what we have in this plot is part of upstream plots that we get event job time, which is the time between when SEUL sees the trigger, the event, before the first job starts. So I think that's a good, for me it's a key metric of how long do the developers have to wait. And I think, yeah, for our biggest tenant, we have times around, yeah, one and a half minutes maybe, for the smaller tenants that we now created. I mean, when we had these issues, we actually did a job to divide the different users in more tenants where we could. And of course in those smaller tenants, the event job times are smaller. We also got some nice optimization job from James, and this actually really helped. And since we have monitors of the reconfiguration times, we just saw that it improved enormously. And I think that when we were around 600 projects, we went to some kind of critical mass where the system just almost broke. Because people were starting up new software teams, they were messing about with their job configuration jammelfalls, and all those little things just caused reconfigurations. So while we worked on it, I was actually sitting a few days, just sometimes dekewing jobs, if they were doing really rush hours, I was sitting there like, oh, maybe we can run this later, and I just contacted a team, checking their gerit, like, oh, let's wait for this. Yeah, so we had, I think, one week where we really struggled, but now after that it went quite well. We also have the global semaphore developed by James, and this is because we use proprietary licenses for compilers and some of our test frameworks. And we don't want to put software teams that don't have anything in common in the same tenant just because they share the license. So this really helps us to keep them separated and keep our system faster. Yeah, this is another challenge. This is a graph of the number of support issues we have. We use discourse internally. We have our own discourse server, and yeah, it's a lot. So at times our sole CI teams and myself included work with support. Yeah, it's a challenge, but it's getting better. I think since many years ago we had the approach that we tried to find these small islands, you know, always there is someone in the team which has some interest in DevOps, and sometimes some teams has dedicated DevOps. So we talk to them, try to get to know them, and we teach them, and when they show that they can really handle the situation, we give them more and more privileges. I mean, usually after a while they have their own tenants and they have their own admin rights. And then we just talk about what should we do with, what should we request or what should we develop on a larger scale. But it is a real challenge for us. Yeah, our future. We want scalability for our worker nodes and we want Kubernetes, and I think a week ago we just got that. So we run that in the same cluster as we run the backend. So for simple Python jobs we now run it in our cluster. And we're investigating ways to run more complex containers, maybe containers within containers to be able to run that in the cluster. And we also want, I mean in a roadmap, if we get the chance we would like to not be fully dependent on AWS but to have a switchable backend to Azure. Like a Boolean switch, I want to switch now for some reason. We should be able to do that. And we also got requests to run Zooljobs in OpenShift HPC computing cluster. And yeah, we're investigating the drivers for those. And that's nice. Yeah, I think that's it. My clock here says we still have a few minutes for questions. Oh, there are so many examples. I think it's very common with questions on the base jobs. So we always try to set up a base job. I mean if we use Google test for instance we have a base job for that to build a certain binary. And it's always questions on, you know, okay I have this base job but now I want to modify it. People are not used to Ansible. Yeah, so that's a lot of questions. Also, I always or I often tell folk to use the new job. We have it many in the documentation. But use the new job when you bootstrap your new repos because usually people just define a lot of jobs or copy something and then they sit there and say nothing happens. Like why doesn't my job start? So that's a very common thing. But for the more experienced DevOps it's usually we have discussions on actually how they define the jobs and where they should be stored and how to solve complex problems where sometimes they come and say hey, I want to enable circular dependencies. We try to like, no, I don't want that. So those kind of questions are common. Okay. Well, thank you very much.