 connecting to cloud. Okay so hi everybody I'm really really happy to have our first open infrastructure foundation Israel Meetup and we have two fantastic speakers today who will introduce themselves and I'm really really excited and James the stage is yours. Thank you. Yeah so my name is James Blair I founded the Zul project which just at a very high level view it's think of it like a CI system but I'm going to tell you why it's different from other CI systems in a few minutes. Zul came out of the OpenStack project which of course is the founding project from the Open Infrastructure Foundation so as I talk about it you might get a little history of OpenStack as well and and sort of visibility into multiple activities of the the open infrastructure community. So this is also a Meetup not just a presentation so you know I have some slides to kind of start this discussion off but I would absolutely love it if people here would interrupt me during the talk or I might pause occasionally you know ask questions let's let's see if we can get a discussion going too so you know if something piques your interest feel free to share why it's interesting to you you know ask questions about it and don't don't make me talk the whole time so without further ado I'm going to share the screen here and let's see go over here all right so I think I think you should have my slides full screen now right yeah yeah okay the thing about this though is I can't actually see any anything else now that they're full screen so if anybody like raises their hands or ask something in chat or something I won't know it so if you do want to interrupt me please literally just interrupt me and start talking because otherwise I won't know what else is going on so I'm going to talk about Project Gating with Zool my I've introduced myself already of course I in addition to starting the Zool project I've started a company called Acme Gating which focuses on supporting Zool and supporting companies using Zool helping them scale it doing custom development and deployment and things like that so this is really a a full-time thing for me I eat and breathe Zool all day long so Zool is a project gating system that was developed originally for OpenStack and a minute ago I said it's like a CI system that's that's kind of the box that it fits into but it does things a little bit differently where we're really focused on complete testing and correct testing of of projects and I'll explain more about why why gate why I think gating is different than CI in a little bit so it was originally developed just for OpenStack and over the past several years it's been used by more and more other companies to the point that we I think maybe new features going into Zool now are more driven externally than they are from OpenStack and that's a that's a reversal from from where we started so it's actually really great to see more more companies using it and and the project sort of growing based on those needs and not just from OpenStack so why is Zool interesting it is a completely gate driven CI and it's actually CI and CD both all on CI and CD system that was that was pretty novel 12 years ago when we started it I think most new CI systems are gate driven now I think I don't know a number of we did a number of things very early on that were that were different that are commonplace now to the I don't know to what extent we've sort of inspired those and other products I think to some extent that is the case we'll never really know for sure nobody comes out and says oh well we did this because we saw it in Zool but we've been talking about this for for this whole time and sharing ideas so anyway it's completely gate driven which is really great for integration with and sort of developer access to the project I'll talk about that more in a minute it has a really flexible model of centralized and decentralized control it supports cross-project collaboration and then it's it's sort of main feature is what we call speculative execution and gating then these two things are very much tied into each other I'm going to drill down into into each of these topics so why did we make it get driven we wanted to make sure that that it basically came out of as we were developing open stack we wanted to add new tests new kinds of jobs and we wanted to do that as as as we went and by making the system get driven we could actually we could make for example a single commit that adds a new feature in the software as well as a new job to test it you're not of course always adding new jobs every time you add a new commit but but being able to keep the test infrastructure completely synchronized with the system that it was testing was huge and and admit that the developers working on the individual open stack projects could like they had some agency over the CI system they wasn't just oh oh go and ask the QA team to go add a new job it meant that they were they were responsible for maintaining their own test infrastructure for their own projects so it it was it's really empowering for developers to have that kind of flexibility there's and and this this feature definitely came out of I think of fairly what we thought was was a really unique aspect of open stack at the time and and that is that open stack is this collection of of of different projects right you've got but the the cloud compute engine is called nova the volume management is called glance these are these are actually different projects with different teams working on them and and there's many many different open stack projects now all fairly independent and we kind of needed them to work together at the end of the day so there's this mixture of well projects have a certain amount of autonomy and they they need to be able to to manage their own tests add their own test jobs things like that but also all of the open stack projects should should behave the same way follow the same conventions and be tested and work together so we set this up so that we could sort of centrally make some some decisions like everybody has to run documentation jobs and they all have to use the same documentation system right but then individual projects can add their own functional test jobs to to test what's unique to them so there's this mixture of centralized control and local control and the zoo you can kind of dial that in wherever you need it if if your developers run everything then then you can just give them full local control if if you have some some sort of compliance regulations or something like that that you need to enforce you can define that centrally and ensure that runs all the time so we thought this was pretty unique to open stack situation which is why we hadn't seen this in any other CI systems out there and then we quickly realized that actually this describes pretty much the environment in every company out there everybody has this need for both for some mixture of centralized and and local control so that was I'm kind of glad that we put that in to Zool originally it's it's been really useful in getting it adopted by by folks outside of open stack as it turns out Zool is unlike many of the the more recent CI systems out there Zool is completely agnostic about where its code lives open stack keeps its source code in the Garrett code review system and so that was where Zool started but folks have added support for github, gitlab and pager to Zool since then and Zool can work with any of these systems equally there's basically there's they're all first-class citizens as far as the support in Zool goes and moreover in you can actually combine projects on different code review systems in the same test job so if say you have an organization where you use you know one part of your organization uses Garrett and another part uses gitlab you can bridge the gap in Zool by having a a a single job that checks out a repo from gitlab and checks out a repo from Garrett and builds an application from both of those repos and test them together so not only does Zool support dependencies between different git projects it supports them across different code review systems which again that was something that we found out is very common in enterprises I mean we all wish that that companies would standardize on a single system it would make things easier for everybody but very frequently we find that that there are different parts of different organizations using different software and so having Zool being able to bridge that gap is very useful so Zool has this feature that we call project gating and to sort of what I to explain what what I think the difference is between gating and other CI systems I like to think of the the sort of the history of CI as it like the first CI system was was Hudson as we know it right like that was that later became Jenkins right and the the CI stands for continuous integration and the idea was that you would sort of continuously build your project and make sure that that everything worked so what would happen is people would merge some changes and then the the daily CI build would happen and you'd find out what broke so essentially what we were doing is we were testing the past because these were these were post merge tests later on we sort of gained the ability to to test changes before they were merged you know you upload a change to Garrett or open a get a bull request or something like that and and a CI system might run tests on that change and tell you if it's good or not so you're you're testing the present you're testing this thing that you just wrote um when what Zool does is a step beyond that it's actually focused on the future it's focused on what will happen if you merge this change that you just wrote so it doesn't just test the change that you wrote it it first checks out the current state of the repo and and applies your change on top of it so so it's not it's not even testing the the commit that you made it's testing what the result would be if we actually merge that commit and moreover because of Zool's support for cross project dependencies if your commit depends on another commit either in the same repo or in a different repo Zool will will check those out as well so what you can do is sort of construct these um this future state um where where Zool sees you know there it might not just be one commit it might be five ten twenty commits um that all need to merge in in possibly different repositories and and the question that that it's answering is if we merge all of those commits what will the result be will it work um and so that's why I think that uh project gating is distinct from these other types of CI systems in that it is testing the future so uh I've got a couple of of definitions here because I've been using this word as if everybody knows what it means um and so this is what what the word gating means to me and that is that every change proposed for repository is tested before it merges Zool kind of adds on to that with this idea of co gating where if you have different repositories then changes to a set of repositories merge monotonically so so that each change is tested with the current state of all other related repositories before it merges so this might be um a sequence of changes that that all need to that are all interdependent um or at least have a dependency series between them a dependency relationship and um and so co gating is saying that all of these different projects they're gated together um they're never going to break each other because you test every change to each of them before they merge and then finally um we have this idea of parallel co gating so if you take this idea where we're different projects uh where we test changes to different projects um before they merge um what Zool does is it is it does this in a very efficient way where it tests all of these changes uh in parallel but still um treats it as a as a a series of changes that that like they still they still merge in a defined order they still merge in the order that that either they've been improved or or the order that their dependency chain implies um but rather than testing them one at a time uh we test them all we start the test all at once in parallel uh with the assumption that they're going to merge uh and um and if they all do pass then they merge so that's getting a little weird and hard to explain with words so uh I have kind of this uh um uh I'm sort of a visual person so this is an illustration of how Zool deals with multiple changes um and and runs the the tests in parallel so if you if you think of um a queue of changes that need to merge um maybe maybe a few of them have already merged we've got two uh two changes here at the top of the screen that are already merged and then somebody approves the change uh and uh and Zool starts uh running tests on that right so we we start a series of jobs to test this change to see if we're going to merge it um and then somebody else approves another change on on top of that and then and then there's the third change so we start um testing all of these changes one after another um when Zool starts do I no okay so when Zool starts this uh um the jobs for for this first new change it it just checks out the state of the repo and and applies that change and starts running jobs right when it starts jobs for the second one uh it checks out the current state of the repo applies the first change then applies the second change and then starts running jobs and so forth for the third change so by this third change down here um we've it includes the the two changes ahead of it in its tests as well so then let's say that second change starts failing um clearly at that point Zool knows that uh with since the jobs are failing um that change isn't going to merge what do we do now um well the the change at the head of the queue stays the same it's fine the change the second change the one that's failing now um we keep running jobs on it and just to to to let them finish in case any other jobs fail we'll give it as much information back to the the user as possible um also there's a possibility that the change ahead still might fail and and if it does we might need to change things again but um but for the moment we just pull the second change out of the the queue we restart the jobs for the third change and so this change at the bottom of the screen um we restart its jobs with the first change and the third change but not the second one so um this is sort of this is how Zool maintains this queue of of changes to be merged uh and runs the test jobs for all of those in parallel so um what this lets you do once you combine this idea with cross project uh gating uh is you you could construct a situation where um you make some changes to a library that your your system uses right um let's say we've got two library two changes to two two different libraries here um and then you change a you make a change to your front end system right like the the actual web service that depends on those libraries um by by telling Zool that that your front end change depends on the library change then your front end will be built with these new libraries um and then let's say that you make a change to your infrastructure to actually deploy this front end you can you can write that change saying that you know we're going to deploy this new version depending on this front end change which depends on these library changes right and what Zool can do is rather than having to merge the library changes and then rebuild your front end and then merge your front end change and then redeploy your infrastructure what you can do with Zool is have all of these changes um actually tested end to end in a simulated production environment before any of them emerge so um this this is huge for developer productivity um and and huge for folks running infrastructure as well um because it means that that uh you can avoid the situation where you merge something and then you have to revert it or or something like that um by by essentially testing from end to end your your complete application um so we actually we've used this a lot in the OpenStack project to to sort of let's say we're making um changes to a client API right we want to actually show that those client API changes do the job that they're intended to do you know there's a there's a commit message that says this this adds a new API that does something right um well with Zool we can prove it we can say not only are we going to add this new API we're going to actually um use it in in in a consuming application and have a Zool job that shows that um that process working from end to end that way if the API is a little wrong um say it doesn't you know it looked right at first but when we tried to use it there was something that didn't quite work right uh you can go back and revise the API before you've even actually committed it to the repository um so uh I'm I'm gonna skip over this this is basically just showing you the the same thing again how how these different um changes in the queue can come from different projects um so I think why don't I actually stop here for a second just to to give folks an opportunity to jump in and and ask if there's any any questions because I've talked a bit about um uh you know sort of the the the a little bit about where Zool came from and project gating and things like that and I I kind of want to ask if there are any questions about project gating or if if that makes sense a look sorry very please yeah this is sorry great uh so uh to my team like uses Warsaw for chcd I'm sorry which project my my team uses Warsaw it's a tool for uh like for cipd front end apps I guess few of my team members use in few projects let me name it for you I have just sent it in the chat so they use this this tool for chcd is this similar or different to it what um Zool works um I am not familiar with Versal um can you can you give me a quick summary maybe it's it's same like it's a general like a typical chcd tool um yeah so uh I I think so without knowing and and if anybody else knows the answer to this question please jump in but without knowing much about um Versal I would I would say in general the the the key features that distinguish Zool from most other CI CD systems are the cross project dependencies um because a lot of CI CD systems are just focused on on helping developers with a single project and so once you get more git repositories involved they they tend to they tend to to fall down a little bit um it's it's hard to express it's either hard or impossible to express dependencies between different projects um and so the I think more and more CI systems are are doing something close to gating most of them aren't doing it quite as sophisticated as as Zool is but certainly I think in general the you know if you go back to my past present future idea folks most CI system CI systems are moving from from testing the present to testing the future um there again not not doing that quite as much as Zool is um but certainly pre-merge testing is is the norm now and um and some systems are developing a sort of fire and forget gating system where where you might approve a bunch of changes and they go into a queue and and eventually they get merged if they pass their tests so um Zool is uh is different than those in that it it does that testing in parallel in a very efficient way um so so I think it's that sort of parallel gating and and cross project dependencies are the the the the things that are that are that really set Zool apart from from most other systems um okay so I'll uh I'll jump back to the uh to my slides now oh and I apparently okay so um one of the the one of the so once you sort of set up these things that Zool provides right with cross project dependencies and gating and things like that there's there's some emergent behaviors um and and one of them is that uh Zool actually um a lot of I've actually talked to a lot of people that that have switched their their entire infrastructure or their entire development um organizations from having multiple repos to having a mono repo and they do that because of the CI system they're using you know if if they're using a CI system that doesn't understand cross project dependencies and they they have these kinds of tied integrations then they oftentimes people say well the easiest way to deal with that is to just have a mono repo that way you know you can you can test changes to your front end and your back end at the same time if they're both in the same repo um having a mono repo is you know that can be a good choice in some situations um other times it brings a whole lot of other problems with it I think the whether you use a mono repo or not should not be because of your CI system that it seems like the tail wagging the dog to me um it seems like it should be decided by by the the level of integration between the actual components of your system um so with Zool um you don't need a mono repo Zool will work just fine with a mono repo there are Zool users that use it with a mono repo but they've elected to do that because it makes sense for their project um not because it was a weakness in the CI system so um if if your system involves multiple Git repositories um and you need to test them together um you don't need a mono repo to do that if you're using Zool you can just keep these these components independent and then um combine them uh when you run the job in some way and of course there's a lot of different ways of combining them um they might be combined with library relationships so um you know project A uses project B as a library and it's incorporated at build time or it might be over the network um so the the two components talk to each other over you know over TCP and and you're um and you're sort of doing the microservices paradigm um as an example OpenStack uses uh both of those um some OpenStack components are built from other OpenStack components as a library and also different OpenStack components talk to each other um over web services so um in in OpenStack's own use of Zool um we we see both of those combination methods being used a third one which I hesitate to mention because it also opens a can of worms is uh is sub-modules there are ways to use sub-modules um uh with Zool but uh and Zool gives you more options than any any other CI system for using sub-modules but anytime you start to use sub-modules it's complicated it's difficult for developers to deal with it's it's hard to get right uh in testing um but it is a possibility Zool has two ways of dealing with dependencies um traditionally we actually only had linear dependencies so that um so that a change in project A might depend on project B and in that case you'd have to merge the change in project B first before you can merge the change in project A we set we wrote that we wrote Zool that way originally for OpenStack because we intended OpenStack to be continuously deployed um so that um you know you might you might upgrade Nova to to the next commit and then you might upgrade Glance to the next commit after that um and we wanted to make sure that that you could always upgrade one component there like there was always an order that you can upgrade OpenStack components in uh and the the system would um continue to work so the original dependency implementation in Zool was was written for those linear dependencies um since Zool has become used by more projects other than OpenStack the um the idea of circular dependencies has become more important um and that's because as an example um folks who are manufacturing things in hardware don't have those same restrictions on continuous deployment that that a web services based software system has so uh if you're if you're updating um the firmware of multiple components in in say a vehicle um it turns out you can update the firmware on different components uh at the same time so you're never running a situation where the firmware on component A um has to upgrade first and then somebody drives the car and then upgrade um component B and and then somebody drives the car again for a little while um usually the upgrade cycle is you upgrade uh component A and B at the same time so um so we've added support for circular dependencies so that tests uh we we can test different changes to different projects uh together at the same time uh and uh and then Zool will either merge all of the changes in the set or none of them so um this is this has actually been an area of of uh a lot of recent work um we've in addition to doing that we've made things a lot more efficient where if if um multiple changes in this dependency cycle are are running the same jobs then we'll actually deduplicate those jobs so that things are more efficient for for um for users um this also ties neatly into a feature in that's that is unique to Garrett um which is Garrett also understands the idea of simultaneous dependencies so um not only can if you're using Zool with Garrett not only will Zool test all of these changes together um and try to merge them all at once but Garrett will also internally enforce that atomicity so that all of the um like Garrett will either merge all of those changes at once or not at all um so it's actually kind of neat to see that uh that support in both the CI system as well as uh the code review system and see them working together on that um so where does Zool do its work um it's uh it's it's kind of agnostic about that as well it's not tied to any one cloud provider or any one way of running work um you can run Zool jobs on virtual machines so we we support OpenSack AWS Google Azure IBM um or you can run your workloads on uh Kubernetes or OpenShift or you can run them on static nodes so those could be just pieces of real hardware that you SSH into maybe those are proxies that that are used for testing um other kinds of real hardware you know maybe you hook that up to actual devices that you burn burn um firmware onto things like that um all of that is uh is possible with Zool and Zool doesn't really care where you run those things on um this is all fairly pluggable and uh and um and and so we we support a bunch of different environments um in uh I haven't talked about this much yet and I'll talk about a little bit more but um we run a a Zool for um that is used by the OpenSack project it's used by the Zool project itself and any other open infrastructure projects are welcome to use it as well um we run that in what we call the OpenDev Collaboratory um and that's basically a volunteer run organization that uh that runs uh Garrett and Zool and a bunch of other things to to help open infrastructure to help open infrastructure projects with their um with their development and inside of OpenDev um we because it came out of the OpenStack environment OpenStack is the only cloud environment that we use however we use um something like six different OpenStack clouds so um we actually we treat this as a multi cloud uh application and if if for example one of our cloud providers goes down that's fine we don't care we'll just use nodes from a different cloud provider uh and I have um I work with companies that are using Zool internally that for instance run uh both AWS and Azure right so if if there's a problem with AWS um then they'll fall back on using nodes from Azure or something like that um or maybe they can get a certain amount of of of cheap resources from one cloud provider or one cloud region uh and if those aren't available they'll use more expensive resources from a different one um all of these things are possible um Zool supports multi-node jobs natively this is another feature that came out of needing to solve OpenStack's problems because when you deploy OpenStack um you're you're deploying them on different nodes you know you know there might be a compute node and a controller node or things like that or of course you can imagine in a Kubernetes environment maybe you need to to deploy Kubernetes on three different nodes something like that um we we we designed Zool to support these uh these multi job multi-node jobs natively so you can say when you run this job you're going to need two different virtual machines one of them is a is a controller which is going to be the you know the cloud controller and then the other end is just going to be a compute node and and we'll test OpenStack on both of those and the way so so when we when we're designing the current version of Zool we said well we're going to need a way to to say run this task on the compute node and then run this task on the controller node and instead of inventing a new way of doing that we said oh there's already a system out there that lets you designate certain tasks to run on certain nodes it's called Ansible so the way Zools jobs are written is just by writing Ansible playbooks so a Zool jobs playbook might look like what you see on the screen here where you say on the controller node run a bunch of tasks to set up the controller then on the compute node run a bunch of different tasks to set up the compute node so I'm going to work walk you through a quick example here and this is a real-world example of how we use Zool in the Open Dev Collaboratory and so this is a little bit different I've kind of talked about using Zool for software development this is actually using Zool for infrastructure deployment so one of the things that we do in the Open Dev Collaboratory is we are all in on the whole GitOps software infrastructure as code kind of thing so all of our deployment is completely driven out of Ansible playbooks in Git repos so if we want to deploy a Grafana server we have a job we have a Zool job that runs that and actually deploys a server with Ansible playbooks we have two versions of this job we have one that does it for real on our actual production server and then we have a version that runs the same playbooks but in an ephemeral test environment so this gets us to the point where somebody can propose like if somebody wants to upgrade Grafana right they can propose a change that bumps the Grafana version and Zool will run its playbook it'll spin up a test node from an open set cloud it'll run the deployment playbook on that test node and we'll actually see the system running before we merge the change to deploy it so these are this is basically everything that we need to do to deploy a Grafana service we have a base playbook that we run on all of our nodes there's things like setup firewall rules things like that we have a Grafana playbook that actually does the Grafana deployment we have an inventory file which in production is our real inventory file if we're doing the test job then it's it's an inventory file that just contains the ephemeral node that Zool retrieved for us we have a bastion host we have our service host and then we have Ansible configuration so this is this is what a Zool job configuration looks like it's all in YAML so if if we went to the actual git repository it would it would look like this but be a little bit longer but we're basically saying that to to run this Grafana deployment we we have a job whose name is system config run Grafana before we run this job we need to make sure that we have there's a different job that builds the container image so we need to make sure that we have that in place there are two git repos involved in this there's the system config repo which is where we this is the one where we actually have our deployment playbooks and things like that we also have a project config repository that has things like that actually has our Grafana dashboard definitions because we drive those out of git as well and so to do a full deployment we need both the definition for how to deploy the server and how to set up the dashboards in it so this tells Zool that both of those projects are involved and you need to check out both of them when it runs this job when we do the the test deployment we're going to request a a bastion node as well as our actual Grafana node so we get two nodes from node pool and Zool is going to use both of those when we run the jobs and then finally we run a series of playbooks so we are going to run the let's encrypt playbook which we we run on every service where we have SSL certs because we get all of our search from let's encrypt and then we're going to run the Grafana deployment playbook and then finally since this is the test job we're going to run a playbook after that so this is after the end of the production deployment where we run basically some functional tests on the system so we use test info for that so we're gonna we're going to run some tests that verify that the service is up that it returns appropriate data things like that as well as this is a neat thing it's going to get some screenshots so that developers looking at changes can actually look at the Grafana dashboard and say oh yeah that looks right so the the the test run for this sets up sets up our hosts sets up our test mirrors things like that um we we translate our Zool inventory into an Ansible inventory that we use for the nested Ansible that we run um because we're doing an Ansible deployment um then we run a series of playbooks that I that I just discussed on the previous slide and oh I actually I just already told you this sorry so this is what the test playbook does um checks the ports are listening to some hspe queries and captures screenshots and logs so um let's say somebody's uh you know uploaded a change to to to do this um this is what the result looks like in Zool so if you actually go to Zool's web interface you'll see that this this job right and these are actual screenshots by the way because this is a real job um so um uh this is sort of the summary screen showing you that the job ran and it was successful um if we go down and look at the task summary we can see that we ran 226 tasks on our fake bastion service uh and 117 tasks on our Grafana service um and over here on the console tab you can get an idea of um sort of Zool showing you um uh what's going on at an individual task level so so what you're seeing now is uh at the top of the screen um a bunch of playbooks that Zool ran and one of those is this um run-based playbook that I mentioned and if you expand that you can see the individual tasks that were run um so we on our bastion host we install Ansible um so this is this is running the individual tasks to install Ansible on the bastion host and you can drill down even more on that if you want to um at the end of the job we collect a bunch of logs so we've got logs from the bastion host we've got logs from the Grafana service we run Grafana in containers using Docker compose so we've got all the Docker logs in there we've got a copy of syslog and things like that so if anything goes wrong with this deployment um we can we've got all of the logs there uh that we can just go ahead and take a look at and then we collect a bunch of artifacts as well so um we generate an aura report we have the results from running test infra uh and then the screenshots and uh so here's the aura report so this is going to be the report from the nested Ansible run um the test results uh sorry the test infra results and here's a screenshot um where the job basically just used Selenium to uh to run a browser against our our test Grafana system and take a screenshot of it and then um since we did all of that work to to have this ephemeral test job um um we can build our actual production job based on that and uh and it's it's uh it's quite thin it's actually a lot shorter it's again if you looked at this in the repo it would be a little bit longer than this but this is uh um all that you really need to do is is just change the the playbook that you run uh so this runs the same playbook that our test job did um it just does it against the actual inventory instead of the fake inventory and then it doesn't run the the validation uh jobs at the end um so uh one last word here about um Zool's architecture I haven't talked very much about it um but I don't think we need to I think it's just important to know that Zool is a highly scalable system so there are no single points of failure in Zool um and uh you can scale out any of these components in Zool um horizontally depending on your on your needs and and the load that you're doing uh open dev runs two schedulers two web servers 12 executors executors are the the component that's responsible for running jobs six mergers um those are responsible for for just doing get merge operations um and so that's a that's a pretty good size system it's not the largest Zool system out there but it's uh it's a decent size um and uh we continuously upgrade Zool at least almost continuously right now we're in open dev um we we upgrade Zool uh to the latest commit uh on master once a week no matter what that is basically we're we're so confident in our testing that we don't we don't really we don't schedule upgrades they just happen on Friday nights um so um we'd actually like to make that a little more frequent the only problem is is that doing a rolling restart of the executors um takes something like 24 to 48 hours if if the system is busy so um we're not sure that we want to sort of commit to to always being in the middle of an upgrade so right now doing a doing a weekly automated upgrade is our is our happy medium but but generally speaking we we upgrade um we we can upgrade continuously and we do it without any downtime because all of these services are redundant so um if you'd like to learn more about Zool that's the link to our project page and if you'd like to learn more about advocating there's my web address um there's on the on the Zool project page there's links to our mailing list and our matrix chat and the source code of course and and things like that documentation all that so that's sort of what I have here for slides um I'm going to uh stop and see if there's any questions um and uh yeah um I had a question um I'm not sure if it really was covered because I was I kind of flipped back and forth when you're going over the actual screenshots however is there an actual visual um like feedback chain uh workflow chain that you kind of get in a lot of like um um our ghost CD I think has it you know like you know where it shows the actual visual reference to where things are sitting and you can like click on those icons and it becomes basically like a little you know pop-up command or you can do stuff with it you do them and talk about yeah um so we we do have uh let me share the screen again um and I'm going to start by you know I should actually let me do this I'm going to share this tab um so we do have um some some some tools to help uh developers sort of visualize what's going on with Zool um the there isn't an exact analog to say what you'd see in Argo CD um partly because the the the constructs are a little bit different in Zool um the way that you build Zool pipelines is uh is a little bit different uh you you can do everything that you can do in Argo CD in Zool you just do it differently and so the visualizations are are are a little bit different um so I'm going to show I have no idea what this is going to look like but okay looks like we're fairly busy so this is um this is the uh this is Zool's own Zool um but I'm actually showing the open stack tenant because it tends to be a little bit busier so um this is um what's happening in the open stack project right now so these are um um if you actually if you think back to my slide um remember I had some circles that were sort of connected to each other um that is that is a direct correlation to what you see here on the screen right so um right now um let's see there's weirdly um there's there's uh there's nothing going on with uh the open stack project itself it's all the ancillary projects right now um but like there's a there's a queue for open stack ansible um there's another queue for a different part of open stack ansible um yeah so so these are basically different these are different independent queues of groups of projects and these are the changes that are are lined up to be merged right so this is if we look at this sort of set of uh three changes right here um these are three changes that are lined up to be merged in the open stack ansible os neutron project um and uh and within each of those changes we're running a bunch of zool jobs um so there's a whole bunch of jobs that are running just for that first change and then a whole bunch of jobs for the second change and and that and that sort of thing right um so that's how you can sort of visualize um the different changes that are in flight and the relationship between each other um but the thing that you see in argo cd because it's usually not focused on more than one change at a time um and and again this is good that goes back to one of those different differences between different ci cd systems and zool um uh a lot of systems don't have this visualization because they don't have the idea of multiple changes sort of lined up in different queues to different projects and being dependent on each other and that sort of thing um so so this is how you visualize that in in argo cd um you're going to be more focused on uh how do you uh build uh you know build and deploy this particular component of the system how does that relate to other components of the system and that's actually a little bit the mapping isn't one to one but in zool that's a little bit more like the relationship between uh between jobs for a single change right so um this this isn't a great example that i've selected at random because this is mostly just testing one thing on a bunch of different platforms um but you could imagine that this is actually um a um do we actually have a i was hoping we might have a oh right so um going back to open dev and my example from earlier uh this is this is an actual production uh deployment job this is a job that updates zool itself as it turns out uh as well as a few other things our eavesdrop service for some random reason is also uh uh connected there so this is um updating uh several different production systems um at once right so so we've got um we need to bootstrap our our bastion service update anything on the bastion service and then we're going to update zool and our our container registry and node pool and the and then the eavesdrop service right so this is this is um uh inside of a single queue item this is a bunch of different components that have a relationship to each other and so the way that we um define the way that we deal with that in zool is um we construct jobs for each of those components and then we express the relationship between those jobs so if i were to um go to let's say if i can find it here um i look at this system config project and i say what happens when somebody proposes a so when somebody proposes a change to the master branch of the system config project um this is the complete set of jobs that might run um and this is um one of not actually the largest graph that i've seen of this type but one of the larger ones um and this graph is basically what i was asking about but i don't see how but it's so complex i don't think it would be used in the same way that because the specific graph i was picturing um because i had touched argus seed in a cd in a couple months but basically i was starting looking for that that piece that actually shows like as you're showing here um you know state oh the stages you know the stage graph like where it's basically you know like stage one is complete you know whatever you name it so you know if you have you know adamant um uh you know q qa testing or whatever whatever you have it's listed in there in that stage and then it's moving through the stages and you can click on it to see what happened so it looks similar to this but it's it's it's not as complex because there's not like one or but go ahead yeah yep um so so yeah definitely reading this requires you know knowledge of the system that that that you're working with right um so but just as a high level these are a bunch of jobs that are basically completely independent they don't rely on anything else in the system and so they can just run on their own um these jobs over here on the right they all have some kind of relationship with each other right so in order to like this one over here on the right um is system config run review that basically means run our garret server um and so in order to run our garret server we might need to build a garret image and in order to build a garret image we might need a garret base image in order to build our garret base image we might need we're we're definitely going to need uh our container registry right so so it's it it's not exactly the same thing um that you see in argo but it it's similar in spirit in that um yes these are the that last part you just said it's like okay well that's it's like it's like um what we usually see in CI CD on steroids because you know you're literally going back to looking for an image you're not just looking in a registry if there is an image you're actually you have it in there um in your I guess infrastructure as code or whatever to look to see hey do we even have do we need to build an image and go build it you know that's this is very exactly yeah and that's actually why this is a dashed line because maybe we don't need to build that image and and there are Zool is configured in such a way it's that that it might decide that it doesn't need to build that image in which case it can skip that that dependency so um so yeah it's and and really the reason why this doesn't look exactly like the argo CD is is because like there isn't the way of like I said you can accomplish the same things but the way you accomplish them is is different enough and uh the way you think about the problem is different enough that it's not going to look it's not going to look the same because you don't express the the the way of doing it in the same way so there isn't like you can't write a script that translates argo CD um configuration syntax to zool syntax because you kind of have to rethink the problem uh in in zool's way of thinking about it versus argo's way of thinking about it thank you okay so uh if nobody else has questions Jonas please this is yours thank you yeah yeah the same goes for me I mean if there are if there is a question uh it's easier and then I just interrupt anytime now I will try to see if I can get this to work can you see this yes great yes so the obvious thing for us uh at all of course why we have chosen zool is uh we don't want to merge broken code and and james has gone through through quite in detail of how that works uh yeah so I have a pipeline here that shows the sum of the backgrounds the history of when we started to use it so we started okay it seems to time flies it's nine years ago where we we started on a small scale and we started using uh zool version two 2018 and uh on that verse and it uses used Jenkins as a job execution manager and at 2019 we deployed zool version three which uses ansible in the way that james has described and I will come later in a bit later why we made this choice because people were used to Jenkins and ansible is not is hard for for some people yeah uh 2021 we won the something called the tech awards inside Volvo cars and this was really nice because developers voted for this it was the first time you could vote and developers just you know showed their appreciation and voted so we won that's that year and since then we have scaled up and we are continuously scaling up our zool deployments and this slide here is quite recent it shows the same web GUI that james showed and and this shows our current tenants and as I think is very common we have something called VCC Volvo cars tenant this is the largest so we try to separate projects into different tenants because then you know they get faster performance but sometimes it's not possible and here yeah we can see that we run all the electric car central components and easy use of the car they have their own tenants and then we have the VCC tenants which is or basically our ADA computer which is the largest one and then we have special tenants for logical simulations and the tenant for zool and then I think we have an old legacy tenant here that we did just recently so I think I start with some statistics I don't know if that's interesting or not but these are fresh figures from yeah recent figures and this shows in the top graph here shows gary patches we do have a git lab connected also but the majority of the software changes proposed to zool is from Garrett and it's also something we selected for our future that we want to focus on on Garrett and yeah it's it's not the biggest zool deployment I have heard of or looked at on on YouTube but it's it's starting to get a bit serious and here we see the zool pipelines or yeah what we came showed in the zool web and here one can see that we have something here in the VCC tenant we have checks jobs and when you push a change to Garrett we will automatically run check pipelines and that's the largest majority each day of the pipeline and we run then the amount of jobs is I think last year we picked that 10 000 for most and now we I think we have an average of 15 000 so it's growing rapidly and it's growing all the time I was when I looked at this in the beginning of the year I didn't have time to look at it but then we had to go through our backend and say oh we're we're growing now we have to make sure we can handle it so it's more and more but as as also we heard before zool is scalable and we have known that we started much smaller and we have been growing all the time and we we we think that we can grow way as these figures also of course we need to make sure that we we don't have bottlenecks in the infrastructure backend and that's something we talk with James about you know when when when we see a certain scale and we we you know we look at the different components and the hardware behind it just make sure it can handle the increased load this is from the grafana that you get out of zool out of the box and here we can see this is a typical day so I think last time I presented this we were like around four or five hundred nodes in parallel and now we yeah we mostly use amazon nodes at the moment and yeah we almost hit k of dynamic easy two nodes so we we have a i think we have a asher account here yeah so yeah not so much but we we do use we have asher and avs providers and of course some bare metal nodes we have a few of these odd rigs connected and a few odd bare metal computers but they are I think they're added on on on these top here so there are very few of them that we run and yeah and we have a central team our two teams actually that focus on 17 people in total and then we have three people in propulsion who work with their tenants they are yeah they're self like self-governed or and self you know they mind their own business and they just talk to us when there are issues basically and then we have also a small small rust based tenant and they they're also minding their own business and we have a support contract with james we're accumulating and and this helps us a lot I mean having the founder of the opson's project you know as a support to help us out especially when we're stressed and we we're trying to solve operations or other issues it's great to have this operation and I added a few to other companies here one is our fully owned volvo course company sansak they do the sensor fusion parts of the adidas and they also use a suit with an open shift deployment and then we have arabe which is a rather new company that handles our legacy of a mission like petrol and diesel engines which we have branched off and we we have a plan in volvo course to to just go electric quite soon so we but they have a they have also their own so so it's even though yeah there are some automotive involved companies using so yes and we we like to I mean we we like to participate as much as we can into the community into the live chats and whatever and we like the openness and the collaboration you know of the product of the both open infra and the school product itself it's very nice to see you know anyone comes from the public and has questions and they get excellent support you know everyone gets first class support and it's really it's really nice to just you know see the live chat logs and see how the community you know the mentality of the community and how it participates and I think this is for us it's really important it's something we're trying to increase because it's it's actually fostering a very good developer environment where where developers can join either I mean for us they're free to contribute to the the project itself if they like and we have seen that when when our developers do it's very very good for their learning and their development as as developers basically so we actively encourage this and we really also appreciate the truly open source padding where you use an open source ecosystem and you man and how that is managed it's I think personally very interesting to see you know how how this is operated and how you can use an open source ecosystem for something as complex as as as open stack and and so together and this is really exciting our own setup we have six schedulers, 20 executors and 10 webpods at the moment and we are working on auto scaling so far we just like panic scaled I could say where we had our first somewhere we just scale it up and let it be like that you know so now we can handle it you know we can handle so but we are we are currently working on setting up auto scaling and the back end of zool runs in any case cluster so it's yeah I think it's gonna work to have it auto scaling basically based on certain parameters or load parameters but this is the size we run today and then I think we here I have some material that touches on what James has said I start with a few just text-based things that we can we can we can make some changes to a set of repositories and we can merge those monotonyically and we can test each change with the current states of all the repositories before you merge and this is this is really a crucial feature and I will show this graphically in some slides and here we have the the complex situation with the cogating that we can make sure that we run all relevant tests for a complex set of distributed change cross repositories and for us they I mean if we have add a computer to our car and we think this is an important node we'll have a lot of software developed the first step for us is to establish just a connection between zool and repository and then when we have that and if we we try to group it together with other nodes of the car or which means other gerrits projects that means that the developer can write in the gerrit commit message depends on and I here I strobe another gerrit change in another node repo and this what this does is emulating in a way a mono repo in in a cs system so even though we they can be made different gerrit instances developers can push changes that depend on each other but they are in different code bases but they have a clear interaction they can push changes in both and just be the commit message say I depend on this change and then zool will tear it through the checking gates pipelines and this is a yeah I try to be the art director in me or whatever tries to make a picture of this trying to explain how how this works and here I have edited out a little information but so basically here we we can see that we have we have changes that can depend on other changes so if we have if you look at the car here and this is one architecture used in the industry the sonal architecture functionality will be distributed you know among these compute nodes and and this is the reason we choose especially to choose zool version three because here you have the depends on functionalities you can you can have changes that that are in different computers in the car and they might be in different repos and but you still need to develop functionality that holds together and with zool you can do this it's very easy you just write a message and there you go and if you have a lot of developers and you have a like a the car of today is a complex data center with a lot a lot of large computers with a lot of communication in between and you can have thousands of developers you this is like we just saw that we we need our product is is complex and when we looked around your open stack they they were a public repository we could look at their super plans and how they work and they also have that they have a even more complex ecosystem where they build of things on each other and they they distribute they build different things of different Lego pieces and we saw that we these data center on wheels and we we we really need we need gating basically to ensure our speed and we need this depends on on feature and that's the reason basically the reason that we choose zool and that's the very powerful feature that it does have and let me see i have a picture here of this i can maybe look at this and and here uh i mean for this to work and and this is very interesting i think because we we actually can only we can look at the open source they open them and we there we can see you know so basically you need you need this complex gating mechanism but you also need another thing and that's the tests so we need we need to we see that we need to have ideally we want to have you know cloud-based uh tests running for these nodes sometimes it's possible you know to abstract the way of it and we uh we recently used these gravitons and arm-based nodes and they they we can run the same binaries as we have in the car on these ones that's extremely powerful because then basically we we can have run virtualized tests that actually translates very well to the real computers of the car and this is the other component that is really needed if you have if you have good tests for all these different nodes then you really you really can handle you know gating with a complex system and solver dependencies between a complex system and here we see just an enlargement here of some of the nodes of the car with and i put yeah some cloud computers on top of them this is a very detailed picture a lot of a lot of information in this one this shows our our current state of what we have today so basically this is a simplified picture of showing the the flow of different kind of permits and how they translate to the check and gate and release deploy pipelines of of zool and this is also here at the at the moment we have a like a manifest for our car in g-club and that's why it says merge request check and gate so they're also running in zool they have their own pipelines so yeah and this is currently we are working on on changing this but but at the moment it's looking like this and here is just you know i try to just put in some kind of jobs here so get an idea of what does developer what do they have in their pair of pipelines i think we have we have some python jobs but we also have a lot of google tests yeah we use clang tidy valgrang we use emulators to run our code i use some proprietary linter tools it's in here yeah so and here we see yeah this is the manifests that were in the purple picture and here we run other kind of like tests full-scale tests of the system because this is the when when when we merge a manifest the car manifest it means that they are actually available for being out sent out to to reveal cars so here we have yeah other kind of tests more complex tests and we also have these hardware in the loop systems that run tests on on the real target computers at times so i won't go into details for that either yeah oh that was the that was a large last slide um yeah i lost track of time also but i think this was quite brief right i hope actually i have i have a question um so actually do you use zool to update vehicles in real time while they're driving no no we don't do that we don't have zool as a cd system in in in that way we have and it's i mean it has a lot of do with security so we what we have is that we have a lot of you know some functionalities they um like let's take drivability development that we had for years in zool let's say you you want to you want to work with the software that handles how the car behaves you know on certain speeds like how when you push the pedal how how will the car how will the torque be sent out to the inverters you know giving the drivability feeling of the car you know that can't really be you have to validate that in a real car you we don't unfortunately we don't have the high enough fidelity models i mean we have in some parts but not in not for drivability unfortunately so that needs to be tested on a car and preferably on a test track we have a really big test track outside of gaffer murd and we have specialized developers they you need to go to a course and you you know you learn the security and everything and these cars have are like they have like red buttons you can slam you know just kill the car you know but there people just run the check job and you know they do software download and that they develop like that and they actually you know they they they push to get it and they get the binary and they flash it in the car you know and the car so yeah so they they they do the it's not like zool is not it's not like the the car the is like a bare metal node and and they are you know we have either it's a labor and it will be downloaded it's not like that but they they get in in the artifact page of the zool web they will get the artifact and and they use that and and they have specialized some of them i think we have a few teams that really do a lot of in vehicle testing for the systems and they have specialized pipelines where they can just type and they get exactly what they need for the test track testing so they get the images and and a few other things so they have like stripped down pipelines where they don't have everything of this they have you know just what they need the documentation the diffs the logs and then just you know the all the binaries they need for the nodes they are testing so it's a we we yeah we're not we're not there that we do like canary or like let's find a customer and and shoot down i think it's also it's also legally you know there are legal things on this and yeah it's very complex but so yeah that's the i hope that's the an answer yeah thank you okay does any if does anybody has other questions okay so i think i think we can wrap up this meetup i think james and johannes a great deal i'm really excited about this first meetup and everybody can i write my email down here everybody can if really email me about anything i think it's super exciting and interesting and thanks thanks everybody thank you thank you guys i was really i was happy that i didn't realize it was going to be james and also you know the gentleman from volvo talking about real world like from a super duper company i was i'm pretty i'm pretty excited i was able to catch this great you're welcome i think it's i mean personally i think james agrees you know if somebody wants to talk about zool you know i just turn up i've been to really odd the small events you know i i just like talking about zool you know with anyone and i don't care if it's it's a huge mass or some conference center it's a small meetup you know i i really enjoy talking about this because i think it's great things always happy you know always trying to make time for these things yeah that's it exactly great great so i can stop the recording