 My name is Alessandro Falgartia. I'm a Senior Solutions Engineer in the NGINX community analysis team, and I also happen to be the main slash only internal developer for two of our most popular open source projects. One of them is the Docker NGINX, some privileged image. And the other one is the Ansible NGINX role. Guess a little housekeeping before I guess I started proper. You can find my contact details down there in the slide. I'm sure you have any questions after the session. Feel free if I don't get to them here. Feel free to shoot me an email, get in touch via Twitter or GitHub. And just to set expectations a little bit more, I don't think any of the lessons that are going to be covering here are necessarily going to be groundbreaking. But hopefully they confirm things that you're already doing in your pipelines nowadays, or they provide some guidance on how you could improve some of your workflows. And with that being said and done, let's move to the beginning, chapter one, the great NGINX. So you probably already know what NGINX is, but just in case, just you level-set expectations. Well, level-set, you know, a little bit of what NGINX is about, and just to people that might be actually tuning to the recording. Later down the line, let me give you a little brief recap about NGINX. NGINX is, well, it started out as a web server in 2004. The main goal back when it was first created, or the problem it was trying to solve was the world-famous C10K problem, which the Cliff Notes version of that problem is that no web server before NGINX could handle 10K concurrent connections. Then NGINX came along, and it was able to handle them. There's a little bit more nuance behind it, obviously, but I'm not going to delve into it right now. As of April 2023, it's the most used web server in the world. And in the almost 20 years since it first got created, it's been a bunch of new features that have been added. This day is in addition to being able to use NGINX as a web server. You can use it as a reverse proxy. It's an API gateway. It's a content cache, it's a CDN, it's a WAF, and the list goes on. You can pretty much do everything with NGINX. Well, not everything, but you get to just. And there's very, very good reason for its increasing performance. You could say that NGINX is the Formula One car of web servers. It has a very performant engine, hence the name NGINX. That's probably a joke. I actually don't know where it came from. But it also has a very lightweight encasing. It's around 8 megabytes on Alpine Linux. If you're using other distributions, Debian, Rel, whatever, it's not going to be 8 megabytes, but it's still pretty small. Combine those two things together, and you get a placeingly fast web server. Well, reverse proxy API gateway depends on how you want to use it. Moving on, chapter two, tail off to GitHub projects. So now that you've covered a little bit about NGINX, let me move on to the two projects that I'm going to be talking about today. The first one is the Docker NGINX and Probletch image. This project is fundamentally a port of the upstream Docker NGINX repository, which you'll probably have used at some stage in your life. And the only major difference, while there's two main differences, the first one is that it obviously makes some changes to the Docker image to be able to run NGINX in an appravelled environment. The second main difference is that it has its own pipeline to build and push those images. The upstream Docker NGINX repo is part of the official Docker library, so all the pipelines are maintained and supported by Docker for various reasons that I'm not going to delve into here. The imprivileged image is not part of the official library, which means all the building, pushing all the CD pipelines have to be run locally by, well, not by me, and not locally, I guess, but they have to be integrated within to be able to run BIOS by NGINX. And then there's the Ansible NGINX role. There's a bunch of other NGINX roles out there, Ansible roles. There's one for configuring NGINX. There's one for NGINX app protect, which is our commercial WAF offering. There's an Ansible collection for NGINX. Bunch of Ansible stuff out there. But all of the innovative, so to speak, or any of the new developments that get done into your CITD pipelines are usually first made on the main NGINX role, and then they get ported across all other roles and collections. Ansible, everyone I'm sure knows about Docker Ansible, just in case you don't know, just TLDR, it's a configuration management tool. It's not that dissimilar from Puppet who is here, for example, it's one of the sponsors. And with that, let me move on to chapter three, Moby Dick or the Docker Unprivileged NGINX Image. I'm going to be covering now some of the lessons that I've learned. They were mostly in chronological order to when they were implemented, not always, but just to give you a little bit of a guidance on how things are going to be presented. And we'll start with the first lesson, just to implement a CITD pipeline as soon as you can, not four years later, which is how long it took me to implement a CITD pipeline for the Docker NGINX Unprivileged Image. For the first four years, I was building everything locally using a local script. Then let's just say local scripts for CD are not great. This script took, on average, about 30 hours to run. On my laptop, every time I had to do a new release or a new CVE came around and had to rebuild everything from scratch, it's not that common for a build script to last 30 hours, but it can last even longer than that. You don't want to have your local laptop or your local environment being set up to such a tremendous amount of effort for so long. Also, in this case, it took half of my laptop's RAM, basically made it unusable, say your laptop crashes, or whatever many issues your local laptop can have will crash the script. You have to start again, not fun times. And as it turns out, it probably took me longer to actually write this batch script than it took me to implement CITD at first place on GitHub, so you know, just as soon as you can, just implement your CITD pipelines, whether it does via GitHub Actions or whatever other tool out there that you want to use, just make it one of your first priorities. Second lesson, figure out when to best publish new artifacts. That is especially relevant if you're pushing something like Docker images to an option repository, it also can apply to any amount of packages. There's some expectations or like some unwritten rules around publishing packages. One of them, for example, is that you never publish something new on a Friday, because if things break, people are gonna have to stay over the weekend, that's not fun. So it's up to you, you gotta figure out what makes the most sense for your project, for the Docker engine action privilege project, what we ended up settling on that made the most sense, it's on a weekly basis every Sunday night, or Monday morning, midnight, I think it's CST, I think that's the default GitHub Actions timing, or time zone, new packages, well, new Docker images are built and published, both to Docker Hub, to the GitHub Internet Registry, and to the Amazon ECR registry. And then there's also the option, which is very neat that GitHub let's see do, which is to set this flag called workflow dispatch, which lets you manually trigger those builds down the line, which for example, if say it's Wednesday and someone finds there's a new CVE and then there's a fix gets released for the CVE and you want to publish that fix before next week, you can just go manually trigger that new build. Next, let's integrate one delivery target at a time, if once it works, move on to the next target. This is what I did when I was implementing Docker Hub and the GitHub Internet Registry and Amazon ECR. Even if you have the code done from the beginning, and you're pretty sure it works, and you may be tested in development and it works, once you're publishing it to production, it's just way better to still make sure that one target works at a time, you push your build, see you Docker Hub, you give it a couple of weeks, make sure that there's no fires to put out, then you'll implement, let's say GitHub, yeah, the GitHub Internet Registry with another couple of weeks, make sure there's no fires to put out, and then you go into the next step and so on. This fundamentally helps you if things go wrong to just be able to address one issue at a time instead of having like, I don't know, 10 new issues over a week targeting three different platforms. The next lesson is that caching can be cool. Better be wary of potential pitfalls. Why do I say this? When I first found out about caching in Docker, using GitHub Actions, well for Docker images using GitHub Actions, I thought this is cool. My builds now on average on GitHub Actions still take like four or five hours. Let's try to make it a little bit faster. Turns out that did work quite nicely. Actually, my build times went from like four or five hours to 30 minutes, 20 minutes, 10 minutes sometimes. But I did not take into account that caching, the way it's implemented at least for Docker on GitHub Actions. It fundamentally does not detect package updates within your internal Docker file unless you're very explicitly set. So dependencies of dependencies and so on are not really flagged as being outdated. And what that means is that when any CVE happened and you fix was released, sometimes rebuilding the full image will not actually force the rebuild of the cache and the CVEs will not get fixed. That can be a huge pain in the ass to put in nicely. Fundament, and there's also this small issue that the GitHub Actions cache to this day is not fully quite there yet, where it should be. If you want to delete a cache, you have to go through the entries manually, they'll let them be at UX. If you want to use the GitHub CLI, it's a little bit better, but it's still a little bit of a process. So caching can work out in your project, but before you implement caching, make sure that you're taking into account all the possible things that go wrong with the cache and be ready to potentially having to use certain caches and going through that process, which again, it's still not quite there yet. And the final and very obvious lesson that I've learned over time for the Docker Engine X Privilege repository, is to never write tags. I made, once upon a time, decided to rewrite a little tag because I made a small change and I thought it was gonna be a huge quality of life improvement for everyone using the repository and using the Docker image turns out, people were not happy about it. And since then, obviously, I learned my lesson the hard way. You just never write any tags no matter how minor changes. Doesn't matter if it's just a marked-in update, people expect their tags for a good reason, obviously, to be immutable and they should be immutable. So yeah, no matter how tempting it might be, just don't do it. And with that, we can move on to chapter four from the Earth to the Ansible Galaxy, which is some of the lessons I've learned whilst developing the Ansible Engine X role. Ansible Galaxy, for those of you that might not know, is the Ansible open-source marketplace as the little joke. All right, lesson number one, avoid a pipeline vendor lock-in as much as possible. When it first started working on this project, the go-to default for Ansible workflows were both for CI and CD was Travis. It was the recommended tool by Ansible. It actually even came with some built-in integrations. As we all know by now, Travis open-source is no longer a thing for better or worse. So that's obviously not a good candidate for any open-source project. So it had to migrate to GitHub Actions. You could fill in any other pipeline tool out there. GitHub Actions just made the most sense to me. And fundamentally, it was relatively easy for me because all I was doing in Travis was a bunch of scripts. Porting scripts is easy, but you gotta be careful not to try to lock yourself too much with that pipeline. A good example, for example, if you're using GitHub Actions, a fact that can happen is if your entire workflow depends on externally supported pipeline actions and you're just using pipeline actions supported by other persons, if you want to migrate from that. So time down the line, it's gonna be quite hard, which does not mean that you should not be using external GitHub Actions supported by other people. Just make sure it's a conscious decision and what you're doing when you choose to use one of those instead of your own scripts running on whatever platform. Lesson number two, optimize your pipeline's runtime and efficiency. This is one of the most important things. Honestly, this day and age, as far as pipelines go, especially if it's part of your automated workflows, it just makes your life so much easier. Developer productivity, it goes up to roof, if there's an issue with your code, you're gonna be able to find it as soon as you can. You know, there's multiple ways to do it. There's no right or wrong way. For example, in my case, I used to run a bunch of molecule tests, molecular-sensible native testing suite, but I used to run them in order in the same workflow. What that meant is that fundamentally running a full suite of tests will take easily about eight hours, something like that. That's when I learned about using build matrixes, and I mean Travis supported them, so that's get-up actions. I am seeing other tools that support MT. You can create a workflow matrix and then you just run a bunch of tests at the same time, run like 10 jobs concurrently and they will hopefully be done within an hour or two instead of having to wait like 10 hours just to have a single test of, well, suite of tests to run. Get there's no right or wrong way to do this, but there's usually, there's always some way you can optimize your runtimes. The next lesson is, split your pipeline into distinct workflows. So when you're starting, this is a little bit like, you know, monolithic versus microservices, when you're starting, having something at the same workflow file can ease development speed, might make sense, depending on your whole ecosystem infrastructure and the complexity of your project. But as you keep developing it, you keep adding more use cases to you and things that your workflows can accomplish, it's a good idea to start splitting them into unique and distinct responsibilities. For example, in my case, I have one pipeline that's for testing, I have one pipeline that's for deploying new Ansible releases into Ansible Galaxy, and then I have one pipeline that's for creating release notes. And it could get even more different than that. It could have like a pipeline that's for testing software, for testing PRs and then for testing external and internal PRs, different pipelines. It could have a more complex pipeline that runs once a week and does like a full range of unit and integration tests versus a more simple pipeline for external PRs. But fundamentally, it depends on your use case again, just keep in mind that it's usually a good idea to over time split your pipeline into distinct workflows. And this actually ties into the other lesson, figure out when it makes more sense to run each workflow. If it's a workflow that just publishes to say Ansible Galaxy, maybe you only need to run it when you're creating a new release. Maybe instead, if you have a pipeline that is creating Docker build artifact, you want to publish it on, you want to run it on every external PR, publish it to your development environment and test that it works. If you have a pipeline that only does release notes, maybe you only want to run it when you're doing release notes and so on. Again, it depends. But you gotta figure it out. And it's a good idea to spend some time thinking about it. Let's listen, depend upon is great and you should enable it. This is very GitHub specific. Find out about it's a great tool. It will tell you exactly what packages you have audited. It will open PRs for you automatically. It will save you a lot of time that you will be spending otherwise if you were hunting for those updates yourself. There's a little issue with it though and that's that it can also be overwhelming. If you, I mean, at least as of two years ago, things might have changed, but two years ago the defaults meant that the Pandabot will open a PR anytime that you release for any package got released. That, and sometimes I can, this trial series of PRs that I have here basically results that over a week, maybe you have like five, six PRs being opened at random times over different days. So only, you know, how to your job becomes to be a dependable maintainer. So it's a good idea to use dependable but tweak the updates to a consistent and same frequency. What I did and what made the most sense to me is that now every Monday, because dependable lets you do this and, you know, not a lot of people know about it, but basically what it is every Monday at midnight or Sunday night, dependable search at the same time for any new updates, it opens a bunch of PRs and then every Monday morning when I log on to GitHub, I can see any PRs, deal with them all at once. If there's any issues with one of the PRs, I can deal with it over the week and so on, but I only have to worry about dependable once a week, fundamentally. All right, next lesson, hard code package versions in all your pipelines. This is something that has bit of me quite a few times, it still does bit me to this day with some packages like last week, just more recently. A lot of, it's not uncommon, especially in some more dynamic ecosystems such as Node or Python, to have multiple packages being released week after week. Those packages sometimes include, well, usually include a bunch of dependencies, those dependencies in turn. Every once in a while might have breaking changes. It's not uncommon for a workflow to just randomly stop working from one day to another and that's again not great. It's not the first time that's happened to me that I've had to spend like a full day just trying to debug why my pipeline stopped working only to eventually find out the issue was that one of my dependencies got a major breaking change over an overnight update and I did not realize. Hard coding, those package versions helps with that. You will always know what version is being used and if there's any update to that package, it'll be found by dependable, dependable will open up ER and you'll be able to fix the issue in that dependable PR specifically instead of your whole pipeline failing. The next lesson is that pipelines will randomly fail and it's good to find ways to avoid a hard restart. Why do I say it will randomly fail? If you're using anything that really lies on a potential HTTP endpoint, whether that's downloading a package from somewhere or querying an API, timeouts happen, the internet goes out, there's always a chance that something will randomly fail if you use networking or anything like that in any way or shape, you'll know what I mean. There's a very, by default, GitHub has this option, GitHub Actions has this option called fail. I can read it from the side, it's fail start or fail save. Fast fail, something like that. It's such a true, that means that if one of your jobs in your matrix fails, everything will fail automatically. I'm not a big fan of that because again, Python's randomly fail. I always suggest people, she said it to false so that if one of the job fails, all the other jobs will still continue and ideally if it was just a random failure, all the other jobs will succeed then you can go to the job that failed, dig into it, see why it failed. If the issue is something to do with your code, obviously you go fix it and push the new commit but if it's just a random failure which happens about, it happens more than like, I don't know after time, it happens more frequently than it does not, you can just rerun the job that failed. The next lesson, very briefly, soon is something that you develop over time, it's just whatever possible, consolidated pipeline tools and steps. To do this properly, you usually have to keep actively checking for updates and the tools that you're using. An example for this, for example, is one of the Ansible lint objects that came out in the past couple of years. Added YAML lint as a dependency and started running YAML lint natively within Ansible lint. What this means is that, well, what this meant is that I was fundamentally able to get rid of YAML lint in my workflows and have Ansible lint with that. It's not a big time saving but over time if a bunch of these little changes happen to some of the tools you're using, you're gonna be saving some time and it goes back to optimizing your workflows, be able to run through workflows as fast as you can, as often as you can. There's also, beyond that, there's also just how you keep developing your workflows and how you learn how your software works and how the workflows work. There's always opportunities to optimize some of those workflows a little bit more and consolidate steps, probably. Next lesson is enabling contributors to develop PRs for less than software. I'm not gonna adopt too much into this because this is one of those there's no right or wrong ways and it depends on your use case but fundamentally if your tool has like an open source component and maybe an enterprise component like nginx.dash, nginxuppensource and nginxplus, it's a good idea to let users have a way to test those enterprise features by letting them have an option to if they fork the repo, provide some licenses, some secrets, be able to test that software. Instead of doing what I initially did here which was to fundamentally block any enterprise tests from running on any fork or external PR. And the last major lesson is integrate all the plumbing and target distros in your pipeline as well as your target architectures. This seems like it makes a lot of sense but I've seen a bunch of times where people don't properly test all the platforms that are part of that their software is supposed to target or they don't test properly all the architectures that their software is supposed to be running in and it's very obvious it just saves you a lot of headaches even if you test some of these things locally and they're not part of your pipeline. And you think you're gonna be doing locally, every major release, it's only a matter of time until something slips and you stop doing that. So just integrate everything in your pipeline, in your automated pipelines really. And with that for the final few minutes we can move on to chapter five, 100 years of collaboration, also known as why pipeline as code is great. Pros and cons, fundamentally. So there's the good. There's the version control side of things when you're using pipeline as code, CICD as code, version control, everything is checked into Git or Mercurial or whatever tool you're using. I would hope most people are using Git these days. You have automated PR, well not necessarily only PR automated CICD it's very easy if you use pipeline as code. You can foster continuous improvements by knowledge sharing between amongst other open source users out there in the world. You can reduce false positive bug reports. This adds a little bit into the whole, make sure that you are code on your package dependency, someone not. It helps people know if the issue is an issue on their end, maybe through using some packages that are not the package versions that they're not the versions that you'll be using, or whether it's an issue with the source code as an actual bug. It also helps because people can then look at your CICD, be like, hey, this is being tested this way, maybe I'm not using the software or the tool the right way, maybe there's this flag or something that's not documented but it's in the CICD pipeline. It does help quite a bit with that. And then the last part is both the source code and the CICD code live together. This is quite important because as a user, it's an open source user, I know for example I myself love when all things are in the same place where I don't actually navigate through three different websites to find everything related to the software I'm using. So that's the same way why you probably would want to have docs in your repo and your source code too, this makes your life, makes the life of external developers and internal developers too, all the much easier. Then there's the bad, which is not necessarily bad, it's just things you need to think about. You know, what do you do with secrets? If you're gonna, it's an open source project and it requires secrets to run. Do you enable them for external PRs? If not, do you provide an easy way for external PRs to access your secrets? It's stuff that you need to think about. There's no right or wrong answer. Again, it depends on your use case but you need to consider it. Then there's when do you run your CD pipeline or for example, who or who triggers the CD pipeline? Does the CD pipeline get run on every PR? You know, get pushed to development repository? Does the CD pipeline only get run on new release? Is the CD pipeline automatically triggered? Do you need like a maintainer to come along and run the CD pipeline? All food for thought. But it's not necessarily bad, it's just the things that you really need to keep into account. And then there's the ugly, which is the const, there's really no const that I can think of or that I've ever really been able to find with pipeline as code. There might be some, I don't know, if there are just hit me after the session and I will make some amends but you know, just no bad reason to start your journey with pipeline as code. And with that, that's the end. You can find those details in the slides later and they're gonna spend too much time here. But there we have a Twitter account, YouTube account with a bunch of videos, LinkedIn, whatever. And then there's the little epilogue, which is basically the Q and A. But before, I think I have like two minutes for questions, not a lot of time. I just wanted to do a quick call to action. We have an engine X community slack. If you, it's quite recent, only like two months, three months old, maybe a little bit older than that. No longer than half a year old. But we are all hanging out there. It's been growing quite steadily in population. If you have any questions, if you want to interact with any of us engine Xers out there in the world, this is a great place to do it. And just secure code there, just the link down. Then there's my contact details again. If you want to reach out at some stage. And I don't know if we have time for questions, but if you do, does anyone have any questions by any chance? No, no, yes. Yes, there is a tool called ACT. It's a little bit of a, it's not the easiest tool to use. It has its quirks, but it does the job. Good enough. Decides? Yeah, there'll be, there should be on my, if you look at my session profile thing on schedule app, whatever, I uploaded my slides. There should be a way for you to get access to them. I honestly don't know how that works, because I've never done it, but there should be a way. Oh, for publication. That's all, maybe a little bit more involved. Say around later, you can shoot me an email or something. We can look at that, yeah. Right, one more question. Yes, you actually do use that for all my Docker builds, yeah. Yeah, she mentioned Docker. I built for all sorts of architectures using Docker, I use Docker or Kimu now on GitHub Actions. Kimu, I don't know. Anyways, way back when I did use Docker BuildX natively on my GitHub, when I run the script locally, and when these days, by default, if you use the Docker Build Action on GitHub Actions, it does use BuildX behind the scenes. There's like S390, LPC, etc. Yeah, I do that for the Docker Engine X images that are built for all the architectures. The Ansible role gets tested on AMD ARM64 and S390X, actually, S390X, that's the name they come off with these days. All right, any more questions? No, I think that's it then. Thank you for coming along.