 Ready? All right. Thanks, everybody, for coming today. I'm Ann Marie Fred. I'm a senior principal software engineer at Red Hat. And the talk today is about CI, CD pipelines. It's geared for beginners, managers, or people like me who are typing in interface nerds. A little bit about myself. I've been a software developer for more than 20 years, of which about the last 10 have been in a DevOps role. I was a DevOps Day's conference co-organizer in Raleigh for three years. I'm also active in a couple of Linux Foundation and CD Foundation open source projects. I have three years of experience as an HR manager, four years as a security focal. Most of my years is at IBM. About a year ago, I moved to Red Hat. So just so you know what you can expect today, I'll talk about how you can kind of get started with CI, CD pipelines. As I'm going through this, I'll be showing you some open source shared vocabulary that we've been working on with the CD Foundation Interoperability SIG. I'm also going to talk about some common terms that you might hear and what people often automate with their pipelines. And maybe it'll give you some ideas for things you haven't tried yet. Then I'll also talk about things like events and inputs and outputs that we expect from each of these steps and how it all relates to DevSecOps and software supply chain security. And at the end, I have a little bit about some current research we're doing with more pluggable tech time pipelines. I'm trying to gear this talk for both beginners and experts. So my goal is, if you're a beginner, you can think about how you can use these different tools and techniques yourself in your own development process. And if you're an expert, try to read between the lines and think about how you can help us standardize the semantics and the inputs and the outputs of each of these tools and techniques. Here's a little plug for the CD Foundation interoperability SIG. It was founded because users faced this challenge with a lack of interoperability across their CI CD tools and technologies. And that causes issues while constructing and running pipelines, such as passing metadata and artifacts between the tools or achieving traceability from commit to deployment. Especially with supply chain security, this becomes more and more important. And I just took this from the website, but a couple of things to highlight. We really want to promote the needs of users who are facing challenges constructing these complex pipelines and flows and explore the synergies between them and enable collaboration across the different projects. So why does the shared vocabulary even matter? As a user of CI CD pipelines, I want to be able to think and talk about them with people who use different tools in a coherent way. So the DevOps world is an open world. Everybody's using different tools and techniques, and it helps if we can share best practices with each other. And furthermore, from a software perspective, we need to be able to pass the outputs of one tool as an input to another. So if we can develop shared vocabulary, we can understand each other more easily. The catches that many of these tools have been around for several years, and they have their own strong communities, they're certainly not going to just change all their wording. But maybe we can use these shared words when we talk to each other. You can find all this at the document that I linked, and if you need me to send you a link later or a link to these slides, I can do that. But at this particular website, you'll find terminology that's used across various tools and techniques, including CI CD and source code management tools, a mapping of terms from one to the other. So you can say a pipeline in one is a workflow in another, and also some shared vocabulary. Now getting to basic CI CD concepts if this is new to you. So the use case here is, as a software professional, I want to automate as much of my manual toil as possible. So the fact is that software development and deployment and operations and support involve a lot of repetitive tasks. And they're boring and a waste of time unless I do the work to automate them. Some organizations have people who specialize in this. It could be like a DevOps or an automation role or something like that or build specialist. But many developers actually set up their own as they're going. And they do this using a combination of specialized CI CD and pipeline tools, as well as shell scripts that cobble everything together. So continuous integration, continuous deployment, and continuous delivery. Continuous integration is actually the practice of merging the developer's working copies of code to a shared mainline several times per day and includes practices like automatically testing and building your software from scripts or tools. Continuous delivery is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time and we're releasing the software without doing so manually. So we usually aim to be able to deliver the software maybe daily or even several times per day ideally. Continuous deployment is the approach in which software functionalities are delivered frequently through automated deployments. So to contrast that in continuous deployment, you're able to actually deploy, you do deploy the software to production. Whereas in continuous delivery, it's capable of being delivered. Maybe you've packaged everything and put it up to a repository ready to be delivered but you're not going the last step. In the context of this talk, it doesn't matter too much whether we're talking about continuous delivery or continuous deployment. I'll point out like the one place where it does matter. So in general, I'll just use the term CI CD and it means both. A few more terms that you'll hear. Pipeline is a sequence of CI CD stages and steps that's used to automate a repeatable process. It's just an automated workflow. A stage is the unit of work one degree smaller than a pipeline and these might be implemented sequentially or in parallel or a little bit of one and a little bit of another, some combination of both of those. And then a step is one step smaller than that. It's two degrees smaller than a pipeline and this is something you can think of that would typically be several lines of code is a step. You'll see this chart a few times. You don't have to memorize it but this is just showing how we have a lot of different CI CD tools down the left side. And you can see they use different terms but a pipeline is generally the same thing across tools or maybe it's called a workflow or an activity. So many people start with these four goals for their CI CD and these are often implemented as pipeline stages. The first is build. So that means you're gonna download, retrieve, assemble, compile software and documentation into an executable and testable format. Sometimes also called compile. Then test is where you're gonna test, scan, verify and lint software and documentation. It's also sometimes called verify. Release is where you're gonna package, version, sign and publish the artifacts and documentation. Sometimes it's called deliver or publish. And then deploy is where you're gonna deploy those artifacts and documentation to any environment other than the pipeline slash test environment. Sometimes it's called install, especially if you have long running systems that you're updating. And here's the stage terminology you see across a bunch of different tools. Now pipeline steps, sometimes called tasks are where the low level implementation happens. So as a pipeline developer, if I know where to look, hopefully I can find pre-implemented, pre-tested steps that other people have made and reuse them in my own pipelines. But it does take more time and effort for somebody to write a reusable step and share it because they have to parameterize everything and test it in a bunch of different scenarios. So in the worst case, I might just implement my own from a few lines of scripting language. And here's again, you can see step is pretty widely accepted term, although sometimes it's called an activity or a job or a task. So there's a few things that most pipeline tools have in common when you're working within the context of a step. You're gonna have access to environment variables and configuration parameters. There's gonna be a workspace where you can read and write files. You'll have some way to securely store and retrieve secrets. And a way to return results, including success or error codes or the output of the step. And also a way to control whether the execution continues or stops based on whether the step has an error or a success. And some steps are optional, so they can fail, but you can continue anyway. And then there'll be log storage and retrieval because you always need to go back and see the logs from your pipeline if there's a problem. So let's just talk about a bunch of steps, starting with the most basic and then going up from there. Before I do that, there's some shorthand terms that I use in here. Software source is like human readable source files, usually what you have in your source code management system. It can also include things like configuration files and documentation files and your declared dependencies, like if you have a list of packages that you depend on. Or what we call the baseline composition information. Like if you compile it locally, you can generate a log file that says exactly what version of each package you used and actually put that in source control to make sure that the build system uses exactly the same versions. So that's all your software source, then your binary source is things that are like executable dependencies, container images, virtual machine images. Then as outputs you're gonna have your generated software, which I used to mean things that are still human readable but are generated from the pipeline. And then you also have generated binaries. Like again, your executable software images in VMs. So the first step is you're gonna do some setup. You'll have to provision the resources for the pipeline itself and then set up your workspace. Sometimes this is called initialize, start, prepare, workspace or orchestrate. And you're gonna get as inputs like the request or trigger parameters that cause the pipeline to run. And maybe you'll have a container image name and version. For example, it might say okay, this is a node 14 pipeline, right? And then as outputs, you're gonna get your workspace populated with your secrets and your environment information. Some side effects. This might also set up some persistent storage or another method for the pipeline steps to share the inputs and outputs with each other. And sometimes this is kinda done under the covers by the tool. Then you have your source step and you're gonna copy your software and images and documentation into the workspace and fetch your configuration data. So sometimes this is called clone or fetch. And for this step, you're gonna need like what source code management repository or branch or commit. You're gonna pull in and as they're outputs, you're actually gonna get your source files in the workspace. Then in the build step, you're gonna assemble and or compile your software and documentation to some executable and usable and testable format. So this might be called compile, install, assemble, generate. So the inputs is a software source and binary source and outputs is your generated software and binaries. Then in a test step, you're gonna run a test suite. Now there's a wide variety of tests that you can run, right? You can have unit tests, integration tests, acceptance tests, performance tests, canary tests, AB tests, smoke tests, code coverage checks. These are all just generally different types of tests and they're gonna produce like test results and test reports and test coverage metrics for you. Then in a package step, you're gonna create the software artifacts that will be published, including your container images and tags and digests and any archive files that you need. And then in a tag step, you'll annotate your source code or artifacts with information like version number and a description. In a publish step, you're gonna upload these to another repository and you might also update various catalogs or mirrors or update your release notes. Sometimes this is called push or upload or release. So that's just the very basics. You can't get very far without that. Now you're gonna add provisioning and deployment next. And this is where we make the leap from continuous delivery to continuous deployment. And sometimes we'll have different tools for the CI and CD stages. For example, something like Argo CD is really focused on this next part of the process, right? Whereas some like Jenkins tend to handle both the CI and CD in the same pipeline. So in a provision step, you're gonna request that a new physical or virtual server or network or other resource be allocated for you. So this might also include like a test cluster or some object storage that you need. And then the next step is to deploy. So you can make changes to any environment other than your own pipeline and configure it and deploy your dependencies and software artifacts and documentation. Sometimes again, this is called maybe install or configure. And you're gonna get back routes to your deployments with connection information and like deployment records and sometimes secrets to access the newly deployed resources. Then once you have a deployment, you can verify that it was successful. Sometimes this might be called a smoke test or there's other terms for it. So of course you're gonna have to take your provision resources that you just got and your routes and your secrets. And then you're gonna run a bunch of tests and you're gonna get as outputs your verification results and some record of that. And then importantly, a cleanup phase. So that's where you're gonna release pipeline resources, deep provision environments, delete the workspace of the pipeline and the pipeline containers. Sometimes it's also called finalize or finish and sometimes it's kind of hidden from you by the tool. So this is where it starts to get a little more sophisticated, adding DevSecOps and supply chain security. So once we have the basics working, now we wanna automate our security and compliance work. Secret detection is where we detect secrets in the source code or other software or even secrets accidentally in our documentation. So the inputs will be like a reference to your source code repository and the output, an interesting side effect here is that these can actually revoke the secrets. So for example, I know of a number of tools like GitHub has one that will detect secrets and invalidate the keys that it can if it sees that you've checked them into your source code. Another is dependency discovery. So this is like a deep discovery to identify all your dependencies, including the transitive dependencies. So for example, if I import five packages and each of them import five more packages and then they import five more, this is gonna go all the way down the tree and it's gonna create like a dependency list or graph. It can create a lock file. It's also something that's used to generate a software build materials. We hear people talking about an S-bomb, speaking of which. Actually generating that file is the S-bomb, or build materials step and that's gonna have like pedigree information about all your dependencies. And a lot of inputs go into this. I won't read them all to you, but that's what's happening here. Then remediate is kind of fun. So this can find and automatically fix known vulnerabilities for your application package dependencies as well as your container-based images and your operating system packages. Sometimes it's called fix or update. So one good example if you use Node.js is like npm audit and then npm audit fix will actually find all the dependencies that have known vulnerabilities and then replace them with the later version automatically for you. That's one example of a remediate step. Or if you have a container-based image, a lot of people like with Ubuntu might do like an app to get update and sort of rebuild the image automatically as part of their pipeline. So a lot of outputs come out of this. It's almost like a mini build phase happening again, but sometimes the remediate phase happens at the beginning of your pipeline. Sometimes people run this totally separately on a schedule. Maybe they run this every few days to just get the latest dependencies. At Red Hat, we have a tool called FreshMaker that does this for you. And it can have some interesting side effects. Like it might also update your source code repository with new dependencies or it might create a pull request with the updates, which is kind of my favorite. Like it's saving me 15 minutes of doing the same thing by hand. All I have to do is read the pull request, see it change one line and it'll actually run my test for me again. I see they still passed and I just say merge and it's done. Or it might use APIs to request an update. Maybe it'll open a GitHub issue saying that you have a problem that you should fix. In the case of mutable infrastructure, this could even update a running system. So I've seen people who really, really trust their tests where it will automatically run a remediate step, create a new container, publish it, deploy it and done. Saves a lot of time if you can do that, but you do have to have excellent test coverage. A scan step is where you're gonna use a tool to do some verification of the software and documentation other than testing. So in here we include things like static code analysis, like security testing, linting, checking for your known vulnerabilities, dynamic security scans, license checks, like making sure that the open source licenses you're using are appropriate for your project. Code smells, things like this. And then when you sign, you're gonna use a cryptographic method to authenticate the software. So this might include information about the source of the software and how or where it was built. Like it might say Ann-Marie's pipeline version two is signing this. And what level of approval it has received. So my pipeline might say I've approved this for staging or production. Then we have policy steps. So these are verifying that corporate policies are followed. For example, you might have a policy that software has to come from a trusted source. Like you have to download it from an internal repository, for example. Or you might verify that your source repositories are configured correctly. Like for example, you might configure GitHub to require that a second person has reviewed and approved every single change so you can't merge your own code. You can check that your dependencies are signed themselves or that code reviews have been completed or that appropriate work items or change requests are associated with the change and that they're approved. And so the inputs to this is usually some kind of policy language as well as the outputs of the other steps. And then interesting side effects. A policy might be able to block a pull request from being merged. They might discard or approve an artifact or an image or they might stop a deployment from happening. Then record results is a step we're gonna just record and report pipeline results and compliance evidence and store the artifacts for long-term archival. Sometimes this is called audit or attestation or evidence or report. And basically the side effect is that these artifacts are uploaded or archived somewhere else. Because by default usually the pipelines will only last, I don't know, a day and then they're deleted. So here's an example of one of these that I implemented in Tekton just to kind of make it more concrete. You can see first on the left it's going to do a setup step and then it pulls the source. It builds in a way that's appropriate for this programming language and then there's a package step on the right. And then it keeps going after package. It actually does that deep discovery on the image and the software and builds a bill of materials and signs the artifacts and publishes them to our internal, it actually publishes it to Quay. And then you can see kind of on the right it's starting to provision the resources in parallel for the test phases. And it's also doing some other things in parallel. It's detecting secrets in the source code and it's checking a policy that code reviews have to happen and also that branch protection is in place which means you have to have somebody else approve your change. You see it just keeps going, right? So after it's published and provisioned what it needs then it can deploy to a test environment and verify that that deployment worked. And then you can see it does some scans on the running code like twist lock and it does an acceptance test and a dynamic security test. And once it's done everything that it needs like a running copy of the software to do then it does a cleanup of the deployment and records results of that. And at the right kind of cut off you can see it sends a message to Slack so that we know how it went. And let's see, you can also see that the unit tests can kind of happen in parallel with all this. And then after the unit tests there's some other checks that are happening like code coverage check and a lens scanner and a static application scan. And there's a bunch of policies that got cut off there in the right like corporate policies like I talked about. So these can actually get pretty complex pretty fast. I think this is like 35 steps and it's actually not an unusual pipeline. And there's even more that you could do. In an analyzed step you're gonna do additionally processing and analytics based on the results of any previous activity. So these are sometimes called metrics or score or grade. Pars, I'm not sure why they put that in there. Outputs could include like analysis results and reports out of that. A message, I mentioned this briefly you probably wanna send a message to another system like a Slack message or an email. There's another step called create request but in this one it's just to send it and forget it. Whereas in create request you're gonna get back a link or a handle for what you did. So create request is if you're gonna create a request in another system like a change request that has to be approved before you can deploy to production for example. So we need to store the link to the request that we can keep updating it over time through our pipeline. Then update record is updating a record in another system like that already exists. Like updating and closing a change request after the deployment to production happens. Or maybe you're gonna update your GitHub issue with the results of your policy check. It's actually really nice if all your pipeline checks are putting messages into your source code management system so that you can see it all in one place. A run step is where you're gonna run a script or program that doesn't fall into one of the other categories. So often this will execute in another container or maybe it'll be implemented in such a way that somebody else builds a container and you pull it into your pipeline and run it. Super useful, also very dangerous from a security perspective because whatever that script or program does is gonna run inside of your build pipeline. So this is a potential source of vulnerabilities for supply chain attacks, right? So you have to put some extra care into making sure these are running in a sandboxed environment or that you trust them completely or both. And as organizations get even more advanced, they start to add more stages. Some examples might be open source introduction, open source storage, like if you're going to maybe pull open source code, rebuild it yourself and have your own internal repository for people to pull from. There's open source consumption and more. So describing these is still work in progress. This is something we're doing now with the interoperability thing. And you're welcome to contribute suggestions also. And here's a little bit of current research that we're doing that sort of motivated this work. We wanna make pluggable tecton pipelines. The use case is as a provider of CI CD pipelines. So somebody trying to build pipelines for other people. I wanna create a policy that requires that my pipelines verify certain things, but easily replace one implementation with another. A very simple example is I might have this 30 step pipeline, but my build steps gonna be different for Java or Node or Python, right? So you have to be able to swap those in and out for each other. Or the static application security test might be different based on the programming language. Or maybe one company has this tool for dynamic testing and another company has a different tool. Well, if we wanna share it across companies, right? We have to make these sort of pluggable. In reality, if you don't do this, you do end up with like 50 different implementations of pipelines that essentially do the same thing. And it's a maintenance nightmare. So our pipelines need to be able to automatically run these tests and gather the evidence that the right checks are happening for security and audit purposes. Also developers need to customize their pipelines. Like maybe when you're first developing something, it's not gonna pass all 30 checks, but you need to at least get the unit test working, right? But by the time you get to production, you have to have passed all the checks. So you need to be able to turn things on and off. And you need to be able to chain these all together into a customized pipeline. So standard step and task types is a starting point for all that work. All right, thank you. Feel free to reach out if you wanna chat. It looks like I have about two minutes if there's any questions too. Basically, you can sign everything. In the pipeline, yeah. In fact, I know in our own pipeline, pretty much the only way you can sign it is through the pipeline. Like that's something that the pipeline team kind of locks down. I don't personally, but I have definitely seen them generated, yeah. Like you can pull comments out of the pull requests, yeah. If you wanna reach me after, I'll be out here also. You can reach me on LinkedIn or Twitter. I'm easier to reach on Twitter. Thanks.