 with a few other things. The biggest difference for me is that they are fully open source. So the community edition you can install by yourself and you have a fully open source version control software. It's very nice for on-premise environments where companies are not necessarily very happy about pushing their code to various cloud environments or SaaS products. GitLab makes a very nice on-prem alternative. There's a community edition and a enterprise edition. If you are at a company that has the financial means, I would recommend you to go with the enterprise edition simply because in order for the community edition to exist, people need to pay for the enterprise edition. So we all benefit from that. We are an enterprise user as well, but we also use open source for the clients where that is the better option. GitLab CI is a continuous integration service. The nice thing is that it's fully integrated with GitLab itself. So because of the fact that most of the time when you're doing some CI, it's very tightly coupled to the source code that you want to do CI on. It's quite nice that GitLab CI is so closely coupled. It's fully open source as I already said. If you are already using GitLab Community Edition, all you need to do is check in a GitLab CI.yaml file and presto you're getting started. It's not actually true, you'll also need a runner but we'll get to that in a minute. This minute. In GitLab CI, runners are what is referred to as machines or environments that run your, are you guys okay? Yeah. Green shirts, they fix trouble. So the runners, they run your builds. There are many different types of runners, but basically a runner is some kind of environment that picks up builds from GitLab CI through the internal APIs. It can be specific to a project or serve any project. So specific to a project is for instance, if you have a certain code that only works for Windows, then you will have a runner that runs Windows and you won't want to run your other projects on them. So see later on how we can determine which projects get run on which runners. But in general, runners are the isolated environments where you run your code. There are a ton of different runners. You can look them up, but most useful at the moment, depending on your project of course, is the Docker runner. It just runs one or more Docker containers that pick up builds, execute a bunch of stuff, and then throw it all away after it's done. It's very easy to get started. The way the standard runner is set up with the GitLab is very easy to run. However, if you're going to run this in production for a longer period of time, look into the actual Docker installations underneath. We had one problem where by default, it was using one of those storage drivers where it can only eat space and it can never release space. Won't get into that here. But yeah, so it's useful to look into that. I'd recommend running your runners on a separate VM. So you have one VM that runs the GitLab installation including the GitLab CI, and then one VM that runs that's the easiest. So the Docker runner basically goes against what the real idea of Docker was and abuses all your Docker runners as a throw away VMs, which feels a bit awkward at first, but it's actually quite nice after you get used to it because it just simply works. I currently do a lot of operations stuff, so a lot of puppet code, and we use a lot of GitLab CI for testing puppet code, and it's actually really nice because the thing with puppet code is that it actually makes changes to a system if you're going to run anything. So you want these throw away environments because you don't want to use it twice. You want to start with the same baseline every time. For web applications, that's a little bit different because they usually don't make that many changes to the actual system underneath. But in our case, it's quite important. So as I said, the only thing you need to do is in your GitLab project which is the same as a GitHub project or a Bitbucket project, whatever. So in your project, in the root of your project, you check in a file called .gitlab-ci.yaml, and here on the right, we see one of the simplest versions of what you could be doing. So we're choosing a Docker image here. This specifies which image we're running, just comes from the Docker Hub. Then we have one job called test. Here, this label can be anything you want. So if you wanted to be your mother's name, that's totally fine. It's probably more convenient to use something that actually means anything. But hey, and then in its simplest form it has a script parameter, and that just has a number of statements that you wanted to execute. In this case, we'll do a bundle install, and then we'll do a rake lint and a rake syntax task. This was taken from one of our Puppet projects, but you get the point. So once you've checked this into your repository, from that moment on, for every push to the branch that has this file in it, it'll run your build. For every merge request, it will also run. But you have to be understanding that your GitLab CI can be different between branches. I wouldn't necessarily recommend doing that, but you'll notice it when you make changes to a GitLab CI file, and you'll probably have it sitting in a topic branch first which gets a merge request, and until that is merged, you have two different GitLab CI files. So you have to be a bit aware of what it's actually running. It's not a big deal, but you could intentionally abuse that if you wanted to, if you have some kind of a specific situation, but I haven't really run into that. So jobs, this little test here is basically a job name. So the jobs can be run in all of these different ways. As I said, these are basically the runners. You could run it locally, but then you're screwing with your GitLab, the VM that GitLab runs on, so I wouldn't recommend that. Then varying degrees of complexity that also provide flexibility. You can even connect to a remote SSH server if you really need to run it on that specific machine, because I don't know, it has an IP address or it has something else that it has to be running on that machine, and that's also totally possible. As you see here, though, we have now split up our GitLab CI file into multiple jobs. So it's doing exactly the same as what it was doing here, a bundle install, a rake lint, and a rake syntax. There are two good reasons why you want to keep your jobs as small as possible. The first one is because you can then, how to say that? These jobs can now run in parallel. So if you have multiple runners available, these jobs can run in parallel, cutting your build time in theoretically half. It won't be exactly half, but cutting your build time. If you multiply this by a lot, that will save you a bunch of time. So the first reason is to enable parallelization. The second reason is that if something goes wrong with the job, it's much nicer to have a very specific place to go and search. You'll have to dive into the build log anyway. But if you already know that you failed against rake syntax, then you know exactly this is where the problem is going to be. One more thing that we see here is the before script section, where before script is a special, it's not a job, this is a special configuration parameter, which runs a bundle install or actually whatever is in the before script is run before the script of every job. It's unimaginable how difficult this is, but it's self-explanatory. The before script runs before every job. You see that I created the names of the jobs here using job, colon test, colon syntax. It's not actually necessary. You could just call this job syntax. But because we have things that are jobs and things that are not jobs in the GitLab CI file, I find it very convenient to prefix all the jobs with job. Later on, we'll see that this syntax job runs in the test environment. So I like to call my job job, like literally the word job, then the name of the environment and then the name of the job so that it becomes more easy for you to grasp what runs where and when. So if you do that, you can create a nice little pipeline. So you have multiple jobs and you can create a nice little pipeline. So these are all individual jobs and GitLab creates this overview for you automatically. We'll see in a minute how these are different stages and how to match all of these together. But after a build has passed, you get your pipeline. Actually while it's running, you can see this pipeline as well. You can see the progress and you can see the jobs completing or not completing. You'll see little green check marks. Once a build has been completed, you can see that it's in progress. You can click on it and see the build log getting created, all of that. This is a link to the commit that's being tested. Here you see one special thing that you probably noticed a red cross but it's continuing anyway. It's a special feature. I won't dive into it too much, but you can set certain jobs to allow them to fail. So it's okay that it doesn't build on Windows because who wants it? No, wait. It's okay that it fails on Windows for whatever the specific reason is. This is actually the pipeline of the GitLab CI multi-runner. I think the Docker runner for GitLab CI. I just made a screenshot of that to show you how pipeline looks. So this was really nice, this bundle install thing but it has a problem. Does anybody know what the problem is? Imagine we're running this a thousand times a day. Sorry? Exactly. So we're doing this bundle install before this job, before this job times a thousand that just adds up. We could easily reduce that amount of time spent on that by using what is called artifacts and dependencies. So an artifact, a job can export an artifact, which is just a bunch of files that are created by that job or maybe not created by that job but usually created by that job, that can then be used in subsequent jobs throughout the pipeline. So exactly. Why would we run bundle install every time? It doesn't really make any sense. If you run bundle install with a dash-deployment, it'll install all the gems in the directory where your gem file is in a subdirectory called vendor. So we've changed the script a little bit. We make it bundle install dash-deployment and then we create artifacts from the vendor path. So everything is in the vendor path well after this job is done, be zipped up and sent to GitLab, where this job then downloads it because these jobs can actually be running on different runners, on different physical machines, on different virtual machines, on different Docker instances. So they're communicated back and forth. You want to make sure that you set this expiry in flag because by default expiry is never. That means that all your artifacts are going to be on your GitLab server forever and ever eternally, and that just uses a bunch of space. So expired them in whatever is convenient. After this time, it would be deleted but this is a vendor, it's a bunch of downloaded gems, so nobody really cares about them after the pipeline has been completed. It does mean that the pipeline has to complete within a day, but I'm willing to gamble that that is actually happening. So this job exports the artifacts, but then obviously we need to say in these jobs that they are actually depending on the artifact's job. So we see that now I've created a job called job build artifacts, and now here the other two jobs have a dependency on job build artifacts, and because of the dependency, they automatically download the artifacts from the job that they are depending on. The first line is bundle install dash-deployment, which wants to go through my gem file and download all the gems that are in there, except that it's getting this vendor directory from the artifacts job. So it automatically sees that all of the gems are already there and it doesn't need to do anything, and then we can continue as normal with the other jobs. This example is obviously specific to Rumi, but you can hopefully imagine how this works for whatever language you're in and whatever situation you're in. It's a very, very flexible system. So stages allow you to create different stages of a pipeline. It's what you saw here. These are stages, pre-build, test-build, package-release, and we define them like this. These are actually the default stages, but we've mentioned them here for clarity, and you can add as many stages as you want. The stages will be run through in the order that you define them here. So if you put test before build, then first all the jobs in test stage will be done and then all the ones in the build stage. Jobs of the same stage that don't have dependencies can run in parallel. So again, referring back to this, these four jobs will all run in parallel if they can, which reduces our time significantly from running them all in sequence. So it's very simple. All we need to do is in each job, determine the stage that it's in, build, test, test. The deploy stage, I left it out because it's short enough already. Is it readable for the guys in the back? Yeah, I was wondering why I was making the slides. I was like, yeah, that's going to depend on the size of the screen. I'll put up the slides in the FOSM website as well after this so you can take a look there. So that's stages, fairly straightforward also. Limiting builds, I don't know how am I doing on time? What's that? Perfect. So sometimes you want certain jobs to not always run. You can think of jobs that take a lot of resources, either lots of time or lots of processing power, then you might want to think, hey, I only want to run these when I'm actually deploying to a staging or a production environment, or I want to only run these for the master branch. For that, we have two options. We have only and we have accept. As you might expect, only defines a list of Git refs for which the build is created and accept a list of Git refs for which it's not created. So in this case, we will only run this for the master branch, except you wouldn't actually use this combination like this together. So it works on Git refs, so it works on tags as well as on branch names. So you could actually have a branch name here, so you run it only for master branch, except for builds that are tagged as develop. There is also a special key that you can put here, and I forget what it's called. A special key which only runs this job if it's specifically requested through the API. Triggers. Triggers, that's the one. Thank you. So the use case for that in my opinion is relatively limited, but I presume it's there because that's why it's built. But yeah, we also see here allow failure. This is that tag that I was mentioning that allows this Windows job to finish, and the rest of the pipeline can continue anyway. Selecting specific runners, so you can use tags in a job to make sure that a job only runs on a specific runner. So as you're registering a runner, for instance, if you're registering a Windows machine or sorry, a Windows runner, you can say, hey, you can tag it with Windows, and then in your jobs, you can tag your Windows jobs with Windows, and nothing else will run on those runners, except for jobs that have been tagged Windows. Yeah, you can have multiple tags. So whatever you, it's fairly flexible system. You see them here as well. Using this system, you can do things that not only for Windows and non-Windows, but also for if you have runners running in different Cloud environments, for instance, they won't necessarily always be able to run the same jobs. So you can very, fairly flexible define here which jobs can run where. Manual build, a small, but very important introduction that makes for a nice finish. So manual builds are very simple, but they will create this little play button here, and the only way to run this job is by clicking that button. This can be very useful for this kind of setup, where you want to test and build and deploy to staging, you don't care as long as the previous jobs all complete successfully, then let's go and deploy to staging. But the actual deploy to production, you want a human being to actually press that button and say, hey, let's go and deploy this. So you set manual to true. I didn't actually include the parameter, but it's literally called manual and you set it to true. I think you can figure it out. If you set that up, then your job will only be executed when you manually press that button. We use that for exactly this setup. So we have some puppet repositories, and we don't care about puppet code going to staging server. That's totally fine. But for a puppet code to go to production server requires all kinds of unfortunate approvals from people. But once they are there, then somebody can simply press this button so that nobody has to actually touch the production server. It becomes really nice. If you couple this with, for instance, LDAP authentication for GitLab, then you can really closely determine who can deploy to your production environments. Secret variables needs a little bit of work in my opinion, but it's already there. So sometimes you have things that you want to not show in your GitLab CI YAML file. So for instance, passwords to places or API keys, etc., etc. You can have your whole project be open source, but it really would be convenient to not have your AWS access secrets checked into your GitLab CI file. For this, there are secret variables. They're on the GitLab project level. So in your project, you define this variable has this value, and then in your GitLab CI, the YAML you can just use them as a variable that will automatically get value from the secret variable from the project. The downside is that currently they are not masked and will just show up in the build log. So if you are using AWS credentials that are actually a secret variable in your project, if something in your build prints those variables to the log, they will just show up. So it's a dangerous thing at the moment. You have to be really careful about this. There is an issue open for this, which is developing slowly but nicely, and I'm sure that not too long in the future, we will have a solution for this where automatically the build log will not contain these things. To go a little bit more advanced, you'll fairly quickly probably run into a place where you say, hey, but I want access to this private repository, and the Docker runner or the runner, the GitLab CI runner does not have special privileges to access your GitLab instance. It only has the specific commit that you are testing at that moment. So if you want to do things with, in this case, SSH, for instance, private repositories, then you'll have to get a bit creative to be able to get that working. How does it work? So first you create a new SSH key pair and add the private key as a secret variable inside the project. The public key, how to say that, put it here where you are not in a before script before the job. So we have a very simple test job here that this SSH is to get at GitLab.com and it does a Git clone, just simply to show how it works. It does a Git clone of a private repository. So the only thing you need to do in your before script is make sure that whatever environment you're running in, actually has SSH agent installed and has the private key, how to say that, creates the private key on the build environment that you're running at that moment. So we just, in this case, do an app get update and app get install of open SSH client. We SSH add the private key. See, this is the dollar SSH private key, is the reference to the secret variable that lives on the project level. We create a dot SSH directory. We set a strict host key checking off in the SSH config file. This is all Mambojumbo that just boils down to get this environment ready to do a checkout of private repository. Then here, you can actually do the Git clone of the private repository. The nice thing is that this will be destroyed the moment after the build is done. However, you have to be careful to not somewhere accidentally print this private key to your build log because then your private key is there. Especially if you're doing open source projects, this becomes a bit more of a challenge because you want to make sure that nobody ever accidentally does something in your build log that prints that key. As for the use case of this, so one is to check out private repositories. The other one is to deploy to production servers or any kind of server that is not easily accessible. So if you have a deploy job like here, for instance, in this deploy job, you could have here some commands that actually SSH into your production machine and run a command there, or run a command over SSH on your production instance, whatever you prefer. Either way, this looks a bit complicated, but it's really a one-time thing, and it's fairly well documented what you need to have there. Once you do that, it becomes very easy to do secure communication with different places. Is that clear? Everybody awake? One of the last things I want to show is the YAML anchors. This is not actually a GitLab CI feature. This is some deep dark corner of YAML that allows this stuff. We don't actually use it because I find it to be a fairly complicated way of accomplishing things, but depending on what you're doing and how much duplication of stuff you have in your GitLab CI file, this might become useful. So this stuff here on the left is equal to this stuff on the right, and this is purely on a YAML level. So GitLab CI doesn't really have anything to do with this. It's just a trick that you can use to have some deduplication in your GitLab CI file. So to run through it real quickly, so we define this hidden key called job template, and then we assign all of these things to it, and then everywhere where we use it, these less than signs, they are actually merging, and this is the job definition that refers to this job definition, and it gets merged into this test one job. So what we see is that this stuff is equal for job one and job two. So we might as well extract it here. So this actually expands on a YAML level to this, where we have two jobs, one called test one, which uses Ruby 2.1 and Postgres and Redis, and you see the exact same thing here. So you could just as well use this or actually this, but this has the deduplication, especially if n becomes larger than 2, that becomes more useful to have that kind of deduplication. Use it with care because it has a tendency to get really rough really quickly. We've played around with it and then decided not to use it. That's it. I was going to do a demo, but I was smart enough to buy a new MacBook Pro 2016 thing, and it has USB-C only. So I thought, okay, I'll buy every connector under the sun, except for VGA because that's a 1996 thing, and of course, FOSDEM runs everything on VGA. So I'm graciously borrowing Tyler's laptop, and therefore, demoing is going to be a little bit tricky. However, are there any questions? Bunch of questions. Sorry. Come again? Fast fail, but you have some method, can you not track it? Is there something that you can use? So the question is the imagine that you have jobs that are not necessarily pass or fail, but do you have some kind of metric that you want to use? Not directly that I am aware of, but there is, for instance, for code coverage, which is one of the use cases. I could imagine this you'd want this for. There is a, I would almost call it a hack that allows you to define a regular expression on the GitLab level that will be searched in the build log for the output of the code coverage percentage. But other than that, right now it's a binary thing. I'm also looking at our GitLab friend over there, so if I'm incorrect at something, you have to tell me that I'm incorrect, but I think that's the way it is, right? Yeah. There are more questions. Yeah. So the question is log with a G or log with a CK? So the question is, can GitLab CI lock other resources than the runner? I am not aware of it, so you can do this in the scripts if you can find a creative way to do that. But I'm not aware of it being able to do other external things at the moment. So the artifacts, how are they kept? So you refer to them by default, they are available within the same build of the pipeline, but you can make some changes to that by having the key with which they are stored. You can change that so that you can use them across builds as well. I haven't personally played with that, so I cannot tell you the details, but I have seen that that is indeed possible. Yeah. So you have to, the question is, do you need to update them? So what we do because this comes actually straight from an actual CI that we're using. So what we're doing is here, we do a bundle install-deployment, which stores everything in the vendor subdirectory, and then here, we do a bundle install-deployment again. It might have changed. It's super unlikely for our use case at the moment, but this is how you would do it if you have a longer cache time or you want to use them across builds, and then you can do a construction like this to make sure that, hey, nothing should have changed, but please go and check that that is actually the case. Can you use Docker's layered file system for artifacts? To be honest, I have no answer to that question. You'd have to look it up. Sorry. Sorry, I didn't get your second question. Let me answer the first question first. So the first question was, where do the logs go? The logs go back to GitLab CI itself. So they're not stored on the runner, so you can access your logs as far back as your history of pipeline goes. So you can always look back at your logs. Your second question? Code review. Yeah. So the question is, can you integrate code review? Theoretically, you would solve that problem on the GitLab side and not necessarily on the GitLab CI side. So what we do, for instance, is that somebody creates code, does a merge request with the upstream, and then that needs to be reviewed before the merge request gets merged. Then because then you also have the ability in GitLab to use commenting and push further commits to the build. So the testing is what is done on the GitLab CI side, and the human aspect of the code review that happens on the GitLab side. So the question is, can you pass a Docker file instead of an image? You cannot theoretically directly pass a Docker file, but since one of the last couple of versions, there's an internal Docker registry built into GitLab CI itself. So you can actually build your own Docker instances and push them to GitLab CI and then refer to them here. So they don't ever need to see the outside world, they just all everything lives in GitLab. Yeah. Come and talk to me after the session because that's exactly one of the things that I wanted to look into sometime soon. So you can actually build a Docker image from a Docker image and then use that in the rest of your pipeline, which is quite a nice solution. So the question is what if you have multiple source repositories that depend on each other and you want to do CI on a combination of these deployments. So this has been a problem for us as well. Specifically, we had two repositories, one is a puppet control repository and the other one is a puppet module. The GitLab CI, whenever you push to the puppet module, the control repository is what sits at the top and it checks out a bunch of modules including this one that we were making changes to. The module had its own CI process and the control repository had its own CI process. What happened is that if you push nearly simultaneously, because sometimes it happens, you made a bunch of changes and it touches both of those repositories. So we would push to one repository which has an automatic deploy to staging environment or both of them have an automatic deploy to staging environment if everything is green. You push to them simultaneously. This pipeline completes fine and pushes to your staging environment. This pipeline does not complete fine and actually breaks the code base. So as this one is deploying to your staging environment, you are now deploying broken code because it actually checks out the other job as well. And we've had some, let's say, unfortunate side effects. I mean, we didn't actually drop any production databases, so we're still... Sorry, I couldn't resist making a joke, at least once. I do commend you if you haven't looked into it. So GitLab accidentally dropped or destroyed a production database this week and had a massive outage from this. But if you want to see how you want to do operations, it is absolutely exemplary how they were public about this. There was a Google Doc that you could openly access that has all the details in it. What happened? Who fixed what? Who went where? What are the consequences? How are we learning from this? At some point, there was even a YouTube live feed where you could just watch engineers fix GitLab's production environment. I'm really impressed by that. And really, it's an unfortunate turn of events where somebody made a mistake and then it turned out that all the fail safes didn't really actually work. I've been there and it's not a great place. It doesn't make you a bad engineer. It's that one unfortunate, one in a gazillion cases that happens to everyone at some point. Anyway, I digress. Any more questions? Do you have any notifications if a job succeeds or fails? So first of all, there's email. I think it's enabled by default. If you enable the email, the SMTP settings in GitLab, then it automatically starts emailing you when jobs, when pipelines fail or succeed. Secondly, GitLab also comes packaged with an open source Slack alternative called Mattermost. If you haven't looked into it, do look into it because it's really, really a great communication platform. Fully open source. It is already deployed. If you have GitLab CE installed with the Omnibus installer, it's already there, literally one line and you can start using it. And GitLab CI will post notifications to Mattermost channels if you want it to. You can also do web hooks and a whole bunch of other stuff. It's all GitLab level stuff. Somebody in the back has a really urgent question. So the plans, I don't know anything about it. I have no official ties to GitLab other than we buy some licenses from them. But so the question is a build matrix is where you say, hey, I want to run all versions of this against all versions of that. I can honestly say that I don't know if that currently exists. Yeah. Yeah, so then you get into dirty, ugly YAML for now. It doesn't exist, including excluding some of the variants. And it becomes sometimes also quite ugly. So this is also the reason just to keep YAML simple. Maybe there is a little bit of a case in the YAML. But at least you really know what is happening there. But there is a number of improvements for YAML.