 Welcome to my talk, my journey using Docker as a development tool from zero to hero. This is going to be kind of like my journey with Docker over the last five years and kind of what have I done and how have I been integrated into my development workflow and what changes have I done. This presentation, funnily enough, is actually running in Docker. I don't have Reveal.js installed locally so and code examples and the slides will be shared at the end. There'll be a link and I'll add it to my abstract as well if you want to go look in your own time. It's a little bit about me. I'm a software engineer. There's a link to my blog and my website. I really like cats and anyone who talks to me for more than a couple of minutes will know I'm an avid cricketer really, really into my cricket. If anyone wants to talk to me about any of those things, come catch me at the end of the presentation. I also work for a company called Zoe who have very graciously sent me over here. Zoe is a start-up in the healthcare space and we kind of have what you might call two products, a personalised nutrition product, which is all about understanding how your body responds to food and I'll show you a little bit about that in a second and a health study which is using the power of community science to tackle global health issues and this is kind of what we pivoted from the COVID symptom study which some of you may have heard of, especially those living in the UK. Earlier this year I was quite lucky to be able to try the Zoe nutrition product. These are my blood sugar levels where I was wearing a glucose monitor. On the left kind of day two or three of me wearing the blood glucose monitor and what you can see is you see after I ate something this big blood sugar spike and dip and that comes with a lot of unpleasant symptoms like crashing, hunger and fatigue. On the right hand side you can see me kind of two weeks into wearing that glucose monitor and I've kind of we've done some experimentation, I've kind of tried different food combinations and you can see much shallower peaks and troughs and as I better understand how my body is responding to food and that's kind of what the nutrition product is all about. Again if you're interested in that please catch me at the end I'll be able to show you and talk to you that more detail. Part of the reason I wanted to talk about Zoe was because Zoe, the Zoe engineering team, I describe it as a Docker first engineering team. We try and get everything running in Docker first and then perhaps there might be other ways as well but that means when developers get set up a new project so we get new developers in you need far fewer dependencies installed locally which is quite nice and Zoe is where I've really accelerated my learning with Docker and just tried to dockerize far more things than I used to before. So who's this talk aimed at? Well this talk is aimed at people who've used Docker but perhaps you're not an expert in it so you know you've used Docker builds, you use Docker run, you know some basic CLI commands, maybe you've heard of multi-stage builds but haven't used them before perhaps you want to use Docker in CI but don't really know where to get started so we'll cover all of those things. If you are an expert in Docker perhaps this talk isn't for you but of course feel free to stay. What are we going to do? Well we're going to take an example web service, in this case a fast API web service, a fast API being an asynchronous Python web framework very very similar to Flask. It's going to interact with a database and it's going to just add and create new users, sorry get and add new users. It doesn't really, the details don't really matter, just imagine it as your standard CRUD service. And we're going to use poetry for dependency management. Poetry adds a few nice things like locking our dependencies. Again if you want to know more I'm not going to go into too much detail about poetry but it's a really interesting tool and there was someone did talk about poetry earlier in a previous talk as well. So before we get into the how of Docker I think it's really important to cover the why of Docker so let me take you back five years ago. I've just graduated, I've started my first job, I'm kind of the only developer at this place and my manager comes up to me and he goes, I've heard about this thing called Docker. Do you think we could use it? And I go do some googling and I see some big words, well maybe not big words, but words I don't understand like container and image and I don't really get it. And I just say oh I don't understand and just move on and then a couple of weeks later I come across a media might call and you know we all have that penny drop moment. Well I read this one phrase and then the penny really dropped for me and it was I think it was something along the lines of reproducible builds in development and production and that's I think at the core why a lot of people like Docker is you kind of have this recipe and it then means you have kind of a more consistent environment with what you're doing locally with probably what you have deployed in production. It's much closer, that's really nice. One other thing that's actually quite nice about it is if I tell you a story from someone at Zoe, as we all are, new macOS version comes out, we upgrade straight away and then all of a sudden this developer who done this all of a sudden couldn't run anything or build anything. There was lots of compatibility issues and he spent the next couple of days actually fixing things before he could do any development and we all have been there. It's very frustrating. Since moving to Docker a lot of those problems have gone away because Docker kind of abstracts away what's running on your host machine. It cares more about what's running in you know you've got all containerizers what's running in the container so it's quite nice. So it provides this easy set for developers. It means if you're jumping between multiple projects you don't have to worry about the set up as much. You probably have far fewer commands you need to run. It's also OS independent. So the nice thing about that is you don't need specific instructions for Linux and Mac like you might use brew on Mac and you might use apt on Linux as your package manager. You just need that Docker build, Docker run wherever Docker commands and so it means you can kind of run it wherever and all you really need is a Docker demon running and then something to interact with that Docker demon often a CLI. And then we've moved to this brave new world where instead of saying it works on my machine we say it works on my container. But more seriously hopefully the fact it works on your container gives you a lot more confidence that it's actually working. So let's go back five years ago. What would have Haseeb's first image looked like? Probably something like this. I'll kind of blaze through this. We're going to use Python 39 as the base image which comes with a few nice things like Python and PIP and a few other things. We define a bunch of environment variables. Most of these are for poetry, telling poetry where to install its dependencies and other such things, adding it to the path variable. Details don't necessarily matter. We're going to copy over the relevant poetry files. We're going to install poetry and then install all of our dependencies. Then we're going to copy over the files from our host machine into the into the Docker image. And finally we're going to start our web service. Really straightforward. Nothing hopefully too complicated there. How do we run it? Well, we've got two steps here. We need to build the image. We're going to pass the tag flag so we tag it as app. Then we run that image as a container. And again, we're using the same name app that we've tagged it as. We're also going to do some port forwarding where you see with the publish flag where 80 colon 80. And what that means is we map 80 on the host to port 80 in the container, which is where our web service will be running. So rather than having to figure out the IP address of that container, we can just access it on the host machine on local host. And that just, you know, it's a nice bit of convenience. Right. Now imagine there's two developers now, you know, where I was working, we've hired a second developer. And he comes up to me and he goes, ah, it's really cool. You know, we've dockerized our app, but now we still have a local database. Let's imagine we're using Postgres. And we're having to do some shenanigans and weird things to get it working and connecting. He's like, you know, it's really good we've got the app dockerized. Can we also dockerize the database? Because, you know, you've got different projects and we've got different versions. We've got Postgres 11 and Postgres 13 and it all gets a bit messy and it all gets a bit, you know, it's not the easiest thing to configure. So let's take a look at how we might go about dockerizing some of our app dependencies, in this case a database, but you might have SFTP server or, you know, other things that you want to use when you're developing locally. So without docker, imagine in a Debian-based system where we've got the app package manager, we probably do something like this. We install Postgres, you know, we create a service so that it will start automatically with our laptop. We create some users. Hopefully we probably have to fiddle around with permissions. We create a database and then we also have to hope that nothing else is interacting with our database. If you've got two projects and they accidentally end up using test and one deletes test by accident, what happens there? It gets a bit fiddly. So again, as we've containerized our app, let's look at how we can containerize the database and do it with docker. So we have something like this. We do docker run. We're going to pass the volume flag. What that means is going to mount a volume. In this case, it means that data will persist between multiple runs. Normally, when you stop the container and kill it, if you rerun that database container, there will be no data in there. You probably want your data to persist between multiple runs. So what we're going to do is add that volume mount and that will make sure that the data stays between multiple runs of our docker container. We provide a few environment variables such as the default database name and the default password. We do some port forwarding again so we can access the database on our local host and then we specify the image that we want to run. In this case, Python 13.4, but if your project needed Postgres 13.4, but if your project needed Postgres 12, you can specify that. Then you can see how now it's being contained to just that project and we've got really easy setup now. So if we want to bootstrap this together, you probably have something like this. We have to create some networks. We build the app. We do a bunch of stuff. You don't have to worry about the details. We'll simplify this in a second, but we probably aren't running all of these commands. We probably got it behind some script or a make target or something. We have a build.sh or a run.sh. Then we probably got some teardown work we have to do to stop the containers and everything. We probably have something like that. That's okay. That's fine. But me and Bob are talking again and Bob's like, this is a bit complicated, isn't it, Seab? Imagine we need an extra container. This script is already 11 lines long. There must be a simpler way to manage multiple containers. We do some Googling as all good devs do. We come across this tool, which I'm sure some of you have heard of, called Docker compose. Docker compose, as you may have worked out, provides an easy way for us to manage multiple containers. We can define all of our containers in one file. We'll take a look at that in just a second. Just as a quicker side, there are two Docker compose tools. There's an older Docker hyphen compose, which I'm sure most of you are more familiar with. There's a new Docker space compose. In this talk, we're going to take a look at Docker space compose, because as far as I'm aware, Docker hyphen compose has been deprecated in April. We're just going to talk about Docker space compose. Syntax is almost identical. There's nothing majorly different about the two, just for the sake of this talk. I've got a link to an article at the end where you can find out more information about why they've done that change. What do we do? We create a Docker hyphen compose YAML file. We define a bunch of services where one service is going to be a container. We have our app service here. We provide it some build information, like what Docker file to build. We provide this command. Again, we're looking to improve our development experience a little bit, so now we're going to add this reload flag. When the code changes in the container, it's going to restart our web service. Rather than what we probably had to do before was rebuild the image and rerun the container, which probably is bit fiddly. We can just simply just live reload. The second piece of that is using another type of volume mount where all the files in the root directory where Docker compose is get mounted into the app folder in our container, which means when files change in our host machine, they'll also change in our container. That then will trigger that live reload and our web server will restart. We also define a bunch of environment variables. We're going to take a 12-factor app approach, where we probably, some of this config will change between different environments, so we don't want to hard code it. Probably you don't want Postgres as your password in production. Maybe you do, but probably wouldn't recommend it. One interesting variable to take a note of is online 13, where you see Postgres underscore host. Rather than needing to supply an IP address, we can actually just supply a service name, which is what we're going to call our Postgres container as well. Docker DNS is clever enough to do a lookup and be like, oh, this is a container name. Let me do the lookup for you. It's one, seven, two, blah, blah, blah. As Docker IPs can change when you run them, this is quite nice, rather than having to hard code stuff, so less boiler play. We also do some portfording. I've stuck the loop back interface. Again, I've got a link to an article at the end where you can find out a bit more about that, but that just means it will only be advertised on our loop back interface, which is only accessible on our host machine and won't be accessible on other interfaces, which potentially, if your firewall isn't configured properly, could allow other people on the same number to access your container. Then we have this Postgres container. Again, the Postgres host variable here matches what we defined here. The rest of the definition is very similar to what we had before, just in YAML, rather than a Docker run. As you can see, now we have one file, rather than that big script. Then how do we run it? We just do something like this, rather than Docker-compose. We do Docker space-compose up. Then with the build flag, this will build our images and then also run the container. Run the containers all in one step. Then to do a tear down, you can just do Docker-compose down. To summarise this section, Docker-ise your app, Docker-ise your app dependencies, and then use Docker-compose to manage multiple Docker containers. Me i'n bob a talking again, and bob's like great, we've kind of got a whole tech stack Docker-ised. But we're having this problem where, and I'm sure a lot of developers encounter this, where your tests pass locally, but when you go to integration pipeline, they're failing. It's a real pain, because you're like, ah, how do I debug this? He's like, you know how we've used bob's like to me? You know how we've used Docker to kind of have a more consistent environment between development and production? Can we do the same thing with our local environment in our continuous integration environment? We start taking a look how we can do that. Basically, what we want to do is run our tests within Docker. How can we do that? With our current setup, it's not too complicated. Imagine we're using PyTest as our runner, and imagine we're already installing it as a dependency. We can just do something like this. We do Docker Compose run, then the name of the container, app, and the command we want to run in the container, in this case PyTest. If we didn't change our Docker Compose configuration, that would just stop the app container. Let's pretend our tests needed to interact with the database, and we wanted the Postgres container to also be up. Either we'd need to run a command beforehand to start Postgres, or we could add this depends on clause where the name matches the name of the service again, so this depends on, and that will spin up the Postgres container beforehand. That's quite nice, because then we only need one command, and so Postgres will run, then our app container will run, and then it will run PyTest within that doc container. Again, that name matches what we have here. How do we get it working on our continuous integration pipeline? We don't want to be this developer where it works on my machine. Imagine we're using GitHub Actions, which is one of many tools you can use for continuous integration, along with CircleCI, Travis, GitLab, CI, et cetera, et cetera. This is just a standard workflow file. Let's take a look at how it might look before. So we need Postgres, so we define it as a service. We check out our code. We're going to use Python 3.9. We install poetry, install our dependency. We export these environment variables that we need again, because 12 factor app, and then we run PyTest. That's okay. It's a bit complicated, and as we've seen, it's a bit fiddly and sometimes things can just break in funny ways and you spend time when you don't really want to be doing that. Or after, we can just do something like this if we use Docker Compose. We can do Docker Compose run at PyTest and you'll notice this is the exact same command that we have, that we're running locally, that we're now doing in CI, which is quite nice. And hopefully again, it provides you with a bit more confidence that what you're doing in CI, what you're doing locally, are much closer. And so if something passes locally, it'll also pass in CI. And of course, you can do this with a bunch of your other tasks like linting, database migrations, a whole bunch of things. So to summarise this section, Dockerise your development tasks, you know, tests, linting, database migrations. At Zoe, we use Alembic for database migrations and we just have that all Dockerised. If you set up volume mounts, you can even copy over those migration files locally. Try and use Docker on CI to have that consistent environment between what you're running locally and what you're running in CI. So Bob comes up to me again. He goes, I see this is pretty good. Tests are running and when they pass locally, they pass in CI as well. But he's like, CI is a bit slow though. Takes three minutes and I'm really lazy. Can we speed it up? And so again, me and Bob start doing some googling and we look up what have people done to speed up their Docker images and the Docker build and all that stuff. So one thing we come across is a lot of people recommend you have smaller images. And there's a few reasons for this. One is to remove if we remove redundant dependencies, there's fewer things that can break. So there's fewer things that are interacting with our app that can break in funny ways because you don't need those dependencies to run in the first place. From a security point of view, there's fewer things that can be out of date and fewer things that can be exploited. So a hack or someone malicious trying to get access to your app will do something you don't expect. There's fewer ways they can do that. Less storage. So if you're publishing your images to a container registry, you're taking up less space. So you're probably saving a little bit on cost and then also upload, download as well, pushing those images and downloading or pulling those images. You're probably saving a little bit of time. So how can we do that in our case? Well, just five characters in our case. We just add a hyphen slim. This slim base image removes a bunch of dependencies we don't need, like a curl, W get, a few other things. And it really simplifies things. You could also look at using Alpine Linux. Again, I've got a link why Alpine Linux, at least my personal experiences with Python has not been fantastic. Again, link at the end. So if we do a quick comparison between our old image and our new image, we're saving about 700 megabytes from just the five character change. But I think the most interesting line there is our CI pipeline runtime is now 43 seconds quicker. And so if you imagine you have like four developers and you each run the pipeline 10 to 15 times a day because you're always pushing code, you're getting some significant savings there, which is quite nice. So to summarise this section, aim to use smaller base images, remove unnecessary dependencies and try and reduce your build time. If you do a bit more googling and I want to take this to the nth degree you end up like this, Pepe Silvia, from one of my favourite TV shows. And what you realise is we are installing our development dependencies that are running in production. So imagine we have PyTest, Flake, ISO, pre-commit, all these things that we probably don't actually need to run our application and production. But we also need them because we're running our development tasks in there. So how can we, what can we do? Well, we can look at something called multi-stage builds. Stick your hands up if you've heard of multi-stage builds. Cool, a decent chunk of the audience. So my general pattern with multi-stage builds is I will have a builder step which has a more larger image. I will build my application or dependencies wherever it may be. This will create some artifacts. I will then copy over these artifacts and you notice when we do the copy now in that step two, we're not copying from the host, we're specifying a previous stage with builder there. And I copy over those artifacts into a slimmer image. And when you actually build your image you can specify what you want to build for, builder or main in this case. And it will throw away the other images that you don't use. And it's all about this build versus run. Often you need more dependencies to build things than you do to run. And that's in a language like Golang or C where they compile, you'll see that that becomes a bigger difference. But for Python, it's mostly just around building up dependencies. So let's take a look how we can do that with our current image. So I'm going to create a stage. So the first stage I'm just going to alias it as base. I'm then going to have a builder stage which is going to install all of our production dependencies. So on line 29 you see the no def flag. So it's just installing our production dependencies and no development dependencies. Next I'm going to use the builder stage as a base and I'm going to call this next stage development and that's going to install all of our dependencies. So that's going to include pytest, flake, all those things. And it's going to do much as what it was doing before. And then we kind of have a split now. We have a production stage which is going to use base as the base and we're going to copy over just the production dependencies from that builder step and it's going to throw away everything else. At the moment probably you're not going to see any significant savings but you'll see in just a second how this can help. Then in our Docker compose file we want to come and add a target of development. So probably when we're building locally we're going to want to target that development image not that production one but when you're publishing your production image you will specify target of production. So if we do a quick comparison of our old slim image versus our new multi-stage image but targeting production you can see size wise we're saved about 80 megabytes. That's a bit conservative because in my tests there wasn't that many dependencies. As you have more dev dependencies that discrepancy will be wider and you'll be taking up far more space. Our build time is a little bit slower for the production image because we have more steps now but I'll show you a way how we can speed that up in just a second. We can use this cache from. So if we add this cache from into our into our Docker compose file again if you're publishing your images to container registry which at ZoE we do we publish our development and our production images to a container registry if the provided image and your current build have layers in common you get the same speed up as if the image had been built locally. Anecdotally I noticed it saved about five seconds when running on CI but it really depends on your change that your change will really depend on how much of the cache you use. Cool. So Bob comes up to me again he goes Haseeb sorry requirements have changed which I'm sure has happened to no one ever but he goes he goes instead of it being a synchronous call to create new users we're going to listen to an event now you know we're going to subscribe to some event and when that event comes in we're going to create new users now. That's fine at ZoE we have you know we have a pubs of library which we can use to subscribe to events that's okay but our libraries are private Git repositories so that means to get to work with poetry I'm probably going to need to have inject an SSH key which will allow us to clone those repositories and install it but I just want to inject an SSH key just at build time and I don't probably want to keep a local SSH key because you might end up accidentally doing this where you upload your private key and if someone gets access to that you know they can do a lot of damage. So let's take a look how we can use our local SSH key to do that. So first thing add this pubs of library to our poetry dependencies then we come back to our image in the builder step we're now going to install open SSH and Git and much is the same as we had before then the cool thing we're going to do is we're going to use this mount type SSH and what this will do is it will inject some SSH key just during that one step and it means when we're installing our production dependencies we will just have it just during that step you'll have that SSH key and obviously you can specify this in multiple steps and you can actually we'll see in a second how you can just use your local SSH key which I was using to clone projects already and how we do that is actually we do SSH add we add our SSH key to our SSH agent then we do doc compose build with the SSH flag and default which will use the default credentials in our SSH agent so I haven't actually really had to do anything I haven't had to move any SSH keys I already had them existing and that's quite nice. In terms of CI we have to now inject an SSH key most CI platforms allow you to inject secret somehow so we're going to inject this as a secret we do have to run a build beforehand and a quick comparison now you can see some significant savings so if we have our old image imagine with no multi-stage in our multi-stage image with production targeting production you can see now we're saving 200 meg because we don't have open SSH and Git in our production image which we don't need to summarise this section use multi-stage builds to have slimmer production images leverage SSH key injection during build time what did we do? just to summarise this talk we dockerised the app we dockerised its dependencies we used Docker compose to manage those Docker containers we used Docker for some development tasks and we looked at multi-stage builds link to the slides and some code examples any questions? Yeah, thanks for your talk you told me that your DEF looks like your CI but how does your production look like? Relatively similar I wouldn't say it's too different from development obviously there's always going to be differences it's probably going to have more memory and more things like that it's going to have the database is going to be connecting to is going to be more powerful etc but for the most part what we have running locally will be very similar to what we have in production to the point where we have if we have SFTP servers we have a mock SFTP server that we can use as well to test when we're doing our integration test to test that if we're uploading files or pulling files from an SFTP server that server is there and that code is kind of working so that gives you more confidence that when you do push something to production that hopefully it will work obviously there might always be a few teathing issues but generally I think we do a decent job at Zoe of trying to match development with production and Docker Compose really helps with that because you can have it all in one file but you're not running Kubernetes or something like that? We do use Kubernetes so that's what I'm saying there are a few differences to what you have locally in production Cool, I think that's everything then so we're done