 Hey everyone, thank you for joining my talk on Mastering Production Great Best Practices to building your Node.js Docker image. Hey, glad to have you here. My name is Lerontal. I'm also known as that guy with the other hat. I'm a developer advocate at Sneak where I'm on a mission to help developers build applications securely using open source software. I'm actively involved in the Node.js security working group, the OWASP, you know, different research and security experiences and best practices. So if you want to attach base on all of those or any of these or just ask any questions, I'm on Twitter, just reach out at Lerontal and we can chat. But today, we are going to specifically dive into this building Node.js applications with Docker containers. Now, most blog articles that I've seen have come kind of like starting and finishing along the lines of following the best, not the best really, but I would say the simplest, right? And simplistic, maybe over simplistic sometimes, sometimes Docker file instructions for building Node.js Docker images. And what I actually mean by that is, you know, it's simple and it works. It's something as simple as this, like this Docker file, which you can run Docker build and then Docker run, and it's fine. The application will run just fine. The only problem is it is kind of full of mistakes and bad practices for building Docker images, definitely if you want to do this for production grade. So you want to avoid anything that looks like this by all means. Now we're going to dissect what exactly it means every single line of them starting from the first one from Node, right? That actually means no reproducible builds because you're pulling in the latest image and we're going to drill into any of the all of this. So don't worry, we're going to, you know, stay tuned on this. The other thing here is, you know, you're copying potentially sensitive files because you're just copying everything, you know, maybe config files, maybe environment files that you wouldn't need to have on the running environment on the Docker, uh, Node.js image. What about, you know, unneeded dependencies in this npm install command, you know, who knows what you're pulling into when you're running this during and during the build time of the Docker image. And finally, even this command to basically spawn the Node.js runtime as well in the application itself is actually incorrect usage of doing so. And it may end up in your application not having proper graceful shutdown. Definitely if you're using this in some, you know, mature and rich orchestration environment like Kubernetes or Swarm or others. So let's start off with our first, you know, best practice. This is using explicit and deterministic Docker base image tags. And what does it actually mean? So if we look at this, you know, first line of code here in this Docker file that says from Node, well, what image are we actually pulling in? It may seem at first, you know, kind of like an obvious choice to use this Docker image, you know, from Node, but we're actually, this is an alias to the latest Docker image of Node. And should we actually be pulling in that latest image? Because at the very least, it means this is unreproducible builds, right? This is every time we install, we create and build this image, we're actually pulling in a new version, potentially new version of the images, you know, from the last time that it was built. And so several things, you're right, just like we're using, for example, lock files for NPM, you know, yarn or NPM, whatever you're using as a client manager. Still, you are using lock files because you want to pin the dependencies, you want to have consistent builds, not inconsistent builds, not indeterministic builds. Same thing here, you want to use a very specific, you know, Node.js Docker image that you're going to pull in. The other thing is, you know, because you're pulling in, if you're pulling in images like that, like from Node latest, you're actually taking the latest Node image, which is, I don't know if you knew this, but it says a full fledged operating system with many libraries and, you know, binaries that you may not need for a running Node application. So why would you want that? It's, you know, more software means, you know, you know, down, you know, more downloads for a size, you know, more, more risk for you because there's now more software bundled. And who knows what's going to happen from it. I will tell you that we're going to see a live hacking demo here of what happens when you actually bundle a lot of dependencies in a Docker image itself. The container itself could be compromised and we'll see it. But thinking about all of these images that you're pulling in, you know, from Node latest is a full-fledged operating system. And as we've seen, you know, in previous research at sneakers, you know, taking that latest image, even for other base images, like, you know, the couch base, the MySQL, all of those other popular base images, this is basically top 10 Docker images on Docker Hub. Almost all of them you see from, except for Ubuntu in the last time we scanned it, all of them have these vulnerabilities that are by default on the Docker image itself. So why would you want to, you know, take that latest image? Probably not. So let's fix it. You know, this base image directive that we are now going to replace is going to use a new base tag. And we can find that, you know, shot 256 hash for it on Docker Hub, or maybe even just by running Docker images minus, minus digest, and that will show you if you've pulled it in, what is the image digest for it. So you can find it, whatever, you know, makes sense, but you can use it. The downside here that it's a little bit, you know, unreadable. So if you want to maintain Node images over time or Docker images in general, you're not really sure exactly which base image are you actually referring here. So hey, look at this, amazing, we can just replace, you know, that shot tag and not really replace it, but repend it with an alias to what this actual image is coming from. And that is, you see, this is Node LTS Alpine. So I'm using that base image as a tag, but specifically the image build of the Node version at that time of the shot. And this is going to provide me all the time, deterministic builds of the Node Docker image. Moving on, are we going to install dependencies the right way or are we going to do it the bad way? Let's see. Well, first of all, we started with our Docker file, the simplistic one, with NPM install. Now, as you probably know from being a developer, this is not the best way of doing it. You know, it adds unneeded dependencies and security risks because you're pulling in depth dependencies and other things like that. It's inflating the image site. You know, why would you want to do it? Not really. Don't do this variant adder, which is, you know, an NPM install and then updating to the latest version because you have no idea what you're going to be pulling in. Like I'm saying here, please do not do this. This is not a best practice at all. Do not need those depth dependencies. You do not need that indeterministic way of pulling in images in. Now, the thing is you could try and add the minus, minus broad at the end to kind of like, you know, pull only prod dependencies, but it may surprise you with the dependencies that you will pull in during a CI environment because, you know, many things that can happen if you're not using a log file. So if you want to do it, the most proper way is you're going to pull in only the production dependencies, but you are going also to pulling the, you know, to, sorry, not to pull in, but to pin the dependencies using NPM CI to what you have in the log file and get the terministic builds. This is, by the way, also faster than the other way of installing dependencies. So we're going to get through that one. Now, something that is, you know, I've seen happening in different packages is whether we are optimizing the libraries and the way that we're building the image to actually work for production. And what it actually means is there are a lot of expectations that some libraries may have that you may not know about in order to like, kind of like a title on performance and security improvement and optimization. Now, what exactly does this mean? Well, if you wanted to do no-dent production to kind of like tell the NPM package manager to install only production dependencies, that will work, but that no-dent production only lasts for that state in the layer and that step in the docker file to pull to create the production dependencies. When you run NPM starter at the end, it will still run in like when no-dent, when node's envis is kind of like in-depth. It's not in production mode. Now, why you want to enable no-dent production for running the application in general is things like express where, you know, express will only enable some caching and less verbose error messages and other capabilities that's kind of like optimized for production only if no-dent is running with production. So there's a blog on this from Daniel Kahn, you know, dated way, way back about why this is important and there's probably a lot more sense. But basically what it actually means for us is we want to install production NPM dependencies with NPM CI minus minus only production and, you know, move that node and outside of the node install process to be a generic way of basically building and running the application. Okay, so once we've got this, let's talk about this other principle of least privilege, which is a long time security control from, you know, the early days of UNIX that we should always follow regardless of, you know, containerization and serverless and whatever. This is a best practice. Now, what do I mean by that? So we've gotten to this state of the Dockerfile, which is already, you know, much better than the simplistic approach of it. And we've probably already remediated some vulnerabilities and risks. But the thing is, do you know which process is actually used, which owner of the user owner of the process to run the runtime? Not really sure. Now, why am I asking you this? Because let's see some examples of how this can turn really, really bad. So maybe, you know, a better way of using, you know, some insecure APIs like this, right? Like maybe there's a child process exec, which who knows, you know, who owns this command once it's running off inside a container. More than this, you know, let's say you have this worker node.js application image that listens on a queue and a message queue to basically do offline image processing. And you use it, you see, I'm not even using child's process here. I'm using PDF image. It's an open source package that I found that allows me to do, you know, image manipulation. So I'm using this one, this is on my node worker containerized working off of a queue, handling billions of messages that I need to like basically resize. But, you know, what if this PDF file path is now user controlled, right? Like something like this, what if someone could actually, you know, add that as a payload that manifests into this, into this, you know, library call to PDF image leap. Now, the thing is that exactly this kind of vulnerability really happened for PDF image and for other images. Now, why is it happening? Because you may not notice, but behind the same PDF image, you know, for you, it's an obstruction for what it does behind the scene, the implementation detail is it's spawning that, that exact, you know, insecure API child process exec command line to basically use the convert utility to make image manipulation. So now that you know this, you are kind of like a bit more worry of about these issues. And, you know, what we want to do here is kind of like maintain and containerize this is a bit of a pun, but containerize kind of like the blast radius of what could go wrong. So instead of running now that, that command injection in of PDF image vulnerability, if it happened, running that as the root user, which is what Docker defaults, if you don't, don't choose anything, we now want to use user node, which is less privileged and has less privileges and can do a lot of things inside the, inside the, the container itself when it's running. The thing is that copy command that I showed you that was, you know, kind of like bad before is a bad practice. It's because maybe you're copying sensitive files, but now because we also want to be able to, to run the user as the least privileged one, we also make sure we need to make sure that all the files related to the application itself are not owned by root, but actually owned by the user itself. So now this is a much better state where both copy and the user directive are aligned and giving us, you know, least privileges for the user. What about those other best practice or, you know, most common mistakes that I've seen that I see with blog articles about, you know, how they containerize node JS applications when running a node containers is how they invoke the node process itself as a process inside a container itself. So how many Docker files have you seen in tutorials and blogs that recommend this way of, of executing your node runtime, right? Probably a lot, a lot of tutorials do. Maybe you're even doing this today, you know, in your team, in your production environment. So here's why not to do it and what could go wrong. The problem is that while this works and is okay to experiment with, it's a bad choice for production node JS containers. And this is a bad way of doing it. This is, you know, might, you might think this is a better way, but it's also a bad way of doing it like that with, you know, with the square brackets, you know, you maybe think of, you know, invoking the node process directly like this, right? Nope, this is not helpful either. And even if you're trying to rob it up with a shell script, unless you, you knew what to do in that shell script, which will get you in a second, this is also a bad way of running and spawning your node containers. Now, why is it? Okay, this is all bad to understand why this is, you know, why this is bad. We need to understand the bigger picture of how node containers run in a, in say like a bigger environment. And what I mean by that is there's an orchestration engine such as, as you can see here, Docker swarm or Kubernetes or even just, you know, Docker engine itself. Now, it needs a way to, you know, you know, generally speaking, the environment, right, needs a way to send signals to the process in the container to let the container know that maybe it should die because, hey, we want to do some AB testing, we want to, you know, roll, enroll a new version in so we need to kill some containers, maybe they're over capacity, whatever is the reason for it, you know, this orchestration engines need a way to signal, right, to applications to terminate them. So they send signals like signal and SQL and whatever. And the caveat here is kind of twofold. Firstly, we are indirectly running the node application by directly invoking the NPM client. So what it means is, you know, when we are running NPM starts NPM itself as kind of like the package manager, the CLI kind of like spawns a new child process for the node runtime for your application. But who's to say that it's going to forward all the events that it's getting into that application? Well, actually it doesn't. If you do not believe me, let me show you how. It's a very simple experiment to set up. Add this process on, you know, SIGHAP, which is one of the signals that an application can receive. Add this code to your very simplistic, you know, node, you know, web application. Then using Dockerkill, you can actually the CLI Docker itself, you can actually send minus minus signal and provide a specific signal to a running container. If you do it, you can see, if you run that, you can see that just like for in my screen here, just like it's kind of like waiting for interaction, that exactly what's happening because what is happening, the node runtime will not show you any console logs that it received the event because the NPM CLI in that case swallows all of those events. And that's not something we want. So the previous example, we had this NPM wrapping, you know, the actual, the actual node runtime and not forwarding all the signals to it. Now we made a change and are we starting the process directly or do we? What's happening here? So let's open a shell in the, in this running container and see what we have now. It looks like we started the node runtime directly. That CMD bracket notation actually tells you that Docker to, sorry, tells Docker to execute a process and wrapping it with a shell. So does the shell actually forward this SIG HAP signal to it? As you can see here in my screen, Sean, even though that is process ID one, it's owned by root, by the way, which is, as we talked before, a bad one, but this shell minus C running this and wrapping it is not really actually forwarding the event. So let's try a different form. This is called the exec form where we are using square bracket notation and trying now to run this, to run this command and see what's happening. Well, what's happening when you run it entirely like this directly, it means that it is running as process ID one that effectively take some of the responsibilities of an init system inside a running container. What it typically means is that it should be responsible for like initializing operating system processes, but the kernel, the Linux kernel treats process ID one in a very different way than, sorry, then it treats other process identifiers. And so this special treatment from the kernel means that the handling of things like SIG term signals is differently. And maybe it won't even invoke any fallback behavior that could kill the process. So, you know, this is a recommendation from the node, the official NodeJS Docker working group to tell you not to run a node inside a container as process ID one. So that's also why not to do it. We're getting to what you should be doing. And that is you need to use a lightweight process scheduler to handle the events, something like dump init. This is the name of a tool of a binary that is statically linked very, you know, very has a very small footprint, you know, very easy to work with. And it's a good helper for these jobs. So if you're spawning a NodeJS process like this, you'll also notice that I needed to install the dump init in my Alpine container here. And we're taking advantage of image layer caching here. And what we're doing here is now making sure that dump init is running and when it gets signals actually forwards them to the node process. So it actually treats them correctly. And this also relates to the fact that we need the NodeJS application to receive interrupt signals like SIGINT and control C like that. And it will cause an, you know, it will kill, once it gets that, it will actually kill the node running, the container running the node application unless we've set some kind of graceful shutdown because we want all the current connectivity, the requests, you know, coming in and, you know, in the container itself, we actually want, don't want to like inter inter abruptly kill them, actually let them finish, you know, stop new traffic from coming in. How we're doing that is the ability to actually making sure that the container itself is able to gracefully shut down, when it gets this, you know, SIGINT or whatever is SIGTERM, whatever is sent to it to like stop, the container needs to clean up resources, needs to free up memory, needs to, you know, whatever it needs to do, you know, properly, you know, close database connections. And at that time, like until all the connections have been freed, like finished all the interactions, only then the container will drop off and not abruptly kill some, you know, some, some, some connections for people, you know, in the middle of things. So this is all about container handling and events and all of those best practices we've talked so far. And I'm getting into, you know, why are you not fixing the vulnerabilities in your docker images for, you know, for your containers? And what I mean by that is, you know, docker has this scan command, which you could use, it's built into it, and you could use, you know, docker scan, for example, node 14, whatever, if you want it and find what vulnerabilities you have in the container. Now granted, some of, finding some of these vulnerabilities is kind of hard and, you know, it might mean that we need to address them. But if you know, this gives you already some really interesting input, for example, it shows you you know, where is it coming from? This is a vulnerability that is coming from image magic. So this is, you know, which library is actually introducing it. Furthermore, it's telling you where in the docker image it is actually getting introduced. Did you do specifically upget install image magic, if this was like a Deepian or Ubuntu one? Or is it inside the base image that is now built with node, that the fact that you're just using node 14 or node latest just introduces that base image. And so, you know, this is very worrisome. And you know, when you scan docker images, you may find hundreds, as we've seen before, vulnerabilities. And I know what you're asking now, like, what is the worst that can happen because I have to, you know, accept maybe some risk, and I can't handle, you know, mitigating maybe 600 vulnerabilities or where do I start with doing that? Let's see what can happen first. Let's do a bit of a demo and understand what is happening and what could happen. So for that, what I'm going to do next is let me go ahead and ensure my screen here. And my terminal, I'll make that a little bit, yeah, font size for you to see it. So what I want to do first is run a container docker run this container called RCE. Now, what that actually is, you're going to move on to this code snippet here on VS Code. And you can see this is a node container running a node 610 whizzy, right? We've traveled back in time to node 6, so I can show you some some vulnerabilities and like some interesting one as well. It's a very simplistic file like nothing here is of issue. And I mean, there's a lot of issues here just about best practices. But for us, this is actually working for the container itself. And you can see, for example, how I'm importing express and multir to be able to upload images. So this application is going to be giving me the ability to upload images on port 3112. And there's actually no, I'd say no security in practices, bad practices from my code. This is just me using exec file. She has a pretty secure API to basically pass the command itself and then any, any sort of arguments to it. So once they do it, let's see if the app is actually running now. 3112, it's on if I remember correctly on slash public. Yeah. So this is it. This is the application. Imagine this is, you know, not even interactive. This is just some worker threat processing images. What I want to do is now upload an image. So before I put an image, I want to show you a little bit more inside it. So I'm going to move into the container itself. I'm going to actually kind of like SSH in and if so to say I'm opening a terminal and showing you what it looks like inside a container can see that I'm already in the user resource C goof. I'm in the goof application here, um, uh, container running and they can run, you know, cut server JS. So you could see the actual, you know, code. It's, you know, very similar. Exactly what I showed you before, right? This is the application working. So let me clear that up and show you the files again of what actually exists here. What I'm going to do now is upload an image, right? Like you would expect any application, you know, to allow you to upload images. So RCE one, let's see what's going on here. RCE one, RCE one, JPEG. When I upload this one in, let me resize it. It looks like it was successful and I can go ahead and upload a new one. That's great. This is a great application. It's resizing for me to make it, you know, thumbnail size that I want and so on. But what's actually happening is, let's see, if, if I'm looking at the list of files here is you could see there's a bit of a discrepancy. I don't know if you caught it first, but look at this May 16 RCE one. I've now basically just uploading by uploading a file. I created a new file. I created, I spawned the command. This is command injection running inside my container and creating a new file. Now, why did it actually happen? Because I have this exploits here RCE one. And if I show you what's happening inside, you can see that it's not really a regular JPEG image, but it is acceptable to be manipulated by the convert application, the image magic one. It exists inside its node 610 whatever wheezy container, this base image. And what I'm doing here, I'm just giving it some, some commands and I'm, I'm sorry, I'm concatenating this RCE one to actually make it, you know, touch a new file, create a new file. The same way I could just, you know, create reverse shell, you know, do RME, you know, sorry, whatever I want to do on this, on this container, I can now do it because of the ability of running, running container commands on this running container. So this is pretty, you know, this is pretty significant. And you know, this is it, like what do you do now, right? Now you found 624 vulnerabilities in the container. What do you do? This is kind of hard. Maybe, maybe not. Let's see the actual full, if you provide the Docker scan with your Docker file, the actual full input that I've saved from you seeing before is this, actually at the end of it, after giving you all of the vulnerabilities and all of the counts, it actually tells you all of the, I'd say, the alternative base images that you could actually transition to. And if you would transition to one of them, it would actually give you some kind of, I'd say, some kind of like a less vulnerability footprint if you move to them. So if Docker node, my current image, Docker node 1410 has 624 vulnerabilities, like I'm seeing here, actually, if I move to node 1416 Buster Slim, I'll be left with 58 only. So I'm mitigating some like 500 dependent vulnerabilities just by moving to a different base image. If my application can function fine with that, why not, right? So you can do it from seeing it with Docker scan and mitigate those vulnerabilities, which we've seen right now, how they actually impact the application itself and can cause command injection. Or if you're using the sneak up itself, like to scan your images, we'll show you similar things. We'll tell you, hey, we found this node 10 image that you're running right now. But actually, if you try to, if you want to move to, if you want to fix those vulnerabilities, you would actually try node, what is it, Dubnium Buster Slim, and actually be found at a better state in terms of less vulnerabilities impacting you and less risk that you're having. So this is about, this was about basically, you know, remediating vulnerabilities. But there are other interesting things that we can do. For example, multi-stage builds are really a great way to move from a simple and I'd say, you know, we had to kind of like potentially earn a Docker file into separated steps of building a Docker image. And what I mean by that and how it can help you basically avoid leaking sensitive files. So if you do something like, you know, NPM CI production, that's fine. But if you need some private packages, you probably need some token inside it. So what you do, you go, you know, add the token inside the Docker file, like, you know, one, two, three, four here and do the NPM install and it works. But it's not really cool because that's hard coded secrets in your Docker file. So maybe you try something else, like maybe you try providing it with a command line argument like, you know, NPM token and then building the the Docker image with this, you know, command with this argument that exists on the Docker file, which is, you know, a step better. But if you look at the history of like the host that built it, you can see even the history of the image itself, this kind of like NPM token one, two, three, four exposed. So this is still a bad way of doing it. Now you'd think, hey, you know, I've created it, but I also need to, this is, you know, might be a good way of doing it, but I want to remove it, remove the, remove the, the one, two, three, four sensitive token from the image itself. So, you know, I don't want it in the running container. So I'll do RMS or F the thing is this adds a new layer that deletes it, but all of these layers and their history still exists as part of the Docker image. So when I do something like, you know, if this is a public image that I'm doing, Docker push and I'm putting it on Docker hub, and even if it's private one, because it might theoretically in the future be, you know, public and open source, that's still a bad, a bad thing because then that NPM token, you know, one, two, three, four still exists as a part of the history of the Docker image. And this is really where, you know, it brings us into multi stage builds. The fact that I am able to now use one image for, you know, this top one for basically my, you know, even if it's like a big one, no latest or whatever to do all of the installs that I need, but, you know, when I'm done with it and I've installed whatever I needed from private packages, I move, you know, I do like a smaller image, you know, from node LTS Alpine, like copy all of these artifacts from that bigger image into the smaller one, the most the purposeful for production. And I can now basically mitigate two things. First of all, I'm having smaller base images for production, less vulnerabilities, less software, you know, less size, also preventing sensitive information leak. Now, one thing that's super important and not well known is how do you mount secrets safely? So it's like a little bit of a better way of doing that NPM token thing. And that is, you know, sometimes you may even need a little bit more than the token itself, like the dot NPM or C, which has your registry and some other default. So what do you do? You don't really want to copy all of that into the running container, just because, you know, maybe you have a Docker ignore or something like that. So what you could actually do is there's a new command available in Docker, it's called with build kit, which is the new kind of the new capabilities into Docker, which is, you know, mount with a secret. So I can actually mount a specific file into the container, give it a name. And so when I actually build it, actually give it a reference ID and the file. And then what happens is it will only have that secret as a file available on the container itself for that specific state. And that's all that step at all. And nothing else. This is not, you know, retained in any container history or image layers or whatever. This is the proper way of mounting secrets like files into the container itself. Now there are a lot of other best practices that we haven't had time here to show you how to build containers securely. You can, you know, there's a lot of them on the sneak blog. And you should probably scan and monitor your code depositors and Docker images, you know, all the time, because of all the vulnerabilities that can happen, like we've seen in this demo. So, you know, may the container gods keep you safe. But until then, thank you for joining my talk. And good luck building your containers.