 Okay, so this presentation goes more in depth about how to work with these containers and what kind of functionality mainly Docker has. So again, we have seen this slide before, two important concepts. So you can compare an image to a recipe, and you can compare a container to an actual dish while the image actually contains all the instructions in order to have this container running. There's a little bit more to it. For example, that an image almost always has multiple layers. So it has multiple layers on top of each other. So we have now seen the Ubuntu image. As you can imagine that if you do an additional installation and make an image out of that, you have an extra layer on top of that Ubuntu image, meaning the installation of figlets in this case. And if you make an image again out of that, you have multiple layers. And of course, you can imagine that you really can really have a lot of different layers for a container. Funny thing is, and we quickly touched upon that already, is that where the analogy between a recipe and a dish stops is that you actually can make out of a container, you can make an image. So we quickly touched upon it during this quiz question we just saw. And that's actually possible. And if you have a recipe and a dish from a dish, it's very difficult to make directly a recipe, right? But you can make an image out of a container. So this is how that would look in terms of a visual representation. So we have these different layers. So for example, we again have the Ubuntu base image. Can be any other base image, but let's say we have a Ubuntu base image. On top of that, we do an installation of Python. And that will be an additional layer for our example in the exercise that will be the installation of Ficklet, for example. And again, on top of that, you, for example, install Python. And then there is a layer added actually on top of that. And that's actually the writable layer. And that represents the container. So if you want to create an image, there are basically two ways to do that. First and foremost, and that's the most used, and that's the best practice, is to use a Dockerfell. We will learn much more about that in this lecture and later on in the afternoon during the exercises. But you can also, as I said, in the slide about the analogy, that you can make a image from a container. So what you then do is you use Docker commit. And what Docker commit does, it actually makes your container from writable to read only. So it kind of fixes the container as a layer on top of your other image and makes an image out of that. So in principle, what we could do is we have our Ubuntu image. We do an installation manually inside the container. Then we go outside the container and we run docker commit. And then we actually kind of freeze that container layer and make that read only and make an image out of that. And then we can store our Ubuntu image with a layer on top of that with the figlet installation and store that as an actual image. And we will do that in the exercises later on. So we will use Docker commit. However, it's not recommended to use that too often. Related to that, I have a question for you. So I will stop share and reshare. Here we go. So mention a few times. Using Docker commit is not very reproducible. Why do you think that is the case? Most of you have answered. So we'll stop. If you haven't answered yet, it's completely anonymous. No wrong answers. Or there is a wrong answer, but nobody will know. Okay. Here we go. So most of you answered all of the above. That is actually the correct answer. So I think the first one, you might have hesitated about the first one the most. In order to recreate the container, you will need the entire image. Yes, that's the case. And that's actually almost always the case that in order to recreate the exact container as somebody had it running inside their computer, you will almost always need the image. But what we have also learned and what we will learn more about is that you can describe an image through the Dockerfile. And if you only share the Dockerfile that describes the image, you can actually build the image from the Dockerfile. So that's what I meant with in order to recreate the container, you will need the entire image. But you can also share the Dockerfile, which can be sometimes a better idea of actually usually it's a good idea to both share the Dockerfile and the image. Well, and most obvious one is that with Docker commit, you do not by default describe the exact build step. So you might have done some installations that you forgot and you didn't write down and then you do Docker commit and then you try to recreate the image and then you figure, hey, it doesn't work anymore. Why is that? That's just because by default it's not written down what exact steps you have done, at least not in a standardized way. In addition, and that's related to that it's difficult to trace back software version, for example, for if you so it's not necessary that if you do all these installations that you specify the software versions. That's actually also the case for Dockerfiles. You can write Dockerfiles without specifying versions, of course, and therefore it still can be relatively not so reproducible, but it's easier to add the software versions to the Dockerfile. So therefore, all of the above are correct. Okay, then we go back to the presentation. Okay, then about Dockerfiles. If you work with containers, if you develop containers, you will see those a lot. And what they are, they are just a set of instructions on how to add layers to an image. And also it has some instruction on how to modify settings of a specific image, but most and foremost, the most important reason to write the Dockerfile is to give a set of instructions on how to add layers to an image. So how does that look? Well, this is an example of a Dockerfile. And basically it just builds those layers on top of each other. So it says from Ubuntu. So that's usually one of the first lines in a Dockerfile from means this is my base image. So I'm going to build upon this image. And that can be or that has to be a image that is present in Docker Hub. Of course, we have been talking about version already for a bit. Usually what you would like to do is to specify also the version you will need over here by attack. And we'll learn about that more later on. Let's say we don't do that. And we have our first layer, which is the Ubuntu base image. Then we do some installations using Appget. In this case, we install both Python and PIP. That will be the second layer. And then the third layer will be run PIP install Python, where we actually install Python using PIP. That will be the third layer. Once you have the Dockerfile ready with Docker built, you actually are building the image. That image will appear inside Docker. You can see it by using the Docker interface or using the command line interface. And you will find out, find that that image was actually built. And you can use that to actually make a container out of it, for example, or, for example, share it through Docker Hub. So this Docker engine, Thomas has a question, I think, or not. Oh, your hand is still there, I think. Yes. Okay, so to manage all these images and containers, we use the Docker engine. And the Docker engine is a demon process, meaning that it basically always runs in the background. So it both manages the images and the containers. Nice thing about this demon process is that, for example, these layers, they are very efficiently handled. So if you are in the process of building a container, for example, what you can do is, for example, build the first part of your image, add some lines to your Dockerfile, and build it again. So what caching them does is it will not try to build the entire image from all these lines in the Dockerfile, but only will try to, so we'll use the layers that have already been built, and then use the line that, so the loose layer that is added to the Dockerfile, and only will build that part. So that makes development of these containers very efficient, and it can only also reuse different layers. So if you have two images on your computer that share layers, for example, you have Ubuntu image and an image that is based on Ubuntu image, it will not store both full images on your computer, but it will only store the layers, and it will know which layers are in common between different images, and also between different containers. So that's very nice. However, one thing that is maybe not so nice, it depends on how you look at it, is that you cannot interact with containers and images through VALs. So they are not available for you as a VAL, so there is no VAL containing the actual full image that's there. So the interaction with these containers and images only happens to the command line interface or with the use interface, but usually with command line interface that is there brought to you by the Docker engine. So some examples of this command line interface that we'll use a lot if you are working with Docker is one we have already seen. No, this one we haven't seen yet. In order to give a list of images, you can type Docker image LS and it gives all the images that are available locally on your computer. You can of course always add images just by downloading them from Docker Hub. And this one we have seen before, Docker container LS, which will give you all the containers that are currently there on your computer, and it will also give you information about which image the container was based on. So let's say you have built your great image with, for example, the figlet installation. I think, well, this is a great image. I want to share it with the world or at least upload it to some repository. So usually what people do is working with Docker and you don't really care about whether other people can download your image. You use Docker Hub. It's very easy to interact with the Docker command line and Docker Hub because what you do is you just type Docker push and then the image name basically and then they'll push it to your repository on Docker Hub. That goes also quite efficiently because what Docker also does is checks which layers are already there on Docker Hub and it will recycle these layers in order to build your image. So it will handle storage very efficiently. Of course, there are also alternatives to Docker Hub, a very frequently used alternative is koai.io, which is owned by Red Hat. We have also for singularity images, we have singularity hub and we have both the GitLab and GitHub container repositories and many, many more. Ruben, did you have a question? Yes. I have a question regarding Docker Hub. Is it also working as a version control system or is it just a repository where you just upload the image and you don't track version control? It's not there for version control. So usually you do version control at the level of the Dockerfile. So for example, very common, what's very common is to have your Dockerfile, for example, on version control by Git and you have it on GitHub or GitLab. And from that Dockerfile, you build an image and that one you upload to Docker Hub doesn't mean that you cannot have multiple versions of the same image. Actually, it's very common to have multiple versions of the same image and we do that with tags and we will learn mainly about tags in the exercises. But one thing that's important to know is that you have one very important tag and the tag is latest. And latest, you can kind of compare with the master branch in Git, which is basically the latest image that is there. So usually, what people do is they want to download the latest image because it's the latest version. And then different versions, they can have different tags. So you can have version 1.1, version 1.2, 1.3 and that's specified in the tag. If you directly push to Docker Hub without specifying the tag, it will just override the latest and it's quite similar as you would do with Git. If you don't have a branch and you just directly push to your repository, it will kind of override the master branch or the main branch in GitHub. That's kind of how it works. But you will play around with it a little bit or quite a bit actually in the exercises. An alternative to not using repositories would be the command Docker save. And Docker save stores your image as a file. So in that way, you can really share your image with others as a file. But then you lose the functionality of all these layers. So it then will fully take the entire image stored as a file if you then have a different version of your image and you want to store it as a file. Again, it will store all the layers again in that file. So in terms of disk space, that is not a very efficient way to store your images. But if you don't want to rely, for example, on Docker Hub or any other repository that could be a way to save and share an image. Another way to share an image would be through the Docker file. Of course, that's not actually sharing the image. It's sharing the description of the image. But with a Docker file, two things are very nice about it. You know exactly what is done in order to build the image. So you have a pretty good idea if you want to change the image that you can also change the Docker file and then rebuild that image, for example. Another nice advantage of it is, of course, that's very small. You just have a flat file there. So a disadvantage of only sharing Docker file is, of course, if things change and if versions are not properly specified inside your Docker file, rebuilding an image with the same Docker file can give a different image two months later, for example, just because software versions changed in the meantime. So usually if you share an image, so let's say the conclusion of this story, usually if you share an image, you share both the image and the Docker file together with it. Okay, I have a question for you. So my question is, what's the best way to share an image in the context of reproducibility? Is that as an image on Docker Hub, as a tariff file or as a Docker file, a good repo? And over here, there's not a correct answer. It's mainly to ask you what do you think is the best way to share it in terms of reproducibility. Most of you have voted, so we'll stop. Okay, so most of you thought as an image on Docker Hub. Not a lot of thought as a tariff file and, well, second most of you thought as a Docker file and get a repo. So we need all three of them as we just discussed and have their advantages and disadvantages. So if you share your image as an image on Docker Hub, that will be the exact copy of, so if somebody's downloading that image, it will be the exact copy of the image as you have intended it to be. However, the person that is using that image locally doesn't really know anything about the history or how it's built or what kind of software was used in order to build the image, what kind of installations were used. And that is not super reproducible, right? So, of course, you can reuse the image. So being reproducible and exactly doing exactly the same thing. But it's not very interoperable, meaning that you cannot really disentangle the image and use the thing that has been built by somebody else in such a way that it actually can be applied to your specific problem. So then it's quite difficult to actually apply it to something slightly different. As a tariff file, well, it would be quite similar to sharing it to Docker Hub with the disadvantages of having to save the entire image and having to ship quite large files. As a Docker file in the Git repo, well, the advantage there is that you exactly know how a container was built, but rebuilding it might not result in exactly the same image a few months later. So that's why the best way to share an image would be both as an image and the Docker file back to the presentation. Okay, so this was about building images, managing containers a little bit. Now I would like to discuss features that are quite specific to Docker and are frequently used and that you will probably also use frequently and you will apply in the exercises later on. One of them is mounting directories. That is, of course, a very nice feature because what you can do then is have files stored on your computer, for example, input files, FOSQ files, whatever, and use those inside your container, do a calculation with that inside your container and then write them back to a directory on your computer again. You can do that with Docker. Managing identities is quite important one kind of security-wise because depending a little bit on your operating system, we have seen that we were root in, we can be root inside a container. If you then write a file to a directory on your computer, it depends on the operating system if that file is written also as root or as another user and it's quite important to understand that because at some point, depending on how you mount directories and what you want to do with the output files, that can become problematic if files are written as root outside the container. Third, I will quickly talk about mapping ports because if you are, for example, hosting a browser interface inside a container, you have to specify which port that is and map that port to a port on your computer in order to actually see that browser interface. About mounting, basically there are two frequently used ways of mounting volumes or directories that I think the most frequently used one is the bind mount that basically or basically that makes a directory that is managed by your own operating system available inside the container. So that's really a local directory available inside your container so you can easily use files on your computer by the container. The second one is a volume mount that is also that is quite different because the volume mount is entirely managed by the container or by Docker and is entirely isolated from your file system, from the host file system, meaning that you cannot really copy paste files to that but if you have, for example, multiple containers running, they can all mount to that volume and that's entirely managed by Docker so in a way it's quite safe but also it's quite easy to have containers communicating together, for example, through that shared volume but for you, especially for beginner users, the bind mount is the one you will be using most frequently and that's also the type of mounting we will use in the exercises. Then a few words about identity. So let's say you have a directory that is on your host computer and you have mounted that inside the container. Falco has a question. Can you hear me now? Yes. Okay. With regards to the basically sharing between or like interactivity between different containers, why would a volume necessarily be better for that than a mounted directory? Can you not share mounted directories? You can. Definitely. But the mounted directory is so the mounted directory is fully managed by the host computer, meaning that, for example, if you're mounting a Windows directory to a Ubuntu container, there are some limitations there. Well, if you are mounting a Docker volume to a container, then it's fully managed by Docker and therefore also by the container. Therefore, if you always require, for example, a shared directory between containers, then you want to ship that group of containers as an application. Then you don't want to rely on how the directory or the file system is managed by the host operating system, but you want to really manage it by Docker in order to be more reproducible, for example, because some things might then work on your Ubuntu host operating system, but that might not work on your Windows operating system. So it's just another level of isolation and therefore reproducibility. Makes sense. Thank you. So an example, let's say we have mounted a host directory, so part of the file system, of our host computer inside a container. And in a container, we are root and if we would be on a Linux system, if we write a file to that particular directory, that file will also be owned by root. You can also change the user inside the container. So you can actually use the same user ID and group ID as you are at that moment when you're running the container. If you are that identity inside the container, the files will also be owned by that identity outside the container. And being root inside the container can have a lot of advantages, of course, because you can do all the installations and everything, but if you're running the container for a particular purpose, you usually don't want all the files that is outputted owned by root, because it can give, for example, security issues, but it can also limit the things you can actually do with these files if you are back to your original user identity again. This is different in other systems, both in Windows and Mac OS. You can be root inside the container, but everything you write in a directory that is mounted from the local host will always be owned by the identity that is actually running the container. So it doesn't give too big issues there. However, a lot of people run containers on Linux, so you have to take this into account. And in order to take that into account, you have to specify that the minus U option. So the minus U option tells you who you want to be inside the container. If you are developing a container, then you want to be root, but if you're actually using the container, actually writing files to your local directories, then you want to be exactly the same user inside the container as outside the container. And this code snippet kind of, so this line will help you to actually take that over. Then the third mapping ports. So containers are nowadays also very frequently used to host browser content like JupyterLab, RStudio Server. So all of these you can have inside the container. And that's very powerful, of course, because then you can do all kinds of installations inside your container. Well, for example, for R, a lot of people who are using R might know that all these dependencies with R can be quite, can give you quite a bit of headaches sometimes, especially if it needs compiling, for example. They install all these packages that you require for a particular thing, particular project inside a container. And then you have, you're using an RStudio Server container and then actually interact with the container through that browser interface, through RStudio or JupyterLab, for example. One thing, and that's great, one thing you should then do is to actually map ports. So that browser interface is hosted inside the container and is there at a certain port, then you have to tell Docker, okay, emit this browser content or this content to a particular port inside my computer. So how you do that is with the minus P option. So it looks like this. So it is the first port that you specify at minus P option is the port you are mapping it to. So outside the container, so for at the host. And the second one is the port that you're mapping the port from, so inside the container. We will see an example of that in the exercises where you actually run JupyterLab inside a container and map some ports. Okay. Now we have some more advanced exercises. So what we have seen in the previous exercise, we have exited the container and you could see the container is still there. And in the first part of the next exercise, you will learn how to reattach to that container. So being able to go inside that container again, well, you have stopped it. So actually start it up and then just do it again. You will create an image with Docker commit. So that's the example we've seen where you actually freeze your container to an image. And you will use a container in a non-interactive front. So what we have done in previous exercise, we have used these options, I and T in order to have an interactive session inside a container. So you could use command line and play around in the container. You can also use your container kind of as an executable, meaning a non-interactive front where it does only the thing you want it to do and then just exits. Removing a container because if you run a lot of different containers from different images, you can imagine that list of containers becomes very, very long. Pushing to Docker Hub. So if you have created an image, how to add it to Docker Hub, mounting a directory. So a local directory, not a volume. We will not be doing that in this course, but mounting a local directory. So you have your local files available for your container. Managing permissions. So that's the one I just talked about. So who are you inside a container? Who are you outside a container and some exercise to kind of understand how that would work? And oh yeah, and the last exercise will be about mapping, mapping ports.