 All right, we're at time. I'm going to go ahead and get going, folks, and trickle in a little bit. I'm going to apologize in advance. I'm losing my voice, and so hopefully I don't sound too much like a frog. My name is Adam Miller. I work at Red Hat on the Fedora engineering team. Today I'm going to be talking about creating reproducible build system for container images, so Docker, anything OCI compliant at the runtime. And the topics we're going to go through is first, I want to define what containers are in the construct of Linux systems in the context. I'm sorry, not construct context. Is it better? One, two, is that good? Good, better? All right, fantastic. So I'm going to talk about what containers are in the context of Linux, a very brief history of Linux containers, in general, and just kind of talk about them in this abstract concept, just a set of baseline. Very quickly talk about Docker. I'm going to assume, by the title of the talk, and the fact that there are people in attendance that everybody knows what it is and how it works, but just in case we will quickly go through that. Then I'll actually get into briefly defining what release engineering is, and the reason that I will do that is for a little bit of background about why certain decisions were made on the design of this particular build system, why it's not just like this single little script or something like that that's not just firing off Docker build someone on your laptop. And then I'm going to actually talk about the Docker layered image build service, how it's built, it's various components, so OpenShift, the OpenShift build service, which OSBS, and then the Koji container build integration, and that's not specific to anyone not running Koji, but if you happen to be running Koji, it's also a different build system, that's what Fedora uses, I'll talk about that a little bit, and then specifically Fedora's implementation, and how we're using it and why we're using it and those kinds of things. So now that the agenda has been laid out, if this is not something that would be interesting to you, please feel free to go find something that would be, I do not want to hold you captive, you will not offend me if you get up in the exit, I would prefer that you get something that does interest you out of the conference and out of the sessions, so. All right, I'm gonna go ahead and kick it off. So what are containers? From an academic standpoint, containers aren't really a thing, it's generally considered operating system level of virtualization, and we in the greater Linux community have defined or decided that we're just gonna call that thing containers, and basically what that means is that you can take a piece of software and run it in a multi-tenant environment completely isolated from other instances of that software, you can effectively lie to it about its context, and you should be able to do that with having to change that software. You shouldn't have to actually alter the software to redefine its execution environment. And I'm going to propose something that some of you may or may not laugh at, and that's fine. Container's not new, and I think the original container was the Cherute. And the reason that I say that is because it allowed us to do that. It allowed us to lie to a piece of software about its surroundings. It thought, piece of software thought that it had a file system tree under it, and it could, or maybe it didn't. Maybe it just thought that it had a directory underneath it, and that it had a certain structure within that directory, a subdirectory structure, and the files within it were laid out a certain way, whereas, where, how they actually existed on a file system, that wasn't true. It was very unsophisticated. You couldn't necessarily confine certain things about resources, or about the file system, or about, I mean, you could if you had a very sophisticated file system, but at face value, no, not necessarily. And you also couldn't do anything about networking. You couldn't really, by Cherute itself, you couldn't really do anything. And a handful of other Unix-styled operating systems kind of took this concept a bit further. FreeBSG Jails, to the best of my knowledge, was the first one to really kind of take off with this in 2000. There was also Linux V Server, Solaris Zones, OpenVZ, and LXC. And the reason that I think that LXC is where things get interesting, because this is the first time that something has actually been merged upstream into Linux kernel. There's a lot of kernel capabilities that are now widely available to all users in all distributions, and LXC has made that easily and readily available. So IBM created this tool chain. Basically, there's a handful of command line tools that let us do certain things and basically wrap a process in C groups and namespaces and those kind of things. And those have been around in the kernel for quite some time. Around 2011, SystemDN spawn happened, and that's technically containers in the same line of thought. But interestingly enough, that was created purely to be able to test SystemD without having to reboot systems all the time. And then some people kind of took that and ran with it, and there's all kinds of, there's other people, there's really good integration with NetworkD, and you can do Rocket, for example, is based on it. I'll get, but anyways, 2013, this is where things got really big and got really popular. Once upon a time, Docker, Incorporated, was called DotCloud, so at the time of 2013, when this took place, this was a true statement, DotCloud released Docker upon the world. And this got very interesting because it introduced a tool chain that kind of standardized how we create and ship what eventually become, or what is a container image, but what eventually becomes the unit of compute known as containers. And we never really had that, that standardized format before. We never had that thing, that way to build them and then pass them around the internet. Yeah, okay, you could tar up a file system and throw it around and run it with LXC command line tools or there were a handful of things that we're trying to do that. If anybody was ever familiar with OpenShift version two, we had our thing called Gears and that was like an SCLinux Sandbox and AUC groups, but at the same time, like it was, it didn't have that very simple barrier of entry to be able to just kind of fire it up in your laptop. Okay, so from there, CoreOS came out with Rocket. They'd find a couple of specifications, 2015, Open Container Initiative happened, and there is a laundry list of sponsors involved here. So that is kind of a big consortium around trying to make, again, to kind of define that standard of execution, that runtime and that image format. From there, the CNCF popped up for Cloud Native Computing Foundation. They kind of want to go beyond what the OCI is based around and trying to do all sorts of orchestration and that kind of stuff. Run C happened in 2015 also. Run C is a reference implementation for the OCI runtime and it will run OCI images and then so also just as a fun point of trivia fact, as of Docker 1.11 and anything newer, Run C is actually what's running under the hood and then Container D in 2016, also 1.11 and newer is now a daemon that orchestrates Run C threads, I guess, or just execution. And the reason that I talk about this is because it gives a good idea of kind of where we've come with everything in the Container Ecosystem, but also to show for those who are not familiar, there's a lot of different people out in the ecosystem working on container technology and under the OCI, they've come together to try to define this runtime and there's a couple of others on here that I kind of debate about whether or not introducing, maybe I should, but the idea is having this specification runtime and if we have an image, a common image format, then we can just kind of share that piece. I'm gonna specifically talk about Docker today though because that is probably without much argue the most popular implementation there is. It is also the one that Fedora initially targeted because of its popularity, because of the demand from our users and then also just kind of from the standpoint of when we kicked off this initiative, the OCI hadn't existed yet. So for those not aware, Docker, so what is now known as Docker Engine, used to just be called Docker, it's the daemon, it's a single point of entry into your execution environment, has an API and it can be accessed remotely, locally. And then we also have language APIs, so if you wanna interact with it programmatically, you can do so. I don't know if anybody knows, there's a thing called Docker Pi, it's a Python bindings, I don't know if anybody's Python programmer, but that's what I've done the most of my work with because the system that we wrote is written in Python. So images are built from a thing called a Docker file and that's kind of your specification and your entry point into the build pipeline. And then a container itself is actually an instance of an image much in the same way that you have instances of effectively virtual machine images in infrastructures and service cloud. So your Amazon AWS, your Google Compute, your OpenStack, eucalyptus still around, yes? Okay, cool, yeah, all right, yeah. So eucalyptus, very similar idea, OpenNebula still doing stuff. I don't know, there's a handful of them. But it's the same idea, you have this image that becomes an instance that you run. However, one thing to note is there's a difference between your base image and your layered image. So the base image is traditionally provided by your distribution provider. That's gonna be your base runtime for all of your other applications to run on top of. So for example, here if for whatever reason we wanna take a Fedora 24 application and run it on a Fedora 25 host, we can do that because the base image for Fedora 24 is available and we can put that application, we can install it in there and we can run that on a Fedora 25 and the reason that they're different is because there's kind of this special, I guess, verb for lack of a better term in the Docker file called from scratch and from scratch means that you basically just pass what should be the root file system of this image. So for Fedora, we provide that, we build that in our build pipeline and we release that and so does everybody else. So Debian, OpenSUSA, R, Chubuntu, I don't know, is Gen2 doing Docker images? Yes, anyone? Okay, cool. So you have your base image and then you build your applications on top of that and your application base image doesn't necessarily have to be a image that's from scratch. You can actually have multiple layers and your layered image could potentially use a layer. But I try to distinguish between that because you do have a difference where a base image is something that is generally provided for you or managed in some way as opposed to being where you install your application and the layered image is gonna be where your application is. However, for some, actually I will put an asterisk on that because sometimes some folks are doing like some Go stuff where they have statically compiled, statically linked and they're just doing from scratch and dropping their binary in there and that's how they run. And that, that's fine, I guess, just depending on how you do things, but there's a difference. So a Docker file, by show of hands, who's familiar with Docker files? Most of the room, okay, cool. Basically, you have your from file and this is gonna find what your base image is, so what you're building from, what you're gonna build on top of, who the maintainer is, run some commands, expose potentially a port or two, run, I'm sorry, add files, run some more stuff and then have a command entry point. And there's a lot more, there's a lot more adjectives that come to this and this is just kind of a quick example. And then you do a Docker build and you create your Docker build and this is your output and you notice here there's different layers. There's these step one, step two and they kind of have these hash values there and there's different layers and at the end you don't actually have those layers anymore but they are intermediate layers and for each run command you get a new layer and you can squash them but you don't have to. Okay, so release engineering. What is release engineering? I'm not gonna go too far off in the weeds with this but effectively you're making a software production pipeline that is reproducible, auditable, definable and deliverable and also as much of as possible should be automated. Like for most intents and purposes we shouldn't have to put hands on a keyboard unless something goes wrong. And there's a definition there if you care to read it, I won't read it at you. I assume everybody here can read. But I do wanna show this diagram and I know that nobody can read the font on this in all these little cells because I can't either. Does anyone wanna take a guess what this diagram is? This is the dependency chain that is required to compile the Linux kernel. Just the kernel. So this is what we need from a distro provider standpoint just to deliver the kernel. And the reason I like to show that graph is just kinda show how much effort goes into trying to sanitize a build environment just for the first thing you need. Yeah, I guess not technically the first thing you need a bootloader first. Unless you're, oh. Yeah, so one of and potentially the most important package out of, so Fedora has 15,000 source packages, not binary, but source. So for one of those, we need that depth graph. All right, moving on, OpenShift. OpenShift is a container platform. It is OpenSource, it is, or a By Red Hat. And it is a Kubernetes distribution. Show of hands, who knows Kubernetes? All right, cool. So basically Docker is very powerful and it makes it very easy to run multiple containers on one host. It's very, very good at that, and it's very good at providing a low barrier of entry there. And Docker has the ability, if you are to use Swarm to scale out. Somewhere around, I'm gonna mess this up, I think it tells a 14. Google came out with kind of their opinionated container platform and said, hey, we've been doing containerized services for a long time and this is how we think it should be done. And somewhere along the way, a handful of people jumped in and were like, hey, we think that's a good idea also. So Kubernetes was now OpenSource. OpenShift is now built on top of that. And the things that OpenShift adds to it are developer focus. So one of those things is a build pipeline. And while it doesn't constrain the build pipeline, which is what you need from a releasing engineering standpoint, it does provide a very nice solution for scaling that out. So if you look at this diagram in the green section, there's a thing called, there's a thing that says build automation I should actually have highlighted that some way in my slide. But for those not familiar, this is kind of what the architecture looks like. And this is gonna be, I say OpenShift slash Kubernetes because this is basically what the architecture looks like between the two. So you have your master and your nodes and your master is gonna have an API end point and a schedule, and your schedule is gonna schedule things on nodes. And there's gonna be pods. And pods is either one or many containers that share kernel name space, they share network space, they share an IP, they share IPC, they share storage, all kinds of different things. And they're the smallest unit of schedule compute within a Kubernetes environment. And again, so for OpenShift. And the reason I point that out is because by leveraging this, by leveraging this already scalable cluster environment, we now have a build pipeline that can scale as well. So if we had a point where we have too much going on in the build system, we can very easily add more nodes and scale it out. Next up, the Docker layered image build service. So there's a lot going on here. I'm gonna talk through it. The next two slides after this are gonna be walls of text that I will effectively have read, well not actually read to use verbatim, but have talked through already. So we'll kind of skip those. The goal there was that my slides would be useful to somebody who just read them later. So we have the OSBS CLI and that is a command line tool that allows you to kind of interact directly with OSBS and from an architectural standpoint, what we've done is we have kind of wrapped, number one, we've created a OpenShift custom build. So in OpenShift's build pipeline, they have kind of this thing called a build primitive. And that build primitive, they have a standard set of build types and then also the custom build. So we created a custom build and then we wrapped components that need to go in that custom build in an API. And that API allows us to enforce the appropriate information being provided. So the only way to interact directly with our build system is through that API and that gives us the input sanitization. And then that build will go into OpenShift and in Fedora's case, we use OpenShift Origin which is the upstream community focused version. Then Atomic Reactor. So Atomic Reactor is actually a single pass Docker build tool that has a very large set of plugins and those plugins can do all kinds of things like adding or removing input sources. So if you are an RPM based distribution, it can add repositories if you're doing something with like PyPy or RubyGems and you're mirroring that content, it can inject those into the build environment so you can limit where artifacts come from during the build. It can also do things like automatically uploading the image to some registry when it's done. It can do metadata validation afterwards or metadata gathering. So one of the plugins that we have is actually it does a artifact metadata gathering so that we can have an upload into our build system into the database of what that container image that got built contains. So we know for historic reasons and auditing reasons what each image has in it. Then all the way down to the name version release of each component. So then the OSBS client API, which is going to be that API I mentioned and that's how the CLI talks and that's also gonna be how the plugin that I mentioned previously kind of goes into that. Sorry, the hot tea helps with the like losing of the voice thing. So we have a registry and we technically have multiple repositories inside that registry in just a base OSBS deployment. And every time a build finishes, it's gonna automatically be provided in that candidate registry. And what's great about the candidate registry is the fact that we can immediately, as soon as the build fails, have that based on webhooks or however you wanna do it, kick off some sort of a CI job and then that can immediately be pulled from and tested before it's then promoted. Show of hands, who went to the talk on Thursday from the gentleman from JFrog about patterns and anti-patterns in Docker Image Life Cycle Management? Okay, cool. All right, decent amount. It was a great talk and I kind of talked about this idea of dealing with the promotion of artifacts and the sanitizing of keeping keeping the inputs in a known good state so that you can then graduate the image through and that follows a pattern known as immutable infrastructure that to the best of my knowledge was coined by, I'm gonna mess this up. I'm thinking it's Martin Fowler, is either Martin or Chad, is it Martin? Martin, okay, Martin Fowler. Yeah, so and you have this single build artifact that once it's created, we don't mess with it. We just kind of graduated up to the point at which it's ready for production. So we follow this very similar pattern and then as our inputs, we have our content streams and we can mirror whatever content we want as input into and when I say mirror, I mean, we do actually gate it. We wanna gate it, we wanna sanitize it, we wanna have it almost curated for lack of a better word. We wanna curate that content stream just to make sure that our inputs are known good and nobody's doing anything silly. Another thing that we do is we actually, we have a build route which atomic reactor, so our build route is going to be a minimal docker image and that is where we perform the builds inside of. So we actually have a docker image with atomic reactor that is then kicking off a second docker image for the build. So we have a controller and a builder and inside of the build route, it's confined. So we actually have it locked down at the build route level so we will not allow somebody to run like curl pipe to bash in their docker file. We will actually fail that build. The question. Okay, so the question was, because I mentioned the JFrog talk, what's the strategy for the registry? Do we have multiple registries? Do we have multiple repositories? Are we tagging things? So the answer is we're tagging things in just base OSBS and I'll talk a little bit about Fedora's and we augmented that a little bit. But by default, every build actually gets a unique identifier. So it's a date time random UUID stamp and then you can also actually define the name version release of your component. So in your docker file, if you provide three pieces of metadata as labels, those will be used in creating a name version release tag for the resulting image that goes into by docker terminology, the repository inside the registry. Okay, and here's the wall of text and here's Fedora's implementation. So this will be a little bit more long-winded but it will build on what we've already talked about. So in Fedora, just as current content, so a little backstory, current content today is contributed by maintainers. So all of the RPMs that people install and this is gonna be very similar with every other distribution in the world. So you're Debian, you're OpenSUSA, you're Arch, you're Ubuntu, you're, I don't know, Mint, yeah, so Mint, they do, yeah, they have their own builds. Gen2's gonna be a little bit different. They're doing eBuilds so they're not actually like building packages but somebody's to maintain that, something that thing that gets taken in and then created and redistributed in some way. So there's a maintainer in the same fashion so we have Fedora Layered Image Maintainers. And the Layered Image Maintainers use our setup we call DistGit. So DistGit is DistroGit and the idea is that you have a Git brand, you have a Git repository for every component of your distribution. So for RPMs in our DistGit environment, we have an RPM namespace and inside of the RPM namespace, we have a Git repository for every single RPM in the distro and each repository has a branch, I'm sorry, each repository has a branching strategy of master branch is the development branch of the distro and then every other branch is a version release of the distro. So Fedora, we're currently on Fedora 25, so the development branch will eventually become Fedora 26. So we have the freeze and branch phase where we branch the distro and everything and what we mean by that is we're literally branching every package. So anyway, so DistGit for RPMs will be spec files, for the container namespace it will be Dockerfiles. So we'll have our Dockerfiles in our service scripts if your service needs a script of some sort for startup and then tests and then documentation. And probably can't read this, but right here there's a little Fed package, FedPKG, that's a command line helper tool for Fedora packages to be able to interact with all of our systems and submit builds and updates, all kinds of things. So Fed package container build will then kick off a task in KoG which is our authoritative source of all binary artifacts in Fedora. So KoG is where RPM builds happen, that is where the container builds are scheduled and all of the metadata consumed, that is where we create our ISO and our cloud images, our ARM images, effectively anything that release engineering curates together as a single release and pushes out to the world, goes through this system. And the reason for that is is you kind of need one thing to be authoritative. From a release engineering standpoint, you kind of have to be able to ask for a source of truth and this is ours and it has a post-regressed database on the backend, it's backed up, it's very nurtured, it's very cared for. So the container build's gonna kick off in there and then that will schedule in OSBS, which is, so this little box down here is the box that was up top left in last slide before the walls of text. And now it's gonna do the build, just like we talked about, except instead of having one registry to rule them all, we have two registries in Fedora space and you can set this up optionally if you deploy OSBS for yourself. It's documentation's out there. So you push this up into the candidate registry and as soon as the candidate registry is done, we actually, we don't have it yet so I didn't actually put it in the slide here, but we will have our CI jobs because everything in Fedora's infrastructure has a message bus, so when the build is done and that image is available in the registry, there's a message set on the bus and our CI system can then consume that message and kick off CI and then automatically promote that to stable. But yeah, so once the build is done, we will actually bring all of the metadata about the build back into Kojina's database and here's an interesting point about that is that is our method of persistent historic data about the build because if there's any point in time where OSBS falls off the map or we lose data or something happens and we need to rebuild all of those artifacts, we can inside of Koji create custom repositories of RPMs to then feed into and verify that they're the same name version release of every component that will go into the container image and we can literally reproduce that container image down to the name version release of every component inside of it. And we do that because we have the ability to gate our content streams. Or I'm sorry, we can do that because. So right now we have a standard two-week release cycle on our container images. We also will do ad hoc releases, so if like shell shock happened, we will get a build out. Right now we have a standard two-week release, goes out to the registry, users can pull it and we go from there. So again, a little bit of text. Disk it, Fedpack is talking about those. I'm sorry, Koji Container Build is the thing that links those together and registry. Registry is the upload download point. I'm sorry, I probably should have clarified that but for those in the room who have used Docker, they know. For traditional Linux systems, you're gonna have a repository. So if you're a Debian base, it's gonna be apt. If you're, well, yeah, apt-get or aptitude and if you're arched to Pacman, your package manager is gonna go out to a repository of some sort to get the software. In Docker vocabulary terms, there is a registry and inside of the registry are multiple repositories. The repositories are gonna be your named and then inside repositories are gonna be tagged images. Yes. So the question was about Koji, if Koji builds Docker in Docker. So Koji doesn't actually build Docker. Koji schedules the build in OSBS and then there's a thing called a content generator metadata import such that OSBS is a content generator and then once the build is done, we actually re-import all the metadata so that we could reproduce that build if we need to. So I wanna round back to what release engineering is and make sure that we have kind of satisfied the requirements. So we want something that is reproducible, audible, definable, deliverable. So for a reproducible standpoint, because of the fact that we have the ability to limit our inputs and curate our content, we also have a output manifest of everything that went into our image, we can then recreate a set of content to be the inputs into a new build to recreate that from an auditable because of the fact that we have those manifests, we can store them persistently and then verify that images with checksums come from, you know, we can at the end of the build, from the build log, we are given a checksum. We can verify that checksum is mapping to what we're actually getting. Defineable, so we define, we open, OSBS defines the OpenShift build. So if the definition is violated, we can fail builds. So we are able to actually define the inputs and define what it means to have a build go through and succeed and then deliverable. So we have a method by which we gate and promote content out to the world. And, okay, so I have a pile of references. This is, I didn't just like randomly pull this, pulls out a thin air. I have, I've spent about the last year of my life working on this system and these are various references of things that I've read and or worked with. When I did my drive run of this, it took me about 10 to 15 more minutes. I spoke either very fast or I glossed over some things. Do you have any questions? Question? Absolutely, yes. Okay, so the question is normally in a Docker file, your input would be some kind of set of content from like a Git repository or somewhere out on the internet. And in this system, does that mean that our input would be like an RPM? Well, your input would be very similar. So in the sense that you do today a, I don't know, what's your distro of choice? Okay, whatever. All right, so for a case of example, so as a Fedora representative, I'm obviously gonna say Fedora, but for Kix, let's go with Debian. You would apt get install some software and then you would put your software on top of that. So maybe you would get clone your software. So the repository of .deb packages or Deb or DPKG. No, DPKG is the program. Deb is the file format. So Deb, like you would have a repo of that and that would be your input stream. And then you would have another input stream as your Git repo. And those two things would be whitelisted. Whereas the input stream of Debian packages, let's say you on your internal team from a releasing standpoint want to gate that and you're not just gonna pull from the internet. That's gonna be a curated set of content that is known to be good, that it's gone through some sort of validation and is available and ready to be pushed out to your systems. That's what I mean by the content stream. Great. Question. So the question was can I as a user verify and audit the build? Correct? Okay. So the answer is yes, but it's not super simple. Yeah, I mean it's not super simple in the sense that all the information is made available. I mean, especially for Fedora. So everything we run is open infrastructure. You can actually as a person off the street join our infrastructure team and they have an onboarding process and all of our systems are openly available and all of our Ansible playbooks are on a public Git. The only thing that's not up there is the private Git repo with the vault and the passwords in there. Yeah, so all the data's out there. The logs are out there, the metadata's out there. To reproduce the build, you would mostly only need two things. You would need atomic reactor, which is the command line tool I talked about, and then you would need a way to create the input stream, the curated input stream. And if you could do that, and the reason it's kind of vague is because depending on what your input stream is of choice, I don't know. So like for example, like start PMs, you just need to have a local RPM repo and then you can tell atomic reactor like, hey, this is my thing as my input and you could do it that way without having to run the whole system. The whole system is mostly about integrating, number one, being scalable in our environment, number two, integrating with tools that we already have. So yes, I mean the answer is yes. The problem I think I see with that is, it's not as easy as it could be and I don't know if we have like a really well-defined set of documents for that specific process, like from ground zero. Like some of it kind of assumes a little bit of background knowledge. So yes. Okay, so the question is, I think the question assumes we're trying to solve configuration management and we're mirroring content to be able to do that instead of pinning versions. Okay, so number one, we're not attempting to solve configuration management. We are attempting to actually, so right now in the history of Fedora, the only true artifact we shipped was RPMs and then things build out of RPMs. We are now getting to a point where we want to actually ship containers as a thing and then you would install containers or sets of containers to provide a solution instead of RPMs or sets of RPMs to provide a solution. So we are actually trying to kind of create that fundamental build artifact or that image thing that you as a user would download. And in terms of mirroring, it's not so much that you have to, you just like have to mirror and you absolutely could pin. It's just that from going back to the giant dependency graph, if in the pool of 12,000 components that Fedora has to deal with, if one of them pins to a version that is incompatible with the version pinned by another, nothing will work and we can't ship anything. So we had to find a way to kind of handle curating the input stream in another way. So it's not so much that we're mirroring the world, it's just that we already have the world because we are a distro provider. However, I mean, if you wanted to and you wanted to do version pinning inside your Docker file, absolutely. That's a perfectly valid solution. Yeah, for somebody who doesn't have the weird problem we have. I mean, because in a very realistic sense, I mean, the problem that Fedora has only half a dozen other projects, maybe let's say 20, let's go on the upper bound and say 20 open source projects on the planet have to deal with producing. We actually, so we produce over 20,000 binary RPMs, but our input is 12,000 source repos. So yes, that is a perfectly valid solution. Yes, yes, absolutely, and that's good feedback. If I ever give this talk again, I'll know to kind of frame the reference of why we're doing this because we have the world. Thank you. Well, so okay, so the question was, I assume you're going to the second part. Okay, so the original statement was that the claim is that we have a checksum that maps, that can be verified. Yes. Okay, so that's a very good question. The question was, if there are post install scripts that are run by the RPMs or something else that are installed inside the container image, do we have any kind of sanitation that happens to result in an identical checksum at the end? The answer is no, because we don't actually provide that. What we provide is the ability to have, if we do have to rebuild, we can audit those two images and by each of their checksum map them back to the same set of input data. So we're not doing bit for bit reproducibility, which yeah, that is a much more difficult problem to solve, which I'm sure you are exponentially more intimately familiar with than I am, and probably everybody in this room. But that is something that we've looked at. Do you know Dennis Gilmore? Okay, so yeah, I spent a decent amount of time working with Dennis, and that's something that we want to get to, we're just not there yet. All right, thank you for your time. Check, check, check, check. Sounding good out there? Awesome. Check, check, check. You guys hear me all right? Everybody's still feeling good? Still awake? Good Saturday? Ready to rock? We've got a long way to go still. It's not yet Saturday night. All right, let's get this party started. Well, some people keep coming in, but I'll start by introducing myself. My name's Michael Rivnak. I work at Red Hat. I'm a software engineer there. We're gonna talk about Kubernetes and how I deployed essentially a pre-docker application in Kubernetes. Let's start with talking a little bit more about me. Why not? I wanna give you just the context for what perspective I'm bringing. This is not gonna be necessarily another introduction to Kubernetes talk. We're gonna talk about deploying an application in Kubernetes from a particular perspective. So I'm a software engineer. My daily activities are occasionally writing code, but more often reviewing code and interacting with stakeholders, receiving requirements, receiving bug reports, triaging these things, reconciling schedules, and all that kind of stuff, which I enjoy. But what that does mean is I'm not a systems engineer, not a DevOps engineer. And so I don't deploy anything important. I deploy virtual machines on this laptop. I've deployed the occasional humorous IRC bot, but other than that, nothing important. So why am I standing here talking about deployment and giving a talk about it? Well, it's really because I want our users to have a great experience. We, as a software development team, and we'll talk about these couple of roles a little bit more later, we have to enable our users to have successful deployments, particularly I think in the open source world, the role of development and packaging tends to be housed under the same roof. So we do those two things, and then some unknown number of users, we hope it's a lot, go and deploy this stuff that we've given them with the documentation we've given them, the packaging we've given them, and all that kind of stuff. So if they don't have a successful employment, they come back to us. Anything I can do, or that my team can do, to make our users more successful with their deployments is a good thing. Anything we can do to help us spend more of our time writing software, which is what we really love, is also a good thing. Here's a story I want to tell, is from this perspective of a development team working on an application, and in particular a pre-docker web application. So this application was started seven, eight, maybe even nine years ago now, and it simply wasn't designed with containers, as we know them now, in mind. It was designed with a different model. It does have multiple services. These are not microservices, which is what we like to see in container orchestration frameworks these days. These are in fact some pretty big services. But they're distinct. They have different roles to play in a cluster. They scale differently. They have different resource requirements and consumption behaviors. So they're actually a decent fit for a model like Kubernetes of deployment. So it scales horizontally. We can scale both for capacity or for high availability, or both. But it does it the old way. And this is where you want to scale, you add another machine. You install more software on that machine, you configure it, you add it to shared resources, and so on. It's a familiar pattern that's gotten us pretty far, but it's not the Kubernetes and OpenShift model. Speaking of the people who are deploying this stuff, we have a community of users that's pretty diverse. So we have our upstream community. We have, we're part of Fedora. So that's exciting. That's our first distribution that we got added into. We have people who run our app on RHEL and CentOS, various versions of Fedora. We're also being a Red Hat sponsored community project. We're part of two different products at Red Hat. We also have our app run in several large internal deployments. Red Hat runs it in public clouds. So if you guys are doing things like yum update from an Amazon EMI, you're talking to this app that we're talking about today. But all that variation of infrastructure causes a variation in the things that us as a development team and packaging team have to support. That gets back into the complexity story. So most of our users start with a single machine deployment. Maybe they start by putting a database on a second machine. But otherwise, they take all these services that our app provides and they deploy them together on one machine. Maybe they have a test deployment or a demo that turns into production. I've all seen that. Maybe they're just not sure what their ongoing long-term resource needs are gonna be. Maybe they're not sure what their high availability needs are gonna be. For whatever reason, most of them start with a single machine. At some point in the future, they decide they need to scale. They need the high availability or they just need more capacity. And now it's time to add a second machine and a total rethink has to take place of shared resources. And it's these shared resources that's the primary topic I wanna address today and walk through and really get our hands dirty and see how we wrestled with those concepts of shared resources and fit them from this, I don't wanna call it a legacy application. We'll just stick with pre-docker application into a container orchestration framework kind of world. So the first of these is shared storage. This is a simple and straightforward one. Our app has to be able to share a file system among different services that write things and read things and then serve things out to clients. We'll talk to details about this app in just a moment. But shared storage is a requirement, it's a hard requirement. And as a matter of fact, one of the biggest reasons we were not able to recommend or really pursue containerized deployment of our application for quite some time until it became a first class, very mature, supported feature of frameworks like Kubernetes. But now we're in good shape. The second thing is shared configuration and secrets. So we have a service that we're now running across multiple machines or multiple nodes or multiple containers, whatever it's gonna be. We have to have, they all share configuration, they talk to the same database, they talk to the same message broker, they have some of the same secrets in terms of authenticating to other services or how they handle authentication for users or other services trying to talk to it, all those kind of things, add yet another layer of complexity. And then scaling services independently is another challenge that shows up when we add that second machine or when we want to scale. All of a sudden you have to make decisions about on that second machine, obviously we're talking here about the old model. On that second machine, do I just run all the same, like just a copy of the same services that were on the first machine and fire up more of them over there? Do I pick and choose and look for which ones are the bottleneck and which ones do I really need extra capacity on or which ones are really the most critical that have to survive failure of a machine, for example. Those kind of decisions add complexity to your deployment. And these three things here are things that Kubernetes or OpenShift or frameworks like that help us with tremendously. That's why I'm very interested in this deployment model and what's beautiful about it. It can take a lot of these problems away from us as the deployment team or the development team and the packaging team and allow it to be handled by this layer of abstractions, software engineers love layers of abstraction that we know as Kubernetes. So today we're gonna get our hands dirty and some yaml. We are gonna look at what makes this application deployment go. We're gonna try to connect the dots from one resource to another in these three different areas. So I'm gonna show you what do you have to do to make shared storage work with an application that wasn't really designed to be deployed in Kubernetes? What do you have to do to take an application that was not designed with the best practices around configuration in containers, but make it work with the concepts that Kubernetes provides in terms of shared configuration and shared secrets? So let's get introduced to our application. This is Pulp. This is the application I work on, Red Hat. It's a Python web application. Not surprisingly, it's open source. It's on GitHub, everything's out in the open. The mission of Pulp is essentially to manage repositories of software. Say you want to have an onsite mirror of yum repositories. Pulp can help you do that. If you wanna keep those up to date on a schedule, Pulp will help you with that. If you wanna also manage other kinds of repositories like Docker images, Python packages, OS trees, other things, puppet modules, Pulp can handle all of those with one standardized API. So we're part of Red Hat's general solution for how do you get onsite copies of software repositories, and then how do you manage things like promotion workflows and that sort of thing. So that's the general problem area we're working in. It has a REST API. So when I call it a web application, I don't think that means it has a graphical web interface. It does not. REST API and a command line interface, we'll see you later. Like we've been talking about, it is not designed for container deployment. So let's see how it is designed. This looks, I think, pretty familiar to most of us. This is the most simple, probably, stateful web application that you might think of, something simple like a WordPress, where you have a web server, it stores some state in a database. Life is good, it serves requests. But Pulp has a requirement to do also asynchronous long-running jobs. For example, go download these 30,000 RPMs from a Fedora repository, and I'll see you in a couple hours. So we have these worker processes. This is a second service. We use, for anybody who's a Python fan, we use a Celery framework, by the way, to have these worker processes. It's worked very well for us. But then our worker needs, we want to be able to queue work for them in efficient and even advanced ways. We also have a Message Broker, so that's yet another service that we deal with. And then there's this Task Editor, even on top of that. Without getting too deep into the gory details, Pulp makes some guarantees about work coming into these workers, that the order in which they're queued is preserved. That's the order that they get executed in. And for any given repository, we don't have two workers working on it at the same time. So we do some fancy management of putting messages into specific queues to preserve those guarantees. And that's what this Task Editor is for. So this one is probably one of the closest that we have to a microservice. And then we have this thing, I'm just calling it Cron here, that's not what we call it in the Pulp land, but it's the most general term. It sits around and is also microservice-like. All it does is queue jobs on a scheduled basis, whatever a user has configured. So these are the things that we are going to look at containerizing and deploying. Everything that's in blue here is a Pulp service. And everything that's in blue scales horizontally. You can have as many of them as you want. Workers tend to be where we want the most, they're the ones getting the work done, followed probably by the web services. And then these others are actually run as singletons and then you can have like hot spares waiting to take over in case the master dies. So that's the general model we're gonna work with. So now that we know Pulp, let's do a quick Kubernetes refresh, promised you this is not gonna be an introduction to Kubernetes. There's a great introduction on Thursday that you may have missed. We're gonna do a quick refresh, make sure we're all on the same page. Maybe some of you are hearing this for the first time. So I'm gonna oversimplify these concepts deliberately to give you I think the right frame of mind to think about them for purposes of getting an application deployed for the first time. So here we go. A Kubernetes node to machine, that's it. Physical, virtual, doesn't matter, it's a machine. Machine is part of a Kubernetes cluster. We're good. Okay, a pod. This is where it gets a little more interesting. A pod I think it was just co-located containers but most of the time you're gonna have one like the one on the right where it's just one container and a pod is the Kubernetes mechanism for expressing run this pod and that's it. Sometimes as you have this example on the left you'll have co-located containers where for a resourcing standpoint they need to be in on the same physical machine perhaps. They need to do inter-process communication, that sort of thing. So here's an example I think I stole from the Kubernetes documentation itself of an Apache web server perhaps with a cache and maybe this web service checks the cache before it goes and checks the database as a more expensive operation. In that case you may wanna say every time I deploy an Apache web service container I want to also deploy a cache married to it. So that's what this pod concept's all about. Pretty good on that, questions? Replica sets are where it really gets interesting. Replica set is really just a way in Kubernetes of expressing I want to have to take this pod definition and run two copies of it at all times or some number of copies of it at all times. Really very simple. And the Replica set does the monitoring of those and ensures that if one dies it'll spin up a new one, it can do health checking and things like this. So scalability wise this is where things really get interesting. And then the last one, second the last one is a service. Kubernetes service is how two different, essentially the network presence for pod is how I like to think of it or how some other pod might interact with your pod. So it gets an IP address, you can have ports that are open and it's the Kubernetes concept that ties things together from a networking standpoint. Does that make sense? Public interface, although interface is overloaded here so it's probably a bad term. It is actually a cheap load balancer. Yeah, it does do load balancing. It's not what you probably would normally think of as a load balancer, but yes. Essentially your service ensures that you can connect to any node in your cluster on a given port that a service is serving. And even if an appropriate pod is not running on that node it will route your request to a node that does have one of those pods. But we don't wanna use it in the way that we normally think of as a higher level like external load balancer. And then lastly, this is one of the most important when it came to deploying pulp is secrets. Secrets are really just small bobs of sensitive data. Kubernetes does some nice things to avoid writing them to disk except in one place on its own source of truth. This is the way that we're gonna inject things like configuration files, things like authentication credentials into these images that we've built for pulp and make it run. So these are the five concepts. Any questions? This is actually pretty important to at least have a general vague oversimplified idea before we move on. So don't be shy. Yes. Yeah, we're gonna look real in depth at it. We're gonna see it in practice. We're gonna look at some YAML. Yeah, we'll definitely dig in. Anybody else? Yes, please. I'm sorry, I can't hear you. Dig in? Yep, secret. Secrets, are they distributed or in one place? Kubernetes stores them in its storage service, which is at CD, and that I think you can make highly available itself. So it depends on what you do there. But in terms of on nodes, when you're actually deploying an application, it creates a temporary file system and keeps it there. So it doesn't write it to disk anywhere else is the goal. Okay. Let's get into actual practice. We're going to, okay, great. You guys can see, I did not plan very well in advance for being able to drive this. So I'm gonna just crane my neck and try to do that. So this is the Kubernetes console running. I'm running Minikube, which is a fantastic way to get introduced to Kubernetes and very, very easily in just essentially one command, Minikube start. Starts up a virtual machine running a single node Kubernetes cluster. So here we see our one node and we can come down and see replica sets. These match some of the services we talked about earlier. So it's HTTPD. We have a database as MongoDB. We have a message broker that's Cupid and so on. And if we look at our pods, these are the pods that those replica sets are managing that created for us. One of them, there's a, let's see, do we have workers? Yes, we have one worker there that is running. We have services for each of the services that needs to be available to be externally connected to. And then lastly, we're gonna look at secrets and we're gonna dig into these as I promised later and look at how these get created and what's inside them. But this is a really nice view to just get an overview of what it looks like to actually deploy an application in Kubernetes. So now we are ready to dive into some YAML if you guys are ready. Okay, we talked about Minikube. So let's talk about storage. So persistent shared storage is such an important aspect of this. We're gonna look at a resource file that sets up storage and we're gonna start with these first 11 lines. Just to walk through this quickly if this is the first time we're seeing one of these YAML files, this is the way that Kubernetes allows you to express a resource and load it in. We'll see what loading looks like in a moment but you essentially maintain these YAML files and you just stuff it into Kubernetes. Kubernetes create whatever's in this YAML file and it does. And in this case, we have this kind, it's a persistent volume. So here's where we're gonna start thinking about the line between a deployer and a packager developer hybrid. A persistent volume is something that the deployer of this cluster is responsible for providing. We've given this one a name, VLP because I'm a lazy typer that stands for varlib pulp. We've given some capacity and other attributes. Those are just basically defaults. And in this case, the type of storage I've used is only valuable for a single node demo quality deployment like we're doing here, which is a host-based storage. So it is storing on this local host whatever node it happens to land on it will use storage on that node's local file system. Great for a demo, terrible for anything that's real in production. Like I told you, I don't deploy anything important so I'm gonna leave that to somebody who actually takes this and deploys it in an important way. But instead of these two lines here, this is where you could express something like an NFS mount or iSCSI or use something fancier like Gluster or Ceph or if you're in a cloud provider, you could use probably whatever cloud provider you're in whatever persistent storage it offers probably is supported by Kubernetes and you could express that here. This is a wonderful way for a deployer to take these 11 lines really just modify the last couple to whatever resourcing they have and they're done. They have now fulfilled their obligation to providing persistent storage to pulp as an application. Now let's come back to me as a software engineer slash a packager. These last several lines, nine, 10? These last 10 lines are the part that we provide that would not necessarily need to change unless you wanna tweak the amount of storage. So I'm doing demos, one gigabyte seemed like enough. This is really just saying given the persistent volume that we defined up top, I wanna claim some of that persistent volume with a persistent volume claim. So this is a nice separation of concerns. I've referenced a name or given a name that this claim can be referenced by in YAML that we'll see in a moment. Okay, everybody with me still on this YAML? Anybody have questions about this before we go start connecting the dots to other parts of what we've done here? Great, let's keep moving. We're going to look at a pod definition now. We're gonna talk about the purpose of this pod because I haven't really talked about the purpose of this pod yet, how it fits into the role of deploying this application but we'll get there. Suffice to say, this is a setup pod. On line two here, we see the kind of resource we've defined as a pod. So it's Kubernetes itself is parsing this YAML. That's how it knows essentially how to address this particular resource and what to do next. That dictates what we fill in for the rest of this YAML. Given this pod a name, label's not too important to think about real hard. Okay, we've defined a container. So this is a container I built. It's just a normal Docker container. We'll look at a Docker file but it's honestly pretty standard. It's actually very basic. I didn't have to do very much special. Image pull policy, that's really only helpful if you're building locally like I do as a developer. Okay, where does the rubber meet the road on this storage? It's down here in the last three lines. We have referenced a volume and its name. We've referenced it. We've given it a name to address it on line 25 within this YAML file. Line 26 shows this is a persistent volume claim. That's the type of volume we're defining here, essentially making available to this pod. And then we've referenced that specific claim name. So this line 27, this is referencing the same identifier we identified in that last YAML file. So this is just making that volume available within this pod definition. And then we come up here to our volume mount section. This is how we actually mount things into a running container. We referenced the name and a nice beautiful part of this abstraction is notice we don't have to care what kind of storage this is at this point. Not only do we not care if it's NFS or local file system or anything else, we don't even care if it's a persistent volume claim at this point. It could be any other kind of volume. We're just saying take this volume and mount it in the container at this path. And that's really it. It's now very simple, relatively speaking, to configure and deliver an application, even an old school application like this one that wasn't designed with this stuff in mind and provide for shared storage in a container cluster. So any questions about storage? Okay, let's look then quickly at a specific problem that this particular pod is solving for us. So I told you this has a special purpose within our deployment. So I'm gonna tell you about a limitation. This falls in like the technical debt category that most long-lived applications have, especially ones that were not designed with this in mind. When you install pulp on a brand new machine, it lays down some structure within its data directory, which is varlib pulp, as we've seen. And it assumes that that data is gonna be there. It's really just a directory structure, but it assumes that that's gonna be there. It really shouldn't, and we really ought to fix that, but we just haven't. So in the meantime, while we get pulp moved over and adapted to this new mindset, I needed some way to say on the image, when I built a Docker image and installed pulp RPMs on it, it laid down this boilerplate file system structure, I need to now get that onto my persistent shared storage. And so this setup script is gonna do exactly that for us. So you see we're mounting this volume at this, I just made this, not a great path probably, but slash mount pulp. And now let's take a look at that setup script that's in our base image. That's all it does. Probably even that fourth line isn't necessary. It just is rsyncing what's baked onto the image right over to the LP. So this is, I think, a decent pattern for any kind of initialization that you have to do for your application the first time it runs. If we're following best practices for a new microservice container, cloud everything application, you're not gonna need any of this, but if you're dealing with an older application that just has needs like this, I think this is a decent way to go. Let's take one more look at this pod. And what we've done, if you notice, we're gonna reuse this base Docker image throughout all the different images, whether it's a worker, whether it's HTTPD, whether it's this setup script, or whether it's our database migration tool, they're all gonna start from this base image. So what we do is it's literally kubectlcreate and we're gonna feed it a file and we're gonna give it setup.yaml. And that's all it takes to create a pod when you have a definition like this. Now it's gonna complain because that pod already exists. We can look at it. And this is actually a good thing because we only wanna run this once. We can see it here. Okay, can you guys see my highlighting? No, you can't, dang. Okay, it's the second and last line. It's called setup. You can see that it completed 17 hours ago. There's not one currently running. There was at some point a desire for one to be running. That's really all there is to it. You run the pod, it finishes very quickly and it's done. Now there's another construct in Kubernetes that can be used for this sort of thing. It's called a job. As near as I can tell for this very simple use case, I couldn't find any value in using a job. But if somebody knows a reason why I ought to use a job instead of just using a raw pod, I'd like to know it. Come, maybe talk to me after or maybe during the questions you can tell me. Otherwise, this is just a very simple use case. Run it, let it exit and it's done. Any questions? I'm real casual about it. You just yell out questions as we go. Sorry. Sorry, one more time? Yeah, so there's zero out of one. So yeah, what the ready column is showing us is on the right, how many do I expect to be running? And on the left, how many are actually running? So this is a little bit of a weird misleading concept with this particular one. And this is maybe the one way in which a job would be more appropriate. A job expects pods to exit. A pod itself really is designed with the mindset of it's gonna be long running. And if it exits, even if it exits zero, it's considered a crash unless you set a specific setting which we did here to not restart it. So it's a little odd, but it's at least showing you that zero of them are currently running. Does that make sense? I'll answer your question. Okay, I think that's all we have to look at for storage. So we're gonna move on to secrets, which is probably the more interesting part of this talk. Let's go look at our secrets. I'm gonna show you all my secrets, at least just for Paul. Okay, what do we have in this directory? Everything that's green is a script that I created that's not important to think too hard about, but these are just things that generate the secrets. And the kind of secrets we're talking about are SSL certificates. This is, oh man, like I said, I'm a developer. I don't like to deploy things, especially things that are important. So I very rarely go through the full process of creating all the SSL certificates that are required to do a full legitimate production-worthy deployment of pulp. And man, when I went through it recently to make this happen, it was painful. So I'm real glad that it's fully scripted. I don't have to deal with it anymore. And one of the most painful parts of this for what it's worth is a Cupid. Our message broker requires this NSS database to be created and you have to add your certificates to that using tools that you've never heard of. So in any case, I'm done complaining now. These scripts create the secrets that we're then gonna load into Kubernetes as our next step. So for example, we have this make certs Dutch. And all it does is all that nonsense pain that I just told you about. It's 131 lines worth of creating SSL certificates for us. It creates a CA, a self-signed, it creates other certificates, signs them. Details are not important. What is important is that it spits out all these certificates in the search directory. And the next thing we're gonna do with them is load them into Kubernetes as secrets. Similarly, this CupidDB directory is that NSS directory. So we're gonna also load it in. The one other thing I'll point out before we look at how we load a secret is this full server conf file. We can take a look at it. This is the default pulp configuration file. And as you can see, we've done our best to well document it. So it's hundreds of lines and all the defaults work, but it's designed to help you, that really make it easy to configure. We don't want to load all of this commentary into Kubernetes as a secret. Secrets really ought to be small bobs. Probably a few kilobytes here and there between friends isn't gonna hurt anything, but certainly it's not a bad idea to just keep your secrets small and manageable. So once a user who's deploying this thing has come in here and finished customizing it, however they like, for example, you might, if you don't want to deploy your database in your cluster, for whatever reason, like maybe you want performance, you want to run it on bare metal, you could change this section and have it talk to some existing database, that sort of thing. You can, you edit that full file and then if we look at commit.sh, this is what loads secrets into Kubernetes. Let's pick one line here to look at. I'm just gonna put some space around. Let's look at line 14. This is, well, it's very simple. We picked one file. We said, hey Kubernetes, I'm gonna create a secret. It's a generic type of secret. The name for the secret is MongoDB-cert and I want you to load this file and it's this certificate that we created from MongoDB. So you create that, you run this command and it's in. It's in the cluster and that's it, you're done. Unless you want to change it in the future, you do this one time and now that is available to be mounted into pods that we will spin up later. And the rest of the script just goes through each of those artifacts and loads them as some kind of named secret. And up here on line five and six, oh I guess that's really, yeah, five and six. This is where I did some fancy eGrepping. Really it's not very fancy. Some poor man's eGrepping to strip out those comments and create a like minifiedserver.com and load that. Any questions on secret creation before we go look at how we use them? Yes. Oh man, I just can't hear you. I'm sorry, I'm gonna run right out. See you again? Oh, great question, thank you. Okay, the question is can you create these secrets as resources in a YAML file like we did our other resources? And the answer is absolutely yes, you definitely can. I don't have an example here, but essentially what you do, it's a resource that looks very much like the others. They all look basically the same. And at some point in there you can define data values where this is the name of a secret and here's the value. For the value, you're expected to base 64 and code whatever your blob of secret stuff is and stick it there. The thing, I mean a couple things I don't like about that are one size maybe gets to be an issue like if it's an entire certificate. I don't know, I guess I also don't like dealing with YAML and JSON by hand, especially when you've got that kind of line wrapping or who knows, I'm sure it's reconcilable, maybe I'm just whining about that and should suck it up and learn how to wrangle YAML better. But why bother? If I can just write it all out, write my secret to a file and say the data's there, just read it, I think that's a lot easier. The other thing that you at least have to consider before you go putting a secret in a YAML file is how are you going to manage these YAML files long term? Probably you're gonna check them into something like a Git repository and you probably don't want your secrets checked in that Git repository. So putting them in a dedicated directory of some kind, at least for me helps me remember not to check that in with all the rest of the YAML. Answer your question? Great, any other questions before we see how we use these in a container? Okay, and remember again, this application was not intended to go looking in some arbitrary place for its configuration file, let alone was it designed with the best practice now that we know of making your app fully configurable with environment variables? That would be a nice way to go. And in fact, there's another thing to know about secrets is that you can make these secrets available inside a container as an environment variable if your application is so inclined, in which it just so happens mine is not. Okay, so all right, we've been through step one which was creating our secrets. We've been through step two which is committing them in this commit Dutch. We loaded all those secrets into Kubernetes. That's what we saw in the dashboard earlier. We saw each of those secrets listed. Actually, as a matter of fact, let's just look at one, holy cow. I should have thought this through. Which one is Mongo? That's the one. Okay, so this is a secret and in the data area, it contains one file and it's that file that we saw on our line 14 of that YAML a couple minutes ago. And that's it. So I can reference this secret from a pod definition and make it available as a file mounted into my container at runtime. And we'll see exactly how that works. There could be multiple files here. For example, HDPD has four of them. So you can group them in whatever way it makes sense to group them for your app. Okay, so now that we have these, we're ready for step three which is actually using them in a container. So we're gonna look at our HDTPD resources. Now the first thing I have to confess is I kind of misled you. There is another resource called a deployment. We didn't talk about it, but you don't need to think about it too hard. For the purpose of this, a deployment is a replica set. It has some extra features that I'm not using yet around things like rolling back or continuous deployment. Not using it, it's a beta feature. Figured I'd give it some mileage. It effectively behaves exactly the same for these purposes as the replica set does unless I use those other features. To quickly step through this, let's just think about the difference between this and the pod definition we saw. We have this replica's line, which is interesting. So here's where we could specify how many of these do we want running at any given time. And here in this spec, we're essentially, everything from line 35 down is very similar to a pod definition except we haven't given it a pod name. We have a template name, and from this we will see names come out, so from line 36, oh yeah, that's the name of the container. We will see names come out like HTTBD dash, now some generated randomized identifier that Kubernetes comes up with. Otherwise it's very similar to a pod definition. Are we good with that? Okay, where does the rover meet the road with that secret? You can see our old friend here, VLP, Varlad pulp is also a volume that we have available in here, but we're not looking at that right now. We are looking for pulp dash config. This is a secret. So we're treating it just like a volume. We're treating it the same way we treated shared storage. We've referenced it by name and given it a name that we can reference it within this file. Just like before, we then say, I want to mount this within my container at this path. Here's where it gets a little bit interesting about how we wrangle making this configuration file available to my application even though it's in a non-standard path from what pulp normally assumes it's gonna be in. Everybody's good with the tie-in here of volume and mounting it in. Great. Okay, let's look at an actual Docker file. This is the base Docker file. Fedora, Zoll-Fedora 25 for what that's worth. We install some stuff. Okay, here's the critical two lines. I deleted our configuration file and I replaced it with a sim link to this var run secrets pulp directory where I'm mounting it into the container and that was really it. I mean, there's really nothing, all that fancy or interesting about it except that I did think about it and do it and I think that's what I did for other things as you can see down here even too. With the certificates that it needs access to, this key pair it needs access to, essentially deleted any artifacts that were defaults installed on the image and replace them with sim links to this location where I promise I'll mount them in the future. So again, these are not best practices. These are like mitigation practices for how to take your application that doesn't have containerized assumptions already baked into it and make it in a reasonable way deployable in Kubernetes. And if anybody has better ideas in this by the way, I'd love to hear them at the end. So any questions about this? Yes, please. I missed the last couple of words of that. Yep. Yes, that's exactly right. Yes. Ah, okay. So I think you're asking a sort of general question to clarify when does the injection happen of the secret and it's not in the Docker file. Into memory, right. So Kubernetes is the one that looks at that YAML file for the definition for the pod template and sees we've referenced a secret. So that secret's available in this pod definition. Now I've told you, take that secret and mount it at a specific path. And that path happens to be the same path that I've sim linked here. And so at runtime, when Kubernetes creates the pod, that's when it will create the temporary file system. That's when it will copy the file into memory in on that temporary file system and mount it into the container at the appropriate location. Does that make sense? Great. Yes, yeah, so all I've done here in this Docker file is say that somehow something in the future is gonna put the files you're expecting into those other places that we've sim linked. The details of who or what are not important in terms of Docker file. That's right, yeah, that's exactly right. Yep, yeah, so none of these files will exist on disk on the physical disk at runtime. At least within that specific container. Yes, please, excellent question. Yep, very good question. I tried that as a matter of fact. So the question is, why the heck aren't you just mounting the secret in the place that pulp expects to see it? And the reason is that doing so cobers the entire directory. And there are other files, other artifacts in here that where the defaults are totally appropriate and fine. So I really just didn't want to cober entire directories wholesale. Yeah, good question. And that's just another like, there's nothing elegant about that. It's just nuts and bolts of getting it done. But something that I find interesting about this particular container definition is there's nothing Kubernetes specific about this Docker file. This is like a pretty reasonable, and I was very impressed by that. Going through this for the first time and learning my way through Kubernetes that I didn't really have to do any fancy environment variable this and run a script to read them and stuff them into a reconstruct a config file at runtime just before I start the process or anything like that that you used to have to do in the past. It's actually very simple. We could probably take this same thing and run it in other ways without Kubernetes at all. Which is great. It's reusable. Okay, any other questions about this? Okay, I think that's all the information about secrets. We talked a little bit about the deployment. If I can find my mouse cursor, there it is. We're gonna look at a service and that's gonna be most of our YAML digging for the day resources. Okay, we talked about a service some time ago. A service is the network presence for a pod or for a set of pods in a replica set. So in this case, we've defined a service. Given it a name. And here's, this is a challenge that, I don't know what the heck, let's go ahead and talk about it now. This is a challenge I face that I love to have a better solution for than what I've done. But essentially, for starters, we've said this service is going to expose port 80 and port 443 from whatever is listening on those ports inside the pod. We're using TCP and that's about it. This selector name has to match the name down here of this deployment, online 26. But then the question came, okay, how do I make this service externally accessible? And the kicker with pulp is we really need to do SSL termination at the process running in the container. Part of that is that we do SSL certificate-based authentication. Part of it is also that when we make content available to clients, and this is the style of authentication and authorization that Red Hat does for Red Hat's CDN to protect its software repositories, it uses SSL certificates, client-side SSL certificates that have especially crafted blob in this X509 v3 extension that contains some data that's compressed but says basically what paths are you allowed to see? So in any case, pulp needs to be able to see that certificate directly and see that blob and read it and in case you're trying to access protected content, reason about whether or not you're allowed to. So I really need layer four to go all the way to the process running in the pod. I think I could use an external load balancer to do this but it seemed like a lot of work. So the poor man's cheap way to deal with that for the meantime was to use this node port feature. Node port's what I was talking about with, I think this guy right up here a little while ago. That is the feature that says I can connect any node in my cluster on port 30080, even from outside the cluster and it will route your request to the appropriate service and the appropriate pod. We're limited in port range when you use this node port feature, so that's kind of ugly. I at least was able to say use this specific port number instead of assigning one randomly. I'd love to have just a better solution out of the box without having to go through the trouble of external load balancers. But again, maybe somebody can show me how to make that easy to use an external load balancer especially if there's a way to do it with like a same default that just works. So in any case, that's a challenge that I faced and would love to have a better answer for. Any questions about this service? Okay, fantastic. So let's talk about a couple specific challenges and then we'll do a quick demo and be done. So one of those challenges was I discovered I'm gonna tell you a quick story. So I was banging away at this some time ago trying to get these services running in pods and at some point I had a pod that kept crashing and I couldn't figure out why. And so what I did was I ran this log command which I'll show you now. Okay, so we've gotten our pods. We're gonna pick one. Let's pick our worker. That's exciting. Okay, that's a Tmux exciting paced experience. Let's see if I can delete all these dots. Okay, so we have just looked at the standard output of this particular container. So I wanted to look at the logs for my crashed container. So I ran a command very much like this one and what I saw on the screen was like a one liner that basically said error can't access the log. Well that's strange. Why can't Kubernetes access the log for this pod? What's going on? Being new to Kubernetes at the time, I spent an embarrassing amount of time pulling my hair out trying to figure out what had gone wrong and as it turned out there was no problem with Kubernetes. Kubernetes was showing me the log and the sole contents of the log was an error from my process, my application that it couldn't open, it couldn't open devlog and that's why it was complaining. How sneaky is that to make the sole contents of a log file an error about not being able to access a log? So in any case, I got a couple extra gray hairs from that. But in the process of that I discovered that I had to do a workaround which we're gonna see right now which is simply mounting in these last three lines, 75 there, we are mounting in devlog from the host into the container because that's what it takes to satisfy pulp and get it to run. So as a result, it's log out but does show up in the system log on the node, like on the host that's running it. Which is not really ideal, like I said, it's a workaround and it got things working and in the meantime, while we're using this workaround we can try to improve pulp to not make that assumption. So that was one challenge and hopefully if one of you sees that one line error in your Kubernetes log output you'll not spend as much time as I did trying to figure out what in the world is going on. Okay, we talked about jobs. Let's just talk quickly about the actual initialization process for getting this app off the ground. This is all by the way, by the way, you can see where I normally live on GitHub. This is all on GitHub. You can see it, you can pull it, you can run it and there's a long read me explaining exactly how to do this stuff and how to load the secrets, how to generate the secrets. Our supporting services, okay and this is what we're gonna talk through right now. So we looked at that setup pod which just did that little bit of file system wrangling. So this is something that has to only be run the first time you deploy this application or any application that's similar that you want to deploy. In addition to this, we also have a tool that we call it pulp manage DB. I don't know why, but it's our migration runner. So anytime you upgrade pulp, if there are any migrations, data migrations particularly database migrations associated with that upgrade, you have to run this tool and it looks and checks, okay, what's the state of the database? How many more migrations do I need to run to get it up to the latest version? So you have to run that. For that, we created yet another pod that you simply run anytime you do an upgrade and you treat it the same way we treated that setup pod. You let it run, it exits cleanly hopefully and that's it and then you're good. And from there, our pattern is, I just created this up Dutch script and it loads all the other service and replica set resources into Kubernetes. That's it, you can access it. So that turned out to be a decent pattern. And the reason I'm pointing this out is I think the best practice for an application like this if we were doing it from scratch and doing it for this kind of environment is none of these services would care about the order in which they started. None of them would care about the initial state of does this directory exist on the file system or not. And we may not even care about not running updated software until we get an updated database. We'd want to do some kind of rolling migration pattern where we could upgrade images and upgrade the data as we go. Hulp doesn't have any of that yet, but we'll work on that stuff. And in the meantime, if you guys have an app like I do, like my team does that requires this kind of special care, the message here is like it's doable and these are decent patterns to get the stuff done. Okay, is there anything else I want to talk about here before doing a demo? Oh boy, we're basically out of time, aren't we? In fact, I've probably gone beyond time. Man, you guys just let me keep talking. All right, I'm just gonna stop it there and take questions. They will be available on the conference website. They should be available on the, I'm not sure who said that, but I'm looking in the general direction. They'll be available on the conference website soon. They have the slides. So that is good to go. Other questions while I get to the end. I'll come on. Yes, please. How do you make the database persistent? That's a great question. So I didn't show that, but we just did the same thing as for VARLIB pulp. There's another resource called VLM, which is for VARLIB MongoDB, same deal. It's up to you as a deployer to modify that persistent volume and put there whatever persistent storage you want. Now this gets into a bit of a philosophical debate and a performance question of trade-offs. Are you willing to run your database on network storage? Maybe not, maybe you are. Some are better than others, but that's a whole level of complexity. That the nice thing about this is it puts that decision 100% in your control and I don't have to worry about it. I don't have to write documentation for five different kinds of network storage. Yes, the question is, do we work with an ops team to get this deployed? And the answer is yes we do. There's more than one because it's deployed in more than one place. Then of course we have a boatload of customers that deploy it directly and a big upstream community. Very different, yeah. And that difference is the sprawl of complexity that happens among all of them. They have differences of like, are they running, are they using upstart or systemD? Could be either, so we as the development team we have to provide systemD unifiles and big piles of bash. And all the, like every one of those decisions and variations in what kind of infrastructure you have add up that we have to document and in some way support as we work with them. Right, and none of them are doing this yet. Yeah, none of, exactly. This is solving those problems in large part for me. Exactly, yeah. OpenShift includes Kubernetes as its core and then adds workflows and features on top. Exactly, but the next step will be the beautiful oh it's magic, here it is in OpenShift. So I'm absolutely moving that direction. Yes, please. Yeah, what benefits do I get using something like Kubernetes or OpenShift would be, I think the same or better versus doing it the old way. And it is that the old way, if you go look at our pulp documentation, we have documentation for all these different variations of infrastructure, what kind of proxy are you using? If you're using this proxy, you have to make this particular setting. If you're using that proxy of different setting. If you're using NFS, you need to think about these things in performance, tweak this and that. We have to document, if you're gonna use network based storage, what that setup script does, we have to tell you to do it because we don't have a tool there that does it automatically at install time. We have to document all kinds of things about which services are good to pair together versus which ones are okay to run separately. How do you reconcile which services do I want to run on which machines? What are the failure modes of these different services? This takes that decision away because you're just running some number of copies of these services somewhere in your cluster and you don't care where they are. The only guidance we have to give you is maybe how many of these should you think about running and why. There's just so many examples of things like that that we as a development and packaging team have to think about that this abstracts away for us and it becomes just a Kubernetes problem. Then it's, if you wanna provide network based storage to a pulp, you don't have to go read pulp documentation and do these pulp specific things to tell pulp where it is or whatever. You just read the Kubernetes documentation and do what it says. So there's one beautiful way Kubernetes owns it. We let Kubernetes be the experts at doing that and providing that. We get to focus on writing software. That's what I like. Any other questions? Great, I think we're way past time. So thank you very much for coming. No, I think we'll start. Okay, good afternoon. Good evening. I am support engineer at Canonical and for some time now Canonical has started to support commercially ZFS file system in the Ubuntu. So since it started in the Lex-D, we wanted to do a talk about how ZFS and Lex-D can work together. There's going to be first disclosure. I am a support engineer so I put quite a lot of love into the slides where I fix things, not make presentations so if I don't make a sense, please bear with me. Sometimes I have a tendency to drone on. I have some skill in seeing this look into my kids' eyes when they're just going blank. I'll try to catch this in you. But if you see that I'm not making sense, just shout out. So how many of you use or know and have or have ever used ZFS? Oh, that's quite a lot. On Linux, a little bit of history. ZFS was first developed by Sun Microsystems for Sun Solaris and first released, I think, around 2005. And at the same time, the source was released as a part of Open Solaris project. It's been later ported to FreeBSD. There was an effort to port it to OSX and several efforts to port it to Linux. In 2010, Oracle bought Sun. And at this time comes very important distinction between ZFS and Open ZFS. And they are very similar because they share the same code base up to 2010. But from that point, they are two separate file systems. They are not compatible. And this is sometimes confusing because when you're looking for documentation for ZFS, you're quite probably going to find Oracle ZFS documentation. Oracle ZFS has some features that are not present in Open ZFS implementations. So you're not going to get them in Linux or in FreeBSD or the Open Solaris for Illumos. Namely, I think the most cited one is the native encryption. Open ZFS is working on it, but it's not present yet. And you can't easily, well, you can't share pull between Open ZFS and Oracle ZFS. You can't create pull. Well, up to some point, to some version, you can, but you don't want to run the ancient version of the pull. So if you create a pull on Linux, you can move it to FreeBSD and you can move it to Illumos. You can't move it to Oracle and the other way. So what is ZFS? It's a very feature-rich file system that works, that is a copy-on-write file system, which means that when your file system is creating a new block of data, it's not overwriting the existing block of data on the file system. It's writing it at a new location. And it has quite a lot of features like building rate and volume manager, checksumming of data and metadata, confusing naming of the building blocks. You name it, it's there. I'm going to describe these things shortly. So when ZFS was created by Sun Microsystems, they decided to create new names for all things because why not? So the building blocks are physical virtual devices, the VDEVs at the bottom of the slide. These can be LANs, these can be files, these can be physical drives in your system. Each one of these become the physical virtual device that you group together into the logical virtual devices. The logical virtual devices are the middle building blocks and they can describe redundancy level, things like that. And the logical virtual devices are grouped together to create the pool. The pool is a pool of drives that give their space to all the file system that will be hosted in the pool itself. So each file system is free to take blocks from the whole pool and can grow up to the whole free space in the pool unless there are some administrative actions taken like reservations or quota. So when we have VDEVs, when we have created the pool, this is about snapshots and clones and I didn't really know how to show it nicely. So let's assume that A0, B0 and C0 are data blocks on your drive, on your file system, written at some point. And at this point of the time, you create a snapshot of the file system and later on your file system, you override the data which means that the data is actually written in a different location in the pool. What happens is your file system starts, adds the blocks A0, B0 and C0 to the table and this is the table that describes your snapshot. So your snapshot is only as big as the changes that get introduced to the file system itself. This is one property. The second property is that snapshots are blazing fast to create. When you give the comment, ZFS snapshot, some file system, it's like a 10th of a second and it's created. But from this property comes COVID that I will mention later about space consumption monitoring. It's gonna be, I will show, I will describe it later and I think I have it in a demo. So in the pool, when you create the pool and I have two pools in this file system, data and R pool. R pool is actually my root file system in my notebook. I run very unsupported configuration in my notebook. So the R pool is the pool and you get automatically created a file system that's mounted at the point. If you don't change it, it's mounted at the point that's exactly the name of your pool and each of these directories are actually file systems. So your file systems can be nested. R pool is a file system, root is a file system, Ubuntu is a file system, Home is a file system, Troke is a file system, music, videos. And file systems can have properties on their own like a separate compression mechanisms, the duplication or they can have inherited from the parent file system. So if you turn on compression for R pool and you don't disable for child file systems, which file system you create within will be compressed with the same algorithm that your parent file system. That's a slide that's supposed to show the space consumption. I don't know if it comes clear. What I wanted to show is that when you have a file system that's nested within the parent file system, both of them can consume whole space in the pool. They are not being, they're not boundaries. Like if home is taking some space, Troke can become bigger than home unless I create a quota or reservation for any of them. One cool feature of ZFS is you can create logical device that looks like a physical drive, Zivul. When you create a Zivul, you basically create a block device that you can use locally on your system, like I use it for my swap. You can export it using Fiber channel or iSCSI and consume on the client machine using just like a normal storage array LAN. And that's actually quite popular use for storing virtual machines on ZFS arrays. So space allocation works differently in ZFS than it does in traditional file system like X4, X4 or XFS. There is no pre-allocated space in the pool. You don't create iNodes, you don't do this MKFS step. Drives are being labeled as a part of the pool and the space is being allocated as a part, data is going to be written to the pool. This is one advantage, which is when you replace a drive in a redundant set, only the data is going to be recreated on the new drive. In the traditional array, your whole drive is going to be mirrored again or rebuilt if it's write five or write six. In ZFS, only the data is going to be replicated. It's resilvering in the ZFS lingo. So there are several common file system layouts, the pool layouts. This is a single VDEV pool built from six drives, from six physical devices. It can be rate Z, it can be rate Z2. It can be Stripe, but it's not very sane to do a Stripe. When you can grow the pool by adding another VDEV and there is a constraint, well, a best practice. It's not a constraint per se, but it's a best practice. You grow the pool by adding VDEVs, logical VDEVs that are the same configuration as the existing VDEV in the pool for the performance reasons. You can mix them, you can create a pool with rate Z VDEV and later on add mirror VDEVs, but that's gonna be horrible from the performance standpoint. So there are not bulletproof things and there are not things that are ideal. There are several things that can be confusing or dangerous when you're using ZFS. So first thing is space utilization reporting. DU tool is not very good tool to track space utilization in ZFS because it can get confused about file system that have snapshots or have compression. You should use ZFS or space command which I will present to you shortly that it has quite verbose output and I recommend reading on it. One of them, when I was working at a software defined, software defined storage company, Nexenta, and one of the most often open ticket was about where is my space gone. People didn't know about snapshots. They forgot to destroy snapshots and they thought that when they are removing files from file system, they are free in space. Well, if you have a snapshot, you're not free in the space. The snapshot is going to take the space. DU is not going to tell you about that. My personal most favorite thing about ZFS is ZFS Azure File System. It's a cheap and quick way to create a cloned version of your operating system. So when I have my Ubuntu installed, I have a clean version without absolutely anything related to graphical environment and then on the clone, I installed Kubuntu if I ever get tired of it or I want to upgrade it to something very better. I will just create another clone. I will upgrade it. If I decide that this is broken, I will reboot to the clone pretty installation. So it's not very easy in Linux. It's working perfectly with FreeBSD and Illumos, it's not very easy in Linux. This is not automated in Linux. I think there are several attempts to do that, but it's cool. I recommend it. The second point, I do this personally. When I create the system installation, I do the snapshot, I send it through the Azure as a send receive stream to my backup server. And when I need to recreate it, I just boot from the stick and I send it back to the pool and I have my system restored, which is very fast because send receive is basically sending a stream of bits over the network. It's not like rsync that it's sending files. So here comes the second thing when I'm trying not to make a mess of containers. There was a very good talk about containers previously. So I shouldn't be talking quite a lot about it. So one of the containers that Canonical uses is LexD and it's machine containers, which is the separate isolated user land running on a shared kernel with use of the C groups and namespaces. They have this nice feature of being used very, very fast because your only overhead is actually the processes running from the separate user land. And they don't emulate hardware and they don't boot a separate kernel, which gives them quite a performance boost. Our choice of hypervisor is LexD. This is a slide that I borrowed from Dustin Kirkland. So LexD is the next version of the Linux containers afford from Intel and right now also from Canonical. LexC is the actual hypervisor layer, the library. LexD is the demon that exposes API to allow you easier configuration and manipulation of the containers. It also allows you to do remote management and the migration between hosts. So since Ubuntu has ZFS as a supported file system for some time and LexD, it's a logical thing to try and tie them together because ZFS has these cool features like snapshots that are fast to create, clones that are fast to create this remote moving and replication using the ZFS send and receive. So ZFS was chosen as a default storage backend when it's available in the system. You can also have it if it's not available on the files, but that's not gonna be very good configuration for anything else that just looking how it works. So let's go for a demo. I have two visual machines running here. Let's see the list of running. So I have one running, I love this, automatically created names, Valetil. So this is Ubuntu 16.04 container running on ZFS backend. All the images, all the configuration data is being stored in the pool that's called LexD. So when I have a container running on the ZFS, the file system that actually contains the container itself is Valetil ZFS. So I can run, you will see that below the Valetil ZFS file system, you have a snapshot of this file system and it's a ZFS native snapshot. You can do everything that you want that you normally could do with ZFS with this. Also you can send it remotely for storing, but if you want to migrate your container, you can do this using the command. I think I've been in the move. And the other machine will show you where is my pointer device. You will see that there is ZFS receive running in the background. So it's not actually right now very, very flawless in my installation because it's for reasons unknown for me right now at this very moment, telling me about this error, but when you're looking at this... This hypervisor has this container working right now. So when you're having LexD on the ZFS backend, you can dedupe it, you can compress it, you can use all the features that ZFS has for you. Also the external devices for the rides buffering and for the cache, you can have this second level of the cache, several hundred gigabytes of it just slap SSD device into your system and add to your pool. When you're using it, there are several things to pay attention to. I mentioned the space consumption. So this output, you would need to read on it because explaining it on the slide is not very easy. Basically this command will tell you where all your blocks of the space in the pool are gone. This may be very easy to understand in this setup because I only have several file systems. When you create thousands of them, and this is a common configuration to have a thousands of file system, and each of them have several snapshots and some of them have quotas and there are two kinds of quotas in ZFS. One quota will account all the consumption of the file system and its children and its snapshots and the other one will only account the space used by this file system and will not account children and will not account the snapshots. The DU command is not, sorry, DF, is not going to give you quite a lot of information. Basically, you don't know where your space is gone and these numbers told to you by DF may be inaccurate because DF doesn't understand the duplication and it doesn't understand the compression. One thing to pay attention to is the duplication. When I was working at the Nexenta, I have installed and maintained storage arrays. The duplication was a big red flag. If someone was turning on the duplication, the first thing to tell them was no, consider it really, think about it. What does the duplication do? So it does the duplicate blocks of data. If you have data that have similar, the same blocks, only one copy is going to be written to the pool, but there is going to be added a structure to the duplication table that will track this block of data. Each time you need to read this data, it will need to be retrieved from the drive being pointed by this structure. If your data is very well, the duplicate label, your memory consumption will go very fast up. You can run out of your memory pretty quickly. And you may run out of your CPU power very quickly and actually we had one customer that has put his swap device on the deduplicated file system, which means that when he ran out of the memory and it was trying to swap itself, it had to run to these DD tables that it couldn't store into memory. And deduplication is one way array of some sort, because you can turn it on, you can turn it off, but the data that has been deduplicated, stay deduplicated until you move it out of this file system so you can recreate the pool and restore your data from backups. And each time you skip testing your backups, a kitten dies, so please think of a kittens, do your backups and test them. If you don't, all you can do is, well, if you have a pool that has 126 drives in redundant configuration, rate Z, which is a rough equivalent of rate five. And one of your drives in one of these VDEVs goes bad and you need to replace it and these are like six terabyte drives. And you replace this drive, it starts to receive, which is the recreation of the data on the new drive. And the Resilverine is a background process that is very nice to the system, so it's going to take very long and in busy storage arrays, we've seen Resilver going for weeks. So if you have drives from the same batch, how high is the possibility that you're going to lose second drive in the same VDEV? Very high. So if you don't have backups, all you can do is your light candle, start singing a space oddity and hope for the best. So do your backups, don't put the same batch of drives in the same pool also. I think people that work with storage will know that. Actually, I did it very fast. I thought this is the end of my talk. Are there any questions? So the question is, if you turn on the duplication, is the memory always consumed by those DD tables, right? It is. There was some work going on to be able to move out DDT to separate device like a separate SSD drive. I don't know if it's been done. I don't think, I don't remember any match. I'm sorry in the main tree about that. This is why the duplication is so difficult and you need to be cautious because if you have the duplicated data, your memory is going to be consumed. It will just be read from the drive upon the pool import and it's going to be there. Other storage arrays that give you the duplication like EMC, NetApp, they turn off the duplication at some point. So if you reach some amount of data that is being duplicated, they just turn it off for you. And it's really a courtesy. Because there is no implementation of the duplication in the wild that I know of that would do this without a problem. So if you have ZFS available in your system or if you have three drives available into your system, you can do sudo lexd init. Ah, I have, sorry, I won't show you that. So if you have drives available or a pool available, you do lexd init step. It will ask you if you want to store your containers in ZFS backend or dear backend. I know lexd can also consume better FS and LVM and it's going to have the same features. So you can slap together, so the question is about striping drives. You can slap together drives in the pool and don't give them redundancy level. It's like, right, zero, right? So this is a continuous space of all drives that gives you no redundancy. The problem is once you lose one drive, whole pool is gone. If you want to restore the data, you need to contact one of the several magicians for ZFS that will do this for another magical amount of money, you will look for your backups. If you don't have them, you will light the candle as seeing space oddity. So ZFS does striping for you if you create, if you have, so I will close this one, I just powered off. So this talking about backups come from a professional experience because ZFS was at some point positioned as a bullet profile system and the file system itself is always going to be consistent but your data not necessarily. If you lose the power to the system and it reboots, the way that ZFS is implemented, your file system won't need to be file system checked. There is no FSCK command for ZFS but your data may not be, may not be consistent because the application is crushed, right? So I have a separate visual machine that does have a pool. Sorry. Does it have Z pools, the Z pool list? Our pool. Okay, so I created myself a script that I wouldn't need to remember the actual. This is actually going to work as fast with thousands of drives because I've done it. You're going to want to script it. So you have the pool data that is actually a, did I just, so this is actually a pool built from three mirrored VDEVs and within each VDEV, the file system, so the file system when you're writing data to it will stripe, will try to load balance the data around the VDEVs. In the mirror it's going to be pretty easy because both drives are going to contain the same data, right? In the right Z configuration, it's going to spread data evenly around the drives. Sorry. When you add another VDEV, it will spread the new data around the new VDEV and its drives. Well, this is another caveat because it doesn't have a rebalance. So only the new data is going to be balanced to the new VDEVs. So if you have all data and would like to have it spread among more VDEVs so it's faster to read and update, you will need to move this data around. It's not going to be read for you. Well, it's not going to be re-blast for you, just like that. Yeah. So this gives you the most important information available space. The next column is how much space this given file system consumes. And there are columns that will tell you how much space is being referenced by a snapshot, how much is being used by the data, by child data sets, and the data set in the file system is everything that contains data. It's a snapshot, it's a clone, it's a Zivol, it's a file system itself. The last one is space used by childs of this file system and used by reservations. Reservation is, you can constrain your file system by giving it a quota, right? It won't grow beyond the quota and that's normal, but you can give it a reservation which is you guarantee this file system to have at least this amount of space, which means that the other file system can grow so much that they take this guaranteed space. So this is going to be shown in the previous to the last column. This is really useful if you're having quite a lot of file systems and you start to snapshot and you start to try to figure out when you need free space, when you can gain it fast and easily with destroying a snapshot or destroying a clone or destroying a file system that doesn't have snapshots, things like that. Any other questions? Yes. Okay. I haven't done it for a bit of time so I will need to remind myself no, it's gonna be like pull far and I think it's going to set a fast set. Quota let's say three bytes or pull bar. So the quota has been created. Actually, the quota is lower than the, yeah, it's using. If you want to see the quota, you need to interrogate the file system by using get and property. This will list all file system and it will give you this property for every file system in all pools in your configuration and in this configuration you can see that our pull bar has a quota set to four gigabytes. So right now I couldn't give it, I think I couldn't give it because there's only, let's set the quota to one gigabyte and let's set to 500 megabytes to our pull TMP. Our pull, sorry, our pull for our TMP. So one thing to also note is that setFS list doesn't list snapshots in a default output. You need to use this TOL or T snapshots if you want to see snapshots. It's also a point of confusion because people that discovered the setFS list all don't remember or don't didn't, up to some point setFS list would list all of the file system and snapshots. But if you're having like 7,000 file system and each has like several snapshots it's going to take quite a lot of screen. So the decision was made to remove snapshots from default listing but if you don't know about that you will still be puzzled where the snapshots are gone. Yes. Yes, with some additional things like creating the control structures for the LexD because the container is quite more than the file system itself. Yes. So you can do the rollback. If you snapshot the file, the container it's the setFS snapshots underneath. If you do LexD rollback it's going to actually rollback from the snapshot itself. Yes. Well, you can set file system properties using the LexD command. So if you set quota on the root file system it's going to actually use the setFS quota and things like that. There is the Nova LexD driver for Nova which allows to administer LexD containers through Nova and I don't know if right now it allows you to do this but it will in the future allow you to control the file system itself from the Nova command. So if there are any other questions I'm available after the talk let's wrap it up. Thank you very much for showing up and bearing with me. Thank you guys.