 I created a JSON file that basically describes what's in the root of fs. Then I tie the thing up together. So I use the Tire Tape Archive tool in Linux, and I tie those up. Now I can have what's called layered images, which is basically I'm going to install something on top of that root fs. So I tie up the first one, and I install something new. Now I tie up the difference from the original code to the new one in the TireBall. And I created another JSON file that modifies the original JSON file, and I tie that up, and that's a layered image. So there's just nothing more than TireBalls and JSON files. And the next thing you do is you take these TireBalls and you put them out on a website. And in this case, we call that website a container registry. So a container registry, and then we build a protocol to pull those images back and forth. So when we came out with these TireBalls originally, there was no standard. There's no standard for it, and everybody was just using the de facto standard, basically what Docker did in the beginning. And so what they did in the beginning, everybody was fine with that for a little while. And then all of a sudden, CoreOS came along. And CoreOS had a different technology. They had a technology called Rocket. And what they wanted to do with Rocket is they wanted to be able to support out their own application container images. And what they decided to do is they came out with a standardize on it, though. They didn't want one company to be able to control what it is. And if you think about the problems of controlling what the data, the data images, just think of Microsoft. So Microsoft came out with .doc format back in the 1990s. And what Microsoft would do is every single release of their operating system, they would basically change .doc format. So all of a sudden, people couldn't send documents around unless you bought the latest Windows, or the latest Office products. So if you had Windows 95, and all of a sudden Windows 2000 comes out, all of a sudden, people would build documents on Windows 2000, and then you wouldn't be able to view them on Windows 95. And of course, Microsoft also was able to get Libre Office and Open Office and all these other tools weren't able to interoperate. So what we wanted to do is get a standard application. And CoreOS said, we have to have a standard on what this image format was. And so they came out with the App-C spec. Now, App-C spec was different than what was the Docker image. So there was a problem with that. I prepaid for the next one. So all of a sudden, the big industry companies, like Red Hat and Microsoft and Google and IBM, basically said, this is going to be bad. What's going to happen here is all of a sudden there's going to be multiple different specifications. So if you want to build applications that are going to ship in the future, you're going to have to have an App-C version. You're going to have to have a Docker version. That was my second one. And so we really didn't want everybody having to ship different type of container images. So everybody got together and said, we're going to form a standard, and that was OCI. So OCI stands for Open Container Initiative. It was a standard spot originated by Red Hat, Docker Inc. Don't have to pay for that, it's the company. And Microsoft, IBM, Google, and maybe CoreOS and Red Hat, and maybe a couple others. But anyways, they got together and they standard it. And as of last December, they came out with the OCI Image Bundle Format. It's basically defined what goes in an image. So a lot of times when you talk about images, for now I want to say it's calling on the D image, call it an OCI image. It's a standardized image form. It's based on the original D image. But everybody agreed to do that. So CoreOS actually triggered this long before they were acquired by Red Hat. So the next thing you need to do, oh, segue. Next thing you need to do is basically pull down an image. So this is one of the tools I'm introducing today. It's been around for a couple of years. It's kind of weird I'm introducing now. It's called Scopio. How many people have played with Scopio? OK, good groupie. Scopio was introduced a few years ago. And that whole idea is originally what we wanted to do is basically go out to a registry, container registry, and look at that JSON file associated with the image. And if you think about some of these images, I've seen JBoss images that are like hundreds of megabyte size, getting up to a gigabyte size of these images. So the only way right now with the D tool to look at one of these JSON files associated with an image is actually to pull the image. So do you want to pull a couple hundred megabytes just to look at this JSON file that describes the image and basically say, oh, that's the wrong image. Now you've got to throw it away. So what we wanted was basically to be able to do a D inspect remote, dash dash remote. We did a pull request to upstream. And they said, no, we don't want to clutter up the CLI. What we want to do is, he said, we don't want us to do that. But he said, it's just simple. It's just a web service. Just do web protocols, and you can pull down the JSON file. Build your own tool to do it. So we built a tool for that called Scopio. So Scopio, which means in Greek, remote viewing, was a tool to basically look at a remote site and just pull down that JSON file associated with the container image. So the guy that did this on my team, Antonio Madaca, actually decided to go further. So originally, he just did inspecting images to pull down that, but then he started to say, well, I can build the entire container image protocol, the ability to pull these images back and forth between registries. And he basically built Scopio into a tool that could move images around. Now Scopio's become really cool because it can actually transition from different formats. So you can actually copy down an OCI format, store it inside of a Docker daemon. You can actually pull it to local directory. She can translate from the original image format to the new image format. But the really cool thing is you can actually move images from one container storage to another, or one container registry to another. So a lot of people now are using Scopio to actually move images around their environment. And we're getting a lot of uptake in this. So we were working with CoreOS to try to get CoreOS to embed Scopio into Rocket. And they said they don't want to embed a tool, a CLI tool into Rocket. What they wanted to do is basically just use the library that Scopio was using. So that library became Container's Image. So GitHub Container's Image is now a library for moving these OCI images and old-fashioned images back and forth around the environment. You're moving between registries. And you don't need to have any root-based tools. So you can basically sit there as a user and copy from, say, my internet-based container registry and copy it to my internal container registry or copy the files locally. So it became a mechanism for moving that image from the registry to the host. The next thing we needed to do is basically take that image and basically explode it on disk. In order to run an application and container, we have to have that rootFS re-established. So we take down those one or more layers and reassemble them. The way you do that in Linux is with things called copy-on-write file systems. You might have heard of Ovalet, DeviceMap, or ButterFS as a whole bunch of them. So we basically took a lot of the tooling that we had worked with the upstream and built it into a little tiny library called Container Storage. So the ability to explode images onto a copy-on-write file system. And the last thing you need to do when you run a container, run around a container is you actually have to basically, what does it mean to run the container? And luckily, OCI is standardized on that. So this is a standard mechanism for running a container. And that was also specified last year, by the beginning of last year, as the OCI runtime specification. So an OCI runtime specification says that I pulled down the image. And that image had that JSON file that tells me how to run the container. Well, I also have input from the user. And I might have input from whatever tool is putting this all together. And I basically want to take those three inputs and combine them together. So a user might come in and say, I want to run in privilege mode, or I want to run without this capability, or I want to volume-mount in this stuff. So we need to take the user input, the application that's setting it all up, or the container, or what I'm going to call a container engine. And then the last step is actually to take the stuff from the image. And it munges all that together and basically writes out another JSON file. So that JSON file becomes the OCI configuration. And it's part of the runtime. The OCI runtime spec defines what's in that JSON file, as well as what's in the root of fs. So it says, you put a root of fs on the system, you put this JSON file between it, and now I launch an executable that understands the JSON and configures the system. DarkerInc basically gave the first tool to do that called RunC. So RunC was the first implementation, so the de facto implementation of the OCI runtime specification. Just about every tool that runs containers now in the universe uses RunC to create the container. So RunC. So this are the steps that you needed to do to run a container on your box. Everybody agree with that? Anything missing? So we don't need a big fat container demon to do all those steps. And I have a big pusher against the big fat container demon, because the big fat container demon, here we are five years into containers, and there's only one way to run containers. Everybody knows that. If I ask you, how do you pull an image, you tell me the deep pull. If I ask you how to push it, you say, how do you build it? Debuild. And everything goes through this one image. The problem with the big fat container is the biggest problem with it is we get the least common denominator of security. So needing to build a container is much different than needing to run in production. I need a lot more privileges to be able to write to the container image than I do to basically when I want to run it, say, under Kubernetes. So what we want to do is basically take these pieces apart and reassemble them and redo different types of tools for running containers, each one with the least privilege. Now later on, there's going to be a talk that talks about some of the security features that we've been able to do by breaking apart the big fat container demon. I work for OpenShift. So everything that I do tends to be either for open source or I am instructed to do it for OpenShift. So when I look at what OpenShift needs to do to run containers, OpenShift is Red Hat's Kubernetes, you know, our enterprise version of Kubernetes. And really what OpenShift is, plus, plus, OK, we have other features and other things we've added onto Kubernetes. But basically, if you want to get, if you come to Red Hat and you want to buy Kubernetes from us, we will sell you OpenShift. So what does OpenShift and Kubernetes need to run a container? They need those first four things, but they need CRI. So there's a little story here. CoreOS, again. CoreOS came along and they wanted the original version of Kubernetes embedded Docker all over the place inside of the code. CoreOS came along and they said, we want to support Rocket inside of Kubernetes. So they wrote huge patch sets, and basically sent them upstream to Kubernetes that basically said, if Def Rocket do it this way, else do it the old way. And the Kubernetes developers at the time of the upstream Kubernetes said, wait a minute, we can't do this. Because if we do this for Rocket, then all of a sudden God and some other container engine's going to come along and say, we want you to support our container runtime as well. So what Kubernetes did is they wanted to turn it on its head. And they basically said, you guys implement a small demon and we will talk to it. And we will talk to that thing called via CRI, so container runtime interface. So Kubernetes defined an interface that it will talk to container engines with. And then if the container engine implements it, Kubernetes will very happily do that. Next thing that Kubernetes needs to do when it talks to a container engine is it wants to tell the CRI that needs, well, it's going to tell the CRI it needs a container image. CRI needs to pull the image from the container registry, needs to store it on top of a copy on a right file system, and finally needs to execute an OCI runtime. Anything look familiar from the first part? So we have all these tools. Another one of my members of my teams, when this happened, basically said, we could take our standard building block tools here and build our own CRI. And that thing was called CRIO. So CRI stands for container runtime interface for Kubernetes, and the O stands for open containers, or OCI, open container images. So we developed a small, lightweight deep and that basically just implements what's needed for Kubernetes to run containers in the environment, and we called it CRIO. So CRIO is an OCI-based, I already said that. So scope is totally tied to Kubernetes. So the CRI only supported uses contained as for Kubernetes nothing more, nothing less. Let me beat this to death. CRIO loves Kubernetes. Kubernetes is it. CRIO is a, you know, she's very loyal to her man. She's never going to go anywhere. She might get, Messosphere comes in and says, you know, start shooting around her and stuff like that, but she says, no frigging way. And we got here, definitely not, not even in a ballpark. This, no way. And definitely not, okay? CRIO is only, all she cares about is Kubernetes, okay? It's just Kubernetes. So overview of additional components. So there are additional things we needed to be able to do CRIO. And we'll talk a little bit about those. So one of the things we needed to do is basically translate the input from Kubernetes. Kubernetes has its own specification of what it wants to do to run a container. But we have to translate that specification to OCI runtime specification. So there happened to be a tool inside of OCI called OCI runtime tools, actually, written by one of my guys. But basically it can take input from users, a library that'll take input from users and generate an OCI runtime specification. So we use that inside of CRIO. Next thing we needed to use is this thing, again, CoreOS comes along. We needed a way to configure networks. So networks is kind of a strange part of this whole container world. And now we needed networks to, you know, we want to allow different virtual private network tooling to come along and build and be able to plug into the container environment. There's lots and lots of companies building their own, so either hardware-based or software-based container networking. I mean, so CoreOS is defined a standard called CNI, which is Container Networking Interface, to use to allow other people to plug in. And so they've been used with FlannelWeave, Open Daylight, Open SDN, I think OpenShift has their own version. So lots and lots of people are building container networking interfaces. Lastly, to run containers, we need a way to monitor the container. So when I launch a container on the system using an OCI runtime, it just goes out and configures the kernel, you know, those C groups and security settings and namespaces, launches the process and then goes away. So at that point, there's nobody watching the container. There's nobody sitting out there saying, did the container exit? Trapping it, and so we needed a tool to basically watch the container. And basically that's called conmon. We wrote it in C because we want it to be as lightweight as possible. And it basically monitors, it takes care of logging, what's the output. So when you run containers, you usually watch what's going to stand it out in standard era. It handles the TTY, it's service, serving attached clients and it detects who basically figures out if the container died and then writes the status to a file. So now any container engine that comes up can actually go to conmon and basically figure out what happened or conmon will exit with the container, but it'll record the data that happened. So the pod architecture. When you're running Kubernetes in your environment, Kubernetes runs pods, it doesn't run containers. Now pods are basically one or more containers running together. And the pod is also this idea of what's called an infra container or a pods container. And what happens when you launch a pod under Kubernetes is it launches this little tiny container program that basically goes to sleep. It just starts up and then it attaches all those namespaces to it. You have to have a process in the original namespace and then it will add containers to it. So if you looked at it under cryo, what happens when you launch a pod? We launch the infra container, it has one conmon listening to that and then one or more containers gets launched. So basically this is what the whole infrastructure of pod infrastructure under cryo. So we talked earlier about how much cryo loves Kubernetes. And the way we're trying to prove that is basically we have the biggest test suites. Every test suite we can find, we run before anything gets merged into cryo. So we don't want cryo to ever break. No new features ever break Kubernetes. So right now we're running, I don't know, it's probably much more than 500, but this is nine full test suites. To get a pull request into cryo at this point is pretty difficult, right? You have to jump through hoop. You have to make sure that everything is possible state. No PIs emerge without everything passing. Cryo came out, was fully supported as of last December. My engineers wanted to call it 1.0. So we released it back in December. I hated the fact that we called it 1.0. So the next release, we called it 1.9, which works with Kubernetes.1.9. Then we released 1.10, which works with Kubernetes 1.10. Anybody has any guess what works with 1.11? Yeah, okay. So 1.11 works with Kubernetes 1.11. We are stocking the hell out of Kubernetes, okay? The goal, right now, I'll talk about that in a minute, but basically the goal for OpenShift 4.0 is that we'll support cryo by default. Right now we support both cryo and Docker under the covers, but the goal is at 4.0 to support cryo by default. Cryo is now running a lot of OpenShift Online. So if you go on OpenShift Online, you're using cryo. If you go to Microsoft and you wanna launch a Cata container, you're using cryo, okay? So cryo is actually getting out there. But in a lot of ways, I always tell people, I want cryo to be something you ignore, right? The real goal here is to make running containers in production boring, okay? I often ask people, they say, all right, you use this in the back end, you use this in the back end. I ask them, what file system do you use? I don't know what file system I have on my laptop. Is it EXT4, is it XFS? I don't know and I don't care. The only time I care is when something breaks. And so our goal here is to make this thing just blend into the background. It's just a feature underneath Kubernetes. So what else does OpenShift need to do to run containers after it uses Kubernetes? Well, it needs the ability to build images. OpenShift has this concept called source to image where a user just checks something into Git, does a push and all of a sudden the container poops out the back end of the OpenShift, right? So we need that container image to come out of the end. So we needed a way to support that for OpenShift and we need the ability to push these things to container registries. So this guy right down here is an Allen Dibuy, is working with me last year at DevConf Check and we're sitting there together and he's charged a container's image and I always kept on saying to him that I need a tool for building containers. I wanted to core your tools for building containers. I said, you know, it's just a root FS. I need to create a root FS, tie it up, tie up some JSON file, put it together and build it. And I said, I needed some copy and write. I said, you got that, you got container's image, could we throw together something to do that? I told him that in the morning while we're at DevConf and by that evening he did a five minute talk showing how he would build container images using container storage. And so he said, what do you want me to call it? I said, I don't care what you call it, just call it builder. What difference does it make? And then he came out with this. And so we, and the last thing here, this is not the current image, but this image was the first image we put out of it. This is a Boston Terrier and supposedly in a hot hat. As soon as we tweeted out that we had an icon for this, people came back and said, why do you have a dog with tidy whiteies on his head? So I still live in it, so it's much more of a hot hat nowadays, but I like to leave it just for that joke. Okay, so in the coloring book, hopefully you guys picked up, if you don't come and get me afterwards, this is what builder is represented as, as a dog, and I think it kind of looks like Nalan Docher. Okay, so builder came along, and again, my idea was core utils for containers. We wanted to have a simple interface for it. So we needed to be able to pull an image from a container registry to the host, and so we built builder from Fedora. So what this does is it goes out and uses that container image to go out to a container registry, pulls down the Fedora image off of the container registry to the local system, puts it on top of container storage, and then creates a builder container. Container's a way overused word in this world, but basically it has all the data that's associated with a container. And the next step we need to do is we need to mount the container, right? I want a mount point. I want that root of fest mounted on my system, and I just want to be able to write to that root of fest. So we build a mount, and that basically brings back a mount point. Okay, now the segue. Anybody ever hear of this command? Anybody know what this command does? It copies content from a container image to the host. Or it copies stuff from the container, from the host into a container image. Really cool, huh? Really cool. I saw that and I said, I'm gonna steal that idea. So I decided to go off and build my own tool, and I called it copy. And I put it into core utilities on the system, and it really works really well. But once I saw that work really well, I decided to build another tool. So I built a tool called DNF. Sometimes you call it yum, I used to call it yum. I might call it yum again in the future. But basically with this tool, you can actually install content into a container root of fest. So I just added dash dash install root, and you can basically install Apache into a empty root of fest and do it. But I said that's cool. I'll invent another tool. I invented a tool called make. So with the tool make, I can actually do this thing called destor. I decided to come up with this concept of destor, and I could basically set it up to point to a root of fest. So basically what I'm showing here is you can basically use anything on a Linux system to actually populate what's gonna go into your container. So the next thing you need to do is populate that JSON associated with the container image, and we have a tool called build a config. And so you can put things like entry point, environmental variables, all this different stuff that you basically put into a container's image to identify what the container is. And then finally, we wanna take that container image and actually have a container and create an image, right? Create an OCI image on the system, and so that's build a commit. And then of course I wanna be able to push it somewhere, push it to a container registry, so we have build a push. So with this tooling, and by the way, all this stuff here, no big fat container deeming, right? I don't need a deeming to do any of this stuff. So I can do it. Not only that, I'm showing it's running it's root here. With the current builder, we can do it as a non-root. We can do all this stuff, taking advantage of user namespace, we're able to do this all as a non-root now. Simultaneous, all right? We're gonna try it again, everybody at the same time. One, two, three. Damn it, please! What about the Docker file? Glad you asked. So builder also has to support Dockerfile, okay? Dockerfile has become the sort of de facto standard. I like to think of it as a really crappy version of Bash, but of shell script. But basically it's become this de facto that everybody wants to support. So we actually had a support with builder support using Dockerfile. So we built a command called builder using builder, build using Dockerfile, and basically has the same syntax that you would expect for running builds on it. But of course we're engineers, so we're all lazy, so we actually have Build-A-Bud. So Build-A-Bud, and you know, Anheuser-Busch is not involved in this decision. But basically we can build container images using Dockerfiles. Well it's not called Build-A-File, but I decided to write this really nice scripting language and I called it Bash. So after I wrote Bash, I basically have that, you know, lots and lots of tools out there to build container images. And the whole idea here is that what I really wanted to build up is to basically provide a library or low level command line tools that other people could build higher level container languages. So we want others to build it. We're looking at OpenShift, is looking to basically replace, right now the source to image is actually injecting the darker socket into the containers to run to do builds. A lot of times I tell people that that is probably the most insecure thing you can possibly do. If you want to give people access to the darker socket, I tell you to just go in and set pseudo to known root and turn off your logging. Because if you give a non-route user access to that socket that's what you're doing. If I go and do evil things on a system as via the darker socket, I can then destroy my container and there's no record of me ever doing anything on your system. So never give out that socket to a non-route user. So what we want to do with source to image is basically stop injecting that. Lots and lots of people are out running container builders inside of Kubernetes. And what they're doing is they're volume mounting in that socket, okay? Which is equivalent to giving them root on any host that are doing it. So we want to be able to do builder inside of source to image and stop injecting the socket. Ansible containers is also looking at potentially using builder to replace and basically using Ansible as your sort of Ansible playbooks for defining what's in the container image. So what else does OpenShift need to do? We need the ability to diagnose problems. We need people to be able to play in this environment. So we decided to create this new tool and we called it Podman. So Podman is part of the libpod efforts. So we wanted to basically build a pod manager or a container managing tool. And we wanted to, based on this tool is just a CLI command line tool that can be used for managing container images. And we based it on top of what everybody knows which is a Docker CLI. So Podman is now out. We're actually releasing Podman on a weekly basis. We've been doing it for probably the last six months. Just kidding, Podman 8.3. So we released it, eight is the month and the third week. So at the end of the year, we're gonna be in trouble. So we have to have 1.0 by the end of the year because we can't keep our naming system going. But basically, you wanna list the containers on the system. If you wanna run a container on the system. If you wanna exec into an existing container. If you want to list the images out in the container. Basically we've tried to copy everything in that CLI possible that we care about. Obviously we're not doing swarm with this command but we've had most of the commands are all done and lots and lots of people. And there was a great tweet that came out back about, now I guess it's back in May, and I love this tweet. He says, I completely forgot that two months ago I set up an alias of Docker equals Podman. And it has been a dream. So he's been running for two months at this point without using with Podman. And of course that's a several month old one. So the next question down comes down and says only downside is no book. I'll talk about that in a second. Next one's down, Joe Thompson replies and says, so who remind, how did you figure out that you were running Podman instead of Docker? And he said I executed Docker help and it came out with Podman help. And I think I owe about three quarters. So what I'd like you to do right now is go home, try this out, try out Podman. It's available on Fedora, REL, Sentos, Ubuntu, and it's fully supported on OpenSuzi as well. So it's basically gone out. We have lots and lots of contributors to it. And guess what? No big fat container demon, okay? It works like a fork in exactly. It works sort of exactly what you expect. Not a client server operation, but Podman is really, really cool and does almost everything you can. So we talked a lot about containers. There's handed out the coloring book before. And I think I'm just about to run out of space. So we have two other talks this afternoon. Nalan's gonna be giving a talk and I'm sure going back and attacking me. So I'm gonna give a deep dive into Builda. And then Urvashi and Sally O'Malley gonna be talking about all the deep things. I've said that there's lots of security stuff that we're able to do by breaking apart containers. So they're gonna be talking about that later on this afternoon. So look for those talks. You can take the photo of this and the presentation will be there. I can only answer one question, I guess, yes. Yeah. Is there any tool currently that can update a tag on a container or remote registry? Any tool, actually, someone asked for that. The answer is that has to be built into the container protocol, container, basically the protocol that talks between the client and the server. And Vincent's raising his hand back there because he's gonna point out that they're working on a standard now to define that. So is that what you're gonna tell me, Vincent? Yeah, you can drop a coin in. The Docker Registry API, not the Docker Registry code base, but the Docker Registry API has now been donated to the OCI, the Open Containers Initiative, as the distribution spec. It is the API that would enable a feature like that, but it's not really up to the client tools right now. They would have to do some shenanigans, like fetch the image and then retag it and repush it. So that would be the place to look for it. Open Containers slash distribution spec. So we actually, we had a big bug report that someone asking for that in Scopeo, but we needed to get it into Quay and out of factory and Docker IO. So we really need that to be a standard, how you interact with the container registries to be able to do something like that. Anybody else? Everybody loves this idea and they're all aliasing it on their machines right now, excellent. All right, anybody wanna talk to me? I'll be around and thanks for coming. Yeah, all right, cool. All right, sweet. So it's falling down, it's gonna be hard because he was awesome today, he was on fire. But I'm gonna dig a little deeper into a lot of the things he talked about, but just go a hair deeper into the standards and how they work and why. And essentially like if you think about it, like backing up and thinking about it as a whole, the problem with the industry is if we don't have a standard, you can't consume, for example, a perfect example is say you wanna use a registry server at Amazon. So you wanna build locally, build an image, push out to Amazon, push to Docker Hub, push to your own built-in registry. You have to have a standard that everybody can comply to so that you can have true portability. Like the idea of the cloud, right, is I can move things back and forth and make it easy. And so it's a mixture of products provided by companies, community projects created by individuals and communities and services essentially that are also by products or by companies. And so if you don't have that, you know, you can't, you have bags, box, barrels and crates, as I used to say, they're all different sizes and trying to put them on a ship, you have to stack them all. And so really the idea, the literally the good, the analogy is basically shipping containers. You know, and people don't think, this is a conversation I've had a lot. In fact, this week I had it with the SyLabs guys, the guys that do Singularity because they have a different format. But I'm, and we were discussing like, where should they plug into the ecosystem, right? Like this is a conversation that if you put your head down for five minutes to start working and you lift your head, you're like, oh man, where do I fit into this ecosystem? Like do I build to the CRI spec? Do I build to the OCI spec? Like where do I fit in and where don't I fit in? Should I use the Docker CLI? You know, some people have integrated with the Docker socket, things like that. And you end up in dangerous situations where you're like the world moves and tectonic plates shift and then now your stuff doesn't work right because something broke with a version change. And so really what I wanna dig into in this talk is kind of like, where should you integrate? Where do these standards protect you? Where don't they protect you? What do you really get from all these? And I mean, Dan did a great job because we've built an ecosystem of tools with build a scopio, podman that actually work using a lot of these standards. So we have a lot of expertise in figuring out where these things should fit and where you can invest long term. And so to have this healthy ecosystem of things where people can go off and innovate and do things, but then lift their head back up and their tool still works in the ecosystem, you pretty much have to have well-defined interfaces and well-defined standards. And so I'm gonna dig into about five of them actually. So the solution, right? Like I mentioned is open standards. You have to have well-defined interfaces and open standards. And so you need to be able to have this happen, right? I need to be able to swap any one of these out in certain places, right? Like the world started with Docker, they invented the concept of a container engine at which yesterday I talked about is really basically a giant proof of concept, right? That word Docker means so many things. It's a company, it's an API, it's a demon. It was a format for images. It was also an ecosystem of images out on Docker Hub. And the only way to kind of sell the world on how to do containers in this new way was to kind of build this giant proof of concept. And that was the genius. But the downside is now four years later, you've got everything munged into this giant proof of concept. And that is bad because it's really hard when you wanna make a small change like the Dan brought up, the example of Rocket wanted to, you know, wants to integrate in Kubernetes but Kubernetes is hardwired to Docker. That's a bad snare, right? Or you wanna have a standard spec on what images they pull, obviously like. So we've chipped away at different pieces of this to make it so that these things are pluggable. And now today, 2018, we're actually in a really good spot where you can plug these different things in and essentially every one of these does the same job and I'll dig in deeper and where they all fit in and why they work. So, a long time ago, several years back, I went back and kind of revisited the original, like four years ago, four and a half years ago, whatever that was, when Docker first came out, a lot of people made the shipping analogy, right? And I actually went back and dug into some of the ISO standards around actual shipping containers because they're so, they're apropos to the, you know, to the software containers. And like one of the ones that were actually really interesting to me was like ISO 1161, hook dimensions and strength. So I just pictured like, right? There was some point in history where some guy was operating a crane, just, you know, trying to lift the container and the top tore off and you know, who knows like the container fell out and you know, all the stuff fell out everywhere because nobody had ever standardized, like what is the strength, the tensile strength of, you know, the corners and the hooks and things like that. And really the same, you know, we went through that same process in containers in that now we have an actual standard for like what the images look like, how, and this also highlights like not just the image but right, the tooling. So like the cranes, right? I wanna be able to buy one set of cranes that can move these containers around. I wanna invest in those cranes because I'm gonna buy them for like 30 years. And so the business problems that are associated with that become a lot more obvious when you're building physical stuff because software feels like you can change it really fast but in reality, nobody wants to once they put it in production. So you wanna put us, you wanna, you know, adopt a standard because I still wanna get value on whatever I built in five years or three years or even as a, I have some upstream stuff that I've built and like I'm lazy, right? I don't wanna go change it just because something changed. I don't wanna have to constantly rewire it. I just wanna build to the spec and say, okay, it should work. So that's what I'm kinda trying to highlight by making this analogy with the, you know, shipping, the actual real shipping container. And so these are the five specs that I just wanna like dig into and they're actually similar. I would say these two on the end are really well-defined interfaces. They're not actually, or I'd call them community-driven kind of standards. And then these are actually like standards body standards, these three. So, you know, there's a spec for images, like the actual image format on disk and when you're moving around. Then there's a standard now which is recent for distribution. So that's like moving it between, you know, registry and cache or between registry servers which is what Scopio works on. And then there is the runtime specification which actually explains, and I'll dig deeper into it, but it kind of explains how you take all the stuff that's in the container image and turn it into a running process and all the translation that happens in between talking to the kernel. CRI does the same thing on the network side in a very similar way except that there are pluggable binaries that can do the work instead of run C as kind of the standard that most people do, although it's pluggable as well. And I'll dig into that a little bit. And then, and then C and I, oh, I'm sorry, C and I is what I'm talking about. And then CRI is really an interface between the Kublet and the container engine. And actually that one came up the other day, I mentioned the anecdote of where should like the SyLabs guys from Singularity, they have a really cool, you know, they have a cool container engine, essentially is what they have. But they never were quite sure where they fit in and where they wanted to play in the space. And they play in HPC and in HPC, they built their own image format. That's not OCI compliant, right? So now they're kind of trapped in that world. But that's good and bad. It allows them to go innovate and they can do things that maybe HPC users want. Maybe HPC users do want to ship the data around with the container image, which is what they do, essentially. OCI images do not do that. OCI images are just a code. The data lives locally. But the cool part is with CRI, two different groups can build a CRI compliant container engine. And then Kubernetes can just talk to it, or CRI CTL, which I'll dig into, can just talk to it and you can list the images, see them there, even if they're different formats, doesn't really matter. So the beauty of understanding these is that you can kind of understand where your project should fit in, where you should build, where you can rely on things that you know that your stuff will work. And then as an architect, somebody building an environment, knowing where these work, you can kind of know which projects you want to investigate. If I'm looking at a project, personally, I would be a little bit apprehensive if Singularity didn't comply to the CR interface, because there's probably places where I'd want to be able to call it through this standard interface. So I'm going to focus on CRI, CNI, and OCI. Those three main sets of standards, if you will. So I showed this slide yesterday, and it's the only one I'm going to show from my other one, but this is kind of the full money, right, of like what a container engine does. And I walk through it kind of, if you think about it, a lot of, it took me a long time to boil down like what is actually happening in a container engine. There are essentially, and I'm going to dig deep, deep into this in some future slides, but it's really a conglomeration of options that are specified in the container image as metadata, options that are specified as defaults in the container engine. User overridden, you know, options that they pass as command line options or either that or through YAML and Kubernetes, which then get passed to the Kubla, which then gets passed to the container engine. But in a nutshell, it's image-defined, engine-defined, and human-defined options all get combined into something that then gets passed to RunC, you know, which is an OCI runtime standard, and then the CNI plugins, which are also binaries that take a config blob that's very similar to what gets passed to RunC, but only for the network side. And so if you think about it, this is what a container engine does. And then on the far right side there, I show all the image layers come together, get smashed together, in a nutshell, you can think of them, they're overlay, they're actually a copy on right layer added to the very end, which I won't dig deep into, but at the end of the day, just think of them as, container image gets smashed down into a single layer that then gets mounted as a root FS. And, you know, again, then all these options come together, and that's kind of how a container gets ran. So now I want to, I talk about it like, I kind of dig in a little bit deeper into like where the standards play. So I showed you CNI, OCI runtime, OCI image, and actually distribution is really up there too. I have an update of this drawing. And then here's where CRI works, right? So CRI is, Kubernetes is talking through the CRI interface to Cryo. So Cryo is a demon that it understands the CRI protocol essentially. And any, you could plug in anything there instead of Cryo. In fact, there's commonly something called a Docker shim where it speaks CRI and then speaks Docker on the other side and then talks to the Docker engine. Container D has a new branch or whatever called container D-CRI that speaks this natively. As I mentioned, the singularity guys are thinking about plugging in singularity there as a CRI interface. So the cool part here is, you could kind of see since you're protected by, you're kind of, it's noticed how Cryo's boxed in here. It's boxed in by standards on every side. So it's boxed in by CNI, you know, the OCI runtime specification, the OCI image and distribution specifications at CRI. So you can plug Cryo in and out. There's nothing stopping you from moving another, you know, some other, you know, CRI compliant slash OCI compliant demon in and out of there. And that's the beauty of these, of these, you know, essentially standards. And then I, this was the one that I came up with for Dan. He requested that I come up with the one that shows where CRI CTL and Podman come in. So this is a little bit deeper version that kind of shows. Now there are some other things going on below the covers, right? Like, so we have, I showed we have at this layer, we have CRI interface. And that's kind of what defines the interface to the CRI compliant container engine, right? So like, whether it's a shim or it's natively supporting CRI, you can plug in and out that container engine. And one of the sides that you care about is the robots, Kubernetes, right? That's a scheduler coming in and talking to the container engine. But on the other side, we have something called CRI CTL. Think of this as the human interface to the container engine. Because, and this took a while for it to click for me, but if you wanna have pluggable container engines, you need a standard API that robots can talk to you. But then on the other side, you really need a utility that humans can become familiar with. So if I learn how to use CRI CTL, if I just learn how to use CRI CTL images and CRI CTL PS, and you know, you'll notice that the interface is quite similar to what Docker looks like, except for there's no run because it's assumed that with CRI interface, you will have the Kube lit will be, you know, basically sending the runs to the engine. But in a nutshell, it's a pretty familiar command line interface, but you can now plug out that container engine pretty transparently to both the robots and the humans. And so that's the beauty of like CRI, CRI CTL, and then Kube lit talking CRI to a CRI compliant container engine. And then one level down, many or most CRI compliant engines will be pulling OCI compliant images from OCI compliant registries through the distribution and image spec. The one caveat there, as I mentioned, like for example, the singularity guys, they could plug in a non-OCI compliant, you know, where it goes and pulls its own container images that are completely different format, which is fine. If that's what you want, as long as you understand that's what you want because you want some other value from that container engine. But then all rules below this might not apply anymore. But in the CRIO world, this is what it looks like. So you know, you can pull OCI compliant images, then one more layer down, you got container images and container storage that kind of mandate in a well-defined interface how things get laid out on disk and how the images get stored. And to look at that, you can actually do it like a CRI CTL images and see what images are cached locally. But if you want to like tag images and play around with what's in the cache, that's where something like Podman or Builder or any of the ones that can talk to the local container cache, you can now go muck around with that stuff at a lower level if you want to. And then, you know, at the end of the day though, as I showed in the last slide, everything gets handed to RunC. When it gets fired up, you're just passing it a root FS and that config.json to RunC and then RunC talks to the kernel on a standard interface and as long as the kernel features are there, those things get turned on, whether it's C groups, SC Linux, S-Fert, which I tracked down the other day and I have a whole deep article on that. And, you know, clone whatever clone environment, whatever clone arguments you want to pass. So whatever namespaces you want the container to run in, all that's defined in that RunC interface. All right, so what is a container anyway though? I mean, I think Nailin mentioned he's going to go into this deeper. It's not rocket science. Like people, like it's a black box and so it's a little bit intimidating when you first start. And I have to admit, tracking down a lot of this in the code and asking Nailin and Dan and a bunch of the guys on the Runtimes team different questions that I realized there was a class of people that understood what was going on under the covers and it was like five, 10, 15 people in the world. And I was like, we need more people to understand this because as architects, when you're building these environments, I mean, I was a long time sysadmin, way too long, I have PTSD from it. But, you know, sysadmins, architects, people like that need to understand this so that they realize, oh, okay, some of this stuff is not that hard. I can feel a little bit more warm and fuzzy about where I can swap things in and out and feel good and actually get the value out of things I want. So like there's places where you might want to swap out the Runtimer or the CRI engine, et cetera, et cetera. But they're not magic, right? At the end of the day, it's just like a bunch of metadata and a bunch of files. I mean, that's all it is. That's all container images. And the metadata is really, if you think about it this way, it's a way for the human that is building the container images to express to the person that is consuming container images, hey, here's how I think you should consume this. Here's some sane defaults. Here's the basic command that should run. Like this is a memcashd container image so the command is gonna be memcashd. Like that makes sense, right? That's pretty obvious. But as container images get more complex, there's other things that the container builder might want to express to the container consumer. And so you really think about this. This is a human interface. This is for humans to communicate with each other using code. And really, if you think about what these containers are, they're a format for collaboration. And the analogy that I talk about is, if you think about hardware, hardware was physical letters, right? Like to get it, I used to get servers from HPE or IBM or Dell, you know, and they had to ship them to me just like a letter. And that was a very slow process of collaboration. Like if I got the server and it had the wrong OS on it or it was not updated to the right level, there was all this stuff where I'd have to call and update the drivers, blah, blah, blah, all these nasty things. And it was very slow and tedious. It was still decent collaboration. I was still more open than we had in mainframe and even, you know, Unix days, but it was still slow collaboration. Then kind of the next level I joke is email, right? That was VMs. Email and VMs, they're better. It's not awesome though, because like still you have to go, if I want to collaborate with another human being with a VM image, like the virtual appliances that VM were tried and failed, I have to go, you know, pull a 10 gig image from an FTP site or for a SCP from somewhere. And whenever I want to collaborate with another human being, I have to mark up changes in that user space, in that VM, change them, save them, and then they have to like upload them back to them. That's terrible. Like that's like email. It's like emailing back and forth Libre Office documents and using the markup. And if you send it out to five people and each of them make changes, you're now screwed because I can't actually get all five changes. I don't know who made changes first and there might be collisions and all this. But then when you get to containers and you have a format where you can actually define the layers and you can actually see the diffs between the layers, the container image layers. Now we get to a point where the files portion of this and the inputs portion of this are really collaborative. And so we can really collaborate on what the container builder and the container consumer want to do. And so in a nutshell, that's all this is doing. And so there's two basic things, right? There's the config.json, which I mentioned here, and then there is the image layers. And so the image layers get all combined together through a graph driver that basically builds this, you know, this root file system that config.json, then that gets handed off the run seat. And then I show a little bit more complex version of it. So, you know, this is the OCI image specification. It defines that you put, you know, these things in that container image. The container engine explodes that stuff, does all the work to actually make it into a root file system, builds it to full config.json, hands that config.json and root file system off to an OCI compliant runtime, which could be RunC, Catech containers, RailCard, GVisor, there's a whole bunch of tools that are OCI compliant runtimes. But as was mentioned, you know, RunC is probably 98% of the world is using that. And then that calls the clone syscall, which then talks to Linux. Typically Linux operating system could be Windows. Windows has their own syscall layer thing. And then here's the full money shot, as I mentioned. So what are we doing here? There's a container builder, there's a container user. So this is the consumer, this is the builder. And what this person is doing is they're saving a bunch of inputs to kind of communicate, as I mentioned, to the end user. What do I think you should do with this container? And it could be, like I mentioned, a memcache d, and maybe I'm gonna set the entry point to memcache d so that this thing always works. And I'm gonna set a default username and password for my SQL server, whatever. There's all kinds of things that you can communicate to the end user here. Or maybe you're gonna communicate to them that they shouldn't use this without a password. So it should fail out unless you pass an environment variable. Or with like Microsoft SQL server, their images won't run unless you agree to the EULA. So you have to put EULA equal Y. So this is a way for the consumer, container image builder to basically pass information on to the container consumer in a really collaborative cool way, which is way better than VMs. And I give some examples, they can use to build, build a Docker build, Umochi, there's all kinds of tools that are now building OCI compliant images. You can do it with podman and a Docker file. And then at the end of the day, what gets happened, what happens is you end up with the config, essentially the equivalent of the config on the image side, JSON file shoved in a registry with a bunch of image layers. And they're a bunch of tar balls is all they are. And then, but I show the daemon or the engine really is what I should change this to because it used to be only daemon and now we don't have big fat, now it's no more big fat daemon. So I need to change that to really the engine. The engine now has a job, right? It has to kind of interpolate what's going on in some of these environment variables. Which ones do I want to obey? Which ones don't I want to obey? Which things do I want to add myself? Like a perfect example is that if you don't specify a second file, the engine has some defaults. If you don't specify dash dash privilege, the engine has some defaults of what it does with security. Like in a Red Hat world, password is used by default. So it will automatically generate an SD Linux context that's dynamic and unique to that container that fires up and then it will start that process in that context. Those are all defaults that are specified by the engine itself. And then the engine will build the config.json I mentioned and essentially the user at that point though when they go to interact with the engine, they can override some things, right? So I give some examples here. kubectl run environment A will be. So even if you're interacting with it at the orchestration layer, you can override some of these environment variables that might be buried in a container image somewhere that again, it's kind of a way for the person that built it to communicate to the user. Hey, here's some defaults saying defaults, but maybe you want to override them because it makes sense. And then again, I already mentioned kind of gets handed off to run C, one of the OCI compliant run times off to the kernel and then run. So I was gonna show, I'm gonna show just a quick, you know, an example of this, which I actually already have it running, but so watch this, like, so the beauty here is right, we can docker run bash. We all understand how to do that. Cat slash Etsy red hat release, right? Makes perfect sense. We've probably all seen that, but then let's do this with, here it is, actually let's just do it by hand. Subpodman run dash IT rel seven bash. Look at that, same thing, right? Completely command line compatible. Pretty familiar, right? Like I think most of you would know how to run that. Like how many of you feel good with that? Pretty easy, right? Now, something I didn't talk about is now this CLI interface is not covered by a standard. Neither is the Docker API. So there's one gap there, right? I'll tell you the one place I would recommend not integrating is directly against the Docker socket or against the demon where you're actually going back and forth in native Docker API. I wouldn't do that because you're completely not protected then. Then that could change and break. We can protect you with a tool like Podman for the CLI and then behind that we can protect you with the image format, the distribution spec, the CRI interface, the CNI interface. These are, you're pretty safe once you get into that, but there may be a few places where people have integrated directly, natively with the API. That's one place I'd say steer clear from, doesn't make sense, don't do it. Does that make sense to everyone? So let's see, let me show you what I did here. So this is basically what I did, right? It's that simple. I swapped out Docker engine for Podman. Pretty much exactly the same, CLI compatible. You won't notice a difference. You're getting the same output, you're getting the same things happening, config.json, root file system again, generator runs these running it. Probably 90% the same code, except no demon. And here I'll give you, one other piece that I wanted to show you is watch this. So we'll go into another terminal over here and I'll show you what's going on. So ps3-ac, so here's what's happening. You see, it's actually really easy to track down Podman, SSH demon, my original bash that I was in. I ran Podman, here's that bash, boom, some things are getting firing off underneath Podman. That's so much easier than trying to do with Docker. If you've ever tried to figure this out with Docker, you could trace this and see what syscalls are happening inside the process. You have a lot better understanding of what's going on in the sub-process. With Docker, it's a pain because you have to trace the running demon, basically mess with the CLI, send some commands to the demon, try to figure out what the demon is firing off and I've actually ran into this problem where it's actually kind of a nightmare, especially remotely to figure out what's going on with it. This is a much more elegant interface. This is better for things like another use case that came up last week was the container executor. So they didn't want to use Kubernetes, they wanted to use Yarn, it was a Hadoop environment. They have their own scheduler, they've already built their own jobs, they probably have a ton of investment in that. But they were using the Docker demon to fire off containers. That's an ineligent solution, really, if you think about it. What's happening is the Yarn container executor was firing up containers and shutting them down so fast that it was corrupting the Docker demon. Because it's a demon and it's trying to keep track of all this nasty stuff, right? And so they were firing up 1,000 containers, shutting them down real quick and that can end up corrupting the demon. In this scenario, it doesn't happen. This is just cloning and running, essentially the cloning of a fork and running another process. So there's no client-server interaction, there's no race conditions that can happen where things, you send a command to the demon and oh, it thinks it's already shut down but it's not shut down and then race conditions happen where it tries to send another command to send it shut down. They were running into all kinds of nasty stuff where it would get corrupted. And so we had them try Podman and so far they love it. And so this is a perfect example of where they were protected because they used the CLI, they weren't tightly integrated against the API and everything else below that, all the container images and everything else were protected by the OCI image. So they could fire up the exact same images with the same command line interface but in a very different scenario with much better results. And those are the kinds of scenarios like people don't care until they do, right? Because usually what happens is you end up in some kind of nasty problem where something breaks like that and then you're in a deep pickle and that's where you start to care. So why would you want to do this, right? Like, well, because, I mean, this has already went through pretty well by Dan but fully community-driven open source set of tools, small, nimble core utilities that are standalone, can be used to build, can just be used to run, can be integrated with other schedulers in random ways like the yarn one, better security, can run as non-route. FIPS is something that we're working on now that we've enabled the possibility of because our underlying Go Wang is linked against open SSL dynamically so that it can be put into FIPS mode. All kinds of innovations like that. You start to be able to break all this stuff apart and you can really start to mess with some of the underlying things. And again, you don't realize what you need until you actually start to dig into the use case and actually start to build out real production stuff. And that's when you get burned by having everything buried into a giant POC where it's hard to make changes to this big ecosystem of things that are all glued together very tightly coupled. And so I would just call it action. Here's some good places to go check some things out. Go check out CryoScopio, Buildup, Podman. I didn't put Podman on here yet because this is an old slide. And then if you go check out the upstream work in these libraries too because you can kind of see what we're doing and how they're leveraged in these tools. And so I'm actually ahead for once. I guess I flew through that. But I guess I'll break and let you guys have any questions. New questions? I'm surprised. I don't know, you tell me. Question is, can you run Podman as non-root? The answer is yes. So that's another thing. Oh, that's actually, that brings up a good point. I'll rant about that for two seconds. So I had a guy beat me up on a mailing list one time about like, well, I can run Docker as non-root. I'm like, no you can't because you're running a CLI as non-root but it's still talking to a demon and essentially you have complete root access. So you can run a dash-dash privilege container and route mouth of local file system, do dash-dash v, your dash v, space slash colon slash mouth local file system completely corrupted to hell. You are not preventing anything by doing that. It is an illusion of, it is a difference between the model in your mind and reality and your threat model is off if you think that you're running Docker as non-root. It is always rude. But I saw a question over here. Oh, sorry, you got a question? And then he had one too. Yes. Thursday's dojo, Josh Burke presented Kubernetes in 15 minutes on Santos. He said install Docker in his instructions to get that done. Would it be possible from speaking as a novice to do that without using Docker? Yeah, you can do it with cryo very easily. I used to do a demo during this talk where I would actually swap out cryo live and show how Kubernetes doesn't even care and actually keep track of the containers, see that they're not running and restart them in cryo because it's just like pulling that CRI interface looking for are the containers running? They're not up. It'll try and start them again. So it's actually very, very easy. It's one config line change basically is what I demo and just installing a package essentially. So let's suppose in our companies or any company you have a private, private own infrastructure, cloud, hybrid cloud and whatever you have. So you have deployed these flows in this environment where you have a mix of solution because I feel this more close to private own and not to a hybrid or cloud models. So this works in every environment or in a combined environment. I think that's the beauty. I don't quite understand, is that a question? Like I don't quite understand the question but the idea is yes, like say Amazon has a registry, perfect example, they have their own registry, right? And if there's no standard, how do I know that I can push my container image into the Amazon registry and then pull it within Amazon, right? Like sometimes you don't wanna set up your own registry server, but you want one that's local on that network so that it can fire up quickly. So you have a thousand container hosts out in Amazon and you don't wanna set up your own container registry, you just wanna push into theirs. These standards are what protect you that make sure that will work, right? And that's the beauty, like maybe I just wanna run a registry for a few minutes to go cash locally, pull an image and then blow it out to all these other, container hosts locally. That's really nice to be able to do because I can speed up distribution real quick for like say a batch job that I just wanna run. So that's kinda, this is what protects you to do that. Otherwise you would have to set up your own registry and make sure that it's compatible with everything and it would be a pain in the butt. Any other questions? All right, I think we'll drop. So, all right, well thanks. Thank you so much. Thank you so much. Oh my God. Okay. Can you hear me? Testing? Awesome. All right, good morning everybody. We're gonna have the next talk on scalable monitoring using Prometheus with Apache Spec by Dan and Zach from the CDO office at Red Hat. Thank you. Thank you. So we're gonna be talking about scalable monitoring with Prometheus and using Apache Spark. So let's talk a little bit about that. So I'm here with Diane Fedema. My name is Zach and we work in the CDO office on AI machine learning. Today we're gonna go over, we're gonna spend some time talking about observability, talking about performance tuning when you're running machine learning workloads. And I think this is a very important topic. And let's look at some of the use cases here. So whenever you're building something, right? Like the same problems developing software exist whether you're doing machine learning or whether you're doing Java programming or whatever development that you're doing. I think performance is a very important thing. If you're shipping something, the quicker you ship it, the better it is. So let me ask you a question, Diane. In terms of getting something, if I were to order a Tesla today, how long would it take? I think it might take a little bit too long to get one and I'm impatient. So I think with machine learning jobs, sometimes it takes a long time to train a model, right? Depending on how much data you have, depending on your hardware setup, it could take months to run an experiment. And I don't think it's good to kind of lose time. The quicker you have experiments, the quicker you get rapid feedback, the quicker you can experiment and tune and improve it. So let's talk about maybe if you turn something that takes a long time, improve the performance, make it run within a week, that's better than waiting a whole month and so on and so forth. So one of the things that I think is important is in order to improve things, you need to be able to see what's the bottleneck, right? In terms of performance, right? So I'll tell you a little bit about the history of why we started building all this infrastructure and tooling. So we were basically running Apache Spark in Kubernetes pods and running them and we had data scientists writing notebooks and doing different experiments. But then we don't exactly have visibility of exactly how much memory they're utilizing, what's failing. So we ended up basically instrumenting a Java agent to run alongside Apache Spark and expose metrics. For example, metrics like the DAG scheduler, the JVM pool, the block manager and other internal JVM metrics that are running within the container. We later experimented with CRDs and continued doing more experiments. Got in contact with Diane because Diane has some experience working on high performance supercomputing. So we did some more work there. So anybody here know Prometheus? Okay, so about 40%. So just for the folks that haven't heard about Prometheus or don't know about it, I'll just go give a brief explanation. So Prometheus is a time series database system that collects metrics. In order for it to collect metrics, you got to tell it where, where to collect metrics from, what's the location and how often you want those metrics collected. So you have a couple of choices when you're designing a system like this. You could instrument, if you have access to the code base, you could instrument your code, import some libraries, there's libraries for Go, Java and Python, and then you can expose specific metrics that way. Another option is to use exporters. The nice thing is with Kubernetes it already comes out of the box with instrumentation that exposes metrics through Kubernetes. The problem was that Spark doesn't expose those metrics in Prometheus format. So we had to do some experimentation there. Anybody here know Apache Spark? Okay, a lot more people. So we won't go into the basics of it, but just high level what we use before, for example. So the nice thing that we liked about Apache Spark was the fact that you could do batch and streaming. You could choose to get messages from a Kafka topic or you can just pull from a file somewhere. It has machine learning libraries, it's distributed. You can do graph processing and it has a nice SQL API as well. There's some interesting news. Does anybody know about the Spark Kubernetes scheduler? Okay, one person. So there's actually some work being done upstream at Apache where some folks at Red Hat and other companies got together to create a Kubernetes scheduler. The schedules, instead of using Hadoop, Yarn, or Maisel's or standalone, then they would use Kubernetes as the scheduler. And then there's a lot of programming language bindings. So you could write your program in Java or Scala or R and you could run your workload on Spark. And then obviously the data access options and then the different data formats. So high level, what does an application look like when I'm using Spark, right? So over here you have a data source. So this could be streaming or it could be a file that sits on S3 or HDFS or some other file system that you have locally. And basically the middle there, you have some processing that's happening. And then whether you could do some ad hoc processing or you could also use machine learning if you want and then produce a model. And that model, you would store it somewhere like S3 or another system. So high level architecture design of how we have this system in place, we use Spark from the Rata Analytics IO project where we have pods, we have a driver and that driver is the application that's running against the cluster. And when we submit this application, we actually do some things. Like for example, we have a Java agent running within these pods and that Java agent is running with the Spark master, the worker and the other driver application. And at a particular interval, Prometheus will scrape each endpoint, collect those metrics and one nice thing that we found this very interesting feature about Prometheus is if there's a problem, for example, if you want Prometheus to send you an email, you could set up something called alert manager. So you set up particular rules and I'll show you an example of a rule in a later slide. Anybody here work with Java agents? Okay, one person. So Java agents, it's basically, so Spark is a JVM application. So it has JVM, it has metrics that you can expose through different options, right? So one of the options is JMX. And JMX provides these things and I don't wanna go into two technical, deep dive into JMX and JMBs, but basically for you, you get to have your metrics accessible in a format that Prometheus can understand by adding this agent and exposing your metrics and then Prometheus will scrape those metrics. So we're getting all the metrics from Kubernetes, from the network, from the Spark, all in one. So what is the configuration file look like? So you have the time, you're telling Prometheus how often do you want me to go and collect your metrics? And if there's a problem, where do I send notifications to alert manager? And where is the rules that you have that you want notifications for? And what's the URL or location that you want me to collect metrics from? So there's a couple of options. You can do static or you can use auto discovery. For example, in Kubernetes, we use auto discovery to discover the metrics within Kubernetes. So this is an example of a rule. So this rule is pretty simple, right? If there's a particular issue on your cluster, you could say, okay, this expression, if it values is true for five seconds, send a critical error saying that this far cluster is down and out. So these alerts, you can set different types of alerts. For example, you could have alerts that tell you the health of your cluster, you can have alerts for if there's too much CPU or memory being utilized. And then maybe somebody can go in and look at what the problem is and do some more analysis. So this is, so we use Spark SQL, we use PromQL, sorry, to get metrics out and then put in a graphical format when we're using Grafana. So there's a couple of different terms that exist, gauges and counters. Gages are, if you only care about this metric, that's like the latest value, then you would use a gauge. If you wanna collect incremental numbers, then you would use a counter. So I'm gonna pass it on to my teammate, Diane, who's gonna take it from here. Okay, great, I think I'm tethered to this spot. Can you hear me? Okay, great. So in the second half of the presentation, I'm gonna talk about how we can use Prometheus to do performance analysis, which is my background. I did performance analysis for years with HPC on supercomputers, with parallel climate models that used MPI. So it's a little bit different, but same general idea. So basically when you've got a cluster running like this, you want visibility of everything that's going on. You need to know how much memory you're using in real time. If you're gonna see a problem, it's really nice to have it scrolling right in front of you as the job is running, and that's what we're gonna show you in a demo. If you hang on just for a minute, I'm gonna give you a little background before the demo. How do I advance this thing? It takes time. It's a network, like... This, right? So in our example application, you can tell I haven't used this. We are gonna show you a stupid of code and explain how we're gonna do an optimization. We'll see it before and after running real time. And Spark, first I should explain for those of you who don't know, it's a very memory-intensive framework. You can use, you can sometimes run into 100 gigabyte JVMs, which is pretty unusual in the Java world. But it does happen fairly often with Spark. And so it's worth optimizing your memory usage with Spark. It's worth paying attention to this. It's kind of a good place to start. And so we are gonna look at, in our example, we're gonna look at running this Cartesian product, and we're gonna show you before and after with caching, an RDD that we're gonna reuse. With the Spark in-memory framework, even when you have nodes that have, say, 200 gigabytes each, it's worth it to pay attention to the memory that you're using. So before I show you the code that we're gonna run, I'm going to explain how the Spark memory management model works. There are two memory use categories in Spark. There's execution and storage. And they share the JVM space and that boundary you see there is movable. So Spark allocates memory in blocks like this. And if you imagine on the left, those execution blocks, in our example code, we're gonna do a group buy. Spark builds a hash table for that group buy, to perform the group buy. That hash table is gonna be created on, built in the left side there in these execution blocks. And then if we cache the result, it will be stored in the storage box over in the right. So you see that empty space in the middle is shared memory. And we're okay as long as there's no contention. But once there is contention, a block is gonna have to be evicted and execution memory can never be evicted. It takes precedence. So in this case, that storage block is gonna be spilled to disk. It'll have to be recomputed or yeah, it will have to be recomputed. So, but that's okay. It's worth it to cache things that you're going to reuse. And of course, we always wanna avoid spilling to disk if we can because you pay a performance penalty for that. Now as execution memory is requiring more memory, it is allowed to evict blocks from the storage area up to a point which is user-definable, this spark memory storage fraction is something the user can set. You can figure out how big your RDDs are, configure that so that they won't be evicted and they can be reused. And at that point, the block for the execution is actually gonna be spilled to disk. So execution takes precedence, execution memory and storage, there is an unavictible amount that we can tune. So another thing we can look at our tuning of that and we can run this Prometheus and Grafana dashboards and just see how the different settings come out for us. So one more thing I'm gonna explain before we show you the code and do the demo and that is Spark SQL, which gives you this nice, if you like working with relational databases, you can interact with Spark through the Spark SQL API. So you get the SQL syntax, you get data frames which we are gonna use here in our example. Data frames are like a table in a relational database. They are strongly typed, they have named columns and you interact with them in a declarative way just like you do with SQL. I mean, it's that you use very much SQL-like commands to interact with them. So I just wanna show you here where Spark SQL sits in the Spark stack. You can think of it as a library that sits on top of Spark Core. Our program is like this user program up here on the right. That's, it's written in Python. It is using both the Spark SQL API and the Spark Core API. You can intermix those any way you want. Some of the benefits of using the Spark SQL which is kind of Spark SQL sort of the future of Spark. You get a lot of optimizations with it. It has a catalyst optimizer that behind the scenes on the backend optimizes your queries and it also has a more efficient memory model than the JVM. So we're gonna show you a cache and non-cached example. One of the things you wanna know before you decide how much memory you're gonna set aside for storage is how big your RDDs are. So you can just cache your data frames which get turned into RDDs. You go into the Spark UI, you click the storage tab and you see right there we have a six megabyte RDD. So we wanna make sure that that unavictable storage is at least that big so it will hold our RDDs. This is our code example, non-cached. We generate a random RDD then we convert to random RDDs. We convert them into data frames. Then we take the cross join which is a Cartesian product and we do a group buy. One of the difficult things about a demo like this is you have to make a small example that will run quickly. So this is our code actually runs in under a minute. The cache version here is exactly the same except we're caching that RDD that we know we're gonna reuse. This is what the catalyst optimizer does for us in the Spark UI. You can see how it is optimized your code on the right. You can see in our cached version there. I'm not sure if you can read that but those are in-memory tables. This is what our dashboards look like that we're gonna show live. The non-cached and cached version because that's hard to read. I'm gonna just expand that a bit and show that by caching this one RDD we reduce the memory usage by about half. On those workers if you look at the first four entries. So comparing cache, non-cached versus cached this is on four nodes. Seeing that we get 65% reduction in memory and we get a reduction in our timings if we run it on eight nodes. We also get over 50% reduction in memory and in our timing. The thing that we do leave on the table here though in this example is see how it's a little bit load imbalanced. You probably wanna go back and run this again and run it through Prometheus and Grafana and maybe increase the number of partitions that we had from 200 to maybe 400 to see if we can load balance the cached version. So now we're gonna run the demo. I'll mirror it. Okay, so we start here at the OpenShift console. Let's see if we're gonna refresh that. One second. I'm gonna have to. I'm gonna connect to VPN. I'll show you that this is live. Definitely a live demo. It's a lot more fun to see a live demo than something that has a safety net, a video that, you know, it's more fun. It's so connecting. It'll take a second. Yeah, we are on one of the students accounts so it should work, but. It'll take a second. I think we're on now. I'm afraid it may have dropped the student list. I'm sure we're using a token or? Yeah, it's cool. It's way more fun to do live demos. Oh, we're back on guest. That's the problem. We were on a student's account and now we're back on guest. So it's not gonna connect. It's been working for me, I guess. So. So is this student who helped us? One second, one second. Well, this one, we're still good on time, so. Okay. So I'll explain some of it while Diane's setting up everything. So what Diane's gonna show you is she's gonna be running SparkJob against a cluster with the sample code that Diane showed you. She'll have two different versions of the code. The two different versions that Diane showed you in the slide deck. And then once Diane deploys it to the cluster, we're gonna see live metrics and graphs and charts. And our demo is okay. Yay, it's back. Perfect. Thanks for being patient. Okay, so first thing, we can see that our targets are up here. These are, this is in Prometheus. This is a view, if you're using Prometheus and if you've used it much, you know that one of the first things you wanna do is check the status of your targets to make sure that they're actually being scraped and that they have a status of up so that they're sending you data. So our targets are up. Now I'm gonna go to the deployment config that we have for the non-cached version. Perhaps I should explain what deployment configs are, right? So if, for example, since Diane already wrote her Python code and she wants the platform to handle creating a Docker image, doing all that stuff and then this makes it a little bit easier to rerun your Spark job. So you can just rerun the deployment config by clicking deploy. So, but basically in order to create that deployment config, you would just use a template and then S2I would go and create that deployment for you, so. Here's our job, just showed up. This is the non-cached version. I'm gonna prove to you that it's not cached by looking at that storage tab, like I told you earlier. We have nothing there because nothing's cached. So now we're gonna, but you can see that the job is running and these are the tasks that have succeeded so far. Then we'll go to Rafauna and we look at everything in the system is pretty quiet, but the CPU usage is starting to pick up and we're gonna see here in a second, the memory usage start to pick up. So we've set scraped intervals at a particular, like a little bit of a, you know, like a bigger number, like 10 seconds or something. You can actually set scraped intervals to be a little bit quicker to scrape more frequently. If you wanna get more snappier graphs. So here we can see that we have these workers coming in, the Spark clusters on the top there and they're using about five gigabytes of memory each. So we wanna save on that. We wanna, by doing caching, we wanna see on which we can improve on that. We have all these different dashboards that we created and it actually, especially if you love to measure things like I do, it's a lot of fun to make these dashboards. There's kind of like infinite things you can do with this. Down here in the bottom, we have the JVM space. So say you're doing a lot of garbage collection. We don't happen to be doing that in this case, but if you are, you could go down and see how the JVM is being used. And this area at the bottom is the storage that is cached. So if you had a problem, you could see it here if you're garbage collecting too much. So here we see that the Cartesian product is happening now because we're up over 19%, 20%. It's a CPU intensive operation. We have one pod per container, one container per pod. And so that these values are the same. That was from the diagram that Zach shared you earlier with the workers each having their own pod. So that job is over now. And we can go back here. Here. Let me see. I need to look at Spark UI. See that our job finished in 1.2 minutes. So that's gonna improve with the cached version. So we have the deployed config with the cached version right here. Push the deploy button to rerun that. So it was explaining how easy that is to do. Have your job all set up. And we're gonna see this job, the cached version coming here. So the tools I used to use in the past for reviewing this kind of thing were callmux and collectall. And it was all ASCII based. And you kind of saw what was happening over the whole cluster, but you didn't get these nice graphs. And you didn't have the control of building your own graphs so easily. The latest version of Grafana lets you drag and drop things all over. Drag and drop your graphs wherever you want. And it's just a really nice interface. So, see this. There we go. There's the cached version. It's proved to ourselves that it's really cached. Look at the storage. There's the cached RDD. Now we go to Grafana. You see that the cluster CPU usage and memory usage are down again between the jobs. And you see the, we're scraping every 10 seconds up there on the right. You can see that this is an eight node cluster here. And you can actually look at these statistics per node if you like or per application. I'm gonna look at all of them instead. And you see the new job coming in now. Pod memory usage is just half what it was with the 2.42 gigabytes. And we have that benefit of caching shown here. And like I say, there are all kinds of knobs in Spark that you can change. And then you can quickly just see your improvement or degradation here through uniqueness. And like, when you're running things on even cloud workloads where you're paying by the minute for particular hardware resources, I think it's important to know exactly what's being utilized and to fix the bottlenecks and try to run your application with and be able to optimize your application a little bit better. I remember back when I was in college, I did a startup where we built a search engine and did a lot of experimentation. And we managed to optimize our code to run more efficiently and ended up saving like maybe almost 120% costs on AWS. So it's always good to optimize for performance, save money too. So the timing on this is 44 seconds. And yeah, I think at any time you're doing some optimization, you just don't know where to start unless you have some visualization of the utilization of the resources. And this is gonna work for GPUs. And so you can also put GPU information into Prometheus. It just allows you to get the most out of your hardware and without it, you have no idea what's really going on. You have to have some visualization of what's happening across the cluster. So does anyone have any questions? How do we make my application? How do I make my application Python code? So we started with the Python code that we showed and then we ran with using Oshinko at Rad Analytics IO. They have S2I templates that you can run on Kubernetes and you input, like my code is sitting out on GitHub. I tell the template that my code is on GitHub, it creates a Docker image for me. So I think I understand what you're asking is with your Python code that you have, let's say it's not Spark and you just have Python code, you just import Prometheus, pip library, and then you basically put all inside your code, you put all the metrics in there and then you would get an endpoint and then that endpoint, you would just add it into the config map in here and then Prometheus is gonna be able to go and scrape it. I can't hear your question, but. So I'll repeat the question. So in your code, you do a lot of log statements and that's how you measure metrics and different things that are happening in your application. So there's some interesting projects, so there's different ways to measure things. For example, there's Open Tracing, which is a Jager project. There is this type of metrics that we wanna just get out of the application. We can instrument our application for metrics and then there's the log method that you have, which is. I want my application, sir. I want my application not to be too much tied to the external monitors that monitor my application. So that's one of the things. So I want my application not to be too tied within code, not to be too tied with external monitors of my application. And so if I need to import a specific Prometheus specific code and write Prometheus specific code in my application, I mean, I think that's kind of anti-pattern. You can instrument your code. Also, that's an option to instrument your code and output metrics in Prometheus format. If you wanna do that, that's an option. So I'm. Can we add another question in the back? Should we do that? Yeah, we have a question in the back. So I can see where this would be very beneficial if I'm running like a consistent workload to optimize for a workload. Are there any best practices for applications that are processing like dynamic data where you don't know what the exact volume of the data is gonna be? So like the distribution of what should be storage and what should be execution might change over time. Are there rules of thumb to start with if that's your use case? So in a Spark general Spark application, are there rules of thumb? Is that what you're saying? Yes. Okay, yeah, there are a lot of rules of thumb with many of these settings with Spark. For instance, like the number of partitions you wanna set, like you want every one of your cores to get at least one partition. So that's kind of a way, a load balancing thing. And like Spark is very memory intensive. So I in general start very high if possible with my memory usage just to get something running first. So I'll start with like five gigabytes per worker, that sort of thing. That doesn't work. Double it and then I'll back down. So yeah, there are rules of thumb. There are a lot of knobs in Spark to set. But yeah, if you start with the right number of partitions and end up plenty of memory, you've got a starting point. You'll be able to look at it like this and then tune it down from there. So I think we ran out of time. But if anybody has any more questions, we'll be in the hallway and we can just have discussions out there. And thanks everybody for coming. Yeah, thanks for your patience. So next talk's gonna be on OpenWe Switch by Aaron Kanol. Yep, hi everyone. I'm Aaron. This talk is on OVSD bug. It's gonna be very kind of terminal oriented and a lot of texts. So sorry, I guess read your email if that kind of stuff bores you. So this talk is gonna be about debugging, networking with OpenV Switch. I don't mean like debugging the C code of OpenV Switch. So we're not gonna go do anything with GB. But we are going to do use some kind of fancy OVS commands. We will talk about tracing packets. And yes, that does mean we'll be using TCP dump a little bit. But no, TCP dump is not the only thing you need. Or rather is not the only tool you need to reach for when working with OVS. Finally, I'm not going to touch net filter, routing table, any of those things. We'll get to why in a bit, but just if you have a problem and you think like, oh, okay, OpenV Switch and net filter aren't playing well together, we will cover it, but I'm not gonna talk about net filter. Okay, so two types of people I've kind of geared this talk for. People who are writing SDN orchestration tools and people who are supporting in a kind of a support role. The most common things that come up are packets don't go out, packets go out the wrong port, performance is bad. Those are kind of the big ones. And then most recently, when we enabled support for running stuff under SE Linux, OVS doesn't start, but we should have solved that. Those are real OVS bugs. All right, so how does OVS work? It's two demons primarily. We have the OVS DB, which is the configuration database and the V-switch D, which does the forwarding decisions and the flow pipeline. There are some important commands that go along with it. OVS VS Cuddle is one of the most common ones. That's how you add ports, add bridges, dump database information from OVS DB. Another one that's important for debugging diagnostics is OVS app cuddle. And that will allow you to actually send commands to specific OVS applications. So you can do OVS app cuddle commands for the DB, you can do OVS app cuddle commands for the V-switch D. Any of the demons that are running will have their own set of commands and OVS app cuddle is how you would access. The OVS DB contains just configuration information. Ports, bridges, interfaces, mirror information, that kind of stuff. It doesn't contain other kind of data. I guess it doesn't hold copies of packets. It's not involved in the actual forwarding at all. It just says, this is the configuration. You can dump that information by using OVS VS Cuddle show, OVS VS Cuddle list, et cetera. Sometimes the DB can contain what's called, what some people refer to as stale information. What that means is someone has added some port configuration will say for a port that doesn't exist. The DB does not enforce that you are, that you have a correct configuration. So just like a configuration file where you can throw in whatever interfaces you want, the DB will allow you to put anything in it. So yeah, beware. The V-switch D is the other side of OVS, the forwarding side, and that will pull all the configuration out of the database, okay? It will make sure that the running state of the system matches what's in the database. And it will clean up any flows that have been installed in any of the data paths periodically, and it will make sure that new flows that are required are inserted. Okay, that's basically all it does. I mean, it will get to some other kind of minor things it does, but for the most part, it's just making sure things are matching what's in the configuration that's been requested. Okay, there are two important data paths that V-switch D cares about. NetDev and NetLink. So NetLink is, you know, sometimes we might call the kernel data path. It's important to note that like OVS runs on Windows as well as Linux and Mac and all that, so and FreeBSD and whatever else. And so some operating systems, notably Windows and Linux, they have support for using this NetLink data path. So, you know, the V-switch D in that case, will generally prefer to use the NetLink data path. We'll get to why in a second, but we call that the kernel data path usually. The NetDev data path is all done in user space. That means packets come in into the V-switch D and the V-switch D processes them and pushes them out as well. So it's kind of simple what happens in a data path, right? Pack it in, pack it out. There are kind of two paths. There's the fast path in kernel or we'll get to something in NetDev. And then there's the slow path, which is everything that fast path can't do. So when fast path can move a packet, it does. When it can't move a packet, defaults to the slow path. Okay, that's what we call kind of an up call, right? I actually like to think of it as a down call, but they think of it as like going up to user space. But the packet, you know, when the packet doesn't match any rules in the kernel flow table, it'll get pushed into user space and then the user space has to figure out what's going on. There's no net filter processing. So OBS, the OBS data path does nothing that you don't ask it to do or rather only does what you ask it to do. So if you don't ask it to send the packet through something that handles net filter, so if you want IP tables processing and you add some IP tables rules selecting on packets that are in your OBS bridge, you'll notice that those rules don't do anything. That's because the packet comes in and is processed by the data path and is pushed right out. There's no chance for net filter hooks to operate. You would need to like somehow distribute it to the local host, push it to some kind of local interface that can be, you know, that has those net filter hooks, maybe the ETH device, maybe a ton device, something like that. Otherwise OBS isn't gonna call those things and it won't push things out to contract, for instance, without you telling it. So really OBS tries to do the most simple thing possible, pack it in, pack it out and give you the building blocks to build what you want. So this is kind of like, you know, this picture kind of just illustrates what I've been talking about. Packet comes in, that packet is matched against the flow key table, okay? If there's no key that matches that packet, so meaning whatever metadata is associated with that packet, whatever stuff makes up a flow key, so for instance, IP source desk, ETH source desk, you know, ports, what port it came in on, those kind of things. If those aren't in the flow key table to match, then it will be sent down to vSwitchD, you know, or rather they like to flip the picture and say it's sent up to vSwitchD. And a packet is processed by the vSwitchD and then pushed out and simultaneously, a flow will get installed into the flow key table to match future packets that come in. Okay, the net dev data path is a little bit different because there's no need for an upcall as it were, right? So, and it can do some other things. So it can take advantage of some packet batching, if that's possible, and it actually also uses a whole bunch of caches. And maybe if we have time, we can talk about some issues around the caches. This is kind of an illustration of what happens, like a batch of packets would come in, would be pulled off of a port. They would be run through the, what's called the EMC or the exact match cache. It's actually another cache called the SMC, but that we'll just call that part of the EMC. That EMC is very small, so you can see like the cost, I've tried to illustrate it, getting a little bit more each time you have to go to the next cache. The EMC is very small, but the idea is it's very fast. If the packets don't match in the EMC, they're pushed on to the data path classifier, and if they don't match in the data path classifier, they go through a proto-processing. In OBS, rather an open flow, everything is like match action. So fields, like packet type, IP header information, all of that, those are what you can match on. And you can also match on some metadata, what port it came in on, or what bridges we use, that kind of information. And then the actions are all like what to do with the packet. Jump to other tables, output to ports, push it over to whatever contract implementation. Modify parts of the packet, drop the packet, those are all actions. All right, so when do things go wrong? Open V-switch never takes action unless it's been told to, right? So the Netlink data path is simple, just forwards packets, maybe it'll go out to contract, but that's it. NetDev is a bit more complex because it has those caches, and it has to be involved in kind of pulling the packets and pushing the packets, but really it's still just forwarding packets. And it's really software-defined networking. And what that means is most likely when you have a problem with a packet moving, just like when your computer, when you have a problem with a program running, most likely you told OBS to do something that you didn't intend. So you told it to take some action and it's taking that action, but it's not doing, it's not taking that action, the result is not what you expect. But usually it's not a fault of OBS, you've told it what to do, it's carrying it out. Orchestrators probably misconfigure things. We see this a lot. So things like adding ports and then forgetting to delete them because of race conditions internally. Or adding improper flow rules for the system that forward packets like kind of all over and create loops. Bad port parameters, so setting things up, setting queues up incorrectly or setting priorities incorrectly or binding queues to specific CPUs incorrectly. Failing to restore flows after OBS restarts. Some of them don't detect that OBS has had a fault crashed and come back up. And so then your system has no flows. It's not gonna process anymore. And failure to observe faults in OBS. So it's important to remember, upstream is always available to help. Everyone in OBS, in the OBS community really does want the OBS software suite to work. So go to openvswitch.org, seriously not joking, go. Sign up on the discuss and dev lists. Like right now, people already have their laptops out, so you know, and you can do it on your phone too. It's pretty simple. So I'm not kidding, like it's good to do. There's a lot of good information there and people are very responsive. So for the remaining part of the talk, I'll try to like do some examples. It's always good to have like a real test environment. So I like to use network namespaces and vEath devices. vEath devices actually work for both data path types pretty well. They're simple to set up. It's simple to set up network namespaces. Here's like six, eight, 10, maybe like 11 commands or something to set up like two network namespaces connected through vEath devices so that you can ping from one to the other, you know, back and forth. And by default, this will work. I mean, like you can send packets back and forth. Another great environment where you can actually work with a real orchestrator is OpenShift includes this Docker and Docker cluster kind of hackscript. That's really cool because it does set up like OpenVswitch. It adds flows. It allows you to start pods on your local machine and you can play around with it. It's, I actually like that quite a bit. So, all right. A lot of times problems that get reported can be solved by just looking at the logs. Okay, vSwitchD logs a lot. It is configurable, but vSwitchD definitely logs any errors, warnings, you know, all that. And if you're using the net dev data path with DPDK ports, all the DPDK log data is also in the obvious vSwitchD log. And I don't know how many times we've gotten like bugs reported where in the log, it actually says, you know, this port is not available for whatever reason. And people like complain to us, oh, we don't know what's going on, like why OVS isn't working. You know, that's the thing they say. And like, in the log, it actually tells you this port failed to add and it tells you why, you know, the IOMMU is misconfigured or something else. So, you can actually see right there, you know, what went wrong and go fix it. A lot of people ignore this. It could have answered simple why questions. I mean, really the logs are quite good. Sure, any time. So, as some of you have doubted what we've been asked, if that's people who don't know logs, they can be fairly critical or fairly specific. I don't know if it's been, I'm not going to ask one of those more than a total question, I don't know if you have a lot of them. It's a bit, you know, I've been to, same one from the shirt you're wearing. Oh, yeah. Now that I'm halfway through my rambling, they'll give me a mic. Is there any thought on making that better or making it a little bit, because especially with the different kinds of net devs and, you know, DPTK errors look different than regular, using the standard DPIF. Yeah, so that's a good point. One thing that's nice though, in defense of the logs, what I will say is, anytime there's an error, it actually you can just grep for that ERR or worn string. I know what you're saying. I agree, sometimes it is difficult to understand the faults. I'll get to that. Well, not in the next slide, but in a couple slides, there are some stuff I'll talk about. Yeah. Check your firmware, check your kernel, you know, but make sure like the version numbers for the firmware are appropriate to the software you're using. We did actually have instances where, like, NICs were sending up multiple packets, right? Duplicate packets, and it was being blamed on OVS, and the team hadn't upgraded their firmware in like two or three years, and it was mismatched with the driver, and the driver was actually thinking it was programming something to the NIC, and instead it was telling the NIC to like duplicate the packet and forward it up in the queue. So, really it's important to make sure that the configurations are set right. Sometimes some offloads do cause problems for certain network scenarios. I know Andy just did a talk, and he said, oh, people always just disable offloads, and here I am on stage like advocating, yeah, just disable offloads, but sometimes they don't make sense. So, and sometimes you have hardware that does require additional work to get the kind of functionality that you want. There may be additional kernel module parameters, additional BIOS setup, additional, you know, other things that have to be done for that hardware to work optimally, or even at all. Okay, so a little bit to answer your question, or to go back to your logging question. So, a lot of times when a port is misconfigured, it just shows up in OBS VS Cuddle Show, right? So, if you do an OBS VS Cuddle Show and there's a port error, it usually just shows up, like right there, you know, in this case, these ports are set up correctly, but a lot of times it will say, like if you add a port that doesn't exist, it will say that port's not found, you know, right there, you could just see it. So, there's no need to grep the logs in that case, although it will show up in the logs too, so. Now, someone might ask, oh well, you know the port's not there, can't you just write a clean-up script? It's actually a little bit difficult, right? You have to know what kind of port you're dealing with. For instance, VHOST user ports won't show up in the kernel IP, you know, in the kernel, like if you do a netlink query to get all the interfaces on the system, you won't see VHOST user ports. So, you have to know like which ports to whitelist, you won't see DPDK ports. So, you might assume that they're non-existent if you do like a simple naive match, and you might remove a working config. So, a clean-up script is really difficult. I like to say it's best for the orchestrator to clean up the ports it adds, you know, like that, because the orchestrator is supposed to know. OBS really can't. And then, for the netdev data path, which really only applies to open stack deployments, I don't think OpenShift is using DPDK at all. But, DPDK ports do require extra configuration to get optimal performance, or even sometimes to get performance at all. So, you need to check your hardware topology, make sure like your NUMA nodes, the hardware is correctly matched, and then the VMs are correctly spawned on the right NUMA node to get optimal performance. Are your kernel parameters, or 2D parameters set up correctly, use ISO CPUs, did you turn off the RCU processing, did you allocate enough huge pages? Are your VMs on the right node, or even accessing those huge pages is that configuration right? There's a lot of additional stuff on top of OpenV switch for that to work. And finally, you should know when you're debugging this stuff what your network topology is supposed to be. So, a lot of projects actually set up their network topologies differently. OpenShift wants to configure OVS differently than OpenStack, and probably different than Rev, and probably different than some other project that's using SDN, and controls OpenV switch. So, I say like all bridges are not created equally. A lot of times developers make assumptions about how packets should flow when they add an OVS bridge, but if you read that blog, which I wrote, so it's a plug for me, but if you read that blog, it actually goes over that really the OVS kernel data path, a bridge is kind of a fiction on top of a bunch of flow rules. It's not really, it doesn't exist as like a thing in the way of that packet, so it's not even a bump in the wire or something. OpenShift and OpenStack, you can actually read how they like to set up their network at these two URLs. So, there's a lot of good information there. You'll find out about like what VRN to BRX, all those different bridges do, and for OpenShift it's radically different. Does your system, when you're using OpenStack, when you're using OpenShift, this is true. Does your system use the kernel IP stack, the kernel networking stack, in addition to OpenVswitch? For OpenShift it's true, and they have a ton of vice and they forward packets through that to provide IP tables hooks. So, OVS doesn't directly use the routing table. OVS, I mean it can, and in some cases it will, but generally speaking it doesn't. OVS doesn't use NetFilter, it uses like contract and only if you've told it to. So, it's not really like it doesn't, I mean it's integrated with the kernel, but it doesn't use those parts of the kernel you haven't asked it to use. Question? Here, how would you map these rules that you've set up? Is there a way for the user to use that? The rules are different. If you're asking about like topology, I would say use PlotnetConfig, or I think there's actually another tool called Skydive, and both of those will actually like detect what ports you're using, they'll kind of give you a graph, like a new plot graph that shows how the interfaces are kind of interconnected. It won't show you the flow rules though. Maybe Skydive will, but I will get to how to debug those flow rules. Okay, and then do the old things like BGP and all that stuff, do that still exist in this world or is that just a different world since you're not using routing tables? So OBS operates at kind of a lower level, right? It's just move packets based on matching these fields from one place to another. So all that routing decision, all that BGP, OSPF, all that, that's done at kind of a higher layer. Okay, we can follow up. So sometimes when the setup is wrong, you can actually see how it was made wrong by using the OBS DB tool. So if you do this OBS DB tool show log and point it at the database, it will give you a rundown of the transactions that happened and which process executed those transactions. So it's quite helpful if something got set up incorrectly. You can also grab some stats. This is like for the NetDev data path. So you can see like running statistics for how the forwarding engines are working. Kernel has other ways and you can pull some interface statistics when the port is non-DVDK port, like if it's a kernel port, you can pull those interface statistics using your standard IP and if config and EVE tool. So sometimes packet goes out and interface and you have no idea why, right? So we do something like dump flows, right? In this case, it's really simple. I know, okay, there's one flow. It's normal action. Oh, okay. So it's behaving kind of like a switch. And a lot of times you can just, you know, if your flow rules are small, you can just watch like which flow has these, see this end packets. You can see like which end packets are increasing. That works great if you have a static setup, there's no data going through and you can push the packets. It doesn't work well on heavily loaded systems. And a lot of times you'll be reading through reams of flows. You can do something crazy like I've done before, which is like you can dump the flows and use diff and like try to compare them. But that's the C and kernel and all that programmer in me coming out. Like that's not really, people don't like to do that. And I like to equate it to, you know, finding the Higgs boson, right? Like a whole bunch of stuff is blasted through this and you're kind of just sifting through all this data to figure out what's going on. And what complicates it or makes it worse is the flows as they look in the kernel data path are completely different than what the open flow rules look like. So because again, as I said, the kernel data path, for instance, it's just a flow key match, right? It's just these specific things match. This is all you have to do. There's no processing. Whereas like in the user space side, it will evaluate these rules. So it's a bit more complicated, but maybe there's a better way. So we'll take a quick detour, right? What's an SDN system? It's programmable. It has instructions, a pipeline, you know, it's like kind of a processing chip, but it's specific for packets. And that means we do have some cool debugging tools. So the one that I would reach for to answer your question about tracing these flows is of prototrace. You give it a description of a packet, or you can give it an actual packet dump, and it will show you how it evaluated those rules, right? So from the example, I made a change to the flow rules, right? From that demo example I showed, I made a change to the flow rules. And you can see here an ARP ping works, but an ICMP ping does not. So if I use all of prototrace and just say, okay, show me ARP, it actually shows that, okay, it matched a rule ARP in port one, that that priority, the action is output to two, right? But if we trace ICMP, we see that there were no rules matching. So clearly in my flow rules somewhere, I have accounted for ARP, I might have accounted for TCP, I might have even accounted for UDP or SETP, but I forgot ICMP. So we can go through and debug. How much? Okay. So yeah, as far as getting packet data goes, all right, sometimes that's what people reach to. So you could just reach to TCP dump. TCP dump works great if you have a kernel interface, it doesn't work at all for V host user, it doesn't work for DPDK ports. But OVS includes OVS TCP dump, which sets up a mirror and that works internally for OVS for all kinds of ports, kernel ports, V host user ports, all of that. And then it has this other cool gadget called OVS TCP undump. So remember I said OVP prototrace can take packet bytes, you can actually pipe TCP dump into TCP undump and you will get those bytes out and you can then feed those to an OVP prototrace. So in conclusion, sorry for concluding so quickly, but OVS debug really shouldn't feel daunting. There's a ton of documentation, I know I pushed a lot of URLs up there, but there's a ton of stuff on the web to read. OVS documentation is really top notch, you can go to openvswitch.org, you should already be there from signing up for the mailing list. So you can just click over and actually read through some of the docs. OVS is almost always doing exactly what it's asked to do. That software sometimes it has bugs, but usually what you're seeing is not a bug in OVS, it's a bug in what you've programmed into OVS. Finally, those are some of my email addresses. I'll see you all in the mailing lists. Questions? So go back, could you go back two slides for me? If you have a backward, okay. So OVS TCP undump, you can actually take the byte stream from that and pass it to the command I think in two or three slides previous, and it'll show you. In addition, because in that particular example, you had a really nice pretty print, like import this version, type ICMP, I think it should. This guy, yeah. So you can just pass it a raw stream? Yes, you can pass the stream of bytes. I forget the exact syntax, but it does actually take it. That's cool, that's really cool. So I could've tried to cook up the XR's pipeline to make it happen, but I'm not that cool. Yeah, just a simple question on the logging thing. Do on system D systems, do you log to the journal by default? What's wrong, what was that? On system D systems, do you log to the journal, like as well as or instead of the log file? Yeah. Cause that's kind of where I look for logs. Yeah, so it's true right now. We aren't logging to the system D journal. That's probably a good enhancement to make, because I think a lot of tools do make use of that journal now. So, yeah, propose it on the mailing list, maybe. So you had, you said that when a packet comes down or upflow, and then the result of that forwarding, there's a update on the... On the flow database. Yeah, yes. Is that from the get go or there is a configuration for all the forwarding done ahead of time? Is it always learn or is it? Yeah, it's always learn. So if you go back, if I go back to this post here. So see this vlog post, if you actually go there, so that talks about programming to a kernel, like the OpenV switch module in kernel. It does not contain flows by default. So by default, the packet comes in and it doesn't match anything. And then it's pushed to an up call, pushed to an up call, and processed. And then that table is updated. I think I'm out of time. I don't think this... No, good talk. Thanks. So to answer your question, what Adam mentioned, I think he was rushing towards his slides. So on the high level, there is, I'm gonna do a shameless plug. There are two other utilities. Plotnet config, which gives a static x-ray of the system within the server. And another project called Skydive. That is exactly what Adam mentioned, but I just wanted to mention again. So those, correct. Yep, you can get a map. We can talk offline also. But there are other utilities above and beyond this. This was at a lower level, but there are higher level maps, sort of network operating system, all of that stuff available. Yes. Yep, in addition, yeah. Thank you. Thank you, Aaron. I just wanted to highlight that there will be a party tonight at seven p.m. at Ziskin Lounge. So please do collect your tickets at the registration desk if you haven't already. And if in case you don't want to attend and you change your mind, please do give it back to us because we just have 200 seats for that. So, but we do hope to see you all there. Thank you. So what's the protocol on changing our laptops? Okay, this is not yours. So we can like... Perfect. And is there a mini DP or is it only a HDMI? There's an HDMI and there is VG. What do you have? I have mini display cord, but I also have HDMI so... Okay, awesome. Okay, I didn't have to run around. Okay, here you go. Thank you. Since I've got like two minutes. Yeah, no problem. And then is there a dongle for... Yes. Sorry. It's okay. What is your time? Open shift for operators. Thomas Cameron. Thomas Cameron, yes. I'm just trying to get... Trying to get a setup. I'll be right here. Oh, yeah. Good, how are you doing? Good. Time is going to collect 12.25, you'll end, so do you want me to give you like a 10.95? Yes, please. Seriously? Yes. Hi, everyone. We now have Thomas Cameron presenting you on Open Shift for operators. Hello, everybody. My name is Thomas Cameron. I am a senior principal cloud engineer at Red Hat and I'm going to cover about four days worth of training in 35 minutes, so buckle up, buttercup. My contact info is up there. I'm thomasatredhat.com. Yes, I've been there that long. And you can follow me on Twitter at thomasdcameron. We are going to be covering quite a bit of stuff. I'm going to talk about setting up the machines for Open Shift. I'm doing this in a rev environment. I'm assuming you're in an enterprise environment that has something like rev or VMware or something like that using satellite. But everything that I talk about here, you can do manually. I'm just showing you how to do it a little bit easier in an enterprise environment. So we'll talk about satellite rev, creating a temple, the build, some add-on software, storing the template, using the template, installing packages, and using Ansible Playbooks. And then we'll get into the configuration and installation. So very, very briefly, when you're setting up the environment to install your Open Shift nodes, I set up content views inside a satellite. And this kind of got me. Be aware that if you're using satellite, when you are syncing the channels for the Ansible components that are required for this, they don't appear under the operating system branch. You actually have to go over to the other tab and then scroll down and enable Ansible Engine 2.4. And it needs to be 2.4 for the installation docs for Open Shift 3.10 Enterprise. So the way that this looks is you create the content view and make sure that you've got all the components in the content view. I did one content view for the operating system, so all the bells and whistles for the OS. And then I did another Open Shift content view that I added Ansible Engine, the fast data path and the Open Shift container platform. And then I created a composite content view with both of those so that I had all of that content available to the machines that I'm installing. I created an activation key so that when I register the systems to the satellite, everything just works. So I created the activation key and I added in the repositories that were part of that, so it's super easy. Again, I don't expect you to pick all of this up because we're moving quickly, but I'm just talking about the software channels that you need to make available in an enterprise environment to make this work. This is what it winds up looking like when I go through and I set up my repository sets. Notice that I had to manually override almost all of the ones, well, all the ones that you see that are overridden by default when you add those repositories to your activation key, they're not on. The repositories are not turned on, so I overrode that in the activation key and that way when you register the systems, they have access to all the software repositories you need. From a rev perspective, I built the OS, set optimization, set the name. Again, I'm gonna move through this very quickly, but I created a template first. Because I'm using a virtualized environment, I don't wanna kickstart a whole bunch of machines, so I just create the template and this is what that looks like. I build it, I set the operating system, I set the optimization for server, give it a name. I set up the memory amounts, so, or actually, let me go back one. The other thing that I did was I set up two disks for storage. That was actually a requirement from a previous version of OpenShift, so you can ignore that second disk if you want, but I did that for the 3.7 and then I realized when I was doing 3.10, oh, they took care of storage for us, so I don't have to do that anymore, yes. All right, so, set memory. I did eight gigs of memory, but you can change that once you're building from the template. I built the machine. I didn't partition that storage drive, because again, we take care of that now. Make sure that you install the Catello consumer RPM so the system can be registered to the satellite server and you install that off of the satellite server. And then I installed the OS and common packages needed, so I registered the machine to the satellite server, I made sure that it was registered correctly, I installed the necessary software to interact with the satellite server just so that I can install all my packages, and then I also installed the packages for rev agents so that the systems would show up under rev. So revm guest agent common, it installs that, install the packages that are recommended in these installation docs, so WGet, Git, NetTools, blah, blah, blah, blah, you have to install all of those. Now I'm lazy, I cheat, and I do a yum group install base, so I have all of the IF config and all the old school Unix stuff, because I'm old. And so that stuff gets installed, I updated the machine and rebooted as per the instruction, so do yum-yupdate and then reboot, blah, blah, blah. And then I installed OpenShift Ansible. So this is a change from previous versions, it used to be the atomic installer. So OpenShift Ansible, it drags in a bunch of other packages, and then you need to install Docker as well, so just yum-y install Docker, so you get those packages installed. Then what I do is I remove all the unique host information for this VM, so MAC address, network UID, SSH server keys and so on, so I just edited the IF config file, took out the UUID because that is unique, and this is gonna be a template, and then I also deleted all the SSH keys, and I unregistered the machine from the satellite server and then shut it down, so you can do subscription manager unregister and then it's zero, so boop, down it goes. Create a template from that, I'm not gonna go into the details super deeply because it's gonna depend on your virtualized environment, but I just created the template. In this case, I did QCAL2 so that it could be spun up quickly in a production environment, use raw. QCAL2 is copy on write, so your initial writes to disks are gonna be slow and that's gonna make your open shift environment slow. So I made the template, you can see that it's locked while I'm creating the template, and then it goes from locked to available and we're in good shape, so that's what the template looks like. You can then delete the original if you want to from the template, and I'm gonna move through this, or delete the template from the satellite server and then also delete it from Red Hat virtualization as well because we don't need that original, we just need the template, so I go in there, delete it, and it's good to go. So then I create the new VMs. Again, I'm gonna go real quickly through this. The one thing that you do want to do is on the master node, it needs 16 gigs. Remember we created the template with eight, so I just go in and I customize it a little bit, give it the name, give it the memory, change that from eight gigs to 16 gigs, and the machine is locked until it gets created, and then lather rinse and repeat for the other nodes. So boom, you create OSC2 through OSC5, I'm doing a five node cluster. So now we've got the machines up and running, or we've got them installed, now we're gonna get them up and running. I'm going to change, now here's kind of a gotcha. I like to use DHCP for all of my network stuff, so what I had to do was I had to go get the MAC addresses from these machines, add them into my dhcpd.com file, then you have to create DNS, and I'll talk more about DNS in a little while, but if you've installed OpenShift, you know that you've gotta have this zone that's specific to your OpenShift environment, it's child zone. So my zone in my home office is tc.redhat.com, I created the OpenShift, not really subdomain, but I gave it names, and then I'll talk more about the cloud apps in a minute, that's where your applications are going to live. So reboot them, they come back up, and when they do come back up, they've got the right IP addresses, we're in good shape, and the host names resolved correctly before, they were just like host one, host two, host three, so that stuff resolves. You have to set up passwordless SSH so that the machines can log in to each other, it's just SSH key gen, and then copy the keys to the other machines. Like I said, in DNS, you need to set up the DNS wildcard for the subdomain, in this case I did cloudapps.openshift.tc.com, that's where all the apps are gonna live, that's what that looks like again, so I've got the hosts that are gonna be in the cluster, and then the applications that I'm going to set up will live in the cloudapps.openshift.tc.readapp.com zone. And they all resolve back, this is a gotcha. When you do this, make sure that the hosts that you are pointing at with that wildcard is one of the nodes that's actually gonna be serving up content. The first time I did this, I didn't read the docs right, and I thought, oh, well, it needs to go to the management node, that is not correct, you want it to point at one of the worker nodes. You want to make sure that DNS is working in both forward and reverse, this will bite you, I promise you want to make sure DNS is working forward and reverse. And you wanna check the wildcard, so I did hostfoo, and I got that .48, and then hostbar, and I got that 40, so you wanna make sure that that wildcard is working, anything that you search in that zone should come up with that address. Firewalling, we actually, I'm gonna skip over this because this is also old information, the new installer for 3.10 just works, it just gets all the firewall rules set up, so I'm gonna blast past this. Let's see. Yeah, there's a few steps involved. Now I'm lazy, because I was doing this in a lab environment, I just said open up all the ports for my local subnet from one to 65.534, don't do that, that was being lazy. Hey, what can I say? So then you're gonna re-register these hosts to satellite, again use that activation key that you used earlier, boom, boom, boom, for one in host blah, blah, blah, register or use an Ansible Playbook for it, they're all subscribed, they show up correctly, the host name's correct, everything's good there. Install NFS Utils, so that if you're using NFS backed storage, you can actually access it, so I installed NFS Utils across all the machines. Here's where we get into sort of the intricacy of the installation. Install the Ansible Hosts file, there's an example at access.redhat.com under the OpenShift documentation. This was really, I will not lie, this was a little bit challenging to get this host file set up correctly, because it's just a generic example and there's some things in it that are frankly just not, they don't work right. So you literally just copy and paste one of the examples. I use this one where there's a single master or single EdCD, both running on the same machine and multiple nodes. And so I copied that over to my local machine under EdC Ansible Hosts or you can put it in another location but that's a default location. Verify that the type of installation is OpenShift Enterprise. If you're doing Enterprise, if you're doing Origin, set it to Origin. And then you've got to define the EdCD server, the master and the workers and the infrastructure nodes. So the way that that looks is you're going to define what your master node is, what the EdCD node is and then all of the worker nodes. Previous versions of OpenShift, the master node was not schedulable. We couldn't run jobs on it. Be aware that in newer versions, 39 and 310, it is schedule. I think it's 39 and 310. But it is schedulable. So your master is no longer kind of a wasted node just doing management stuff that can actually serve out content. Now, if you don't uncomment this line, the default behavior is that no one can log in on the web UI. So we did that intentionally. We want you to make a conscious decision about what type of authentication you're going to use. So I just uncommented the line so that it's going to use HT passwords. You can integrate it with all kinds of authentication backends. Just be aware. The ORED URL line, comment that out because it points at a bogus location. If you comment it out, it just uses the default and it goes and grabs content from us. So comment that line out. And that one killed me. I fought with that for like an hour trying to figure out why my OpenShift node wasn't being able to come up. So just be aware of that. Then you run the playbook. So I actually said ansible playbooks-i but since I put it in the default location, since I did it under ansible hosts, I didn't really have to do that. But this syntax is important if you put your host's file somewhere else, that config file that defines how the hosts are going to be set up. So you run the prerequisites YAML file first. And again, this is all in the documentation. But you run the prerequisites first and then let that run and it gets through in a complete successfully. And then you run the actual deploy cluster YAML file. These take a while and it's going to take longer if you have multiple nodes in your cluster. So mine was a five node cluster, running on some big honkin ProLiant machines with a ton of memory and very fast CPUs and super fast storage. And it still took like half an hour. So it takes a while. So it'll run through and hopefully if the OpenShift gods are smiling, you'll get this at the end with the green okay. Took me a few tries because I had to figure out some of the syntax changes in that host's file. But when it gets done, now you should be able to set up authentication and log into the console. So what you can do is verify in the Etsy Origin Master, masterconfig.yaml file that the HT password or the identity providers section is set up for HT password auth and figure out where the file is that you're going to use for your authentication. So Etsy Origin Master HT password is the default location based on that example file. So what you can do is you can now create the user. You have to have the Apache Utils package involved for the HT password file. But you run HT password dash C to create a new user and you point it to the HT password file and the username is T-camera. And it'll prompt you for the password. And once you're done with that, you can take a look at that HT password file and you'll see that you've got the password in there in an obfuscated format. So now what you can do is you can log in with the web UI and kind of test to make sure that the system's up and running. You're going to connect to the machine at HTTPS colon slash slash your URL and then port 4443. And you'll get the sort of standard pop up that says, hey, this is not private. Be aware of that. You're just going to accept it and get logged in and you're going to use that username and password that you created earlier on the command line. So T-camera in the password and now you've got the web UI. You can start creating applications. So life is good there. The system came up the way that we expected. You also want to test from the command line that you can run commands logged in on the console. Now you can do this if you have the OC commands installed on a workstation like at your desk or something like that. You can use OC log in and it'll create your config file. So OC log in and then the URL if you want to do it. In this case, I was on the master node. So I just did OC log in dash UT camera and it asked me for my password and it said, you don't have any projects. It's a simple environment. You can create a new project or a new application from the command line. But you can look at the config file and you can see it's got all kinds of information in it. It's got authentication information and stuff like that. You also want to log in as admin. This is kind of important because until the admin account is created in two places and until you log in from the command line as admin, it doesn't get created on the back end and I'll show you what that looks like in a little while. So this only works from a node. So on the master, I did OC log in dash U system colon admin. So I'm using a system role in administrator and it creates my user account and then gives me access. And you'll notice that once I'm logged in as administrator, I can now see all of the projects which are all the services that are running that help manage the environment. So I can see all my Kubernetes services that are running, logging, the OpenShift service itself, the Ansible service broker, infrastructure, node services and so on. These are all actually containers that are running in the environment that are managing all of the services that are being handled by the OpenShift cluster. Now, I can do something like OC status and once I'm logged in as administrator and this will give me a cluster-wide status and you notice it points over to the management node, OSE1.OpenShift, blah, blah, blah, and it'll tell you about all of the services that are running. And I mean it's page after page after page. You get a lot of good information of what's running on the environment. I can do OC get nodes just to see what my machines are doing, what the nodes in the environment are doing. And this is actually really helpful. If you see like status not ready, you can start digging into logging and looking at OC status on the node and try to figure out what's going on there. So, and I actually did not even catch that wasn't ready. I gotta go look at that. I literally finished these slides like 10 minutes ago. So, you know, in fine red hat form, right? It's a new version of the software. Got up at like four o'clock this morning, running through the labs, trying to make sure that this stuff's all correct. And then you can do OC get pods to see what pods are running in your environment, what they're doing. So you got like the registry, the console, the router services. You can see what the status of them is, if they're working, if they've had to restart or anything like that. And then for a lot more information you probably ever wanted, you can do OC describe all and then pipe that to less or something like that. And I mean, it is page after page after page. This is actually, as an operator, this is really handy because you can dig down and look at all of the service descriptions. You can dig down and see if there's a status, you know, if something isn't ready, you can page through this and see everything that's going on on pretty much every node in your environment. So now what you can do, now that it's all up and running and you've logged in and you've tested that your connectivity is there, you can actually create an application. Now, I'm an operator. I come from a long background. I was a, I was a, I'm dating myself. I was a Novelsis admin back in the early 90s. Okay, there's somebody out here as old as me. So I was a Novelsis admin. I went to work for Microsoft back in 94 because I was kind of the new kid on the block. So I was an MCSE after that. Been doing this for a long time. But my point here is I went, you know, through all this whole career of administration and operations, I am not a developer. And so when I was doing this presentation or when I was submitting presentations, I was like, I want something for OpenShift that's targeted at me and people like me. So I'm going to talk about building the applications, but honestly, man, me building an application is silly because I'm not, you don't want me writing code. Like, it's just, it's not good, not at all. But here's what the UI looks like. Once you get logged in, you do get a UI. And this is one of the things that I love about OpenShift is that it has got from the factory sort of, we've got a ton of options for, you know, platforms, for languages, for application services and so on. So you can go in and you can drill down and start building applications. Like in this case, I decided, okay, I can do Apache, right? I can do a simple Apache server. That's not a problem. So I go in and I define what I want my Apache server to look like. I give it a name, you know, my first Apache project. I don't really need to fill a whole lot of the other stuff out because it will know to go and grab, depending on what you selected, we pre-populate, like where it's going to go grab the content for the container for Apache. So the git URL is in there. Now, here's something that I have run into in the past. I will verify the application host name. And in a lot of cases, like whatever I put up above, TCA Apache or whatever, make sure that you put the domain name for your environment. If you let it auto-populate, I've seen cases in the past where it'll auto-populate with some name that actually doesn't resolve in DNS. I don't know exactly why that's like that. But, so verify that your application host name has, it is resolvable, or at least that it comes back, you know, that wild card to cloud apps. Remember that I talked about earlier that we set up in DNS? So some name that shows up in the cloudapps.openshift.tz.redhead.com or whatever your domain name is. And so you click on create and it takes a few minutes and you can click on continue to the project overview and then you can go and watch the process where it's downloading the software from GitHub and so on and so forth. So here's what that looks like. You can watch the builds and if you expand this out, you can actually see, you know, what the URL is that it created. It'll tell you that the build is pending. The Apache web server is pending. And as you watch it for a while and the screen refreshes, you should actually see, you know, you can even see where it's going and grabbing stuff out of GitHub. So you can watch the progress there. It's pretty cool. And from an operations perspective, that's actually really helpful because you will get, you'll see where if it can't connect or something like that, you should see that information as well. So now once it's all completed and you transition from, see how over here the pod is kind of grayed out. Once your application is up, once your container is up, that pod will then turn solid blue. And now you know that the system is, or your container is up and running. So you can look at the URL right there and open it up in a new web browser. And there's your application. At this point, you would grab the content, you would clone it, you could make changes, push it out there and put your application into production. Not me though, because I'm a terrible developer. So, and at this point, you've done, you've jumped through all the hoops, you've got all this stuff set up, and you can start allowing developers to access the containers. As I said, you know, because we got 35 minutes and oh man, I actually finished way early, I wanted to move really, really quickly because these are short sessions and because the key points that I really want to make is, or the key points that I want to make are, when you're setting this stuff up in an enterprise environment, you really want to make your life as operators as easy as possible. As an operator, because you're usually going to be using virtualized environments that are enterprise virtualized environments, not like a KVM instance on your laptop, it is important and you guys will get this slide deck later on, but it is really important that you pay attention and make things like your templating and make things like your software distribution through satellite or through RHN or through whatever, make sure that that stuff is nailed down upfront. As I've gone through all the different iterations of OpenShift, we do change things from release to release. Like I said, I was up in the speaker's room like 10 minutes before this like cursing because I couldn't get some stuff to work. Now it turned out that it was silly DNS issues and mostly on me, not on the software, but the big thing that I want you to take away from this is if you get your fundamentals from an operations perspective, nail down, make sure that you've got a good template, make sure that you've got good access and make sure that you do update your systems so that when you're building your environments out that they are secure, then as an operator, your life is going to be so much easier. Just out of curiosity, because I did finish a lot earlier than I thought, just out of curiosity, how many folks work in enterprise environments and are dealing with OpenShift? Okay, so, and then raise your hand if you're using virtualized environments like VMware or Rev or something like that. Okay, cool. So, and then raise your hands if you guys are doing enterprise Linux or raise your hand if you're using enterprise Linux versus a re-spin or something like that. Okay, cool. All right, excellent. So, really that's it. I mean, that's a ton of information in a ridiculously short amount of time, but I think what I'll do now is just open it up to any questions. That was faster than I intended, sorry. Yes, sir. So, my question goes beyond sort of what you presented into the developer's use of OpenShift. So, I know with OpenShift it's sort of a Paz experience. You push your code directly to OpenShift as it get endpoint. The way we're used to building software is we build it and we generate an artifact and then we promote that artifact for environments. And I'm like, how do I create a similar workflow with OpenShift? That's a great question for a developer. Now, in all seriousness, I don't know if you saw when I was going through the screen that had all the pre-hand applications, we actually have the ability to set up an entire Jenkins infrastructure so you can do a workflow. You can point it at your either an existing Git repository and then you can build, kick off automated build processes, or you can point it to upstream GitHub and start build processes that way. So, the whole concept of having that sort of workflow, that pipeline, the development pipeline, we've built that into OpenShift with an expectation that yes, you're going to spin up Jenkins or whatever your favorite CICD environment is, you can set that stuff up and then your developers who are going to know better like what your Dev and QA and UAT and Pro, you know, whatever that cycle looks like. But yeah, you can absolutely do exactly what you described either behind the firewall on a private Git repository or out on GitHub or whatever. Does that answer your question? I answered a developer question. Are you proud of me? Actually, it's really more operations, but you know. What else? Yes, ma'am. What exactly is the licensing on OpenShift? It's a little confusing when I look at it. It says something about the Apache license. Yep, yep. It is just like every other product that Red Hat releases. It is an open source license. We don't sell licenses for software. We sell subscriptions. And those subscriptions cover open source licenses from the Apache Software Foundation license to GPL, you know, various versions of GPL and so on. So when you purchase a subscription for OpenShift, you're not paying a license for the software. You're paying for the support. You're paying for access to the documentation. You're paying for the updates, the bug fixing, all the engineering that we do on the backend. So you can absolutely go, and here's the other cool thing. You can go and download the upstream OpenShift and you can run it in your environment. You just, and it's pretty similar. It's almost identical to what we've got depending on how far ahead the community is versus our commercial product. But you don't get support. You don't get consulting services. You know, it's basically what we're doing is we're wrapping up support, hardening certification with third parties, et cetera, et cetera, et cetera. And that's what you're paying for when you buy the subscription, not the software itself. You can also do that. So you would, if you'd upload it and put it on your own server, you would get the community updates if you wanted. Exactly. How often does Red Hat incorporate community updates? Who are we at, about a six month release cycle right now, Dan? About six months? Three months, three months. Oh gosh, okay, yeah. Which is why I can't keep up. So yeah, so we will take from upstream and what we do is we're behind upstream because what we do is we'll take the upstream release, we'll code freeze it at a certain point in time, apply a bunch of bug fixes to it, certify it with third parties, generate documentation in multiple languages, get our consultants trained, you know, blah, blah, blah, blah, blah. There's a ton of stuff around it. Yeah, and so yeah. I mean, the upstream is actually awesome, but it's kind of wild west. And I will be the first one to admit, like I've been doing this since 1993. I'm not an idiot, I think. And man, you know, there are times where I struggle and it's silly stuff. You know, I messed up my DNS config, you know, and it blew up and I'm like pulling my hair out, trying to figure stuff out. But OpenShift is not a trivial product. There are a lot of concepts you need to understand. You know, we talked about everything from like storage back ends. We talked about, you know, doing updates. We talked about how to build your cluster, what the various roles are and so on. I mean, there's a lot involved in setting it up. So with upstream, man, it's a lot of fun, but it can be challenging. Any other questions? I know I'm the last person between you and lunch. Everyone's like, no, shut up. All right, was this helpful from an operations standpoint? Good, because everything else here seems to be developer focused and I feel like the last man standing, like, okay. Hey guys, thank you so much for coming on behalf of Red Hat. We appreciate you being here on behalf of DevConf. We appreciate you coming. You guys have a great day. Thank you, Thomas. Before we head to... Oh, hang on, hang on, announcement. Yeah, just a couple of announcements. So today, there will be a party at 7 p.m. We would like you all to come, but do please ensure you have the tickets. It's at the registration counter. And if you don't wanna attend, please do return it back. And the second announcement is there will be a keynote presented by Chris Wright and Saren tomorrow at 9.30 at Metcalfe Large. Please do come there. Thank you. And that's on Fedora Atomic Workstation. And since then things have changed and got renamed and now it's called Life and Death on Fedora Silverblue. So I decided that I'd spent the first few minutes of this talk, talking a little bit about the changes that led to that and why it's Silverblue now. And then I'll switch over to give some impressions of how daily life on such a system actually is. So let's dive right in here. This works. Other direction. So the first section is gonna be about the renaming and rebranding and these are the highlights for that section. If you take away just these three points then you can sit back and wait for the entertaining part, the second half. But I want to talk a little bit about the fact that Silverblue really is just Fedora. And I'm stressing that because when we first started talking about Silverblue there was some concern that oh, we're trying to like do a new distro outside Fedora. It's gonna be a competitor and why are we doing that? So that is not the case. Silverblue really is just Fedora. It's the continuation of something that used to be called Fedora Atomic Workstation which has existed since roughly Fedora 25 and it's built from Fedora RPMs just like any other Fedora edition is. It's just that we compose those RPMs on the server side and deliver the content as an OS3 image or an OS3 repository similar to what Atomic host does and we install applications via Flatpak. Yes. But why did we decide that we need to rename it? So I don't know about you but to me Workstation sounds a little old fashioned like when I hear the name Workstation like a computer comes to mind that looks maybe like this like a Sparkstation from the 90s. So that's not necessarily the impression we want to we want to give people so we want to have a name that sounds fresh and interesting and the other half of the name Atomic Workstation also is not so cool for marketing purposes. I mean we in the know with tech savvy users maybe know where this comes from and find it okay but for the rest of the population the closest thing that comes to mind is probably nuclear waste and since we want to reach out beyond this audience here in this room we decided that it would be better to avoid that. So yes, we decided we need not just a new name but also a new logo, a new website and new communication channels. The logo that we came up is the thing you can see here if you squint a little you can maybe see that could be like a leaf. The original name that we were going for was going to be silver leaf. It didn't work out for some reason so we had to go with silver blue in the end but the logo still kind of has this leafy feeling to it which I think is nice. You can maybe think of it as a new leaf on the Fedora tree or something like that. And down here you can see the link to the website that we set up and we have a Twitter handle by that same name and as I said we also wanted to try something new for communication instead of a mailing list that may be the most comfortable for us old farts. We decided to set up a web forum using this course and the link to that is down there. Yeah, what I should also say is that when we started setting this up we did things pretty quickly so we got our own domain there teams silver blue.org and we're now in the process of moving this over to Fedora domains so some of the URLs that you see here might change over time for instance the discourse server has already moved over to the Fedora side because we want to use this course a little more widely in the Fedora context in the future. Right, so Fedora atomic workstation as I said has existed for a while since Fedora 25 and it was kind of undercover. Not many people knew about it. A few of us were trying it on and off and it lived off to the side under the umbrella of project atomic next to atomic host and a bunch of the other new container tools that we are hosting there. And yeah, so when we decided this year at DevCon from February that we were going to try and push atomic workstation and make it something that's more public and more visible and ready for prime time for a wider audience. We did not know yet that Red Hat was going to acquire CoreOS. That was the surprise that we all learned of an on the flight home from DevCon. So it has taken a while but by now it's become clear that atomic host is going to be merged with container Linux that we acquired that way and it's going to become Fedora CoreOS. And the atomic workstation is becoming silver blue as I already said. And the other projects that are currently hosted under Project Atomic are going to find a new home and a new place for container tools. And Project Atomic is basically going away in the medium future. So in a way it was very lucky for us that we already had embarked on this rebranding effort just in time to be surprised by this. Right, so much for names and brands and now I'll switch to the actual topic of my talk and I want to give a little bit of an impression of how daily life is on an atomic workstation or silver blue system. I started, I reinstalled my system after DevCon and forced myself, I had played with atomic workstation before like in the VM, you install it off to the side, you play around with it for 10 minutes and then you go back to your actual system into your work, which kind of doesn't really force you to like figure out how things work on this kind of system. So I decided that I'd have to actually reinstall my system and force myself to use it day in, day out. And what I'm showing you next is basically the experience I had doing that. So the first impressions are, I just show a few commands that you could run on your system to see how it feels. So if you run DNF installed and I can install some extra package, which seems like the most normal thing to do on the Fedora system, you get told, oh, DNF, I'm not found, that's pretty shocking. So you're gonna do it, get anything down on a system where you cannot install a package. Or maybe you might look around a little further and you look for slash user, how it's mounted and you'll see, oh, it's read only. Then that means that DNF wouldn't have done you much good anyway, but it's another sign that this is really a system that's not like the other Fedora systems. The third command I put down here is like calling RPM, RPM at least is installed. You can run RPM-Q-connel and it'll tell you, well, this is the kernel that you have. So that's at least a little bit of a good sign. There's still an RPM database somewhere on the system and it can actually tell you what's installed there. So there's still some familiarity left. Yeah, moving on, basically, DNF is not there, so we have to learn to work again and figure out how you can get things down on a system like this. The important command to know here is RPM-OS3, which is the tool that we are using, inherited from Atomichost to update the system and install from an OS3 repository. There's a bunch of useful commands. The one that I show here up top is RPM-OS3 status, which basically gives you a bunch of output that tells you which image is installed on your system currently, which version it is, what the exact commit is. I abbreviate it just a little bit if you want to see there's more coming, but I'll tell you about the other images that are also on your system because RPM-OS3 actually keeps more than one system at a time and lets you switch between them. As I said, RPM-OS3, the second command down here, RPM-OS3 upgrade is what you run to pull down the latest. Basically, the equivalent of DNF update. Um, yes, what did I want to say about that? Yeah, I didn't have an upgrade available here. Otherwise, it would tell me a lot more about what's actually getting installed. And I should say, I'm running these commands not as root because RPM-OS3 has an assistant demon that runs in the background and if it needs privileges to do things, the command line tool will just talk to the demon and get things done that way so you don't need to run in this route. Yeah, at this point, I'll have a brief intermission about talking about why we may want to use RPM-OS3 in this way. The basic idea is that this is an image-based system so I'm installing an image which is identified as a unit and not individual packages that get combined on my system and then a bunch of postscripts get run in undefined order and the outcome is not really clear. So this is an image-based system where I put an image that is clearly identified by this commit checksum and everybody who does the same will get exactly the same bits on his system which is kind of very nice for reproducibility and QAs or QA can actually be sure that they test the same bits that users have on their system and reporting bugs becomes a lot easier. And the way this works in practice is that RPM-OS3, when I run the command RPM-OS3 upgrade, it will start pulling down the new bits and put them somewhere on my disk until the full new image is assembled. But it's not gonna modify my running system underneath me like the slash user that I have was read only and it's not going to get modified and I actually have to reboot into my newly downloaded new image which means that the updates are actually atomic. So I'm running the old system until I'm ready to switch into the new system with a reboot and there's no weird mixed states and if I lose power in the middle then the next boot will bring me back to my old system rather than to something that is half-updated or broken. Yeah, I already mentioned the third point here. RPM-OS3 downloads a new image but the old image is still around after I rebooted into the new image. If I find a problem, I can easily reboot back into the old system so that means that rollbacks are kind of easy. The last point I put down here is that the downloads are at least somewhat efficient because OS3 uses content addressing so it'll only download files that I've actually changed if it finds that one of the files it needs is already on my system, maybe because it didn't change from the old image to the new image then it's just going to keep reusing the file that it already has. But moving on to learning to do more things on the system, the first line here shows the dirty secret of RPM-OS3 or if you want to put it in a positive way, the charming compromise. You can still install packages even on this image-based system. That's what I'm doing here. RPM-OS3 install GDB. It's going to like go out to Yammer Pository, find the GDB RPM downloaded and then do some magic and the magic that it does is RPM-OS3 will compose a new image on my system by combining the existing image that I have currently running with the layered RPMs and create a new image. I still have to reboot to get the new image. So in that sense, it's still image-based. But I do lose some advantages that I've just had on the previous slide. I said that everybody's getting the same bits. Now I've composed an image on my own system that has whatever RPMs I needed at the time. So it's no longer going to be the same bits. I lose that advantage. But at least it's still atomic. I have to reboot to get into the new image. And in that sense, it's still safe. This is called package layering and it makes the system a lot easier for somebody who comes from a traditional for their own system because at least you don't have, you don't have both inside behind your back but just one. But maybe this is not the ideal way to go because this layering, as I said, brings back some of the problems of package-based systems. So maybe the right way to go is actually to look at containers. And when I started this journey in February, I didn't know much about containers at all. I hadn't really used stock at all before. So I found Builder and everybody thinks it's cool so I thought I should try it and build myself a container. This is how you do it. You say Builder from Fedora which downloads the Fedora base image and then I can run it like this. And inside that container, I'm using a different color here to indicate that these last two commands here are running inside the container. I actually have DNF there because this is like a Fedora system. So a lot more traditional. I can just DNF install something and it'll work. And then I can try to run it and when I did that back in February, I discovered, oops, I didn't have a display here because the container is isolated from my system and I tried for quite a while to figure out how to fix this but I don't know Docker and I don't know a lot of the technologies involved there and despite me trying to leak the display socket into the container, I could never really make it work. A little frustrating. But thankfully, things can get better. The bash on my team is working on a, on making this something that you don't have to figure out for yourself. We're working on a toolbox container which will basically be like a ready-made Fedora container that has a bunch of useful things in it and will be set up in a way that makes it very easy. So I actually, yesterday evening, I finally got around to trying out what the bash is so far and I found that if I use this toolbox, I can again install my little test application here and I can just run it and it works. So this is hopefully coming in Fedora 29 and should make things a little less hard to get going. Yeah. Another quick interlude here. The workstation target audience developers are obviously a very important segment of that. So the atomic workstation is supposed to basically replace it over time so we kind of still need to target developers and that means in this case, in particular we need to make sure that container tools are installed, so all of these will be installed on a silver-blue system. Podman, Builder, Scorpio, and maybe even Docker, I don't know. And this toolbox that I just mentioned, I hope we can bring it in for Fedora 29. And long term, we also want to look at making, yeah, making the terminal that you run on the desktop a little more aware of the context in which you are, like if you're jumping in and out of containers a lot, it kind of becomes important to know where you are so we're looking at making the terminal a little bit aware of that and maybe telling you about that. There's some other alternatives that I tried out back when I started in February just to get my work done. Flatpak Builder is a build tool that comes with Flatpak which uses containers and avoids host tools for doing builds, so that works well on silver-blue. And GNOME Builder is an IDE that does things in a similar way, also using sandboxes for builds and not using host tools. And you can install both of these actually from FlatHub with a command like this down here. Yeah, and now that I've started talking about Flatpak I'll talk a little bit more about that. I said earlier on that we're using our PMOS tree for the host image itself and we're installing applications via Flatpak. And Flatpak is essentially just desktop containers, you could say, it's using all the container technologies that Dan Walsh talked about this morning. It's just wrapped up in the right way to make desktop applications work nicely. So they are going to be isolated from the OS and we can update them independently and safely. Currently the best place to get Flatpak is the website called FlatHub. I put the wrong URL there, but if you go to www.flaphub.org you'll find 350 or so applications that are packaged in this format. And that's pretty good. It's not something that we're comfortable enabling by default for Fedora because there's a mix of proprietary and free applications there. So kind of the API infusion problem all over. And thankfully very soon we'll start building Flatpaks directly inside Fedora. Owen is somewhere here who worked on this. So as soon as this week actually or next week we'll start seeing Flatpaks come out of that. And those will be available from the Fedora registry in the form of OCI containers. And we'll look at making those available by default. Yeah, going back to my experience from back in February after I learned how to do some basic things in containers. I got, yeah, I got somewhat comfortable with the system but I'm using the Rohite stream and Rohite is still Rohite. So you hit all the usual problems in this broken composters. That's not so bad. I mean, if you don't get an update for a day you just keep using what you have. Unfortunately the most recent phase of broken composters has now been lasting for well over a month. I'm a little disappointed by that. I hope we can get a compose going very soon again. Yeah. And sometimes you also hit a working compose and it just turns out that the updated image afterwards fails to boot or doesn't work in some other way. Which on a traditional Rohite system would be the time where you like look at maybe a few hours of lost work because you have to like struggle to get your system back into a working state. And that's always very annoying which is why a few people use Rohite if they have any sense. Thankfully on silver blue Rohite this is a lot easier because as I said earlier there's easy rollbacks. You can just reboot and switch back to the old image and you lost maybe a minute or two, not an hour or two. But that's very nice. And that actually gave atomic, the slogan feel as updates. That's an important characteristic of this kind of system. You can just update and if it doesn't work as you thought it should then you can easily get back to a working system. But it's still just software so it's sufficiently stupid or determined human can still break it. And I thought I'd tell you about the one time where I almost painted myself in the corner on the system. So this was a snow day, it was morning and I had just installed an update and I needed obviously a web browser because it was snow day, I was stuck at home. Needed to be in a meeting in an hour and so I rebooted into my new image and I didn't get a login screen because Norm Shell was giving me this error message there. I didn't find the GBM renderer, blah, blah. So far so good. I didn't fear the update because these are fearless updates. So I went on IRC and asked some of my graphics team peers what to do and they recommended I should get them the output of a little tool called EGL Info. So I put it back into the broken image, run level three this time, try to run EGL Info and I'm surprised this wasn't installed. So I have package layering available so I just ran Appium Austria install EGL Utils which contains that tool and rebooted to get into this image with EGL Utils installed now. And well, now at the bootcamp I realized my mistake. Now I was faced with the choice between the broken image and the broken image plus EGL Utils. Which I mean, the second one is kind of what I was going for but I had hoped that my working image was still available. So it turns out that Appium Austria keeps exactly two images around at our times and it always assumes that it can never replace the one you're currently booted in because that's supposedly the good one. In my situation I was booted into the broken one and still needed to create another one. So this was a bit of a bummer. I had 50 minutes left of the meeting. So what to do? I remember another slogan that Atomic has which is like Austria is like get for operating systems. So if it's like get, there should be a history somewhere. I thought a little bit of grabbing around. I found a command called Austria lock which lists existing commits or images. And I kind of abbreviated the output here but the important line is the last one here. History beyond this commit not fetched. Basically what Austria was telling me is that I only had like two commits available like the current one and the previous one and the rest of the history is not kept because it's kind of bulky and my disk is not infinite. So I had to figure out that I can actually pull more commits from the remote repository using Austria pull, giving it a depth. So I was going to like go five versions back not infinitely far because we were so much just crazy. And then I could use another command called Austria admin deploy to actually switch to one particular committer that knew was working. And this all was a little stressful but I managed to make it just in time and got into my meeting. So that was good. But I guess this was the closest I've come to death on silver blue. And that's basically the story I have to tell. Hope you're interested in trying this out for yourself now and if you want to, you can actually, you don't have to wait for this. You can try it out. There's a kind of pre-release that we did for the 28th of spring or not sure when I read it was released. And for the 29th, we'll have a first, I guess proper release of silver blue. And some of the things I've talked about here today are going to land for 30 minutes, hopefully going to be ready for prime time. The ultimate goal of this initiative is that we'll make silver blue basically ready to take the role of the traditional workstation addition and be useful for everybody. But you can try it today. Oops, wrong direction. Yeah, that's all I had for you today. There's a bunch of links you can follow. Don't learn more. The docs website has a list of references to the blog posts I wrote since February that have a bunch more stories of life and death on silver blue. And thank you for listening. And if you have any questions, I think I have some time for that. One second. There's a microphone. So two questions. So silver blue like image itself is basically the same set of packages as for where a workstation has, plus a few more, correct? Pretty close, yeah. So I'm managing a fleet of REL7 workstations and we frequently have to provide additional packages that aren't applications, but add on to the base, like, you know, GNOME desktop environment, like say, additional GTK themes because users want them or additional fonts because users want them. Would those be installed as layered RPMs? So that's certainly one option. You can just use layering, but that kind of goes system by system. So if you have a whole fleet, that might not be the most practical approach. You can also compose your own image if you want. I mean, the tools are like the RPMs, which the tool is also used on the server side to compose the image. That may not be the most well-documented aspect of this whole system, but it's certainly possible. People have done that before if you have a whole system and you want to like put up your own O3 repository that has a image that is a bit larger, has other packages in it, that's certainly possible. Yes, for instance, there's a few people who created a KDE variant of silver blue, I believe. I forgot the name, but so it's possible. Gotcha. So I also maintain a fleet of workstations, mine or Fedora, but what does configuration management look like in a world like this? So this is a workstation, so most people use it on their laptop, so it's kind of individual systems, I guess, most of the time. And if you're looking at Etsy, that's not managed by RPM Austria or rather, it's managed in a somewhat smart way, like Austria will merge your own edits in a slash Etsy. With whatever comes in from the updated image, like from packages, and tries to do a three-way merge there to make interest. Okay, just for reference, we have workstations that are old term, I suppose, but are actually managed through Ansible and such, so. Yeah, I mean, Etsy is open, you can edit Etsy and like whatever tools you use to manage that configuration on your systems, should work on atomic systems as well, I think. So I think it was clear at the start, but I just wanna make sure that the RPM OS tree, that approach versus like the Fedora Core OS, or the Core OS merger, all that kind of stuff. Silver, blue is staying with the OS tree stuff, is that correct? Yeah, so I can maybe say a little bit more about that, but you can also go to Colin Waters talk tomorrow to hear like what the plans for Fedora Core OS are. But I believe that Fedora Core OS is also going to continue using RPM OS tree. They kind of tried to pick the best technology on both sides, and I think for the kind of atomic update system, image based system, I think they decided to use RPM OS tree. Do you have to reboot for any update or? As long as you do the things that I showed here, yes. Like if you install like using RPM OS tree upgrade, that'll require reboot to like get into the new image. If you use RPM OS tree installed to like layer package on top, that'll also create a new image that you need to boot into. There's some experimental things. RPM OS tree is full of these charming compromises. There's also like an experimental thing where you can like unlock the system and install it package live, which basically gets you all the way back to traditional package installs. You install into the existing image and things you do take immediate effect, but they're usually gone with the next reboot. So there's different compromises there. Yes. It sounded like there were three options for tools that are not in the base image. You could use a flat pack, you could use a pet container and you could use a layered package install. How do you make choices between those? So I'm not the expert on containers, but in my like experience since February using the system it seems that package layering brings back a bunch of the problems that you have from the traditional system. They're like mirrors get out of the sink and then there's a conflict between the layered package and the base image. And like things tend to like fall apart a lot more if you have a lot of packages layered. So I found that it's much nicer for me to stay away from package layering and use either like a container or like a flat pack builder style approach for doing things. That's the amount of recommendation I have I guess. We need to walk a lot. So you said you had a, you're working on something called the toolbox that's a container that enables you to run graphical apps and other containers. So I've tried, you know, three years ago I tried to run graphical apps in Docker and it was annoying because I had to use Tiger VNC or X2Go or try to figure out some better way to do X load and forwarding. I can't find any information about this toolbox. Is that the name of it? Or is that to see your nickname or? Probably that's the nickname. I mean it comes from a, the core has actually had a thing like that called toolbox I believe or has a thing like that which works very much the same way. You basically have this toolbox command which gives you a fedora and a container. And so we kind of stole that name from CoreOS but we haven't really published, Debushy has not published what he has yet because he's still like, not quite happy with the current state of affairs but I'll push for it, getting that out there very soon so we can try it out. Thank you. My question's about, so you at the very end, you quickly went past the future of Silverblue how it could become maybe part of work station in general. OS trees initially have to be released as images themselves. So how will that work if OS tree is part of a normal work station release? Is it going to be released as an image? Is it going to be a feature within the installer to make it this immutable platform that's updated through OS tree? What does that look like? So I think we're basically going to like steal whatever atomic host does with Anaconda like this. An atomic host has an installer that basically does this already, right? It gets an OS tree image onto your system initially. And I believe it does that by having an OS tree repository on the ISO and installing from there. And we're just going to use the same. Yeah. Yeah, right. So one thing we haven't worked out in this area is how to pre-install flat packs quite yet because currently at the Silverblue image, I didn't say that, it still contains a bunch of like graphical apps that are just expected to be there like the terminal and the file manager and like a simple editor, things like that. Ideally, we want all of those to be installed as flat packs. So we want to take them out of the OS tree image, but we don't want to give you like an empty system when you like install for the first time. So we kind of need to figure out how to pre-install flat packs with Anaconda. That's one part that we haven't solved yet. And I assume that'll in the end look similar to what Anaconda currently does for the OS tree image itself. Okay, we have time for one more. You got it? Perfect. So in many IT organizations, you want to mirror the repositories that use for YUM. And there's multiple ways of mirroring YUM repositories. And I see that OS tree has a way of mirroring repositories, but is there a way of mirroring like flat packs? Well, I didn't say that explicitly, but flat pack actually uses OS tree itself like underneath for repositories and for local storage. Flat pack is very similar to APM OS tree in that respect. So it's also using OS tree as the repository format. Whatever's going to work for APM OS tree will work for flat pack, most likely. Great. I recommend that you go to Colin's talk tomorrow. Maybe he's going to talk about some ideas he has for making mirroring of OS trees easier and better. Who's talk? Colin, Colin Waters. Thank you. Thanks for this. Thanks everybody for coming. OpenShift team, I absolutely love my job. I wasn't planning to say all this, sorry. But I started out as an intern working with the containers team, and that's how I got really interested in Linux containers. I now work up in like I'm working for the OpenShift installer now, but I worked with Dan Walsh as an intern and like I won the lottery for internships ever because he's awesome. Yeah, I'm Urvashi. I'm a software engineer at Red Hat also on the OpenShift One-Times team. I also started as an intern under Dan and still working with him, and he really is a great guy. So today we're going to talk about container security and how you can use them all. There's so many options. There are so many options out there and so much innovation going on surrounding this container space. All right, before we start, how many of you were at Dan's talk in the morning? OK, quite a few. Some of you, or most of you, or many of you know what a Linux container is, and you've gotten to maybe a few talks today where it's been discussed. But Linux containers, when we talk about them, they're a normal process. He's running on a Linux host, and they have three things going for them. They're constrained, they're isolated, and they have some extra Linux security features added. So how are they constrained? Well, Linux has a C groups is a mechanism or control groups that can limit the amount of resources like CPU or memory that a process on your container can use. And the isolation comes in with Linux namespaces. There are six Linux namespaces, but one example of a Linux namespace is what gives the virtualized feel of a containerized process. So if I'm in a PID namespace, I'm a process running in a PID namespace. I think I'm the only process running on that host. And I can't see any processes running outside the host. Same thing with mountain namespace. You can mount a whole root FS inside a mountain namespace. And in that way, you can have the whole Ubuntu user space inside your container and feel like you're running an Ubuntu system. But really, you're just a process running on a Fedora host. So that's the idea of namespaces. And then Linux also has Seccom for syscall filtering. They have Linux capabilities and SeLinux to add to the isolation of the Linux container. So Linux containers have become super popular over the past just a few years. And there's been a ton of innovation and development surrounding this. And really, in two different areas, there's the container image format and there's the container runtime. Those are the two pieces that you need. So the industry got together and formed some open standards, open industry standards, surrounding those two areas, the container image format, the container runtime. So now we have OCI images, open container initiative. And that has enabled all sorts of development. Now we have a nontraditional Linux container called host-separated containers where they don't share the host kernel. They actually wrap each containerized process in its own virtual machine. So that's like Cata containers or GVisor or NABLA. And also now we're free to develop all sorts of tools surrounding Linux containers, tools that we know will work. If we follow the OCI specs, we know they'll work with all container runtimes and we can run any OCI image with any OCI container runtime. So standards were very important in moving things forward with containers. OK, so now that we know what containers are, we can actually break the container space into four different sets of actions. These are one, building your container images. Two, running and developing these containers locally. Three, sharing your container images to like remote registries, moving them around from one environment to another. And four, finally running them in a production cluster. So what would happen if all of these functions were actually in a monolithic tool? Of course, we would end up having the least common denominator for permissions, which will affect the security of the system overall. So for example, you don't need all the privileges to run a container as you need to build container images. So why have them all together? Hence, we decided to break these actions into four different tools following the UNIX philosophy, which is design programs that do a single thing. So they do it well and perform well together. And obviously, all of these UNIX founders are very happy that we follow that. So these four tools I'm talking about are Builder. The name says it all, building container images. Podman for running and developing containers locally. Scopio for moving around container images and sharing them on container registries, as well as Cryo for running your containers in production in Kubernetes or OpenShift. So yeah, let's go through these and talk about when working with containers. So the first one we're going to talk about is Builder. Builder is a tool that Red Hat's been working on the past few years to build containers. What do I think about when I want to build securely and build secure containers? One thing that comes to mind is minimal images. You want to create images that have as little in them as possible to minimize the attack surface. The more you have in an image, the more that can go wrong. Yeah, and another security feature that Builder offers is that you can actually run Builder in a container. This way you're adding an extra layer of isolation between your host and your build process. So the new container images that you're building, you can give them elevated privileges that if they end up breaking out of the container, they would not affect your host in a bad way. They would not be able to affect your host. Wouldn't it be cool if we could show this live? Yeah, we have live demos for you. Oh, good, all right, so. And we've already sacrificed to the demo gods, so they all go well. Live-ish, we're pretty live. So I'm going to show how easy it is with Builder to create a minimal image. When I run Builder from scratch, the command Builder from scratch starts a container with absolutely nothing inside. It's completely empty, you're literally from scratch. And this will spit out a working container. So of course with Builder you can use Docker files, but this is, I'm showing without a Docker file that you can just start a working container, plot stuff in it and then commit it. That's what we're showing. So say I want to install a package. I have one thing I need in my container. Well, my friend Nolan told me about a, oh first, so if I want to do that, I want to create a mount path so that I can install something from my host into the container without having to have DNF in my container. Again, there's nothing in there. So I'm going to use my host DNF to install this small package that my friend Nolan told me has no dependencies because we're a live demo and I didn't want to take forever to install this. And I should have hit that button while I was saying that. But anyways, the Fedora repo. I can tell you while we're waiting, actually it's pretty fast. I don't think I'm going to have time to. Last week I created a minimal image in work. I'm working with the OpenShift installer and they have a bunch of Terraform files. So with every pull request, we want to make sure we run Terraform format. So that requires this Terraform binary that not everybody has on their system and certainly our CI didn't have it. So instead of installing it CI-wide, oh, and we're running in Proud. So everything runs in a container. So I created a minimal container with Terraform. And I, to run it in Proud, I'm just going to finish my story. To run it in Proud, you have to volume out the source code that you want to check. It's like Golint or GoVet. You just, and so you volume out the source code and then you also have to volume out and read, write the temp directory because Terraform has to write to that, to the temp directory to do its thing. And then we run that in Proud and it's a minimal image and it's not going to blow up the CI infrastructure if something goes wrong with it. There is nothing that's going to go wrong because there's only that one thing in there. All right, so now that I have installed my small, oh, what does it say? I think you know what that. Oh, okay. So now I can commit my image. You notice I unmounted that directory and I can commit it and I'm going to call it, what did I call it? I called it minimal image, okay. There. Now that I have my image committed and I can just run any container run tools such as Podman and you can see what's interesting is what's not in the image. If I try to ping, that's not there. Most packages, a lot of packages will pull in Python. I didn't need it and I don't want it, so that's not there. All this image can do is run busybox and that's the busybox help menu. Pretty awesome. So again, I use the host DNF to install packages or you know, with Terraform I downloaded the zip file and unzipped it in that mount directory and then there it was in my container. Okay, so now I'm going to the demo and running Builder inside a container. So I have this Docker file. I already built an image that has Builder installed in it from Fedora before starting this demo so we don't have to sit through that. The entry point is set to Builder so that I can use Podman to run this image and I can do stuff like build a bud, my image and I'm telling it where my Docker file is which is in the volume I'm mounting in my wall. So once I do this, it's a small simple Docker file from Alpine just set environment and label and then commit it and I can actually go in and look at what this if this image was built by doing Builder images. And as you can see the last image, the most recent one that's the image that was just built inside the container. Yeah, so this is all inside the container. Yeah, this is all inside the container. And so now you have that image if you could push it up to a registry you could push it over to your host. Yeah, play around with it. Yeah. Is that it for both? All right, let's go back to the slides. Yeah, so now the next thing we want to do is to be able to run and develop containers locally. For that we have this tool called Podman where you can manage, develop, test your containers locally. It's an all in one tool, more of like an entry level tool. It has basically, we basically have covered everything that the Docker CLI has to offer and much more. We actually have Podman pods command as well where you can create pods. So one of the cool features that comes with Podman in terms of security is that you don't need to have good privileges to run Podman. That's great. So admins can actually get away with not giving the developers any good privileges. And added consequence that is really good about this is that it offers you compartmentalization such that in the way that multiple users can work simultaneously on the same host machine and not be able to access each other's work. So for example, Sally and I can be working on a host machine, but I wouldn't be able to see or even know that she has any containers or images on the host. Everything is in their own compartments. Adding to this compartmentalization, we actually, Podman actually has this feature called username spaces that adds to more isolation. So what username spaces mean is that you can map a certain range of user IDs in the container to a different range of user IDs on the host. So I can map, for example, to UID-0 in the container to UID-100,000 on the host. So my processes will have the root privileges in the container, but on the host, they'll be running as 100,000. If it breaks up, it can't cause any diameters. I won't have those privileges to do so. That's pretty cool. Yeah. So adding to that also is that we can run each container in its own separate username space. What is this? This means that if the process breaks out of the first container, it will still not be able to access the second container, as it won't have the same privileges. I'll delve into this a bit further when we go back to the demos, as it's easier to see. Yeah, and also with Podman, if you went to Dan's talk, there's no daemon, there's no daemon there, no big vet daemon. So Podman runs in a true fork exec model rather than that client server model that we're used to. So what that means is the child process started by Podman inherit the parent login UID and you can easily trace through Podman who on the whole system has been running things. And I can show that in the demo too. I think we can go to the demo now. Yeah. Demo time. So you do the rootless first. Yes. So as I mentioned, as you can see here, I don't have students on Podman, so I'm gonna be pulling an image by running Podman in rootless mode. When I list the images, you can see Alpine. And just for comparison's sake, I'm gonna list the images using root privileges so you can see that my root images I have way more than I have Alpine to just emphasize the compartmentalization I was talking about earlier. Just a quick to show that actually works by running the container. I can run Alpine and list what is there in my home directory. Yeah, so now back to the username space stuff. So using Podman run, you can use a UID map and that is basically telling Podman that map, UID is here on the container to UID 100,000 on the host and do it for the next 5,000 UIDs there. So that's the range. I'm gonna run this detached on the background and we can use the Podman top command to look at what the user, what the user ID is in the container and in the host. So as you can see, it's root and 100,000. The latest flag is just a really cool feature we have in Podman that tells it that just use the container, the most recent container you've created. So you don't have to go back and get the ID of the container and all. When I do a PS of the same thing and graph for sleep on the host, you can see that it is running with the UID, user ID 100,000. To show you how what I meant by each container having its own username space, I'm gonna create another one, but map it to 200,000 instead. Same thing as before. And as you can see, the process is here. One of them has 100,000, the other has 200,000. So if any process from the container that has the UID 100,000 breaks out and tries to talk to the one with 200,000, it won't be able to enter completely different username spaces. Well, so the fork exec model I wanted to show is pretty easy to show. So on my host system, to see who I am, I can cat, proc, self-log in UID and you'll see that it's 1,000. That's the user currently logged in on the host. So now I'm gonna run a container, just a Fedora container. I'm gonna cat, proc, self-log in UID from inside the container. And as you'd expect, since it's a fork exec model, the person logged in there is me, log in 1,000. Now, the interesting thing is in, with another container runtime, if I do that same exact command, what does that mean? Well, that is the number that equates to an unsigned 32-bit, I think I said that right, yeah. And that means that I have no idea who that is. That user has never logged into the system. So I hope you see the problem here. And that is, if I try to do something tricky, like touch the Etsy shadow file, now back on the host, Sysadmin can use the audit search tool to see exactly who that was. User 1,000 who is cloud user did that. So the interesting thing is I can run a Docker command and I can volume out the root directory and touch Etsy shadow. And now I see someone did it again and who was it? Well, that user is unset, which means I have no idea who that was. And so you can see the problem here and the benefit of that fork exec model in being able to audit who's doing what on your system. Now, we want to show a couple of the neat features of the podman top command. Podman top just prints out in a nice pretty way. Some, you can use it to see what security things are enabled with your container. So here I'm just gonna run a Fedora container and if I pass label to podman top, you can see the SC Linux label that's currently there. I can make sure that my container is running with SACOM filtering turned on. Also, you can check and see what capabilities are currently effective inside the container. I'll talk a little bit more about the Linux capabilities when we talk about cryo next. But podman top is a really useful command. Back to slides. Oh, yes, back to slides. So SCOPIO is the, so we know what a container is, we've created the image, we've played around with it on our local system with podman and now we're ready to, what do we use to manage the image? What do you, when you add security to a system, the first thing you wanna do is not have to run root. And there's no reason to run root when you're managing your images. SCOPIO is our tool to, originally it was designed so that we could inspect a remote image from a remote registry. Before SCOPIO, in order to check out an image, you had to download that image to your system and then run inspect on it. Now you can run SCOPIO inspect from a remote registry and get some useful information, the JSON file that describes the image and the layers and who owns it. But the important thing is that it can, you don't need root to run SCOPIO, there's no daemon, there's no reason for root. And also since the original use case was so great, we've added some other things like you can copy an image from one registry to another and never have to even have that image on your host system. You can delete images right, yeah. Yeah. I don't know if you mentioned this before, for Builda also we can run Builda without root so all our three tools here, SCOPIO, Podman and Builda, you do not need to have root privileges to run them. You have the option to do that but you can also do it without root privileges. So here's just an example of some information you can pull down, we pulled this down from the Docker hub. You can see all the tags available, you can see the images in there. That's just our Docker hub Fedora image. And in the spirit of like don't run, don't download and run random crap off of the internet, that's what SCOPIO solves there. Okay, so now that we have a tool to like, we can use it to build our container images, we have already tested and run them locally using Podman, we have put them on registries, moved them around using SCOPIO. The final thing we want to do is to be able to run these containers in production, right? We want to be able to run them in Kubernetes for example. So that's what Cryo does. Cryo is a container runtime interface that helps, so cryo is a container time interface that you can use to interact with the Kubernetes API to launch containers in production. We firmly believe that when you're running containers in production, you should run them in read only mode. What does this mean? It just means that processes running inside your container should not be able to write to any part of the container that came from the image, making like almost every path in your container immutable. Now you're wondering, what if I need to run processes that write out that write stuff to a path in the container? What if I need to save vital information? Guess what? You're actually glad that we have the read only mode because if you were writing in the container, you could have lost the, you would lose the information when the container is destroyed. The way to go around to doing this is to mount volumes into your container, write to those paths and then those paths will have, and the contents will still persist even after your containers are destroyed as those will be like linked to a path on your host. So that's what the read only mode does in the cryo. And then we'll show you that cryo, it's really convenient to set what Linux capabilities are enabled system wide for all of your containers. And so Linux capabilities are a way, they divvy up the super privilege that you can have on a Linux system. So there's like the Trone capability and the NetRoc capability. There's a list of about 40 of them, but with cryo by default, we only enable a very small subset. I'll show you that list in a minute. And the idea is run with as few capabilities enabled as you can and only run with the ones you need. This will just again minimize the attack surface, minimize the chance that something can break free and wreak havoc on your host. Cryo also has the same username space support as Podman does. The only thing is that it's still working for us in Kubernetes. So we're waiting for Kubernetes together so we can actually take advantage of this feature in cryo. Here it works for the federal government. It'd be interesting knowing about cryo as your only option for running things in FIPS compliant. So FIPS, it's a list of encryption algorithms that are permitted to be used and the federal government pretty much makes their employees run their systems in FIPS mode. So cryo is the only container runtime that knows what that is and can carry that information into the containers and enforce it. Back to demos. So the first demo is read-only mode. I'm just, we have a config file for cryo. I'm just gonna show you that. I've said the read-only flag to true. It's telling cryo to run all containers in read-only mode. Restarting the cryo daemon and, oh, so cry-ctl is actually a CLI tool that you can use to debug and run containers in cryo since cryo is actually made to be run with Kubernetes. So this is just a way to locally do it and we use JSON files for that. So I'm using the runp to create a pod. Using that pod, I'm creating a container and starting the container. Now, I'm going to exactly the container and try to DNF install build for example. As you can see, that failed. That's saying it's a read-only file system because when you DNF, it expects to write logs to our log which is a restricted file path. The great thing about this is that for example, if your container gets hacked into by mistake, the first, not by mistake, but gets hacked into, your hacker would want to put a backdoor in place, right? So the next time you start up your container, they would have easy access to it. This stops it. So run your containers in read-only mode in production. Also run with as few capabilities as possible. I want to show you which capabilities are enabled by default with cryo. It's just a small subset and it's super easy to just go in and delete a couple of them. And all you have to do then is restart cryo. And I'm going to start a pod. Again, the cryo starting pod is a little bit cumbersome. And here, if we print out the capabilities, it's not as pretty, but you can see which capabilities are enabled there. And if you're in a cluster, this information also carries through to the pod, not just the container. So this is information about a pod in your cluster and you can see it has the shown gone and whichever went out. Also I deleted, I can't remember. Dack override, I think. Dack override, yeah. Yeah. And is that it? Yeah. Am I gonna have the demos? That wasn't so scary. I thought it was all these security features that come with our tools. Please, please try to use them when you're actually using them so that a great guy like Dan Walsh doesn't get upset and he's too good for that. And remember the UNIX founders. Yeah. So these are the resources for you if you want the GitHub links to our tools and the demo script is right up there if you wanna get it, play around with it. And we actually have this coloring book so if you didn't go to Dan's talk or Scott's talk in the morning and then get this, we have them here for you to take out. It's just a book that highlights the tools we have on a high level and you can learn as you color. Thank you. Thank you. Awesome. If you have questions, I'll just combine. Give me the mic. Yes, we explained everything. Perfectly. Yeah. Yeah. No questions. Just a few closing things. Don't forget tomorrow at 9.30 to attend the keynote speech by Chris Wright, our CDO, Red Hat CDO. And tonight if you're partying, I think they're still available. So if you don't have it, you can go walk up to the registration that should be available. Thank you so much. We're showing up to see what we've been up to with Minecraft and OpenShift and children. What got you started, Eric, when you were... So when I was young, about 12, a friend of mine bought a computer, an MSX computer, like this one. Okay. And my first, so it booted up and it was basically a basic environment. So you could type your own program in there. And the first program I did was just a little print my name. Right. And right away I was really caught by this. I got a book in the library and started programming. I got hooked. That sounds very familiar. That sounds very familiar. I had a almost identical experience on this environment here that some of you might remember. What you're doing in this environment here for those of you who remember. Very, very similar, right? And those days, it was different. There was no... There's no icon here to double click on to get onto YouTube. Exactly. No apps to tap. It was really simple. And that's a bit of a problem right now with kids. If you want to learn them to program and you say, okay, this little program can print your name on the screen, they're not really caught by that anymore. That's a little boring, right? Print to the world is not getting them anymore. Right. Right, right, right. I wonder if you have any idea what we could do about that. Well, my son went to this DevOps for Kids once. Okay. And they had Minecraft there. Oh, that's a cool game. I'm not playing that as well. Yeah, because that really gets them because they know this game and they play with it. And in DevOps for Kids, they show them how to extend this game. So what he could do is build a little plugin that he could change arrows into cats. Oh, right, to throw cats through the air. Exactly. That's cool. That's really cool. But then we came back and I said, okay, let's do this more because it was really cool. I thought, hey, that's cool. I would like to see how it's done. And then he said, yeah, I don't know how to because you have to install these things. I had this Eclipse workspace and all that stuff. Because Minecraft has a Java API. Right. To set up everything. You have to set up Java, you have to set up an editor. The server. The server, the plugin, the workspace. I need to compile these things and copy these jar files. And that's all a bit much, I guess. That's a bit much for kids. Right. Compared to our print. Our super environment here. We should just turn it on and go. I wonder if we could somehow get back to have an experience like that. What would we need for that, do you think? We need something like a server running somewhere. Yeah, that would be... An easy way to set it up. That would be in one of the elements, right? Some way to run this, in this case, the game server. Right. To just sort of fire it up and get on it. Luckily. Luckily. I've been working on the developer experience. And we have this project called Launch. Okay. And it's basically, show the website. It's basically a website where you can get going on OpenShift, real simple and easy. Right. I like where this is going. OpenShift, like this runtime environment, where you can get free accounts to hold your applications and run them on. Perhaps we could just run our server there. Right. And let our kids connect to that. The way we made it is that we have supported runtimes here, like Swarm, Vertex, and Node, and Spring Boot. This all looks very serious. This is for work. This is for work. But it's made in a way that you can also extend it and add your own examples to it. Oh. So while Monday to Friday at work, we can create new projects with Wildfly, and Vertex, and Spring Boot. And in the weekend, we can make Minecraft servers for our kids. That sounds cool. Can you show us? Sure. So we have this server set up on our local OpenShift. So we have a local OpenShift running on this laptop. That's just because we didn't trust the wifi here. Right. Right. So we actually have an OpenShift running locally. Yeah. It's this thing called Meshift, if you can use for that. Right. So here we can say, I just want to build and deploy and run on this local machine, local OpenShift cluster. And then you can choose what you want. Oh, check this out. Node, Jboss, Node, Vertex, but Minecraft, Sponge for the whole of Sponge. There's a kind of different kind of APIs where you can build stuff on Minecraft servers. Sponge is one of them, and that's the best one. Okay. Of course. That's the best one. That's why we have it here. And you can choose here, a simple plugin example. Okay. That will show you how to do some... Extend Minecraft. Yeah, extend Minecraft and then do some basic things. Okay. And then you choose your GitHub account. So it will create a GitHub repo for you with this example code. Okay. Right. That sounds really handy to get started. Right. And then when you press here, setup application, it will create a GitHub account, add some web hooks. Okay. So that when your code changes, the server gets restarted and loaded with your new changes. Right. And it will set this all up and open shift. Wow. So you can sort of with one click setup application, you can get started with an example, have it running in the cloud, let your friends connect to it. Yeah. And this is pretty fast or? Yeah. So we could, it's just a couple of minutes, but we don't... A couple of minutes. But like a good chef, we... Here's one I prepared earlier. Right. Yeah. This is a safe three minutes here of hour 30 that we have with you. Yeah. So let's see, this is the GitHub repo that it created. Okay. And it has web hooks. So if I would go here... Yeah, wait, hang on, hang on. I lost, you lost me here. So what are we looking at here? This is... This is the demo plugin setup. Okay. So here is an example of a... Of an example code. Minecraft extension. So this is what a kid would... Could be shown how to extend Minecraft. Yeah. So then you could go and play now this... Oh, I'd love to. I'd love to. Go and play in this. It's time for some fun? It's time for some fun. All right, all right. Let's try this, let's try this. Okay. So this is the server that's also booted up and run setup for you. So this runs on this laptop. Inside our OpenShift is already ready. Local single node cluster that we have here in a container. And those of you who know Minecraft, it's a standard Minecraft world. Walk around here. And look at that. Well, what's happening here? That's the extension. That's the plugin. So... That's changed the behavior of Minecraft. And whenever you collide, so whenever you jump, it will print this boing in the chat. Boing in the chat. Yeah. Great. Okay, so... And that's... You're saying this is coming from here? This is coming from here. Look at this. An event listener and it has a collide event handler and then it finds the player and says boing. Okay. So walk me through this again. So we have on GitHub an example project, How to Extend Minecraft, which the OpenShift launcher created for us. Right. And this is running because... How did this actually get sort of... So there's an S2I, the source to image kind of a thing. All right, source to image is this thing they have an OpenShift. Right. From source to containers. Right. Builds this code and boots up a server running this plugin, adding this plugin. So all I have to do is change this. So if I wanted to change this, I could just go here, go to edit and change boing into something else. Oh, like right on GitHub. Right on GitHub, check it in. And it will fire the webhook and reboot my server with build the plugin and start it up again. It'll do the Java compilation to build the plugin. Yeah. And then create a new container image with this S2I stuff. Right. It will stop the previous container to a deployment that replaces the previous server and gets the new one out. Right. So can I just keep playing or how about that? You will be disconnected because the server restarts. All right, the entire process is wiped. Yeah. Killed. I mean this is interesting and sort of to learn about OpenShift, it's okay. But I have a 10-year-old daughter and she's like, well, you gone back to YouTube in a while. Yeah, I'm afraid, I'm afraid you're right. That's so not gonna. Yeah, it's not gonna fly. If only we had some way to hot reload code. That would be nice. So that we could actually change it live or something. Yeah, that we could change it and then build something and then hot replace it. So there's a couple of things, at least two sites to this, right? We would need a way to change the code somewhere more direct because this whole GitHub committing it rebuilding. So one to address that, we have something like Eclipse Shea which is an IDE that runs in your browser. So instead of installing Eclipse on your local machine, we could run this also on our OpenShift instance. And now, instead of having just a little bit of fancy text editor, here I have now the same project, see? The same project, the same code. Oh yeah, this is the same thing that was just up on GitHub. Right. And instead of in GitHub, it's just in text editor. It's just text there, yeah. Yeah, with some syntax highlighting. Okay. But here I have like code completion. Oh, this is a real editor, it's pretty good. So that will help kids to find the right API to call and write things to do. Hey, this Eclipse Shea thing is pretty cool and this runs on OpenShift. And this runs on the same instance, on our same locally, we were able to set this up. That's pretty handy. That's pretty cool. So that will solve one of those problems and the other problem we have to solve is hot replacing this code. Only someone build something on top of OSGI. Right, right, right, because a Minecraft being Java based and so if you want to hot code replace things in Java, there's a couple of options there. There are these Java, JVM agents that you can use to replace code. Yeah, the Java debugger can do this. Yeah, that was the other thing. But it has limitations. There's limitations, yeah. This OSGI thing, isn't used that much in the industry for sort of production applications because you might not want to replace your running code in production. You don't. So code interjecting bits of code or something but in this case, we totally want to do that. In this case, this will be quite handy. Yeah, so let's check. So Minecraft OSGI, oh, you actually did that last year. Yeah, right, right, I got this. All right, what are your incidents? So you're saying, hang on, so you're saying that from chain in here, you could make a change. If you could find it. If I can find it again, sorry, that's the one. Yeah, here, I could make a change. So let's do that, let's say, hey, hello DevConf, that would be something. That would be really cool. Nobody does that. Right, right, right. DevConf, if I could type. DevConf.us, right. So yeah, and then save it. Save that, okay. So now we've changed the source code inside the GA container in the development side. I mean, you just don't get it. Somehow we have to get it there. So there's a little Maven plugin in this project already installed that will make a diff of your code. Send that over a web socket to the running Minecraft server, right? Wow, got it. Wow, okay. It will apply the diff. No. Yeah, then it will compile the code into an OSGI bundle and hot replace this code. You don't say, this actually works. This will work, but I'll show you. All right, that's all right. If the demo gods allow it. Right. So I just run this Maven. Maven build. Maven build, yeah. Inside OpenShift on our development container. In the side separate container. Right, so we can just get our kids to extend and add some things in here. Just by going on this website. Yes. Nothing needs to be installed locally. Right, so here's the plugin in running now. Okay. And it will send the diff. So the things are changed. Okay. Right. And then it will fire up and build and do OSGI magic. Okay. To get this code running. So right now. It takes, takes a. All right. Now it's still boing. 10, 10, 20 seconds or so. So without restarting of this micro show. Because we can keep playing here. We can sort of be in the world live. Yeah. And keep on, keep on jumping. And you think this message will change? It will. Huh. If, if not, if not, you can cut. Let's see. Boing, boing, boing. Boing, boing, boing. This is big. All right. Thank you. So now we have the node, node.js experience. Right. In Java. Right, right, right. All of this, all the way restarting. Yeah. We just reload. This is handy. This is handy. This is, this gets, it's going right. Right, right, right. So I think this works really well. If we go back over here. This works really well. If this makes some sense to you or you can sit together with your, what would you say? I would say like, he has to be like 12. 12, 30, 14. Yeah. Maybe even a little bit older. Yeah. Like they've done before. Right, because You have to know the APIs. You have to know all these Java constructs. There's a little bit of still. Right. All of these different APIs, these classes here. Yeah. You have to know how to put the semicolon and braces and whatnot in the right places and all of the syntax stuff. Otherwise it becomes red like this. Yeah. That's, I mean, it's cool, right? It's cool. Pretty cool, pretty cool. But I'm just wondering if we could do better for the batch before. Yeah. Sort of eight, nine, 10, 12-year-olds. Maybe we could do better. This is the home of MIT. You're right, we're in Boston. Right. So they have this scratch developed here. Anybody here does not know scratch? Raise your hand if you don't know scratch. Everybody knows scratch. Right. Everybody knows scratch. Okay. Scratch is a visual development environment for children up on scratch at MIT.edu. Very well known. It's been around for 12, 13, something like this years. Yeah. We're very interested in noticing. Oh, right, yeah. It's available on the Raspberry Pi. It's very widely, very widely used. And the cool thing with this is it's visual development. So instead of writing this code that we have over here, and the point here isn't really that it's Java code. I mean, if there's just Python code, it will be the same thing. The point is that for younger children to just be able to assemble blocks is a really much easier experience. I can do that at an age of two. Yeah, yeah, yeah. So it's something with blocks. But here, yeah, with eight, they could get this. I think a minimum maybe reading, recognizing the blocks and you can get them started. Yeah. Right. So, hang on. What we're saying here is that it would be interesting to be able to mod Minecraft. With Scratch. With Scratch. Instead of Java. So how would that work? Because we are in a container over here in OpenShift. Okay, we have to do with GI stuff. Yeah. But we would, we would have to have some sort of, so we can extend it with JavaScript. So the, Oh, Scratch can be extended with JavaScript. Yeah, so Scratch can be extended with JavaScript. It's built in Flash. But you can extend it with JavaScript. That's the current version that's built in Flash. Right. There's a new version coming up, Scratch 3, which is completely written in JavaScript. Your JavaScript. Yeah, because then you can also do it on your iPad or on your mobile. Okay. And Flash is. So that would let us get some additional blocks in here. Right. That will get some additional blocks. So if you would have something that would be JavaScript and Java, that would be really cool. So yeah, because the problem we still have is that we're running, even if we had custom blocks, like if you had a custom block to maybe, I don't know, show something in here or something like that. We still need to bridge from what we're running in the browser to back in the backend where the Minecraft servers are. So something that allows us maybe a sort of a distributed message bus or something to go from the browser back into the. Something reactive. Something like Vertex. Vertex. Right. What about that? Yeah. Vertex is this other red hat thing. We're just talking about lots of these red hats things here. Yeah. We get to eat our own dark food. Yeah. We do. So that's a reactive Java framework that has an event bus that you can also use from JavaScript. Right. Because it's a distributed. So let's see. We could actually, we actually have something that was running, right? Right. Let's show it up. Let's go there. So this is on another server. So I just need to quickly go out here. It's also running locally on our own OpenShift instance. Actually, should we have the time maybe to jump into OpenShift? Yeah. Let's show how we set that up. We're good on time. Yeah. So what we could maybe just show here is how on this local OpenShift instance here, which just timed me out. Deli secured set up. The fellow for the fellow. We have here, like we have the launcher as showed earlier in the separate projects. Okay. So this is the container that runs this thing. Yeah. Right? Mm-hmm. Then... Mini-J. J instance we have shown. That's a container. Actually, several containers that runs this on an online environment. Because every build you do here is in a separate container. Okay. Yeah. And then we've got test38, which is our Minecraft server. Right. That's the launcher created this project and set it up to do the OSGI stuff. So let's have a look into this one just to drill in here a little bit. Yeah. So what it's showing us here is that we have an application Minecraft server. Right. And that application is running, if we click on this, has a deployment. It's the first deployment because we just set it up. And we can check out the logs here. Right. Okay. There's logs of the running Minecraft server. All right. If they come out. Just demo. Where are the logs? Yeah. It's loading. Logs, so this is the... Yeah, there we are. Logs. Oh, here we go. So this is actually the logs of the Minecraft server inside the container. We can install it from the OpenShift console. Yeah, you see the Maven build. This is when we made the change from one going... This is the Maven build. Right, right, right. When the one going became the Hello DevConf. Yeah. And what's over here? What is this do? So here it builds and they create the image that runs. Okay. So the point of going from here initially. Yeah. Going from our source code here initially to a container. It's done by the build. That happens here. Over here, we have like a build. There's also, you can show maybe also there's an image. Log. Yeah. Oh, this is gone. Yeah, this is gone. So the images. The images, there's an image here that's two actually. Is this like Docker Hub? It's like Docker Hub. Kind of basically. It's like registry, yeah. Registry. Container registry that's built into OpenShift. Yeah. Okay, cool. Well, with that, we could look at our other project. The stories one here, very similar. Very similar project. I'm going to use this opportunity as a plug to show where we actually have this running. In case anybody wants to continue checking this out at home. We actually have what we're showing you here. Running on a public server, on a public OpenShift instance. So if you want to go to www.learn.study, you can actually see what we're about to show now, the scratch integration. And if you want to run it at home, or God forbid, help us make it better, you can go check out the sources here, where everything we've done is linked. Hey, could you make it better or add some YouTube videos? Yeah. Every contribution is welcome. And there are different ways to contribute to OpenSource, right? It's not just code. No, you don't have to code, maybe improve the documentation or... Make the site better, make a video that shows what you've done with this. Yeah. Or get your children to do it. Or get some issues like, what else do you want? Right, what we maybe forgot or... Yeah, or what doesn't work. Let's jump into this. So how does this work? I have to connect to oasis.learn.study. Yeah, that's the public one, but we also have it running locally as well. We're running locally as well? Yeah. We don't want to trust the conference. Yeah, so we're going to use this instance here, the mini shift one. Yeah. But you get the same thing when you connect to the public one here. Right. Ah, it became dark while we were talking here. Dark to long. Yeah. So these Minecraft worlds have a weather cycle and a daytime cycle. Hey, who is this? That's Benny. Oh, right. Benny, our favorite... Favorite donkey. A favorite donkey. Okay, so this is a standard Minecraft server. Where is the bundling? It's not on this one. Right. This one is a different one. This is the one where we have the scratch integration. Right. Okay, so let's go back to this very www.learn.study. What are we saying here? Yes, start by typing slash make. Type slash make in the Minecraft chat and you get a link to the scratch... Oh, check this out. So you can do slash make. Right. Then you get a link. Click here to open scratch and make actions. Ooh, okay. Do this, do this. And you think we can go in here? So basically we're doing the same thing like what we did before. Okay. Here we're creating a plugin or we're extending Minecraft but not by code but by adding scratch blocks. Using this vertex event bus from client to server. So here we have like a normal scratch GUI. It can do stuff like heavy event and have control blocks. So everything other than the more blocks is the standard scratch environment that some children are already familiar with. Right. And we have the Minecraft blocks, extra Minecraft blocks on the more blocks. So what do we have here? Maybe we can make this a little bigger. Is there any way to... To control blocks. We'll load this up a little bit, let's see. So the blocks that we have here, there's one that says title. How does that work? If we get this one here, put it in here and let's use a scratch launch thing. Yeah, so when space key is pressed. So this is the space key in scratch, not in Minecraft. Yeah. Just stand the scratch. So if I press space here... It's to be fast. I get a welcome over there. You get my, basically you get my first... Right, we're back to 20 years ago. 20 years ago. Because this is something that kids can get excited about. Right, right, and this is really easy. You don't need to set up servers and what. You just connect to this Minecraft server, type slash make, and then use these blocks. Right, this is nice. And then you're extending Minecraft. And you're extending Minecraft. Do you think we can do more here? Yeah, sure, go ahead. What do we... I think you should do something of our favorite donkey here. I think that's a good idea. Let's try to do something creative, more like storytelling, not like slaying zombies and counting points. That's stupid, right? Yeah, that's stupid. Let's do something a bit nice with... For example, we have a... What am I holding here? You're holding a carrot. Carrot. Tell you what. We'll make a custom command. Instead of pressing space and scratch, we'll build a custom command, slash demo, and we have this entity speak block here. Why don't I try to make Benny... Say something. Yeah. Load FGOMF. Sleep. It's free. Over here. Oh, can I do that demo? Let's do demo 2. So, if we look at Benny, and we do slash demo 2, HANADEFGOMF! Whoa! Okay. And because we have all the scratch available here... We could do more. We could do more. Crazy, right? We could add some logic to this. Okay. Let's do some if-else. What do we have in here? Or maybe do it even better. Do if-else. If-then or if-then-else? If-then-else. Let's go crazy. This is a huge logic. So, we could make a story. It won't be like a fun story with the carrot. And feed me. Feed the donkey kind of thing. That'd be fun. That'd be fun. So, why don't we say, if the item that's held... Yeah. ...is equal to... It's a visual like this. The talent is equal to... An apple. No do a carrot. Donkeys love carrots. Donkeys eat carrots. Okay, so we'll do carrot. And let's change the held EFGOMF to say... Is it the donkey polite? Donkeys are also polite. Donkeys are polite. Please feed me a carrot. No, this is when you have a carrot. Oh, right. I'm sure kids would have gotten this right, but I'm totally confused. If it's a carrot, if-then-else we say, please feed me a carrot, if we are holding a carrot. Carrots, you say, yeah, thanks for that. That's for a carrot, yeah. Like, yummy. So, Benny will say... We can have different characters also. So, we can have a dialogue here. Like, we can have Benny and his friend and they can exchange comments or something like this. Yeah. We could hear, thanks, yummy. Thanks for the carrot. Thanks for the carrot, because he's polite like that. Right. So, with this, if we're in here, then we say slash demo two. Please feed me a carrot. So, that's because we're not holding the carrot. And if we use this Minecraft thing here to take the carrot into our hand. Yeah. And we do demo two now. You'll say, yummy. Yeah, yummy. Thanks for the carrot. Nice. Cool. All right. So, now we could build... Our kids could build a whole adventure in inside Minecraft. Right. With a goal in the end. This is just the beginning. It could be some puzzles to solve. Puzzles, riddles, real stories or something. Yeah. You have to find something. If you hold that. You can use the variables in Scratch. Scratch is this data blocks. This is basically variables where you could say, if you've done this before, you can set something to true. Yeah. After this to that. First, complete the quest. Complete the quest or something. Right. The sky is the limit. Yeah. Yeah. We probably can't figure all the things. No, we're too old. Yeah. The kids can think of. Right. Right. So, with that, we're sort of almost up on time. And have basically covered what we wanted to show you guys. If you have any questions, we're happy to answer them. And most importantly, try this out. You can join our server here. www.learn.study. And you wanted to show the other... And we have some more links. We even have this chat server here. Our kids told us that gamers are all on Discord. We didn't know much about it, but you're... Yeah, my son says, hey, Dad, if you want to be cool, you have to be on Discord. That's good. So, here's Discord channels over here that you can, like, join us and say hi. Yeah. And, again, the sources are linked down here. We also have got a link down here for more cool sites. This has nothing much to do with Minecraft, but we're just using the opportunity of having you all here to showing you that the last link on www.learn.study is a link to this little site for educational YouTube channels and hardware things for robots and micro-bits and all of these things that some of you, I'm guessing, are familiar with. And we have over the last two years or so, I think we've collected some recommended links here that are great for children. Yeah, and if you can think of some things that we forgot about on this page, then you can make a couple requests for this one. Because, of course, this page is just a... Read me on... They marked down on GitHub. Right. It's linked down here. You can just click here and edit it. Send us poor requests for how you... Other things that you might be aware of that are very useful. Yeah. Yeah, that's it. That's pretty much it. Yeah. Are there any questions? Thanks for your interest. Or just to speak out, then we'll repeat it. We'll repeat it as well. So, I did this on the Scratch page. We had it on the save. The actual demo was a little bit different. It was a save link. So, two questions. I think it's number one. Clearly, you say that they come back to it later. But are these actions active to every user on the server at the same time? That's a great question. That's a great question. Yeah, that's a great question. So, the question is, are these commands for everybody on the server? Right. And that's actually very much what we were working on like half an hour ago before the presentation. Currently, this project is in the browser and you can save it locally. So, this save basically opens a file chooser and puts a zip file on the local thing. And we've just started chatting together how we're actually going to save the project that you have here, which is a JSON file in Scratch. And we're going to push it to the server. So, save the scripts together with the worlds because that's where they belong. This goes very much together with what you've created in Minecraft. Right. You have this donkey and you have your adventure set up. This links, technically, things that are in the world. So, we're going to save these projects on the server. We're not quite there. We haven't finished that yet, but we're pretty much going to do it. So, help us if you like. What you do is active for every user. So, right now, it's for every user. As long as you have this open. So, if you were to join the server right now, this would work for you. But it's running in the browser. So, if we close this tab, it won't be active for you anymore. Yeah. And so, something we wanted to do as a follow-up to when we have this push to the server, when we have this saved on the server basically, you can speak about this. So, in Scratch 3, the whole logic has kind of changed and then this is like a separate thing where you can parse the Scratch file and save it also as a JSON file. So, when we can run it on the server, so this whole parsing and logic on the server, and then we can save it easily. So, the cunning plan is to move this to Scratch 3, which has this JavaScript-based, very modular architecture where they have a, what do they call it, Scratch? It's based on Blocky. Blockly is for the front end? Blockly from Google. That's for how it looks. That will replace this part? Yeah, that will replace this part. And that's how it looks like. Then you scratch VM. That's basically the whole parsing running the script. The Scratch VM is the non-Guy. The non-Guy part that we will then also run in Node on the server. Right. And then there's a separate GUI part that will do this and we'll change it a bit because here there's this preview window, there's this preview window there. That doesn't make that much sense. We need that so we could change that, that goes and we have more space to... So, the idea would be to push the project to the server and use the Scratch VM project and run it in some Node container or something and have it constant there. Very cool. Thank you. Thank you so much. Cool, cool. Just go for it. Try it out and we'll... Yeah. Thanks for your interest. Can you go back to the next stage? Yeah. So, this one here. And this is easy to find just... Yeah, that's right here. It is here. A couple of things. It's easy to find. Don't write down the URL of this. Just go to the Minecraft page, learn that study and then down here you get to this from here. Maybe a point uses on our big page. That would make it easy to find. All right, any other questions otherwise? I know we're between you and the break. We're around, you can talk to us afterwards as well. We'd love to see many of you on our Discord server. Chat with you. See you. Thanks. All right, thanks. Thanks again. Time to start. Okay. Hi, everybody. Hi, Tom. How was everyone doing today? So, this... My name is Nolan Dye by I work in the containers group at Radhead and I'm here today to talk to you about how we build container images. Or more specifically, ways that you could build a container image that you probably shouldn't, but we're going to do it anyway as a learning exercise. And then I'm going to list some reasons why we shouldn't do it that way. And then we're going to list, look at some other ways to do it that are way better. So, let me back up just a second and recap. For those of us who've been here, we've been talking about container runtimes. People who saw Scott McCarty's talk, or Dan Walsh's talk, or Sally Narasi's talk earlier today. Lots of talking about container runtimes and engines and a little bit on orchestration. I missed some of those talks, but we have been talking about it here today. One of the things that we didn't go in depth so much on is what goes into an image. And the image is what you use as a template for launching a container. It starts, well, it is the initial state of the root file system. It's some additional information about how to run the stuff that you have inside of that file system. And it's a useful exercise to know how it works. The purpose of this particular talk is to demystify the build process. And we're going to do it by actually walking through it all directly. And by the end of the day, well, actually within the next half hour or so, you're probably going to walk out of here thinking, well, I know how to build that. And you might just go ahead and do it anyway. And I'm not going to stop you. So for an example case, we are going to use, well, probably find inside of a truth. We're going to start with that. We're going to build a container image out of that. We're going to run it. Then we're going to refine that a little bit. And well, over three or four steps, we're going to refine it into something more full-featured and fully fleshed out that looks more like you would expect a container image that actually does things to do. It's going to start with a simple case. I'm going to work my way up. So what do you need to supply when you're building a container image? Three things, really. You need to supply the root file system, the things that the process is running inside of the container are going to see. You need to supply a configuration that tells the container engine how to run it. Things like the environmental set, the actual command to invoke when you want to start the container. And the third thing is the manifest. That's a detail of the image format and how things are moved around. But I'm going to go through them in that particular order because, well, each one of them successfully contains information about the one that you've created before that. So let's start with... Oops, I forgot about this one. One thing I'm not going to do is actually do the work of pushing the image to a registry myself because we already have Scopio for that and it's a great tool. So the output of our particular build process will just be some files in the local disk that we're going to transfer over to the registry. It actually does quite a few other things for us, but that's what I'm going to be using it for here. So let's get to it. Let's create a layer. This is going to take a little while. So I'm going to use a directory named root. Oops, that's a subsequent version. Okay, let's make a directory named root. We're going to call that our root file system. Create a directory inside of it named bin. Just not really required, but let's do that. And I want to run find inside of my container. So I'll just grab a copy of find that I happen to have on my system already. Copy it in there. And let's see if we can run it under root. That's not going to work. All right. My copy of find, there we go, is a dynamically linked object. So we need the runtime linker to be present in the truited environment in order to run it. So what was the name of the file that needed? There it is. I need to make the directory first. Now let's try running it again. Right. It needs shared libraries. Well, what shared libraries does it need? Uh, root, bin, find. It needs several shared libraries. And at this point, I'm going to give up on this particular exercise, because that's starting to look like a lot of work. So let's find a statically linked binary to use instead. Let's search user bin first. Actively linked binaries. Well, those are particularly good examples for telling me what's going on inside of a container. Let's check user sbin. Oh, busybox. Of course it was going to be busybox. Those of you who predicted it was going to be busybox, pat yourselves on the back. So, start over again. Now, the fact that I'm using the directory named bin to hold it, that's merely personal preference and isn't really strictly required. Okay. We could run that inside of a shared root. So now let's turn it into a file system layer. File system layers are extracted relative to the current directory. So let's just create one. tar, cvf, dot, dot, layer, dot, tar. Right. Current directory. Now we have a layer. So, the next thing we need to do is create, yep, a configuration. This is an example configuration. Oh, oops. Hopefully it shows up well. I'm actually emitting some of the fields that are, that can be left empty by default, just so that it'll fit on the screen here. I'm actually going to edit a proper one in a minute. So, as you can see, there's some bookkeeping information, a creative field that doesn't really tend to get used by anything. Information about what architecture in OS you need in order to run the container, because, again, it's not a fully virtualized environment, so it isn't a full kernel. Then the basic stuff you would expect, environment variables, the username and ID, well, the user ID to run the command as and which command to actually run. And the history is part of how we build it. The history is, well, the RIDFS and the history match the set of layers that we're using. And since, for our example, we're only using one file system layer, each of these things, which can be an array, is really just one thing. So let's go to the command line. I actually already created a template which contains most of the stuff in empty form. So, just edit that really quickly. Layers represented in the configuration file are identified using the show of some of the layer contents, so we need this information. Okay, and this goes here. I already filled in a history entry, because, well, why not? Leave this alone. Added a volume. We don't actually need a volume, so we'll just delete that. This is the command we're going to use. It's a little bit more elaborate than just busybox find, because I wanted to show us all the information about the files that we're finding. This, well, we're not going to use that. We don't need that. Okay. Now, we have a working config. You're just going to have to take my word on it until we actually try for real. And like I said, it's not not like filling out a survey. The next thing we need is a manifest. The manifest needs to list both the configuration and every layer blob, again, using their digest. And this time, it also needs to include their sizes. This is actually a very simple one, which we'll work quite correctly for our image, except for the fact that these values are all made up and the shot sums are truncated and the sizes are completely wrong. So let's create one of our own. This actually all fits in the screen once, so let's do that. Let's grab the configuration. It's 1071 bytes. And this is the digest. Paste the digest. That's that. We again need the shot sum of our layer, which goes here. This is the type of data that it is, and this just says that it's not a compressed tar ball because we can do compression. That just makes things a little more complicated for us. Okay. I need this again. I probably could have remembered that. Probably not, but it's the size I needed. There we go. We'll paste that here. That's everything I need in order to copy that up to a registry. So one of the things we're going to do is we're going to tell Scopio to copy it. We're going to use the current directory as a source. Scopio does when you're using a current directory, is it assumes that everything except the manifest is named using the shot sum of its contents. So I need to create a couple of symbolic links so that Scopio will be able to find things. Let's see. Scopio copy. What should I name this image? Anyone? What? Okay. But I might already have an image named busybox in my registry. You wouldn't know. I forgot my brother's name. I didn't see who suggested that, but let's go with that, because I know I didn't do that. Anyway, I'm running a registry on my local machine, slash library, slash... Right. Scopio does not like the fact that I didn't bother setting up SSL. Right. What's destination? TLS Verify Equal Spons? Okay. I copied it up to my registry. Now let's do it this way because people are familiar with this. Dan, I'm pretty sure you're giving me stinker eye right now, but let's not go there. Add a tag. Do I need to add a tag? This was working yesterday. This is very annoying. Okay. Forget the registry. We'll just copy it directly into the daemon. Destination looks like this. Okay. So let's run that. Right. I made a mistake here, because my root file system doesn't contain an Etsy password file. The username to run everything as with a name meant that it couldn't be resolved to an ID. So let's change that. I need to update these things in the manifest. In 65 is the new size. Everything else is still correct. Man. Oh, right. I didn't create the symbolic link from the new digest to the name. Ah, success. I'm now running a copy of Busybox in a container image that I just built in. Oh, I see. Here's a problem here in that this file should be owned by root, but it belongs to me because the copy that I used belongs to me. So I'll have rest for a second. Those of you who saw Sally in a Russia's talk earlier today, they discussed user namespaces, and that's how I'm going to get around this one. Actually, yeah. Oh, no. I own the files. When you tarp something as yourself, you own the files, usually. Well, it's usually content that you own, and I want the contents of the layer to look like they belong to root. Now, one of the things you can do with Unshare, which is a useful piece of container technology, is you can create a user namespace. It's essentially launching a new process inside of a new process tree, and for everything inside of that process tree, there's a set of UIDs specified in this configuration of UIDs and GIDs outside of the namespace. One of the cool things about user namespaces is that you don't need much in the way privileges to create a new one. An unprivileged user can create a new user namespace and then map their own UID to UID0 inside of the namespace. They'll still have limited privileges on the system at large, but everything inside of it will think that that user is root. So we're going to do that. In fact, I'm going to do it the cheapest way possible. Unshare-ur. So let's take a look at what we've got here. Proxelf UID map actually shows us what's going on here. This shows us that UID0 in the namespace is the beginning of a range. This being mapped to a range starting with my UID 2510 outside of the namespace, but the range only has one thing in it, which is fine because we only have one file that we want to map. So if I look at my raw contents it looks like it's owned by root. It's still actually owned by me, but the user namespace is causing everything inside the namespace to see things that belong to me as if they belong to root. Things that are not mapped because, again, I only mapped my own ID. Things look a little weird there. Unmapped values are mapped to specific magical values set as siskittals that are configured in the kernel at runtime. But we'll get to that in a minute. So I need to recreate my layer with the new contents. Now I need to update. This is going to take a while. So the digest of my layer is different now. Which means the copy of the digest I keep in the config needs to be updated. And the copy that I keep in the manifest also needs to be updated. How big is the file? The file is the same size. Okay. Then... Right. But the digest of the configuration just changed. Live demos everybody. I know you love them. Okay. Oh. I forgot to name the files. I'm in the right shell. Yeah. That's why I'm out of my extra work. I forgot to exit my name space. Okay. There we go. Right. The blob is... I didn't create the symbolic links. To my updated layer. So we'll copy this. What I miss. This one that is missing is the layer. Which is no longer available. What did I name this thing? Okay. Right. Okay. Now if I run it. Ah. There we go. Everything in the image and the container that's based on it now appears to be owned by UID0. Fun. Yes. Yes. You can see you're all laughing. Okay. Now one of the fun things about a user name space is when I'm UID0 I don't have to settle and I can go ahead and use the whole image package manager. So... Let's look at them later. So let's try again. I'm going to use this as my shortcut cheat sheet. Actually, yeah. That'll be good enough. There we go. Oh, whatever. Good. Yes. Least version. Yes. I'm using OGPG check. I'm going to talk about bad ways to do things. So that's some more about that. Oh, Wi-Fi is going much better today than it was yesterday. You're right. So... I'm installing a simple Python script that actually depends on the Python interpreter, which depends on libc, and wow, this is taking longer than I expected. I just got the 10 minute warning. So... Actually, we're doing fine. Tick, tick, tick, tick. Come on. Right. Okay. So, this is where I need to fill time. Anybody here from Indiana? Anyone in the audience? No? Okay. I've just been obsessed with that since this morning. So, one of these... The annoying thing about installing things in the charoot is that we do need to pull down a fresh copy of metadata that gets stored in the charoot environment. We're not going to bother cleaning that out because... I'm going to use a bit later because I fully expect this to fail at some point. Okay. And that's probably going to break something, but that's not what I expect to break. Right? Language packs? Got it. That's a fairly large piece of it. Okay. That's an error. That's more errors. Ooh, so many errors. Transaction failed. That's a problem. So... I already know, but does anyone else know why the transaction failed? Well, that's one good reason. The other one is that, unlike the simple case, because, well, the main problem is that when I created a namespace, the only UID and GID that I mapped that are known to the system, that are allowed to own new files that you create are zero, and not every file in the distribution is owned by root. So a lot of this is actually just that the user couldn't be given ownership of anything, in particular this one. Ooh, yeah. So let's try that again. Well, we can tell Unshare not to map things, and then we can use new UID map and new GID map, which is some tools that were introduced in the newer versions of Shadowutils to go ahead and give us access to things that we didn't already have. Let me back up a second. Give me that look. Okay. Normally, when you run Unshare as an unprivileged user, you're only allowed to map your ID. That way, you can't map somebody else's ID as an unprivileged user ID and start fooling around with their stuff. Because you are UID zero, you can start doing chone and start deleting things. That isn't allowed because it's generally unsafe. If you could map the root ID into your space that all kinds of crazy stuff would happen, and that's also not allowed. Starting with Shadowutils, I want to say 4.2, one of the things that happened by default when you logged in, when you created a user, I should say, with user add, is that it allocates an entire range of previously unused UID files. Actually, for every user that gets created, and I'm in here somewhere, yep, and there's one in sub-GID. The purpose of this is the idea, well, the whole point of this is that this notionally sets aside a whole range of UIDs that are only going to be available for use by me and that are authorized for use by me. So it also includes a set UID tool that I can use to set up a UID map in a user namespace that lets me map in at least things that are in this range. And we're going to use Unshare again. But we're not going to tell it to initialize a UID map. There's nothing there. So we'll go to a different shell and use new UID map and new GID map to set the mappings for my new shell. 22520. 22520. Start mapping my ID to root, okay, a range of one. Then let's map starting at range one in the container to 2200544 and the whole thing. Okay. Now this should be aha, I've set up mappings. Now let's try to double check the new GID. Yep, same. So new GID map will set up that. Let's try that again. I really need to cut that down, okay. Watch me a little faster. So what I'm doing here is I'm doing the exact same thing I was doing earlier installing an entire root file system to run one program. It's fine. But we're doing it inside of a namespace where the set of available and recognized GIDs runs all the way from 0 to 65,537 because the first one's actually me followed by an entire range. And that should be enough for us to create this image. So far, in practice, I haven't seen images that had contents in them owned by UIDs higher than about 65. So I'm going to actually cut this down significantly or take that 65K range and slice it up into about 64 of them and that would still work. They can be completely unrelated sets. So hopefully this should go a little bit faster than the last run if only because I'm scanning fewer repositories. And now this is not the most optimal way to install it because there are ways to do smaller installations. I'm still installing recommended package dependencies. I want to not do. I'm also not going to bother cleaning up the DNF metadata that I just downloaded about the Fedware repository. And that's also going to take up space in Manage, but I don't care because it's local disk and local disk is free. Sort of. That's just a warning, not an error. Keep going. Come on. Getting closer. All right, come on, come on. And the transaction succeeded. So let's create a layer out of this. Same as we did before. Before I forget to get out of that I should really do that. Okay. So the diff ID for the new layer is this. In fact, I'm going to change the command I invoked because I just installed a new one. There we go. Copy let me update the symbolic links. New configuration is much smaller. That's the brightest new layer. New layer is significantly bigger than it was. That's fine. It's not compressed. And also I didn't bother cleaning up a lot of space. So let's copy this. We'll be much slower to upload it. But that's fine. And let's run it. There we go. Run the command in a container. Everything's fine. If you're thinking this is pretty easy, yeah it is. I did purposely skip over some of the things that make this complicated though and I'm going to go over some of what those are. First things first. While I was installing packages, Postcripts being run by RPM underneath the NF, those all executed as UID 0 in the namespace. Now, if you remember, I set up myself as UID 0 in the namespace. So those commands, if they had broken out of the truth that DNF set up would have had free reign over the system as my UID. That's generally a bad idea for me because I like having my stuff not messed around with by other people. If I were building it as root, that would be an even bigger problem. Normally, container build tools that you see out there in the world, image build tools, will use a proper container. They'll do set comp filtering. They will set up control groups to limit the set of resources that can be consumed by this one process to avoid messing with the system. So that was sort of a bad idea what I just did back there. But it kind of worked. You also didn't have to deal with multiple image formats. While I was using the OCI format for configuration blobs and manifest, there are actually three different formats, two of which are very similar to each other. Each of them unfortunately includes information that the other ones don't. So if you need a specific field that is specific to your particular version, you need to be able to write that exact format of the file. Scopio, in addition to being awesome at copying things around, will also handle format conversions for you automatically. Also, if the registry requires that you compress the layers, it'll do that for you too. It's pretty sweet. And, oh yeah, way back at the beginning of the talk, I said I wasn't going to be creating file system layers, and I didn't because that's a much harder way to build up root file systems or generate the differences between them in a way they can do for layers. So what I've essentially done here is the equivalent of a squashed image, and it's not going to be able to do anything more complicated than that. So how do you overcome these limitations? Oh yeah, I forgot the repeatability. While a shell script is great for storing around and messing around on your system and doing something ad hoc, it doesn't help you rebuild it again later unless you remember the exact sequence where the shell script is very hard to document. The format that people tend to use for expressing how to build an image is dockerfile. So it's still highly desirable to be able to support dockerfile if you're going to be creating one. And I'm going to do it on time. Oh, 30 minutes more. Right, okay, so time for a speed run. So we use tools. So this is the rundown of the tools that are out there. Well, the pointer and container network building, sorry. A quick survey of tools that are out there that are known to me, at least, there's probably a lot more that I don't know about for building images. Microsoft's Azure Cloud Platform includes a container registry which currently includes a building containers feature that's in preview status. It handles dockerfiles, you can all read. Build is the one that I work on. I happen to like it. Build is going to use RunC if you're going to have to handle run instructions in a dockerfile. BuildKit is the newer bit that was spun out of docker. It's actually pretty cool. It uses a lower level interpretation and it handles dockerfiles. Well, one of its examples is a front end that uses dockerfiles that reworks it into a lower level syntax that actually is run by BuildKit. DockerBuild is the OG builder. Everyone's pretty familiar with how it works. Google Cloud Build is a fun set and I wasn't actually able to get all this running on my local machine but you can run it locally which I think is a very positive thing. Daniel who's IMG is a pretty cool project which handles a lot of this. It fills in the blanks in BuildKit and also handles dockerfiles, also runs on privileged. Conoco is a really interesting one that is built to run inside of a container. It makes certain assumptions because it is running inside of a container and it's hard to start a container inside of a container. So one of the things it does is it just if you're doing from an image it actually explodes that into the container that it's running in. So it doesn't have to launch a container to run the command because it already expects to be running in a container. It just executes them directly which is pretty cool. And these are some of the references I went through. Here's where you can find all of this stuff. Now, sorry about that. Didn't really time this well. Yeah, any questions? Yeah. Any comments or opinions, disguised as questions? Okay. One last thing. I want you all to check under your seats and make sure you didn't drop anything. Okay. Thanks a lot, everyone. Thank you. Thank you. Just a couple of words before you guys leave. Tomorrow at 9.30, there's a keynote. Please plan to attend that. The keynote speaker is Chris Wright who is the CTO of Renat. And if you haven't got the party tickets, I think there are some available for tonight's party. Stop by the front. Yeah, thank you. A practical guide to Qvert. Hopefully a guide for the rest of us. Just a current state of the world or a statement on where it is. Containers are increasingly becoming the de facto standard of how we're packaging applications. And Kubernetes and OpenShifter becoming kind of the de facto way that we do that. But that's for new applications. When you start talking about virtual machines, I've heard some people say they're going away. Well, no, they're not. For business reasons, it's hard to redo some applications. And for technical reasons, it may be impossible to do that. For instance, if you need windows in a machine or if you attended the inter-cournals talk yesterday, that would not be something you would put directly in a container. Now, in today's world we traditionally have separate management infrastructures for these Macs, which unfortunately means underutilizing some hardware because you just can't mix. So that's where Cubevert comes in. We're looking at a technology that enables unifying these two infrastructures so that you can build, modify, deploy applications that are virtualized alongside your containers. Or in other words, put virtual machines right under your Kubernetes project. So we do this by using a custom resource definition that we drop into existing Kubernetes clusters. Now, this is really important to say. We don't require one of our requirements for ourselves is that we do not allow modification of the Kubernetes cluster before we deploy. In other words, we can't change container run times. We can't add system accounts or what have you. It all has to be done as part of our deployment or it can't be done. And so, by doing this, we extend the Kubernetes infrastructure so that it, you know, as Kubernetes native possible as a way as possible. And so, by doing this, the virtual machines are actually inside a container. Some solutions out there, such as Conna containers, I believe, maybe actually modify the container run time. That's something we're explicitly trying not to do because we don't want to be modifying that ahead of time. Now, in the future, that might be a restriction that's lifted because dynamic container run times is something that may come to Kubernetes in the future. But for now, that's kind of a hard and fast rule and one of the reasons that we're doing it the way we are. And, you know, by leveraging our existing ecosystems, teams are allowed to use the proper tool for their solution. Whether that's a virtual machine or a container, they can put that right into their CI CD pipelines. So, for the way we implement this, we actually are using a custom resource definition. This is, you know, I've got an example of one over on the right. It's basically just a YAML file for those who haven't seen this before. For those who have seen Kubernetes constructs before, this should look pretty familiar. In this case, the only thing special about it is the kind is a virtual machine instance. So, virtual machines here have their own kind. And this gives us the ability to express all common virtual machine parameters, such as memory, CPU, and the like. Because we're implementing this as a custom resource definition, we also inherit RBAC rules. So, users are only allowed to modify things in the namespaces that they're designed for and what have you. So, here's a little bit of the workflow. This is a busy slide, so if I could take a minute to explain this. When the user implements this custom resource or posts it to the system, that's a virtual machine instance. And so, that's actually just a record on the, you know, in the CD cluster. We've got a controller, a vert controller, which is monitoring for changes to custom resources or to virtual machine instances, in this case. And when it sees one, it actually schedules a pod. And that's all it does at this point. We just schedule a pod. And you can see that, you know, this is the third step here. And then, now the vert controller is a cluster level resource. So, it's only job is to schedule pods. Then on each of the individual nodes, vert handler is running. And that's another controller we have. And it is looking for these pods that have a special label on them so that it knows that it owns that pod. And it will then schedule starting the virtual machine inside of it. Now, there's a little bit of hand waving there, of course, because I said start a virtual machine in a container that's already running. So, what we're actually doing is we've got a daemon called vert launcher inside this pod that's actually doing that work. You know, just some full disclosure there. So, as far as scheduling, as I said, vert controller is scheduling a pod. That means that we're actually literally using a pod and pod rules for where we're going to end up placing virtual machines. That means anti-affinity, affinity labels, selectors, all of those constraints that you can put on a Kubernetes pod still work. And you can even use a custom scheduler if you needed. Now, the applications within the virtual machines, because they are leveraging a pod, all existing Kubernetes constructs such as services and routes still work, and we'll get a little bit more into what those are later. But we actually use labels on the service itself to designate which pod the service belongs to or where to route the packets, basically. So, virtual machines live in pods. Now, that's transparent to higher level management systems. But, you know, technically, that's not worse than it currently is before we did this project. Now, virtual machines leverage pods. When we have a new virtual machine record, any labels that are on that will be translated over to the pod. We're going to need that for scheduling, of course. And to match things like services. CBQ and memory resources, they're actually matched from what the virtual machines definition has to the pod, so that we're not over allocating how much we're requesting. And, of course, affinity and anti-affinity, I talked about that. As far as the storage, that's where the real rubber meets the road. We're using persistent volumes for this at the production level. And so, what that basically means is that any existing storage backend that you already have for Kubernetes cluster, we can take advantage of. And that's a one-to-one mapping between the persistent volume and your virtual machine disk. And so, by doing this, of course, we can benefit from all existing ecosystems that are currently out there. And there's a lot. So, for the disks, and this is actually not a Kuber project, this is a Cyster project, the containerized data importer. I'm actually going to take advantage of that and use it in the demo in a few minutes because it's just that awesome. This is something that allows you to take an existing virtual machine image, a raw disk or something, and actually import it directly into your Kubernetes cluster on the fly, which is something that, obviously, we're going to need if we're going to be able to pull this off because that's been something that's been complicated for us in the past. So, this is also, like Kuber, a declarative Kubernetes utility so that you have controllers and operators monitoring for resources that will take action when those resources show up. And so, the two use cases here to either designate an HGP URL that you would use to download an image from, or a second use case that I won't be showing is to actually use a read-only namespace within Kubernetes to copy golden images to your user's namespace. That way, they wouldn't modify the original and all the goodness that comes with that. So, as far as the network goes, we're actually using the pod network for the virtual machine. That's both bad and good. The good, of course, is that you're able to communicate with any existing container resource as it currently exists. So, we can also expose these services from our virtual machine using services and routes, as I've mentioned, to expose specific ports on your virtual machine to the outside world. We're looking at alternative networking options such as multiple networks or different variants, but right now what we're using is just a tap device inside the virtual machine. The unfortunate part about that is we lose the ability to do live migration, because in the beginning we actually had Libvert outside of our pod and at the cluster level. Or we had one Libvert per node and that allowed us to do migrations between a virtual machine between different nodes on your Kubernetes cluster. The trouble with that was a little bit of a rabbit hole, but we had some issues with PID namespaces and the like that we were violating assumptions and we just really couldn't do that. It wasn't a good model. So, instead we're actually doing one Libvert per pod. And so, Libvert actually lives inside of the pod that we're deploying our virtual machine in. What that unfortunately means is Libvert has no network access to the cluster or to other nodes. So, we lose the ability to do live migration for now. Once we implement other networking options, we can reintroduce that. So, looking at the virtual machine client tool, this is vert control. One of the things that I sort of skipped over or have glossed over at this point is we're looking at virtual machine instances versus virtual machines. These are two different kinds of records. A virtual machine is kind of a static template for a virtual machine instance. Point being, in Kubernetes world, if you start a pod or stop a pod, you're basically creating or deleting a resource. And so, that's what our virtual machine instance is. It's kind of an analogy to that. But, we recognize that that doesn't really translate well to the vert world of Rev, Overt, and the like. People coming into this ecosystem or the tools that we're trying to translate to this system, that doesn't really work well. So, we created the virtual machine object and that's where, when I'm talking about starting and stopping, that's what we're doing. You actually issue a vert control start command on a virtual machine and it will kick off an instance. It also allows us to connect to the console or use VNC in order to be able to interface with your virtual machine and kind of get a snapshot of what's going on, because you're obviously going to need that. Now, there's two ways to do vert client and that is either as a standalone command, which is what I'll be using, or you can actually use it as a cube control plugin, so I'm straight off of cube control. And time for a demo. Real quick, before I do that, I'd like to explain what my system looks like. So, this slide, other than being an example of something complicated, is what my development environment looks like. Inside the physical machine, we're actually running, and I'm sorry Dan, I have to say a docker cluster. Nearly got away with it, but it's on the slide. Inside of the D word, docker, we're running a vagrant instance, and the reason we're doing this is for streamlining development so that everybody's machine looks the same. We're getting consistent builds and the like, but unfortunately adds a little bit of complexity that I can't get around when I'm showing this as a demo. We'll be using cube control commands directly from the physical machine. We've got a little sleight of hand where we're actually proxying these calls through the different layers here and down to node port. But when I start working on the networking, the edge of the light gray box, node 01, is where your node ports actually terminate, and so I can't reach them from the physical machine. So, I had to explain that before we get into this. So, if we start this off, I think I can just hit... So, I'm actually just running a QMU instance here. Let's see if I can make this bigger. No, I don't want to do that. So, I'm just booting a QMU instance here. We've got two gigs of RAM and just a network device. Just a standard CentOS machine. It's logging in to show what the environment looks like. It's a standard run-of-the-mill CentOS machine. And all I did was take a gig image ID deed from DevZero and then run a QMU install on it and, of course, reskin it with the DevConf logo so that we would have something recognizable. So, killing that off. And I'm actually going to start a simple HTTP server here using just Python, which I wouldn't really recommend, but it works great for a demo here. So, I'm going to use Fort 9090 and, of course, when you run this, it's exposing all the files in this directory. One of them, of course, is disk.img over Fort 9090 as a web server, which will become important in a minute when we start using the containerized data importer. So, here, this is the containerized data importer. And what just happened was I'm using the get tree, as you can see. There's a little bit of cruft there from the video. Ignore the second argument of the last part of it. All it is is a pointer to my get repo with containerized data importer. And all I've done at this point to that is run make manifests. So, it's just a straight get tree you can check out and run directly. And that's all I did here is just deploy these different pieces. So, I've got a service account, the cluster roles that are needed to actually do these actions, and, of course, the controller that is monitoring for persistent volumes or persistent volume claims that match its they're actually annotated. I'll show that in a minute. So, I'll show you what the persistent volume claim here looks like. We're using annotations and that's all that the containerized data importer needs in order to recognize persistent volume claims that is supposed to be taking action on here. So, as you can see, I've got port 9090 disk IMG. This contrived IP address actually points back to my bare metal machine when I ran this demo. And, of course, the key value pair, the key is the Kubevert IO storage import endpoint for telling the containerized data importer where to go to fetch this image. The commented out secret name, obviously I don't need because I'm just using Python simple HTTP server, so it's just going to serve up whatever file it sees. So, I'm going to go ahead and create that. That's created the persistent volume claim already. And we'll look at the pods here real quick to show that that's the case. Now, there ends up being a little bit of lag here and that's an unfortunate side effect of our current implementation. We're in, as you can see, it's now running. We're running the upload for containerized data importer directly through the Kube ABI server. Now, in this case, that's a 10 gigabyte image. We're moving 10 gigabytes through the Kube ABI server. That's causing lag. I'm sorry about that. That is what it is. In the future, we're going to be re-implementing that as its own service endpoint and using different solutions in order to be able to authenticate so that we don't have that particular issue. So, checking the logs here, you see the import has begun. And so, we're going to look at the virtual machine instance itself here. This is what's going to actually be using this persistent volume claim that we're creating right now. So, this is basically the bare minimum that I would really want to define in the first place. I've got two gigs of RAM. I've got a persistent volume claim in this case, which is mapping back to devconf-pvc, which is the persistent volume claim that we're actually creating with the containerized data importer as we're going live. So, highlighting some of that here. Of course, two gigs of RAM. Sorry, I shouldn't pause at that point. I'm basically repeating what it's doing. So, quit out of there and then we'll check on the persistent volume claim again to see if it's up yet. Still not. But for now, let's look at the services. We don't need to wait on the persistent volume claim itself to be instantiated in order to be able to instantiate the service. So, this is what it looks like at this point. We're basically just going to use port 22 as our service, SSH, because it's already running on the CentOS box. And we're going to be exposing port 30,000 as a node port, which means that on the outer level of the light gray box, we're going to be using port 30,000 as SSH. So, over here in devconf.yaml I'm highlighting the incorrect thing. Actually, what I was wanting to show is the label here is devconf.usDemo. And that's just an arbitrary label that I put on. And so, the selector on the service here devconf.usColonDemo that key value pair is what indicates that this service should match that virtual machine. That's all you have to do. So, we can go ahead and create that and it's up. So, I'll show that. So, we've got a cluster IP of 109913423244 and the port that we're exposing is 30,000. And we can check the endpoints real quick. So, services always have an endpoint, which is where they map to on the other end. And at this point, of course the endpoint that we're mapping to is none because we haven't created the virtual machine that maps to this yet. So, let's check again and it looks like the import has now completed. So, let's check the logs real quick to make sure that everything went okay. And it did right there. Import complete down near the bottom. And we don't need to worry about the learning for file because we didn't use that. We used HTTP. And here is the persistent volume itself that we're bound to. And it is mapped to the DevConf PVC. We don't need to worry about its name because that will be looked up automatically. So, let's go ahead and create the virtual machine instance now. And show that we have a new pod for VertLauncher and it is running. It's been up for three seconds. And here's the virtual machine instance that it goes with. So, let's check the VNC console here. Login through VNC. And the screen saver kicked on. Don't know how to get out. Actually, no, we ran out of power on the laptop. We never plugged it in. So, yeah, the power ran out on the laptop. Here is a plug. We never plugged it in. So, at that point we were showing the virtual machine, you know, missed it right there. VNC was going to boot up and actually show that we were using the exact same image that we had started with on bare metal, which was, of course, well, we've got a charge now. But it would take a minute for the machine to boot at this point. So, but that was basically, the only other thing that I wanted to show was the service and actually how we mapped from that virtual machine exposed that service because the endpoint that I showed you a moment ago was mapped to that pod for that virtual machine. And because we're mapping the virtual machines, there's the pods IP to the virtual machine, that then terminated at the point itself. So, then we were able to SSH in from the node IP itself. So, something outside the cluster could then SSH into that box, which obviously if you're going to have a cloud based virtual machine is a very essential point. From here, the, yes? Yeah, we're just going to talk about the next steps, of course. One of the things that I glossed over was that we don't, or that I was using local storage on this because it's a single instance machine. You can use other back ends, however I chose not to do that because of the complexity of actually doing that on a single node. One of the things we'd like to work on in the future, of course, is making that a little easier to do and multiple networks, you know, is another thing that I had mentioned that was on our major wish list. But from there, I guess it's time to take questions. Here. Do you see the cloud mainerized? We're bringing in Mike one second. All right, can you guys hear me? All right, I want to make sure I understood the containerized data importer, is that what it was called? Yes. Okay, it's basically just a utility that will go grab an HTTP image of a disk image, right? Yes. But it runs as a container, is that what it's called, containerized? It is a container basically running as a run once container. And that, you know, because as you see, it said completed when it finished actually doing the import. And so what that pod's job was to do was just simply to load the local storage on one end and connect HTTP on the other and just move the bits. Move the bits into the PV. What's that? Just dumps the bits into the PV. Exactly. Okay. And it's doing it in the namespace that you want on purpose. So whatever namespace you mentioned in your virtual machine is where it's, you know, the containerized data importer is going to start its pod and then that, you know, we're not running into permissions issues by doing it that way. And is that the only utility you have today? You don't have anything that'll pull? I was talking to Steve Gordon here and I thought there was some way to pull an OCI image that might be embedded with a kernel and all the things that you want to be able to do that. So pull an OCI image with it's already got all the bits you want? It's the abuse of OCI, but it also would make it nice in that all the infrastructure would be the same, right? I think what you're talking about in that case is using a different, excuse me, container runtime. No? No, no, no. I'm saying like dump a VM image in an OCI image, stuff it in a registry. Oh, right. Pull that in instead of just a random disk image. Or another option would be just pull street from Cinder. Like I always wondered why. So we actually do have registry disks as a possible option here. So, you know, you absolutely could do that. I think that in a production environment, I would imagine maybe that the persistent volume claim would have just a more universal appeal. But yeah, we certainly could put it into a registry as well. A container registry, right? We're actually doing that on our dev images right now. So we created Docker registry, sorry, a container registry. And we, you know, when we instantiate the development environment, we stock it with an Alpine image and a Cirrus image and a couple others, just so that we have base images to do all our testing infrastructure in. And so yeah, we put that directly into a container registry and utilize those images inside Kubevert as well. Yeah. In that case, we actually have a, there was one other pod that was sitting on the, you know, what I would was listing the pods. That's what was doing that. Yeah, one more pod. Any other questions? So this project has been going on for a couple of years. Are there examples that are using this for like primary security or are they just migrating existing virtual machines to make it easier to their existing workflow? And if the former, can you give an example of where it's better than just... I'm getting a lot of echo. I couldn't hear the question. I'm sorry. So one of the benefits of running virtual machines on top of Kubernetes is you get more than just like LXC and C groups to do like sandboxing between different environments. And I was curious like, are there either customers of Red Hat or Red Hat that's using it in an area that they have like pen tested this on direct containers that they have and this was secure enough? And can you talk about that a little bit? So in terms of what I think the question is is address security concerns or with security one of the things we were trying to address when we set this up. Yes, it is more secure. That wasn't necessarily our stated goal in terms of something we were setting out to accomplish. But you're right. There's a lot stronger process isolation when you run services inside of virtual machines like this. However, I would still point out that between other containers, there's no isolation at that level anyway. So you're just really, I mean, if security now if you're looking at untrusted workloads in the virtual machine, yeah, this is great. If you're looking at not trusting anything on the cluster, you're going to need stronger guarantees. Does that answer the question? Yes. So the question was how is it different from in the virtual machine versus just in a container and maybe who cares about that? Okay, so in terms of what's the difference in level of security between a virtual machine and a container? Level of process isolation. That's still gay, are you? I'm sorry. The level of process isolation? The level of process isolation? Yeah, so that is one of the reasons that you would run a virtual machine is because you have that stronger process isolation than you do with a container which you see groups, namespaces, SeLinux. I mean, those are strong guarantees but in theory you might be able to do something. I don't know. So yeah, I mean the virtual machine is a stronger guarantee but that's not what we set out to do when we do that, when we set this up. Thank you. Anybody else? If you don't mind I might try to answer this question. I don't understand it. So I think you're thinking about it reverse like you're thinking about it from a security perspective he's saying it exactly right. Don't think of it from a security perspective but we should tell him what to think about it. Think about it as a tool to pull a VM into an application similar to in a Kubernetes EMO file like that's phenomenal, right? Because now I've got a database living in a VM I've got front ends living in a container it's a way to scope the entire application with a single application definition of Kubeberg. On the other side of that I would say something like Cata Containers is where you're now running a container in a VM it still pulls the VM image it still uses all of the I'm sorry the container image it uses all the container constructs that you care about so now it's a packaging format in that scenario you're adding extra isolation around a container I think that's the way that you think about it a security construct in that now it's an isolated container but I still get all the packaging format advantages you get bringing old stuff that's in a VM advantage which is like the converse essentially Thank you First you owe me about 50 cents Can you spot me? So one of the interesting things I would like to see with Kubeberg is to sort of be able to handle the Cata Container use case in that Kubernetes is running along and finds that it needs additional resources an application figures out that it needs additional resources so it calls into Kubeberg and launches additional VMs that Kubernetes could take advantage of those new VMs to launch more containers inside it Is that being considered or is that So the big limitation to what we're setting out right now is that we're attempting to modify the Kubernetes cluster or not require it to be modified ahead of time and so in order to be able to run a virtual machine as a container that requires a different container runtime than the default so we can't ask an administrator to necessarily do that because that's going to increase the friction in the future if Kubernetes does allow dynamic container run times to be injected in on the fly then we can explore using virtual machines directly or using them in the Cata Container sort of approach where the point is for process isolation. Does that answer the question I'm not looking for necessarily the isolation to be great but just being able to launch more at a certain point and applications run out of resources inside of the existing VMs and it needs to launch more VMs so usually Kubernetes breaks at that point and has to fall back and open stack or some other tool for launching more VMs I just thought that Kuber could be a way to actually oh so resource overrun right I hadn't thought about that that's an interesting angle three more minutes maybe less is there a way to pass like a cloud config as part of the virtual machine instance object cloud init data yes I did not show that absolutely yes we inject that into the virtual machine and you know basically cloud init is its own standard so we're not doing anything special there we're just injecting that data and making it available we do that usually through a through its own volume mount so just to confirm the Kuvert has no access to the VM itself right it just brings up the VM and you're expected to know how to make use of it that's a loaded question no what I mean is like there's no for example if there's no like injection in the cloud config of like an SSH key well you can do cloud init data you can do an SSH key that way absolutely right right but I mean that's up to the user that's up to the user to do what they had to set out presumably you would know that you wanted to do that however so we're exposing the ability to do that and yes you can use that or but you have to know to do that right what I just mean is the Kuvert just brings it up and that's it like there's no management inside of the VM itself in August 2018 that's the answer is yes that's true we are looking at other possibilities in terms of doing a monitor application that would be available but of course you know how do you do that in a generalized fashion if you're building up a generalized virtual machine suddenly you're building you know your own VMware sort of infrastructure so yes we're looking into possibilities for limited cases last question maybe let's get started I never gave this talk in 35 minutes so I probably would run over time but nobody's speaking after me so if you get bored just leave but thanks for coming thanks for staying long on a Saturday evening at DEF CON Graebiots vs Nightmare how could we need this and container changing the world Graebiots they fight Balrogs and for Linux distribution so when I came up with the name this whole systemd drama was going on and Debin got forked over systemd and I thought if they fork over systemd what they're going to do with containers in kubernetes kubernetes is actually systemd done right in a way so that's what the title came from like first time I gave this talk in bruno and I actually didn't shave for quite a bit but I can't pull that off here so what are we talking about so traditionally I'll go a little bit into the history of the role of the operating system and why I think these things matter if you look at inside rathead you often can have this discussion like what is the operating system a lot of people say well rel is the infrastructure I never really agreed with that from my point of view infrastructure like network cables and hardware and things like that and the point of the operating system is actually to run applications it's the application run time because no one runs an operating system because they want infrastructure to run an operating system run an application and they want to abstract the infrastructure so they don't have to deal with it but it's true like there are two views there's the hardware center we're thinking about how do I make the hardware work how to bring it up and there's the application center view which is how do I how do I give an application a common run time and if you look at the historic role of linux specifically in our world enterprise linux coming from a world where like everything was about the hardware computers were big and scars you call them mainframe and they came from a big window who controlled the hardware the operating system and the software you could run on the operating system and you used least hardware ran black box services in not everyone like not every mainframe worked that way but the biggest dominant players in the market worked that way you didn't really have much control and everything was vertically integrated and it was like least hardware and like if you wanted more performance you called them and they locked in we unmode them made funny noises and then activated more capacity in your mainframe and then we had linux and mini computers which I will ignore here to make this shorter someone will call me out on it with linux we had a little bit less vertical integration you still had the operating system and the hardware from the same vendor usually depending on the vendor more or less control on the ecosystem some of the linux vendors had very tight control on the software you could run on top of linux some had a very liberal view and in many cases you would install the GNU environment anyways because you want to tap completion and at least better shell and open systems was the big deal so you end up with controlled hardware vertical integration hardware operating system and often in the tool chains you had some choice of third party software vendors but then that also was like too controlled for many people and with linux we broke both of them we created a model where now you have free choice on the hardware from different vendors and you have free choice on the the ICV ecosystem software you are going to run on top and you have full transparency and insight into the operating system itself the operating system today linux today becomes an abstraction layer across different types of hardware both the different footprints of what we call the hybrid cloud so bare metal virtualization, private cloud, public cloud and it allows you to build a binary once and run it in all these footprints and it gives you that transparency it also gives different players market different ISVs a common way to run binaries, build binaries an implemented standard it also gives you the same interface across different hardware architectures right so if you if you are doing ARM or Intel or power today or what's really interesting from my point of view risk 5 you can use linux as the abstraction layer of looking at that they're not binary compatible but at least their source code and the kind of architecture is called compatible at that level so you don't have to write your software multiple times and a lot of the additional work of making things work in different environments is going away so if you so linux is the neutral run time across different types of infrastructure different hardware that abstracts from that and allows you to be efficient as an application developer to not have to deal with the underlying infrastructure or have dependencies or get locked into vertical dependencies so that's the role that new linux played and in our world that reted, reted enterprise linux played for the common industry right and you know the snarky common is that we allowed people who couldn't afford the sun server to run their IT like they could so that's what we did back in the day now early on the way we managed software stacks in linux we inherited that from Unix and you know when I went to university like we had we had a bunch of of Vax machines and a bunch of big sun servers in the university and a whole bunch of individual linux PCs and showed up but early on it was like what we used was primarily the sun services terminals and we had multiple admins per server and they would compile software locally on these machines right and use a local and then use stow to put it somewhere to find it and your binary path and you cross mount user local or even user depending on what you did from one machine to multiple machines so you shared your binary run time across multiple machines by cross mounting it over NFS it was a common practice like that we did a lot of that and even in the linux machines early on so linux distribution came up as like a way to easily install linux because like figuring out how to make 500, 600 different components work together in specific versions was really hard so linux distribution wasn't meant to make that easy the first linux distribution basically used tar files with pre-compiled binaries for the initials slackware was what we used early on but there were others sls or things like that but then we would get additional software and that usually was compiled in user local and we used stow like some links to put that in the path and then every time there was not that and all that good stuff the problem with that was that you had a lot of dependency on the state of the machine so you couldn't expect that the software would behave the same way the state of the machine at the time where you install software when you compile software because you don't install it you compile it really defines how your software is going to going to behave and that worked fairly well if you had like a few big machines and admins would occur of that and it became very clear very quickly that that wouldn't work when you used PC type machines many machines with linux because it just didn't scale it just didn't scale to compile everything on every machine so the solution was to create binary reproducible builds to package things in a different way not just TARS but manage dependencies and things like that which the original you couldn't and it was just painful so manage the problem by trying to capture more state of your software stack in the binary package you create so RPM and Debian came along you build ones, install it everywhere, it behaves the same way it manages dependencies as all this context you Debian added apps as kind of a transport model for that right here it's something called up to date and later replaced it with yum and what that gave you is really a good way of dealing with a stack complexity however it implemented like the original model is a very late binding you bind on the source code level you compile in place and you rely on that kind of leading to the same behavior here we still do late binding but we only bind dependencies late we use shared libraries extensively and we rely on ABI stability between the libraries to update parts of the stack and make it reproducible it worked fairly well and got us really far 20 years of this I'd say the problem is that dependencies get hard to manage over time a lot of people in the room are dreaded, are painfully aware of that you start back porting things to older versions because you need to keep binary compatibility because you locked in this ABI contract in this dependency model that's a downside and things break if you don't do that an interesting side effect that I didn't really realize for a long time was a problem like back in the day when you compiled and used a local and stored things you could have multiple versions of the same binary or the same stack you could have multiple instances of the same code on a machine in different versions so multi-incense, multi-version environment because it was just separated way and used a local in the move to binary packaging both in Debian and in the RPM world we implicitly turn Linux into single instance single version system you can only install a single instance and a single version of every binary package if you want multiple versions at the same time you have to rename the package so you have to do some naming things and relocation and if you want to experience it look at the Python stack some of the languages support that better than others but it can be really painful and we evolved around that with some nice tooling software collection but we have additional tooling to work around a single instance single version and I never really realized a bigger problem that was we kind of of course we want one version of everything because we moved away from these complex multi multi service machines but who never really got on board with that because they wanted their own version it runs everywhere as long as you use the same JVM right and they always wanted in their home directory their own version of the runtime they might be okay with a shared JDK JVM for the machine but then they want multiple instance of everything else and we had these arguments of like J-Bus should be in an RPM package and no one ever wanted to use that and we were like why wouldn't you use that like you get actually control over your software stack and it's also your dependencies scrap those zip files and we never understood why I never understood why they didn't want that until I realized oh they want multiple different versions on the same machine and that's why they hate this because RPM actually doesn't allow them to do that in a clean way or they have to do these additional steps there was an interesting side effect and I think we'll see how that's going to work in this discussion now with all these PCs and like scale-out architecture and more and more machines RPM and JVM weren't good enough so other tools were invented one of the big things for Reddit was Satellite Kickstart Kickstart allowing you to recreate a software stack on installation on deployment of the machine, automated deployment of Satellite we built some instrumentation around that and central service for content management Satellite server basically was kind of a content management for software streams or is still content management for software streams where you do your standardized approved build and then you deploy that through a Kickstart file and then you centrally manage what gets updated where so you have a large number of machines and you centrally manage the software updates for them, work like a charm and you have things like CF engine and then later obviously new implemented like Puppet which is CF engine re-implemented in Ruby why that makes sense different question or answerable like a newer model so centralized control for configuration management and orchestrating these machines that you are on and that got us to really big deployments that works really well under the constraints of single instance single version or some additional workarounds and the constraints of a binary ABI ABI contract that you have to maintain now the problems in that model like is still is that whenever you have a change in dependency you want something that's newer you can't install different versions of it at the same time to do additional work and with the complexity of software that gets harder and harder today just because of the amount of software and the amount of change I had that with I managed streaming for this conference and the presentation laptops and so I installed them on one day and then two days later I ran the same script which did a YAM update with 86 packages that got updated now this is Fedora not an enterprise linux but it shows you the amount of software that gets updated in a linux distribution in two days it was the kernel and some SSL thing and a bunch of other things so 86 things that happen so basically if you have a large cluster when you are done patching it you are going to start over because it is always moving and it gets into some problems we are late binding we expect that people do this in production so I have a production cluster and I have a running application and I am going to update a library like SSL which is kind of important in production in that cluster and I expect that all my applications keep running I might have to restart them so it loads a new version and I do that constantly because when I am done the next update rolls in if I have 5000 machines and then I want a new version and that breaks over time we do an update because one satellite wanted an update and then we broke open stack because they used the same shared Python library and ABI compatibility was not maintained I am making that up that never really happened but it is just a theoretical example so the problem is that the amount of software the amount of change the scale and managing the stability the ABI stability gets really hard and we call that dependency hell so what happened is that people moved to VMs VMs came around as a big problem one big thing we inherited it came out of windows primarily because in windows in Linux early on we ran multiple services per machine you had a cluster with a bunch of servers you ran your database your business logic in the same cluster and you had cluster managers that would schedule your services and by default you had three tiers with typical at least three machines the database on one the next server and then your front and on the third server and if one failed things moved around and consolidated and you could run multiple things in the same cluster it was really nice in windows they never could really do that you never ran multiple services on the same Windows server and that was a real problem for them because you had to buy a piece of hardware for every service they couldn't multiplex hardware like we could it took too long so VMs took off as a solution to that because with the VM you had virtual hardware so you multiplexed your hardware by running multiple VMs one instance of the operating system of the windows operating system per VM they worked like a charm and we inherited that in Linux because suddenly our customers started operating Linux like that that solved some of the dependency problems because the dependency problem in one service it's kind of manageable the problem is if you're running satellite on one machine we test satellite when we update rel we test rel and satellite and that usually works the problem comes in when you run your own application on top of that with some different version of a Ruby package and you're running another reted package there so you have a single namespace for dependencies for multiple applications and by different developers who want to use different versions or are not aware of the same level of changes in API and ABI you get dependency conflicts a VM source set because you're basically saying okay I'm using one instance of the operating system I'm binding it to my application and I just run a lot of VMs so it's a bit heavyweight but it gives you isolation and reduces the problem it has two downsides one is we call it VMsProl now you have a 1 to 10 number of servers that you need to update so your problem of like if you had 500 servers now you have 5000 servers and you have to update all of them and this whole problem you're done at one end you're starting over that gets worse and worse and people use VMs just like in theory it would have worked in practice they just screwed it up so the first thing is when VMs came along everyone said oh you're going to use image based deployment so I'm going to image my golden image and then I'm going to run use it like an appliance we call it virtual appliance and you're going to update the golden image and then roll out the golden image around your cluster and that's going to be super clean you don't actually have to update things in production anymore I'm updating an underlying library and suddenly my application logic breaks because the developer didn't know that behavior changed, ABI changed or API changed you won't have that, in reality that never worked so most people even if they're using virtual appliances they use them for deployment initially and then they go back and run YAM update in VMs so they are like I've not seen any, I shouldn't say that of course I've seen it once or twice but not at scale at large people are not running VMs efficiently with imaging they're treating them like pets so it's terrible and at the end it borders a couple of years on dependency help because you reduce the side effects of different stacks running in a shared namespace but it sprawled out too much and it just got too heavyweight and any cloud took over it abstracted us from the hardware we don't care about hardware anymore performance doesn't matter anymore it's all about scale and we're elastic it just made this even worse now there's no control at all anymore you don't have to request a VM and wait a week and justify it now everyone deploy things all the time you just press a button in a self service environment and you get a VM and you install something and you run it many companies have kind of like a workflow around that with golden images and some control and you're not supposed to, if you're the application owner the line of business you don't have route and production but in reality what most of them do is that they give you route when you install your software because it's too hard to like figure out how to not have route you're supposed to give it back but in the end that means that your operation people have no idea what this VM is doing no one knows what's in there so it's great it kind of works but you have things changing and you create a CI CD process around it you create DevOps around it works great but the problem is because you're treating it like a pet you're not using the image based deployment you're testing it you're developing it as a version of the stack then you're moving it to testing and someone runs YAM update so it tests a different version of the stack with SSL updated and some Ruby packages updated and then you're running in production and it changes again so when you run in production it's not the same stack as the one you tested and then in production you're updating it still with the YAM update and SSL changes again so it's still like it gets put together in production every time there's a security fix update and people are trying to control it but customers really are afraid of security updates and bug fixes because they introduce churn but there's no way around that because you can't separate like it's a single path forward basically and in RHEL we did some interesting things like extended update support which I profoundly apologize for to our engineers it's my fault but it was a workaround I was a PM for it because customers wouldn't want to keep on they wanted the new features for new hardware and new software so they wanted to keep us back porting things to versions of RHEL in a major release so RHEL 7 runs for a couple of years 12 years I think I'm not a PM for it otherwise you can see but then they wanted once a year a more stable release that gets only critical fixes because we didn't want to absorb the change anymore so we added like another full layer of life cycle to work around the problem and it works but it's very heavyweight now while all this was going on a couple of things changed in the world out there in the industry everyone became a software company because software was eating the world there is no business it's not a software business my landscape will disagree but if you're in larger businesses everything is defined by software your car is a freaking data center on wheels so this value is driving development we used to have a certain control over what happens in IT because there was an IT department that at the end could enforce standard could enforce a golden image but that went away with the line of business becoming a software business and writing their own software and getting the power because they're driving revenue so at the end you win the argument against IT over what to run in few exceptions and we also had a change in cultures I grew up with three TV stations and you had to follow a schedule and if it wasn't on it wasn't on it's kind of weird but it makes you a bit more tolerant about waiting for things but that completely changed newer generations and people and customers are not going to deal with that because we have an on-demand culture and that has trickled through so that has changed line of business behavior and corporate behavior and now bring your own device it's just an example of that who would today accept that the company tells you you can't bring your own phone thank you if you need a proof like me having my own laptop and my own phone it's just a proof point that those things don't work anymore you can't stun at us we have a much more so service oriented architecture it's kind of a buzzword but it's reality we are aggregating services it led to a preference to consume the most current version because developers are driven by getting the end customer value in the line of business so they're not going to standardize on old versions of anything they want the newest version of everything interestingly open source is the default but that also removed even more control because everyone can get everything everywhere and so customers now have the problem how do I control what they download and the biggest challenge it's very creative using a lot of value for a lot of people but it's hard to control if you had back in the day with proprietary software it was easy to control because if the vendor didn't publish it you couldn't get it in our world that's not the point if someone writes it it's on GitHub you can get it and someone is going to use it and we had like cloud taking off DevOps changing how we do things and then application centricity and I wanted to delete that point because I'm making that later so I'll ignore it so one of the effects of that is that the amount of software available and being used is exploding I got this on the internet so it must be true modulescount.com gives you a picture of how fast software is being created NPM is going to create a black hole if they keep going like this so this could all be forks of the same package someone is going to use every single version of that if you think that we had a problem with dependencies to put this into perspective I think Fedora has like between 20 and 30,000 packages in the ecosystem that we package in RPM and I think Debian has like 10,000 more or something so that would be somewhere down here so our ability to keep up with this we can't repackage all of this in RPMs so we have this nice solution that got us really far but it's not going to keep up with the software is that line of business software application developers want to use no chats you can't keep up with that and most of these languages have their own package managers so it's not that like this is going all the way back to compiling well it's kind of depending on what you're doing but if you're using pip install it actually is going back to compiling locally because you can't find the wheel for what you are doing and so it's actually going to compile on your local machine and you're going to depend on your behavior for that and I'm sure that other languages have similar problems with all of them but it's a huge problem so we're kind of like because of the complexity we are back and I would argue that even if you're in a complete like binary packaging complete RPM or Debian packaging world the fragility of the dependency chain right now is basically as bad as the fragility of compiling from source on every machine back in the day because it's too much change and the ABI contract doesn't hold end to end high in the application and I'm not arguing that we should give up on ABI contracts there are a lot of things where that is very valuable on the low level on the root of stability which is the kernel, glibc, gcc those things you want to control but above the interpreters you can't keep up so that puts us in a problem our traditional distro approach was application centricity is at an impasse we can't keep up with the amount of package, we can't even make the packages available and customers just don't care Solomon Hikes found of Docker in the event of the Docker container thing said that a couple of years ago in a talk at their conference in Europe DockerCon Europe or whatever that was called he said no developer using Docker uses the distribution package manager inside the container he's probably right that developers will use the language native packaging inside their container because the stuff they want in the versions they want is just not available from the distribution because it's too much content too much change, too many different versions and I would also argue that just repackaging things that already packaged in a binary format or in a format that deals with the intersection of binary and compiling local dependencies is of limited use so if you have a pipi package and wrapping it in RPM doesn't solve an additional problem it solves the same problem in a slightly different way so we see a change there there's also a problem that testing when you're up in the application testing actually isn't valid if it's not done with your application because most of our customers have more developers than we do and they write applications that use features in these libraries that we will never be able to test so it's not only the amount of software but also the depth of functionality in the software that we can't cover in testing for all of the ecosystem and so you actually need to validate that a certain Ruby library didn't change behavior to break your application you actually have to test it with your application and that further devalues the standardization that we have been driving in the Linux distribution again lower in the stack I don't think things have changed but when you go up in the stack it gets harder and harder because it's not a high value target validating the kernel is a high value target validating glibc is a high value target because everyone is using them and the benefit of like no one compiles their own kernel anymore like back in the day people did that and I don't no one does that anymore and the value of doing that versus the risk you incurring is just not worth it but you know if you're like some Ruby app that is different so containers came wrong and we thought that might solve the problem we had the ability now to run things in different namespaces LXC was there it gave us multi instance multi version environments we could run different namespaces different versions at the same time vserver was vserver was a project or virtuoso was a project that everyone was using like 2011 2012 2011 and that actually like solved a lot of problems because now I could run different things on the same machine without dependency interactions I could compile things locally without polluting my system I could install things from pip or Ruby gems without pushing them into the shared namespace that all of my applications share so this problem that oh I'm updating this Ruby library and I'm breaking satellite and OpenStack actually OpenStack doesn't use Ruby so I'm making this up for the sake of argument that's not happening anymore now that was like the first iteration of containers was looked at as a nice packaging or nice operational tool but then Docker came along and revolutionized how we do this and it's brilliant we had containers which is basically C groups, namespaces and as Elenox but then Docker combined that with a transport format and a copy and write layering model so they added the concept of aggregate packaging and the concept of distribution so the two things that we had so with RPM we had packaging of application and we had some concept of aggregate packaging when I was in in states engineering consulting big deal was installing Oracle databases on REL and they had this Java installer that was just terrible and you couldn't really run it headless or remote so what you did is you ran it once on a test machine and then did a binary RPM just for the whole Oracle install to make it redistributable so you used RPM for aggregate packaging of the whole software stack just to make it reproducible so that's kind of the same thing that Docker did so you build the ones you basically inherit the frozen binary reproducible builds and then aggregate for the whole software stack per application and you have the model of distribution that we had was yum they added push which totally makes sense in the modern world which didn't make sense in the broadcast world but on demand peer to peer world it totally makes sense that you can push and pull things and they had the management of it so in a way you can look at this as reinventing it's like a static binary it's an early binding so you bind the full stack at one point when you create a container and then you have a fully controlled binary distribution of exactly the same behavior and the only interface the ABI interface is moved down to your user space to kernel interface which is the one that we can absolutely keep stable even upstream is keeping it fairly stable it has a great track record of keeping it stable and it's a valuable target because it's the right interface to keep stable everyone buys into that it also lets you move the same by the same artifact through the development test and deployment process so we're building it early and then we're testing we're deploying the same thing and we build orchestration around that to make it fast right now we heard enough about Cryosol just skip this but you know it's containers done right Docker was great they innovated invented a lot of things but they also had a lot of overlap with the underlying system that wasn't that useful necessarily like you know it's not a natural experience for sysadmin when you use Docker it's probably good for developer but for sysadmin it's not ideal and they have too much demon overlap with Cryo we solved most of that and you know you want to know more dance and Nellyn's talks on that we're awesome today and we recorded all of them so you can find them on YouTube if you didn't listen to them another problem is like an application today is not a single container so we have multi-contain applications right that's where Kubernetes comes in it orchestrates multiple containers into a consistent application and of course we're moving from a single like traditionally a single server was your default deployment right that's what we cared about and then you might the cluster was the exception right you would aggregate multiple servers into cluster nowadays everything is a cluster and the cluster is the default the single server is the exception so cluster is a computer again Kubernetes is like how we solve that we call it the meta kernel but actually it's just a meta system d we call it a meta kernel because everyone likes a kernel and no one likes system d but Kubernetes gives you it's basically both the application orchestrator takes multiple containers puts them into a multi application or manages how many instances of what you're running and how they connect and it also is your cluster manager so it gives you a scale out cluster next thing is full service delivery so we are with operators service broker API helm things like that we are putting together the full transport capability right so the problem statement is today I can I can do YAM install IPA and then IPA IPA install dash dash config or whatever like it's a config script and I have a running orchestrated instance that next time I boot my machine we'll come back with just containers and Kubernetes I have like that's a bit hard to do I have to copy around the Git export files edit files run 15 commands to get to the same experience with the combination of things like the service broker open service broker in operators I get to similar experience where I can actually take a full application I see it in a service catalog in App Store click a button it gets deployed in my cluster so I get full application deployment and portability which is really important for scalability and reproducibility and I'm out of time so I'll quick talk for two more slides before Langdon kicks me out so yeah so I'll put a protest in here someone had this stupid idea sorry for that to put 35 minutes on a technical presentation like you can't say anything in 35 minutes clap if you agree so three use cases that matter fully orchestrated applications that are portal orchestrated Kubernetes you still have applications scenarios what I call loose orchestration so you're deploying one container somewhere using something like creating a unit file to start it up in a traditional server and you'll still have pet containers which is like oh I want to do what I did on a single server but I want to do multiple versions of it I want to run different versions of the user space on the newer version of the kernel that's where pet containers come in so you're basically operating inside the container it's an exception but it's valuable if you are on single servers and using things like that you want to pip install things and you want to install a different version of an RPM than you have on the shared namespace it's just namespacing the machine still valuable and with Builder it's awesome so what does it leave out it leaves us with a new architecture where we have an application platform that has containers plus Kubernetes as a new meta operating system extending the old model of Linux into a cluster application centric view it's awesome and if you think this isn't real unicorns are real thank you very much thanks for watching