 Hello, and thank you for, oh, I'm wild in here. Thank you for coming to hear my talk. I'm honored by your presence. My name is Randy Barlow. I work on the Fedora engineering team with Paul Freelands and a bunch of incredible engineers. It's a thrilling place to be. And today I'm going to talk to you about our plans for getting our containers that we're making to you. I'm gonna begin the talk by describing the challenges we face with getting our containers out to the world. And then I'm going to present to you a plan that we think of as being our immediate short-term solution. We have a very short timeframe to achieve our goals. And so we're going to take some shortcuts. And lastly, I'm gonna talk about the future, the crystal ball, so to speak. I'm going to try to predict the future, which is very hard to do. So the problem, Fedora, as if you've been in this conference very much, you probably haven't made it very far without hearing about all the containers that Fedora wants to distribute. And for the record, Flatpacks may have some connection to containers too, so we aren't just talking about containers. We may be also thinking about Flatpacks a little bit here. And so we want to distribute those to our users and we have a lot of them. And our users aren't in one place, they're all over the globe. So we need a solution that will scale for a large number of people that are distributed around the planet. And I had a note here that so far we don't have any requirements to ship our containers beyond the solar system, but maybe in the crystal ball we can think about some ideas for that. Oh, I wanted to make a note also that I am open to questions throughout the talk, so if anything strikes you, you can ask a question. You don't have to wait till the end. So for the next few minutes, I'm gonna talk through some ideas we considered. I joined the Fedora engineering team in June of last summer and when I joined the team, there had been some ruminations about this problem and maybe more some ideas, some plans, some thoughts. And so I'm gonna start by talking about the initial plan that we had considered, which was to use pulp. And pulp is the project that I worked on for the approximately four and a half years prior to joining this team. And I think it maybe helps me sneak into the door a little bit to be here, because I had the experience with pulp and in particular I wrote pulp's Docker plugin to distribute its containers. So when I joined the team, Adam Miller had, who's actually giving a talk right now in another room, he had developed this plan to use pulp to solve this problem for a Fedora. So we sat down and talked about this quite a lot. One of the things that I came to realize was that the plan was to use pulp to generate the artifacts that we would put into our mirror network. One of the things that seemed a little bit challenging about that for us was that we were using, we'd be using a relatively complex project to essentially create some large files onto a file system. They're called blobs. They're the images for the container and we were going to then put those into our mirror network. But it seems like using a crane to crush a fly, so to speak, that's an idiom that means a very powerful solution to do something that's small. So I started talking with Adam Miller and I suggested that we could write a small tool that simply extracts these files that we want so that we could put them into our mirror network. So we started iterating on our next idea, which was to create a small tool that can talk to our internal build container registry, extract these large files, write them into a file system in a way that's suitable for mirror list and mirror manager to consume, and then we would have our worldwide mirror network copy these files down and then what? So we started thinking about this problem. There's a lot of things that needed to change here. We would need to make contributions to mirror list, if you're not familiar with mirror list, when you type DNF upgrade, your DNF client talks to mirror list and mirror list hands you back a list of places where you can get repository metadata and then your client picks one of those, which is hopefully near you geographically and then it will extract the repo metadata and retrieve whatever files you're looking for. So we had to alter mirror list to be container aware. In addition, there's another service that you may not be aware of called mirror manager. Mirror manager is the mirror admin side of things. So mirror list is for user facing. Mirror manager's job is to tell the mirrors about what updates are available and it gives them some options about what they want to sink down. So we had to also update mirror manager to be container aware. And we would also have to work, make some changes with our build system and sort of integrate it with these other systems. So there was a lot of pieces to touch here and it's complicated. And as we started thinking about this, we realized that it was going to take a long time to achieve this system. It would work well, but it would take a long time to get there. Additionally, there's one more challenge. We analyzed the notary system, which is a container signing system. And we decided that it was a bit more complex than we want to use and it wouldn't integrate well with our existing signing solutions. So we needed a way to serve these containers and manifest to our users that was safe so that they can guarantee that the files they're downloading from the mirror actually came from the Fedora project and haven't been man the middle attacked by anyone along the way. So we decided to consider another approach. And this was going to be a small flask app that we named Fegistry. And we considered doing this project with OpenShift dedicated, where we could store our large files in OpenShift dedicated but serve the manifest and initial API calls from Fegistry. Fegistry's job is to answer the container pull call. So when someone wants to retrieve a container, they will speak with Fegistry and they'll say hello and Fegistry will say hello. And then they'll say, I would like a manifest for the Fedora container. And then Fegistry says, here you go. And then when the client asks for the blobs next, it gets a 302 redirect into OpenShift dedicated. There was actually a really cool patch I found that RunCom created that would allow us to inject URLs into the manifest such that the client would never come back to Fegistry to ask for the blob, which would save a step. It also allows you to give a list of places they can find instead of a single place, which is the problem of a 302. So that if anyone of them responds with a 404, they just iterate the list until they find the container. So that seemed pretty cool. However, we have an incredible sysadmin in Fedora and I was talking with him about this problem. Actually, what really happened is I wrote a pull request and he reviewed it and he didn't like it. But this is good, right? This is open source, right? So I'm proposing a change to Fegistry and he comes in and says, why are you doing this? My change was to add support for the 302 redirects and some caching and he said, why don't we use varnish and we can use varnish to do some caching for the responses and we can use just HDBD rules for 302 redirects. And I responded that we wanted to have a Fed message consumer inside Fegistry that would know when to invalidate the cache and then once again, our amazing sysadmin, this is Patrick Alterveich, which I've mispronounced surely, he responded that we actually had a solution in place to do cache invalidation in response to Fed messages, which is awesome. So we decided to kill Fegistry. It only lived for about three days. It was a sad thing. I wanna make a side note here that this is something that you might encounter while you're writing software or doing any kind of technical work that you may have something that you're investing in and even if you only put three days into it, you feel passionate about it. And when someone comes and says on your poor request, let's not do this at all, this entire project, let's just kill it, it's a tough thing to hear, but the thing is you have to be able to override, look past your feelings and analyze is this what's best for Fedora? Is this what's best for our users? And it was. So we made the decision to retire Fegistry. By the way, it was in raw hide for about a week. It's retired. I was hoping I would get a badge for that because I wanted your project was the shortest ever project in Fedora badge, but there's no badge for that. Maybe I should submit an idea. So now I'm gonna talk about what we are going to do in the short term, and I've actually hinted at it quite a lot. So we were planning to run the atomic registry in our Phoenix data center. The atomic registry, the reason we're gonna use the atomic registry is it's difficult to keep up with the API changes and manifest changes, and I've had a lot of experience chasing these things and every time you spend a long time writing all your code, by the time you submit your core request and your co-worker tells you that you forgot all these things, oh, there's a new blog post and they changed it. There's V3 now and schema 10 and everything you wrote is no good anymore and you have to throw it all away and make a new project. So in order to avoid that problem, we're going to use the atomic registry and let someone else solve that problem for us. So this is going to be in our Phoenix data center and we're going to use Varnish on top of that to handle requests. Now one thing that's awesome about the, and I don't have a graph for this, but one thing that's very awesome about our Varnish system is that it is a worldwide network of Varnish servers. Somewhere out there, there's probably one nearby here and so what this means is when you ask for a container, you will very likely not be talking to the Phoenix data center unless it's not in cache. You're very likely going to receive a response from a caching proxy server near you. So this should be a quick interaction but what's more, you won't be receiving, the data center will be protected from serving any of the large files, the blobs because those will come from OpenShift Dedicated where we will run another atomic registry. The reason we are not just using OpenShift Dedicated, the reason we have one in our data center is we want to serve the manifest over TLS because we will not have signatures for our containers for a, for most clients and in order to provide safety, we will serve the manifest to you over TLS and the manifest references the containers by hash, typically a SHA-256 hash and then when your client asks for the blob from OpenShift Dedicated, which runs in Amazon, you don't have to worry about whether the container image has been tampered with because we will check the checksome and make sure it's correct. I want to say a side note here that we also will have many users in Fedora who use the atomic client and we do have a signature solution for atomic, atomic's pull command and so we will be signing for that client but we want to, Fedora really wants to support the idea of non-Fedora users pulling our containers and using them. So we want to think about Windows users even actually because they can run containers and maybe they would like to use our containers that we're making so we want it to be safe when they ask for a container from us so we don't want to rely on the atomic signature being the only way we provide safety. So that's the motivation for running two of these and using one just to serve the API and the other to serve the files. Oh and loopable is in the picture here. This is a project that Adam Miller made that's very cool. It listens on the Fed message bus and in response to any messages it can be configured to run a playbook for us. So we can use loopable to listen to our build system and know when new containers are out. It can push them into OpenShift dedicated for us and push it into our Phoenix data center so that everything happens automatically and you just type Koji build and everything just goes where it goes. So that's the role of loopable in the solution. I'm going to show you a picture of sort of how the API is going to work. I actually skipped a few steps here. So after the client says hello to our Phoenix data center this will, okay. The Phoenix data center. So this is our atomic registry inside our Phoenix data center. It runs behind HTTPD which protects it from, this is going to send 302 redirects. We have our caching proxy server which there's one in Phoenix too, actually didn't draw us but as I said there's actually one of these near you as well. So you may not ever even speak to this data center but the client will ask for a manifest over TLS. The data center will return a manifest over TLS and as I said this manifest has the hashes of all the blobs inside it so you can be sure whether you're getting the correct blobs or not. The next step is that the client will, an unpatched client will then ask the data center for the blobs, these are the large files. The data center is going to say no you can't have that but you can get it from someone else. So you'll send a 302 redirect. And then lastly the client will go to our OpenShift dedicated registry and ask for the blog and the blob will be returned. Are there any questions at this point? Yes. So the question is how are we putting the blobs, whoops. What have I done? The question was how are we, we're putting blobs in our Phoenix data center but we're also pushing it, putting them into OpenShift dedicated and the question is how are we accomplishing that? Is that? They both actually have a complete copy of both codes. So the internal registry has everything, it has the manifests, it actually does have the blobs and the one in OpenShift dedicated also has all the metadata and the blobs. However, we don't want users to default to coming here because we cannot guarantee as the Fedora project that these images have not been tampered with given that they don't run in our data center. So the only reason we're putting the data in both is for simplicity. We don't want to write our own custom project, we want to use upstream projects. The cost here is that we will be using a lot more storage in our data center. That's a downside to this plan. And likewise the OpenShift dedicated doesn't really need to have the manifests but they're relatively small. The big deal with there is gonna be the storage and the bandwidth. So our thinking is that keeping the solution simple allows us to deliver more quickly and we can solve those problems later and we will. Any other questions about, okay. So that's what we're going to do and we're planning to launch that solution in mere weeks. We are very close to being done there. So once we're done with that solution we will have to do some work to bolster it a little bit to make our data center solution highly available and scalable. So for the next few months we're going to be working on doing things like writing standard operating procedures for our sys admins and we're going to write a lot of documentation and we're going to we're actually going to deploy it to more than one data center. Phoenix will actually in the end not be our only data center that has the in data center manifests so that we can fail over live. We anticipate to have all of that done. So the goal for the initial launch is just Fedora Alpha Fedora 26 Alpha so that's next month. We want to be highly available for the beta freeze so we have a few months to work on those things. After we finish that we can start thinking about what do we want to do in the long term. One of the problems is the storing all of our blobs and duplicate it like this will cost us money. We also will be paying money to use the OpenShift dedicated system and so we're trying being that we're the Fedora project we don't have the large budgets that some other awesome companies have where they have this incredible CDN and in fact it has been estimated that in order for Fedora just to mirror our RPMs and something like a CDN would cost about $2.3 million a year and Paul my manager just made a scary face. That's a lot of money and so Fedora has traditionally always used our mirror neck. We have a worldwide volunteer mirror network and I hinted earlier that we had been looking at a solution to use them and so one of the things we would like to do long term is to consider other blob storage solutions. One idea could be to do all that work I talked about by getting our manifests keeping our manifestor from our data center but keeping those blobs on our mirror network which would involve all those changes to mirror manager and mirror admin, mirror list. One thing that's nice about our short term solution is that the in data center the in data center registries will be running will still be useful for our mirror network for the same reasons. A caveat to mention here is that our mirrors typically do not want to run software that we ask them to run. They really just want to serve files out and this is a problem for the trust since we don't have signatures we need to run the container API and our own data center over TLS. So it's going to be advantageous that we have this in data center deployment for the long term future I think. So we need to think of new places that we can store our large blob files. So we had also considered using perhaps something like Docker Hub but Docker Hub actually does not have terms of service the last time I looked which was in December none at all. When you make an account you're not asked to agree to anything. So, oh question. So the question to summarize is why are we why do we not require our mirrors to serve all of our files over HTTPS so that we don't have to rely on signatures? And do both. Well, if you recall, we don't have signatures for our containers when they are pulled with every client that's available out there because we will not be using the notary service. So we need some way to assign trust to the files you're downloading and even if you speak to a mirror over HTTPS this indicates that you trust the mirror not that you trust the Fedora project. In other words, there's no way for you to know that the mirror has not tampered with the files at all and that they really came from the Fedora project and that's a really important thing to us is asking the question, who do you trust? Perhaps you have a mirror that you trust maybe you run it yourself but we don't want each of our users to have to think about this, about trust on each mirror and you have to trust the Fedora project. So if we serve our manifest over TLS out of our data center then there's a verification that you really got that digest from the Fedora project and then you can go to any mirror and as long as the file you download matches the checksum, you can know for sure that it came from the Fedora project and has not been tampered with. Although I would say it's still good to speak to the mirror over TLS just for privacy and as sort of like a second layer of defense. So the question is essentially is there a conversation to be had about the notary services complexity and whether we want to try to offer signing for more than our atomic client. I actually am not an expert in this area so I don't know that I can tell you the full details our security chief, our security officer, I like to call him chief, Patrick Auserwitz who I've now pronounced four times incorrectly today for sure. He analyzed the notary service and determined that it was not ideal for Fedora to use it. I really cannot tell you the details. I got him to explain to me once and Patrick knows a lot of things that I do not know. So if you're interested in asking that question I think we could have a discussion perhaps on Fedora develop and I'm sure he would participate in such a discussion and probably give us an amazing answer. So I don't know the details. I just know that he did closely examine the solution and rejected it for Fedora. The comment was tell him to come up with something else which I am very much in favor of. Yeah, that would be helpful if we had something like this and to reiterate, users who use the Atomic CLI which is going to be integral in the future of Fedora operating system, those users are going to be doing getting a GPG signature on these things. So they don't suffer from the problems that a lot of the problems we're describing here but one of our main goals was to think about more than just our own current users. We want to think about trying to draw a user, sorry, what did I just say? Trying to draw users from Windows or other Linux distributions. So we want to make sure that they are also getting a guaranteed safe experience from our containers and are not being attacked in any way. Any other questions? Oh, I completely forgot to talk about that. Actually, that was something I meant to say and that's good that you brought that up. So the question is, are our mirrors eager to serve these files? And this is actually one of the biggest problems that I can't believe I forgot to mention this. So one of the challenges with containers and many of these new formats is that they are doing a lot of bundling. In addition to bundling, they use these layers. So you have typically a base layer which has your core things like glibc in it and then you've got your application layer on top that might have the program you want to run like perhaps HDPD. If you think about this, this causes a distribution problem for the Fedora project because these things generate a huge amount of data. Every time the base layer changes, we will need to build every single container we have in the whole project again. So if there's a glibc security update, instead of updating glibc, the RPM and having one file that changed, we will not change every container we have. And this means that you have however many containers Fedora has, times hundreds of megabytes that we now need to send out to our mirror network. And this could happen a few times a week. And our RPMs are already quite a lot of data but we're talking about making each container is also a lot larger than the RPMs that make it up because there's so much duplication. So suppose that I've got a container and you have a container and we both depend on Python Django, your container has Python Django and my container has Python Django but there's no deduplication. So we've got that twice. But we know that's not twice. There's other people using that as well. So that particular package will be in quite a few of these things. So every time we do a rebuild, we could be talking about 10 terabytes that our mirrors need to sink down. And even with very fast data links, that's a lot of data. And you have to think about storage as well. We often, I laugh sometimes, talk to as a developer, I'll sometimes say, well, you can get a big hard drive for $100 but that's because I've got a consumer hard drive in my computer. It's not a data center hard drive. Data center storage costs a lot more than you think it can be surprisingly expensive. So our mirrors are volunteering to do something that costs them a lot of money, which is awesome. But we can't ask too much of them. So that's one of the challenges I think of potentially a long-term solution of putting the blobs in the mirrors is we may not see very much participation from the mirrors in syncing that content. We have had some discussions on the mirror. There's a mirror admin mailing list. We did try to put out some viewers on the list to ask like, hey, if we did this, what would happen? And there was a lot of questions from the various admins about how much data we were talking about and those questions are very hard for us to answer but the truth is we're talking about a lot of data and it sounded like there would be a lot of concern from them about whether the places they work and they host this stuff would be willing to pay for both the bandwidth, the storage. I guess that's just the two things, the bandwidth and the storage. So that's a really big problem. It may be that that does not work for us long-term and we'll have to think of something else. We also did look at key.io or quay.io. I have no idea the right way to pronounce it. We would need to establish proper terms of service but that may be a potential option for us and maybe there's something else I don't know of that you know. So if the OS trees. So the question is would OS trees be another way we could distribute these to reduce the amount of data? That's a good solution but OS trees actually have the opposite problem from containers with when it comes to mirroring. We actually did in the past put OS trees into our mirror and we got some, I actually wasn't around for this so I did not receive these. But I heard that there were some consternation. The problem with OS trees is instead of making huge files which is the problem with containers they make lots of little files and the mirrors actually sync with our sync and so they would spend an entire day doing a sync and then we've already updated it again. So can we use them, can we ask the mirror admins to use OS trees instead of our sync? One of the problems that we've had with the mirrors sort of a social problem, not a technical problem is that they typically do not want to run software we ask them to run inside their data center. They want to have a simple HDPD server serving a file system and then they like to use our sync and this is sort of the way it's always been done and if you keep in mind that many of these mirrors also mirror other distributions. So they want something that will just work and be easy because after all they're volunteering and they're being very generous to us and we can't ask very much of them we can't ask them to run our software for example. A comment was that we can't guarantee that there's a infrastructure in place that can even run OS tree at all. So we could offer options though this has been a discussion inside release engineering that we could ask mirrors to volunteer to do things like that. So we could put out say, hey you can our sync these files but it has some problems if you're willing to run OS tree consider that we could do this and we may see some participation but historically it seems that the participation levels when we ask for things like that is actually quite low. So that's one of the challenges there. Any other questions? I'm actually near the end of my talk. So in summary, we talked about the challenges and I forgot to talk about the mirrors and the big files problem but that was, so I talked about the challenges out of order but we talked about some challenges. I talked to you about some of the solutions that were considered both before I joined this project and since I joined the project. I talked about our short-term plan to run an atomic registry inside our data center and an open shift dedicated just to get us going because we need to have this live in a few weeks which makes Amanda happy and we want to keep Amanda happy. And then we talked about some long-term ideas. The long-term ideas are not in any way set in stone. These are just things we're thinking through and if you have ideas that we haven't talked about I would love to hear about them because we're kind of out of ideas. So Paul. Oh yes. That's right. So we have been in talks with the CentOS project to find some ways to, yeah. We're looking for ways that we can collaborate on this problem. We have talked with them a few times. So there's this high-level idea that we want to collaborate and that sounds awesome because why would you not want to collaborate? When we work together we accomplish more than when we work apart. But one of the things that's been difficult for us to identify up to this point is what are the exact integration points that we can collaborate on? Where do we draw the boundaries? There are some suggestions that we could actually share a registry deployment so that one registry would have Fedora containers and CentOS containers. There are options to share the technology to share the way we deploy. There's all kinds of hybrid ideas we've talked about. We've considered having our, we've considered even using 302 redirects in our registry so when we see people asking for CentOS containers we say, oh, I don't know where CentOS, I don't have CentOS containers but I know they're over there so I'll send you there and vice versa. So there's actually a lot of discussion left to be had and in fact we're planning on talking about that in an ambiguous time over this conference, like a hallway track. Yeah, perhaps today, perhaps tomorrow, we'll talk. So that's an ongoing question is what can, what are the details about the levels at which we can collaborate together to share and work together to create something bigger and better? So yeah, it's kind of a fuzzy answer. We're at a fuzzy place. Any other questions? So the question is, how do we manage authentication and authorization on the registry? So because this is an open source project and as a CentOS, we actually are not going to require our users to authenticate when doing a container pull from us. Much like with DNF, we don't require you to authenticate to do package updates or installations. We do plan to use authentication authorization on write operations, of course. And we plan to just use HDPD to protect the particular API calls that are involved in pushing containers to the registry so that it's only allowed from our own infrastructure, which we use Kerberos all internally. So it'll be, you'll have to have a, what's the term? TGT. It's actually, what is it? It's a proper ticket. Yeah, but I was thinking the, forget the name of the, when you have a server that has a Kerberos. Yeah, a server principle. So basically only our servers are going to be able to push into the container registry and no users will be able to do this. And it'll be protected by the server. Any other questions? No? Okay. Actually, I should have put this light up. So that's all I have to talk about unless there are further questions. Okay. Maybe all the tools for regarding the mirrorless, I mean, so that they would, so that you would propose some tools which would offer, for example, the duplication, which would offer multicasting and then we would also propose some more requests for the client to try to consume or use these new features so that everybody would look at this as an improvement, step forward, and then it would be a more acceptable program to serve those big blocks. So one of the questions is, do we have an opportunity here to make contributions to our mirror list and mirror manager projects so that they can do things like deduplicate the blocks in the large files and make it easier for our admins to our mirror network to consume? If we were to use the mirror manager to distribute our blobs, we would have to make contributions to it. As it is, it's very oriented around YUM and DNF repositories. So when we were considering this, that was very much the plan was to get involved in that community and make contributions there. One of the problems with the deduplication, so we could deduplicate some of the data in mirror manager itself, perhaps, but if you recall, one of the challenges is that the blob files need to be downloaded by the mirrors. They need to retrieve these files and they don't wish to do anything other than rSync and so that's the trouble is, though rSync does a lot of block sort of deduplication operations and things. The amount of data we're asking that is serviced are quite large. We do have the opportunity to distribute the base image one time so that's a point of deduplication so we can have that hard linked so that it's only served one time to all the mirrors but the deduplication I was referring to is actually mostly in the layers on top. Everyone that has Django is going to get Django in it again and again and it's very difficult to deduplicate that without using a fancy file system or other things but rSync will not do that for us, unfortunately. Any other questions or ideas for a bullpull request? Okay, well thank you for coming so much. I really appreciated you all and thank you for listening.