 Thanks everyone for coming. My name is Zina and I work on park park project at Red Hat So the main reason why we all gathered here is pulp now Please you raise your hands and tell me who are here because they hear about the pulp for the first time Very good. So in this session, you will have the chance to learn about the school project And for other guys who already had the chance to play with pulp and have some experience To keep your interest warmed up. I will talk about new features with recently implemented So the plan is I will go through slides show you the basic concepts I will show some features and demo and in the end I will take a bunch of questions So let's start What is pulp? Pulp is a platform for managing Repositories imagine that in one hand you have a lot of content types like R. P. M Docker images erratas puppet modules and many others and on the other hand you have a lot of repositories and In case you want to Combine and manage these two concepts by one tool pulp is just for you As I mentioned, yeah, I forgot to click that As I mentioned pulse supports many content types, but meanwhile is it is not tied up to any type We have support for Python packages for Docker images, I will tell you more about that in a couple of slides. I Will tell you in this session about the new brand future called pull-through cache It is available in 2.8 version We're a completely open source. You can find all our code on the github You can see all the pull requests. You can comment. You can submit the pull request And we will be really happy to some contributions Byton is a web publication. It doesn't have any Graphical fancy interface bus instead we have a common light interface based on the rest API Now let's see some examples who uses pulp It had released engineering uses pulp So that release engineering is responsible to deliver contents to our clients And it's a big deal because we have a lot of products, right? And behind the CDN, which is the content delivery network. There is pulp Pop is also in public clouds. So if you have ever used an easy to image on Amazon And if you have ever done, I mean style something you pull this content from pulp Catella, which is the upstream project for it had satellite six of this pulp and obviously our lovely community So when you have this fresh installation of pulp And you want with what do you usually start usually you start with the creation of the repository So when you create the repository You tell to pulp what kind of content you want to place into this repository. So let's say you want to manage RPNs You created this repository. It's empty at this point The only thing that has been done there is also a record in the database. That's all At this point you want to get the content into pulp and you have two options You can synchronize the content from the remote repo Such sources like CDN on CentOS. You can do it manually or on the schedule Or you can upload your own content At the point when the content is into pulp, you can move it around you can copy one repository to another repository and The main cool feature about that is the copies are cheap for example, you can have hundreds of repositories and in each repository same RPM at This point if there will be only one record in the database and a bunch of sim links that point to the storage path for this content unit You can also apply some Filters during the copy by using criteria search So basically you can mix and match content as the way you need and you just are So let's say you have this repository prepared and you are ready to show these contents to the rest of the world What you're gonna do you want to publish your repository and by publish I mean make it available to the rest of the world so There are several options how to publish it you can host the rep was a web-based repository or you can export it into an ISO and Very nice thing about that is it's not limited to any particular kind of content What does it mean? It means that pulp has a very flexible design and in case you find out that there are We support a lot of content times, but it can happen that You need something and we still do not support it yet And at this point you can do it by yourself how to do that. I'll show you later, and it's not a very big deal So let me show you the specific case so we Do you see? Yeah, this is the palp admin palp admin is our common line interface What you do you tell I want to create an RPM repo. I Do RPM repo create I specify the repo ID and I specify the feed The feed is the URL from which I want to pull in the content. This is the upstream remote content source So at this point I created this repository What I do next I think it I say palp admin are pain sync run and specify repo ID in addition to the run we also have RPM reposing schedule, so the make a shadow. It's like a crone job. You can do it like once a day once a month How do you need that? So this thing starts it allows some metadata it figure out what content it need to grab so there is some RPM delta RPM's and The end the task succeeds at this point the content is into pal Then there is the publish It has as you notice some similar structures. So there is this RPM type repo publish run At this point we are making content available to the rest of the world what we do we publish the metadata Oops Yeah, here is it we are publishing the content types and in the end We are making available it via web by default. It's available via HTTPS, but it's up to you to configure that Oh, and now let's talk about this cool feature that we recently implemented In public who have such concept as download policies What does it mean it tell it means that you tell to pulp in which way you you want the content available? So there is the immediate a lot which we just saw You say to pulp sink pull it so the content is pulled right away and save to the file system and There is the deferred download which basically enables pulp to serve declines or copy content Repos without actually the loaded the every content unit first and we have two types of Deferred alone. There is the on-demand download and the big background one So what is done? at this point only the metadata is downloaded and save to the database and During sync The the download of the file is skipped So after synchronization and publish the content is ready to serve when no actual content was downloaded so there is some magic let's say and The background don't know this pretty similar the same like the on-demand, but so it skips the files during the load it Saves the metadata into the database and After the sync completes in addition to that it creates a task which runs in the background and downloads Every file in the background and the cool thing about that is While the content is the law that you can still manage this content You can do rep a copy you can publish it will basically anyway ready to serve the clients Let me explain how it works so let's say we have the Young client and it talks to pop and it says hey palp. I want the beam package And it makes a request to the Apache What does Apache Apache handles for requests? And it looks into its file system. Do you have this package in case? I do have this package. I serve it directly and I respond to the client in Case I do not have this package. I Make a free to redirect and the request is redirected to the squid Squid is our proxy and what it does it caches the content Squid looks into its cache and sees do I have this file? No, I don't have it what it does it makes a request to the palp streamer Palp streamer is a microservice which is responsible to download the actual content from the remote source So palp streamer figure out figures figures out where this exact RPM lies So it goes to the upstream repo downloads the content and streams the content back through the squid Apache right to the client So at this point the client received his content One nice thing about all this process is once the palp streamer Finished its download it makes a record into the database and says hey I don't know that this file and I make this notice to palp. So at this point palp knows that this content was downloaded and Every 30 minutes, but I think it's configurable Ah, there is a task and what does this task this task goes to squid where all the cached files are and Downloads them and saves them into the file system of the palp So next time when the client requests same package It's not downloaded from the upstream repo But Apache can serve it directly because it already have saved on the file system. Oh I think that's all And about the squid it has very nice feature what it does is deduplicate the requests. What does it mean? It means that for example, we have 100 clients that Simultaneously decided to ask for the same package and so there are 100 requests Squid looks at them deduplicates them and practically it makes only one request to the streamer to download this package So we will not be in situation where there will be downloaded 100 times same package Do you have some questions at its point? Yes, how do we actually invalidate the cache? Oh Well, honestly saying I wasn't working especially on this Feature, so maybe Michael will be able to answer. He's the team lead How do we invalidate the cache? So I think there is a variable which you can configure and this cache I think is alive for one day, right? So you can configure it to be expired like in half an hour in one day one month So back to the content types as I mentioned We have a lot of them and currently we do support these content types. We have the whole rpm family We recently implemented the support for Python packages We have support for Docker images There is also all three You probably know about the project atomic which is based on this all three We also do the community support of Deben packages and Susie packages So Let's see what are the use cases for pulp the typical use case is dev test production Where you pull in the content into? Development repository You do some testing and then you promote it by copying to the testing repository and then To a production one so when you do manage the content you want to be sure that you do not screw up and by mistaken will not provide the Testing crepo to a production one, so pulp basically Ensures you that we will not happen and it's very useful for testing upstream Repos like new red hot point releases another use case is that you can mirror packages from Python package index you can sync Part of them all all of them You can also add your own custom packages And I want to mention that it can retain all versions and why it's important is I'll explain you So I bet you all had the situation when you were using a package with some version and then you just built your infrastructure around this version and On pi pi there is a new release and it happens. Sometimes that the world release just disappears, but you still need that So if you want to take control of These versions pop will help you so it can retain the versions you need the exact one Our community also likes to take advantage of use case of mirroring the puppet models from Puppet Forge you can also sync all of the puppet for sure some specific modules You can remove for add the modules as much as you need and it can also retain all the versions pulp is scalable and extensible pulp has a very flexible design So our core features like synchronization and publish was implemented in generic way so you can extend it with the plugins and Plug-in will tell to pulp how to get content into pulp and how exactly to get content out of pulp and Once this plugin is installed Palk has the ability to auto-discovery it automatically so Plug-ins as I said If you want to implement a new support for new type for example rubber gems, right? We don't support rubber gems so far and you need them you can implement this and how to do that you need to write a plugin When you write a plugin you need to define how will be your content What make it unique? Let's say RPM who knows what makes an RPM unique? What Exactly the never So once you define the uniqueness of content you need to figure out how to write the importer and importer is the thing that basically Answers the questions How do I pull in the content into pulp? What I need to interrogate the remote source? What do I need to grab how I download it and how I stuff this content into pulp? What does the distributor? Distributor basically makes the opposite thing it figures out how to provide the content out of pulp so You have bunch of RPMs right in the RPM repo and what distributor does? It makes from it a general yum repo Which is treated like any other yum repo by yum agents So yum agents can come and download the packages like from any any any other yum repos We also have the CLI extensions is the common light extensions. It's also pluggable and It has a very nice hierarchical design. We use the Polkara framework What does it what it basically does it enables you to implement new comments? for newly implemented content types and operations, so We have like RPM repo sync publish. I don't know what else you can invent or what else you need You can do this with this framework Now let's talk about the integration Pulp is designed to be integrated with the build system with the continuous integration testing workflow with the REST API The generic REST API that manages many content types if you want to Have some response To some action like I don't know successful publish There are like there are these you can have these events published to an amqp topic exchange to be clear amqp is a Message broker, so you have this message producer that Sends the messages to the exchange. What does the exchange? exchange decides Which queue should receive this message based by topic, so you subscribe to this exchange Look what is pulp is doing what's going on and when you want to make a response you just say I Want to kick off a job in Jenkins and to test this just publish repository to see its correctness and how it's working We also do provide HTTP callbacks What do they do is they send a callback to the URL to notify that this job has been completed and it's very nice way to inspect and see what pulp is doing and There is also the ability to Answer on some actions where it's needed consumer tracking pulp Has such ability to figure out What is installed what packages what content is installed on every machine in your infrastructure In addition to that it will be able to figure out what updates needs Every machine in your infrastructure that were released recently available From these features that light I think takes a lot of advantages Asynchronous there are many asynchronous actions done in pulp. We have a distributed task system so we have this rest API that runs in Apache and This long-running jobs like synchronization and publish which Perform in a specific worker processes. So these workers Listen to the I'm QPQ Go to that queue and once there is a job to perform it takes the task it completes the task Sends a notification then takes another job completes that and repeats that So this is very nice because imagine yourself that you try to sync I don't know ten repositories at once and obviously you don't want to be blocked right so For these asynchronous things we do use the salary project Which is deep let's say main Famous and responsible project for the asynchronous things. Oh My presentation comes to an end and I'd like to summarize. I gave you the basic concepts of pulp so I Hope you already have in your head What pulp can do what you can do with pulp and where to apply it so We have very nice extensive documentation and I Encourage you to visit it. You can find it on palproject.org slash docs and In case you will not be able to find the answers to your questions or you will have some troubles and issues We are very welcome Go to the channel on the free note on palp or paudev. We will be very happy to help you. I Also want to mention that pop is in Fedora. It's in Fedora, Ohio. So there is another why not just go and install it because it's easy and Last but not least important thing. I want to mention is we do accept contributions Thanks, that's all Questions if anybody has all and I also have stickers who has who wants them Yeah, so the question was what kind of database do we use we use the MongoDB database is there For the question why do we use this database? Yeah, you're talking about the lazy. I mean the new feature of the third and loud So the question was when the new feature will be available the new feature is available in 2.8 release and It's trying to be be in 6.2 red hot satellite 6 And I am supposed to give the scars Okay yes yes, it can happen because Paul can manage a lot of repositories and depends what you're gonna do with palp so if you're gonna if you plan to manage big repositories with a lot a lot of content types and a lot of contents in general the It's better to scale them on between its own machine because we Practically we do not save the content itself in the Mongo, but we do store and save all the metadata and Imagine if you have like million or millions of unit types, of course, it will arrive to At least 10 kicks So Okay, are you you'll are you'll pick it first? Oh, do you have some money other questions? Yes, well, probably it's in plan on some roadmap, but currently I Don't know what our plans we planned that we always do plan everything, but we don't have any strict time frames That's actually a new feature It was it was the support was implemented like a couple of months ago, right? and currently these last ten and 11 I'm sure about well if you support it Yeah, so it's Yes Like We do have Well, this is very interesting because We use the most relational date and the nonrelation, right? It's very Interesting why why it was done this way. I don't know because I wasn't on the team at that time, but We do plan to migrate from Mongo to Postgres I think so you do release one release two release three that all your machines are released three you want a garbage collect release one repo You do repo delete delete the repo in the logic of fall You do garbage collect the packages, they are gone, but the sim links are still there So the repo is there the sim links are there and it kind of makes mess Yeah, it's an old bug, but I think we did fix that. Did we? Yeah But at least I remember that we've been dealing with trying to fix that and since our developers and I develop person We are trying to fix everything. I think it's fixed by now Yes So the question was pretty long And the question was like when you Sink from the motorway or publish it and then there are some new packages on the upstream source and you Sink and publish it again and basically in the publish It and it it still saves the old packages which are already gone on the upstream We have these cool things. We have an option like retain old content Retain old content that means that it will retain the packages on Pult even if they are gone on upstream with the next sink and publish and there is another option Called remove missing which basically with the next next next sink It will look into upstream and let's say there there is gone some package. It will also delete locally that package Does that answer your question? Yes, oh I think so What is our time we're we still have time yes Well, I heard a lot of requests of ruby gems and There is an external contributor who I think he was writing the plug-in for non RPM type, right? Yeah, but I don't know did he finish that not yet. Well, it's still in progress, but Well to write the plug-in is not that difficult there are like these three parts the type definition Distributor when we also have very extensive dog dogs and examples and The most important we're all always there and we can help you and in case you even would like to fix some buck You can pick it up. Don't be afraid Because we will help you we had a lot of cases when like extra contributors just reported it back And he wanted to fix that and he already had the fix and he just took it and fixed it and submitted a blue request And it was merged Once again Yes, we do have key to verify the bugs we have a nice framework called pulp smash and We develop ourselves to contribute to that so It becomes more and more robust and there are a lot of world will be more and more test cases with each day So we do have these parts of pulp Automated pulp smash is the framework to test Yeah couple About 10 and to We don't have that strict Separation between developers in Q because developers are also interested to make pulp really reliable and safe So when we do fix a buck, we know exactly the steps to reproducing what to test So if we have some time like half an hour one hour, we just try the test We do have regular meetings with key. So we do plan How this framework should be implemented at what else we need to automate So we come to very close and we help each other pulp performance testing Okay, I just look the couple of floors in the end. Do we use last question three two one Done. Thanks. And the stickers. Who wants the stickers? Thanks Okay Yes, I will wait for you outside. Okay, great. Yes. Thanks Oh, thanks Just once Martin only did it like Thank you I You know, I'm pretty out now. Yeah, I think I can Can you hear me? Can you hear me in the room? It's okay. Good. Thank you