 Welcome everybody to the fourth and last part of the CVMFS tutorial and the EasyBuild user meeting. Today we're going to cover a couple of advanced topics and do some demoing for some of them, but we won't give you an exercise anymore at the end. So we figured that was a proper way of wrapping up the tutorial. I'll pass the mic to Bob who will do the first couple of sections and then I'll take over at the end. Yes, good morning. So first again a short recap on what we did yesterday and some questions and issues we've been seeing. First of all there were some questions about those .CVMFS catalog files which you put in the directory structure to basically indicate that you want to have an nested catalog for the entire subtree below that directory. And there were some questions about what kind of file this is and if you should put anything in that file or if that should come anything in that file afterwards when you run the publish command. So basically if that file is going to contain the database itself. That's not the case so this will initially be an empty file and it will always stay an empty file. So CVMFS the server part will not store the actual database in that file. This file is just some kind of marker or indication that for that particular directory you want to have a nested catalog, but the actual catalog itself will be stored elsewhere. So this will just stay an empty file and you can just leave it there and if you ever want to get rid of that nested catalog just remove the file do another publish operation and it will basically merge that nested catalog back into the parent folder. So it will go into the parent folders and find the parent that also has a nested catalog and basically merge it into that one. Then there were some questions about the solution that we provided for the exercise. So basically the CVMFS DRTAP file and as some of you already mentioned in the Slack channel basically our solution is not really correct. So that might be a bit confusing because what we basically said there's a recommendation to have between 1,000 and 200,000 entries in your nested catalog and when you use the solution that we provided you will basically end up with catalogs nested catalogs that contain like eight or sometimes 24 entries which is mostly because we are now using dummy software installations which only have a few files except for that open phone one where we put in a few hundred thousand files. So it doesn't really comply to what we recommended so basically you could say that we indeed provided the wrong solution. So some of you have solved that or partly solved it by doing a slightly different which is of course correct. On the other hand if you're going to use this in production it might still make sense to just have one catalog per software installation because these catalogs are also some kind of way to easily allow access for clients to files that you often access together and then of course it makes sense to have one per software installation because you often if you access one file in that software installation directory you're probably going to access more files of that same software installation. Maybe even better for this one would be for instance to make one per architecture because every client is most likely to only going to access the software installations for that particular client architecture. So if you're running on a x86-64 machine you're probably not going to access the the ARM installations anyway. So because these installations are quite small you could have done it that way by just making one nested catalog per architecture but then still you would have to be careful with the the open phone installations. So I can show you maybe some alternative. So what I did this morning is make another one. I forgot the repository name organization.tld. So this might have been better to use as a real solution I'll also show you the real deal that I used for this one which basically only says make nested catalogs for those examples subdirectories in the open in every open phone installation to every version that's this wildcard and for every micro architecture that we have in our repository then you end up with nested catalogs for all these examples directories which is the most important thing for at least this example because otherwise you will get a huge catalog. So we make one for all the examples and everything else basically goes into the root catalog and then even then the root catalog is still quite small so that's still fine but as I mentioned maybe even better would be to even though this is quite small to already make one per architecture micro architecture so one for Intel Haskell one for Thunder X2 one for AMD Rome especially looking at the future if you're going to add more software later on then for now this is okay but once you're going to add more installations this will of course grow very quickly and then you again will have to do some manual work to fix everything in order to be prepared for that it's probably better to already make one per architecture or as I mentioned before already one per software installation itself because if you're going to add lots of more software soon then this would already fix that. So I can imagine that it was indeed confusing and we should have at least made a remark about that but yeah strictly speaking our solution was not correct. So then the next question that we got was about if you're going to change the CVMFS dear tab later on and for instance remove an entry that you had there before then the directory that you had in that file before might already contain this dot CVMFS catalog file so when you make this dear tab file and you run a published command it's going to create all those CVMFS catalog files and maybe later on you want to get rid of one or a couple of those if you then just change your CVMFS dear tab file it will not automatically remove the ones that have been created before which is a slightly annoying thing but yeah that's just how it works right now I'm not sure if that's intended to be like that but so if you're going to change it be sure to check if it's actually if you need to remove something manually so go into that directory just create this empty file and then the next time you run the published command it will basically merge the catalog back into a higher level. Finally there were some discussions again about the requirements for a stratum zero for instance so what kind of resources do you need especially if you're going to do large published operations? I don't think I discussed this fully in detail yesterday but go back to this slide so first in terms of CPUs and memory you really don't need a lot of specs here so for instance for our easy project where we sometimes do really large ingests of lots of software in one go so we basically built lots of software for one micro architecture and then ingest that huge tarble with lots of software lots of files so both in terms of number of files but also in size these tarbles can be several gigabytes or even more and hundreds thousands of files that's no problem for our current stratum zero which is a simple VM with only two cores and a few gigabytes of memory so you really don't need a lot of specs here of course it might help a little bit because it has to calculate all the hashes for the files and create all these catalogs that may go a bit quicker if you have faster CPUs for instance but in principle if you don't mind waiting a few minutes when you inject a large tarble it's fine to use just a few cores but the most important thing basically is the storage space that you have for your CVMFS repository and then there's two spaces which you should take into account two directories because also on the stratum zero the slash cvmfs is basically a read only file system and if you open the transaction and start adding new stuff to your cvmfs repository it basically what it does is add a writable overlay on top of that read only slash cvmfs directory and that writable overlay uses a scratch area in by default slash far slash pool slash cvmfs which means if you're going to manually add lots of files to your repository by just moving them in a transaction to this slash cvmfs location you need quite a lot of scratch space in far spool and then you might quickly run out of space if you by default don't have a lot of space there and then when you do the actual cvmfs publish command it's going to take all the files that you stored in that scratch area so in your writable overlay in the spool directory and then it's going to compress them de-duplicate them calculate their hashes update the catalogs etc and finally move the compressed files to the real cvmfs storage area which is by default in slash srv so even though if you have a lot of space in that directory in the slash srv directory still you sometimes do need quite a lot of space in the far spool directory as well otherwise you might get weird errors that you've run out of disk space so that's important to take into account but one huge advantage of using the ingest command for large tables is that you basically skip that staging area so what it does is basically stream the contents of the table directly into the right place and then you don't need that a lot of space in the far spool directory because it doesn't use the writable overlay so basically it does that on the fly calculate the hashes etc and then move the actual compressed contents of those files into the real storage area right away so that's the advantage of using the ingest command over just doing it manually by opening a transaction and doing the extraction yourself because then you do need a lot of space so that's about the specifications for the stratum zero and I think that covers most of the things we've seen yesterday so unless there are any questions right now about the publishing section I will continue to the advanced topics I don't see any questions right now okay then I'll continue so the last section last page of the tutorial websites just discusses a bunch of advanced and other topics that we haven't discussed this week so it's really just a bunch of different things so I will just go to that page and explain the first couple of ones and Kenneth will do the last two I think so let's go here make sure if I've changed anything so let's refresh it to be sure so first there's the automation part during these exercises you've done everything manually which is a good thing to do at least once so that you know what kind of steps this involves so what kind of things you should take into account when you're going to set this up in terms of ports and whichever talks to which one etc but as you've probably also noticed and I've also even noticed it myself and doing some of those demos this week it's very error prone so you can easily mix up some IP addresses make some typos somewhere mix up the proxy with your stratum one or something similar and then stuff breaks and you will have to start debugging which is sometimes not very easy with CVMFS to find out where the actual cause is so to make this a bit more easy it's probably a good idea if you're going to use this in production to use some kind of automation let's increase this a little bit so it depends on what kind of tools you're familiar with but for instance the easy project we use Ansible a lot so you can take a look at the link here to find our playbooks which are mostly based on this Ansible role provided by the Galaxy project it's a bit tailored towards the Galaxy project so there are lots of variables with Galaxy underscore that you can either override those or there's some more generic variable that you can set but this allows you to install both stratum zero stratum one proxies client so basically everything so it's a really cool and powerful role that you can easily use to roll out all the infrastructure that you need and it actually makes it quite easy often you don't have to configure that much maybe if you want to change some storage locations or something like that you have to add some configuration and of course you will have to add for instance the public keys of your repositories somewhere in the configuration have to provide the names of the repositories that you want to make or configure but then it's just launching the playbook and it will do everything for you can even change the firewalls as well so that's quite nice there's an alternative from compute canada they also have an Ansible role to configure clients which is publicly available and also one for servers which i'm not sure if it's fully complete they call the demo release so at least you can take a look at them at their Ansible role and see how they do i think so there cern also has its own automation module that you can use but only for puppets so if you use or are familiar with puppets already you can also try to use that one and that also works for both servers and for clients so it can also do basically everything that you need um yeah i don't know if it makes sense to go into this more can show quickly how such a playbook looks like so for instance for the one that we use for our stratum zero for the easy project this is basically everything that you need to include in the playbook so just on which host you're going to run it we have to make some fixes because some things didn't work on debian systems browser is a bit slow with zooming in but other than that we don't do that much here we basically just run the role itself so this is the role itself the galaxy project cvmfs role and we add the apple repo story because that's needed for some packages that's basically it and then that's of course a configuration file that's specific for easy it's over here where you just add some details about your repositories of the domain name the public keys and the location where you want to store them the URLs of the stratum ones so this is basically something that you have to do once and maybe later on if you're going to add more stratum stratum ones you have to add those URLs here as well but other than that it's just configure once and then you can use it for all the different components so that covers the automation part if you have any questions in between just let me know just raise your hand and you can ask the question so then I'll go to the debugging section well I already did more or less a live demo when was it yesterday I think when I ran into an issue so most of this I think I've already shown you but let's go through it quickly one more time so this is a very useful command to check setup I don't think it does an actual syntax check of your files but it does a lot more than that so it will look if the right configuration files are in place it will set up the connections and see if the connections can be made it will check if the surfaces are running for for instance AutoFS it will check the permissions on directory so it will do lots of checks it's basically a batch script so you can look up the source of the batch script yourself if you want to but there's a long list of checks that it will do to make sure that everything is in order except I think for the actual syntax checking of your configuration files so these are basically also just batch files which it will just source and use to set up the connection so if there's something wrong then you will probably get an error somewhere but it doesn't actually do the syntax checking another useful command is the show config command we've not really shown that during this week but CVMFS uses a kind of hierarchical configuration structure and sometimes it can happen that you try to set a parameter somewhere in a file but then it gets overwritten by another file and it doesn't actually use the parameter that you've set and this is a useful command to really dump the effective configuration that it's using at runtime so then you can actually check is my proxy being used or is my server URL correct is my directory that contains the public keys correct which are probably three of the most important ones these three to check if you have an error and see if they point to the right URLs and the right directories so that's always a good thing to do if you run into some kind of issue check if the live configuration is really correct but the public keys of course also something to always verify because a wrong public key file can cause weird errors and sometimes hard to debug so one of the first things you should do on the client if you run into errors is check your public key file if it's correct and if the location that you provided here to the directory is correct well the probe basically tries to just connect to your repository so we'll basically just try to mount it and access the repository so that can also be used to find out if it's okay and finally you can enable a debugging setting so if you want to get more information about the particular error you're getting you can add this to your default local file so just point to some location where you want to store a log file which is going to contain the debugging messages one warning here which I think I already mentioned one or two days ago be sure that you point to a location that's writable for the cvmfs user because if you don't do that then the cvmfs client itself may even crash and then well you basically have two issues here it doesn't work anymore and you don't get log information anyway so then it's really hard to debug so make sure that you set it to something like slash tmp or something if you add this to your configuration you have to reload the configuration which you can do by for instance unmounting and then remounting the repository if your repository was already unmounted with which autofs will do after a couple of minutes you can just re-access the repository because then it will remount it and it will reload the configuration automatically but this just makes sure that it does that and then when you access the repository again you should start seeing messages appearing in this debug file well one other big cause for issues is of course the connection issues which can be caused by wrong IP addresses or wrong firewall settings etc so if you think it's a connection issue it's always a good idea to make sure that for instance by using telnet that you can access the actual IP addresses of all the different machines on the right ports so for instance check on your client if you can access your proxy on port 3128 unless you've used a different port same for the stratum 1 so if that's okay then at least you know that the the services and the ports are accessible and one other command that I always find very useful to check is if you can actually access a kind of hidden file which is available in all the cfmfs repositories which you can always access by going to this URL so just fill in the stratum 1 ip also work for the stratum 0 if you want to check that one and see if you can access the the file this file and this location so you should always contain this slash cvmfs slash the name of your repository and then the name of this file that should return some outputs which includes this so if it returns this htp code you're pretty sure that the cfmfs is returning the right thing to you so then at least you know that the cfmfs part is working correctly and also the port is open etc you can also try that by including the proxy so using this cruel command it's the same command but only with this part added that means go through the proxy with this IP address and this port and then try to access that file from my stratum 1 so then you can check both the proxy and the stratum 1 and if for instance this one does work but the second one doesn't work then that's a clear indication that something is wrong with your proxy settings so then you should probably check your squid configuration so one issue that you can easily make is that there's a wrong acl setting in your squid configuration and then this will probably not work anymore for instance if your proxy cannot or it's not allowed to access the stratum 1 servers um finally here's an overview of the relevant log files that you can check on the different machines so the client has that debug log that you can enable the stratum 0 that's basically just an apache server so you can check the apache logs the stratum 1 is basically apache and squid so you can check the apache and the squid box so squid is in by default in far log squid um and also the snapshot command that you will run as a cron job that will log to this file so snapshots.log in the cpmfs folder so if it looks like your stratum 1 is somehow running behind you can check if there was anything going on during the latest snapshots and you might find errors in that file um and then for the proxy itself uh well that's a squid only so uh take a look at squid files if something is wrong um beside that you might also find of course stuff in the for instance the system logs you know i've not included that here um let's check no raise hands yet so i will continue with the garbage collection um so i've already uh talked a little bit uh about these different tags or snapshots um and revisions that you have for each repository um which means if if you add a file to your repository and later on you're going to remove that file it might not be automatically uh be purged from disk on the stratum 0 and stratum 1 servers especially because by default um cpmfs enables this feature called automatic tagging which will automatically assign tags to every uh published operation that you do so i can show that one more time um if i check on my c stratum 0 server i can again do this tag minus l option uh repository name so again every time you do a published operation uh you get a new revision um and as you can see there's already some revisions missing here because i purged some of those or went back to earlier versions um and then you can assign tags so basically names to those revisions to easily uh distinguish between them and even descriptions to make it more clear what each tag means um now suppose that at revision two for instance i added a file and i'm going to remove that file now in a new revision um so i'm going to open the transaction here and i remove the file so let's try to do that transaction so sometimes you don't have to provide this repository names i think if you only have one repository um but i apparently made another one here um yes so there's a second one um so for instance if i'm now going to remove this hello.sh file um and then publish this now it doesn't show up here anymore but if i look at my tags again um this is the one that i just did so this is where i removed the file but of course that file was included i don't know where i added it probably it was one of the first files uh this hello script so i probably added that somewhere here in revision two or three um so this one might still include the file and this one definitely does and this one and 10 and 11 etc um so cpmfs can't remove that file from disk yet because it's still included in early revisions and if i want to roll back to that revision of course the file still has to be present um so we'll not remove it from disk unless i start removing those revisions um and for this you need the garbage collection so i can start removing tags by doing tag minus uh r or d what was it let's check here minus r to remove the tag so i can start removing all these older tags so that basically there's no revision with attack attached to it anymore um and that will basically mark the tag for removal but it will still not actually remove the file from disk for that you have to run the garbage collection service and that's something you can do with the gc command so that's listed over here um and for this you first have to enable garbage collection on your repository which is disabled by default so if you want to enable that you have to go into the configuration for this repository which is in the server always in this folder so in repositories dot d and then the name of your repository and in here you will find the client the server file with the server settings and you should see the garbage collection here which is indeed set to false so i can enable this and then you can run the actual garbage collection command so here it is so i can set this to true and now i can actually do garbage collection by uh just running this command i don't know if there's anything to remove already i'm not sure what i did in all those revisions anymore and which one i've removed but if you run this it's just going to check for revisions which have been removed and files which are no longer referenced by all the active revisions in my repository nice does not allow probably have to run this as roots segmentation for that doesn't look good i'm not sure what's going on here well i wasn't planning on doing a demo of this anyway so maybe i shouldn't have that that's something to check on i guess yeah i don't know why it doesn't maybe it's maybe it's not happy that it doesn't have anything to clean up for yeah that could be indeed um well so let's skip this for now um but one other thing that i do want to mention here uh is these automatically generate attacks which are useful um but at some point especially if you don't add messages to all these tags you might lose track anyway of which automatically generate attack include which files or remove which files so i don't think there's a point in keeping them forever which it will do by default so that's probably not really useful but you can also change that value in your server configuration files the same file that i just edited and then you can change this parameter which basically says i want to keep these automatically generated tags uh only for the last 30 days so all the other ones remove them automatically um so that would mean that uh well i can't show that here because my tags are all newer than that but um if you set it to two for instance one or two days you can start removing those very old ones and it will automatically do that for you so you don't have to run tag minus r for all these uh tags anymore but then it will automatically mark those for removal and again to actually remove everything that is no longer necessary then you again have to run this garbage collection command so my recommendation would be to use those together so set the auto attack to something useful 30 days or something or 14 days whatever you find useful and then make a cron job that will for instance run this every night for you as we can also use these options which are useful so minus a will do it for all the repositories on your server and then you don't have to provide the name anymore of the the one that you want to run it for the minus l will make sure that uh uh it prints to a log file which files it's actually going to remove so you can at least see what it has been doing um and the minus f will make sure that one asks for confirmation uh which i just said to do by pressing yes um so uh this you can easily put in a cron job file and run it every night so that you at least clean up old unreference stuff uh you have to do the same on the stratum one by the way so you have to enable again first garbage collection on the stratum zero but then on the stratum one you still have to run this to clean up old stuff on the stratum one as well um i think that covers the garbage collection if i see any question um there's a there's a question in slack can we extract a diff of the metadata from two tagged repos missing files that have been deleted added so what's different between two tags yeah there's a diff command diff command for the cpmfs server making lots of typos today um so the diff uh here you can see how that works so you can provide two tag names basically and then it will show the the differences between those um and i think jacob explained this yesterday as well when we were talking about this but if you're inside the transaction so if you're currently changing files you can also do that so that's currently the default um so it's going to check the between the last and the published version um so that allows you to check for the differences and then the slightly related question can you extract a random file from a tagged repo so if you know a file is there in a certain tag can you just grab a copy of that file without obviously rolling back to that tag it's a very specific question yeah i don't think as far as i'm aware there's no command to do that but uh i could ask jacob yeah jacob i don't think he's in the call anymore all right no okay but no i don't see a way to do that with the cpmfs server command so i don't think that's possible yeah i'll ask jacob jacob see if i can get an answer okay and then there's there's one actually two more um how do you delete or remove an ingested tarball so the like we did yesterday how do you undo that um the ingest command itself also has an option for that so then you can basically use this minus d to remove folders but um i don't think yeah so i assume you mean that you have a tarball and you just want to give the tarball and that it removes everything no i i think what he means is undoing the ingest okay if you just want to undo that that's easy because the ingest will also show up in this this list with tag so if i do an ingest you will see a new entry popping up at the bottom and the revision will get increased so what you can do is just do a rollback to the previous one so that's the rollback command which i think explained yesterday but i can shortly repeat that so the rollback allows you to provide a tag so you can just pick the name of one of those tags in this list it's just this column here uh and then the name of your repository so that goes over here and if you don't provide the name of the tag it will just automatically go back to the previous one so that's an easy way to just refer back to the previous version okay and then maybe a final question on the garbage collection when you're doing a gc garbage collection can you export the deleted files so get a copy of what's being deleted just in case you need to re-ingest them later um that i'm aware of no so you can let it print one and with this minus l option you can make it do it you can make it do a dry run i think yeah so that it shows you what it's going to remove but if you're actually doing the the garbage collection then it's actually removing it doesn't give you a copy i i guess that could be a feature request to suggest to the developers yeah sure sounds like a useful feature to me so you can move them to a trash bin or something maybe first so you know these ones have been deleted yeah yeah just in case if you later realize oh i shouldn't have done that like you're not actually losing the files yeah that makes sense okay so also quickly checking slack but i guess that's everything for now right yeah that should be it and the other question i've asked jacob um if he gets if he answers that we'll get back to them yeah great um then the next topic is about the gateway and publishers which uh we might have briefly mentioned already before uh and jacob also told something about this in his cvmfs talk earlier this week um so previously it was only possible to inject new files on the stratum zero server which is a bit annoying because then you have to give everything everyone who needs to change the repository access to your stratum zero server um but there's now a new quite new feature in cvmfs which allows you to set up a gateway and publisher machines which basically means you're going to uh run a gateway uh which you can easily do on the stratum zero itself and then the you can set up build machines or publisher machines which can uh have uh write access to the repository and they basically talk to this gateway service running on the stratum zero and um allow you to uh well also do a right make a right ball overlay on the publisher machine on top of your cvmfs repository then just send the the changes back to the gateway machine um so we wanted to use this for the easy project as well then we were given a bit of a warning that is still a quite new feature and it's not being used a lot yet by large production sites uh so that we should uh well take care and uh or be careful and uh use it at a wrong risk basically uh though it should be stable but it hasn't been used that much yet so there might still be some bugs or issues with this feature um so I can quickly show you how that works um so again the easiest way to do it is to run the gateway on the stratum zero you can run it on a different machine as well but uh I think then it needs to have the the storage the backend storage of your stratum zero mounted on the gateway because it has to write the files into the the storage area um so you can just install this on the stratum zero it's just another package cvmfs gateway um you need to open the port so by default it will use port 4929 but you can change but you need one more open port on this machine then if you don't have a repository yet you can just make the repository um by running the same command as you would do otherwise on the stratum zero and then you have to provide provided with some kind of configuration about who is allowed to change the repository and which parts of the repositories so that's also a very nice feature that you get here so you can basically define keys so first you have to give those keys a name or id basically so you can use for instance the the name of the person here who's going to be using that key and then you can define in which part of the repository so the repository is given over here and then the uh the path starting at the root of the repository uh has to be defined for which that user should get right permission um so for instance here this user will only be allowed to write to restricted to sub the air um then the extra keys themselves are stored next so here the id again refers to that one and then you have to fill in some kind of secret password or code which is going to be used by that particular user as some kind of api key that will give him access to the repository that's basically all you need on the gateway side there's also another configuration file so and this one is stored as this path you get one by default which you can start editing and there's another one that you get by default it's this one which you don't really have to change unless for instance you want to change the port on which the gateway service is running or if you want to put a maximum on how long a user can get a lock on the repository basically if you open the transaction how long that transaction can stay open until it will be automatically closed that's useful because if you give lots of built machines or published machines access to the repository you don't want one machine that for instance forgets to close the transaction or actually publish the transaction to always keep a lock on that repository and then no one else can make changes to that part anymore so we can put a maximum on that transaction and these locks I think bob can be path specific right so a publisher can take a lock on part of the repository as well um yeah I think Jacob has mentioned that before but in principle you just use the transaction command for this and I don't really see here an option where you can define which part or maybe that's an upcoming feature then yeah we can ask him that as well because indeed I think he did mention that if you have for instance write permission on the entire repository but you just want to change for instance one particular software installation somewhere in a sub directory you should be able to indeed say I only need a lock on that particular part so that you don't lock the entire repository yeah or the or the build machine for Haswell and a build machine for Skylake yeah they are obviously not going to write in the same pub so that would make sense um so that's one side and the other side is of course the actual publisher machines that are going to write to that repository there's no special requirements here besides that you just need a bunch of packages packages again I think this is not entirely correct probably update this I think you also need the cvmfs package and because we install cvmfs server here which had that dependency on jq you also need the apple release thing again here but in principle this is more or less the same as a stratum zero that you need to install they you need three keys first again the public key of the repository which you can just copy from well the client or the stratum zero or wherever you have it stored they need a special certificate which is stored on the stratum zero as well so if you go to the keys directory on the stratum zero you will find both the public key and also that certificate that you need so that's basically necessary for the the permission to be a publisher basically and then there's that gateway key that we created previously in the in the JSON file you need to store that in a file with this name so that gateway basically where you store that API key so that's just your secret code that will give you access to that part of the repository so these ones copy them from stratum zero that one depends how you get it from whoever maintains the stratum zero but just give it this name and then store all these three files together somewhere it doesn't really matter where you store them because in the next step you will have to provide a path to where you store them and you just have to run that command once so once you have obtained all these different keys put them in that directory and then you basically have to run this command so again the make fs but with some special flags in this case so here you will have to provide the URL to the stratum zero so be careful that you also include this path and then just replace the IP address by the actual stratum zero IP address or host name some special options here and again the stratum zero with the port for the gateway and some sub URL here to give you access to the API that is being used then the minus k you should point to the keys directory where you store those three keys that are mentioned so put them all together in that single directory and then the the owner again of the repository so if you run this once as root then later on you can again just do transactions with the regular user so you only need root once to set up everything and of course in the last part here the name of the repository so just copy paste this it's a bit of a long command but copy paste it and fill in the stratum zero IP here and then it will automatically replace that everywhere once you've done that and you didn't get any errors you can start running transactions on that particular machine and send them back to the to the stratum zero so that's a very nice way to allow multiple people to write into your repository and and also the ingest command will also work I guess yeah it will also work yeah so basically everything that you could do manually on the stratum zero okay you can now do on the publisher command that covers most of this meanwhile in Slack I think we have an answer for the question on extracting a file from a specific tag there's a checkout command it looks like this could be helpful and also a client can mount a specific tag of the repository so if you do that you can also copy the file from that specific tag yeah that's indeed a possibility so then you have to change your client settings and indeed there's a parameter that allows you to specify which revision or rich version of the repository you want to mount the checkout I'm not sure about I think the checkout allows you to make separate branches yeah I think if you want to make a different branch in your repository so by default just everything is on one single branch but you can make sub branches just like on Git and then you need to checkout commands to basically go to a different branch so I'm not sure if that allows you to check a file from a different so so tags and branches are different things for cvmfs while in Git they're basically the same well no okay a branch is a moving tag in Git so I guess it will be the same thing here yeah okay yeah I'm not sure it's being used a lot I myself haven't really used this before yet but I think that in the documentation they say something about data releases where this could be useful if you just want to update some data release that you put those in separate branches so we have another question so if you want to use a gateway you need a stratum zero and a gateway yes yes yeah doesn't it suffice to install the gateway part on the stratum zero yes you do so the gateway and the gateway service and the stratum zero service can live on the same server and I guess in practice that's also usually done like that I think so at least I would assume that that's how I'm using it as well but I don't really see the advantage except for maybe spreading the load a bit between the stratum zero and the gateway because then the gateway will take care of some other stuff that the stratum zero don't have to handle anymore but yeah it will make the setup a bit more complex because you have to share the storage then and and Caspar has an idea on why your your garbage collection failed or at least he saw the same issue he managed this by starting a transaction and then publishing publishing that transaction without changes so it seems like the publish reloads the server config or something which is needed to pick up the the change you made in the configuration so just changing the config file is maybe not enough seems to work thanks Caspar yeah so now it's indeed doing the garbage collection yeah maybe I should have included the minus l then it also would have shown me what it's going to remove but at least you see the process now of it's just going to search for unreferenced stuff in the repository and remove that yeah so I guess there was a reload needed of the configuration or something yeah okay then just coming back to that gateway question so this is actually my stratum zero that I've been using all week and you can see I installed the cvmfs gateway on the same machine it's even running at the moment with the gateway so all the configuration for my gateway is in here that this is basically taken from our tutorial web page and I just defined a very good secret for one user here or actually even two so one has full access to the repository and the other one only to slash bog and then you can just start servers and it's running on the same machine I see more questions popping in but I think we'll continue first with the the rest of the two topics we have yeah so maybe Bob can jump into Slack and answer them or we can get back to them at the end we're already over time to what we have planned but we do have some plenty of buffer actually yeah okay so I'll take over the screen Bob yeah one more warning that I forgot to mention here I was looking if we actually documented it on the page but it's indeed here so big warning if you're going to use the gateway on the stratum zero you should not open manual transactions apparently anymore on the stratum zero itself because they might interfere with the gateway so you can still do it but then you have to stop the gateway first because the gateway will now be managing all the external transactions and if you're going to run an internal one on your stratum zero that might interfere with something else so and it may even corrupt your repository so if you're going to do that stop the gateway first and then do the manual transaction um so that's all that I uh wanted to say about this topic and then Kenneth will take over you're going to share yourself you said right yes yes yes okay then I'll stop sharing sure sharing the rights yes okay that looks good um yes and these are the slides I'll continue with the the mounting of CVMFS repositories as an unprivileged user so up until now in the client we've installed CVMFS natively which requires sudo to install the packages and also to set up the configuration that's good if you have sudo like we do have in the VMs here but you don't have that everywhere so for example if you want to access um CVMFS repositories on an HPC cluster where you do not have admin rights or where you are not ready to set up and install CVMFS to make sure you're not causing any trouble you can actually do it as an unprivileged user as well and there's a couple of ways of doing that one way is through singularity so using a container image and this is also currently what we recommend or at least have documented in the easy project as accessing the easy repositories if you don't have root permissions so singularity has a fuse mount option where you can mount fuse file systems as the CVMFS repositories are and you can leverage that to mount CVMFS repository without having CVMFS on the host so you only need the CVMFS software installed in the container image itself have singularity installed on the host and singularity is enough to mount CVMFS repos as an unprivileged user so as long as singularity is installed you're good to go um i'll demo that here quickly with the easy repository so this is basically copy pasted here from our documentation um what we do is we create two directories that we will bind mount into the container which are required for CVMFS the var run CVMFS and var lib CVMFS have to be writable for CVMFS for for the caches and and things like that and obviously when you when you start the singularity container it's read only so we do have to bind mount directories in the container where we can write stuff too into these locations so that's what the singularity bind part does um and what we also do is we do a fuse mount of not one but actually two repositories um so the CVMFS config repository so that's the configuration repository which we discussed i think yesterday in the recap so this will make sure that all the configuration is available for the easy pilot repository and that you always have the latest configuration so you don't have to do any manual changes to the container image or anything like that and then the second one is the actual mount the fuse mount of the easy pilot repository itself so we we define environment variables for these two just to keep the actual command shorter and this one is actually an environment variable that is picked up by singularity itself then the last one is the actual command we start a shell in a container image we use fuse mount to the fuse mount option to mount both the config and the software pilot repository and we use a container image that we pull from docker hub so in the easy project we have created a container image that basically only includes a pair sent to a seven operating system and CVMFS installed so that's all that's in there so to to demo this what i did here i did a full reset of the the client node in azure so the same one you are using and i'm just showing here with this arch archback command so this is a small utility i installed to show you you're running on an intel haswell system that's important for the easy repository i'm showing you here there's no CVMFS natively installed on the node there's only singularity installed so i'll just go ahead and copy paste this whole block and run it here i already did this before to make sure it actually works so it's not going to pull in the docker container anymore it's already cached locally and you do see some scary looking errors they're actually warnings here so it's it's failing to do something but it actually doesn't really affect the functionality of accessing the container there's a small performance issue here but i don't think it's a very big one and i think it's even maybe fixed in more recent versions of fuse or singularity so now we're in the singularity container and we can check slash CVMFS we have our pilot repository and we can source the easy init script which will set up our environment to start using the software and what the init script does it does some detection using archback like i showed before what kind of CPU is in here based on that it will pick a particular part of the repository that is specific to haswell and then you're ready to go with the module command you can start loading modules which are included there's a whole bunch of stuff here openfoam gromax tensorflow is already included in the easy repository and just to show you if i'm accessing the container and check slash CVMFS even the mount point doesn't exist so it's quite easy even as a non-privileged user to access it that way and i have another window here which is another VM actually running an AWS i want to show this one because this is an ARM system ARM graphiton 2 and i can actually use the exact same command here so we we've built the container image such that we're not it's not hardcoded to x86 it also works on ARM thanks to this last bit here uname minus m which spits out what type of architecture we are running on so if i run this on the azure VM it's x8664 if i run this on the graphiton 2 AWS VM i get arch 64 so ARM 64 bit i just copy paste this the exact same command i will also get the container and here i don't get the errors and i can also run the exact same source command to source the initialization script the exact same initialization script so i'm not telling it anywhere what kind of cpu i'm running on not even whether it's x86 or ARM and the script just figures it out okay this looks like a graphiton 2 so let me give you that part of the repository and make sure that you get software that's not only working but also properly optimized for the hardware on which you're running so you can see thanks to cvmfs we're able to do this no matter whether we're in the cloud or even on other systems this works nicely and with some additional logic in terms of how the software is organized in the repository and some scripting on top of that you can make sure it works nicely even on very different cpu architectures so that's one way through a singularity container that basically only includes certain vmfs so you don't need anything more that's very good there's another way if you don't have singularity available there's a separate cvmfs exec utility which sits in the separate github repository which you can also use but there there are some limitations or i should say requirements if you're on a very recent operating system like rel8 or centOS 8 where it has user namespaces enabled and it allows fuse mounts as a regular user then you can use cvmfs exec without having singularity installed and this allows you to mount the repository even without admin privileges under slash cvmfs it will only be mounted for your own user and that specific session but at least you can mount it under the correct part which is typically important for the software that's installed in that repository to actually work properly so this does have some requirements you have to have access to user namespaces so this has to be enabled on the system you need to be able to do a fuse mount as a regular user typically it says here this works on old operating systems as well that's true but then you have to add some kernel parameters at boot time to make sure user namespaces are enabled I think by default they're enabled on rel8 and probably also centOS 8 unless they were actively disabled so at least on modern operating systems you have a bigger chance for this to work and then yeah what you do I won't demo it here you can check the readme page of the github repository for how to use this utility but it's obviously a very handy one as well and it comes with a separate script a sing cvmfs script that leverages singularity when it's available so if you don't have user namespaces or any of the other requirements but you do have singularity this script basically automates what we have shown before so you can use a singularity container to get access to the cvmfs repository that way so there's definitely options and then the other part but I think we covered this already yesterday well so with the configuration repository you can avoid having to manually maintain things on your client so you set things up once you mount a configuration repository which knows about different cvmfs repos even across different stratum zero stratum one servers and since this is a cvmfs repository it also gets updated automatically when changes are being made so it's a sort of a one-time thing and then you are always sure you're picking up the latest configuration of all those repositories so it takes away the maintenance problem that you may have but the important limitation is that you can only have one configuration repository in use at a time so as we showed yesterday you can actually have multiple ones installed but only only one will actually be active and will be usable by cvmfs and this is a pretty hard limitation it's unlikely that this will be removed anytime soon so that's definitely something to take into account of course you can combine a configuration repository which gives gives access to a bunch of cvmfs software repositories in combination with manual manual maintaining manual configuration of additional repositories you can still get access to as many repositories as you want but you'll have to maintain those configurations and there is the cvmfs contrib github repository where some of these configuration repositories are hosted and it's typically for the the big organizations that use cvmfs so they collect their different repository configurations in there so they are easy to pick up and easy has its own small configuration repository as well which only gives you the pilot repo so yeah we'll probably need we'll probably have to provide other ways and proper documentation of doing the manual configuration or giving new packages to that make it as easy as possible I don't know if there's any questions on that so Bob maybe you can you can raise them to me if there are any or maybe we should wrap up first and then then do some more Q&A let me wrap up the slides first and then we'll do some more Q&A so just to wrap up the tutorial here the containers the VMs actually the VMs you have in azure you can still use them today but we will we will actively destroy them and remove the user account you have access to tonight around 8 p.m european time so if you still want to play around there you can if you if you're done with that we just ask that you terminate the cluster yourself so to terminate your cluster you go here on the top use the terminate button and remember that will kill all the nodes and throw away all the storage all the configuration you had so you will lose everything keep them in mind of course so only do it when you're really done playing if you haven't terminated your clusters by tonight we will do it happily for you but then everything will be gone and destroyed and the final slide here so if you want to reach out or get engaged with the cfmfs community there's there are different ways they have a JIRA bug tracker where they keep track of ongoing work reported bugs feature requests so you can jump in there and and create any issues that you see relevant if you're hitting any bugs that's definitely very useful to report them there they have a matter most server so matter most is an open source equivalent to Slack so an interactive chat environment where you can look into you do need to create this certain lightweight account so anyone can create a certain lightweight account which gives you access to certain services at certain including the the matter most server they have and this is brand new they only announced that a couple of days ago they have a forum now as well a discord forum so if for the the new people or the the hipster people it's now called discord this used to be called a forum and it really is a forum so you can you can post things there people get an email and they can reply to it if they want to and it's all public and available on the web and then the timing of this is pretty good because next week there's the cfmfs workshop as well it is similar to the it is similar to the easy build user meeting so people giving presentations having discussions back and forth and it's open for anyone to join and three and Jacob will give a talk there on the world map there's a talk about the author of the cfmfs exec tool you will explain how that works and what you can do with it bob is giving a talk on the easy project and there's going to be several talks about the the combination of containers and cfmfs as well so unpacking containers in the cfmfs repository and why that makes sense and all the tooling they have around that okay with that maybe we can take a look if there are any more questions so feel free to raise your hand or raise them in slack is there anything we should cover bob from the slack channel i think on slack everything seems to be covered so okay nice don't see anything here either yeah i don't see anything in zoom here either uh we'll wrap up here because we'll we're quite a bit over time um according to the planning but yeah you know where to find us in the in the easy build slack there's the cfmfs tutorial sub channel um or if you want to um get in touch with the cfmfs developers themselves i guess matter most is a good starting point and that's an interactive chat as well oh there is one more question coming up by nicolai who is not in zoom so he has a small bit of delay in youtube so let him type out his question or maybe it's a practical issue yeah probe is failing i yeah so he's trying something and it's not working out i guess it's better to handle that's like a slack because that will mean yeah back and forth okay so we'll wrap up here people um thank you very much everybody for joining us i think this worked out really well it was the first time we did a cfmfs tutorial and we were actively working on it in the days prior to tutorial but i'm quite happy with how it has worked out i think it's a uh a very valuable contribution to the community as well thank you very much we'll wrap up the recording in the stream here and we'll be around in the slack channel for sure