 Hi, I'm John A. I'm from the production Seattle, thanks for having me. It's great to be here, and thanks to HVC now for hosting us. The front hatch is the Cancer Research Institute in Seattle. We also do HIV research. We're publicly funded, so we get most of our funding from the federal government. Maybe a little bit more. What we see in here, we have about 300 PIs at the site. There's about 3,000 people on campus. We're also associated with the University of Washington. And we also have a Seattle Cancer Care Alliance, which is an actual clinical setting for patients. So we've been using the EZ Bill since 2015. I've had a lot of experience with modules. I've been using it since the 90s, back when I was in NCT. So that wasn't new to me, but EZ Bill was. So I started using it, and then I started to have a lot of people. We just initially thought of EZ Bill as a tool to build software. So we just started with an initial pull of the repository and started building. And we installed the EZ Bill on a machine that we had been using for years as a developer platform, which had an enormous amount of it. So our early use of EZ Bill, our packages were very included. The package can be pulled in other versions. It's an EZ Bill version. And this idea of low level libraries. So that line between the OS and now we're at a point now where probably our next cluster will be deployed with the most minimal Ubuntu we can find. Nothing, nothing in there at all. No compilers, no make. And if it's not loaded through EZ Bill, it's not going to work. Sorry about that. I'm the person that is responsible for R and Python also. We have some very demanding R users. Our development had lived at the Fred Hutch for eight years. We still have a lot of those developers on campus. Bioconductor was also developed at the Fred Hutch. And we still have a lot of those people, although they moved to Buffalo a few years ago, which is a crazy move. So my users expect R to be out like the day after the release. It's a very demanding group. In order for me to prepare for that, I'm usually rolling out the development copy of R on a weekly basis. And I have an EZ config for that. And so my users can have a pre-release of what's going to happen. And then also I'm practicing my build. So it's not that I'm quick. I'm already practiced that release like a week or two in advance. And obviously most of our packages are bioinformatics. We're not involved in chemistry or physics. So very, very heavily involved in bioinformatics. And we also publish all of our EZ build recipes. And that's the URL you can find them online. It looks a lot like the EZ build site. EZ configs directory. Underneath that, you'll find the alphabet and then all of our EZ configs under there. So if you're looking for a new version of R or Python, we might already have it, although maybe not in the tool chain that you want. So again, we started just using EZ build. We just took off and we started building R and Python. We just started adding modules to the bottom of it. And we weren't really paying attention to the EZ build community and what was available out there. And we really diverged rapidly from that. And now we're very focused on the community. We really see a lot of value in being able to get EZ configs from the community. And so last year, maybe over a year ago, I started to untangle R EZ config from the EZ config that's published. So tease apart, pull all those different things apart and always use the base R and base Python. And then add R existing modules through bundles. And of course, you know, our current philosophy is to publish as much as possible now. I think three of the last R releases were mine. And I might say the other thing that we're doing with R is the community that has a separate bioconductor. And that's very confusing to our local users. So all the bioconductor packages that local users want are loaded into our custom Fred Hutch version. And EZ Update is able to look at CRAN and bioconductor. So when you load the Fred Hutch R, it has well over 1,000 packages now. And it also includes bioconductors. So we don't support that as a separate package anymore. So what Python looks like now is, you know, our custom version. So FH Fred Hutch version one is, you can see the dependency list. Now there's a lot of version suffixes. So read all that in. And then our extents list will be custom to our site. So again, when I started using EZ Update and we started developing that, it only read a single file. So in order to make an easy config like this work with EZ Update, what's happened in the last year is I'll open this file up and I'll actually read the dependency list and the extents list. And then I'll populate the dependencies from all of these, but there's about eight different modules in there. I'll open every one of those up, read the extents list and that and store that in a big list of like dependencies that are already part of that. And then when we scan the local extents list, we're always checking it against what's already included in these other files. So that was a major upgrade this year to make that work. And also initially, it would only work with modules named Python or R. But now we have so many bioinformatics packages that are written in Python that have huge dependency lists. And when I scan the file, I need to figure out what language it is now. So I look for the extents class. I look for it in the dependency list. Hopefully the very first dependency is R or Python. I use that version to create the suffix, version suffix. So in using EZ Update, this is a command line version. I'm looking just for the dependencies for RStand. And if we run it like this, you can see there's 48 dependencies. So it looks like a rather large, large package. I may also make a comment on the... If you're just searching, I have Rver and Pyver. So I don't know what language we're using and I don't know what version I'm searching against. So those command line arguments are just for me the easy updates. I don't know what I'm using and which repository I'm searching. I think I have an open issue with somebody who thought, like if they put that there, I would update the file. But it's really a hint to EZ Update to know what I'm searching against. So now the feature of EZ Update is it'll read all the existing dependencies from a file. So here I'm saying EZ Update, I give it my R file and then I say search RStand. And of those 48 dependencies, almost all of them are satisfied by the existing R that I'm specifying on the command line. So if you're building a custom package that just has RStand, now you can see there's just four packages you need to add. So I use this a lot in just trying to build packages or new packages or even just do metadata searches. I don't have it in the slide, but if you do dash dash meta, it'll just dump a ton of information about that module. So it's kind of interesting to see what the contributor thinks is important. And that works for both R and Python. So again, our current approach is to use bundles for everything. And that's what I see in the community. And again, there's so much Python code that's being written in the bioinformatics world. But I needed to improve how EZ Update works in order to install things. So WOT is a single cell RNA from the road. And it's a very, very complicated package. So building this by hand or suffering through repeated failures with dependencies not met. So you can see now that when I'm using EZ Update, it reads all these dependencies that are found in the dependency list. And then it finds WOT and it starts to unwind all those dependencies recursively. So this one package extends the base Python and adds 50 modules to it. John, a question on this. Where are you getting the information that what requires all these Python packages? Because this is not tracked in PyPy, is it? PyPy does. So the dependencies listings are in there. Although it's still, PyPy is very problematic as you know. And not everything is documented. Once you download a package, you might look at the dependency list and the setup.py. It often doesn't match what's in the metadata. So since it's maintained by humans, there's lots of error in there. So what I've done is I just have a separate script for that, which is probably not available anywhere. But what I do is a pip install WOT and then just check what pip did and use those versions. And where is that getting the information from? PyPy just, I don't know. God knows, I don't know. Maybe it scans the setup.py or the requirements.txt in the source star ball or it asks PyPy. Sure. I also have a lot of scripts laying around in the directory with this that I've not checked in. So I've actually go to PyPy, find the sources, untar it, put it in the directory, open up the setup.py file and scan that for dependencies. I've also looked at the requirements file. I'm just a huge critic of PyPy. That's a whole other slide deck. But it's getting better. In the last two years, I see a lot of improvements. So people are inputting better metadata now. But I'll just go into it briefly. But what we're doing now when you do a new PR for easy configs is it starts going through Jenkins and it's actually trying to build this. And PyPy, there's no such process. Anyone can just push something up. And until PyPy actually has a CI built into it where if you say it needs these requirements and it tries to build it with those and it fails, you know, we're going to have these problems. Although I can just imagine right now there's still a lot of people that build software that are out there. I think even WOT is still at the brode. It's not checked in yet. You know, it'll be another barrier to get people to maintain their software. I do see Facebook's investing money in PyPy. And their key thing is they want to have modules that are signed. So they're very curious about security and making sure the modules are what they say they are. The reason I used it, I picked the versions based on the PIP install is because that's what everybody's doing. Everybody's doing PIP install WOT and just whatever PIP thinks is how it should be. Rather than scanning the metadata myself or trying to figure things out, I mean, even the people that develop what probably do PIP install WOT themselves. So if they're happy with what PIP does, then we should too. I'll look at the code and see what it does. Look at the PIP code? Yeah. I'm so sorry. So that's the updates that I've done, the easy update. Just a few of the things that we're doing at the Hutch is, again, based on the earlier problems we had with a very polluted build environment, is I build every easy config now inside a Docker container. So I have a Docker container that has a very particular version of EasyBuild, that has a very particular version of the tool chain. And we're very careful about how we build that Docker container. So our Docker run command, it's a real chicken and egg thing. We need DevTools to install this stuff. But after that, I don't want DevTools. So it's one huge run command that installs DevTools for Ubuntu, installs EasyBuild, and then I remove DevTools in one line. So my resulting containers, it's a true clean room. It literally does not contain make. It doesn't have a GCC. It doesn't have any development libraries. So it's a very, very clean environment and we only use it one time. So there's so many pieces of software out there that will just insidiously like pull libraries from the net, things that you're not aware of, and drop them into that environment. So we just do a Docker run. We build a single easy config, and then when the run is over, the container is turned off. So each time we build a new piece of software, we start with a clean container, clean build environment. And that's really improved our build quality substantially. And just, you know, my usual complaints about tool chains is, you know, when they come out, it's really hard to get the next version of our build because I need, you know, 100 other dependencies first. And we've also started a bounty program. So we're actually working with HPC now to help us build as many of those dependencies as needed and help the community out. And that's an ongoing project and it's kind of an experiment for us. That's a very old photo. Just briefly share this, because of a, oh, share my screen. I'm using modules of course, so we do a spider, and every time I make a new piece of software at the Hutch, I'll run a couple of tools I have and update our website. So for the first time in many years, we have a current up-to-date list of every piece of scientific software at the site. We publish it on this website. We have a little tutorial in here on how to use modules, how to request software. And then I have some really specific sections on R and Python and just our bioinformatics software in general. So if you're looking for all the different R's we have, they're listed here. Anything, again, using Lmod, the module type says bio, so I publish that as a special list too. And I remove all the duplicates, so any given name is only used once. And it's kind of cheating, but you can use the browser search. So if you were looking for something that did single-cell, you could just type that and use your web browser through this list. And I have a, this is just Python that's publishing this. And for any one of these, you could just click on and go to the actual site where the software is. So I tease all that metadata out of the easy config and annotate everything. And then the other thing I annotate is with easy update, I also have something called easy annotate. So since I do a lot of work with the EXTs list, again, like how do I expose all of those, you know, thousand R modules and close to a thousand Python modules to our end users? So again, I'll open the easy config, pull all those lines out, go to PyPy, find the website and publish it in a list. So this is the module list for our current Python right now. And again, this is, you know, fully annotated. It's a complete list of everything and you can just click on anything in here and go to the site where the software is and get the docs on. All of those documentation pages I was showing you are actually GitHub markdowns. So it's the flip side of our repository. So it's all contained in our docs directory. And then from here is where you would find all of our easy configs, which are published. So we're using GitHub for all of this. So it's the repo for our repository and it's also our documentation. And then the scripts that create the web page are in this repo. So it's all nicely done in one single bundle. So, you know, GitHub Push puts my recipe out there and if I run my scripts correctly beforehand, it'll also create all the documentation for it. So users are, you know, as a new piece of software is published, all the documentation that goes with it is published at the same time. And that's the end of my presentation. The last bit is probably something you can update with something like GitHub Actions. So when you do a push, it updates the docs and publishes a new page. Yes, that's the next step. Save you a couple of minutes. Yeah, it's just a couple of command lines now. But yeah, that would be version two. Any questions for John? The mechanism you're using with a container to build easy configs, is that publicly available so other sites could use that? It is. It's called in the same, let me see here. Well, in the Fred Hutch GitHub repository, there's a repository called EBCB, easy build container build. Ketchy. Just have a question regarding the container build. You were mentioning that I think you have to have this long run command to install easy build and then remove all GCC. So did you think about using this multi-stage build from Docker where you can have basically a build container and then build it inside the clean container? I have. I've looked at that and I've done a lot of experiments with it and I have not had any success with it at all. It doesn't, I don't know. It may not work the way I think it does. So I've obviously had many stages in there. You know, an OS stage, an easy build, Elmod, well, Lua is way up at the top. Need that first. That's not part of the base Ubuntu. But if it's working for you, I'd like to see an example. Just adding on to, Ken, it's about the workflow GitHub action. There is also one that may help you. It's a URL checker. So if you're doing easy annotate, getting all the URLs, just want to make sure that you can still validate all those URLs. Sure, it's probably still working, but there is one feature that does that. So that could help you when you're publishing those documentation. It's on the marketplace. It's called URL checker. That's what it's called. So I actually was curious about it. I tried it out and it works pretty well. And it works through GitHub action. Yeah, again, whatever is published, I just take it at face value and publish that. Occasionally, there are some broken links. That would be a good thing to add to it. Okay, thank you.