 Okay. Hi, I'm John Day. I'm here from Fred Hutchins, Seattle, Washington. And I'm the author of Easy Update, which was on the survey. Can I just get, like, do my own survey right now? Like, how many people use Easy Update or heard of it? Oh, okay. A few users out there. I just threw this in real quick. So the Fred Hutch, we're Seattle, Washington. We're home of three Nobel laureates. We have a pretty nice campus. Pretty good budget. We're federally funded. And we have over, well, very close to 300 faculty at this point, which we refer to as PI, personal investigators, people who have funding to do research. And that's our campus. So our and Fred Hutch. So the Fred Hutch has had a really interesting relationship for R. The main maintainers used to be on campus for eight years. They were there a long time. There's a lot of leftover R users there. The main bioconductor people were there up until about a year and a half ago, and somebody's wife got a different job and they ended up moving to a different part of the country. So we have hundreds of R users and they're very demanding people. And when R comes out, they want a new one, like within days. So I'm very highly motivated to do that. So when your local R build has 800 modules and you have to go through that list by hand and update them all for the next version of R to be released. I just think it's a level of work that can be performed by humans. It's just, it's not possible anymore. So it's just a real difficult task. And of course, Python is in a similar. So the motivation of course, so R's coming out four times a year. Python comes out a couple times a year. There's Python 2, Python 3. And again, these things have hundreds of modules and it's just really hard to go through and update the versions for all of those. It's also very hard to unwind all the dependencies and make sure you have them and they're ordered correctly in the easy build file. So R is pretty straightforward. It has a really nice API. The metadata is always correct. It's a beautiful thing. Whenever I'm updating R, it works really, really well. Bioconductor is a little different. Their metadata is JSON files that you download. So when I'm running easy update, there's a slight penalty to pull those files down, but we do it once. We load them. And if I see them locally installed, I'll just open them again. And again, the Bioconductor metadata is beautiful. It's always exactly right. The dependencies have already researched. It's very well maintained and they only update that file on new releases of Bioconductor. Python on the other hand, it's just a swamp. Anyone can upload a Python module. And the type of things I see in Python are like the project name is different from the module name, which is different from the download name. The dependency names that people use, it's like street slang. It doesn't actually match the project name. So they might be referring to the import name. They might be referring to the project name. So I've just had to write a lot of extra logic to kind of tease that apart, try variations, dashes, underscores. One's allowed in URLs. One's allowed in the language. Something to think about the next time you design a language Guido, it's should you make language constructs match across different domains. Perl, I had to do BioPerl around Christmas. So it's not in the code. I haven't pushed it yet and it's really ugly. I've been looking for an API for Perl. Now I've put that out. I joined the Perl developers mailing list and sent that out there and they just treat me like I'm a crazy person. Why would you want an API? Why would you use the internet for Perl? It's in Perl. So the only way to really get the metadata you want is to do the latest install of Perl and then use Perl tools to interrogate it. I found there's some people out there who have taken the CPAN database and have put it all in Elasticsearch and it's searchable. But it's not so much like an API as like a curiosity. Elasticsearch, I'm a huge fan. It's a great product. You can certainly get data out of it in a lot of different ways. But like the idea of the latest version or something like that, there's no real nice way to do that. So the next time I'm under a lot of pressure to update Perl, I'll probably do a better job and I'll get that code out there. So I'm supporting three and a half languages right now. And then the dependencies. Again, R is really great. Everything that's in R, if they list the dependency, it's actually needed. No one ever leaves stuff out. And Python, people forget things. Again, it's pipi.org. It's just humans typing. It doesn't seem to be any gateway there that really enforces this or they don't put it in a Jenkins container. They don't try to make sure it actually will work or do what you say it does. So I don't want to come here just to whine about Python, but if you run Easy Build and there's some mistakes, you get what you pay for. The pipi maintainers are very aware of this and they basically say, yeah, we can fix it. They can't. There's a blog post somewhere that explains this situation and they basically say, yeah, we're getting whatever people put in and we can fix it. I can say it's gotten a lot better from a year and a half ago and then this spring, they really shut down pipi.python.org. You don't get a 404, but they're not serving it up anymore. So through that whole process and people are cleaning it up, it does look a lot better now than it did even two years ago. So someone's paying attention and they're working on that. So this came over the mailing list just last week, so I thought I would throw it into the slide deck, but somebody wanted to, you know, what would happen if you included rstan in r and a feature of Easy Update is you can run it on the command line. As long as you give it the r version you can also say py version and then dash dash search. And the output of this is something that you can just cut and paste with the mouse and put it in your ext's extents list. So r stands in the bottom here and above it in two columns are all the dependencies. So that's what it takes to walk that tree. So if you were just building a single r instance with rstan in it, you know, this is what would be in there. The other feature I have that I haven't made public yet because the code is so ugly and I'm embarrassed by it, but you know, you'll end up having like the standard r distribution and then your custom one. So after you've typed this and you have this huge list, almost certainly there's a lot of duplication between what's in this and the official easy build distribution is. So you'd like to be able to subtract that out because again we're humans and I don't want to spend all day picking through this file and looking for those duplicate entries. And the diff is a really weird feature. It's not something you'd use every day unless you're forced to build r over the weekend. And I'm really in the automation. It's something I think of a lot. So 2019a was just released and I'm thinking like, why don't we have all 1000 easy build modules updated over the weekend? Like we should just be able to go out to the internet and update all the version information, not just the entries into r and Python, but all of our bioinformatics modules. And there's no answer to that. That is a... But I think about putting maybe the idea of like hints. The easy build file itself tells you it's in GitHub or it's in a private repo it tells you a lot about where it might be coming from, SourceForge, and there's not that... There's APIs for that, but you could probably... I don't know. It's probably not worth the effort, but just talking about it, I think in the future people need to be cognizant of things like that and educating programmers when they're young in school. And GitHub's a really great thing, but even when we look at GitHub, it's just a morass. I mean it could just be master.zip. It could be v1.4 instead of... It could be releases. It could be versions or no information. And when I find that in the wild, I'm that guy putting an issue in. Please tag your software. Please make a release. Please get that v out of there. So just going out there and standing in front of you guys, complaining. But doing something to improve software in general, how it's being released. And we're all using EasyBuild because we want it to be a reproducible science. This is something I've had in mind for a while, having a backdash update. Yeah. Trying to figure out what the latest version is. And you're right, getting it perfect is going to be very, very hard. It's a mess. But I think we can probably do it for like 90%. Yeah. Which just GCC or Doom software has a very standard way of just looking through the directory and sorting the right way I think for some, even for most of the software on GitHub, it's still okay if they do proper release tagging and if you see a v and just throw it out there. So we can probably do like 90% good and then very special ones like openfoam or no tagging or I don't know what. Yeah. You do those manually, but it will still save you a lot of work with 90%. I might take a stab at the GitHub stuff and as a first go see what I can get. Just run it through, just run it over the weekend and see what the results are. Oh yeah, like Linux from scratch. That's such a great site. It's just like it's not really like releases. You know, it's not like a distribution. So all the modules that are out there in Linux are there. And it's a place I go and I look for stuff a lot to see what's possible. It could be in CentOS or Buntu, but it's kind of like a neutral site for me. And then, you know, Nanopore, they're difficult to deal with. And so I think like a lot of people, we started using EasyBuild around 2015. We just at the time just thought like, oh, this is a great tool for building software. We'll be able to save our easy configs. We'll be able to reproduce it. We are saving every time I download a file, a source file for some project. I am saving it along with the EasyBuild. It's amazing how many source files disappear over the years or projects get pulled on GitHub. And, you know, again, we want to make science reproducible. So having those source files on site and spending the money on the storage to keep them will help make that reproducible. But, you know, we were just all over the place with our easy configs. We had to grab something and it's like, well, we're using this version of the Zlide. So, you know, we'd end up making a new easy config just because of like something that's very inconsequential, I would think, you know, like who cares which version of Zlide is in some bioinformatics package. The researcher doesn't know. They just want their stuff to work. So the last year, we really made a big effort, like how can we use, and the 2018B course is a huge release, making sure that all the dependencies are the same. I'm in that boat. I don't want them to change. I want, you know, consistency across the line for that. So we're very much on committed to being, you know, 2018B making sure everything's consistent and for the first time trying to contribute back. And like our R and Python have always been very divergent and we're not going to do that anymore. We're going to use the standard Python, the standard R that comes down from the community and then all make a bundle that has all the extra modules that we want. And we'll do the same with Python. So just want to try to encourage, like, you know, more sharing and have more uniformity in the recipes and not have so many of them. And, you know, again, so I'm personally we're upgrading our clusters operating system. And what a task. So I'm the guy who's rebuilding all the software. And so, you know, I'm maybe about 80% of the way there. I've got to the end of March to get there. So like the idea of the community, I know it's out there and like issues and stuff like that. But like, you know, boy, you know, where could I post like a, you know, modules wanted list or something like who, I mean, it doesn't seem like anyone has any spare time. But we would, I don't know. We'll get this somewhere. I don't know. My boss has mentioned the same thing, like maybe getting funding a position and that money would look like it's well-spent. If it's national money, the National Institute of Health spending that, you know, then prove that we're contributing this back into the community and it's being shared with a large number of people. So I think that's kind of our angle on funding now. Maybe a central place to list wanted software is actually a good idea because many people are sitting on it or not contributing because of a lack of time. And they could just tell you, look, I have something here. I don't have time to do the proper contribution, but take it and use it as a starting point. And I'm sure some of the software at least would already be there handled by someone in some way. You know, as everyone posts their URLs or GitHub, you know, I'm trying to type them out. I'm going to have to parse through them all at the end of the show. So get my list of places to go. Maybe we should even have a separate repository on GitHub where you just open an issue like a wanted repository. And if people have an easy copy file, just copy paste it in the issue and you can take it from there. That could work. Yeah. We could at least try it and see if it works. Yeah. So just quick and dirty contributions and then someone else can take it from there and that would be a big, big help. And then maybe, you know, if enough people use it, maybe we, you know, get it polished and check it in and get it upstream. So, yeah. No, I get it. I was going to ask which OSF. Yeah. So we're moving from 14.04, which is rather embarrassing, up to 18.04. And from ToolChains 2016B to 2018B. So we've put the, there's a little hook you can put in lmod that writes a message to syslog. And so we're using that hook. We want to make some changes to it. And then we take that syslog information, we put it in Splunk. So for our entire cluster, it takes hours to run these reports, but we can actually get a report on every module load command across the entire cluster. So that's kind of the basis of how I'm picking what to build for the next release. So we have 138 modules that are in the bio directory of, you know, that nice little easy config at very, very, very at the bottom where it says module class. That is super helpful. So I'm just kind of like looking at layer 8, just like what the researchers see, just those bio-traumatics modules and all of that other stuff, all of those other lives are just kind of making themselves with dash dash robot. Yeah, so 51 million module loads are clusters not anywhere near the size of some of you that have presented. And 1.3 million of those are in bio packages. So that's my target for what to rebuild. So there's a pull request that's still in the works, right, about updating dependencies. But this kind of transition where you're going from 2016 B to 2018 B, it would automatically pick up all the latest versions and insert them into the things that you're trying to do, that you're trying to move over. So there's no hand editing of anything like that. It'll automatically update stuff that's already available in 2018 B, right? Well, I'm not sure I understand. So say you have something in 2016 B that uses auto tools, an old version, right? And it's a later version in 2018 B, it will automatically update that in the transition. And that's available in the easy build today? No, it's not. It's still, we're still waiting for a review. But it's working, I mean. Okay, that would be a big help. You know, some things I've done is I've just, you know, go to the 2018 B repo and every bio module that you have, whether I needed them or not. I looped through that last week and I think I built 57 packages with this script that ran over the weekend. So, yeah, just kind of like scoop up whatever I can. And that's the end of my talk, just a quick update on easy update. Oh, do you have like a demo? The other half of easy update is something called easy annotate. So the same code that reads an easy config, you know, I can take that and unwind it and publish it. So this site is a GitHub markdown pages and this is attached to our easy build repository. So whenever I check in a new module, it's kind of awkward. You have to do it twice. You know, you create all the documentation and then GitHub publishes the markdown site for you. So things like our modules, we, you know, we have a lot of our years of it, 2018 B. So here's my current one. So what easy annotate does is it publishes a list of all those modules that are inside the easy config. So with this site now, a user can just go here and hit the little search thing. And if they're looking for a module, you know, maybe it's already installed. Probably about half of the help desk tickets I have now are stuff I already have installed that people don't know how to find it. So trying to keep everything we have published, all of our bioinformatics stuff published, all of our easy configs published, and in like a user friendly way, overview using like a little tutorial on how to use modules. So in general, we can just kind of direct users at the site to this page. And then all of these are linked back to their original. So if you want to know what that module does and that's all easy, you know, all the features and easy update of crawling the web and finding all of these links, you know, so it's all in the code so I can annotate this and produce markdown. And it just, you know, this whole site again, it's all driven. And of course, since we have a module list of just bio, you know, I can publish that. So these are all the modules we have in our bio. And mostly, again, like having it on a web page makes it searchable so people at the site can just go here and look before they open up an issue with us. So, questions? First of all, great talk. I mean, it's really great that you're tackling one of the problems which many of us fight. Out of curiosity, and since you are kind of related to the bioconductor community quark position, locality, whatever, R essentially is sticking to a semantic versioning scheme, but not officially. And I recall the day where the bioconda community asked the RD well-opist to actually announce that they stick to semantic versioning and they never got a reply, they never get a confirmation that R will adapt semantic versioning. Even though, if you look up the details of every release, it's essentially semantic versioning. And all the dependencies and the minor patches could be dependable. And do you have any work from the RD well-opist or not? No, we're kind of out of touch right now, but I know people who know people, so I could get to them. But yeah, it's pretty frustrating, right? Some of the R modules obviously don't move at the same rate as some of these things have been sitting around for a few years. It wouldn't be a problem if the update wouldn't be enforced because it's actually compatible in most of the cases. Yeah, and I've had to fudge things. Things are broke. I mean, I would just drop down a version and try again. Yeah, it should be better. Thank you. Thank you.