 my opening slide. Yeah, the slide is there. So good to go. All right. So tell me when the guys start now. Yeah, you can start. Okay, so I'm Robert McClay. I'm the chief cook and bottle washer of Elmod. I've been working on Elmod for over 10 years now. So I'm going to give a brief history. I'm going to assume that everybody in the room has heard of modules and occasionally have heard of Elmod. Is that a fair statement? Yeah. Okay, so I'm going to talk, I'm going to, this is really going to be a an advanced topic kind of thing. I thought since everybody should be fairly familiar with Elmod, or at least modules, I thought I'd show some interesting features about Elmod itself, things that you might not easily or know about. And so I'm going to do that. But I'm going to give a brief outline of what I think are the major features, little history, and then go into advanced topics and mention some things I'm thinking about. So Elmod is reads modules that are written in TCL or Lua. It has the one name rule, which means you can only have one version of GCC loaded at a time. It supports a software hierarchy, but it's not required. It has a fast, it has a way to remember all the module files or as many as you want to tell the system module file say, and that means you can do things like a fast module avail or a fast module spider. Modules can have properties, things like, you know, is it built for GPU or a mic, or is it alpha beta or deprecated or something. Elmod supports semantic version, which means that 5.10 is newer than 5.6, which is not the way it would be if you just use lexographical. It has support for families. So you can have only one compile loader or one MPI stack loader or one Python loaded. Well, Python, but, you know, if you have Python 2 and Python 3 on your system, you could, if you wanted to prevent people from having both of them are loaded at the same time. You can also track what modules are being used. I think this is an important feature and one we use a lot to figure out what people are using and what we can deprecate. There are many other features. ML, collections, hooks. So you can tailor Elmod to work the way you like your system to work. Collections, that means you can have a default set of collections or a group of modules that you want to be either a startup or you say I want to load modules, restore, GNU, which gives you a whole bunch of GNU stuff and then switch to Intel or whatever. So you can have collections. And ML is I could wax hours on how wonderful I think ML is. If you haven't been exposed to ML, ask one of your colleagues who has and tell you why they can't give it up. All right. So that's a big list of some of the things or features that I think are important in Elmod. A little history short, one slide's worth. The original version of Elmod only supported a name and a version. Elmod 5 supported a category name version. So you could say compiler GCC and a version number. Elmod 7 supports name version version. So you could have the name of the module is Intel IMPI and the version is 64 slash 1801. Okay. So I'm going to now talk about some of the features I think are helpful in using Elmod. Suppose you want to have, you want a module you want to remove, but you don't want to do it immediately. You want to contact only the users that care. So there's a file called the admin.list, a.k.a. the nag message where you can put a message in that says this module is going away or this module has a problem. Please use something else or whatever you want in a message and only those users who load that module get that message. So you don't have to send out a blanking message that says please stop using XYZ version 2. The format of the nag message is discussed in this on the Elmod documentation. There are their support for pattern matching for either on the module name or the file name. Okay. Another recent feature to Elmod is depends on. So instead of trying to, so let's say modules X and Y depend on module A. So you could have said, you know, put it, put up that X and Y have pre-rex, which is module A, but depends on handles this very slightly. So if you purge load module ML obviously, which means ML and a module name means load that module, MLX means load X and then unload X, it will load A and unload A. If you purge again and load X and Y and unload X, A will be retained. If you load XY and then unload them, A will have been loaded and unloaded. And then if a user loads A, X and Y and then the user unloads X and Y, because the user requested A themselves, A is kept. So that's a convenient way to manage and not have and leave, not leave a bunch of modules lying around that were dependent. And I know that modern versions of EB support this, right, Kenneth? Yes. Okay. Okay. So, since EB makes it so easy to generate lots and lots of modules and many sites use a single location to store their, these kinds of files, you may want to allow them to stay, but you don't want to overwhelm your users with a huge list of modules that are, you know, two or three years than use the most recent versions. So, you can hide modules. And through a system module RC file, there's a standard one, or a user model RC file, you can say, you can include a new statement, which is hide version, flu one, two, three, it's hidden from avail or spinal and keyword, but hidden modules can be loaded. Sites can either use them to be for deprecated modules or experimental modules. Users can see the hidden modules with a module minus, minus show hidden avail. Sites must explicitly list all hidden modules and there's no pattern matching. So, Ward provided a hook that allows you to use pattern matching. So, there's an invisible hook where you can use when you wanted to do pattern matching. So, you are passed in the full name, the short name. So, the full name is GCC slash, you know, 7.2 or something. And SN is the, is would be GCC. That's the short name. And FN is the file name. And there's a talk of the is visible is passed to you. So, you use whatever rules you want to set the is visible to true or false. And that you can then mark which modules you want to be hidden. Or, and if you want to, there's a fully work example in Contrib More Hooks site package as an example. Okay. So, here's something that came up in somebody I'm working with at the University of Texas. These people are big into singularity and they want to supply a huge list like 8,000 containers, which do a lot of bio packaged up bow tie and all those kind of packages. And we are not a, you know, we have a small but growing number of biologists who use or people want to use those apps. But we don't want to expose that huge list to all of our users. So, you know, I, the bio people are the one that tend to have the one that I know they have largest, but any group that might want to have this. And, but you want to opt in for these modules. So, you'll say something like module of bio container or bio packages. And so people who don't load that module won't know about them at all. They'll just see the bio package module. And everybody who wants to opt into those just loads that the meta module which exposes this. And the cool thing, so let's say bio packages is the meta module. And among the thing it does is it extends the module, it extends the module path. I don't know if this shows up, prepends module path to the list of the where these bio package modules live. But you can also extend the cache file. So, you don't have to have a cache file that includes this dynamic, you know, this huge tree of modules. You can have a separate cache for just that. You can create this file called, you know, it didn't have to go, some file that's similarly named the script.lua. And it, which has to match where this bio package is says. And you can give it a directory and a timestamp file. And in this format, in this Louis, define a lua table, you can then dynamically expand the module path to include this huge list of packages. And because the trick is to not let spider know about prepending to this module path. So spider won't see this prepended module path. So it won't walk that. But a user, when they load it, the mode won't be spider. And so they'll extend the module path and then the, the, the cache file will be there. And so it will just be there and you can see all these modules. They'll see them, the people who want all these bio package modules in these containers. And then when they unload this module, the whole thing disappears. So it's a nice way of if you have collections of modules that most of your users don't want to see, you can hide it. And this all works just right now. No changes required to Elmont. Robert, one small question. Yes. Why is it important to not prepend to module path in spider mode? Is that to prevent these modules to end up in the general cache? Yes. But then what if they do module spider on say bowtide? It will not see those modules, right? Correct. Okay. Correct. So, you know, let's get the trade off. Is there a way to force spider to pick up on the packages? Well, you can do that. But then you're now main. Here's the problem. Um, well, I guess what I'm what it's actually Alan's question. But what we're asking is there a forced spider mode that will jump around this check? No, at least not least. And there's no way to do, you know, I mean, if you want that, then you take out that if statement if block and just make a put a prepend in it. And it would just it would just work. But then having this, then it would be part of the of the cache. So it would always be visible. Yeah. Yeah. So you either you either are on or off. Yeah, that's cool. You know, this is this is, um, you know, most sites may not need this. But, you know, for this particular case where there's 8000 modules listed, I, you know, yeah, it is hiding it from people who might want to know about bow top. But it is a, it is, I think it's a useful trick. But sure, if you want it to be part of the standard package, then you don't need to do this than you should use whatever you've done. And you don't need to have a separate cash file. This is a way the reason why this exists is the way our file system works. We have what we call the work file system, which spans multiple systems. And so we want to be able to provide this container, this the set of things to all the systems that support work, which is everything now. And, but we don't want these containers to be, you know, loaded by everybody. So this is a convenient way for us to do that. Okay. Oh, let's skip something. No, that's it. Okay. Well, I wanted to make this short, so it's short. So here's some idea. This actually discussion with Xavier. One of the complaints that if you have many, many TCL files, dot version files or dot TCL, you know, dot module RC files in your module tree, and they're written in TCL, that every time Lmod sets up an interpreter for each TCL file. And I want to look to see if there are ways to improve the speed of this. One possibility is to combine, optionally combine the TCL and Lua interpreters together into one package, the speed of performance. And this will be used to parse the dot version dot module RC and TCL module files. I only thought about this. I don't know how possible it is or whether it speeds things up at all, but it's an idea I'm looking at. It should, I'm hoping that it will speed up the sites that have lots and lots of dot version files. So, collusions, you know, you can find the latest version of Lmod at GitHub. This last stable version is at SourceForge and the documentation can be found at lmodreadvidocs.org. So I'll entertain any Lmod questions. Okay. Any questions on Lmod in the room or on the chat? Maybe not on the chat either. No, there don't seem to be any questions for now. All right. Well, give me a minute to switch to access. Switch talks. I mean, I lost. Okay. Yeah, way to move this. You can start when you're ready. Okay. I am ready. Okay. So Kenneth, let me steal some time to talk about Lmod and Exalt. So I'm going to talk about the other project I work on, which is Exalt. And it's a way to track HBC usage by a job-level collection. Exalt is trying to figure out what's running on your system. It was a National Science Foundation project. It's a census taker. What programs and libraries are running your system. It's currently running at lots of places, TAC, NICS, University of Florida, KALS. There's some work going on at Livermore. And it integrates with TaxSatch, which is another way to get detailed performance counters. And you can get commercial support from Lexus if you like. So we're trying to understand what your users are doing. So what programs, libraries are your user using? What are the top programs by core hours, by counts, by users? Are they running their own programs or built by somebody else, either the system or by another user? Are the executables implemented in C, C++ or Fortran? So since I, if you build under Exalt, I know the linker that, you know, what compiler you use to link the programs, assuming you use one. So I'm assuming that main is written in whatever compiler you use to link with. You want to track the number of MPI tasks, the number of nodes, track threading through OMP NUM threads. The history is that Mark Fahey and I both separately came up with independent packages. And then we combined forces to build, combine Altie and Lariat into Exalt One with Ruben joining a little later. And then once we finished the funding on the National Science Foundation, I finally extended it to Exalt Two, which the earlier versions only tracked MPI programs. Now we can track all programs. Design goals are to be extremely lightweight, to provide some provenance data, how does your program run, what libraries, how many you use, what libraries or what applications, and then we collect the data in a database that you can do analysis. The linker, Exalt wraps the linker, or finds the linker, wraps the linkers to track, tracking of executables. We provide our own LD command, which then intercepts the user's link line. It generates code key value pairs. It captures the user's trace map option to LD to find out what .os and sos were part of your executable. And this is transmitted, collected into a JSON file, which can be transmitted via the file mechanism or SysLog. I'll talk about that more later. And it also can add code that executes before main and after main completes. And then this can be transitioned to a database either by writing the files or sending data to SysLog, and you can either collect it using the database I provide, or you're willing, if you want to take the JSON, you can stick it into any kind of system you want, like ElastiSearch, LogStash, and Kavana. So, since I know the author of Elmod, I can get Elmod to provide some useful information. So, Elmod Spider walks the entire module tree. It can build what I call the reverse map, which maps for path the modules. So, I can map both programs and libraries to modules. So, if I have a path like the one that shows to some HDF5 module or HDF5 library, I can map it back to which package it came from. And this is also helpful if you want to do function tracking. And if you don't run Elmod but run Tmod, you can use it. You can still use Elmod to build a reverse map without making users switch to Elmod. So, you're not required to have a reverse map. So, if you don't want to do this, don't want to run Elmod at all, or don't want to have some way to generate this, then you can just plain live with the paths. And that may be sufficient for most sites. But I like to think in terms of modules, I don't really care where they're located. So, I like this mapping between paths back to modules. One of the problems with Exalt is that it's running in the user environment. So, at least the data collection side, or data generation side. So, you've got to protect Exalt from unexpected changes in the user's environment. So, the way that I do this is by remembering the LD library path that was there at Configure Time and also a set the path at Configure Time as well for all the internal executions that happen with Exalt. And which I call C++ programs inside. And so, this protects me from the user doing something bad to my environment. And by setting both LD library path and path, I can avoid that. So, all you have to do is get it right once at Configure Time and then the user changes will in effect exalt. So, what's the point of collecting this data? So, we had a case where we had a large, we had a lot of people using large memory nodes and we figured out that they were a lot of particular chemistry application. And by using tax stats, we were able to show that they did not need to use large memory queue. They could instead use what we call reduced weightness instead of asking for all cores that could ask for half the number of cores or a quarter of number of cores and double or quadruple the amount of memory each task had. And that that they could run in the normal queue and not clog up the large memory queue. So, tracking non-MPI jobs. As I said, originally, we only tracked MPI jobs by hijacking MPI run or IB run or SRUN or whatever. Now, we use an ELF trick to track all jobs, all programs. So, ELF is baroque. Among the things you can do is you can define some functions and then these stats adds, you can add a function to the init array or the finny array and the one that in the init array, any of those get called before main and if the program completes, you can call anything in the finny array will be run after main completes. Now, Exalt also adds in a signal handler. So, if it of a program, segfalts or whatever, I've hooked it in so the finny will get called as well. But since you only get one signal handler per signal, a user can override this. The C code is compiled and linked through the hijack linker. It can also be used by LD preload. By default, we only use LD preload but can do both. So, as long as your program is dynamic, not completely static, then if you set LD preload, those things are read and those things dynamically get added to your program. So, if you have any dynamically linked library and you set LD preload, then Exalt will hook into your program and get called before main and after main. But for static binaries, you have to include it in the executable. So, depending on your site, you can have it be built into the executable or just be set by LD preload. But by default, I'm only using LD preload. So, the advantage that Exalt has is that it can track all programs that run on your system. Every CP, every move, every, you know, quick call to Python. And I don't want to do that because I'll get overwhelmed with the data. So, I only want to track, typically, I only want to track some executables on the compute nodes and I don't want to get overwhelmed by the data. So, the idea is I'm not going to get everything but I want to get a flavor of what's going on. I want to get, I want to know what's going on at least to some detail. It's sampling. It's like this, you know, what the pollsters do. I'm going to collect enough data to really understand what's going on without having to talk to everybody or every single person. So, you can tell tracking only when you want it to. You can control where it runs. So, particularly, you might only want to track what's going on a compute node. Or you can also, based on path, I want to ignore everything that's in bin or user bin. There's an issue about standard error getting closed before the FeeNee gets run. So, Exalt, I mean, Exalt takes care of that. Elmont has nothing to do with this. But we've got sampling for non-MPI programs and this is also like configurable. We use flex to compile in the patterns. These regular expression control what to keep and what to ignore. There are ways to control this. There's accept list. So, I want to accept certain things like, you know, things that are in user bin or bin. I want to track Perl or whatever. But I want to ignore everything else. And so, here's a configuration file that says three things. So, you can control what pattern of hosts I want to be on. In our case, our compute nodes are always started with a C, three numbers, a dash, and three more numbers and something else. But you could have any other patterns that your site uses. I want to hear the executables that I want to track. For example, I want to keep DDT. But I want to skip that anything that starts with a slash user or anything that starts with a slash bin. But anything that looks like a Python, anything that looks like R, I want to keep. And those are special packages, which I'll talk a little bit about later. So, there are either packages and keep means I'm going to record them. But there's special treatment for these things that are PKGs. But anything else I can skip? And then you can also control the what environment you track. And you might want to skip a lot, but keep some things. Because the environment can be huge. And that can take up a lot of room in the database. So, I want to have minimal impact on all programs. So, currently, Exalt, at least in my testing, takes less than a tenth of a second. Only non-MPI programs only produce an end record. So, currently, Exalt produces something before main and after main. So, it produces a start record and end record. But for non-MPI programs, I decided I'm only going to get an end record because I want to be able to sample. So, this is psych-configurable, but these are the rules that we're using. So, any program that takes between 0 and 5 minutes, it has a 1 in 10,000 chance of being recorded. Between 5 and 10 minutes, it has a 1% chance of being recorded. And anything longer than 10 minutes gets a 100% chance of being recorded. So, now I can track things like Perl and Auk and Zed and Zizip, etc. At least get some flavor of how important they are in our system. And then use the fact that I know what percentage, you know, what chance I had to get that to get an estimate of how much they're going to use. And it's only an estimate, but it tells me a lot about what's going on in the system. Okay. So, now, this thing called these PKGs, these packages, we can track things like R. And so, we want to know what libraries, what packages are used by R. So, we can do this by having code that runs as part of your R and then generate something. And this will allow us to know that, you know, used R. I'll know used R because you ran the R program, but I'll also know which packages are used. And I can, I'm working to support Python and MATLAB later. Robert? Can I ask a question? So, in the previous slide, you said that you're looking at Auk usage, Zizip usage. How can you do that? If XL2 does not wrap any more the launcher. So, how do you do that? So, that's what this was all about. So, what I do is, I have a .so, which contains these init functions and finit function. And I set LD preload to point to that .so. Then ELF automatically combines those into a single executable. And so, you can hijack things. You can do all kinds of, you know, things to ELF program. If it's a dynamically linked program, then you can dynamically link in stuff that the user knows absolutely nothing about. So, any application actually can, okay. So, non static link applications, you cannot get it, right? That's correct. But except for a Cray, I don't know how to make, well, on R system, there is no libc.a. So, there's no way to make a static application on R system, except for the Cray. So, it's very, I try. It's pretty hard to make a static binary. And under most modern Linux's, even move and CP are dynamic. So, okay, thanks. You should have seen Victor's face when he realized what you were doing. It was cool. Okay. So, yeah. So, we can track, we can, we can track Pearl, Auxet, etc. Obviously, I'm not getting all of them. I mean, for any of those that take less than five minutes, I only have a one in 10,000 chance of getting them. But, you know, but at least it gives me a sample of them. And I know, I have a friend who some of you know, Peter and your Rath came up with the tool, which is somewhat similar to Exalt. And because they were dealing with a system which had, with a bio, mainly a bio shop, and most of the programs they ran were Auk and Sed, and those kind of programs, Pearl. And so, they wanted a way to track that. So, this gives us a way to do that without overwhelming the database. And then also, I just said we can support tracking packages. So, we've got this running right now for R, and we're looking to add it for Python in that lab. So, as I said, Exalt knows about all programs that run. They don't have to be built with Exalt. So, commercial applications are things that were built on a system or things that, anyway, won't be linked with the Exalt linker. But, if you have the Exalt linker, this program, Exalt puts in what I call a watermark, which tells you information about the program. It finds out who built the program, on what machine and what modules were loaded. And there's a program called Exalt Extract Record, which is supply, which will find out this information. So, here's a piece of what this looks like. In other words, it tells you what time, what, you know, what the epoch was. It also gives you the date, who built it, what the compiler was used, the path to the compiler, what the host is, what version of Exalt generated that. So, this information is in the executable. It was built with Exalt. If it wasn't, then this, there's no watermark, so I don't know anything about who built it and when. But a funny story, we were benchmarking for the new proposal, what we call our frontier proposal. And we were, you know, competing or trying to do benchmarks that were run on the blue water system at NCSA. And so one of my colleagues tried to run executable and it failed, and he ran this program and discovered it was built at NCSA and not us. So that was kind of cool. Some new features is that Exalt can know if a GPU was used. This is an optional choice. It will know if one or more GPUs were accessed, there's no performance data, and this comes from Scott McMillan from NVIDIA. And also Exalt can now track what's going on in a singular container. You just provide some sets of environment variables, Singularly underscore something or other, LB preload, whatever, and it can write to syslog or use a file transfer, and this is also from Scott. And so one of the things that this pointed out to me was typically, well, many systems are running Red Hat style systems. We run CentOS, but containers are typically you want to, even though they're using whatever kernel you use, and they're essentially doing a fancy CH route, so they're running in a slightly different environment. And so where CentOS put things and where Ubuntu puts things are slightly different. So I modified the way Exalt works so that instead of hard coding the pass at configure time, I'm now using a limited path to control what things get run where. So that's been a new change to Exalt as well. So if you want to get the reverse map, you'll need Elmod, and so there's a pointer to where to get Elmod, and then you can get the latest version of Exalt at GitHub at Exalt Exalt, or you can get it and you get the documentation at Exalt Redox. Okay, I'll entertain questions on either. Elmod or Exalt? Any questions? All right. We walk up to John who has a question, which may be related to R. Hi, this is John Day, and I don't really have a question. I actually wrote something exactly like Exalt a couple years ago, but I am plugged into the Unix kernel and I get an event for every single exec call. And unlike your program, like my code doesn't get inserted between the start of the program. So it's a stack based thing where I'm picking up this giant fire hose of events off of the kernel. And my first filter is the UID. So anything that is run by root, I don't care about. So I throw all that stuff away. And much like yours, I also output key value JSON straight to syslog and collect all that. And it is pretty interesting what you can find and detect and patterns of behavior on the cluster. But I never thought of looking at the loader. I just kind of curious if you have comments, if you've looked at just going straight into the kernel and grabbing kernel execs. I haven't looked any, you know, once I just found the knit mean the, you know, the advantage of us hijacking the linker is so I can put among the important things, this is build UID. So I can connect, you know, when I start a program, I look to see whether this watermark is there. And then this build UID is available to me. So that way I can connect the execution that was run and where it got linked. So that's really why I want to do that. If I hijack the linker, I have the advantage of doing that. And that provides useful information. Among the things I really enjoy is by doing this, I can back out what you linked with. So that gives me an idea of, you know, C, C++, FORTRAN questions. I guess it could be GO as well. Anyway, although I think GO produces static applications. Anyway, this gives me a way to get this information. And you're right. The problem with Exalt is it's a bit, you know, you've got a supercomputer generating all this data and trying to find ways to get useful information out of it without being overwhelmed by the firehose is really the problem. But yeah, send me some links for, if you've got anything written up on your stuff, I'd like to look at it. Okay, he's nodding. So that's coming up. It sounds like this may be a way to fix the lack of being able to track static binaries. Because- No, I can, well, I'm not sure. Well, I don't know. You can link static binaries, but only if Exalt was hijacking the linker, right? Correct. Yeah. But as I said, you know, I'd like to know how many sites are producing static binaries. Yeah, that's a good point. I certainly don't have, you know, I can't unwind as much data as what you have. You know, what I get is just given to me from the kernel. So what's in the proc file system? So I have UID, you know, parent ID, I have the command line. So it's not as valuable. I have to like tease that apart to try to figure out what users are using from that. And this gives me a way to track things like, you know, I mean, our, the way we understand our system is typically by the module use data and by Exalt. And what was nice is that we were able to say to, you know, when we were approaching a new system to the National Science Foundation, we said, these are the kind of, we know what percentage is by users, by SUs, by sorry, core hours that our system uses. So we didn't have to based on surveys and stuff. We just know. But we learned, it's surprising what we learn. Every time somebody asks me a question, I say, oh. Robert, so how do you actually get Exalt in production? Do you have a module for Exalt or you get it by in the path of our users? And then they cannot really unload anything. So it's always by default enabled. So what we do is we have a, it's a module. And I actually like having it as a module because occasionally Exalt can cause problems. It's rare, and it gets rare all the time, but something happens. And so it's a module. But on our system, every user, well, by default, every user gets the tack module. And in that tack module, we have Exalt. So you have to explicitly unload the Exalt module or do you have to do a lot of extra work to make sure that Exalt is not there. And as I said, I'm not going to collect everything. So if some users don't have Exalt, that's okay. Most of our users are so unsophisticated, they don't know what modules are. And so it just happens. But you know, since I'm sampling, I can't get everything. I need another system, almost as big as the system I'm trying to measure to try and keep track of it all if I did that. Can you comment again, why you only, you don't want to take like very small, very small job, why you don't want to track that? I'm sorry. You asked the question again. Why don't you want to track very, very fast jobs? Because, well, so what happened to me was, on one of our systems, somebody was processing image data to do dark energy studies, in a dark manner, dark energy, I'm not sure which. Anyway, each job took, each program took about two seconds. And in two hours, they generated five million entries. And it took me five days to read in those five million entries. And that's just not feasible. I have to read it in less than a day. So I just cannot track everything. And do you have any plans to drop the MySQL support to something like Elasticsearch? You're not required to use MySQL as a database collector. It's all, it breaks into two pieces. It generates JSON. And then I provide a tool which uses, which reads MySQL. But Kenneth and I, in a day, got, well, he did most of the work I watched. I got an ELK thing set up. So if you understand ELK, it's quite easy to do. My problem with ELK, at least my limited understanding of it, was it throws data away in, I guess you can dynamic control, but throws away data in like three to six months. And so that's not something I want. The other thing is that I don't understand things. Simple databases don't seem to, again, this is me speaking with limited knowledge. I'm sure somebody else knows more than I do. Exalt generates on many, many different one to end relationships. So an executable has one to 10 to 15 to 20 shared libraries. That gets complicated with databases that are designed to be fast. I've never found a good way to do both. These time series databases are great, they're fast, but they don't really support one-to-end relationships. At least I don't know how to do it. So the answer is you can do what you want. I'm, I provided one of those choices, but I don't, I haven't had time to look at ELK. It certainly, certainly would be something that we could also support as well, because it's just going to SysLog. So SysLog could be processed multiple ways. Okay. Hi, I'm Chris Mesas. I'm thanking for, on behalf of Victor, he's just nodding now. I'd like to comment on the issue of the static binaries. At least on our side, we see a little more frequently than in the past that people request software for a build form written in esoteric languages like Go, Rust, D, then with LMVM support and stuff like that. And instead of providing them the entire trunk, which goes with it, we rather install indeed a static binary, if it's at least performant. And I don't know how other sites here would react to this attitude, but at least we would actually require then that static binary is, can be installed and would like to see it tracked. Fair question. I mean, as long as, I mean, you know, that then to track it would, my understanding is that you can't use the LD, LD preload trick, but you can set up Exalt to, you know, use the linker and insert code in there. And those all work because, you know, the, at least as far as I know, it works. I've yet to see a case where it doesn't, where as long as the, you know, the Exalt's LD is there, then it would be tracked. But, you know, I don't have a, you know, my solution is limited to those two methods. Hi, Robin. It's Amy here. I have a question, kind of best practices kind of question. So you have these two methods in Exalt, Hayak in the linker using LD preload, but I guess both have their downside with, you know, we use easy build. So there, you can, you would have different linkers spread out on, well, on the, on the file system. So if you Hayak the linker, I guess you would have to make it aware in every, in each one of these installations. No. How does it work then on that case? Well, you have to be using Elmod, but Elmod supports a thing known as dynamic paths. So the Exalt module has a path that's more equal than any other. So every time a module gets loaded, it puts the Exalt binary path, I mean the Exalt path, the Exalt bin path ahead of everybody else. Panned, but with less priority than the, the path of Exalt kind of thing. Yeah. Okay. Yeah. Otherwise, it has higher priority. It has a higher priority. Yeah. Okay. Okay. That makes sense. And so for people that are using Exalt already, because we aren't, how is people normally using it? Are they using the Hayaki of the linker or LD preload? A mix of both, because one requires that you have to recompile your stack and the other. Well, no, you don't. I mean, as I said, you don't, you, the only thing you lose is the ability to get this information. You don't have a watermark. But otherwise, you know the name of the executable that ran. So you don't have to rebuild anything. You lose a little bit. You have a preload set, right? You still have to set the LD preload. So that means that if you have a static binary, I will not, and it was built pre Exalt or Exalt wasn't loaded, then it will not be tracked. I think part of the answer is you would enable both to make sure that if you're hijacking the linker and you're building static programs, you will get that information. Exalt will pick up on it because it's included in the binary basically, and it will spit out all the data. And then LD preload you use for things like CP and move and things you haven't compiled yourself, but are dynamically linked. So I'm not picked up on those as well, whatever information it has. And that way you get most of the possible data you get. But the only thing you lose is static binaries that were built without Exalt being around. That's the only thing you lose. I had a question as well. Somebody else has more questions. Okay. One thing you mentioned is that you now have already support for tracking our libraries that are being loaded basically or imported. And you can do the same for Python, but it's not there yet. Is that right? Yes. I haven't had a chance to work on it, but it's not, I know how to do it. You know how to do it. Yeah. And I know how to do it for MATLAB as well. Right. It's just a matter of finding time. And what about Perl, same mechanism? I don't know. I mean, I could track Perl. I think it's, I don't know whether I have the ability to go into the use thing and have my own wrapper in there that could generate records. Depends on whether the Perl runtime gives you a hook or something. Right. I'm dependent on some sort of hook. So every thing like our MATLAB and Python, there's some trick that you have to know in there. Each one is different. But once I know the hook, then there's support for collecting it. Right. Okay. Sounds good. Any more questions before we close this talk and switch over to the last remote one? Do you have a time frame in mind for being able to track Python imports? Sometime this year. It's not hard. Yeah. Okay. Okay. Thank you very much, Robert. Thank you. I'm actually at work another home with a computer. I know that works. Okay. The streaming has started. So you're good to go. Okay. Thanks. Okay. Welcome everybody. So my name is Bart Oldman and I'm one of the, one of the easy built maintainers of the weeks at this point. And I'm working at Compute Canada, and then be a certain cluster organization, regional organization called Quebec and then for McGill University. And since there have been a lot of discussions about Python, I'd like to present our setup and a few things that we're working on upstreaming to the main easy build repository that could be useful for other people. There's some things we're doing completely different and I'll give it for you information you can learn from it. Or maybe use it at your leisure or not. So the overview of the talk is to give a quick instruction to our infrastructure and software stack. We shouldn't take more than five minutes because I want to focus on Python, introducing our Python module like modules really, but it's just three or four variations of the same thing for different Python versions. And then how to use a single module with multiple Python versions and virtual environments and wheels. It's just basically our real, our way of dealing with packages. So the motivation of setting up our system like a common system for academic clusters in Canada is that we had our bigger national systems replacing many smaller clusters that we used that is basically every self-respecting research university used after our own little cluster, sometimes bigger, sometimes smaller. We now have bigger clusters, only five of them which are supported nationally. So there's many universities now who no longer have a data center, but they still give support and they have analysts working for them, but they just don't have the hardware on site. So the big systems that we're having now, there's a cloud system, University of Victoria, that came online about two and a half years ago. Almost two years ago we've had Cedar and Graham coming online. They're like general purpose clusters. They have many CPU cores and various CPUs. Niagara came online last year. It's a homogeneous system with 60,000 CPU cores. It's 1500 identical Skylake nodes with 40 cores each. And last but not least, we have Beluga which is sitting in the data center right next to me at the moment, about to be handed over to us by the vendor any day now, which has almost 45,000 cores and 688 GPUs, which is 4 GPUs per node on those 172 CPU nodes. And the V100s, which is a great advantage to people doing AI because they are really, really very fast on half precision for numbers, like a factor eight or so versus the V100s that are on the other clusters. So we wanted to have a common interface. So when people switch between clusters, they have pretty much the same module field. There is internal differences in the clusters. They could have omnipar for melanox, infiniband. They may have a different file system. So have clusters on GBFS, etc. But we want to have like a common user experience. So we need a distribution mechanism for the software, which we do via CBFS. And also we'd like to open the possibility that the underlying OS can be different. So it should work on not only CentOS, which both clusters are using, but also you move to the right, etc. And we do that via a middle layer that sits on top of the OS in a prefix. We're using NICs for that, but we're also testing another solution to prefix because we had some issues with NICs. And then we use Easy Build, of course, for the automated installation of the software scientifically oriented packages. And then we use Elm Mode with a hierarchical structure to make it scale well and not to have a module fail with 10,000 modules. So this is our design overview. At the top, we have the Easy Build. Below that, we have a few Easy Build-generated modules around NICs profiles, which seems like good idea at the time, but we are mostly not doing it anymore. We use it mostly for GCC, where we use a NICs GCC that sits below the Easy Build modules. And then all the kind of boring stuff, but including Tlipsey, all the tools made by SCAT, all the new utilities are in a mixed layer. So we basically filter out all these boring dependencies of libraries that are just dependencies for other software. Many sites make them hidden, but we just put them in a Tech Antic middle layer module, which is NICs we're testing 10.2 as well. Then the gray area of some very system-specific libraries like CUDA, Luster Client Libraries that we can distribute on CVFS, but we don't have to. And the OS kernel, of course, needs to be local with some restricted software tool like Fash that doesn't allow us to redistribute it in any way, even if it's internal. So CVFS, I don't really have to talk about what it is, because it's been extensively covered in the previous talk already. So I'm just going to skip this slide. NICs. So this is the middle layer that we provide via module. We, using NICs, we prevent that in low-level modules that I just mentioned. We have a long list for the filter depth configuration option for Easy Build. So many things are just not, even if we take an upstream easy config, those dependencies are just not taken into account, because we just provide them via NICs, maybe a slightly different version, but it's usually new enough. We have an updated T-lipsy, so many people can download random binaries, then change the patch elf on the downloaded binary, and it will work. And we don't have an outdated T-lipsy like you have on Santos, generally. And we can just add any library that we like, depending on system. NICs has hundreds of OS packages out of the box, but we had some drawbacks, namely NICs is very self-consistent in itself, but it requires that users of NICs also have to use NICs, like random developers on our clusters. To take full benefits, we have to use NICs themselves, which is an unrealistic expectation. So basically, we say, what do we need NICs for? We basically do it to put things like common dependencies in a prefix, but there's a simpler solution to get that working too, which is sent to prefix. And that's matured quite a bit over the last few years. It's actually also available in Assurance CBMFS. It's quite a very mature system of building software using eBuilds. Maybe EasyBuild can learn a bit from it too. And it doesn't mean we're completely far away NICs. There's still a NICs module on the Krayam cluster, which is full-featured, and people can install their own software in NICs. And so we just make it separate. We say, okay, we don't use NICs in our own fashion anymore, but we keep the NICs full-featured for people who really want to use it. So, there's the statistics there. We have a team installing software. So now we have thousands and thousands of modules. Over 2018, there was a bit of a bump towards the end. October, November, as my colleague Maxime Poissonot, recompiling a lot of things for newer tools, ATC 7.3 and Intel 2018. And you also see the bottom, like a lot of Python wheels, which I'll get to in a moment, like we have about 1500 of those. So to get to the core of the talk is the Python module. So there's been some discussion about moving Python down to the TCCore level. We put it even lower. We put it in the dummy level. So basically, we use the the system compiler, which in our case is TCC 5.4, which is basically the NICs compiler that we started with. And it's pretty much a module that looks, the easy config looks a lot like Python bare, easy config, except we have a couple extra options. One is our month extra paths, which sets a Python path, which is just one file in it, which I'll get to in a moment. The second thing is that we set the PIP config file is that we have a customization for PIP. So people can set up their own virtual environments. And using this PIP config file, it will download wheels from our local wheelhouse instead of downloading them from the internet. The last I just mentioned, what we install in our Python module is a very short list of extensions. We only have five of them. This is like the minimum that we need to install wheels, really, set up those virtual and wheel and our read line is just handy in general. So the two lines that I make in dark red are the things that I get to in more details. First, I'll get into the month extra path, the Python path, what's it doing? So in that, we have one file which has the name sidecustomize.py. This file is documented on Python documentation and it just is loaded by default when you start Python. And so it can add things to sys.path, which is basically an equivalent of Python path, but internal to Python. So what it does is that it looks at an environment variable, which we named ebpython prefixes, you can name it anything you like, and then combines ebpython prefixes with the actual Python version. So it takes the path from the ebpython prefix, then adds slash lib slash Python plus sys.version, which means Python 2.7, Python 3.5, etc, slash side packages, and then it adds those paths to the standard paths using sys.adside here, which means it does all the processing that's needed, like bth files and other things that are in those directions. Why do we use this method and not use Python path like the standard modules do in easy builds? There's two reasons. One is that Python path is not version aware, like if you set Python path to a Python 2 module and you start Python 3, then Python 3 will still try to use the Python 2 Python path stuff and it will just not work and vice versa. So it's very hard to use if we have multiple pythons available at the same time, which is the case because, well, there's always the system Python 2 that sort of floats around in the background and Python path affects that too. The other problem with Python path is its overrides virtual ends that we sometimes saw that people were installing things in a virtual environment but also had Python package module loaded and the module Python path was overriding but they installed their virtual environment, which is kind of counter to the expectation they expected in virtual environment to be more important than the module that was loaded. So the EB Python prefixes is designed to be, to sit at the lower priority than virtual and install packages. So how are we using EB Python prefixes? This basically allows us to have a single easy config and module for multiple Python versions. So instead of having the Python version in the suffix, which is often the case in the standard modules, like you have boost slash 16080 slash Python 3 6 3 and slash Python 2 7 15, etc. We have a single boost module and then depending on which Python you load at run time, it will look for that particular for the particular Python files, which is kind of the way it happens in standard Linux distributions that don't have modules. So what we put in the easy config is a new option named iterate build dependencies. This name is up for discussion. I just put in a PR on easy on the framework, where I basically added a new dependency parameter called iterate build dependencies, which is on top of build dependencies and just iterates through a bunch of dependencies and use them at build time. It's the similar mechanism as you can already do with build ops and config ops, you can iterate through them. And then the build is just happening and times with different modules. And then in the in the software that uses Python, we put in what extra pass it goes to Python prefixes. And now if it's an extension, we could put this kind of mechanism into the easy part as well. So there's something that we're working on. So how this works in practice is that a module may set in this one of your file prepend path, e.b. Python prefixes. And then for sci-pi stack module, it puts the prefix slash core slash sci-pi stack 2019 a and then Python 217 15. When you start it, it will look in that directory slash lib Python 217.7 slash size packages. And Python 363 will look in the Python 3.6 directory. So it's all when it's working, it's pretty straightforward. Just you have to hope that other people don't use their own side customers. I would overwrite hours so far that hasn't happened. Now the iterate build dependencies is something that I just recently implemented. We used the hack around it by just putting explicit module load statements in the easy config files. That's a bit of a hack. So I'll try to contribute to do it properly. This also iterating for dependencies is a bit of an intrusive change within the easy build framework. It's not a lot of lines that change. It means that the modules that are loaded are not always the same. That brings some expectations. And so I need to go through iterations to get that fully pressed right. As an aside, there are some drawbacks of using dummy Python. They were discussed on the mailing list and on various comforables. Generally, whether you use TCC or Intel compiler to compile Python makes no difference because Python by itself is slow in any case. However, there was one issue that was found by Damian. And he put it in the GitHub NumPy issue. Namely that exponential, right, mathematical functions like sine and cosine aren't using the faster libimf in NumPy, even if NumPy is compiled using the compiler. Even if libimf is linked and set up, it still will use libm because Python is using libm and the symbols that are loaded in memory come from libm. So even if libimf comes in, the symbols aren't overridden from the main Python. The only workaround is to use a LD preload. No, multiple workaround, sorry. So the workaround is to use LD preload, which is a bit cumbersome to use. Another one is to just ignore this because most problems don't spend 100% of the time doing exponential and trigonometric functions. Another way that you use a hidden Intel compiled Python executable that you just use whenever you have some Intel compiler. Another way which will help make this problem go completely the way of the dodo in the future is that newer Tlipsy have a much faster exponential and trigonometric functions. Hopefully this will only get into Tlipsy on clusters maybe in about 10 years, but we work around it by having the prefix. I read there is some work going on in the Tlipsy community to actually separate some of the user functionality to make this stuff easier and more earlier available on clusters to make it independent of the system's specific part. But what they did is to relax the precision of certain of these functions. It used to be at 0.5 U of P, which is an extremely good precision. It's like the ideal precision. But what they did to get such good precision is that in some edge cases the Tlipsy actually went into a multi-precision path where there are certain values that you can feed into Tlipsy where x takes 20 times slower than a value that is 10 to the minus 20 higher or lower than the slow value. But some people took some people completely by surprise. So they made Tlipsy a little bit less precise from 0.5 to 0.509 U of P, like a rounding error. But now the performance is compared with LibIMF and sometimes even slightly faster. And LibIMF, by the way, has as much worse U of P numbers like 1 and is even a fast path which is like 5 to 9 U of P. So it's still very good. But that's that issue and it's sort of plaguing the decision whether to go with TC Core Python or WPython. Then the other thing that we're doing is to provide virtual environments and wheels. So we basically do not provide a full Python with a lot of packages built in, but we only provide a base one and a SciPy stack module for the most commonly used packages like NumPy, SciPy, Laplace, etc. Everything else goes into wheels, except for a few things that are higher key dependent MPI for Py and some packages for Python bindings like OpenCV and Boost. But everything else goes very virtual environment. So that's like a common case. People use TensorFlow, PyTorch, Scikit-learn, all these things they don't use as firewalls. And we actually recommend people to set up virtual environments dynamically in their job scripts because most of the nodes have a local file system of the SSD and they said, okay, load the Python module, then set up your virtual environment dynamically and by using PIP install with the requirements of TXT, you can basically say, okay, these are the exact packages I want to load and they just get installed on local files, when it runs fast. And then the requirements that comes from PIP3, so you basically set up a virtual environment in your home folder first like a test environment and then you do PIP3s and basically it can save the virtual environment and then you can reproduce it using PIP. Now, the wheels are not without their issues. So we've run into various issues, but we basically have solutions for them. So what we do is that we have an arch-specific configuration file for the wheels for PIP, which is called PIPAVX2.com for PIPAVX512.com, et cetera, which then refers to two directories, two wheelhouses, one is arch-specific and one is a generic wheelhouse. And it sets prefer binary equals true because what happened in the past is that sometimes PIP was installing a package from the wheelhouse, but it has some dependencies listed and then PIP is very dynamic and thinks, oh, the requirement is that package X must be version 0.20 or newer. Oh, but I see that there is a version on PIPI that's newer than the wheel that we have available and it just starts downloading stuff, which it didn't download the day before just because PIPI has updated something. It's something I want to prevent because it's not reproducible and it also takes a lot longer. So we said that prefer binary equals true. We said, okay, we want to prefer the binary wheel only if we don't provide a wheel, it will fall back to download PIPI. Prefer a binary doesn't mean that we download binary wheels from the Internet. The binary wheels from the Internet are usually called Mini Linux 1, which are not very good wheels for our HPC system. They're very generically compiled, basically meant to run anywhere, but they're not very optimized. So we basically disabled them and we can do that by putting that found in Mini Linux or PIPI in the Python part. It's aligned many things, one of the competitors falls and then those wheels are not downloaded. The last note is that all our wheels, they sit at the core level, which gives us some challenges because sometimes, sometimes weird things happens with lip standards, C++, with sometimes this scenario that we load Python. Python is not compiled with C++, so there's no lip standard C++ in memory yet. Then some Python package loads, which is linked to the TC640 lip standard C++. Then we load package number two, which is compiled with a different TCC and we get into simple errors because everything, all Python packages are using .so that all live in the same memory space and so we get simple errors. So we are now avoiding that by basically trying to recompile as much as we can with TCC730 and just using the lip same lip standard C++ anywhere, but we can't shield ourselves against users compiling with their own TCCs. So at least we know where the issue is coming from. In the end, maybe we'll just make sure that the newest lip standard C++ is always used because these libraries actually backwards compatible. The other way is, of course, use a hierarchy throughout so that you cannot load a Python wheel compiled with TCC7 if you load the TC6 module, but we found it a bit too constricted and it means we have to build a lot more wheels for a good reason. So that basically concludes my presentation here. I'd like to credit the other members of the research support national team in Canada led by Maxime Vachonos. We are responsible for setting up this SOPR stack of the Asian ticketing system and some of the helpers for the NICS experts on the sideline, Tyson Whitehead, Silverio, Arthur Prentice and my colleague, Casey Chen, who started it as an experiment, so we're building on that. Also, thanks to EasyBuil, we took a lot of clues from other sites. Univerity of Kent, of course, a lot of ideas from Julie's and Robert's speeches also. Canada does not work for computer. So this concludes my presentation. Thank you very much. Thank you, Bart. Let's see if there are any questions in the audience. If not, I do have a question. So you're working on the iterative build dependencies thing in framework. Well, one thing that I immediately start wondering, but I haven't taken a close look at the PR yet, is why introduce a separate parameter and not just extend the scope of the current build dependencies one? I actually used that method of extending build dependencies in my first patch, but I found it was getting messy very quickly. If there are common build dependencies, you basically have to mention the common build dependencies twice or three times. So suppose there is a common build dependency, like whatever CMake or something like that, and you put it in the iterative build dependencies, then you have your Python 2.7 plus CMake, then Python 3.5 plus CMake, then Python 3.6 plus CMake. You have to mention the CMake three times, and then the framework would have to detect the duplicates, too, to avoid loading the module three times. Well, maybe loading the module three times doesn't hurt, but it's still a performance issue, even if it doesn't hurt. Right. So what you're doing now is the ones listed in iterative build dependencies, you load the first set, and then in the second iteration, you unload the first set and load the second one. Yeah, actually, this is one of the issues of the PR I'm fighting with because unloading it, I haven't quite figured out how to do that. So basically what I do is to save the environment of the first iteration, and then I restore the environment of the second iteration before loading the modules. Yeah, properly, you would unload the modules, but and then are you aware of any other changes that we would have to make to make sure we can have a single installation that's compatible, let's say with multiple Python versions? I mean, we would need to get the script in there, decide customize somehow. Yeah, that's all part of the Python module. It's in the Python installation. Yeah. And then that's it? Well, there is, there needs to be somehow to get the IPvPython prefixes set. So that could be part of the Python extension, easy block. Yeah, just like a flag that you say this is a multi Python, or I guess you could auto detect that if you have iterative build dependencies with Python in it. Yeah, that implies that you need to set. Yeah, yeah. Yeah, but you even want to use it from configure make easy config. So I'm not sure how that would work exactly. Then it needs to go in framework. Yeah, just make it the general easy config parameter in framework where you can enable it or you just add the auto detection in framework. Yeah, we can figure it out once we get there. Yeah, these are all technical details. I think the most difficult part is to get that iterative build dependencies. The rest are fairly simple tweaks that I mean, I can already just put them manually in the easy configs that I showed you, just to get it more higher level. And that is automated that needs a few more tweaks, whether in easy blocks or in the framework. Okay. Can you go back a couple of slides? The question popped into my head, but I forgot what it was. If I see the slide again, it'll return to me. One more. One more, I think. Was it here? I forgot what it was. Maybe it was the next one. Yeah. Yeah, I guess I forgot. Or yeah, the comment you made about, you could have the problem that users have their own side customized by and that gets in the way of making this actually work. Is there any way to avoid that from happening at all? I'm not aware. I think users can always even modify Python path and put all kinds of things in there. So, but it just, it could be that some software is depended, customized or pie.