 Okay, so hello everybody. I'm Basilios Caracases, currently working at NVIDIA as a senior performance engineer. Formally, I was at CSCS, so several people from this audience know me already. And I'm basically now a free-time contributor to ReFrame, but I'm a user of it as well also in my daily work. And today we're going to present this self-induced. There is an advances in the framework. We haven't done this presentation since the last years. Is it a user meeting? And it's already like almost one and a half years, so there's lots of things that they have happened. So I will start to try to be quick so as not to overrun the time because there's enough of things. So the presentation is practically two parts. My part I'm going to give like a community update and overview of the ReFrame 4.0 changes. And also I'm going to present some of the less of the brief 4.0 features that are less well known. And yeah, with some examples. And then Theophilus is going to focus more on the program configuration. That's featured in ReFrame from quite a long ago, almost from the beginning, and how you can leverage this to do container-based testing and also user environment-based testing. Yeah, most of you know already what ReFrame is, but here is like a recap. So it's a framework for writing system testing and a system and performance test that has HPC features. It's not limited necessarily to HPC, although it has started from HPC, but you can really write from your laptop or for other users. And the key advantage is the way you write the test, which is a very composable way. You can compose test. You can define variables, parameters. You can have test dependencies and so on. And there is also support for several HPC schedulers, motor systems, speed systems, and container and times. And integration with Elastic and Greylog, so that you can feel directly the performance data of your tests. Also CI integration and tests are also executed in parallel and the runtime takes care of that. Yeah, the community has grown since last year, and we are happy about that. We have about three to four hundred unique readers from all over the world monthly for the docs. And we have our Slack workspace has more than 230 members. We recently updated the link due to hierarchical becoming like no more free and also Slack now has an invite link, a permanent invite link. So you can use this link to join the group where basically asking for support questions or whatever else you're interested about to refrain. GitHub. And since last year we have moved the repo under the refrain HPC GitHub group. And this group also holds some public forks of site test repositories so we want to make it like a place where it acts as a reference. So that other people can find like this and the examples from what sites are doing. Yeah, it's it's it becomes gradually community project as we have like little contributions from from up to 45 contributors since the beginning. And we have the backup which is which is public, and you can see it in the refrain HPC org. And, yeah, don't hesitate to give it a start. And there's also pipi packets, and which is quite popular there and I don't know how accurate this pepitech metric is. But it also shows a weird thing that 3.11.2 version is the most popular one I don't know probably a CI bot is downloading downloading I don't know. Sometimes today, I don't know. But it's interesting. Yeah, we did some changes in the way we're moving on with development since 4.0 so we introduced a develop branch. And so all new features go to this, and also the latest docs point to this, and it's also the default brands. Master to remains as the release branch so every release goes through master. And also bug fixes documentation updates and minor enhancements that directly master and periodically we have, we sink the two branches. The advantage is why we ended up with this is that we have quick release cycles for bug fixes. So, one or two weeks and then, and then we can also more accurately follow the semantic version and scheme so whenever you see a bump, but level bump level, it is indeed like either recommendation and date or about fixed before in the past, we might have been adding also some functionality. Now this happens only when we have a new release. The advantage is that we have to sink the two branches but so far this has worked good for us. And, yeah, it might be a bit confusing if you want to contribute as to which branch you should but there is a small guide for contributing. So starting with the changes in the frame that for all of the deprecated features in three the text has been brought so drop so if you're getting duplications three the text and you want to move to the frame that for, we will have to fix those replications otherwise, you will we introduced one more than one big deprecation in four, which is the body attributes or the variables configuration parameters for the environments. Now this is called and vars to more accurately match what exactly is and so you might get lots of deprecations if as soon as you move to the frame dot for to refrain for but especially for test this is really a set like replacement so it should be easy to fix. Now some of the new features is that the configuration can now the configuration can be seen. We did have scoping in the configuration through this target systems. Also in the past, where parts of the configuration were to be targeted for specific systems or partitions. So we can split the configuration in multiple files and all this will be combined and you can send them either from the command line environment variables. And this is quite helpful because for example this configuration here is a valid minimal configuration so you define your system and the environments relative to the system you don't have to copy anymore any configuration from the frame and you can send other sections that you don't want to play with like the logging or, or add systems that you don't don't care like the building etc. These are combined automatically. Now, the final also look under for handling performance logging has been revised fundamentally. You're reporting in performance logging specially for the file on traditional way of logging was a bit left behind and say, but with four to zero I think it takes another another spin. So the default output is like is CSV. So that can be in post process and then import it whatever you want and get your performance data. If you don't want to use other other means of like sending the data to elastic etc. We also in the header line and we find and if for whatever reason this header line changes and that could be because you are the new performance, so a new performance variable for your test. Or you change the way you name it, a new log file will be generated with updated header. So, then you have your logs cleaner. And they're not mixed up. Another thing is the custom parallel launchers we have seen that several sites have really need to do some small to actually have even the really custom launchers and in the past you have to do that inside the framework. So either you have to submit a PR, or you have to maintain a separate for. Now there's no need to do this as you can do that in the configuration file. I haven't exactly the wrong so I will show you how. But if, if, if your launcher is is is good for the community. Yeah, please submit a PR that's that's the purpose of it. We have two new buckets, we support the container container platform. It's basically, yeah. Same as the similarity. And thanks to Vanessa's contribution. Vanessa socials. We have the flux framework support. Yeah, another thinking for the zero is the new name scheme. And we introduce a new name scheme especially for parameterized test and test with fixtures doesn't affect the classical test, the test. And now the name encodes the information, this information in a more human readable way so you can see for example, the name of the test. You can see the test after a column, and there is an associated house, and then you can also optionally see the test case info that is where on which partition and with which environment, this is going around so you can have now an idea of exactly what's going around, if you just list the test. Then you get the dependencies. Then you get the fixture scope. And also the body of name of where the fixture is bound and that's useful for example in this example because for example if you want if you want to set a body of linear in a deeply nested fixture. You can track it through through the names of those right up so for example here you can say also or or if you test dot also binary dot also benchmarks dot version and seems the version that you download test can can be filtered by name by by hash or by the variant number. All with a minus option and some post for the zero features that are interesting is the dry run option that was introduced before that one. And basically generates all the scripts that will be executed and tries to validate as much as possible of the test it doesn't execute the test. It should work out of the box but they might be easy that your test assumes that some something is produced because because it has run. Assume that it has run so then your test can still adapt to the trial mode by calling the set is is dry land. We we now allow to do custom for formatting of station records that are to be sent to elastic. And the reason for that is that some elastic servers might impose some schemas, and this is our entirely site specific, I have an example showing how to do that. And the fresh, really fresh one is the reruns and duration options that are for from refrain for the tool which was released last Friday. So the idea is that you run the session multiple times within options or you read on it for a certain period of time. And the statistics you get to get them from all the runs. It's, it's not like the max retries option. But of course you, you should use, use with care. This one especially it because that can be used also for stress testing the system. Going a bit to other features now. This one is is really something that has come out from discussions from this community and the easy community. What was that with prior versions of the frame was that the test was somehow bounds to to the actual system to the system or the environment by binding essentially the system name in the valid systems and the environment name in the environment in the drug environment. So since 3.11, we have completely changed that. So in the configuration, you can define features or extras, which those are selectable in your test. So you can say, for example, the value system is plus CPU plus IB, and that means that this test is valid for any system that defines the CPU and IB feature, or it can be an or like with common this list or could, for example, this test is going to be valid for any programming environment that's going to be tested on a specific system that does not have could a feature feature, or you can have, you can match specific extra values. And that's kind of completely decouples and makes the problem of defining. It's basically defining a contract of feature names and attributes, we have not standardized any books in the frame. And that's we kind of leave it to the sides and perhaps community effort standardization of those things comes up and that could be really helpful. And that's more in his part in that. One of our other features that are quite interesting is the minus S option, which is setting the test variables from the command line that's I use it like very often, because you can modify the behavior of the test from the command line. And this is from 3.8 with subsequent refinements, I totally recommend to use five minutes and use that option. Others as the repeat and distribute options which can specially very p3 clones the test practically and times and times. Whereas this can distribute the test single note test on individual notes and around and pin them on the notes. Another useful option that is is popular and is, is now already like a couple of years old is the CI generate. And where the frame can generate github CI side pipelines, where it will run different tests. And now we have added support for the test to add to basically modify the generated CI pipeline. Another one that was I kind of rediscovered recently. And it's a pretty, pretty old feature since 2.5 I was looking at the history to see when this was introduced is the minus and minus mode command and remember when this was introduced the idea was that you don't have, especially we want to hand off a street to somebody else from, you don't want them to remember a whole bunch of command line options, but it turns out that especially if you do like performance testing or working interactively, or you want somehow to record the, the way you have experiments, it's, it's perfect fit, because you name the experiment and your options, and then very easily you can combine them. For example, here I'm just running the bear, the bear mode or like the bear experiment, the baseline experiment, which is all these options. And then, oh, what if I increase for example clients to 10, or what if I set another body in my test. And then of course you can also change like other options, because those options will take precedence over those in the mode. Yeah, that's the programmable configuration, I will probably skip that due to time because they will gonna is going to cover that. And yeah, I'm going to just go to some of the example which I promised from some here is the custom launcher. So here it's very easy so you just define your launcher and has single command that takes the job instance, you can find what the fields of the job are in the, in the dogs. You register it with a name and all that in your config and then you can use it. And here is, is the, the example with a custom record for mother, where you define like the hctp json that's going to talk to an elastic server, and you say okay I want to format to change the way this format, the record before it's being sent. And for example this case, I'm just adding the my underscore prefix. And this are also documentation that you can also see. Now, come to that. Okay. And here is a very interesting but quite advanced feature. This called the make test API call that basically allows you to create test programmatically. So for example, this test here is class yellow test you can create it programmatically here with the make test, giving it the class name, the base classes, the body of the class, the methods, and so on. And then you can use of course the creators you can use us as you would hear so if you see there is almost one to one machine. And that's very powerful because you can. It opens completely new horizons and how you can build the talk of inflating myself. And, in fact, I was, I was, I wanted at some point I had some workflows, and that's not string what was the other work flows that I wanted to combine and run it parallel but they want to have this big test file and this could be also I want to be dynamic. So here I have the same with the stream benchmark. And for example, imagine that you have written a stream benchmark. As, as there are as it is in the tutorial, but then you want something on top of that like domain specific. And you have this. And here I have a mini language, for example, string workflows, and then some of the variables of the test, and also a thread scaling, if I want to scale. And now this spec is generating those tests now I have, I have skipped some of those. And then you see test there and their fixtures, and so on. And this is literally happening with this. With this code, there are some imports that I didn't have enough space to put in. But the key to get is that everything happening in a normal test file that you would, you would load with the minus C option. Now the spec file is passed an environment variable that's called stream spec file here, and loads this specification in general. And then from that specification, we generate the test using the make test. And finally, we register them to register them to the framework with the RFM single test. It really allows you to build on top of refrain, and essentially use all the machinery and all the future it has, but that's a bit advanced but also quite cool. I mean, it might have like edge cases and excuse cases but it's interesting. I'm not going to go through this example for the sake of time. But there is this pull request because the idea is that to convert this to a tutorial so the people can try it. So the example the whole code is in the pull request yet but there's no text. So the only text about it is what I said in this presentation. Now, as future I look, one thing we wanted to improve is really the reporting and post processing of reports, we do generate like detailed reports, but it's not easy for example search and compare with past reports etc etc and that as you're kind of daily using in something that you're kind of missing and it's something that has come as a request from several people. And another another interesting thing is generalizing the system and the auto detection method so that it can be, it can be easier, adaptable to other environments like more cloud based where for example the host name is not a way to pick the doesn't give you anything about the system and it can be a random host name so you want a different way to retrieve something that you could use to identify which configuration entry to work. There is already a draft PR that would address this. It's a first attempt to address this. And we want to merge that in four to three hopefully. Yeah, other is like it has worked with test parameterization, however cool it is, you end up okay what if if I change that parameter like to different range. Now you realize you have to go to your test and change that or what if I have a body and I want to create a parameter out of it. And again, you have to do it manually. So this would be really easy to deal if you could do the problem online, and also perhaps generalize the way that's between. And as I said, I mean, we're barely one with it's a community project. And so if you have anything to contribute you're welcome to do so. That finishes my part and I'm giving I'm giving it to them. I hope I didn't. Can you see my presentation. I can see it. Can you hear me. Yes. Okay, good. So Kenneth is it okay by your side. So you can see my slides. If not, let me know so since we are short on time, I will start quickly. So my talk is a continuation of what was presenting. So it's called embracing refrain programmable configurations. So this is a brief outline how the configuration of refrain looks like the different flavors that we have. And then I will present two use cases to apply what I'm going to talk about programmable configuration for container use cases and user environment based program configuration, and then I will end up with some conclusions. So a brief overview of how the refrain configuration looks like. So we have two flavors of the refrain configuration. And one is the Python one and one is the station. I will focus on the Python one, since it's the one that is fully programmable. So the reason that why I want to go to the programmable pipeline is that up to now. These things that Jason one in the Python were more or less one to one, in the sense that the Python all this information the Python one were hard coded. So we expected the system that we that we were expecting that the system that we are going to London had all the features that we are going to test. And therefore we gave it the list of tests and it would test all the programming environments according also to a specific test requirements. So again, refrain can work with both the flavors. It's easy to go from one type of configuration to the other one via the show config CLI option and save it is Jason five. And the Python one is the one I will focus on since it's fully programmable. And as facilities already set a refrain for allow splitting the configuration into multiple sub configuration files. And I have here the link to the documentation for the various configuration options that refrain accepts. So how the programmable configuration looks like for containers. So when I was, I started looking into how can start testing containers. In fact, each container can have very different features, and therefore I cannot give the whole set of tests that we have at CCS for example, and assume that the containers offer all these that the test required to be tested. So in fact, we can actually use the label instructions of a talker file singularity has also something similar for passing metadata to your container image. And you see here for example I can pass information that can be refrain specific. So let's say that this image provides also micro benchmarks provides MPI serial compilation open MP and CUDA. And here I, I describe what the compilers looks like here it's MPI CMPI C++ and they have 90. We have a similar one but without the good a version of also micro benchmarks. And here we have another image which can provide lambs for example, and here we have another one with a good a version of Gromax. So this metadata that is going to be added to the image once it's created can be accessed at runtime and be used to program the configuration that is going to be generated. So I will show how all this is how this workflow looks like. So for example, for configuration, instead of having a hard coded configuration, you pass the container images that you want to test as an environment variable, you get the image names with this here. And you can inspect using scope or each image and retrieve the features of each image. And if you use the image name as an actual programming environment. For here I create for each image and actual reframe environment, and I use I pass the features and every other information that is specific to reframe to the features of the environment, and the CC for example, or all the compiler settings that I want. And then I assign to a specific partition of my system the environment names. And finally I put the actual environments that I created in step four into the site configuration, and this is the actual the actual variables that reframe needs in order to to understand the definition of this environment is so more or less the final set configuration for each of my target partition will contain a programming environment for each specific container. And therefore, through this, the actual feed the actual features that the container image, the says that it provides can actually be tested by the corresponding reframe tests. On the test side how this looks like is, for example, I have here the reframe test using the features. In the past, I would say that my valid systems would be here, for example, dined a pitch dined and the GPU partition or pitch dined for the MC partition. So here I say that I don't require something very specific from the environment, but from the system but from the environment point of view, I require that this environment provides me with also micro benchmarks. And here, since it's a container, this is a test that is going to run via the container platform, I access the image through the environment variables that are passed by the programmable configuration. The difference here for the CUDA version of the OSU latency test is that not only my programming environment needs to provide the OSU micro benchmarks in CUDA, but now my actual system has to have an NVIDIA GPU. All these are going to be taken into account and the reframe will find out in which test will have to run in which system and which programming environment which here is our container. And to put it all together, here is how it looks. I just use the here I call it RFM container image. I pass it a list of images, split by a semicolon, and then I did the dry run here because otherwise it will not fit into the page. But I passed three container images and the reframe knows and maps which tests are required to test the features of each image. So for the lamps image, you see here and lamps. Here is my username on Docker Hub, but here a lamps latest programming environment corresponding to the lamps image here is created and all the lamps check that are needed are going to run using this container image. And then I have two flavors of OSU micro benchmarks. One provides CUDA and is based on mvapid2. The other one does not provide CUDA and it's based on mpids. I just have them here to illustrate the concepts. So here you see that the CUDA, the OSU bandwidth CUDA test will run only on the GPU partition on Dyned. And for these container images that actually provide both the OSU micro benchmarks and the CUDA. So here are all the tests that are going to run. And this is very easy to let the reframe do the mapping between what the image says that I'm providing as features and reframe knows and will run the corresponding tests to test if these features actually are working properly and are performing also as expected. Now, I will show you the same concept more or less, but here using something that we are developing at reframe which is called user environments. What sort of user environments are more or less cross-infested images which contain software stack which can be comprised of one or more programming environments. So they are created using a stack in a CSS developed CLI tool called Stackinator and we pass a YAML file which describes more or less the components of the software stack. And using another CSS developed tool which is a SLAM plugin actually. You mount at runtime via your bot script, these squad surface image on a specific mount point so you can access it on your computer nodes. And the good thing about this user environment is that they can make use of host libraries and that's the basic difference, one very important difference between the user environment and container images. Because some host libraries for example to take full advantage of the proprietary interconnect are on the host and are needed for best performance. And each of these cross-infested images is accompanied by a YAML file, which is currently a work in progress, which is consumed by reframe. And it's more or less a similar thing to the metadata that are provided by the container image. And this is consumed by reframe and using the features of the environment and their names, we can again use the programmable configuration and let reframe do the mapping and testing for us. So here is how this YAML file looks like for example I have two programming environments. Here the one is called Cray, I have also the modules that might be needed for this programming environment to work correctly. And here I have a LAMPS one and each one gives me the features that the environment provides. So it's more or less the same as the labels in the Docker files that I showed previously. Again here similarly to instead of inspecting using Scopeo, now I retrieve the image path past using environment variables, but you can set it up also to accept CLI options. Here I pass the image path and the user mount point that the user can provide to the SLAM options. So I use them as access options that are going to be used by the scheduler. And here I open the YAML file to consume the metadata of the image. I get the environment names, I use them as the partition environments and I pass the module path to use the module path as a prepare command. This is optional if you're not based on module or a module system. And here again I create the actual environments and use the features that I retrieved for the YAML file. I create the environment and pass them to the final side configurations. So putting it all together again here I export the UN file and then I can run all the different tests. So I run for an environment that provided only MPI and MPI compiler which has also open MP also it allows for serial compilation and also offers also micro benchmarks. And again here you see that the also micro benchmarks that would provide CUDA for example, here the also Latin CUDA are only run in a partition that actually specifies it has provided an NVIDIA GPU for you. So here there's a large list that I trimmed a little bit there's a large list of environments to test all the features for each programming environment that this user environment provides. So to conclude, the programmable configuration as I have shown gives full access to the Python language and the library so it can be very expressive. The final configuration can be generated on the fly based on the environments and systems that are tested. And this generation of the configuration on the fly moves the responsibility of the actual environments to state which features they provide and they want to be tested by reflame. So labels in container images is a useful way to pass this metadata that the reflame needs as features to map the regression tests. And by expecting the features that the user environment stacks provide by the other file for the moment reflame and test the corresponding functionality for you. So thank you very much for your attention and I will be happy to answer any questions that you might have. Okay, thank you to you and Vasilius do we have any any questions. Maybe have time for one is a very, very deep talk I think very technical so people not familiar with reflame. This was not easy. There's some very interesting developments being done. I was looking at Kasper here maybe Kasper has no everything was perfectly clear. Yeah, no. Okay, was too deep as you said. But at least it's there so people can look at it as reference. Can you repeat that Vasilius. Perhaps it was too deep or like assume like really knowledge, good knowledge of reflame but at least it's there as a presentation so it can be used as a reference for future in case people encounter the same things. Yeah, indeed. Yeah, so it was fully recorded it will be made available online. There's a question and zoom. What is this one. Ah, that's a good one so test names are looking more and more like a spec naming scheme. The question is, is that intentional. I mean, yeah, we got inspiration from that. But yeah, I mean it's the context is different but essentially spark uses. The symbols that they are not like interpreted by by said, but for example the dependence symbol is the same, but then we overload them, essentially. And the reason is that those symbols there you don't have to escape them. But yeah, and besides that, yeah. I'm sorry to say that, but I like those like little language of like specs in in in spark. But yeah, we use it, we have completely different meaning though, except the target. Yep. Okay, thank you very much we'll wrap up here.