 Thanks very much. I hope you can hear me okay. I'm going to do a quick introduction. Who am I? Well, I'm quite busy in the open source world. I maintain 20 or so packages in Debian and quite a few of those in my own bits of software. I'm currently a principal engineer working Intel. And before that I used to be a senior kind of engineer at Canonical, working on kernels fixing bugs and basically getting my hands into lots of different kernel problems and trying to get them sourced out. I've done quite a bit of upstream work mostly now it's janitorial work kind of fix and trivial bugs which I find a static analysis. Before that before working on kernel bug fixing I started working on laptops and trying to fix firmware issues. And so I started rushing the firmware test suite so that's a tool for checking biases. And that really led me on to developing stress and G is another tool for finding and fixing but well finding bugs in the kernel. So my kind of history is I really like finding and fixing bugs. And I really like it when people send me new hardware because I like trying to break kernels on new hardware. So yeah, I am people in the past to give me lots of nice bits of kit which I've tested things out on and broken kernels on them so that's kind of my background. So on with the show. So can I have the next slide please. So, a long time ago, about 10 years ago, we were looking at laptop issues where, when I was in clinical we had issues laptops getting a bit too hot, and we use the stress tool, which was kind of cool. It's a great tool written by Amos Waterland, but I wanted to add some more tests to it so I started reworking stress and G stress. And that really adds stress the next generation, hence its name NG. It's not a brilliant name, but you know, there we go. And the hard intent where there was to make laptops really busy. See if we can burn CPU cycles and hammer them memory and cash to make laptops overrun with thermal temperatures. We kind of did that. And that was really to compliment the work with the Intel thermal demon, which is a tool designed to keep laptops from overheating. So stress and G was really designed to make Intel thermal deep work hard and make sure that laptops didn't get hot before they went into production running your bunto. So that was original intent. But like core projects they grow in scope and new features get bolted on and the tool has grown over a decade. It's just interesting there at the bottom right there's a picture of a Raspberry Pi being exercised with stress and G and you can see the red parts are quite toasty and hot. So that's the intent. Make things hot. See if we break things but no stress and G has kind of increased in functionality. So let's look at what else it adds. If we look at the next slide please. So what stress and G do now. Well, I've really designed it to break and crash kernels. The main thing by that is, I want to make stress and G exercise memory cache all devices file systems hammer away on all the different types of kernel interfaces that system calls and CISFS and pocket fest entries and so forth. We also want stress and G to exercise memory in the scheduler and IO and basically hammer a system as hard as we can. So we exercise those weird corner cases where resources are low and bugs seem to pop out. And with that, we also exercise things with currency we stress and G is designed to run multiple processes and really exercise race conditions in the kernel to see if we can cause lockups or we're at weird null point of the references because that's been freed while another process is still working on it. The last couple of years I've been really also working on making stress and G more portable so it should build on most positive systems quite easily. And I've also made it compiler friendly so it builds on multiple compilers. So what is stress and G now well really it's like a sledgehammer, it cracks kernels, it breaks and finds bugs and this was a Swiss army knife because it's got rather a lot of functionality and it's it's like not one purpose for tool it's got lots of purposes. So without further ado, it's looked a bit more with the next slide please. For example, you know we don't just break the Linux kernel, although I am mainly working for the Linux kernel is stress and G stress and G has been known to also break other kernels so this is a bug which occurred in Dragonfly BSD, and oh dear spin lock bit for panic, and it's broken. And can we look at the next slide please. And this is open Indiana or basically Solaris kernel and you know Solaris I used to think was like a rock, rock hard kernel you could never break it. But oh dear I managed to break it with stress and G on a file lock cooperation here. So yeah stress and G being portable means you can test other kernels other than Linux. As I want to kind of focus on it really has been designed for Linux, and then other systems. If it if if I can support those then I'll try and get that working. So next slide please. Right, here's a quick overview of the stress and G design. So on the left hand side there's a box which shows you the main kind of structure of the stress and G main part of the program so it sets a time out alarm. And then it forks off a stressor process and I'll describe the stressors later on. And then it waits for the stressor to complete, and then it reports any metrics such as Bogo ops or memory consumption or how long it's run for. And on the on the middle part of the slide we've got how a stressor instance works. So basically the stressor initializes and allocates any resources it requires such as mutex is or opens files, or so forth. And then it basically spins on a very simple stressor loop. So basically checks if I alarm has occurred tell it to stop running, or a sigint like you hit control C, or if the Bogo op counter has been reached. So the increment counter increments the Bogo op counter for every loop around the stressor loop. And you can you can tell stress and G to run say for a set amount of Bogo ops, or you can make it run for a certain amount of time. The fine part is where the alarm works and the Bogo op counter is where the increment counter and keep stressing check, checks for the number of loops you do. And then when the stressors finished, it cleans up, writes some statistics into some shared memory which the main process can read, and then exits. So that's about it. Single stressor instance in this example, but stress and T can run with multiple stressor instances up to 4096. And those stressors can be any type of the any of the 280 stressors which written into stress and G. So this is this example here just shows you one stressor running, but imagine stress and G running lots of stressor instances in parallel. And that's where we can scale up the ability for stress indeed to really hammer a system and try and find bugs. So can have the next slide please. So as I mentioned, stress and G now comes with over 200 stressors. So these stress tests are cover a lot of the kernel space, and they do lots of different things. And when you combine them, you basically got a very powerful ability to hammer a system and try and force bugs. As I said earlier, you can run one or more instances, and they could be running parallel, and each stressor of the 228 cases can also be running parallel. So you can run a mix and match stress tests, run them in parallel. So the stressors are focused, they basically do one thing and try and do that well. So a stressor will not try and do lots of different things it tries to keep focused on one particular stress scenario, such as exercising memory or exercising cash or exercising the CPU in a specific way. So stress and G also when it's terminated, it can report lots of different types of stats. So stress and G can show you the load average. It's got the ability to dump out perf monitors so you can see exactly what the processor has been doing. For example, cash stores or TRB reloads and lots of really deep nitty gritty knowledge from the CPU. So you can see like how much memory is being used and the BOGAWAP counter as well. And that shows you some bogus measure of throughput for each stressor. So that's a very quick overview. Can we have the next slide please. So here's very, let's get into details. Here's a very simple example. The first example runs the CPU stressor. Here's the CPU one, which means just run one instance of the stressor, and the money to option says run it for five minutes. So stress and G will just basically run the CPU stressor which will exercise lots of different CPU operations such as floating point integer maths jumping around cash instruction cash usage and stuff so it basically makes the CPU as toasty as possible and run it for five minutes on one CPU. But we can do more than that. The next example shows stress and G minus minus cash eight that runs eight instances of the cash stressor. And also can run the memory contention stressor which will basically have in this example for M contend stresses which will exercise reading and writing memory and trying to basically exercise the memory. And this test will be detrimental to the processor. So it's fighting over memory reads memory writes in the same or different locations across cash and across memory. And this test, the minus T option says run it for one hour. So the T option you can specify minutes hours to specify days, or even years if you if that inclined will help seconds so the T option was used to specify the time and the number of stressors specifies how many instances to run. For example, we're running the virtual memory stressor and using the number zero says to stress and G run, well find out how many CPUs there are on the system, and then run the VM stressor on every CPU. And this is kind of handy if you don't know how many CPUs you've got, or you can't be bothered to put them in, you know the zero option will always pick up the number available CPUs. So, my laptop's got eight CPUs so minus VM zero is equivalent to running minus VM eight. And there are options as well. So each stressor might have extra options. So the VM stressor has an option to specify how much memory you can use. So this example I'm using VM bytes 95%. So the stressor will look at how much memory is available, and use 95% of that. Also I'm enabling the verify option, and that tells the stressor to exercise memory, and then read back the results to make sure there aren't any memory errors. Now when you put verify mode on, tests will run slower because verification takes a bit of an overhead. So if you're using benchmarking with BOGO Ops, turn verify off and you'll get faster results if you have verify on. So hopefully that gives you a quick overview of how to start different stressors and how you can run one or many stressors or you can mix and max stressors. And also how stressors can have extra options to drive them or configure them in different ways. So can have the next slide please. Right, so as mentioned earlier, the VM bytes option I gave it a percentage but you can also specify sizes in megabytes or gigabytes or terabytes. So whenever there's an option to specify size, it will normally be the stressor name bytes or, for example, VM will be VM bytes. And if you're stressing like the hard disk, then it's actually D bytes, and you can specify sizes in kilobytes, megabytes, gigabytes or terabytes depending on the suffix of the size, or you can even put percentage as a percentage of the total resource. And again, with times, you can specify minutes, hours, days, years. For time, the default is in seconds, you can specify that minus T30, that'd be 30 seconds, or minus T30S, that's equivalent of 30 seconds as well. You cannot specify fractions though, these are all integer values, I'm afraid. So maybe one day I'll address that and fix it properly, but currently all the sizes and times are in integer values. So hopefully that gives you a flavor of how to configure sizes and things. How is stress and G different from LTP? I'll answer that question at the end of the session, I think if that's okay. Thanks for that question. Right, can have the next slide please. Right, so one of the strengths of stress and G is we can use concurrency to hammer a system. So as I said earlier, we can run multiple instances of a stressor and multiple instances of lots of stressors. And these stressors are being monitored by stress and G as they're running. So stress and G can tell if the UM killer has killed a memory stressing instance, and it will restart that stressor over and over, if it gets boomed. Although one can override that default by using the unify option. So the unifiable option will tell a stressor stress and G to allow stressors to fail if they're never get respawned, but the default behavior is to continue to respawn and keep stressing. Now, some of these stressor instances themselves may fork off their own children. For example, a lot of the network stresses start up a client and server mode. So they themselves have their own children, and some stressors can use p threads and multiple p threads. So, you know, a stressor is not just one process it could be one or more processes or one or more p threads. And the idea of concurrency is really just to try and trigger soft or hard locks are locking conditions and places where the kernel can't handle concurrency correctly. In the past stress and G's been quite useful at finding these types of conditions, especially corner cases with race conditions where things aren't completing correctly. Okay, next slide please. So I'll quickly give you a view of some of the stress cases we've got so stress and G has been designed to stress CPU caches. So typically in modern processes we have a level one cache, which might be a split data cache or instruction cache or a combined data instruction cache, depending on the architecture. Nowadays, most processes have a level two cache. There might be even a last level clash level three or even a level four cache. And finally, we hit the actual memory. So the whole idea of cash dressing is to put weights on the different levels of caches by fetching data prefetching using fencing operations, flushing validating cash lines, and basically trying to hammer the adjacent cash lines to cause lots of havoc in the cash, and also on the memory. So this is kind of useful with with the development of processes and past work I've been doing in Intel is this kind of work where we have processes and we want to make sure the cash is working correctly. The other thing is cash dressing we can actually modify the cat instruction cash. So when you do self modifying code or instruction cash flushing or random branching we can make the level one cash behavior. Be exercised in a stressful way, we can cause lots of cash misses, and this is kind of useful for testing as well. Otherwise, other places where we can stress the cash of doing streaming memory rights or randomized memory rewrites, and also doing exercising shared memory across multiple CPUs in symmetric multi processing or even new style architectures. So the stress and he's got quite a lot of tools for checking out to see if the cash is working correctly, especially if I especially if you use the verify mode on these stressors, it will try and exercise and find weird corner cases where caches might be behaving incorrectly. Can we have the next slide please. I'll just quickly show you his examples of how to use some of the cash stressors. The first one uses the cash stressor across all CPUs, and it uses an option just to flash the cash so basically the stressor is spinning around, flashing caches all the time and checking if that works correctly. Let's go through. Let's add like the third third example exercises the level one cash only so allocates a buffer exactly the same size as a level one cash, and we'll keep on exercising memory, which will make the level one cash be hammered a lot. And there's a verify option there to make sure it's working correctly. And there's also a prefetch stressor the bottom example. So you can if you want to benchmark how fast prefetch reads are on the level three cash, the prefetch stressor with the metrics option allows you to benchmark how fast you can read from your level three cash. So, I have got give some flavor there lots more stressors but these are just a taste of some of the, some of the stressors just designed for CPU cash handling. Next slide please. Right now, system calls are really where stress and G tries to work the kernel hard. There are different ways of exercising system calls on the kernel. You can stress and G you can call Lib C and Lib C then exercise of course the system call which goes to the kernel. So that's one way. So if you say the select API. That's using Lib C which then calls it select system call. There are other ways of exercising system calls as well you can directly call them using Cisco or a little bit of memory called the VDSO. Now the VDSO has functions in a shared page, which we share between user space in the kernel. And it's a very fast way of calling a function which can read some kernel data and returning without actually jumping into kernel space but it gives you kernel system calls so get time of day is implemented like that. So it's super fast, but doesn't actually cross the kernel boundary. So stress and G exercises the VDSO system calls. Also on x86 we use the x86 Cisco instruction, and also the x86 int Cisco trap handler as well. So we try and exercise different ways of entering the kernel. And I think it's been about three years now I've been working on this and about 98% of all the system calls are now covered. I'm just missing some of the mount system calls which were added in a couple of years ago, I haven't got around to that yet. So the idea is basically cover as many of the system calls which are provided in Linux. And I'm also working on that for different operating systems as well. And the way we test the system calls is obviously we use valid ways of using it. So each system call has a valid use case so stress and T choice exercise or those different use cases. But also we've got ways of exercising invalid arguments. So we use the system calls in just stupid ways, wrong arguments, and ways which are kind of pathologically stupid just to see if those trip kernel bugs. Now this is not a reply replacement for things like sis caller, and this is called it's a fantastic way of fuzzing kernel interfaces. Stress and she's not intended to do that stress and she's just intended to use system calls. And with a large combination of system calls being used in lots of stresses, hopefully that forces out different class of bugs. So if you're really interested in breaking system calls use sis caller. Next slide please. So here are some examples. I've got a stress, a stressor called system vowel, which just uses lots of different permutations of invalid arguments. And a stressor which calls lots of different random system call numbers which are not valid. And, for example, we found a bug on the risk five kernel where the boundary wasn't set correctly so we're causing a course kernel error with that. And as I said earlier, we've got the video so being tested and also we can test x86 system calls. So there's a different, you know, mix of ways of exercising system calls as illustrated there. Next slide please. So other kernel interfaces, apart from system calls are the dev, sis and proc file systems and dead nodes. These use the traditional Linux mindset of everything as a file. So, for device entries and sis of s and proc of s files, stress and g will iterate over all the ones that can find, and it'll try and open them, perform art tools or file controls on them. And then map the read and writes were appropriate. And basically exercise is many different ways as possible on these different files. The whole idea here is to use like sis of s and proc of s in ways which maybe weren't designed just to try and catch bugs and shake out issues and we've had some success with that in the past. For example, doing seek and reads different offsets and moving around on a sec and read on the sis of s file. I think that triggered a bug, and that got fixed. And with device entries stress and g would detect what type of device they are such as block devices or character devices, and trying to determine what type of device, it might be as a subset of that so it might be a CD ROM device and then it will exercise lots of IO controls on the CD ROM and try and force out bugs. So it tries to be thorough, but there's still more work to be done there because device drivers are a huge part of the kernel. And there's a lot of work to get full coverage. So the key is sometimes when exercising these the kernel might might just break or maybe a device just hangs for a while. So stress and she's got intelligence to detect if opens get blocked, and it will back off and not try that again if something's gone wrong. So it tries to get it's essentially tries to move forward, if it gets stuck, but sometimes kernels just break and stress and G will then eventually stop working if it can't move forward, but it tries its very best to keep on working. Next slide please. So here's some examples. So stress and G exercising the device stress on every CPU. We have to run running this is route allows you to access more functionality from the device drivers, but it does come with its risks, and the option K log check will check for any new messages appearing in the kernel log, and it will report those to standard when running stress and G that way you can see if the kernel panics or throws any weird warnings when exercising device drivers. And the same kind of examples, furthering on for sister fest and pocket fest. So, one difficulty is there are multiple devices and thousands of files and sister fest and pocket fest. So if you run these stressors, please run them for at least five minutes, or maybe much longer, depending on the speed of your machine, just so that every file gets exercised in these pseudo file systems. Can have the next slide please. Right. I'm quickly moving on to the schedule. This is a very naive diagram on the left hand side but basically, there's the run queue, and lots of stressors of queued up on different CPU threads. And one way of making sure things kind of work is to force context switching and changing affinity, moving stressors from one CPU to another and back again, causing frustration and changing niceness values and doing lots of evil things like thousands of forks and clones and then dying very quickly or hanging around to call zombies. Thrashing the wait system call it's waiting for thousands of threads and migrating things across human nodes and doing things like locking on resources and forcing priority inversion. Thrashing CPU scheduler, and we can with stress and G we can cause ridiculously high load averages, which always interesting to see what happens when the system gets fully overloaded. And, you know, see if any soft locks occur or even hard locks in the car or when the scheduler has a massive run queue. So that's the mindset behind that. Can we have the next slide please. So, again here are some very quick examples. The first one exercise the fork fee for clone and demon stressors. So that's multiple instances on all CPUs with lots of stressors running. If you run that you'll watch your load average just crawl up to a stupid level. We can do things like create zombies so we can create processes which died but wait a while for being reaped. We can force heavy context switching with the switch stressor and there's some other stretches that stressors which also can force context switching but here's a good example. And the HR timer stressor is really cool because it with the HR timer adjust option, it will try and figure out what's the shortest timer interval to schedule to create the most timer interrupts. So when you run this across multiple CPUs on a on a system with lots of CPUs, it's really amusing to see how many thousands of or millions of interrupts you can generate on the high resolution timers. And this is one way of exercising a system to see if you cause any problems with that. Next page please. So, there are too many really to list here but to give you a flavor of the different types of schedule of stressors. Here's a definitive list at present. I just want to pick out a few. There's one. Where is it. There's a soft lock up stressor which was designed to construct soft lockouts. And there's another stressor where is it called sync load. This one creates a load, which scales across all CPUs and then in synchronization, it will stop the load for duration. And then it will in another synchronized event time it will start the load so it will start and stop load across the whole seat across all the CPUs. And that's kind of useful for seeing how frequency scaling occurs and seeing how well the system can spin up and then shut down when you've got loads peaking and spiking very frequently. But there are far too many examples here to give you, you know, work example on each so please refer to the stress and D manual. I just want to give you a flavor of. There are lots of different ways of stressing this schedule up with stress and G. Next slide please. So, network stressing this is still work in progress. I've got a partial coverage of the network stack is stress and G. And most of the stressors are stressors such as UDP or TCP or TCP TCP or SETP stressors start a client and server. They talk to the colonel and basically use the loopback network device driver to send messages to and from the client and server so it's all within the system packets never go outside the system so you can't do deny service attacks on other posts. So that's the idea there. So there are quite a range of stressors that we also do I comp and ping flooding on the local host and stress and he also supports raw socket and raw UDP client server models. I try and exercise as well all the IO controls and set sock options where possible. And stress and G network stressors can use different ways of sending and receiving messages using the send and write family and conversely the receive and read family of system calls to send and receive data over a socket or over the network. Currently we support IPv4 IPv6 and also you next domain sockets. Next slide please. So, here's some examples. The first one is UDP stressor on all CPUs. We also enable UDP light, and we specify IPv6 for the UDP domain. So, as you can see the options are they are quite flexible allows you to target different use cases. Oh, I've got some got some background noise. Thank you. Right. Like the TCP socket set stressors. This is a stock example there. And where possible, you can use zero copy for faster packet transmission into the kernel space and the sock options there says you send message to send a packet or send a send data. So the next example is a sock mini stressor, and this one's useful because it runs 200 instances of the sock mini stressor and the sock mini stressor itself will open up thousands of socket collections to itself. So it will basically thrash the machine with hopefully tens or even hundreds of thousands of open sockets. And we can point this to a specific interface as well. So that's another sock option. So all the other all the network stressors have an IF option to specify the network interface to be used. The default is the loopback, but you can specify other network devices. Again, you can mix different stressors so the bottom example shows TCCP and SCTP protocols being used with eight and four stressors being run in that example. So mix and max hammer the network, you can also specify each each stressor allows you to specify the port being used as well so if you want you can move ports around and target the same port with lots of different stressors as well just to cause havoc. So have fun with that. Next slide please. So, lots of different types of network specific stressors. I mentioned a few earlier, but there's the whole list of stressors which could be used for the network. I also exercise the Linux net link interfaces so that's the proc and task networking net link interfaces. I also do a bit of network device stressing on the interface there. And a curious one is the crypto AFLG protocol. So you can actually talk to the crypto engines in the kernel over a socket. So that's the AFLG stressor. So it's not really a network stressor but it uses a network socket so I just included it in this list. Yeah there's a bit of coverage but ultimately I would like to cover more protocols so if anyone is a real kernel kernel and Linux networking guru. Please consider adding more stress tests for me where they gaps because I'd love to see that. Right. Next, next slide please. So, I hope I'm not boring everyone with all these examples but I'll go on to now the virtual memory stressor. In the real physical hardware we have physical pages but Linux allows those pages to be virtualized so each process when it uses a page is using a mapping on from a physical page. Now stress and G tries very hard to exercise memory mapping and virtual memory space memory space with lots of devious stressors. So the idea here is to really see all the different ways we can memory map pages, change attributes, map it to files or physically back to memory, change attributes with the way stuff is shared, change protection bits, exercise page sharing across processes and basically force page eviction and swapping and all sorts of havoc like that. So we're trying to basically exercise the memory subsystem in the kernel and try and cause problems. Things like pages when you've got tens of thousands or maybe millions mapped stress and G can exercise page traversal and exercises walking over different pages and causing translation look aside buffer refreshes and basically hammering virtual memory subsystem as much as we can. Next slide please. So I might have showed earlier the VM stressor the VM stressors an interesting one because it contains lots of methods for stressing memory. So it's a good one if you want to have a system and you want to make sure memory is working correctly and you're not getting bit rot. The VM stressor is a good first try. The first example I'm using eight stressors on nearly all the available virtual memory. The VM populate basically says when when a page is allocated back it with physical memory immediately it's mapped. The verify option says when exercising each page with a test make sure the bits are all correct and we haven't had any bit errors during the testing. Now, the second example shows 32 giga memory being exercised by the VM stressor, and I introduced here the VM method option and the VM method option allows one to specify the different ways of exercising the memory. And there are lots of these options and I'll describe them later point, but for this example, we're using a bit flip stressor. And that goes through every single bits in each bite in each page of the 32 gigs and we'll flip them from zero to one, one by one and then verify those bit flips work correctly. The VM method has a lot of different ways of hammering memory in lots of different ways from rotating or shifting, XORing, popping in different patterns, walking over cash boundaries and all sorts of weird and wonderful ways to try and cause memory bit corruption. Now, this example shows an abuse of the virtual memory system by running the break and stack stressors. So if you're not clued up about how the break work, the break or S break system calls are they used for using allocated memory on the heap. The stack stressor is equivalent to using like malloc allocating lots and lots of memory, and this will cause paging and then hopefully the kernel might kill the process using the in killer. Once it's used too much memory, because the way links is in killer and over commit works. The stack stressor just keeps on pushing data onto the stack and tries to run a memory through abusing the stack. If basically, if you've got lots of swaps is stressor will take a long time to complete the swap is slow, but eventually it will get to the point where there's no more memory left and the killer will kill the stressor. And stress and G will restart that process it will just keep on stressing until you stop it. The final example shows how to generate minor major page faults. It's an exercise the reader I think to explain how made major major page faults works but just just to let you know if you want to create millions of page faults use this stress option. Next slide please. Here's some other examples you can do things like ran run random memory advice hints on lots of pages. That's the advice stressor. If you look at the advice system call you'll see there's lots of different ways of giving hints to the virtual memory subsystem to say things like, I want this page or I don't want this page anymore. Or please page this in and back it with physical memory. So there's lots of different ways the advice stressor exercises lots of different pages. And the next example uses the Memlock API to try and lock pages into memory so they don't get swapped out. If you run this stressor with thousands of instances, you'll try and as route it will try and lock as many pages into memory. And then you get weird bugs occurring because other processes can't get that one gets flipped swapped out and maybe you'll see demons crash and so forth. That's a good way of allocating lots of pages and causing a bit of havoc. Other ways you can exercise memory. The next example uses the memory map stressor, and it backs the memory map pages with a file. And it's a four gigabytes worth of allocation. So that's a simple example there. And the remap stressor does memory mapping and remapping. So again, we exercise the memory remap API. So it's all very simple stuff, very focused. But you know, we need to test the kernel API is and abuse them and see if any bugs pop out. Next slide please. Yeah, memory. There are lots of different ways of exercising with stressancy. There are things like be search and heaps or nature search which exercise memory doing searching and sorting algorithms. This is endless. So I advise people to look at the manual page or the documentation because there's a lot of stresses here but stress and he is designed primarily to hammer memory and cause processes to fail or the kernel to run out of memory and cause lockups. Next slide please. I have a couple of questions and then also I have some questions of my own wondering if it might be a good break. Yeah, because I'm doing a lot. Yeah, I can skip through here or maybe we can go through. So let's ask us some questions. Okay, so would you like to. Is this a good time to answer some of the questions. Yeah, so, yeah, so the first question was how is stress and G different from LTP. So LTP is a really excellent source of regression tests and LTP has lots and lots of very focused small subtests stress and G is different because it allows you to mix and mix and match lots of stress cases across lots of CPUs. And so in this case, you know, you could do more focused testing with a good mix stress and G also has some options to do benchmarking and profiling and seeing how the kernel is running with like Perth monitors. So, in a way, stress and G is different from LTP in that respect, but I can't say much more because my knowledge of LTP is a little bit weak, but LTP is very focused on micro tests stress and G has stressors which have a bit more functionality built in. I could be wrong on that. So, can be a mentor. Please bear with me if I'm wrong on that. Do we have, what's the next question do we have any support to stress encrypted file systems. Well, the, we don't use stress and G exercises file systems. So once you mounted encrypted file system you can run stress and G and target that mounted file system, but there aren't any specific crypto encrypted APIs being used. So, the answer is nothing special is being done. It's very much like how FIO would be used to stress test a file system. Rob Berto has asked a good question about practices to test VM hypervisors. Yes, so it really depends what you want to look at. I suspect hypervisors you want to make sure that you know you're not running out of memory can do memory contention stress and G has to say lots of memory stressors, you might also like to see if things like noisy neighbor problems occur. So, one thing with virtualization is you can have multiple virtual machines running on the same hardware and stress and G has some devious stressors, for example the lock bus stressor, which will exercise various do locking across page boundaries and cash boundaries and really horrible things like that, which can actually affect neighboring virtual machines. So that's a good thing to test stress and G also has different memory stressors like memory and that allows you to benchmark your memory on your hypervisor and maybe loading your hypervisor with lots of CPU load is useful just to see how well it's responding in cranking up to full speed and not so. I think the question really Rob Berto can be best answered if you feel for the stress and G manual and consult all the different stress cases and see if they're useful for your use case because it's difficult to say really because I don't know how you'll be using your hypervisor. Hopefully that helps a bit. Any further questions. I have a few questions myself. So when you talked about exercising cash back on, you know, a few slides ago. What does exercising mean in the context of cash and what actions, what kind of tests do you do. What does it mean. So, so if you imagine the cash is broken up into small cash line. Blobs, you know, normally on x86 we're talking about 64 by chunks stress and G will try and walk over those and for example flush them out. And then, you know, lots of different options but one of them is, you know, walk through the cash and trying on multiple CPUs flush and maybe prefetch and mix those and then test if the memory is coherent as it should be. For example, if you're sharing a page of memory across multiple stresses, stress and G will write to those different bits of memory. Hopefully, if the cash management's working correctly you won't see the wrong contents appear it should be all coherent across all the processes. So there's like testing like that. So for stressors, it's basically, can we just exercise the cash in very bizarre mixes in ways which you probably wouldn't do in real life but it's useful to kind of thrash the cash with reloading and stuff just to see how well it performs. Some of the stressors, for example, the stream memory stressor will read all the way through memory. And when you enable the perf option you can see how well it's doing in cash misses and prefetching. So you can look at CPU perf monitors just to see how well the cash is behaving to see if it's working as you expect or not. There's some in-depth knowledge about how cash works and how perf monitors work but hopefully the mix of stresses you can use allow you to kind of construct different stress scenarios to exercise the cash the ways you want. So it looks like you are exercising cash coherency integrity and performance aspects. Yes. Yeah. So this is x86 only or is this. Well, yeah, predominantly x86 but there are issues, there are other architects like I've got some of these features working and there's some other architects I can't remember now where there are some kind of flashing options. So where possible if the process supports it I will try and implement the more complex cash behavior. I've got a question here, how can you make sure stress care different stress cash online type cash. I'm afraid I don't understand that question actually. I don't understand what nine type means there. So while maybe that could be clarified. While young is going to clarify I have another question you mentioned on Sysaphas when you were talking about Sysaphas and Prokofas and that coverage and you mentioned give enough time. What is in your experience in ideal time, say in a QMU versus a desktop for example. Well, I kind of experienced seems to tell me that unless, well, normally 10 minutes is sufficient. But it really depends if you've got a device which is like, you know, if you've got a CD run attached to this stressor exercise the CD run. Yeah, sometimes it takes a while to open the device because there's a CD in there and you know, so it really depends on what devices are attached and how slow some of the devices are to start and stop, or for their timeouts and so forth. Typically I do about 10 minutes worth testing with as many CPUs as possible to cause as much racing havoc. Each of those stressors by the way far soft lots of P threads. So there's lots of concurrency working at the P thread level, and also at the processor, a process level on lots of different CPUs. So that's why I get quite good coverage results. This kind of paralyzation which stress and G is using, but it's also worthwhile running it for say 10 minutes to get as much coverage as possible. Okay, the question is non inclusive and non exclusive policy cash. I still don't understand that so excuse my naivety and not understand in the cash question. I will look that up and get back to you I think that's probably the best thing shot. If, if companies that have show, show how can reach out to me and send me an email. I will look into that and answer that question because at the moment. I don't know how to answer it. Thank you and I have a, sorry, I have a couple more questions writing down questions as reading presentation because I was kind of manning the advance in the slides. What is load in your sink load option you mentioned load. Okay, so what does it mean. So basically do some very simple busy spins on the CPU, just doing some simple math operations and things like that so it's nothing. Nothing like rocket science there, but the whole idea there is just to keep the CPU CPUs busy across the whole set of CPUs for a determined amount of time and then to stop and go idle. You can do like very simple busy maths in a spin loop and just keep on iterating over that. And the sink load stress allows you to specify the time when it's idle and busy. So you can specify in like terms of nanoseconds or, you know, you can scale up to seconds or so and just twiddle around with those options just trying force, you know, CPU busyness and unbusy cycling. You can do this across all CPUs you could specify multiple CPUs and say hey, kind of alternate between Ireland. Yeah, and the whole, the whole idea is basically do everything in sync. And that that should exercise like the x86, you know, CPU scaling dropping up in different C states, P states. So, you know, the reason why you can change the duration is in you can see if it, if I can drop into the, am I actually dropping into the deepest C state across all CPUs or not. So that's the kind of mindset there. That's also it's useful if you want to run a say a laptop like busy not busy busy not busy and see how the founder response is bonus because the, you know, the temperature might rise up and the fan might might be cycling or not cycling. So, just another, there's not another way of loading CPUs basically. Do you also put in for this kind of testing do you also take into account, which is if I'm a CPI mode that CPUs might be in like a performance mode or, you know, normal mode kind of things low power consumption type mode so do you not worry about I don't worry about that. That's really how the user wants to configure their system. I try and keep it as policy free as possible. It's basically set your system up to see how you want to use it in a particular mode, then run stress and G. Looks like we have a question in the q amp a and then also one person has their hand up. All right, how can I come with a good base benchmark values. Yeah, so this is the interesting thing of. Yeah, so if you're using the metrics, the stress and G contains this notion of bogey ox, which is how many times it spun round a busy stressing loop. This line is interesting. What you should always do when comparing systems to stress and G is number one, don't compare compare different systems with different versions of stress and G. So, in short, you build stress and G or install one version of stress and G when you're testing it across systems benchmarking systems. Number two, if you're building it from source make sure you're using the same compiler version. And the reason is stress and G will try and use more advanced optimization features inside the compiler when available. So when by benchmarking if you compile the same version of stress and G with older or newer compilers, you might get different metrics coming out. So benchmarking. What is the bogey ops measurement does can does suffer from jitter if you don't run the stress test long enough. And another thing is some stress tests work very well in providing reliable benchmarks others don't. And it's free. The reason is just the behavior of the system. For example, if you're doing lots of random IO seeks on a hard disk, you'll get different metrics than if you're doing something like measuring CPU operations as consistent or as on the hard disk, you're working with real physical hardware, which can vary in performance. So I don't think I've answered that question well, but there's some things you need to be aware of is run always the same stress and G, build it with the same compiler. Then you can compare apples with apples rather apples than pairs. And running a good baseline. It's always good to run the benchmarks on all CPUs, rather than just one CPU, you will notice different behavior as stuff scales up so benchmark reliable if you run them across all the CPUs. That way you know you got a fully loaded system, and you've got a fairly representative idea of how your systems behaving. So I hope that answers the news question. Oh, good. Okay, that's good. Okay, they hand up. Do you deserve. Oh, I can't see. Oh, yeah. Sorry. Is that just checking the. Hi. Yeah, I'm sorry for the early question about inclusive and exclusive catch. So what I mean is, because you know, some of the level of the catching they might inclusive of the lower level so that means right. Yeah, so okay, I got it. Yeah, so the cash, the stresses are particularly naive. So if you say I want to stress the level one cash. Basically stress and G will allocate a buffer exactly the size of the level one cash and then exclusively exercise that. So you will get probably leakage down, you know the stuff gets flushed out from from beforehand. But if you're spinning on that one blob of memory which fits inside say the level one cash then hopefully that will keep it active in that as that working set. So I don't have any smarts to say exclude, you know, stuff from level to level three or DRAM is not that smart. Hopefully that answers your question. Yeah, sure. Cool. Okay, that. I think that's up questions and then we can move back to the presentation, let me take you back and then I'll sit as it through because I think I've got lots of examples so maybe people can look at the slides later for more work to example I think I've got quite a bit content to go through so that's, let's go on to the next slide and was it through. So, yeah, so, as I say, lots of examples here. Thermal overrun was really what what strategy was designed really to do and stress she has lots of different ways of exercising the CPU. So the CPU stressor has over 80 individual stressor methods and they exercise all types of compute from floating point vector loading storing trick functions. It's got algorithms which are used widely in industries such as for your transform cyclic computations IPv4 checksums. There's also compression and packing and all sorts of things being used there so I think stress and G at this point it's fairly representative of ways to make your CPU is hot and exercise and now that thing to be aware of is, if you find a stress test which might be your CPU. If you upgrade your CPU, it might be a different set of stressors which make it hit a thermal overrun. That's because the way the micro architecture changes over time. So, you know always be aware that what is today's good test might not be so in two years time. And it also is per architecture specific so something on x86 makes it really hot might not be the same for a Raspberry Pi. So I won't go through them all but here are some more examples of ways of stressing stuff. The thing to notice here is there's a TZ option, and that allows you to dump out the state of the thermal zones. So you run your stress test for five minutes and the thermal zones, the temperatures for every thermal zone on your computer will be dumped out and you can see how hot it's getting. And third example down the int 128 decimal 32 is a really good CPU stress test for x86 systems it tests 84 bit sorry 128 bit integer maths and decimal 32 bit maths. And for some reason this seems to make modern x86 is quite toasty hot. So that's a very mind. And the matrix stress is useful because it does to compute and cash bound make loads and stores. So the cash gets on the compute part the CPU gets warm as well. So those particular tests which are illustrated here seem to produce the most heat but your mileage may vary but vary on different systems. Next slide please. So, as I said the minus TZ option was dump out thermal zone temperatures and centigrade and Calvin. So, you know it's kind of useful just to see how my laptop gets hot here. And that's after changing the thermal paste. So, yeah. Very simple example, nice data coming out of it. Next, next slide please. So the other compute options as JPEG compression juices a series of different types of images to compress with JPEG stressor. We've got a zeppelin compression. We've got vector maths and wide vector operations. These are only really supported in modern x86 processes or architectures which support the GCC vector maths. The GCC target clones attribute to try and map generate code which matches new versions of CPUs as well. So if you've got the latest and greatest x86 system, then hopefully GCC will produce code which is quite efficient or relatively efficient for your target processor. And this is determined at runtime. So when the stress test starts up it will say, oh you've got all these fancy wide vector operations, let's use that bit of code. And another one to look at are hashing functions. It has, I think at least 20 hashing functions and they're good for stressing, shifting, multiplying and memory and cache activity. So those kind of make the system a bit more as well. Next slide please. Yeah, just to say, again, stress and GPU stressor is very wide and rich. So let's look at the witch option that CPU method which now show you all the different stresses stress methods in the stress and CPU stressor. Next slide. So some forecast earlier about eCrypt fs s eCrypt fs stress and G can exercise a lot of the, well all of the fast system system calls reading, writing, seeking direct IO, async IO, exercises IOU ring, syncing data, creating holes, punching holes and files, and also exercise a lot of file based IO tools as well. There's locking as well all the different locking APIs which Linux has way too many. So you can exercise that on the file system so point it to your favorite file system be it eCrypt fs 64 XFS butter fs whatever, or ZFS for example, and then run the stress test on that. And we've also got the ability of stress and T to actually exercise the broad device by doing lots of random read read patterns and interesting read patterns, which is kind of fun on a traditional old hard disk because you can move the head back and forth and exercise it. Stress and G is not really a benchmarking tool replacement for FiO. If you want to do serious file system work. I use FiO, but you know stress and G still got the ability to test things. If you want, but I recommend FiO it's the most excellent tool. Next slide please. One feature I put into stress and G is to actually check the drive smart stats and most modern drives have smart enabled and they can tell you if your drive is going to fail, or how hot it's getting another interesting features like that. So the top example runs a mix of IO patterns enables the smart monitoring and says I want to test that file system with 10% of free space on that file system. I'll leave that running for a while, stop the test, and the smart data will dump out. And I've used this before on a drive and detected. It was having lots of seek errors and wanted to read problems. And so it's kind of useful as a way of exercising a drive to see if it's going to fail or not. One other option I'd like to draw your attention to is the temp path options. That's the third example down. So you can specify where stress and G will create its temporary files for the file system stressing stress and G will generally, well, we'll use the current directory or running from as the default place to put its temporary files. The temp path allows you to overwrite that. And that way you can specify lots of different points where for other file systems are mounted. Well, I think that's a kind of a quick and quick and easy example there as I say some more examples you can look at there or read the documentation. Next slide please. And as I say there are lots of different file system stresses test different system calls and ways of creating direct trees and populating direct trees and locking and so forth. Yeah, some of those examples are at the bottom I've got lots of different files system exercises being run all at the same time. Just to show how you can mix different files system stressing capabilities with stress and G. Yeah, next slide please. So here's an example of Minix being broken, just to show how portable stress and G is it builds clearly on Minix, and it breaks the virtual file system. Oh dear. Yes. Well, there we go. Next slide please. Quickly, just to say that stress and G can also exercise branch prediction, one of the stresses. It's a horrible mix of labels to jump to. So it uses a go to to indirectly to a label and a little random number generator, which produces very difficult hard to predict branching. And this causes lots of branch prediction misrates. So that's a kind of nice little exercise, just to see how good your processor is at doing branch prediction. So here's an example using the branch stressor with a puff. And you can see the perf monitor stats being dumped out after a minute, and you can see that it's doing 0.31 billion branch instructions per second, of which 0.29 billion a second the branch misses so that's working out to be nearly 100% branch prediction misses. So that's how evil the branch stressor is it really does make your branch prediction logic work hard and get it wrong. And there's another stressor I've written called the go to stressor which does forward and backwards branching. It pushes so many into into it does so many of them that it actually run the processor will run out branch prediction logic. So that's another way of making branch prediction work hard. Next slide please. Quickly, just to say that there are ways of exercising locking primitive so there's the P threat mutex, a few tech system calls some old ancient ways of doing locking using the Peterson and Decker algorithms and that's just using shared memory and no kernel interfaces. Those Peterson Decker stressors are really useful for systems where you've got shared memory, and you want to make sure that the cash coherency is working correctly on on the SMP or new mobile system. And finally, atomic locking, and basically we've just got lots of x86 lock instructions. Other architectures it's implemented differently. So it's a really good way of, you know, literally every instruction is a locking instruction, and it tries to do across boundaries and all sorts of evil things like that just to make things difficult for the processor. Next slide please. So, here's an example of exercising memory bandwidth. This is the stream stressor, which is modeled on the streams, the de facto stream benchmark tool, but it behaves differently. It also allows lots of different levels of indirection to handle multiple pointer dereferences. So it's not exactly the same as the stream benchmark stress case, but it's useful for seeing the type of memory rates you can get from some writes and how much compute you can get while doing streaming operations. So it's a kind of useful metric of compute and read so you can benchmark systems. Next slide please. And the next slide shows you this current one shows you the memory stressor, and this will do writes and reads of various sizes on x86 systems, it will use non temporal rights where possible, and that's basically where you bypass the cash, and it goes directly to memory. So you can get very good write performance with that. And there's also a prefetch read as well for x86 and systems which handle prefetching. So, you know, this is a good benchmark if you want to just kind of get an idea of streaming right or read performance on your system. It's not really going to do any verification it's just a throughput kind of benchmark. Next slide please. Interferences communication we use semaphores message cues and pipes. So those are pretty standard so I won't go into that much more but be aware that you know stress and she can stress it. Next slide please. Right so now into how do I write stress and G. Well, apart from writing a stress test and having a guess how it works. I also use a G curve instrumentation in the kernel. So I build a kernel of G curve enabled. I install that kernel, then I run a script called kernel coverage which is in the stress and G repository. And this takes 12 for 15 hours to run. At the end of the run, I run alcove and that generates HTML web pages which show every slide in the kernel which has been exercised or not exercised by stress and G. And then look at each stress care test and see where coverage is hitting and also missing. And then I try and devise use stress cases to exercise the bits which are missing. And basically I keep on repeating that rinse and repeat and try and add more features for each release exercising all parts of the kernel. So if you've never seen alcove output, let's look at the next slide and I'll show you what we see. So here's an example of the file system directory in the kernel, and you can see the coverage for each individual source file. And you can see how the coverage works in percentage of coverage and the number of lines in each file. So you can see I'm getting quite good coverage on some files, some parts of the system whereas others like bin format MISC, I'm not really exercising that at all. So there's hardly any coverage. This kind of the next slide will show you an individual source file in the kernel. So here we go. Here's FS F control dot C. And there's one particular part of a case statement. So this particular command F get owner UIDs is not being exercised by stress and G for some reason. Now that could be because I haven't implemented it, or it could be because that that has to find isn't in my libc implementation at the moment, or it may be my stress test is broken and not testing it correctly. But at least one can see by looking at the coverage that I'm not exercising it. So that needs to be addressed. So this is the kind of exercise I do. I have to eat up for each release. I run this kind of coverage. Check that I'm getting better coverage from the previous version I released makes for the no regressions, and also identify areas where I can improve stress and G. It's very tedious. It's painstaking, but it does guarantee I get good coverage. Next slide please. So, as mentioned, one of the features of stress and cheese micro benchmarking, and the bogeyop is a bogus measurement of operations per second. Remember, it's bogus. It's totally bogus. It doesn't mean anything apart from how many times a stress test is iterated on a loop. As mentioned earlier, it can jitter a lot. Some stress tests are better than others like compute bound ones have little jitter, but maybe I based ones or ones where you may be the colonel is doing some other activity in the background, it might get jitter so bear bear that mind. I've also mentioned that new releases of stress and G have optimizations new feature new features bug fixes and so forth so performance on the bogeyops may and will change. So don't compare metrics from different versions of stress and you against each other. Make sure we build it with the same compiler. That is something you need to remember. And the bogeyops are not gospel truth. This is not a true benchmarking system, but it's something to give you a feel a good kind of indication with a few percent of slotted, you know, so it's jittery it's sloppy. It's not perfect. It's a good way of getting an idea of how well your systems performing. That benchmarking is really hard to do. It's really hard to get right. So treat this up with caution. The metrics option shows you the bogeyops rates and there's also a times option which shows you how much CPU time and system time uses, uses space time has been used for each stress test. And the example below I'm running the CPU stressor and I'm saying to 10,000 bogeyops and stop after that. So you can basically specify how many iterations you want to run, rather than just specify the time to run. Next slide please. I think we're running out of time. So here's an example of the matrix stressor being run. And the type of output you get. Notice that the first one running with one CPU, the bottom running with eight, the number of bogeyops from the first one, one is not eight times improved on the one with eight CPUs, just because the way things scale and we're using threads and you don't get a one to one, you know, a one to one improvement as you increase the number of CPUs. And that gives you a flavor of how the metrics work, the type of bogeyops stats you get. So you've got stuff like the real time is the walk lock time and then you've got user and system time in bogeyops a second. So the different ways of measuring throughput based on how long it's wrong, or how much CPU it's used for different parts of the user space or kernel combined. So that makes sense. If not, ask me later and I'm very happy to clarify that. Quickly moving on to the final few slides, it's a bit of a marathon this. So, here we go. Perf I've mentioned this before so I won't elaborate, but here's an example of the stream stressor with Perf options. You can see how many cache messages I'm getting level one, level two, and TLB that's misses and stuff so Perf is great for seeing how the system runs. Think of a stressor, run it, get Perf output and see how that changes on your system or to see where the hotspots are. So that's kind of useful. Next slide please. Yeah, the other thing you can do while running a stress test you can use like VM stat, thermal stat or IO stat options to get more data out of stress and G while it's running. So the bottom example we're running IO stat every five seconds. And that will show you how many reads and writes are occurring every five seconds and the actual data, the amount of data being read and written as well. That's IO stat, VM stat is very much like the VM stat tool gives you information about CPU utilization and thermal stat gives you some statistics of the how hot your CPU is getting so that's kind of useful to have. And you can run all those three in parallel so you can VM stat, thermal stat and IO stat will running at the same time if you so like. Next slide please. I mentioned that you stressors are all gathered together in things called classes. So if you run to run like all the file system tests together, you can use the class option. So, for this example shows you I'm running the file system class, I'm going to run them sequentially on eight CPUs and each stressor file system stressors can be run for one minute. So you can kick that off and come back in an hour or so after it's walked through every file system stress test. So this you know it's just a useful way of collecting stresses up in a group of very similarly designed stress tests. I don't use it that much but some folks do. So couple of questions. One is, so this is would be, would you say running all these classes say you know putting in a shell script would that exercise the kernel. Yeah, yeah, so if you did stress. Well actually, if you wanted to just do all the stressors you could just run stress and G minus seek eight minus T one minute and then stress and she will work through every single with the 280 stressors for one minute each on all the CPUs or eight CPUs here. But the class option as you to say actually I want to just exercise all the stressors which exercise interrupts you do class interrupts or you can say I want to do all memory specific ones that be class memory, and all this, all, you know, all the memory type stressors will run. So it's just a way of bundling them together and kicking off running it for a while and then coming back to see how it's worked. So, one question I have is as we were talking about earlier about l cowl and that's cool by the way how you determine if you need to make improvements to stress and G to keep up with the kernel. How often do you find yourself making changes to stress and G to keep up with the kernel features and then also how extensive is that what you find yourself so so it's actually. It's a lot of stress and G I've developed in my spare time so I keep I as I work on the kernel quite a lot. I keep track of what's going into Linux next and what minus is actually pulling in from next kernel. And I also look at things like Linux newbies which has a list of all the features going into the kernel so I try and keep an eye on what new system calls and maybe extra flag options and I have tools and things being added to the kernel. I build a kernel, the latest kernel run the current stress test on it and then eyeball those areas just to see you know just to verify that I'm not actually exercising them. And then I'll write the stress test so it's it's it takes it takes a bit of time. So, yeah, I must admit I do release normally of stress and G once a month so I do the coverage, at least once because I do it around release time. I'm using that coverage data I then write down a roadmap of extra things I want to put in for the next release. So, that's generally how I drive it. It's rather ad hoc work. It's kind of spare time stuff I do in the evenings or the weekend so it's kind of boring. It's mind, mind blowing boring, but I'd like to do it because I like to kind of add more features in. So we have a question looks like from Paul in the Q&A box. Oh yeah. Let me find that. What's been the taste of great benefits stress and G. Okay, so the greatest benefit is really. The wind when I was working with in a bunch of we will bring up the risk five platforms so we had a couple of development platforms, and we were working closely with the vendor there and they were running stress and G and we were running stress and G and we're finding all sorts of features just because of a new architecture, the problems of the system call interface for example, and the way stuff is not being mapped correctly we're causing panics and exercising the virtual memory subsystem in weird ways. So, you know, it's great for when you're doing new bring up of new platforms. It's also really good when new features of the kernel arrive you enable them it's good just to kind of thrash it with stress and G, just to see if it behaves correctly. So I think that's the benefit stress and she's found 45 or so kernel bugs and they're not just Linux ones they're ones based on different kernels as well so I kind of think new architectures new kernels very useful to run stress and new features it's good to exercise it. And missing capabilities, well, I did, I was going to say this GPU stressing is not used. It's not implemented, but this very weak, some very kind focus red hat sent me a patch which is in review, which stresses the GPU so that's kind of answering that I do feel like the network side of things could be improved. And other shortcomings is, is not good as tools like Cisco sis caller, which is really developed to exercise system calls and does that really well stress and G is not like that. So, you know, that's a deficiency of stress and G really. And this caller is more of a first test, right. Absolutely. G is doing different things in terms of regression testing and performance testing and stress testing. Yeah, it's different. I would compare this caller and stress and G in the sense that that would be short changing stress and G in summer. I've had, I've had people saying oh it stresses all the system calls why are you using sis caller it's like yeah it's different. So that's why that's that's that's why I want to kind of point out that I'm really trying to produce corner case bugs where when like you've run out of resources in various different ways. So your driver is not checking a pointer correctly when it allocates memory in that files, and then you get a kernel crash, or, you know, there's a locking issue because I'm running multiple threads on a device and letting go open in a kind of racy way and bugs pop out. So that's that's that's the kind of the concept of stress and G is just to force out these weird corner case issues. And with the coverage allows me to actually look at where corner cases aren't being exercised. I'm trying to think of ways of creatively exercise and tickle those error paths. So I have one more question the smart you mentioned the smart, what is smart stand for I don't I don't think it. I do not remember off the top my head, the acronym stands for it's industry standard way of getting various drive metrics from a disk drive or SSD or nothing in the image I sometimes support it. It basically tells you things like read error rates, and it's useful for detecting, maybe, you know, if sectors are going wrong, and it's doing sector read direction to cover up bugs, thoughts on this disk, you know, it's good, a good way of capturing my disk is starting to fail early when running a stress test. I just wonder if. Yeah, I can't remember. It's an industry standard term which I've completely forgotten. Yeah, that's what happens with standard terms I use them and then I don't always remember them. So I have one more question for you. What are your thoughts on hooking stress and G into case of test, in terms of if I were to say, I could, I could write this script shell script that exercises picks a kind of a few classes and runs them through on test strings. I think that's a good idea. Yeah. And, you know, we, the different ways of it. I think the tricky part is getting stress and G for your system because if you act to get whatever you're going to get an old version, because, because I push out releases literally monthly. Someone kindly provided a way of building it in Docker, just this last week. So you can now get a Docker build, get the latest and greatest. So, and there's also snaps if you want to use those, but they're not my favorite. So, you know, the different ways, or you can just build it from source. The cool thing about stress and G is, if you don't have all the support libraries, you can still build it, it'll just lose functionality. So, you know, it should build cleaning up the box with the compilers you've got. Oh, so that, yeah, I might explode that option and then last question you don't have to answer this if you don't want to. It is, have you ever considered making stress and G part of the Cardinal. Um, no, like. Well, we could do that. I'm kind of more, I wanted to couple it from the kernel because it really is. And my intent is to make it so it's useful across lots of different lines. And, you know, pushing commits into the kernel means has to go through another layer of indirection. So, you know, it looks like there are overlaps between Perth tool and stress and G in terms of performance is would you say that's accurate. Perth is great for monitoring stuff, which you write, you know it's brilliant for that. Perth interface to exercise stuff with stress and G is an extra kind of way of getting metrics so, you know, Perth is a great, great tool and combined with stress and G is just nice to be able to validate that some of these stress tests are doing what you think they're meant to be doing. So that's one of the reasons of having Perth there. So, you know, it's good to see if, you know, you can actually drive a system really hard to see if you can actually maximize the number of instructions per second or your memory bandwidth and Perth is very good at giving you some very deep statistics. So to see if stress and G is doing what it says on the lid. Cool. So I think those are my questions. And then, how are we doing on time? Well, I've got a few more slides so we can zip through those because if we just go through those I'll be correct. Yeah, we ignore that. Just to say you can use the cyclic stressor for doing real-time and low-level kernel benchmarking. Read the slide, come back to me folks if you want to know about that because it's actually really quite handy. Next slide please. Yeah, we'll move on. Next slide please. I've got some evil stress tests like generating system management interrupts, and that requires a pathological flag because it is really horrible. And there are ways of booming, there are ways of exercising P states and there are ways of locking the bus, which is really useful in VMs causing noisy neighbor problems. So they're evil ways you can use stress and so treat with caution. Next slide please. So I just want to emphasize that stress and G has been designed the last few years to be really portable so it does a paralyzed auto detection of all the API's and features you've got with your C headers the compiler features what some destructions are supported with your, with your tool chain. And I've got to build with TCC clang TCC PCC and the Intel compiler, and it builds most of those with zero warnings on the pedantic warning messages so it's it's really clean code. And it compiles and runs on all those lots of different architectures. And I've also got it running on a lot of different operating systems. You can use G lib C or muscle C as well. So I've tried to make it really portable. And I'm really interested if other folks want to try out on different systems where I haven't tried it. For example, if anyone's got access to AI x, it'd be nice to see if it builds and runs on there. But try on a try out on different kernels different operating systems, different tool chains, if they're bugs, let me know and we'll get it even more portable. Next slide please. So, very simple to clone the repository. If you want to build it down to the CS stress and D directory just type make clean and make, make install. And if you want to get the manual, make PDF. Obviously you need to build dependencies for making the PDF and you need the tool chain. Basically, that's it. There's a read me dot a read me file with stress and G which tells you all the libraries you need to install for different distros. And that way you get the full set of functionality with stress and G but if you don't have those libraries, you can still build and run stress and G with less support. So I try and make it so it builds cleanly out the box and will work. So that's about it there. And next slide please. Just to say the way it's structured the stress and G dot C is the main stress test. There's core support files. Let's do all the kind of magic shimming around interfaces and abstraction. Now, stress and G dot C or the stress is the test directories and the Debian config. And basically to make files just run make at the top level and it will do all the stuff for you. That's it for that. And I think we're down to the final slides now. Next slide please. Yeah, as I say I'm I'm really heavily making things work correctly so I do static analysis on stress and G for release it. I build lots of compilers. I tested lots of different operating systems on lots of architectures. Most of this done. I got 90 virtual machine set up to do the testing to make sure it really is not buggy when I release it but bugs still to escape but I try and minimize that with thorough testing. Next slide please. And for people who really want to get on involved with stress and G be great if people can see if we can improve the eye actual coverage across different drivers at war networking support. There's the new family of mount system calls I haven't even implemented yet. I'm going to get in sync with new kernels and x86 has been focused a lot in stress and G and other targets would be really useful different architectures. And kernel is majority driver code stress and G should be testing drivers better to find more bugs. I think most of the bugs now are in the kernel drivers. So, you know, that's where we can focus next on getting stress and G to be better. I think we're down to the final slide now please. We've got 180 stresses we found 45 kind of bugs. Micro shedling with both that's where the micro bench markings found 15 or so kind of improvements and that's really zero day CI integration Linux project stress and G is being used by lots of was fairly well quite a few districts for kernel regression checking. People are using it now for research in the cloud domain. And it's also being used to check for microcode regressions by some folks so you know it's getting a lot of use. I do get feedback on how it's being used but if you're using stress and G in a really interesting way. Let me know and see if we can improve it for you. This is the final thought for fun apart. This is the place where everything's kept so please please refer to those URLs and the manual. Please read the manual for asking me questions. And yeah that's about it really. Thank you Colin. Great presentation. Sorry there's a lot of content there. There's a lot of features and stress and G so apologies for everyone. Colin do you want to see if anybody has any final questions. Yeah, yeah sure let's wait for some questions. The other thing is I'm very approachable send me an email or report bugs on the projects project site and I'll try and get answers to you pretty quickly normally a day or so, unless I'm on holiday. So it looks like somebody posted smart self monitoring analysis and reporting technology that seems to be the abbreviation smart stands for. Oh yes. Yeah. That's handy. One message. Just questions about slides being shared. I suppose they'd be shared after this. Somewhere on the next foundation website. And then you'll be shared on the on your landing page on the next foundation website. Also, thank you. And I think I'll probably put the slides and in the as a PDF in the repository as well. If that's okay. Awesome. Okay, well if anybody doesn't have you know there are no more questions and I think we can let everyone go but thank you so much Colin and two of your time today and thank you everyone for joining us. Just again a quick reminder this recording will be up on the Linux foundations YouTube page later today, and we will have a copy of the presentation slides on the Linux foundation website on that landing page where you went to register. So we hope you will join us for future mentorship sessions. Thank you so much. Thank you.