 Okay, thank you. So I guess I should start with a brief description of what monkey testing actually is There's this myth that a monkey given enough time will complete the Produce the complete works of Shakespeare. There's actually an empirical study of this And they found that basically the monkey will do bad things the typewriter long before it will produce the works of Shakespeare But in computer science monkey testing means basically spamming some piece of software with continual random Key presses until some kind of bug is found Now of course when you do this you probably end up with thousands and thousands of key presses needed to reproduce the bug but we can We can randomly discard key codes for example, and then if that the shorter recipe still reproduces the bug We can keep that shorter recipe and eventually end up with a very nice short recipe So this is a bit like genetic algorithms for those who've come across those except that we are the only mutation We need is to make the gene shorter So basically the short answer is yes they can produce better bug reports because For example monkey testing being an automated process can produce consistent bug reports with all the required information is required like backtraces And so forth whereas users will often forget some important piece of information Also monkey testing provide can provide by section reports So I can go through each version of the software and find out the exact version of the software That caused the bug to be introduced Whereas this is a lot of work for a manual user for users to do manually and so most many reported bugs never have by section reports Also when a user comes across a bug it's often because the your software is eating their homework and they're not feeling That good a mood and so the bugs are often emotive and they're almost always a motor in a negative way, which is unfortunate Monkey testing on the other hand, of course, it's always factual. It's always patient It doesn't care if you know about its homework being lost Another advantage that users generally report bugs against a stable version of the software Which from the point of view of the developers is ancient In fact, I remember back when I started using Linux and the KDE project wouldn't accept any bugs from any Any version of the KDE project that was actually distributed in a distribution So the developers really like to have recent bugs against recent versions But there are some related questions. So Although monkey testing produces very high quality bugs Question is how difficult is it to you to run monkey testing in principle? It's fully automated and so there's essentially no cost, but it's a little bit more complicated that is than that We'll discuss and also the other question is if these bug reports are better to what extent does this really save developer time? So those questions are a little bit more ambiguous. So I'll sort of discuss those But to begin with I'll sort of give a motivation So I've reported a number of bugs to launch pad in one case I think one of my bugs was marked as a paper cut and I think fixed but There's a bug. I think the bug number in the launch pad is about up to about 700,000 So there's a little launch pad. There's a lot more bugs in launch pad than anyone could possibly fix And I think this is probably the same with fedora I mean fedora generally doesn't fix that many bugs in upstream projects Ultimately, all of the distributors rely on the upstream projects to give them high-quality software So to some extent you'd think that we're cutting out the middle strip the middle man and reporting directly upstream would produce would be more efficient and In particular would let upstream have the bug reports earlier and potentially The earlier bug is found the easier as to fix So arguably everyone wins Ubuntu, Fedora and upstream But for example, there are a lot of bugs in Ubuntu that aren't that high quality And so just of dumping 700,000 bugs on upstream developers without any triage could be quite counterproductive I guess the idea is that with a fully automated bug detection system That automatically refines the bug reports and generates all the required information that we can reduce bug reports actually of higher quality than a Skilled bug report could produce and because it's essentially fully automated Any user is able to able to run the software will generate the same quality of bug reports so you can produce consistently high quality bug reports and Also monkey testing done monkey testing at least can only find sort of crash reports But if it encourages users to learn about how to report good upstream bugs, it could also help other bugs be found And so this dream is that you could have this project. That's easy to use That pretty much any user could use has a like maybe has a GUI and So this would mean because there's you know millions of Linux users if even a tiny fractal and use this highly easy to use software that means that every every GUI project could be tested And so you could find important bugs and even the snapshots so even The unstable versions could be quite stable in practice And this would also encourage power users to use upstream versions And we're made that a lot of bugs will be fixed well before the Debian packages or RPMs are produced Saving saving downstream packages a lot of work The other goal is that we'd have produced that many of the tools would be useful for example by section is a very important tool and often Developers want to know when a we're in particular aggression was introduced So if we had a user friendly by section tool that would be useful if for bugs that went on by a key test Empowering users well for one thing when the user knows a particular change that introduced a bug they could for example You can have a GUI that says to invert this particular change And so if there's a particular bug that really annoyed them they could simply undo whatever change caused the bug to occur So it would let them sort of scratch their own itch to some extent And also earlier our users to more effectively contribute to upper stream projects would allow them to sort of Be on a more equal footing with developers Because I guess an open source software you also often have the feeling that the developers are just sort of throwing out this code To help these users and the users aren't really directly contributing back which leads to a sort of uneven relationship where it's of hard for users to make requests in some cases So Monkey testing in principle is a fully automated process So the cost and pretty low One of the things I particularly wanted was to not require monkey testing to require Develop developer time from the upstream projects for one thing of course because developer time is very valuable And also because if you want to sort of test, you know every great project Then there's lots of lots of good ideas But you can't keep every developer or every upstream project to be excited about it So it has to be something that doesn't you know require any buy-in from them Clearly you CPU time is something that a user can contribute very easily because desktops happen to have huge amounts of unused CPU time And also although users aren't willing to spend a year usually learning C So sort of a small amount of time just to have a look at What the bug key test has done And then report a few bugs is probably quite reasonable Has anyone here heard of progress quest? So does anyone here actually enjoy progress quest? Okay, no, okay, but But yeah, so the progress quest is a game where absolutely you don't have any interaction It just sort of sits there and it tells you that you're becoming more and more powerful So Key test is kind of like the same thing. It just continually tells you have found all these new wonderful bugs And so just watching the process I find so interesting seeing it just sort of counting over and finding new bugs when we're finding them Sort of interesting and so I expect that it's that sort of minor contribution is something that a lot of users will be willing to contribute Is it free? Well the script is GPL version 2 But what I'm really talking about is does it require resources that are scarce? So CPU is cheap, but for example memory is not so cheap on a desktop machine It's easy to say the background process can have CPU But if the foreground process needs it that can take the CPU it's easy to nice things On the other hand, it's harder to suddenly swap out memory when the user starts using the machine So memory is not as cheap as CPU and so it can cause the program processes to go a bit slower in my experience There's also a lot of things we have to be careful. Otherwise, it could really piss off the user the standard example is What happens if you're testing like a word processing application and the random key press goes print? and Done prints again and again and again that that's not good So obviously we run this as a user that is doesn't have the rights to print And there's various other problems of here like then try to allocate all memory So we detect if that happens and kill process and we start it also the Process and Linux distributions are often quite buggy like no session manager and so that can leak file handles Which you don't want to happen either because that's going to annoy the user of when they stop being able to use their desktop so we try to avoid causing other parts of the components of the computer to fail and That seems to be really successful. So at the moment the only thing that really is annoying as it tends to use more memory than I would like So also, but when I say Monkey testing can produce by sectional ports The actual testing of each version is essentially free because it's just to see if you guys this version tests We have an automated script to reproduce the bug does this version of the bug occur on this version of the software What's a little bit more tricky is compiling all versions of software There's no standard way to say I want to compile this particular version of the software Because the old version of software might also require old versions of libraries and they tend not to be a standard way of saying Get this old version software and all the old version of the libraries it needs And so it requires a bit of skill So I think it probably possible to automatically detect what libraries it uses for example We could assume that a version of the software that's from 2007 probably wants libraries that from about 2007 as well And so I think this is this is worthwhile because Often users might come across a bug and they might learn that it's that doesn't occur in an old version Or maybe on other hand that it does doesn't occur in a newer version So even even if we're not talking about monkey testing I think it's quite good to give fuses the power to choose exact version of software. They would like to use So although this requires a bit of work to do which hasn't been done yet It does seem worthwhile to me to allow essentially a user to go I want to compile any particular version of the software without requiring too much thought on their behalf So the other question was does it help developers and the answer is yes Developers the particularly the developers in the next project found the by section reports very useful They found it very useful to be able to go. This is the particular change that caused the problem And in particular they found the ability for monkey test to continually take Test the very latest snapshot very useful because it often was able to find the bug within about five days of it being Interested into the main version control when the bug was still fresh on their mind Whereas typically users when they report against a stable version It's you know years past. They don't remember why they did it They know they did it in an odd way and they think they're maybe trying to work around some other bug But they can't really remember so they found that very useful Another thing that was a little bit useful was party recipes for existing bugs So as you say users often forget to put Important information in or maybe they have a bug which really annoys them, but they don't know exactly how to reproduce And so when monkey testing comes along and finds the exact same bug and says here's a recipe to be produced Here's the back trace and all the other information you need They found that quite useful And so since the monkey testing can help the developers produce better software You might wonder does it actually you know make them happy inside And I discussed with them and the answer is no not really It's kind of a victim of its own success when you And technically speaking having 60 known bugs is better than having 60 unknown bugs But suddenly having 60 bugs are preparing your database can drain motivation because I guess if you like having a low known bug Counts and you bug report and you fix it and you have a zero bug known bug counts You know go to bed feeling happy and suddenly that doesn't you can't do that anymore 60 bugs You have to change the way you think you have to start for example realizing that just because that bus is a bug as a crash crash bug It doesn't absolutely have to be fixed right away because the only person who cares is a monkey who doesn't have any emotions Because it's a virtual monkey, and so it doesn't actually matter at all So maybe we could limit the rate of artificial bug reports For people who work for a company or something there's sort of a coral a parallel of this to sort of company politics It's generally well known that if you have a new cool way of finding bugs It's often not a good idea to do it immediately before your performance review because you tend to be judged on the known bug count Not by other measures And so it can be a bad idea particularly if your boss has a nice pretty graph that he wants to show to his boss You know showing that your known bug counters can you know converging to zero and then all of a sudden it hits hits the sky So it finding lots of bugs doesn't necessarily make the developers happy and I certainly know that the script I'm using has a lot of bugs and I just just like pretend they don't exist when they're not biting me On the other hand it doesn't find all bugs It's sort of a paper that suggests that only find monkey testing only finds 10 to 20 percent of bugs And also for example only finds crashed, you know assert bugs sort of these hard bugs that have a very sort of technical definition But I think these are perhaps the most important bugs because when you're testing software And you just like do something very simple like saving and it crashes and then a hundred other users have the exact same Problem you just think what's the point reporting bugs because you know everyone knows that it's buggy So I think that this these monkey testing does provide a approximation of a user because users will often use the frequently used Frequently used functions that have hotkeys if a function is frequently used will usually have a hotkey in which case will be tested very frequently By the monkey testing software So what's the status of this the moment well, it's still very experimental As I said at the moment I just like to pretend bugs that don't exist if they they're not hitting me at the moment But I've been running it for that intervention in the background for quite a while In fact, I can probably show you at running it in the background. It's running as we speak Now it's also quite aimed at licks at the moment So for example the bisection utility are I use only supports a version And so people who are familiar with git might say well, I don't care because Get bisacked is probably better than whatever you wrote anyway And I think that's actually Ultimately true probably the better approach would have been to to convert everything to git and then just use git bisacked Now there was a few nice features of a I beset by sector a Of the beset I use for example, I keep snapshots So if you want to test where the particular bug is reproduced it will quickly go You know, this is do you think this version is good and you think that version is bad Well, I'll just check your right and we're using these pre-compiled snapshots and they'll be very quick And so you'll quickly know that you'll you'll be able to do to bisect and you can just trust that it will Run overnight and give you the report you need Whereas if you have to compile each time you can do each test It might be a while before you'd realize you made a mistake and the bisection isn't going to work So it's good to have immediate feedback that works The other thing is it's designed to go backwards. So git bisect. I understand requires you to manually say this is a good version That's a bad version So I kind of use a heuristic to try and guess what the best good and bad starting points are So there's a few little things that you know could could be made more general And the other thing is for example, as I said, it's good to have consistent bug reports And one way I did that is to have a I'll show you later when when you can port bugs it will automatically fill in all the Fields required, but this is only links into the track. So I've used big bugzilla. It doesn't link in So it's kind of has lots of features that are specifically aimed at licks or licks like projects And I think for example Sometimes it allocates way too much memory and for some reason it doesn't get caught by the automated kill script And so you probably want to use something like the C groups feature anyone come across that No You'd see groups as a way of monitoring Resources used by groups of processors. So you can say this entire group of processors is only allowed to use a maybe a certain amount of memory And there isn't anything really resembling a GUI yet So I just thought I'd show it to you in action Okay, so this is running in the background Now what you can see there is Here are all the files. It's using the only one that's really interesting is the key code pure file there That's that's the file where it restores all the key codes that it's used to try So they can if it finds a bug it can replay that file and get this the bug that it So what it generates is a list like this so For example, if we take this one at the bottom here This this slash a means alt so it means press alt I a if Control shift L control I alt V oh And so we can take this licks up of it prepared earlier and go Shift L I don't know good at typing exact sequence of sequence of commands and under pressure, but anyway There it produces a bug I can we we can reproduce it and we can There's a search function They're put in and that's useful because it may be the case that someone else's would reproduce this found same bug And so it goes into the track database and sees if it can find a similar bug And as I said it will report we can have a report button So it will fill in all the fields required. So for example this particular bug isn't so much a crash as an assertion For those who know what that means and so it automatically falls in assertion and because key test has a tendency to generate huge numbers of bug reports They they generally want to have a key test Key word as well so they can quickly find that and maybe deep prioritise deep prioritise them or whatever And so once I've filled it in we basically take the mainly what I do The only thing I really change is this to reproduce so I give a plain English Meaning of what the keys mean. So I say well, what is this actually doing it starts a math manual And then we I navigate to this part of thing and we do that and so it's not just a bunch of keys That don't mean anything Now there's also So I guess the goal is to increase the prop Reliability of software while not while not requiring any expensive resources But of course when you're finding 60 bugs there if it required to fix those bugs is quite substantial. So What is perhaps a better approach ultimately is static analysis? So people who know see might see there's a bug in that line and what static analysis can tell you is yes There's a bug on that exact line and the way you fix it is by changing that five to a six And so there's no almost no thought required it would make things a lot easier You don't have to track things down like you do in monkey testing And so this has a potential to be really To fix a lot of bugs without requiring, you know the same amount of effort it would take to track down a bug Normally using it's even like monkey testing The downside of this is that formally reasoning about code is very difficult I've basically done a PhD in formal reasoning and the idea of trying to reasoning about reason about something as complicated C++ gets me nightmares Whereas monkey testing my first I had a prototype that worked fairly well after like a weekends of work so The open source tools tend not to deal with C++ Very well, or they focus instead on enforcing some coding standard But of course if you are a user going up to a developer and saying I don't like your coding standard You should use some average e-tools as calling standard. They're probably not going to like you very much This but on the other hand if you say this this tool shows that if you do crash here Or this is it obviously incorrect piece of code that will cause problems or security flaws. They're much more Have much more open ears There was a project whose goals seemed ideal for this purpose they focused on finding obvious defects that were Fulturing out false positives and so it seemed a very useful sort of approach to just increasing the overall quality of software without You know putting too much extra load on the developers the downside of this is the software is our closed source and even more fundamental problem is that I sent them a couple emails they didn't respond so I It's it's been I used to to me. I guess if I can't use it even So some pretty monkey test still needs work If I try to apply it to a new project that needs a bit of fiddling So for example the Abbey word it tries to run Abbey word in GNU debugger But what I found is that Abbey word tends to use a script to run the binary and so it GNU GNU debugger requires the binary itself so you have to sort of pop open the script and find out how to run the binary Which of course we could potentially do automatically But you know each time you find a new project you probably find some extra thing that would have to be added to monkey test To make it work automatically And so as it stands you do require a little bit of thought from someone who actually knows programming to get it to work on the new project And so in some sense monkey testing was mixed was very Successful because you found over 60 bugs But the real advantage is not the number of bugs found they have it So it's an advantage that you have these bugs and they can fix them the downside being that it creates a little work for them and It kind of drains their motivation when they all of a sudden find that they can't sort of fix all the known bugs and go To sleep so the real advantage was finding recent regressions So you can find of both that's just occurred and they can fix it while it's fresh on the mind and they find it very easy to fix those bugs So I think that leaves about five minutes of questions or something Questions excellent What sort of connection or interest have you had in the security? Implications of doing this obviously lots of people do like fuzzing for security research Has there been any contact with people in relation to that or anything you've particularly identified? Well, I'm certainly interested in security and I've vaguely talked to people who talk about improving our secure systems and Institution I think object-oriented capabilities where whenever you pass in a Descriptor of a valuable resource or parties in the right to use it. So for example with You know can imagine like with a WordPress or something you go file open my document And then when you click the file open my document the the GUI Passes the WordPress to the right to open that particular file So I find that interesting because it means that the it's greatly limits the damage But I think that there's a moment here it's probably isn't that relevant security because GUI software usually doesn't have that much. It's usually running with the same rights of the user So the person who's typing on on the GUI The only right that you get from breaking the software or the rights have already Right, and I guess the like the with most fuzzing it's taking a file format. So it's assuming That somebody has like maliciously given them a file whereas it's less likely that someone's maliciously sitting on their keyboard Sort of typing I imagine there's possibly software where maliciously bashing on the keyboard could score the problems like this possibly yes like Studio root applications that maybe but you maybe hope that GUI wasn't running with the root Who knows? Yeah anyone else Which I did have a quick question about When the You sort of get that sequence of key commands back to to generate an error Does it only test that sequence of key commands once on any given instance over an application? I'm curious just how If you I suppose if you're dealing with any external hardware that might be in different error states or something depending on what might have gone before it Or if you're dealing with parallel processing or Quirky stuff like that Does it just run that? Run through until it gets that the error state log the bug or can you make it sort of run after it's found in it? Like I found a sequence of characters that produces an error. Can you get it to run that a few times or? provide more information just to see what Whether there was any more information about the underlying situation that caused the problem I don't configure the number of times it attempts. I just leave it to run If it doesn't find out a few number of times I just go well, maybe it's all reproducible And it's that hard to reproduce the developers aren't it's going to be able to do it anyway And this project is you know, they really need to prioritize which both they perhaps and so the easy ones to pass They're easy to produce for they're easy to be reproducible to get prioritized Excellent okay everyone thank John with me You