 I would like to introduce Pablo DuBall. He's going to talk about what if we could automatically test our textual instructions. So yeah, he's going to do his presentations and then questions afterwards. OK, hello, everybody. I'm using Bimmer with the Singapore theme that has this nice feature of showing you how many slides I have per section. So as you can see, I have a lot of generalities and very few details because the details are very messy and I don't want to bore you and overwhelm you. But this, for me, is like my presentation in front of Debian. I have been working for many years in things that were very secretive, so to speak. And now I'm out of job. And I can do more interesting stuff with a wider community. Because of the type of things I was working, I cannot really say much. I go to things I like that the natural language processing community has been working on. And I think we can profit within Debian. So this paper was a paper from last year from a research group from MIT. And it's a really awesome paper. When I saw the presentation about it, Professor Basile was visiting the research center recently and she showed us the paper. I thought, wow, we should be able to profit from this in Debian. Also, the thing that really got me even more interested in profiting from it in Debian is that the whole paper was built on top of Microsoft. And I was, OK, if they have this technology, we can have it too. And one of the things that I learned working for big, private companies is that many times they develop the technology, but then they don't use it because they're different business units just because the technology is there doesn't mean it makes sense. That's one of the beauties of the free software world is that we are much reactive. We can adapt new things much quickly. And I think that's one of our biggest strengths. And we should definitely go for it. And try to take advantage of things coming from universities as much as we can. And this paper received the best paper award in one of the top conference of my field. So the problem they were looking at is you go to the Microsoft Knowledge Base website and you have a bunch of instructions that look like this. This is an instruction of how to remove a file, a temporary folder, MS, download, TMP. And the instructions go, you know, click start, point to search, click for files or folders, in the search results, dialog box or some menus, click for their options, blah, blah, blah. And what they were interested in was trying to transform this into actually what we will consider a shell script, you know, actually feeding this into the computer and the computer will do it for them. And so what they were working is into mapping those textual instructions into actions that look like this. You have a command and you have some parameters. Well, the parameters refer to GUI objects, yes. I mean, if you will have come to me two years ago saying, you know, this can be done, I will say no way. Yeah, I mean, because the space is huge. You know, you are searching in a really, really big space. And also because you are dealing with things that are very difficult to deal with, like clicking buttons and stuff like that. So up to some point, okay, yeah, here. So up to some point they pull it through. They managed to make this work, yes. This is one of the best kept secrets of natural language processing. We always find some granularity where things, the numbers look good, but most users really care about the column where the whole problem gets solved and we don't really solve it yet. But still, they are getting a mapping between instructions to actions in a fully unsupervised case to 65%. So 65% of the time without having any human annotating anything, they are telling that this instruction means this action on the GUI. Yes, I'll explain later how they managed to do it. And then when you go from instructions to full sentences, of course, more error. And then you go from full instruction to full documents, then the results are much interesting. But their algorithm is such that you go consuming text and you go executing these actions one after another, yes. And the most interesting part about their approach is that they do this by restarting a virtual machine but they don't even use a free software virtual machine. They use VMware to over and over again and just sampling the possible actions you can do and doing reinforcement learning over that. That's also one of the things I hope this talk can help is to say, wow, such a radical idea. Actually, for these guys, it works. So it may work for other things. I mean, the idea of doing an exploration of the space of virtual machines where you are restarting them and seeing what's happened, it's pretty radical. Okay, so their training data is this text instruction plus some way to tell that you are doing the right thing. Which in this case, they do it is to say, the instructions refer to text, like show hidden files, yes, things that you should be seeing the GUI at that time while you are doing the point and click. So the reward function is, as you go through these instructions, you make sure that you are seeing the text on the screen that you are supposed to see. And if you don't, then you say, bad algorithm, bad and you discount the probability of that thing to be correct. The output is that type of script. And as I was mentioning, the method is to run this in a virtual machine and execute first at random and start informing a model that map things. And the process is quite complicated because the instructions can be out of order. You can see plenty of text, select, run after clicking start. Yes, so the ordering between these, so this is the fraction you execute and this is the second action. Yes, but in language, we like to spice things up and make nice sentences. The other thing, there are multiple ways to mention, they count at least seven different ways to say click on the start menu. Yes, you have aggregated phrases, phrases that refer to multiple actions at once and you have high level instructions that in some instructions that will be explained that start the web browser means clicking on an icon on the desktop, while in other cases, you will have just, they understand the users know what that means and they will just say start the web browser. But this is not a talk about what they do, this is a dev comp and the question is this type of technology, how can we use it and why should we care? Well, the main part is that users want instructions. Yes, they give them, I don't know, the warm cozy feeling and they rather have instructions and then modify them or follow them through or they just don't wanna execute scripts blindly. So maintainers from time to time write instructions or their forums online with instructions or you have instructions on the installer scripts. But when the system change, the instructions become stale. So we have in the system places where there are instructions that refer to commands that have been renamed to things that have moved, to things that don't exist and devian has a very strong tradition and automating as much as possible this type of process and test to know that things are doing okay. Like you have LinkedIn, Pupart, we need installed, et cetera. So, this is a very simple way to do it so this type of technology can enable us to continuously testing instructions related to packages. So here is an example that come from the team I part of from the devian Java. There is this program called Hadoop and I was following the other day the instructions on the man page about how to modify an installation. Well, the first thing is that it's managed by alternatives but alternatives now is called update alternatives. Yes, and then it says show Hadoop but now actually it's called Hadoop dash comf and then it says after you modify all these things restart Hadoop name node but now it's called Hadoop name nom d. And these are all silly problems that I'll hopefully fix directly on the man page soon but the point is that the maintainer never realized that these problems were on the man page because Thomas wrote the man page long time ago and nobody really tried to follow these instructions since then. So I've been thinking how to use this. These are definitely not good ideas. That's why I'm giving this talk. Hopefully you guys can contribute better ideas but one of the possibilities will be to have for example a target on Debian rules that let you acquire these instructions. So you register, you say there are these instructions within this package that achieve these certain things and then at the beginning the system trains itself and say okay now I know how to execute this but then with when a new upload comes it says okay these instructions no longer work and then it will notify the maintainer. We were discussing about this last night with Mark and Max and there was like maintainers will never write this. Well so yes, most probably we need something better. The other option that I thought would be pretty interesting is to add a special markup to discussion forum because it's very, very common to find forums that says oh how to do these and then you don't realize that the message is from 2007 and actually the instruction is for a very, very old version of Debian. So if you can say, if you allow the people to add tags to the forum and say okay here is the problem and here are the instructions and then we can automatically markup in the forum to say well this actually doesn't run anymore on Debian. This code is old. That at least will help our users a little bit. Then there are much many, I mean I went around and tried to look for other things like in forums deviant.net there is these dogs how to pips and tricks that has 600 and plus topics. You have, I really interested in the Debian installer because when you are installing you are kind of, this is a moment where you will read something and follow instructions because you have no system to really execute so many scripts for you. The other is the instructions on dev contemplates and one of the interesting things about this is that if the system can be really trained and work well you cannot try to address multi-linguality in the sense of you take the model from text to instructions for two languages and then you can produce the instructions on the second language when one changes. So you can keep them in sync. And the idea is that that should be produced to produce better quality than just machine translation. So I feel like a home out here because I spent five years of my life in this building. You can still see a photos of mine on the NLP group page on the seventh floor. I graduated in 2005. I have been working for Big Blue since then. We were putting together this computer to play a trivia show that's pretty exciting and should be more news about it soon. And one of the, I mean we're living really exciting times right now for natural language processing because thanks to the increased computational power and advanced machine learning, some tasks in NLP starts working. I mean how many people here for example have been able to read something using Google translate in a language that you couldn't understand? I mean that wouldn't have happened 10 years ago. You know the machine translation technology was just using something that wasn't readable. And I'm very passionate about free software and Debian and I'm really looking into getting more natural language processing technology into Debian particularly the tooling level. And I'll have plenty of time in the upcoming months. So I'm hoping to contribute to Debian at a more technical level. Okay, so I want to talk about how these guys did it. Yes, how they managed to get this. Not for the sake of just talking about it, it's just because this may help other people try to do the same thing with this virtual machine approach. Yes, that's the part that I'm hoping is gonna be more the takeaway. The details of the natural language processing are well interesting to people who do natural language processing for the most part. So the idea of reinforcement learning is you are exploring a space where you are taking actions and you have the teacher that come and say well here you make a mistake. Yes, and so you give you a reward that can be a positive or negative reward. And what the learner is doing is trying to get this model that chooses which action given the current situation and the environment has a higher payoff. So what normally you will do in a system is to go sampling actions at a given point and see how the environment reacts. It's very common for example in robotics to use this type of approach. So when we are mapping instructions into actions, the instructions guide the sampling of actions and are used to compute the rewards. And the sequence of actions that maximize the reward is the one expected to correspond to the instructions. So you get the document is a bunch of sentences and you are trying to map it into a vector of actions. And the actions are predicted and executed one after another and it consume a full sentence before moving to the next one. And the action has, as we were seeing on the first figure, a command, the parameters that the action takes. So the command is like click here for example and the words that were consumed to instantiate that action. And the environment contains the objects and their properties. So when the environment changes with a command, then you have a transition probability between the new environment and the previous ones. So again here, the first one is a full sentence, then this is the full vector of actions as it's being executed and these words are already consumed. And on the left side, you can see the environment. On the right side, you can see how the environment is changing as these actions get executed. So the first action was a click on the run menu and then the run menu gets clicked and you have the run dialog there and it says type in sql-msc in the open box. So you have the box that has the label open and the system will type then sql-msc there and then you press okay and conclude the sentence. The reward function they use as I was is to check which words are still visible after you go executing these actions. If now what is visible, they consider that they have hit a complete dead end. They click outside a menu and they just get the empty desktop. Then that's a real dead end and you get minus one. And then depending on the amount of words you get there, you will get a weighted reward between zero and one. The main drawback of the technique is that if you wouldn't have this, then you will have a human that has to be sitting there seeing two million rewards of a virtual machine saying this good, this bad, this good, this bad. So this is the key to their technique to work. And so the same goes. If you are trying to find out the best parameters for the Debian installer, for example, you will need to be able to do that without any human intervention. And the environment for them is all the UI objects in the screen and the properties such as label, location, and parent window. And they have commands like click, type into, and the like. In total, each environment encompasses 4,000 features. Which is quite a big number of features for any problem like this. One of the interesting things is they train all this in only 128 documents, which is very small for the magnitude of the problem. The average number of, and each document has around 10 actions. So just to get an idea of the magnitude of this thing, imagine you go to Windows and you have 10 random options you can click. You can end up absolutely anywhere on the system. Yes. And with only 128 documents they managed to train this. The algorithm they use is gradient descent. So it's an algorithm that can get stuck in a local minima. But the idea is that they go sampling the different actions and then computing the reward. And from the reward they compute a gradient that they use to modify the function that sample new actions. So at each time you say, well, I did this action and it was kind of bad, so next time I won't do it. But if it was good, then you keep repeating it. When the circumstances are the same, so you need to take into account the environment. Anna has a differing view. So, and again the results, they also then combine with some supervision and full supervision. The author has to go and mark up these actions, 120 documents by hand. And the interesting bit is that there is no so much different between these things. With the full supervision you still don't get much. So, initially I was really hoping to get this some example within Debian because the others make the code available on their website. So you can get all the Python code for this. I haven't been able to find which license they are putting it on, but the code is there. And the main problem I'm having is that in the Debian world, we don't write instructions like this. I guess our users are a little smarter. Yeah, I mean, in general, our instructions are like this which is a mixture between words and actual shell scripts. And that's already a very different problem than the problem they have. And our state of the system, it's much more than just the things you see around the start menu. So it's not going to be so easy to profit from this new technology as just adapting it. Maybe more for the GUI aspect of the known, but I look around many forums, even Ubuntu forums, and people seriously, nobody writes something like this around here. And I guess, well, these are professionally written instructions by people who are expected to document things very detailed. So I will be very interesting to talk, okay, so now where to go from here? You can throw me some tomatoes, but I would rather have you guys save them for lunch. And you can forget NLP and use reinforcement learning for something else. We can talk about where we can tinker with this within Debian. Maybe you have some interesting set of instructions that are more constrained and we can give it a try. And I'm going to spend a little bit more of the talk talking about other things with NLP that I have been looking into for Debian, just to, so now that I'm done with talks, we can spend some time in the hack labs and play a little bit. So the first thing I have been looking into is in Debian Java, we have a mailing list that has a huge traffic. It's very, very difficult to keep up today with the number of emails it has. So the idea is to try to supplement the mailing list with a page with summaries and information that is relevant to the reader. Like, you are part of the team maintenance, but you really maintain five packages, so you only want to see the messages related to those packages and you want to see what are the discussions on the other topics. Another thing in the similar vein is to try to make summaries for Debian Devil, yeah, in the same style as the now defunct kernel traffic summaries. How many people here has ever heard about kernel traffic summaries? I mean, they were really good. And they are really good training material. I mean, of course, a computer will never produce something so exciting, but... So the idea is to try to use that to complement submitter-driven new services that we have in the project like Debian News. So if you have anything related to these, you would like to work together on these or you have another team that has a very high traffic mailing list that may profit from this, just come see me and we can talk. Other thing that after being for so many months on the Debian team meetings that can be very interesting is try to automate, I don't know how many people here have used Meetbot. Okay, so Meetbot is a bot that stays on the IRC channel and you can conduct a meeting with it. Yes, so you issue commands to it by just saying things like hash info and then the bot will catch it and say, okay, here's some info to add to the summary of the meeting. And it was written by Richard Darcy in Pure Python and we use it heavily in DEF CON team and many other groups also use it, even outside Debian. And he has a number of commands. The idea is that while you are conducting the meeting, you can say, this thing was agreed upon, this person is gonna take this action and so forth, so on. And you can include further information into the summary by issuing this general info command. But many times the participants forget to issue the info commands because they are too caught into the excitement. And what I'm interested is into trying to have something like also off-node items that are like automatically info messages. This should be fairly straightforward. It might need more work to get the good quality, but I'm hopeful I can get hold of Richard. Maybe we can work on this in the hack lab. Okay, I mean, a more general vein NLP can work, but it requires many, many eyes. You know, the whole idea of free software allows for many people looking at the problem and all that stuff, it's key for natural language processing because there are plenty of errors in the current systems. Systems tend to be very heuristic in nature. So it's not that we have unveiled some underlining through about how the human mind processes language. No, we have a bunch of ifs and some statistical models and stuff. So, but perfection is possible, it's just painful. So if we have a task that we care about it, we can work together and make it work. That's one of the reasons why I'm more interested on contributing on free software than in actually scientific, pure scientific things because I believe we can get it to the 90% performance from the 60s from research. Yes, just by working together and solving problems that we care about. So, I'm also trying to start working more in maintenance of NLP tools within Debian. I start working by using the NLP tools within Debian and that's fairly helpful to get back and stuff. And also, I'm interested in getting more free software contributors, interested in natural language processing. So I'm considering offering some sort of online course around these days for free software contributors. So if you are interested, pick me afterwards. This will be around these months. So, to conclude, some new, cool new technologies becoming available and in Debian, we can take advantage if it's useful for us. And I mean, to me when I saw this talk, I was really impressed about the fact that they could make this virtual machine stuff work. So, if you want to take something out of this talk, it's the fact that these type of technology actually work for somebody. Well, and I hope we can add some natural language processing goodies to the Debian tooling. So, let's keep in touch and Dr. Davin of TC and Pablo.db.gmbl. Thank you. Hi, I'm Ashish. I've done some NLP work and I think this is awesome. And so the Microsoft example you gave, the Microsoft plus the research paper, they had 128 documents in their corpus. So basically they had 128 walkthroughs of here's how you do something. You pointed out that there's no forums that have such clear, well said that style, let's say, of instructions, but I think that style of instructions is pretty good that the Microsoft page had. Here's basically what you see, here's basically where you click. So, would it be interesting? Could you maybe name a part of Debian that could, where the following could work? Step one, a bunch of people come up with between 50 and 128 documents that have instructions. And then step two, we set up these VMs and we apply this model and we see if we can induce the actions from them. And then if so, that would be a success that we can try to generalize. But we would probably want to pick. I mean, do you think we should do that? Yeah, 128 really boring instructions will take some effort to convince people to write them, but I'm sure we have some place full of instructions waiting to be tapped, it's just to find it. And I spent quite a bit of time looking around for them. And the instructions I have found were not, sort of when we just need to generalize to these mixed shell script stuff type of instructions. Or look into a wider range, like more like the GNOME forums and stuff like that. Hi, I'm Daniel, DKG. One observation I had here is that each of your documents appears to be on average 55 words in length. It's actually quite short. And I think it might even be shorter than the example that you showed earlier. So generating 100 of those might not be as complicated as generating 128 really long steps. I'm also, so one of the main reasons why I like the text-driven command line interface is because of the lack of ambiguity. So that's one of the reasons that I don't contribute to the textual instructions about how to manipulate a GUI, right? I don't write those instructions because they don't actually work without, and put aside all the machine learning stuff. Human to human, those communications don't work. The strings get localized differently. Somebody goes, oh, I got a great idea to how to change the UI. And they decay much faster than the sort of stable-ish API that we have from a textual instruction set. So I don't, I mean, that's just to give a rationale for why people like me haven't contributed. That's where I'm coming from. Yeah, well, along those lines, it's, you could imagine a tutorial on setting up modern rewrite for WordPress that was like half English, half command line. And there would still be some ambiguity. It would say open a terminal, CD, you know, type the following in, CD at Apache 2, it's like it's available, whatever. Open up an editor, edit this file to have this contents. And so there is even in the command line crossover thing some sort of English and command hybrid. And I guess I'm talking away from the camera. But so it would be pretty cool, even for these more explicit instructions that you're describing to be able to automatically test, do they still work? And that's where I think the real value of this is. So we should, I still want to do this. Yeah, I see what you mean. Yeah, I mean, if we can solve this hybrid bit, just, you know, many of these instructions, you run them and see that they are still, I mean, the problems I had with instruction for Hadoop were not on the English. The English was fairly good. The problem wasn't the actual, well, that was an English problem, but these ones were actually commands. But you need to execute everything, you know, you need to give semantics to the English chunks. So you can also go through the shell scripts and say, well, this failed. The main thing here is that you don't get the reward they have. You cannot just look at the GUI and see how things are doing. So it will need much smartering out. The good thing is that I know the other, I mean, we were students here at Columbia and I mentioned to her that this could be interesting for Debian, so, you know, we can push them into using our instructions and maybe something will come out of that. Hi, my name is Par. So we could use this for testing our instructions. Yeah, so the idea- So write test cases for our instructions, basically. And if they fail, we have to update our instructions. Yeah, so the idea is that the system will detect, I mean, when you upload the first version of the package, it will expect that instruction is indeed correct. And then it will keep checking that the instruction executes when you upload new versions of the package. And then when it doesn't work anymore, it will notify the maintainer and say, well, this might be a bug. With new versions of other packages as well or just new versions of that package? Well, you have to run these in a place with all the dependencies. So it will be new versions of other packages too. No, I mean, do you test in between versions of the package? You can try to do that too. I mean, when you upload a new package, you can test instructions also of dependent packages. But the problem is that this instruction is in a man page. Other instruction is in a readme. Other instruction is, you know, it's going to require quite a bit of effort on behalf of the maintainer to say, well, this chunk here, this chunk there. So fair. Hi, a couple of sources that you could use. One is the Debian installer instructions. And they're aimed at a fairly naive audience. Another place that you could use would be searching the web for blog postings and wikis that start with phrases like, I haven't packaged it yet, but you can get it installed like this. I look at the Debian installer in detail. The instructions are full of branching points. So in reality, the instruction is like a unification of 200 instructions. So, yeah, I mean, if we could explode it and make different branches and stuff, then maybe this thing can work. But the way it is right now, it's difficult to understand even by non-English speakers with a lot of English knowledge. So, okay, thank you very much. And see you at lunch and don't hesitate to. I mean, if you wanna talk about this stuff, I'll be very happy to talk with you guys.