 OK. Welcome to this talk. The speaker is Alessandro Amici, a good developer, I think. It's a friend. So he will explain some stuff about pie tests and so welcome. So this talk is about test-driven code search. And this is a rather new technique, not so new because someone already tried it a few years ago with Java. But it's the first time that I see it applied to Python. The idea is pretty simple. What we produced, what we did was a very basic search engine. It's a pie-test-nodev, it's a pie-test plugin, that enables you searching for code inside your machine, on packages that you have installed on your local machine. The special thing about the test-driven search is that you use a test as part of the search query. So you may use some metadata and try to refine your search, but at the core, what you are looking for is what you describe within a test. We call it a specification test. There is something that tries to specify a behavior or a feature without going too much into the details on how it is implemented. And once you run your search engine, you will get some search results. So this is a list of functions or classes or whatever object, actually, that pass the specification test. The documentation, the core of the tool is the pie-test-nodev plugin, and there you have the main documentation, but there are a couple of other tools that I will show during the talk. Now, since this is something new, at the beginning I organized this talk to be somehow theoretical, but then I completely rewrote it yesterday because I think really good examples make people understand much faster. So how it works? Do people here know pie-test and pie-test fixtures? Okay. Now, basically, the base implementation detail is that the plugin provides a special fixture that's called candidate, and you need to use this feature when you write a test that you want to use to search for code. What will happen is that the fixture will effectively parameterize your test by passing it all the objects that it will manage to find in your environment. So if you install 10 packages in your virtual environment together with pie-test, it will collect all the objects, all the live objects in your standard library and all the packages that you installed. Then, obviously, since this will be a parameterized test, the test will be run every thousandth of time, most probably, and once for every object, and the object will be passed, a reference to the object will be passed into the candidate variable. So you basically will use this candidate as if it was the function that you are looking for, and then the search engine will just tell you which functions, classes, or objects in general actually appear to behave exactly as you intended. So let's do our first search. You want to search for some kind of a function that has a feature, for example, let's search for a function that, given the name of it executable, returns the path to it. This is not just a nice example. This is actually the first real case that we have. We had exactly this need, and we started searching for it on the web. We didn't like the results, and we said, okay, no, this is the perfect test case, because it's something easy. It's easy to write a test for it. And maybe there is something somewhere in my environment already that does it. Obviously, you could just write something like a sub-process call to which and then pass the result, et cetera, that would be hacky, and it would not work on Windows, so it's not the best. So what is the specification test? I write a standard test function for PyTest. I use the candidate fixture. Then, just as to have the test more readable, I basically renamed the candidate to which, which is more or less the idea that I'm looking at something that works like the which command in the standard library. So then I assert the behavior that I expect. If I ask SH, the shell, I want to my function, the function I'm looking for should return bin SH, and if I pass it the string M, it should return USR bin M. These two are two very common UNIX commands, and they are the one among the most tables, because a lot of commands can't be in USR bin, or in bin, or spin, or... But these two are the more common. So once I have written this test, I write it to a file, and then I just run it as usual with PyTest as usual, just I need to add a candidate from all. This means that the candidate function, the candidate fixture will be parameterized by everything I find in my environment. So this starts a standard test session, and I get usually something like 5,000, 6,000 objects. This depends very much on how many packages you have installed. This is not many. It's easy to go into the 30,000 or 50,000. And then it just run for a while. We will see in a minute. And since the test is expected to fail, the PyTest will print a small X when the test failed. I mean, you are throwing basically random functions to the test, so you expect it to fail most of the time. But then you have capital X, which means that the test passed in... It was not expected, but it passed. At the end of the run, you have many, many X. What you expect to do is to have a result, and in this case, we found three functions, three objects that passed the test. And this is the report. So for my test which file, we found a case in which the test which function passed, and this is the executable. Now, I have the test function as well, and let's see how it works, how much time it takes. So right now, I'm not using PyTest, pure PyTest. I'm using kind of a boxed run on PyTest inside the Docker container, because when you throw random arguments to random functions, anything can happen. So if you try to do it on your machine, you will find backup files with crazy names or probably connections to wrong hosts or whatever. So you prefer to do it in Docker, and at the end of the run, you throw away your Docker environment. So what happens is that right now, it's collecting all the objects. Now, I have a little bit less objects done when I did the test, because I blacklist objects all the time, because they might crash your environment or, I don't know, open up a browser, et cetera. And this is what happens. Now, all the tests, that test is running with all the functions. We see small x, it means that we didn't find n much, but here, we have one x. So this is one of the, we find at least one function that actually worked. This takes approximately 60 seconds, and everything goes okay, and now we should also have some garbage on the screen, because since you are using functions and classes in unexpected ways, you are always throwing random stuff to it. Exactly. You end up discovering a lot of bugs and in the package that you have, because most of the printouts is exception in the Dell method that are ignored but printed to the standard error. Well, I finished. Now, what happens once you get the result? You say, okay, I have tested a very easy, very basic test, and what do I do? Well, since I have a manageable number of results, I can just have a look at them and decide if this is really what I want. This is the, sorry, the distutil spawn find executable. The name looks like what we are interested in, and this is inside the standard library. So it's very useful. Maybe I don't need to write any code for my find executable, for my which function, because I may just use this one. You see, that's more or less what I thought. It gets path, then it splits somehow in OS independence way, then it does some Win32 checks that I even didn't think I needed, because I don't use Windows usually, but yes, might be useful. And then it just tries to see if the file exists inside the path. It's not really the best. I mean, this file, it doesn't check if the file is an executable. So it's not really perfect, but at least I have a template, if I want to improve on that. Then I have, be expected with the switch. I don't care too much about that, because I already have a function in the standard library, so I don't need to add a dependency to my project if I want to use that. But then they have SHD, which, whoa, this is even more standard in the standard library, and this is the code, and if you have to look at the code, it's much, much more complex, and it has a real access check, that means it checks that you can read it, that you can read the file, and you can execute it. Plus several details that I would not have thought, I would not, it would have taken me one year of production to get right. So very nice. Unfortunately, if you go into documentation, you learn that this is only a Python 3 only, actually Python 3.3. So if your code, if your use case needed to work in Python 2 and 3, you can get to a very nice find executable, it's still in the standard library, it's not as nice, but okay, or maybe you can just take it as template and get it better, or if you are Python 3 only, you have the luxury to use which, which is great. Well, how many of you already know the which function or how to solve this problem? Okay, a few, right? I mean, it's in the standard library, but I mean, I didn't know it, and it was faster this way than to, to, to look for it. Okay, let's go back. Now, this is a very simple example, but it also shows how things work. Now, the, one of the point is that in this case, input and output of the function were really easy. When you have something that, where the reasonable implementation is really easy, it's easy to write a test, but as soon as you look for more complex stuff, writing a test that doesn't, that, that is somehow implementation agnostic that doesn't make too many assumptions of the implementation is complicated. It's more complicated, but actually Python is really great to write stuff that it's not too tight to the implementation, to the details of the implementation, because it has dynamic measure. For example, using that typing, you are not forced to guess the right, the right data type. The in operator is extremely powerful, and a lot of classes even work nicely with the in operator. That is, instead of looking if the, the, the result of your function is a list, and the first element of a list is what you were looking for. You just use the in operator to see if somewhere inside your function, inside your result, there is what you expected it to be. And then there are, you may write specific helper, in particular, we wrote today, not have specs, that helps you, that leverages the inspect module to go even deeper into the search of where if the, your result actually contains the, what you expected it to contain, even if in crazy ways. So, let's see how you would write a, a specification test in a way that is, that tries to be more independent on, from the implementation. Here, I want to parse an LFC 3986, 86, URI. This is also a real, a real test, a real case. And so I use the candidate fixture. I just rename it for, for, so I read it nicer. I use a test URI. And then I get my, all the functions that I will get will be passed this URI, and I expect it to return some kind of tokens. And then here, I will check if the schema and the path that I put in my URI are correctly parsed. Now, since most, there are a lot of false positives, if you, that are just strings, I mean, just function that return the same string as the input, I check that the return of my functions is not string. I don't care. I, I really want the string to be into, divided into tokens. So I don't want one string. I want some kind of list of strings. So let's see how it goes. This is the naive implementation in the sense that I didn't use any, any special trick except Python standard in operator that is overloading, et cetera. Now, this is going to run. Come on. And usually, I have different common lines that can be passed. And those mostly needs to restrict the search space. If you already know that some part of the, of your, some packages are not useful, you want to restrict the search space so you get faster. But this one candidates from all, it's the more powerful. It's just search for everything and anything in your environment. So this is where, obviously, I tested just before the, I don't know. Let's see. I have a second run. Now, let's see. It takes a little bit of time to run. I also tried a second, the second example that is the same, the same parsing function that is instead tested, the test is written using some advanced functionalities. This is a container. It gets an object and it makes, it's a proxy object that when you use the in function, it tries really hard to see if the, the item that you're looking for is somewhere in the object. So for example, it looks into the attributes, into the properties. It's, even if it's an iterable, it looks inside every, every item inside the iterable. So it's extremely thorough. Let's see if we manage to not kill the queue. So, okay, apparently they're both running. So on this screen, I have the naive test, that one that trashed before. It was some kind of risk condition because it's going okay now. And now let's see what the results are. Okay. I got several results. Now, the first three results in collections doesn't look very good because keys view, chain map, user string really look like a false positive. That is, they're not trying to do anything with RFC or URL parsing, but it's just their packaging somehow, this thing that you are giving them. But then you are, you have this RFC 3986 API, you reference that looks very nice, but also the URL parts, your parts. That means you hit function that are able to do this, both in a package and also in the standard library. Now, what is interesting is that in both cases, both URL, the URL parts inside the RFC 3988 package and also the one in the standard library, they don't return lists. They return classes. So how exactly this worked with a class? The point is that a lot of people are quite smart and they give you some way to get to access stuff or to test stuff in an implementation independent way. That is, the two implementation that they used actually provide an underscore, underscore contains underscore, underscore method that tests exactly if they managed to find pieces that test exactly like a string, like a, sorry, a table or a list. So it's not a simple type, but it's a class that behaves like a type. Very nice. So you can happily use this one for most of your need, but if you need more features, if you may explore the code and you see this special package has more features. For example, it's able to recognize the username, which the standard library function doesn't. Now there's something even more interesting. The other test that one that uses a dedicated proxy object to do the containment test has found one more object that matches. And this is a class in the PIP product that actually does the right thing, but doesn't provide the nice containment helper functionality. So we managed to get it as well because the helper function tried very hard to find if the postgres and the path, the schema and the path were inside the class. So this is a way to get to test results in an implementation-independent way, but then I want also to pass arguments in an independent way, in the implementation-independent way. So in this case, what helps me is the parameterized marker of PyFest. For example, in this case, I'm looking for a function that just removes comments from a stream. And the main point is how do I represent the stream? Because this is my text, this is the readout of my configuration file, for example, and I want to strip these comments here. So how do I do? I use a parameterized argument so that I can say, okay, I have different functions that will make this text, this comment into different shapes. I can pass it as it is. I can pass it as a list of individual lines. Or I can pass it as a list of individual lines with numbers. This is how my application actually was doing this part. Or I can pass it as a file. Now, in this case, since I have a lot of parameters, I will run not just 5,000 times, but this will run 20,000 times. So I prefer to restrict my search by just including any function whose name matches this regular expression. So I want something that has to do with comments. This makes everything much, much faster. And here it is. So I find an ignore comments function in PIP. Very good. Because PIP is something that I might assume it's a light dependency. And this tells me that the text to stream that passes is the third one. So I go back here. It's 0, 1, 2. So this is the way which was exactly the way I preferred. I could have worked with all the other trees, but this means that I don't even need to change my application to use that function. So, by the way, extremely fast. This is the ignore comments. It's very simple, and it has also the feature that skips the line if it's empty. And this is the reason it also returns the line number because it doesn't return all the lines that you pass. And look at the other function just below. This function takes options, which is a special class. And this class must have the skip requirements, otherwise it crashes. Oh, God. Even if I needed this, I would never, ever manage to pass the correct parameters to it. Because these parameters is extremely tied to the implementation. So I do tests that are quite loosely coupled with implementation. I try to be as implementable as possible, but I only find functions, callables, or classes that are good code. They are not, they don't mix implementation details used to see. You could just skip these skip requirements, which has been a keyword argument with the same default. And the function would be as useful, and I would have been able to search for it, or to use it. So, when you search, you may either get only relevant results, which means your query is just perfect, or you have to refine your query. So if you don't get any results at all, which happens quite often, it means that your test is too strict, and you probably need to remove test cases, edge cases, or probably just use a lower number of normal cases. If you find a lot of results, but they are not relevant, it means that your test is not, it's too weak. It's not strict enough, so you need to add more cases, more describe your feature better, and probably add more corner cases. If you see, if you appear to go from no result at all, to no relevant result, and back, it means that you don't find anything you most probably are looking for a function that is not in your environment. Now, this is the base of code, test-driven reuse, which is something that started, have been studied a little bit in the Java community, and the idea is to use, of test-driven reuse, is just that you start, like test-driven development, you start your test, maybe you try to write it in a more independent way than you would do if you already know what is the implementation that you are doing, and then you try to search if you find a function that already works. If any code, if any function pass your test, then you have three options. So if you don't find any function, it's test-driven development. You have to develop it, so fine. Otherwise, you may just import it, that means you get the dependency, and all you may fork it, that is, you get exactly the same code, test, you check the license, and copy into your project, or you may just have a look at it to see how many details you didn't think already. Another trick is that you may just use the test-driven code search, which is a tool by itself, for unit test validation. If you wrote a test, you think it's a good test, then you make a search with it, and you find a couple of totally unrelated functions, it means that your test is too weak. It finds false hit. So, limitation of future work. The main point right now is performance, and then you may do a lot of things like extending the search space and making more tools, but then you get even more work to do, and so performance, performance, performance, and parallelization, etc. It would be very nice if this was not done on your machine, but on the web. So what we are trying to do is to make kind of a search engine on the web, if you want to know when things are starting to roll, write an email to the email here, and we are looking for people who are willing to test. Conclusions. If you start using it, you will recognize much better what are good tests and what are good codes, and you will tend, at least this is what we notice, we tend to write your code so that all the implementation details are as far as possible, as simple or as intuitive as possible. Thank you for your attention. Okay, do you have any questions for that? So do you filter somehow already on, for example, the number of arguments that can be passed and similar things, because if a function doesn't take any argument and you need something that takes one, then there's not a valid candidate, for example. I didn't understand. When you look for candidates of things that solve your problem, do you filter already on the things that you don't? Right now, no. This is one of the reasons a web search, I mean a curated index of objects would be nice, but it's very difficult to do it on your machine. I mean, with Python, it's, you may tell how many arguments you are in a function, but not much more because due to typing, you might not actually want to be too strict. So the idea behind having a web search engine is that you have a curated index of what kind of function may fit a particular test or not. Is there anything for timing out functions that could take a long time? This is already taken into account. Every test has a time out of one second. So a lot of the stuff that are tried are time out. I use prompt, row input, etc. It doesn't give any problem. The only, the real problem is when you call C extensions and they just crush the interpreter. I have a long list of, I have a long blacklist for this kind of things. Another question? Yeah? No? So how do you deal with multi-argument functions where I don't know what the order of the arguments is going to be and what's the time complexity of that? Okay. So this is what he, in the first row, tried to do at the beginning. There is an automatic permutation of arguments. I refuse that because right now, to get correct, it's more important than to a large search space. But as I was writing this talk, I noticed that you can easily use the parameter. You can easily parameterize your function just switching arguments. So it can be done right now by just passing the parameterized with switching. And the complexity is very hard. If you have two arguments, it's two, but then it's n factorial. With four arguments, you are already very, very heavy. It's very easy to go into the hundreds of thousands of tests. Now it was really a small environment for educational purposes, et cetera. Thank you.