 Hello everyone. We are here with Sylvain who is our next speaker. I saw he's pretty impressed by in the EuroPython website and I'm very interested with your talk I have to say. What are you calling from? I'm calling from west of France actually. And how's the weather there? Actually it's so-so and we are close to the ocean here so it changes every 10 minutes. I'll tell you if Sun is coming through the windows. Yeah, please check our game you're sharing and whenever you want you can start. Perfect, well thanks a lot for hosting this event. I want to wish you good morning, good afternoon, good evening for everyone. And well I'm Sylvain Marier. I work as an analytics and AI engineer at Schneider Electric in Grenoble, France where I'm not located right now by the way. And well in this talk I'll introduce you to a library that is named PyTestCases and I'll show you how to use it to hopefully make powerful tests and reproducible benchmarks. In this presentation I'll start with a very quick reminder of PyTest as I assume that the audience knows about this already. Then I'll introduce you to how PyTestCases works, the basics, mechanisms and then I'll switch to more advanced topics for the end of the talk. Okay so with PyTest first well it's as most of you know it's the test framework for Python extremely popular now and the reason why it's popular is really it's philosophy. It's meant for you to not write extra lines of code. You just focus on what is interesting so you can really reduce boilerplate and copy paste code in tests to almost zero. And how can you do that? Why does it work? It's based on three pillars. The first one is that a test is just a plain old Python function such as this test foo function here. Second you can parametrize tests by decorating this this function and telling in the decorator what is the name of the parameter and the range of values. And with just that line your test will be will generate here two tests actually and the value of the parameter will be injected as an argument in your test function. So extremely simple. And third and it's really the killer feature of PyTest. It's your tests can reuse shared tools, objects, features and that's the place where you actually handle all the complexity that sometimes you need you know setting up a database, tearing down etc. This is named fixtures and actually there are many of them built in the PyTest library. One example that is quite easy to get is the tempdir fixture. So you just declare that you use it by adding it to the signature of your test. And for you when your test will run it will create this particular fixture will create a temporary directory before your test runs in a setup phase then it will inject here the path to this temporary directory and when your test ends it will delete it. And when a fixture exists already you just have to call its name to get all of that. So this is really powerful. These three pillars plus many other features and also great ID integration for example in PyCharm makes it really really easy to create tests and it's actually fun. Maybe two other things to mention for PyTest. If you don't know them have a look at them they are cool. One is the fact that you can write constest.py files that can be placed anywhere to declare common fixtures or other things. And another thing that is really also another reason why PyTest is so popular is that there is a huge range of customization hooks that make it very easy for you to write plugins and this is why there is a huge list of plugins and this is why I'm presenting today yet another plugin. So let's switch to the core of this presentation PyTest cases. Well it's a plugin for PyTest that I started to write in 2018 and its main goal was really to make PyTest easier to use in a very particular context of tests for data science. I'll explain later why tests for data science or analytics libraries have some peculiarities. And a secondary goal that actually popped up on the way is actually to make PyTest easier to use in general. So the idea is that while developing PyTest cases I created mechanisms that were not there and that I now make available directly also so that you can use them without using the core of PyTest cases. It's a Mathieu project. This is the collection of badges that you can find on the welcome page and I have to mention the users. A small community but very active with a great lot of ideas and also I want to thank everyone who has been reporting issues because it helped the library become more mature. Okay so the use case that actually was at the origin of this library is to test analytics and AI libraries such as for example machine learning libraries for I don't know predictive maintenance. This is the kind of libraries that we do at NetElectric for forecasting energy consumption or production. So these libraries, machine learning algorithms, they usually work on tables. So if you are not familiar with the library, the reference library for tables in Python, it's there is one named NumPy for matrices and another one named Panda or Pandas, I don't know the name how to pronounce it, that is the tables basically. And these objects are complex Python objects because if they represent tables with potentially hundreds of rows and columns and you cannot usually set them up with a single line, you require a few lines to create the object and manipulate it. And these are the kind of inputs that we use for our tests and we can source most of these inputs from files or other kind of storage and so this is one way to create the test. For example, we get feedback from production software where some table was not working and we capture that in a file and we include it in the tests. But we also sometimes want to push the limits of the algorithm and in that case, it's better to simulate, to create a table programatically controlling everything, you know, the correlations in the columns, etc. So in that case, we rather need code. So that's at least two ways to source the test data. And there are many, of course, many others. Another need that I mentioned at the bottom is the need to benchmark. So you might tell me that this is not really testing. However, when writing benchmarking libraries or tooling in the past, I ended up finding out that I was recreating almost everything that was already in PyTest. So maybe there was something to do here because a benchmark is nothing more than, you know, running several versions or several algorithms on several datasets. So that resembles very, very much tests. Okay, so to go a little more detailed here, this complexity that I was mentioning is really, you can see it on this slide. It's basically that PyTest was meant to parameterize your tests with things that are quite small to write, such as this tuple here at the top. And for data frames, and I mean, data science in general, we end up with things that cannot be very small. You know, first, you have to create objects, but then you usually cannot do it in a single line so that you then you add other lines to say, oh, I want to drop this part of the table, blah, blah, blah. So you manipulate it with several lines. Second, once you have done all of that, how will the next person looking at your code know what you've done? So you need somehow to be able to document this parameter value, which is really not convenient in the decorator for PyTest mark parameterize. And besides, as I was mentioning earlier, this example that you see on screen is a script that simulates a table, but sometimes you want to read it from the database. And in that case, you would like to use a fixture setting up the database beforehand and draw your sample from this database or file system. And finally, the ID of the case of the parameter becomes very important because you end up having many data frames, more than 20 for a single test function, and you need to have a very quick way to check which one is not working. Okay, so I did not seem really right to use the PyTest mark parameterize decorator for that. So the idea behind PyTest case is really to transform this iterable of parameter values that you see in the decorator here in functions, not fixtures, just functions. You see that these functions, by default, you have to prefix them with case underscore. And the functions, the only contract they have is to return the parameter value to use. So since these functions are created separately from your tests, then the decorator for your tests is a bit different. It's shown at the bottom here. It becomes parameterized with cases. And you still have to remind what is the parameter name to inject, same as before. So here it's name. But now you have other arguments here in this decorator to declare where the collection of cases should be listed from and possibly how to filter them. We'll talk about that later. Another thing that you see on the screen is that now it's, we have a doc string for free because we create functions. So we can document our parameter values out of the box. Okay, so what about IDs and marks? So for those familiar with PyTests, IDs are usually generated automatically by PyTest, but sometimes you would want to control them, especially when your object is complex because it might not be represented in a way that you like. And you can pass these IDs as an either an iterable or a callable in the decorator. Another way to pass this ID is really to attach it to the value directly. It's not shown here, but it would be appearing here between this comma and marks. You can customize the ID. And you can also customize marks on a parameter value one by one. So here I declare the second parameter, the parameter world, to be marked as skipped. And this is done through the PyTest.param. So as you see, it's feasible. You can do it, but it's less elegant than the previous screen. So how is it done in PyTest cases? Well, the default ID is directly extracted from the function name after the prefix. So here it would be EuroPyton lowercase. But you can also customize it with a decorator named case, where you can use this argument named ID here with Earth. So that's the two ways that you can use to customize the IDs. And for marks, it's even easier if possible. You just use the way PyTest allows you to test functions. The same marks can be used on case functions. So PyTest.mark.skip, the decorator can be applied on this case world function, and it will be skipped. Now, let's talk a little bit about parameterization and fixtures. In PyTest, parameters, so here the EuroPyton 2021 parameter, you cannot recursively parameterize it. So you cannot say, I want EuroPyton 2021 and then EuroPyton 2022. If you want to do that, you have to expand this in the list of parameters. This is a one-level list. And also you cannot say that one of those requires a fixture. And I mean, it's for a reason. I mean, you would not even know how to use it. But you cannot even inject a fixture as directly as a parameter value. And this use case is more frequent. Those you can do with PyTest cases. So your case functions, such as the one on the left, you can apply. It's the same idea as before with the skip mark. Actually, you can mark a case to be parameterized. And it will just work the same way as a test function. So that's for, so here we generate two parameters, one that will contain the world, the string world zero, and the second will contain the string world one. And for the fixtures, it works as well. And it's also the same way that you inject fixtures in test functions. You declare in the signature of your case that there is a fixture here. And then you can use it. So in that dummy example, I require the fixture that I was telling you about at the beginning of this presentation, the tempdir fixture. And I say that this fixture, I want to extract the base name of the temporary directory. And I want to use that as a parameter. And last but not least, I don't know if you will use that often, but some users already reported that they use that. You can recurse. So you can actually parameterize a case with cases. It becomes a bit hard to read. Okay, so a few properties of cases. The functions are lazy, so that they will be called just before your test executes, just as fixtures actually. And the case functions that require fixtures will be transformed into fixtures. Otherwise, PyTest would not know how to handle them. And they will therefore be also set up just before your test runs. A second interesting property is caching. If one of these case function results is needed at several places in the PyTest framework, for example for other plugins, the case function will not be called several times. This is very important because you usually install other plugins and sometimes these plugins grab arg values from parameterized tests. So it's important to not spend again some time to look up the database or recreate an example that you already had in end. Okay, let me talk very quickly because time flies about collection of cases. By default, the initial idea was to say, okay, let's be very strict and separate two files, one for the tests and one for the cases so that we are very clear. So that's still used as the default. Otherwise, personally, I use it very less often than the next slide. Still, it's worth mentioning. So if your test file is named test foo, then you just have to create a file named test foo cases or cases foo, whatever is your preference. And if you do this, you do not have to declare anything in the parameterized with cases decorator. It will look up all the cases in this module. A more interesting and actually suggested by one user in the repository, I want to thank him a lot because it really opened the possibilities is that you can now also specify a module or a module name to grab the cases from and relative works. So here I use dots to say, just use this module. And you can use these cases that are written just before the test. Or you can use a class as a namespace to contain other cases. And in that case, as well, you can reference a class. And moreover, you can actually grab cases from several places, so modules, classes, or even functions directly. Here I reference a function, a case function directly. On top of this collection, I see that I have five minutes left, so I will really skim through this one. On top of this, there are many tools such as different prefix supports, tags for filtering, and callable filters with built-in library of filters. All of these allow you to, you know, multiplex your cases. So you can put all of your cases in a single file, but still have test functions grab the ones that are interested in. Finally, well, you can use parametrized cases to parametrized fixtures. This requires the fixture decorator provided by PyTestCases. And you can also, there is a built-in fixture that is in current cases, that you can use to know the function and the ID of the case that was injected in here. If you want to do some, you know, debugging breakpoints, putting a breakpoint is easy with that. Okay, now let me jump to advanced topics. So very shortly, PyTest cases could not be possible without major improvements in the PyTest core mechanisms, and these are made available independently of what I explained before. So I invite you to check the documentation. In particular, in particular, there is an enhanced parametrized decorator, an enhanced fixture decorator, and a concept of fixture union that I will talk about a bit later. PyTest cases was created with a few design choices, not to, you know, disturb or hack into PyTest too much. So I tried at the beginning to say, okay, I will just modify the functions from the user and inject things inside there or remove arguments in there so that I don't mess with PyTest too much. In the end, I ended up messing with PyTest later on. But for many features, I still just wrap the functions from the user. And this is done using a library that I wrote separately that is named MakeFun. That is, that you can see as an enhanced version of fun tools that allows you to dynamically modify a function, adding arguments, removing arguments, things like that. One, I will not have the time to explain these graphs. So one challenge that was solved, and that was a major challenge, is the concept of fixture union. In PyTest, everything is a cross product. Tests are parametrized, depend on fixtures, they depend themselves on other fixtures that are parametrized, and all of this can be represented as a graph. And basically, PyTest grabs everything and does a cross product of all the combinations, and that's your test node list. And this limits the capability that I wanted in PyTest cases to have some parameters to require some fixtures, but not all of the parameters. So I ended creating this mechanism of fixture union that is basically the ability to create a cross product plus another cross product plus another cross product. And if you want a detailed explanation, and with examples, there is one in the documentation page. Finally, to conclude the last minute on benchmarks, if such a table looks familiar, a table where, I don't know, it's research paper, for example, you've got data sets for each row, and then challengers, for example, algorithms with different variants. Here, this is a polynomial regression with two degrees, one degree two. And you want a results table like that with some kind of evaluation protocol providing error metrics. Well, this is actually feasible with PyTest cases. You just have to see this, the evaluation protocol as a test. So in the figure that you have on the slide, the evaluation protocol is the test that is at the middle. And in here, it will receive one algorithm, one challenger, that will be a case. For example, you can prefix it algo underscore. It will also also receive a data set that you can prefix data. Both will be cases, but with two different parameterization to grab both prefix independently. And then once injected in the test, you can do whatever you want, for example, fit the model and then create performance metric, et cetera. And then you grab it and dump it to a file. And that creates your table. If you want to do that more efficiently, you can use PyTest harvest, which is another plugin that I wrote for PyTest, that is just meant for that. Okay. Now, this is time for question. Thanks for watching. I think that it was a very interesting talk. We'll have some questions. I will put one or two here. And I think that there will be a big discussion in the breakout room after. So how is scoping done, session modular function, in relation to laziness and casting for cases? So cases are function scope only. So it's like a parameter. So, yeah. Therefore, we can rely on fixtures that have all the other kind of scopes because they are bigger, but cases are a function scope. Okay. And another one is about the case functions, namely, why not the decorator just like PyTest fixture? Sorry. Say it again. I did not get that about the case function. Why not to use decorator just like PyTest fixture? Decorator of a library. Yeah. Yeah. So I actually use part of decorator that I extracted in a library that is named make fun now. Basically, there was in decorator, there was several things mixed, the capability to create a decorator and the capability to create a dynamic function. And basically, those two capabilities, I would say I created independent libraries for both. So you can look at decor patch for decorator and make fun for dynamic function creation. Because most of the time, I just wanted to create a function, not to create a decorator. So I would say that I relied on the great work that was done in decorator and also in ATR because it's the same code, almost the same code was written in ATR as well. It's now in make fun. Okay. And I think that this is it. Thank you very much, Sylvain. I like it very much. I think that we can continue discussing the breakout room because there are many questions. Cool. Thanks very much for welcoming me. Bye. Bye.