 I am Saransh, I am pursuing an undergraduate degree in computer science and maths at the University of Delhi and you guessed it right, I am from India. I like Python and Julia and I like open source research software. And I maintain a few of them listed out here, PyBam, Python battery mathematical modeling, Backbot which is just short for battery bot and Vector which is not an acronym, it stands for Vector itself. And I love going to do a lot of them, specifically the open source research software. You can learn more about me on the website and again from my shirt it looks like I like superheroes. So going through the table of contents really quick, I will be starting with the introduction, covering basics of unit testing and code coverage, then running testing code coverage but not in a sub-process yet. Then we will be configuring a complete pipeline including some YAML and some code code magic which would automate everything, every test and the code coverage values. Then we will start working with threads, we will run these tests inside a thread and see if that affects code coverage or not and I don't want to spoil it for you so we will look into that at that section. Then we will move ahead to sub-processes, like what are sub-processes, why and how and then some basic data structures that you should be concerned when you're using sub-processes while unit testing. Then running tests in a sub-process and this would be the part where we would identify our main issue and then we will push everything to remote, see how the issue is still persistent and then we will work on a solution for that. Alright, so what is unit testing? A very basic term and let's get started with the brief introduction to come. As it says on the slide, unit testing means writing extra code to test your user-facing code or the main code that your user would be using and this process should ideally be automated so you should not run unit tests every time you make a change but it should run automatically using the pipeline. We will be using the unit test library of Python for this talk and there are a lot of Python libraries that are shifting to PyTest in most tests but for this talk, let's stick to the unit test and maybe some other libraries next year. Moving ahead to code coverage, again as the slide says code coverage is a percentage value that shows how much of your user-facing code is actually covered by the unit test that you are writing. This helps you in finding missing tests but again in an automated way. So getting to a very small example of unit test, let's say we have a feature that adds two numbers in the form of a very simple function and we can write a unit test for that function. This is a standard format of writing unit test. We inherit the unit test or test case class and that enables us to use some assert methods that are provided inside that class and there is a whole list of assert methods that you can go to a particular URL and check it. I think I have linked in my slide yes. And then we check some edge cases to make sure that a unit test are working fine and they are not just a hoax and then the last if underscore underscore name the double underscore statement works is again a pretty standard way of how we write unit test. This basically ensures that our unit test runs when we are running a module and we can run all of them at once. So if we go ahead and write Python hyphen unit test with a hyphen v that is basically to prettify the output. It shows us all the unit tests run well. We can do the same thing with coverage then with the coverage run command and then pass hyphen m unit test and hyphen v hyphen v again for prettifying the output and it does the exact same thing but it generates a dot coverage RC file with it and then we can use that particular coverage RC file using the current coverage report and so if we do coverage report it would show the coverage of the calc module that is calc dot pi and test calc dot pi but we want to omit the test calc dot pi code because that is not a user facing code that is test that we wrote for a main feature so we can omit it by using the hyphen hyphen omit option and then it shows us that our coverage is 100% that means we're covering everything that a user would be using. There are other commands such as coverage HTML which you can use to visualize the coverage even better and a lot of these options are available in coverage documentation so I'd highly recommend you to check that out. Then we move on to why unit testing and code coverage because now why should I increase my already huge open source workflow to include unit testing and code coverage reports. So there are a lot of points that makes them that makes using unit testing and code coverage a very good candidate so first it makes your code more reliable that means when a user comes to your code base they know that the code is tested and if a certain breaking change happened then you would know how to fix it or if even you don't know how to fix it or at least know that a certain breaking change has happened that is making your code crash which you'll know through unit tests and it obviously improves your code's quality alongside testing it it makes your code more maintainable that means it makes it easier to accept code contributions from other people now you don't have to worry about if that code contribution is breaking your existing code base that would be tested out by the pipeline that you write hence making your code base more maintainable then it lets you check faults so if your existing code has a sudden breaking change or a new feature breaks something you'll know exactly where the feature is breaking something because that particular unit test will be failing then it lets you keep the production life keeps the production master or the main band safe and your user base won't go down moving on to coverage it definitely pushes you to add more unit tests but it also tells you where the unit tests are missing and how you can improve the coverage of your code base and then it also helps you identify any bugs in your code base but not always the bugs can make way through unit tests without being detected so in all it's a very good addition to your code base and most of the open source libraries use it and you should too use it maybe not in your personal projects but definitely the projects that other users would be using so don't be this person a person who does not write any unit tests and would do anything to skip writing unit tests or this person who writes the first line of the project and unit test it unit test it or is addicted to unit testing that means the right unit test before even writing the code instead maintain a balance so you should go up through this pyramid so the first priority should be writing basic features of your project that means your project should be usable then you should add and improve the unit test which means your project would be reliable and the users would trust you even more the next step would be to automate these tests and coverage your lowest tests are missing and you don't have to worry about running them again and again everything would be automated and the last step or according to this tutorial or the top this the last step would be making the unit testing making unit testing mandatory so now you don't have to worry about a random person accidentally breaking your project instead whenever they add a new feature they would have to add a unit test along with it and then we can explore more better testing such as integration tests which test different parts which says how different parts of your code base integrates together but we won't be talking about that in this talk so finally let's move to setting up a dummy project for this talk very simple project a calculator file named as cal.py and and test for that name test and so cal.py so right now everything should work fine and it should give me 100% coverage so let me see okay is is the code visible or is only the presentation visible I have VS code on my screen but I'm not sure if it is visible just a minute I'll share my complete screen or I'll share the VS code I think the screen should be visible yes so a very basic calculation or a calculator file with some tests written for the same and according to a slide it should work very well so if I go to let me open up the terminal and let's see it into dummy project and alright so the first step would be to we were using the omit feature of coverage.py inside the command line but a visor option here would be to add a .coverage .coverage RC file which would also be beneficial for the upcoming sections of this talk so you can specify the coverage configuration in this file and coverage.py would automatically note that up it is written in the classic terminal or any format and it has a lot of sections and a lot of fields which can be exported for documentation so if I go ahead and run the coverage using this command again coverage and hyphen in the test and then hyphen review for the pretty output it executes all the tests it generates a .coverage file that we discussed about and this coverage should be ignored when I go ahead and write coverage report because that is how we have configured it it knows that file and everything works out next let's go to github and create a new repository for this then we will be automating everything so that we don't have to care about the tests let's name it euro python and if you have code code already configured it would show you the option here but if you don't then you can just sign in and it would automatically come up we create a new repository till then I will set up a gith repository locally adding files to the staging area and committing stuff let's say initial alright we can go ahead and push that local repository and I already had ignored in place to ignore the cache and .coverage file .coverage file should ideally not be kept in your version control system because it is changed every time coverage runs and you will be running coverage a lot if you have a bigger project so everything working fine and we have created a remote repository next step would be to configure an actual CI pipeline for this we will be using yaml and codecove getter provides this really nice tab actions and we can configure whatever we like getter gives us a few solutions we don't want a python packet but we want a python application and we will configure it we can do everything here inside this particular editor and let's say I want this would run on every push and on every pull request in this particular branch we don't need the permissions thing and for the job let's say we want two jobs first would be checking the style of the file with flake it the linting part and then we would be running the actual unit test and uploading the coverage report to let's say codecove so for that I already have a file in place so that I don't have to write it all from scratch right now so the style job I can copy it here it would set up python 3.9 interestingly set up python before the version 4 requires you to specify the python version here we could have skipped specifying the python version if we were using v3 but before it requires you to specify the version it installs upgrades pip installs flake it and then just runs flake it style issues it would automatically give an error and let us paste that here so the build part again let's skip to python 3.9 so the libraries don't give any error and for installing dependencies a good rule is to use python-nm before pip to avoid any particular errors especially in a remote system we don't need flake it we don't need file test we will be using coverage and we don't have any requirements right so I can skip that glinting with flake it we've done in a different job and then we're not testing with pytest we are testing with unit test so let's say run unit test and generate coverage report here we can check our old command coverage run hyphen and unit test hyphen we this would generate a coverage report but then we need to make sure that it gets uploaded to codecove and speaking about codecove I think it this would be a really nice time to go to the codecove and like sign in if you don't have an account already so I already have an account so it will automatically redirect me there so these are all the repositories which are connected with github and this would appear right there this is how we were generating coverage report locally the dot coverage file next I think this to make it a bit more realistic we'll add the need style the needs field here just to make sure that if our style check fails we don't unit test it so we want users to write the code pretty first and then test it the next step would be to upload the coverage the coverage file generated here to codecove for that we should have an action here report for action yes so let me check out the usage yes it shows that I can basically put it like this and then some additional arguments or options that I can pass in which we don't have to care about right now next step would be to add that particular action here so let's say upload coverage report and then put this thing here and just to be on the safer side let's use we do I think it looks good let's rename it to Euro Python and just to make sure that it is good and I haven't done any made any typos I'll copy paste the already written one I think both of them are identical at this point and let's rename this to CR so this would run on every push and every pull request on this particular repository no matter what the branch is this would first check the style that means run flake it if that errors out it won't proceed ahead if that passes then it would proceed ahead and test everything and upload a coverage report so making the complete making a complete CI pipeline you don't have to do anything manually everything is then automatically all right let's commit this directly to the main branch and ideally this should have started some checks here if you would guess the actions part is running this would first run the style part if that passes it would go to the build part and if the build succeeds you should be able to see the repository here in code code all right let's wait for it spoiler alert it would work the build check has started so the test went fail I think yes everything passed and then the upload coverage report to code code also succeeded and the result URL is available here you don't actually have to go through there it would take some time but code code should automatically show that repository here I already have a test repository from yesterday here yep it should take some time all right it's here it shows that the coverage is 100% which we were seeing locally and if I go to files it shows only a single file and if I open that file it would show me which particular lines are covered which are all the lines this would also this is how the interface would look like if you do coverage HTML but that would look locally instead if you want to automate it use coverage another option is cover also but I personally prefer coverage code code mode all right our repository is here as well you know python if I open that up quick it should show me yes I have made one commit that has 100% coverage the root directory and only one file some pretty graphs and again that particular file and how the coverage is for that particular file all right so everything works right now which is good next let's follow it so I think yes let me create a PR here to show you how code work can work but all right let's go this one yes code code comes with a nice patch to let your users know users know how much coverage your repository has let's create a markdown batch and go to a repository doing everything on the remote repository create a readmify let's rename it to Europe I can talk add a code coverage batch and also the GitHub actions come with a nice looking batch that would tell you users when the repository is down or when the tests are failing and when you can when everything is passing copy the status batch all right so if I do this right and I create a new PR so yes the test should run automatically because we configured our CI that way first of all the style test would run and after that the root run which is unit testing and it's running over full version portion now as we're running out of time I think I'll move ahead alongside as a test run and everything should work out of the box if we check out that PR right yes going to the next part which is threads now one might want to run the unit test in threads this would come with some minor changes in your code so for example you cannot read them directly from threads rather you'd be using a thread safe data structures like built-in path indexes or usually most of the built-in data structures are thread safe you will be appending that value or adding a new key and adding the value here and then you'd be accessing that in the thread so here when we are asserting and we want to make sure that everything works fine we are running the test inside a thread passing arguments like this and additionally passing a dictionary to capture the return values if I run the test and code coverage on this particular code it would run as fine as it was running before so because threads usually are meant to share memory and it won't affect coverage.py in any way the output won't be affected in any way if I use thread or if I don't use thread I won't actually be running this because we are running late on time by a thread why would you want to run a thread first off it would allow you to execute test parallely a lot of open source software use this to execute test parallely and save time and resources so for example if you have a really large testing suit then you might have to pay to get up to allocate you a larger amount of CIT time per month so that you can run your test without any failures. If you want to check out the examples and you can search this particular expression on GitHub and you'll find a lot of repositories use threads. Some additional things you'll need to take care of when you are using it. Unit testing is that ensure multiple threads don't access a single variable at a given time which is you don't have a consumer producer consumer type of problem and you don't run into tech logs etc and then use thread save built-in Python dictionaries or other Python data structures which are built-in data structures are thread save to return values and everything would work if you would use thread so let's switch to the pull request that we made so all the checks are passing and in addition there are two more checks which are automatically added by code code which shows the coverage is not affected at all and the coverage report is full that means 100% and additional information so this is everything is automated if someone comes ahead and contributes to your repository you don't have to care of running the test and running the coverage manually instead everything is done here I can go ahead and merge the pull request so that repository looks a bit nicer yes yes the badges are working very well everything is passing and code code is at 100%. Next if we go to sub-processes so sub-processes the memories are not the memory is not shared but there are specific functionalities through which you can share memories but we won't be discussing them here rather we'll use a queue to return values and this would this queue would be taken out of the multiprocessing library itself so this is how we would be running our test we would be creating a process running the targeting the add function passing the arguments and arguments will be passing the queue additionally to get the result and then we'll save.assert to check if the result is working fine all right I think I'll run this quick I'll move out the calc and test calc out and then move the process one in and calc process yes so I think the calc process one disappeared all right so I might not be able to run it right now because of the time shortage so if we run this thing we will note that our coverage value drops down it goes down to 60 percent around going from 100 percent to 60 percent because coverage cannot automatically detect if you're running detect as if you're running them inside a sub-process or if you're using multiprocessing or any other processing library to run the tests inside now why are we using sub-processes when the coverage value is going down and it is making our lives difficult again you can run tests parallelly saving CI diamond resources both but this is a bit more inefficient than threads if you just want to execute test parallely but you can also step stop test midway if they're taking too long and restart them so if you're having some probabilistic tests in your repository you might want to use sub-processes to run that again a lot of repositories use sub-processes but some of the few and few ones to name pi bam bad bot gnii etc and pi bam and bad bot are the ones that inspired this talk later does know that you can use sub-processes and how they would affect the coverage value of your codebase and again use multiprocess save data structures like q and maps to return values this work does not work ideally but I guess I cannot get it to run right now and I don't have enough time to figure that out so why are we why will we be getting the wrong results because this is what how the how coverage dot-pies documentation put setup so measuring coverage in those sub-processes can be tricky because you have to modify the code spawning the process to invoke coverage dot-pies so coverage dot-pi would not automatically know that you are running tests inside the sub-process rather you need to invoke coverage dot-pi to run the tests a fix for this I think let me add that and run the coverage quick so you can actually see that the coverage is written down so I'll have a test repository here and yes the calc process for you so everything is same as it was before instead we're just calling this test out of a process here and we're using q to return values let me run the coverage and high-finance test and with the high-finance it runs all the unit test fine and if I do coverage report the coverage actually goes down whereas we're testing everything this is this part of the site that it does not work naturally rather you have to configure some stuff the first configuration would be to add these particular lines in your coverage RC file that means telling code coverage dot-pi that you're running parallely and the specifying multi-processing in the concurrency part but here you can replace this with the particular library that you're using to spawn a particular process so if I go to alright let me I think I'll push this up onto GitHub to make sure that code code gives the same result alright process and then we push yes set upstream so now that we have another full request I'll open this up quick right I might not have anything like that so it is the time the test runs we can move ahead and if I add these particular lines coverage dot-pi should automatically note that I am running test insider makes a sub-process but any library that could be spawning that sub-process you can replace that library here and if I do the same commands running the test and generating coverage report alright some so it first of all it goes ahead and creates different files for different processes that and these files are named differently because of this particular option parallely equal to true now we can use coverage combined to combine these files and then a classic coverage report which would so coverage combined combines all these files into a dot coverage file that we've been seeing until now and then the coverage report gives us a hundred percent which is what we were expecting alright s pass the code code thing might take some time so I don't want to take more time here in the session this was the config fix how you can fix this particular problem by editing your configuration file and then if you edit your CI and push it up then code code would eventually notice that the coverage is wrong and it will error out on that full request another fix would be to pass the concurrency option in the CLI itself but this particular thing works only with multiprocessing that means if you're using some other library to let's say spawn a sub process then this would not work and you should stick to the the configuration file fix so if I go ahead and I remove this part here concurrency is equal to multiprocessing then I can still run this thing and pass in concurrency here it still generates a lot of files I think four of them are for different sub processes and one of them is a general overview file then we can coverage combine these files and the coverage report now gives us a hundred percent coverage so and if again we push this alright I think yes I think we did not edit our so if everything went well and if we edited our CLI then the code code part should give us an error here that the coverage is dropping whereas we are testing everything inside sub processes alright so our problem is solved now so we can run sub test inside a sub process and inside the thread without like taking down the coverage of a code base and keeping it all looking pretty for the users as well and everything works now so things to take away from this session it was a bit rushed but I hope you got the basics of it we went through basics of unit testing and coverage then we had some fun with CI pipeline using GitHub action and code code then we started running tests inside threads inside sub processes and then at least locally we discovered that the coverage was going down it would definitely happen there but happen on the remote but maybe some time maybe and I'll have it for another talk next year and then the code coverage how we can fix that error that we are getting using two particular solutions editing the config file and then passing the configuration directly into our CLI if we are using multiprocessing as a library to spawn sub processes or to spawn different multiple processes and yes that could be it thank you so I know it was a bit rushed because I couldn't figure out the echo part in the starting of the presentation but I hope you get something here