 Hello everyone, my name is Radek Ankewicz and I'm from Poznań, a beautiful city located in the western Poland. And my today talk will be about writing quality code. At the beginning a few words about me. I started programming in Python in 2007. And since then I've been mostly using it for developing web applications. And I've had a chance to learn and use some of the most popular Python web frameworks starting with ZOPE 2 back then. And for many years I've been related with Stakesnext, a Python software house located in Poland, where I had a chance to be a part of projects for banking industry developing some sites, governmental sites. And since the beginning of my adventure with Python, I was always trying to improve the code quality that I'm writing. And today I would like to share some thoughts and ideas about it. And in order to make this part of presentation at least a bit interesting, here's a random fact about me. So once I wrote an application that enables you to write a decision making algorithm that will play a poker game. And I made it just for fun for our coding code list internal in our company. But if you are interested you can take a look at my github profile, the code is there. So starting with the plan of my presentation at the beginning I'm going to try to define how the quality of code can be defined. Then we will try to find out if it is really important and if it is really worth to take care about it. Then we will try to answer if there is any reasonable way of measuring the quality of code. And in the mind part of the presentation I will try to give you some advices, some conclusions that I received. And I will try to present some useful tools and other hints that will help you to improve the quality of your code. And at the end there should be a couple of minutes for questions and comments from the audience. Ok, so let's start with the definition of the high quality of code. So in fact there is no explicit definition. It is rather a subjective feeling and many different programmers may have a different opinion on given part of code. It may be based on their experience, on the language they use or the convention that they are using. However, in his book Code Complete, Stephen McConnell distinguishes between external and internal characteristics of software quality. And the external characteristics are those that the final user is aware of. This is something that the final user of a software product can experience. These are such like correctness, usability or efficiency. On the other hand there are so-called internal characteristics of software quality. And they are not actually visible for the final user of a software product. But they have a big impact on the process of development of a software product. They are such like maintainability, flexibility or portability of code. And in my opinion I would rather even simplify it and I would say that good code can be characterized only with one word. The good code should be understandable. That is because this is what according to my observations the developers usually do in their work. They try to understand the code. And a small part of time is spent on modifying existing code or developing a new. And actually this chart is not a product of any scientific research. I took it from Jeff Edwards' blog post but it perfectly describes my feeling about developing software. Ok, so now let's try to find out if it's really important to care about the software quality of code. Because sometimes under the pressure of deadline or under the pressure of project manager we tend to sacrifice the quality of written code. You should take into account that the technical depth that you take at the early stage of development of a software product will surely have to be paid back. Let's take a look at the chart which presents the dependency between code quality and time spent to develop a new feature. And in case of the project that characterizes with high quality code, the time to develop a new feature is much smaller than otherwise. So now we know that we should take care about writing quality code because anyway the bad quality of code costs. And now let's try to find out if we can actually measure somehow the quality of code. And as I said earlier quality of code is usually subjective thing. And according to this funny comic you can actually, speaking how seriously distinguish the bad code from good code just on your own impressions when you read the code. But fortunately there are some metrics that let us determine the quality of code more deterministically. I will try to tell a few words about them starting with cyclomatic complexity which can be defined as a one plus the number of decision statements and Boolean expressions in the block of code. So here's a short reference table, the full table you can find on the net. How different statements affect the cyclomatic complexity number which also can be called McCabe's number. And there's an alternative definition of McCabe's number which says that this is the number of linear independent paths through the block of code. We'll see it in an example. We've got a function here which does something. Actually it doesn't make anything sense. It's just an example to show you that we can try to find the count McCabe's number for it. And it is one plus all the decision statements so it will be one, two and three so the number should be four. And we can compare it with the number of paths which is one, two, three and four. So, yeah. Another metric. And in fact it is a set of metrics, our so-called hosted metrics. And they are calculated based on the number of operators and operands in the block of code. And here's another example. We can count the number of operators which stands for the actions that are applied to operands. These are logic statements, mathematical statements, anything that affects the flow of algorithm. And on the other hand we've got the distinct number of operators which are usually variables, constants used in a code. And based on this number, hosted defined a set of metrics and I'm not going to go into the details of those formulas. I'm just mentioning them because they are usually used by automated software for measuring code of quality. And the hosted volume is also used to count another metric that describes software which is called maintainability index. And it is based both on the hosted volume, the cyclomatic complexity of code and the total number of source code. Number of lines of source code. And I would like to show you a tool that helps you to calculate those metrics for your code. With a single comment you can run on a code and it will spit you out the rates for all methods and functions that are defined in the module that you are using it for. And it ranks it with a letter. In a minute I will tell what the letter means. And the same you can count the maintainability index for given module. It also gives us some results. And here's the reference table with the results from Caves number. Each letter is a rank that tells you how complex is your code. And if your code is rated with E or F, that means that perhaps you should take a look if there is any place for simplifying it. Because such code just may occur to be error prone and unstable. And the same for MI results. If your module is ranked with C, that means that it will be quite difficult to maintain. And just for curiosity I tried to run the MI metric for some of the popular Python web frameworks and the results are pretty good. Flask even got rated A for all the files, so it is a good result. Another useful tool I would like to mention here is Pylint, which is a static code analysis tool. That helps you to keep the coding standard that you use in your project. Beside it detects some errors that can be detected from the static code definition. So it is usually used as a refactoring help tool. By the way, anyway, it is also fully customizable and should be easy to integrate it with your editor or IDE that you are using. And here's an example of using Pylint. There's another piece of code with some flaws and defects. We've got some unused variables here, and we are using the module that we haven't imported. And here's the result of running Pylint on this code. And as you can see, it is warning us with some code-style violations. It detected the error using a datetime name, which is not imported anywhere in the module. And it gave us some warning with definition of unused variable names. And it also rates a code, and this piece of code was rated with minus 8 in the scale up to 10. So it means that it is not a good code, and it should be refactored. I just cut out a piece of what Pylint gave us in results to fit the slide, but it gives you a bit more results. That's why I usually use this tool and would recommend you to do so too. And if you would like to gain some knowledge about other tools that support checking quality of Python codes, I encourage you to go to a tomorrow session of a colleague of mine. It's tomorrow at the same time in Room Barrier 2. And now let's try to think how can we improve the quality of code we are writing. I will try to give you some hints on it. So in my opinion, the single most valuable action that supports the process of developing a software project is a code review. And simply because it decreases the number of bugs, because with more people can take a look at the piece of code, it is more likely that the error will be fine. It also enforces you to write in code, because when you are writing a code with knowledge that someone, that opinion you care about will take a look at your code, you are going to be programming differently, and you will try to avoid simple errors. Then it speed up learning, because it enables the flow of knowledge between more and less experienced programmers. And it enhances the team culture, because it enforces constructive feedback, which is very important for programmers to grow in their jobs. And a few hints of how to do a good code review. At the beginning you should review all the code that is pushed in the master branch, you should never push anything which is not reviewed. You should use automatic tools to be focused only on the code itself. You won't have to care about the code violations, because they can be fined automatically. And the third rule is that everyone should be equal, and everyone should make and receive code reviews. Speaking about the tools that support the process of code reviews, there are plenty of them and most of them are really good, but for me it's sufficient is the simplest one, which is an inline comment that are available on most of the open repositories, like GitHub or Bitbucket. And readability has a great impact on the understandability of code, so we should also focus to write our code readable. And Python was designed to enable you to write a clean code, however it is just a tool, it is just a language that you use, so you should also take care and I'll try to give you some common rules that will help you to achieve it. At the beginning try to keep consistent with the coding convention that the project you are writing uses. It doesn't have to be code standard, but usually it is good to use it simply because it is used widely among other Python projects and it is probably a default option in your editor that will help you to be consistent with Pep 8. Naming variables is not that easy thing it may seem to. So let's take an example and try to find out some common rules about naming variables. At the beginning it should be descriptive, so don't use one car variable names, don't use two general names and this is something really bad because you are overloading Python standard library method here and I've seen many times where people use objects or ideas and name for a variable which can be quite dangerous. On the other hand the length of variable name should not be too long because one is unreadable and two it is not convenient to use it in your code when you have to type 50 cars. And here are some good examples for the variable name in this example and abbreviations are of course acceptable at least as far as they are obvious enough. One more thing, you should also avoid double negative boolean logic in naming your variables simply because it is not natural with natural human understanding. So when you will try to make a negative of that you will have double negation which is not that readable. I worry about dog strings and comments. There is one general rule dog strings should be valid and should be up to date so when you change anything in a code you surely have to take a look if it didn't break the dog strings comments. Yeah, the same rule about comments and one more rule use it when you can't avoid the complexity or obscurity of your code so just don't try to state the obvious thing. It doesn't increase the readability of your code anyway. And you should also make your test clean, keep them clean you should take care about them because they also may affect the process of decreasing the maintainability of your project. Beside that you should also put some individual effort to write more readable code. You should know Python idioms like in this example where using Python chain comparison is more readable than not using it. You should know Python standard library functions it will help you to write more readable code and you should know Python syntax expressions that will enhance the readability of your code. By the way this is not a good example to generate HTML anyway but I use it just to show that using a context manager and decorator make this code quite clean and readable. Beside that you should try to read some books not only the Python devoted books but also some language agnostic books that treats about writing clean code. There are plenty of them you should easily find the one that will be more suitable for you. You should read a lot of documentation. The Python standard library has a great documentation with many examples and it will surely help you to gain some knowledge about usage of standard library modules. And you should practice a lot. And I would like to show you one more thing. The system I use sometimes to have a challenge in Python. This is called check.io.org and here is an example task that is about finding the most frequent letter in a piece of text. And here is the solution I came up with. Not very readable but at this time I just thought that it makes what it has to and I submitted this as my solution. Then I took a look at the list of players who took part in solving this task. And the good news is that the solutions can be rated and commented. So let's take a look at the piece of code from the first place. So yes, you can solve this task with just two lines of Python code which gave me much to think about how could I improve the quality of my code. Ok, thank you for your attention and this is time for questions if you have any. Questions? Ok, so I have a question about dog string. What parts of code in your opinion deserve a dog string because making them for every function I think it would be too much. It depends on the convention that you use in your code. I must say that I try to keep a convention with having at least smaller dog string for each function that I use. And sometimes I agree that they are not usable. It's up to you. The pilot will tell you that you don't have a dog string defined for any function but you can of course configure it to omit these messages. Any other questions? Ok, thank you very much. Ok, now we...