 I'm Christian Rades and I've been a professional PHP developer for the last three years at Shopwear. And not only do I live in a rural shopping where Shopwear is based, it looks like this. So really nothing much to go on about. Just meadows, some cows, a few trees, you know, just normal village stuff, more cows than people. And so there are two things I can talk about. Trees and beer. So today it's going to be trees, but maybe next time beer, so I'll keep an eye out for that. And the thing is today I'm going to talk a bit about forests and trees, as the talk said. And forests are a very cultivated thing nowadays. I mean, they might look rough and tumble, but in fact a lot of forests are maintained by humans going through there and taking care, like this guy. And these people, you know, they measure the trees, they cut them down if they're rotten, and they're caretakers. And the thing I want to drive home is you are caretakers as well for your software. I mean, kind of you don't go out in the forest and cut down trees, but I mean the metaphor still works, probably. Right, but what kinds of trees does your application contain? I've got this here as a little example. I mean, it's just a pseudo call trace. You know, you have a controller at the checkout route. The checkout will fetch some users from the database, will fetch a card from the database, and then, you know, calculate the bill, and then you render it. So, you know, a normal call graph. And keeping your trees in order when they're small and well groomed is easy, right? Having an application that has already been taken care of well and has been developed with things like clean code in mind are often easier to keep in that state. But the thing is, most people who start to use stuff like static analysis, they encounter more something like this. And the thing is, it's really hard to get order into that and to start changing it for the better. Because you have to know some, you know, you have to have values, metrics, stuff like that. And so, enough for the trees for now. The thing I want to talk about first is readability, you know, as a cornerstone of like getting your trees in order. And I mean, we all know about code style readability. It's quite important because readability is not only for the developers maintaining the code base, you know, searching through it. But also, it's important for, you know, just the first way you see it, right? How you feel about the code when it's properly indented, stuff like that. So you don't just go in the code base and think, oh my God, like, what's happening here? It's not even formatted properly. And you've got stuff like indentation, line breaks, maximum line width. You've got the occasional stray space. And the thing is, it's not only, I mean, it's quite obvious, but it's hard to get right. Because in a code review, for example, you only have a limited amount of time to really do some concentrated looking through the code. So every second you spend thinking about, like, is this space too much? Or is this indented not properly? You will lose concentrated time thinking about stuff like data flow and the general idea of what this pull request is trying to do. So the thing is, style issues are boring. You don't want to bicker about how long lines should be or when to put a new line, stuff like that, or should you use comments, shouldn't you use comments. And the tooling for that is quite well explored. So we have easy coding standard. And the nice thing about easy coding standard is that it combines two, two. You have, on the one hand, PHP CS fixer. This does a lot of, you know, the standard layout stuff like line width spaces. And you've got the PHP code sniffer, which also does some cleanliness checks. And they were quite well in conjunction. You also got, you also have some custom rules. And you can just use ECS out of the box because it's already got an opinionated collection of lins and style formatters that can automatically bring your code in line. So I thought this quick detour to readability, we're back to the trees, like here. And today I'm not only want to talk about like green trees, but PHP itself, the language, also has trees. It's actually what inquired the horrible pun that is my title. And it's the abstract syntax tree. It's been in PHP since PHP 7. It was put in there by Nikita Popov and it enabled the language to be more memory efficient and be a bit quicker with the compiling and also made the PHP runtime more maintainable. And I mean, I could show you a PHP syntax tree, but they are quite big. So let's start a bit smaller like this because as it turns out, math has some kind of syntax tree as well. You can pass a mathematical term like this one here as a tree. And why is this a good idea? Because if you just read this term left to right, you get the wrong answer obviously because it's not six times three. It's, I can't do math right now. No, been a long day standing all the time. We don't have chairs at the shop where stand outside, but the thing is the right answer is 10 obviously. And to get to this answer, the computer has to build a model about precedence. And you can represent this model about precedence with trees, right? You have to first figure out what is what are the arguments for your addition. And after you figure out and to figure that out, you need to figure out what is what is the result of the multiplication. And so you can then just collapse your tree and get out the right answer to give you an example about how big these abstract syntax trees in PHP can get. I've just taken the shop where six index PHP and run a line code on it like 104 lines. That's not much code. I mean, everybody can keep that in their head and think about it stuff like that. But if I run the PHP parser on it and output the AST to Jason, suddenly it's like 35 fold in volume. So you have no chance to think about that manually, right? Just reading through the AST that won't happen. And so I want to show you two tools to help you with that. The first one is the some older tool. It's called PHP stand. And it does basically the job of looking through the syntax tree, finding some weird parts, some maybe erroneous parts and telling this to you. And one of the main pro of PHP stand is it's easy config. The configuration itself has nine levels. That's all you need to do. You need to decide what level to take. Level zero is the most permissive. So it only finds stuff like syntax errors there. And level eight is the most restrictive. So it looks into stuff like types and that code, logical conditions. And to run it, you know, it's really just you give it the level, you point it to your source directory, and off you go PHP stand will look through your code. And it's not that slow as well. So lots of people may be thinking, oh, static analysis. It will take quite a while. I think we run our like 3000 files in five minutes, maybe 10. This is obviously a bit much for day-to-day development, like having it continuously run after every code change. But it's good enough to run it per merge request. So you just know if you're building errors into your software that don't need to be there. The other thing that's really cool about PHP stand is that you can define custom rules. So these custom rules are just PHP files implementing some methods at PHP stand calls to check out the tree with custom information. For example, at Chopper, we're more of a vendor kind of software. And so we have to take care that our code is usable for third-party developers. And so we need to make clear to our own developers and people who might work on it later that we want clear interfaces. For example, through decoration. And so we implemented an annotation called atDecoratable that makes sure that the class that is marked with atDecoratable is properly decoratable through symphony services and we don't accidentally break this decoratableness because of some later change. We don't get direct feedback about things like these because we don't decorate our services ourselves, but other people will and we don't have their code bases. So we would, you know, just break code somewhere else and we wouldn't know about it. So we implemented a few of these checks that all fit into this. This class is decoratable category. And I mean, obviously it needs to implement a interface. So we've written a rule that checks doesn't implement any interface. Then we have a rule that says, you know, does not add another public method because you can't decorate the methods that aren't in the interface. You know, stuff like that. It's not using any of their own methods. And this is why we're using PHP then. And some people think you only need to, or you only can run one static analysis and if you're running more than one tool, they might have overlap. But as it turns out with Psyl, you're still getting added benefits of running both tools at once. The thing is that's so awesome about Psyl, which by the way is developed by Vimeo, is that it has an extended type system. It puts a lot of effort into building additional typing into PHP. Even stuff like pseudo generics with their templating mechanism. And the typing system adds other things as well, like for example, union types. They do exist in the type annotation that is pretty common in PHP, like just annotating types instead of putting them directly in the function body, or not function body, but function signature. And for example, with the weird functions like the string functions that may return a zero or false, you can even take like enum values, like directly false. It cannot be an integer or true or false, you know, or bool, but it is either int or false. And with that annotation, you can find errors later down the line where maybe somebody cast the result of this operation down to a bool. Because if you cast this operation down to a bool, somewhere to maybe fit into another function signature that expects booleans, it will never be true. So this is able to find that code. The other thing is it adds typed arrays, something that we cannot have in the PHP core itself because it would be too slow. Excuse me. The thing about typed arrays in PHP is that arrays themselves are quite pliable. You can throw a lot of stuff in PHP arrays, and if you were to have them typed checked, you have to run the check on every single item, on every single array operation, and that would just be too slow. So we do it in, you know, some kind of compile step, like the static analysis that Salm does. And as you can see, we can not only define the value of the array, but if it's an associative array, sorry, you can also define the key type. If we define the key type as int, it's like a normal non-associative array. And by now I think PHPStand supports the same notation. So there, PHPStand and Salm work in conjunction, and if you define it in the Salm notation, PHPStand will most likely also help you finding bugs due to typing mistakes. The next thing is that's quite special to Salm, is it's got object-like arrays. Salm is, as of my knowledge, the only tool that does it, but you can even type in certain keys in your associative arrays. So what looks like an adjacent object basically guarantees that whatever is written in the V variable here has to contain an item under the key value and an item under the key name. And this is, of course, useful for APIs and stuff like that, because you can write assertions for that. Salm does know about that as well. You can say, right, it asserts that the array has a key value pair, value and an object type foo. And once you've run this assertion on your input data, all the code that comes after it will be treated like this is a valid assumption. And so you can find even more bugs and you are forced by Salm, basically, to have a proper input validation. But I mean, I can talk all day about, you know, language quirks, but the question that comes up is why even care about type safety? And excuse me for the lot of drinking. My mouth is quite dry today. But the thing is, ever look at this. This is a real-life example. We found it in our code. I mean, it's quite small. And somebody wanted to sort an array. So they gave it an anonymous function that sorted by the value of type. And this looks correct, right? You have a greater than part there. But once we've run Salm on it, it says invalid Scala argument. So I mean, in this case, I reproduced it, that's why it's in the main PHP and like on the top of the file. And it found out that the callable you give it must return an integer. But the callable that we had here returns a bool. What does that mean? It means that you sort another hood will cast your bool to an integer. And you might say, so why should I care? The thing is, it represents not only if an item is larger than another one, it represents the cases, an item is larger than the previous, an item is equal, and an item is smaller. So what you should do, like here, or what you should do is, excuse me, is you need to produce a value between minus one and one. And that the you sort function knows how these two items relate to each other. And it even says in the PHP manual that if two members compare equal, their ordering is not guaranteed. So while it might work for years even, but suddenly a little change that throws around the ordering of the array before it is sorted might change the sorting after it is sorted, which is a total no-go, because in this case PHP cannot tell if A is smaller than B, or is equal to B. And so the fix is quite easy. I mean, you just have to swap out the operator, PHP even provides one for that. But this was just to show you that types of safety really is something you should care about. But there's another thing I want to show you, and this is metrics. Because the thing is, we can try and satisfy tooling all we want, but we might still want to have an easily parsable quality measure, a way to know if your software is going in the right direction. If it becomes more maintainable, not less. People have to measure trees to know if they are sick. And the first tool I have here for you is Deptrak. Deptrak itself can be configured through a dependency file. And this again is a real-world example on a project I worked on like two years ago. It was quite the legacy project and we tried to refactor it, get it more maintainable, so we could put it a bit on the back burner. And I mean, first of all, you just tell it, where does it find the software components in our component folder? And we don't want the relationships of our test files with it, because tests are tests. They can do whatever they want as long as they make sure that the software works correctly. And then this is the cool part. You can customize what for Deptrak constitutes a module, a dependency. It's because if you plot out every class and their relationships to each other, you'll just get, you know, giant graph with like a million lines and you can't tell it apart from each other very good. This problem actually has a name. It's called the big ball of mud, because you arrange your classes in a circle and then print out their relationships as lines in the circle. And for most software projects of a medium size, it just turns into one big black circle where you can't tell anything apart. So we define our layers, for example, you know, a CDN layer, and the collector is what actually does the assignment of the classes to the respective modules. So in this case, we just go through the class names. You have in this case a REC-X, and this REC-X just checks the fully qualified class name. So this is quite important because maybe you want to have modules that spend several namespaces, or just at some point further down have like a common ancestor or something, so you can specify whole parts of the namespace there. Then you can define your rule set because the nice thing about DevTrack is that you can teach it what kind of dependencies do you want to avoid. Because I mean this is the reason we're doing this to figure out where we make mistakes. And so in this case, we have two components that really contain cross-cutting concerns. They're called SDK and common, you know, stuff like logging, messaging, stuff like that you'd find in these two modules. And we basically tell DevTrack that every module is allowed to have dependencies on these two cross-cutting concerns, but not on anything else. So it took like, I don't know, an afternoon to get working, and after we ran it, we found this. I mean we don't have that many modules, but you have to keep in mind that each of these modules contains tons and tons of classes themselves. So, you know, stuff like the updata was like the main component, but by the other components depended on it. For example, here the indexer and what we call the social network. And the problem that was becoming obvious here was that we had an application that was instead of, you know, a distributed monolith, which is badly architecture micro-services where every micro-service knows about every other micro-service and constantly works with everything else, and you don't have any clear path for your data to take. We had something similar in a monolith. We had components that wanted to be modularized quite strongly, but they didn't because at some point some developer, probably inadvertently even, built up dependencies because it was just, you know, the quick way to go through there. So we came up with the idea to just use XMLRPC because we used XMLRPC in that project for some other things and to take all these dependencies and move them to an RPC interface so that the modules don't have to know about each other anymore. And after the first, I think, month, three weeks around that, we were at the stage where we reduced the wrong dependencies by, like, 40. Then after, you know, like, two more months' time, we put quite a lot of work into this, we got this, which was perfectly fine, and it becomes easy to understand that the SDK and the common bundle are cross-cutting concerns. They're forming a layer below the rest. So this is what just simple dependency tracking can do for you. It can make it obvious where there are, like, hotbeds you want to refactor. The next tool is PHP Metrics. PHP Metrics has quite the feature, quite the amount of features, as you can see by the dots on there. And the thing is it can output, like, totally simple stuff like lines of code. I mean, I did it with, you know, a 30-year-old Unix command, so that's nothing special. You have stuff like cyclomatic complexity. If you don't know cyclomatic complexity, it basically models statically how many paths there are through a function. So when you've got a function, I mean, that is cyclomatic complexity of, I believe, zero, it might be one, and nothing happens there. You know, data goes in, data might come out, easy. But when you introduce an if, suddenly you have two ways your data can go through the function. It might go into that if, it might not, depending on a condition. And so your cyclomatic complexity increases. And then when you've got a loop in there, you have even more possibilities, the loop might not loop once, it might terminate somewhere in between, so you've got even more. And if you put, you know, ifs in loops and loops and loops and stuff like that, cyclomatic complexity balloons up. So it is a measure of how complicated code is to understand, because after all, we're not computers. If we run, if we read this code, we might not spot all the ways in which this function performs. And so it's a good thing to keep track of, but as I can tell you, if you try and write or use test-driven development, cyclomatic complexity tends to stay quite low per default. But if you're not using test-driven development, which basically most of us, I included, it will be a good measure that the function should be tested more intensely. Because if your function has a high cyclomatic complexity of like 10, 15, or even 20, it's like maybe 100 lines long. And you only've got one test for it, that test might better be gigantic to test through all the different cases in that functions, or you have a problem. Because there might be a code path that's completely unknown. So also what's pretty neat is the distributions of the lines of codes. You have a nice graph with percentiles, like I don't know, the 50th percentile of your classes is below 50 lines of codes, stuff like that. And like the 95th percentile is like 400. These numbers I've got from Twig, I just had a bit of spare time last week and ran it on Twig. Quite fast, by the way. I mean, Twig is not that big, but also not small, and it took like, I don't know, 10 seconds to run through it. This does not even, as far as I know, does not even build an abstract syntax tree. It really only looks at the code as text. Then you've got quite the curious measure here, the average box per class. Now, a proper question for that would be, well, how do you know how many bugs are in code? If you don't run it, you don't test it, you just look at it as text. But this was the idea of some computer scientist like 40 years ago. He basically looked at how many unique operators do you have in your code? Stuff like addition, function calls, stuff like return statements, and how many unique operands do you have? Like variables, I mean, all kinds of various aesthetic variables, stuff like that. And by these measures, you can calculate how complex a piece of software is and then you can infer from this complexity through experimental results how probable it is that your code will contain a bug. This is, of course, a bit of divining. You know, it's a pretty general measure. So even very good code bases will have some number that is greater than zero by about average bugs per class. But it might give you a good indication how probable it is that your code does something you don't expect it to. Then it got efferent coupling and efferent coupling. I hope I said that right. I always get them mixed up and as a non-native speaker, efferent and efferent are not the easiest to say either. And the thing is about efferent coupling. It's basically inward. It's inward coupling. If you have a class and this class gets depended upon by other classes from your module, this is efferent coupling. This basically means that the class is just used a lot in your code base. On the other hand, you have efferent coupling, also called fan-out coupling. And this coupling is how many other classes does your class use? The matter with these two is they give an indication how fragile your class might be. Because if you depend on a lot of more or less primitives, classes, objects, structs from another software bundle, it might break because dependencies can change. And when they change and they are used in a lot of different places, these places might break. And there's a measure for instability that is basically efferent coupling divided by efferent coupling plus efferent coupling. So efferent divided by the general coupling. And this just means that if your class is more or less just a facade above another piece of software, it's not as hit as hardly by the coupling effect if it's more of a user kind of class. So it uses these other primitives than just hide them. And last but not least, it can output the class relationships that I just talked about with the big ball of mud and the grass that are not so easy to read. PHP metrics has the nice benefit of outputting an HTML site as an artifact. So you can click around in it and you can have a look at that. And it uses some JavaScript, use some JavaScript to keep all the graphs readable and, you know, they highlight stuff, they make links to files. And it also can tell you about the maintainability. So as I told you, I've run this on tweak. And the thing about that is you can see lots of circles going in and all of these circles are a class. And the size of this class is basically an indicator of how long it is. So, you know, tiny classes will be somewhere here, large classes will be outside, and that color is representative of their maintainability, which is an aggregated score of all the metrics I've shown you before. So, you know, classes on top of here, very cherry red, quite big, they are probable breaking points in your application. Should you want to change something or should any of your dependencies decide to change something? Which is quite common because, after all, your code is used by people and they may have changing requirements, but also you might find security bugs and that's the point where, hey, we can't add another feature, it's no longer a valid way to keep your code stable because, well, you've got to fix security issues. Although it is nice that PHP metrics generate like a ton of different stuff and values and graphs and even a whole HTML page, you want something that's more easy to digest. And that's why I want to recommend you PHP Insights. PHP Insights is a bit young, it's very opinionated. You definitely have to configure it if you run it because it disallows certain things like, I think it says no comments and code and no circumstances, which is not that good an idea. I mean, you shouldn't just write everything your method does inside the method, but at some points, maybe you're using something magic and you want to document that in the code for the next guy because it will be useful. And this is the output of it. I've just run it on a plug-in I'm currently developing. And the really great thing why I like this is it's a CLI app. So for developers, it's quite natural. I mean, you're running your tests, you might as well run PHP Insights. And this might say more about me than the tool, but I really like the fact that it's got color coded output. I mean, not only does it look kind of pretty for a terminal app, it's also rewarding to see like a tool tells you, hey, you did well. And it's not to be underestimated, right? Because you want to feel good about your code and this helps you. It does run some general analysis like, yeah, how much comments do you have, classes, and how big are your functions, stuff like that, takes that apart. Then you have a complexity measure, which is, as I told about, cyclomatic complexity, basically. And well, we've got one point called architecture. And I mean, it's maybe a bit of a controversial misnomer, because architecture is more like how people envision software to be and how it's modularized, how it depends upon different parts of the software. And so this does not do that. So architecture might be more aptly called clean code. It checks that your interfaces are not too big. It checks that you don't have functions that take like a million parameters, stuff like that. What I'm not showing you here, but it's pretty neat. The issues, it's fine. It will show you them in the CLI with a bit of interaction, so you can scroll through them one by one. And they are a mixture of like Slivomart style recommendations, but also concrete ways where your function header might be too big or you might have like a gigantic cyclomatic complexity somewhere in a function, to put for each in a while and if as well. So these are the tools, these are the general tools that I wanted to show you, right? These are the tools that we can use to keep our code clean. But here's the thing, we're software developers and not gardeners. So this does not cut it, right? We need automation. We need the really big guns. So for that, I mean, I recommend you CLI. I think it's quite well adapted by now that a lot of you will use some form of CLI, maybe nightlies, maybe continually. I mean, right? But the thing is just because you have your checks running in a server somewhere, does not mean you should not use local. You make these tools available locally. Because the thing is your CI will always be slower than running it locally. Simply because you've got limited resources on your CI server and I've seen it myself at the shop where we have like 50 people committing on a project and the CI server will almost burst into flames because it's all these jobs getting queued and queued and queued and then maybe somebody finds another bug, does another rebase or something like that and every time the workload increases, increases, increases and every time the CI fails, you will 100% get another merge, you know, get another task for the CI because, well, the developer fixed it. And this means that running a task might take quite a while to get scheduled. So you push your code in the repository, you open up a merge request and suddenly you need to wait like half an hour on the result of your static analysis and tests. And when they fail, that's the moment where people say, oh, this just shit, oh man, I'm really down, I'm really angry because, you know, just a lot of time and you expect it to run successfully. So the key takeaway I want to take is you need to keep the friction low, right? You need to lubricate it. And what better way to do that than making your tools available locally? But the thing is in PHP, if you were to go naively about adding these tools, you might put them in your composer dependencies or force people to install them globally. Installing them globally is a bit bad because you have to tell people that. It's hard for them to find out themselves because it needs to be documented somewhere. And to keep a separation, but still have it running automatically, I can just warmly recommend you the Bamani Composer bin plugin. It's quite a neat invention. You basically install this plugin into a composer file and then the Composer bin command will allow you to have namespaces for your composer dependencies. You're basically automatically building tiny composer projects in each of these namespaces. And in this case, I have a PHP. I have a PHP Stan namespace and require PHP Stan in it. That's all. The Composer bin plugin will then automatically go open up a new composer JSON in the PHP Stan folder or precisely in the bin slash PHP Stan folder and it will manage the dependencies there. So you no longer have a problem with dependencies because your CLI tools might update at different rates. And when you know a very common dependency like symphonies command updates, then your tools might break because they all want a different version of symphony. Some updated, some didn't. Maybe the project's gone a bit into a hiatus and so you want that encapsulated and the bin plugin helps you with that. So the result of running these commands is just this file tree. You have a Vindor bin folder and in the Vindor bin folder you find your namespaces, the Composer, JSONs and logs for it and it also installs the Vindor. The Vindor folder fetches the dependencies and you can also run stuff like Composer bin all. So if you want to update your tooling you just have to run one command to update all your tools at once. The thing that makes this really good is that you now have tools like spread across the file system but the Composer plugin does another neat thing. It simulings the executables you find in each and every one of these Vindor bin folders to your main Vindor bin. So for the user it looks like these are dependencies of your project in use like Vindor, bin, PHP stand, something like that but in reality they're encapsulated and you're not in dependency hell with these applications. Next thing. I mean having encapsulated tooling that comes with your repository is very nice but what I also want to tell you is you got to still make it a bit more easier. Most people don't want to run like five commands to get everything installed and so I mean I prefer Makefile for example. It's an old and better proven tool. This syntax is admittedly a bit weird but with stuff like Makefile you can just encapsulate even your commands like the easy coding style will check in dry mode and then you've got another command to just append a fix. I mean you might say oh great this saves you like five characters but it adds up, it's just accelerating your point and I mean the same for the static analysis you can just add more and more tools they are the same way they run in your CLI and with stuff like these pipes you can also require different commands running in order so it will automatically run a composer install before it runs your tools if it doesn't find a composer lock file or something. Of course there are some alternatives. You might want to prefer writing plain shell scripts that's totally fine as well and also more compatible with most common CIs or even PHP. I got a colleague that wrote basically a whole build tool in PHP to run shell scripts in a more organized fashion and I mean there are thousands of other ways I mean you could use whatever kind of make like I guess CMake would be a possibility if you're into that or outlander stuff like writing your own tool your own application, something else maybe JavaScript or something for all I care you might input it into NPM the most important part is that it's easy for your developers to use your tooling and with that I want to thank you all for attending my talk and I wish you all a good day.