 Yes, so my name is Witoldz and I want to share some stories about how to work on code that can, well, drink legally even in the United States. So it's quite old. So what is old code? So imagine you're a software engineer in mid-90s in Poland and your boss asks you to write a phone book application, a simple one. The phone number simplifying in Poland is nine digits long. There is a two-digit area code and seven-digit subscriber number. It's easy, right? And your task is to give a number, return the area the number is from. So you write a, you know, a simple function that checks if the number is of the correct length, check if it is a number and does a simple table lookup. But then GSM comes and the prefix for GSM operators is three digits long. No longer three digits. So you need to add some more code to deal with the special cases. Then GSM is a hit so they add more prefixes. Then the regulator of the market decides that you can switch between operators without switching your number. So you need to add a database lookup. And, you know, this code was okay at the beginning, but then you add layer and layer and layer and you end up with something that's not really nice because it's doing what it's not designed originally to do. So it's not that the code is badly written. It's just filled with lots of technical depth because people ask to add features and, you know, not to fix the code. It's not testable usually because it has so many special cases and stuff like that. But it's really impossible to test it. And it works. It performs well and, you know, if it ain't broken, don't fix it. So it's hard to tell your manager, you know, I have to spend three months on refactoring this piece of code even though I won't fix any bugs and it won't perform better but, you know, I have to. And the thing is it won't fix itself. If you have a code that's complex, that's an old code, it will be worse with time. It will never fix itself. You have to spend time fixing it. The other problem is that it's hard to understand. If you have a function that's really complex, it's really hard to get into it and it creates a high barrier to entry. So when I first started working on Bind a few years ago, my manager told me that he doesn't expect me to do any real work for a first couple of months because that's how long it will take for me to get accustomed to the code. So I can, you know, look at some bugs, try to fix something, but if I want to do anything productive for the first couple of months, it's okay because that's how it works. So, you know, Bind 9, first commit imported from CVS. Can anyone guess who doesn't know the date of the first commit in Bind 9? Anyone? Oh, come on. So it's August 1998. So in August it will be 21 years. It will be able to drink legally in US. It was a replacement for Bind A which was the buggy Internet name demon because of, well, tons of bugs. And Bind 9 was written with a principle of design by contract. So basically, because Bind 8 was so buggy, if you have a function, you always verify all the inputs to the function. If something is wrong, you bail. So there are no exploits. There were never any exploits for Bind 9, but there were bugs that cost does because basically if we have a problem, we bail, we shut down. And just for your imagination, Bind 9 is older than the Matrix. Agile Manifesta, which was created in 2001, test-driven development. It was, well, rediscovery of test-driven development because there is a paper from like 50s on how to write code and that's basically test-driven development. But somehow people forgot about it for 50 years. And Linux 2.2, 1999. Not to mention 2.6 with modern working threads, 2003. So Bind is like an onion. It has layers and if you open it, it makes you cry. But Bind is a reference implementation of DNS. It used to be, it's still used if we have DNS vendors here who hasn't used Bind as a reference implementation of DNS. If you don't know how to do something, you check what Bind does and you do it this way. So the other thing is if there is an RFC, Bind has it. Well, no longer, but most, we decided that it's not saying to put everything, but we have support for dial-up connections, for authoritative servers on a dial-up connection in 2018, 19. And also, we are a small team like virtually any open source project. And because of the barrier to entry, because the code is so complex, we don't have many external contributors. We have some, but not that many. And basically that's how old and complex code is created. You never have time to fix the issues that you have because you are, well, you have to fix bugs. You have to add new features and there is no time to, well, say, stop. We have to stop development for a year and fix the code. Okay, so let's do science because, you know, if you want to do something, I'm a mathematician by education, so let's do science. So how to define what the code is complex? Well, there was a guy called Thomas McCabe and in 1976 he published a paper and there he defined something called McCabe Cyclomatic. And that's a number of linearly independent paths through a program or a function source code. So basically if you have a function that does print hello world, that's one. If you have if input is odd print hello world, else print goodbye cruel world, that's two. So the more paths that execution can take through the code, the more the number that's called McCabe Cyclomatic complexity is larger. The more complex the code, it's harder to understand and it becomes more impossible to test it. So there's a rule of thumb that I found in many, many papers that the complexity of a function is below 10, it's okay. Below 20 is warring, above 20 is bad and basically you have to refactor it and also above 40 is horrible. Okay, so let's do applied science. So there is a tool in Ubuntu, I think in EPL called PMAC tape, that basically you put, you give it the source code and it prints you the complexity number of lines of code for each function. And I took it in bind source code and in a nice file called bin slash name d slash query dot c, I found a function called query find. Now that's an interesting function that basically does everything. It has 2500 lines of code and complexity of 474. It's an horror, but you know, and also I don't think that PMACC calculates it. It had lots of go-to's going backwards, going inside switch statements. It was a horror. It wasn't always that bad. It started at about 100, which by 1998 standard was okay, but still 474. And, you know, the whole DNS camel was in this function. So if you had DNS 64, it went into query find, filter code A, it went into query find, everything was there. So you know, hold my beer. So disclaimer, I'm not saying that the thing I did with query find is perfect, not even that it's good. What I'm going to say might sound obvious, but if I knew, if I would see this presentation before I started, I would save a month of work. So, you know, wise people learn from mistakes, smart people learn from, well, my mistakes. So this is a collection of my mistakes with the code. So how to start? Read the code. 2,500 lines. Then, read the code. Then, again, read the code. You won't understand all the flows through the function. It's impossible. That's what McCabe said. If it's above 20, it's impossible to comprehend all the flows. If it's 474, it's really impossible. But just, you know, for each piece of the function, you need to know what it does, vaguely. Then comes crisis. You need to just rewrite it from scratch. But can you guarantee that it will perform the same way as it did before? No. Can you guarantee that the behavior won't change because your task is to refactor it and not to change behavior? How many new bugs will you introduce? Do you have enough tests to verify that the code you wrote from scratch works? Do you have the budget to do it? Because it will really take a long time. So how do you make the progress? You cut the function into smaller pieces. You take something that looks like a separate function, and you move it into actually a separate function. Optionally, you create a state. Well, we are talking about C. So basically, you create an object with all the state that can be passed between functions. So as an example, that's a piece of bind 911 code with some comments removed just to, you know, make it clearer. And that's bind 912, before and after refactoring. So that's a piece of query find. And that's a beginning of separate function that replace this piece of code. So you can see nothing changed except for added state. That's the query context. That's the structure that basically is being passed between the calls. So I think four rules that you have to follow if you want to do it correctly and not waste time like I did. First thing is stick to your job. Your job is not to optimize the code. Your job is not to fix bugs. Even if you found a horrible bug, leave it. Your job is not to rewrite pieces that, you know, oh, this looks bad. I can rewrite it. No, it's not. Your job is to refactor the code by cutting it, cutting code, putting it into a separate function. Optionally, taking the state object. And I know it's tempting. But just don't do it. You will regret it later. Make comments. Write bug reports. You know, put positive notes on your monitor about things that can be fixed. But don't do it at this stage. Second one is don't be smart. If a code block looks similar to a code block that you removed earlier, you don't care. Just make another function. There will be time to do it. Because then there was one simple difference in iterator and, you know, nothing works. And you don't know why. Because you thought that you have two exact same pieces of code, but they are not the same. If a function can be simplified, don't do it. It's not the time to simplify functions. If some code is not reachable, still copy it, make a function. There will be time to deal with this. Basically, be a dump code cutting monkey. Nothing more at this stage. And the third thing is work slowly. You might miss something that's really important. So for example, you might not notice that the piece of code that you just cut has some effect on the global state of the function. That it modifies something. And only realize that after three days of work of cutting and, you know, and then suddenly DNS64 stops working. And you have no idea why. And you spent one more day trying to figure out why DNS64 isn't working. And then you have to scrap everything and start from the beginning. So cut one piece, commit, compile, run all possible tests. It's slow process. It's really painstakingly slow process, but it will be faster in the end run. Basically, don't take shortcuts or you'll pay for them. And the fourth one that, you know, I regret. You have a bunch of small, simple testable functions. So write unit tests for them. You know how the code works. You know what it's supposed to do. You're very familiar with this code. Write unit tests right now. Don't, you know, think that you'll write them in a few months or a few weeks. You won't. So basically drop everything, write unit tests. Because I didn't. That was my final mistake. I did some unit tests, but it wasn't enough. And now I really regret because I would have to, you know, re-familiarize myself with the code. And that takes time. And then you can take care of the posted nodes. So you can fix things because you have unit tests. You can fix things without worrying that you will break something. You can fix bugs. You can optimize because you have, you can code that's tested that has small units that you can work on. And for example, the merge similar functions. That often happens if you have such a large piece of code, you will find blocks of code that are very, very similar. But now, only now is the time to take a look at those functions. They look similar. Where are the differences? Okay, they're the same. Or you just need to put one if inside and one flag. The result is not perfect. We still have functions that are overly complex after the refactor. But it's good enough. And thanks to this refactoring, we now have Q-name minimization. That was a really easy thing to work on. I did the work on Q-name minimization. We have modules coming in bind 9. 9.14. Basically, you can add a module and in a way change the way the query is processed. So DNS64 filter Quade will be a module. And that's of most thanks to this refactoring because now the working of the code is much easier than it was before. And the other thing, at the end, remember to measure your code regularly. It's very easy for the code to go berserk and basically you'll end up with a function and code that's not maintainable. That's very hard to read. There are tools like PMAC Cape. And you can use them to measure your code to see, for example, that there is a function that is now overly complex. It won't fix itself unless you refactor it. And it's better to do earlier than later. So I've checked the DNS projects, the open source DNS project because that's what I do. And some of them have problems, I have to say. So use PMAC Cape, measure your code, and fix the issues while you can and while it is vaguely easy because later it will be much harder. Any questions? Have you pointed these other projects to the complex point? I believe I did. So the question was, have I pointed the other projects to this? I believe I did, but that was a few years ago, so I might talk to people from those projects about this. I remember something, yeah. So yeah, I've said something about it. What's this? Next question. Heavy object. It is in bind nine. So the question is, when is key name minimization going in bind? It is already in bind 913, which is a development version, and it will be in the next stable 914 release. Yeah, so the question was, do I have any advice on if data structures are chosen wrong? Well, in this case, no, not really. I was focused on working on this particular function. It happens, and usually if you choose data structures wrong, then you have bigger problems basically. You need to sometimes rewrite the whole stuff, the whole thing from the beginning. So yeah, I was trying to focus on just how to simplify the structure of a function. Are you mentioned modules? So what can we expect from modules? Okay, so the question was, are you mentioned modules and what are modules? So this is the thing that will come into 914, and you can modify the query as it goes through bind basically. So the first two things that will be exported as modules, that now are core features of bind are filter quad A. So the thing that if you query over IPv4, you will always get A address, and v6 you will always get quad A, and DNS64 will also be a module. So you can imagine basically it will be, the things that you can do will be added on an on-need basis. So if you need to do something, we will add a place to put a module in. But basically you can transform the way the query processing is done. You can replace some things in query processing, like DNS64 for example. Do you plan to add some generic hooks like the Lua hooks in our DNS? We don't know. So the question is, do we plan to add generic hooks or Lua hooks? I don't know. We don't have these plans yet, but if there will be an interest, then why not? So the question is, why bind isn't written in coffee script instead of C? Well yeah, I would love to have a modern nameserver written in Java script or some modern language, definitely. Yes? It's very useful to parse, get them out in a sensible, possible format, and I don't have to write parser. Why do you use the comments? Well yeah, that was more a statement than a question that name compiler zone is useful. Yeah, well, like I said, we are no longer reference implementation, but people still use name D as, you know... More or less. Yeah, so people still use it to verify. Yep? Okay. Thank you.