 Hello, welcome everyone would like to present our next speaker Andre Suri from ISE who will be talking about bind 9 code quality I don't think this is actually a DNS talk is it? No Right a little bit so Just to get a feeling of the room who knows bind Who uses bind? Okay. Who is the developer here? Oh Oh, so many No, no I would know something about it and who who ever programs something about DNS like DNS developers, okay? so This talk is more about like what we are doing to bind at the moment So it's not strictly about the DNS. It's more about the bind and And about the concurrent programming So so my question would be who does threaded programming Okay, quite a little hints So there's a couple of prerequisites for us In in bind and that's continuous in the integration because if you don't use the tools I'm going to show you on a regular basis. It doesn't make sense at all So so if you ever use the tools just integrate them into your continuous integration and then you have to have a time to fix all the bugs so About bind. It's a 20 years plus All the code base so you can imagine It's it's pretty nice C99 codes sometimes and it's getting better. It's multi-fitted It uses locking it has its own RW lock which Have some advantages like it allows downgrade and upgrade and Since by 914, which is current stable We are right now just before the release in 916 stable version It uses the STD atomics or or the shims for in GCC or or in Windows API so I don't know if you can actually see that but there are a couple tools that we can use or We are already using that it's kind of built from Clank, I'm going to talk about a cpp check There's also clank tidy which does a different kind of checks. We are not using that yet for the runtime analyzes this there's a lot of Sanitizers from from GCC and and a lot of em which are very good and There's some commercial tools like the cover this cover this can which can be actually used for free for the open source projects There's something called solar cloud, I think But that's that's like very chetted it represents all kind of Weird stuff. That's not even like it has a lot of false positives for us And there's look good to me That's calm which also looks very good and And the last tool to be used for one is cosinelle. It's Very good tool for refactoring large code bases. It's it's mainly being used for Linux kernel, but we are using that too So so what's what's the scan belt? It's static code analyzer It works by replacing the CC and CXX Environment variable and it comes from the LLVM and clank suit suit So the usage is quite simple. This is for the auto conf base projects You need to use it for configure and then when you are running make you keep the CC variable set from the configure step and and the output is this is just one of the comets because we tried to be More verbose than just fix back in the commit messages, so this includes that the real reports from the from the scan belt and It it finds that like the little stuff that the bind code base is quite good in regard regarding the scan belt So it found like the small stuff like the result is never used and And so we fix that obviously Then the CPP check that that's a different tool it's independent projects also static code analyzer It's more difficult to use if you if you use auto conf because it requires compilation database, which is a JSON file with like which includes all the configuration options and and C flags and LD flags and and and the path to files and And you need to build a compilation database So there are some tools that can build that automatically like I think CMake and others For auto make it's it's more complicated And it has different levels of strictness that the most strict level is is Crazy at the moment for bind. I'm not saying it refers false false positive. It returns good stuff, but you're not there yet so This is this is the usage so you run configure then are you form? for may ought to make an auto make you still tool called bear which is I think Like short for built ear so it listens what what goes on and then you run CPP check on the Compile commands jason that's the compilation database and it runs the scan on all your source files so it's important to run your project in like a debug mode or or Enable you enable everything in the configure so it gets analyzed and the output looks like this so it's This is all from the tool called CPP check dash HTML report and You are lucky because the CPP check one point ninety just came out and it found a new errors Because we were CPP check clean before so this is this is from the CPP check nine point ninety So it found a new errors And then if you click the individual error, it looks like it looks like this so it It's annotated source your source code And it shows the path through the file and it's very it's very good So if you're if you are developers and not using the CPP check, I invite you to do so Because we like it quite a lot So The next tool the cosinell It's a program matching and and transformation engine and it has something like called semantic patches the usage is that you call the command as patch then you Pass it a patch did the semantic patch it can work on on git repositories and There are some other small options So this is an it outputs the number of files it matches according to rules And then if there's anything Needs that needs to be patched then it either outputs the patch to the standard output or you can like patch the files In place there's an option for that too This is a very simple Semantic patch So what this semantic page does that if it finds an assertion? For example, if you have a switch Switch command and there's a default which always asserts like this is like custom Bind assert it insists zero. So it if the code reaches the point it crashes with assertion. So So it adds ISC unreachable which is actually just macro to To attributes that the end couldn't be reached So it adds the IC unreachable to all occurrences of the insist where it's not already there So that's the condition behind. I'm sorry for the unicode corrector. It's actually The the normal Asterisk dash, but the font kept like changing that and And this is more complicated example We made a recently we made a change to the bind allocator So I see man get cannot return null it if it if the allocation fails The bind will crash because if you don't have enough memory, then you have different kind of problems anyway With the current allocators in modern operating systems. So this this tries to catch all the all the occurrences where There's a statement Which is you you declare a statement s and then expression v so if v like a variable Is a null and then falls the statement It just deletes the whole whole block of the With the variable and there are some it's it's quite smart it it even can understand. So if if the is not now and there's a block and There's a L statement. You can just keep the keep the other block which is The condition on the the first con first block on the on the right with the s1 s2 So it can do all kinds of transformation So it understands a C code and it transforms the code properly. So it's not like a Like find and replace it it understands a C code and can delete the whole statements or transfer them into something else Again if you are working with the old cold cold base like bind is and you need to do the extensive refactoring this is very very cool our tool and and You should try it so There are some more examples how to how to do stuff with costinell. There's a web page There's we have a directory with all the semantic patches will be used in bind But they are very bind specific and there's a costinell area org which is even more examples in a in the gate repository So so why and and here comes the fun part of all It's not so funny if you have to fix all the bugs So who understands the memory ordering? Oh, I'm lucky. I can tell you whatever I want on a platform with weak memory or consistency Okay Me neither Well, I'm trying So and what about memory barriers like do you know what's that? Okay one So and and bonus question what's luck a lesion would anybody know Okay, I'm not going to talk about luck a lesion, but it's a feature on Intel processors maybe on AMD 2 where you can like not block things, but when they Collide you roll back the whole transactions in on a CPU and memory and you actually do the locking so you can avoid some locking on a hardware Very cool, but I'm sure there are some bugs hidden in there especially with Intel So Fred sanitizer This is runtime analyzer. So it's it doesn't analyze the static code, but it analyzed it like running code It requires toss it requires requires a good test suite to run Or you can like run in on your production if you are brave enough And it checks for data races Like if multiple threads access the same memory location Mutex ordering so if you have like multiple locks in place and they are locked in different order in from different threads So that there's a possibility of deadlock and other subtle errors in the internet programming The usage is is again quite simple. You just add a flag to to see flags or a leaf legs Well, you need both and and for buying we need to I told you we are using our own implementation of RW lock So Fred synthesizer doesn't understand that so we need to use the the standard P threat RW lock Here then you you build the program then you export some extra options We because buying this quite complex we need to increase the history size It's like a amount of memory the threads in Russia used to to keep track of the memory locations and some other options which make it more like Cleaner because for example lock path the lock path T-SEN Basically means that the output is slog in the separate file and it's not mixed in the in this standard standard output Exit code again means like usually it exists with Exit code 66 But if you are running a test to it, you usually don't want to stop the test suite because the it failed so There are some like options like this then then you run a test suite Bind has both unit tests and system tests and then there's an output like this so As you can see there are data rays between the two threats and There are similar location like I see and I'm handle on ref But there's a data rays and it's quite interesting data rays because this is the this is the unref function It uses reference counting and if the reference goes to zero Which is the the function is your f count decrement So if it go if it's not zero yet, then it returns But if it's zero then it delegates resources, it should be correct, right? So why is there? A data rays especially in this function, so it probably uses the reference counting and Then the like the free function, which is called here. We just delocates resource and that's it It clears the resource and then it's the memput is basically the IC version of free and Still there's a data rays it looks like innocent in a cruise, right? So the fix is to either destroy to the to the reference handle why because the Refcon API uses the standard atomics and and the decrement function uses the memory ordering release and This story function uses the order acquire and To actually understand this this is the from the documentation So if you don't use that There's no synchronization between the threats So the destroy function the missing destroy function made the one of the threats not to synchronize with the other during the decrement and Then there was a data rays because of the missing release load and Like okay, so who now thinks understands memory ordering? I Understand memory ordering and the atomics more than I understood them a year ago, but I'm still I Another example There's data rays on the power PC64 Reported by redhead because they do have some interest in hardware And it's it's caused by our own RWB lock and the proposed patch again changes the Memory ordering from relaxed which doesn't cause any synchronization to acquire And and the compare exchange operation again to acquire On a synthetic benchmark, it works. I just I just look at the function and I think that basically There's an error in our RWB implementation, but we haven't seen it because the usual hardware has a strong memory consistency and It doesn't manifest so it manifests only on the power PC64 because it has a weak memory consistency again like So but I Think that there's a there's a lesson to be learned here. So do my I Like the saying don't do your own crypto and I would add don't do your own walking Unless you absolutely know what you are doing and And and the good news is that in a new version by 916 we are actually have running a benchmark now with the system library Rw lock and With some other tweaks. We are over one million Requests per second. So just dropping our own RW implementation increases the performance because we changed the other parts of Bind so That's good news. We might be locked. We might be dropping the implementation at all another example There's a comment. No one else has this socket so we can free it, right? And and the fix is actually to lock the the socket structure because again There's there was a bug which no one else has the socket except when you are on free BSD in a VM where on a hardware with load of course, it was very hard to reproduce And But it was reported by the free BSE maintainer And thank you for for all the work he does by reporting the bugs to us But it was it was very weird bug We be tested like with a lot of course on Linux, but it only manifests in the free on free BSD in VM VM like If anybody can say me why then I would be happy to listen So another another thing On free BSD We added to our continuous integration. We added free BSD and windows in September last year We also added open BSD because if you don't run in the in the CI On every bird request, then it it's like it's not happening because the developers are going to ignore the reports that came synchronously and and there was an intermittent failure in the free BSD system test job and it was a crash in the libfret system library and It was a open SSL at exit concurrency because open SSL 1.1 Automatically destroys the resources, but if you have a multiple frets running and and shutting down the resources then One of the frets were destroying the mutex is held by the open SSL before the open SSL on Exit functions ran so we had to request that like There's an internal task manager In buying so in a fatal function you have to request the exclusivity of a task manager so it finishes all the other stuff before it crashes again like It was very weird because we were like looking why it's crashing in the in the system library Here's a quote from the documentation The other type of the box is the lock order inversion and this is actually the the cycle at the beginning is quite short I have some like eight Eight mutex long Bucks in bind at the moment. They are not yet fixed and but it looks like Like mixing the mutex is in a different order from different frets Just an example how it looks The most simple thing you can do is the convert the locking to Because there's there's a lot of flags in structures. So we just converted this this flag remove to standard atomics drop the locking at all and It's protected by the atomics And now you have to get the memory ordering right So are we there yet with the fret synthesizer? Not yet. So this is the current result on the on the bind master There's like 10 distinct warnings from the from the fret synthesizer, but There's something like more than 200 different paths to to the warning. There's a lot of Duplicates like these six Errors are probably the same one and in the in the lock ordering. There's probably also a Lot of duplicates. So it's not that that much bucks But we are not there yet. It doesn't crash on a regular basis. It just sometimes something happens and We don't know why and And getting getting this clean means that we can run the fret synthesizer in a strict mode so if there's a new warning, it will stop the merge request from being merged and But as I said, we are not there yet So, well, thank you for listening and We have seven minutes for questions if you have any questions Not about memory ordering, please I have several Just a very quick one Did you also enable more compiler warnings over time? Yeah, we run with W all W extra all the time and We also have a in the CI. We have multiple jobs. So you run with clank We run with GCC and we run with different Optimization options because sometimes it's also if you run with minus or zero Sometimes it reports stuff. That's not real back. But it like I don't know there. There was a there was a code that's if there was a variable Zero, it didn't run a loop So on a higher level of optimization that got optimized out But on a lower level optimization it did pretty warning about something So so again like trying different optimization options from minus O0 to minus O3 is also a good option in the in the CI because the compilers Do a different stuff Also running the threads in a tizer with clank and GCC There's a different results. So that's also thing you should do if you are running the sanitizers Just just do the different Compilers because they did, you know, they created a little bit different code I see a question. Just talk loud. I don't know. I don't know what is our cold coverage at the moment You'll go back like 30 slides Okay, question here. I would like to ask you about C plus plus check you use this tool to Try to find bugs or exceptions in your code. Is it possible to use it for checking coding style? Well bind is written plain C. So but I don't know if the C plus plus check Okay, yeah, you mean this one is possible to check just coding style Some yes For on some of them on some strictness level it also reports like Variable scoping so when you can reduce the scope of the variable and and some others it's it's quite extensive lists actually so But it would be moves to just so show it here, but this is what we use so we This is from our CI that the line so we are just using these but I have a I'm In the e-max camp so And I do my coding in e-max. So I use fly check to run cpp check and plank On my code so it out to automatically like highlights the issues I'm I'm probably sure that the the stuff that the young people use like all the ID So that it does this as well, but he makes I like he makes So what's bear make bear make it's It creates the the completion database from right okay any other questions So are you finding that see I is very expensive or slows down your team because I found that those are the options so right now um We are using Like hosted bare metal servers and we have like eight of them To run all the CI, but we have something like 50 56 jobs at the moment So it and it it's getting slower and slower as we had more and more tests We also have a windows in the CI so it's it just consumed one one of the machines, but the other machines are shared so if you are Open bsd is ran in the QMU Freeb sd had also consumed its own machine and it uses jails We we wrote a plug-in for githlap ci to use jails in the in the githlap ci and Also the open bsd that's also our own plug-in For Linux it reduces the standard Docker Oh But for for the other systems we had to write custom executors for the githlap ci So define expensive so If it catches a bug they would cause a CV later was it was that expensive? No, no, no but I found in some teams that It gets care people off like you say we're going to spend a thousand dollars a month now I'm just testing the code all the time. They say what can we do a hundred and Well, yes, then the team can do one feature a week Yeah, we are spending something under hundred thousand dollars per month for hundred thousand now Thousand under thousand one one thousand one thousand under one thousand a month under one thousand a month It seems like a good deal. Yeah All right. Thank you