 I work as a Linux kernel developer at Oracle and I work in the kernel security engineering group. So, basically I work on the kernel hardening project and some static and dynamic analysis tool to reduce the work classes in the Linux kernel. So, today I will be speaking about COXINL which is basically a program matching and transformation tool and it's used like since past 10 years in the Linux kernel for the last 10 years. Here are some examples about COXINL and how we are using it in the Linux kernel. So, basically why we need COXINL? So, as a big community we have some code maintenance issues. So, first is like refactoring code to use newer APIs. So, sometimes people want to have some kind of new APIs for the code and they just go and push the changes over there but they don't know that other part of the other subsystems in the Linux kernel are also using those code. So, the newer APIs and then applicatory code needs updating based on that sometimes. So, actually results in some kind of specific errors. So, in those cases we can just have a see that this kind of bugs are there and they can just change the way of the structure of the code. And the human factor. So, copy-pasting the code and the mistakes can always happen. So, copy-pasting, I mean let's just admit that we do copy-paste the code. So, these conditions in the Linux kernel, so we found a case in the real-tech driver and actually all of those based conditions but that doesn't happen all the time. It can be done by the wheat macro. So, it's a global macro defined in header files and different subsystems can use it. So, this is the chip. So, over here separate variable for that. So, this is this line, use this over here. So, this one. So, before that can anybody tell me what is wrong with this code? I have just the part of code where the bug lies. So, there's a chance that you might never lock but... So, if the while condition is never true in the second bit. If the while condition is never true, you'll never do a spin lock at you but you'll probably end up doing a spin unlock. No, actually it's so... We are calling the function hwf-relock under this spin lock and this function is using copy-to-user. So, basically Karnal is interacting with the user and if something happens in the user space then this will end up in the deadlock. So, the system will crash. So, this kind of basic pattern that something is called over there and you should not use copy-to-user or copy from user under this spin lock. This is some kind of stuff which is happening in the Karnal. So, when we reported this, he told that he didn't know that this can happen. So, this kind of stuff also exists over there of the tool Karnal. The bugs like any other static and advocate analyzers get to abstract over the relevant information. So, it doesn't happen that the whole pattern is in the line. There are kind of some other code which is in between those patterns. So, it should be going to abstract over that relevant information which is not relevant for the code. So, how it happened? Explain later. For all kinds of software developers and it runs on your code only. So, it's not something related to the middle process or the different architectures. So, COXINER is a program matching and transformation tool. It's developed by the India researchers and it has its own language called semantic patch language. It's like a very intuitive patch language notation style. So, playing in the git patch, we have plus and minus signs. We can just have this kind of scripts and it will just fix your code. It's used by several other communities as a semantic patch language. So, actually, COXINER knows C. So, it's written in the OCaml and Python. And it has abstracts like the independent copy compilation process. It has some, which we are using to abstract over the sub terms. And it has dots which we are using when we don't care about the relevant information which goes between two code fragments. And lines can be, now we can write the COXINER script for this example. So, this part of the code. So, what we want to do is we just want to remove this line and we want to use a bit method instead of this. So, there are two things we care about. First is the number 16. We should go over here and then it should be replaced with the bit. The other thing. So, in the previous example, we had one constant which is 16. The number and the meta variable. So, 16 and the thing which is in the capital letter both are constant, right? So, it should match both of these things. So, that's why we are defining the constant. So, what goes under, it's a C constant. And then we are just writing the minus and plus sign. Then we can have some kind of expression over there. We need to, we have another meta variable called expression. So, expression over there. So, this script means these junctions. So, it will match the COXINER. Sorry, it will match the constant and expression both. And it's divided by the disjunction between either this or this. So, now, in the community, some people prefer to use it and some people, it's like a choice. So, some people don't want to use it. So, sometimes what Mendino prefers is, if the file has, at some point, then it can just convert all other points and it can just have consistency in the file. If with the another rule, they normally just change it. So, one of the other rules called use this bit. And it's just mentioned that this file is using bitmap rule. And if it is using it, then the other rule depends on the cross rule. So, if the file is using the bitmap, then we will just change the things. Second example. So, there are some lines which are useless. So, we can just have a directory should be here and the data is here. So, we just can print it. So, we don't have both behind the brackets in the, as we printed the arguments of the function. So, in that case, we can just have dot dot dot over here. Like I said, the irrelevant information can be just abstracted and you can just have a dot dot dot. So, it means that you don't care what goes in that section. So, transformation specifications. So, in the patch we saw, minus is something which is, we should be removed, plus is something we should be added, like in fast style. And one of the other thing here is we have a staff. So, it can be, see what happens over there in some particular code parts. And just collect the things, just collect the list of the files which have the same kind of pattern. The parts of that code. This mentioned that how the script is written based on the code fragments. But how we are running the scripts? So, its own command line tool which is called as patch. So, to check that your semantic patch is valid, you can just pass it. And if it shows zero error, then it will just run it over your code. And these are the commands for running it over your code. It can be run over one single file, one directory or all. Coxy check. So, coxy check is basically a share script which is designed for the Linux kernel. It works with the four modes. So, RG and report. And if you just run coxy check, then the default. If you are not specifying a mode like this, then it will just give you the output in the report form. So, here are the modes. So, first mode is patch mode. Like we explained in the other examples, this is the patch mode. And once you run it with the mode patch, you can just get the output in the .out file and then you can just apply it over the files you want to. Other mode is context mode. So, context mode is basically the start thing. So, either you can do the start thing in your patch itself or you can just have a minus and plus thing in your patch. But you can just run context mode over there. So, it will just have the dash over there. So, you can just highlight this section and you can just check it with the context mode. The other is the oargy mode, which generates a report in the oargy mode format of Emacs editor. And this is the common report mode, which is mostly used by all static and dynamic analyzers. We just print the message and warnings with the line numbers. And so, this was some kind of simple stuff. You can just do it for your C projects as well. It's not like a limited to this Linux kernel. You can have your own shell script if you want to automate these scripts or process scripts written by you for your C code. Or you can just have a single script and run it over your code and just check what happens. But this is not the limitation of the coxsignal. It was the simple things. There are many other things which goes in the language called SMPL, which is used by the coxsignal. One of the other things is you can embed it with the Python and OCaml script. So, the report mode it is printing, report and oargy mode. So, that printing is done by the Python script over there. And so, better like this written dot dot dot, like we don't care about it. And even if that is written there, if you have one of this pattern, it will match the both. You can put the constraint over there, so it's not like they are in the line. Everything which goes in between those patterns. There is some function, so the function is there. So, you can say that when not equal to this function, just give me the output. So, then it will just give you the output based on that also you can put the constraints in the meta variable sections, using the position meta variables what you can do is you can just have two different time, in the first rule what you can do is you can put the position, so you are just holding that thing and then for other time you want to match if those that thing occurs in the other rules, other betas, so you can use position meta variables over there and iteration, so iteration is done with the outcome and Python scripting, so it can be possible that you have five rules and some of these rules should happen in one file and some of the rules can happen in header files or can happen in other .c files, so in those case when you want to do inter procedural analysis, in that case you can use iteration over there and you can have the hash table and you can just hold the first two rules, part of first two rules and then you can just change the other things. These are some of the useful things, maybe they were pretty fast but you can just wait to ask me any questions, if you have. Do you think it's possible to do some automatic back hold, like a driver back hold is the best thing? Sorry? You know driver back hold, the back hold that is the version to the stable version, the back hold project and is it possible to back hold the driver to this kind of automatic? Yeah, yeah, yeah. So there is been project going on, like automatic back holding is the constant, so I think we do want to wait that fast and so he is actually working on these things and writing some of these scripts for this, where they are trying to have the driver back hold automatically by the patterns. Just a quick question, you showed an example just now of taking a stand off and then copy from user and then stand unlock and also would this program be able to detect such a pattern? Yeah, it would detect patterns or like this. So what do you do is you have one rule where you can say that there is some function which is using the copy to user thing and then you don't care about anything or any other things, right? And then you have the other rule where you can see that that function defined should not be called under the spin lock pattern. So as long as it is between spin lock and spin unlock, you should not copy? Yeah, if it is unlocked then it is not under the lock. So what about a function, say if it is nested in for instance the main function and then the main function calls function A which does the spin lock and then it calls function C which does the spin unlock. Yeah, so that can be done with the iteration. So the script we write was using the iteration. So in this example it was in the same file but it can be possible that there is a chain of the functions which are called and so yeah, that can be done. So why don't you just write asserts locks in your hardware depth mumble function? Isn't it easier to just assert that you are calling this with lock held? Yeah, it is actually, even with any of the documentation part, right? Right. But then if you actually put an assert in every performance sensitive code, it doesn't quite scale like that. The sheer reason why you use a spin lock instead of, say, an atomic lock is you want to save as many cycles as you can. I am aware of performance. While you are developing, you are not really performing sensitive. So ideally you crash far. Yeah, so there should probably be some debug build which you should probably put in an assert. That is a fair point. Yeah, maybe for that reason you need to go for the slow path. So is this project actively developed? Yeah, it's open sourced over here. And the mailing list, the person who would adopt this tool, she's very active and she's also collaborating with and changing it according to the feedback and things like that. Yeah, I did my outreach internship with her and now I'm using it for the security purpose. So it sounds like something that can actually be at least conceptually implemented as a static analysis pass in LLVM or GCC. Yeah, yeah, yeah. So I'm trying to look at it from that aspect. It looks like a very interesting thing that you would probably add to as a GCC plugin or as an LLVM pass. But maybe GCC, I guess, yeah. Yeah, it would be a lot easier to add to LLVM than GCC. Yeah, arguably better solution for LLVM. So any other questions? Okay, round of applause for all the questions.