 Hello everyone, here's the last free poster, if you want somebody and his stickers also free, if somebody want. Well, now please welcome Siddharth. I am Siddharth Sharma and I work for the Red Hat Product Security Team. Now? It's much louder. Okay, so I work for the Red Hat Product Security Team. It's again gone. Is it audible now? Okay, so title of my talk is Tactics of Code Auditor. And it's about, the code auditing is something which cannot be really taught. It's by time, it's like how much time you devote in reading code and doing stuff with it. So I'm going to share some of my experiences that I, like I've been auditing code for past two years now. And so that's what I'm going to share, how I do it. And I don't say it, it's like the complete thing. People can actually implement on, like their own techniques and stuff. So here is the agenda for the talk. And I'm not going to talk about it. We'll go with that. So what does code auditing look like? It's actually the fort in India. It's pretty amazing. If you see, this fort was built quite a long time back. And the fun part with this fort is when the tides are high, the fort is like the water will be here and you cannot go walking. You go through boats or any other means that moves on the water. And when there is a low tide, you go by horses or the other means of the transport. So what I want to say here is it's like the source code or the application is same like this. If you put your application into different environments, it would have different type of vulnerabilities. It can be attacked differently like the architectures, the way the application is running at the moment. So yes, it's pretty much the same thing. You have a back door in the fort. You have known floors. You have unknown floors. There are guards which you want to tackle them. So the scope of the audit, it's like many developers, like I work for the team where we have customers who are internal to the Red Hat. Sometimes we get the customers who use Red Hat products. They say we want to buy this product, but what's the proactive effort that you are putting it into the code audit of the application? Many developers would ask, okay, we have implemented this feature in the application and we want someone to look at it. Like there might be some vulnerability. Or not in terms of the open source or the Red Hat, but like people who are the freelancers, they can go for the bug bounties and zero days. We all see that. A lot of times, like the program managers of the product or different people involving the product, they will say, okay, even if you find the lower, the moderate vulnerabilities in the code, just tell us, we'll fix it. So when you start auditing code, it's really important to understand like what you are looking for. It might be like your customer wants only the vulnerabilities, those are critical or important to them and they just don't care about the lows and moderate, so you don't report them. So this is pretty much the methodology that I use. When you are looking for the particular application, you should really look at the documentation exactly the same version of the documentation, which applies to the source you are auditing, because that's really important. And most of the times in the documentation, you are going to see comments or even in the code, you can see the comments from the developers saying that, okay, we have not implemented this feature, but this is really important. And then you know where to look at. The other thing is I haven't tried it because most of the developers like live outside or like I don't talk to them in the real world. It's like you talk to developers. When developer is really drinking, you should go and ask them, they might spill the beans like they have this problem in their code and you wouldn't know, yeah, this is something which we need to look at. The other thing is if the source code that you are... if the application you are going to audit, it has been in the community for a long time, you should see what the other people in the community are saying about it. They might have some ideas, okay, we think this could be the security flaw in this application, but no one knows it. So you have some idea, like some boundaries where you can start thinking of it. There is an idea and then you start developing the thing that, yeah, how you are going to audit that. The other important part is if there is an application, you should really know there could be like million lines of code in it. So you should... if you are able to get the blueprints of the application, like how it works, like what is the architectural design of the application, you might not want to look at the backend, like whatever it's getting stored in the database, but rather the modules or the other places in the program where the user is feeding the input to the application. So you can literally cut down the areas where... the places where you are going to see or look through the code, because in no way you are going to... people can audit one million lines of code or larger than that in a week or two weeks. It's not possible. So this is like... I'm just taking example of the Chef. This is the architecture, how it works, what's there. So you get some idea, like how you want to audit the code, where are the places where you want to look, for example, if there is some network thing or maybe at the places where the cryptography is used, or the places where there should be the really secure channel in the data and the developers have not implemented it. So those are the good places to look at in the code. Do you find out the vulnerabilities? Yes. So why the application was written? Who are its users? It's also really important, because if there is any daemon that is going to run as a root and there is not much users, but it's a daemon, then you see things like if daemon escalates privileges like getUID or setUID and there are places in the code where it doesn't drop it properly. So you might want to look at those places in the code as well. So basically you should have some high-level view of the application, where you can think in the boundaries where an attacker would be looking at. Yes, like authentication information disclosures while logging data. So this is also really important. For example, a lot of places in the application where developers enable the debugging logs and the debugging logs get shared with the other people, or they might load it to any public places that can have password, IPs, a lot of places. So this is also one of the important parts for the information disclosure, like how the application is logging data, especially the debugging logs. And the crypto failures, yes. Also just not by looking at the source code, but it's also important when you compile source code. Compiler will throw up a lot of warnings. Like when we build the RPM packages, you can see the RPM build logs, what's gone wrong over there. It might... So there's some places where the code was not properly written. So you get some ideas about the application or the code. So I'm basically going to talk about the CGNC++ vulnerabilities here, like the common type of vulnerabilities in the code where people can actually see them. So the first one is the buffer overflow, then the data types and the miscalculation or the bad logic that happens in the code. Predictable filenames. For example, in the terms of the buffer overflows, these are caused by the bad APIs. You can see mem copy and str copy and things like that. You can just... In such situations, what I usually do is I just grep these functions, just throw them at the source code, and if there is something that it shows over there, I don't have to use, like, a lot of tools, different tools to do that. So if you see these things, you can just directly just log them into your audit report and say, okay, just need some change. But usually these are not really very critical vulnerabilities nowadays. The other thing is, long time back, if the code is using any string functions and you don't import the string.h, there are completely a lot of security checks in those miss in the code. So while compiling also, for example, the 45 source will not work. It will not have the proper security checks, and if you have an overflow or any format string attack over there, it's just... Maybe the application just won't crash, and you will get an overflow over there. Also, it is important, while compiling the code, you use the d45 source with the optimization flag, because if you don't use the optimization, then also just 45 source doesn't work. But in our RPM builds, we make sure that that's enabled by default. So in terms of the, like, buffer overflow, it's the classic buffer overflow that happens all the time in the code. You can see there is an argument that goes directly into the sprint tab. And you can actually see a lot of places in the code where the user data directly gets into these functions. So these are one of the places where you should look at, like from where the source is coming and from, like, what is the source and where it is writing to. These are also some of the examples that you will find in it. You will find in the code. For example, sometimes developers doesn't think about the null termination of the string. So you get an off-by-one byte error. And below is the code, how you fix it. Like, you calculate the string length is not greater than the buffer minus one. That is for the, like, you calculate the null string at the end and that's how you return it. But basically, those are places where, like, you can look in the code for the overflows. So the next are the data type flaws. These are different on the different architectures that you are working on. For example, the X86 or ARM or any other architecture, it would be different because the data type handling or the size of the data type would be different. So integer overflows are associated with it. It's like signed and unsigned data types, which have, like, major problems in that. You will see a lot of places developer uses size t. And size t. So size t is, like, the unsigned data type and size t is the signed data type. Now how many people know here, like, how the signed and unsigned... Is there anyone who doesn't know that? Okay. Like, how... What is the difference between the signed and unsigned data type? Like, everyone knows here, right? Yeah. So there are a lot of places, like, the size t signed data type is not said to be used, but there are actually some of the places where you will find, like, developers do this common mistake. And you can see in this code, it uses size t, and then this is the user data and the length that goes into some check, and that returns minus one. Now that goes into the size, and what happens is it's returning minus one, but this is the unsigned type. So it's going to wrap, and there will be very large value that's going to be stored in the size. And this statement will always be false, so it won't hit here. Though there was a security check done, it returned minus one on error, but here it's never going to check because it just wrapped, and now the size has the positive value. And it will escape this code, and then you have, like, boom, four gigs of memory allocation. So that's how, like, it's two's complement, how things work. That's 32-bit signed. It's, like, the first bit is used to store, like, negative values or what. But, see, this is the common problem, like, you use the unsigned value, and the maximum value it can store is, like, 255, and in some logic it adds one to it now. It will be, like, zero. Now the overflow bit in the processor is set, but it's checked nowhere, so it's pretty much common mistake in the code. That happens. The same applies to the signed variable also. If you add one to it, it becomes minus 128. This is also barred code. Can anyone tell me, like, what's the problem in this code? Yep. You get a muffler. So should I give it now? It's here. So, yep, this is some more barred code. Anyone? This one was actually, yeah, I can actually show you. This is from one of the CVs that we have. Try to hear what the problem is. It's, ah, I don't know why it's so bad, but okay. So, basically, what it's expecting, it's a, the function is expecting the four digits, but what happens is as soon as the PID goes up one, there is an overflow, and that becomes this FFF here. After this operation, it becomes 1-0-0-0-0, and it overflows. So, like, these are pretty common errors that you can look in the code. Now, a lot of places, this is actually from the Python code, but, yeah, a lot of Python or, yeah, a lot of places in C, C++ programs, and maybe it would be relevant to any other program also, that they are writing to the temp directories. Now, if there is a daemon, a daemon which is working and using the temp directories, and attacker has access to it, they can actually use the, they can just swap it with the malicious code over there, and whenever the daemon again reads, and it's going to execute, and you get pretty much a root. If it's daemon, it's working, it's on the root privileges, you will get it. The other thing is, sometimes it doesn't depend mostly on the source code, but how it is shipped to the customers, because, for example, when you create RPMs, or just the mechanism, like, to deliver these updates to the customer, it also depends on the way you have packaged it. So, for example, if there is a post-install script which is using this, that's bad. It also happens in the spec file, so you just leave the boundaries of the source code as well. The code editor has to take care about things, how this is going to be delivered to the customer as well later, because, suppose if this is an RPM, and you are, like, internal user, you can just wait, and you'll know when your admin is going to update the RPM. You can just change this. You can add sudo, your name, and RPM uses the root privileges to install on the system. So whenever RPM is going to install and do the post-install script, then about the tools, source code browser, yeah, everyone uses it. I use CScope just to see some of the functions, how they are getting called to and fro, because if you are running some of the static analyzers, they will say, this is the function that you need to look at, but then you go into the function and you see, like, how the values are coming from the function to this function, or how the input is coming to this function. So I just use, like, CScope or some other tools to see how the functions are getting called from the start of the main function. So you actually backtrack. You have a place that static analyzer showed you. Then you backtrack to the place and see how things could be manipulated in between, if they can be. So some of the fast things are, like, you can use CppLint or Flowfinder or CppCheck. CppCheck is good. It shows quite a lot of problems. And there is this post-memory corruption memory analyzer. It's on the GitHub. If anyone wants, I'll share the link. He can pretty much run the program and attach the PID to this. So whenever the program is going to crash, it is going to analyze it automatically. And we'll tell you most of the areas or the regions where things were executable on the stack or the heap or wherever. The other part, like, the sanitizer. So this is also important in terms of the code audit because the code auditor can spend a lot of time reading the source code. And it gets frustrating not to find anything over there. So what I usually do is I just compile the source code with the address analyzers and I just run them. And whenever there is some corruption or anything, it just tells where exactly the corruption happened. And you can just go back and fire up your source browser and then you can look at the code where it happened. So for me, like, code auditing is not only like one way. It's from the other way as well. And then there are fuzzers, but that is pretty much the pen testing that people do. But, yes, as a code auditor, if you can do that, it's good. It helps you track down the places which might have been left in the code audit. So all the sanitizers, there is this LibAsan. If you are using Fedora, you can just install LibAsan and it works with the GCC. You can just compile it with fsanitize equal to address flag and it will set it. So address sanitizer can pretty much find these use after free, double free buffer overflows and such type of flows. And there is another library in the Fedora called LibTsan. You can just put flag fsanitize minus minus fsanitize equal to thread. Then the thread analyzer will come into the picture. And it checks for the race conditions and the deadlocks. So these are good. About MSAN, I don't think so. There is a package available in Fedora. So it just tracks for the uninitialized memory spaces. So also as a code auditor, we use CVSS too in the product security also at the moment for analysis of the flaws. But as a code auditor, you can use these things to actually identify the impact of the vulnerability you have found, like XS vector. It's how you are able to get into the program. It's based, it's on the local machine or the adjacent network, like on the VPN or the intranet. Or it's some kind of a daemon that you are able to access through a network. So you just use these values and there is a CVSS to calculator which will show you the impact of that. So this is pretty much anyone has any questions? Yeah, so, yes. Actually, like I have not dealt with most of the developers like that, just we say like this is the flaw. That's it. But nothing like that. But yes, the same thing is if you see the build logs of any project, the compiler would be able to show that. But the most important thing is if these are built on a system where no one is looking how it's building, it's going to mess it. Sorry? Yeah, yes, should be. Yeah, so there are like some other things. I showed you some of the classic things or the functions. So when we say don't use Band API to our developer, then developer can actually write his own Band API at some point. Like not to use those functions and do something like that. And then compiler won't show any warnings like that. Like, for example, the string copy or any which were like Band APIs. A lot of people might write their own C code just to do that which is like more bad. Now, the job of the auditor has become more difficult because tools are not showing that. Any more questions? I usually spend my Fridays on it. Because, yeah, you should know when to leave, when to give up on something. And then maybe it's not your day in front of the code that you are auditing, maybe the next day. Yeah, because it's really frustrating. It's not something you know beforehand, like your manager or anyone will not come to you and say you have to find 100 flaws in this source code. Now find it. Maybe there is nothing. Maybe it can also skip the eyes of the auditor. And that's why there is more usage of things like the address sanitizers and the things which missed the audit. But again, an auditor can go back and see things, like what were the problems, and then go back and look at the code again. A lot of, I guess, I don't know about that. A lot of RPM packages have, they use covariate, but I basically don't use that much. For me, yeah, static analyzers are good, but there are a lot of false positives in the static analyzers. And if it is going to give you, like, 4,000 places in a million lines of code to check, it's completely not possible. So the only way is, like, you run the application, you poke it several ways, you see where the crash was, and then you try to match where the static analyzer said it was important. And then you can see some link between those two places. Then you know exactly where to look at. In ordering, you cannot start from the main function, right? And then you start from where it is coming. It's a long, long way. Fuzzers, I use very rarely, very, very rarely, because fuzzing is completely different world, and there are, like, lots of fuzzers. Today, if you go and see, there are lots of fuzzers. And, yes, it's a bit difficult, but I usually do the classic ways of exploiting. I usually use my classic ways of exploiting that used to be done before the fuzzers were there. Just send some crap into the system and see where it crashes. But that's what fuzzer does it for you, like, doing the permutation and combination of different types of things. But, yeah, sometimes it's a bit difficult to use fuzzer in some projects. So, for example, I work particularly in storage. I look at the storage security for the 7th and the Gloucester. And, for me, the important areas are there. But there's, like, a vast number of people working on, like, different products. So, they will have a different type of approaches to their own products. For me, it's for the storage only. I guess no one wants this. So, mufflers are here. They can take it. No, no, no, no. First of all, when I was learning programming, usually, but you may try. It's because, okay. Okay, like that. Yeah, and one more question, because I really don't know a lot about spirited. What would you recommend to start with some basics about spirited? Some literature. Yeah, sure, sure, sure. I'm sorry. I said you didn't mention Valgrine. So, the ASM. Asan is, like, similar, maybe better? So, with the Valgrine, the problem is because that's a runtime thing. And the application becomes really slow. So, sometimes it affects it? Yes, it does. A lot of clothes, like, memory based clothes. But in terms of that, it's at the compile time. It just puts the check at the compile time. Yeah, but still you have to run it, no? You just compile it with the ASM. And then I run it with the, no? Yes. So, let's compile it and run it? Yes. So, that happens. Presentation. Oh, okay, okay, okay. Thank you. I had the most to solve, but... I'm sorry, just give me... Yeah, sure, sure. There was a four in there, right? Yeah, thank you, thank you so much. You work at Red Hat? You work at Red Hat or...? No, I don't work at Red Hat. Okay, okay. I will request for literature because I started to... I was interested. I started with some kind of very serious book and it was like... My name and... Siddharth. Siddharth? Is it S? Siddharth. Okay, it's D. All right. Okay, so I will write it to you right now. Thank you.