 Okay, good afternoon Welcome back. Now we have Luciano Bello who's going to talk about towards easier security patch porting. So please welcome Luciano Like a normal disclaimer. This is extremely alpha thing. It's kind of like an idea so The talk is kind of like a like a call for comments and suggestions So for that we are going to do something like we used to do a lot here in DevConf With Gobi. Now we are going to do it with another part. So here is URL in the bottom The URL will be there all the time in the presentation. So the Bullets of the presentation are there. So if you have any comments or something Please write it down there because I really interested in Super as much as possible with details in the feedback And Yeah, well the idea will be like I would take kind of like half the time for percent in the idea and then we're gonna discuss of course with you with the microphones, but also maybe on IRC if somebody is there or Well, we'll take a look to the comments that they are there in the other part Okay, so let's start so Nowadays if we want to patch something in Debian what we do is something like this We go to the Debian security tracker We see that in the bottom. There are some things One of them is to a GitHub commit so we click there we get the diff that solves that we assume that solves the The will ability that we want to fix and then we run patch on On the source that we want to fix with that patch and we get a lot of errors Basically, that's the workflow Any of you are familiar with this workbook any of you try to do this Okay, that's good. So this mostly a job for you. I don't know what I'm going to do the rest But I'm happy that you're here And so that that is how to how to automatize this process. So So for that in IBM where is IBM is my employer so half of the time I'm working down this project there We thought okay, maybe what we can do is to like pull all the URLs that we see in the security tracker put in database then Go to for example, you'd have collected the patches that we found there And then try to massage them to see if they fit in the whole level source code and Once that we have the results we put them back in the security tracker So the reason why the monkey is there is because for now we put it like in a Chris monkey extension You will see how it works. Let's let's go to them on time. So this is for example One vulnerability. This is a moving target because I mean all the time they are fixing problems in in deviant So we'll try to make sure that all the demos work, but maybe something that's not work So the normal the normal Security tracker looks like this You see looks like this So I'm not sure if you notice the difference, but the difference is like there is a column That used to be there. So that's the column that we are adding now with With a Chris monkey extension So if you activate Chris monkey, we see another column and we see for example that we have it check agree and that means that One of the patches that they were added here in the bottom You can see in the bottom. There are two patches from github One of those ones it could it was ported Automatically by the application the other one is still in process. That's why it has this loop See so we if we go there Take some time load it load it load it From there, we just can download this patch That is fully killed compatible. So if you download it and you import in your guilty, it will work. There will not be Hanks broken and some that kind of things so Automatically we grab it from github We run it in the in the in the vulnerable source code In this case in the version 7 3 2 5 And we Well, once that it merged We make sure or we make sure we try to check for example if it compiles. So if it compiles we have some sort of Guarantee that the patch is good Sometimes in the compilation process also we have to go through testing and that kind of things So that's that's that's the main idea of the of the juggle a Here, for example, we well in the initial stage. This is the stages that we path through So the initial stage we download the source code and then we apply By one one the hacks the hanks For example this bank We applied there and then at the end of it We build And if everything is fine We said, okay This is a super easy trivial case. You don't have to deal much with it. You just can download the patch So we have other situations too. For example In this example Check it out the three Patches provided by github by the by abster So these are these three x x means we couldn't for some reason we couldn't Adapt the patch to make it go through But there are also one one check mark That means because we could get the patch from jess Check it out in oh, sorry for that's the back there From wisi should be wisi there Because in wisi it's already it's already fixed. So we could go to wisi We because The kilt format has the header and the kind of things we knew that that patch fixed that vulnerability We could adapt it check it out because because the version is exactly the same Or almost the same That's adapting was so much easier than trying to backboard something from 8.1.3 Which has basically took the patches from here or from here And we adjusted to here and we took And we know that now works So also we can download The patch from here So if you notice there are some situations that they say single solution here in the top and why is that? Because as part of the of the of the massage in the patch to to port it automatically We introduce ambiguities What what that means? So check it out this situation This situation we have Some patches that they have this double check This double check means We have multiple solutions How we end up with multiple solutions? How we end up with multiple packages and with multiple patches? Well because in the In the process to try to put the patch back to another version Maybe for example, we remove the offset So we said, okay, try to find the context of this patch Sanguared the file and maybe that context is repeated in the file. For example, in this case we try to to Backport this patch. It's a super simple patch But we could find many places where this could match So in this particular case we found two situations where it can match You can match in this context You don't have to pay attention to the details here, but basically this is one possible patch This is also one possible patch. Check it out. There is a different Different line number. This is 283 and the other one was 259 Or we have a third option Where we Patch both situations. Okay. I don't know what happened here So this this is incomplete, but it should have both both We should insert these lines twice in the in the in the in the in the file So again Patch one was that example. Thank you. Oh, yeah. Thank you. Good catch. So here you can see that the same insertion was twice So that that's why we end up with this with these multiple solutions where you can Where you can download three versions of this backboard patch And that's what we represent with this with this small tree Yeah, there are other situations like that one. For example, this one a Again is the insertion of this single line The insertion of a single line can happen twice We're gonna happen single times Or three times and this one happened three times So we represent these some b which is with this tree in the bottom So what happens sometimes is like for example this case that um The real patch is the combination of those two commits So if we want to create the patch that creates those both commits, we do the following We take one of the existing patches and we create something Something based on this so we we click there And here we have all the hands that are included and we can choose which ones do we want to insert We don't want to consider We can filter by file by cde or by origin for example in this case both coming from github So in this case, we change this filter. We're going to focus only on this vulnerability And we just add The other hack that was in the other In the other commit Once that we choose all the hands that we want to insert we just keep clicking proceed And the backend will again try to measure them again try to compile it um In addition we can say okay for example this this hank in particular maybe Try to have a different strategy So we have this idea of heuristics Which is out the way to insert a patch in an old version So, uh, for example, let me set the fault strategy Let's let's take this one Which is basically tries to okay It says first tries Normal patch The command patch then tries to Find in the same in the same file Trying to find in any offset of the file where the context matches Then something called it fast in patching which is basically ignore the first line of the context Then we lower the bottom line of the context. That's that's what we do with with these parameters Uh, we have other strategies. For example, ignore the fine name. Maybe the maybe the Maybe this Difference belongs to a different file because the file was renamed or something like that So we can we can set different strategies different heuristics Uh, we have to consider that more generic are these heuristics more big which we may insert So, um So yeah, it's yet another thing that we can we we can do so when we click proceed here We're going to ask for a transition token because this push in the in the backend I'm sure if I if I will be able to Um And then this will put the the the full process in the queue and after some time will give you the results A to see if how how that went So if you go to the middle probably will look like in progress Or probably waiting because I'm I'm trying to process my some patches now. Yeah, so now this one is in waiting Okay, so you can create your own patches very quickly from the web Based on the on the patches that we have already there A common very common thing is for example The the push also include changes in the changelog or something like that that we don't care about So we can disable those ones automatically So so far with with with with a very simple strategies Uh, which is offset fasting ignore files uh All the from all the the the patches that we found that about automatically found which is the full bar We could automatically port those ones in blue which may not represent a lot But consider that these ones are patched for free. So now you can Basically focus on those they are around 100 So on those 100 patches and they are trivial cases You just scan directly download the kills and it's done. You don't have to do nothing else than compile it And of course make sure that the patch Patch is whatever you want to patch and That's usually because the security team makes a really good effort in putting this The patch information in the security tracker So this is basically a small list of of of the features that we have so far. So For testing that the patch works that at least makes syntactic Syntactic sense We build it in a in a in a pbuilder in a pbuilder Also, we provide you a kit ready patch Reddish because for example, that's not include the the header which db3 or depth 3 Ask you to include But that's probably something that will come in the in future versions And also we have multiple of providers In some cases different distributions in some of the different distributions from there yet But also could be also we we pull things from other version control systems Like for example, why should they have their own version control system? So we pull from there Could be great for example to pull from different distributions also We have a heuristics that are more flexible than patch And because patches It's a it's a good tool, but sometimes it's a bit old and was put was done with the different minds somehow And so we were implement patch and we were very flexible in what we can do with it And we managed to visualize and be which is the yet another thing that patch cannot do We easily can support which are the trivial cases So sometimes I am making the queue in the cafeteria at work and can see okay this one This patch will not take me more than 20 minutes because I know that works I know that works in the meaning that I know that that it will not be Super long. I can do it in 20 minutes. Just I need to compile it And if that's not I can quickly rerun something based on that strategy The the the the goal that we start the project With was to have a platform where security parties are shared Automatically ported to different versions and multiple stakeholders can define together. What is a good patch and consume solutions from others Because remember the the the ugly process that I told you the very beginning The same process is done by many distributions at the same time By by by many derivatives at the same time HP is doing the same thing for their own custom packages They try to back all patches constantly. So if we if we have a place where we already know Okay, this is the patch for this vulnerability and this is the way that you have to adapt it to your old version Just it's one click away. We can automatically Do this this transformation and People can define together what is a good patch. Maybe with thumbs up. I don't know exactly but The way where you said, okay, this patch is the good one. Just ignore this part And that would be great Also could be also great to have it in a way that everybody can consume it like for example with a rest api or something like that So we stop for once at all Duplicating job because that's something that I know it's been lost and So with this goal in mind, let's try to focus a bit more on the road map Something that we have to do at least this week is to perform to improve the performance We're doing a lot of things on the client side with that we can do in the back end So move the things that we can do in the back end Could be great to integrate better with this with the devian security tracker, of course, if that's okay with you Instead it could be great to have something like a field call it patch instead of because right now We are just pulling every URL that appears in the tracker and see if that's a patch So if instead of doing that, there is a way to say, okay, these are patches don't go around and search for things I'm sure you already know what I'm talking about, but I can show you quickly So that this is the source of the security tracker. This is the the file where the security tracker consumes for generating the website This is a good example of why we need it Instead of saying note, we should say something like And then the URL here Because something that we do right now, for example, is going to that URL And and this is a patch and we're going to try to back for that one But that's not a patch. That's a point where the the the back was introduced So that that's obviously we're doing is wrong Okay, this is this is typical situation where in that commit in this line the The vulnerability solved. So this is the case of of a patch. So what I'm suggesting is instead of having note She's basically having patch So we can automatically know or quickly know which ones are patches And of course this way with the token to push in in the the the the database Is not nice. We We plan to have a user registration interface But probably for the developer it's just basically It's just easier to use the signals. I know that we already have So this is on the on the on the integration part. We can discuss a bit more about this later And As I say, it would be great to to be able to crawl Other distros searching for patches We tried to do something with the red Website, but it's not so easy to find which is which is the patch that's all particular vulnerability And in the same way that I'm suggesting to have a patch field, it would be great to have Epoch field Because sometimes we add there in the tracker Like a proof of concept whether we know that this file is the one that makes the program fail And maybe we can use that program that makes you fail to check if the patch is good I don't know exactly how to automatize that but it could be great to to have something like that That will give you even more entities that that patch is a good one And About interface improvement for that I need you guys to test take a look to it In in here in the url you can You can find instructions about how to install it nowadays if we integrate it it will be easier, but now Right now you can go there and try it If you need a token for pushing things in the in the database, just let me know I can I can give you one Um The the the the the databases is an important part of it I have been using a bit and trying to detect some things that are here there But probably you come up with better with examples And in IBM a lot of people are very exciting about using machine learning In this context, I'm not a machine learning guy. So I have no idea how exactly the measure for example that it could be It could be possible to find the the the Injection point of the patch With machine learning. I don't know how Also that the the custom runs instead of being custom doing by by hand Maybe a machine can learn to do that Again, I have no idea exactly how because that's not my field Yeah, so Let me see if something about this part is missing Yeah, also like for example, if we if we start ignoring things like for example Changes in the changelog Maybe already the machine knows that we are usually ignored Text files for patching. So maybe automatically the the the back end can ignore those concentrations this this If somebody here is good at machine learning, you know have ideas about how machine learning can be a player or like I'm happy to listen more about it and Lately it would be great to add more new fancy ways to include this patch in bad proportions That right now we have really basic heuristics, but For example, if we take all the situations where we cannot handle it and we make a taxonomy of them We see why they cannot why they don't work or that call some sort of classification Maybe we can come up with more generic ways to solve the situations Something for example that exists in academia for some time already. It was developed by intria in in in france It's something called semantic patches Which is an idea that I would like to talk about that Shortly as an example of what it could be a fancy heuristic For example, take out this this this patch This patch hard solves An old problem. It's an old it's an old bug It solves a typical serialization issue, right? That is So the patch basically removes the Unsafe use of pickle in python adds a new model for doing proper Serialization and then modifies the use of it for serializing And this this hunk from the bottom is repeated several times in the patch So if we want to if we have this target This this patch is not working. It's not it's not merging properly. Okay. Can you see why? Check it out that the line 19 Sorry the line 18 Which reads a response Read It should match to the line 17 No 79 But the variable was renamed Because of the difference The matching does not work So semantic patch abstract this by using instead of the syntax using the ast So it's abstract from from variables. They use meta variables So it doesn't match the name of the variable and and it founds this matching point Quickly Another thing which is slightly subtle to see is the fact that The distance the tabulation is different the temptation is different That makes also the the the patch Hard to merge in the in the target a Semantic patches abstract from that kind of a syntaxes issues another thing that semantic patches Had is something called isomorphism Which is based on the situation that all these lines in C Are this do the same So basically If I search for the match for the first one doesn't matter if it's representing like the second one for C the the Israel project has already full database of isomorphism pieces of code that do exactly the same But if it's possible to extend these without the languages for example in panther Check it out the patch in the top needs to to match in the target in the bottom the line 19 and 20 Are semantically equivalent to the line to the lines 12 of the chain? but in the top We're using concatenation while in the in the bottom. We're using a replacement strictly present a substitution Um If there is there is a way okay to say okay the line 12 is exactly doing exactly the same that like Like line 19 in that case the march the the the merge of the budget is is that right? So this I've tried to start playing with these kind of ideas There are a lot of engineering issues that I couldn't finish too But if you come up with other ideas And all years it would be great And everything is this MIT license you can download the source of key in this URL And Yeah, the patch crawler is written in Node.js in case you want to to to write something there the patch itself is written in panther And Another thing that you can contribute with is to add patches in the security tracker Uh, if you know the commencing github is fixing the security security patch, let the security team know And the last point is about if you are fixing a vulnerability Put in your header in the in the patch header, which is the cda that's key because we can you also use that information for a Boarding that patch to a different version So that that's that's all my presentation. So now we start playing with you. Let's Try to see if you have comments ideas I'm I'm already here. Thanks. That looks nice I'm willing to try it out A couple of features that I had One particularly useful would be to check like whether a patch does not apply at all in the case that Might not yet be present. I'd say pretty like that For example, patching a function which does not exist. That would be something that could like even be like mostly automatic another So hold on that that's that's super interesting. So what you suggested is like If the part that we're trying to match does not exist in the code Maybe the code that we're trying to patch is not one of them. Yeah, I mean there like you could you could I think pretty there could be pretty reliable heuristics to detect whether Holder below code is not at all at least to give it a strong indication that I can still review That's that's a good one. And I think and another um, so whenever we receive patches Which target so often let's say there's some hypothetical application called Inversion 3 and the upstream fix adds a new library function which fixes a specific problem like interviews in the length check, etc and for that patch the upstream office usually only Fix the call sites of course only the call sites which are present in version 3 are present and Whenever we receive patches One of the things where you need to be very very careful whether the backpot has been done Right is whether actually all call sites in the older version are fixed because it might be that there are actually like let's say for example, there's the fix involves introducing a new function which now verifies user input for example, and The upstream author has fixed like all the users of the affected function in the current version But the whole version might simply have different functions. So um Just applying a patch like simply with the existing hunts might simply miss existing defective functions in the original version so that would be a really useful to Yeah, I have no idea how to summarize that with that idea though. Yeah, we can discuss it later I can add it to the to the copy that and I was speaking for the changes to the security tracker and um We can certainly do all that and I think it needs to have some further discussion on the specific syntax in the design, but Right now's note is like a catchball for anything like arbitrary comments, but we can definitely make this a little bit more Structurized so that fix version and specific branches that entertain Yeah, yeah, it could be it could be for example that the patch type depends on specific versions. It's not generic Yeah, yeah, I think Okay, thank you any added comment What are you back another one from I don't think this necessarily of course some of the existing data sources are title security tracker, but I don't think this necessarily should be Limited or tied to security updates. We have the very same problem with backporting standard Standard functionality bug fixes to a model release. Yeah, so I don't think this should be like a part of security tracker For the specific case of backporting security vulnerabilities. It would reuse a lot of the logic, but I think it should be equally supported and So the use case should be kept in mind to simply Yeah, the thing is we start which has patches in jail but security patches has the feature of of being Super small and that and that being key because when you have a huge patch the the possibilities of not having a match is huge too and even worse When you have these ambiguities They explode exponentially. So if in every step they have new ambiguities You end up with a lot of possible patches at the end of the of the trade. That's why we we focus on security patches to Just keep it Keep the patches small So I will see some also inside. That's like you Yep So thank you so much for your attention