 Test test yeah Good morning everybody welcome to the security track at scale 21 our presentation is the open-source fortress and our presenter is Andre. Hey hey good morning and thanks for joining I'm Andre and we will spend one hour together for talking how we can find vulnerabilities in software using open source tools I want to mention that if you have any question during the presentation please feel free to interrupt me and we can discuss let me start by a bit of context about myself I started an open-source startup one and a half years ago with a bunch of friends and after working on it participating in different programs for startups we decided to stop to stop it because of lack of traction in the open-source space and business space and this also come with the decision to leave the previous job that I had which was in the army in the Romanian army to be more precisely and this is the photo that I had from the last day in the servers room after that I was lucky enough to enter canonical in the Ubuntu security track for securing the distribution this is a photo with my team in Bilbao in which we participate in the Linux security summit and a physical spring together at the same time thanks to Alex Murray I started doing short sections in the Ubuntu security podcast for detailing academic papers so if you want to hear that out just give a try on Spotify or whatever podcasting platform you prefer so let's dive into the content open source implies the codes is open namely anyone can read it and this also assume the ability of an attacker to read the code and to detect vulnerabilities and to exploit any instance that are set by anyone on the internet but in my opinion this doesn't need to be this way we already have some open source tools that we can install and detect these vulnerabilities before the attackers namely in a shift left approach during the production of the during the development of the code and this will also minimize the the risk of publishing vulnerable software on the internet and let's start with that with an example which is for roundcube roundcube is a browser based iMac client and as in their websites it provides full functionality you expect from any email client for example outlook or Gmail this is a graph with their GitHub stars just to get the grasp of the adoption of the tool in the open source space you can see that there is a steady increase of the stars so the software in my opinion is widely used in in in practice and you can also judge this shodan query for anyone who doesn't know this shodan is a search engine for computers on the internet you can create a query run the query and after that shodan will provide you a list with with all hosts on the internet that will match the query so by using this simple query namely HTTP that component roundcube we can see that there are 161k hosts on the internet so this somehow prove our our previous assumption after that we can look at what's programming languages do we have for this codebase and what I did here is just cloning the repository changing the directory into what I already cloned and running sec into the current folder with the head and we can see the top programming languages in this code bases and there are PHP SQL JavaScript HTML if anyone here has a homelab maybe this is familiar for you this is the web installation page for roundcube and basically you can plug a lot of details in your browser and this will be stored in the server what happens here it's detailed on this diagram basically we have the user which configures the roundcube instance to the user interface the user controlled configuration is stored on the internal configuration and after the initial configuration if anyone sends an email with a non-standard image then we should get the path to the conversion program which is set in the initial configuration and we need to execute roundcube exec for converting that image into something that roundcube understands so basically we have two main flows here one for installation one for email processing which is basically the vulnerable one and I want to ask you if you have any idea what's missing here judging by this raw user controlled configuration yeah that's right mm-hmm and any other aspect that's missing input input sanitization maybe it's not it's not important enough that the user is not authenticated but we also have the problem of the input not being sanitized namely any command that is executed here in the last execution path is basically user controlled by anyone on the internet so an attacker can send the post request to the installer you can see there an IM config convert path with a PHP command which is for a reverse shell after the attacker sends an email containing an image with a non-standard format for example TIFF that roundcube doesn't doesn't understand by default roundcube will try to convert the image to JPEG using the command that was provided when configuring the instance which is the parameter of the IM config convert path so in this way the attacker will have a reverse shell basically anyone from the internet can scan for hosts can send this kind of requests after that can send an email with a non-standard standard image and get reverse shell into these instances this is known as CVE 2020 1 to 641 there are multiple configuration items as the one that we previously saw as IM config convert path which led to arbitrary code execution into the instance the CVSS for this vulnerability was 9.8 it has an exploitability score of 8.1 which is pretty high so basically if you have a vulnerable roundcube instance on the internet there is a high probability of you being exploited and as previously saw during the last year this is this was also used by APT28 which is this advanced persistent threat actor a bad a huge name in the bad guys scene which compromised Ukrainian organization servers during the Ukrainian Russia war and it was also added by CISA which is this American organization into their known exploited vulnerabilities catalog and then the question is was this preventable what do you think yes yes it was but there is a but not with standard linters and scanners as you previously saw there are some functions for example roundcube exec that's for example bandit or cpp check where any other lintel we will not understand by default this is basically the vulnerable function which doesn't do the input sanitization that we previously mentioned and you can see there for example after that comment executable must exist that if that check will pass then the comment is returned as it is basically the raw user control configuration without any shell escape let's say one possible solution for this in is taint analysis we will discuss this in a in a following section basically we will track the execution flow using this source of data namely roundcube get instance config and we will set a sink a sensitive sink namely hey if this data will end up into this sink for example an argument into a function called then we have something sensitive in there there is something named samgrab which is this code querying engine we will discuss this in the presentation and this is a rule a yaml rule that is created for for samgrab for detecting this vulnerability we basically set the source using a pattern with the the function and the object that we previously mentioned the sanitizer that must exist in order for data to be sanitized in this case escape shell and the vulnerable sink which in this case is return and if we run this rule the samgrab rule over the function we saw before we will have two matches here basically the returns of commands without any sanitization function called so what i want to detail today is the open source fortress there is a internet facing website in which the whole presentation and the workshop that backs this presentation are there are basically there is basically a collection of open source tools that you can practically use to detect vulnerabilities in your code bases and there are multiple sections here namely factual information about software software security topics and the brief presentation for each analysis technique and some practical examples namely some exercises that you can do at home for testing this this tools on your own this presentation will cover only the bolded items in here i hope the quality of this is good enough judging by the smiles is not so let's take this one by one we basically have there is the first step threat modeling namely we we need to think about how an attacker can attack our application after achieving this process we will have a threat model for example if we have a blogging page we can think hey maybe anyone on the internet can brute force this there is a threat there so we should consider this in later development phases after that there are two branches here one for manual vulnerability detection namely me myself as a security researcher will do some manual steps in order to detect the vulnerabilities in that code and the other approach that is the bottom branch is for automated vulnerability detection and here we have multiple steps the first one is running the tool and aggregating the results after that we have triaging namely seeing what warnings are invalid or not and after that in the end we end up with some vulnerabilities that are coming from from both processes namely the manual and the automated one for the for the second one for the second approach you can also think about a CICD pipeline in which you have some quality gates for example hey after each pr that i received for my codebase i'm running a lot of linters and security scanners for seeing if that commit is vulnerable or not if it's not i'm merging the the changes into the main branch after having this these vulnerabilities we have two to approaches either to execute the defense activities if for example the codebase is ours either to attack the codebase if we are a malicious person for defense this defensive activities we need to research the vulnerability for example if you have experience with fl or any father you can detect crashes basically some blobs of data which doesn't mean anything you only know that the program is crashing but you have no further context so in this step you basically need to take the to take the to take the crash to take the data that is crashing the application and do some research in order to understand why your application is crashing this also contain cvss approximate cvss uh cvss approximation you can see there a string basically there are multiple components for computing a score between zero and one which is the severity of the vulnerability and uh this is for example for log 4j and you can see here in the end uh something like uh ch i h i h which is configuration impact high of impact uh integrity impact high and availability impact high beside this we have cwe which is basically a classification of the app of uh of the vulnerability in the case of log 4j we have this cwa which is 502 which stands from uh deserialization of entrusted data you'll have the presentation afterward the links are clickable so you can deep dive afterward and we should also request a cv id for example if i'm the maintainer of of open ssl let's say i want the users of my library to uh to be concerned and to update their their libraries so what i'm doing is to publish this cv which is basically an identifier and a bunch of information about the vulnerability itself in the case of log 4g there is this unique string that may be not well known for many of you and other defensive this defensive activity is patching uh for example this patch that is linked here from oracle uh communication with stakeholders and this is the remediation guide from apache and as previously mentioned you can also do offensive activities for example if you want to write a malware and this is not something that i am proposing but this is something usually usually done by state actors by anyone wanting to spread the malware in the wild and there are multiple steps here you need to write and exploit you need to detect an attack vector uh for example we can continue the the example for log 4j here you can target vmware horizon instances and exploit log 4j uh you can bypass mitigations and here you have we have two ttps which are for symmetrically encrypted channel communication and mimicking legit services again there are links in the presentation for afterward and weaponization yeah an id for a ttp and yeah the exploitation part in which you basically run the exploit over instances of on the internet but uh we will look today at ubuntu portrait which is a vulnerable by design codebase uh like webgoat like then vulnerable web app like cloud goat any vulnerable application that is written to be vulnerable by default um basically it's a lightweight piece of software that runs on an ubuntu server and allows users to control it through their browsers i written this uh from scratch it contains a lot of vulnerabilities and basically we will look at multiple techniques in which we can detect vulnerabilities in the codebase it should be uh deployed on premise and it's written in python and c so most of the tools are specialized on these two programming languages there are 12 plus embedded vulnerabilities and there is a plus there because uh the application depends on python packages as you know vulnerabilities appear during night so maybe there are multiple vulnerabilities appeared in in the meantime um let me show you how this look basically here we have a login form in which i used the pam credentials i want to upload the file in the server and clicking some buttons and there is also a functionality for exploring the file system and i'm showing you this to give to provide you an idea about what's the application about and uh yeah just to let you think about what can go wrong with this application for example you you've seen some uh comment a way to execute comments what if we can run comments which are not embedded in the sandbox uh this is the architecture of the application basically we have a web UI which does api request through HTTP to a backend which hosts a web api in python tree and flask and it also connects to a shared object written in c for generating recovery tokens and the Linux authenticator for checking if the provided credentials are valid or not nice so the first technique for detecting vulnerabilities or a practice tab let's say not really to detect vulnerabilities but to catalyze the the action of detecting vulnerabilities is threat modeling and this is an image perfectly illustrating what threat modeling is we have some assets which is the food on our fridge we have a threat for our assets which is the cats which is trying to get the food and threat modeling means being inspired enough to assume that the cat will want to steal the food from the fridge so as previously mentioned it means identifying the the assets and the threads so you should answer to these questions for your code bases what we need to defend what can go wrong and i should mention here that this threat modeling of activity is already a legal requirements in some states for example usa and singapore not all states are adapting this at the moment but in my opinion this will be this will be the future and there are multiple advantages for this kind of for this kind of approach for threat modeling namely to to design a secure by default solution to prioritize your activities and yeah to boost the confidence of your stakeholders because as you see some states are requiring this so they want to have confidence in your in the solutions in the solution you develop the first open source tool we will look at is or as thread dragon here is a quick demo for thread dragon i'm creating a thread model here i'm all i'm already using something existing but after clicking the existing model i can see the architecture items for example the circles and the rectangle on on top and some details on the bottom for explaining what each component does and the the red lines over there are existing threads for example i have an authentication path something there may happen wrongly so i should think about that and plan for some indications so the usual process for for this open source tool is creating a threat model using a diagram style in this case you know as we have stride and cia we need to represent our assets which means which which can be stores processes actors data flows and trust boundaries and to manually identify the threats which are possible with type status course priority description and mitigations there are for example some commercial tools which automatically identify the threats which are possible for example hey i'm using this aws components i'm linking this two together what can go wrong and the tool will automatically detects the information without you being required to do this manually do you remember this image any matrix fan here no so there is this search scene in matrix and this is an illustration for the next vulnerability detection technique that we will look at namely code code clearing it means searching for a specific pattern into our code base and there is an optional abstract representation of the code base for example as abstract syntax tree or control control flow graphs there are multiple query types in here you can search for literals for example i'm searching for scanf but basically i can have this is a dummy approach basically you can have scanf into a comment and it will also match you can have rec access and basically i'm writing here some parentheses for detecting the calls this is still dummy but i also have some data structures and there is a tool called jern which has a specialized kind of query language and after the tool after creating an abstract string texture it provides me the ability to query that tree so i can write here cpg which is the graph that method and i can search for scanf and after that all calls and the advantage for this technique for code querying is that you can have community queries which are generic but they can help you detect some vulnerabilities which for which you are not aware of this is an example of an abstract string syntax tree this is what the computer or compiler will understand so we need to think about ways in which we can query it for this we have semgrep the installation process is pretty straightforward just we've installed semgrep after that we need to write these kind of yaml rules there is the first example regarding browngubb there is this one for detecting all login calls to blogging or logger using some passwords or token so we are trying to detect all passwords or token which are logged by the application which as you as you assume it's a vulnerable pattern we should not do this in our code basis after that i'm running semgrep with the scan functionality i want to export the results into the sarif format which is something proposed by microsoft for standardizing the results of static analyzers i'm mentioning the configuration path where should i store the output and what's the what's what's the code base in my case there is the portrait application the one which or which we described five minutes ago after running the tool i can see that 17 files were scanned and that there are nine findings namely nine vulnerabilities in this code base and this is a snippet from ubuntu portrait in which in fact the users and the user and password were were logged after authenticating a user a user you can also test this at home using semgrep and the code base of this workshop what is semgrep it's a partially open source code scanner i'm saying here partially because they are an open core startup they have an open source solution but they also provide some pro feature that for which you should pay for it supports 30 plus languages there are no prior build requirements and in my opinion this is a huge advantage you have for example covariate but covariate has this disadvantage that you should build firstly your your your code base and after that to run the tools for semgrep is not the case it's basically scanning the code and that's it and another advantage for semgrep is that it has no dsl uh if you remember the germ query you have seen that there are multiple weird names that are combined and after that you have the query for semgrep you if you want to query for example python code you'll write python code in the end so it's low barrier to to entry and the advantage that was previously mentioned namely it has default rules basically default a default configuration that you can use for scanning your your code bases but you also have third-party rules which are written by members of the open source community that you can take and run on your code bases the next technique yeah another classic here guesses idiocracy for 2005 in which people are dumb enough to know to not being able to solve these kind of games uh and the question here what if we can have the same approach the same dumb approach for our programs and testing our programs and this is what fuzzing does namely hey let's run the program using some random random inputs and see what happens in this case a crash in the program is basically a security issue this may or may not be the the true uh statement uh for example we can catch null null the referencing we can have sanitizers as address sanitizers undefined behavior sanitizers for increasing our success rates and basically it's a beefy bfs traversal of the control flow graph and there are multiple optimizations here for running our code faster which is instrumenting the code or yeah uh knowing the input format which are helping the the fuzzer to do a better job this is the architecture of a fuzzer but we will not deep dive into that maybe leave it for home and as an open source tool for implementing fuzzing in your code bases you can use ifl plus afl plus plus there is uh a straightforward installation you basically need to run the the docker image and that's it and this is uh a harness uh namely some code that we should run beside the code of the application for gluing the fuzzer and the code base and you can see here that we have a file that is read basically the file that is produced by the fuzzer and uh in the end there is a call to generate recovery token which is the function that we want to fuzz so basically we run that function using two arguments which are generated by the fuzzer this is how we we can run afl uh firstly we need to compile our program with using afl cc it's an it's an alternative for gcc or clang or whatever you use that basically implements or instruments the code base in order for the fuzzer to better to do a better job we also include the symbols using the minus g flag we specify which is the executable that we should produce which is crash me if you can that elf uh and the cc source files which are generated which are used for as input after that we run the fuzzer which is afl plus afl fuzz with a dash i uh flag for specifying some dummy inputs for example if we want to fuzz a parser for images we can provide some uh example images for the fuzzer to better understand the format dash o for specifying where the output should be stored and in the end the executable that we should fuzz and a symbol for specifying where the fuzz data should be placed so as an argument in this case we're on the program and we can see that after some milliseconds there is the first crash you can look at the findings in that section and there is a total crash number which is one and this is uh the code that that triggered the crash and you can see here that there is a buffer which which has uh a fixed size in which we copy the data which are specified when when calling the function so basically this is how we can use a fuzzer to detect vulnerabilities in this kind of complex code which is in my opinion hard to hard to audit manually or hard to understand if you are not the writer of the code base so this was a demo for afl plus plus uh it's a fork for american fuzzy lab um and it has besides afl some additional features which are cami emulation persistent mode which help us to create snapshots of the process and run the program faster and faster and other optimizations and it's nice that google already embedded this tool into os s fuzz so if you are a maintain uh just a short introduction here if you are a maintainer of an open source tool into a memory unsafe language you can basically create a pr for os s fuzz for integrating your application into their fuzzer and they will they will allocate the google infrastructure to fuzz your code so basically to detect vulnerabilities using in your code which is pretty nice and there are a lot of huge projects in the open source space which are already leveraging this and they have nice nice trophies nice vulnerabilities which are which were detecting with with this approach another technique for finding vulnerabilities it's secret detection uh and there was i am romanian in fact and we had this habit for hiding our keys uh under our mat which is really insecure basically a thief can look under the mat and detect the keys uh and we we we try to transpose this approach namely finding the keys under the mats in computer science into development and we have here secret scanning there are multiple types of secrets into a code base uh we have api keys we have credentials we have tokens these are sensitive items which should not be included in your code base you should use for example environments variable or standard input or whatever you prefer but not including them into your git repository let's say this approach means scanning for a specific pattern or for entropy for a secret and uh us for the case of code scanning we have here community or generic rules for example there are some rules which are created by the community for scanning for AWS tokens you can import these for your code base and run this run them and that's pretty much it you there there is low effort for the the practical part and the next open source tool to we we will look at uh we will download the release for git leaks you basically need to enter their github and click the release tab and download the binary for your architecture uh the the run is pretty straightforward just use this to just use this comment that will specify where the source code is is stored uh and where the results should be stored and the last flag is for deducting the secrets basically the results will not also include the secrets only their location after running this tool for multiple commits we can see that there are five uh secret secrets which were leaked and this is something for ubuntu portrait the application in at the first of the presentation in which i included a secret key for flask this is hard coded i should have used for example uh i don't know uh some random number generator that is present in my operating system and or and not using this approach of having a hard coded key so git leaks uh it's a detector for hard coded secrets it what's nice about this tool is that it's analyzing the entire gith history so not only the current state of the code base also the commits in the past because for example you can have overwritten secrets you may think that they will not exist in your code base but in fact someone from the internet can can all commits and can detect them and it also support baselines and custom format for secret for example if you have a custom api and this is something that github does for example they have a prefix for example jj h underscore and after that the secret you can also have this approach for your secrets prefix prefix them with something and after that creates a pattern for a secret scanner and in this way you can detect your custom custom tokens and the baselines what this means uh these tools have uh maybe a high rate of false positive so you can run the tool once validate the warnings and after that consider all warnings as false positive if you run the tool again you will exclude the warnings which you already proved as false positive so you only scan for the difference and you should validate only a few warnings of the tools um i have this xacd meme for illustrating a chain of dependency and i think you are already familiar with this because of pi pi because of npm uh this is a graph that is generated for an npm package you can see the main package in in the center and after that there is a bond there is a huge dependency tree for for that dependency basically if we have a vulnerability into one dependency this will propagate in in our code base but it's nice that we have open source tool to to deal with this namely tools for dependency scanning we basically iterate through all dependencies uh in our code base and we try to find vulnerabilities in in them for example uh cvs which are published for flask that we are using for creating our web servers it's using a dependency uh declaration list for example pi project.toml or package that that jason in the case of the npm ecosystem and for this for this demo section we will look at osv scanner it's a it's a tool created by google uh and yeah as previously the setup uh process is straightforward we just download the the binary from the releases from github we run the tool with this simple comment we just specify the poetry.log file which is generated by poetry a package manager for python and we can see a bunch of vulnerabilities which are reported by the tool um i i really think that this list increased in the in the previous few days so this may be a bit updated uh and you can see here the dependency list uh in in this case i searched for cv 2023 4863 which was a hippo buffer overflow in lib webp which was in fact used by this pillow library which which i had integrated in in ubuntu portrait osv scanner uh basically there is a database published by google uh the sv database which embeds multiple sources which are github security advisories pipi rustsec and global security database and there is also support for ignore vulnerabilities so if you can prove that vulnerability doesn't apply in your code base you can specify the vulnerability as false positive yeah so black duck if i remember correctly is for composability analysis so maybe the the end goals are a bit different uh but i can compare to uh fossa uh i know that they have some reachability feature which is pretty handy in my opinion for example they see that hey you are you are either a dependent of this package but we will also look at uh your function calls basically if i'm using a vulnerable package it doesn't mean i'm also vulnerable i should call the vulnerable function from that from that package so they are also doing this maybe we can expect some tools in the open source space in the near future for also implementing this but yeah at the moment there are there are only the commercial providers which offer this this kind of features um the next section uh there is i like the car safety in or the automotive industry because they have these standards they should check for each car basically they have a score and based on that score you can assume how safe the car is in an accident uh so we have the same approach in our computer science industry for linting which basically means defining some rules and do some static analysis for finding issues before compiling or running the tool the code and these issues may be formatting for example hey you are using spaces and not tabs or tabs and not spaces whatever you use uh you can also scan for grammar for example i'm not if a developer is not using inclusive words in in the ui you can basically automatically raise some warning for the developer to replace these words or you can also have security security warnings for this uh demo section we will use bandit maybe this is really familiar for you for python developers in the room uh it's already integrated in a lot of ideas but we will prefer the cli interface i'm using the recursive flag for scanning goal source files and specifying the path to the to the code base and specifying the format which is also sarif this is a common theme in my presentation i like sarif it's like it's nice about sarif that you can also push the results into the github security tab so it's pretty straightforward and you can have results into the github user interface with the linters or security tools that you have set uh and a dash o for setting what's the what's what the output file is after running the tool we can see this output and um there is this snippet from our code base from test test yeah it's fine so this is a snippet from ubuntu portrait uh you can see there that i'm unarchiving a file and this is a vulnerable pattern that is already detected by a lot of linters so it's a low low hanging fruit that is vastly detected by an attacker but can also be vastly detected by the developers with bandit in this case it's a linter for python uh it has an abstract abstract syntax re-representation of the code and there are multiple modules in in this tool basically we have patterns for suspicious code we have denialists for imports or function calls for example if we want to block the pickle module we can do this from from bandit and it also have a report generating functionality if we want to pass the results to any other person and uh it also have supports for baseline the functionality we already discussed another movie do you remember this uh it's from mace runner basically they are locked inside a maze and they need to run inside the maze in order to detect all paths and they need to escape the maze and yeah we we we can also think about how we can apply this abstract abstract concept of having a huge problem space and searching it in computer science and this is what symbolic execution does and data data analysis we have the cfg the control flow graph basically the tree with all function calls in in our in our code bases and we replace the concrete values with some symbolic ones and we search for some specific patterns there are multiple components here there are some sources namely where the data came from some sinks namely some places in which some tainted data may end up and we can consider this combination of sensitive sink and and tainted source of vulnerability and some patterns and we also have a path explosion problem here basically some some code bases are huge it's really hard to explore all all possible paths we can miss a lot of vulnerabilities but the guarantee of this approach is that we will explore all possible combinations in the end with let's say an infinite compute power which is yeah not true but it's still detecting a lot of vulnerabilities in in projects as a demo we have cli which is an open source symbolic execution engine just take the docker image run it as in the case for fuzzing you need to write a bit of glue code in this case i'm marking some buffers and an integer a symbolic and i'm running the generate recovery token function again i'm running the compilation phase the engine and in the end i can see that i have an error there namely an out of bound pointer which is basically the error which was detected previously by fuzzing so cli it's a generate a generic symbolic execution engine with security use cases uh and built on llvm this marks the end of all security detection techniques but there are others which you may use there are there is stress and load testing you can use jame meter for many protocols and services or k6 for kubernetes and web dynamic analysis using oszap and we all like automation so all this tool can also be automated i will briefly mention some tools that you can use there is sarif multi tool for performing operations with the sarif files that we generated into this presentation we have make and poi the part for running the tasks we have id workflows for example something integrated in visual studio tasks for running the tooling while coding we have we have pre-commit for example hey run me this bunch of of linters before uh pushing the commit into into the origin uh and act and github runner for running ci cd workflows locally this is pretty useful for developing some or using the same code that you want to run on remotely on git lab or hit or github but locally and yeah the remote alternatives i want to end this by mentioning ubuntu concept which is uh yeah i i guess you are familiar with ubuntu operating system but we didn't invent the name it's a african philosophy and ethical concept and i really like this this quote from nelson mandela ubuntu does not mean that people should not address themselves the question therefore is are you going to do so in order to enable the community around you to to be able to improve so i want to end this presentation with some ways in which you can help the open space at first you have some sponsorship options there are a lot of github projects with this button namely sponsor you can donate one euro or two euro monthly for the projects that you use this means a lot for maintainers who are contributing their spare time for for for community basically you can also report bugs and vulnerabilities via email github issues or any reporting channels that the upstream has for example just take a random random code base from the internet from github run the tools that described on this this presentation and see what you get maybe there are some vulnerabilities uh that the upstream are not aware of and lastly you can implement some features or solve some bugs for security tools or uh maybe you can propose some patches for the vulnerabilities you discovered in in the previous approach and in this way i can bet that the whole community will be grateful for the work that you did so i wasn't ray uh if there is a single thing to remember for this entire presentation there is this link which already contained the presentation the workshop the cheat sheets that you can use and more so the barrier to entry in this vulnerability detection space is pretty low in my opinion so you can go home and implement these features in a few minutes thanks towards the beginning you um you mentioned a flaw with a round cube and then you used semgrep uh static analysis tool yeah um i'm not familiar uh to um uh show a way of detecting that but you had to load like a custom rule is um i use covarity and i i'm fairly certain covarity would just catch that without needing a custom rule um the open source version of semgrep does it have a more generalized rule that would catch that in other software so as previously saw in the in the code base there are some custom methods which are called right there is for example the round cube namespace and there is this exact function inside the namespace the linters and semgrep will not detect this this kind of methods by default so you should specify them uh in my opinion the rules are pretty straight for to implement you need just to understand the format for these yaml files and uh the open source engine is capable of using the tools that you generate uh run the data flow analysis that i mentioned in the presentation and combine this information and detect the vulnerability this is something that i was in the practice so you can use the open open open source open source engine with that rule and detect the root cause of the cv that we have seen does this answer your question so for me for me the reason to use code scanners is so i can stand on somebody else's shoulders who's way brighter than i am because i'm stupid these open source code scanners and all of this that you're talking about are they actually maintained by people or do i have to go in there and write my own rules to try to figure out my own bugs are other people contributing how active are these projects so that i can you know look at this and i can say look i'm going to put this in my pipeline so that i'm constantly you know getting other people's input and trying to build on their shoulders because if they're not i'm just going to go out and spend the money to buy a commercial version of this because the security of my products depend upon it so can you comment on that on what you see is the activity on these various projects so the startup that i previously mentioned was was for cm is small and medium businesses and i think that i learned from that experience was that the the small companies that doesn't have the money for the commercial solutions for example they they can pay one thousand for a security solution because maybe they have a marketing campaign that they should execute so in my in my opinion you can use this open source tool for tools if you don't have the money or if your use cases are generic enough to respond your question there are a lot of contributions these tools are used by security researchers by dev dev sec ops people that by dev ops people so there are a lot of uh of parties integrated in the in the community which are contributing with with their rules uh i'm familiar with cemgrep there are a lot of people pushing their their their rules into the open source space either by using the the hosted repository from cemgrep or just create the github repository and push the yaml rules there so in my opinion yeah you can use bandit for example just run the tool and you can have some low low low hanging fruits that you can catch but if you have some really tailored tailored use cases maybe you can tailor the the the scanners which are open source or if you if you have the money pay a provider to do this work for you and you should also have uh you should also consider the fact that most of these tools namely the the static ones so not symbolic execution and not fuzzing are generating a lot of false positives this issue seems somehow solved by the commercial commercial uh companies but we are still fighting this in the open source space there are a lot of learning generated by bandit by cpp check by other tools so you should invest the time to validate if the warnings are valid or not yeah just one follow question so the one thing that i didn't see in here is the profilers where you take your code and run it in a profiler and run your tests against it to make sure that you're actually you know squirreling into all of the if statements and execution and stuff like that are you referring to coverage or yeah code coverage you know code coverage profiling yeah it's a nice recommendation i will do this in fact so the workshop and the work the websites are work in progress so it's a good idea i preferred going to security tailored tools namely security scanners fuzzers symbolic exec execution engines but yeah i am i'm i agree that you can also use this kind of coverage analysis or profiling to detect the the the code paths which are not that executed because if they are not executed maybe people are not paying enough attention to to these code flows but the attackers may detect if that's there is a vulnerability there and they may target that specific code flow yeah thanks for the recommendation okay so uh i guess one of the questions i have as you mentioned uh like going back to osv scanner um how would you like projects like yachto have uh tools for checking vulnerabilities but they're using the cv database instead like cv check do you would you recommend to one of those projects that they start looking at using this um different database osv scanner as opposed to like they're just checking cvs they are also checking cvs but it's nice about them that they are trying to integrate this vulnerability data under the same umbrella they are doing this for example with rust sec if i remember correctly there is no there is no direct correlation between a rustic id and the cv they are not working for assigning a cv id for each rustic id uh so in my opinion it depends on the project if you have uh vulnerability data which is not available in cv or mitre or nvd you should also use alternative streams maybe there is osv scanner maybe there is another tool which is tailored for your uh your toolchain okay so this yachto is a distribution builder i don't know if you're familiar no i don't know if okay so um you know they're used for building embedded Linux distributions is pretty heavily used in the embedded Linux community and so i'm just curious if if somebody on that side should be looking at switching to osv scanner as opposed to like i think they may have rolled their own scripts to look for cvs and especially if osv scanner is kind of a superset of uh cvs is that is that basically true yeah but but it depends on the superset i mean if you have if only the cv is applied to your project in my opinion it's the same it's the same thing to use a cv scanner and to use this kind of uh aggregator like osv scanner so if i'm using for example rust maybe i have an advantage in using osv scanner because i have the rust scanning already integrated and the vulnerabilities are not published as a cv but if i'm using uh i don't know portal maybe portal is not integrating in osv scanner there are only cv so i mean i i can also use i i will only use the cv scanner because osv scanner will basically generate the same information okay yeah no there is a there is a rust layer for yachto i don't know if it's official or not okay thank you thank you for attending please give a round of applause to our speaker test test okay testing one two three testing oh it's back good afternoon everybody welcome to the security track at scale 21 this presentation is human versus ai how to ship secure code please welcome your speaker zavier hey fox um first uh thank you for being here i know that it's the last slot so you made it congrats to you um so there has been a slight uh title change it's code security reinvented now um so my name is xavier i'm french but i live in palo alto california with my wife my three kids my three cats and my three koi fish um and uh i'm working at github github with which is the ai powered uh developer platform where more than 100 million developers build scale and ship secure software um so there is of course a commercial site to github's mission but you i hope all know that there are also sites of github that are not commercial for example github education where uh we give resources to students and faculty to learn how to develop software and of course everything that we do for open source and i am on i am on this side i am part of this side uh i am leading the github security lab it's a team of hackers uh who find and help fix security vulnerabilities in open source projects uh so disclaimer i'm not working for github products so i won't be able to answer all of your questions about github products if you have them but i will give you the perspective of a user of these products uh basically of an open source customer of these products and so this talk is about the impact of ai on security and i will illustrate that with what i see in uh open source so let's start with the state of open source security open source code is everywhere right um in all the software that is crucial for our lives whether in transport medical defense finance open source is everywhere and we all depend on it uh however there is a lack of security expertise some studies say that there is only one security professional for 100 developers uh and this is a number for global software right so you can imagine that it's even worse for open source where um they have less resources so this imbalance is much worse in open source in addition to that there is a disconnect between security and developers security experts think that developers don't care about security and for developers security is often the department of no you know the people who are blocking them the people who are slowing them down the people who are preventing them to ship features this is similar you know to the disconnect that we had between dev and ops you know before the dev ops movement right where dev and ops were working on competing objectives innovation and stability so now it's the same you know for security we have competing objectives innovation and security and here again the situation is worse in open source um because these two communities are often not collaborating very smoothly to set the list so what we need to do well we need to shift left we hear that a lot right but what i see a lot happening is um people trying to run the traditional security tools and practices earlier in the development process and that cannot work these tools and practices are made for the security teams not for the developers right and security teams will be able to understand and to act on the findings but they are not designed for developers and they will generate even more frustration uh and friction between the teams the best example of that is uh false positives developers hate false positives right and um just moving these tools earlier in the development process will reinforce the disconnect if these tools bring a lot of deluge of false positive to developers developers will have a confirmation that security is the department of no and they are blocking us for no reason and you know what happens when um you have an alarm that goes off frequently for no reason well after a few false alarms people tend to ignore it so developers will tend we begin to ignore these alarms you know and security will have confirmation that they were right developers don't care so you will have a reinforcement of these wrong assumptions on both sides so what's the solution well we need to empower developers right we just we cannot just shift left by just a moving left these these these tools and these practices we really need to give developers autonomy and expertise right and that includes you know designing the tools for them for their use cases but we really need to shift left a culture of security so this is the agenda for the rest of this presentation so we we show the state of open source security next i will show you what it takes to effectively shift left a security culture right what we are doing um right now as humans and then we will see how ai accelerates augments and reinvents this and finally we will finish with uh some perspectives so we will start with talking about what we are doing as humans it will be a bit boring and then the robots will come in so bear with me um so yeah shifting left the security culture this is not easy uh encouraging developers to adopt security practices is a bit like encouraging my kids to eat vegetables developers prefer to ship features and they leave the security you know to the side of the plate well despite the provocative analogy i don't believe that developers are childish and i don't believe that developers don't care right i believe that they need to be empowered so this is a cake that my wife baked for my son's birthday a few years ago um who can take a guess what's in this cake no one well chocolate no there is a question mark here so it's not chocolate it's anything but chocolate we're looking for a vegetable no no it's worse than that no it's for it was for nine years old it yeah yeah so beats red beats and the kids loved it and imagine nine years old kids right pretty difficult judges when it comes to uh food and they loved it and my wife was doing this all the time she was integrating the vegetables into a meal that the kids love and what does it mean for security and developers well we should try to integrate security practices into what developers love and what do developers love coding so my solution to really shift left effectively is to integrate security into the existing practices into the existing developers into their existing flow not in another tool i want to integrate that into the developer's idea into the developer's browser not into another process i want to integrate security into the current development process whatever it is for your organization and so we talked about vegetables but this is what worked with DevOps right developers started testing when they could code their test you know with x unit and fitness and selenium etc developers started coding deployments with infrastructure as code so i think this is really the solution for security bring the community's expertise into the developer's flow so now let me show you a few examples of how we are doing that uh at github and in my team the first example is how developers can secure their dependencies so my team maintains the github advisory database this is an open source uh information about security vulnerabilities disclosed in open source projects it's free it's open source because we think that vulnerability information in open source is a common good and because it's open source we receive more than 1000 community contributions per year so this database we originally we take data coming from the nvd the national vulnerability database from some other sources like rust sec like pi pi etc but we also get community contributions from anyone who has the information and the knowledge and the expertise we review it in my team we create it and we make it available for everyone for example as soon as this uh denial of service vulnerability is disclosed it gets reviewed in our advisory database with important information such as the cwe the cvss core and very important the affected versions and the patched versions and with that this information will flow directly to dependabot dependabot is the sca tool uh by github which is free for everyone and if you want github dependabot will create automatically a pull request in all the projects that depend on world wrap if they use one of the affected versions and so as a developer you know i just have to review the pull request and merge it if i'm happy with it so i have really here knowledge coming from the community and ending up into my developer flow this is exactly what i would do if i get a suggestion in a fixed suggestion in any pull request um you will note that there is a box a yellow box here with a nice message from dependabot but it's not nice it's a bit passive aggressive you know oh you're a bad maintainer you don't care about me so i will pause my interaction review uh well indeed it's because i kept this uh alert intact so that i could take the screenshot and for this presentation but more seriously this feature is another example of how a security tool can try to be less disruptive for the developer's flow right oh i detect that you seem that it seems that you don't care for any reason i will pause and i will not bury you uh under a lot of use this reminders my second example is how you can secure your code so at github we have a static analysis engine called code ql that finds security vulnerabilities in your code it's also free for open source it takes one click uh to enable and unlike the well the difference with the other static analysis uh engines is that canql has a very powerful time tracking uh to track malicious data through your code it does it across different functions across different files across different libraries and even through libraries external to your code for example code ql can detect log for shell the 2021 jndi injection in log for j because it can track the unstressed data from the call to the logger down to the jndi call and this data it flows through more than 150 steps in the code in more than 10 different files right and code ql is able to detect it out of this again is free for open source yes question yeah so so code ql is available for you can visit codecrate.com you have a java javascript type script c c sharp c plus plus python ruby uh i think that's it and yeah that's the languages i might have forgotten one or two but yeah um yes wait wait and yes of course we we are trying to uh implement other languages it's it's it has to be done language per language because how it works is that we to be able to do this powerful time tracking we really need to adapt each of the of our data model and our control flow to each language to have something very precise without the false positives that we we all hate um yeah and this is code ql is what my team uses for example for finding vulnerabilities in open source and you can check our code ql wall of fame we find more than 100 vulnerabilities 100 CVs with code ql on open source projects every year and this is used also by um by the community the second line here they are not in my team jordan alexandra we are working at google security but they're using it and then they can say hey we find this cv with code ql um so yeah so once enabled you will benefit from hundreds of security checks designed by the github security experts but also by security experts from the community because the queries are open source the code queries are open source so by just enabling this scanner you will benefit from this ever growing expertise automatically without doing anything and you will basically sit on the shoulders of giants because our security teams at customers who are using code ql and writing are improving these queries you will benefit from that automatically when they reach the the open source repo and of course the github team creates these queries so that they have a very very low false positive rates because again we don't want to interrupt developers in their flow right so basically you have two kind of queries queries that are designed for developers with very low false positive rates and you have queries designed for security teams the one that my team uses for example with uh we care more about false negatives ourselves so we will have a lot of false positives but we triage them so by default you have the standard queries for developers just by adding one line into your configuration you can also benefit from the security queries sourced from the community and for my purpose here of enabling empowering developers these alerts are displayed in the developer's flow in their pull request next to your next to the code next to the very line of code right so I remember this friend who was engineering manager it was five years ago and he was very happy because his sourced tool was upgrading and he tells me hey you know the previous version was generating the security alerts in a big pdf and now it will generate jira issues and developers will love it well they did not love it right because yes we can all agree that jira issues it's much better than a big pdf but it was not in their flow right so instead here you have the community expertise flowing directly to the developers flow in their code in their pull request you'll notice also here the link show more details that will just open you know more information about the vulnerability how to fix it because again we are speaking also here about shifting left expertise and not only just the security finding and it works by bringing these alerts into the developers flow by eliminating false positives we obtain an impressive fixed rate half of these alerts are fixed immediately when and where they are raised which is really a testament to the effectiveness of yeah shifting left really an empowering developers my last example is this training that my team developed it's a game it's a set of challenges to train your developers on secure code and it's an in repo game so developers just have to clone read code run tests fix the code we run the tests and it's open source so we have contributions from the community who are adding more challenges as we go so here again same principle community powered because it's open source and directly usable by developers it's in their ID or in their browser and they are just doing what they usually do as developers very good they code they test they fix so again an example of community security knowledge flowing at the fingertips of your developers and when you think about it this is a perfect use case for AI getting the community's knowledge and bringing it to the fingertips of your developers so now let's explore how AI can accelerate can augment this solution and yes how it reinvents really the way we are thinking about shifting security left so i will illustrate this exploration with a github programmer github copilot i'm pretty sure that some of you forgot but github copilot has been around for quite some time now we announced it in june 2021 and so it was really the first production ready AI pair programmer and i want to uh to say loud that it's not here to replace developers right it's here to help them be more productive by eliminating low value tasks such as writing regeps right so that they can focus on higher value tasks right this is the goal of github copilot and last year we surveyed 500 developers from uh 4 to 100 companies about their use of github copilot and nearly 90 percent reported that they were completing time task faster but what i find even more important is that 88 percent say that it helps them stay in the flow and it helps them focusing on more satisfying work and this is exactly what we want to do so well i will repeat that throughout the presentation AI is not here to replace developers just to make them more productive by accelerating the low value added tasks so now we are going to explore several ways developers can use AI to leverage the world security knowledge and remember with the same exigence that we stated earlier we want this knowledge in our flow we want this knowledge to accelerate what we do and not to slow us down and uh yes a small disclaimer what i will show now so what i showed you the whole github security suite is free for open source right but copilot is not and therefore the AI features that i will show you are not right but again i'm not in the product team so i cannot tell you if that will be in the future so let's get started with writing safer code can someone in the audience spot the bug it's a sacred injection indeed the user is concatenated directly into the sequel query so what can happen well the attacker does it play yes the attacker can put another sequel query in that query and drop users for example so drop your table uh it can they can create an admin user with elevated privileges they can animate all users of the table which can leak private information so that's a secret injection that's bad but if i delete the code that i've written and i leave it to github copilot to propose something the suggestion is safe from this secret injection it's users uh parameterized queries right and so the um the the user control value uh is not passed anymore into this query yes sorry are you running a form of security scanning for each suggestion from the copilot no let me let me let me tell you how it works so how does it work it's because we added a vulnerability filter on top of uh on on top of copilot right so originally codex the llm developed by open ai that powers uh copilot it's trained on all public open source code available on github so the suggestions are by default as secure or as insecure as this public code um but we added this filter this filter is another lm that is trained on codeql right and so this lm is able just to identify the most common vulnerabilities like path injections second injections and then identifying these most common vulnerabilities this lm is able to remove them from copilot's suggestions and so at the end the suggested code will be safer than the original code that codex was trained on does that answer the question but i will repeat this again copilot is not here to replace developers and copilot is not here to replace devops practices right uh it's even the opposite i'm saying that github copilot makes you um gain some time writing code you should do more of devops practices it's an opportunity to do more devops practices um now let's move on to examples of where ai can help us find security issues so we will see several examples of that one finding uh leaked credentials two finding security vulnerabilities in your code and three doing a code review so in github advanced security the github's security product suites uh you have a feature called secret scanning to detect secrets like keys tokens in your code that you you are checked you know into your repositories again free for uh public repositories and this is important because you know according to um a very reasons a data breach report in 2023 use of stolen credentials was by far the preferred way for attackers to breach into an organization so preventing secrets to leak into public code is going a long way to protect our organizations one million secrets in public code was detected by github's secret scanning only during the first eight weeks of this year that's more than 10 secrets per minute you know i i thought that it would not that it doesn't happen frequently yeah to push secrets in code well i was wrong it happens all the time so in addition to detecting secrets when they are already in repositories we added a feature called push protection which is a proactive detection of secrets leaked locally and then when you try to commit to sorry to push this commit to a shared branch then push protection would kick in and would block you and would tell you hey no you are trying to push a secret to this branch so it will tell you how to how to remediate that and so the secret we've will never reach a public branch in the first place and this feature was obtained until now right and last year we this feature prevented more than 30 000 secrets to reach public repositories just last year alone and so it was obtained but given the the scope of the problem we decided to enable it by default so now it's enabled by default it's been two weeks now and so you will have to opt out but by default we will prevent you to push a secret into a public repository and back to our topic how can ai help us here so all these secrets that we detect in code uh is because we are partnering with different providers like aws for example aws apple etc and we have 180 partners and more than 220 patterns so regular expressions provided by these companies that we can detect but you might still have some custom patterns that you want to detect that are very specific to your organization and these patterns you will have to write them with a regular expression right and well we saw in the earlier example that writing regular expressions is not always always easy a colleague of mine was saying that the plural of rejects is regrets and so to make it easier and faster for you now we include this form you know where you have an ai powered experience that guides you through creating those custom patterns and though this feature well it's not only an incredible time saver right but it helps you get the good coverage that you need to make sure that your secrets are secure uh we are shipping another ai powered feature it's now in public beta um but it's uh ai powered detections of passwords so if you try to use the traditional scanning methods with rejects matching you will have a high false positive rate to detect passwords because they they don't have a structure you know so so uh we are um shipping this new feature where we are asking ai to detect whether passwords are linked in code and so far the results are pretty promising uh we have a very low false positive rate um even with that low false positives rate what we will do is that we will um display these uh these results in a separate section uh with other low confidence patterns because the rationale here is that again we don't want to block developers in their flow so we are displaying separately the detections that are high confidence low false positives and the the findings that are low confidence and potentially high false positives again we don't want to interrupt developers for nothing same principle so that was for secret scanning now let's look at um finding security bugs in your code so we talked earlier about uh scanning your code with uh code ql our static analysis engine um and now we are shipping a feature called code scanning autofix so code scanning will now propose ai generated fixes right in the pool request it will enable your developers to instantly fix vulnerabilities while decode right and we think that it will give another an an uh an even better remediation time if you remember we had a 50% fixed rate when the alert was read in in in the pool request we hope that by providing a suggested fix in addition to the alert will make it even easier for developers to act on these alerts and you know these suggestions will also help developers to quickly understand the vulnerability and how to remediate it because they will basically have a concrete example of an insecure pattern their code and of a secure pattern with the fix and so it's it's also you know contributing to shift left expertise at the same time so this is at the moment uh for javascript and typescript but more languages are coming up so let's see uh that in action so here i'm writing a very simple code i'm personalizing uh hello world by adding a name uh i see that there is a reflected code subscripting here reflected xss in the in the uh in in the alerts in my pool request and then so of course here you can see where the data flows so again there are only two steps here but remember for for log for shell for example it's 150 steps that you could see in there um okay and then on this alert you see that well you can dismiss it if you deem that it's a fault positive but now this is the autofix autofix will generate will generate a fix for you the fix consists in uh sanitizing uh the the user control parameter with this call to escape and this fix well you can edit it you can dismiss it or you can commit it directly you can see that this fix spans across different files not only your original file right it's a it's a proper fix llm generated to help you remediate this vulnerability again it's right into your pool request right into your flow not anywhere else and it's again an example of securities community knowledge flowing into uh your your flow now let's look at my third and last example for finding security issues code review so in my team so i regularly ask my hackers to find uh well to audit some open source code that they are not familiar with right and so i guess it's also the case in organization when they ask security teams to audit some code that they don't know about so i can use a github copilot chat which is available through github copilot enterprise to describe the attack surface of a project right so the attack surface is the number of possible attacker attack vectors um where an attacker can access and abuse the system right so entry points to the system locations where there are uh accesses to um uh to assets of the system right so here i'm asking the attack surface and i get an answer from copilot chat telling me the different entry points the tip point points in the in the code that i should look at you can see you can know that it first starts telling me hey uh i won't perform the audit for you yes of course once again it doesn't replace you it just accelerates the process you know it kind of quick starts the process so now my hackers for example what they were doing before they were looking at the documentation you know and then trying to find their way uh okay what would be interesting or various database okay so i will look at that now here it just accelerates this process this is another example of the same thing and i used for that our secure code game repo which is an intentionally vulnerable and yeah i was happy here that copilot was able to tell me oh you know you're using a database so you should pay attention to second injection you should pay attention to entrusted inputs you should pay attention to managing elevated privileges and now my last example is for developer learning as we said shifting left is also giving autonomy and expertise to the developers not only the tooling so let's see this example i wrote some code and again you can see that there is a sequence injection in that code because stock symbol is controlled by an external user and is concatenated directly into the SQL query so i'm a developer let's say i'm a junior developer and what do i do after writing the code well i'm asking for a code review so let's ask githubcopilot for a code review and immediately githubcopilot tells me that there is a sequence injection in my code okay but what is the sequence injection i don't really know you know i'm i'm a junior developer what is the sequence injection so i'm asking and i will get an answer immediately that is not super helpful i mean i could i could get that from wikipedia right but it's not helping what i need is really to understand the consequences of the sequence injection and i need to have that knowledge tailored on my code right this is how i will be really able to to to grasp completely what this sequence injection is about and so i will get a clear answer here telling me okay yes stock symbol is user controllable you can add you know all one equals one to that and it will enumerate all of your database i can ask other um but again this is really tailored to my example to my code so i can kind of test it and see it right i can ask clarifying questions so here what am i asking uh yeah can this attack uh be used to damage my database okay and yes of course if the user adds drop users to drop table to to um to the to the external symbol so this is a way where and and only done you know only after that when i understand i can ask for a fix right but i just don't you know get a fix and apply the fix i get expertise in addition and this expertise is really useful and concrete because it's tailored to the context to my context to my code okay so so what did we see here we saw how developers can use ai to leverage the world security knowledge through several examples but what i want to to point out to point out is that we saw examples spanning through the entire development process right we saw examples with coding with code review with tests and even with learning now i'd like to conclude and give us some other perspectives um so yeah i think that it's for me it's pretty obvious that ai is reinventing developers day to day including uh secure coding and yeah from the survey that we that we had last year uh yeah 92% of developer surveyed are already using ai tools at work or for their personal projects so it's there right and i want to repeat again that ai is an accelerator it will not replace you it's an accelerator you know and you will spend less time typing you will spend less time searching the internet and it's an opportunity for you to have more time for security quality design learning and the good news is that from the same survey this is exactly what developers want they say that if they had uh used if they were using ai coding tools to gain time then they would like to use this time to focus more on code reviews and security reviews well that's good now um every new technology breakthrough comes with positive and negative consequences right so this talk will not be complete without exploring the dark side right so so i would just tell you the two questions that people ask me uh the most the first one is conco pilot reinforced in secure coding practices and the second one is conco pilot leak my private code to the public there are others i i trust you to ask them after the talk but these two are the are the really the ones that are the most asked so um it's a very legitimate question right copilot is trained on public open source code this open source code can be insecure and a copilot relies also heavily on your surrounding context um to tell all its suggestions therefore you know if you have a lot of insecure code in your team or in your project then it's yeah it's natural that copilot suggestions will will will be um um how can i say um tell all to that and will give you insecure suggestions but well this is not new right this is already happening before copilot a junior developer getting into a team or on a project the junior developer would be also heavily influenced by the surrounding code they will learn from the surrounding code they will copy paste from the surrounding code so that that was already existing before copilot so three considerations for you is that first of all it was existing behavior before copilot but now copilot will do it faster that's half a drug because uh we will see later that the it's it's it does that for secure code for insecure code but faster the second thing is that well actually it's not true it's a lie because we have this vulnerability filter that we apply on top of the suggestions so this vulnerability this vulnerability filter removes the most common security vulnerabilities so at the end the code generated by copilot will be safer than your original context and finally why the first point was half a drug because if you're going faster use this time to do more depth backups use this time to do more design more learning more testing you must not use copilot without depth backups practices imagine the following scenario uh you're a junior developer coming into a team and in this team there is a a piece of code you're working on where there are a lot of insecure patterns you're using copilot and thanks to the vulnerability filter the code suggested by copilot is secure the code that you're writing with copilot is secure okay but then during the code review phase you have a colleague who tells you hey don't rev on the wheel i've already uh coded something that does exactly that so reuse it instead of revamping the wheel which makes sense but then in the end by doing that you will become insecure you will use this insecure code right so again if you are using the gain time to do more depth backups to do more learning then you will hopefully prevent this uh scenario to happen so the second question that i get asked is whether copilot can leak your private code and make it public because if you use my private code as training data eventually suggestions to external users can end up in a public code so GitHub copilot for business does not retain any prompts so you know in the in the prompt there will be your prompt there will be also the surrounding code and this all of this prompt will be signed to the model you will get back the suggestion the prompt is not retained it is discarded automatically just after a suggestion is returned so no your private code cannot leak to a public code but there are more questions of course again i repeat that it's a new technology breakthrough people have a lot of questions and they are legitimate so we created the uh the trust center the github copilot trust center with dozens of frequently asked questions related to security privacy transparency just such as the one that i just answered and again these questions they are legitimate a lot of people are asking them so you can go to this page this is a great resource to understand what is happening with your data when you use github copilot finally uh so this talk was focused on how AI can reinvent code security but it's only you know a small portion of what AI can help with when it comes to security you know software security is much broader and we can anticipate a lot more applications of AI for software security right for threat intelligence penetration testing malware analysis security operations incident response my team is also using it for code security to um to to improve fuzzing for example they are using coverage data identifying um gaps in the in the in the coverage generating new fuzzing harnesses and then closing the loop with that so yeah certainly a lot more to come when it comes to how AI can help security so that's the end of this presentation so you can follow us on uh x with the github security lab account to get more information about what we are doing for open source security and uh you can also follow me uh thank you for uh staying for this last slot of the of the of the conference uh i appreciate it and uh yeah if you have questions i'm uh i'm here thank you thank you for your presentation um i have a couple questions is the developer learning it's part of the code it has to be in the github ecosystem this is built in in an copilot thing right you can't we can't access like for example if i don't want to use github i just want to use the developer training learning i can't write you have to use github on the back end right so what i showed was with github copilot but uh well i know there are competitors out there uh for uh who are also uh doing um well i know they're at least doing code completion i'm not sure if there are yes i i'm i'm pretty sure that you you you're not bound to to to that i showed example with the github platform but i'm pretty sure that in principle having this kind of developer helps where you can chat with about to get community expertise right you know flu i think that's that's a principle that i want to push but i i'm not aware of uh of other solutions okay and one more question is the code ql you said it's open source right the queries are open source so code ql you have the engine that run the queries on your code yeah and you have the queries and in the the the knowledge if you want resides into the queries because this is where a security researcher will say oh this pattern is insecure let me let me code it so all these queries uh huh are open source okay so i still need to go like i can't i still need to use github right for this so or do you have a do you have a public repository huh is it a public repository that that you have okay a big open source um i'm so for example if i use big bucket which is get on the back end yeah i wouldn't be able to use the code ql right is your project open source and no it's not it's a it's private so no okay so if my if i use big bucket in the it's it's a public i would be able to use yes yes code ql is free for all public repositories even if it's not on github but if you're on github you will benefit from the integration you know and the alert in the public voice etc yeah but if you're not on github you can and you have a public repository you can still use code ql on your code yes got it okay so for for code ql you have a nice feedback cycle in my opinion for example if you are a security engineer you can publish the queries via github and you can maybe get a bounty for that query that you can get and basically the entire open source ecosystem is improved do you also plan for something like this for the ai-based patches or the ai-based vulnerability detection for example myself as a security engineer to publish some patches which are uh correct or to publish some vulnerability detection patterns uh no no we don't it uh yeah it would be a nice uh next addition i guess i have two i have two questions the first is what is your stance on people hard coding test tokens or other sort of invalid tokens um and from like a machine learning model perspective i find rather sorry i don't i don't hear very well but no i think it's because of the noise a bit uh so my first question is what is your stance on hard coding test tokens um or secrets and from from like are those considered false positives and um because for ML models i feel like they're very hard to distinguish those types of tokens that are structurally the same as a real token but sort of the context in which they're used um sometimes people want to hard code those for uh particular test reasons and then the second question is um is there any application of using ML models to detect general sensitive data um like personal user information or other things um beyond just secrets okay so the first question was about how well my stance on these detection that are detecting a lot of false positives uh or just like are our test tokens or test secrets considered false positives or are they like considered real detections like what's your stance on whether people should be adding these to repositories or not huh um it depends it depends i i mean i know that we have some some uh secrets that are um um that are only for tests and that are not doing anything and we can add them but it will be detected by the secret scanning and then the developer will be able you know to say no i dismiss it's a false positive because blah blah blah and and and you you have the possibility to do that right um it's it's um my personal stance is that you know this secret if you have another way to to to uh to do that it's it's better not to uh to to add it in there right um because it's it's also kind of a habit you know yeah no don't put secrets in there find another solution right oh it's only for tests oh it's only a very limited secret to that perhaps yes but i mean if you can find a way to do it in another way please no do it in another way that's my stance but the product gives the possibility to to have this this case manage you can say no it's a false positive okay uh and then the second question was um just about using AI or machine learning models to detect general sensitive data like UI or personal identifier or information um beyond just secrets oh yes uh yeah that's interesting so this is something that uh i so again i'm not in product so i don't i cannot tell you if we are looking at that too but i know that it it was flowing like okay a secret what is a secret right um so i i don't know but i know that it was uh floating into discussion in in discussions so is is this integrated tight tightly into github so if i was using a git repository internally is there a way for me to use the AI uh so the AI no um secret scanning no uh no no well i mean but but codecurel for example you can use it for any uh public so uh advisory database it's it's open source you have an API yeah you can use it for everything uh yeah so yeah it depends but the AI know your question was specifically for the AI part yeah no codecurel you can codecurel for it's free for public repositories so uh and you can also buy it for private repositories yeah uh i know i have a lot of questions but it's okay maybe other people i i did not understand why you said co-pilot is reinforcing the insecure code now were you saying that people were thinking now i don't have to write learn how to write secure code is that why it's or is it something else why is it reading why are people saying it's reinforcing yeah insecure so well it's so i don't know why they're asking that but it's a question that i get asked a lot i i think that they might have read uh that somewhere but but when you think about it it's it's it's normal right because uh it's it's uh influenced by your surrounding code if you have a if you are on the project in a team if you have uh code that you already wrote that was insecure it will automatically say oh i will try to adapt my suggestions to prioritize my suggestion to match this style for example you know and doing this if you have wrote a lot of insecure code then it will give you insecure code you know that's insecure in insecure out right so that would be by default but fortunately we have added this vulnerability filter on top that removes the insecure suggestion and hopefully you know breaks the visual circle and improves thanks i also like your last bowl is saying about the privacy because you know as security engineer we're always concerned about yes i don't want to let people know what yes you know the stuff i have so um that was good yes yeah and that that's important for us it is important i like how you said ai is not going to replace developers or ai is not going to repeat it enough or i like how you said it repeatedly hopefully that's true oh i'm sorry let's give it a pause for thank you thank you all