 The next two sessions will highlight technology partners and how we innovate together. At GitLab, we're strengthening and maturing our product organically. And while our static application security testing is now mature on our rather rigorous maturity scale, there's always room for improvement iteration, especially with help from technology partners. So it's my pleasure to introduce to you Luke O'Malley with R2C and Taylor McCastlin with GitLab. They'll show you how together we've integrated R2C's SEMGRIP next generation SAST with GitLab to prevent vulnerabilities with secure guardrails. Be sure to ask questions and chat or connect with them after the session. Hi everyone, welcome to GitLab Commit Virtual. We're thrilled you've chosen to join us today. We're going to chat about a integration with SEMGRIP, a new static analysis tool created by R2C. We'll talk about the next generation of SAST to prevent vulnerabilities with secure guardrails. And today I'm joined by Luke O'Malley, who is the co-founder and head of product at R2C. And we'll chat about what this integration looks like, why we've chosen to integrate with R2C and their SEMGRIP tool and what the future holds. And then Luke is going to tell us a bit about what the future of SEMGRIP looks like. So let's jump in. And our first point is we are going to talk about future-facing roadmap items in this presentation. So as with everything, this is all future-facing, things may change, plans may change. So don't use that for any purchasing or planning decisions. Again, all forward-looking statements. So let's dive in. Let's start with looking at what GitLab SAST is today. Take a look at where we're going in the future and then we'll talk a bit about the SEMGRIP application itself. So in terms of GitLab SAST, we've really focused on developing this shift-left persona to really embed security into the DevOps platform that we have here at GitLab. Our ideal setup here basically is that as you develop code, security tests are running and analyzing the source code contributions that you're introducing with your code commits. And then you're able to mitigate those vulnerabilities all before actually deploying that code to production. So we're really trying to embed security scanning and make it an everyday concern for developers. You may have heard this referred to as the shift-left, shifting security tools as close as possible to developers. And one of the reasons we want to do that is because we want to increase efficiency for development and security. When you commit code, your developers are thinking about that code, they're in the flow, they're developing, they have the context of what they're trying to accomplish with it. And if they introduce a security vulnerability, they immediately have the context to work through and remediate that vulnerability. If we wait a week or two or three and deploy that code in production, the engineers have moved on to other functionality, they don't have the context of the code that they wrote. So it's a lot more expensive and a lot slower to remediate those vulnerabilities if you try to remediate them late in the process. So this is that shift-left that we were talking about, shifting development and security testing as close to the actual development time as possible so that you get the added benefit of that shared code context for developers and reduce the cost and increase the time to remediate those vulnerabilities. So what does this look like in practice? So we've got a visualization here of a traditional feature branch development process. As your developer is working on a feature, they're pushing up commits. Every time a commit gets pushed to GitLab, we're triggering our CI CD process, which runs and triggers all of our security scanning. That is the time that we're detecting all of those vulnerabilities. You'll notice on the far right here that we run security scans again when you merge that feature branch to your default branch. When we do that, we're also running those security scans and we'll compare that feature branch with the development branch so that you're getting a direct cause and effect with the code vulnerabilities that you're introducing with the changes that you're working on. So you've got a very direct cause and effect for a vulnerability so the developers can really understand I made this code change and I introduced this vulnerability. I need to go and remediate that. So that's how our SAST setup works today within GitLab. All of the tools, we support a variety of them. SAST, DAST, Fuzz Testing, Secret Detection, Composition Analysis, Dependency Scanning, all of these tools run within GitLab CI and produce outputs. We've normalized all of those that you can triage and remediate those vulnerabilities, manage them. There's a whole vulnerability management suite associated as well. And all of that is built in with our security scanning tools. We really wanna make it as easy as possible for developers to run as many security scans as needed and to really layer those on top of each other so that you're getting the best type of scan. And that's really one of the reasons why we're talking about Simgrep today. We think Simgrep is an awesome tool that is the future of software development and security testing. We want to bring those best in class security tools to developers and thus that is kind of where the integration with Simgrep was born from. So Simgrep is joining our entire lineup of SAS tools today, though we'll talk a little bit about what that migration path looks like. GitLab SAS is a conglomeration of about 15 different tools that support about 20 different languages. We have automatic language detection. So as you're pushing those commits, we're looking at the code changes and running the appropriate security scanner. They're all language specific today. So you're getting the best scan for the type of code change that you're committing. All of this is built in and easily configured. All you have to do is include that SAS CI template that you see the code snippet for there. We also have a UI tool to easily enable that as well as configure it if you need advanced configurations. We also support customizing rule sets so you can pass custom things to those analyzers as well. All of this, we try to build with it working by default. So we've also built this into auto DevOps so you can check a box on your project and get all of GitLab's security scanning capabilities on and running by default. In general, they just run and they tell you about security vulnerabilities that you can go and then fix. And that's really how we've built all of our security tools. And thus, why we again are starting to migrate towards SIMGREP because we think it is a amazing and very fast static analysis tool that's really pushing the bar forward for what kinds of vulnerabilities you can detect in source code. So let's continue on and take a look at what we've got today. I did briefly mention that we support custom rule sets. This is when you start wanting to fine tune your rules and add custom rules. You may want to introduce rule sets. The rule sets and the reason I bring this up is because the SIMGREP community is actually very rich. There's tons of community contributors that are building and writing rules for the SIMGREP engine. So part of our transition to adding SIMGREP is actually enabling all of those community rule sets. We'll talk a little bit more of that in the future. But with our 13.11 release, we announced a partnership with R2C in their SIMGREP tool to power the future of GitLab SAST. As I mentioned before, GitLab SAST today is about 15 different tools. As you can imagine, managing, updating, keeping all of those tools running, secure, is a large burden. There's also a lot of work to be done to add features like that custom rule sets feature that I talked about. We have to implement that 15 times on top of all of the analyzers that we wrap. It has worked up until now, but we want to support more languages and support them faster. And so one of the thoughts we had was, what if one tool could support multiple languages? And what if it had a really great API and development team behind it? And that's kind of how we discovered SIMGREP and the R2C team. We reached out to them and said, hey, we really like what you're doing. The tool is a really modern interpretation of SAST. We would love to come and replace some of our tooling with the SIMGREP engine. And that is effectively how this partnership was developed. In 13.11, we released a beta version of SIMGREP powering our JavaScript TypeScript and Python analyzer. We recently released that in 13.12 as well. And so all of this together has really helped us to start reducing the number of security tools that we've got running. And we're actively transitioning these analyzers now to SIMGREP. If you've used GitLab SAST and you have JavaScript, TypeScript, or Python code, you actually may not have noticed it, but you're already running the SIMGREP engine by default. We, as part of our management of all of these tools, actively change the way that those rules and the triggering mechanisms work. So we have behind the scenes swapped out those engines to now be running SIMGREP. Ideally, you haven't noticed anything. You're getting the same quality results. We did a lot of work to try to replicate the quality and the coverage of all of those security tools to try to make as seamless as a transition as possible. This is only the beginning for us. We really want to empower the rest of SAST and continue transitioning a lot of our analyzers to the SIMGREP analyzer engine itself. The idea here is that basically by making this transition, we'll reduce the number of analyzers, we'll engage with the active community that R2C has developed with writing rules, we'll also contribute rules ourselves. This is all built on top of SIMGREP's advanced detection engine, which has a number of modern techniques for detecting vulnerabilities, which Luke is gonna go into here in a bit. It'll also help streamline the customizations that I talked about with custom rule sets so that you'll have a single way to express vulnerability detection rules and in a really easy to understand language. That was one of the things we really liked about SIMGREP was its rule set grammar was actually very easy for developers to understand and to customize, which is not true for all of our tools that we have today. And then this sets us up to enable new language support and really expands the number of tools and languages that we're able to cover. So that is really the background behind why and what we're doing with the SIMGREP analyzer. And just to give you a sense of scale, GitLab SAST today runs 2.75 million monthly scans. So that is a SAST scan for every code contribute that is being run on GitLab. And of those 2.75 million in the past couple of months, we've already run nearly half a million SIMGREP scans. This has truly been phenomenal to see the just immediate success of this tool. We've done a lot of work to really try to make this as seamless as possible. In most cases, developers on GitLab have not had to do anything to start leveraging the benefits of SIMGREP and we're only just now getting started to be able to unlock all of the awesome functionality within SIMGREP. And to that point with when we start looking at what's next, we're gonna continue migrating some of our analyzers to SIMGREP so that you're getting the benefit of the rule engine that SIMGREP offers and the community of rule writers that are contributing to those rule sets. We're also looking to add support for native SIMGREP custom rule sets so that if you find an interesting rule pack on SIMGREP or want to work with the R2C team to develop a custom rule pack, you'll be able to bring that in and run that alongside natively with GitLab SAST. And then we'll also start expanding to other languages. We're going to try SIMGREP's new beta language support for Rust and then kind of go from there as we enable and roll out all of the languages that SIMGREP supports. So that's kind of a look of where we're at today and where we're going with GitLab SAST and the SIMGREP engine. And now I want to hand it over to Luke O'Malley to talk about R2C and the SIMGREP engine. Luke, take it away, tell us about your tool. Great, thank you, Taylor, I really appreciate it and hello everybody. So I wanted to spend the little bit of time that we have together to talk through SIMGREP both within GitLab and outside of it. So give you a little bit of context on the tool and the types of problems that it's really designed to solve and we can answer questions during the Q&A. So just as an illustrative question, I wanted to ask the group, what do SQL injection and cross-site scripting have in common? And these are actually both vulnerability classes, it's not quite the answer I was looking for. We'll leave that question open for a second and I want to share a story that's going to provide a little bit of detail and what these have in common from a SIMGREP perspective. So stepping outside of security and software engineering, I actually want to talk about building codes and in particular fires within the state of California. So I live in California, so this is very near and dear for me. And I was reading about the campfire which happened about two or three years ago and found this interesting stat that in 2008, building codes changed and the effect that the building codes had on fire damage was dramatic. So if you had a building that was built before 2008, 82% of properties were damaged and if it was after 2008, it was only 49%. And the change was relatively small, so it had to do with fire retardant cladding on homes. But it was a building code, the way that we construct our homes that had an effect on the safety and security of that home into the future. So the reason why I bring this up is, we could have built faster fire trucks or maybe bigger hoses, but instead we changed the way that we actually build and I think this is true for software as well. So what would it look like to enforce software building codes? And this is an approach that a few companies have taken and championed. So Google comes to mind and then also the React community comes to mind. And the building codes in this case were referred to as secure guardrails or secure defaults. So I want to talk a little bit about this and this is how SQL injection and cross-site scripting have something in common. And it has to do with the mitigations and how you eliminate the prevalence of these types of issues in your code base by construction instead of looking for them kind of after the fact. So in the guardrails case in SQL injection, instead of making raw SQL queries, most folks now use an object relational mapper of some sort, right? So it's an abstraction on top of their database that they use to interact with the database. And when you introduce this, you virtually eliminate SQL injection vulnerabilities. And I believe this is the approach that Google has taken to great effect. And in the secure defaults case, really the question is, how do we construct a library or a framework that is very difficult to misuse that doesn't require specialized knowledge to use or you know when you're basically going outside of normal practice. And so for React, instead of having set inner HTML or inner HTML editing, they're very explicit that it's a dangerous action. So there's dangerously set inner HTML is what a software engineer has to interact with. So these are the types of things that I think for me in the members of the SEMGREP community, we feel like are the future of DevTechOps, shifting left and a secure code base. And that's where SEMGREP really comes in. So the intention of SEMGREP is to not only find bugs, but to help you enforce these code standards. Whether those are standards that are set by the community. So this might be the maintainers of a particular library have standards that they believe should be followed, or it might be things that are specific to your organization, like the ways that you specifically use authentication libraries or the ways that you specifically interact with databases. So SEMGREP as a tool, it's an open source tool. It works on 17 plus languages right now. So it's got broad language support, which Taylor had talked about being one of the compelling reasons to try it out. And we do have a rich community. So we've had about 1000 plus rules that have been contributed to the registry with more and more contributed every day. So if you want to just get started, you can go and use rules that other folks have already written, which helps us rapidly codify the knowledge of other security teams and APSEC members. And when you write rules in SEMGREP, one of the things that we found is that a lot of tools required a lot of specialization. So you had to be a security researcher, maybe you're a PhD, you know about things like abstract syntax trees, things of that nature. And that was a barrier for a lot of folks who had knowledge. So they weren't able to share that knowledge via a rule. So we've made it really easy to just more or less copy the code that you want to match and that becomes your rule or your pattern. So it's really the democratization of rule writing. It runs super quick. This is all part of the shift left theme. Like you've got to run this as close to code conception time as possible. So we have a lot of things for that. And it should be easy to adopt. So there's some niceties there that the GitLab team has also provided. So when you think about SEMGREP, there's always a question like, hey, where does it fit into the space? You know, I've heard of linters. I use grep sometimes. There's some more complex, maybe expensive tools that we've discussed using. You know, where are you? And so SEMGREP falls somewhere in the middle of the spectrum. So we wanted a tool that was easy to use, but powerful, a tool that was smart, but simple to use. And so we position ourselves somewhere in the middle where it's aware of the semantics of a programming language. So it's not just performing a textual search like grep might. And it's also understands things like in the Python programming language, there's different ways to import a library, things that you wouldn't want to have to think about all those different edge cases SEMGREP takes care of. So that's some of its intelligence to make rule writing easier. And within the GitLab context, if you wanna learn more, there's a specific page you can go to. So it's semgrep.dev forward slash for forward slash GitLab that talks about both GitLab's native integration and then also some of the SEMGREP community's customization on top of that specifically so that you can use rules from the registry. And we're gonna dive into that in just one second. So there's a little bit of information there for you to check out. And the registry, which is at semgrep.dev forward slash explore contains both individual rules that members have authored as well as collections of rules. So we call those rule sets. And so you can go through the registry, you can search it by language, by type of issue that you're interested in. You can click on any rule or rule set and you can run that very quickly, either in your terminal or if you're using the integration that I spoke about earlier, you can actually just add this to your job definition within semgrep CI. So it's within GitLab CI, excuse me. So very easy to go in and start to add more stuff. So here's a sample where I've chosen to run a rule set that covers OWASP top 10 type of issues. And if you're feeling super adventurous and you are fully committed to shift left and you want to get results in front of developers, there's also an integration for inline merge request comments, which has been, as a developer really nice to get. So I just wanted to show that briefly. So again, you can go to semgrep.dev forward slash forward slash GitLab for more details. And the GitLab documentation as well is fantastic for the native GitLab integration. So we promised to talk a little bit about what's coming next for semgrep. So the big thing I think that we're excited about is really helping the GitLab team transition more of their analyzers, which has been a truly awesome relationship, both because the team is contributing their rules back to the registry for everyone to use, but it's also pushed us on the performance front, kind of grinding down the sharp edges, which has been really, really valued. We also have our eye on a number of performance improvements. So we've got the kind of catch tagline that is static analysis at ludicrous speed, but we want to be even faster and specifically improving the in editor experience so that you can run your rules or rules from the community on every single file saved within the editor. So that's truly trying to shift left. And there's been some questions about the power and capability of semgrep. And so one of the features that we're adding and working on is further tainting capability. So you can specify sources and sinks and accomplish some more sophisticated types of rules and checks. And then finally, more languages, more and more folks, both within the R2C team and within the community are writing new language parsers. So I know that we're adding bash right now, HTML. We talked about Rust earlier, Kotlin, basically all the languages that you can think of. And we're really fortunate to be able to use the tree sitter project as well, which is another open source project. So if any of this seems interesting to you, we have a really active and I think really friendly community. You can access that through R2C.dev forward slash slack and folks are there to answer your questions both from your R2C team and then from the broader community. So I would love to get a chance to show you what semgrep is all about. And that wraps it up. Taylor, is there anything that you wanna add before Q and A? No, I think in general, hopefully you've enjoyed this presentation hearing from Luke and getting a slight peek into how awesome the semgrep tool is. Like I mentioned, we've had a blast working with the R2C team. They're a great group of people. Definitely if you are running GitLab SaaS today, you're already likely using semgrep and don't even know it. If you're interested in customizing that experience, looking further into semgrep is probably a great next step. And like I mentioned, we'll be coming out with more native integrations for you to be able to run your custom rule sets and to develop your own and run those alongside GitLab natively. So lots of exciting capabilities here and ultimately all of this is about helping you write more secure code. And that's what we're here to do. Everyone can contribute and hopefully with GitLab SaaS and semgrep, everyone can write really secure code. Luke, thanks again for joining me today. It was awesome to hear more about the semgrep tool and where you're headed in the future. I can't wait to work more with your team. We feel the same, thank you Taylor and thank you everyone for tuning in. Awesome, thanks everyone. Have a great commit and we'll see you next year. Thanks.