 Good afternoon, everybody. Thank you for coming to my talk. My name is Michael. I work for Rockspace. Today I'm going to talk about how to integrate security into the ICD pipeline. Before I start, I want to do a little survey. So how many of you are working in security field? Great. We have lots of security guys. How many of you are developers? That's great. How many are QE or quality? Do you all know CICD? How many know CICD? How many don't know CICD? I say everybody knows CICD. That's great. Unfortunately, Jim will not make here today, so I will do the talk by myself today. I'm a manager for security engineering in Rockspace. I have been working in Rockspace for the past couple of years. I have been in the security field for more than 10 years. During the past two years, my team has spent a lot of time testing a lot of open-stack projects, including HATES, Neutron Network, CDN Network, Solom, and we found quite a few security facts. At the same time, a lot of Rockspace teams, they started to adapt the CICD process. So we have been working very hard to integrate security into the CICD process. So I would like to share our experience with you all. Before we started, if you go to the keynote by Jonathan, you will remember Jonathan mentioned everything about the software. Open-stack is a platform that lets people run software. So we gave this to our customers. Our customers run their software. During the process, our developer also creates software. And the tester, we do test software. And security engineer, we check the security of the software. Unlike a lot of traditional software security companies, Rockspace is doing things a little bit differently. In Rockspace, security engineering belongs to quality engineer. In Rockspace, we strongly believe that quality includes the tracing function, performance, and security. That's the key pin of the quality. For your software to work, it must be functioned properly. If you send a post request, the result should be created correctly. It also should be able to perform meet our SLA. So if we say we're going to create this resource within two minutes, it should be happening in two minutes. And it should also be secure. So our security engineer has been working closely with our performance engineering and the quality engineer to make sure all software we have developed that give to our customer tests thoroughly. There are no security defects introduced in the process. If we want to test the software, we might need to know and we might need to follow how developers are creating software. So there are tons of ways for developers to create software. Here are the three main ways people create software. Traditional waterfall methods and the CSE process. Let's take a look at them one by one. So a lot of products are still created by using traditional waterfall. So traditional waterfall is a non process. So you have step by step, first you design, you came up with all the function requirements and you began developing the software and then the test, the UI test, function test, the security test. And then it delivers it to the customer. So the advantage of the traditional waterfall is you have to plan everything. You have to come up with detailed plans from day one to day 60. We do function requirements from day 60 to day 19. We do development and a lot of stuff like this. And another advantage of the traditional waterfall is before we start to develop software, we already defined the scope. This is what we're going to deliver to the customer. And we are working very hard to deliver it. And some people say, you know, because for waterfall you need to think about the whole structure of the software. It might come with better design because you have to think about every component, how are they interacting with each other. But one of the main problems of the traditional waterfall approach is it might take too long for people to develop software. And because there is a low interaction between the developer team and the customer. When the developer team finally developed the software, delivered to the customer, oh, no, that's not exactly what I want. So people came up with a protocol agile process. So I'm pretty sure everybody knows how agile process works. The key benefit for the agile process is to increase interaction between the customer and the developer team so that the software can be delivered in a pretty good timeline. They run sprint, normally sprint are going to be two weeks or four weeks. And you came up with minimal requirements for these two weeks and you deliver the software in two weeks. And after another two weeks, you deliver another feature. Based on the customer feedback, you can always make a change. So the ability to change is very important. That's what makes it different from traditional waterfall. But business is not satisfied with agile process. They want to develop their products even quicker. They want to release a feature. In today's world, if you can release a key feature today, one day behind your competitors, you can gain a great advantage here. So people came up with the term of CICD. So for CICD process to work, people want to release their products multiple times each day. They want small changes, small changes. So you can bundle a couple of small changes into big changes. But the problem with the big changes is if there's something happening to the deployment, it's very difficult to troubleshoot. By separating small changes, it's very easy to identify what the problem is. And so the benefit of the CICD is you have less defects and the quality of the products is improved. And since every commit to the report always hand was deliverable. So it means basically you can release whenever you are ready. Some of the teams release multiple times each day. All those are good stuff. But this caused some problem for security team. So generally, security team has limited resources. We don't have enough people to test our software. And we cannot test them quick enough. So we are trying to find the better ways to improve. And at the same time, when we work with the development team, they always struggle between the priority rights. For security, we always want our top priority security. But the business wants the features. So they want the feature and faster than they can. So sometimes it's all... Because developer is always busy with their software, it's a challenge to make both sides satisfied. And a lot of our testing, because we want a better test results, we have to do a lot of manual tests, especially for OpenStack because it's API and there's no automatic tools. So it might take a long time for us to test the software. So all those are problems. But if we take another approach, these are all the opportunities for us to improve. So in Rockspace, we have been working very hard to take this opportunity to improve our process. We are trying to invite security in our CICD process. Here is a typical OpenStack project in Rockspace. And you can see here, it's one take of the CICD pipeline. So we have developers, whenever they commit the code to GitHub, the GitHub is going to trigger our build server. It's going to automatically build everything. Once the build succeeds, we're going to run our test here. We're going to run our security test, the function test. And here, we inject our security test here. So we have our source code security testing here. We also have our API dynamic testing here. If any of these tests fail, the system is going to automate general reports, send it to the team, including security engineers, so we know that there's something wrong and some action should be taken care. And we treat security defects the same as the function defects. So the pipeline will not continue until the security defects are fixed. Once all the build is successful, it moves to like-stage deployment. So in the deployment, one is the deployment to a predefined environment, like the staging environment or pre-producing environment. We're going to run all kinds of tests, the smoking test, to check if it's functioning enough. And we're going to run performance tests. And this time, we'll also run our security test to make sure everything we deploy into production is secure. If anything fails, there's always alerts sent back to the team and the team are going to take action to correct them all. So once we build the pipeline, we thought, cool. Yeah, we did it. And there are quite a few benefits we saw immediately from here. One of the first benefits is for our test time. Before doing the ICD process, we have some automatic tools, like a burp and some other tools. But the whole process is still manual. It's very difficult to repeat. So if one product was released to the product, to data center A, we repeat, we get together with the developer, ask all the documentation, and they're trying to make the function work, and then starting finding, you know, based on different attack strengths, check for SQL injection, check for authentication authorization, check for transfer layer security, and other security tests. It might, because there are so many functions to test, we needed to test every function. We needed to test every parameter, including every HTTP header. It takes a while for us to do this. And then the developer said, oh, we make a small change. But we end another new feature, and we release to another data center. It means we have to repeat the whole process again for another data center. So if we have five data centers, it means we have to repeat the whole effort to five data centers. So the test might take a long time, a couple of months, you know, it's really painful. With this CICD process, the first one takes some time, because we needed to create those automatic test scripts. That part, you know, there's no way to cheat here. But once we create all those automatic test scripts, the run is just, if you just type the command line, we have our framework runner, it's just keep going running all the test suites. Normally it takes a couple couple hours on the test run. If you want me to test another data center, it's just a simple change of the conferring file. We point out and point to, you know, London data center. It's just a couple hours, we have our test results ready. The second big advantage of this approach is we saw that our defects fixed time is reduced from weeks to days. In all the ways, we normally, after we finish our test, we generate this beautiful PDF reports, you know, to meet the compliance requirements, and we send the reports to developer team. And the developer team put it into their backlog and started working on that. Once they finish the fixing, they send us alerts, an email, and we go there, check if it works. We close them in our defect system. Some of the times, they're much more than iteration, so it takes a long time, because it's based on the developer's priority, how much time they have. But you wouldn't say I see the approach, because the security defect is just like the function defects. It's a roadblocker, so it forces the developer team must take action. We have a couple of incidents that, you know, once we identify security defects, the defects will fix in, you know, five or six minutes, and then they push that to stay in environment, and we verify them and close quickly. And for our security test, by creating automatic security testing, we can repeat our test, just like I mentioned before, you know, if we want to test different data centers, it's just a simple configuration change. And it's also audible because we have CICD process built in here, so everyone is saved in build server, so we can always come back to check. So, on Monday, we run this security test. On Tuesday, we run this security test all the time, so it leaves audit trail for us. Whenever it comes to the PCI compliance, did you guys test this product before? I said, yes, we did. Here are the results. This issue identified how fast they are fixed. And another big advantage of this approach is it generates a lot of metrics for us, so we can know, you know, how many tests we run, how many issues we found, and we can build this over time line, and we can tell the development team how this type of defect appears, for example, SQL injection appears two times in the past year, and using validation has been a consistent problem. So, based on this metric, we can work with the development team on possible training and improve our code in the future. Of course, during this process, we all run into some challenge, and I mentioned before, the first time is always time consuming, because we need to spend time to create all those automatic security test scripts. The good advantage for us is because we have been using our QE framework, so we see us sometime instead of playing this application by ourselves, trying to figure out what's going on, or read the documentation, we can just borrow their code, and the developer, our quality engineer, they can also help us create a security test case. They can help us speed some of these parts, but it's going to take some time if you want automation. And another challenge is different projects are running in different ways, so different teams running different pipelines, there's always some tweaks here. OPE stack running some QE framework is different from ROX based QE framework, and we have to overcome all those hurdles. And the last and most important one is it takes some time for our security engineer to adjust this approach. CICD is a process, but most important is the mine site. Everybody should agree that this is something we need to do. Whenever the code changes, all those stuff should be automated, and we should follow the pipeline all the time. It's a little difficult for security engineer to adjust, because we kind of got used to security consulting road, we test your products, send you reports, and it's all your problem. Fix it and come back. Let's look about, you know, since we want to integrate CICD process, how are we going to really do it? So the key is automation. And we said the only reason why CICD can work is everything is automated here. For security, what should we automate? Generally, there are three types of security testing we talk about. The first one is static code analysis. The second one is dynamic application test. For open stack, most of the API security testing. And we also have infrastructure testing. Basically, it's just scanning the network, scanning the server to check for, or let's open the ports to see whether you are running update software. Let's first look under static code analysis. For static code analysis, it's a process where you have access to the source code. So you get a copy of source code, you look through the source code to read through the logic or to check how data is processed step by step, which function is called, how does the code call database? And then you check for common security vulnerability, like SQL injection, improper using validation, stuff like this. There are generally two ways to do it. One is manually review all of the code. A manual review the code really can find a lot of good defects, but it takes a long time. The general rule of thumb is for a good security engineer, they can review up to 200 or 300 lines per hour. If you think about open stack, it takes a long time to review. So we definitely need help. So there are a couple of commercial products, like 45, wire code, the white hats, they are specialized automatic scanners. So those scanners are going to read your source code and based on their rules and the passing results, they are going to give you a list of defects they found. But unfortunately, this does not work because so far all those commercial windows, they don't support Python because Python is a dynamic language, it's very difficult to do data flow, analyze, baseline. We talk about with all those windows, they keep telling us, oh, this is going to be a new feature in the next quarter, and so far it has not happened yet. So here come our heroes. So open stack security community, they saw there is definitely a gap here and they create a project called Band-Aid. And Band-Aid is an automatic framework to scan your code for security defects. Ever since it has existed, it has been using a lot of projects, Keystone already invited into their CI process. And for Rockspace, we already used Band-Aid on a couple of projects. Here is an example of the results reported by Band-Aid. So we run the Band-Aid on Solom project. And you can see here, it definitely can give us some results back. Like it can tell you what's the sorority and how confidential about these defects and which language code might handle security defects here. The first one is about a sub-process without a sub-share. Basically it's just trying to write some system command and the Band-Aid saw it's a dangerous function and give you some warning. And the second one is giving a warning about using random. You know, random is used to generate some random number, but it's not a critical graphic strong random generator. So it's giving you warning that you should not use it for critical purpose. Just like all other automatic scanner, and you can see here, it might give you false positive, because you cannot see definitely in this case, okay, that's a security defect. Random is used, but is that used to encryption or decryption? Maybe not. So just like other tools, you need to dig a little bit further to make sure that they are not false positive. The good thing about Band-Aid is they came up with a configuration file, and you can include what type of check you want. For example, if your application does not use sequential, you can just, you know, completely remove all the sequential injection check. And if you don't want to look into your test case, just look at the example here. You can use the exclude directory and test here. So that means we are not checking security effects for our test code. In some other case, you always can find some new defects, right? Some new security defects, either by manually code review or doing other approach. And Band-Aid provides very flexible ways to write plug-in. So you can easily add plug-in to this framework. Here it just shows one of the plug-in for shell injection. If you look at the code here, you can see how it works. So the key function here is called function name equals configuration subprocessor. Basically, it tells the framework to check the function name. If they match something defined in the configuration file. And the second one is check call argument value, shell equals true. So basically, it checks whether there is an argument called shell equals true. If they found both, they're going to return an issue, say, oh, we identified the issue. You are using this dangerous function, and it has argument shell equals true. So you should not, these security facts. In Rockspace, we have been running Band-Aid for a couple projects. So whenever the code is checked into GitHub, it's going to automatically kick off the Band-Aid. Band-Aid is going to run it and send the report to the team and also keep all the results on our build server. It gave us a baseline about how good it performances. At this stage, we did not fail the whole pipeline because, and we talked before, it might have some false policy here. But if we introduce one feature and our defects jump from 10 to 30, it's definitely read alarm here. And the security engineer should jump in to take a close look at what has to be done. Next, we're going to talk about the dynamic APS testing. So basically, we need a working environment. And we're going to send the EUA request, test request to check a response to identify all the security defects here. It would be great if we have commercial products or other open source products that we can just automatically run this and it give us a list of security facts. But unfortunately, so far, that it does not happen. So, and first, we are thinking about how about we creating our own framework and doing some tests. And after we talk with Sam, our Sam from Rockspace, he quickly said no. And here come our here because we work with our quality engineering. They already have their framework, which do automatic function tests. So we can definitely leverage what they have. So we can build a security plug-in in their framework, which is specially do an insecurity testing. And we can also review their code because they already test the products and make sure all the function code works. So there's no need for us to read the documentation, send the request to see whether the JSON request in the documentation is correct or not. It took a long time, trust me, because open stack documentation needs a lot of improvement sometimes. And we came up with our limited checklist. So here's the list of the check we're going to run for all our products. And you can see here, there's common injection defects and transient layer defects. And the last one is application specific attack. So it's kind of different from different projects to different projects. It's based on what does this action do, and then we can bring some attacks, some tests we want to do. After that, after we came up with a checklist, we worked with our quality engineering, got their source code, got their model, got their client. And then we started creating all our security test case. Here is one example we created for our Neutron network, Neutron project. So this is for the authorization. The code is pretty straightforward. So we want to check, we create two clients, you know, one client create one network, another client create another network. We want to check whether client one can access, you know, another client's network. The second case is about whether one client can update the network from another client. So once we build all those automatic security test case, and once it went through the same process and our QE, they need to go through the JRAT review and be merged into the building machine, merged into GitHub. And all those, when our code, new code is pushed to GitHub, all our security test case is going to be run automatically. Let's share, you know, some of the security factor we identified here. So this defect was found for Neutron network. So this is a post request that you send when you want to create a subnetwork. So you send a post request. In the payload, you define what's your subnetwork name or what's your network ID, and pay attention to there is a parameter called DNS name server. So the proper value for this should be IP address or least IP address. But we found, you know, if we send a long string of one, and the server is going to give us a 503. And in addition to that, any further request to the server just retain the same. And after working with the developer team, they tried to log into the server and found that they could not run trace on the server. So basically this one request just killed the whole API node. After digging around about the source code, it turned out that for this security facts, the code did try to do user input validation. But their user input validation is relying on regular expression. Their regular expression is so complex. That's the most complex regular expression I ever see. So basically our tag string just killed the regular impression check. It just killed all the API nodes. So here's some lesson we learned by integrating our security testing to our CICD process. The first, you know, CICD process has become popular. More and more projects are adapting to CICD for better code and better quality. And this definitely brought the opportunity for security team to improve the traditional way of doing stuff. We should automate our security testing. And we should improve our process. And the band is very good to enhance the future. Even though it has a lot of features that are not available in commercial automatic scanner. But it looks good. By working together, by contributing code to Bandit, we can definitely make Bandit a good tool for us to use in the future. The last one is about collaboration. So internally, we collaborate with our quality engineer performance engineer development team. And more importantly, we need to collaborate with OpenStack community. We're going to push our code to OpenStack. And we want to OpenStack to integrate all our security testing in the test suite for the future. So together, we can make OpenStack a better and secure product. Thank you all. If you have a question, please use the microphone here. All right. Thank you. I have a question. So just mentioned your team are using their own QE framework. Where are you now using the upstream time pass for the unit function test? I would differ second time. It's open source. It's widely available. Also, do you have any plan to push out the security testing test case upstream? Yeah. That's something our team are currently working on. So currently, we are working with the CDN team and the Solom team. Are you going to push it to the Tenpence or another project in the OpenStack? We are going to try to use Tempest for this one. Okay. And probably another folder on the Tempest that API and COI and another security folder on the Tempest. There are always debating which one we should use. Okay. All right. Thank you. I'm trying to stay away from this wall. With your testing framework, your REST API testing, and it looked like a lot of it was modeled off of basic functional testing. I see the paradigm between security testing and functional testing radically different. So these tests that you write today, it looks like you're looking for a single input, a positive output. Whereas security, anything can be considered good input. And then along with that, how do you readjust your frameworks and your tests based off of new and emerging security vulnerabilities like the switch right now from UTF-8 encoding attacks to UTF-16 encoding attacks? And how do you make sure you're ahead of the game when those new vulnerabilities come into play? So there are two questions here. The first one is, you will function test a different from security test. But the RWU, they are the same type of test, but forks are different. So QEHAM negative test, that's in some extent a security test. So basically our test, we leverage the same framework. We send some attack stream. So for example, for SQL injection, instead of valid, just like the example we show here, instead of IP address, we try different values to see whether we can crush the server or it retains a different response. So that part we can work on. The second one for the new attack. So for new attack, the framework will not automatically do it. We have another suite that we use by ourselves, which is used by security. So basically it's a flooding. So we find a lot of stuff, just like Unicode or whatever, different encoding with other stuff. But that's going to take a long time because we have a huge list of file streams from all of the end data points. So it will not be able to integrate with the CRCD process because it might run two days or three days on that. But once we identify something, for example, if we file the stream, all ones, and we found that we can kill the neural network, we can build a test case in our test suite. So make sure in the future, this one will not cause a problem. Thank you. Thank you. Hi, my name is Rob Clark. I'm from HP. Firstly, I want to say thank you for doing this talk and rounding out the security track today. I think you did an excellent job. Thank you. It's really encouraging to see Rackspace and other big organizations pick up Bandit, which we open sourced recently, and it's really great to see it's in your CRCD chain already. My question is around, have you developed any extra plugins or enhancements for it, and will you be pushing them upstream? To this stage, we haven't yet, but that's what we are hoping to try to do. We definitely want to get more involved with the Bandit and trying to push some of our security tests to the community. Cool. Well, we have code sprints on Bandit around about once or twice a year, and we'll look forward to seeing Rackspace there. Cheers. Yeah, great. Hey, so my name is Tim Kelsey. I'm also from HP. Yep, really great talk. We actually have a talk about Bandit specifically, so later on this week, that should be quite interesting. I wanted to ask if any of the findings from the work you've done end up documented anywhere. The security project, which is the recent merger between the OpenStack Security Group and the VMT Group, the vulnerability management team. We have a couple of initiatives, the OSSAs and the OSSNs, where we try to document potential security issues and things that can be avoided with configuration changes and things. I just wondered if any of this stuff that you've discovered through this testing process gets fed back into that process at all? Some of the defects were identified. We logged them through Launchpad and started defects and went through the process, but for some of the tests, we did not. It depends on which project we're working on. If the code is from a belong to OpenStack community, we follow the process. If it's Rackspace customized, we did not. Okay, cool. Okay, thank you very much. Thank you.