 Okay. Hey everybody. My name is John Price. I'm with Intel, IT at Intel. And I guess a few months ago, somebody said, hey, you should put together something to present at the Cloud Foundry Summit. And I'm like, what would we do? And everybody's like, well, how about somebody about security? You know, I'm like, yeah, why not our security guys? They're kind of psycho. We call it psycho security. That'll draw people in. They'll all show up. And much to my dismay. Here you all are, because public speaking in my thing, but anyway, I'll give it a shot. I've been with Intel for about 15 years, and I've been involved in Cloud Foundry for, gosh, since early in the V1 days, we've been deploying and using Cloud Foundry. Pretty small team of us. Just up until recently, it's just been pretty much two of us engineering this stuff, myself and my partner, Aaron Huber here. We just got two more guys in Costa Rica who are ramping up right now, because we are expanding. We're growing all the time. Anyway, next slide. Next one, we can skip that. Wasn't there one more in there? Maybe not. Yeah. So a little bit about Intel at IT. We're a pretty big company. Most of you have heard about Intel, I hope. We're over 100,000 employees, about 6,000 of us are in IT. We're actively consolidating our data center footprint. We're down to 61 data centers right now. It's down from 91 in 2010, and we actually had 142 in 2007. So a lot of virtualization, a big effort to consolidate. We support a lot of devices, almost 120,000 of them. 54,000 handhelds, just about. Most of those are employee-owned, bring-your-own-device sort of things. I guess that's about it there. Next? Back one. Back two. Yeah, okay. Our objectives here. Well, we're going to talk about the importance of security and why we can't ever assume that our own networks are secure. That seems to be a common misconception that once it hits your network, everything is safe, secure. You can do whatever you want. That's not actually the case. Most breaches happen internally. And I'm going to show you how we at Intel have addressed that problem ourselves by enabling N10 network encryption. All right. So security breaches are on the rise. I'm sure you've all heard about them lately. I've been in the news. According to a 2014 report called the Insider Threat Security Manifesto, that 35% of organizations in the U.S. and UK of over 10,000 employees have had an internal security breach in the last year. That's a lot. And those are just the ones they know about. So if you haven't been hacked, it's just a matter of when maybe. Hopefully not, but that seems to be the trend. Okay. Here's a few examples of them. There was the Operation Aurora, Stuxnet, Target, Home Depot with their point of sale, devices being hacked, Anthem, Blue Cross, Blue Shield, just a lot of stuff lately, Sony. It's been all over the news. Okay. A little bit about why everybody should take care about security. You know, everybody has a role in this as a platform operator. You know, your role is to protect both the platform itself and your customers, not just the customers who are landing apps, but the customers consuming those apps that are landed. So, you know, as a customer, you initiate a secure connection to a platform to adapt HTTPS. It's reasonable for you to expect that your connection and your data is secure all the way through, not just to the first load balancer it hits. So, as an end user, you just say, I'm making a secure connection. You don't know that down the road that's being unencrypted and send clear text. As a platform operator, you also need to protect your platform. If somebody gains administrative access to your Cloud Foundry deployment or your Bosch deployment, they then own that thing. They can see all the customer apps, all the data. They could destroy your platform. None of that would be good. You need to protect the rest of your org. If somebody gets access to an account, like a valid domain account on the network, that could just be the gateway, the entry point to gain access to other things on the network, beyond your own Cloud Foundry stuff. And as a developer, you need to be aware of what the vulnerabilities, the potential areas of access are within the platform you're landing on. So, if you're deploying to Cloud Foundry, what are the potential areas where your data could be compromised? And how can I design and code my app in a way that mitigates some of that? So, again, never assume your network is secure. At present, in a standard Cloud Foundry deployment, there is data being transmitted in clear text between the servers in your deployment. So, all the Cloud Foundry servers have clear text messaging going on between them. That includes usernames, passwords, all customer application data. And to help show that, here's a diagram depicting that. So, you know, you have somebody accessing your environment in the standard deployment. You have some sort of a load balancer sitting in front of your Cloud Foundry deployment. And somebody comes in with their CLI or browser, whatever it might be, and they initiate a secure HTTPS connection. That traffic is terminated at the load balancer. And from there on, everything inside is sent in clear text. So, somebody gets a worm on your network. They've compromised it some other way. They're sniffing traffic. All that is visible to them. All right. So, what have we done at Intel to address this problem? Well, our solution includes three main components. We've deployed the HA proxy from the CF release. So, with Bosch, we just deployed HA proxy. And we use that to terminate our HTTPS traffic from our load balancer. We've implemented IPsec to provide encryption between all of our Cloud Foundry and Iron Foundry servers. And for those of you who don't know, Iron Foundry is another component that you can add on to Cloud Foundry that gives you a Microsoft .NET stack so you can deploy .NET apps. Works well for us. And IP tables and windows firewall to restrict traffic to only allowed ports. Because once we've enabled IPsec, the security groups that you've implemented in your infrastructure as a service, you know, in AWS, OpenStack, whatever you're using, can no longer inspect that traffic. It's all now encapsulated within IPsec. So, we need a way to restrict the traffic. So, after we're done implementing the solution, here's what it looks like. We still have the secure connection to the main load balancer outside of our platform. We still terminate that traffic there, but then we reinitiate an SSL connection to our, to HA proxy. And from everything, everything within there is now IPsec encrypted. So, at no point do we have unencrypted traffic in our platform. So, pretty much eliminated all the, all the open, all the open traffic. So, what is IPsec? For those of you who don't know, IPsec is Internet Protocol Security. It's an industry standard protocol suite and it provides authentication and encryption at the IP layer. Most VPN softwares use IPsec underneath. There are some, some nuances you need to keep in mind when you're working with establishing IPsec tunnels between Linux and Windows systems, you know, including the, the, you know, finding common protocols, ciphers, encryption suites, et cetera, pretty much every aspect of configuring IPsec. You have to find two things that work between Windows and Linux and that's, that was a challenge for us actually. Next slide. So, so how do we, how do we approach this? How do we do this? Well, first thing we had to do is identify what all the communication flows were in our Cloud Foundry environment. There are several ways we did this. You know, our configuration files, our Bosch manifest, a lot of those ports are defined and configurable. So, we knew what those were. Code reviews, we've been doing Cloud Foundry for a long time. We've seen most of the code in there. So, we've seen some of the ports in the code. But the easiest and probably most direct way to do it is just use something like Netstat, TCP dump, Netmon, and go to each server. And as you can see here, you know, a Netstat dump shows, shows what's listening here. And this looks like one of our lab servers. So, it looks like it was probably a Go Router and an API Cloud Controller server, just from looking at that one. That's what it looks like it was. Next slide. We had to create IP tables rules. That's the firewalling to restrict just the loud ports. I don't know if you've worked with IP tables before or not. It's kind of a heavy curve when you first learn it. But this is just an example of what a NAT server would look like, the IP tables rules. So, we have, you know, we're allowing ICMP traffic, loopback traffic, UDP port 500, ESP and AH. Those are our IP sec rules allowed there. We're allowing port 22, telnet. These three here, 4222, 223, and 8080. Those are for NATs itself. So, those are the actual NAT server ports. And then we drop and log everything, everything else. So, that's just one example of every server, every component is going to have a different set of IP tables rules. Next slide. So, an important thing to note here, we're using certificates to authenticate our servers for all of our IP sec tunnels. You know, so you can, if you do use certificates, you can use your own certificate authority that you create or you can use the existing PKI. But if you do that, you have to make sure that you get the correct key uses extensions and extended key uses extensions on that certificate. In this case, digital signature, key and cipherment and server auth. Otherwise, IP sec won't use that cert. And most PKIs don't add all of the appropriate server extensions. I know our internal one doesn't. We had to create our own cert. Next. We had to install and configure Strong Swan. Strong Swan is the secure or the software suite we're using on the Ubuntu stem cells to provide IP sec. It's a standard software. Just apt to get install Strong Swan. The important thing to note here is the IP sec comp. This is where all of our configuration is set for IP sec communications. So our key exchange is internet key exchange version two. We have our internet key exchange ciphers and algorithms, you know, the ESP cipher. You know, what we do for dead peer detection. Our certificate, we define the certificate we use to authenticate between servers. And finally, all of our connections to the other servers are defined here. And this is just an example of the configuration is much larger. We have many more servers than that. But the important stuff is there. All right. When we implement IP sec, that does the absolute overhead to the protocol or to the packet. And because of that we have to determine and set our network MTUs both on the virtual machines in the Cloud Foundry deployment and within the Warden containers themselves. For those of you who deploy to OpenStack today using GRE tunnels, you're already familiar with part of this. You probably are already setting the MTU in your Bosch deployment manifest to account for the overhead in the GRE tunnels. And this is much the same thing. We're adding another tunnel on top of that. Yeah. If we have data fragmentation that leads to all kinds of bad things in the environment, some of the Warden containers can no longer talk on the network when the packets are fragmented. So as a good Intel employee, I'd be remiss for not mentioning that there are some benefits to deploying on an Intel-based infrastructure. All the current versions of Strong Swan, OpenSSL, Linux kernels, they'll automatically leverage AESNI chipset instructions which offload all this encryption work to the hardware of the CPU itself. And a completely non-biased setup and test that Intel performed. You can achieve up to a 400% throughput increase by using a hardware-enabled architecture versus just a non-enabled software solution. There is some impact to supportability with this. Once we've turned on IPsec and all these firewall rules and all this stuff, supportability becomes harder. Some of this is by design. The traffic is now encrypted and encapsulated, so it's harder to inspect. That was our end goal. We don't want people looking at the traffic. That includes us. So when we're trying to diagnose a problem, we can no longer just do a TCP dump, look at what's going on. It's all encrypted. It's gibberish. Yeah, so we stepped on our feet there. But we also need to establish additional monitoring. We need to make sure IPsec tunnels and services are running at all times. If those fail, so does our network or our platform. And we also need to make sure we start our IPsec jobs and our firewall rules in the right order. We want those tunnels and firewall rules in place before all of our CF jobs start up and start communicating. It's especially important for the DEA servers. The wardens that are spun up the containers create their own IP tables rules. And we want our rules to be in place and processed before those rules are created. So this solution isn't going to catch everything. This is only going to solve the problem between the servers we set up IPsec on. That is our own Cloud Foundry servers. If we're making a connection to an external server, anything outside of our platform, like, for instance, if we're logging to an external syslog server, we still have to make sure we're using a secure protocol like TLS to do that. Otherwise, it will be non-encrypted. Developers should design their apps to enforce HTTPS. You know, they should look at the X-forwarded proto header in their app and make sure their app is requiring HTTPS. If you're using something like LDAP, which we do for our authentication, make sure you're using LDAPS. Otherwise, you've defeated the whole purpose of this. All your domain passwords will be flamed around in clear text. Next. So, call to action. And this is for all of us as community members, contributors to the code, supporters of Cloud Foundry. You know, there's other ways we can, other ways and things we can do to extend Cloud Foundry and secure it. You know, we could be better at documenting the ports and protocols used within the platform. A lot of these are hard to discover. They're not clearly documented. Are they secure? Are they not? It's hard to find out, especially for a new adopter of Cloud Foundry. And that's what we wanted people to adopt this thing, right? The more the better. We could natively encrypt all the endpoint traffic. Then we wouldn't need IPsec anymore. That would be awesome. If we could turn all this off, that would be great. We could do other things. Mostly targeted enterprises like us or any other big corporation. We could do things like make the staging process extensible. Allowing us to insert things into the workflow, like real-time dynamic code scanning. So, a developer deploys their code. Upon deployment, it goes out there. It's scanned for vulnerabilities, common exploits, that sort of thing, SQL injections. Before it actually gets deployed, we have real-time scanning. Or even to inspect the build packs. Who knows what's in some of these build packs. If you're allowing the developers to specify their own build packs, they could point to any URL out there on the internet. And who knows what code they're running on your platform. Maybe you just deployed the best botnet out there, Cloud Foundry Botnet. And one of the things, this is actually one of our security requirements, is that we need to be able to isolate the Cloud Foundry API from the endpoint that services the deployed applications. We have an internal community of developers targeting our environment and deploying applications for external customers to look at. People out on the internet to see these apps. But our security group doesn't want those people on the internet to be able to hit the Cloud Foundry API. They don't want to be able to target our platform and scale and things like that. But at present, we can't do that. We can obfuscate that through the host headers. We can make it seem like you can't get to it. But a savvy hacker could still discover the host header for our deployment and target it. But something like the ability to register the API route to a different TCP port would be one way to address that. We just wouldn't forward that port on our external load balancer and it would isolate the API. But those are just some examples of things we can do to enhance security in the platform. I'm sure there's a thousand more. And that's what all you guys are here to help with, including us. So whatever we come up with, we should contribute back. And next, how am I doing for time? Any questions? So to summarize, he wants to know if I can quantify what the overhead was as far as impact and performance of turning on all this encryption, IPsec, and then with the A of A and SI, AESNI, how long should we get back? So I don't know. We didn't do any actual measurements of the impact. So I don't have numbers that I can quote. I can say that it was noticeable when we first turned it on. Oh, that's a little slower. And then we got to look and we noticed that our platform itself, our OpenStack environment, hadn't been configured to actually enable the AESNI extensions through to the VMs that were being run. So we had to reconfigure that in our OpenStack to enable that, you know, the forwarding of the AESNI extensions. And that actually was faster. It was something we could just notice just no day-to-day use. We don't have time means to back that up, but it was noticeable. So, yeah. So the question is, what is the IP tables approach due to elasticity? There are things you're going to have to consider. It's not as easy to scale out, obviously, because now you have to implement all these new IP tables and IP sec rules for that matter. We, in our case, have built most of this into a Bosch release. We've made our own custom stem cell with a lot of this pre-installed in there. We didn't start from scratch with the stem cell. We took the one that one of the default stem cells. We extracted it out, added strong so on, installed some more own things, repackaged it up, and then made a Bosch release to deploy IP sec and all these rules. So in our case, we just changed the number of servers we want and redeploy with Bosch, and it's all scaled. So that's why when we did it in a Bosch release, that's all then comprehended in there, because all the IPs are in our manifest, and we define all those when we do our release. If you didn't do a Bosch release, it would be a lot harder. You'd have to manually go out there and turn all those on. The question is, have we made our Bosch release available for the community to see? We haven't yet. Quite frankly, we didn't know how many people would be interested in implementing IP sec and doing all this work because it's some serious overhead, and it's not for the faint of heart. It took us a long time to get this configured and working right between all of our servers. It was something our security team said, you're not deploying without that. I don't know how many other people have that. Is everybody security psycho like that? I don't know. I assume there are some others. So that is something we can explore. We can be happy to see what parts we can make available, and definitely maybe write some white papers or blogs or something to get into more detail in how we did this. Yes, sir. Oh, I got you. Okay. So you're asking about the Wharton containers themselves because they're assigned dynamic ports when they're deployed, how did we account for that? We didn't actually need to. We're securing the traffic between the DEA servers themselves and the rest of the platform, and the Wharton containers that you're talking about, they're all handled by AP tables rules within the DEA itself. So before that traffic, it's a hard thing to describe. Aaron, do you have a way to summarize how that works? Yeah. Are you just asking about the AP tables rules? We have to use port ranges to open up. So for example, there's a wide range of ports that Wharton container might open, so we just open that entire range. So we're just trying to block access to well-known ports. We don't want people to be trying to hack things that we don't want to expose. Good answer. Good question too. Yes, sir. I don't think they're fully comfortable with that. I think that they know that right now we've done the best we can given what we have. We're always looking to evolve and improve what we have. So no, I think there's always room for improvement, and that is possible. Do you have any? I don't know the offhand I have any just yet, but I'm sure eventually our security team is going to think about that and go, hey, you guys have to do this, and they won't have a solution for us. They just tell us we have to, and we'll figure it out. Yes, sir. So did we do anything to secure Bosch itself? We haven't specifically targeted Bosch yet, and mostly because there's not a lot of secure information going back and forth between our Bosch systems. If there were customer data, user names, passwords, and things like that, we're not using LDAP to authenticate for Bosch or anything like that. So it's not a top priority for us, but still, if Bosch were compromised, that is still they could destroy everything. Yeah. Yeah, for sure. Yeah, right. So when I talked about natively encrypting the endpoints in the platform, I actually meant all of the endpoints between the Cloud Foundry components themselves, like the NATS communication bus, if that were natively encrypted, or the API calls between the Cloud Foundry server and the health manager or something like that. If all of that was already encrypted, we wouldn't need to secure this. But what's an example? Well, I think at present, between the UAA server and the login server, they actually, in your deployment manifest, you can specify to use HTTPS and give it a certificate in your manifest. Yeah. Yeah. But yeah, if we could do that, the problem would be solved. Anybody else? Oh, yes, sir. Amon, it actually starts them in the order that they're listed in, don't they? Well, let's see. The jobs are processed in the order that we list them in the manifest as well. Right. So the job is looking through all of our manifest properties for all the IP addresses and all that stuff we specified in there. And it does create the connection rules to all the different servers that it needs to talk to. The specific, we have roles defined for like a NAT server has these specific IPsec rules. And it generates all of that and pushes that configuration to that server and then starts the services. So there is one particular place, I think, what you're referring to, like the monot job. So ideally, we'd be able to say in the monot job, there's a dependency of the DEA component, for example, on the IPsec job. And so to do that, we have to modify the file that CF creates, right? So there is an issue there, but we can do that through some sort of configuration management on top of the releases, right? Because it's not extensible in the release itself. Anybody else? Anybody going to rush home and try this? No? Oh, you guys are? Let me know how that works. No, seriously, if anybody does have interest in trying any of this, we'd be happy to help in any way we can. Ideally, we'd like to see some of the changes pushed back into Cloud Foundry itself, so that we don't have to take these kinds of steps to secure the platform. And I think we're getting there. We see improvements all the time. The code for Cloud Foundry, for those of you who have been with it for a few years, know that it's getting better all the time. It used to be just to deploy it and figure it out would take you months. Now a new user can come in and within a week have a platform running pretty easily. Once they figure out Bosch, there's the hard part. But if that was it, I guess I'll hang out up here for any questions anybody wants to ask me.