 Hello, everybody, and welcome. My name is Greg Marsden. I'm responsible for Linux kernel development at Oracle. We provide OS and kernel services for Oracle Cloud Infrastructure, upstream Linux feature development, upstream kernel packaging, and case-placed rebootless patching. A large part of that role has always been security, but never more prominently than this year. In this talk, I'm going to talk a little bit about various approaches to OS security from a design and process perspective and some of the lessons that come out of these experiences. As more services start to rely on containers for isolation, what are the ways to protect yourself and your data? For those of you listening who were involved in this past year of security vulnerabilities, that was an amazing accomplishment to get the patches done and shipped for each vulnerability on time. Thank you so much for sticking with that. I can't tell you how many times we thought we had a kernel ready to ship, only to hear the dreaded phase, stop the build. We need one more patch in there. As for the user community, security isn't new. It's just finally in the budget. Take this time to get good security policies put in place, ones that really guarantee that your systems will stay up to date. The other day, I got a call from a friend. We have a winner, he said, for our system with the longest uptime, 2,258 days without a reboot. On your kernel, we found it in a closet, 2,200 days. That's more than six years without a reboot. I'm still not sure whether to be proud of that accomplishment or not. I mean, I still feel a little glimmer of joy at a 200-day uptime, or a 300-day uptime. But on the other hand, it's now tempered a little bit by the knowledge of security exploits, vulnerabilities, and sympathy for whatever poor soul is going to have to someday upgrade that application to use system D. If there was one benefit to our year of security, it was rooting out boxes like this one, an artifact of a simpler time. Just try starting a server on any large cloud providers public IP block these days. Watch your log files. Even without a host name assigned, attacks are going to start to roll in, security probes, SSH login attempts, Apache post requests. It's not academic research. Security breaches are up. One way to handle that legacy OS problem is through containers. Containers are really popular right now. And for good reason, strict container usage provides process isolation, creates a natural sandbox environment should an attacker be successful. After all, they're still stuck in that container. And this makes for pretty decent security to an extent. Containers do share a lot with the host OS and share a lot with each other. It's one kernel running underneath all the containers. And a vulnerability that affects that container is a vulnerability in a kernel that's going to affect every container running on that system. Meanwhile, each container has its own user space runtime stack with their own sets of security updates. Containers will still be updated if the on-disk version of the library is updated, but only if the container is actively restarted. Until it restarts, it has the old version of all of those libraries. Applications, which are written to live in containers using a microservice architecture, those would actually be updated because they're easy to update. But if it's going to be a legacy type application, those real-world container workloads are going to have the same uptime and availability requirements as a classic data center service, which means that despite the advantage of deployment and packaging, container security isn't really that much better than the old model of application development. Containers also provide a perfect jumping off point for side-channel attacks, running on shared processor resources and sharing kernel and system services between each other. Side-channel attacks work against processes running on hyperthreads. And C-groups and namespaces don't necessarily give native protection against sharing processor resources with potentially antagonistic workloads. We are really excited about the promise of the Cata Containers project. Rather than running as processes and namespaces, Cata Containers take advantage of hardware virtualization for process isolation and using a stripped-down kernel that boots in seconds. These special containers use those hardware virtualization features to defend against cross-contamination with other containers and will be able to take advantage of KVM's defenses against side-channel attacks. That problem of one vulnerability showing up in every container is only magnified in the cloud world, where containers share one kernel across a computer, clouds end up sharing a hypervisor across the entire fleet. Clouds still have plenty of software diversity at the guest layer, but they enforce much stricter controls at the service and hypervisor layers. Clouds have managed to standardize, to some extent, the hypervisor and virtualization environments to provide an enforced consistency across that whole environment. An ideal cloud would have just one OS platform running all of its services, just one hypervisor version that's running in those services that's always kept fully up to date, fully patched, and can be patched for security at a moment's notice. Because when security vulnerabilities are fixed, they need to be fixed everywhere. Because having a homogeneous environment means a faster response time, and it means that a vulnerability that can be patched there will be patched on all the systems instantly. There's a saying for this. Put all your eggs in one basket and watch that basket. For our cloud, that means relying on rebootless live patching. Live patching is now fairly commonplace. It used to be magic, and the case-play team worked very hard to convince sys admins that they really could patch their kernels in real time without requiring reboots. Here's why you want to think about live patching solutions for your production workloads. Cloud service providers often can get early disclosures for security vulnerabilities, but end users really only found out when those public disclosures come out. So live patching is really the next best thing. Having a system that can actually respond and pivot to attacks in real time towards a self-healing OS. And the abilities of live patching have scaled farther than we ever could have imagined. For the latest variant of Spectre, we were actually the L1 terminal fault one. We were actually able to apply, in real time, a patch that ran more than 2,000 lines of kernel code across 50 kernel files in real time without rebooting. Now, that's how you keep a fleet secure. And why has patching been in the news? Well, the latest raft of security vulnerabilities come from a new class of exploit known as speculative execution side channel attacks. Like other side channel attacks, they take advantage of external, measurable characteristics of the processor, like power consumption or temperature. And in this case, it's timing the loads and stores precisely in the speculative execution pipeline and timing those results. It's not directly reading data, but the latest optimized proof of concepts have refined this to a data exfiltration tool. What slows that attack down is that each bit of data requires sometimes hundreds or thousands of attempts and comparisons. And the good news is that we don't actually know of many real-world exploits of this yet, probably because it's easier just to send out some spearfishing emails. But as people get smarter about their SOP security, we need to start thinking about some of these back-channel attacks as well. The bad news is that many of these speculation-based attacks are indistinguishable from correct processor operation and can be mitigated but not detected. There's been a lot of discussion about the vulnerability disclosures that came out even just a moment ago. Despite the very late notice, I think all the OS vendors were able to scramble and have those vulnerabilities patched at the same time. It was not a fun exercise. There's an inherent conflict to keeping embargoed security vulnerabilities private in an open-source community. But it's going to do more harm than good if those security vulnerabilities are exposed before they can be patched. Therefore, sometimes it is necessary to have secrets even in an open community. Those vulnerabilities don't stay secret forever. They do get published. Even the vulnerability code sometimes becomes open. Responsible disclosure gives the vendors a chance to fix the bugs and gives Linux developers a chance to fix those bugs. And I want to thank some of the researchers at TU Graz and many of the other ones who have ensured that the Linux community had the chance to patch those vulnerabilities in advance. Intel also has significantly improved their cross-vendor collaboration and helped to build a mechanism for responding to these types of issues across the entire Linux ecosystem. Even when everything goes right, security is a race against time or attackers. And time and again, patches have come out at the literal last possible minute. So what's next for security? We've already talked about the promise of Cata containers and the magic of live patching. In the side channel realm, work focuses on restricting available secrets within the kernel with proposals including looking at removing page table entries entirely from the host OS while it's running in a potentially untrusted context and only replacing that entry when it's flushed those secrets out. We also need to focus on traditional security and traditional attacks because where else is this innovation progressing? We're looking at developing runtime patches for G-Lib C. That already exists to some extent with CaseBlyce, but I think there's a lot more room to make that efficient and effective across thousands of active containers. Using SecComp to prevent containers from touching unpermitted system calls. Combined with SE Linux, we can provide a very good traditional security isolation model for containerized workloads at the penalty and there is a bit of a performance penalty, but we're working very hard to remove that. So how do we prevent the inevitable firefighting that comes when security vulnerabilities are revealed? By ensuring that our infrastructure is in a state to be patched quickly. Security comes at every layer in the stack from hardware to virtualization, OS to container, and of course at the application layer too. The most important thing we as developers can do is to make sure security is easy. After all, if applying security fixes is difficult or impacts end users, then most companies will delay weeks or months or longer before the scheduled critical downtime to patch the OS. Don't let that be your company. Insist on a regular maintenance cycle and applications that are designed with patching in mind because when a security issue is revealed, it can turn into a week or a weekend long patching nightmare. A combination of preparedness and rapid response can make the difference between security and outage. So that is the silver lining to this year of security vulnerabilities and it is that people are finally starting to take security seriously. We are finding those machines that have those 2,000 day uptimes and scheduling them for maintenance and modernization. We as an industry are building a robust infrastructure that can be patched quickly and safely. And I go into the next time that I hear someone say, hey, I've got just one more patch. We'll be able to pull it in. No problem. I hope all of you will too. Thank you very much.