 Okay, so I guess it's a good time to get started. We have a pretty tight schedule. I'd like to thank you all for joining this session, showing interest in Linux and encryption and in performance. So those are the three topics I'm gonna be talking today to you. My name is Daniel Soldo. I'm a performance analyst at IBM R&D in Germany and the southern part near Stuttgart. We have one of the largest IBM R&Ds outside of the US. So basically to know a bit more about myself, just short introduction, I'm actually coming from a little country in Europe called Croatia from the city of Split. Four years ago, I did my studies there and let's say four years ago I made a jump into the southern part of Germany in a little town called Bublingen. The main reason and probably the only reason to do that was the IBM R&D in Bublingen. So as mentioned, it is the largest IBM R&D outside of the States. And besides other things, it is also the home of Linux on the IBM Z platform. So this is basically the stuff I'm doing there, combination of Linux on Z, performance evaluation and crypto, and let's say the performance evaluation of the crypto stack on Linux on Z. And Linux on Z, IBM Z is basically the mainframe. So I guess it's a pretty known name during all these years. This is a picture of the first mainframe released in the 60s, so it's a pretty, pretty old platform, but it is still very much in use and currently it looks something like this. So it's a very modern platform, it's a scale-up platform, so you can imagine it as a very, very big server. The current version goes up to 170 CPUs, each CPU running at a rate from 5.2 gigahertz. So it's kind of a really powerful machine. Most of the airlines today, most of the insurance companies today, most of the banks today are still relying on this machine. We have Linux running on it and it is a growing business, the interest is there. So I'm gonna see the future should be interesting to work on it on. But today's talk is motivated by something called data breaches and the last year or two, we're seeing data breaches getting quite a lot traction also in the media. So we don't have a month without seeing a big name, without seeing a big name, a big famous company having a data breach there. The interesting fact about data breaches is that it isn't always someone else who suffers from them, it might also happen to your company, it might also happen to some partner company you're working with. And there are some interesting facts we found out in some of the recent reports we had. And with the most shocking ones are maybe that an average data breach costs around $4 million. So it is a serious trend which should be taken seriously as well. What I found interesting about all these data breaches in the last, I guess in the last six years, only 4% of the data which breached was encrypted. So everyone knows about encryption but we're still not doing it enough. And Europe, we had last year a interesting regulation called GDPR, General Data Protection Regulation. And in one part of GDPR, they state that companies may face fines up to 20 million euro or 4% of their annual turnover, whichever is bigger. It still doesn't mean that it can be taken as simple as it is stated because there are a couple of measures all the companies can do to avoid paying such big fines such as reporting the data breaches as soon as possible, such as giving their customers some other benefits to avoid big costs for their customers. And last year, so GDPR came around May last year. So one year after, we had around 56 million euros in fines. Interesting fact about that is that the biggest fine was around 50 million euros. So the rest of the fines during the whole year, there have been around six million euros which have been distributed on different companies. So the rest of the fines haven't been that big but the general opinion is that the regulatory body of governments are still warming up. So the bigger fines are still going to show up. A few examples, this is pretty fresh news. We saw the ICO from the United Kingdom. They showed that they stated their intention to find British airways for around above 180 million pounds and Marriott International with a fine of around 99 million pounds. So there are some serious fines there and we gotta keep track what's gonna happen with those companies. But yeah, the situation is getting serious. One interesting thing there is that the data breach fines, they may sound really serious. They may sound big, but that's only the tip of the iceberg. There is a nice study from Deloitte. They showed that it's only the tip of the iceberg and that the real costs are actually hidden at the moment the real costs will hit the companies in the years to come. So for example, they identify the value of lost contract revenues during the next five years or for example, the brand devaluation is going to happen during the next couple of years. So they identified those costs to a company as a much more serious than defines. Anyway, they are ways to protect yourself and there's a site called GDPR Associates and you may find it online. They prescribe a number of methods how to protect your businesses from data breaches in an efficient way. And maybe the number one way to protect the data is encryption and crypting it and having a backup policies. This is for most of the businesses, the number one step to go. So maybe some other examples are updating security policies doing regular risk assessments. But I would like to point out this point, staff training and awareness. So there are a number of reports recently which showed that and I guess in the first quarter of this year around 70% of the data breaches were caused by employees having their passwords saved in plain text files. So raising the awareness and providing some more education to the employees is of great importance to play the data protection game. So let's get back to encryption. It's been identified as the number one way to protect yourself. And what about encryption? Why should we actually do it? I tried to divide it maybe in two groups. One of them is to prevent the data breaches and we saw a number of data breaches which happened and they have been serious only because of when the breach happened, the data was exposed directly. It wasn't encrypted. So for example, Equifax, they had a huge data breach once the attacker was inside, he just found huge databases full of plain text data. So it was like just take the data without any consequences without any obstacles. So encryption is a very simple mechanism to maybe not to implement but for a lot of people to understand but it's a very powerful in terms of it helps very quickly to reduce the attack surface. And the other side of the story is big enterprises, they do business with a lot of, in a lot of different industries and all these industries have some regulatory bodies coming up with regulations which they should follow. For example, the payment card industries regulations, the GDPR as mentioned before or the Health Information Privacy Act. So to be compliant with all these regulations, a lot of them request the big companies to have encryption mechanisms implemented. So by having encryption implemented, it is a very quick way to be compliant to a lot of these regulations. And there's this big question, why are we still not doing it? And maybe one of the first answers there is performance. When we talk about performance, the second thing everyone thinks about is IT costs. Okay, it's nice and simple, I will implement it but then I will have my IT costs, let's say raised by 30%. The IT folks, they are faced every time and in a lot of discussions with executives how to reduce the IT costs and then they come up with some crazy idea like encrypting everything and then the IT cost jumps 30%. So no one likes to hear that. Missing skills as both a technical reason and a organizational reason. So find the right people and educate. No one, I mean, we all have a lot of stuff to do and we lack some time to do the proper education but yeah, that's a time management issue all the companies have to face. It's also interesting to see that there are countries which block encryption of data. For example, HTTPS, there are countries which block HTTPS and if they not manage to block it completely, they try to in a few different ways they try to degrade the performance of HTTPS. So they are a couple of also political reasons why encryption isn't implemented in some cases. But the focus of this talk is actually on the performance of encryption in Linux and how did it evolve and what actually happened in the recent time. Recently, there's been with visible a focus of a few vendors to show that low-cost encryption is actually possible. So marketing charts are often getting charts like this. For example, from the Intel platform using the ISNI instruction which would improve the IS performance on the Intel platform. They claim that there's less than 1% overhead to encrypt 100 gig of data using the ISNI instruction. IBM, for example, with the IBM Z platform we had with the recent machine, there's been a aggressive marketing campaign called promoting a feature called pervasive encryption. So the focus has been all the time, okay, let's shift from selectively encrypting only critical data to pervasively encrypt everything we have in our systems. And to do that, we had to do some hardware optimization. We had to improve the actual infrastructure we had there. So there have been a couple of hardware optimizations which led to significant performance improvements. So we saw in some cases, we saw up to 18 times faster encryption or in some other cases up to 93% lower cost of encrypting. So just by switching off, just by upgrading the machine to the latest one. And yes, while talking about encryption in Linux, we can separate the whole encryption on three different groups. So the first one being data in flight or data in transit encryption. Maybe the most known, most used algorithm standard, there is TLS, which is actually in Linux implemented in the OpenSSL package. There are a couple of other examples, but a couple of my next slides are gonna be talking mostly about the TLS standard. Data trust, when talking about data trust encryption in Linux, mostly the discussions start with Lux and DM Crypt and everything revolving around them. Data trust encryption, it's interesting because we may talk about different levels of encryption. They're about different granularities. So we can talk about file system encryption, for example, or let's say database encryption. So it's just the difference, the granularity. Data news, it's a quiet, trendy topic. It's been, we saw on the first day of this open source summit, we saw at the keynote that this week, there has been this, how is it called? Confidential computing consortium presented and it's actually handling this issue. There are a number of vendors, a number of companies who are joining forces and we see a lot of movement in that area. So these are some examples of technologies already available there, but be sure that now in the time of cloud computing and the time of multi-tenancy, this is a very important aspect of encryption and the focus is getting and getting bigger, I guess. About TLS, Google has an interesting transparency report and this transparency report, we may find that since 2015, there's been a great increase of usage of TLS. This chart shows only the usage at the Google Chrome browser and it's kind of interesting to see, okay, right now we're mostly above 90% of TLS coverage on all the websites on the internet, but it's also a bit scary to see that only four years ago, every second webpage wasn't using TLS at all. So an improvement has been made and yeah, it should probably not get worse. What happened last year? We had August 2018, we had the new TLS standard, TLS 1.3, it's been an update almost after 10 years of the TLS 1.2 standard. So finally we got an update and regarding cipher suites, we got only five cipher suites now. This is very, it was kind of shocking, but it's very interesting to see that TLS 1.3 is limiting us to the usage only of five cipher suites. It is quite different compared to the TLS 1.2, which had I guess almost over 20, 25 cipher suites to select from. The important news there is that we're limited only to use authenticated encryption ciphers for the symmetric part. So it basically means from all of the IS modes you may only use cipher suites which use IS GCN mode. And the other one is Cha Cha 20 Poly 1305. There are also some new crypto algorithms as mentioned Cha Cha 20 is there. And also interesting the key agreement method X25519, I'm gonna be talking later a bit more about it because it's using a elliptic curve named curve 25519 which brings in a bit more performance. And yes, and the key exchange part it restricts us only to ephemeral Diffie-Hellman. And the TLS 1.2 it's been possible to use Diffie-Hellman not ephemeral. So it basically means right now we have the key generation be mandatory. So every new handshake, every new TLS connection is generating the key from the beginning. So in a way it gives us the perfect forward secrecy. TLS 1.3 promises superior privacy, security and performance regarding performance. They simply by reducing the size of extra data and the handshakes they managed to get some of the performance improvements using these three different Cypher suites. I have some, the guys from Wolf SSL they made some measurements publicly available. So just by switching from TLS 1.2 to TLS 1.3 they got this performance improvement between six and 15% using these different Cypher suites. Elliptic curve cryptography as probably well known RSA was before was probably the most used asymmetric crypto algorithm. But in the recent years RSA suffered a couple of attacks and there are known vulnerabilities. So elliptic curve crypto is basically the way to go when talking about asymmetric encryption. And at this link you'll find an interesting comparison of the popularity of different curves which are available. We see that the P256 curve is still the most used and the most popular. Very interesting article. The author he tries to compare these curves to popular singers today. So for example he says that this curve is like Beyonce and this curve is like Cardi B. So it's kind of fun if someone's interested in elliptic curves. But we see the rising share of the curve 25519. And this curve is basically set as the default in the TLS 1.3 standard. And yes, we have some performance numbers here as well. And it's maybe important to note that these performance numbers one should take them carefully and they basically highly depend also on the application which is implementing them. These numbers are an example from a Nimbus Jose project using JSON web tokens. And they saw like 14 times better performance using this curve compared to the P256. Also, the curve is known as to have better performance as the prime curve. But another argument why someone should use it, well you may find that quite often online people like to use it because it's not a NIST approved curve. So they connect NIST with NSA and the government. So let's keep the crypto away from them. So I mentioned already the TLS 1.3 and forces authenticated encryption. So basically means moving all these cipher suites using ISCBC mode are moving to GCN. And some data from Firefox state that we already have the vast majority of websites using ISGCM mode. So around 88% of all the websites currently are already using GCN. Maybe more reasons why to switch to GCN. Different vendors have also noticed that GCN is the way to go. There are hardware instructions, hardware improvements in that direction. And this is, for example, a comparison of three different Intel processors comparing the throughput with GCN against CBC. And in all of the cases, GCN does have a lot better performance than CBC, maybe up to three times in the final case. At IBM with the Z14, as mentioned before, there was a great strategy in the direction of motivating people to encrypt everything. So the hardware improvements we introduced there when comparing the ISCY for modes, you may see GCN down here showing a improvement of around 12 times compared to the former machine. So the focus has really been set on improving the performance of the hardware there. ISXDS mode has also tremendously improved the performance. The point with ISXDS is that it's basically selected as the mode of encryption to go while doing data at rest encryption. So XDS is the preferred mode for data at rest encryption, data in flight encryption, GCN. The focus is clear to see. Intel also has some nice charts showing a lot of improvement in the hashing performance. Also CBC and GCN showing solid improvement to previous generations. When talking about DM crypt, it basically serves us as a good way to enable transparent end-to-end volume encryption. So one of the major benefits of using DM crypt is basically the transparency. What does it mean? It's a transparency in these terms means basically that the developers don't have to change any application code. So you don't have to implement encryption inside of your application. Your application can remain intact and they will write data encrypted at the disk. So the good thing about DM crypt as well is that it's basically a device-mapper target. So you can combine it with other device-mapper targets such as logical volumes, such as multipath devices and so on. And LUX, as you probably have seen before and know about it, LUX is, I guess everyone has it already on his laptop. So while booting the laptop, you'll probably provide a LUX password because it is widely spread and widely used by desktop, by server and by Android smartphones as well. So the Android disk encryption is based on LUX and DM crypt. So it may have different formats. The most popular one is LUX because it basically integrates a header in your encrypted volume and in that header it will be saving your encryption key. So if you're not using it, if you're using the plain mode, you will have to think of a safe place where you could store your encryption keys. These are some numbers I made comparing the latest IBM Z14 compared to the previous one. And to be honest, I was hoping to see the performance gap between no encryption and having the encryption turned on. I was hoping to see it very close, maybe below 10%. But in this case, I got around 20% overhead having the encryption turned on compared to the no encryption case. The third case is by using protected keys. Protected keys is basically a mechanism of protecting the encryption keys by encrypting them with a master key. So yeah, we may talk about it offline if someone is interested. Right now let's just focus on the clear key bar here. So I got around 20% degradation of my throughput just by turning encryption on. Important to note here is that in this case, the workload I had, it was direct IO and it was synchronous. So having synchronous IO basically means that every IO request is waiting for the previous one to finish so it can start. So direct IO meaning that I'm not using any means of caching. So I'm avoiding any kind of page cache in Linux. So in this case, I'm actually having one request encrypted and getting it down to the disk so I can then send the next one. So it's far away from a realistic use case. It's maybe, one could say that it's possibly the worst case scenario. But for us, it was kind of important to see how does it look like. And what happened is that by the end of 2017, we already had that Linux kernel 4.12. And with the 4.12 Linux kernel, it has been possible to extend the sector size at the device mapper level. So by having that in December 2017, we finally got the Lux2 support and Lux2 made it possible to use the 4K sector size in the encrypt. And just by turning that on, the encryption overhead also in this case got to about 8%. So this is basically what happened only in the Linux world. So this example is on the IBM Z14 platform, but it isn't bound to the platform itself. It is already shown on a conference 2016, I guess. It's already shown on a ARM Cortex processor. The guys, they didn't have that Linux 4.12 kernel at the moment, they just hacked the kernel, they had it that time and had to patch the kernel by themselves so they could use the 4K sector sizes. And already in this case, they got an improvement, almost a factor too. So the same effect seen on the IBM platform here shown on the ARM platform. So it's platform independent and could be tried on any platform. What also happened is in the beginning of this year, Google presented Adiantum as a new cipher for data at rest encryption. And the cipher is basically optimized for embedded devices, for IoT devices, for smartphones. And they claim it provides five times better performance than the usual XDS mode used for data at rest encryption. So it's based on Chacha 20. Chacha 20 having the maximum amount of 20 cycles, they used 12, so we could claim they were using Chacha 12 cycles. I'll try to go quickly over this. So just to give a few examples that hardware acceleration is really used first of all to offload CPU cycles from your main CPU and secondly to get better performance in encryption operations. So for example, the Z14 has co-processors on each core in the machine, you have a different, you have another little processor just specified for the crypto operations. Another thing are PCI express cards which are used first of all as HSMs and then of course also to accelerate mostly asymmetric crypto operations. Intel has quick assist technology which is also an interesting example. It's also a PCI card and it gives improvement in crypto. It gives also improvement in compression. Compression is very commonly combined with encryption because of performance reasons of course. So to be in time I go over a couple of future topics which are gonna be interesting in the next couple of years of course. Data and use protection, we already saw it fresh this week that there are big things happening and big things are gonna be happening I suppose in the next months, years. Cloud HSMs, HSMs are a topic for themselves. It's just a piece of hardware where you can safely store your master keys, your most valuable pieces of data. So in these cloud environments, customers want to have safe mechanisms how to store their keys so that their cloud service provider can't even know them. Quantum resistant crypto, quantum computers are emerging, they're coming so people got together and trying to get some new algorithms which could resist also the quantum computers. And polymorphic crypto is also a very nice topic which is going to happen in the next time but the guys doing polymorphic crypto I guess they're hoping a lot to see the presentations of quantum computers so because with current computers they're lacking performance to do it how they would want. Quickly to just recap the key takeaways. So it's important to stay up to date with crypto and security news. So we have it every month, there's new stuff coming out just improving performance and getting new functionalities so keep up with the news. Another thing is we should raise awareness on the education level. We saw that a lot of data breaches are made because of lacking education and awareness on these topics. And performance evaluation of your workload it's also very important because different types of workload do perform differently by use of encryption or not. It is a good thing in every company to start doing the evaluation by themselves. And again, a lot of vendors of hardware they see the requirements in the encryption space. So there are a lot of improvements and as customers the best thing we can do is just try to use them and try to exploit them. And yes, and that should help you in your data protection journey. So thanks a lot for your attention. I guess we have maybe a minute or two for additional questions if there are some. Yes please. Right, right. So I'll repeat the question quickly. What are the tools I use to measure the performance to see the differences? I used, I guess, just two simple tools available at the Linux community. The one is a flexible IO tester, FIO. So I guess everyone in the Linux world who does some kind of IO evaluation already knows about a tool. But FIO and SAR, SADC, SISTAT, those are maybe some, this is one tool which may be used to just follow the system performance but I used it to get some more CPU utilization numbers and so on. Yes, side effect. So the question was regarding boot time impact. Yeah, that's a good one. But I guess most of the bigger systems are not, I mean, that could be an interesting question regarding containerized workload or virtual machines. But the focus was, in my case, was mostly on the bigger kinds of systems, let's say databases, bigger instances which don't really do a lot of rebooting in, if not necessary. So yes, but regarding boot performance it could be maybe, and yeah, but encryption isn't used that much in that step. Maybe secure boot processing is being done but it doesn't take a lot of the time in the whole boot process in both boot sequence. Yeah, another question, I'll say. Right, right, right. So the question was, do I have any experience with applications which would reach the encrypted data and what would be the effect? So I did some evaluation with a Postgres database. So having a Postgres database mounted above the encrypted volumes, both transactional, both data, both log, and I had a generator of requests aside, HammerDB, and it has been hammering the database all the time and I saw around 5% throughput degradation by having encryption implemented at all layers. So it was kind of interesting because this kind of benchmark is a microbenchmark. It's synthesized only to measure this particular part of performance. But having a database, for example, and then it's like more realistic end-to-end scenario. So in this case I saw around 5% degradation, so to say. So yeah, I'll wrap it up and conclude. Thanks a lot for your attention and have a great weekend. And yeah, before that, a great rest of the conference. So thanks a lot. Thank you. Thank you.