 Ladies and gentlemen, please welcome our next speaker. This is Milan Brozhen, Andre Kozina, who will say something about the future of disk encryption. Thank you for your introduction. I would like to say that I am the maintainer of disk encryption looks in Linux. I will explain what it is in the next slides. Andre Kozina will talk something about online encryption. He is the maintainer of the same just in a reddit enterprise Linux. So, let's start. Just a short overview of what I will be talking about. This talk is quite high levels. I will not go into technical details. Ask me later or send mail to mailing list. But it's a collection of reasons why we need some extension disk encryption schema. I see some people which are really requesting this extension like Natanjali. Thank you for coming. We will just mention why we don't do that and what exactly it's needed. It's not dictate. It's just what we are working on. So, any discussion later on mailing list is welcome. So, first I will do some introduction. What is disk encryption in general? What is looks? Then very slight touch of cryptography. Only why we need to change something in future. Then some view of looks too. We have some proof of concept implementation. So, it's not only theory. But it's still in really proof of concept phase. Then Ondra will talk something about online encryption. Why we need that. Again, it's proof of concept. So, it's something not in product stable state. But we have already some code. And last, but not least, I promised something in abstract about user survey. We will see if we have time. But something quite different but still on the big. So, let's start. Disk encryption generally is encryption of block device or imagine disk, row disk partition and it works on sector level. So, you basically present new plain text device while your row disk contains just safer text device. I am talking here just about so called software full disk encryption. It means that the real encryption process is starting on the main CPU. So, for example, in laptop on your CPU, I will completely ignore the various other formats like self-encrypting drives or encryption chipset. There was more talks about that. But for now, we will just stay with software disk encryption. So, that's definition. What is important that this type of encryption is basically transparent for file system so you can use any file system about that. It can be used for Ibernation, file swap file and so on. So, it's just some layer just above the row disk. I would just like to mention two key terms here. I will operate with them. So, what is volume key and what is passphrase? Volume key here in that context is the real encryption key. So, the real encryption key, the sectors are encrypted on. So, if an attacker gets the volume key, the game is over. He can decrypt the device without any key management. He's just in the same. Passphrase is something used for unlocking the device. We can, of course, a new device but passphrase is usually something you enter when you are booting and system is decrypted. There is some magic between the deriving volume key from passphrase. I was talking about that but just let's say that there are key differences between these two. In Linux, full disk encryption is implemented inside. Kernel by DMCrypt module. That's basically kernel module using the device map infrastructure. And the control of that module is done by Cripsetup. That's user space, Cripsetup utility, control utility for that. And Linux is basically just very simple key management scheme implemented inside Cripsetup library completely in user space. So, if we are talking about DMCrypt, DMCrypt has no idea what the looks is. I will just shortly mention what the looks is and maybe short mention of a story. In 2004, Yena South wrote the DMCrypt kernel module. Basically, it's the same as today, just missing some extended functions later. And very simple Cripsetup utility. It was really simple just to allow it to configure in some more user-friendly way. But that time it was that volume key was directly derived from passphrase. Basically, you take your enter passphrase, run some hashing, and the output is directly volume key. That's today called plain Cript, or something still available in Cripsetup, but we don't want to use that way of key derivation. So, in 2005 and so as part of the dissertation thesis on Vienna Technical University, Clemens Kroger invented looks. The dissertation thesis was about new ways in hard disk encryption. And as implementation of that, he needed to use some standard for storing the key in some key hierarchy. And he found that there is no such standard so he defined it himself. And since that time we are basically using just small changes. Small changes were always security related. So the format is still compatible. Since 2009, I think, I took over of maintaining looks and Cripsetup because Clemens lost interest in that. So since that time probably all bugs in that package are mine. So thank you for that. If you are reporting it. From 2012 we have redefined Cripsetup API. I added some loop IS and two Cripsupport so you can use it for another full disk encryption system. It's not a replacement but you can use that without installing some party systems. So that's just a story. And now I will talk something about common use cases and why we want to move forward. The first use case is usually the one, looks was designed for local encrypted disk, usually in your laptop or external disk. And you just want to enter a passphrase, unlocate. Looks cannot do anything more than entering passphrase. So you want to do something more. There is some wrapper, user space wrapper and so on. Corporate notebooks is something similar. Just usually you need to implement some, I call here on demand recovery. It means that there is either a special key slot or some other mechanism to key escrow, store keys in some environment, enterprise environment like Active Directory because in enterprise or corporate environment you want to administrator to unlock your device if you forgot passphrase, if you left company, something like that. So there are some ways how to do that but definitely not optimal. And what's it's more important, especially important today is the use case for data center disks. It doesn't make sense if you think, okay, I am in safe data center, I have no problem with storing the disk but if you think that disk encryption can be also used for secure data disposal, actually you can destroy your data just destroying the key and not wiping the whole device, it's starting to make sense. So if you are replacing disk, you can replace the disk if the disk doesn't contain the key, basically the data are still safe. So that's the major use case and what we want to do in the data center mainly is to somehow implement automatic unlocking because you definitely don't want to unlock hundreds of disks in some way. So that's key missing feature there and some interface for that. The third major use for this encryption is mobile devices, maybe you see that Android 6 requires mandatory encryption. They are using DM creep, maybe some patched DM creep but they are usually not using looks metadata. So I just mentioned in here but it's huge segment where disk encryption is currently used today. Very quickly, I would like people to see how we should think about the security features and how to think about the application of that. So we should probably always think about the environment and what we need to cover there. Usually it's done by some that modeling, that analysis, very simple idea how to do that for example here. We have some configuration data on disk. We want to prevent leaking this data so our threat is stolen disk with the confidential data encounter measures here to prevent it is using strong encryption with random key and long enough key and also if we are unlocking this passphrase the effective dictionary prevention or resistance also attacker still has the cost to do brute force. So basically we are covering even if the attacker has the disk and running brute force it should be so slow with strong enough password he shouldn't have statistically chance to guess the password. Another key points I would like to mention looks provides only confidentiality. Basically full disk encryption provides only confidentiality. You cannot, because it's mapping one to one you have Cypher text and plain text device the same size so you cannot effectively provide integrity, integrity of data. You have no space to store integrity protection. So what it means is that attacker can, if he has physical access to this he can play with Cypher text and you have basically no way how to detect it with some exception. So that's first the problem for this encryption that's maybe more academic problem but I will mention it later. And the second problem is that full disk encryption is always designed only for unlocked power of device. So if you have mounted device in your system there are a lot of other threads and a lot of other ways how to get volume key directly from memory for example. So if we are thinking that the thread modeling must just define this properly. Sometimes it's problem, sometimes not. For example for a laptop if you have covered of laptop or someone steal it on the airport it works. If I have a laptop in some just power of mode I can probably hacking with fire wire or something I can probably get the key from the memory. So I will not go in detail here but that's important thing. So like touch of cryptography we have two major areas for full disk encryption where our cryptography strong algorithm is used. First is the key management so we are working how to get the key for encryption the second part is the real encryption engine so the real encryption of the disk sector. In the key management basically oversimplified it's about weak password. If you have strong enough password of entropy you don't need to apply any derivation, key derivation, something like that you have enough strong input for your key but that's not the case. Password are there, there will be there and we need just to work with weak password. I don't mean that weak passwords like one, two, three, four, I mean really strong password for people but still from cryptography point of view there is not enough entropy so we need to apply some counter measures here. That counter measure in Alux is so called password based key derivation functions basically it takes password plus some sort on the input on output is some intermediate key and you work with that in some key hierarchy that's not important here. Today the most used key derivation function is PBKDF2. There are problems with that function because it can be very successfully optimized on GPU system because it has very small memory footprint so we actually prove that by running ATX2 looks in some tests on university and the trait between the attacker's machines or parallel attacker's machine and the current user systems or laptops is so high that we just cannot balance the trait of anymore or at least next year. So we need to replace that function with something better resistant to these attacks or this speedup and the plan is to use algorithm which is already selected from password rushing competition which is just finished last year. The name of algorithm is Argon2 and it introduce some new ways what I want to mention it has some memory cost so you can say how many memory it will use, you can say how many parallel traits it's run and that's our counter measures to avoid that speedup on GPU. Memory cost on GPUs are very high so it works. I have a lot of academic papers that's part of my research around university so if you are interested just send me an email I cannot go into detail here but that's one of the problems in cryptography world. And other problems just short mentioned we have no simple way how to bind that to some tokens, TPM, hardware some smart cards and so on that's missing in looks you can do some wrapper but not standardized and it's related also to more factors authentication. Another part of the cryptography there so real sector encryption there are some problems too note with the block ciphers used there, block cipher we are quite safe in the post quantum crypto, what can happen we need to use the long enough key if you are using 256 key we should be still safe with even post quantum crypto but there are other problems we have inside sector we have to something called encryption mode again I cannot go to details but the major problem is currently use modes are that if you change one bit in plaintext you would expect that the whole sector size which is atomic unit encryption will randomly change as a whole. Currently it's not the case usually only the one slice one size of encryption block encryption what is usually 16 bytes change so you can localize change in plaintext if you have two snapshots of cipher text before and after change. Note that it's problem in all cases but I would say we should do something in that at least in future so one idea is to use some white mode that's exactly defined that block size is the same as a sector so it's defined unfortunately most of the white modes are patented I know about some modes which are not but that's we need to test later and what is missing there we would like to test some support for authenticated encryption it means adding integrity data for this encryption I already mentioned the problem sometimes we call poor authentication what we have currently it means that you are only that you are able to change in your data you have some plaintext, attacker will change ciphertext you have no way to detect it so the only way you detect it that you have some plaintext you have some garbage there that's usually not the case so we really need either authenticated encryption or some adding integrity it's not so easy at all I'm saying that but and volume key change that's for online encryption so very shortly that was two from my point of view should be and how we play with that so motivation we have confidential data on disk and these data will be there probably for decades I have already disks formatted on prelux CRIPS setup so 10 years so we need to think about providing security even in that time frame so either we need to provide secure enough algorithms today or we need to provide way to increase the encryption to some better standards later so I would like to provide both so what we have today and online encryption just to possible re-encrypt the device later so that's called security hardening at least here I already mentioned key duration integrity support that's questionable but at least it's interesting for academic research so that's another part of it and online re-encryption Ondra we will talking about in few minutes the second part is from the user perspective and it's even more important today we don't have any interface to bind volume key or the passphrase to some token to some remote key or something like that and there are a lot of request for that so we are trying to do something directly possible to link to metadata so for example you will have key slot which is just password only you can have key slot which is bound to TPM so part of the passphrase is stored in trusty platform module or we can have some key stored remotely either by some protocol or just directly to SSH that's the goal of looks from this point of view there are metadata redundancy looks very simple so if there is some data correction you have no backup just game over you cannot recover even for the visible metadata the second part of that problem is that you can even detect that there is corruption there is no checksum that's probably design mistake in the looks design so we would like to fix that what I would like to mention everything I will say just experiment it's not fixed as standard it's just plan so what I would like the new disk format for looks on this will look like we would need to some format which will allow add features without any binary structures change on this so something modular I would like to use some kind of abstraction for the key slot is the way how to store key and now let's say what it will use one key slot can use token one key slot can use just passphrase one key slot can be just some special hidden key slot for administrator something like that from the user point of view we would like to provide way how to online upgrade from looks one format it's not always possible but we already have that implemented proof of concept code so definitely it's possible it just limit some area of extension but it's possible so looks to target mainly such enterprise users which want to connect that encrypted disk to some other environment typically corporate one looks one will stay here forever probably I don't want to touch looks one code it's very stable very proven code so it will stay here if you don't need to change you can stay with looks one forever at least if there is no really bad security problem so just very very quick view how it's look in the proof of concept code we have instead one header we have two headers that header contains of two parts versus very small binary part which is intended for block ID block ID check so block ID can scan for UID for some basic parameters like labor subsystem and so on so there is no real need to complex parsing block ID as you probably know UDEV depends on block ID scanning so I need to find UID of device to bind it to some system and rest of the header is basically buffer for JSON format I have selected JSON because it's very simple format it's able to store attributes and I don't understand I just take JSON object and copy it even if I have no idea what's there question if it's good idea or not maybe someone may find some problem but for me it appears to work so that's the configuration we enter here and of course there is check sum, there is some recovery so we have two headers we can implement simple journal so there is power fridling during the header write we have either old header or the new header so we can recover in that situation there are still some problems but at least user experience will be better very quickly this is a real working example of JSON no need to read that I just want to mention some abstraction here the first abstraction here is defining of key slot object key slot object is as I said here basically how a key is stored and encrypted so it can be the same way as it looks in one just stored in some binary area it can be for example just information on how to get key through ELEEP SSH actually I have example just to prove of that interface works that I will download the directly key from remote server so outside of looks you set up public key to get to some server and it will unlock the device just by downloading by second channel there it's fuel lines of code still it should work it works actually it's something called segment it's a description of area on disk where the user data live so it's similar to LVM in normal situation there is only one segment but Ondra will need more segments if we are running this re-encryption so you have some old key area, new key area something between that what is important the segment abstraction allows me to define new type of segment for disk integrity data so that allows me later at some extension without modifying the whole header processing area is not so important here is basically the same segment just for internal use of flux and digest is exactly the same how it looks one it's the definition way how to check that the candidate key is correct without any parsing of user data it means that you enter a passphrase it will run that magic rule key you have some candidate key and now you need to check if that key is correct or not and that's way how to do that we are using again the derivation but it's not important again abstraction allows me to change it in the future to something better okay so I will hand over to Ondra and we'll talk about online range thanks okay so let me start with a brief review of what's our encryption and why would anyone want to encrypt his block device perhaps the the most obvious reason is that we have quite a different data lifetime and quite a different algorithm lifetime so for example you may want to store your data for decades I don't know maybe 50 years or more and there's algorithm lifetime good old example with DES algorithm which was invented in 1975 standardized in 77 and okay it was I think late 80s when there was first concerns about the algorithm algorithm strange or weakness so okay so now I have data encrypted using DS algorithm and well sort of aware of the safety so what shall I do well the most obvious reason is the way to do it is to exchange the volume key aka encrypt the whole block device another good example why to encrypt is prevent your prevent access to the data using the lux header backup from the time when I had some knowledge about about the header so example I'm as an attacker I'm an ex-employee for some company I had knowledge of at least single passphrase for lux header so I had access to the volume key of the whole storage even if the future administrator exchange all passphrases for lux header the attacker still has access to the volume key so he can access the whole cypher text device so again if I exchange the volume key the old header backup became useless next example of why to encrypt is something we call snapshot reply attack it's a sort of complimentary to header backup attack then an attacker can create a snapshot of cypher text and even without knowledge of volume key he can play with the cypher text backup and do some nasty things good example would be let's say the attacker has an estimation of where ETC password file is stored on disk and he has a backup of a cypher text from the time he had wanted password in this ETC password file so he can replay the snapshot of cypher text over the actual cypher text and even without the knowledge of the volume key now he gained access to the system so that's the basic picture just to say you may be forced to perform a regular volume key exchange due to security policy ok, why to do it online there are basically only two major reasons, first is that the volume disk encryption can take really, really wrong time and it's not likely feasible to ask administrators to take down the data storage for the time of encryption and it starts to be even more tricky when you want to re-encrypt the storage that's your root file system because that basically is equal to shutting down the system and we already have offline encryption that works perfectly but we are forced somehow to bend the use case of offline re-encryption for re-encryption of root file system which had to be performed during the init run phase during the booting before mounting the file system so it was like you turned on your system and then you had to wait for the re-encryption of the whole disk and it was performed in an init arding in the output phase which in case something went wrong you as a system administrator had really really limited access to the tools to perform some recovery so usually such interruptions could end up really horrible so we are working on an online re-encryption tool that will perform it on a live mounted file system but what is from my point of view even more interesting is that we are able to store the metadata in a more resilient way originally the Cripsetup-re-encrypt utility offline re-encryption tool stored its metadata aside so typically in another file system now we are stored the metadata about the progression of the re-encryption inside the lax2 header format so what we can do if the re-encryption is for some reason gets interrupted the only thing that user will notice after a system restart or whatever is that the device activation takes slightly longer time so he just inputs the password and it's few perhaps even only milliseconds longer to before the activation is completed and it's thanks to the fact that the only thing we need to do after online re-encryption gets interrupted is to find clear boundary between new volume key and the odd volume key and when we find this boundary and do some verification we just set up mapping with two keys which is which is what this key was trying to describe this is your lax container and as you can see this yellow magical box is slowly moving from the beginning of the device to the end and underneath this yellow window we are performing the re-encryption and the other two segments marked with green color these segments are perfectly accessible for the file system or other users of the block device so we need to suspend only this particular chunk which is units of megabytes during the re-encryption and the rest is perfectly accessible for the file system so that's how the re-encryption will work this is really very simplified version of how it will work so if anyone is interested just approach me after the talk I can explain and I think that's pretty much for what I can share about the online re-encryption that's one question what happens if the system is interrupted during re-encryption when it is on the yellow box yeah, that's a one tricky question I will explain that after the talk because we have no time it's early magic Dan and we know about all the problems there but the schema is much more complicated than that on this picture okay, so I promised some some output from I promised some some user survey output so just we have few minutes I will go very quickly just have time for question answers maybe you noticed most of you are there was some strange mail last year in Memolist that was about it I decided that we have some thesis on university about approaching the user user friendly approach how to create this encryption initially it was about graphical graphical part of it but we decided because it's a specialization between faculty of social sciences faculty of informatics let's try social science so we create some questionnaire and my plan was to put it to several companies here in Brno specifically ones where I know they are forced to use very bad encryption product so I want some feedback unfortunately all companies rejected that because of various reasons except for that so we collected some some output from that it was exactly 141 responses it's not much but from the statistical you can do some conclusions from that the thesis is unfortunately not yet online because it's not yet defended but I just want to share some nice numbers here complete analysis in the thesis so for me what is interesting is that people mentioned that they don't use encryption even if we have company policy to use that no comment about 96% people believe that encryption increases security that was a question I forced to do that just for fun so 4% probably works for NSA I have no idea what is more important is from my point of view 20% of people said that the most of data of encrypting this at last once that's a huge amount of people and what worse you can do then in storage system then lose customers data so that's something we should think about just from that number of people 59% of them lost data forever I just have to mention that part of that is that there was real data corruption and they don't have backup the second problem is they didn't even try to recover because the data on the disk was just not so important imagine you have some rip of DVDs you can of course buy that again so something like that 18% of them suffer discorrection of encrypting disk in the real problem so we should probably work somehow better to provide some backup solution and lead user to create some something like that that's for me as for developer 62 people use backups 62% of people use backups just nice number another important number for me just one percent of people said that they noticed and it's problem of slow down of this encryption there's a lot of flame about encryption speed and so on that means for me that the performance of this encryption is probably not in the desktop segment there is no problem probably we have problem probably in data center in real system and we can work with that because in data center there are real powerful systems so we can use parallelization appropriate to that hardware so that's very short overview I will probably prepare better numbers but just for fun just conclusion very quickly I already said we need I think we need both looks one stable format living forever and looks two for extension and work with that I would like to work from the academy point of view on integrating new strong cryptography to increase security but really very conservative way so don't push something new on you and later there will appear some attacks and we will just know you can use that if you want but it never will be default until it's really proof stable if we can do and some conclusion about user friendly way just my message here is as storage developers and engineers we really should think about users how they use that system if it's comfortable for them and provide proper interfaces just not push on them some complicated systems and even system which can lose their data so that's all from me and we have probably two minutes two minutes for question and answers so a lot of them so let's start looks two format is a JSON data so it doesn't be possible to atomically update data inside the clock so yeah yeah I will just repeat the question the question is that JSON just forces us to use non-atomic updates because it's really really big data yes I must update a lot of sectors there is actually fixed buffer for the JSON format on the look so during the format you say that how long the buffer is so non-atomic update is so that if there is fail I can so I write the JSON data then I write atomically check some of the data so until that it's not correct and if there is problem I have still correct meta-lita so I can work atomically using that journal like access yeah it's already implemented so yeah yeah that's we don't we can store size of the looks container there but actually you just need to resize block device underneath you should re-encrypt or just rest of the data but you can resize the header is not on the device in the beginning for some reasons unfortunately I think we are out of time can we continue or just there is nothing there after the talk can I continue? okay so just okay let's I will just wait until people move to another we want to discuss we will continue with that maybe you should go there because there is continuation of that so thank you I I I I I I I I I I I I I I