 Hello, everyone. Thanks for coming here. And next talk will be about something new added to CRIPSETUP or the CRIPSETUP tools, because there are several tools. But I should mention the idea behind the integrity protection was actually academic idea in the first place. We do some or I do some research for PhD study, but because I'm CRIPSETUP maintainer, I would like in the end something practical. And actually it seems that we have something practical before the publication of some papers. So the talk is what you can do with that and maybe play with that and maybe find it useful. So the talk here is just for some introduction I tried to do with some pictures, not with mathematics or some complicated we will see if it works. Then I will describe what is new device map or integrity module that is already in upstream kernel so we can play with that. What is the authenticated encryption for DMCRIP? That's another extension. And also we will maybe shortly talk about LUX version too, but I have only two slides in the end because LUX is really about key management and I will be talking about the real encryption. But I promised it in abstract so let's try to do that. So just very short introduction what is full disk encryption? Full disk encryption is actually encryption on the lowest layer of the stack, storage stack, so on the disk sector level. And usually it's transparent so you can put it in the chain of encryption of the sectors. So we have some limitations there but also the implementations usually quite simpler than implementation for example in file systems. As I said the encryption is on sector level so we are limited by the sector size, sectors are always accessed independently so you cannot have any interdependence between the sectors as the block device operates. Full disk encryption initially and still is mainly used for data address protection. So it means it's designed to protect your data when devices switch offline and usually the threat model I will mention later is that if someone stores your device the data are still safe. So it provides confidentiality but by the design it doesn't provide data integrity and data integrity protection. What I mean by that we will see there but the reason for it is that we have so-called rank present, rank present encryption. So the ciphertext, the size of the sector is the same as the plaintext and we have actually no space to store any integrity information or authenticated information. That's one of the problem I am trying to solve here. So let's start with the pictures just maybe explanation. It looks like glitch art but it's not glitch art. It's really generated using DM clip in kernel and clip setup. If you see some talks about encryption you will usually see the encrypted tools with ECB mode. I will start with that as well but we have some other pictures trying to show what it's about. So technically it's bitmap image because bitmap doesn't have any integrity checks so it will allow me to display even the ciphertext. So for the start just simple text it's our plaintext and the correct encryption should look like something like this. Just say the random noise you shouldn't see any patterns. Actually this is the encryption with ISXT mode with initialization vector with sector number. So usually all of most of the encryption systems today on various operation systems are running in this mode. So this is how it should look like. This one is actually probably recognize some pattern if you use electronic code book. Electronic code book is the mode that shouldn't be never used for the basic encryption. In principle is that you will just split the plaintext in blocks of encryption size here in 16 bytes and around the encryption without any tweaking. So if there is some plaintext there will be some ciphertext and of course in the image there are some patterns. That's very known and you shouldn't do that but unfortunately there is millions of drives using chip encryption that are running exactly this way. So it's not just academic fun someone in some big corporation actually use that for big series of drives. Whatever the second picture is something I would like to demonstrate that it's not only mode. It's also that tweaking value you really want to have the same plaintext encrypted differently on different place of the disk. And if you see on this picture I use actually XTS mode so some quite secure mode. We will talk about some problems with XTS later but here I just use constant initialization vector. That's the tweak value for the sector because it's constant and the internal operation of XTS you can see the pattern just in something different. So not only encryption algorithm encryption mode but also the tweaking value and later in authenticated encryption it's called nonce is important to security of encryption. And I would like to show you what is data integrity here on the picture. What I play is here is that I take that ciphertext that's the image of the disk and just implanted some picture there are just wiped some bite with some constant value. So I'm playing with the disk without the key just rewriting the ciphertext and then keep user to decrypt it in the plaintext form. You can clearly see that the parts I rewrite were decrypted as some pseudo random noise. But also you can see that I implanted some additional information there the picture of some pilot flag or something like that. It's quite stupid example here but I would like to say that it can be more sophisticated. Someone just can take for example hidden disk if you know that true key concept and implant in your encrypted device and then later prove that your device actually contains some data you have no idea about. And because it's like preserving encryption your system have no way how to recognize that because there are no checksums there are no authenticated encryption nothing. Maybe it's not problem but all solution is based on the threat model that just thinks it's problem. So threat model just very very simplified view. The usual encryption this encryption threat model is designed for a stolen device so someone stores your laptop your disk data are gone they are never used again. So the confidentiality is perfectly fine. He cannot decrypt the data and your data is safe so it's only hardware. But if the device retains original original owner you can have problem because there can be some solid data corruption. So random data corruption actually you have no idea how to detect you should detect it on the upper layer. But as you see with that picture it's not always the case. But also as I said attacker can implant some external data there. And if you think that this situation is just academic exercise just remember how many times your device travels separately from you. And how many how many times you are forced to put laptop in some baggage and so on. So yes I know there can be the hardware attacks you and some killer girl can can be installed. But this is what I mentioned is just manipulation with the software. So it's not excuse we shouldn't take care about some some data integrity here actually. And another the art another trade-offs I would like to mention just because we are trying to maybe fix them as well. First one that actually you would expect with this encryption that on every right to the sector the sector on the disk will change so the randomly. So I think I can always detect with the snapshot that the sector was changed. That's actually the same with the disk drive you see the sectors which sector I change it. But you shouldn't be able to locate the change with a larger granularity than the sector size. That's not the truth is all modern most of the modern this encryption because the operation of the XTS mode and similar modes. And the second problem is because there is no additional randomized information for the sector. The same plain text written to the same place on the disk will always produce the same ciphertext. So I think I can actually detect that you are writing the same data in cryptography. We would play with some Oracle model. I would say it's a choosing plain text attack. But let's not talk about cryptography. I will show it on picture. Just one thing here. There are also replay attacks. It means that someone is able to take all the valid data and put it back on the disk by the wall disk or just part of it. And you are not able to recognize that. Solving this problem is much more problematic than others. Because for solving this problem you need to have additional trusted store to store something like counter last modified date or the hash of the mercury. Mercury is three of the hashes and the top level hash is stored somewhere. Chromebooks use it for system partition for example. We are not solving that because our requirement is that we work just with standard drives and don't have any ability to store it. Just warning in advance. So I promised some pictures. So there is another one. Actually it's not encryption. I just revert the following block. So I would like to demonstrate what is the encryption block granularity. So on the left side you can see the is block with 16 bytes. So if you imagine that ECB encryption pattern that's the reason why we see this pattern there. On the right side is the 4 kilobyte block disk sector with the same way. So that took something around 16 kilobytes of data. So you can see how it propagates here. What is more interesting if I apply it to the real situation. So I would have the same encrypted disk and I as user just run some operation. I just selected something very simple. So I overwrite every 64 byte sector in plaintext. And then I run the difference between the old ciphertext or the disk and the new ciphertext. You would expect that every single so the randomly change because 64 byte is inside each sector on the disk. The reality is what you can see on the left. Actually the X-taste mode has for performance reasons some different structure and it encrypts the data internally in parallel similar to ECB mode. And you can clearly see that I can locate the change up to the is block. The white spaces are unchanged areas. It's it can be problem. It cannot be problem. Sometimes it called a leak by X-taste mode in some articles or analysis. I don't think it's problem but I would like to do something with that if it's possible. Just when I generate this picture I noted that there is some even unchanged part of the blocks. I was too lazy. I just was overwriting the disk with the constant value. In this case I just overwrite the same value. What was already there? So of course it didn't change just if you are interested why there is some pattern even inside that inside a change pattern. So what we can do with that? One solution is that we will somehow randomize that to big value for the sector, randomize the initialization vector. Actually I tried that. It's implemented in Kernel now and then you can see what happens. So expected output everything so the randomly changing. But there are some problems with that we will see later. So what's missing? If I would like to provide some data integrity protection on this level, so block level, I would have to use instead of length preserving encryption I need to use authenticated encryption if you want to be graphically safe or authenticate encryption with additional data. That's that variation there. The site protection granularity problem can be solved by using randomized IV like I showed you it can or can be used by use so-called white encryption modes. White encryption modes means that the block of the mode is the same like the sector so it will work as expected. But there are two problems with that. It's still length preserving encryption and the second problem most of the modes that are performance okay are patented and we cannot use them. So I am not playing with that but maybe one day there will be more we can actually use. There is e me to mode but still it's not sure if it's patented free or not. Anyway this the last problem as I mentioned if we would like to Cypher tech change even if we are writing the same data that can be solved again by randomization of initialization vector. All this problem together have one big problem and that's it requires some additional metadata per sector. You need to store that randomized initialization vector somewhere. You need to store authentication tag somewhere. So that's the problem major problem we are trying to solve. So before I start to describing how we did it did that just all requirements that was defined before we even started to do that. Of course we are not inventing the wheel there are some solutions. I would like to mention 3BSD solution for encryption combined with data integrity protection. This solution is nice but is limited to some sector sizes and even to some algorithms. The problem is that increase sector size there. I mentioned that sector size today is 4 kilobytes. So if I would use that sector size I would need to increase for the user space sector size to something bigger and it will be bigger than page size inside most of the system. So there are problems with that. I don't use it but I know about that and it's quite nice solution but we use something different. So our solution should be based on no special hardware. You should use just laptop as you want as you have it today. We would like to use commercial off the shelf disks. With focus on SSD you will see why because on the rotational disk it will work but it will have quite bad performance. What I would like to have even for the future is configurable per sector metadata. So if I decide that I will put not only data integrity detection checksum or authentication tag there but I can put there even the forward error correction so it will try to recover from that in the future maybe. So I would like to have it configurable so we can change algorithms, we can change the way I use that. I already mentioned that native sector size in the whole storage stack should remain the same and what is absolutely necessary is to have some protection to power failure so we don't lose the data. The data corruption must not occur. We are writing actually data and metadata together so this couple must be always keep in sync even if the system crash. So this means we have to implement some kind of journal or something like that to prevent this situation. And from the cryptography point of view the solution should be algorithm agnostic. Algorithms are changing specifically for authenticating encryption today. It's very, very, very appearing new algorithm so we should be anyhow tied to the specific algorithms. Of course it should be free. So that's the part of interaction and now what is the real part we implemented and what's actually already in code you can test. So the solution was that I tried to separate the crypto part and storage part. The storage part is the metadata handling so implementation of the per sector metadata. The crypto part should be implementation of the authenticated encryption inside the mCrypt. So we somehow split these two areas and the first one is DM Integrity new kernel module for basically emulating per sector metadata. If you know that enterprise storage drives with half kilobyte plus 8 bytes or 520 bytes sector size that's something similar just defined in software and with completely configurable interface so we can have whatever metadata per sector we want. And we implemented it using internal kernel feature that was already there that are actually using this enterprise disk and are able to... Inside one structure you have data and you have metadata integrated protected metadata and we just use that for our block level maintenance in kernel. Maybe I think we are misused that it works and doesn't require any touch of the block level subsystem inside Linux kernel so it was actually all the solution is embedded inside device mapper in kernel. So that's the first part non-cryptographic one. Later we added standalone mode that DM Integrity can actually calculate integrity data itself I will show you an example of that but it was not the first idea. The first idea was to implement authenticated encryption and so we have already DM-crypt as the provider for this encryption in kernel so the solution there was just added to authenticate encryption randomized initialization vector and the last part was just wrapper how to user-friendly use that without million parameters on command line and that's implementation-crypt setup using new on-disk format. So how it internally works. DM Integrity works somehow similar like DM Verity if you are familiar with that quote or something like that it will format the device the device structure is there is some super block with per-resistant parameters then there is data area and data areas is interleaved with data sectors and metadata sectors. Metadata sector can contain several metadata for the whole range of sectors and these areas are basically interleaved on the disk. The reason why I'm interleaving it here we can easily just increase the size of the device online just in some steps. If we have stored it separately it will be much complicated. Then there is a journal area that provides the journaling of the data journal can be switched off if you have journaling on the upper layer but the area is still on disk usually it's fraction of the device size something around 80 megabytes or something like that. Everything is configurable so you can play with the parameters how to format it. Just for information I will first show that stand-alone mode this is how many storage is used for the metadata so if we are stick with the non-cryptographic metadata just prevention to random bit flips or something like that it's under 1% of storage it's quite good. The problem is of course then later with performance. So stand-alone mode data integrity because there is no key there is no encryption it is rarely designed just for detecting random errors so it works that on reads it validates the checksum on writes it updates the checksum and that's actually all. We designed and I finally wrote it for Clip Setup tools and a special tool to configure the TM integrity in stand-alone mode the tool is called Clip Setup and can be used quite simply this is just example from some virtual machine so you will format the device you can specify the algorithm and all parameters I just use default here just there is one problem if you format it for integrity protection all integrity checksums there are just not initialized so every write on that not-yet-written device will return integrity failure of course because there is not yet initialize the parity or checksum so by default it tries to wipe the whole device with zeros wiping device to initialize integrity checksum because later you can have some problems you will probably appear once people are trying to use that because you can have integrity failure even if you are reading just part of the device because you have page cache in kernel and page cache can generate reads itself because it will increase the read to the page size and if some sector in that area is not yet initialized you will receive data read error integrity error where you will see why I am trying to prevent it but probably file system should be aware of accessing the disk in page sizes and should work it's not completely true but that's maybe it's nice tool to discover some mistakes in implementation of various system we will see later anyway it works the way you format it you open it and there is some status so you will see the tag size use of the use of the which algorithms and then you have some digital parameters that's actually just status of that and then you can use the device a normal logical volume or something like that but it's only protection to random data changes for authenticated encryption we have to use the encrypt on top of DM integrity so DM integrity provides metadata DM encryption provides authenticated encryption and authenticated encryption must So I define some very simple, I try to keep it really simple without any complicated design. So this is all authenticated encrypts, every sector will basically process it as this request. As you can see the additional AAD is additional authenticated data, that's our additional data that are authenticated but not encrypted. They are just authenticated and not stored anywhere. We must authenticate it always sector number because that's for sector misplacement. If you take one sector, move it on another area and disk, even with that initialization vector you should get integrity failure, that's not allowed operation. So that's why we are authenticated in sector. Then there is encryption area, we are authenticated initialization vector as well. It's actually not needed but it's according to some different IEEE standard and it's more safe that you will return authentication filer error and the encryption produced authentication tag, decryption just verifies this tag. So if there is any authentication error, you should receive the decrypt data, you will receive just authentication fail or integrity fail. So that was how to work this request and we need to use, for every request we need to use some randomized initialization vector, that's the tweak for the sector. There are several possible solutions. I just choose the simplest one, I'm just generating the whole random initialization vector from the random generator and that's one important thing. We have to do that way that the collision probability, so collision is, then you produce the same random number for two sectors, even in the history of the drive, it must not happen. So collision probability must be negligible from the cryptographic point of view. And as you can see in next slide, it's actually problem with today's algorithms in kernel and we have to do something with that. Just again I mentioned this solution doesn't prevent reply attacks. You can take part of the old device and with correct check sounds put in there, it will not detect them. That's not solution for this, we need separate trusty storage to implement that, so I ignored this problem. So what's the mention problem with authentication algorithms? I'm not saying the algorithms in kernel are bad. They are actually very good and they are designed for network encryption, so authenticated transmission on network where you have no problem to do some ricking after several packets or you have no problem with counter then just increases and there is no collision. But if you have disk, you have the whole disk and you are reading sector one and sector million that against sector one and you are actually running back in time, the network you cannot do that. And for this if I store the initialization vector on the disk itself, I just need to prevent collision there. So for that I need long enough nonce, so I would need to write a lot of data before the probability of the collision is not negligible. Unfortunately all the implementation of authenticated algorithms inside kernel specifically GCM use 96 bit nonce and with this there is a problem that after writing I think it depends on sector size but this 4 kilo sector size is I remember correctly it's over 100 petabytes of this overwritten data when you probability with some birthday paradox you will get the collision. The probability will be even worse and even there is some problems that you shouldn't use so many, so many rights with the GCM. So GCM is not usable specifically because if there is collision it not only breaks the integrity for that particular sector but there is no attack that will reveal part or the whole key what's the worst case it can happen. If you have the key you can do anything. So GCM in this state in kernel it's not usable. Unfortunately even the Chet Chet 20 if you like the Bernstein design modes has the same problem we are actually using it in some wrapper according to this RFC. It has again 96 bit nonce so it can be used but the probability of collision is so not perfect as we would like. So what's the possible solution? First solution is just don't use authenticated mode just use the length present with month and add some HMAC or keyed hash so just calculate the integrity separately. That's the first item here I will show you how to do that but the major problem with it is too slow and it will always be slow because if you want to calculate integrity separately for example with hashing HMAC you have to hash the whole sector and hashing of 4 kilobytes of data will always take some time. So it works it's safe but I would not basically use that as default but that's not end of the world. There is Cesar competition for cryptography authenticated algorithms and actually it's the final state and because we know there is the problem we try to implement some final list there and try to play with that inside the M-crypt and actually our student did that he's just defending the master thesis regarding today this week so he already implemented three of the Cesar finalists and performance with I guess that can use implementation and hardware acceleration looks very well so I would say that's the future of that but we'll see. So that's the reason why I'm in all this notes I mentioned that the authenticated encryption for disk inside looks is experimental feature. You can play with that you can use that but please think about it it's just try to implement some academic idea practically and maybe we are too fast. I was probably too practical so you can use that now and we have still not the crypto competition final so anyway how it looks practically looks to this integrity protection looks to here really just works as the storage new on disk storage that there are no any other new features used. So the first one parameters are for that separate HMAC calculation so length preserving modes and HMAC the second is for the native IID mode I use cha cha 20 with play authentication and you can see what happens if you just looks format it this looks to and you just need to specify cipher as in looks and you need to specify the integrity field and it will format it and then you are using it just as normal looks device there is no difference later if you activate it you can clearly see that there is some in the middle there is a stack device called test diff that's the DM integrity device providing metadata for the test device it's completely hidden in activation so only if you if you look how the system in configure you can see that but from the user point of view it's hidden if you see the status here there are two differences expect that it reports the proper ciphers for the separate HMAC authentication you can see the separate integrity key so it has separate independently generated key for the authenticated mode the key is actually just one and it's up to the encryption mode internally how to will deal with that encryption part of the key and so on so that's simpler otherwise is exactly the same so the always I showed it there is question how it fast and if it's usable the mind design goal was not to just design something in regard to performance we would like to design something that is secure and then try to optimize that anyway because I know that academic works usually are on paper and in reality differs we try to do something what is usable if it's usable it's up to you but there are some numbers this is picture from some paper that was rejected so I can use that unfortunately I cannot show you other pictures because I just tried to submit the paper again and there are strange academic restrictions so what you can see here is the field simulated load I try to simulate something very near to worst case and I am interleaving 70% of reads with 30% of rights so really mixing of 8 kilobyte size and just measuring what's happening here the black box is the underlying device so no encryption just the baseline the second one is the just DM integrity so CRC32 and the rest are various encryption mode so you can clearly see the problem is with the journal journal will drop the mixed of the read rights to the half of the throughput if we switch journal off so no journal the top two pictures on the top you can see that there is some differences between the algorithms the slowest one is of course the HMAC but it's not so bad so so the real problem with this system I should mention all the processing used hardware acceleration on Intel E7 here so we are using ISN we are using specific SSC instruction and so on so the problem is not with the algorithm problem is with that journal what to do with that it depends on situation maybe if we stuck some file system that is doing journaling on top we can switch the journal off but it depends on how it will be used so short summary before I move to very short introduction to looks to all I describe it is already available as a quote and not only on github it's already in a released version so the DM integrity and the encrypt extensions are in kernel since 4.12 the user space this lux version 2 is in creep setup to zero something I will do probably some minor releases but it's just entering distributions now Fedora will hit it and 28 version the beyond already have it unstable repository I see it for most of the architectures in that experimental state reason is that which we have to change the ABI so we bump basically surname of the library so it needs to integrate it in systems but it's on the way if you are playing with unstable distros you can just install it and play with that without additional patching of the source code the conclusion for the encryption here is that really for authenticated encryption we need to wait for the final list of crypto competition I don't expect that NIST or someone will provide something better there and then for the implementation inside the kernel but as I said we have already something already done so we will see how it goes but you can experiment with that once the thesis will be finally defended I was sent to the list link to the implementation of various authenticated algorithms so people can play with that it's already on github somewhere what I did mention on many of you probably are asking why the hell I am doing it or block level that was actually the motivation I was waiting for a file system real cryptographic file system in Linux with integrity protection for years there are some off trees but in my line tree there is no protection using authenticated encryption all the encryption coming today to the kernel X4 no discussions just last week or about XFS all is the same encryption length preservative encryption using x-taste mode so something we have in DM crypt 10 years ago and we have still today so my motivation that's the academic motivation is just try to motivate people we need to implement something for the future so yes it should be implemented on the high level there should be file systems using that but why not try to do it here and if you can see the initial idea was in 2015 we have January February of 2018 and we have in kernel so it can be done I know file systems are complicated complicated but yes it should be there I'm not discussing that so just one idea for the integrity my initial idea was just implement the integrity for existing device with the journal or that performance stuff but I would say in future with the persistent memory and no I mean the real persistent memory not that dim this battery or something like that faking just real persistent memory you are not limited but by any block size you can just define your block size so you can define your sector size with metadata and then you have atomic right without any integrity stuff and the DN crypt should work as well there with very small liar there so that was something that I would like to see in future but usually in storage world it's always different that people think so that's a conclusion for this part and if I have few minutes I will try just short introduction to looks version 2 but just big warning looks to was designed for several other reasons I am still behind this documenting that so there is just a very small documentation inside the tree but it's about key management all the oldest all was about a real encryption engine and the looks to is mainly about key management so it's something different just adding adding to this talk so looks to is actually additional format it's not replacement for looks one as I mentioned the last line looks one will be supported probably forever if you are happy with that it works it it's there and the reason I need new version is I need something where I can extensions I can store metadata maybe do some header redundancy but I am not touching the key cryptographic concept inside looks it's the same it's just on this format just is more generic but the principles are the same we will probably change some part of the in future but it really requires some security analysis and so on so for now the logic how the key slot are calculated our key derivation function was it's the same what I added to looks to is new metadata format I use Jason and if you are scared that we use Jason for something on disk actually it was a very good idea I several times need to change that during the development and it was much simpler a simpler than change any binary format we are using it very specific way so we are not we have no problem this partial update of Jason or else and something like that so that was decision I actually I think it was good decision here don't use binary format just metadata Jason format and what's actually new here is are going to key derivation function that would be for different talk but are going to is memory hard function so we are trying to implement something to prevent GPU GPU acceleration for our password cracking and so on so that's another feature and what would like to see in the last slide it's actually something what James was talking the talk before me using TPMs various taken tokens and so on I have just did one decision crypto will never ever link to any hardware provides directly at either TPM various tokens markups anything we are just not able to keep eye on all problems with that but we try to provide some new interface how you can how you can combine for example sealing the passphrase in your TPM or token and unlocking creep set up or looks to format that's actually called token concept token is just some Jason object so just metadata object inside header you can add there and that that object says how I can get the passphrase for particular key slot so instead of typing the passphrase on the on the command line the header contains information just look somewhere if there is passphrase and if there is unlock the device with that the first concept is internal key ring tokens so we are intensively using kernel key ring and the token basically says look to the kernel key ring if there is caring with this name try to use it to unlock looks if it works the device is unlocked automatically and everything was so it's up to the some external application to put the passphrase to the kernel key ring the second option is more sophisticated I am not sure if we will use in reality but maybe you can define your own token type so your own metadata we are providing way how you can start inside looks so you can define whatever you want and then you can operate is with but creep set up will just ignore this kind of tokens you have to implement library for activation devices and so on so for example TPM can use the first one if you have application or even the second one just someone need to write some external application working with that and just my last note to TPM it has it is trusted platform or do so you have to trust it of course I'm not so sure I'm not still convinced that this modules that are using embedded firmware can be trusted to some extent that I would like to use myself and there is a nice that I was actually sitting exactly in the same room literally in the same with people that found their car attack to TPMs and I asked one thing that was actually in that paper not not so visible how it possible that the manufacturer has possibility to just upgrade the firmware in that trusted platform module if you see that Chromebooks updates Microsoft Windows updates TPMs manufacturer can update firmware that they can implant whatever they want maybe some backdoor or something like that and if we don't have way how to check that that update key or something doesn't it's not leaking and it's a firmware block so yes use that if you trust the manufacturer if you are in component you have to trust your notebooks anyway but just think about that it still contains firmware that can be upgraded someone someone can play with that so it should be someone else responsibility if it's as problems and apparently in the local case you can see that there was big problem with factorization of numbers next time it can be with the manipulation with the keys anyway just my personal opinion to that okay that's that's all thank for attention and if you have any question if you have time for question anyway if you have any questions later just use the M-Crip mailing list so we can discuss it later