 Hello, my name is Mike Halcro and I'm here with Eric Biggers, my colleague, and we're going to present on FS Verity. Now we're just presenting on this, but this is actually an effort that's being contributed to by multiple people, including Ted So and Victor, who's also here from the Android team, and so forth. So we've started out with some design work about a year ago, and I'd like to say any good parts of the design came from Ted, any questionable parts I'll take the fall for, personally. And then code-wise, anything that's good comes from Eric, and anything questionable you see in the code, again, I'll take the fall for that. And so just by way of background, I originally wrote eCrypt FS, some of you may have heard of it, and I apologize every opportunity I get for having done eCrypt FS, I'm sorry. And I've since also done FSCrypt, which is together again with Ted and other contributors. That's the native encryption capability in EXT4 and F2FS. Anyone else who's interested in integrating FSCrypt, of course, always interested in having a chat. So we're going to talk a little bit about taking measurements of contents of storage, DM Verity, and then doing integrity and authenticity in the file system. Then we're going to introduce some of the work that we've been doing with FS Verity and what some of the use cases are that we have in mind. So really at the core of all this, it's about taking hashes. And we take measurements of contents of storage in order to establish identity. And a hash is a trapdoor one-way function, and what that means is we take a preimage, which can be of arbitrary length, and you run it through this hash algorithm, and you end up getting just a few dozen bytes. And those dozen bytes can reliably uniquely identify the thing that you're taking the hash of. What you can do then is you can do verification of the root of trust, against a root of trust, for that few dozen bytes that you generate from the hash algorithm. When you take a hash of an object, you measure the entire thing prior to taking further action with that thing. There can be significant latency on initial access of it, especially if it's a large object, and if the computational resources of the platform are limited. If you take an initial measurement of this whole thing, you can validate it before you even begin access to that thing, which is a really nice security property, and oftentimes a requirement given what your security adversarial model looks like. However, there's no further revalidation of the contents coming from storage after that initial measurement. And so if you have a malicious data source, such as a file server or a disk or a controller firmware, which has been attacked in some way, then you can play games through this man in the disk, so to speak, and make changes after the initial measurement. There are firmware attacks. You can look up things like equation drug and gray fish and so forth. This is something that's been around for quite some time. And it is something you have to worry about in several contexts. So an alternative way to taking a measurement of the entire object all at once is to use an authenticated dictionary structure. Now, there are several different structures that exist, and they have various pros and cons depending upon what your usage requirements are. But one of the common ones is a merkle tree. And with a merkle tree, what you do is you segment what it is that you're measuring into chunks. You measure the individual chunks, and then you take the measurements of the measurements all the way up to some top level. And so what this allows you to do is to take a partial measurement while ensuring comprehensive validation. Because all you need to do is do a signature against the top level of the authenticated dictionary structure. And after you access some subset of the object, you can say, well, at least the part that I've looked at has been validated against a digital signature, which is done on the top of the data structure. When you first access the object, the time it takes latency-wise is a logarithm of the object size, because you don't have to read the entire object before you begin access to it. The trade-off with this, of course, is that IO errors are possible while you're processing the file. So this allows for an attacker to be creative by injecting faults in the middle of the file that you're accessing, and thus having some arbitrary manipulation of the execution of the process by virtue of the fact that you've gone a certain amount of computation. You've generated a certain number of side effects, you've impacted the environment, maybe sent out some network packets or what have you, and then, bam, you get an IO error, and you can select where that IO error occurs. And so that is a concern when you're taking this particular approach. So how many people who have Android devices, when you turn on your phones, how many of you get something that looks like this? Okay, surprising few. This is what you see because when you disable Android verified boot, and so this is a feature of Android where we take the system image and we've generated an authenticated dictionary structure against this system image and pushed it out. And then we have some keys which are in the secure environment on the platform that are used to validate the root of this authenticated dictionary structure, or Merkle Tree. And if you're unable to validate the authenticity of the system partition, then we give this warning saying, hey, look, we don't know what's on your phone, all bets are off. This is accomplished with DM Verity. And so this has been around for a little while. DM Verity sits in between the block device and the file system. And it protects all file system content plus the metadata. So the user-generated content or the contents of the individual files or inodes, those are protected. But in addition to that, everything associated with that with the file system, like the block mapping, the DN trees, and so forth, all of that is also protected, which is a very nice property to have. And it's something in general I recommend you have if your environment allows you to do so. As we're going to discuss, not every environment is amenable to that level of protection. So if you have, for example, a mobile platform like Android, you have an ecosystem, you have partners, you have different regions, different devices, and so forth. Every incremental update to the system partition requires regenerating the entire auth tree. You'd somehow have to pack that together, all of these updates, with system image updates. And then when you deal with this Cartesian explosion of devices and updates and versions and so forth, you end up getting something that's very unwieldy. It's intractable to be able to keep all of that authenticated in the way that we'd like. So we start looking at the file system as a way to address this issue. And what we talk about is partial disk authentication, where selected parts of the file system that is primarily the file contents are validated using these authenticated dictionary structures. And so this facilitates, you know, incremental updates of arbitrary subsets of the file system, and significantly reduces the complexity and deployment. Of course, the trade-off for this is that the file system made of data is unauthenticated. And so there could be opportunities for people who are more creative than I am to find ways to manipulate the block mappings or the de-entries, directory structures, and so forth in order to trick applications to behave in ways that they weren't intended to behave. I'm going to go ahead and turn this over to Eric, who will talk about some of the more technical details of FS Varity. OK, so now we get to FS Varity. FS Varity is basically DM Varity for individual read-only files. It's implemented at the file system level. Specifically, it's part of the file system, but also most of the code is separated out into its own module that is shared by multiple file systems. And so far, we've implemented EXE4 and F2FS support, but it could be supported by other file systems, too, in the future. And FS Varity is not yet upstream, but we are working on it, and I sent out the first version of the kernel patch set a few days ago, and anyone is welcome to re-view it. So the contents of a file using FS Varity look like this. There's the original file contents, then potentially a bit of padding. Then the Merkle Tree. The size of the Merkle Tree depends on the settings used, but with the defaults for large files, it's approximately the original file size, divided by 129. Small files do have more overhead due to the padding. Then after the Merkle Tree, there's a small structure called the FS Varity does have a descriptor, which contains some additional metadata files, like the hash algorithm and the black size used in the Merkle Tree. And similar to DEM Varity, all the FS Varity metadata, including the Merkle Tree, is written by user space ahead of time, and the kernel only reads it. We've written a user space tool which sets up this metadata. The simplest command is just FS Varity set up, and then the file path. That appends the metadata to the file using the default settings, which are the SHA-256 hash with a 4K black size, and without any signature included. But there are options to change these settings if you want to. Also like DEM Varity, you set a black size, usually 4,096 bytes, and all the hashing for the Merkle Tree is done over blacks of that size. Also the root level of the tree, which is always only one block is stored first in the file, and the leaf level, which we call level zero, is stored last in the file. The leaf level is the largest level since it gives the hashes of all of the data blocks. The other important metadata fields stored on disk include the black size, hash algorithm and the original file size. There's also a way to store variable length fields. Those include the root hash of the Merkle Tree, optionally a salt to salt hashes, and also optionally a PKCS7 formatted digital signature of the file measurement. So it turns out that the Merkle Tree root hash by itself is not sufficient to reliably identify the file contents because of the other metadata fields like the hash algorithm and the black size and the original file size. And because of the padding that's needed at the end of the original file contents. So what we actually do is we hash the root hash together with these other metadata fields. And that gives us another hash value, which we call the file measurement. And the file measurement is what FSFARITY actually reports as the hash of the file. As I mentioned, the FSFARITY metadata is written by user space. Afterwards, there's an IACTL to actually enable FSFARITY on the file by setting the FSFARITY bit in the inod. The user space command to do this is FSFARITY enable. And once FSFARITY is enabled, the file becomes read only and the Merkle Tree and other metadata becomes invisible to user space. So user space will only see the original file contents. Also file systems are allowed to move the metadata to some or else if they want to, like into a file stream if the file system supports that, but for EXE4 and F2FS, we're just keeping it past the end of the file contents. So to implement FSFARITY when reading data, we hook the file system's read pages function, which is the function that reads data from a file into that file's page cache. If the file is an FSFARITY file, when a read from disk completes, but before unlocking the pages, we have the file system submit the pages to a work queue, which calls into the FSFARITY module to verify the hashes of the pages. To do that, for each page, FSFARITY reads any needed hash pages from the file and verifies the hashes starting from the root node of the Merkle Tree and descending to the appropriate leaf node. And finally, it verifies the hash of the data page. And as an optimization, each hash page is also cached in the page cache with a bit saying whether it's been verified or not. So since there are many hashes per hash page for usual IEO patterns, most data pages get verified without having to read or hash any additional hash pages. And we do not allow direct IEO on verity files, since that would bypass the verification. But we do support encrypted files. In other words, files that use both FSFARITY and FSFARITY simultaneously. So FSFARITY is essentially a way to measure or hash a file in constant time, subject to the caveat that the verification against that hash happens on demand as data is read. And applications will receive an IEO error if they try to read from any corrupted part of the file. And FSFARITY file measurements are available in the kernel, but are also exposed to user space via an IOctl. And with the user space utility, FSFARITY measure just calls that IOctl and prints out what the kernel returned. So there are various use cases that can be supported or enabled by the FSFARITY mechanism. The simplest is integrity only to detect accidental corruption. And for that, all you have to do is turn FSFARITY on. There's also the auditing use case where you log the file measurement before doing something with the file, like executing it. And finally, there's the authenticity or appraisal use case where you have a known good file measurement either from a digital signature or from somewhere else, and you validate that the actual file measurement matches the expected one. And that detects both accidental and malicious changes to file contents. There's been some confusion about the relationship between FSFARITY and IMA, the integrity measurement architecture, which already exists in the kernel. The difference is that FSFARITY is a lower level thing, basically a way to hash a file, whereas IMA is focused more on higher level things, like what is the policy for which files are measured and what should be done with the measurements. So in general, FSFARITY is not replacing IMA, and in fact, we're planning to have IMA support getting its measurements through FSFARITY on files that use it. So that should allow IMA users to take advantage of FSFARITY, for example, for IMA audit or for IMA appraisal. However, we have found that some users find FSFARITY to be useful on its own without IMA, so it will also be supported to use FSFARITY on its own and just do things with the file measurements in user space via the I-Octol I mentioned. And we're also considering supporting a mode where you can configure FSFARITY to enforce that all file measurements are signed by a certificate that has been loaded into the kernel. That mode is maybe still up for debate, since it does overlap with IMA, but we do have a user who is asking for it. So in the IMA use case, this is a representation of what IMA does today. Essentially, when IMA needs to measure a file, for example, because if an open or an exec, it just hashes the entire file, which causes a long latency before the operation can proceed. And this is what IMA is planned to look like when it's used on FSFARITY files and configured to support them. It's the same as before, except the actual hashing of the file contents is replaced with just asking the file system for the FSFARITY measurement, which allows operations to proceed without waiting for the whole file to be hashed. So if you're interested in FSFARITY, you can try out the kernel patches and go to space, review TEL body, and help review the patches, which as I mentioned, I sent out a few days ago, and they add support for FSFARITY to EXE4 and F2FS, and we've also been working on tests as part of the EXE4FS test file system testing suite. So this concludes our presentation, and thank you for your attention. Question. Can you talk about performance impacts? So the performance for smaller files is going to be a little more impactful because the amount of space that we take for the tree structure is one block, which right now is fixed to the page size, which from most architectures are 4K. And so you're going to wind up amplifying the IO as a result. However, you also get some characteristics of storage that help mitigate that, such as read ahead. The person who's best equipped to give fine details about this is sitting right here, Ted So, and I'm sure he'll give a little more context on that. So in general, it's going to be no worse than DM Verity for large files. DM Verity also has some significant performance impacts if you're using it on hard drives simply because you're seeking a lot to read the Merkle tree. In practice, the root of the Merkle tree will probably always be cached, so that's not a big deal. But some of the intermediate nodes and the leaf nodes don't get used as often. And so they very often will be pulled into disk. In practice, this has not really been a problem on flash devices because on flash devices, random reads are much less of an issue. I think we've been a little bit more worried about the CPU impact of actually doing the hash, especially on some of the lower end Android devices that just simply have a CPU that doesn't have a whole lot of oomph. That's right. Yeah, so I think it's Goldmont is the Intel architecture that has the SHA-2 acceleration, and so that helps. Since we are targeting mobile platforms, we do expect that there's going to be additional power consumption as a result of the frequent hashing, which we anticipate is going to be a benefit because these trusted APKs for these privileged applications on the platform, they're currently measured at the time that the APKs are installed and validated. But then after a device reboot, they're not revalidated, so anything could have happened to storage. We expect that's going to be a good trade-off. So you mentioned earlier that one of the problems with having verification be done rather than on the entire file, but on individual blocks is that you can have EIO at basically an arbitrary point in the file and screw it up. Is there any mitigation for that that you have, or did I miss it? It kind of feels like the nature of the beast. If you're going to delay verification to the point of access, then there are tricks you might be able to play if you want to try to detect that there has been an offline attack against your storage. It may not necessarily be something that will be at all helpful for a man in the desk or you have a compromised controller that can fake the results arbitrarily at the time of access. But when you bring the platform back up, we've imagined scenarios where perhaps you could have something in the background that gradually reads the files and does nothing with the reads except just reads the files. Then you're just in a race condition and you're not necessarily getting security guarantees. The other thing to note here is that for the initial use case, we're using it for privileged APK files, which basically means the class loader is going to be loading a class at a time. And so if there is a failure, it will probably be as you are loading a class and not at any random point in program execution. And so that mitigates that somewhat for that particular use case. It's very much user-beware. You need to be aware of the benefits and trade-offs of using complete measurement of the file prior to the start of any access at all versus validation after you've begun access. So with DMVarity, we added forward error correction to prevent random bit flips from preventing your system from running. Are you doing the same thing for FSVarity or considering something like that? We've considered it. We haven't implemented that. That's going to be for a future version, I think. But it definitely is something that you want to consider doing so you can mitigate the incidental bit flips as opposed to the malicious bit flips. So how do you exactly protect against a man on the disk? As far as I understood, you're still storing the Merkle tree on the disk, so the man on the disk could modify that while system boot up to correlate to the bad changes in the file. Are you using some kind of cryptography and making sure to store the keys to that signature of the disk, or how does that work? Yeah, that's a great question. And so it has to do with what is the profile for the entire platform? This is your secure boot process, and how are you utilizing things like secure elements to get key material into the proper locations? And so you have to consider that in your adversarial model, and you have to make sure that you get your keys from the trusted source, the secure element of the platform. Then what FSVarity is actually doing is we're creating something called the Neffes Verity Measurement, which isn't just the contents of the files themselves, but it's also the metadata, including the tree. And so the root of the tree together with the descriptor, which is the metadata that describes the characteristics of the FSVarity file, all of that is captured in the root measurement, which is the thing to be validated against the key that you get from the secure element. Yeah, I mean, the other thing to say here is that measurement, again, for this initial use case, will be digitally signed, and the key to verify that will be in the system image, which is protected by DMVarity, and the key for DMVarity is in the kernel, and the kernel is protected by the trusted bootloader, and so for that initial use case, that's the chain. Yeah, in the next session we have a panel on hardware root of trust type issues. So one question I had was whether you had thought about attestation of that appraisal that you perform. We haven't thought about attestation. In general, attestation terrifies me, but yes. So you're talking about linking with the TPM and doing PCR measurements based on. Yeah, do extending the measurement off to be validated. Right, binding to PCR measurements and so forth. That's not a requirement for the Android platform. It's an interesting use case, and we'll consider that. So I think we might be able to get support for that. And when we add IMAs, how part? Mimi's nodding over there. IMA. Any more questions? OK, thanks.