 Hello, my name is John Holly. I work in the open-source programs office at VMware And today I want to talk to everyone about peeling back the layers of storage with a specific emphasis on how bits get from the hard storage That is where the bits actually exist on disk disk in some cases All the way up through and to finally get to it in like a text editor or your video editing software Or whatever you're trying to actually access those bits from databases for instance This is gonna have a very specific Focus on how Linux does it most other operating systems are gonna be roughly the same With some caveats on how they handle Certain layers But for the most part this is going to be a fairly broad overview of how storage works because we we I've already explored in My last couple of talks for the last couple of years end of tables and networking And I've had a number of people reach out over the over the last couple of years Just you know asking me about you know how this stuff actually worked from these presentations And I want to keep doing this one to keep diving in and sharing with everybody how this stuff actually works So without further without further ado storage it's I mean I Shouldn't have to put up this definition But I'm gonna put up this definition so that everybody's aware of what exactly we're talking about storage But it's literally just a place for storing things memory Yeah, it's Literally straight up very simple to understand Conceptually, but how storage actually works is actually incredibly complicated. So let's talk about that Let's start at the bottom. We're gonna work our way out because As you can see on the the right hand side here that diagram is very complicated on how Linux actually Handles all of the layers involved here and kudos to Thomas Cren for having this this diagram I literally could not do this diagram any better And unfortunately, this is actually a fairly old diagram Generated in 2017. So the the way The the latest kernels actually work is a little bit different not dramatically But it is a little bit different But the physical side of this is where there is a lot of black magic going on This is where we start talking about MMC SD sada sass NVMe USB Um And all those kinds of pieces we're then gonna move on to how things transport from the physical realm and event How they kind of cross into the software domain How a block device actually works inside of Linux? What a block device is? What device mapper LVM and kind of these soft partitioning kind of ideas or conglomeration kind of ideas VFS views and we're gonna wander off into some other topics like swap and Some other bits and pieces there that I want to make sure everybody's kind of aware of and looking at and One of the nice things about how the Linux storage system works Kind of like how the the networking subsystem works if you've seen my my previous talk Everything kind of be certain pieces can be stacked almost arbitrarily And this provides a way to mix and match How your storage actually works to what you wanted to actually do So I'm gonna get into this a little bit more But I want to keep people aware that outside of a couple of specific spots You can kind of stack stuff however you want you could put another block layer over the top of an actual file system you could put a Raid devices over the top of encrypted devices you can put Encrypted devices over the top of rate like you can kind of mix and match All of these kinds of things I'm gonna go into a little bit more detail on all of this But I want to keep everybody aware that the Linux Storage system is very versatile and very It is very flexible in how it can be put together and and mashed into into a working system So and Before I get too far The one thing that it always comes up when you start talking about storage on Linux is ZFS and ZFS is a really fascinating file system. It's actually goes beyond just being a file system But it has a couple of really Inherent issues that people need to be aware of before they start even playing with it And I'm not gonna cover a ZFS beyond just this slide Because it's fundamentally not in the Linux Source tree and the reason for that is that it's got a licensing issue between GPL and the CDDL. They can't Really cross you can't you can't cross these barriers And this leaves ZFS completely outside of the Linux kernel. It leaves it outside of the main Development model. It leaves it outside of a number of different pieces that just Complicate everything involved with it. This isn't to say that it doesn't work. It works. I have lots of Friends and people who I know who are using it to great success But this is a this is your giant As soon as you start playing with this you're kind of out on your own Bugs are going to be more common just because it doesn't have as many people running it or running it as often I mean if you compare the the runtime Numbers of ZFS versus EXT or XFS or the you know, these other file systems that are in the Linux kernel itself It's it's orders of magnitude difference in the number of power on ours a file system has been run And the more power on ours you have generally speaking the more bugs are going to have found and squashed so This is you know, you're welcome to read this it does a lot of different pieces It's beyond just a file system It takes care of a lot of the block layer because it has tiered storage Which is actually really really cool. You can sort of get to the same kind of tiered storage pieces using some of the layers inside of The rest of Linux kernel, but it's not as tightly integrated you can do Raid like things you can do a whole bunch of different pieces The other thing to keep in mind with ZFS is that it uses its entire own tool set It's a tool chain. So all the tools you're going to interact with ZFS are Exclusive to ZFS. They are not the normal tools you use with You know, the other file systems that are in Linux and that's because ZFS again exposes a different API up up the stack as opposed to The file systems that are in the Linux kernel. So I at least want to touch on this and Try and head off anybody asking about ZFS Before before they ask about it, but if you're interested in ZFS, that's a completely different beast and Kind of outside of the the scope of this talk. So One of the things that we need to get out of the way very early on as well is that the entire storage stack is built on lies and potentially very bad assumptions and this has been the case for a very long time and Some of this just comes from how drives expose themselves up to the operating system What lies the the drive theoretically has to make to work? Um all the way through how does a hardware raid card actually expose things? How does it describe things? Why does one hardware raid system explain things differently than a different hardware raid card? You know and that this gets further down into the weeds when you start talking about a real hardware raid card versus a Call them win raid cards, but they're basically software raid cards that have just enough bio support to fake being a raid card At boot time and then switch over to actually being a proper software raid after the boot and This extends not only to the blocks that are exposed on the device a lot of Drives even to this day despite the fact that 4k blocks have been pretty common for Probably 20 years now. Um, we'll still report 512 bite blocks and that's for compatibility reasons Um 4k blocks are bigger Provided some substantial performance improvements as well as Being able to handle bigger file systems and what not more efficiently Why you know why some drives are still reporting 512? It's all legacy support. It's what Operating sometimes expects. It's sometimes what? Other devices expect it's just it's just there and what happens is in the background The firmware on the the drive translates 512 grabs four of them I'm sorry eight of them Gets you to 4k and then mashes them together and then writes it Is this is this efficient? No, it causes a lot more Operations to take place as opposed to just the one big 4k block ATA devices if anybody is actually playing with ATA devices ATI is a subset of the SCSI Protocol and thus in Linux. They actually did away with the entire ATA subsystem and rolled it under as a subsystem or a subset of this the SCSI subsystem This is why if you were to plug in like a compact flash card or a Or a straight ATA device it's going to show up as a SCSI device as opposed to a an ATA device This simplified a lot of drivers has made things a lot easier, but Again, you hit Linux is Linux is actually the piece that's lying to you in this case as opposed to the the hardware hardware raid cards smr flash all kinds of things there will specifically lie to you about when something is on disk and That is because caching on all of these devices is a very complicated Argument and in a lot of cases The when you're waiting for something to actually be written to disk when you've synced it to disk You want it to actually be on the physical platters If you're using platters or physically in the the flash ring, however Sometimes because of how this works specifically things with like smr or even flash quad Layer flash for instance The right to disk may actually take a very long time And so what happens is is that the disk will lie on it's on disk when it's in some sort of a cache layer somewhere between where it's actual actual final resting places and What and where it is right and where it is right now so Keep this in mind if you're ever playing with stuff and you start seeing weird data loss. Sometimes it could just be weird caching issues um Hardware raid because it it's mashing things together a bunch of different drives together sometimes it literally just lies about everything um right cues also Vague suggestions from the hardware. Sometimes they don't even handle the right cues that they claim they handle Yeah and A block is not where you think it is so a most storage devices operate on a block idea You know that there's a block of data. You can think about it as a like a 4k block or 512 bite bite block Your hardware will report that that that block lives In some specific place on the disk But because of how the disk whether that's flash or spinning media or anything Works is where that's at might not actually be where it is And that's because behind the scenes the drive itself is moving stuff around, you know, let's say a Bad block comes up on spinning media or in flash What it'll do is it'll copy all of the data out with its checksum data to a new locator that block It will copy it out to somewhere else and it will set up an internal reference for where that block is so Even though you may believe that you are, you know, streaming a Contiguous stripe across an entire disk, which should be very very fast You may actually be seeking all over the place because you've got a bunch of bad blocks somewhere in that stream and You don't even know about it So keep that in mind where you think that block is is not necessarily where that block is Um and and that kind of goes into failures aren't real until there's been a lot of them Because of how much extra overhead both on spinning media and in flash these days Fail drive failures may be occurring when none are being reported So you may be losing blocks as you go, but they're not necessarily percolating up through smart Where you can actually see that the blocks are being Shuffled back and forth to other areas of the disk that may not be damaged Smart which if you've ever played with a disk drive Is it kind of a self-reporting tool back about what's going what the drive controller thinks actually going on on the controller Or on the disk itself all of the data in there is relatively accurate, but it's possible to Erase smart data if you know what you're doing or don't know what you're doing in some cases Um, and it's also possible as again with the failure rates that smart may be lying to you about what's going on. It's not explicitly um Required to be fully accurate and it may just not expose a lot of information that you may want Let's say you want the power on hours for a drive. It may just not record Um Flash in particular has some very odd Right-ended specifically erase Properties and an erase may not actually happen on a flash drive until an explicit trim command is issued If you try to erase things or if you just like DD zero across the entire drive You can actually burn through rights To the disk much faster than if you actually do a trim Something to keep in mind and a friend of mine asked me to include this about disk self-incryption a Lot of disks over the years have tried to do self disk encryption Never ever trust it if you actually want your disk to be encrypted You use whatever other Mechanism your operating system provides to do it because that is at least been I'm absolutely vetted in a number of different ways the on-disk encryption sometimes is Yeah, it if you're ever interested go and do some new searches for for for that and you'll kind of see where things are at on that Okay, so that's a lot of preamble to finally get to Physical storage. This is where the bits actually get Stored this is where their final resting place and where things get get pulled back from In the entire system this is going to be the most black box of the entire System just because There's No way to know what's actually going on inside of the desk the disk reports certain things that you expect But you don't know, you know, like I said before you don't know exactly where the the data is being written to on the disk You can't predict it Long ago and far away in ancient times of how disk controllers work Your your driver for the disk was literally you were you were actively Actuating the head on the drive to move the head around and to read data at certain times Long long gone are those days. We don't want to be there anymore. That's way too complicated That would just burn CPU cycles for no reason. This is why dedicated specific controllers exist But this is going to be this is definitely where you're going to find a lot of black box There's a lot of just unknowns about what are going on in here, but this is where your data physically lives So this could be Flash this could be spinning media. This could be tape. This could be You know as is pictured a floppy drive You know any of these kinds of things can and are physical storage locations But you can't actually directly talk to the as I just said you can't really directly talk to the physical storage location anymore So how do you act? How does software actually talk to this? mythical magical physical device and That's where the transport layer comes in and this is where you convert Kind of the software side of things into commands or information that the physical can understand and you get information passed back and forth so the the transport layer is what you mostly write drivers for on For storage directly there are a few odd and and in cases where this isn't exactly true, but this is Everything in storage is kind of a weird. There's always an exception I'm gonna try and gloss over those Um, and they these interfaces kind of get lumped particularly in Linux into to broader categories because things tend to sort of be like ATA, SCSI, SAS, SATA These all share Certain aspects of the overall SCSI protocol and thus they all get kind of lumped into being SCSI disks Is that strictly accurate? No, nobody for the most part is still running an ultra 320 low voltage SCSI drive but there are lots of people running SAS and SATA and Drives right now. There's a few ATI drives out there. And yes, there are still a few SCSI drives for various various things But that's one big a layer there and those drives tend to be Indicated if you don't do anything weird on on Linux as just, you know slash dev slash SD something SCSI drive or SCSI disk um If you have an NVMe drive, NVMe uses a completely different protocol Because it ditches a lot of the extraneous pieces that a PCI Express bus doesn't need for accessing flash Or a zoom to be a spinning media Or high-speed spinning media You end up talking a completely different protocol. So those all end up under the NVMe umbrella And there's various other bits and pieces where this kind of grouping happens, but Transport the transport layer is kind of where you translate from the software stack Into what the disk can understand and things shuffle back and forth now Kind of glossed over it to a certain extent those two last two because for the most part you're not gonna be mucking with these too much There's not a whole lot of layering. There's not a whole lot of interesting bits and pieces going on there but we're gonna start talking about the block layer and then This is where things start getting very interesting. So the block layer Kind of comes out of the conceptual idea that there is a block of data on the on the desk and Almost all storage can be eventually gotten back down to there is a block of data Depending on, you know, whether what kind of device you're actually using Depends on how big that block of data is it could be 512 bytes. It could be 4k. It could be 2 it could be The size of the the block is kind of irrelevant in some in some cases You just need to know what that size of block is because that that determines how big anything can be and This is where things like the kernel I oq happens So what this is how the kernel? Determines what things get put into the disk layer in certain are down into the transport layer in certain orders In an in an attempt to optimize things for for whatever reason whether that's for latency or whether that's for throughput or whatnot But the kernel I oq does have to make a lot of assumptions about all of the things that the drives are reporting And if the drives are reporting in correct data, the I oq layer is going to have it is going to have issues being correctly optimal and I use correctly in the Mathematical terms There Yeah, so The next step up Once you've kind of got a block system, you know a set of blocks that you want to write data to You start getting into things like how do you break that up? And there's several different ways of going about this One thing to be very specific about partition tables are not required to use the disk There is nothing inherent about a partition table That is needed for the disk to be usable you could use a disk Directly no partition tables know nothing and You wouldn't have to worry about how things are broken up, but you will end up probably using the entire disk So why do we have partition tables? the real answer is is that There's a number of different things that a system needs and Breaking those up into physical devices and sometimes completely unreasonable If you look at you know a laptop or a small tiny form factor system They probably only have a single disk in them if not just straight emmc Or even your phone, you know your phone has a single piece of storage in it. There's not for the most part There's not a lot of different disks that you can actually put different pieces on to so what a partition table does It actually breaks it up into logical chunks. It's a It it's an indicator to the operating system. It's an indicator to the software on this is slash boot this is Slash this is my UEFI partition. This is where home lives. This is where the root file system lives This is where swap lives And it's just a way of breaking up the disk into logical pieces But there's some interesting caveats about what is optimal about breaking up a disk Um Because you when you go to do a right to a disk You want rights to be in certain boundaries? Particularly to spinning media so that things don't have to be reshuffled and re broken back up With a partition table you can define if you're not careful Areas where each block that the partition table claims as a block is actually offset from where the block actually is And so if you're ever curious about why Partitions start at 2048 if you're using most Linux tools these days That's why is that they're forcing all of the partition tables to start high enough up that they should be aligned to a correct boundary it is still possible that if you're You go and you mark with the the partition table directly to offset them again and basically what the Hardware side of this, you know underneath all of this layer because the software is not going to real necessarily realize this is that it is going to have to Read in read in the block are the first part of the block that it needs to write to Overlay the the new part of the block write it down read in the the next select block Overlay that the next part of it and then write that back So a single write operation actually ends up being two reads and two writes if you do this incorrectly so partition tables there's two types GPT which is going to be on more more modern systems and more recently very large discs And the the classic MS-DOS type MS-DOS realistically only provides for four primary partitions and four extended partitions This is kind of why it's a little limiting it also doesn't have a whole lot of Definition for what those partitions are there's just gross information about them GPT Allows for much more specific Definition of what a file system is or what a partition block is intended to be so things like UEFI boot partitions or MS-DOS partitions or a whole slurry of Other types of partition tables can can be defined using the the extended GUID syntax that they use But that's all that's all hinting information for how what is expected to be on there. That's it There is a caveat that I want to point out particularly about partition tables if you are on a GPT defined system if you're on a UA a pure UEFI system and you're using software RAID UEFI can't handle Correctly again mathematical correctly um a UEFI software rate device Correctly because it what it what happens is is the system will come up attempt to boot off of the UEFI Device and if it writes anything back It will only write it back to the disk that it's actually checked not to both disks because it doesn't actually understand the RAID information So while under Linux you can set up a GPT UEFI RAID Like RAID 1 pair it will almost always be out of sync With the with the opposite so something to be aware of software RAID Is sometimes very weird and very wonky and this is kind of why when RAID? Exists is that it fakes enough information up to the the boot process To to get you around these weird issues with with soft with software RAID devices early on in boot now This is where things get exceptionally interesting because this is where A lot of the layering really comes into it into play And this is with things like device mapper LVM and encryption And once you kind of have a block layer once you have a block device that you can actually just start Dumping stuff to you once you have it partitioned or kind of broken up the however you want You can start overlaying pieces on to the onto the desk To build up some very interesting Apologies and if you've ever actually encrypted a disk on on Linux and taken a look at what LS block looks like You'll see many of these layers all kind of grouped together In the in this long tree structure, and I've got an example of this towards the end of the presentation And and kind of a visual representation here of sort of what's going on but Device mapper Specifically brings up a lot of low-level Kind of block device ideas that you can layer. This is where MD RAID or DM RAID comes in This is where things like DM cache, which is the main Linux kind of caching layer DM crypt where some of the encryption stuff comes in Lux and Distributed replicated block device DRBD This is where these things kind of all come in And each one of these that all they care about is a block device underneath it And what they expose is a block device above it Which means that like you know like building blocks You can literally just kind of plug these in in different chains and get different apologies and the example I've got here What you do is you've got a bunch of desks and a number of desks doesn't really matter They're all linked up into a RAID device MD zero And That is exposed into DM cache. So now there's all right. I'm sorry that that that is then encrypted So that entire RAID device is encrypted, but the metadata about the RAID is not There isn't a caching layer. So the the caching layer is inherently also No, actually take that back the caching layer is not encrypted in this specific scenario because the caching layer happens outside of Outside of the encrypted side of this and then there is a DRBD block that runs over the top of this So there's four different block layers involved here before we've even started getting to something like a file system or anything above that and That's that and you could literally just start replacing stuff You could put on DRBD at the bottom of this and ray you could replace DRBD and and the RAID layer Just literally in this diagram and it would still work or you could you know switch DM cache and where the the The encryption happens so that the cache is encrypted as well as the underlying disk And all these these other kinds of bits and pieces This is this is where stuff gets very interesting particularly from the device mapper perspective LVM is a little bit different and it it will not only can act sort of like the Device mapper, but it also provides a more scalable way of partitioning Effectively the disk so that you can kind of shift bits around Are blocks around inside the disk? It's very very powerful. It also has a ridiculously complicated and at least for someone who's not used to using it a set of tools to approach it because there's logical groups and Devices and volumes and a number of different pieces. It's actually LVM is probably going to be very close to how ZFS not that I was intending to mention ZFS In terms of how it kind of moves things around and puts things together but I think but All in once you're past all of the device mapper layers here You end up with yet another block device But your block device has several different layers that it's going to pass through before it actually gets back to The transport layer and before the block or before the physical layer and now Once you've got a block device for the most part people actually want to put something over the top of this and Usually what you end up putting is either a virtual file system or some sort of an object store An object store all that is is it it's a a file system that Exposes blocks in a slightly different manner and more or less an object is just a file system logically Without POSIX attributes or or without like file system hierarchy attributes That's it. That's all an object store at this level is of the virtual file system Really all that does is again it translates the blocks because 4k or 512 a byte blocks Are not exactly human friendly to read, you know 0 1 1 0 that you know that humans don't like reading binary Or or lots of hex for that matter but what it does is it translates it into something legible and the file system will actually create Additional boundaries additional protections for that data so You know forward and reverse References and you know file names and where in the file system it all exists and those kinds of things Not all file systems are explicitly Bounded to POSIX attributes things like VFAT and XFAT and Some of these file systems don't expose all of the portions of POSIX But but they have a lot of the same basic ideas They're not they're not object stores, but they're also not POSIX compatible and POSIX is a set of attributes that defines how a file system is Expected to work what it's expected to store from a metadata perspective, etc And not all file systems are actually Not all file systems what they do is exactly the same so the file system is explicitly responsible for Showing you files, but also what these files logically in some way look like on disk So it's going to do a set of translation here Sometimes you're going to get copy-on-write journaling reflinks some file systems do do tearing at this layer instead of at the block layer There's a data integrity sparse file processing There's a whole number of other different things that are happening at at this layer So things like EXT 3 and 4 have journaling XFS does As journaling as a file said EXT 2 does not But our FS uses a copy-on-write System, which is a completely different set of Or a completely different idea on how to how bits end up on disk as opposed to how journaling works VFAT and XFAT have a tendency to be very simplistic and TFS has its own idea of everything And ZFS again also has a completely different idea of how everything should work under the hood But at the end what you're at least from the file system perspective What you're expecting to see is some sort of a hierarchical File system structure that you can see files in that's it. That's all that's going on there now What's interesting here is that? Just because we've hit we've gotten to the file system layer doesn't mean that things above it can't actually regress so let's Kind of move on here a little bit. So everything we've been talking about up to this point has almost exclusively lived in Colonel space You know, there's nothing at this point that that that should that should have left other than maybe What I'm about to talk about um Should have left the colonel space now file systems and user space can be very very powerful There's a lot of really neat things you can do once you get out of the colonel and you can start taking advantage of Different languages different ideas to actually process data That looks like a file system So sometimes this involves Some of the licensing complexities there is or there used to be I don't know if there still is a Fuse ZFS file system. There is a an NTFS fuse file system, you know, sometimes where this is used to Avoid licensing issues sometimes the complexity of what you're trying to do is it just Extends beyond what makes sense to do in the colonel, you know things like Dejuplication sometimes Would be much easier if it's done outside of the colonel space as opposed to in the colonel space directly There's all kinds of weird oddities where this may this may make sense up to and including if you're Things like mp3fs mtpfs these are FTP like protocols that don't actually expose a file system But but exposing things, you know like these as a file system means that you can actually run things like our sync against the this Remote data store Curl up curl ftpfs, you know same kind of idea. Let's take a website and actually turn it into a file system Or in the case of something like Union FS you want to be able to take a multiple or multiple different places on a file system and merge them in weird in a weird layering system To be able to to you know, let's say you've got that on what you know on two different disks But you want that those data stick, you know kind of weave together into a single in a single store You use things like Union FS The big downside with fuse is that it has a tendency to be slower Because what you're doing is you actually kick all the way out of colonel space to do some operation and then back into colonel space To then expose things back up through the colonel file system API So basically you end up doing a lot of context switches once in a while there are certain cases particularly with networking file systems and Cache and local cache that you might end up faster most of the time you don't So there's a big caveat here that while fuse file systems can be very very powerful and very very interesting They do live outside the colonel. There's going to be a performance. It almost guaranteed with them And a question that always comes up When you start talking about swap or memory or anything on the system And I want to talk about swap is why is there no free memory on my Linux machine? And that is because the Linux kernel actually does a phenomenal job of disk caching So more or less Linux takes a look at all of the the the ram that isn't being used on the system And it goes great. This is all available for disk cache And it will start caching stuff into the disk into that all of that extra space Now what this ends up looking like is kind of this second black picture here where it doesn't look like there's any free free memory However, that's not actually true. Linux will basically eject things out of the disk cache instantly if a process actually needs the RAM and so It's It's a question that comes up a lot because people don't understand what's going on from the disk caching perspective But this is what's going on and swap There there's a growing consensus that people think, you know, oh, I don't need swap Swap, you know swaps an arcade thing and that's all wrong swap Is it incredibly useful because sometimes there's stuff that ends up in RAM either from the operating system or from running processes or whatnot That literally just never gets accessed it gets shoved up there It's almost never accessed and it shouldn't be eating up your expensive Very fast memory storage. And so what swap can do in a couple of different places specifically things like Compressed RAM, Z-RAM, the fasts, or whatnot It takes these data and it shoves it out of the fast set of memory and it shoves it into potentially progressively slower disks or slower storage locations and this can free up huge amounts of RAM for stuff that actually wants Stuff to use Downside of swap, which if anybody's ever played with swap is that if your working set Exceeds the actual amount of RAM and you're going to disk constantly to try and get stuff in and out your disk IO goes to Goes really badly and it the whole system will just grind almost to a halt potentially depending on how you think had things set up and This is where swap has a tendency to get a bad name is that people end up with two two little RAM Too much swap and the system just ends up swapping constantly because there's not enough working RAM to actually fit everything Some of this can be a little bit offset with again compressed RAM This isn't a sneak oil concept if you remember back in if you're you've been around since the 386 days There were products back then that would magically compress your RAM and double you know RAM dollars or whatever Those were all snake oil The compressed RAM particularly compressed RAM stuff that we have these days is much better They don't try to actually, you know claim that it's going to double your RAM It's not but what they'll do is they'll give you a compressed space inside of the RAM That you can actually swap to it's still slower because you have to decompress and recompress things to get in get it in and out But it is definitely faster than physical disc And for some things that they'll compress really nicely and it's a huge performance win So if you if you're actually setting up a system that's something that's actually really useful to take a look at it can it can really change how your swap IO looks on the system and And Now that we're kind of up several layers here and we're into actual file systems Let's talk about my bind mounds sim links and ref links Reflings have a tendency to or ref links exist underneath at the block layer But what this is but what all of these technologies are sort of doing whether they're hard links or soft links is trying to deduplicate data Either for one reason or another Bind amounts what you're doing is you're taking one file system location and mounting it into another So there's basically just a link that if you cd into this directory it comes over here And if you as you you pop things back out it will It should pop back correctly, but you can access this location Either by going over here or going over here. That's effectively what a bind mount does and it's using this It's doing this through the mounting system in the Linux kernel as opposed to some other File system specific mechanism sim links and hard links Specifically on need file system support for this to actually work hard links are going to be Bound to the same block device Sort of like with ref links that they're bound to the same block device That you're actually working from Because of how they reference things And soft links are not they are literally just kind of a pointer if you're coming from another operating system sometimes they're shortcuts or or other similar ideas where What what exists over here is nothing but a met piece of metadata that just happens to point off to somewhere else They are Soft links are a little special particularly when you go to read them Depending on how your your open function works in whatever programming language you're using It can read straight through it. Sometimes it needs to know To actually take a look at it and then parse through it hard links. You don't need any of this It just needs file system support to actually create it and ref links When you get back to this you actually have to tell the block layer You know, these things are the same please reference them the same and Mean difference between some links and rough links particularly hard links is hard links If you edit the filing either of the places both places will reflect the edit on a rough link Most file systems will break that rough link When one side tries to write to it because then because then they're no longer identical references and temp of s coming up on getting towards the the end of this but I did want to touch on temp of s because Talk a little bit about Ram already. This is a file system that exists in memory So basically it just slaps some pausic app pausics attributes over the top of Ram Where you've already got the disk caching layer and everything anyway, and it just exposes it as a file system The really nice thing about Ram is that it's super fast The really bad downside about Ram is that it's volatile So if you turn your machine off or something goes wrong or the kernel crashes everything in this space is lost But this does make it really really nice for you know things like system D or any of the Anything else that you just need a space to dump some data That you want to be really fast. There's Databases that do similar similar things like this where that they'll store everything on Ram and then dump it back out To disk on occasion sometimes they never even just dump it out to disk. They just exist completely Ram Mencash D for for instance, um and this is You know if you kind of stack everything this this can be a very interesting place to do some very fast Ram if you start playing with This for anything normal Please remember that if you use if you have an 8 gigabyte file, it's gonna use 8 gigabytes of memory. So Sometimes swap will be very very useful because a temp of s will swap out But if you don't have enough memory and you don't have enough swap things are gonna go very poorly so Putting this all together and kind of seeing what looks like a finished setup. This is literally taken off of Server I have Here that had the most interesting Apology I could I could show everybody. There's 13 disks involved with this system There's three Lux devices for MD raids a Z RAM swap and a slew of temp of s entries that are not actually listed in the LS Block because they're they're not actually block devices But you can kind of see where you've got the raid device that then builds up into or that comprises several other different pieces you've got Lux devices that then Overlay over the top of raid devices that then overlay over the top of partition devices that then lay overlay over the top of disk devices and and these are things that literally um Just build up and in fact you can kind of see in This 32.7 terabyte example or disk here where slash home groups back backups and main storage are all that that single device has Multiple mount locations and that's because there's a bind mount There's three different bind mounts backups group and home are all a bind mounts outs out from main storage and You can also see the same thing with a var lib libvert and group 2 in the 10.9 terabyte example here and I And I wanted to specifically show everybody this just so that they can see what a running system looks like And to kind of see what an LS block don't looks like so that they can actually take a look at what things look like on their system the minus s Switch to LS block will kind of reverse Which direction you're looking at usually it's the other way around where it's the base layer all the way up to the Then you change to the bottom. I'm looking at this from reverse from a block perspective, so Where do you go from here? I'm kind of running out of time Honestly, your best bet on doing anything with block with storage stuff is to test it out because you really need to understand What's going on with stuff? I'm to feel confident on how you've got it set up and how it's working Um, I can't really stress that enough I I mean I can put up examples of how to do FS tab and how to link all of this stuff together But there's so many good information. There's so much other good information about how to do this You should go and look for it for your specific thing you want to do Understanding that you can piece these things together in different orders is It's probably the biggest takeaway. I can give to you today Um that you don't have to have empty raid underneath a locks you could have locks underneath Empty raid um and that these other things exist for you to play with um But that being said anytime you play with storage what the there there's one important role of storage You want all the bits to go down to the disc In one order to come back in the same order that you put them in there put them down in and You know, don't play with don't do any of this experimentation on data You care about and if you do do any of this experimentation on data you care about backups Backups are your friends backups are always your friend. You should have Raid is not a backup Don't back up to the same machine if you don't have to or if you're you're you're only backing I don't have to be your only backup if you're on the same machine Yeah, just if you need a backup recommendation play with board slash board medic. It's what I've been using it's quite good There's lots, but there's lots of other stuff out there. That's either similar to born or to to many others things but it's just you You definitely want to do backups and remember any backup that you've never tested or haven't tested is not actually a backup and um the the You know the the last thing I can give to everybody today is some homework Let's be honest. When was the last time you did an off-site backup of that all the data that you care about? Most everybody is going to watch this is going to Probably say that they haven't done one recently So here's your homework go do it now if you if you can or remember that you need to do this Soon go and make a backup on off-site off-disk Back up that's not near anything you care about of any of the data that you actually care about and with that Thank you. I hope You've learned something about Better about how storage works in Linux. I know that this is a fast and furious Take our dive through all of this but I'm trying to get some of the basic concepts across and I'm not I don't have an infinite amount of time to get into the deep stuff If you're ever curious about storage stuff or anything else I'm doing. There's my contact information I'm happy to talk about almost anything really But yeah, just please reach out ask me questions. I Try to be as approachable as I possibly can so with that. Thank you. I hope The rest of your talks that you're gonna watch or listen to go Well, I hope that if you're there in person the hallway conversations are amazing. I will admit I really miss them But hopefully soon We'll be we'll all be back. So thank you very much