 All right. Today is Friday. It's going to be like 20 degrees tomorrow, right? Celsius? Yeah, I wish. OK, so today we're going to keep talking about files. Talk about, I'll do a little design exercise where we talk about file metadata, where it goes. Talk a little bit about what we mean by a file and the semantics that are associated with files. And start talking a little bit about hierarchical file systems and hierarchical file system name. These are concepts that are probably familiar to you, but you may not have really thought through some of the design implications here. Now, so compared with processors scheduling or virtual memory, you guys have probably more direct experience sort of manipulating file systems and things like that. Although I am still shocked that many of you seem to think that you can only open a file if you're in the same directory as the file, which is not true. Anyway, but regardless of how painful your interaction with the shell is, you guys still do have some sort of first-hand knowledge of the file system. And so I just want to build on that, but make sure that you guys sort of understand what's going on behind the scenes. OK, do assignment three. That's all I have to say. You guys have a week to complete the first part. There are a number of groups that have already submitted assignment three, the first part of it. Currently, the grading distribution is 100% of people have gotten 100%. So maybe the first part is easy. I hope so. But please get that done. If you guys are continuing to work on assignment two, I would really suggest prioritizing this part of assignment three for the next week. Get it done. You don't know how long it's going to take. Finish it. Get your 100 points, and then you can keep sort of plugging away on that assignment. Any questions about logistics? We will have the midterms graded by next Friday, right, Ali? Hopefully. OK. Last time you said yes. So I don't know. Connect the dots there. We'll see where we are on Monday. But yes, our goal is by next Friday to have the midterms returned. And I really would like that to happen. OK. Any questions about disks? So last time we talked about hardware. And did you guys do disks, spinning disks, flash? Any questions before we go on? Yeah. Magnets, yes. What is it actually doing? What is it doing on the platform? Oh, Lord. OK. So I'll just put a big, like, Proviso around this answer, which is that I'm not sure. But look, I'm a computer scientist, OK? Like, I'm not an electrical engineer, happily. What I'm assuming is happening is that it is altering the magnetic properties of the medium at that particular point in space. So I'm magnetizing the medium at a particular spot. And that will remain that way right over time. Does that sound? Does anyone know? Come on, there's some electrical engineers in here. Andrew, did that sound right? He's like, I don't know anything about magnetics. Well, talk to one of the material scientists. Anyway, the answer is I don't know. That is my best answer. At least he didn't ask about flash, because with flash, I have no idea. Yeah. Ah, yes. Yes, that's interesting. So we'll come back to this when we talk about early file system designs. It did a lot of work on geometry. But it is a good observation that if the disk is spinning at a constant rotation, the parts, the diameter, the length of the track at the edge of the disk is much longer than the length of the track toward the middle. And so that part is passing under the heads more quickly. That is true. Yes, and that has implications in terms of the amount of data that you can read or write to those areas. That's a great point. And there are file system designs that have considered that aspect of disks. Did you have a question, Margaret? OK. So just to review, I think you guys get this platter, spindle, head, track, sector, cylinder. Any questions here? I'll just do this, see if that simulates learning somehow. Any questions about disk parts? Yeah. This is like a Google interview question. What's that? So seek times. Yeah, so that's a great point. So absolutely, that's a great question. I don't know why disks, you don't have disks that are like these tall, long things. Because you're right, the seek times are shorter. I think the problem, well, here's my assumption. So let's say I'm going to create this, rather than having a disk that's kind of big and flat, I'm going to make the tower of Pisa disk. That's short diameter, tall. What does that characterize this disk for me, compared with its flatter, squatter, sibling? Yeah? Well, I can create one that holds as much data. I just have to get it tall enough. If it doesn't hold up data, keep going. Just a big, long skyscraper of a disk. But that disk is going to be very what? Could be very fast, going to be very what? That's right, it won't fit in my pocket. That is a consideration, obviously. Your iPhone has a big pillar coming out of it. That would make it awkward. But there's something else here. For every platter, what do I need? Yeah? Well, I still just have one arm. Now, the arm might start to get weird, because it might be hard to move it in sync, or it might wave in the wind. So that would be an issue. But what do I need in between every platter? What's the most expensive part of the disk? Yeah, so the cost of that disk, I think, would be pretty enormous. Because if I have 30 platters, suddenly, and they're all very small, you're right. I can seek super fast. Now, the geometry is kind of weird. Well, here, I have a better idea. Let's take that disk. This is fun. And let's just chop it into four pieces and put them next to each other. So essentially, I have four little disks. They're all sort of just. So you could probably bin pack it in the same area. I think the problem is the cost. The more heads I have, the more expensive things are. But you're right. I mean, performance-wise, to reduce seeks, that would work. What's that? It's kind of a little hard drive. Yeah, yeah. Like data is possible. Well, that's kind of what RAID is, right? So hold that thought. Once you get to that point, because once I have the four little drives next to each other, which I actually think is really awesome. I wonder if anyone ever did that. Now I can treat all the heads independently. So that's pretty cool. Again, hold that thought to when we come back and talk about RAID, because essentially, to some degree, what RAID is, is that disk. If you zoomed out a little bit, everything got better. All right, good questions. So we talked about the fact that compared with the other part of the system, disks are slow. They move. And so this introduced failures. These are things that file systems have to cope with. The slowness of moving things around on disk and finding data. The failures that are caused by the fact that disks are mechanical, or the old disks were mechanical, and so will wear out. Flash still has this property, it turns out, just sort of sad. And disks are exposed to the operating system as devices. And so there's a lot more management of them that's done in software, right? So we talked about the different components of finding things on the disk. These are the steps that it takes to actually get data back and forth to the drive. And the fact that disks have been getting bigger and bigger and bigger. And this is probably tailed off a little bit, I think, I'm assuming. And yet they have not been getting faster. So this IO gap represents the difference in the performance improvements that we've seen over the past few decades in CPUs and memory, which have just gotten faster and faster and faster. And disks, which are, they've gotten bigger and they've gotten faster, but at a lot slower rate. And so the gap between the performance of the CPU and the performance of the storage subsystem has been growing over time. And this has caused a lot of interesting redesigns in operating systems. All right, any questions on this? Where you go on? We have a couple things to finish up. So when we talk, we're not going to talk too much about the low level disk interface, but I want to give you just a brief look at what it is. You can think of the disk as providing an interface to the operating system that allows you to write 512, or maybe it's bigger now, like 1024 or something, chunks of data. Those chunks, there's usually no way to write a byte to the disk. If I want to change one byte on the disk, what do I need to do? Let's say I want to change the third byte in a 512 byte sector. What do I have to do to do that? Yeah? Exactly. So I have to read it in, make a change in memory, and then write out the whole contents. On Flash, this turns out to be even worse. We'll come back to that in a second. And so the main, so this is kind of cool. We've talked about the operating system as a provider of illusions. The illusions about the file system that you guys are familiar with are entirely in software. The underlying disk has just no concept of most of the abstractions that you guys are completely 100% familiar with when you interact with the file system. There's no notion of names, no notion of directories, of permissions. All that stuff is just a figment of your imagination that's all implemented in software. The disk, what the disk knows is that it has an array of 512 byte chunks. It says, which chunk would you like to modify? That's it. So all it understands is numbers. Everything else on top of that is all done in the files. And this is what's kind of cool about file systems. So the file system design challenge is given this low level interface that essentially allows me to read and write sectors of data identified by an index, build a hierarchical file system with all the features that you're familiar with. And oh, by the way, the disk can fail. Parts of it can fail at any time. The whole thing can fail. It's possible that I'm in the middle of doing something to the disk and the power turns off, or the computer crashes. And so there's all these really interesting design goals that come out of some of the features of disks. And I think to some degree, well, to an entire degree, this stuff is actually done in software. So file systems were a huge part of computer systems research and computer systems design for years because all of this is done in software. This is entirely a systems challenge to build a reliable storage system with a bunch of cool features on top of this very, very low level and very unreliable, potentially, substrate. And I think that's, well, and this is one of the ways we talked about last time, that you know that this has happened. There's so many different file systems. How many people have tried any of these? Is anyone still using EXT2? Don't do that. It's too old. EXT3, anyone? Also too old. EXT4? OK. Actually, you're all using EXT4 because the VM you use for this class is EXT4 internally. Anyone tried Ryzer FS? Ryzer FS was like the HIP file system back 10 years ago. And then the guy that created it, I think, went to jail and did bad things. And it turned out to have bugs and things like that. So anyway, it was sort of a NTFS? Is that still a thing? Is that still the Windows file system? Oh, OK. And then, you know, and this list goes on and on and on and on. I mean, this is just a subset of what's out there in terms of, and people are still designing new file systems. There's a lot of work on novel file systems and designs for flash that are trying to take advantage of some of the unique features of flash. And I think this is why systems people, particularly systems people that lived through the 80s and 90s, kind of have a soft spot for file system design, because it's a fun problem. It's just an interesting, fun problem. It's all solved in software, and there's a bunch of different ways to do things. And there are a bunch of sort of cool lessons here. All right, so let's compare and contrast with flash just a little bit when we talk about the low-level disk interface. Flash, no moving parts. Fantastic. So whereas with a spinning disk, the speed at which I can access part of the disk depends on something about the disk's current state. So if you guys have thought about stateless interfaces, the disk, a spinning disk is not a stateless interface. The amount of time it takes to perform some operation depends on where the heads are right now. With flash, not so much. I mean, it's all electronic. Really speeds depend on how fast I can get things over the bus and very little else. However, this is the biggest complication with flash. Well, there's two big complications with flash. One is that on certain flash technologies, in order to remember how we modified a byte of the spinning disk, I had to read the sector in, change the byte, and write it back. On flash, this is similar, except the sectors are huge, really, really, really, really big. And there's a mismatch between the amount I have to, I can read and the amount I can write. So I might be able to read in chunks of 512 bytes, but I have to write 32 kilobytes. So if I want to change one byte in that 32 kilobytes on flash, I have to read the whole thing in, erase all of it at once, change the byte, and write out the whole thing. That's terrible. This is sort of ugly. And I also have this, this is called wear-leveling. So parts of, and again, I have no idea why this is the case, but parts of flash, as you erase and write them, and erase and write them over and over again, they only have a certain number of what's called erase cycles that they can go through before they start to fail. And what's interesting about this is if you have, let's say you're reading and writing the same part of flash over and over and over and over and over again, that part of your flash drive is going to fail way before the rest of it. That's not ideal. I mean, it's sort of interesting, like how would you guys feel about a hard drive that just had sort of slowly declining capacity? You wake up one day and it's like, well, I thought I bought a 32 gigabyte hard drive, but now it says there's only 21 gigabytes left. Next week it's like, oh, you only have 18, right? Time to start off loading some of those files, right? Yeah, this isn't usually considered to be a feature. So what flash drives do is they try to, they have internal circuitry that tries to level out accesses to the underlying flash, and you can imagine how hard that is. Yeah, Steve, do you have a question? I have a file that I know that I still need to put in there. That's cool, I like that, yeah. Well, so let's come back to this, because when we start talking about file systems, so you can think of a file system as a really complicated data structure that I'm building on top of the underlying disk. And so, for example, deleting the file might require changing a part of the flash that can't be changed anymore, and so the system may be trying to delete the file and the flash gets, and it's just impossible, right? So yeah, I mean, and this is another one of the challenges, so when you think about, you may think, well, it's not a big deal, I mean, I just have a bunch of data there, and if I have, maybe if I do some redundancy where I have different copies of it, I should be able to recover from failures, but disk failures are equally, I shouldn't say equally, disk failures can affect data, but they can also affect file system metadata, they can actually affect the file system's internal data structures, and when that happens, it's usually much more problematic, right? Question? What's that? Trim. Yeah, I don't know what trim is. I know that trim exists, right? Yeah, yeah, so there's, so I think trim falls into the category, and I'm happy to do more research about this, but I think it falls into the category of tricks that the flash file systems try to play to even out where and load on the flash underline flash layer. So when you're using flash, it's giving you this illusion that there's this array, this contiguous array of sectors that's similar to what a spinning disk would provide, but where those sectors are in flash is kind of a mystery because there's something called the flash translation layer that's playing games with where those are to try to make sure that all of the flash is wearing out at the same rate. But you can imagine how hard that is, right? I mean, do you guys use all of the files on your computer with the same frequency? I know I do, right? No, I mean, there's some file that's the one movie you watch over and over again, right? And then there's other stuff on there, like, wow, I still have that file around, I totally forgot about it. So trying to even out where is really hard. All right, anyway, so flash is not necessarily a complete fantasy. So now let's start talking about files. We're gonna talk about files and file systems and file system data structures for the next couple of weeks. The primary abstraction that file systems exist to provide is this idea of a file. Like I said, the disk understands blocks, that's it. Everything else we're gonna talk about, files, permissions, names, is all stuff that is stored on the disk, but the disk doesn't understand. All the disk, when you talk to a disk, you just say, I want sector 10,042. That says, okay, here's 256 bytes of information, that's it, and there's nothing about any of this stuff. Okay, so we'll talk a little bit about sort of what files are and the minimum that we expect from a file. The idea of sort of some of the challenges associated with organizing files together and also some of the information that, wait, I got this out of order, okay, sorry. So what other stuff do we save about a file? Because there's other information that file systems store about files that's potentially useful. We'll talk about sort of assumptions that are baked into the way that processes interact with files, like for example, why do I have open and close, it's interesting. And then we'll talk about files together, so groups of files. How do I organize groups of files? How do I optimize access to groups of files and things like this? These are all tasks that the file system is in charge of providing. As you guys found out when you did assign it to, it's really the underlying file system that implements most of the UNIX file API. I mean, you have a little bit of work to do in the system call, but not much. I mean, most of what you're doing is kind of gathering data and sending it down to the underlying file system to handle, so that's really what's doing all the work. Okay, so here I've got a file, it's an old file. What do you guys expect, like what does a file have to do to be useful? What do you guys expect from a file? At minimum. Yeah. Yeah, I need to be able to locate it. Okay, so that's important. It would, yeah. Hold information. Hold information, yeah. Like store data, that's useful. What else? Okay, now we're getting fancy here. The file might have a type. That's actually one of the grossest things about file systems, yeah. Do you view the data when you ask for it? Yeah, it should actually, that's sort of like finding it and storing it, so I should be able to summon up the data. There's some other features, though. I could build a file system meeting these requirements that you guys would be unhappy about. What else should I, should a file let me do? Well, I might want to know its size, that's true. That's useful metadata. Okay, some maybe no sort of permissions. Okay, we're getting warmer. There's still some pretty important features here that we're missing. It'd be very hard to complete the assignments for the class if you could not, what? Move it up. Okay, yeah, I should be able to rename it. Okay, we're still, we still have some problems here, yeah. Resize it, we're getting warmer, yeah. Open it, okay, that's good. Still, now we're a little colder. Yeah. Modify the contents, right? You do want to be able to do that, right? I can build you a file system that's like, okay, there are the contents of the file, you can never change it again, right? You have one chance to complete VM fault, that's it. You're done. Yeah, modify it, change the size, files grow, they shrink, I add stuff to them, I move them around, right? So these are all good things, right? I mean, this is what we expect, reliably store data, be located. And those are, I would argue, the basics. Some of the other things people said are nice. Permissions, types, yeah. Function, I like that. A functional file system, there is no editing. Well, actually, this is interesting, there is. You guys are using one right now. No, file system where it is impossible to edit something. And all I can do is save new versions, okay, good, right? Once an object is in the Git repository, it's there. Unless you go and like rip it out. There is no way, once you have committed something to Git, I will just warn you about this, because apparently, I was telling someone earlier that GitHub has discovered that people check all sorts of bad things into their Git repositories, like passwords, nuclear launch code, stuff like that. Once you check something into a Git repository, if you remove that file, it's still there, right? The information is not gone. Anything you commit is there forever. Now, there are things you can do to get rid of it, that require a great deal of brain surgery and the cooperation of anybody else who has cloned to repository. But in general, once you commit something to Git, it's there and it cannot be edited. It can be overwritten, it can be moved, it can be deleted, it cannot, but those contents are there until the end of time. The heat death of the universe or the end of GitHub, whatever comes first. All right, so, yeah, but the functional file system, I don't know, I'll keep thinking about it. So these are base requirements, right? They expect that, and we expect that things are gonna change when we ask, they shouldn't change at other times. Now, this is harder than you think, and this is an old bug. I'm sure there are still new bugs. Margot used to say, and Margot did a lot of work on file systems. She said that file system data corruption is the worst problem you can have because if you corrupt somebody else's data, not only do they have the reason to come find you, but they have the time because you probably ruined their whole project, right? So, for example, if you could figure out how to corrupt the file system on the test 161 server and destroy all of your submissions for the class, and it's interesting, it's an interesting idea, I would have time to come find you, right? But anyway, so, I mean, file systems turn out to be tough to get right. There's actually ongoing work and maybe we'll look at one of these papers, although it's a little theoretical, unprovably correct file system designs. Wouldn't it be nice if I could guarantee that a file system didn't lose data? Now, the thing that makes these expectations hard to meet are failures, right? And these failures can be the things we've talked about in terms of flash sectors wearing out, disk sectors wearing out. They can also be these unexpected changes to the disk state, things like the power going off, or you dropping something, or somebody tripping over a cable and then unplugging the disk from the server that it's connected to, or whatever. And the problem here, and we'll come back to this when we talk about caching, is a lot of the strategies that we try to use to improve performance, which involve moving things into memory, have a direct impact on our ability to make sure that things persist. So, to some degree, making file systems, making disks fast requires memory. Making file systems safe requires making sure everything's on disk at all times. So those two objectives are in clear competition with each other. All right, so we came up with a couple of these before, but let's talk a little bit about file metadata because this gets mixed in with the concept of a file. There are timestamps associated with files that are usually stored by the file system to give me information about things like when the file was created and when it was modified. Why? Is this just because it's interesting? Is this just for your information? Who can give me an example of a tool or a program that uses this heavily dependent on file timestamps? Yeah. Make, thank you. Yeah, you ever wondered how make works? Basically says, is the thing you are trying to make, does the timestamp on that object, is it earlier or later than the dependencies? If it's later, then it's probably up to date. If it's earlier, then I need to rebuild it. It's really pretty simple. If you destroy timestamps on files, then make will complain. Or if you modify timestamps on files, you can get make to do things that you want, like rebuild your kernel, right? There's other tools that do this as well. These timestamps are not entirely just for fun or for investigative purposes. They actually enable certain tools to operate properly. Permissions, this is a good one. Who owns a file? Who should be allowed to read or write or in certain cases execute the file? This is, what would happen if I didn't have these permissions? Give me an example of a place where I might want to use these, yeah. I like that, anybody could do anything. The world, it's like freedom, anarchy. So, okay, so this is interesting. I mean, maybe, let's say there were no file permissions. Tell me how you would attack that system because you really can't do anything to my files on my machine, right? You don't have any access to it. Yeah. SSH? Yeah, but you're not gonna be able to access SSH into my machine. I'm too smart for that. What's that? I think so, yeah. I think so, that's a good answer, yeah. Something to check after class, right? Yeah. No, no, no, but hold on. We're getting two things confused. You don't have any access to my machine. That's different, right? But where is an environment where file permissions start to become pretty important? Yeah. Yeah, Timberlake, there we go. So you could go onto Timberlake and you could read all of the important files I have on Timberlake, which are zero, right? But if you had important files on Timberlake, I could go get them, right? So file permissions, I mean, file permissions are kind of weird. When you use shared machines, they can be pretty critical, right? But on your personal machines, you're probably safe even if they don't matter, right? Unless you share your personal machine with an adversary, which is always a great idea, right? Like if you have an enemy, you should let them use your machine on a regular basis. Make sure your file permissions are in good shape, right? Other file attributes? Somebody had an idea over here about something else they might wanna know about a file. One of the sadder stories. Yeah, there's like even weirder file permissions, like the contents of the file should be able to run as root. It was an interesting thing that they added at one point. It's not clear that that had a good impact on the world. So there's other, so I just wanna pause briefly and talk a little bit about file metadata, right? Because there's other information that we associate with files. And there's this interesting sort of design question about where this information should go. So, so actually let me back up a minute. When you open a file on your computer, what else might you wanna know about it? Yeah. Well, I'm assuming I know the path. Size, okay, yeah. Type, what does that mean? Yeah, okay. Yeah, yeah, so file type turns out to be kind of weird, right? Like what is that? I mean, I could have a file that has text in it. What type of file is that? It's a text file. Does that uniquely identify a program that should be used to open it? No. Yeah, okay, yeah. So now we get into the grubby business of file extensions, right? Probably the nastiest hack the world ever created. So, yeah, so this is, unfortunately we don't, I don't talk about this in class because there's really no rhyme or reason to how file extensions work. There's like conventions online and people have these whole dictionaries of them, but the whole, the truth is no one cares, right? Like if you wanna take that XLS file and you wanna rename it to PNG, you can get a photo browser to try to open it, right? It won't work because it doesn't have the right internal format to be a PNG, probably not, unless you have a very unusual Excel spreadsheet. But in general, file these sorts of extensions are entirely sort of an ad hoc mechanism, right? But they work okay, yeah. Does Unix use file extensions? It does. I mean, Unix systems. I mean, if you open files on Ubuntu systems, Ubuntu pretty much does the same thing that other systems do, which is it looks at the extension and then it makes a guess about what type of thing should be used to open it, right? So if you open something with a .tex extension, it might try to open a LaTeX editor or something like that, right? But the extensions are just guesses, right? Yeah. Yes. And you just actually have to say like, what program do you wanna open it with, why are they opening it? Yeah, who, let me just ask you a question. I mean, which system does your mom like better, right? Does she like it when the pop-up opens up and it says, which program would you like to open this with? And there's like a thousand entries? Or does she just like it when it just magically opens with preview? Yeah, magic. People like magic. So, yeah, I mean, you can always get around this. You could just open the program and like go to a file and open it and it will try, it might say, no, no, no, I don't wanna do that, right? What's another way of identifying file types? Right, like why, if I try to open the Excel spreadsheet with my photo viewer, why won't it work? What's that? Yeah, the format's wrong, right? There's also sometimes what are called magic numbers at the beginning of the file. So certain file formats will identify themselves by saying, okay, the first eight bytes of the file are always this signature. And therefore, when I try to open them with the file with a certain program, I can look at that and say, okay, well, this isn't the signature that identifies, this is being a Word document, so I'm just gonna bail out here, right? The problem is when you have things that are just text file formats, it doesn't work, right, Steve? Don't get an HTML, HTML is a totally, HTML is a totally different beast, right? HTML has had to, HTML has the sad distinction of having to grapple with some of the worst interpreters of all time, right? And so HTML is like the most, very flexible because browsers suck, right? They're all, they all do things wrong, but they do things wrong in different ways, right? We're starting to fix this finally, right? I mean, does anyone know this? Has anyone ever tried to get something to look right on IE, like version six or something, you know? Turn on quirks mode right away. It's like, what is quirks mode? I don't understand this, right? There's a mode that causes it to do weird stuff, right? Like awesome, that sounds like something that every program should have, right? I'm gonna run test 161 in quirks mode, right? Causes it to randomly add points to your score or subtract them, all right. Anyway, we sort of digressed a little bit here. It is April Fool's Day, so we can have some fun. So MP3, so give me some example of MP3 attributes, right? Other than the fact that it's an MP3 file which contains some sort of audio data in a standard format, what's other stuff about an MP3? Yeah, Steve. Yeah, title, artist, you know, genre, right? I mean, MP3 genres are a total mess, right? Talk about an unstructured system. What else, Jared? Yeah, like maybe the data was recorded, the length. Now where should we put this information? So that's one option, right? I can, well actually, let's say there's three options, right? I can put it in the file itself, and this is usually not a bad idea, particularly when I have file information that's very, very sort of central to the file. I could put it in another file. Now for a long time, I haven't dealt with this for a long time, so that's true, right? What was it called? Like a, what was the extension of this special M3U? Yeah, I remember this, it was like some sort of weird, or ID3, sorry, was that it? Man, how do you guys know this stuff? This is awesome. I think we solved this problem, I guess, yeah. Yeah, yeah, it's like a tech, yeah. So essentially what we did, I think at some point, it's not the MP3s, we moved it into the file itself, which is nice, right? The other, so just so you understand this, there have been file systems, and this is not a feature you guys are familiar with, because I don't think it's very common in modern file systems, I think there's some modern file systems that still support this. Has anyone ever used file attributes before? On what system? Linux, Linux has file attributes? It's a little way down the side of the list. Windows, okay, I think NTFS has this, actually, it's kind of interesting. So imagine you could just attach structured data to any file on the system you wanted. This is kind of neat, right? So if I store it in the file, right, so this is the ID3 tag, sorry, that's exactly right, it was on the next slide. The pros are that this travels from, this travels along with the file. Are there any cons to this approach? What's that? It makes the file a little bit bigger. It makes the file a little bit bigger, yeah, that's true. What else, yeah? Oh, okay, that's a good point, yeah. So if I have secret metadata I want to associate with the file, then I have to move it along with the file, yeah, what else? Yes, that's part of the problem, and that was kind of the major problem with some of these things, right? And this is almost HTML-like, where in the early days of MP3s, and I know you guys were like in the womb back then, so whatever, it doesn't have to trust me that I'm telling the truth here. Every MP3 player interpreted these tags a little bit differently, so you would write tags with one player and they wouldn't be able to be read with another. Does anyone, are there any problems like this today with files? All these problems have been solved. You guys have never noticed a problem like this? Where like one version of a particular type of software interprets the file a little bit differently, yeah. Oh, of course, yeah, yeah, absolutely. And well in there you're talking about the official version versus the Harry Hacker version, right? And of course, those guys will never get it right, because every day the word people get in a room and they're like, how can we break Libra Office? Let's break it right away. Let's change something, really important, right? Yeah, yeah. So what about a case where there's a document that's supposed to be portable? An example of a portable document that you might see two different viewers open differently. Is anyone familiar with this portable document that I'm talking about? Oh, okay. Well, HML is definitely right, yeah, what else? I'm thinking of the portable document, right? No? There we go, PDF. You guys know what PDF stands for, right? The portable document format. That is literally what PDF stands for. Vicki's laughing over here, because she knows what a PDF is. You guys never do that? Portable document format. How portable are PDFs? Has anyone tried adding a comment to a PDF before? What's that? Yes, every time it asks, every five minutes, basically. Yeah. Has anyone ever tried adding a comment to a PDF? Can you see that comment in other viewers? Sometimes, right? It really depends on if the moon is in the right quadrant or whatever, right? Like sometimes previews show you comments that are created with Adobe Reader and usually they're broken or sometimes you can edit them and sometimes it'll work. Yeah, so there's certain, even with these portable documents, right? HTML is even worse, obviously. It's difficult for people to agree on standards and how to interpret those standards. HTML at least has the benefit that usually your browser is not editing the document. If it was, all hell would have broken loose in a long time ago, right? With HTML, all the browser has to do is display the document and it has enough of a hard time with that. Once you actually start editing things, you're in real trouble. And this is true even in the case of the portable documents where there is a well-recognized standard for how the document is supposed to work. Okay, so if I put the file metadata in another file, an example of this would be like the iTunes database. So when you use iTunes, it builds up a database of all the music on the system. It stores that somewhere. That includes things that aren't part of the ID.3 format. Is Albemart now part of ID.3? Probably not. Albemart was always that weird thing that was like, where does it go? Anyway, so this allows me to maintain it separately. The con is that it doesn't move around with the file, so it's harder to keep it in sync, right? Okay, and the attributes are cool. Attributes are probably a feature of file systems that I'm the most surprised just never really caught on because attributes turn out to be pretty neat. It's sort of an interesting way to merge certain database-like features with a file system because attributes that I think were first introduced in BFS is BIOS, BIOS, BIOS was a pretty cool operating system. I don't think it's around anymore. The nice thing is that these are maintained by the file system, so there's a lot of standardization and frequently I can build interfaces for querying them. If I want to find, for example, all the songs in my collection that include the word Friday, I can run that query in the file system rather than having to run it in my application, which is kinda cool, right? I mean, it's almost, again, like a database-like feature that's now baked into my file system. The problem is that these never really caught on, right? So they're not cross compatible enough, it doesn't move around. I could move it around with the file as long as the other file system also supports attributes in some standardized format. Maybe we'll get there, though, right? I mean, the existence of things like JSON now makes me hopeful because at least there's more standards about how you can store structured data, so it's possible this will happen. Okay, any questions about file metadata? So let's talk about a little about process semantics. You may have noticed that the Unix file interface provides these calls that allow processes to establish a relationship between the process and a file, and the underlying file that's being used. There's times when that's important, but it might have struck you with sort of unnecessary. So what's a way to simplify the file system interface by getting rid of open and close? What's that? Well, I mean, so how do things work right now? I call it open with the path, I get a file descriptor back, I use that file descriptor and read and write. Cut out the middleman here. I mean, it's two fewer calls. You have to write for assignment two. What do you do? That's interesting, I like that. By default, every file is open. What does that mean? Like, if I don't have open and close, how do I do read and write? Yeah. Functional file system. Okay, there we go. I don't do write. Okay, now I've got three calls off the list. I don't need to write either because it's a functional file system. So how do I do read? Let's just talk about read, then. Yeah. No, I don't have open and close. No open and close. How do I get rid of open and close? Yeah. Just pass the path name to read and write. That's so simple, right? Here's the file I wanna read. Here's the path name to it. Done. Why don't I do things this way? Seems simpler to me. Yeah, I mean, so I am setting a signal that the file is in use. So it's not terrible, right? Now that may or may not be a good signal. I might have programs that open a bunch of files that they never use, right? Hopefully not. But yeah, so this allows the file system to identify a little bit more about when things are being used. There's also semantics of open that I might wanna support. So UNIX allows me to open a file exclusively. What does that mean? There's this O-exclusive flag I can pass to open. How does that change the semantics of open? What does it sound like? Exclusive. Yeah, it means that my process is down the only process that can have this file open. What happens if I call open and I get what happens if Process A calls open with the exclusive flag and then Process B tries to open the file? It should fail, right? Because Process A has the file open exclusively. So I can't support those sort of semantics if I don't have open and close. So you guys point it out, I can potentially improve, maybe I can improve performance of the OS knows what files are open. It also allows me to do things like exclusive access. But here's what's interesting, right? So there are file systems that don't support this at all. So NTFS, which is still, sorry, NFS. NFS, which is still the file system that's used by a lot of shared clusters, probably by something like Timberlake, unless it's using something that's even older. NFS doesn't support opening and closing. On NFS, everything is done using read and write. I don't, and now you can run the same program. You can run your program that calls open and close on an NFS file system, but the open and closed actually don't really do any work. They don't perform any operations. Every time I read and write to the file, I have to pass a full identifier to the file to the file system in order for those calls to work. Now the reason for this, of course, is because these were file systems were designed for network clients. So if I, let's say that process A opens a file exclusively, and then that machine becomes disconnected from the network. What do I do? Never open the file again. That's my functional file system, right? Just make a new copy of it, move on with life, right? No, I mean these things cause problems. So newer versions of NFS have solved this problem, but earlier versions of NFS that are still probably more widely deployed than they should be still have this feature. So I also store the file position, right? You guys remember this one I do L-seq. That's also kind of weird, right? What's the alternative to storing the file position? So I got rid of open and close, now I'm coming after L-seq. Almost, we can get rid of L-seq. How do we get rid of L-seq? Why do I need L-seq? Yeah. Yeah, that sounds awesome, right? Get rid of L-seq, L-seq is just a hanger on, man. I don't know who came up with L-seq, but we don't need L-seq anymore. All I do and read or write, I'm already passing the full path name, so I just pass the area I want to read or write from, that's a lot clearer, right? None more of these weird implicit reads and writes. I just pass the pointer to where I want to read. I could do this, of course, this makes things a little different when I'm using pipes and other things, okay? All right, we're out of time. We'll come back, talk about the file interface on Monday, have a great weekend, go play a trick on somebody.