 All right. Good afternoon. Clearly I have a branding issue. I should have called it review session if I wanted anyone to show up I guess but yeah, I'm here any questions for the three of us that are here Because that's all this is I'm just here. I'm not piling on new content. It's not not a Not a ruse or anything just questions about Anything in the course so far feel free to ask Sorry, oh you four exists for exams before this one. Yeah, this one's the 24th. Okay, so it's low in the priority list All right. Well, no questions because I'll just be here then Didn't get I think hopefully have Yeah, it's an applicable course But unfortunately I tried to make it so that ah You guys hopefully picked up some actual useful skills from the course because when I took it I don't even remember anything that happened in it. And then I Re-learned the stuff on my own later. I'm like wow. This is actually useful I'm hoping to add more and change them into exercises and just give you guys some solutions to it because Yeah, some of it I would rather people actually learn from reading good code as opposed to just kind of making stuff up Nice to try it too, but yeah. Oh Do you have any questions? Okay? So, yeah, it's just open office hours a reviewer if you want me to go over anything or whatever this is Unplanned no new material time, but particularly Hello, oh I mean we can also Let's see here We can also look at the topic list if you want that as inspiration. Let's see So here's stuff we have learned. So this is again. Yeah Yeah, so the the question just for this is what's the difference between a mutex and a spin lock? so a mutex is like a general lock that lets you just have mutual exclusion and It could be any type of implementation and the spin lock is a specific implementation of a mutex so spin lock specifically doesn't have a cue or anything so if it doesn't acquire the lock it'll just try over and over again and Sometimes you want that implementation because it's really really responsive So as soon as it's unlocked you get the lock immediately and sometimes like the critical section is really really small Because the default mutex implementation Uses a cue and it'll put the thread to sleep if it can't acquire the lock and then wake it up again in some random amount of time Or maybe when another thread unlocks it so it has a cue has a bunch more overhead So if it's a really small thing you want to use a spin lock But it's an actual implementation and sometimes you might actually care All right, any other any inspiration questions from the first 12 lectures. Hopefully that Spurs everyone's memory quick rapid fire. How do you create a process? Sweet what do you need to do when you create a process? After you create it Wait for yeah, you have to acknowledge your children don't want any orphans don't want any zombies everyone knows what those two are Hopefully Let's see those Well, we took three lectures to go over that. Wow So the question is can a process be both an orphan and a zombie and the answer to that is yes, so You can make a child and then your child Terminates you don't call weight so it's a zombie and then the parent process just terminates calls exit And then that process is now an orphan and a zombie at the same time It would get reparented Probably to a knit and then a knit would wait on it and then it would just die Yeah, you can also have just an orphan just a zombie or just a process of running. That's not Terminated yet. So yeah, they can they can be both. They're not mutually exclusive Uh Basic ipc. Let's see that was read and write. It's reading bytes and Reading writing bytes and reading bytes from a file descriptor. I don't think there's probably any questions about that Yeah Oh, okay. So Back in lecture two the kernel mode thing. Remember, there's that system call interface between you and the kernel and If you make a system call, it's actually Much much much much slower than just calling a normal function Because your cpu has to change modes. It has to go into kernel mode Then the kernel has to know what you're doing and then actually do whatever you asked it to and then do that So there's a lot of System calls are really really slow. So the question on the midterm was like 1000 something system calls versus just one and you would always want to just one. So there's like a specific stream of research too that tries to speed up the system call Interface by like batching calls together. So instead of doing like You know five system calls. They try and just make one system call that does all five things So that's like one Kind of flavor of the research that's out there for that But yeah, yeah, you can write a paper just on implementing that and then Then there's like the kernel architectures, which we didn't really go into because we didn't really implement a kernel We kind of saw it the last bit like Adding some code to the links kernel, but there's those different architectures right the Sorry monolithic kernel that has everything in it. So you need to do a bunch of the idea behind that is you don't do as many system calls because the kernel does everything So like all the drivers are in kernel mode All the file systems are in kernel mode and everything While if you have a micro kernel the idea behind that is to have kernel mode do the least amount of work possible And then your file systems would all have to go through system calls to actually work And all the drivers would be like normal programs and they would have to use system calls too So that's like and like there's a line between them so like On mac they don't let you write kernel code because they're apple So all apple drivers are like user mode stuff So you just write a driver and then you ask the kernel. Hey, this is my driver for this Please run it and it runs as a normal process So you can't ruin any hardware or like touch anything Think the windows drivers are actually in kernel mode, which probably explains why windows or at least did crash a lot in like well Windows 98 was probably before any of you were born. So pipes So what what about pipes? So pipes if you create a pipe You can think of it as like a buffer that's managed by the kernel And the interface it gives you are just two file descriptors One to write and one to read and that's it Yeah, so um, oops dog So if you let's say we make Some process you can see that right? Okay. Yeah, we make some process. Say it makes a pipe, right? Say it makes a pipe and then Uh file descriptor three is the read end And file descriptor four is the write end So without even forking or anything that process can use that pipe and read and write to it if it really wants to It's kind of useless because it's you're communicating to yourself But you could do that And yeah, if there's nothing in the pipe so you can think of a pipe as just being like some buffer Over here It will make sure that you don't have a invalid reads or anything like that So if nothing's written to it and there's nothing currently in the buffer and you call read on it Read is what we called a blocking system call. So if there's nothing to read it'll just Not return from read and actually block your process So your process won't wake up until there's actually something there to read And then it would return from that Yeah, so the question is if let's say, you know, let's say we have We forked here. Let's fork boom we forked this is now process three And let's say that process two You said it just writes once or writes twice writes twice. Let's see we'll write Whoops to four Something like that So and then this one just calls read From three Let's say we just have like a big buffer And this buffer would be in that process like it would be its own array, right? So let's say We make a buffer. That's the size of a page And we read to the buffer So the question is what would the other process Read well, we actually don't know because remember All these system calls they return the number of bytes So because it returned It needs to read something to the buffer But you don't know when it returns from read if the buffer just looks like this like it just process two could finish writing high And then because there's something in the buffer This could wake up like the scheduler could say, okay, you can go now return from wait or return from read And then the read would actually say, hey, I read two bytes and it would put high in that buffer Another possibility is process two does both the writes So in here we would get hide by and then After process two does both the writes then process three wakes up Calls read and it would actually read, you know, it would actually read Uh High oops high by Which would be like five bytes So we actually don't know it's like the same thing with threads. You don't know When you'll get interrupted or whatever because processes are Like threads, right the OS schedules and whatever it wants. So you'd either see high high by And that's probably your two possibilities. Yeah Uh question is read and write atomic and the answer to that is yes They're supposed to be thread safe because the kernel it's supposed to be managing a whole bunch of different things so everything the kernel does is thread safe For the most part there might be some weird caveats with that where you have to like read the documentation But for the most part they aim everything to be thread safe And process safe too. So if because two processes could be calling read on that as well And they'll be atomic Oh, yeah, so let's say if you instead Instead of doing the one read you did it with this and you said how many bytes you want to read Like this Yeah, so in that case the first one would definitely return high and the other one would definitely return by Minus any errors that could happen so if you know that Yeah, in this case there should be no errors unless Yeah, there should be no errors under normal behavior, but You might get like when we saw when we introduced signals we saw that sometimes a read system call can be interrupted so The read system call might be interrupted or something weird, but Under normal conditions it'll always work Yeah, like if you have a read and a for loop we did that right so if we go Let's see. Okay, please track me um So all the way back in six so that was essentially our Our cat implementation, right? There's just a wild loop that just read over and over again So it would read over and over again while there's still data to read from standard in And the only way it exits is if read return zero, which means there's no possible way to get any more input So that like this was the implementation for cat essentially, right? All it did was read from standard in and write whatever read out to standard out so yeah, you can just read over and over again and if it's a pipe the read end Will report closed if there's no right ends open for any process So it's not possible to get any more data into the buffer So if it's not possible to get any more data into the buffer and you've read everything in the buffer Then you're done So there's no possible way to get more data and that's how it knows it's done So that's why for the example to um When we started screwing with pipes Sometimes if we didn't close the right end of the pipes it would just kind of sit there and Look like it's stuck It'll look kind of like it's deadlocked I guess and that doesn't respond to anything But that was because you forgot to close one of the right ends of the pipe. So it was reading Reading from a pipe and it was still possible to get data in there Even though it probably it wasn't you never wrote anything there and it just sits there forever Yep. Yeah multiple processes can write to the same pipe Yeah, they can all read the same pipe too Um But remember that file descriptor when we talked about the file descriptor table it has a position in it So it's atomic like the question before so if a bunch of processes try and read from them It's guaranteed that only one will read whatever that data is and then that process would advance the position And then the next one that reads would get the next chunk of data. So if you have like Yeah, so here let's go back to this So instead if I had Whoa took too much So let's say instead Of these reads say I had the reads in different processes something like that well, then it's Kind of like a race condition between there like the reads are going to atomically happen, but you don't know which one's going to read first So in this case Here let's simplify it. Let's just say that this writes This just writes high buy all at once So this just writes high buy one All at once and then both processes are reading from that file descriptor You don't know which one's going to go first Right and where that buffer wants to read from Starts here like that's where the internal position would be In the like global open File table So either one could execute either one could return from read So this one could return first it would read high Which would be the first two characters and then atomically it would adjust the buffer. So this is the next thing to read And then the other process would return from read and it would see And it would read buy But what's the other option? Huh, yeah. Yeah, the other option would be a process four went first and Because the reads are atomic they're guaranteed to read everything because of how we have it set up But you don't know what in order. So if process four returns from read first it would read Hib and then process Three would write or would read yi so and that's the same if you like Share to file descriptor and then forked like eight times and then each called read Well, you're guaranteed that across all the processes calling read you'll read the entire thing But you have no idea what the order is going to be and That might be a very poor idea Especially if you assume that like each process is the only one reading from it and it represents a whole file or something like that Then each process would get different chunks of a file if you're unlucky and then you can't you probably won't be able to read it Oh, so if you close the read end to a pipe And that's all the read ends like all the read ends are closed And then you try to write to it the kernel will give you an error on that right system call so Yeah, it'll return negative one set error no and I think there's an error no that says Pipe closed or file descriptor closed or something like that, but there's a specific error for that Uh, if you read from a pipe that doesn't have any writers Then it will just say it's closed if you've read everything Yeah, that just returns zero and that's like if it was a file read would return zero if it reached the end of file and Read read returning zero for a pipe just means no more data. It's essentially done Yeah, too bad I wrote this exam like a week ago. Otherwise I could put that in Well, it's always next time All right, uh any other things or shall we go back to the list of topics? All right So let's see So after that what'd we get? We got virtual memory every Is that could that's cut off a little bit? You know virtual memory everyone's favorite topic So hopefully lab five drilled that into you a bit better. So other people didn't have lab five but So hopefully you do much better at the uh virtual memory question Oh, yeah virtual memory page table page table implementation those three that was all virtual memory lectures You did the tricky thing about page tables is implementing copy on right. That was like probably one of the harder things you could do So hopefully that wasn't too bad and you didn't just Hopefully you didn't just slam in code until all the test cases passed But sometimes people do that Oh, let's see threads. So threads kind of hard So any questions about threads? Well threads themselves aren't that hard data races are hard So everyone off the top of your head. What's the data race? Yep, so that's a fine fine enough definition when there's a a bunch of things trying to read and write at the same time more than one And at least one of them's a right so The actual word is like if there's concurrent accesses but Same deal if you have threads you have concurrency. So you don't really need to say that part so that's Pretty much the concern whenever you have threads you want to worry about data races because you will get really really really odd values and it will be absolutely A pain to debug So those are the types of bugs that have lived in the kernel for like seven years before someone has fixed them So yeah, that should tell you about how hard they are to identify um and Unfortunately for all of you you are now in the age of multicore machine. So if you want your If you want your program to run fast, well, guess what you have to use threads And then you have to worry about it. So if you care about performance at all you will have to deal with it And this is why people can't use python for everything as much as they would like to so Has anyone Okay, who here has ever tried to use threads in python? One how was it? Was it slow? Yeah, okay, so do you know why it was slow? Okay, so python does support threads, but it's really really really really really slow lives that so python Is an interpreter. It doesn't actually have compiled code or anything So within the python interpreter, guess what there's a bunch of date potential data races that it has to Prevent so the python interpreter has a has a Ideally the python interpreter would just have a bunch of locks for anything to prevent deadlock But the easiest thing to do which is what they did is they just put a lock around the entire interpreter So they don't have to worry about any data races, but whenever you call them to like the python interpreter There's something called the g il or the global interpreter lock And the whole thing locks so If you use python and you use threads in python, it will probably be Mostly the same as single threaded because it's all going to use the interpreter It just has one big lock around it. So it will only actually use one thread. So it'll be slow as hell So if you want python to go fast And you want to stay in python the recommendation is to use A another process. So there's a multiprocess module So instead of dealing with threading and doing locks you just use multiprocess Which will fork and then everyone has its own independent copy of the python interpreter And they can run at full speed. Well full python speed, which isn't that fast But if you want to make python go fast you pretty much need to use processes Which is hey good thing we had this course so Yeah, which yep So if you have multiple threads a thread can call exit zero And that just ends the process Yeah, p thread exit or something like that if you don't want to kill the process So yeah, if you want to end a thread it's just p thread exit if you're using p threads Yep Yeah, so the question is if a thread makes it to the end of its run function does a p thread exit automatically and yeah It does exactly what you did in lab three So if it makes it to the end it implicitly calls p thread exit, which is what you guys did too, right? So you kind of know how that how they would have implemented it Um, let's see threads thread threads. Well, we talked a lot about threads threads and locks center four is locking Oh, we even did a paralyzation example. That was fun so Yeah, but the whole thing is pretty much You have to a prevent data races Then if you have multiple locks you have to b prevent dead locks and then In some programs you also have to see ensure some type of ordering between threads depending on if you have a bunch of dependencies or whatever, right any general any other general questions about threads or anything like that okay, uh Let's see. So last last third the course was pretty much not that much. So it was disks. So SSDs being kind of weird that they have to erase like huge Oh god, I even forget the name. What are they called? not pages above pages Whole blocks. Did they call them blocks? They might call them blocks But you have to erase multiple pages at a time and you can only write to a freshly erased page and has a bunch of weird rules Um, which is why it makes it kind of difficult. Then we saw file systems like some basic ones you could do Different allocation strategies. That's pretty much all file systems job is is to give you a name to refer to a file and allocate some blocks in some manner and All the inode ones while you got to see an inode file system up close and personal in lab six So that was a whole file system. So you're if you're running linux the file system You're actually running probably it's just an extension of that has some more features, but It's essentially all it is So your it's just instead of being one megabyte yours is hopefully several Several hundred gigabytes or terabytes. So but it's the only difference. It's otherwise exactly the same Then we saw sockets that was more IPC that was mostly to give you some background So you can connect to the internet and fun stuff like that Page replacement that had the clock algorithm. Hopefully that was okay to follow You know for sure that okay. Well, you can probably deduce that that's on the final so practice that that hopefully is free marks then general memory allocation buddy and slab allocators and virtual machines And then that was it for the content Then after that we saw a fun memory mapping thing people making machine learning models go fast by knowing about operating systems Then some kernel module just so we could see some kernel mode stuff rust which was Just fun. I guess uh And it was just rewriting that threads There that bank simulator and showing that it prevents Data races, but does not prevent deadlocks And then yeah today open office hours so Anything for the last third of the course last third of the course was definitely a lot lighter than the rest of it But sorry wait me Oh, so what the difference between the sim link and the file was? All right here, we'll go here. Okay, so we'll try and flip back and forth because this is probably a good question. So Let's say we make a file called a dot txt So it's just a normal file. It gets its own inode everything right has no content. Let's make some content for it Okay So now it has some content. It's 10 bytes. It's got this inode number, right So if we make a hard link So if we make a hard link Refers to the same inode and it's just another direct or another entry in that directory because the directories essentially just paired name to inode So if I modify one of these files it modifies the same underlying inode. So if I Right if I go ahead and I modify This and then I look at b dot txt while I see the same contents of the file All right, and Let's make another file called c dot txt And then we'll make a soft link all right So we made a soft link. So a soft link does not refer to the same inode. So This is the inode for c dot txt and then It's different than the inode for d dot txt because it's supposed to just refer to a different name, right? so it's So it means if I try and access d dot txt it would say okay to access d dot txt actually look at c dot txt and then you could read c dot txt, right So, sorry, what's your question about sim links? Like why would I use them or How is d different than What sorry Yeah, so here So, yeah So the difference is kind of in their use. So the sim link it's its own inode and just Uh refers like name to name it has a bit of a hop So you can think of soft links as name to name but in actuality, you know what they are their name in a directory to uh, I know soft or to an soft link inode and that stores the name. So there's like a vivid jump, but really abstractly, they're just name to name and yeah, why they would be different. Well, so in this example right now Uh a dot txt refers to the same thing as b dot txt and then c is the same as d, but it's just through another Uh, it's more indirect, right? But if I do cat a dot txt It says I get the same thing as b dot txt and then if I do c dot txt and d dot txt They're the same thing right kind of looks like they serve the same purpose But yeah, so the difference is well because it's a hard link if I do Say I remove A dot txt, right? I don't Actually remove that inode because there's still something pointing to it b dot txt But let's say I make a new Uh a dot txt So let's say this is version two So now because I just removed uh, I just removed a dot txt and made a new one. Well, guess what? So if I look at a dot txt, this is version two if I look at b dot txt It's still pointing to the old one So maybe you want that behavior in which case you'd use a hard link But if you actually want them to always refer to the same thing then A soft link's probably a better idea because well, what happens if I remove c dot txt well now d dot txt Refers to c dot txt which doesn't exist which that makes sense c dot txt doesn't exist But if I went ahead and made c dot txt and said this is v2 of c dot txt Well, guess what now they both refer to the same thing still and I didn't have to update anything All right d dot txt and c dot txt are the same thing because I just Went through one layer of indirection. I said, okay D dot txt should refer to the same thing as c dot txt at all times So you don't have to play with the inodes or whatever the final inode you arrive at is whatever inode c dot txt arrives at so it always keeps it up to date and sometimes you really really want this so like Bunch of the times in linux configuration files or server files or when you have to do cloud stuff This is how it knows what to start up So it will just have a soft link that is in a particular spot And they just say I don't care This web server always goes to this sim link and will just start running whatever that points to And that way if you update it later make it point to a different inode It doesn't care it just goes to that name and then whatever that refers to So if you have to update anything it always stays consistent Otherwise if you really wanted A and B to always point to the same thing that would be really really annoying So if you change what either of them point to you have to make sure to change them all so they all agree So instead of dealing with that headache you just use a soft link instead Yep Yeah, yeah d dot txt would have its own inode So you created like a soft link inode and then the data that's stored on that soft link inode is the name So it so it just uses that name and just restarts the path resolution So like Because i'm currently in this directory it would use this current directory. So it's c dot txt in this current directory So I could also do like for soft links. I could do like This right So I could create a soft link that's name is like an entire path And then hey, guess what if I cd to holmder well It keeps track of like my shell keep track of the path, but guess what this is actually my home directory And it's the same thing as if I went to home john. Whoops Like it's the same contents So it's kind of like a shortcut. Yeah, uh here. Let's what I call it uh open office. Yeah, so that's a good question. So Your question was if I cd into holmder So I cd into holmder so this would actually Go ahead and access that directory. I know right for home john and nothing else Right, even though it looks like this path. So within here is going to be all of The entries of that directory and it's still that directory. So like this path Here isn't really a real path, right? So You asked if I do cd dot dot. So if I do cd dot dot, where do you think I'm going to end up? Yeah, and why would I end up in home because that's In my directory dot dot refers to that i node And let's see. So if I do that it Yeah, but the path I'm in the right directory, but your shell keeps track of the path with the sim links so It's kind of lying to you a little bit. Well It's not really lying to you. That's how you got there So your shell keeps track of if you went there through sim link. It will adjust the name based off that Because uh Oh, wait Oh, that was annoying. Okay, never mind. Okay So in that case Your shell does something really weird then. Yeah, because we're in home directory And dot dot is supposed to be Here at this number Which is not this number So my shell played a magic trick on me So it didn't actually do what I wanted it to do So my shell obviously remembered that when I cd'd into it, it was a sim link And I wanted to go back to the sim link not what that directory actually was Probably based off this path. So it figured it out based off how I got there and tried to fool me And I guess it worked Yeah, yep, it's an AI camera. Oh See No, it's a It is a one you can just buy I guess I should stay neutral and not like advertise what it is I can say it after Actually screw it. Yeah, it's an insta link 360 Insta 360 link So it's a so you can see the actual physical thing move But before I was using my iPhone, but I thought looked like garbage Because yeah, your iPhone uses Like the wide angle lens and it looks like crap and it just digitally crops in this one. Actually you can see it move Like it that thing physically moves But yeah, I only got this like one day, but it seems to be pretty nice All right, any other final questions concerns front Or do we want To look at a last minute panics with the Front page of the exam. Where did I even put it? Yeah, so front page of the exam again Really quick this short answer page replacement. You probably know what that's going to be virtual memory processes processes and threads how they interact together Locking center fours Locking you can probably bet will be either deadlocks or data races I actually forget So I can't ruin that one center fours another threads question just threads That could also involve data races and deadlocks And then file systems, which I mean we kind of just went through file system stuff here. You did lab six allegedly Uh, so Yeah, so you should be good for that um Final last minute things Uh, let's see So another good Possible question. Let's see Yeah, so here's like a style of question if You can I mean if you we probably don't have enough time to go over it now because we only have like three minutes But this is the cs111 summer final question three So this is probably closer to your threads and process question because guess what it has both So this creates what four threads and then In each thread it forks Anyone remember what happens when you fork within a thread? So Chooses a random thread Sorry Yes, okay, so If your process has four threads and then one of the threads calls fork You that creates a new process and then the dilemma is okay. Does that new process have four threads? Or does it have one or doesn't have whatever? So What it will do and what the guarantee is is that newly created process will only have one thread running in it And it is the same thread that called fork So it's a copy of that one So that's how they resolve that issue the other weird thing that if you have threads with processes So remember signals get sent to a process So another thing you would have to contend with if you add threads is oh If you send a signal to a process with eight threads What happens do they all process a signal does one get it? Because no one initiated it right so In the case of signals the kernel just says yeah a random thread gets it You have no control over who gets the signal just one will get it and not everyone And that's like There's other complication, but those are the two main ones we care about All right, so let us wrap up so Little anticlimactic end to the year, but hopefully the course was enjoyable you learn stuff and Even if you didn't like the course if you were staying in software Guess what you will have to deal with this stuff because everything runs on an operating system. So hopefully You learn some things so you can actually optimize your program You can make it multi-threaded. You can make it run really fast And unlike other people that didn't take this course You'll your code won't have data races in you won't have deadlocks Everything will go as fast as possible and it will be great So yeah, just remember for the final pulling for you. We're all in this together