 Today I'd like to talk about GO, which is especially interesting for us in this course because GO is the language that the labs are all going to do the labs in. So I want to focus today particularly on some of the machinery that's sort of most useful in the labs and in most particular to distributed programming. First of all, it's worth asking why we use GO in this class. In fact, we could have used any one of a number of other system style languages. There's plenty of languages like Java or C-Sharp or even Python that provide the kind of facilities we need. And indeed we used to use C++ in this class and it worked out fine. GO, indeed like many other languages, provides a bunch of features which are particularly convenient. That's good support for threads and locking and synchronization between threads, which we use a lot. It is a convenient remote procedure call package, which doesn't sound like much, but it actually turns out to be a significant constraint. For many languages like C++, for example, it's actually a bit hard to find a convenient easy to use remote procedure call package. And of course we use it all the time in this course for programs and different machines to talk to each other. Unlike C++, GO is type safe and memory safe. That is, it's pretty hard to write a program that due to a bug scribbles over some random piece of memory and then causes the program to do mysterious things. And that just eliminates a big class of bugs. Similarly, it's garbage collected, which means you're never in danger of freeing the same memory twice or free memory that's still in use or something. The garbage collector just frees things when they stop being used. And one thing that's maybe not obvious until you've played around with this kind of programming before, but the combination of threads and garbage collection is particularly important. One of the things that goes wrong in a non-garbage collected language like C++ if you use threads is that it's always a bit of a puzzle and requires a bunch of bookkeeping to figure out when the last thread that's using a shared object has finished using that object because only then can you free the object. And so you end up writing quite a bit of code. It's like manually the programmer writes a bunch of code to manually do reference counting or something in order to figure out when the last thread stopped using an object. And that's just the pain and that problem completely goes away if you use garbage collection like we haven't go. And finally, the language is simple, much simpler than C++. One of the problems with using C++ is that often if you made an error, maybe even just a typo, the error message you get back from the compiler is so complicated that in C++ it's usually not worth trying to figure out what the error message meant. And I find it's always just much quicker to go look at the line number and try to guess what the error must have been because the language is far too complicated. Whereas Go probably doesn't have a lot of people's favorite features, but it's relatively straightforward language. Okay, so at this point you've all done the tutorial. If you're looking for sort of what to look at next to learn about the language, a good place to look is the document title, defective Go, which you can find by searching the web. All right, the first thing I want to talk about is threads. The reason why we care a lot about threads in this course is that threads are the sort of main tool we're going to be using to manage concurrency in programs. And concurrency is a particular interest in distributed programming because it's often the case that one program actually needs to talk to a bunch of other computers. A client may talk to many servers, or a server may be serving requests at the same time on behalf of many different clients. So we need a way to say, oh, my program really has seven different things going on because it's talking to seven different clients, and I want a simple way to allow it to do these seven different things without too much complex programming. For us, threads are the answer. So these are the things that the documentation calls GoRoutines, which I call threads. Their GoRoutines are really the same as what everybody else calls threads. So the way to think of threads is that you have a program, you have one program and one address space. I'm going to draw a box to sort of denote an address space. And within that address space and a serial program without threads, you just have one thread of execution executing code in that address space. One program counter, one set of registers, one stack that is sort of describing the current state of the execution. In a threaded program, like a Go program, you can have multiple threads. And I'm going to draw it as multiple squiggly lines, and what each line represents really is a separate, especially if the threads are executing at the same time, but a separate program counter, a separate set of registers and a separate stack for each of the threads so that they can have a sort of their own thread of control and be executing each thread in a different part of the program. And so hidden here is that for every stack, there's a set of every thread, there's a stack that it's executing on. The stacks are actually in the one address space of the program. So even though each stack, each thread has its own stack, technically they're all in the same address space, and different threads could refer to each other stacks if they knew the right addresses, so you typically don't do that. And in Go, even the main program, when you first start off a program and it runs in main, that's also, it's just a Go routine. You can do all the things that Go routines can do. So as I mentioned, one of the big reasons is to allow different parts of the program to sort of be in its own point in a different activity. So I usually refer to that as IO concurrency for historical reasons. And the reason I call it IO concurrency is that in the old days where this first came up is that, oh, you might have one thread that's waiting to read from the disk, and while it's waiting to read from the disk, you'd like to have a second thread that maybe can compute or read somewhere else on the disk or send a message in the network and wait for a reply. So IO concurrency is one of the things that threads by you. For us, it will usually mean, IO concurrency will usually mean I can have one program that has launched remote procedure calls, requests to different servers on the network and is waiting for many replies at the same time. That's how it'll come up for us. And the way you would do that with threads is that you would create one thread for each of the remote procedure calls that you wanted to launch. One thread would have code that, you know, sent the remote procedure call request message and sort of waited at this point in the thread and then finally when the reply came back, the thread would continue executing. And using threads allows us to have multiple threads that all launch requests into the network at the same time. They all wait or they don't have to do it at the same time. They can, you know, execute the different parts of this whenever they feel like it. So that's IO concurrency, sort of overlapping of the progress of different activities and allowing while one activity is waiting, other activities can proceed. Another big reason to use threads is multi-core parallelism, which I'll just call parallelism. And here the thing we're trying to achieve with threads is if you have a multi-core machine, like I'm sure all of you do in your laptops, if you have a sort of compute heavy job that needs a lot of CPU cycles, wouldn't it be nice if you could have one program and you could use CPU cycles on all of the cores of the machine? And indeed, if you write a multi-threaded, if you launch multiple Go routines in Go and they do something compute intensive, like sit there in a loop and compute digits of pi or something, then up to the limit of the number of cores in the physical machine, your threads will run truly in parallel. And if you launch two threads instead of one, you'll get twice as many, you'll be able to use twice as many CPU cycles per second. So this is very important to some people. It's not a big deal in this course. It's rare that we'll sort of think specifically about this kind of parallelism. In the real world, though, of building things like servers to form parts of a distributed system, it can sometimes be extremely important to be able to have the server be able to run threads and harness the CPU power of a lot of cores, just because the load from clients can often be pretty high. So parallelism is a second reason why threads are quite a bit of interest in distributed systems. And a third reason, which is maybe a little bit less important, is there's times when you really just want to be able to do something in the background or there's just something you need to do periodically and you don't want to have to, sort of, in the main part of your program, it checks to say, well, should I be doing these things that should happen every second or so? You'd just like to be able to fire something up that every second does whatever the periodic thing is. So there's some convenience reasons. And an example which will come up for you is it's often the case that a master server may want to check periodically whether its workers are still alive because one of them has died, you want to launch that work on another machine, like MapReduce might do that. The range, sort of, oh, do this check every second, every minute, you know, send a message to the worker, are you alive? Is to fire off a go routine that just sits in a loop that sleeps for a second and then does the periodic thing and then sleeps for a second again. And so in the labs you'll end up firing off these kind of threads quite a bit. Yes. Is the overhead worth it? Yes. The overhead's really pretty small for this stuff. You know, it depends on how many. You create a million threads that each sit in a loop waiting for a millisecond and then send a network message. That's probably a huge load on your machine. But if you create, you know, 10 threads that sleep for a second and then do a little bit of work, it's probably not a big deal at all. And it's, I guarantee you, the programmer time you save by not having to sort of mush together sort of different loop, different activities into one line of code is worth a small amount of CPU cost, almost always. Still, you know, you will, if you're unlucky, you'll discover in the labs that some loop of yours is not sleeping long enough or you fired off a bunch of these and never made them exit, for example, and they just accumulate. So, you can push it too far. Okay, so these are the reasons, the main reasons that people like threads a lot and that we'll use threads in this class. Any other questions about threads in general? Yes. So the differences are there between concurrent programming and asynchronous programming? By asynchronous programming, you mean like a single thread of control that keeps state about many different activities. Yeah, so this is a good question, actually. There is, you know, what would happen if we didn't have threads or for some reason we didn't want to use threads? Like, how would we be able to write a program that could, you know, a server that could talk to many different clients at the same time or a client that could talk to many servers? What tools could be used? And it turns out there is sort of another line of, another kind of, another major style of how you structure these programs called, you call the asynchronous program, I might call it event-driven programming. So sort of, or you could use event-driven programming. And the general structure of an event-driven program is usually that it has a single thread and a single loop, and what that loop does is sits there and waits for any input or sort of any event that might trigger processing. So an event might be the arrival of a request from a client or a timer going off. Or if you're building a window system, many window systems on your laptops have written an event-driven style where what they're waiting for is like key clicks or mouse movement or something. So you might have a single, an event-driven program that have a single thread of control, sits in a loop, waits for input, and whenever it gets an input, like a packet, it figures out, oh, you know, which client did this packet come from? And then it'll have a table of sort of what the state is of whatever activity it's managing for that client. And it'll say, oh, gosh, I was in the middle of reading such and such a file, you know, now it's asked me to read the next block. I'll go and read the next block and return it. And the threads are generally more convenient because they allow you to really, you know, it's much easier to write sequential, just like straight-line sequential code that does, you know, compute, sends a message, waits for a response, whatever. It's much easier to write that kind of code in a thread than it is to chop up whatever the activity is into a bunch of little pieces that can sort of be activated one at a time by one of these event-driven loops. That said, the, well, and so one problem with this scheme is that it's a little bit of a pain to program. Another potential defect is that while you get IO concurrency from this approach, you don't get CPU parallelism. So if you're writing a busy server that would really like to keep, you know, 32 cores busy on a big server machine, you know, a single loop is, you know, it's not a very natural way to harness more than one core. On the other hand, the overheads of event-driven programming are generally quite a bit less than threads. You know, threads are pretty cheap, but each one of these threads is sitting on a stack, you know, a stack is a kilobyte or a kilobyte or something. You know, if you have 20 of these threads, who cares? If you have a million of these threads, then it's starting to be a huge amount of memory. And, you know, maybe the scheduling bookkeeping for deciding what the thread to run next might also start, you know, an entire list, scheduling lists with a thousand threads in them. The threads can start to get quite expensive. So if you are in a position where you need to have a single server that serves, you know, a million clients to keep a little bit of state for each of a million clients, this could be expensive. And it's easier to write a very, you know, some expense in programmer time. It's easier to write a really stripped down, efficient, low overhead service in event-driven programming. Just a lot more work. Yeah. Language is like JavaScript, right? You could write multiple event listeners. So, and those would be event-driven. Do they have multiple cores or is just one thread? I mean, do they have multiple threads or is just on one thread that everything else? Are you asking about JavaScript? I don't know. The question is whether JavaScript has multiple cores executing your, does anybody know? Depends on the implementation. Yeah. So I don't know. I mean, it's a natural thought. Even in Go, you might well want to have, if you knew your machine had eight cores, if you wanted to write the world's most efficient whatever server, you could fire up eight threads. And on each of the threads run a sort of stripped down event-driven loop. Just, you know, sort of one event loop per core. And that, you know, that would be a way to get both parallelism and sort of bio concurrency. Yes. Okay, so the question is what's the difference between threads and processes? So usually on a, like a Unix machine, a process is a single program that you're running and a sort of single address space, a single bunch of memory for the process. And inside a process, you might have multiple threads. So when you write a Go program and you run the Go program, running the Go program creates one Unix process and one sort of memory area. And then when your Go process creates Go routines, those are sort of sitting inside that one process. I'm not sure that's really an answer, but just historically the operating systems have provided like this big box is the process that's implemented by the operating system. And the individual and sort of the operating system does not care what happens inside your process, what language you use, none of the operating systems business. But inside that process, you can run lots of threads. Now, you know, if you run more than one process in your machine, you know, you run more than one program like an editor or compiler. The operating system keeps them quite separate. Your editor and your compiler each have memory, but it's not the same memory. They're not allowed to look at each other's memory. There's not much interaction between different processes. So your editor may have threads and your compiler may have threads, but they're just in different worlds. So within any one program, the threads can share memory and can synchronize with channels and use mutexes and stuff. But between processes, there's just no interaction. That's just a traditional structure of this kind of software. Yeah. So the question is, when a context switch happens, does it happen for all threads? Okay, so let's imagine you have a single core machine that's really only running and just doing one thing at a time. Maybe the right way to think about it is that you're going to be, you're running multiple processes on your machine. The operating system will give the CPU time slicing back and forth between these two programs. So when the hardware timer ticks and the operating system decides it's time to take away the CPU from the currently running process and give it to another process, that's done at a process level. It's complicated. All right, let me restart this. The threads that we use are based on threads that are provided by the operating system in the end. And when the operating system context switches, it's switching between the threads that it knows about. So in a situation like this, the operating system might know that there were two threads here in this process and three threads in this process. And when the timer ticks, the operating system will, based on some scheduling algorithm, pick a different thread to run. It might be a different thread in this process or one of the threads in this process. So it will cleverly multiplex as many go-routines on top of single operating system threads to reduce overhead. So there's really probably two stages of scheduling. The operating system picks which big thread to run and then within that process, Go may have a choice of go-routines to run. All right. So threads are convenient because a lot of time they allow you to write the code for each thread in a pretty ordinary sequential program. However, there are, in fact, some challenges with writing threaded code. One is what to do about shared data. One of the really cool things about the threading model is that these threads share the same address space. They share memory. If one thread creates an object in memory, it can let other threads use it. You can have an array or something that all the different threads are reading and writing. That's critical. If you're keeping some interesting state, maybe you have a cache of things that your server, you have a cache in memory. When a thread is handling a client request, it's going to first look in that cache. But it's a shared cache and each thread reads it and the threads may write the cache to update it when they have new information to stick in the cache. So it's really cool. You can share that memory. But it turns out that it's very, very easy to get bugs if you're not careful and you're sharing memory between threads. So a totally classic example is supposing you have a global variable n that's shared among the different threads and a thread just wants to increment n. By itself, this is likely to be an invitation to bugs if you don't do anything special around this code. The reason is that whenever you write code in a thread that you know is accessing, reading or writing data that's shared with other threads, there's always the possibility and you've got to keep in mind that some other thread may be looking at the data or modifying the data at the same time. So the obvious problem with this is that maybe thread one is executing this code and thread two is actually in the same function in a different thread executing the very same code. And remember I'm imagining that n is a global variable. So they're talking about the same n. So what this boils down to, is you're running some machine code, the compiler produced. And what that machine code does is it loads x into a register, adds one to the register and then stores that register into x. Or x is the address of some location in RAM. So you can count on both of the threads. They're both executing this line of code. They both load the variable x into a register. And if x starts out as zero, that means they both loaded zero. They both increment that register so they get one. And they both store one back to memory. And now two threads of increment at n and the resulting value is one. Which, well who knows what the programmer intended. Maybe that's what the programmer wanted. But chances are not. Chances are the programmer wanted two, not one. The instructions get compiled down to assembly. Can we consider each line to be atomic? No. Some instructions are atomic. So the question is a very good question. Which, it's whether individual instructions are atomic. So the answer is some are and some aren't. So a store, a 32 bit store is extremely likely to be atomic. In the sense that if two processors store at the same time to the same memory address, 32 bit values, what you'll end up with is either the 32 bits from one processor or the 32 bits from the other processor, but not a mixture. Other sizes, it's not so clear. Like one byte stores that depends on the CPU you're using. Because a one byte store is really almost certainly a 32 byte load and then a modification of 8 bits and then a 32 byte store. But it depends on the processor. And more complicated instructions like increment. Your microprocessor may well have an increment instruction that can directly increment some memory location. Pretty unlikely to be atomic. Although there's atomic versions of some of these instructions. Yeah. So there's no, anyway. All right, so this is just classic danger. And it's usually called a race. A lot. Because you're going to do a lot of threaded programming with shared state. Race I think refers to some ancient class of bugs involving electronic circuits. But for us, the reason why it's called a race is because if one of the CPUs has started executing this code and the other one, the other thread is sort of getting close to this code. It's sort of a race as to whether the first processor can get to the store before the second processor starts to execute the load. If the first processor actually manages to do the store before the second processor gets to the load, then the second processor will see the stored value. And the second processor will load one and add one to it and store two. So that's how you can justify this terminology. Okay, so the way you solve this, it's certainly something that's simple, you know, as a programmer you have some strategy in mind for locking the data. You can say, well, this piece of shared data can only be used when such and such a lock is held. And you'll see this, and you may have used this in the tutorial. Go calls locks mutexes, so what you'll see is a mu.lock before a sequence of code that uses shared data and mu.unlock afterwards. And then whichever two threads execute this, whichever one is lucky enough to get the lock first gets to do all this stuff and finish before the other one is allowed to proceed. And so you can think of wrapping some code in a lock as making a bunch of, you know, remember this, even though it's one line, it's really three distinct operations. You can think of a lock as causing a sort of multi-step code sequence to be atomic with respect to other people who have the lock. Yes. Should you? So, can you repeat the question? Oh. That's a great question. The question was how does Go know which variable we're locking? Right here, of course, there's only one variable, but maybe we're saying n equals x plus y, really three different variables. And the answer is that Go has no idea. It's not, there's no association at all anywhere between this lock, so this mu thing is a variable which is a type mu text. There's just, there's no association in the language between the lock and any variables. The association's in the programmer's head. So as a programmer you need to say, oh, here's a bunch of shared data and any time you modify any of it, here's a complex data structure, say a tree or an expandable hash table or something, any time you're going to modify it, and of course a tree is composed of many, many objects, any time you're going to modify anything associated with this data structure you have to hold such and such a lock. And of course it's many objects and instead of objects changes because you might allocate new tree nodes, but it's really the programmer who sort of works out a strategy for ensuring that the data structure is one core at a time and sort of creates the one or maybe more locks. There's many locking strategies you could apply to a tree. You could imagine a tree with a lock for every tree node. The programmer works out the strategy, allocates the locks and keeps in the programmer's head the relationship to the data, but go for go it's this lock it's just like a very simple thing, there's a lock object, the first thread that calls lock gets the lock, other threads have to wait until it unlocks and that's all Go knows. Does it not lock all variables that are part of the object? Go doesn't know anything about the relationship between variables and locks. So when you acquire that lock, when you have code that calls lock, exactly what it is doing is acquiring this lock and that's all this does. And anybody else who tries to lock objects, somewhere else who would have declared mu tax mu and this mu refers to some particular lock object, there are many, many locks and all this does is acquires this lock and anybody else who wants to acquire it has to wait until we unlock this lock and that's totally up to us as programmers what we were protecting with that lock. So the question is is it better to have the lock be a private business of the data structure? Supposing you're implementing map and you would hope, although it's not true that map internally would have a lock protecting it and that's a reasonable strategy would be to have the if you define a data structure that needs to be locked to have the lock be sort of interior that have each of the data structures methods be responsible for acquiring that lock and the user of the data structure may never know that's pretty reasonable and the only point at which that breaks down is that well, it's a couple things. One is if the programmer knew that the data was never shared they might be bummed that they were paying the lock overhead for something they knew didn't need to be locked. So that's one potential problem. The other is that if there's any inter-data structure dependencies we have two data structures each with locks and they maybe use each other then there's a risk of cycles and dead locks right and dead locks can be solved but the usual solutions to dead locks require lifting the locks out of the out of the implementations up into the calling code. I will talk about that some point but it's not it's a good idea to hide the locks but it's not always a good idea right okay so one problem you run into with threads is races and generally you solve them with locks or actually there's two big strategies one is you figure out some locking strategy for making access to the data single thread one thread at a time or you fix your code to not share data. If you can do that that's probably better because it's less complex alright so another issue that shows up with threads called coordination when we're doing locking the different threads involved probably have no idea that the other ones exist they just want to like be able to get out the data without anybody else interfering but there are also cases where you need where you do intentionally want different threads to interact. I want to wait for you producing some data you're a different thread than me you're producing data I want to wait until you've generated the data before I read it or you launch a bunch of threads to say crawl the web and you want to wait for all those threads to finish so there's times when we intentionally want different threads to interact with each other to wait for each other and that's usually called coordination and there's a bunch of as you probably know from having done the tutorial there's a bunch of techniques in go for doing this like channels which are really about sending data from one thread to another and waiting for the data to be sent there's also other stuff that more special purpose things like there's an idea called condition variables which is great if there's some thread out there and you want to kick it you're not sure if the other threads are even waiting for you but if it is waiting for you you just like to give it a kick so it can we'll know that it should continue whatever it's doing and then there's weight group which is particularly good for launching a known number of go routines and then waiting for them to all finish and a final piece of damage that comes up with threads is deadlock deadlock refers to the general problem that you sometimes run into where one thread thread this thread is waiting for thread two to produce something so it's drawn arrow to say thread one is waiting for thread two for example thread one may be waiting for thread two to release a lock or to send something on a channel or to decrement something on a weight group however unfortunately maybe T2 is waiting for thread one to do something and this is particularly common in the case of locks thread one acquires lock A thread two acquires lock B so thread one is acquired lock A thread two is acquired lock B and then next thread one needs to lock B also that is hold two locks which sometimes shows up and it just so happens that thread two needs to hold lock A that's a deadlock right they'll each grab their first lock and then proceed down to where they need their second lock and now they're waiting for each other forever right neither one can proceed neither one then can release the lock and usually just nothing happens so if your program just kind of grinds to a halt and doesn't seem to be doing anything but didn't crash deadlock is one thing to check okay alright let's look at the the web crawler from the tutorial as an example of some of this threading stuff I have a couple of two solutions in different styles are really three solutions in different styles to allow us to talk a bit about the details of some of this thread programming so first of all you all probably know web crawler its job is you give it the URL of a page that it starts at and many web pages have links to other pages so what a web crawler is trying to do is fetch the first page extract all the URLs that were mentioned in that pages links fetch the pages they point to look at all those pages for the URLs URLs that they refer to and keep on going until it's fetched all the pages in the web let's just say and then it should stop in addition the graph of pages and URLs is cyclic that is if you're not careful you may end up following if you don't remember, oh I've already fetched this web page already you may end up following cycles forever and your crawler will never finish so one of the jobs of the crawler is to remember the set of pages that is already crawled or already even started a fetch for and to not start a second fetch for any page that is already started fetching on and you can think of that as imposing a tree structure using a sort of tree shaped subset of the cyclic graph of actual web pages ok so we want to avoid cycles we want to be able to not fetch a page twice it also turns out that it just takes a long time to fetch a web page both because servers are slow and because the network has a long speed of light latency and so you definitely don't want to fetch pages one at a time unless you want the crawl to take many years and so it pays enormously to fetch many pages at the same time up to some limit you want to keep on increasing the number of pages you fetch in parallel until the throughput you're getting in pages per second stops increasing that is you want to increase the concurrency until you run out of network capacity so we want to be able to launch multiple fetches in parallel and a final challenge which is sometimes the hardest thing to solve is finished and once we've crawled all the pages we want to stop and say we're done but we actually need to write the code to realize we've crawled every single page and for some solutions I've tried figuring out when you're done has turned out to be the hardest part alright so my first crawler is this serial crawler here and by the way this code is available on the website under crawler.go on the schedule this first crawl is a serial crawler it effectively performs a depth first search into the web graph and there is sort of one moderately interesting thing about it it keeps this map called fetched which is basically using as a set in order to remember which pages it's crawled that's like the only interesting part of this you give it a URL at line 18 if it's already fetched the URL it just returns if it hasn't fetched the URL it first remembers it it's now fetched it actually gets, fetches that page and extracts the URLs that are in the page with the fetcher and then iterates over the URLs in that page and calls itself for every one of those pages and it passes to itself the way it it really has just the one table there's only one fetched map of course because when I call a recursive crawl and it fetches a bunch of pages after it returns I want to be aware the outer crawl instance needs to be aware that certain pages are already fetched so we depend very much on the fetched map being passed between the functions by reference instead of by copying so under the hood what must really be going on here is that Go is passing a pointer to the map object to each of the calls of crawl to share a pointer to the same object in memory rather than copying rather than copying the map any questions? so this code definitely does not solve the problem that was posed because it doesn't launch parallel parallel fetches now so clearly we need to insert GoRoutines somewhere in this code to get parallel fetches so let's suppose just for chuckles that we just start with the most lazy thing so I'm going to just modify the code to run the subsidiary crawls each in its own GoRoutine actually before I do that why don't I run the code just to show you what correct output looks like so from this other window I can run the crawler it actually runs all three copies of the crawler and they all find exactly the same set of web pages so this is the output that we're hoping to see five lines of five different web pages are fetched principle line for each one so let me now run the subsidiary crawls in their own GoRoutines and run that code so what am I going to see the hope is to fetch these web pages in parallel for higher performance so you're voting for only seeing one URL so why is that that's exactly right after the it's not going to wait in this loop at line 26 it's going to zip right through that loop it's going to fetch one page the first web page at line 22 and then at that loop it's going to fire off the GoRoutines and immediately this crawl function is going to return and if it was called from main main will just exit almost certainly before any of the GoRoutines was able to do any work at all so we'll probably just see the first web page and indeed when I run it you'll see here under serial that only the one web page was found now in fact since this program doesn't exit after the serial crawler those GoRoutines are still running and they actually print their output down here interleaved with the next crawler example but nevertheless the codes just adding a Go here absolutely doesn't work so let's get rid of that so now I want to show you one style of concurrent crawler and I'm presenting two one of them are written with shared data shared objects and locks it's the first one and another one written without shared data but with passing information on channels in order to coordinate the different threads so this is the shared data one or this is just one of many ways of building a web crawler using shared data so this code significantly more complicated than the serial crawler so it creates a thread for each fetch it does all right but the huge difference is that it does with two things it does the bookkeeping required to notice when all of the crawls are finished and it handles the shared table of which URLs have been crawled correctly so this code still has this table of URLs and that's this f.fetched this f.fetched map at line 43 but this this table is actually shared by all of the all of the crawler threads and all of the crawler threads are executing inside concurrent mutex and so we still have this sort of tree of concurrent mutexes that's exploring different parts of the web graph but each one of them was launched as a own go routine instead of as a function call but they're all sharing this table of state this table of fetched URLs because if one go routine fetches a URL we don't want another go routine to accidentally fetch the same URL and as you can see here line 42 and 45 I've surrounded them by the mutexes that are required to to prevent a race that would occur if I didn't have the mutexes so the danger here is that at line 43 a thread is checking if a URL is already fetched so two threads happen to be following the same URL two calls to concurrent mutex end up looking at the same URL maybe because that URL was mentioned in two different web pages if we didn't have the lock they'd both access the map table to see if the thread had been already if the URL had been already fetched and they both get false at line 43 they both set the URLs entry in the table to true at line 44 and at 47 they would both see that it already was false and then they'd both go on to fetch the web page so we need the lock there and the way to think about it I think is that we want lines 43 and 44 to be atomic that is we don't want some other thread to get in and be using the table between 43 and 44 we want to read the current content each thread wants to read the current table content and update it without any other thread interfering and so that's what the locks are doing for us okay so actually any questions about the locking strategy here alright once we've checked the URL's entry in the table line 51 it just crawls it just fetches that page in the usual way and then the other interesting thing that's going on is the launching of the threads yes so the question is what's with the f dot no no the mu it is okay so there's a structure to find at line 36 that sort of collects together all the different stuff that all the different state that we need to run this crawl and here it's only two objects but it could be a lot more and they're only grouped together for convenience there's no other significance to the fact there's no deep significance to the fact that mu and fetched are inside the same structure and that f dot is just sort of the syntax or getting out one of the elements in the structure so it just happened to put the mu in the structure because it allows me to group together all the stuff related to a crawl but that absolutely does not mean that go associates the mu with that structure or with the fetch map or anything it's just a lock object and just has a locked function you can call and that's all that's going on so the question is how come in order to pass something by reference I had to use star here whereas when in the previous example when we were passing a map we didn't have to use star that is didn't have to pass a pointer I mean that star notation you're seeing there in line 41 basically and passing a pointer to this fetched state object and we want it to be a pointer because we want there to be one object in memory and all the different go routines want to use that same object so they all need a pointer to that same object so we need to find your own structure that's sort of the syntax you use for passing a pointer the reason why we didn't have to do it with map is because although it's not clear from the syntax a map is a pointer it just because it's built into the language they don't make you put a star there but what a map is is if you require a variable type map what that is is a pointer there's some data in the heap so it was a pointer anyway and it's always passed by reference you just don't have to put the star and it does it for you so there's definitely map is special you cannot define map in the language it has to be built in because there's some curious things about it okay good okay so we fetched the page now we want to fire off a crawl go routine for each URL mentioned in the page we just fetch so that's done in line 56 line 56 just loops over the URLs that the fetch function returned and for each one fires off a go routine at line 58 and that lines that func syntax in line 58 is a closure or a sort of immediate function but that func thing keyword is doing is declaring a function right there that we then call so the way to read it maybe is that if you can declare a function as a piece of data as just func you know and then you give the arguments and then you give the body and that declares and so this is an object now this is like it's like when you type one when you have a one or 23 or something you're declaring a sort of constant object and this is the way to define a constant function and we do it here because we want to launch a go routine that's going to run this function that we declared right here and so in order to make the go routine we have to add a go in front to say we want a go routine and then we have to call the function because the go syntax says the syntax of the go keywords as you follow it by a function name and the arguments you want to pass that function and so we're going to pass some arguments here and there's two reasons we're doing this well really there's one reason we you know in some other circumstance we could have just said go concurrent mutex concurrent mutex is the name of the function we actually want to call with this url but we want to do a few other things as well so we define this little helper function that first calls concurrent mutex for us with the url and then after concurrent mutex is finished we do something special in order to help us wait for all the crawls to be done before the outer function returns so that brings us to the the weight group the weight group at line 55 it's a just a data structure defined by go to help with coordination and the game with weight group is that internally it has a counter and you call weight group.add like at line 57 to increment the counter and weight group.done to decrement it and then this weight method called line 63 waits for the counter to get down to 0 so weight group is a way to wait for a specific number of things to finish and it's useful in a bunch of different situations where we're using it to wait for the last go routine to finish because we add one to the weight group for every go routine we create line 60 at the end of this function we've declared decrements the counter and the weight group and then line 3 waits until all the decrements have finished and so the reason why we declared this little function was basically to be able to both call concurrent mutex and call done that's really why we needed that function yes one of the subroutines fails and doesn't reach the done line that's a darn good question there is, you know I forget the exact range of errors that will cause the go routine to fail without causing the program to fail maybe it divides by 0, I don't know where D references and nil pointer, I'm not sure but there are certainly ways for a function to fail and have the go routine die without having the program die and that would be a problem for us and so really the right way to I'm sure you had this in mind when I was asking the question the right way to write this to be sure that the done call is made no matter why this go routine is finishing would be to put a defer here which means call done before the surrounding function finishes always call it no matter why the function is finishing yes yes so the question is how come two uses of done in different threads aren't a race yeah so the answer must be that internally a weight group has a mutex or something like it that each of done's methods acquires before doing anything else so that simultaneous calls to a group's methods aren't weight groups with a glow class yeah for certainly C++ and in C you want to look at something called P threads for C threads come in a library they're not really part of the language called P threads which they have these are extremely traditional and ancient primitives that all languages say it again not in this code but you know imagine a use of weight groups weight groups just count stuff and weight group doesn't really care what you're counting or why this is the most common way to see it used you're wondering why U is passed as a parameter to the function at 58 yeah this is alright so the question is okay so actually backing up a little bit for these for a function like the one I'm defining on 58 is that if the function body mentions a variable that's declared in the outer function but not shadowed then the inner functions use of that is the same variable in the inner function as in the outer function and so that's what's happening with fetcher for example like what does this variable here refer to what does the fetcher variable refer to in the inner function it's the same variable as the fetcher in the outer function so it's just is that variable and so when the inner function refers to fetcher it just means it's referring the same variable as this one here and the same with f f is used here just is this variable so you might think that we could get rid of the this U argument that we're passing all but just use the U that was defined up on line 56 in the loop um and it would be nice if we could do that because save us some typing it turns out not to work and the reason is that the semantics of go of the for loop at line 56 is that the for loop updates the variable U so in the first iteration of the for loop that variable U contains some URL um and when you enter the second iteration of the for loop that variable its contents are changed to be the second URL and that means that the first go routine that we launched that's just looking at the outer if it were looking at the outer functions U variable the that first go routine we launched would see a different value in the U variable after the outer function it updated it and sometimes that's actually what you want so for example for for um f and in particular f dot fetched we interfunction absolutely wants to see changes to that map but for you we don't want to see changes the first go routine we spawn should read the first URL not the second URL so we want that go routine to have a copy have its own private copy of the URL and you know as we could have done it in other ways we could have but the way this code happens to do it so the copy private to that interfunction is by passing the URL as an argument yes yeah if we pass the address of U yeah then it uh it's um actually I don't know how strings work but it is absolutely giving you your own private copy of of the variable you get your own copy of the variable and it yeah like its own version of still at the same are you saying we don't need to play this trick in the code we definitely need to play this trick in the code and what's going on is it's so the question is oh strings are immutable strings are immutable right yeah so how kind of strings are immutable how can the outer function change the string there should be no problem the problem is not that the string is changed the problem is that the variable U is changed so the when the interfunction mentions a variable that's defined in the outer function it's referring to that variable and the variable's current value so when you if you have a string variable that has A in it and then you assign B to that string variable you're not overwriting the string you're changing the variable to point to a different string and and because the for loop changes the U variable to point to a different string you know that change to U would be visible inside the inner function and therefore the inner function needs its own copy of the variable so any immutable types as far as I know whenever you call a function of them you pass them as your own argument they'll essentially make a copy of that so that like even if the variable at that pointer got changed at that location got changed your own copy okay but that is what we're doing in this code and that is why this code works okay the proposal or the broken code that we're not using here I will show you the broken code so but if you make it an argument it's normal on the heap because it doesn't escape the context so and that's why you actually do get a copy so the static melts go if you take you out of the argument we'll put it on the heap instead this is just like a horrible detail but it is unfortunately one that you'll run into while doing the labs so you should be at least aware that there's a problem and when you run into it maybe you can try to figure out the details yeah okay that's a great question so the question is if you have an interfunction that refers to a variable in the surrounding function but the surrounding function returns what is the interfunctions variable referring to anymore since the outer function is returned and the answer is that go notices, go analyzes your interfunctions these are called closures go analyzes then the compiler analyzes and says aha this closure this interfunction is using a variable in the outer function to find heap memory to hold the variable the current value of the variable and both functions will refer to that that little area of heap that has the variable so it won't be allocated the variable won't be on the stack as you might expect it's moved to the heap if the compiler sees it it's used in a closure and then when the outer function returns the object is still there in the heap the interfunction can still get at it and then the garbage collector will refer to this little piece of heap that's exited that's returned and to free it only then okay okay so weight group weight group is maybe the more important thing here the technique that this code uses to weight for all this level of crawls to finish all its direction is the weight group of course there's many of these weight groups one per call to each call to concurrent mute text just waits for its own children to finish and then returns okay so back to the lock actually there's one more thing I want to talk about with the lock and that is to explore what would happen if we hadn't locked right I'm claiming oh you know you don't lock you're gonna get these races you're gonna get incorrect execution whatever let's give it a shot I'm gonna I'm gonna comment out the locks and the question is what happens if I run the code with no locks what am I gonna see so when I see a a URL called twice or a fetch twice yeah that's yeah that would be the error you might expect alright so I'll run it without locks and we're looking at the concurrent map the one in the middle well this time it doesn't seem to have fetched anything twice it's only five run again gosh still five geez so maybe we're wasting our time with those locks yeah never seems to go wrong I've actually never seen it go wrong so the code is nevertheless wrong and someday it will fail okay the problem is that you know there's only a couple of instructions here and so the chances of oh these two threads which are maybe hundreds of instructions happening to stumble on this you know the same couple of instructions at the same time is quite low and indeed and this is a real bummer about buggy code with races is that it usually works just fine but it probably won't work when the customer runs it on their computer so it's actually bad news for us right what do we you know it can be complex programs quite difficult to figure out if you have a race and you might you may have code that just looks completely reasonable that is in fact sort of unknown to you using shared variables and the answer is you really the only way to find races in practice is to use automated tools and luckily go actually gives us this pretty good race detector built into go and you should use it so if you pass the minus race flag when you execute your go program it'll run this race detector which well I'll run the race detector and we'll see so it emits an error message from us it's found a race and it actually tells us exactly where the race happens so there's a lot of junk in this output but the really critical thing is that the race detector realize that we had read a variable that's what this read is that was previously written and there was no intervening release and acquire of a lock that's what that's what this means furthermore it tells us the line 404 so it's told us that the read was at line 43 and the right the previous right was at line 44 and indeed we look at the code and the read is at line 43 and the right is at line 44 so that means that one thread did a right at line 44 and then without any intervening lock another thread came along and read that written data at line 43 that's basically what the race detector is looking the way it works internally is it allocates sort of shadow memory it allocates some it uses a huge amount of memory and basically for every one of your memory locations the race detector is allocated a little bit of memory itself in which it keeps track of which threads recently read or wrote every single memory location and then when and it also keeps keeping track of when threads acquire and release locks and do other synchronization activities that it knows would force threads to not run concurrently and if the race detector ever sees a ha there was a memory location that was written and then read with no intervening lock it'll raise an error yes I believe it is not perfect it uh I have to think about it one certainly one way it is not perfect is that if you don't execute some code the race detector doesn't know anything about it so it's not analyzing it's not doing static analysis the race detector is not looking at your source and making decisions based on the source it's sort of watching what happened on this particular run of the program and so if this particular run of the program didn't execute some code that happens to read or write share data then the race detector will never know and there could be a race there so that's certainly something to watch out for so you know if you're serious about the race detector you need to set up sort of testing apparatus that tries to make sure all the code is executed but it's very good and you just have to use it for your 824 labs okay so this is race here and of course the race didn't actually occur what the race detector did not see was the actual interleaving simultaneous execution of some sensitive code right it didn't see two threads literally execute lines 43 and 44 at the same time and as we know from having run the things by hand that apparently doesn't happen or at least only with low probability all it saw was that at one point there was a write and then maybe much later there was a read with no interleaving lock and so in that sense it can sort of detect races that didn't actually happen or didn't really cause bugs okay one final question about this this crawler how many threads does it create and how many concurrent threads could there be yeah so a defect in this crawler is that there's no obvious bound on the number of simultaneous threads it might create you know with the test case which only has five URLs you know big whoopee but if you're crawling the real web with you know I don't know are there billions of URLs out there maybe we certainly don't want to be in a position where the crawler might accidentally create billions of threads because you know thousands of threads is just fine billions of threads is not okay because each one sits on some amount of memory so you know there's probably many defects in real life with this crawler but one at the level we're talking about is that it does create too many threads and really ought to have a way of saying well you can create 20 threads or 100 threads or 1000 threads but no more so one way to do that would be to create a pool a fixed size pool of workers and have the workers just iteratively oh look for another URL to crawl that URL rather than creating a new thread for each URL okay so next up I want to talk about another crawler that's implemented in a significantly different way using channels instead of shared memory to remember in the mutex crawler I just said there is this table of URLs that are called that's shared between all the threads and has to be locked this version does not have such a table does not share memory and does not need to use locks okay so this one the instead there's basically a master thread that says master function on line 86 and it has a table but the table is private to the master function and what the master function is doing is instead of sort of basically creating a tree of functions that corresponds to the exploration of the graph which the previous crawler did this one fires off just one go routine per URL that it fetches and that but it's only the master only the one master that's creating these threads so we don't have a tree of functions creating threads we just have the one master okay so it creates its own private map at line 88 record what it's fetched and then it also creates a channel just a single channel that all of its worker threads are going to talk to and the idea is that it's going to fire up a worker thread and each worker thread that it fires up when it finishes fetching the page will send exactly one item back to the master on the channel and that item will be a list of the URLs in the page that that worker thread fetched so the master sits in a loop where in line 89 it's reading entries from the channel and so we have to imagine that it's started up some workers in advance and now it's reading the information in the URL list that those workers send back and each time it gets a URL list on land 89 it then loops over the URLs in that URL list from a single page fetch at line 90 and if the URL hasn't already been fetched it fires up a new worker at line 94 to fetch that URL and if we look at the worker code starting at line 77 basically calls this fetcher and then sends a message on the channel at line 80 or 82 saying here's the URLs in the page that I fetched and notice now that maybe the interesting thing about this is that the worker threads don't share any objects there's no shared object between the workers and the master so we don't have to worry about locking we don't have to worry about races and instead this is an example of sort of communicating information instead of getting at it through shared memory yes yeah yeah so the observation is that the code appears well the workers are the observation is the workers are modifying CH while the master is reading it and that's not the way the go authors would like you to think about this the way they want you to think about this is that CH is the channel and the channel has send and receive operations and the workers are sending on the channel while the master receives on the channel and that's perfectly legal the channel is happy what that really means is that the internal implementation of channel has a mutex in it and the channel operations are careful to take out the mutex when they're messing with the channel's internal data to ensure that it doesn't actually have any races in it but yeah channels are sort of protected against concurrency and you're allowed to use them concurrently from different threads yes yes we don't need to close the channel the break statement is about when the crawl has completely finished and we fetched every single url right because what's going on is the master is keeping this N value it's private value and the master every time it fires off a worker at increment cn every worker it starts sends exactly one item on the channel and so every time the master sends an item off the channel it knows that one of his workers has finished and when the number of outstanding workers goes to zero then we're done and we don't once the number of outstanding workers goes to zero then the only reference to the channel is from the master or really from the code that calls the master and so the garbage collector will very soon see that the channel has no references to it and will free the channel so in this case sometimes you need to close channels actually I rarely have to close channels can you say that again? if you call master with mapping of the channel does it speed it forward? so the question is alright so you can see at line 106 before calling master concurrent channel sort of fires up one shoves one url into the channel and to sort of get the whole thing started the master goes right into reading from the channel at line 89 so there better be something in the channel otherwise line 89 would block forever so if it weren't for that little code at line 107 the for loop at 89 would block reading from the channel forever and this code wouldn't work yeah so the observation is gosh wouldn't it be nice to be able to write code that would be able to notice that there's nothing waiting on the channel and you can you look up the select statement it's much more complicated than this but there is the select statement which allows you to proceed to not block if there's nothing waiting on the channel so let's assume that the url that we're fetching is really big and the fetch itself takes a very long time when we launch the worker thread in line 94 then we go afterwards we still haven't received anything in our channel from the worker when we reach line 101 and we check for anything else left in the channel again there's nothing there because the worker hasn't finished it also doesn't create concurrency issues because like in java the changing a list while you're in a for loop will create a concurrency exception okay sorry so the first question I think what you're really worried about is whether we're actually able to launch parallel the very first fetch won't be in parallel because at first we don't receive the results we don't receive the results as fast as enough but we're checking to see if there are any results left so we should stay listening when we check in line 89 to see if there's anything left in the channel we've already sent the worker is trying but we see nothing in the channel so the for loop exits the for loop waits at line 89 that's not okay that for loop at line 89 is does not just loop over the current contents of the channel and then quit that is the for loop at 89 is going to read may never exit it's just going to keep waiting until something shows up in the channel so if you don't hit the break at line 99 the for loop won't exit alright I'm afraid we're out of time we'll continue this actually but a presentation scheduled by the TAs which I'll talk more about go