 This will be a very broad introduction to concurrency. What is it and what are its issues? First off, we'll look at the rationale for concurrency, when and why would you want to split up a task into multiple threads of execution? Whether using cooperative processes or multiple threads within a single process? Then we'll end by looking at how locks allow threads to share common resources without tripping each other up. So what is a thread of execution? A thread is simply a serial execution of a piece of code, one instruction after another, with only one instruction executing at any one time. By default, an operating system process has just one thread, but a process may have many threads, and all those threads would share the resources of the process, including the memory space. The operating system schedules these separate threads to run just like it does for separate processes. So in a multi-core system, the separate threads of a process may at times end up running simultaneously. Just like with separate processes, though, this is a decision left up to the operating system. A program itself is only responsible for starting and stopping threads, but it otherwise can't specify when they actually run. So the question is, why use a multi-threaded process instead of just separate processes? Well, because threads of the same process share the processes resources, including the memory space, the threads all have equally fast convenient access to the same data. When instead we use separate processes, the threads of each separate process have their own separate memory spaces. Though, in fact, many operating systems, including Linux and Windows, allow processes to share all or part of their memory space with other processes, which would affect basically the same result of multiple threads, all with access to the same memory space. So arguably the real choice is whether or not your separate threads share a memory space or whether they have their own separate memory spaces. In the case where memory space is not shared when we have separate processes with their own memory spaces, the processes can still coordinate and share data using inter-process communication mechanisms, such as files, pipes, or network sockets. On the other hand, sharing data through these mechanisms, rather than shared memory, is both less efficient and less convenient, which is why programmers often elect to instead create multiple threads within a single process. On the other hand, there is a virtue in processes only sharing data by copying, such that each process can only touch its own copies. This arrangement minimizes the sort of errors that commonly stem from threads sharing resources like memory. Now, you're probably wondering, why would you want to write concurrent code at all, using either threads or separate processes? What's the point? Well, there are two basic purposes of concurrency. There's what's called data parallelism and what's called task parallelism. In a problem that lends itself to data parallelism, you have some amount of work that can be neatly divided into separate chunks. In the real world, for example, if you want to dig 100 holes, that job can be neatly divvied up. Instead of one worker digging all 100 holes in sequence, you can have 10 workers digging 10 holes each and thereby get the job done 10 times faster, or you can have 100 workers dig one hole each and thereby get the job done 100 times faster. In this example, the 100 spots on the ground where you want the holes, they are the data that can be operated upon in parallel, hence data parallelism. In computation, you might have a job like, say, adding together 100 numbers, and because addition is commutative, you can neatly split the work into separate threads. One thread adds together 50 of the numbers and the other thread adds together the other 50 numbers. You then take both results and add them together to get your sum of all 100 numbers. By splitting most of the work into separate jobs that can run in parallel, we can complete the task in significantly less time, assuming a system with multiple cores to run the thread simultaneously. So data parallelism is about performing the same kind of work on the many items in a collection of data. We have all this stuff and we want to perform basically the same operation on each thing, so we split this stuff up into chunks to be processed in separate threads. In task parallelism, the work we want to do concurrently, the work we want to do in separate threads, is, in contrast, not uniform. We want to do different tasks in parallel. For example, in manufacturing, many separate factories can simultaneously produce different components that all get sent to one factory and they're assembled into one product. In computing, you might have a program that needs to perform a long running computation in one thread, but which also wants to send data across the network in another thread, so that the data can be sent without interrupting or waiting upon the computation. And especially common use of multi-threading is in interactive applications, where the user wants a responsive interface, even while the program performs some long running task. For example, while a video editing program processes a video, the user might hit the cancel button. If the handling of the cancel button click were handled in the same thread as the processing, the cancel button wouldn't do anything until the processing finishes defeating the whole purpose of the cancel button. So such a cancel button is handled by another thread, which upon the user's click terminates the processing job in the other thread. This example shows that concurrency isn't entirely about just improving performance by splitting workloads across processor cores. It also tends to better fit systems where multiple things are going on at the same time, particularly systems with a human user that wants a responsive interface. Be clear, though, that concurrency is not strictly necessary for such systems. Video games, for example, have been programmed for most other history as single-threaded programs. The responsiveness, the interactiveness of a video game is achieved with what's called a game loop. The idea of the game loop is that whatever work our game must do, get user input, simulate the game world, render the screen and render the sound, all that work is done in sequence over and over again many times per second, like say 30 or 60 times per second. This, of course, imposes a strict limit on how much processing time each one of those tasks can take. If you want the game to maintain a 60 frame per second target, the loop must iterate at least 60 times per second. And so the work done in each iteration must add up to less than one 60th of a second. When the game needs to do a task that may take longer than a fraction of a second, like say opening a file, that work must be broken up over multiple iterations of the loop. So that is the traditional structure of how to program a game, and is still present in basically every game ever made, except today, increasingly a lot of work is being farmed out into separate threads, largely for the sake of taking advantage of multi-core processors. As long as your program is single-threaded, it is only ever using just one of the cores in your system. So concurrency is actually necessary to take full advantage of today's hardware. Now, an important thing to make clear is that not all problems lend themselves to parallelism. You can't just take any problem and split up its workloads such that it can be done in parallel. This is because many problems have sequential dependencies. Certain steps have to be done before others, because the work produced by the earlier steps is necessary to do the later steps. For example, when you make bread, you can't make the dough and bake the dough at the same time. You have to make the dough first and then bake it, because obviously the baking step requires the product of the make the dough step. In computing, a lot of algorithms have these kinds of sequential dependencies. Data produced by one step of the algorithm is used in a later step, so until those earlier steps are completed, the later steps can't proceed. What makes concurrency so problematic, so error-prone, is that the independent actors which are our separate threads or separate processes may have some number of resources in common, whether we're talking about a file or data in memory or some piece of hardware. Hardware resources, though, are generally the least problematic because the operating system handles most of those issues for us. All interactions with input-output devices are done through system calls and it is the responsibility of the operating system to resolve those conflicting requests. Like say, if a process requests to use an optical disk drive while another process is using it, the operating system will make the process wait or maybe tell it to try again later. And of course the sharing of the CPU cores is handled entirely transparently to the processes and threads. The real problem area is memory shared between threads of execution. And be very clear, the issue with shared memory is a software problem, not a hardware problem. If two simultaneously executing threads both access the same memory address where one reads and the other writes, that conflict is resolved at the hardware level, either one writes before the other reads or vice versa. But otherwise the two threads just proceed as normal, they don't both get stuck as if they were both trying to walk through the same doorway at the same time. Instead, the actual danger with concurrency is that a thread might modify some piece of data in shared memory in a way that's unexpected by other threads, likely leading to errors. This is analogous to what happens in real life when say someone leaves a toilet seat up or down and someone else comes along and either pees on the seat or sits in the bowl because they made a false assumption about the state of the seat without checking. Humans can commonly, if not reliably, avoid these kinds of errors because humans can notice things. While we generally assume that the state of the world is the same as we last observed, we have the ability to unconsciously notice things out of place. Computers have no such faculty. The logic of code is always grounded upon certain things being in certain states in certain places. If you change the name of a file by one letter, the code doesn't notice, hey this looks like the same thing just with one letter changed, unless the code is explicitly written to look for files that are one letter off. Code can only notice things, which it is explicitly coded to check for. So the key question is, how can a thread of execution be sure that the data it shares with other threads doesn't get unexpectedly messed with behind its back? The general solution is to use what are called locks. A lock is simply a temporary assertion of privilege over a piece of data. Hey, no one else touched this thing while I'm still using it. But note that the term lock is a bit of a misnomer. In the real world, if I lock something, that actually physically prevents access. If I lock my safe, you can't get at the contents even if you try. A software lock, though, is merely advisory. If I lock a piece of data, I'm just asking you to please not touch my data until I unlock it. A software lock is much more like a please do not disturb placard hung from a hotel doorknob. The two operations of a lock are most commonly called weight and signal. When I want to use a resource protected by a lock, I invoke weight on the lock associated with that object, and when I'm done with the object, I invoke signal. If the lock is currently being held by another thread, when your thread invokes weight, then your thread will be made to weight, hence the name. Most commonly, weight will either put the thread to sleep, or will actually put it into the block's state, such that the operating system will not schedule it until the thread is unblocked. And this is why the unlocking operation is called signal, because it not only releases the lock, it signals all other threads waiting on the lock by unblocking them. Once unblocked, those threads that are waiting try to acquire the lock again, but of course only one of them will succeed, so the rest will block again. Just like when the occupant of a bathroom still vacates, only one person goes in and they lock the door, and all the other people waiting are still waiting. Now, when implementing a lock's weight operation, we want to make sure the testing of the lock's availability and the acquiring of the lock are done atomically, meaning done together as an indivisible instruction. Otherwise, what might happen is that in the gap between the testing of the lock and the acquisition, some other thread might happen to acquire the lock at the same time, leaving us with two threads holding the same lock at the same time, which is not supposed to happen. So the implementation of a weight operation hinges upon CPU instructions, which do both a comparison and a modification of memory in one instruction. A test and set instruction, for example, does a comparison and then conditionally flips a bit based upon the result of that comparison. When using locks, there are two issues to keep in mind, granularity and deadlock. Granularity refers to the scope of the lock, how long the lock is being held for, and how encompassing the resource being locked. A fine grained lock is a lock which encompasses the bare minimum resource needed and which is held for the bare minimum time needed. A coarse lock is a lock which encompasses more than the bare minimum. Say that multiple people are trying to simultaneously use the same set of tools. If one person told the others, hey, don't use this particular hammer for the next 10 minutes because I'm going to use it, that would be a fine grained lock. Whereas if one person told the others, hey, don't use any of these tools for the next week because I might use some of them, that would be a very coarse grained lock. The obvious problem with coarse grained locks is that they unnecessarily prevent others from doing work. If one person needs to use the pliers, while one person only needs to use the hammer, it would be wasteful if they could not both do so simultaneously, but this is just what a coarse grained lock might prevent. The same thing can happen in multi-threaded code. If one thread locks more resources than it really needs to, or for longer than it really needs to, other threads might have to wait unnecessarily for the resources they need to use. Using fine grained locks, locking the bare minimum of what you really need, solves this problem, but at the complexity cost of using more locks more frequently, which is not only more burdensome to code, but may also lead to a higher chance of what's called deadlock. Deadlock refers to a situation in which two or more threads are all waiting on each other, because each is waiting on a lock which the other one has. Say we have two threads, one of which has acquired a lock x, and the other of which has acquired a lock y. Now imagine that both of these threads, while holding onto these locks, both attempt to acquire the lock which the other one has. The thread with lock x would be waiting for lock y, and the thread with lock y would be waiting for lock x. And now because both threads are waiting on each other to release the locks they want, neither is ever going to release their lock, so they're both stuck in deadlock. The simplest technique for avoiding deadlock is to impose a locking discipline, such that all threads acquire locks in a set order, and then release them in the reverse order of which they were acquired. So for example, say we have locks a, b, c, and d, and that that is their established order. The rule is that as long as a thread holds onto a lock, it should not acquire a lock earlier in the order. So given a lock's a, b, c, and d, as long as a thread holds onto lock c, it should not acquire a lock's a or b, but it may acquire a d. And if the thread does acquire a d, then it must release d before c because that is the order reversed from which they were acquired. So to be clear, the only way a thread could acquire all four locks at the same time is if it first acquired a, then b, then c, then d, and then releases them in the order d, then c, then b, then a. Also be clear that a thread can skip over locks in the order. For example, we could acquire a lock b, then d, without acquiring c. It's just that once that thread has d, it can't acquire c until it releases d. Again, the rule is that as long as you have a lock, you should not acquire a lock that's earlier in the sequence, and the locks must be released in the reverse order from which the thread acquired them. If your threads follow this discipline, that lock will not occur. So this solution is quite simple in theory. The trouble is that in practice, coding is just quite error prone. This technique can also easily lead to excessively coarse grain to locking, thereby putting a damper on efficiency and thus possibly defeating the whole purpose of using concurrency.