 The Unity Job System enables us to utilize multiple cores in our game code, but without the usual headaches of writing multi-threaded code. In this video, I'll discuss only the Job System and not ECS, which is independent of the Job System, though highly complementary when used in conjunction. When we write multi-threaded code without the Job System, we're required to create and manage threads ourselves, as well as to manually synchronize access to data that is shared between the threads. When using the Job System, it creates and manages a pool of threads for us, usually one per CPU core. We then create units of work called jobs and add them to the Job System queue. The Job System then farms these jobs from the queue to the pool of threads as the threads become available. It's left up to the Job System to decide precisely which jobs run on which threads in what order, and once a job is running on a thread, it is never preempted by other jobs. While a job runs, its thread is occupied until the job finishes. For primarily this reason, the Job System is not appropriate for doing IO work. If a job were to wait for IO, its thread would be wastefully idle during that time, thwarting our goal of maximizing utilization of the cores. So, jobs are intended only for doing computation upon in-memory data. Now, it may sometimes be the case that two or more concurrently scheduled jobs access the same shared data. If the shared data is only read, this is not a problem, but if one or more of the jobs mutate the data, we usually then want to ensure these jobs run in a certain order and not overlap in their execution. We can do this in the Job System by specifying dependencies between the jobs. If we tell the Job System that Job A is dependent upon Job B, then the Job System will make sure Job B finishes executing before Job A starts. A single job can have multiple direct dependencies, and those dependencies can have their own dependencies. However many dependencies, a job will not start executing until all of its dependencies have finished. So the general idea is that by splitting our workload into separate jobs and creating dependencies between them only where needed, we can maximize the utilization of the cores. But how do we create and schedule jobs? Well, a job is represented as a struct that implements the iJob interface, which has a single method, execute. When a job is run, its execute method is called, and when execute returns, the job has finished. Alternatively, a job struct could instead implement iJobParallel4, which we'll discuss later. When a job is run, it's usually run in a native thread, meaning a thread unmanaged by the garbage collector. Just before calling execute, the job struct is copied, and the job is meant to access only this private copy of the struct, not any other data. Accessing managed objects from a native thread is dangerous because it might interfere with garbage collection. So as a rule, a job struct can only have fields which are blittable types or which are native container types. What C-sharp calls a blittable type is a data type which can be simply copied byte by byte between managed code and native code. Non-blittable types in contrast would require extra steps, mainly fixing the preferences so as not to interfere with garbage collection. The basic number types, like instant floats are blittable, but not chars or bools. One-dimensional arrays of blittable types are themselves blittable, as are structs that contain only blittable fields. However, class instances and any managed objects are not blittable, so we cannot include them in our jobs. Unity provides a set of native container types which are not technically blittable, but a special allowance is made for them in jobs. The native container types include native arrays, native hash maps, and a few other basic data structures that cover most common use cases. You can implement your own native container types if the stock set is inadequate. What a native container is, is a struct type with a pointer to natively allocated memory where the actual content is stored. A native array, for example, is a struct with a pointer to natively allocated memory, and the contents of the array, the elements of the array, are stored in that native memory. Because the garbage collector has no awareness of the natively allocated memory, it is our responsibility to deallocate the memory by calling the native containers dispose method when we no longer need it. When creating a native container, we specify one of three options for how to allocate the memory. The persistent option allocates using malloc, meaning an actual system call might get triggered and thus be quite slow. The temp and temp job options allocate from pre-malloc to chunks of memory and so are generally faster, but they are not intended for long-living allocations. A temp allocated native container will throw an exception if it is not disposed in the same frame as which it was allocated. A temp job allocated native container will throw an exception if it is not disposed within four frames of being allocated. Understand, however, that the checks for these exceptions are fairly expensive and so run in the editor but disabled and standalone builds of the game. Regardless, you should always respect the intended allocation lifetime limits. Now, if a job can't perform I.O. or access any data other than its own fields, the only way a job can do useful work is by mutating one or more of its native container fields. Although a job effectively accesses its own private copy of the struct, when the native container fields are copied, their pointers still point to the same underlying memory. Thus, mutations of native containers in a job are visible outside the job. For example, if the purpose of a job is to calculate a single number, we create a native array of length 1 and set the array as the field of the job before scheduling it. In the job, we store the calculated number in the single slot of the array and after the job finishes, we can read the result from the array. Once we've created a job struct, we then put the job on the queue by calling the schedule extension method which returns a job handle. Passing job handles to schedule makes them direct dependencies of the newly scheduled job. Because we need the handle of a job to use it as a dependency, the job must already be scheduled and so it's effectively impossible to create circular dependencies. We can't make job A a dependency of job B while also making job B a dependency of A. And we wouldn't want to do so because then both jobs would be deadlocked, waiting on each other forever to start running. Having scheduled a job at some point will want to make sure that it has finished. That's when we call the complete method on the job handle. Complete those three things. First, complete marks jobs as ready for execution. When the job is scheduled, it will not be executed until it's marked ready. As we'll show later, you can also ready jobs by calling the schedule batched job method. Second, complete will not return until the job has finished executing. If the job happens to have already finished, complete will return immediately, but otherwise it will wait. Now, jobs usually run on background worker threads, but when we call complete, it may be the case that the job or its dependencies have not yet started execution. In such cases, the job system may elect to run the jobs on the main thread rather than letting the main thread wastefully sit idle. Third, complete removes all reference to the job from the queue. Forgetting to complete a job effectively creates a resource leak because the record of the job otherwise won't get removed from the queue. Two more important notes on complete. If a job has already been completed, subsequent calls to complete do nothing. Also, completing a job first completes its dependencies and all of their dependencies recursively. So, having created and scheduled a long chain of dependencies, you need only complete the root of the chain rather than complete each job in the chain individually. So, finally, here is a simple example of creating, scheduling, and completing a job. The example is very artificial, but demonstrates the steps. At the bottom, we have to find a struct called myJob. It implements the iJob interface and so has an execute method that takes no arguments and returns void. This simple job has just one field, a native array of ints, and in the execute method, the job simply increments the value at index four of the array. Above, in the start method of a monob behavior, we're creating a native array of five ints, assigning the value 99 to its index four, and incrementing that value. So, when we print it out immediately after, it has the value 100. We then create a myJob instance, assigning the native array to its single field, schedule the job, and then call complete on the returned job handle. The call to complete readies the job for execution, it is then executed, incrementing the value in the array, and when complete returns, all record of the job has been removed from the queue, and we can be sure that it has finished executing. When we next print out the value in index four of the array, it is now 101. Having completed this job, we don't need the array anymore, so we dispose of it. Note that we chose to use the persistent allocator here, but we could just as well have used the temp or temp job allocators. Generally for jobs, we prefer to use temp job rather than temp, and we only use persistent for jobs that run over many frames. Also note that this job requires a native array of at least five ints. When we run in the editor, an exception is thrown if we access an array out of bounds, and so we would get an exception here if the array of the job had fewer than five ints. However, no bounds checks are performed in standalone builds, and so this code would dangerously access memory outside the bounds of the array, given an array with fewer than five ints. In this next example, our job struct at the bottom now has an additional int field called val, and the execute code multiplies index zero of the array by val. Above in the start method, this time the array has just one int and is allocated with temp job. We assign three to the single int of the array and create two ints of my job. Both jobs are assigned the same array, but the first is given two for its val, and the second is given five for its val. Because the first job is completed before the second is scheduled, the first job will always finish before the second starts. When the first job runs, three in the array is multiplied by two, giving us six, and then when the second job runs, six is multiplied by five, giving us 30. So after completing the second job, the value stored in the array is 30. If, though, we try scheduling both jobs before completing either, we'll get an exception running the game in the editor when we try to schedule the second job. The job system safety checks notice that the second job uses the same native array as another job that's already sitting on the queue, and neither job has a dependency of the other. If the job system allowed this, it would be dangerously indeterminate which job ran before the other, and the jobs could also dangerously run at the same time. For jobs that share mutating data, we almost always want to ensure that they run in a set sequence. In this case, if the second job were to run before the first, we'd get the same result 30 because multiplication is associative, but in many other cases, ordering jobs differently would produce different results. To make the job system happy here, we should make the first job a dependency of the second. The safety checks then won't throw any exception, and the job system will ensure that the first job finishes execution before the second begins. And thanks to the dependency, we need only explicitly complete the second job because doing so first transitively completes its dependency the first job. It's also the case that while a job sits on the queue, we shouldn't access any of its native containers in the main thread, and in the editor, safety checks will throw an exception if you do so. Here, only once the job has been completed and so removed from the queue should the main thread again access this native array. To make this code correct, the main thread would only access the array after the job has been completed, not before. As a rule, jobs should only be scheduled and completed on the main thread. Though scheduling and completing jobs within other jobs might seem useful in some scenarios, it was decided that allowing either would be too error prone and would create too many complications. As a guideline, it's generally best to schedule jobs as soon as we can and then wait to complete them only when we absolutely need them completed. In our examples so far, we've completed jobs right after scheduling them, but this tends to defeat our goal of maximizing utilization of the CPU cores. More realistically and more typically, we often want to schedule jobs at the start of a frame and then only complete them at the end of the same frame. In this example, we're scheduling a job in the update method, but then waiting to complete the job in late update, which runs after all updates of every mono behavior. In other cases, we may even want to complete a job in a later frame, allowing the job's workload to be spread across multiple frames. As mentioned, there's no possible logical conflict between jobs if the data they share is only read and not written. So, we can mark native container fields of a job with the read-only attribute. The job system will then not consider it a conflict to schedule multiple unrelated jobs that only read from the same native container. Without this attribute, we would be required to make the job's dependencies of each other, meaning their executions couldn't run in parallel. When running in the editor, safety checks will throw an exception if we modify a read-only native container. In this example, the input array is marked read-only, and so mutating its contents triggers an exception. Sometimes, we might want to complete multiple jobs at the same point, but calling complete on one job after the other may force them to run in an order that makes sub-optimal use of the CPU cores. By instead using the complete all method, we can wait for multiple jobs to complete, but allow the job system to choose an order of completion that may be more optimal. The schedule method only accepts one job handle as a dependency, so to give a job multiple direct dependencies, we need the combine dependencies method which combines multiple handles into one virtual handle. The combined handle here doesn't represent an actual job, but if we schedule a job with this combined handle as the dependency, then the new job would wait for A, B, and C to all finish execution before it starts. And be clear that A, B, and C do not have to be dependencies of each other, and so they can still run in parallel with each other. The iJob Parallel 4 interface is like iJob, but the execute method has an int parameter. When we schedule an iJob Parallel 4, we specify a count and a batch size. Behind the scenes, the job is split into sub-jobs, each of which calls execute with a different range of indexes. Here for example, the count is 100 and the batch size is 20, so the job is split into 5 sub-jobs. These sub-jobs are individually taken off the queue and executed like normal jobs. The first sub-job calls execute 20 times with indexes 0 through 19. The second sub-job also calls execute 20 times, but with indexes 20 through 39. The third sub-job also calls execute 20 times, but with indexes 40 through 59, and so forth for the other sub-jobs. Now understand the count need not be evenly divisible by the batch size, in which case the last sub-job will make fewer execute calls than the other sub-jobs. However the work is split across however many sub-jobs, the job is only considered finished once all of its sub-jobs have finished. What a Parallel 4 job effectively allows us to do is conveniently split a workload across multiple jobs while only explicitly creating and scheduling one job. This example increments all 100 indexes of the array, splitting the work across 5 sub-jobs that can run in parallel on separate cores. Note that the array is effectively split into sub-ranges, one for each sub-job, and it would be improper for a sub-job to access indexes of the array outside its own sub-range. In fact the job really shouldn't access any index to the array other than the one passed to the in-parameter. Now you might be concerned that a regular job has a single call to execute, but a Parallel 4 can have many, and those many method calls may induce a lot of overhead. Well this is not actually really a concern, because the overhead gets optimized away by the compiler. Speaking of compilers, lastly here another feature of Unity's data-oriented technology stack is the burst compiler. Unity's special optimizing C sharp compiler that can aggressively utilize SIMD instructions. SIMD, SIMD stands for single instruction multiple data, and these are instructions in modern CPUs that perform operations en masse, particularly math operations on floating point values. A typical SIMD instruction might say multiply four pairs of floating point numbers, and it does so in significantly fewer CPU cycles than if you were to multiply the four pairs individually using regular instructions. Unity's regular C sharp compiler, the mono compiler, is not very aggressive about utilizing SIMD instructions because of certain design choices in C sharp. The burst compiler though only works on a subset of C sharp that Unity calls high-performance C sharp, and the burst compiler only works on job code. So by moving your workloads into jobs and sticking to this subset of C sharp, you can then take advantage of the burst compiler, which often yields performance gains in the range of 2x to 10x or even more sometimes. It's about as close as you're going to get to free performance. Currently, burst is only available in preview as an optional package in the package manager, and once installed, you enable burst compilation on each job with the burst compile attribute. Burst does not yet support debugging, so you'll need to disable burst compilation if you want to debug a job. There's not much you really need to know about burst to use it, but I'll cover more details in a later video.