 In this video, we're going to explain four strategies to address what we call input-output architectures. Now the underlying question with all these four strategies we're going to study evolves around a single question. How much work is done by the CPU in an input-output operation? So the situation we have is that a system has a CPU which is in charge of executing as fast as possible machine instructions, but at the same time we have these input-output devices that need to perform certain operations. Now let's start with the first strategy. Very simple one. It's called polling. Let's call it number one and it's very simple to explain because basically the premise is that the CPU takes care of everything. First it has to ask devices if operation is pending, initiates the processing and that's also the transmission of data. So this strategy is very easy to understand. It's just the one in which most of the work is done by the CPU. Now this is highly inefficient because if we have a system with a lot of devices asking the devices if an operation is pending requires the CPU to spend a lot of time and if the number of devices is very large this could even turn into a very inefficient system in which most of the time of the CPU is wasted on asking devices if an operation is pending. The second strategy focus on reverting or reversing this step over here and it's called interruptions. In the case of interruptions, which we call the number two, the idea is that even though the CPU has to be executing continuously machine instructions whenever a device, an input up or device requires attention instead of having the CPU asking for that, the device is going to notify the CPU and the CPU is going to temporarily stop the execution of instructions and perform what is called an interruption service routine. So this is as if the CPU is still doing most of the work except the step of asking if the device is ready to do some operation, some input up or operation. The device is the one notifying. Now the first step that occurs here is that the interruption is detected, in other words the CPU detects that there is one device that requires its attention immediately. Now the fact that the CPU is going to stop doing what it's doing right now and start executing this interruption service routine requires first to save the context of execution. This is because the CPU temporarily is going to use all of its internal resources like registers, internal flags, things like that are going now to be used by this interruption service routine. Therefore we need to store a copy in memory of this context so that then we can restore the situation when we are done executing the interruption service routine and the CPU can proceed executing the program normally. Now the next step is one of the most important one. There is a table which is checked, this table is called the interruption vector table and this table is used to perform a lookup operation in which we use the number of interruption that was received by the CPU. This table with the lookup identifies one slot and this slot all of them contain the address in memory in which is stored the interruption service routine. So the idea is that there is a location which is also in memory by the way of this interruption vector table IVT and this table has in each one of its slots the address of the ISR or interruption service routine that will take care of the operations required by every input-outable device hooked to the system. After this interruption service routine has been executed then we have to return to the way the CPU was working before which means first we have to restore the previously saved context. In other words we leave the CPU exactly as we found it in this place over here as if this interruption service routine has not been executed, in other words it's not going to leave any trace on the context of execution and then the microprocessor returns to the regular execution. Now the key observation here is that we have changed this step about asking devices if they have an operation pending. However, the CPU still manipulates data and performs the transmission of data. So even though from this point of view the CPU reduces its work it's still carrying quite a lot of operations as part of this input-output processing. Let's now look at a more advanced strategy which is called DMA which stands for Direct Memory Access. This would be our strategy number three. Now direct memory access comes from the realization that some of the input-output it's done in blocks. Let's take for example the disk sometimes. These blocks are four kilobytes. So rather than having devices connected to the system, CPU, memory, connected to the bus, and we have here a device, rather than having one or two characters that is produced by the device or the input-output operations are based on characters, these operations are based on blocks, large blocks, four kilobytes. Now if we go back to the interruption service what it means is that every time there is a new data produced by the device the CPU needs to take care of all of that data and transfer it to memory. In other words in this transmission it still requires a lot of time for the CPU when we use or when we deal with blocks of information produced by these devices. The solution adopted by designers is this strategy called DMA by which a new circuit appears here. This circuit is called DMA. Now the DMA is precisely hooked to the bus or connected to the system to remove the responsibility of the CPU by making this transmission. So the CPU now what it has to do is program the DMA operation. So it tells the circuit three things. It tells the circuit which device needs to be served, which device requires the transmission of a block, it tells also the number of bytes of the block, and finally a memory location for the transmission of information either from the device to the memory or from memory to the device. So now the CPU the only thing it does is it provides these three parameters and the DMA from then on takes care of supervising all the transmission of data between the device and memory and the CPU can keep executing the regular instruction. So as we can see the main difference between strategy number three and strategy number two is that one significant portion of the handling of the data, especially in the case of block-based devices such as for example hard drives is now passed on to this more sophisticated circuit called DMA and the only thing that is needed from the CPU it's some basic programming of this circuit and the rest is being taken care of. Once the device finishes this transmission still the CPU needs to be interrupted. In other words we're still relying on the interruption-based mechanism such that the CPU then checks for errors and sometimes it programs the next operation. So this is a scheme that is fairly efficient because by brief connections of the CPU to the DMA we are capable of programming the transfer of a fairly large number of blocks between the device and memory. So this is strategy number three and the next step is to evolve even farther on the type of computation that occurs without the intervention of the CPU and this strategy is called channel input output, this would be strategy number four and it's based on one circuit which is much more sophisticated than the DMA which is called the IOP or input output processor. Now this circuit it's still connected to the regular system as we know it with the CPU the memory here but now the difference is rather than having the devices directly connected to the bus what we have is one or several IOPs and each one of them takes care of all the operations for more than one device actually. These microprocessors are capable of dealing with more than one input output device. So as you can imagine this type of a structure or architecture is more suitable for sophisticated systems that have large numbers of input output devices. Now the difference with previous strategy is that the CPU now again programs the IOPs but this programming that occurs here is different from the programming of the DMA because the CPU is capable or rather the IOP is capable of executing high level input output operations. For example suppose we have to transfer a file which is in memory to disk and this file has a large number of sectors or blocks in the DMA approach the CPU would have to intervene every time a block needs to be transferred. IOPs on the other hand are capable of understanding higher level input output operations therefore the CPU would only program IOP to say this file that you have in memory needs to be stored in disk and that's it and then check again that the operation has been done successfully. So as we can see these four strategies for input output vary significantly on the amount of work that is performed by the CPU to the point that we can represent into these two graphs over here. Suppose here we plot the CPU time that is devoted to input output. We would be able to plot the four strategies we have described or the four architectures as follows. Polling would be at the top because it's the one that requires more CPU time. Interruption would be slightly lower because the CPU doesn't have to go and ask devices if they are ready. DMA would reduce the amount of time that CPU dedicates to input output because it takes away all the complexity of transferring blocks between the device and memory and finally channel input output would be the lowest one in CPU time devoted to input output because these microprocessors are capable of executing fairly sophisticated or high level input output operations but at the same time we can draw this other graph in which we can plot the circuit complexity and the circuit complexity as we can see starts with a very low value for polling because the circuit that is needed is barely nothing. The CPU is asking the device if it needs an operation, it's taking care of everything. Interruption is a bit more complicated because the notion of an interruption controller appears. We need this lookup table. We need the service routines story memory. Option number three, DMA is more complex because now we have a specific circuit which is capable of executing some basic operation mainly a loop but it's still fairly much more complicated and therefore the number of transistors or gates required to implement the circuit is much larger and of course up here would be the channel input output because these IOPs we have here are much more complex circuits than the DMA and require therefore a large number of gates.