 In the last few lectures, we have seen the improvements that we can bring about over the standard buffer insertion technique by using low swing current mode signal. There were two variants that we saw. One of them used inductive termination to provide a high pass function which countered the low pass nature of the wire. The other technique actually boosted the high frequency components of the signal before it pass through the wire. That was done through dynamic over driving in which at every transition additional drive was added on through a strong driver. This boosts the high frequency components of the original signal and therefore even after attenuation of these through the wire, you get an undistorted signal at the end. As a result, we could get very power and energy efficient transmission at high speeds over wires which are essentially low pass. We had seen a comparison of the inductive termination method and the dynamic over drive method. There is yet another technique which has lately been suggested and it is worth mentioning this because it does essentially the same thing as the dynamic over drive but accomplishes it using much simpler components. After all, what do we want? We want to dump extra current into the line every time there is a transition and this can be done by putting a capacitor in series with a strong driver. The series capacitor will make sure that no extra drive is injected when there are no transitions and at every transition extra current will go into the line. So it is a much simpler way of boosting the high frequency components than using the NAND NOR combination that we have seen for the dynamic over drive phase. So let us have a look at this capacitive driving technique, what are its problems and how we overcome those problems. As we have seen inductive peaking counters the low pass nature of the wire by providing a high pass function at the receiver. Dynamic over driving provides compensation by boosting high frequency components before transmission by providing extra drive during transitions. The same effect can also be achieved by putting a capacitor in series with the driver at the transmitter. However, this causes a problem. The DC common mode voltage of the line will now become undefined because we have put a capacitor in series. To counter this we can put a weak driver to set the DC level and to provide low frequency coupling to the line. Otherwise we might in fact just by putting a capacitor in series go to the other extreme where the drive is much diminished at the low frequency end and indeed the DC drive is removed altogether. So essentially a combination of a weak driver and a capacitive driver would provide very similar functionality to the dynamic over drive solution. Let us look at the circuit which accomplishes this. So if you look at the circuit on the top essentially the strong driver is now a simple series of increased geometry inverters. This is a standard way of driving high capacitance loads. The inverted digital input notice the one inverter here. The inverted digital input actually drives an NMOS which is directly coupled to the line and this is a model of the low pass interconnect. At the receiver end we have a grounded gate PMOS transistor which essentially acts as the termination and also pulls up the wire to VDD because it is always on. The gate is grounded this is a PMOS and therefore it pulls up this line. The receiver is actually a comparator and we will see the details of it is working in a digital way. Now the grounded gate PMOS at the receiver keeps a line at VDD when the input is at 1 because when the input is at 1 this voltage is at 0 and therefore the NMOS is off. The NMOS being off but this is of course always on and that establishes a proper and determined DC voltage at this point. This point then floats up close to VDD. When the input is at 0 then this point is at 1 and the NMOS turns on. This pulls the line to a voltage which is lower than earlier. In fact the geometry and the current for which this transistor is biased determines the low frequency swing at this line because it normally when the input is 1 this will flow to VDD and it will be at a lower voltage because of the pull down provided by this NMOS and the amount by which it is lower will depend on the amount of current that this transistor draws when the input is 0 and consequently this input is 1. The drop is a combination of the current drawn by this and the resistance provided by this transistor. Now these two transistors together will consume static power when the input is at 0. Therefore we would like to keep the current level through these as low as possible. After all these are only providing DC and low frequency coupling and a high power is therefore not required through these. So, we would not like to waste a lot of power as static power in these two transistors by keeping the drive through these any higher than is required. The actual high frequency drive is in fact provided by this chain of inverters only two are shown here but any even number can be used. These inverters have progressively larger geometries in order to be able to drive a large capacitive load. This capacitor then provides the capacitive peaking only when there is a transition at this point is it coupled to the line through this if the input remains at 0 or remains at 1 then there is no drive through these capacitive. This is exactly like providing a sharp current pulse of positive or negative value to the line during transitions which is what we had done during dynamic over driving. So, it is a simpler replacement for the concept of dynamic over driving and now we do not need that NAND nor and feedback circuit. So, this is actually an attractive solution and it is now being used in addition to the dynamic overdrive solutions that we have seen earlier. There are various pros and cons of this technique versus the dynamic overdrive and we shall see that this receiver is a little harder to design simply because the head room available is small and this line actually swings from VDD to a few millivolts below VDD. Therefore, essentially amplifying this voltage this low voltage swing is not that easy because we do not have much head room for the transistors connected to this line and this comparator therefore, is harder to design. In fact, our group at IIT Bombay has worked on this technique also and modified this technique so that the resting voltage at this point is brought closer to VDD by 2 so that efficient comparators can be designed easily here without adding too much to the static power consumption. That combined with a worst case data sequence technique actually optimizes the behavior of this capacitive peaking and gives very efficient data transmission on long wires. However, all these techniques have brought in efficiency by reducing the swing on the line. This means that our design has to be very careful otherwise if there are small changes in device parameters which will always happen can have a disproportionate effect on the performance of the system. In the voltage mode this signal is swinging all the way from ground to VDD and therefore, small changes in VTE etcetera of the transistors do not have such a big effect. However, in modern short channel processes variations in transistor parameters are large some of the parameters can vary by as much as 60 percent and therefore, we have to design circuits so that they are robust with respect to batch to batch variations as well as variations between devices on the same die. What will these variations do? They can in fact change the operating points and the strengths of the driver connected therefore, we have to design our techniques such that they are practical and robust with respect to such expected variations. Dry variations are also important because after all we are talking of long wires therefore, it stands to reason that the transmitter and the receiver are in different parts of the chip which are separated by a long edge distance. As a result the transistor parameters at the transmitter end and those at the receiver end will not be identical. So, there are two kinds of variations which vary us and therefore, the design of interconnect or where design must take this into account. One of these is a batch to batch variation that means, if for one run the scheme works if we are not careful in a design in another run where the values of VTE and mobility and so on are different for the NNP channel transistors. The operating point may shift and because your swings are extremely small it is not guaranteed that your scheme will work as well anymore this is one problem. The other problem is that the transmitter and the receiver are in different parts of the chip and they may be mismatch on the same chip in the same run and this mismatch can cause malfunctions. So therefore, we need to have a style of design which takes these variations into account and permits our circuits to keep working in spite of these dynamic variations. So, what are our robustness requirements? We are saying that the process supply voltage and temperature variations popularly known as PBT variations will affect the core logic as well as data communication circuit. It is not only the interconnect which will slow down in fact, the rate at which we generate the data will also change therefore, the requirement for data transmission is not complete invariance with respect to PBT variation that is not our robustness requirement. We just have to ensure that throughput and delay properties of the interconnect are at least as good as data generations and clock rates. If we land in a slow version of the transistor parameters then the data generation and the clock that it can support will also come down. What we have to ensure is that the deterioration in interconnect properties should be no worse than the deterioration in general logic. Because of global interconnects by definition these connect remote points on the die and on chip variations must also be accounted for. Let me just give a simple example of how this local variation can be of concern. The batch to batch variation and transistor parameter variations with runs is easy enough to understand because after all a slow circuit when VTs are high, mobilities are low will not keep up to the speed requirements. However, it is a much more softer nuance to understand why local variations should worry us so much and why is that worry more in case of in case of low swing techniques that we have been describing. Consider this case and here at the receiver we are trying to resolve this small swing around a common mode voltage into a full blown ground to VDD swing which will then be used by the receiver. Now, if the switching threshold of the receiver is exactly aligned with the common mode voltage as driven by the transmitter then we do not have a problem. Designing an amplifier which will take this small swing and amplified to a full rail to rail swing with a comparator is not very difficult. But, let us say that because of parametric variations and mismatch between the transmitter and the receiver the common mode voltage at the transmitter is slightly below. In fact, it is solo compared to the common mode voltage at the receiver that even at the highest level of the swing at the transmitter it remains below the resolution threshold of the receiver comparator. As a result while we have a healthy swing around this common mode voltage at the transmitter this entire signal whether high or low is below the threshold of 0 at the receiver and as a result the receiver will be stuck at 0. Exactly the same thing happens if the common mode voltage is much higher than this one level and because this swing is rather low relatively small mismatches between the transmitter and the receiver can lead to problems and there is no problem for the rail to rail swing of the buffer inserted technique. So, while we have come up with a better technique which is energy efficient it brings in its own requirement of robustness of design which we must be aware of. So, to analyze this we have essentially a somewhat idealized model of either dynamic over driving or the capacitive drive case in which we apply an enhanced drive for a short time. This could be because of capacitive coupling or because of dynamic over drive 9-0 combination and then maintain the line at a low drive. Similarly, when there is a 1 to 0 transition we give it a large boost for a short time and then maintain it at low drive. At the receiver end we have a reference voltage Vm which is the switching voltage of this inverter at the receiver and a terminating resistance. This amplifier has a high gain provided this line is kept at Vm and finally, this drives a buffer which drives the load capacitance. There are various parameters of the transmitter and receiver which will affect the robustness of this solution. The value of Ip is the peak current supported by the strong driver during input transition that is the Ip value. Tp is the duration for which the strong driver is on and delta V is a line voltage swing at the receiver end. So, as a result of this drive shape in current at the transmitter we shall get a delta V at the receiver end after it has passed through this low pass line. And finally, the mismatch between the common mode voltage seen at the receiver and the operating point of this transmitter. So, these are the various parameters which will affect the robustness of a design. The scheme with feedback which we had described which has essentially a feedback inverter which stops the drive when the line at the transmitter end reaches a 1 or a 0 has a particular problem. The reason for that is that this sensor inverter which turns off the drive is at the transmitter end and this inverter which is very similar it transfers the low swing voltage to a range and to rail swing is at the receiver and these two might not match. In that case if the mismatch is too large we may have a problem. Let us look at this case and let us say that the common mode voltage at the transmitter end and the common mode voltage at the receiver end have a certain mismatch. Now what happens is that because the receiver tries to maintain the line at this voltage the sensing at the transmitter goes completely awry. Consider the case here. Let us say that the line was resting at 1 and we are trying to pull it down to 0. As we pull it down to 0 the feedback inverter after the transition is complete turns the strong driver off. However the common mode voltage at which this turning off occurs is much lower than the receiver common mode voltage. As a result the voltage here goes to a voltage much lower during than the receiver common mode voltage. And therefore as soon as the strong driver turns off the receiver starts charging this line up because the receiver is trying to keep it at VCM RX. As soon as it reaches a certain voltage the feedback inverter at the transmitter thinks that this voltage is too high and turns the strong driver on again. Notice the input has no transition at all. However because of this feedback there is this back and forth between the receiver and the transmitter. When the strong driver turns off the receiver starts charging it to its common mode voltage. This common mode voltage is too high for the feedback inverter at the transmitter and that again turns the transmitter on which then takes it down to the low value which it sees as the appropriate low voltage. And when that voltage is reached the strong driver turns off. As soon as the strong driver turns off the receiver tries to take the voltage to its common mode voltage and because these two are not the same you get a an intermittent turning on and off of the strong driver which reduces the average swing of the line which can cause robustness problems. These problems are removed by a technique which we have advocated and which is a fixed pulse width driver and this gets rid of the feedback. Notice that this circuit is not feedback and the drive is now provided for a fixed delay. That means the strong driver is not turned off on sensing the line we are not sensing the line anymore. The strong driver is turned off after a delay which is process dependent. We would like to minimize this process dependence and this we have done in some work done at IIT Bombay by developing bias which actually senses the current process. This whole thing depends on a short channel PMOS and a long channel NMOS or a long channel NMOS and a short channel P. A long PMOS and a short channel NMOS. This system relies on the following fact that the short channel transistors have a much higher variation than the long channel transistor. So consider this because this short N channel NMOS will vary with the process whereas the long PMOS will not at least to 0 level. Therefore it acts as a it sends more or less a process independent current through this diode connected NMOS and as a result this output tracks VTN. If VTN is higher this voltage also becomes higher and that corrects the bias for transistor parameter variations exactly the same thing happens for this PMOS. So using such auto bias circuits we have developed a system in which the drive through this is in fact correct corrected for process variation and also it does not use any feedback. By combining these two techniques in fact we have been able to come up with a very robust technique. We have simulated these techniques and we find that the degradation for the scheme that we have suggested these are the three schemes this is the current mode scheme with feedback this is the current mode scheme with fixed pulse width and this is the current mode scheme with the smart bias which I have just now described. And we find that the degradation and the mismatch is much reduced in case of delay the percentage degradation can be as much as 25 percent for the feedback case 10 percent for a fixed pulse width case and only 4 percent when we combine fixed pulse width with a smart bias case. Similarly the throughput degrades by about 33 percent in case of feedback about 14 percent in case of fixed pulse width but only 9.5 percent when we use this technique which combines the fixed pulse width with a smart bias generation. So by using essentially good VLSI design techniques it is possible to meet the robustness requirements so that the current mode solution can in fact become practical. You can see that the ring oscillator frequency degrades by about the same order 23 percent here and the one with bias is much smaller than the ring oscillator frequency due to process variation. Essentially what it means is that the ring oscillator frequency will determine the digital rate of generation of data. If this degrades by 23 percent then as long as we degrade by less than 23 percent everything is fine and we notice that the voltage mode fails to meet this requirement. So does the current mode scheme with a fixed pulse width whereas the scheme that we have suggested and the scheme with feedback they can meet the requirement for the process variation notice this is not local variation this is process variation. However the feedback circuit is not so graceful as we had just seen in case of on chip variation between transmitter and receiver this table is for global process variations and for that the feedback scheme is not very good. For local variations the fixed pulse width scheme is okay but this is not very good but the scheme that we have suggested which is the current mode scheme with smart bias that meets the requirement in both cases. I think we will skip this to go to a bidirectional link. Now notice that bidirectional links are very important we had talked about this earlier and we need to have a scheme which will permit bidirectional transmission of data. This can be done in voltage buffer mode by using back to back connected tri-state buffers where exactly one of these is activated however as we had seen that this leads to problems first of all the delay of a bidirectional repeater is more than that of a unidirectional buffer because of the loading and a direction control signal is required by each repeat repeaters if there is a bus then the direction control signal is loaded by a large number of such transistors in parallel and the buffers carrying the direction control signal are heavily loaded and they consume additional power. So we need a repeater less signaling scheme and this can be done in the current mode bidirectional link essentially we have across the line a transmitter as well as a receiver connected to the end of the line notice that nothing needs to be connected in the middle of the wire at either end of the wire we have a transmitter and a receiver. Obviously the transmitter and the receiver must have the information of who is to transmit and who is to receive in any bidirectional scheme that is the case since that information is available we can use it to turn on either the transmitter or the receiver at either end of the wire and thus achieve bidirectional transmission quite easily this is possible because there is no active circuitry in the middle of the wire and as a result we see that as a here we plot the regions in which the current mode bidirectional drivers consume less power than the voltage mode bidirectional power. The plot is data rate versus line length and for all combinations where the current mode consumes less power we have this shaded region that means for this line length and this data rate and beyond the current mode will consume less power and you notice that most of the usefulness useful range is covered by that region for example line lengths greater than 2 millimeters and data rates which are say a few 100 megabits per second for all such combinations whatever the line length whatever the data rate the current mode consumes less power compared to the voltage mode and this is a very important point because these are in fact the robust designs that I have described just a little while ago. There is one additional advantage that current mode has and this is the power drawn from the supply the voltage mode buffers draw huge amount of power from the supply and as a result cause spikes on the supply voltage this is a source of additional noise to the entire system because current mode draws less power from the power supply the spikes that it generates on the supply voltage are much smaller and as a result the noise level injected is much smaller consequently a current mode interconnect runs quieter than a voltage mode interconnect. So, therefore, we are talking of a 68 percent reduction in peak current and hence contribution to supply noise is smaller and 80 percent reduction in active area therefore for bidirectional data transmission current mode is indeed extremely attractive. Well many of these ideas therefore sound quite attractive we would like to show that they work in silicon under practical cases and there is one problem which presents itself when we contemplate doing this and that problem is the following the overall delays of wires of any practical length are quite small these are of the order of a nanosecond or less measuring such delays is not an easy thing and if you couple it through a pad and bring it out to external instruments which might present loads of the order of picofarads it will be extremely difficult to demonstrate which of the techniques that we are talking about is in fact faster and therefore we need to develop test circuits which will allow us to compare the performance of various suggested schemes on chip itself and the output of these test chips test circuits should be such that it is either DC or some low frequency which can be brought out from the chip easily and can be measured using inexpensive instruments. We shall illustrate this by only a few representative circuits consider this suggestion what we have here is a multiplex demultiplex circuit and we have a choice the MUX and D MUX ensures that there is a ring oscillator here and this ring oscillator will oscillate at a particular frequency. The frequency of oscillation will depend on the total loop delay now what we can do is that in one of the arms of this MUX D MUX pair we can put the transmitter wire and the receiver of the suggested scheme the other is a dead shot now we measure the delay using the dead shot first this measure the delay through these inverters this MUX and this D MUX apart from that the delay of this short wire which provides the shorting path L 3 is also included. The other option in the other position of the MUX D MUX what we have are these approach lines L 1 and L 2 and apart from L 1 and L 2 we have the transmitter the long wire over which we are measuring the data rate and power etcetera and the receiver. It is laid out in such a way that the transmitter and the wire loops back to the same region this is the shorting wire of length L 3 L 1 is the length of the approach wire to the transmitter and L 2 is the approach wire from the receiver when the MUX and D MUX is in this position we ensure that L 1 plus L 2 is the same as L 3 as a result the total delay which is common for the two cases includes the entire delay of this path that is common and because L 3 is equal to L 1 plus L 2 this delay is also the same. Therefore, if we take the difference of the two delays then it measures accurately the delay through this. So, essentially what we have done is that we have converted the measurement of very short delays to a measurement of oscillation frequency. We have this ring oscillator we put the MUX D MUX in the L 3 position and because of the much smaller delay through L 3 this ring oscillator oscillates at a much higher frequency using a low frequency signal we now switch this MUX D MUX to take the lower path when it takes the lower path then the delay of the transmitter and receiver is included in the path and therefore, the ring oscillator still oscillates, but at a much lower frequency these two frequencies are indicative of the delays in the two cases and if you take the difference of the two delays in that case all the common delays cancel out leaving only the delay of the path which we want to measure. So, therefore, by measuring the frequency in the two cases which is as simple as this formula. So, therefore, the net delay of the transmitter plus wire plus receiver is given simply by this which is 1 by F RO which is the ring oscillator minus 1 by F system. So, system is when it is shorted and this is the ring oscillator with the transmitter receiver in notice that this system is much higher this was assessed by doing first simulations in simulations of course, we can see the delay and we can also see the frequencies. So, when we simulate this circuit we look at the frequencies and compute the delay using this formula and compare it to the delay which we see in from the simulation transient simulation case and we find that the percentage error is very very small. Similarly, by using a time to voltage conversion in which the application at the transmitter of the digital bit starts the charging of a capacitor and the arrival of the bit stops the charging through a current source. We can convert this delay to a DC voltage and this voltage can be read from the outside circuits. So, this essentially points out that there are circuits which are possible these circuits can be put on the same chip as the interconnect and by using these circuits we can actually make very small differences in delay and power visible through signals like frequency and DC voltages which are very easy to measure of chip. So, we actually implemented these various schemes on silicon and use these measurement circuits on chip. The high frequency of the ring oscillator was in fact scaled down by a factor of 32 to 64 to come down to a level of frequencies where we can measure it easily using inexpensive frequency meters. So, this is a chip that we actually made this was this is a photograph not a diagram and the transmitter receiver all the wires and all the circuit are here. We built an external test jig which provides all the voltages control signals trial signals and so on and the hold I was packaged in a 44 pin QFN package. Using this we measured the actual delay the power and the energy used by the three schemes and looking at the data rate and the measurements we can see that the proposed circuit which is the CMS bias remember this is the circuit which counters both the batch to batch process variation as well as the transmitter to receiver on chip parametric variation. So, using CMS bias we can see that we get about 22 percent improvement in delay and as much as 85 percent improvement in the energy delay product over the voltage mode scheme. This establishes the fact that this scheme is much superior to the widely used buffer insertion scheme and at the same time is practical against process variations and on chip variations. Therefore, it is possible that in future circuits in the interconnect aware design will make use of circuits of this class. Remember this has the advantage that the general design style of the digital circuits which constitutes most of the complexity of the VLSI design does not change at all. The bits are still rail to rail they are the conventional voltage mode bits it is only the transmission of these signals which has now been reduced to current mode. So, essentially just to summarize the behavior there is at least 7 times lower power in the worst case process corner 78 percent gain in active area this is the area on silicon and 65 percent reduction in the peak current which then translates to generation of lower supply noise. Another factor which must be pointed out is that the voltage inserted buffer have to be redesigned for every wire length if the wire length changes then the placement and sizing of the buffer inverters has to be changed. On the other hand the current mode signal is very robust it is designed once and for all and remains unchanged for all wire lengths and this is an advantage because then you can put it in a library and then not worry whatever the length of the wire the same component is pulled out and then used. The proposed dynamic over driving CMS scheme and by the proposed scheme I mean the one that we have proposed which corrects for robustness from batch to batch and on chip variation using a smart bias circuit offers 26 to 40 percent improvement in delay for 2 mm to 8 mm long lines, but also compared to other schemes it offers a substantial improvement in the energy delay product. Compared to other current mode schemes like the one with feedback there is 22 percent improvement in power delay product which is much smaller of course with voltage buffer all current mode schemes perform much better than voltage buffer schemes. So, this 22 percent improvement is over the other current mode scheme a factor of 7 improvement over the voltage mode scheme and the CMS scheme with feedback is sensitive to intraday variation whereas the current mode scheme with a smart bias remains faster than logic circuits even in the presence of intraday and intraday process variations. We have also made measurements with bidirectional links and we noticed that current mode bidirectional links offer very small delays and small consumption of power compared to traditional traditional voltage mode buffering scheme and in this because the simulation showed that the performances are not even comparable we did not actually compare these 2 on silicon. We actually did an extraction of transistor parameters from some extra patterns that we had put on the chip and then we can show that if we use the transistor parameters which occurred on the exact run on which we have made measurements then we can reproduce the results that we measure. So, in conclusion we can say that global interconnects form a major bottleneck for performance of a digital system at scale down technology. Use of current mode signaling is promising to remove this bottleneck through simulation circuit fabrication and actual measurements on silicon. We have demonstrated that current mode signaling has overwhelming advantages over the currently used voltage mode buffer insertion schemes. We have demonstrated that the particular configuration suggested by us for a current mode scheme is superior even to other current mode scheme and this particular configuration has apart from a fixed width over driving pulse, a biasing scheme which controls the amount of current dumped by the over driving in a process independent and variation independent way. Our scheme is robust with respect to batch to batch parametric variations and to on chip parametric variation and therefore, it is a practical option for use in modern systems for implementing both unidirectional and bidirectional data links. With this we bring this discussion on current mode and voltage aware data links to an end. So, essentially what it means is that the interconnect wires which were not even considered important earlier have become performance limiter and very careful design has to be done. The widely used methods are running out of power now and fortunately new schemes which combine mixed signal design with VLSI design which can give interconnect aware design and they can continue to boost the performance of integrated circuits as we scale down the dimensions at least for the foreseeable future. And it is the use of these techniques which will result in interconnect aware designs of tomorrow. We will bring our discussion of interconnect aware design to a close with this lecture.