 Welcome to this lecture on field programmable gate arrays in the course digital system design with PLDs and FPGAs. The last lecture we have looked at the vertex silencs vertex configurable logic blocks. We have looked at the lookup tables how that can be combined then we have looked at the flip flops and we started with the carry chain or the carry logic. So we will continue with that we will continue looking at the carry logic the flip flops and something with the clocking and all that in today's lecture. And so let us briefly look at the last lectures portion so that we get some continuity. See this is the CLB or the configurable logic block of the vertex FPGA it has two identical parts called slices and within a slice you have two again two identical part. One part consists of four input lookup table which can implement any function of any Boolean logic function of the four variables the output can go to a flip flop and the detailed diagram we have seen here and we said that this is a four input this is the output that can be used as a combinational output in that case this flip flop can be used separately with this input. If you want you can register the output which is the case of a data path or a sequential circuit and so on. You could combine the two four input lookup table using this MUX and the select line act as a fifth variable and such two input five input two lookup table can be combined with this because assume there is one here two lookup tables which is formed a five input lookup table the output of that F5 MUX come here and F6 MUX can make the six input lookup table and this is the sixth input and this can be taken directly out or through a flip flop in that case like if you do not use the flip flop I mean if you do not register the output using a flip flop this flip flop cannot be used because this is used for the six input lookup table. Same is the case with the five input lookup table and this flip flop and we have seen that you know this the flip flop as the data input the Q output set and reset ok. And the clock can be connected to there are options to connected to the different clocks or the user clocks ok there are global clocks and the user clock. And we will come to this after a like we will you know what is this EC we will talk about as we go along. So I have shown all that you know using the lookup table and flip flop separately together forming the five input lookup table from two four input lookup table. And forming the six input lookup table with two five input lookup table so which consumes all the four input lookup tables ok. And we have also seen that it is not necessary that you have a six input function always you need to use the four lookup you know four lookup table ok four four input lookup table. Depending on what function you are implementing you could cascade lookup table and implement that we have seen an example of six input A, B, C, D, E and A, B, C, D, F. So there is a common product term and we take it out and implement that in a lookup table that output is cascaded this is that output is X, X is cascaded with E and F to form the real output the Y ok. So it is not that whenever you have six input you need to go for a six input lookup table ok you could cascade four input lookup table. This shows a five input function using two cascaded lookup table than and F5 MUX and you know combining them but there are cases you know you take this function Y is A, B, C, D, X or E you are forced to use the tool definitely has to use the two lookup table with the F5 MUX there is no other way. So this is a summary and we also said that the lookup table is nothing but a memory one bit memory and when it is four input lookup table it is 16 into 1 RAM and the RAM write usually is controlled by the configuration circuit. Because the lookup table is written during the FPGA configuration. So the case is that if by chance there are lot of lookup tables not being used in your design this memory goes waste ok. So the write control is also available to the general purpose routing line. So in principle you could use this lookup table as a RAM and you have to use the tool core generator to instantiate it. The only thing is that then it cannot be used for logic but if you have a design wherein lot of lookup tables are you know remaining free and you have used up all the block memories the hard core memories then you could use this as a memory and since the lookup tables are distributed all over it is called distributed RAM and I mentioned that there could be kind of variable latency because it can be distributed one need to take care because if the output is not registered then there you would experience in the read different delays that could you know kind of mess up your design suppose example is that suppose you are generating trying to generate a waveform by lookup table ok. Then you will find that you know in a waveform generation you will store the sample and output it at you know the regular intervals. And if the read axis of the memory is variable then there will be jitter and the distortion will be in the waveform ok. So you might not think about it but this could be the result ok. So if you use the distributed RAM for waveform generation huge waveform with lot of samples stored then you will find that the waveform is kind of distorted depending on the frequency of the waveform ok. So if it is a very low frequency wave you are trying to generate may not create much problem but if you try to read the memory very fast where in this variation in the latency affects it then you have a problem. So you should take care of that ok. So let us come to the carry logic. So this is the full adder we are talking about a ripple adder. So this is a ripple adder equation the sum of height stage is ai exclusive over bi exclusive over ci and the carry of that stage output carry is carry i plus 1 because this is the carry input to the next stage si plus 1 ok. And that is ai bi or ai exclusive over bi and ci that means that the carry is generated if both inputs are 1, 1, 1 then you definitely know that the carry is generated it does not matter the input carry is 1 or 0 because 1 plus 1 is 0 and the carry goes out and even if this carry input carry is 1 it just kind of you know it just get it becomes 1. So the sum becomes 1 it does not matter and if not both are not 1 if either of the 1 is 1 then the carry out will depend on the carry input if the carry input is 1 the carry output is 1. So that is what the equation say. Now if you implement this in FPGA using lookup table you would need one lookup table for si one lookup table for ci plus 1 because lookup table has only one output. So for a 4 bit adder you need 8 lookup table much more than that you know that the ci plus 1 should be kind of using the wires should be connected to the input of the next stage ok. So all lot of wiring is required and this will slow down the adder ok. Now if the adder is slow the multiplier will be slow and so on all the arithmetic operation will slow down if the adder is slow. So this happens because there is interconnection between the adder stages so one thing we can do is that the carry equation can be implemented in hardware we will see how it is implemented ok. So that then can propagate like ci there is a dedicated hardware along with this si stage and the ci plus 1 go to the next stage without not through the programmable interconnection matrix or the programmable switches it directly goes by the wire. So that is how it is speeded up so I think we have just started describing it in the last class. So this equation is implemented here you know this lookup table and this MUX ok that is what it does. So we will see that in the real CLB but before going there so this is the lookup table a 4 input lookup table what is done for ai, bi is that ai and bi is connected here mind you the ci is not connected here it is not that you do you implement the sum by ai exclusive of bi exclusive of ci in a lookup table it is not done because you see the trick is that you see ai exclusive of bi is common to the sum and the carry ok. So the idea is to implement this common thing in a lookup table. So that is one done here so you connect ai, bi implement xor of ai, bi and now this MUX will form the carry this xor will form the sum ok. So these are kind of dedicated circuit so this will form the ai exclusive of bi and you can see that it is coming here combining with the ci from the which is coming out of the previous stage. So this xor form ai exclusive of bi exclusive of ci and the sum comes there ok and that can go out directly or can go to a flip flop. Now you see this the ai exclusive of bi is a select line of this MUX ok. So if it is 1 assume it is 1 then you see the ci so that this is the path. So ci plus 1 equal to ci that is what is this equation saying. So if this is 1 then the ci plus 1 is ci ok. So if either of the bit is 1 the carry depends on the input carry. So if the carry input is 0 then the carry output is 0, if carry input is 1 carry output is 0. If it is not like if this is 0 then the carry i plus 1 is ai bi. So you can see that if it is 0 this path is selected now there are 2 possibilities. If this is not you know either of them is not 1 then there are only 2 possibilities both can be 0 or both can be 1 ok. So it is enough that you do not have to do an and you just pick up ai and connect it here if ai is 0 then the carry out is 0 ai is 1 then both are 1 because 0 1 1 0 this is the path selected 0 0 and 1 1 this is path selected. So if both are 1 then ai definitely will be 1 that is enough and the ci plus 1 go to the second stage exactly like that ok. So let us go to the CLB the slice diagram and you can see that say look at here here is where ai and bi are connected you form the ai exclusive of bi in this lookup table. And you see that that is coming here to this exclusive or gate and the carry in from the previous stage is coming here through this MUX and is combined with the ai exclusive of bi and you get the sum. So now you can take it out directly or you can register it through a flip-flop depending on the function you are trying to implement if maybe you are trying to implement a counter then it goes here if you are not registering it for whatever reason then you can it goes out directly ok. Now you can see the other function ai exclusive of bi is going to the select line of the ci MUX and you see that the ci plus 1 is nothing but ci then if it is 1 but is 0 you see the ai or bi is coming as an input ok. So this is the carry logic which is implemented it works definitely along with the ai exclusive of bi here and it goes there and it forms the carry logic for the next i plus 1 index and this is the sum. So the sum is form using this lookup table and the XOR gate in the i plus 1 this lookup table and the XOR gate and the carry circuit is here for the i stage i plus 1 stage is this and so on it goes like that and you can see there is a MUX input MUX in this stage. This allows you can see that if you are starting the 0th stage of the address this stage then you can have a carry input to the stage ok like the first stage can have a C0 or C in 0. So that is what is this for which is not available here it will be available only in the next slide. So that is how it goes I hope that is clear to you. So the suggestion is that when you implement an adder in FPGA you do not write the equation of the adder using a loop or a generate as we have shown in the VHDL because when we discuss VHDL we were not discussing probably the FPGA implementation part we were discussing the VHDL synthesis ok. Now when you come to FPGA if you write equation for a ripple adder then it will get implement and everything will get implemented in lookup table and everything will be slow. So to be able to use carry chain there is one direct method there is very circuitous method the direct method is that use the operator plus ok. So you want to add A and B you just say C gets A and B ok straight forward otherwise you can go very low level you can start playing with the lookup table carry logic instantiate them interconnect them and all that but not a very good idea not required. So advice for the FPGA design is that whenever you try to use a plus operator it will use the carry logic when you use a counter and higher level function somewhere this plus will appear you say count gets count plus 1 you know we have seen that in our example then automatically the carry logic will be used and there is an AND gate which is shown earlier which is used for the partial product implementation in the case of a multiplier. And in some cases the carry chain can be used as for cascading that means that you implement say suppose the inputs are for A, B, C, D then you implement A and B and C and D. Suppose you want to AND it would say EFGH there is a way that you know it comes here and in this kind of circuit and that get ANDed here and you can take the output ok. So not all the FPGAs at least not in the vertex but then there are FPGAs where this AND cascading or over cascading can be done and you can think about it is very easy with a multiplexer kind of scheme it is very easy ok. So that is this then in that case it is called cascade logic or cascade chain. So let us come to the flip-flop now this part I want to talk about this EC or it is called enable clock and when we discuss our initial case study of the CPU we said the controller will control the data path many a times it will control a enabler register or a counter or things like that. In such cases and we have seen one foolproof way of implementing that is using a recirculating 2 to 1 MUX that suppose this is a latch signal from the controller and this is the register then what happens is that when the latch is 1 the input gets registered the latch is 0 that output gets recirculated and there is no timing issue. If you do clock gating there was a timing issue since the latch signal is coming from the state machine there will be a delay with respect to the clock the tcq plus tcom delay and you can see that this enable signal will come offset to the clock period and in the next clock edge when the clock comes here to the register in this edge the data is coming in from here it gets latched. And the data has lot of setup time you know you have all the time to set it up okay because it gets enable much before the latching clock edge okay so that is what is this recirculating MUX does. Now this recirculating MUX is built into the flip flop of the FPGA so the real FPGA flip flop is a kind of what to say it is 2 to 1 MUX plus the flip flop combined okay. So this gets in to the flip flop and this is you know written as clock enable. So whenever you see that you assume there is a 2 to 1 MUX with this clock enable as a select line which allows you to kind of implement and enable signal. So you look at this VHDL code which say that if clock tick event and clock is equal to 1 then if some condo signal is 1 then Q gets D okay. So normally some 2 to 1 MUX is required to implement that but in FPGA since it is built in whatever is a condo signal is connected here the clock is connected here and this is the D this is the Q you do not have to do anything okay. One level of 2 to 1 MUX is available but then you need further control then you need to have the MUX's outside this would not help you but in most cases on an average you need kind of one level of control enabling a counter or enabling a register and so on. So that is easily implemented using this clock enable and some vendors call it enable clock okay. So some people call it CE clock enable some people call it EC enable clock kind of to make a difference probably between the devices okay or vendors. So now let us see how the circuit get mapped to this scheme of this kind of architecture. So one thing is that suppose you have combinational circuit you know that lookup table is used to implement it can get combined if there are more inputs like 2, 4 input to 5 input, 2, 5 input to 6 input and so on okay and can be further combined or it can be cascaded okay. So basically when you have a kind of combinational circuit will be implemented in one or more lookup table. It could be cascaded it could be combined using MUX's whatever it is but if you have a you know data path okay I have written sequential circuit but then assume it is a data path or a part of a sequential circuit does not matter you have a flip flop through a combinational circuit the flip flop okay. So you know there are in the slice or the CLB there are a lot of flip flops all these get implemented in the flip flop of the CLB and the flip flop output goes to the combinational circuit the combinational circuit goes to the flip flop and this get implemented in one or more lookup table and we have seen in the diagram the combinational circuit output can be taken to the flip flop okay and this can come from the flip flop before okay. So that is how it gets implemented and if you take in a FSM finite state machine you have a logic output goes to flip flop, flip flop output is fed back to the logic and some input comes there. So again same thing you know you can implement the look next side logic using lookup table which gets the fed back flip flop through some wires and the input is coming with the wire and the output of next side logic is going to the flip flop and the flip flop output again goes to the output logic. So that if you look here you have the lookup table which is acting at next side logic the output goes to flip flop this output is fed back here to form the feedback and this can go to the next stage lookup table to implement the output logic. So that is how things get implemented in FPGA and let us look at the IO block okay. So we have seen that all around the FPGA are IO pin and you know attached to IO pin is an IO block okay. So the star of IO block or the main element which implement this IO functionality is this tri-state gate okay. So when that there is an enable when there is enable it act as an output when it is cut off it act as an input. So there are essentially three paths one is so this is an IO pin and you can assume the FPGA wires are here to which all this signals can be connected. So there are three paths one is from these wires internal wires to the output pin from the output pin to the wires inside the FPGA like this and there is an enable signal which is also controlled by the wire okay. So three path output input and the enable path now all this can be combinational it can come directly it can go directly and this enable path also can be combinational. But see there are two things we might get an input which is not synchronous with the clock here. So it can be input synchronized okay we have seen the problem of metastability we told about the single state synchronizer. So you have the option of taking the input directly inside to connect to the wires or through a synchronizing flip-flop and that flip-flop as you can have the clock selection multiple clocks the clock enable the reset all that is there in the all the flip-flops okay. Now there is a programmable delay which you can add input delay to make the whole time 0 because to meet the whole time it is quite a tough thing. And we have seen that whole time for the whole time violation not to happen tcq plus tcom min should be kind of greater than the whole time. So many a times to if the whole time is large it is difficult to meet so this can be adjusted to make the whole time 0 but it will increase the setup time it does not matter. Similarly you have the enable path and that can kind of come directly through a combinational path or it can come registered okay. So that is a basic IO block and sometimes some voltage standard you need to give a reference voltage and that is this particular pin. So these pins support various IO standards which can be programmed in the constrained editor okay. We will see the constrained editor when we discuss the tool but this kind of it support various voltages maybe 3.3, 2.5, the PCI standard because where different things can vary and the slow rate can be programmed the rate of change of output can be programmed. You can program a pull-up resistor or a pull-down resistor we have discussed that if it is tristated this will be in high impedance and it will pick up noise and if there is this is an output whatever is connected at the output those inputs can start switching because it is it can be maybe near to the threshold voltage. Then a slight noise make it go up and down to the actual logic level and the input circuitry can switch and dissipate power. So this avoid that you can pull-up pull-down and there is a whole circuit or a weak keeper which remembers the last value. So if you can choose either pull-up pull-down or this okay. This is a latch which latches the previously driven value very weakly so that also can be a program. So essentially this is what that slide say there are 3 paths output input and tristate can be taken directly as a combinational path or through flip-flops. And flip-flops are set to reset clock enable clock selection and all that. Programmable delay can make the setup time 0 sorry hold time 0. You have the programmable pull-up pull-down hold and slow rate all that is programmable okay you can program it. One time you know you have implemented something in a CLB okay and that the CLB output is taken to a pin or a pad. So what the tool does is that it will move that flip-flop okay to the IO block because it is very nice to have the last you know latching or registering at this point. Because if it is somewhere inside a CLB then again you have to take the wires that will add lot of wire delay and which may create timing problem with the following you know the circuit following logic okay. So many times it is a wise idea to register the output before leave the pin because that goes with the minimum delay. So after the clock head there is a TCQ delay so there will be less timing problem for the input okay whichever circuit is using this output as an input will have less problem. Similarly like we have the input which if it is asynchronous we can synchronize. Similarly if everything is synchronized this tri-state enable also can be synchronized. And it supports various IO standards CMOS 3.3, 2.5, 1.8 and all that. And all the time there will be kind of core supply which supply to the internal and the IO block supply. Because the core can work at a low voltage resulting in low power dissipation and when it comes to the IO pins this can be scaled up because if the swing is more there will be more participation. And so the many times the most recent chip not only FPGA the core will work at a low voltage near 1 volt and when it comes to the pin it is scaled up to 2.5 or 3.3 volt. So that is done in FPGA2 and this is the whole circuit suppose you have a bus and if you program the whole circuit to the bus then what happens is that suppose the bus is driven 1 then what happens this 1 is latch here and 1 comes here this becomes 0. This 0 will become 1 and it gets latch. But if you put a normal latch then this inverter will drive that bus very strongly. And if somebody some other output is driving it to 0 and if this is driving it to 1 there will be a short circuit. So that is why a high resistance is kept here which will weakly drive the bus and hold on to the previously driven value very weakly. Very weakly means that since it is through a resistor some other output can pull it down okay make it 0. And since there is a high impedance nothing happens otherwise there will be a short circuit. So that is what the purpose of the whole circuit which remembers the last value on which keeps hold on to the last value of the bus and thus avoid the glitches and switching unnecessary switching which is dissipating power and create noise and so on okay. And this is the view of the internal wiring structure you have lot of wires. There is a huge switch here which connects all vertical, horizontal wires together and the input to the CLB is from here. But in vertex CLB there are connection from this switch directly to the CLB okay. So that is what is shown here this is that switch which connects to the adjacent switches. So this is that switch here and this is the logic block. So this is the logic block here what it shows is that that switch has direct connection to the CLB okay and the CLB output is fed back to the CLB input. So it need not go through the general wires okay so it is not that the CLB output is directly taken here suppose it has to be taken to the input it need not go like this come all the way back and from here okay. It can directly go to the input and it also shows that if you have adjacent CLB it is directly connected. So there is a CLB here that there are some output going to the input of this CLB or similarly some output from CLB is coming to the input of CLB. So there are no programmable interconnections. So it is very fast you know the interface between the adjacent CLB is very fast okay that makes if things are close together the delay between them is very minimal okay. So these are kind of some statistics of the vertex routing wires the adjacent CLBs are directly connected therefore there are 24 single length line. So it means that you take a switch like this switch like this between the switches here there are 24 wires okay between these switches there are 24 wires between these switches there are 24 wires and you have 72 buffered hex line that means from 1 CLB to the 6 CLB. So from here to the 6 CLB there are single wire running okay. So that makes the lowest very large you know lengthy connections less costly otherwise if it single length wire it needs lot of interconnection to reach the destination and it will all add to the delay but this will reduce the delay from the first stage to the sixth stage okay. And there are 12 buffered long line that means there are lines running from one end of the chip to the other end of the chip okay and not only horizontally but vertically also okay. And per channel there will be 4 tri-state line both horizontal and vertical and those there are separate line which can form tri-state line or busing. So every CLB has 2 output through a tri-state gate which is connected to some 4 common wires in some fashion. So essentially and the 2 output is connected to the 2 2 wires so you can form a bus using this tri-state gate as output of the CLB that is possible. So let us now ask having learnt all the architecture we have looked at basically the architecture of the CLB we have understood how that is interconnected and so on. So let us just try to estimate some resource requirement. So my question is that say we have a finite sheet machine it has 2 inputs okay external input it has 3 states and 2 mille output how many CLBs you require in a vertex FPGA to fit this finite sheet machine that is the question okay. So if you look at the finite sheet machine you have an exit logic which decode the present state and the input okay. So we have to look at to decide how many lookup tables this will combine this require we need to have we need to know how many flip-flops are there and how many inputs are there okay. Similarly you have to determine the number of flip-flops required and you should remember that next aid logic is required for each flip-flop if there are 4 flip-flops here then there is a Q3, Q2, Q1, Q0 so there will be D3, D2, D1, D0 and the next aid logic should give 4 output to connect to D3, D2, D1, D0 okay. So let us work out this requirement and this requirement okay for this case and we know that the vertex CLB has 4 input lookup table and a flip-flop and such 4 of them okay. So we have sorry this was the example we have a finite sheet machine with 3 state so it require 2 flip-flop to implement in a binary encoding. So the next aid logic has 2 inputs and the present state which is 2 Qs from the current flip-flop. So the next aid logic will have input 2 state variable the present state and 2 inputs so that is all so 4 input so and you know that there are 2 states like Q1 and Q0. So we need to decode for D1 and D0 so there are 2 4 input lookup tables are required okay because 1 4 input lookup table is required for 1 bit and so you require 2 4 input lookup tables for the next aid logic and when it comes to output logic we have a melee output and that decode the present state and I have not shown but the input also okay. So there are 2 flip-flops 2 inputs so there are 4 input output logic and that occupies 4 input logic will require a 4 input lookup table. So that is shown here that is you have output logic has 2 input because we say 2 input and 2 state variables because there are 3 states and Q1 and Q0 so you require a 4 input lookup table for output but we have 2 outputs so you need 2 4 input lookup tables. So we need 2 lookup table for next aid logic 2 flip-flops for state variable and 2 lookup table for output logic. So if you look at there are 4 input lookup table and 2 flip-flops and in a CLB we know that in a CLB we have 4 lookup table and 4 flip-flops so this require 4 lookup tables and 2 flip-flops. So basically this require 1 CLB minus 2 flip-flops so you can see the kind of the power of this architecture a small state machine require only 1 CLB to implement okay. Similarly let us take an example of an 8 bit counter with parallel load feature that means a normal 8 bit counter with a reset and also it has a synchronous load okay. We have seen the circuit of this counter so if you look at it you have the 8 bit Q7 to Q0 flip-flop and that is incremented and given it here there are for parallel load they have to have a kind of load signal and the data in signal okay that is required. So for each flip-flop if you take there will be kind of 8 things fed back and you have the load and the data in and all that okay. So we come to this 8 bit counter you need 8 flip-flops now the incrementer can use carry chain so essentially we need only 8 lookup tables and this carry chain will implement the plus 1 okay because you know that you can have and think of an adder with the carry input as 1 then will implement the plus 1 and that can go to the flip-flops here. But so you have the input suppose whatever is a Q will come here and the carry chain will use it as an incrementer with a plus 1 this BX will be 1 and that will add 1 to everything you do not need to give a plus 1 at the input of the lookup table but this can be used for plus 1. So that is how the counter get implemented not only that we need a load feature and a data input so that you know that now the load can be given here and the data in for that corresponding variable can be given here still we need only a 3 input lookup table but there is 4 input lookup table. So that is enough to implement the incrementer plus the multiplexer for multiplexing the input value okay. So essentially we require 8 flip-flops 8 lookup table with a carry chain okay. So that is what is required you have 8 flip-flops incrementer use carry chain next 8 logic will use 1 state variable for plus 1 and 1 load signal and 1 data input signal for loading. So 3 input per state so NSL next 8 logic require a 3 input lookup table. So we have we need such 8 for 8 flip-flops so 8 lookup tables 8 flip-flops so there are you will end up with 2 CLB or 4 slices okay. So you can if you have a circuit if you know what is inside a circuit you can estimate how much resource is required to implement that circuit okay knowing the logic lookup table. So as I said we are discussing the vertex architecture that vertex chip is not available now the higher the recent version of the vertex is there vertex 5 vertex 6 vertex 7 different versions. You can look at and you have the Spartan 3 Spartan 3 has a similar architecture as a vertex but Spartan 6 which is quite different and so all these you know now you can go back look at the architecture of the CLB and you can understand the functionality you can given a code given a spec you can estimate how much lookup table how much flip-flops are required to implement this functionality okay so that is that can be easily done. So we will take one more kind of example what I do is that I will give a VHDL code and let us see how that VHDL code gets implemented within the CLB the logic block using slices okay. So this is the library and the package declaration this is the entity declaration very simple entity A, B, C, D, E, F, T, H so you have 8 input 1 output it is a combination logic the architecture you write a process which is sensitive to A and B begin if A is 1 then Z is 0 so it looks like a reset A is a reset Z is an output else if BT given and B is equal to 1. So we know that B is clock A is reset Z is output underneath we say if C is 1 then Z is D and E and F and G X or H okay. So we again know that when you say under this if C is 1 it is a condo signal okay. So this can go to the clock enabled because that can enable the recirculating box and Z is the and for variable X or the fifth variable and we have seen we have discussed this in the last lecture you need you cannot cascade to lookup table you need to combine to lookup table with an F5 box to implement this because this is a bar and this or this as it is an H bar okay. So that is how the equation comes then you need 5 input lookup table. So that is the essence of the circuit I have not shown the synthesized circuit because it is easy to imagine A is a reset B is a clock C is enable clock Z is output this gets implemented in a 5 input lookup table. So let us come to the slice diagram so we need to combine 2 4 input lookup table using the F5 MUX okay. So D and E and F and G are common so you see D E F G is connected to the wire to the top lookup table same D E F G is connected to the second lookup table and that is you can see the output is going to F5 MUX the select line is a fifth variable which is H. So our logic gets implemented here okay. So that is what is the logic here that goes to the D of the flip flop now we have to implement the C that is nothing but so this F5 output has to go to the flip flop here. So you can see that this take through this MUX through this MUX it comes to the flip flop and you see the enable clock is connected to the C because that is the control signal as the code suggest. Now B is the clock and you can see the clock is this pin and that goes to the B okay. So assume that there are wires before the lookup table in the channel and these are connected to the wires there using some switch there okay. Now we have one more point A is the reset so you can see that A is the set reset A is connected here and that blue line goes to the unit of the flip flop so that is get reset. So you can even so the idea here is that there is no magic in most people do not look at the CLBs okay. So however complicated the CLB is that it is nothing much it is lookup table now you go to a higher FPGA you might have a more complex lookup table like instead of foreign input lookup table they will have 6 input lookup table and sometime within that lookup table that lookup table itself is composed of 2, 5 input lookup table. So here what we have is 2, 4 input lookup table which can be combined into 5 input and 6 input but in a complex FPGA you might have a multiple 6 input lookup table which can be used as 2, 5 input lookup table so like that you know they play with this kind of functionality but if you spend some time you will be able to understand this thing and given a code you can even kind of draw the synthesized circuit and you can kind of place and route manually I mean for your understanding to the slice or within the CLB. So what you can do is that you can write such a VHDL code using the tool you can implement it you can place and route and there is a floor planning tool you can go inside and you can literally see that connection you can place this wires inside maybe when we discuss the tool when I show the demo of the tool which I keep it to the end because the problem with the tool is that the tool gets you know modified change all the GUI change the menu change the more functionality is added and if I start introducing show things in the middle of the lecture middle of this course and after say 1 or 2 years if this course survive it is found useful and then this will become that part become outdated. So I am keeping that towards the end I hope you will have the maturity to kind of quickly adapt to that kind of thing that every lecture we do not need to you know show a tool and make it work and things like that I keep that towards the end so that is how. So let us come back to the slide so that is how given a VHDL code how it is mapped here definitely I skip one step I did not draw a diagram the schematic of the block schematic which synthesis tool synthesize you can do that because it is fairly simple that is why I did not draw so that is how things happen and the vertex has the block memory built in okay the vertex FPGA has dedicated memory built in which you if you want to use you should not write the VHDL code for it what you can do is that though the tool allows it you can instantiate the template for this using a tool called core generator. And if you see this dual port memory is quite useful because it is unlike the normal memory with a single port it has 2 ports okay so you can see that there is 2 address line, 2 data line and 2 write lines, 2 enable lines and all that. So this is a single memory and of course the data input and data output is separate which is true of every memory chip because you can imagine the memory cell inside or you want to assume the flip-flops inside in each location and the data input is a D of the flip-flop, data output is a Q of the flip-flop okay. Only thing is that there will be multiple location depending on the location you want to read there will be a MUX there the address goes as a select line of the MUX and appropriate location is MUX to the output okay. And similarly the input input goes parallely to all the flip-flops but this address line will clock the correct location okay. So there is an address decoder which enable the clock to the input of the flip-flop okay. So in principle you can have the multiple port you know you have the same location use another MUX with the second set of address line to read another location. Similarly as you are writing to one location you can have an address decoder clocking at another location. So that is how the dual port RAMs are built I have not shown the internal details but you can imagine it is not a very difficult thing to imagine. And in principle you can build any number of ports like another 3 ports 4 ports it does not matter but at least what is useful is a dual port because many a times you can write using one port and you can read using another port okay. So that is very useful because maybe somebody some hardware which is computing is writing to the memory location. At the same time the earlier output can be read and computed okay. So you can see that the computation can go uninterrupted if it is a single port memory this computation block has to write then stop it then the computation here has to read using the same port and read it okay. So the throughput will suffer the dual port will allow you to kind of write and read at the same time different location if it is same location of course there will be a conflict if one port is writing and one port is reading the same location okay then there is a problem what is maybe read may not be correct okay. And you should also know that there is no great thing about input and output data line in a normal chip it is multiplexed together to save the pin but in a it is very ideal to have separate input and separate output because you do not need a mug select like you do not need separate write read select and all that which avoids that. So that is how the dual port is implemented. So it is true dual port each port can be read or write or read or you know only read or only write it is all synchronous you know it works with the clock edge you know what when a clock comes this gets latched the address is used data is used similarly when a clock comes the read happens okay. It can be instantiated to co-generated tool the various blocks can be combined for larger width and depth and if the same location is accessed there is conflict and the read can be wrong and the memory can be initialized in VHDL code okay like you can instantiate it and you can write some pattern in to start with in the memory. And so while configuring the FPGA it gets written so that can be specified in the VHDL. So I think we have come to the end of the lecture what we have seen today is basically the carry chain how it implements the fast orders then we have looked at how the sequential circuit the FSM gets mapped into basically you know to the CLB we have seen the IO block we have seen the whole circuit we have seen the little bit the tri bus line and we have seen some example of fitting an FSM counter and given a VHDL code how to even trace the routing within a CLB and we have seen the block memory usage okay. So we will wind up here we will look go ahead with the configuration of the FPGA in the next lecture. So I wish you all the best and thank you.