 Welcome to this lecture on field programmable gate arrays in the course digital system design with PLDs and FPGAs. In the last lecture we have looked at the carry logic in the vertex. Then we have looked at basically the routing resources and how the combinational circuit and sequence circuit or the data path maps into this FPGA resources. We have looked at the dual port RAM and the few fitting examples you know we have looked at given some state machine and given some kind of circuit how much resources it uses within FPGA and also we have looked at a VHDL code and given that code what is the circuit it is synthesized and how is really the signal paths trace within the logic block or the slice of vertex FPGA. We will have a quick run through those lecture slides before we get into today's portion. So let us move on to those slides. So what we have told about the carry logic is that when you the basic operation in arithmetic is addition which is used for subtraction, multiplication, division everything. So to make the arithmetic faster the addition has to be faster. But if you do implement and you know that for each stage you need two outputs one as a sum output and another as a carry output and if you implement this in two logic lookup tables then you need to interconnect them with the wires and all that. That will make all the kind of when you cascade the full adders the total delay will be very high. So there will be advantages if this part is kind of hardwired or put it into dedicated logic and that is what is the carry chain is all about and we have seen that essentially AI like for a particular stage I the input AIB is given to lookup table and an XOR is performed and that XOR select a MUX so which when it is one the carry output is a carry input or if both are one the carry is generated if that is not the case okay. So essentially that implement this equation AI exclusive of BI is this path and AI BI is this path and to generate the sum we use this particular AI exclusive of BI and an external XOR gate you know XOR with the carry input to generate the sum. So you can imagine this goes to the next stage where AI plus 1 BI plus 1 is input to the lookup table and that combine and generate the SI plus 1 and this logic there generate CI plus 2 and so on okay. So that is how it goes and this is dedicated this is dedicated built in and we have seen that in the real the logic diagram of the slice where you can see all these you know this part and this part which forms a carry logic and this is exclusive or gate external to it and that is when that can be you know you can take it out or you can register through this particular flip flop okay. And the advice is that if you use a plus operator the tool will pick up this carry chain and use it you do not have to worry you do not have to write the kind of equation for the ripple adder or anything like that. In fact if you do that it will not get mapped into the carry chain it will use you know two lookup tables so whenever you want an adder use the plus operator and also some FPGAs this carry chain can be used as a cascade chain that is this particular output can be kind of ANDed with this particular output and so on okay or odd with it really depends you know whether it is kind of positive logic or negative logic. And we also have seen in a sequence circuit or FSM controls a register or a counter and in our CPU example we have seen that that can kind of enable the data path through a recirculating MUX. So when they enable signal from the FSM is 1 something input goes to the flip flop otherwise it is recirculated. Now in FPGA this recirculating MUX is built into the flip flop and the select line of the recirculating MUX is available as clock enabled or some vendors call it EC enabled clock okay. These are this is all same you know some call it CE and some call it EC. So if you write a VHDL code like this upon the clock if some condo signal is 1 Q gets D then automatically that condo signal is connected to clock enabled because one level of recirculating MUX is all already available. But definitely if you have an you know you have an else condo signal then another MUX has to come outside which will be implemented in the lookup table than in the flip flop. So this normally may on average you will have mostly one control and that is built into the FPGA it is quite good because otherwise for a 2 to 1 MUX unnecessarily the lookup table will get used and by building that into the flip flop the lookup table is saved otherwise a 4 input lookup table will be kind of wasted for a 3 input logic because 2 to 1 MUX will have 2 input and 1 select line as input. So it is a 3 input logic but a 4 input lookup table will be used to implement that okay. And when you have some combinational circuit in an FPGA it will be mapped to one or more lookup tables you know maybe 2 lookup tables will be cascaded or both will be combined whichever way or 4 of them is combined using F5, F6 MUX which is cascaded with something else and so on okay that you can kind of work it out. And when you have a sequential circuit mapping you will have some flip flop combinational circuit flip flop be it the sequential circuit or data path it is all same. Like in the case of sequential circuit the flip flops are kind of you know the source and destination can be in one place and in the case of a data path it is some source register and a separate destination register but the path itself is different in sequential circuit. So the flip flops get implemented in one or more flip flops combinational circuit will get implemented in the lookup tables. And again the flip flop will go to the flip flops of the slices okay and will be good the best we can hope is that this combinational circuit follow with the flip flop can get this thing can get implemented in you know the same slice okay depending on the number you know depending on the number of flip flops it might occupy multiple slices and this would require some extra flip flops from a slice you know it all depends on the complexity we unless we see the circuit we cannot say how it gets fitted. Same thing with the sequential circuit or finite state machine you have flip flops and the next state logic. So this you can imagine that coming in a kind of you know if you go back to this diagram sorry here you have the lookup table followed with the flip flop. Lookup table followed with flip flop so definitely so you have the combinational circuit the lookup table followed with flip flop so that can get implemented and that output can go to the lookup next slice and so on. So that is how the real sequential circuit get mapped into FPGA. This we have seen the IO block essentially it is a tri-state buffer which allows an output path an input path and a tri-state path okay and this can be combinational. So as I said there are FPGA wires here so the output can come directly or through a flip flop so it can get synchronised and go out. So that is a good idea because otherwise already if it is coming through some combinational circuit there will be lot of delay from the clock edge and you know it is going out of the chip it can suffer further delays. So it will be useful to synchronise it with the clock because it appears with minimum delay after the clock then maybe it can be re-synchronised at the receiving end or go through some combinational circuit and get re-synchronised. Similarly input path can be directly taken or it can be synchronised and all these flip flop has clock enable and the clocks and the clock can be chosen from multiple clocks they have reset and set and this programmable delay can be added to make the whole time 0 because sometime the signal is coming from external source it could be synchronised signal but then the whole time can create problem because meeting the whole time is tough so we can add some delay to make the whole time 0 once it is synchronised the whole time will not be much of a worry okay. And similarly the tri-state path you have the combinational path or the registered path and as we said that you know it can be pulled up or pulled down because when you are tri-stating a signal it is floating the noise can get picked up all the inputs can switch. So that can be avoided through the large pull up and large pull down or you can programme a whole circuit which weakly holds on to the last value and so to avoid the noise pick up and it support various IO standards and this is a whole circuit it is a flip flop which latches the input part you can see the output is driven through a large resistance so really does not matter even though it is holding driven by a flip flop the other circuit can kind of pull it down or pull it up with a low resistance there is no kind of VCC to ground short circuit because of this resistance but it avoids the switching you know used due to noise. And this is the main you know the wiring diagram we have main switches and this is the logic block and this is the input to the logic block output from the logic block. As the picture shows there are direct wires running from this switch to the CLB and the adjacent CLBs are connected with dedicated wires that means this CLB output can go to the input of this CLB and this CLB receives input from the CLB output without going through the wires. So that can be very fast and there are some statistics of the number of wires which is available then we have seen the bus lines to tri-state gate per CLB to form the bus which is not available in the kind of current FPGAs. So you want to do maxing you have to do the and or maxing then the tri-state gate maxing then we have looked at an FSM example with 2 input 3 state 2 mille output we have worked out how much CLB it requires to occupy I mean to implement this and we have you know come out with basically 1 CLB-2 flip flop we can kind of refer to the last lectures discussion to get into the detail. And we have looked at so that is how kind of it implements and we have looked at the 8 bit counter with the parallel load feature. So this is the counter you have the 8 flip flops with the plus 1 implemented and we said that this lookup table with the carry chain can implement plus 1 and the flip flop. So it will be 8 flip flops and 8 lookup table with the carry chain that requires so basically it requires 2 CLBs. And we also have looked at a code with kind of A to H input and we said it is a process A is reset B is the clock C is the control signal and Z is B and E and F and E and X or H we have discussed this will be a 2 4 input lookup table cascaded. C go to the enable clock of the flip flop B is the clock A is the reset ok. So we have shown that you know you have taken this flip flop and we need 5 input lookup tables so D, E, F, D goes to same 2 lookup table H is the 5th variable that selects this F5 MUX. So that is combined and that output is routed out to the flip flop and the flip flop gets you can see the clock which is B reset which is A this blue line and the clock enable which is EC which is C. So that is how it get mapped as I said you can write this code select a vertex FPGA or as part in 3 FPGA and synthesize it implement it go to the floor planner open up then you can see the zoom into the CLB which is implementing. Then you can see this wiring connection inside there so you can verify that it is working if the time permits I will show that in the tool this exercise I will show in the tool. And we have looked at the dual port RAM and true dual port RAMs are available because it is a RAM block RAM with 2 ports with input and output separated. So you could write through the one port and read from another port or you can write using both port does not matter but only thing is that there should not be conflict you know both should not try to access the same location. So if somebody is writing and you are reading through another port okay this port then if it is same location then the read value may be wrong and if you try to write to the same location we are not sure which one will succeed. So there is a conflict so that should be avoided but otherwise it is a true dual port and we have discussed little bit about the multi port memories it is not very difficult to implement that but these are kind of hard coded and this can be combined you know this basic blocks are available this can be combined in various ways to implement say different width different depth like you know this will be supposing to 8 then you can combine 2 of them in parallel to implement into 16 both of them can be combined to implement double the size which will involve some maxing as we have seen the lookup table that is all part of the block RAM but you should know that there is a basic granularity with which is the size of the width and the depth. So if you choose some arbitrary width and arbitrary size some block RAM get wasted so you would see that though the you might look into the FPGA data sheet and find what is the total size of the memory available but if you choose arbitrary depth and width you will see that some memory is wasted okay. So that you need to take care you should find what is the basic granularity basic size in terms of the depth and the width. So if you choose an integer multiple of the depth and the width then everything there will be no wastage but if it is kind of fractional on that then some gets wasted there is because these are kind of cascaded or ganged together at a block level not at a very small area I mean small size level. So the basic blocks are combined together to build that so that should be kept in mind So this is true dual port memory each port can be read or write or either you know one is read one is write it is all synchronous read and synchronous write everything gets written on the clock edge and down the clock edge and it can be combined for larger width and depth and it can be instantiated through core generated tool. So when you want to use that you have to invoke core generated tool that will give the instantiation template which can be cut and pasted in your code and this is the conflict and basically this memory when you use you can initialize it with some data and that initialization can be specified in the VHDL code okay. So that is what we have seen in the last class maybe I have gone little fast. So I have repeated that part you please look at the last lecture and this lecture. Now let us come to another part we have discussed this the topic metastability like for a flip flop to latch or to register the input data to the output the input has to specify some timing with respect to the clock edge. That means when the clock edge comes the data has to be set up sometime before it and after the clock edge it should be held on for some more time. So that the value at this point gets you know faithfully copied or transferred to the output okay. So there is a window around the clock edge where in the data has to be stable okay. The time before the clock edge is called setup time time after the clock edge is called the whole time. Then if that is met with a propagation delay TCO the output appears after the clock edge. And if it does not you know if you do not meet that then there is a chance that this could be wrong this would not be a faithful copy of the input. In the worst case it can get stuck in between and we have discussed this phenomena and we have seen how to avoid that by synchronization. We have looked at the single stage synchronization, double stage, multi stage and all lot of techniques and we have also looked at little bit you know how that probability changes and all that. So that is a essence of the timing of the input of the flip-flop. Now when you take a data path and this timing plays to find out what is the maximum clock and basically to avoid whole time and all that. So take a data path where there is a source register the output of which is going through a combinational circuit and reaches the destination register and we know that when a clock comes here at the same time the destination also get a clock. So but the data like whatever may be the data at that point in there is get captured but the input like because of with respect to this clock the output changes and it propagates through combinational circuit and reaches here by the time the same clock edge is gone already and now we know that the next clock edge come and we have to have some setup time before it for it to be captured. So we always in this case analyse the timing with respect to one clock coming here and the data get transferred here go through the combinational circuit and it is setup before the next clock edge. So a clock edge to clock edge timing is analysed to find the clock period okay the minimum clock period which is should be greater than TCO plus TCOM plus T setup. We also know that when a clock edge comes the input has to remain there after the clock edge okay so we should make sure that the input when a clock comes here both at the same time the input changes after TCO plus TCOM that should be greater than the whole time okay because that is with respect to same edge because the same edge is coming there and there is a whole time after the clock edge. So but the data is changing here going to change here after TCO and TCOM so that TCO min plus TCOM min should be greater than the whole time otherwise the whole time is violated with respect to the same clock edge. So basically from the setup and whole time we are you know kind of coming out with this the setup time decide this clock period the whole time decide this whole time violation condition okay in the data path same thing is true for a sequential circuit and FSM where the critical path is from some source register through a combinational circuit which is next side logic to the destination register okay. So the situation is same okay when you look at the individual flip-flop the situation is same from a source register through a combinational circuit to the destination register the clock period should be TCO plus TCOM plus T setup similarly the whole time violation the TCO min plus TCOM min should be greater than the whole time and when we mind you should remember that in all this analysis the clock period is analyzed from one clock edge to the next clock edge and the whole time is analyzed for the same clock edge okay like when the clock comes the data changes with the TCO TCOM here and the same clock there is a whole time window. So this change should happen after the clock edge that is the basic what is basically expressed in this inequality. Now one thing you should remember that we are assuming one we are making a very kind of simplistic assumption in all these analysis in the minimum clock period and the whole time violation you know you can think what is this assumption we are making. Assumption which is not very realistic which we are making is that the clock reaches at the same time to the source and destination flip-flop okay. In a picture it may not be very obvious because you know you are writing a block diagram and it looks near but you should remember that in a chip you know in a huge chip area maybe the source register is in one corner and the destination register is in another corner and the clock has to travel through various path to reach there and that can suffer delay all along. So the clock there could be skew between the clock okay the skew is the relative delay between the clock arrival that means the clock arrives here at 100 nanosecond and the clock arrives is here at 102 nanosecond due to the wire delay then there is a skew between them 2 nanoseconds skew between them okay that can affect because so we have to bring in that skew into analysis of the clock period as well as the whole time violation okay that is important for realistic analysis. So that is what is stated here so in the all analysis we have initially assumed that the clock reaches the flip-flops at the same time but we have to consider the relative delays due to the wiring and that is called skew which is the difference in arrival time of the clock at the flip-flops and we will see what happens when there is a skew between the flip-flop and take assume that there is a chip and there are flip-flops kind of scattered around and assume which is not very realistic that the clock is coming from a pin which is going to the flip-flop okay like that you know like a chain okay which is not a very good strategy but to highlight the problems I am just putting some kind of fictitious case and you see there are two scenarios say take this flip-flop source and this destination where the clock is going from the source to the destination like that and the data is also going from the source to destination and this problem is called min path problem which we are going to analyse the timing and there is another situation where you see this is a source register where the clock gets faulty when it reaches the destination register so but the data is going back from the destination to the source okay so or you can say this is as far as the data is concerned this is a source register and this is the destination register. So here the scenario is that the data and the clock are in the opposite direction which is called max path okay this is called max path problem and here in this case the data and the clock is kind of travelling in the same direction so that is called a min path problem. So we will analyse both of this and see what happens so let us first choose the max path problem where the data goes from the source register to the destination register but the destination register clock comes earlier to the source register okay so this is a little bit of a opposite case the source register drives the destination register with the data but as far as the clock is concerned the destination register gets the clock fast and the source register gets the data I mean the clock later okay. So let us put that picture for analysis so this is the case we have a source register as far as data is concerned the output of which is going through a combinational circuit and reaches the destination register and the clock of the source is called clock 1 source of the destination is called clock 2 and you see that the data is moving this way and the clock is moving in the opposite direction. And now the fallout is that the clock 1 is the delayed version of the clock 2 okay. So let us put that picture so you have a clock 2 which is reaching here and clock 1 is delayed version of the clock 2 and that delay is called skew okay for whatever reason there could be wiring delay buffer delay and all that in the clock. So there is a skew between the clock 1 and clock 2 and mind you the clock 2 clock 1 is a delayed version of the clock 2 so the clock 1 comes later. Now you assume that there is a clock coming to the clock 1 and clock 2 so by the time the same clock we need not worry because the tcq plus tcom is greater than the same clock so by the time this first clock comes at both places there is no time to latch whatever is propagated because it is already gone and the data comes later. So the analysis is from the this clock to the next clock wherein the tcq plus tcom plus tsetup should be greater than this the clock period okay. But you see here the clock 1 is here and the clock 2 is here clock 2 comes early by an amount skew and skew is the delay in this wire okay. So the clock 2 comes early to the clock 1 so basically the timing analysis is from this clock edge to this particular clock edge here okay. Now that clock edge so the analysis is that so the time for time from this clock edge to this clock edge is tclock the period minus t skew. Now that has to accommodate this tco plus tcom plus tsetup so you get an expression tclock minus t skew should be greater than tco max plus tcom max plus tsetup and so the tclock is greater than tco max plus tcom plus tsetup and t skew. So what happens is that the clock period is increasing because of this skew and the frequency of operation goes down. If there was no skew we say tclock is greater than tco plus tcom plus tsetup when there is a skew that gets added up so that the tclock has to accommodate that skew and the fallout is that the clock period goes down the frequency of operation goes low than without this skew. So that is one disadvantage of this skew in a max path because of this skew the clock period goes down or the clock period goes up and the frequency of operation comes down because of this skew the frequency of operation of the data path comes down which is not a good idea. If there is a because of this skew you are forced to choose the clock period which is larger which reduces the frequency of operation. So one need to analyze all the path and what is the maximum skew in a max path situation and choose add that to the clock period so that you get the corrected clock period. And we know that the whole time also will kind of change because the tco plus tcom now should have the skew should bring into the to come into the whole time violation. So that should be analyzed. So let us choose the other problem where the data path and the clock is going in the same direction okay. So here you see the clock 2 is a delayed version of the clock 1. So you have the clock 1 here clock 2 with a skew okay. So the clock 2 comes later now we are in a better position because if you analyze from one clock edge to the next clock edge you have tclock plus t skew because we are going from this edge to this edge. Tclock plus t skew should be greater than tco plus tcom plus tsetup okay. So which is good because now the tclock then the t skew goes to the other end tclock will be tco plus tcom plus tsetup minus t skew. So the clock period requirement comes down the frequency of operation goes out okay. But this is not a great concern because there could be max path which reduces the frequency. So one need to choose the critical condition which is the max path not the mean path as far as the clock frequency is concerned. But the greatest danger when you look at the diagram is not with respect to you know one clock edge to the next clock edge say there is because of this clock 2 is getting delayed. So normally we assume that it is coming at the same time and by the time because of this clock edge the data will change here at tcq plus tcom and we are assuming it is greater than the whole time. So by the time the whole time window is gone and nothing happens to this. But you see here the situation is that the clock 2 is getting delayed. So maybe the tcq plus tcom can catch up with the whole time window of this clock 2 okay. So that is what is shown here maybe because of this delay the whole time window comes here if this is minimal tco plus tcom is minimal maybe it can violate the whole time okay. So the whole time violation condition is that tco mean plus tcom mean should be greater than tq max plus t whole max okay. So that is the issue with the clock skew it can create the whole time violation and if it happens you see there is no point there is no way in changing the clock period and solving the problem. Because if you increase the clock period is not going to solve the problem because the problem is with respect to the same clock edge okay. So it does not like when you go from one edge to other edge the clock period come into picture. But here we are talking about the same edge reaching you know the one of the destination register later and which can cause the whole time violation. So here the tco plus tcom should be kind of greater than tq plus t whole there is a violation happens and because this tcom gets into the whole time window changing the clock period would not help it. But what you can do is that you can add additional delay so that the data at this point comes after the whole time. So this is the issue with the clock skew it creates max path and min path and max path you can bring down the clock frequency in min path you can have the whole time violation. So essentially when you route the clock in a VLSI chip one has to make sure that the clock reaches every flip-flop without much relative delay between them okay. If there is a relative delay between them then all these issues crop up. So it will be advantageous to route the clock because the clock reaches all the flip-flops which is all spread within a chip. Do the routing such that the relative delays at the flip-flop clock is minimal okay. So that is the basic requirement of a clock tree routing. So the requirement is at the minimum relative delay between any two flip-flops at least between flip-flop and where there is a delay data path okay. So if you have two flip-flops we should make sure that there is minimum delay between the arrival time of the clock at least between the registers where there is a data path if there is no data path there is no cause of worry so that is the requirement. So the trick is to balance the number of buffers so assume that there is an input where the clock gets buffered and reaches the flip-flop. So we have to assume that the number of buffers in reaching from the input pin to edge of a flip-flop should match across a flip-flop and the number of wires you know the number of wire segment number of buffers are identical in each segment and one solution this need not be the solution there is one solution is called edge clock tree where the clock is routed like an edge okay. So in a VLSI chip when you do the clock routing you have to make sure that the clock tree is like an edge shaped clock tree but in an FPGA it is built into the FPGA fabric. So that is what is the clock tree and we can see that suppose there is a clock pin which is buffered at the beginning and it comes vertically down then it goes horizontally like an edge and when you branch you put the buffers then you go in the horizontal line you go vertically okay and whenever you take a branch you put buffers and now from the vertical line to each flip-flop can branch out with a buffering okay. Now the advantage is that suppose there is a flip-flop here and a flip-flop here. So you from the input pin you look at the number of buffers say 1, 2, 3 and 4 in reaching here you say 1, 2, 3 and 4. So the flip-flop here and the flip-flop here has the same number of buffers from input pin and similarly you see the wire segment length 1, then 2 and 3 but here it is 1, 2 and 3. So the wire segment are also somewhat identical so essentially we can assume that this queue between the end points are less than that is what is required. There could be a delay between the input pin and the end which we are not worried what we are worried is that relative delay between the flip-flop that should be minimal. So that is where the clock tree is used in an FPGA it is built in and there are dedicated pin which is driving these clock trees. So you should connect the clock oscillator or the clock input to those particular pins and this is about the vertex the clock tree but in the recent FPGA there could be clock trees which is kind of handling one part of an FPGA because the FPGA fabric is quite big may not be able to clock by a single clock or in such complex FPGA there could be multiple clock domains. So sometime you need to clock by the user clock and or there is a clock which is limited to an area of the slice or the CLB of the flip-flop. I mean of the FPGA so that should be kept in mind so whenever you want to use the dedicated clock pins use the dedicated clock tree and so on. And we should look at the delay loop log loop which is available in an FPGA which is used to remove this queue between the input clock and output clock assume that there is an input pin like that and there is a clock tree which we are using to kind of clock lot of flip-flops. Now because of this the buffer delay and all that there could be skew between like you put a because of loading you know there are lot of flip-flops and from the input to the output there could be lot of delayed due to loading and that can be avoided by the delay log loop where the input is connected to the input of the delay log loop. Output is going to the clock tree which is fed back to a feedback. So what does the DLL do is that you have a clock input and assume that the clock output is skewed by some amount. So what it does is that it cannot anyway cannot bring anything back but what it can do is that suppose this is a clock period and there is a skew what it does is that the input is further delayed by the clock period minus skew. So that this edge you know get you know synchronized with this edge. So that skew relative skew is removed. So the DLL is used for desqueuing an input clock with output clock. So you can remove the skew between them using the DLL. So that is the idea of the DLL which kind of add delays to remove the skew. You do not need a PLL because DLL is much simpler to implement than a PLL. But in a PLL you know that the block diagram of the PLL you must have learned and basically in a DLL input clock is delayed but in a PLL always you find the phase difference between the input and output and that is filtered and that is given to a voltage control oscillator to synthesize a clock which is synchronized to the input. So in a PLL the VCO voltage control oscillator synthesize a new clock okay of the same phase and frequency as after the input clock. The only problem with the PLL is that it will lock on to only a range of frequency at the input depending on the filter you are using inside. And also it takes some time to you know get the locking or get the output to in phase. It takes a while but DLL is much more quick in doing the desqueuing. But one advantage with the PLL is that PLL kind of newly generating the output. So even if the input has some jitter that would not be available at the output. So for very serious frequency synthesis we should consider PLL than DLL and like in the current FPGAs you have PLL blocks in addition to DLL in the digital clock manager. So like current FPGAs as PLL and digital clock manager and digital clock manager contains DLL for desqueuing. It has a phase shifter so the input clock can be phase shifter by 90 degree, 180 degree and so on. It has frequency multiplication and division which is available within the DCM digital clock manager. And there are clock buffers, muxes which is glitchless that means there are two to one mux where you can switch between the clock and that should create in not any glitch because depending on when you are switching there could be additional edges at the output. So the glitchless switching which essentially means that those clock inputs should be kind of the switched output should rise to the corresponding clock. So there will be two flip-flops which is synchronizing, crisscrossing and all that. So you can imagine the circuit so the muxes are available and all these devices you know the PLL, DLL, clock buffers, muxes everything can be connected in the clock tree path like between the clock pin and the clock tree and depending on where you require that ok. So that is the basic resources available with respect to the clock management, the PLL, the DCM digital clock manager which contains the DLL and which does phase shifting frequencies and this is de-squeuing and all that it has clock buffers, muxes and all that and this works with the fixed clock pin and you can insert this between the clock pin and the clock tree. So this is little bit about the special resources so you have buffers, DLL, PLL, DCM, block RAM, DSP blocks all this can be you know instantiated with the vendor library components or in the case of silings you know you have the core gen tool which can generate this the template code for instantiating it. And sometime you write a code and synthesis tool infers saying that it is a memory and will put it in the block memory or this computation can go into a DSP block all that is decided by the synthesis tool. And sometime what you can write is that instead of using the vendor library component you can write the VHDL code for a memory and you can attribute use attributes in VHDL code to say that use block RAM for that particular code of the memory. Writing such a code will be you know you will be kind of implementing portable codes you know if you use the vendor library components you translate this FPGA design to another FPGA from another vendor you have to change those components or if you are taking this FPGA based design to ASIC which is usually the case because the ASIC designers will do the prototyping in FPGA and move to the ASIC. So if you use vendor library component all that has to be changed in the ASIC design domain so that can be avoided by using the VHDL and attribute and so that is it. And we are going to have a look at the configuration of the vertex FPGA. So previously we had a look at essentially we looked at the metastability, the clock period analysis and the whole time violation analysis for a data path and sequential circuit and we said we analyse the clock reaches at the same time which is not the case in real life. So we have put a real life case we have put max path and min path we introduce Q between them and analyse and we have seen that it is troublesome. In the max path if there is a problem then you have to reduce the clock frequency in min path whole time violation is that we have to increase the combination delay. And we have looked at the H clock rate routing which gives the minimal kind of relative delay between the flip flop clocks. And we have looked at the DLL which is used for de-squeuing and we have seen what is the difference between PLL and DLL. And we have also looked at what current FPGAs has in terms of the for clock management, the resources for clock management. So let us look at and how these resources are instantiated in the VHDL that is what we have seen. So let us look at the configuration of FPGA briefly I am limiting myself to the vertex and briefly tell you about the current FPGA which has more option than the vertex FPGA which is available. So to be in kind of current I will talk a little bit about the Spartan 6 with respect to configuration. So basically when you are doing the prototyping you can use a port called JTAG which has different name you know it is a committees name which has come out with this particular port which is called joint test action group. That is also called TAP which is text you know test access port. That is also called boundary scan because that was developed that port was developed for the continuity check on PCB with very complex packages like BGA. You know that in a PCB when you mount chips earlier you know you had DIP packages or PLCC package and you want to check the continuity you can probe the source pin and the destination pin. There are two probes one of the source one of the destination. Now in a package like BGA or flip chip the pads come underneath the chip and because of the multi layer PCB this goes maybe one pad output goes deep in within the layers and through the inside layers of the PCB goes to the destination pad and reaches the IC and when you have a bare PCB there is no way to kind of probate because mostly when you manufacture PCB after the PCB manufacturing the chip gets mounted then comes to the testing site you know it is not that the PCB is manufactured then it comes back for continuity check and goes again for you know the component mount. Because that gets delayed because unnecessary you know transport is involved so PCB gets fabricated and chips get mounted on it and so the boundary scan is a test port which has come for the continuity check through the boundary of the chip you know electrically not by probing. So you can call it say kind of inbuilt probing inside the boundary of the chip through the test access port so that is used for even FPGA programming. So basically this port is a serial port with minimum number of line not to waste number of pins so it has a clock pin data input data output and you have a mode select to kind of control a small FSM inside and when you are prototyping you can program an SRAM based FPGA from a PC to a small dongle. A dongle is a small circuit like a kind of a USB pen drive or a very small circuit. So normally you have a USB dongle where the output goes to the board and you can program reprogram. So this is very useful at the time of prototyping because you are working you know for the various iteration so you do not want to permanently program that into an FPGA so you just program it test it then rework on the design reprogram and all that that is where the JTAG is used and it is possible to program multiple FPGAs through JTAG because the multiple devices can be cascaded because there is a data input and data output data output of the first FPGA can be connected to the data input of the second and so on. So JTAG can you know program a single device or multiple devices which are cascaded. Then there is master serial mode which is like JTAG, JTAG is a standard with respect to boundary scan but master serial is a kind of proprietary port from the silings which has a clock very simple port which has a clock and data input ok. The case of the master the clock is given by the FPGA to a serial prom ok. So literally FPGA is clocking the data out of the prom it is a serial prom which gets and the FPGA gets programmed ok. And this is very useful for deploying an FPGA based embedded board in a field because in the field you cannot insist that program the FPGA through from a PC. So the configuration is stored in a prom which is connected with a serial mode to the FPGA. So the power on FPGA gives the clock and program itself ok. And slave serial the master and slave depends on who gives the clock ok. If the FPGA is giving the clock it is called master but in a slave serial what happens is that the FPGA get the clock from outside ok. That is it is not that the prom gives the clock but you can think of a serial mode where the FPGAs are cascaded. So one FPGA acts as a master clocking the prom also clocking the slave FPGA that is why this slave serial is you know used and there is and these all are serial modes which are slow. There is a parallel mode where may be a CPU a microprocessor can program an FPGA through an 8 bit or 16 bit wide path which is synchronous. So this is the case when you have a processor board and you have an FPGA. So you do not need an additional prom the processor will have some flash memory to store its embedded software or the firmware. So along with that you know in the same memory you can store the configuration for FPGA and through an 8 bit or 16 bit wide path you know you can program this CPU can program the FPGA. And in the vertex FPGA there is only the master mode kind of you know FPGA and the select this particular parallel path but in the current FPGA there is a slave mode. So I think I will show one at least one or two example of at least one select map and this two together I will show this is pretty simple and we also will look at in the next lecture what are the current FPGAs supporting in a nutshell and little bit detail about the configuration. So that is about the configuration one for prototyping 2, 3, 4 kind of in the field how to deploy this in the field. And the select map is a you know parallel programming or parallel configuration which is faster much faster than the serial configuration ok. So I would like to wind it up and we are coming to the kind of end of this lecture set on the FPGA lecture module on the FPGA what is left is maybe little bit about the configuration and bit about debugging and a slight brief look at other FPGAs then we will wind up the FPGA lecture. So please go through the lecture on the FPGA try to understand try to grasp it so that you can when we wind up when we put it together you are very clear what is happening when I show the tool and so on. So I wish you all the best and thank you.