 Welcome to this lecture on field programmable gate arrays in the course digital system designed with PLDs and FPGAs. In the last lecture we have looked at the interconnection technologies, design methodology, what is the tool flow and what are the commercial tools available etc. And just started with the vertex CLB architecture. So we will have a quick look at the slides and move on to today's portions. And so last time we said these are the three programmable interconnection technologies. And SRAM uses the pass transistor as a switch with the gate status stored in a flip-flop which is arranged as SRAM so the name SRAM. And the flip-flop consists of cross coupled inverter which occupies 4 transistors, the right transistor and the real switch it consists of 6 transistors quite big. And that is why the cells are made, the logic blocks are made big. And flash transistor is a normal transistor with a floating gate which can be turned off by trapping electrons. It can be raised electrically with the polarities opposite to that of the writing. And antifuse is where between the wires in the layers within the wires something is deposited. And normally it is not conducting and when you apply a voltage it conducts, it is not an active switch it is a passive connection which is one time programmable so very small area because it is the wires you need to interconnect layers of wires I mean between the layers. So in that wire the deposit is made so though the transistor is required to isolate the voltage applied to fuse it but it is very small delay in the small area. And that is the kind of summary we have seen the SRAM is as need to be programmed each time it is powered on but it is non-volatile but it is large delay and large area. Antifuse has very small area, small delay but it is non-volatile one time programmable. Flash comes in between is non-volatile reprogrammable, large delay the medium area. So most of the FPGA vendors use SRAM because it is I mean there is no limit on the reprogrammability and the technology fabrication process you know yes etc okay. And the SRAM based technologies you make the logic block big because the connections are costly if you make it small then the connection the switches and associated circuitry will be quite huge in comparison with logic. The antifuse vendors will make the logic block small so that less is wasted but here as I said they will make sure that nothing is wasted there will be lot of kind of configurability that things are can be used partially split into 2 parts and things like that or you can use a combinational and flip flop separately put it together all kinds of programmability within the logic block is given. And this shows a coarse grain architecture with a 4 lookup table and 2 slice and this is just a fine grain logic block of actile which is a kind of 4 to 1 multiplexer essentially. We had looked at the design methodology we start with the HDL source do functional simulation synthesis generate the net list then you do logic simulation then you go for the place and route you can specify the IO constraints and timing constraints you can do a timing analysis which is quicker than timing simulation ultimately you can do timing simulation and everything is fine you can program iteration is here like you iterate here then you come down iterate then you come down through static timing analysis you can iterate then ultimately you can iterate through timing simulation then ultimately program it. We have seen the commercial simulators, commercial synthesis tools and vendor tools which has everything in it quite good most of the vendor tools are good and these are the commercial other kind of very you know complex EDA tools suit and each consists of lot of packages within and you can do synthesis and simulation with it but the place and route you have to go to the vendor tool. So let us come to today's portion we will look at the this particular FPGA silence vertex FPGA it is not commercially kind of manufactured now it is probably only supported but this was a kind of I choose this as a kind of basic on which you can build other kind of architectures you take the latest part and seek so vertex 7 if you understand this well and you can you know understand other FPGAs the FPGAs from silence itself and from other vendors. So this is a summary basically it is an SRAM based programmable connection and configuration the lookup table is used as a combinational logic you have flip flops within the logic block with synchronous and asynchronous set or reset large configurable logic block it is quite big then you have dedicated memory called block RAM okay which can be configured as single port dual port FIFU and all that lookup table can be used as memory we will see that the there are clock distribution network which is called clock trees which are low skew once again we will see why it need to be low skew. There are DLLs and currently the FPGAs use PLL but this is a dedicated component you have to instantiate it or from the core generator and you can use it then you have tri state gate for forming the buses may not be available in the current FPGAs you have carry chains we will see what is carry chain or carry logic cascade logic then there are various configuration schemes the serial parallel jet tag and all that there are IO blocks and which support multiple IO standards okay. So, this is the kind of a top level view of the vertex FPGA I will not go into specific of this first thing to note is that all around is IO block and you will have the PLLs okay now in this it is DLL but there will be PLLs okay and inside will be the block RAM CLB that is the block RAM is a dedicated memory CLB is a configuration logic block and so on okay this sandwich one after the other. And the recent FPGAs you will even find the DSP blocks which allows basically the fixed point arithmetic you can have a fixed point addition or multiplication and things like that. So, those blocks will be built in to the FPGA then there could be some recent FPGAs might have the ARM core ARM processor core built in some might have some memory controllers built in and so on okay. So, that is in a nutshell the top view of the vertex FPGA and there are many things to look at the FPGA but Lammable interconnection these logic blocks and all that but like if you take the memories then it is kind of it comes in kind of say you know built in and you have a tool to kind of instantiate this in various width and depth and so on. So, there is I mean there is nothing much to kind of worry about it I mean you have it use it okay though there are minor quirks which I will mention as we go along and but one thing which we will go little deeply is a configurable logic block but that is where the main functionality get built in okay. So, we will look at this little more detail because there you can do a lot of optimisation you can understand you can debug you can you can find why certain things are happening and things like that okay or why it occupy so much area so much more area so much less area and so on. Like when you have done a design you look at the resource utilization you are wondering why it has taken so many look up table. So, you will be able to figure it out at the end of the lecture. So, we will look deeply into this CLB touch upon the other standard things the basis. So, let us come to this vertex configurable logic block okay. So, a configurable logic block is this this is the top level schematic of the manufacturer this is taken from the silencer datasheet and it has two identical slices silings called as slice. So, this is a slice and this is another slice and within one slice you have two identical blocks okay. So, one CLB is two slice one slice has two parts which is identical okay. So, now the one part if you look at it there is a four input lookup table and there is something called a carry and control we will see what it is and followed with a flip flop okay. So, the lookup table is used as a combinational circuit this is used for combinational circuit implementation and this is for registering. So, you have say a data path with say register then combinational circuit and register that is implemented like that okay things like that okay. So, or you have a next state logic with input then the present state somehow fed back through the wire back here and the next state is loaded to the present state okay. So, you get a get my point. So, that is what it offers. So, we have seen what is a lookup table in the case of like when we have discussed the symbol PLDs okay. We have seen that the symbol PLDs evolution was starting with the prom. So, at that point we have looked at how the memory can be used for Boolean function implementation. I hope you have seen that lecture if you have not seen I will briefly explain. So, that it is you do not have to go back in search of the lecture particular module. So, let us look at the lookup table as like lookup table is nothing but a memory okay. So, suppose we have this particular Boolean function x, x or y to be implemented okay. So, how that is implemented is that you take a two address line memory that means four location memory because there are two variables and we have to implement all the minterm. So, x, y as x bar, y bar, x bar, y x, y bar and x, y. So, you take a two address line or four into one memory and what you do is that you write the truth table of this function into the memory okay. So, the truth table is that the 0, 0 is 0, 0, 1 and 1, 0 are 1 the 1, 1 is 0. So, you write that in that particular location. Now that writing like that writing is you can say configuring the lookup table or programming the lookup table. Then so, when you write to the memory that is in the right mode that is during the configuration it happens. Now you want to like as a user you want to use this as x, x or y put the memory in the read mode that means it is reading the data line is taken as the output okay x, x or y output or y is equal to or z equal to x, x or y. Now what you do is that you give 0, 0 then it goes to 0th location and you get 0, 0, 1 goes to 1st location 1, 1, 0, 2nd location 1 the data line and 1, 1, 3rd location 0. So, you get the picture. So, you program you take if you have n variable implementation you take 2 raise to n into 1 bit memory then program the truth table of the whole function in the locations okay in the memory location put the memory in read mode connect the variable to the address line okay like a1 and a0 the way you program you have to connect and use put the memory in read mode you use the data line as the output okay that symbol as it is. One advantage with this is that you can program any function of two variables like it is so flexible that if you take you want to implement and or x or nor everything is possible. So, that is the kind of flexibility of the lookup table when we have discussed the prom you have said that is an overkill because there the prom was containing lot of locations it was a very wide kind of implementation and then it wastes lot of kind of location and for that particular application it was using basically address decoding it need only one product term and we were implementing quite a number of them. So, that was a waste but here it is only few lines and all the functions can be implemented. So, thing to remember is that writing to the memory is while configuring and that write control is controlled by the configuration circuitry of the FPGA but as a user you use the memory in the read mode with address line connected to the input variables and the data line connected to the used as output and as I said in the prom case it is a fixed and which consists of all the min terms and this is nothing but a programmable or because this is an and or we are implementing an or of this which is programmable okay. So, I want to tell you before going further when we say we are programming an FPGA or a configuring an FPGA it means essentially writing to configuration memory okay and so all the interconnection patterns are configured here very specifically when you are configuring the logic blocks and writing to configuration memory configures everything possible. Now all interconnection all logic all special resources everything but when within a logic block there is lot of things to configure one is writing the lookup table with two table like you are using lookup table as a Boolean logic implementation you write the two table sometime a lookup table you need more inputs then you can combine the lookup table and that need some kind of additional circuitry to be configured that is also done through the configuration then lookup table is nothing but a memory. So, we are using the memory as a combination logic implementation but you could use it as a memory also if it is not being used to implement the logic you could use it as a memory we will discuss that to the flip flop you can select number of clocks maybe more than the CPLDs you can do the set and reset of flip flops and there are lot of multiplexes within the logic block to make you know various programmability various to make it flexible and all that is programmed these MUX paths are programmed by writing to configuration memory in addition you can configure the memory FIFO PLL all that like there could be memory blocks which is combined to make a larger width or larger depth all that combining is done through various extra circuitry which is configured using writing to the configuration memory and all programming the interconnection and the IO blocks there is programming available for using it as a output input IO's and so on so all that is done. So, when we say FPGA configuration of programming is not merely the interconnection of the block there is lot within the blocks okay lot to do with the special resources and quite a bit in the IO blocks and we will see all that one by one and when you start looking at the data sheet you will be looking at a family you will have a family of devices which goes from low complexity to high complexity okay and very important specification as far as you know as far as an FPGA is concerned is this one the array of logic blocks okay how many logic blocks are there okay. So, you take this particular 100 device XCV 100E there is an array of 20 x 30 600 logic blocks okay. So, when it comes to the largest device it has around 104 x 156 logic block and in the recent FPGAs and this is just you know see it is just 100,000 if you multiply. But if you look at the current FPGAs it may be having 2 million logic blocks and things like that. So, one important and we are going to see what is inside the logic block. So, one important spec you need to look at is this the number of logic blocks okay. Now, again you can disregard this logic cell because you see the 20 x 30 600 and we have seen is that there are 4 identical kind of logic cells and 604 is 2400. But you know they have shown some little bit of a you know kind of scaling up to accommodate the extra overhead circuitry which itself can be used sometime because there are some overhead circuitry used to combine the logic resources. But at times that itself can be used for some kind of logic implementation. So, they include that also as part of so, instead of 4 multiplying by 4 they multiply by 4.5. But anyway the understanding the architecture important spec is the number of CLBs. Definitely the number of IO pins there could be single ended IO differential IO like you want more noise sensitivity you can use or you want to reject the common mode noise then you can use a differential pairs. And another important thing is the block RAM bits how many dedicated memory is available. So, here around 81 kilo bits are available when it comes to the last point nearing 1 megabit is available okay. And this distributed RAM bits is again something to do with the CLB array. So, this can be calculated from the CLB array spec. So, not a big kind of spec. So, I would say that the important thing to look is the number of CLBs, the number of pins, number of block RAM okay. And when you are encountered with the latest FPGAs definitely you have to look at the number of PLLs available maybe number of the DSP blocks available all that okay. So, I think you can extend this but there are things which is not useful is that the number of like system gate, logic gate these are the equivalent gate counts. And it is not very clear how that is calculated. And it is maybe you can use it to you know kind of decide between the kind of within a family okay. But there is no point in comparing the system gates of one family with the other or one family from a one vendor with another chip from another vendor the doesn't make much sense. And just saying coding big numbers saying that 1 million or 100,000 200,000 one shouldn't choose FPGAs depending on these system gates and logic gates though it is an indication of the complexity. If you understand well then you can kind of judiciously you know make your decision objectively knowing the architecture and knowing the essential specs. So, basically I would say at least in this case the CLB array, the block rampage, user IO, differential IO all that the important spec in the recent in the current FPGAs you can add DSP blocks, the number of PLLs any other dedicated resource which is available okay. So, let us look at the CLB itself. So, as I said one CLB has two identical slice and one identical slice has two parts. So, we will concentrate essentially on this. So, it has a four input lookup table. So, that means you can connect any four variables like ABCD implement any function of the ABCD and there is no optimization involved okay. There is no when the number of variables are four or less is absolutely no optimization just write the truth table within the lookup table okay. Then it works okay but if you have more than four there may be different ways of handling it. I will give you some possibilities and you can see that the output is going to the flip-flop and flip-flop has D, Q, the clock set synchronous preset and you know the reset okay. And there is something called enable clock we will see what it is okay. And one thing to note is that it is not necessary that the lookup table output should directly go to the flip-flop. Sometimes we want to use this as a combinational logic which is combined with other combinational logic to implement complex functionality or you know like you would like to take it out directly to a pin okay whatever it is. So, that is possible like you can see that that path is directly going out. In that case the flip-flop can get wasted okay. This picture is just a kind of like top level schematic we will go into detail okay but like just I want to illustrate not like the connections are not that correct. And this you see this input which can go in principle to the input of the flip-flop. So, like you could use the lookup table separately and the flip-flop separately or you can do it use it together okay. So, that is one kind of level of programming. So, what we are going to do is that we are going to look at the blow-up of this one slice so that we can understand it little more deeply okay. Now, that is one slice at the beginning it looks little intimidating but don't worry. So, let us look at say these are the identical part this is one part and this is a second part and you see there is something shown here which is there on top of this also but we will sorry this is common for both lookup table. So, we will see what it is and there is something in between here we will see again what it is all these okay and there are two flip-flops okay. So, the first thing I said is that it is possible that you can use the lookup table output you can implement any function of four variables A, B, C, D and the output can go to a flip-flop to implement a data path or a sequential circuit or whatever. But if it is a case that you want to use it separately like you want to give A, B, C, D, Z is some function of A, B, C, D and we are taking it out and we are giving from the wires interconnection wires some input directly to the flip-flop and take the output it is possible. So, it can be used together it can be used separately okay. So, let us go back to the slide the main diagram and you see this lookup table there are four inputs one output okay. Now, this one output goes here you can see it can go through this 2 to 1 MUX to the flip-flop. So, this is combined with this flip-flop or if you are not kind of if you do not want that you can take it directly outside okay. When I say you can definitely you are going to write some HDL code which is going through a synthesis tool it will be synthesized. Ultimately it will be placed and routed within this particular circuit block. So, the tool does it but then that is how it you know the things happen. But suppose you use this lookup table independently of the flip-flop then this flip-flop can be used separately you can see that this pin BY this signal BY and that directly can be taken to the input of the flip-flop and the flip-flop can be used separately okay. So, this you can combine with the flip-flop or you can use it separate in that case a flip-flop can be used separately okay. Same thing here now you can see here independent or to the flip-flop and the BX is used as an input to the flip-flop it need to be used kind of separately okay. So, that is what it is about. Now, mind you you should not lose the global picture this is only one slice there are two slides in a logic block and you should know that there are lot of vertical wires here horizontal wires there are switches here. So, and there are input connections from these wires coming into this part all these parts you can connect these wires here that is the meaning of it. Similarly, these output can be taken to the wires again and can be taken to other logic block that picture you should not you know you should have in your mind. Now, the question to ask is that suppose we need 5 input lookup table okay. Say we have a 5 variable Boolean function how do we get a 5 input lookup table the question is that can we get implement 5 input lookup table using 4 input lookup table okay. So, now the thing is that the 5 input lookup table need 32 locations okay because 2 raise to 5 is 32 4 input needs 16 you know 16 locations so it has 2 raise to 4 16 locations. So, it looks possible because you have 16 locations here 16 locations here so how to combine that you know. So, one idea is that when the 5th because you can connect only a b c d maybe you have a 5th variable. So, one idea is that when the variable the 5th variable is 0 program the lookup table here and choose the output of this first lookup table as output when that third 5th variable is 1 that lookup table can be here and that should be used there. So, that suggests a max 2 to 1 max in between the select line of the max can be used as the 5th variable. So, that is how the 5th like 5 input lookup table is built. So, you have a 4 input lookup table you write a b c d and here another 4 input lookup table 2 outputs are maxed through a max and in the case of vertex FPGA it is called f5 max and the select line of that forms the 5th variable because I0, I1, I2, I3 are the 4 variable this is the 5th variable when it is 0 that part of the lookup table is here when it is 1 that part is programmed here and a b c d is common here E is connected here okay and this is the output of the 5 input lookup table. So, we will go back to the diagram of the slice and you see here this is the output of the lookup table okay and the first top lookup table and this is the output of the bottom lookup table you see that there is a 2 to 1 max which is called f5 and the select line here and f5 output goes there and the select line is here. So, you have bx and this is the 5th variable. So, if you have a b c d e then you can say a b c d here a b c d here and you connect e and you can see that you get a 5 input lookup table output that can be registered using this particular flip flop okay or it can be taken out directly out okay. So, that is how the 5 input lookup table is formed okay and you can ask the next question can we form a 6 input lookup table because you have already using a slice. So, this occupies now one slice okay you should remember that. So, within a slice you have 2 lookup table 1 lookup table is 4 input but you can combine 2 4 input lookup table to make a 5 input lookup table that occupies a slice. So, the next question is that suppose we have another slice here which has like another 5 input lookup table can we combine them like 2 5 input lookup table to make a 6 input lookup table okay. The answer is yes we have yet another mark. So, we have like we can do this is a 5 input lookup table this is a 5 input lookup table. Now, you use a 2 to 1 marks between the output of 2 f5 marks we call as f6 marks the select line of that is the 6 variables. So, you have A, B, C, D, A, B, C, D here is same for both because like when E0 you program here E1 program here E0 program here E1 program here and f like when it is you know f when it is 0 you program this section f1 it is this section. So, it goes like that f0 is program here f1 is program here okay and in that f0 when now E0 is program here E1 program there and when f is 1 E0 is program here E1 is program here okay. So, let us go back to the slice diagram from the silings and you look at this assume that we have combined this lookup to lookup table to 5 input using NF5 and assume that there is a slice before I mean behind that or before that that could be coming from the same logic block or a different logic block does not matter. And now you can see that that is assume that f5 output is coming here okay in the case of a single CLB this f5 output is connected to the f5 input of the next slice. So, that is coming here and you see this f5 output and f5 output of the previous slice is combined in a f6 MUX the select line of that can be used the BY can be used to use for the 6th variable okay and that f6 MUX can be directly taken out or can be registered in this particular flip-flop okay. Only disadvantage is that if you combine if you make a 5 input lookup table then you cannot and if you are using taking it out directly you cannot use this flip-flop input separately because the B axis is already used as a kind of the 5th variable. And similarly if you are implementing a 6 input lookup table this select line is nothing but BY so then this flip-flop can be cannot be used separately. So, that should be kept in mind. So, in a slice you can have 2 4 input lookup table or a 5 input lookup table in a CLB because it has 2 slices like 1 5 input here and 1 5 input here which can be combined to make a totally a 6 input lookup table okay. So, that is what we have discussed okay. Now like when we say okay this should be kind of understood when we are when I say 5 variables like we mentioned that you have like ABCDE like you have 5 variables then you implement that using a 5 input lookup table using an f5 MUX many a times it is not required okay or you say you have a 6 input 6 variables then necessarily not necessarily that you have to go for a 6 input lookup table for implementation. You could in some specific cases you can cascade lookup table okay depends not like this when we say consider all possible min terms okay. But in specific cases of 6 input lookup table it can be implemented using cascade of 2 lookup tables okay if you build a 6 input of lookup table from 4 input you need 4 of them because 2 4 input combines to 5 input lookup table and 4 4 input combines 6 input lookup table. So, like in specific cases say take an example here okay let us take an example say this is a function we want to implement y is ABCDE or ABCDF okay. So, you we have now 6 variables ABCDE ABCDF okay. Now one way of doing is that going to this you connect f here connect e here ABCD you connect everywhere or for all the 4 and you write a big truth table for this you know wherever there is 1 0 and come back and program it okay. But there is another way of simplifying it you see ABCD is common. So, we take ABCD out. So, it becomes ABCD and e or f okay. So, what we can do is that we can say ABCD is x and y becomes x and e or f. Now what we can do is that 1 lookup table is used to implement ABCD which is x okay x and output of the lookup table is connected as the input to the next lookup table. And then you connect e and f and you implement this x and e or f here. It does not matter because we have seen that any Boolean function of the up to 4 variables can be programmed in the lookup table. So, this can be easily implemented. So, this is 6 input but it can be used it can be built using 2 lookup table instead of 4 lookup table that should be kept in mind. And of course disadvantages that there are 2 lookup table delay you encounter when you implement like that. But if you maybe like if you implement like this then it is nothing but 1 lookup table delay plus the FIMUX delay which will be less than this lookup table delay not much of a problem because many a times in FPGA what dominates is in the connect delay and in any case we have saved 2 lookup table which is quite precious okay. So, that is how it can be done. Similarly, you can use say 5 input instead of using a kind of 2 lookup tables you can using FIMUX you can kind of you know combine cascade 2 lookup tables and a b c d e can be implemented as a b c d and e and we implement a b c d as x in a lookup table. And x is cascaded with the e using another lookup table then you get it okay. You do not need an FIMUX you can use a cascade but there are cases like you know you see this particular function a b c d x or e okay. Now like we say a b c d z that means we have to expand it becomes z x or e that is z e bar and z bar e okay. There is nothing common like when it is e bar it is z and when it is e it is z bar okay. So, you cannot have a common lookup table to cascade you need real true 5 input lookup table. So, you are forced to use this architecture. So, this can be expanded to 6 input case also. So, there are 6 input functions you can come out with which definitely needs the 4 4 input lookup table or 1 real true 6 input lookup table then cascading. So, these are various kind of these are all are done by the tool. What I suggest is that you have you know the VHDL I will demonstrate the kind of the tool towards the end of the course and during the case study and that time you can write all these and see how it is fitted and you can even see how things are routed within going to the floor planner tool to see what is happening inside the lookup table and all that okay. I will illustrate that maybe I will at least show you I may not take this code and show you but then I will show you in general how you can view the floor planning. So, this all can be tried you know that is how you learn you try out something and then you see what is happening within the device. See there is one way of most people kind of take the tool as it is and you tool has lot of features. So, you write a code then you click some button you choose some options and some magic happens the hourglass come then lot of console messages come you enable some optimisation and you get some advantage you get some speed and the tool gives you lot of features some are used some are not used but whatever may be the tool whatever may be the technology. If you try to understand little bit detail by playing with the code by playing with the design by playing with the tool option analysing the various output trying to figure out what is happening inside then you will get a deeper understanding all that the report churned out by the tools make sense to you and much more than that you get ideas how to optimise okay. Maybe that inside okay many a times the tool vendors try to incorporate all that back to the tool like any special insight which is kind of you get it from you know working out with the various examples they try to incorporate within the tool they make try to make the tool smarter. So, maybe like if you have such insight then that should be you can be communicated you can if it is not known it can be published and the vendors will take note of that and they will try to improve their device tools and so on. So, that should be kept in mind. So, let us come back to the slide and this is what we have seen. So, in a summary vertex CLB the lookup table and flip flop can be used separately or together it has 4 4 input lookup tables 5 input lookup table can be built from 2 4 input lookup table using F5 marks 6 input lookup table can be built from 2 5 input using F6. So, essentially it is a 4 4 input 2 5 input or 1 6 input lookup table of course you can do cascading and the flip flop you have set reset clock enable and you can one advantage of certain reset is that you can initialize the registers to any value you do not have to say at the power on everything need to be 0 ok. You can have 8 bit register you can say at the power on it should be loaded with the x value say 3a like that. So, that adds a lot of flexibility like when you build complex function you might have some default register values at the power on that can be easily kind of implemented using that. So, okay now when we look at the slice diagram one thing you should notice that when we use lookup table as a logic the read part of the circuit is available to us ok. The address lines are we connect the address line to the various signals input signal data line is used as output and this memory is in the read mode ok. Now, this memory is written during the configuration that means that the right control of the lookup table is through the configuration circuitry it is not available to the user ok at least at that like if you say it is dedicated right circuit then it is not available to the user. But it may happen that this is nothing but a four address line memory. So, it is 16 into 1 bit memory ok. Suppose you have an FPGA 50% of the lookup table is not used as logic and you have used all the memory the block RAM for some kind of as a memory and you want little memory. But if you look at it lot of lookup tables are lying around they are memories but you cannot use it because the right circuit is not available to the user ok. So, what the Xilinx or other FPGA vendors do is that the right circuitry is made available to the user ok. Now and you can use this you can see that this is the right enable and the data input because this is like internally you can assume flip-flops the flip-flop output is MUX to the you know the O and the select line of the MUX is the address line ok the read circuit. But this is the input to the flip-flop the D is of the flip-flop ok. Again it is connected in bus and this address can be used to decode the input flip-flop ok. So, that circuit is made available so that if this lookup table is not used as a for logic then it can be used as a memory ok. So, normally what you do is that you go to core generator which allow makes available the standard cores. So, you can instantiate the distributor RAM and sometime what you do is that you write in VHDL say case statement like you say case on say ABCD and you say 000 output is something 001 output is something else for all the 16 possible values. Then this is an you know ideal candidate to use a distributor RAM and it is put it in the lookup table. You might ask what is the difference it can be treated as logic but yes any it can go to block RAM also it is not that it is it need to come to the distributor RAM. But then we are discussing that if the lookup table is not used as a kind of logic for logic implementation it can be used as a memory and since it is distributed it is called distributor RAM. Because it is all over in a block RAM it is in a one place but in a distributor RAM it is implemented you know it is distributed ok. Now you should know that there is a slight issue with it and the thing is that when you have address line when you have lot of separate like particularly you combine suppose you want a 16 by 16 memory that means 16 location 16 bit. Then you have to parallel 16 of the lookup table to get kind of 16 width of the memory. It may happen that you might assign the pin somewhere here some lookup tables are close to the pin and some are away from the pin. So the read latency may be different for the different lookup table. And you will see if the read is not synchronized which is usually the case you do not synchronize the reads you could synchronize the read using the flip-flop. If you do not then you will see the variable latency and if you use it for some precision say like you are trying to generate a waveform using a DAC then you will find a jitter ok. So you should keep that that may not happen with the block RAM ok which is many a times output can be synchronized. But you keep this in mind you know you use it distributed RAM as a huge memory which is spread across and the read is not synchronized then you might get variable read latencies. And in some cases it can affect your implementation ok that you keep in mind. So that is what is shown in slide. So you basically you have a lookup table and now there is a block which is a lookup table right which is normally controlled by the configuration logic. But now it is those wires are made available to the outside so that user can connect their signals to use it as a memory. And that is in the logic block in the slice diagram this is what is shown here ok. You have the address line, the data line, the clock, write everything is here. So you can use it as a distributed memory. So let us look at the this issue. Now maybe I will introduce that so there is a carry chain within the FPGAs slice. Now you know the equation of an adder the sum is for any particular index say si is ai exclusive of bi exclusive was c of i ok that means if it is a7 is a7 x or b7 x or c7 this is a full adder ok. And you construct a ripple adder by cascading the full adder and if you look at the carry i plus 1 is nothing but ai bi that means if both are 1 then the carry is 1. Or a exclusive of bi that means either of them is 1 then if the carry input is 1 then the carry output is 1 ok ai bi or a exclusive of bi and ci and many a times we simplify it simplifying it by an inclusive or ok we say that ai or bi ok. It is a bit of redundancy because when you say ai or bi it includes 1 1 also it is not 0 1 1 0 1 1 but it does not do any harm because 1 1 anyway generates a carry. So that is a bit of redundancy but we will look at this particular the correct optimised equation because it enables us to implement a carry circuit. Now if you try to implement a ripple adder using full adders in an FPGA and suppose we are trying to implement this using the lookup tables ok. Then like you need one lookup table you might give a0 b0 here you will implement s0 here and you will again give a0 b0 here and you will have to get you know you give c0 here and you will have to get c1 and take it out ok. Now combine the c1 and this s0 sorry c1 with a1 b1 to get the sum 1 and another lookup table for kind of c2 ok. So essentially what is happening is that you need 2 lookup tables per the full adder and all the interconnection is going to make everything slow. So if the addition is slow the multiplication you build using the addition will be slow ok. So that is one issue so if a separate lookup table is used for the carry logic then the interconnection will slow down the adder and it will slow down all arithmetic operations. So what is normally done is that this carry is kind of built in the carry logic is built in to the slice. So I will describe you the idea how it is done then we will go to the slice. So there is a lookup table what is done is that there is some additional carry logic which is part of the lookup table ok. So every lookup table you take will have a 2 to 1 MUX like this connected like the select line is connected to the output of lookup table and one of the 2 input is going to the input of the MUX and this output of the MUX coming from the previous section is connected to the input of the MUX and output of the MUX goes to the input of the top MUX ok because this comes from the bottom and this goes to the top MUX ok. There is a MUX here with the lookup table and this is an XOR additional XOR gate which is used for full adder implementation. It can be used for other purposes also ok if it permits if that kind of location permits then you can use it and the lookup table output is going to the XOR gate input and this is available as output ok. Now this implements the carry equation we have seen. So this CI plus 1 AI BI plus or AI exclusive of BI and CI that is implemented sorry that is implemented here ok. So how it happens is that you give AI BI here ok and that is AI exclusive of BI that is going to the select line of this and this is this part and AI BI is this part ok. So that is the carry chain basically. So we are coming to the kind of the you know end of this part of the lecture we will continue with the carry chain. So what essentially we have seen today is a detail inside a slice or inside a logic block we have seen the lookup table flip-flops combining the lookup tables for various you know higher level complexity cascading them and using the lookup table as a distributed memory. And now we are looking at the carry logic which kind of speed of the arithmetic operation. So please have a look at this the lecture notes you can look at the data sheets you can even look at the data sheets of other FPGAs I suggest you look at the Spartan 3 FPGA data sheet the logic block diagram which is identical. And once you master it you can go to Spartan 6 or Vortex 7 and so on. So please revise we will continue with this in the next lecture I wish you all the best and thank you.