 Welcome to this lecture on field programmable gate arrays and the course digital system designed with PLDs and FPGAs. Last lecture we have looked at the Xilinx vertex configuration or programming and we have also looked at certain features which was not available in the vertex kind of device and what are the extra features which is available in the new devices citing spot and six. Then we have looked at the issue of kind of one hot encoding and we have to kind of finish it there were little detail left towards the end of the lecture we will complete that and in today's lecture we will look at similar device or FPGAs from other vendors like Altaira and Actel not very kind of in depth because we have already gone through the vertex device in depth. So, I presume that you will be able to understand that other architectures very easily you spend some time then you will be able to make out everything. So, before going to today's lecture part then we will look at the last lectures portion briefly for continuity then we will continue with today's part so let us move to the slides. So, in the last lecture we have looked at the vertex configuration. One is through the JTAG port for prototyping any time then master serial is where the FPGA is connected to a serial prom and in the vertex it was only 1 bit data. But in the current devices it can be 1 bit, 2 bit or 4 bit serial data to enable to support larger devices because for a single bit it takes configuration time will be quite high. So, 4 bit should reduce it by 4 and the master serial would means that there is a master FPGA clocking the prom. In a slave serial the FPGA expect the clock and the data in synchrony with the clock to be supplied to the FPGA. This is useful when in the serial mode if you chain multiple FPGAs then a master will clock the serial prom as well as the slave FPGAs. Then there is a select map mode which is a 8 or 16 bit wide mode, 8 bit in the vertex but 16 bit in the current FPGA. This also can be in the master mode or slave mode in the master mode, the FPGA give the clock in the slave mode the FPGA expect the clock. Again this can be chained in the current devices. So, this shows a you know this JTAG thing I am not showing because it is simple you know normally a JTAG USB dongle is there connected to the PC which convert the USB to JTAG and JTAG is connected to the JTAG port of the board. And you can program using the programming tool any time either you can program the FPGA or the serial prom. Even the serial proms have you know the JTAG port so it is possible to program the FPGA in that case every power on it has to be programmed. If you program the serial prom then it is kind of permanent because you put the FPGA in the master mode and if you program the serial prom it is kind of non-volatile. Each time at the power up FPGA clocks the prom, select the prom and program itself. And this shows a scheme in which the two devices are shown where the bit stream or programming bit programming stream of both are combined together in one prom. The master give the clock and at the data comes it programs itself while it is programming it pushes one and the slave wage once it is programmed it pushes the bit stream for the second device and that programs and so on okay. So, at the beginning both get kind of initialized by clearing the configuration memory and to synchronize that process there is an IO pin open drain output and an input which is pulled up and when it is doing in it then it will be low pull low. So, whichever FPGA comes out first wait for it to go up so that everything initializes properly then only the programming start. Similarly, at the end of it each FPGA takes it is on time and you know the last slave programs last so this done pin is again a wired end through the pull up with open drain and when it goes high the rest of the circuit knows that the programming is done it can continue the operation and while programming all the pins are tri-stated and the rest of the circuit has to take care and all the flip flops at the end of the configuration is reset by internally by the FPGA. And in the select map scheme the FPGA support a 16-bit or 32-bit parallel configuration interface these are the additional signal and normally this does not match the protocol of the CPU. So, either you have to use a CPLD or use some parallel port to make control to make this particular interface. The only issue is that in the case of parallel port this can be pretty slow if it is software controlled. So, it is worthwhile to have a CPLD kind of built in so that this is done much more faster and the configuration bit stream is stored in the CPU memory to save space and that shows the timing and the current FPGAs as I said you have the boundary scan which can be single device or chained as in the serial configuration. The master serial can be chained or gang that means the serial devices identical devices can be parallel in gang chain is what we have discussed. And the serial prom can be 1-bit, 2-bit, 4-bit. The prom itself can be programmed to the JTAG port and the slave serial again these devices can be 1, 2, 4 data width. In the master serial the FPGA you know clocks a BPI flash in 8-bit or 16-bit mode can be single device can be chained or gang. And there is a slave select which go along with the master select which is where the clock comes from the external source. And the next thing we have discussed was that in this kind of scheme bit stream is kind of open expose people can reverse engineer your design by copying the configuration bit stream. So, the current FPGAs have encryption that means bit stream before programming can be encrypted with an AES algorithm with a 256-bit key. And FPGA is programmed to the JTAG port with that particular key it can be permanently fused or can be stored in a battery backed up RAM inside where the battery backup is outside. And once it is programmed certain configuration can be written not to read back the configuration or even the key all that can be blocked ok. So, that is the encryption scheme and there is a bit stream compression because only maybe a part of the FPGA is configured. So, if the bit stream contain the information about all of the resources can be kind of removed from the configuration to reduce the storage requirement and the amount of time spent in configuring. So, that is called bit stream compression it is useful where you have less memory space for storing the configuration stream or you want to configure at the power on a much faster than what is usual because every device depending on the device complexity takes certain time. So, when you compress it has less configuration time and another option is that if the bit stream you are programming if it gets corrupted then one need to reprogram it. So, it is possible to store a golden configuration in another place in the flash and which works with the SPI flash and BPI is the platform flash with the 8 bit or 16 bit interface which also has a JTAG. So, it can be programmed through JTAG. In that case if during the configuration anytime a CRC error on the bit stream occurs then it can fall back to a golden configuration not a particular site, particular location in the memory or if there is a sing word detection failure there is a watch stock timer which is watching and it times out then the FPGA try to configure from the golden bit stream. So normally whether it is embedded or FPGA golden would mean certain default very tested configuration which is not upgraded which is kind of kept from the beginning and the main configuration is the one which gets upgraded in the field and so there can be kind of corruption because you are writing to the flash quite often in that area and the bit stream itself could be kind of buggy or you know because it comes you know it gets downloaded through the net. So, it is possible to have corruption so this multi-boot takes care of that. So that is about the multi-boot these features can be used in the current FPGAs. As the DSP slice I said a pre-adder which is 18 bit, a multiplier which is 18 bit which is giving 36 bit result which can be sign extended to 48 and add with the 48 bit. It is very useful in the DSP kind of algorithms where you have multiply, accumulate kind of operation you can have various you know output of this bypass you know adders cascaded the multiplier output taken up all possibilities are there I am not showing it here. Like the CLB you can go through this the detailed schematic which should give you some good idea about it. This helps in implementing the DSP algorithm very efficiently and the tool vendors many times offer a MATLAB or simulink interface for such algorithms. So, implement those algorithms in the using signal processing tool box or simulink then that can be kind of equivalent VHDL code can be generated. That essentially uses this DSP slice and that can be kind of synthesized place and routed in the FPGA to generate the bit stream. Another issue in FPGA is that you program the FPGA then it is very difficult to kind of debug you know you program the FPGA and everything was you know timing simulated but when you program it it does not work as it is intended and then you have debug it is a big problem because only the external pins are accessible to you if the something is wrong internally then it cannot be seen. So, what is done is a along with the connected to a JTAG there is a logic analyzer IP and the probes can be connected internally to your circuit interfaces inside. And you can set trigger thing that on this data bus on this particular signal lines if a particular value comes from then onwards you capture or you capture before and after some time and transfer it through the JTAG port to the PC to do an offline analysis and debug it. So, it is a very useful tool once you program the FPGA and that is called in the case of silings it is called Chipscope Pro and the Altera it is a signal probe. So that shows the picture you know you have you instantiate these kind of logic connected to the user function and capture it and analyze it and change the code rebuild and you know go on to iterate. And very important thing to remember is that it occupies certain areas so if you are trying to use the Chipscope Pro then you have to have some free area to put that IP and sometime that can little bit mess up your timing because unless you floor plan properly the presence of that circuit can upset the place and route if you are not doing anything if you leave the place and route to the tool completely then it can kind of flatten everything and all logic can get mixed up within the area there you cannot say that particular one of the module will be very close together nothing like that may be this particular Chipscope IP and your logic everything can get mixed. So whatever timing achieved may be upset by the presence of that but if you floor plan properly there should not be an issue that you should keep in mind and we have looked at the pins the clock pins and the programming mode pins and few pins like the C clock programmed done in it is not a dedicated pin these are dedicated pin VCC ground the JTAG port rest are all are user defined port even for the select map interface the user defined pins are used and after the programming it becomes user defined pins depending on the programming at the power on that can depending on the mode pins these pins will have the special programming features at the power on. The next thing the last thing we have discussed was the one out encoding the issue is that if you encode do a binary encoding of the number of states say if there are five flip flops and the flip flops come here five input in the worst case when we say worst case because many a times in state machine you make decision on a particular signal or combination of two signals you do not combine all the signals. But then let us assume there is a scenario where to a state there is transition from other state on different condition so that can happen so when you develop the equation for that state there could be say five input condition along with the state okay. So such a thing can happen so the effect is that this can become because you have five input five flip flop you have a ten input logic implementations required for each flip flop and again once again in the worst case it might require a ten input CLB which can span quite a lot of logic blocks making this very kind of spread in the FPGA increasing the logic interconnect delay and reducing the clock frequency okay. So that is what is I have shown here a particular case of an FSM where in the worst case you can go to 16 CLB which can make T logic very high and we said the solution is to encode state in a flip flop so that as far as the state decoding is concerned there are number of inputs plus the number of flip flops wherein you know there is transition to that particular state you know and that we have seen an example where si on condition i transit to sj, sj on condition j remain there then the equation will be dj will be condition i, qi plus condition j, qj, qj because si is kind of represented by single flip flops qi, sj by qj. So this is five input condition so as I said less likely but then mostly it will be one but then let us take the worst case it will be only seven input not a big deal and one more thing with this kind of one out encoding is that many outputs will be more type output and in that case output will be high on a particular state many a times or in one or two states. So the output logic will become just the q output of a particular state or of multiple q's ok. So in one out encoding not only the next state logic becomes simple the output logic also reduces that should be kept in mind the output logic will be or of one or more flip flops in the one in the case of one it can be taken out directly. So even in timing wise it will be much faster than binary encoding so that is about the output logic. And the state encoding can be kind of manipulator not manipulator can be changed using the user defined attribute this I have briefly mentioned when we looked at the state machine coding using VHDL ok. And this coding after you know specifying the state type this can be mentioned in the VHDL code. So one example I am showing here say here there is a state type which is defined as you know this particular name state type. So you say attribute state encoding of state type you say type is grey or that means a grey coder or type is 101 or 100 ok. This is 10 encoding we will see what is 101 and 100 ok. And there is another possibility you say attribute enum encoding for enumerated encoding of state type you say type is and you literally say the encoding ok. Here we are assuming there are four states of two flip flops and you are saying the first state is 00, 01, 11, 10 which is nothing but a grey code ok. Now mind you this is called a user defined attribute so it means that user would mean does not mean the designer in here it means a vendor ok, vendor of the tool ok. And so you have to check this it is not a standard VHDL it is a user defined attribute. So from vendor to vendor it can change it does not matter it does not kind of you know guarantee that this will work for suppose if it works for silings it will work for Altair ok. So you have to check the tool vendors manual recommendation with regard to this attribute to change the state encoding ok. Suppose you are not able to ok now one little bit difference between 101 and 100. Say 101 means true kind of 101 like we have five state then you have 00, 01 because this flip flop represent the first state 00, 10 this one represent the second state and so on. So you have five patterns represent the five states ok. But the issue is that sometime at the power on you want to come to a starting state and if you are using resets for it will be troublesome because you need a particular flip flop to be 1 and reset is not there I mean set is not there you have to build otherwise into the next state logic. In that case the reset part of the flip flop is going waste sometime it is useful to start with all zeros then move to this kind of pattern ok. So mind you that this is a kind of dummy power on state at the beginning it comes to this state and but for all practical purposes it transit to the next state 001 then never goes back there it is only just for starting then you are real state machine what you what you design start with 001. So you have a dummy state which could be a replica of the starting state as far as outputs are concerned but then this is never kind of visited again. So that is 001 which is most useful in the case of 001 encoding that you should keep in mind and suppose you are not able to control the state encoding for whatever reason using these attributes which is less likely and mostly the synthesis tool options will support is 101,10 you do not even have to use attribute many a times but you have to check that you have to check the tool how that this is handled mostly the synthesis tool options will help you in the case of silings they have a synthesis tool called XST silings synthesis tool XST and that as the command line option for this kind of thing in the GUI there are pull down properties synthesis properties where these encoding can be changed and if you are not able to suppose you are somehow stuck with you want to kind of hard code the 100 encoding then you can explicitly define the state normally we use an enumerated data type for present state and next state but then you can say signal present state next state is standard logic vector 3 down to 0 because we have only 4 states and you can literally say constant a is standard logic vector 3 down to 0, 001 constant b is like that so on. So we are defining the 100 encoding and for the state you use a, b, c, d like that you know that is the basic idea. So we have kind of come to end of the FPGA I have in detail I have discussed the vertex and some of the features in the current FPGAs some issues like 100 encoding now I am sure that you will be in a position to understand all the silings FPGA you spend some time looking going through the data sheets quite a lot of them are there. There are some parts which I have not covered but you can look at it there could be packages and package details so that you can like when you make a PCB there are guidelines which package to use and how to do the layouting of the PCB, how to route it properly, how to terminate certain high speed signal with termination all that. You can refer to the manual I have not detailed you know discuss the basic electrical characteristics timing characteristics all that can be looked at the data sheet but internally the logic the routing special resources configuration certain example of how these internal resources are used how it is mapped to the standard circuit all that I have covered. So I am sure that you are in a good footing to now to understand other FPGA from silings and from other vendors but before closing down I want briefly very briefly look at Altaira and Actel devices because Altaira is also a kind of commercially as a good share in the market and Actel has a very dedicated kind of technology for the space and higher reliability application so very briefly we will look at it. So let us turn to that part the Altaira has their mainstream high performance FPGA called Stratix series which is you know Stratix 2, 3, 4, I think 5 may be there I am really not in kind of in touch with these devices so but you can refer to it pretty much everything is exactly similar to Vertex you know. If you look at the latest FPGAs the Altairas and silings will be kind of similar features they offer maybe one is better in some ways one is better in some other ways. So first thing to note is that there is little bit difference in the interconnection very marginal difference which is at two levels of interconnection. So little difference you know in silings FPGA you have interconnection matrix and logic block and we have seen that the adjacent logic blocks are interconnected you know directly. So in the case of Altaira this is little more extended that is the basic difference here and SRAM based programmable connection no difference logic array block is 10 logic elements. So you have that lookup table flip flop such 10 elements are connected together in the case of Vertex you have 4 you know lookup table flip flop combination are there in a logic block lookup table is a combinational logic no difference flip flop with synchronous or synchronous reset preset DP RAMs SP RAM before all that is there. And the low skew clock trees and PLL the PLLs are there in the latest silings FPGAs or other version of Vertex carry cascade change same DSP block with multiplier shift register. Again in the Vertex series it may not be there but in the later Vertex series it is there in the silings IO blocks registered non-registered everything is same multiple IO standard same JTAG parallel and serial configuration it is same. So it is very much identical if you know one you know the other just I am putting a very large you know high level block diagram which is this is taken from the Altaira data sheet and you have the IO pins, you have the logic array blocks, you have the DSP blocks here no these are the memory blocks and these are the DSP blocks and there is a big RAM block here. So that is what is pretty much the architecture which is very much identical to Vertex. And when I say two levels of interconnection this is what I mean you know you have vertical horizontal matrix with interconnection but 10 logic elements you can see that they are connected to adjacent logic element. And between the logic element there is a kind of local interconnect this is the kind of maybe a slight difference between the Altaira and Stratix with regard to this FPGA. But the later FPGA one need to look at it how it is. So I think I will kind of will not elaborate more these are kind of taken from the Altaira data sheet these information you can go through there later FPGAs but it is more or less similar to the Siling's SRAM FPGAs. So let us look at the Actel FPGAs this is quite different you know Actel 54SX a really old series but very useful for certain application essentially because it has the anti fuse as a programmable interconnection. So at the beginning we have looked at the general interconnection technologies and we have said that anti fuse is a one time programmable solution. So if you send something to space you know in a satellite then it is exposed to radiation in the outer space and the RAM or a flash can get corrupted you know you cannot even think of even it is radiation hardened you an SRAM based FPGA if it is sent to the space there will be frequent corruption it has to be kept on reprogramming if somebody send it probably on an hourly basis the bit stream has to be kind of put back you know that is the only way to keep it sane. But in an if you think of an Actel kind of FPGA it is a fuse connection you know it is a permanent connection nothing can get corrupted. So the space and military application basically anything to the space where the radiation is there you can use it maybe it is used in some weapon technologies I have no I am not very clear about it. But at least it is used in the space application and the one issue we have discussed at the beginning is that since the anti fuse technology take less space and less time with regard to the delay there will be the simple logic blocks ok. So there are combinational blocks and register blocks which is separate in the SRAM based FPGA everything is put together but in Actel 5.4SX the combinational lookup table like thing is separate and the flip flop is separate and you can have a kind of certain ratio by choosing some certain clusters ok. Very simple IO block, low skew clock tree, multiple IO standard and I must say that Actel is the one which had hardware probe pins built into the chip ok. We talked about the internal logic probe or logic analyzer IP connected to JTAG. But Actel had two pins extra on the chip and through JTAG one could connect the internal signal to this two probe pins and capture the probe pins to debug. So I think it was a kind of original idea at that point in time to provide two dedicated probe pins that can be probed you know that can be. So instead of taking the probe through the JTAG it was taken through a pin but the configuration was through the JTAG ok. So if you look at the combination cell essentially it is a 4 to 1 multiplexer in the case of like we have discussed the fine grain FPGA architecture we have shown this as an example. So 4 to 1 can implement two variables say you connect A and B here then all the min terms are available here you can select it by 1 and 0 you can implement a logic function. But it is possible to connect a third variable at the input and implement three variable but because of the presence of this AND gate and OR gate instead of connecting you know kind of one signal you could connect two signal and some combination of the min terms of 5 may because 2 here, 2 here and 1 here 5 inputs can be implemented in the combinational cell and this goes to a flip flop. And the registered cell is nothing but a flip flop with set and reset you can choose a clock polarity you can choose from a hardware clock where low skew clock routing and two different other clocks ok. And you see there is a recirculating marks and this is connected directly to a combination cell ok. So the adjacent combination cell and this is for you know getting the wired connection with through the AND fuse to here ok. So there is a port like this act as a clock enable to get this and this is also a kind of connection for the input from the previous C cell. And normally what is done is that they have a clusters, two type of clusters, two combinational cell and one register cell and two register cell and one combinational cell. So these clusters can be kind of mixed together to control the ratio of register to the combinational cell. So maybe in certain device there are more registers, in certain device there are more combinational cell suited to different application. And the routing is that within say between a C and adjacent C and R direct connection within a cluster there is a 1 AND fuse connection across a cluster 2 AND fuse connection. What is specified is that the number of fuses used is minimal when the interconnection is closed by so that you can get a fast interconnect. This is all taken from the actual datasheet and this shows the idea of probing there are two propins and this is the FPGA. So there is a logic, there is a hardware which connects to the JTAG port of the FPGA the propins and this is serially connected to the computer and you can give command to connect internal signal to the propin capture it and view it and debug it. So it does not require an additional logic to be put into the chip and this was useful maybe at that time it was a good idea which started by the Achtel. And the Achtel also has a flash base FPGA which is essentially better than the SRAM in a way because even in SRAM FPGA you know that when you deploy something in the field this flash prom is used okay. To that extent only thing is at the beginning there is a configuration time there is a kind of copy from the flash to configuration memory which is SRAM. So additional configuration is involved but here the programming is done through the flash memory so it is non-volatile so it is already configured at the power on the chip is ready. So that is advantage but otherwise architecture is pretty much same there is a difference in the logic block. So there is kind of memory, IO pins, the logic block just for the kind of curiosity. This is the logic tile or logic block of this ProASIC plus. This is what they have you know this is what is the logic block first thing to notice that there is no flip flops inside okay that could be quite surprising but if you look at the combinational circuit you have a 2 to 1 MUX that means you can implement 2 variables like 1 in the select line and 1 in the input and there are such 2 to 1 MUX that is all okay. So how do one implement an edge triggered flip flop is a question okay. But if you kind of know the circuit then you can make out that 2 to 1 MUX with this 2 inverters you know this can act as an inverter can act as a latch okay. So I am showing that in a picture here this is a 2 to 1 MUX this is the select line which is clock. This is D input and you say D is the clock is 1, D goes to the Q and if clock is 0 Q is latched okay. So that is what is here shown. So there is an input which goes through this NAND gate which can be by making this 1 this can be an inverter and to the second inverter it goes it acts as a latch. So if you look here you have an input say the input then it goes through here it is a latch okay. Now you know that a master sleeve edge triggered flip flop where there is a master latch a sleeve latch and there is an inverter in the clock this act as an edge triggered flip flop. So you see this so this is the clock line which goes to the select of the MUX one is inverted one is directly going. So that is how this master sleeve latch is implemented using this 2 to 1 MUX and this particular you know cascading of both you can see the output is going to the input and this is the real thing. So this can be very cleverly used as for combinational circuit as well as flip flop. So that you know it is a use a tile either to implement 2, 2 to 1 say 2 sets of 2 variable implementation or a flip flop it is a clever idea I have not used it I do not know how effective the utilization is and what is the kind of overhead in terms of these kind of switches because there are switches needed to configure interconnect you know chain it together and all that I have absolutely no clue because I am not played with this particular FPGA but one can look at it and they have the fast connect you know between the adjacent blocks there are one short lines of 1 logic block, 2 logic block, 4 logic block length long line across the device clock tree a pad ring so there is a wires around to do the pin locking SRAM blocks the programming technology is flash ok. So that kind of winds up the look at the various other FPGAs the FPGAs from vendors like Altera and Aktel in the case of Aktel we have looked at basically an antifuse technology and the flash antifuse is very much useful in space application and flash is convenient because it does not have the power on the configuration time and that is at least the Aktel FPGAs have certain different architecture and the antifuse architecture is kind of totally different from the SRAM based architecture like if you are working in space application maybe you should look at the Aktel FPGAs other FPGAs there are radiation ardent devices available but SRAM FPGAs prone to corruption in the presence of you know kind of a space radiation so that should be kept in mind. So very briefly since we have looked at the CPLDs and FPGA a basic you know comparison we have mentioned this at the beginning so the logic is in a CPLD is always AND and OR and most of it is wasted because there is very wide AND decoding and sometimes the product number of product terms are quite high the FPGA essentially uses lookup tables and there are few FPGAs as MUX and few gates the register logic ratio is small very very very less number of flip flops but FPGA has lot of flip flops and even we have seen that lookup table itself can be used as memory it can be used as shift registers we have mentioned this and so there are quite a large registers and dedicated memory is available so for all practical purpose this is a huge advantage FPGA not only in terms of complexity the register logic ratio is quite huge in the case of FPGA and the timing is very simple because there is a crossbar central switch interconnecting the old logic block so the timing is simple you know you have one or two interconnection but in the case of FPGA it can be very complex because from one end to other end there can be quite a lot of switches programmed and that will add to delay unless one fit in your design and floor planet properly you would not be able to kind of estimate the timing a priori you cannot say that at the beginning for a complex design how much performance in case of in terms of delay you can get it and the architecture variation in the CPLD is very small like you take the CPLD from Altera or Xilinx or say Atmel the architecture will be identical but if you look at the FPGA it can be quite different at least SRAM FPGAs look similar but then you can have the anti fuse FPGA which has a quite a different architecture and the programming technology in the case of CPLD is always flash it is good in a way it is non volatile but in the case of FPGA it is SRAM based mostly SRAM based some flash and some anti fuse with regard to capacity let us look at the slide see it is limited CPLD has a kind of 10k blocks but here there are kind of 2 million look up in the largest FPGA available now plus lot of memory dedicated memory not we are not talking about the look up table being used as memory the dedicated RAM is quite large. So that is the comparison between CPLD FPGA, FPGA is very much useful for prototyping so by a large the use of FPGA is in prototyping that means you want to develop an ASIC you implement that first in FPGA ok that is why high end FPGAs are used there are vertex 7 very complex FPGA which cost quite a lot is there. But then that enables one to test an ASIC before going into the foundry because ASIC the capital investment or the non recurring engineering cost is very high and if you go with only the simulation the timing simulation to the foundry then huge risk is involved fabricating it if there is a bug you have heard about the indel bug in the you know floating point unit the famous bug you know the chip manufacturers have to call it back then correct the issue it is a huge trouble you know all the production is top it affects you know a lot of money is wasted a lot of time is wasted. So all that can be avoided by you know putting the design in FPGA and when say the CPU vendors put their design in FPGA they have to put it across multiple FPGAs ok. So like if you take an indel CPU even the core functionality cannot be put it in a single FPGA might require multiple FPGAs to put everything together. But once it is tested for long time they are confident and go for ASIC. One more thing is that many a times at clock speed testing may not be possible like if there is a CPU which clocks at 1 gigahertz or 2 gigahertz there is a less very less chance that I know it can be done in I mean it is not less chance there is no chance that you can clock an FPGA even the latest 122 nanometer can clock at kind of 2 gig or 2.5 gig it is impossible it will be in the order of megahertz 200 to 400 megahertz. But say you can run it for a long time. So mainly the FPGAs are used for prototype checking. There are low cost FPGAs like Spartan which can be used in a mass produced item. But it is all about the economy that you make $10 FPGA and you put it in a consumer device maybe in a setup box maybe in a TV but always the people think of integrating that logic into part of an ASIC because there will be some CPU related SOC which is working. So why not put this whatever logic which was there an FPGA into that ASIC. So whatever even if the low cost FPGAs are used there is a great chance that the logic within that FPGA get absorbed into the SOC which is being built. So by and large I would say the use of FPGA is in the IP development and in prototyping and which is a huge role you know you should not underestimate saying that you do not see an FPGA in a commercial device which is mass produced but there is kind of the particular place for FPGA in this particular design space and one can play with the different high performance architectures like accelerators you know you want to speed up something and you can use FPGA okay. And that once you know you can try out various algorithm when it is successful you can make an intellectual property that can be later translated to an ASIC you know. So and the high performance we know that the FPGA clocks at a very low clock frequency compared to ASIC but you still you can get very high performance out of FPGA because you can parallelize computation okay. So assume that you are doing some packet processing in FPGA say you want to do say networks you know kind of security device say intrusion detection device where you have to analyze a packet okay. Now when you get a packet you get it serially you convert into parallel okay now maybe you have to look at the addresses okay the destination IP, source IP, the port numbers of TCP. The basic idea is that once you parallelize it you can do a parallel processing okay you can look up say you do not have to look at sequentially you do not have to say first look at the destination IP address then source IP address then the protocol number everything can be looked up parallely you know you put while reading when you convert serial to parallel you store this bit stream in a wide memory like maybe 128 bit wide memory and read the 128 bit all together sorry 128 bit you know all together and do the parallel processing you know kind of simultaneously look at the source IP address, destination IP address, source port address, destination port address even if you want to look at the higher level suppose you are doing some HTTP kind of protocol processing or parsing say you want to look at the SQL queries because there could be attack on through SQL. So these are deep inside the packet the TCP within the TCP payload and so that can be parallely looked at so that way the performance will be quite high so that is how one achieve high performance using the FPGSA you do a parallel you know you make a wide data bus and wide memories and you do the parallel processing sometimes you may have to use multiple memories like you have the say the maximum possible memory width is 128 bit. So what you do is that you put two such memories and the memory output can be kind of parallely feeding your processing logic okay and sometimes we use multiport memories okay. So you have a 4 port memory where 2 ports are writing 2 ports are reading different locations and doing parallel processing. So all these techniques can be used to extract high performance and one thing I have not touched because this was a kind of basic course I wanted to kind of take you from with a minimal background to a good level that was the intention of the course but there are many things you know I have not seriously discussed synchronization or these high performance techniques. One thing I have not mentioned is the pipelining so wherein a data path operation can be kind of sequence through registers the idea is that say you are adding very simple you have a ripple adder okay and normally you give an 8 bit ripple adder all the 8 bits are added together. But it is possible that you introduce a register such that the when two least significant bits are added and then in the second stage the next two bits are added okay and while the first data's second bits are added the new data can come in and the LSB of the new data can be added okay. So it is possible that 8 additions can happen parallely it involves a lot of memory to store the previous results and so on okay. But that is essentially pipelining you may have kind of studied that in the context of CPU. But pipelining is a general technique which is not only in hardware it you know it kind of stems from the factory assembly line automation and if you want to look at the hilarious background you can look at the Charlie Chaplin's modern times how the automated factory line works. So basic principle is that so through pipelining when the data is streaming okay pipelining is useful only if the input data is coming continuously in such cases you can get very high throughput by pipelining okay. So that is another way of getting high performance using FPGA. So I think I have briefly mentioned how to perform you know high performance computation within FPGA by parallel processing by replicating the computational engines you know you have a basic code doing some basic computation you can replicate it to do things parallely then you can do pipelining to get high throughput. So these are the different method of achieving very high performance and you can use like the multiport memories which aids in kind of parallel computation because you can write through one port you can read through one port or you can read through multiple ports from multiple locations for the processing to proceed parallely. All these techniques can improve very high throughput and we have implemented very high throughput kind of processing which is sometime kind of quite surprising that with a very low clocking like 200 megahertz you can achieve such high frequency that is one reason the FPGAs give very high serial ports you know there are serial ports which clock at giga bits okay gigahertz. So that you can get in the data very fast then parallelize it though the internal clocks are low you can do a parallel computing so that is the basic idea about high performance design using FPGA. So please go through the FPGA lectures I have given try to understand it and you will be able to do good design and I wish you all the best and thank you.