 Welcome to this lecture on field programmable logic arrays, gate arrays in the course digital system design with PLDs and FPGAs. In the last lecture we have looked at the clock tree essentially some basic related to the clock tree why the clock trees are special clock trees are required. And we have analyzed the basic timing of data path and sequential circuit in the presence of skew and then that is the kind of you know setting the background for the clock tree which gives the minimum skew between the end points. And then we have looked at the special resources like DLL, PLL, clock managers and all that and we have started the various method of configuring or programming the FPGA. So before continuing with the configuration or programming we will briefly look at the last lectures portion and then we will get on to today's part. In the last lecture we have looked at basically the issue of metastability for a flip flop not to get into metastability the input has to meet the set up and hold time. And for a data path or a register to register path from that requirement we develop the inequality for the minimum clock period and we develop the condition for not violation of the hold time ok. And same thing with the sequential circuit or FSM where the register to register path is between the state registers expressions are same because if you kind of show it explicitly you know separate the source and destination it exactly looks like this. Now what was missing in this analysis was that we assume in all these you know arriving at this expressions we assume that the clock is reaching at the source and the destination at the same time which cannot be true in a chip you know depending on which way the clock flows the destination could be lagging or leading the source in terms of the clock arrival. So that should be taken into consideration when analysing the these expression and that is what we have done and we have set the two conditions or two scenarios where the clock and the data flows in the same direction which is called a min path problem and where the clock and the data flows in opposite direction. And we have looked at the case where the clock and the data are in opposite direction then in that case the source clock is kind of leading or the destination clock is lagging. So the available time for a register to register path is reduced by amount of skew. So t clock – t skew has to accommodate t skew, t comb and t set up. So in a sense the clock period requirement goes up and the frequency comes down. So because of this skew the frequency of operation is coming down in the opposite problem in the min path problem where the source destination is receiving a clock which is a delayed version of the source register with respect to kind of you know as far as skew is concerned. And in this case looks like the register to register path from one clock edge to the next clock edge we are in a better position because you have t clock plus t skew is kind of time available which need to accommodate this path delay ok. So if you add an edge from one edge to the next edge things looks pretty comfortable because that is you know easier on the budget you know if you would have fixed the clock period or something then here is a relaxation on the clock period so which is good in a way. But the real danger as I said is that since the clock is getting skewed maybe this if this the data path is fast then it can violate the whole time because the clock is moving ahead and if the t co and t comb is min then it can violate the whole time. So that is what is shown here t co min plus t co min should be greater than the t skew max and t hold. So if it is not so it can violate and if there is a violation happens kind of reducing the clock frequency is of no use ok. Because in the earlier problem we sorted out suppose we have a clock period which cannot accommodate this kind of skew then we can reduce the clock period. But in a min path problem it is not related to the clock period at all because we are talking about the same edge of the clock ok. So between the same edge of the clock or between the edges which are delayed same edges which are delayed by the skew. So the only way if there is a whole time violation to avoid is increase the combination delay or the logic delay. So in essence clock when somebody is routing the clock in a chip if there is a relative delay between the end points ok wherever it is that is troublesome that can reduce the clock frequency that can create whole time violation. So there need to be a mechanism to kind of route the clock trees such that the relative delays between the end points are minimal. So that like that shows you know with respect to earlier picture we cannot route like this you know that as the clock tree goes it gets you know everything gets delayed because of the loading and from this end to that end the delay is very high. So we cannot have such kind of arbitrary scheme of clock routing. The ideal thing would be that from the clock pad to any end point there should be kind of equal length of the wire and the equal number of buffers. This scheme does not even help even if you insert buffer here, buffer here, buffer here and all to kind of offset the delays you know offset the loading that does not help because the buffer delays add to this skew. So you should have the buffers which are balanced. So it is ideally like in a simplistic way you can put a buffer then branch another buffer and another branch you know from the main route one more branch with the buffer and so on. So that the extension of that is the H clock tree where from the input point you know you go horizontally branch then vertically branch then horizontally branch wherever there is branching then you put a buffer okay. Now that does not mean that you know this is just one end point where a single flip-flop is connected you know there could be some kind of depending on the buffer capacity there could be 20 or 25 or so many depending on the buffer so many flip-flops are there in this branches in the last branches okay. But like if you have order of magnitude increase in the number of flip-flops then you branch vertically you know each one from there you can go different vertical branches and so on okay. And the idea is that if you take from say take this end point and this end point and if you count the number of buffers up to the end point and take the length of the wires it will be identical. So the relative delay is between the edges between the end points will be minimal though from the kind of the input to the output there could be you know skew quite a bit of skew which is not much of a problem if that is a problem then I can introduce a DLL which compensate for that kind of delay okay. So that is what is the DLL is you suppose you have a clock tree which is getting loaded and because of the loading if the clock is delayed and if there is a skew between the input clock and the output clock then the DLL can avoid that because from the output there is a feedback to the input and there will be definitely delay a skew. But how it is compensated is that the t clock minus t skew delay is kind of further added into the tree. So that this edge will come in kind of sync with the next edge okay. So the delay lock loop you know kind of synchronize a edge by further delaying already delayed edge you know that is how it works. Compared to PLL in a PLL essentially the scheme looks similar you know you have the input clock and the feedback clock. But what is done is that the phase is compared and the phase difference is converted into a voltage with a low pass filter and or some equivalent scheme okay and a voltage control oscillator you know synthesize a clock frequency which is not the kind of delayed version of this input clock frequency as in the case of DLL. But in a PLL a new clock is synthesized okay in phase with the input clock and that has certain advantage in the sense that like if there is a jitter in the input clock in a delay lock loop that will appear as a jitter at the output. Because delay lock loop just delays the clock but in a PLL since it is synthesized and there is a filter which kind of you know filters out the minute jitters and the clock is kind of stable okay it is a new clock it would not have the distortion of the input clock or the jitter of the input clock. So that way PLL is better but then you know that the PLL always act you know with kind of with a range of there is a lock range and there is time to lock and things like that you know at the beginning when you start up the PLL has to get into lock so it takes certain delay for the PLL to come into lock and all that. So that is about the PLL so in the current FPGAs as PLL digital clock manager which is composed of DLL for disqueuing phase shifter frequency multiplication division and this gives phase shift like 90 degree, 180, 270 and all that. There are clock buffers maxes for clocking I know clock switching and that has to be glitchless you know when it has to synchronize the edges. So there will be synchronizing flip flops which synchronize you know 2 clock sources to each other and all this can be connected in the clock path or clock tree path. So that is what we have discussed and this can be all these special resources can be instantiated from the vendor library using the code generator tool. Sometimes synthesis tool will infer from the code. Sometimes you have to write the VHDL attribute along with the code to help the synthesis tool to kind of infer what is going on and to kind of instantiate the correct component. Then the last thing we have looked at was the configuration or programming of FPGA and this the first thing while prototyping the most suitable method is through a synchronous serial port called JTAG which is used for PCB testing, chip testing and all that which is used for FPGA configuration also. So there will be a small dongle connected to PC which can program the FPGA and then there is a serial mode where the FPGA clock a serial prompt and get the data and programs it. Slave serial exactly like master serial where in the slave serial FPGA expect the clock as input and the data in synchrony with the clock normally this works in conjunction with the master serial for a kind of chained programming of the FPGA. And the select map is a byte wide or kind of programming 8 or 16 bit usually when there is a microprocessor through a parallel bus FPGA can be programmed and the configuration bit stream can be stored in the memories of the processor ok wherever it is located maybe in a flash on the board and from there along with the firmware it can be stored and at the start up the CPU can program the FPGA. So this shows a master I mean serial configuration where one FPGA is a master and another FPGA is a slave which is configured by some mode pin 3 mode pins are there in the current FPGA there are only 2 mode pins and see this is in the master mode and this is in the slave mode and you can see there is a serial prompt the clock for the serial prompt is given by the master FPGA. Also the slave FPGA get the clock from the master and the data from the serial prompt is connected to the data input of the master FPGA and there is a data output of the master FPGA which is connected to the data input of the second slave FPGA. Now if you have a third FPGA what you can do is that the data out of the this serial port can be connected to the data in of the third FPGA and so on and you can combine all the bit stream of all the FPGAs and store in the this particular prompt at the beginning what happens is that the power on the FPGA start programming and or if there is a program pin is there if the program pin is made low and you know if you kind of apply a negative you know zero pulse then it starts programming any time. One issue is that at the beginning these FPGAs to clear their configuration memory ok because it could be not a power on programming it could be a reprogramming of the FPGA while the power is on. So all the program memory configuration memory has to be cleared and depending on the FPGA type and the size it will take variable time. So the question is that how do each FPGA knows that the other is completed the initialization. So there is a init pin which is an open drain output which is also sampled internally. So it is an IO pin which is open drain and which is pulled up. So it is like it is forming a wired and ok if everything is high this will be high if one of them is low which is pulled low ok that is the state of this kind of wired and connection. And what happens is that suppose if the master clears the configuration memory and it has come out then it will drive the init output high. So but if this slave is still in the init mode it will be driving it low because of this pull up resistance this still be in low and that is also input to this particular FPGA and that you know sample that input and if this still is it is low it waits for other FPGAs to finish. So this kind of the init which is an IO pin which is an open drain which forms a wired and helps the init synchronization between the masters and slave. So ultimately everybody initializes every FPGA initializes and in come out of initialized initialization and then the master FPGA you know start getting the data ok. You know it gives the clock it enables the clock and you know you can see that the init pin is connected to the output enable of the prom. So even if the clock is going unless the init goes high this chip is not, chip output is not enabled so the data would not come. So once the init phase is over the FPGA starts clocking and get the data. So the first the master FPGA programs while it is getting program this D out is made one ok. The one is going there so the slave wait for the starting pattern of the configuration since it everything is one it will wait. So the master FPGA configures itself once the configuration is over then the slave FPGAs configuration bit comes and that is bypassed to the D out and the slave gets you know programmed ok and so on. Now if the second slave while the second slave programs that leaves the D out with one and the third slave wait. So one by one the FPGA gets programmed and this is a DUN pin which is a which indicates that the FPGA is kind of finished programming. Now again once again this is an open drain output so which is pulled high so it forms a wired and again unless all the FPGAs DUN pin is high this would not go high. So this is an indication to the rest of the circuitry like you will have some like maybe a CPU which is working in conjunction with the FPGAs. In that case this DUN pin will indicate a CPU saying that FPGAs finished configuring and normal execution can kind of continue. So if there is a CPU it will kind of sample the DUN pin and wait for the FPGA configuration to be over once it is done then it will start you know enabling the normal operation. Or if there is another external circuitry you have to make sure that the DUN pin kind of enables the rest of the circuitry otherwise there will be synchronization problem with the FPGA and the rest of the circuit. Because there is another circuit which gives input to the data input to the FPGA and if FPGA is not configured at those the data will be lost. So this DUN pin has to be sampled by the rest of the circuit and normally we can assume the rest of the circuit is maybe a CPU or another programmable logic based circuit whatever it is this has to be taken care while designing. So that shows the kind of very important kind of mechanism to program the FPGA in a chain okay in the multiple or this can be single it does not matter if there is only single FPGA you can forget about this and this works. And nowadays the SPI or SPI is based the PROMs can be used. So the FPGA offers instead of the custom serial port SPI port and the SPI can work with 1 bit data, 2 bit data and 4 bit data. In addition this PROM itself can be programmed through the JTAG port permanently. So not only that while prototyping the FPGA can be programmed through JTAG this SPI PROM not this PROM the SPI type PROM can be programmed through the JTAG port which often when you buy an FPGA board these options will be available. You can program normally a SPI flash will be connected in this fashion to the FPGA so that you can program the FPGA through a JTAG port this flash through an JTAG port. And if you put appropriate the mode pins FPGA can be configured from the flash PROM all that is possible. And that is what is written here all what I have done is kind of elaborated. And the select map scheme where you have a CPU and you have FPGA normally you have a synchronous byte wide or 2 byte wide data you know program the FPGA. And these are the pins you know you have the program chip select write clock and data this is a synchronous bus in it done busy and all that ok. Now the issue with such a bus is that most processor bus are not sometimes synchronous ok like the micro controller bus may not be synchronous. So you may have to kind of translate the bus protocol of the processor to this synchronous protocol. So that can be achieved by a CPLD coming in between to do the protocol translation of the bus and we have discussed the CPLD in the CPLD part of the lectures and we have mentioned that CPLD is good for bus protocol translation. So that is the scenario which is shown here where the CPU has some memory which could be flashed where the firmware is stored. Now the FPGA configuration bit stream can be stored here CPU can access that read it then you know maybe byte wide one byte is read here then the byte is written there again read and write you know such a thing can be done using this kind of scheme. And if possible the CPU can directly write if there is a synchronous port which matches the protocol otherwise you will have to use some kind of parallel port instead of CPLD that can be done. But which can be very slow sometime when you use a parallel port in a CPU because you have to address each port you have to use the port for clocking which can be very slow sometime ok. And that shows the timing like you know every clock edge the data is coming chip select and write bar is low and any time the FPGA is not writing the data it will indicate the busy signal in that case the data has to be kept for one more clock you know. So it is like a kind of extending the bus cycle if the peripheral is slow I am sure that you have studied that such scheme you know normally ready normally not ready kind of system. So this is a kind of normally ready system. So if it is kind of nothing if the FPGA does not require more time then you keep clocking the data every clock cycle if it requires then it indicates it is not ready or it is busy then you extend the bus cycle by adding extra delays. And this is a kind of very simple scheme which can be implemented in a CPLD. So that is what is the kind of crux of FPGA configuration. There are you know current FPGAs are more detailed I will briefly mention it but for vertex these are the main ways of configuring it basically the JTAG the master serial slave serial and I mentioned about SPI port. Then the select map which is byte wide which is 8 bit or 16 bit one thing to remember is that while FPGA is getting configured all the pins will be in tri-state mode. So all the FPGA pins will be in tri-state mode. So the rest of the circuit kind of has to make sure that while you know being configured these pins the status will be tri-stated and it has to be appropriately pulled up or pulled down depending on your the requirement of the rest of the circuit ok. So the rest of the circuit is sampling one of the output of the FPGA which normally in a default state you are assuming it to be 0 but while configuration rest of the circuit comes up before the FPGA configuration then assume that one of the input is 0 if it is tri-stated it can create problem. So it has to be pulled down and once FPGA is configured all the flip flops are reset using an internal reset line which the FPGA enders does not advise you to use as a reset you know. So in your circuit you want reset even the power on reset you should implement the separate power on reset than this internal reset signal though there is a way of using that reset signal within your design. It is a very kind of high kind of lot of flip flops are connected to it so it is very heavily loaded so it may be a better idea to reset it separately which gives a good drive. So this I just briefly mentioned because there are different you know kind of x centered version of that vertex configuration is available in reset FPGA. So I have taken an example of Spartan 6 it is true of Spartan 6 or kind of vertex 6 or vertex 7 or any of the 7 series FPGAs. So your boundaries can which is a JTAG port which can configure a single device or like we have seen in the serial mode you can chain the multiple devices through the JTAG you know exactly like what we have seen here even in the JTAG there is a TDI pin which take in the data and TDO pin which take out the data. So the TDO pin of an FPGA can be connected to the TDI pin so it can be chained ok. So that is possible so you can in a boundaries can you can have a single device or a chain you can program multiple device. In the master serial you can have chain you know along with the slave you can have a you know chain of devices sometime what is required is that if same configuration has to be applied to all the FPGAs ok. That means all the FPGAs are of the same type and it is configured by the same bit stream it can be connected in parallel and now in the serial mode if you use SPI flash the data bit can be 1 bit, 2 bit or 4 bit there are appropriate pins and similarly for the slave serial you know normally as I said slave serial work along with the master serial for the chain. And in the select map you have 8 bit and 16 bit configuration you can have a single device or you can have a chain of devices in select map you can have slave select map where the clock is here the clock is given by the FPGA but in a slave select map FPGA expect the clock from outside along with the data ok. So it is little more elaborate than the vertex we have studied so I just mentioned so that the information is current whatever I have talked about the lookup tables logic block and all that can be extended to the these kind of new devices. But this is additional so I mentioned that and another issue with the earlier FPGAs were suppose you come out with the proprietary design you put the configuration bit stream in this prompt ok now you send that into the field on a PCB or in a product what can happen is that the power on somebody can capture this data because it is very easy because the clock is coming in synchronous data is coming. So this can be easily kind of reverse engineered now somebody can read the bit stream and reverse engineer the complete system ok because in an FPGA the main thing is in this prompt and that is kind of there is no security on it the bit stream is coming as it is it can be easily read. So it will be worthwhile if it can be kind of protected because mostly this will be some intellectual property of the designer or the company which is doing the design. So what is done in the current FPGAs are that this bit stream can be encrypted using the advance inscription scheme which is called AES using a 256 bit key. Now through the JTAG port this FPGA can be programmed with that key and FPGA can be told no read back that means once it is programmed this key cannot be read back also the configuration can be read back ok otherwise it is possible to read back the configuration for the purpose of verification and so on. So that also can be disabled and then you deploy that in the system that means you program this prompt with the protected bit stream then what is going on this kind of this line is an AES encrypted bit stream and without the knowledge of the key it is very difficult to break the scheme and this can be kind of a key can be very specific to the device that means if a company is making an industry is making say 10,000 devices each 10,000 will have separate keys not the same key. So that way it can be very well protected so that is available in the current FPGAs. So there is an AES encryption with a 256 bit key so the bit stream is encrypted using the bit chain tool with the 256 bit key encryption key is programmed into FPGA through JTAG port and once it is programmed you can configure it for no read back and the configuration also cannot be read back and AES key can be permanently fused like blowing the fuse in FPGA or it can be programmed in an internal SRAM with a battery backed you know with a battery backup which is connected externally. So one can choose those option if you permanently fused you cannot change that AES key so it will be permanently programmed you will be forced to use the same key for all the bit stream you program into to the flash so that way the encryption kind of protect the design. Another option available in the recent current FPGAs are bit stream can be compressed and which is bit older than you know not that this part in 6 even before the FPGAs before has this had this option because there could be a lot of resources which are not used at all used not configured at all so there is lot of kind of redundant information that not being programmed so that information is removed and the configuration bit stream size can come down. So that has two implication one is that you can store it in a lesser memory space and it can be configured very fast the configuration will take lesser time otherwise for a fixed device it will take you know kind of fixed time for the device is larger takes larger time. But even in a larger device if it is not used only 50% is used it might take you know kind of less time than the full FPGA configuration. Another possibility which is available is that suppose I know this addresses suppose you have programmed a configuration here and the feel like if it is programmed in a flash and flash memory and sometimes the flash can get corrupted you know suppose the flash get corrupted then the whole device may not work the flash need to be reprogrammed okay. Nowadays it is possible you know you would have seen that earlier the you know the computers used to update you know through the internet the drivers the software and all that. But now you can see the setup box at your home connected to a TV it can update through the cable you know through the cable the firmware get updated your mobile phone can get updated with the firmware over there. So same thing can happen now the configuration through the internet or wireless network can get you know can get to the device and the device can reprogram itself. So that is a possibility so but still you know in the field if the flash is getting corrupted it will be a good idea if you have a kind of a golden bit stream or a fallback bit stream which may not be the recent one which may be a very old one which is stable okay. So it can happen that you update the configuration over the internet and some corruption has happened then you can fall back on a very stable version which is there from the beginning okay. So that is what is called multi boot and that is what is shown here in this particular slide. So you can have at least one main configuration and one fallback configuration and during the configuration if the CRC error like while the configuration bit stream is read the CRC is calculated if there is any bit corruption CRC will give error in that case it will fall back on the golden configuration or if the sync word detection is timed out that means at the beginning there is a sync word and if that is corrupted and there will be watch talk timer which is waiting for it and if it does not come that times out and in that case it will it can fall back it can fetch the golden configuration from a particular location and that can program the FPGA and can recover from that particular error. And this scheme of multi boot is available only in the SPI based flash prom and the BPA based flash prom okay. So in both it is available it is a very good option if you have a kind of if you are deploying FPGAs in the field you should think of you know the multi boot, encryption and compression and all that which will kind of improve the reliability improve the security all that part okay. Now the current FPGAs in addition to the DLL, PLL block memory and all that you have the DSP slices which allows the designer to implement the DSP algorithms very efficiently and the DSP algorithm normally use fixed point computation 18 bit Mandesa may be used. So there are and you know that in one of the major operation in signal processing algorithm is multiply and accumulate. So which requires a multiplier and an adder okay. So the DSP blocks there is a silence FPGAs DSP blocks give you this particular option you have a pre adder that means there are 18 bit 2s complement pre adder that means you can do signed addition 18 bit in the pre adder and that added output can go to a multiplier which is an 18 by 18 multiplier. So 2 18 bits can be added and that can go to multiplier it can multiply another 18 bit which is coming from a separate port it produces a 36 bit result. Now this can be signed extended to 48 bit and it is followed with a 48 bit 2s complement adder substractor and there are various you know it is not that everything need to be used you can use this multiplier along with the post adder or you can cascade pre adder and post adder bypassing the multiplier. You can take the multiplier result outside 36 bit result. So all kinds of options are available in the DSP slides which really enables one to implement the DSP like say filtering algorithms, the encoders, decoders and so on. All that can be implemented very efficiently in the DSP slices and many a times you just you know use the multiplier operator then if the data size matches then it gets implemented in the DSP slices or it can be instantiated and you can use it. So this is the general architecture you have 4 ports all are kind of pipelined and 2 you can see 2 ports are added 18 plus 18 it give you 18 bit result which is combined with the 18 bit to a multiplier. This output can be taken out or it can be taken to in another adder which is 48 bit. So this is sign extended added with the you know sign extension that output is available. So this helps in implementing the DSP kind of algorithms and this is called DSP48A1 slice in the Spartan 6 and in addition I must mention I have not put it in the slide but I should mention that the current FPGAs allows you to use the lookup table as a shift register. Because the lookup table is normally suppose a foreign boot lookup table you have 16 flip flops inside serving as a location. So that is connected in a chain and is available as a shift register. In this case it is called SRL16 and that can be chained to form SRL32 and even higher 64 bit shift register up to 256 bit in some kind of Spartan 6 slice and that can be you know chained across. So that is you know you get a shift register implementation without using the flip flops of the slice because the number of flip flops in a slice is limited and we have seen in vertex there are only 2 per slice, 4 in a CLB. So if you say implement kind of 16 bit shift register then you will end up using 4 CLBs but using one lookup table in a CLB you can implement a 16 bit shift register. So that is possible to implement and one other important very important thing with respect to FPGAs. So I am kind of discussing all what is remaining you know one problem with the FPGA is that you have designed something in FPGA. You have you know verified you know you have done behavioural simulation, timing simulation everything but the moment you put that design in FPGA for whatever reason if something goes wrong you cannot and if it is an internal signal you cannot kind of debug it. You cannot see what is happening inside okay. So what is done is that the FPGA vendors give you a logic analyzer circuit which is hooked to the JTAG. So now you can instantiate this logic analyzer IP along with your design and the probes of the logic analyzer can be connected to the internal signal. And through the JTAG port there will be a software which is available at the tool side at the PC side and through the JTAG you can capture the waveform of the internal signal and you can analyse it and like logic analyzer you can trigger say it will be kind of crazy because suppose you have some kind of 8 bit data line you want to monitor okay. And your clock is 200 megahertz and in one second there will be kind of 200 mega kind of data passing over the data bus if it is clock that frequency then to analyse that or to store all that there will not be enough memory. So you can if you know that you have a kind of hint saying that what could be going wrong but on a particular data the error happens then you can trigger on that data and capture some data around the trigger point or after the trigger point or before the trigger point. So if you use logic analyzer you would have heard saying that pre trigger post trigger pre trigger 50% and so on that it depends on how much you capture before the trigger how much you capture after the trigger. And the analysis is always offline it is not that you on a trigger point you capture some data and you analyse offline on the PC and try to debug it. So it is a great tool to it is like once you have some complex circuit going to the FPGA to debug this logic analyzers can have to be used and the silencs college chip scope probe and the altera college signal probe and it is very easy now over the years it has become a very nice tool. Only thing is that it occupies because this IP this circuit occupies some kind of space within FPGA. So if you are not really floor planning in the rest of the circuit it can upset the timing performance of your design sometime but then if you properly design floor plan that is not a very serious issue because it does not occupy much space. So that is shown in a picture here you have the FPGA board you have FPGA what is done is that this is the kind of the blocks you instantiate like you have a ILA pro it is a logic analyzer pro which has you know the pro points which can be connected to the user port. And this is the one which is connecting to the JTAG and on the PC side you have a software which on the trigger it captures the data transfer it to the PC and you can analyse and take action you know that is what is the signal probing does. So I am just showing the vertex pins you have the global clock pins which is dedicated which is used which is input to the clock tree. The mode pin for selecting the mode of programming dedicated pins. The C clock which is the serial clock which is can be reused as a user IO program done in it busy the parallel port all that can be used once the programming is done it can be used as kind of this program pin is dedicated but this in it done pin is dedicated but these are can be used as user IO and there is all these you know these are programming pin and there is a JTAG port which is dedicated TMR is the mode select TCK is the T clock TDI is the data input TDO is the data output then you have the internal VCC IO pin VCC the ground and so on. So normally the FPGA this is the scenario you allow clock pins you have some dedicated JTAG port dedicated mode pins and the programming pins some are dedicated most of it is kind of can be once a programming is over can be used by the user okay. So there is one point I want to mention here is something called one hot encoding which is a state machine encoding which is used in FPGAs. This is you see the state machine block diagram we have an excite logic which is decoding the input and the present state and or a two block diagram you know enough of it but what can happen is that we will take an example to kind of you know to put the background for the need of one hot encoding. Suppose you assume a finite state machine with five inputs 18 state and six output. So when you have an 18 states it means that you have you need kind of a binary encoding five flip-flops because it is greater than 16 less than 32 five flip-flops are required and there are five inputs. So if you look at the excite logic and it will get five input and five state variable so there are total 10 input to the excite logic okay. But you know that there are five flip-flops so there will be D4 to D0 each of that may not use all the inputs okay but we assume the worst case assume that all the in some case all the five inputs are used less likely but then let us assume the worst case. So and we assume that so there are 10 input for an excite logic and once again we assume the worst case in the vertex that means we have seen that the five input function can be implemented by cascading or six input or seven input can be implemented with two four input lookup table. But we assume again worst case for the you know for the sake of argument so here we have 10 input next state logic so for one CLB can implement six input lookup table by you know two four input make a five input two five input make a six input and so on okay. Now when you have a 10 input lookup table that means that you need 16 CLBs because you can implement one CLB can implement six input lookup table. So you need two four seven four four eight and so on okay. So you will end up with the 16 CLB I definitely have kind of exaggerated it because is not the case that all the inputs all the next state decoding will require all the external input and it is even if there are really 10 input for some kind of decoding then that may not require the lookup table which is 10 input lookup table. One can cascade but assume with the worst case it happens one can kind of come out with such a scenario then what happens is that this next state logic is spread in multiple CLBs and the interconnection of that make it slow. So this clock frequency which is the TCO plus T logic plus T setup and we are talking about the logic delay of the next state logic and that can become very high and it can bring down the clock frequency of the state machine and we have analyzed the data path and we have tried to kind of be very aggressive in the kind of the timing of the data path. And suppose if you have you know designed a high performance data path but if this state machine is slow then nothing can work. So not a good solution you know the problem here is made by the next state logic which is very complex okay. So the question is how to kind of reduce the complexity of the next state logic. So if you look at it again you come back to the slide we have 5 input 5 state variables. So the question is can we reduce this is the number of state variables so and we have used a you know kind of binary encoding which necessitated a 5 flip flops for encoding 18 states. So why not encode 18 states in 18 flip flops okay. So while decoding instead of decoding a state you know when you decode a state you do not need 5 bits representing the that one state you can take one bit representing that state you know that is a basic idea. So you have there each state is a flip flop okay. So you take this kind of state transition like s a get 2 transition one from the previous state s i on condition i and s j there is a self-loop which is condition j. So the d j because this is nothing but the s j d j corresponding to this state j is nothing but condition i and s i but s i is just a single flip flop. So we say it is q i or condition j which is composed of worst case 5 input and q j because this state is represented by a single flip flop q okay q j. So you have the worst case 5 inputs specifying this input condition and 2 inputs for q i and q j. You have 7 input next side logic which require only maybe 2 CLB which is less spread and then we have less logic delay and the timing the clock frequency becomes you know manageable okay that is the idea of one hot encoding. So there is many a times people blindly like when you see FPGA you just say okay let the state machine be one hot encoded it is not required you know suppose you have a 4 state state machine with 2 input and 2 output absolutely there is no need to go for one hot encoding because this can be encoded using 2 flip flops and with 2 input the next side logic can go in one lookup table for a particular flip flop and there is absolutely no need of going for one hot encoding so but when the number of states are more, number of inputs are more then you can go for one hot encoding which eats up the flip flop but the timing becomes better which definitely is at the cost of extra flip flops but the timing is better. So this can be kind of control using some kind of attribute in VHDL code you can have a state encoding like sequential which is binary like the state 0 will have the 00 state 1 is kind of 1 and the gray is a gray code and this is a 101 and 101 which is a single flip flop for a particular state you know that is what is 101 and 101 and this can be controlled using some attributes like saying suppose you can say suppose you have to find a state type called you know enumerated state type called state type you can say attribute state encoding of state type you can say type is gray or 101, 101 things like that or you can even specify attribute enum encoding of state type, type is you can say S0 is this, S1 is this, S2 is this, S3 is this can literally specify the state encoding okay and now this is the vendor dependent this is not a part of the VHDL this is a user defined attribute. So you have to refer to the vendor tool manual whether they support it okay so we are coming to the end of the lecture so I will complete this part in the next lecture maybe another 15 minutes I will be able to complete the FPGA part then we will look at some remaining the VHDL part. So today basically we have completed the configuration we have looked at what is implemented in the current FPGA in terms of the configuration we have looked at the bit stream compression, bit stream encryption, multi boot then we have looked at this one hot encoding where the next side logic complexity can be reduced. So that the state machine works with at a faster clock so that the data it is in sync with the data path and that one hot encoding can be controlled using some attributes that is what we have discussed today and what is remaining is very minimal we will look at the complete this part and we will look at some FPGAs from other vendors very briefly one from the Altaira and one from the Hackerl and wind it up. Please review the portions I have covered so that you are in sync and I wish you all the best and thank you.