 So, let us continue from the last session, where we were discussing about the data path extraction capabilities of compiler. So let me just review what data path extraction means, so data path extraction means that compiler, compiler is able to see what data path operators will do and it will try to extract the maximum possible, if you try to extract the maximum possible number of operators and put them in a single module. I mean already given that let us say you have a module and you have coded a number of operations in there. So what DC will do is try to extract all such operators and it will try to implement them in a single data path block, so that the optimization and so sharing is able to make the maximum use of optimization. So we will see we saw this in the last session that extraction only occurs in such such cases these are the operators that are extracted as part of the data path and these shifter and equality operators are not extracted. So let us see one example of RTA code and we will see an example of about the process where it will show what data path will be extract. So yeah let us look at this example now. So this this module contains an input, so there is no module that we have. So these are the inputs and this is the output, there are so many there are five in fact setting operations taking place I have equal to a into b, I have equal to a into b Ip is equal to b into f, I4 is e minus g, I5 is I1 plus I2, that one is Ip plus I5 plus I4 plus 2 is I5 plus 2. So these are seven operations that are taking place in your RTA. Now let us see how DC will extract the data path. So now so you do not have to do anything special, you just analyze this RTA, elaborate, apply a constraint and do a compile, compile. After compile is done you just do a report resource. Similar example not exactly same but similar example is discussed in the last report where we see the report resources with report in the data. So now what kind of report resources what does it report. So it tells us that so it extracted all these operations and has put them in the same module. So DC will instantiate a module comprising of designer components and in this case it has instantiated single module for all these operations. On the right hand side it is also telling what are the operations that are taking place. So there are three multiplications, four additions and one subtraction. Let us see if it gives some code about the resources also. Let us see let us go back to RTL and so there are three multiplications, one subtraction and four additions which matches the one. So now DC since the RTL all the operations are a building module and all the operations confirm to the ideal coding writing that enable us to extract the data path. So DC will just instantiate this module from the model. Now further it will also tell us that now it says that there are seven inputs and two outputs and it is also telling us what is the purpose. Let us go back to the RTL again. So there are seven inputs a, b, c, d, e, f, e, g and there are two outputs. So this particular module now will correspond to so data path. So a, b, c, d, e, f, e, g will go to these codes i1, i2, i3 correspondingly and z1 and z2 will be connected to O1 and O2 of this particular module. We are talking about this module, about DC instantiated. It tells us what are the bus switch and so on. Now let us see what is this module. So this module corresponds to this name here, this name here, this name here. Now let us see what else does it report. So now it tells us that it creates an intermediate port cannot be and it implements this i1 into i2 plus i1 into i1, i1, i2, i3 correspond to a, b and c. So it actually did a into b plus a into c. We can see a similar operation here, i5 is equal to i1 plus i2 a into d plus a into c. So we created an intermediate pan out 3 port with this operation, it contains 2 multiplications 1, 2 and 1 addition right. Then O1 output is simply pan out 3 plus i4 into i5 plus i6 into i7. So it is able to determine that a1, O1 is equal to some function of pan out 3. It is a verified result from the article. So O1 corresponds to z1 and z1 is i5 plus i3 plus i4. So you can verify that this impact is the case and it tells what all operations are needed. Again O2 is equal to pan out 3 the output. So it is able to use 1 intermediate output here to compute the O1 and O2, it is output right. So and so this is all these reports here are telling us that DC is able to extract data path. It is able to create a module, an intermediate module that implements all our fields and how they input and output are connected. Now remember if you do a default compile like that without setting the variables that controls the data path, you set this variable to false compile unroof dw to false and we see that then it is now contain the module and all the dw modules. This is the implementation report. Now it comes to that what kind of implementation is available. So it tells us that the implementation is something, module name is something, implementation is something. STR is a generic data path implementation. So it does not tell us what, what kind of multipliers or what architecture is chosen as of now. So some reports will be more runful, they will tell us what all operators are chosen, what are measurement techniques are chosen. We saw few examples in the last one also. So this is the way where DC will be able to extract data path and it tells us that it is implemented so and so. Now if DC would have implemented I1, I2, I3, etc. in separate modules then given the fact that modules give a 100 percent boundary, it would not have been able to reuse the resources or not been able to optimize most of them. So that is by the ability of creating a single module on implementing all the operations in the single module DC has the flexibility to reuse the resources plus it has the flexibility to optimize it. So it could be a very good idea to hold some, in fact the last example we did in the lab in lab 4 has the examples of data path operations. You could use that, you could use to experiment it more and see what kind of how does DC extract data path work. You can also in fact compare the instrumentation reports between compile and compile data and you will see that compile data is in the much more sophisticated way. So RT coding guidelines, so you should code RTS as that the extraction of lab data path is not possible because it leads to effective the best QR. You can see there is one solid article about this. Now again with the material type compile data. So Altra was introduced in two steps. In the first step DC data command was set, Altra optimized it. We set the compile data group and the compile command was added in. We set it, we did set the compile data group and then we let compile it. We will use compile data, but later in the later versions they will compile Altra command now contains all the functionality of DC Altra. So it is a converted solution. So we will talk more about what are the factors compile and Altra. So again I will repeat that till now in this unit 4 by every discussion compile data. We have seen that in most of the cases you as a designer do not have to do much. You just have to then compile it. We saw some concept and if you feel that if you do not feel that your particular goal kind of goal that we are going to not feel that then you use these advantage in this technique to achieve a particular goal. We do some problems, but the first class you should do we should always do compile data in the following. I am not trying to explain and I guess in 95 percent of the cases your goals will be met and you will be happy right. So now compile Altra let us see how to discuss briefly about compile Altra how does it operate. So compile Altra by default there are two parts compile. That means inside when you are single compile in form there are two levels of compile time. It is unlike compile command. So it has timing driven high level optimization. And these things are we are somewhat different than the regular compile command. It has the architecture explanation for automatic operation. It will select the data path implementations from the designer directory. There is some things related to wide span gate mapping to reveal data process. Yes it employs aggressive logic duplication. Then this feature is unique to compile Altra this one where auto unlocking of analysis along with the path happens. You can stop it by saying no auto output and there is a DFT flow support that means you understand right. So these are the options that we have seen earlier. We have already looked at these. These are the variables that control the maximum block size for auto output. Designware library is you can set this in synthetic library and the design that would be loaded. Now what happens is that there are two options area high effort script and timing high effort script. So these are these should only be used if you are doing some aggressive working or timing not you cannot do both. If you are either worried about your larger area or your timing position not being used. So if you give one of these option then DC will employ some strategies which are not visible to us and it will try to achieve a lower area or a better time. So there are some figures available from some of this. Like it will give about 3% timing and 10% area improvement as compared to before compile Altra. This is these are built-in flows. They are updated from these to be and they do not require any change to your scripts. But these again we should be using a very specific. Because again see each of these every time you use some special or advance or specific technique it will lead to bigger and larger. So for a big design for a friendship these are you should use them with caution. One time is always a time limit. Now few more examples of data path optimization. This is one example which is unique to DC and graph is that it utilizes the source sharing for parallel constant multiplier. So see this example now. Temp is equal to I1 plus I2 where and O1 is equal to 10.2105 O2 is equal to 10.260. Temp is common here. The binary code for 105 and 621 is as follows. Now it sees that 1010 to 70p 105 can be shared. So the implementation is not too multi-plier inside. It is just one multi-plier. Or the multi-plier is not limited to the chip property. So 10105 is 10 shifted by 6 plus 10 shifted by 5 plus 10 shifted by 3 plus 10. 10621 again is 10105 plus 10 shifted by 9 and 1050 by 9. You should be very very clear how DC is doing this for example. Each one in the binary representation represents a chip property. So the LSD here represents the 10 before 0 1 2 3. This one means 10 plus shifted by 3. Again this one this one means left shift to the 5. This one means left shift to the 6. So this is the way it converts the multiplication by constant into addition operation. And adders are much more area and timing. They are much better for area and timing compared to multi-plier. So and furthermore one other another optimization. What DC does is realize that 10205 is common to both the constant and it can use it. So 1020681 is equal to impact 10205 plus 2 shift by 5. So this results into better area. This is one of the special cases where DC Eltra does a very good job. Now yeah this is again a very important thing. Laboratory aware mapping and structuring. Now what DC Eltra does is that the first time you run. So it will read in the library before reading in the design. So when you read the design it will first read in the library. And it will characterize the library and create a set of matches for each function based on the library gets available in the library right. So what it does is that it will pre analyze the library. So in the lab you must have noticed that when we do the compile Eltra for the first time it will give a message such that it is creating a live sign. A live is the analyze library. So you have the standard side library as part of the standard library. It will use that library it will pre analyze it. It will analyze all the functions. It will create a table which is very similar to the one shown in the graph. So this will have an operation called that is equal to a b bar c plus b bar b c b. It will create multiple representation of that based on the type of set available in the standard library. Some of these representations would be good for area. Some of them for representation would be good for family. Please note this is not the design dependent. So the library analysis will be performed before optimization. And the information is kept in a file called alias file. If you do not if you do there are two options. We can pre analyze this technology library file by using these variables alias analysis. This is the command to analyze library you can do it effectively. And alias file is generated and kept in the library. When the pre generated alias is not file the automatically generate alias during compile. So we have seen this the second example the last one we have seen in the last. When you start a large refresh in the first directory. It will say that alias is generated in the first file and the first compiler will actually take some time. It will take 5 minutes to analyze the library. So again two options you can pre analyze and keep the alias file. Or you can let compile and analyze the library file for the first time. Second time when it finds alias it will not do this process again. So again what compile alias has enhanced function register removal now. Usually so enhanced function registers are detected by initial state simulation. Usually let us say the data pair of a clock is tied to 0. So DC will remove that file by default. Because that register is not there is no other functionality of the register apart from the constant view. So it will remove that register and save on a view. Now supposedly there are two registers like this where the reset state. So only set the second registers will be 0. The 0 will feed it to AND gate here and the view of first register will be 0. And this these couple of registers will never come out to be 0. This is because this queue is set back in. Now in compile or in earlier versions of DC this type of structure was not detected. Only single register was recognized for constant register removal. But now DC Eltra does a better job of it. And it recognizes that structure where even a shift register unknowingly or knowingly. If it is tied like this and it cannot come out of reset DC the removal and replace it by 0. Obviously this is a register you should take the option. So many register you can remove. So it explores design structure characteristics such as dependencies between registers to do a register removal. We saw that in the labs we are doing by default we are doing compile Eltra minus sequential output inversion. So DC in fact does compile Eltra does sequential output inversion by default. It has one more thing is that if let us say your library does not have an easy to accept. So if you have an asynchronous reset clock but not asynchronous set clock it can implement the asynchronous set clock by adding inverters to data input. And it can use the asynchronous input or it will have to inverter to be and use the QMN. So again it has some better functionalities related to logic understanding and ideal value. So by default we did low sequential output inversion. It is on by default. It is disabled with this option and it is not affected by this variable. Compile on the other hand is it affected by this value right. So now it creates a sequential output inversion. What happens is that the output of clock might be inverted to provide a better option. But this might create problems in formality. We will see we have a couple of problems in formality we will see what it means and how will it create a problem. So whenever you do such things whenever you implement this sequential output inversion it will give a warning. Information that sequential output inversion is removed and SVF file must be used to form a version. So you can choose not to do this by using how much not to do it. Let us start the let us see the concept of register becoming. What is register becoming? How does it work? What does it call? What should it be used? And so on right. Register refining is one of the most advanced format systems. Where the registers on the flops are either duplicated or merged or moved around the combination block. Just to reduce area or to make performance better. There are four kinds of register refining operation. The first one is register moving CV CV first graphic. So in all the four graphics there is a left hand side and a right hand side. Left hand side is the before, right hand side is the after. So register moving involved. So register is that before an inversion can be moved as the output of the inversion in case of register moving. In case of splitting the register as the output of a gate can be moved to the input of the gate, but now it has to be distributed. It has to be presented both the inputs or all the inputs if they are moving to input. In case of merging it happens the other way around. The register that they input are removed and the registers they will be output. So the splitting will involve an increased area because now the flop is moved earlier there was a run flop. Now there are two flop. In case of merging reduced area earlier there were two flops now it is a run flop. In case of hybrid what might happen is that the three registers that say for example here, this is again an adder a full adder. Now for a full adder case register that they input can be moved to the output to save on a loop, right. Now for register if the timing is often dependent on it. So you have to explicitly test the time compiler to retain. Now please note that the functionality into a functionality of the circuit is not changed. There are only two parameters here area and time. Either it results into a lower area or better timing. Now you have confused about why the functionality is not changed. Please note that pipelining or placing of registers just corrects the sequencing of events and not the logic level of events, right. So for example in case of this full adder earlier the inputs for register registered now the outputs are registered it does not change the functionality of S and C. The functionality of S and C remains same. However it changes the timing characteristic. Now earlier the timing check was done here at the input of this now the timing check will be done here. So let us see let us we will see few more examples to make the concept clear. So the idea is the question is whether do we want to pipeline or not to pipeline. Let us see one example of this code. The left code is VHJ, right code is VALLA, right code is VALLA. We see the VALLA code. Now let us say you have a product AB is equal to AB. Product CD is equal to C into D. Different ES is E minus F. P21 is brought AB plus brought CD. So A into B plus B into D. P22 is DTF. Y is equal to P21 plus P22. Please note although this is coded in single always block each such construct each non blocking construct means one level of F blocks. So what happens when it implicates implemented A into B this is the method that C into D, E into F. This is these are the first level of operation. So start with the right hand side A into B, A into D, A into F. They go into fraud AB, fraud CD and DTF. So there is the outputs. So A into B operation takes place first then goes into fraud AB. Fraud AB is a flaw. This is fraud AB. This is fraud CD. This is DTF. Now fraud AB fraud CD are added and go to P21. DTF goes to P22. DTF goes to P22 directly. Y is equal to P21 plus P22. There is an adder here P21 plus P22. Through a flop it goes to 1. What are the resistance in the design? Fraud AB range, fraud CD range, DTF range, P21 range, P22 range by range right. So this is the only way you can code. You are as a designer the designer has implemented pipelining in this code. By just by the way of RTL code right. Now let us see what happens to the tidal right. Now here these multiplications of the operations are bigger in area and they are slower compared to this adder compared to this adder. So the critical part will be from A to this flop. A to this flop, B to this flop, C to this flop, D to this flop. So these are the parts through multiplications these parts are critical parts as compared to this right. So design violates timing right. Let us say let us take some figure let us take the flop period to be 10 nanoseconds, flop to 2 to be late only 1 nanosecond, flop to 2 to be the flop, take the time of the flop to be 1 nanosecond. Now let us say the adder takes about 4 point the adder takes. So this total multiply here take 16.5 multiply idling. The violation would be 7.5 16. So the time available to you is flop period is 10 nanosecond, flop to 2 delay is 1 nanosecond, adder time is 1 nanosecond. For the for the part A to flop A to flop this part here it will be multiplier delay less than adder time should be less than the the flop period. Assume the input delay to be 0 here. So the let us say the multiply delay is 16.5, flop period is 10, 16.5 plus 1 flop to 2 set of time the data to be arriving 1 nanosecond before the flop is. So 1 nanosecond gets added to be data part delay, 16.5 plus 1 is 17.5 minus the flop period is 10 nanosecond. What is the flop? Since the data arrival time is later than the data required time, data required time is 10 nanosecond, data arrival time is 16.5 percent, 17.5 flat is multiplied by 4.5. But let us talk about this part, this part now. Now let us say the added delay is 4.2, flop to 2 delay or this is 1 nanosecond. So it is 1 plus 4.2 it is 5.2, it should arrive 1 nanosecond prior to the encapsulation at 10. So 5.2 plus 1 6.2. So total data part delay is 6.2, required time is 10 nanosecond. If it is a tractive, slack will be positive 3.8. Similarly you could do the exercise for this. If the data takes 4.3, the slack will be 3.7. Please make sure that you understand these calculations or better take a parent paper and do the calculations yourself. The important thing here is that the multiplier here is slow and there is a volition. Adders here are comparatively faster and the timing is meeting by a good amount, right. Now consider this pipelining, these these blocks are just to facilitate, consider the complete part, this complete part. We see that this slack is 3.8 is available here, 3.7 is available here. And volition is minus 7.5. If I add 3.8 plus 3.7, the sum is 7.5. Ideally end-to-end path should be tiny because the total slack, negative 7.5 plus 3.8 plus 3.7 is below. So ideally, is there a way now to use this? This is called time borrowing, right. This is I want to borrow the time here to there or here or this time to here. Those of these positive slack, I want to borrow it and and meet this volition. Now only then in a pipeline design, I am sure that you would be reading about pipeline design in some of the course also. In pipeline design, the complete path is the design has the best performance whenever each combination cloud is of similar delay. In such a case where one combination cloud like this one, like this cloud here has much more delay than these combination clouds, you are limited by the performance of the performance. For in this case, if you do not do anything special, your frequency would be the lowest frequency at which you can work would be minus 7.5 plus 3.7. So, you get a lower performance. However, if all these clouds were of similar delay, then your frequency would be maximum. In this case, your frequency can still be 10, your time delay can still be 10 moment that is given that if you are able to borrow this time. So, what do we do now? We borrow slack from another stage. How do we do that? Pipeline is fastest when logic is equally balanced, leftover positive slack can be can be used to reduce area. The question is, can we code our HDI to achieve this pattern? Answer is no. We cannot code our HDI. Why? Because this is the only way we can code our HDI. We do not know or want to know the functionality of this multiplier. We are not implementing this multiplier path by structure. We are just using a multiplication sign. We can only place loss at the logical boundary like like done here. We cannot place loss in the middle of the multiplication sign because we do not we do not here. There is no structure of multiplication here. So, the answer is we cannot code our HDI to achieve this value. This can only be done after the implement I mean at the second level of the implementation right. So, at to achieve this balance first of all some structure of multiplication should be available to all. And when there is when the structure is available the structure is available only after the corresponds to compile right. So, what happens after we distribute the slack? So, after we distribute the slack, we move 3.8 to here, we move 3.7 to here, the slack here becomes 0, the delay here becomes 9, the delay here becomes 3, the delay here becomes 6 right. You can do the calculation of papers to confirm that these are correct. So, I believe the slack since the sum of positive slack here equals the negative slack here after distribution the slack should be 0 at all the places and the combination cloud here where equally they are balanced. This here the delay is 8, here the delay is 8, here the delay is 9. Yet I say they are equally balanced why because in the stage 2 and 3 there is additional clock stage. So, clock to Q 1 and clock to Q 1 is here, but here we are assuming the data comes at 0 or after there is no clock here right. We are assuming the data comes at 0. In this stage 2 or stage 3 the data is coming at 1 right. So, please make sure that you understand this, draw the waveforms in paper and see that it you cannot. So, this is if you understand this calculation you will understand the report time. So, I would say go the other way round or go through the lab make sure that you understand the report time in report and then you can do the calculation on paper for this set right. Now, so they so you there are a lot of a lot of instances like this and now we know that we cannot code an SBA to achieve this kind of pattern it is it is it is not possible. So, what is the way out? Way out is register e-timing by DT and so it is a technique to optimize the gate level and it is for timing in possibly register area major yield is for timing right. It used it is used to pipe in a gate level at least that can be used for better throughput. Now, that two ways by which we could do register e-timing one is compile extra minus e-timing. So, the algorithm used by compile enterprise for adaptive e-timing during optimization to improve till it. Second is called optimized register or set optimized register. Now, this is a very extensive e-timing. So, compile extra e-timing is kind of a basically timing it is used for general logic. Optimized register is something it is used for very special basis of complex data form. Again recommendation is to use this first and if you are not able to mil timing then go to optimize. So, what optimize registers that optimized registers there is lot of things it does register moving moves registers across combination elements like this one the graphic area. It does not modify any combination element that is minimum tiller e-timing that is the bandwidth of area that is minimum area e-timing following minimum. So, first it is minimum tiller means that it should be the top frequency that we apply. Then in the second phase it can also do some register some area optimization in phase one this is phase one. Phase two it has an incremental compile boundary optimization combination optimization optimizes combination element between register in any way it supplies the numbers also there is not moving. So, first phase it will do register moving to achieve better timing. In second phase it will optimize the combination area between these two potential elements. So, it will optimize the area timing both formality support very very important. But now you are moving register we will talk about in formality whenever we move registers. So, RTL the placement of this sequential elements is now altered in the matrix. The sequential elements that are in RTL are not at the same phase in a matrix they are moved around. And formality considers all sequential elements as compared to form that means we will see that in formality much more detail but it is a problem of formality. So, we there should be a way by which we tell formality or formality to that my registers have moved. So, BC design compiler provides that kind of support where you can tell formality in a file called SCF that my registers have moved. So, this is an example so existing circuit of block period is 10 nanoseconds one side is violating it is 10.2 other kind is coming from 5. So, now it will move. So, I think after BRT means again BRT is a whole loop for I think register timing something like that. It will optimize register area. So, here it gives some optimization to both it optimize register area. I am not sure what is the circuit here. So, we cannot tell the 100 percent surety what happened here, but since it decrease the signals that are passing the boundary and it did some it did some it did the slack improvement also by boring slack from the stage to the stage. So, the command is optimized registers minus you get the period primary goal is obviously timing secondary goal is area. It preserves from functionality is very important of a design the functionality of your course that wants to be very important. It optimize operates at the gate level register placement is not respected by RTA operator boundary. So, you code it RTA the way you like obviously following guideline and let optimize registers do the best of the job. Now, compile and draw now has most of the. So, there is a command called set optimize register the synopsis testers that it now have most of the options for optimize register. Optimize register is a very expensive command. It has also options to control optimize register. You can delete the mandate of it. So, but ideally what you should do is that and at first part you should try to set optimal register command. It contains most of the from Saturday optimize register. How do you do it? You say set optimize register with some options whatever is available or whatever you want to use it in what you want to do. So, this is where you tell what are designs you want to do with the and then do a compile. So, compile it now will look for the designs on which you have set optimize register and then it will try to retain only that part of the design. So, you could use it collectively on some part of the design right. Now, users can trade off which means you are untying for reset state justification during the timing. So, there is a functionality related to reset state. So, what you could do is that there is one option called justification effort through which you could control what kind of retiming what time how much time does this thing spend in retiming. Now, all again I am stressing the point that all success will increase the run time and all. So, if your register retiming is enabled then your run time should be much more compared to the default compile effort. The option justification effort kind of controls that. So, if you say no then DC will finish quickly at the cost of obviously the quality will not be that way. You could use medium which is default it has the best trade off of the QoR and run time. If you set to high again it has potentially normal level compared to the low and medium right. So, again optimized register also supports optimized register retiming on sub design. The no need to change the current design you just set optimized register to prove and set a design means and do the optimized register and say only after this is done. So, these kind of commands let us say set optimized register, effort don't, effort don't need these kind of commands the set command they will place some attributes on the design on these for example, if I say set optimized register stream and the design mean let us say design mean is X. So, that design X is marked by an attribute called optimized register stream. These kind of these are these things are called attribute like optimized register, don't use, don't touch. So, all these are collectively called attribute and these stay DC. So, let us say the library cell if I set some library cell as don't use it means that at I am telling DC that this cell is attributed as don't use. So, DC will look at attributes in the file whether I want to use this then whether I have to optimize this or whether I have to redesign design and so on. These attributes are very very powerful feature of design compiler and of first time time and you could build DC user internally and as a user we can also use that. For example, I want to report I want to find out the list of all the variables that are going to be set. So, what I could do? I can I can write a follow in typical which will go through all the library cell and it will get the attribute of the library cell the attribute in is don't use and I will check that whether don't use set to go out. So, I can write such programs with powerful programs and design compiler or primetime or any typical interface just by getting the value of attributes. So, there is a list of attributes which are available. Now, these attributes are again designed again update depending there are some attributes which are special for design, there are some attributes for cell, there are some attributes which are very particular to make. So, we will see lot of attributes in in primetime in unit 5 we will see some examples, but here the minus only attribute design means that do the retiming only on design they have this attribute optimize the list of attributes set to go. So, this is the meaning of attributes as opposed to here right. Again compiler has a support for last three timings. So, we saw these many discuss about the timings this was the flop retiming all these cases were of flip flop retiming where the flip flop is moved from one place to other or either duplicated or merged or whatever, but DC ultra also supports compiler also supports last three timings. So, in case of last three timings similar to flop retiming the time borrowing happens the important thing is that the latch stays if you are using a single flop the latch stays should be inverted there will be two plots there are there are latches and you want to share you want to borrow time from one to other one combination clouds or a combination clouds. Now, in this case the latch the middle latch should be should have the same opposite polarity when compared to this latch in this only then the time can be borrowed why because latch as opposed to flip flop is lever triggered and yeah lever triggered during one phase of the clock it is transferring and whenever this launches data this will launch data on the closing edge on the closing edge. So, whenever this closes this should be transferred so that so this so what happens when these two are of different polarity is that this latch behaves the combination element. So, I will see the area. So, in case there are two different clocks they should have alternate phases if there is same clock then they should have different opposite polarity. So, let me yeah now let us say this cloud here is has burst timing I will put a password this time this cloud has burst timing when compared to this cloud this here yeah let us say this cloud here is the one from which I want to borrow this is the cloud with the burst timing compared to this one, but the first question we should ask is that let us say the period is 10 and a second this cloud here takes let us say 6 6 and s and this cloud cloud number 1 takes 6 and s cloud number 2 if it takes 4 and a second. Now, first question we should ask whether the sum of these two clouds is less than the clock frequency in this case let us say I want to need for the time period of 10 seconds I find that over here 6 plus 4 is 10 it is barely meeting. Now, what I can do is by adding a latch here of an opposite polarity what I am trying to do I am making this whole circuit of one single clock while when this will launch data the data will come here at 4 after 4 of the closing edge and that time if this latch is open data will shoot through like a combination of a latch during the transparent phase is like a combination of a buffer. So, this data will go through here and at the capture edge the data will be available after the sum of delays of cloud 1 and cloud 2. Assuming the buffer delay I mean the latch delay in transparent phase is possible assume that call this exercise we see that the data at this point arrives after the delay of cloud 1 and cloud 2. So, in a text the latch here is transparent and we say that the timing it was able to borrow timing from 2 to 1 or from 1 to 2 as the case means. So, this way design compiler optimized registers has the real time support for well. So, see the idea is well formed symmetric 2 phase to not match this is called 2 phase to not match. Ideally your RTL will not have this it is not recommended to design this latches if they are very used in very very special cases some very special circuits. So, just keep this in mind that the support is available and in most of the cases you will not be comfortable, but we will see timing reports. So, now this is we had the same synthesis in unit 5 static timing analysis we will assume that the net list is available to and design is available to us and how do we design that design problem. So, in such cases where the latches are already present in the design you have to make sure that you correctly check the timing and in such cases we will see that how the time borrowing time is involved looks like. It might be a bit more complex on a clock to clock before, but we will see how the time borrowing looks like. So, this is the way it is implemented. Again if you have clocks which have which have if you have latches. So, there can be cases of a latch between two latches there can be case of a latch between two clock right they are those in the case. So, if you have two separate clocks then they should have alternate periods. So, see the transparent here does not hold up with the transparent here. So, whenever this launches j-tiles the closing edge this latch is already transparent that is the ability of latch treating right. So, this ends the this the session of advanced simple set means. So, many first step you should not use any of these techniques. The first rule is some final trash should be used first with default options and the quality of result analysis. What comes out is the quality of results. Quality of results means you should check the timing, you should check the area you should check the report substrate minus all validers and make sure that your design is waiting on the goals. In case your design does not meet all the goals you can use advanced synthesis options. First you should check whether you are using compiler compiler as the use compiler truck in most of the cases. If your goals are not being met use advanced synthesis options and even in advanced synthesis options do not go to the time do not jump to minimum. First verify that the goals are realistic use some you can use group paths and. So, let us let us talk about timing problems. So, if you encounter timing problems first of all use group paths see if any other group has volition or the critical path in your group is causing some other paths not to for some paths in your group some like we saw in lab group. Some paths in the same group some paths which are of low priority like IO path input and output path can cause a register to register path to violate. So, make sure that your groups are correctly treated and let us say for very critical path use separate group need a separate group so that we will be able to focus more on that. So, first use group paths you can use critical range also if you want to make sure that your timing is not met this flag 0, but you should have positive flag use critical range for that each of this will increase the run time of this then come to map effort high map effort high means advance critical path use that even if your design timing is not met then again check whether your design goals are realistic and in the last three timing should always be the last right. Now, there are design components like multipliers multipliers a big block they are so, so they most of the time they are timing problem. So, you could use a pipeline multipliers from designing your lab group what is pipeline multipliers pipeline multipliers has internal pipeline building, but that pipeline is not a fixed loading these will move around the pipeline based on what is your timing position. So, it has an inbuilt re-timing built into right. So, you can use such options you can go through designer literature and so, there will be a data file there will be a data file file and so on right. So, but please note that all these advance features like re-timing or yeah mostly re-timing will or sequential output inversion will cause formality issues if you have to make sure that you are you are well knows with how to solve these components right. So, in the in unit 4 further we will discuss about the power optimization in the next session we will see what kind of we will see first the basics of power what kind of power input I will talking about how do you optimize power and then we will move on to unit 5 which is started from here.