 lecture on performance and obviously, by that I mean performance of the computer system. And here is the outline of today's lecture. After giving a brief introduction, I shall define what do you really mean by performance. Then I shall discuss about the iron law of processor performance, the various factors on which the performance depends. And then we shall discuss what do you really mean by processor performance enhancement. And then I shall discuss about performance evaluation approaches after the performance is defined. How can we evaluate the performance and the various approaches that can be used. And another very important aspect is performance reporting. How you can report the performance by using a single number. And finally, I shall conclude my lecture by discussing Amdahl's law which is related to performance measurement. Performance measurement is very important because it helps us to define if one processor works faster than the other. That means we are discussing about the high performance computer architecture. Obviously, we shall be discussing various techniques by which the performance of a processor can be improved or enhanced. And in that context, the performance plays a very key role and we have to see how really you can measure performance and see it. So, also it helps us to know how much performance improvement has taken place after incorporating some performance enhancement feature. So, you will see that we shall be incorporating various performance enhancement features whether in a compiler or in the organization of the computer like pipelining and various other things. And by doing so, how the performance is improving that we shall know from this performance measurement. And it also helps to see through the marketing hype. You know there is a marketing hype that whenever a new processor is introduced people say this processor is so much better than other processors. So, this how true is that hype that can be evaluated with the help of performance measurement. It will also provide answers to the following questions. Number one is why is some hardware better than others for different programs. So, it will be measuring performance obviously by running programs and how a particular processor is performing better for a particular program than other program. So, that particular analysis is very important and that answer you will get from this particular topic. Then what factors affects system performance? You will find that there are various factors which affects the performance. Obviously, the first thing is the hardware. The hardware means the processor by the way it is implemented and incorporating arithmetic and logic unit, then register file, control unit and so on. So, the hardware which implements the processor that will definitely play a very important role. Second important parameter is operating system. Operating system will schedule various tasks to the processor and obviously operating system will also play a very important role in deciding the performance of a processor. And also you know you will be using whenever you will be writing program in a high level language it is like C, Portrait and other programming languages it has to be converted into machine language before you can run on a processor and obviously the efficiency of the compiler, the performance of the compiler will also affect the overall performance of the system. So, these are the various factors which will affect the performance. And last but not the least, how does machine instruction set affects performance? You know a particular processor is characterized with the help of an instruction set as I have briefly discussed in my last lecture. Now, that instruction set can be different for example, for a RISC processor there can be one type of instruction set, for SISC processors the instruction set is different and as depending on the complexity and nature of instruction which we shall call as instruction set architecture how the performance depends on that. So, that those aspects are to be studied. Question is how do you really measure performance? Obviously it is time, time, time and time. Time is the ultimate measure of performance and what kind of time? You will find that a computer exhibits higher performance if it executes program faster. Similarly, whenever you are trying to measure a performance you will be measuring the execution time, faster is the execution time the performance is better, but whenever you measure the time there are various ways you can do it for example, if you think from individual perspective then response time or elapse time is the factor which is important to an individual user. An individual user will submit a job, submit a task and after the task is submitted when he gets the result that is the elapse time that time is important to an user because that is the time for which one has to wait for getting the result. On the other hand for a system manager the perspective is different, they are from system manager's perspective throughput is the most important parameter because he is more interested in how many jobs can the machine run at once. That means in a multi-user, multi-programming environment a computer is executing many tasks of many users and obviously how many tasks per unit time is executed by the processor that is important to the system manager. And what is the average execution time and how much work is getting done by the computer. So, you can see these two are not really same response time and throughput these two are not same and from individual point of view people will be interested in response time for system manager's point of view he will be more interested in throughput. Let us see what do you really mean by elapse time, so it counts everything this can memory accesses waiting for I O running other program etcetera from start to finish. That means a job will be submitted and a fraction of CPU time a particular user will get during the running of his task and obviously there will be other times like the time required to switch from one task to another task waiting time for I O and the operating system time these are the various times that will come. You can state it in terms of a number and that number elapse time is CPU time plus wait time, wait time can be for waiting for I O it can be waiting for other programs running etcetera. It can be for because of page fault later on we shall discuss about all these things. Then comes the CPU time, so the CPU time does not count waiting for I O or time spent running other programs. So, it simply finds out the time required to perform a particular task, so it can be divided into CPU time, user CPU time plus system CPU time. Obviously operating system calls will be involved when a particular user is running a program. CPU time is equal to user CPU time plus system CPU time. Then elapse time is equal to user CPU time plus system CPU time plus wait time, so that means elapse time which a particular user encounters can be divided into three components user CPU time plus system CPU time plus wait time. But for this particular course our focus is on user CPU time, we shall not be bothered about system CPU time or the wait time when the processor is running somebody else's job. So, we shall be primarily concerned about user CPU time that is the CPU execution time or simply we shall call it execution time, so that is the time spent executing the lines of code that are in our program. So, a particular user is running a program and how much time is required to run that particular program that is the time that will be used as a measure of performance. Now whenever you try to measure a performance, you know performance is a relative thing for some program running on machine x is equal to performance x is equal to 1 by execution time. So, execution time and performance is inversely related because larger the execution time performance is worse smaller is the execution time performance is better, so that means larger execution time will lead to inferior performance. Now whenever you compare x with y, x is n times faster than y by that we mean performance x by performance y is equal to n, so this is how you shall try to measure performance. Now before I go into these topics, another very important aspect I should tell you know you may be knowing that a computer is nothing but a sequential circuit. A sequential circuit or finite state machine will require a clock, so we are interested about time. How do you relate time? Actually it is related to clock, a computer is controlled by a clock a processor, so a clock is nothing but a repetitive way of form you know and this is called the time period. Time period of the clock usually it is stated in terms of second or may be whenever it is very small it is stated in terms of microsecond or sometimes it is nanosecond as the speed is increasing, so it is some form of time in terms of second may be microsecond nanosecond. So, this is may be called one clock period or time period, so if tau is the time period then the frequency of the clock is related to time period in this manner is 1 by tau, so sometimes you will see we shall try to express the execution time in terms of number of clocks because ultimately the processor is controlled by a clock and number of clocks is related to the execution time. So, let us see how we are going to express performance in terms of this various clock frequency or clock time period, so processor performance is equal to time required to execute a program and it comprises three components, so we can say that you know a program is a set of instructions, then an instruction requires few cycles may be one, few cycles for few clock cycles to execute an instructions, then one cycle will require some time, so you can see we have got three important parameters say program can be decomposed into a number of instructions which you may call instruction count, then each instruction will require one or few cycles and then each cycle will take some time, so this is what is shown here, so that means the processor performance is nothing but the instructions number of instructions per program you may call it instruction count, so it depends on the size of the code and then the number of cycles per instruction that is known as CPI or cycles per instruction, so that the CPI that cycles per instruction can be one, can be more than one and later on we shall see it can be less than one as well, so and then cycle time clock cycle time I have already explained that is the time period, this is the clock cycle time that time period is can also change and so it is a you have to consider all these three parameters and you will see that these three parameters is affected by three important aspects, one is your architecture, architecture is represented by the instruction set architecture I have briefly discussed in my last lecture and in the next lecture I shall elaborate it in more details, then the processor is implemented for a given instruction set architecture, so processor instruction set processor is essentially is represented by the implementation, so then comes the realization, realization is implementing the a particular processor instruction set processor with the help of some electronic circuit, may be transistors or integrated circuits or VLSI chips, so you can see there are several designers are involved in executing a program or in the design of the system, compiler designer, processor designer and chip designer and their design will affect the overall processor performance, so these three factors are to be considered when we are considering processor performance, so actually this is known as iron law of processor performance is the product of these three terms code size that is instruction count CPI and cycle time, now that number of instructions per program or the instruction count will depend on obviously you may be asking what do you really mean by number of instructions per program, say a program that source code will consist of several instructions, now that is static in nature but whenever you execute then we call it that number of instructions that is executed by the processor, so that means the size of the source code has got nothing to do with the or is really independent on the dynamic size of the code that is executed by the processor, so we are more interested on the dynamic size not the static size of the code, so static size of the code can be very small, say you have written a program where it is looping, so looping within this size is this but it may be looping 1000 times, so that means this code will be repeated 1000 times, so you have to take into consideration that the dynamic size that means total number of instructions that is being executed, so do not get confused with the source code size, so it is essentially the instructions executed not the static code size, so this particular factor instruction count is dependent on three important parameters, it is first of all it depends on the algorithm, you are implementing a particular algorithm and that the way the algorithm is implemented, you can devise some better algorithm to reduce the size of the code, it also depends on the compiler, compiler can perform a number of optimizations to reduce the size of the code, so those who have attended a course on compiler, they must have studied various compiler optimization techniques, so the size of the dynamic code will dependant on the compiler then last but not the least it will depend on the instruction set architecture that means the instructions which can be executed by the processor, so you can see the instruction count is dependent on several factors, then cycles per instructions is dependent on the ISA and the CPU organization that means that instruction set architecture as I said it can be CISC complex instruction set architecture, it can be reduced instruction set architecture, so depending on the complexity of the instructions, the cycles that will require that means average number of cycles that is required by instructions that means how do you measure CPI, CPI you measure CPI is equal to total number of instructions executed or you can say time to execute number of cycles, so not this total number of instruction cycles, total number of not instruction number of I should say clock cycles by total number of instructions, so these are the ratio we will decide the CPU, so total number of clock cycles by total number of instructions for a particular program, so it is the dynamic that number is dynamic dynamic in the sense while executing what is the total number of instructions, so it is determined by the ISA and the CPU organization, the way the CPU is organized later on we shall discuss about in detail about this and overlap among instructions reduces this term, so we shall discuss about different techniques for reducing the CPI like techniques like pipelining and other things by which this CPI can be reduced, then the last term the cycle time it is determined by the technology organization and clever circuit design, so first factor is determined by the technology, technology means the VLSI technology as you know the VLSI technology is improving over time and you may have heard of Moore's law I briefly mentioned in my last lecture about it and as per Moore's law as you know the size of the devices is reducing every 18 months the size is becoming half and as the size is becoming half the capacitance is getting reduced and it is becoming faster, so the technology is determining the cycle time, cycle time in the early years it was microsecond, now it has become nanosecond because clock rate of the processor as you know earlier the clock frequency was few megahertz, now the clock frequency of the modern processors are stated in terms of gigahertz, how it has happened that is primarily because of the advancement of technology and of course it is also dependent on the organization particularly pipelining and clever circuit design, so you can see these three factors together decides the processor performance and obviously whenever you decide about improving to improve the performance of the processor, you will see that all processor performance enhancement technique boils down to reducing one or more of these three terms, so in the here we have discussed about three terms, you will see that whenever we try to improve the processor performance either we try to reduce the instruction count or we try to reduce the CPI or we try to reduce the cycle time or sometimes more than one of these two terms together we try to reduce, so now the question is some techniques can be used to reduce one term without affecting others, in other words what I am trying to tell there exist some techniques which does not affect others, for example whenever the technology is enhanced, when we are going from one technology generation to next technology generation circuit is becoming faster, so other things are remaining same, so the cycle time is getting reduced, but CPI may be remaining will remain unaffected and also the instruction count also will remain unaffected, if only the technology enhancement is considered, so the one is improved hardware technology, similarly whenever you go for compiler optimization techniques, you remove some dead code and various other compiler optimization techniques you use, what happens by using this techniques the instruction count reduces without affecting CPI or cycle time, so such type of performance optimization techniques are preferred, because these techniques does not affect other parameters, only it affects only one of the three parameters, so these are preferred, however there exist other techniques which are interrelated some techniques can reduce one of the terms, but may increase other terms, so let me explain with the help of example, let us consider SISC and RISC, so whenever you go for complex instruction set architecture, we know that instruction count reduces, so instruction count reduces whenever you go for SISC, so on the other hand in case of RISC, this instruction count increases, because number of instructions required for a SISC, RISC processor is roughly three times, may be three times more than SISC processors, so we find that one particular parameter is decreasing for SISC and increasing for RISC, on the other hand if you consider CPI cycles per instruction, then you will find whenever you are executing a complex instruction, obviously the number of cycles required to execute a complex instruction number of cycles will be more, so it will increase the CPI, on the other hand for a RISC processor, since the instructions are simple and the number of cycles required to execute a simple instruction obviously the CPI will reduce, so you can see whenever we go for I mean decide compare between SISC and RISC, we can see for one particular parameter is getting reduced, on the other hand other particular parameter is increasing, so as a consequence we cannot really say that this is better, this is inferior, because there is some kind of interrelationship, so SISC RISC reduces instruction count, but increases CPI. Similarly, another technique we shall discuss later on which is known as loop unrolling, so with the help of loop unrolling what we do, a particular loop is unrolled, say for example, a loop without unrolling may require the lesser memory in your program and obviously, whenever you do loop unrolling what happens, the number of instructions increases, the code size increases, no sorry the dynamic code size will reduce, because whenever you do loop unrolling, the static code size reduces but increases, but dynamic code size reduces, because many loop unrolling those decisions wherever you take, if there will be decision making things, those decision making will be reduced whenever you go for loop unrolling, so you will see the code size will reduce, that dynamic code size I should write dynamic code size, dynamic code size will reduce in a loop unrolling technique, that means we can say that instruction count will reduce. However, what will happen whenever you are executing a program, because of the static size of the code that you have to load in a program, what can happen it will lead to increase in CPI, because of hazards, whenever your static code size increases, the static code size increases then the hazards increases as a result the cycles per instruction increases. So, later on we shall discuss in more detail about loop unrolling and we shall see how dynamic code size reduces, but CPI can increase, because of various hazards that may occur whenever you try to execute a code. So, the loop unrolling reduces instruction count but increases CPI, so we can see there are many factors which are interrelated and in such cases we have to be very careful whenever you try to measure the performance by using these techniques. Earlier one very important parameter was MIPS or M Flops, MIPS stands for million instructions per second and M Flops stands for million floating point operations per second, so these two were extensively used 30 years back as a measure of processor performance that means higher the MIPS rating the one used to consider a processor used to be considered faster. Similarly, higher the M Flops rating a processor used to be considered faster, but it has some drawbacks we shall see what you really mean by MIPS, MIPS is instruction count by execution time into 10 to power 6 or we can say that clock rate by CPI cycles per instruction into 10 to power 6. So, MIPS can be calculated by executing a program and it can be used for comparison, but let us see what kind of problem we face whenever we use MIPS as a measure for performance. We encounter three significant problems when we use MIPS and these problems are so severe that somebody commented meaningless information about processing speed. So, although for many years MIPS were used as a metric for performance measurement for a long time. One important factor is MIPS is instruction set independent as I have already told that instruction set architecture plays a very important role in deciding the performance of a processor, but it can be shown that MIPS is independent of the instruction set architecture. The reason for that is you know it is dependent on the technology of the processor we have seen that clock rate by CPI into 10 to power 6. So, it can be instruction set MIPS is instruction set dependent that means the instruction set dependency is occurring because you can see CPI is there CPI is instruction set dependent and as a consequence the MIPS is dependent on the instruction set simply we cannot really tell in terms of MIPS we have to also take into consideration the instruction set architecture. Second is MIPS varies between programs on the same computer that means whenever you take different programs and run on a single processor then you will find that MIPS rating is different for different programs that means you cannot really tell that higher the MIPS rating means this is better because for a particular program the value of MIPS may be better and for another program value of MIPS may be inferior which I shall illustrate with example. Third problem is MIPS can vary inversely to performance I mean that is the reason why somebody commented meaningless information about processing speed. So, let us illustrate with the help of example and why MIPS does not work. So, let us consider the following computer and we are using two compilers compiler one and compiler two which are designed for the same processor same computer and we have got three types of instructions A category A category B and category C category A instructions require one cycles category B instructions require two cycles category C requires three cycles. Now, a compiler one generates category A instructions which is 5 and category B generates one instruction of category B and category C another one instruction. On the other hand compiler two generates ten instructions of category one one instruction of category B and one instruction of category C. So, your CPI will be equal to CPU clock cycles by instruction count that is your CPI and now what you can do you can measure the total number of CPI into n take the summation by the instruction count for all instructions and then you can found out the CPI for the compiler one and CPI for compiler two. So, for CPI one for compiler one is we have seen that the type of instructions is 5 that means I mean 5 is the number of instructions. So, it requires one cycle then one instruction of two cycles another one instruction for three cycles into 10 to power 6 and CPI and total number of instructions is 5 plus 1 plus 1 into 10 to power 6. So, we get a CPI cycles for instruction is 1.43 that is for compiler one and so MIPS rating will be 100 megahertz by 1.43 that means so it is operating at 100 megahertz. So, 100 megahertz by 1.43 we get MIPS rating of 69.9 so that is the MIPS rating for compiler one on the same processor. Now, CPI for compiler two can be calculated in a similar way 10 into 1 plus 1 into 2 plus 1 into 3 summation of that into 10 to power 6 by 10 plus 1 plus 1 into 10 to power 6 that is the total number of instructions that is your instruction count and this gives you a CPI of 1.25. So, we find that MIPS rating for compiler two on the same processor is 80 so 100 megahertz by 1.25. So, we find that compiler two has higher MIPS rating. So, it should be faster because MIPS rating for compiler one for the processor is 69.1 and here it is 80.0. So, the compiler two has a higher MIPS rating and it should be faster, but now let us see whenever we translate it in terms of CPU time that is instruction count into CPI by clock rate for the compiler one we get that is equal to 0.10 second that is the execution time and for the compiler the code that is generated by compiler two that execution time is 0.15 second. So, we find in this case in earlier we found that in terms of MIPS rating that compiler two was giving you better performance, but on the other hand you know the CPU time corresponding to the code generated by compiler one is lesser. So, therefore, program one is faster despite lower MIPS rating. So, we can say MIPS rating is not really reflecting the processor performance and what you can do? You can calculate overall CPI in this way this is the instruction set architecture different types of instructions ALU operations 50 percent, load instructions 20 percent, store instructions 10 percent, branch instruction 20 percent and these are the corresponding CPI and these are the frequency of appearance in a particular program. So, for a particular instruction for a particular instruction for the particular instruction mix and you can find out the overall CPI in this way 1 into 0.4 plus 2 into 0.27 plus 2 into 0.13 plus 5 into 0.2. So, this gives you overall CPI of 2.2. So, for a particular program. So, you find this is how you can calculate CPI. Now, whenever we try to measure performance you have to use some program what kind of program and these programs are known as benchmark programs. So, you can see the benchmark programs can be can have 5 different levels. Number one is real applications real applications that will be running in your computer in your day to day applications like compilers, editors, various scientific programs, graphics applications and so on. And unfortunately for these real applications there is a problem of portability because these applications will be dependent on the operating system as well as on the compiler. For different computers the operating system can be different compiler can be different as a result portability is a problem whenever you try to compare performance with the help of real applications. So, instead of that one can consider modified applications. That means you take you consider a particular application then you modify a particular application and tailor it and improve the portability. So, that the portability is improved or it can test specific features of the CPU. So, specific features of the CPU that means graphics feature or digital signal application DSP features that may be present those particular aspects can be specially tested by modifying the applications. Then the third level of benchmarks are known as kernels. Kernels are very small and key pieces of real applications and since these programs are very simple can be 10 to 100 lines of code and examples of those are Livermore loops 24 loops kernels and linear algebra package these can be used as for the measure of performance. So, these are known as kernels. Then the fourth category are toy benchmarks which are also simple programs may be 10 to 100 lines of codes and which can which are easy to type and run on almost all computers. These are the applications which are typically given as assignment in your to the students may be in the first year like quick sort, merge sort. These programs can be used can be considered as toy benchmarks which can be used for the purpose of testing. However, there is another level which are known as synthetic benchmarks. So, synthetic benchmark means you have created some benchmark to analyze the distribution of instructions over a large number of practical programs. That means you have an instruction set architecture you want to test how different types of instructions are executed by the processors. So, some synthetic benchmarks are created and synthesize a program that has the same instruction distribution as a typical program. However, these programs have no real meaning to a user because these are they do not give you any meaningful result and examples of these synthetic benchmarks are dry stone, coroner stone, lin pack these are the some older benchmark problems. Nowadays, however, people depend on spec. Spec is system performance evaluation cooperative. So, this is the recently used popular approach where a collection of benchmarks are put together to measure the performance of a variety of applications. So, here we are not dependent on a particular applications. We have chosen applications from different fields which are used to measure the performance. So, spec is a non-profit organization. This is their website www.spec.org and they have developed benchmark programs which can be CPU intensive benchmark for evaluating processor performance for workstation. So, different benchmark programs have been developed for the measurement of performance of workstations, servers and so on. So, this is the history of spec. First round was spec CPU 89, 10 programs yielding a single number. Then second round spec CPU 92 where 6 integer programs and 14 floating point programs were used. Then of course, compiler flags can be set differently for different programs in this particular case. Third round was spec CPU 95. So, you can see they have been enhanced and the benchmark programs have been changed at the processor technology improved. So, here 8 integer programs and 10 floating point programs are used and in this particular case single flag setting is allowed for all programs. We have seen in the previous case the compiler flag settings can be different for different programs. Then the fourth round is spec CPU 2000 which is presently used. Specint 2000 has got 12 integer programs and spec CFP 2000 has floating point programs and single flag setting for all the programs. These programs are written either in C or C++ or 477 or 4190. So, here is the least for integer component of spec CPU 2000. So, you can see there are 12 programs written in C or C++ and performing different functions like compression, FPGA circuit placement and routing, C programming language compiler and so on. Then you have got the floating point component of spec CPU 2000 where you have got 14 different programs written in C or 477 or 4190 and various functions which are performed are given here. So, you can see physics, quantum, promo dynamics, shallow water modeling. So, various applications from different fields of scientific computing, image recognition, seismic wave propagation, image processing, computational chemistry, number theory. So, instead of putting I mean considering a one particular application. So, different applications of different fields have been taken to evaluate the performance. Then comes the question of reporting. How do you really report with the help of a single number? So, we have run may be 14, 12 integer programs and 14 floating point programs and they are to be compiled and a single number has to be used to give the measure of performance. How can it be done? So, obviously whenever you want to do that you can visit this website for more details and documentation. And whatever measure we use, it should reflect the execution time. So, the single number result can be either arithmetic mean or it can be geometric mean of normalized ratios for each code in the suit. It has been found that arithmetic mean although it gives you some measure of execution time it is not very good. Because you know if you take one computer as reference you get one value. If you take another computer as reference you get different value. So, arithmetic mean has some lacuna or pitfall. Similarly, geometric mean is although it is good because it gives a single number, but unfortunately it does not give you the measure of execution time. Another term that is used is weighted arithmetic mean which summarizes performance while tracking execution time. So, this has been found to be quite good. So, in addition to using a benchmark suite what you have to do? You have to report precise description of the machine because platform plays a very important role. What CPU you are using? What processor you are using? What is the on chip cache? What is the off chip cache? What is the main memory size? These all these parameters will affect the execution time. So, this particular whenever you report you have to give precise description of the machine because if you change some parameter of the machine if you increase the cache memory size execution time can be different. So, platform information has to be provided whenever you report the performance. Then comes the compiler flag setting. So, what compiler flag setting has been used? Whenever you use different compilers for different programs like C, C++, Fortran 77 or Fortran 90. Our discussion will not be complete without considering Amdahl's law. So, it quantifies overall performance gain due to improve in a part of computation. Normally, you know we cannot really improve performance of all the aspects. For example, we may improve floating point processing by adding a co-processor. So, only performance on floating point program execution will improve, but not they are not the other type of programs. So, in that context Amdahl's law states that performance improvement gain from using some faster mode of execution is limited by the amount of time the enhancement is actually used. That means you have to consider the gain that has taken place for that part only. So, you can say that speed up that is achieved by improving performance of a particular aspect is equal to the ratio of execution time for the task without enhancement by execution time for a task using enhancement. That enhancement can come in different forms may be in the form of the floating point processor and or the compiler or various other aspects. So, speed up tells us how much faster a machine will run due to enhancement and whenever you use Amdahl's law two things you should consider. Number one is fraction of the computation time in the original machine that can use the enhancement. So, if a program executes 30 seconds and 15 seconds of execution uses enhancement that fraction is half. That means that entire program may not be using that enhancement that has been incorporated in the system. So, that is what is being stated in this. Second is improvement gain by the enhancement you have to consider overall enhancement. If enhancement task takes 3.5 seconds and original task took 7 seconds we say speed up is 2. Now, let us see the formula that we can use for the speed up. So, you can say that execution time for the new system after enhancement is equal to execution time old into 1 minus fraction that is enhanced plus 1 minus fraction that is enhanced by speed up that is enhanced. So, fraction of enhancement that takes place is taken into consideration in this formula and you can find out the new execution time using this formula and then you can find out the overall speed up that is equal to execution time by the old system and by the execution time by the new system and that you can find out from this that is equal to 1 by 1 minus fraction enhancement this one plus fraction enhanced by speed up enhanced. So, this formula there is do not try to just memorize it these equations and these equations and plug numbers into them. So, what you should do it is always important to think about the problems to you have to consider what problem you are testing for what application those problems are decided and accordingly you have to choose you have to see on what aspect the improvement has to be done and that will lead to enhancement in performance. So, we can summarize our lectures by mentioning the points to remember of this lecture. First of all we have seen that processor performance is dependent on three factors the code size that is instruction count then into CPI cycles per instruction and cycle time. So, you have to consider all these three together whenever you try to compare processor performance and particularly we have seen these terms are interrelated. So, you have to minimize time which is a product not the isolated terms. So, you may reduce cycle time, but it may affect others as we have already seen. So, you have to consider all these three factors together. So, in the use of benchmark suit to measure performance I have already told the use of spec and you have to report with the help of single number and to do that we have seen you can use different techniques, but we have seen that weighted arithmetic mean gives you good result because it ultimately tracks the execution time. So, with this we have come to the end of today's lecture on processor performance. In the next lecture we shall discuss about the instruction set architecture. Thank you.