 to today's lecture on case studies. So far we have discussed different types of architectural features, various innovations which can be used to realize processors. So today I shall discuss about some case studies particularly we shall focus on evolution of Intel processors. So we shall see how Intel processors have evolved over the years and essentially the objective is to examine how various architectural innovations we have discussed so far have been incorporated in the Intel processors. So if we look at the history of microprocessors we will find that microprocessor was invented sometime in 1971 and the first processor was 4004 that was a 4 bit processor developed by Intel and it was having only 4096 4 bit memory and it was it has very limited processing capability and it was primarily used for video games calculators then microprocessor based controllers and the instruction set was very primitive and subsequently another processor that was 8 bit was introduced that is your 8008 this is the first 8 bit processor microprocessor you can say and memory was extended from 4 k 4 bit to 16 k 8 bit and instruction set was also upgraded then back in 1973 another version of 8008 known as 8080 which was much better processor having many modern features that was introduced and subsequently in 1977 the one of the most popular microprocessor that was 8085 that was proposed. So this 8085 itself is a 8 bit processor and it has many features like instruction set was quite rich and 100 million copies were sold by Intel itself and not only Intel was manufacturing it many other companies like Toshiba, AMD, NEC, Hitachi they got license and they manufactured the same chip. So based on the success the history progressed and then subsequently from 8 bit to 16 bit processors were evolved particularly 8086 and 8088 that was the 16 bit processor and it has many modern features and has powerful processing capability of course much lower processing capability in present day context but because of larger memory space 1 megabyte addressable RAM and all the registers were 16 bit registers 16 bit data pass and it was possible to have a separate floating point processing unit 8087 which is known as co-processor and because of all these features the full-fledged computer known as IBM PC was developed based on 8088 which was essentially a simplified version of 8086 that means the bus was 8 bit but internal capabilities were same as 8086 and based on the success subsequent processors were developed as we can see we got X6 series of microprocessors for example Intel 8086 that was evolved into 80186 80286 then these were all 16 bit processors subsequently 32 bit processors i32 processor family were introduced by Intel then P6 Pentium 6 processor family were introduced and subsequently net bus family of processors were introduced which we shall discuss briefly in today and in the next lecture so this is the internal architecture of 8086 processor as you can see it has got two distinct units one is known as execution unit another is bus interface unit and these two units are essentially work in an overlap manner these are I mean they used in a pipeline manner so the pipelining was introduced with this processor and this bus interface unit was accessing memory and fetching instructions and so at this generation bus control all these things were generated with the help of this bus interface unit and of course there are some internal registers like segment registers instruction pointer which are used for generating the effective address and after fetching the instructions the instructions were stored in a queue known as instruction queue and from this instruction queue instructions were taken by the execution unit for processing the instructions and the instruction unit comprises the ALU registers various registers like general purpose registers and other registers where you can store operand flag bits and so on. So, this is how the pipelining was introduced in Intel processor and 8086 was the beginning of that and as I mentioned 8088 a minor refinement of 8086 was introduced and that became very popular in their IBM PC computer and then 80186 is an extension of 8086 and 80286 which is little bit powerful than 8086 a reasonably successful extension of 8086 then that IA32 that 32 bit processors 80386 Intel first 32 bit processor that was introduced and 80486 a much improved version of 80386 was proposed and that which used instruction pipe we shall discuss about their internal architecture in more detail. Then 80586 a much improved 8048 version of 80486 was introduced that was named as Pentium and subsequently some upgradations of 80586 were proposed 80686 an improved version of 80586 named as Pentium Pro then 80586 MMX a refined version of 80586 faster and with MMX extensions that is for multimedia processing named as Pentium MMX then 80686 MMX refinement of 80686 a faster with MMX extensions multimedia extensions named as Pentium II and 80686 plus MMX plus SSI SSI essentially provides you much more powerful processing capability and leading to Pentium III then IA64 processors the newer Pentium processors that is Intel's atoms of 64 bit architecture which I shall discuss deeply one after the other first let us focus on 80186 and 80188 just like 8086 and 8088 this 80186 and 80188 they introduce two processors and this 80186 and 80188 may be considered as an embedded controller with the following enhancement of 8086 and 8088. Now you see Intel processors they targeted their processors in two directions one is for embedded applications means which should require minimum number of chips minimum number of ICs and with the help of that a complete system can be built and which will go in various embedded systems like printer various communication equipment and so on. So what was done the 80186 incorporated that internal clock generator earlier in 8086 that clock generator were outside the processor so 8084A was the clock generator IC that was externally attack connected to 8086 that was included as part of the chip and built in programmable interrupt controller interrupt controller was also outside the chip in 8086 so that was also put inside the chip then built in three programmable 16 bit timers that was incorporated as part of the chip then built in two channel programmable DMA controller. So DMA controller is necessary outside the chip in case of 8086 but here as long as the DMA and you are satisfied with two channel DMA interface then two channel programmable DMA unit is built in as part of 80186 and built in programmable memory and IO decoder whenever you build a complete system you will require some decoder ICs those decoder ICs will select memory devices IO devices and so on. So without this using this decoder ICs what was done in 808186 itself some of these decoder and ICs were incorporated. So all these features facilitated system implementation with reduced chip count which is a very important requirement for embedded applications. So this is the internal diagram as you can see apart from the processor this is the processor execution unit it has got clock generator programmable interrupt controller programmable timer counter which are needed in many applications to generate time delays and different types of counters in different embedded applications then DMA this is the two channel DMA controller and this is the programmable I mean chip select unit as I mentioned that you will require decoders for selecting memory and IO devices this performs a function of I mean providing the chip select signals then the bus interface unit as it which was present in the earlier 8086 that is also present. So as I already mentioned clock generator is replaced by an internal clock generator similarly programmable interrupt controller the built-in programmable interrupt controller controls all internal and external interrupts and controls up to two external 808259a PIC. You see normally large number of input IO devices operate in the interrupt event mode. The data transfer can take place using three distinct modes one is known as synchronous mode another is asynchronous mode and third one is interrupt event mode interrupt event mode is very suitable when the speed of the IO devices is not compatible with that of the processor and large number of IO devices like say mouse and various types of IO devices that you attach in modern processors which can be interfaced with the help of this interrupt event data transfer technique and since large number of IO devices are connected you require external programmable interrupt controller like 808259a. So this programmable interrupt controller which is built-in can be used to interface two additional external programmable interrupt controller like 808259a and without this external interrupt controller it provides you five interrupt inputs INT0 to INT3 and NMI stands for non-mascable interrupt. So interrupt inputs can be broadly divided into two groups one is known as non-mascable interrupt at least one has to be provided particularly for emergency situations which cannot be disabled. On the other hand you may have a number of mascable interrupts and the number of mascable interrupts provided is five in case of 8086 sorry four INT0 to INT3 and five interrupts have been found to be adequate for many applications and however when it is not sufficient we can use external interrupts. Then timer counters as I said which is required in many applications which can also be used to implement watchdog trimmers which are necessary in embedded systems. If an embedded system fails and if it does not respond within some duration then watchdog timer will automatically reset the processor. Sometimes the processor is in an infinite loop or is not able to perform the task for which it is designed. In such a case watchdog timer helps and that is also provided in 80186. Then DMM unit as I said it provides two channel and then programmable chip select unit for selecting it has six output lines to select memory and seven lines for selecting IO. So you can interface external memory and IR devices without the need for additional decoder ICs. Then coming to 80286 you see the 80186 was meant for embedded applications. On the other hand 80286 is an extension of 8086 that was primarily used for making computers. So the processing capability were enhanced in 80286 and another new feature was incorporated that is memory management unit. We have already discussed about the need for memory management unit in the context of virtual memory. Virtual memory allows you to have larger address space. In fact that is what was done in case of 80286. Since the primary purpose was to develop computer the address space was extended from 16MB physical memory and also the virtual memory 1GB virtual memory using the built in memory management section. So both physical memory and virtual memory were enhanced with the help of this memory management unit. So this is the internal diagram as you can see it has got the various registers, the processor that for this is how the address generation is done and this part is the bus interface unit that which generates the different addresses and other things. And this is that instruction unit where the instruction decoding takes place and so on. Then coming to the 32 bit family back in 1985 Intel 386 was proposed with 4GB addressable RAM all the registers were of 32 bit and it allows virtual memory using pezing concept. So it is very suitable for building computers. Then in 1989 80486 was introduced with instruction pipelining. So far there was no pipelining I mean up to 80386 pipelining was done in a limited way I mean there was no separate pipeline for instruction and data. So it was a single pipeline but now you will see that separate pipelining I shall discuss in little more detail about it. Then Pentium was introduced in 1993 which was having superscalar architecture. So the superscalar capability was introduced for the first time in Pentium with 32 bit address bus and 64 bit internal data bus. So as I mentioned 8086 revolving to IA32 family with 32 bit processors and basic features of these IA32 bit processors are their SISC processors. As you know main difference between SISC processors and RISC processors are the RISC processors reduce instruction set computer RISC stands for reduce instruction set computer. And in RISC processors the usually the instruction length is fixed for example in MIPS processor the instruction length is fixed but in case of SISC processors the complex instruction set computer architecture there the instruction length is variable and it provides you it gives you many complex addressing modes which are not provided in RISC processors. And this IA32 processors turned out to be the most dominant architecture of the time in terms of sales volume that means the computers PCs personal computers were built not only personal computers workstations and desktop systems were built using these 32 bit processors and the sales volume was quite high. And as I have already mentioned this Intel 386 was introduced in 1985 and in 89 the 486 was introduced and as I am already mentioned in 808 386 the most important feature was extension of virtual memory architecture. It includes both segmentation used in the 80286 and paging the preferred technique of UNIX world I mean both segmentation paging are together used and in 80486 as I mentioned it used pipelining and with an on chip floating point unit. So earlier in case of 386 the pipelining that floating point unit was external but in 486 it was put inside and so far as the pipelining is concerned as I mentioned that Intel 486 that used instruction pipelining and that instruction pipelining for the first time in Intel processors and it was having 6-5 stages instruction fetch instruction decode 1 instruction decode 2 execute and register write back and the various functions perform in different stages are given here instruction fetch fetch instructions from the 32 bit prefetch I mean prefetch queue then instruction decode translate instruction into control signals or micro code address and in this same stage it also performs the address generation it initiates address generation and memory access then instruction decode 2 access micro code memory and it performs also output micro instruction to execution unit. Later on as we shall see instructions are converted into micro operations and this is what is being done that micro code memory access micro code memory and output micro instructions to execution unit. So those micro operations are executed ultimately and execute ALU and memory accessing operations in the execute stage and in the register write back stage essentially writing into the register take place. So same thing fetch unit performs in the fetch stage load 16 bytes of instructions are loaded in the prefetch buffer. So in the prefetch buffer you can load 16 bytes of instructions and then the decode unit determines instruction length instruction type then in the decode 2 stage it performs the address calculation it computes memory address and generates immediate operands which is provided as part of the instruction then in execute stage register read ALU operation and memory read write operations are performed and as I mentioned in the write back stage of register files are updated and you can see here the 80486 stages that instruction fetch then decode 1 decode 2 execute and then write back and later on we shall discuss about this Pentium Pentium is an extension of this which use superscalar. So you can see there are two I mean two directions in which processing is performed U pipe and V pipe I shall come back to it little later. So you may be wondering why two decoding stages were provided in 80486 because it uses complex instruction set computer. So decoding of instruction is much more complex compared to that of decoding of risk instructions. So it was necessary to have two decoding stages because it is harder to decode SISC instructions and it is inevitable due to microcoded control microcoded control that is being used in these processors and effective address calculation is done in the decoding stage 2 and multi-cycle decoding stages for more difficult decoding and stalls incoming instructions. I mean these are the two attributes of doing the decoding in two cycles and the execution stage performs both ALU operations as well as cache memory access and whenever I mean two penalty cycles are incurred if an instruction produces a register result and the next instruction uses this result for address generations. So whenever you are doing pipelining some kind of penalty is incurred and this is how it happens. So two penalty cycles are necessary in such situation and pipelined 486 could achieve performance improvement by a factor of about 25 over 386 because of pipelining and the faster clock and various other features the performance improvement about 25 times and you can see the number of clock cycles that is required for the different processors the 386 and 486 as you can see in case of 386 load instruction load requires four cycles on the other hand in case of 486 only one cycle is sufficient similarly store requires four cycles in case of 386 and 486 requires only one cycle ALU operations requires two cycles in 386 and 486 requires only one cycle and jump instructions when it is taken it requires nine cycles because address calculation and all these things are involved the 386 requires nine cycles and 486 can do it in three cycles and when jump is not taken then address calculation is not required and it is faster than when jump is taken it requires three cycles and 486 it requires only one cycle and subroutine calls requires nine cycles in case of 386 and 486 requires three cycles and main difference is coming because of on-chip cache that is used in 486 the 386 was not having any on-chip cache memory but in 486 small size cache memories were provided on-chip and that has led to this improvement leading to faster loads and store and also pipeline as we have already seen five stage pipeline was used in 486 there is another reason for improvement in 486. So, these are the different Pentium family family and how it has evolved is shown here and you can see the number of transistors has increased from 3100000 that means 3100000 transistors to 42000000 transistors so that means 42 million transistors that is used in Pentium 4. So, the number of transistors has increased and that has been possible with the advancement of VLSI technology and technology has also improved as you can see from 0.8 micron gradually the size of the devices has shrinked has become smaller and smaller from 0.8 micron to 0.18 micron and because of that it was possible to incorporate larger number of transistors in later processors. So, Pentium from 1933 to 1936 Pentium series of processors and you can see clock frequency was in the range of 66 MHz to 166 MHz and that as I have already mentioned it used superscalar so 2 ISU processors 2 ISU was performed 2 instructions were issued in Pentium and word size was 32 bit and L1 cache was restricted to 28 kilobyte. So, one for 28 kilobyte in style cache memory was on GIF and of course, there was no L2 cache in Pentium processors then Pentium MMX and mobile Pentium MMX which were introduced in 1997 and 1998 the clock frequency were enhanced and also the L1 cache memory size was also enhanced then in Pentium Pro processors which was introduced in 1995 the clock frequency was remained more or less same you can see as Pentium MMX, but L2 cache was introduced in Pentium Pro processors then in Intel seller run processor you can see the transistors have been increased clock frequency has been enhanced and of course, the cache memory has been made from 8 kilobyte to 16 kilobyte and second level cache was in case of Intel seller and first version there was no L2 cache and then second version there was L2 cache then Pentium 2 mobile Pentium 2 and so on you can see the enhancement that has taken place in terms of increased cache memory, increased L2 cache size, increased clock frequency and then we shall come to the net bus we shall discuss about that little later. So, this is how the Pentium 5 and P5 and P6 processors have evolved over the years and from 1993 to 2000. This is the internal architecture of Pentium P5, you can see here the floating point unit is built in and this is that bus interface unit which interfaces with the memory and IO devices and the internal bus is 64 bit external bus is also 64 bit in case of Pentium and the cache memory is shown here the instruction cache 8 kilobyte of instruction cache and 8 kilobyte of data cache and you have there are two ALUs. So, it uses superscalar process architecture, two arithmetic and logic units these are the register sets and the prefetch buffer the instructions are prefetched and stored there and it has got separate branch prediction unit built in and then as I mentioned multiplication division these are available in the hardware which were not available in earlier processors. So, in addition to floating point unit processing unit multiply addition and division these are all available in built in as in the hardware as a consequence is it gave very faster instruction processing. So, this is some overview of the Pentium processor architecturally Pentium is firstly different from 486 as we have already seen Pentium is essentially one full 486 execution unit called UPype and Vype as I said there are second two processing units UPype and Vype and two pipes are capable of executing instruction simultaneously there are separate write buffers and even simultaneous access to data cache. So, which is represented here data cache data cache. So, simultaneously they can access you can see 32 bit through the 32 bit fast the register set can also be accessed simultaneously this is how Pentium is superscalar of degree 2 that means there are two processing units. So, as we have seen in the diagram it is at least twice as fast as 486 because of the internal architecture and I have already mentioned about this Pentium expands 32 byte prefetch queue to 128 bytes we have seen that prefetch queue that is present here which is 128 bytes. So, this shows that Pentium superscalar processor it has got two pipes UPype and Vype separate execution unit separate write buffer separate decode unit and so on. So, instruction fetch instruction decode two stages of instruction decode execution write back these are done for the two different execution units and this is I mean logically how it really works is shown fetch and align instructions decode instruction generate control word then it has got two separate pipes decode control word generate memory access. So, same thing is repeated in UPype and Vype access data cache or calculate ALU result and write register result that is your write back stage. Then Pentium does superscalar execution can execute instruction that instruction 1 and instruction 2 in parallel if instruction 1 and instruction 2 is not jump. So, you see there is a there is some restriction it cannot always perform execution of two instructions together when it can perform when it cannot perform is highlighted here. So, can execute instructions instruction 1 and instruction 2 in parallel that means after fetching after fetching those instructions they are stored in a buffer then they are checked which pairs can be executed in UPype and Vype two separate pipes. So, they are executed is they can be executed in parallel if instruction 1 and instruction 2 or instruction 2 is not jump destination of instruction 1 is not the source of instruction 2 destination of instruction 1 is not destination of instruction 2. So, this is checked and if this is not true then only I1 and I2 can be executed in parallel otherwise two instructions cannot be issued. So, as we know whenever we use multiple issue processors it may not be necessary to I mean all the execution units may not be always busy. So, if conditions hold issue I1 to UPype and I2 issue on the next cycles. So, in such cases you have to do it serially and that means the next instruction 2 can be paired with the following instruction. So, this is how the superscalar execution takes place in Pentium processors. Then as I mentioned in 1995 Pentium Pro was introduced and Pentium II was introduced that is your multimedia instruction set. So, that multimedia processing can be done at a faster rate. Then the SSX processor that I mentioned which allows you streaming which is essentially SIMD instruction allows you streaming extensions. So, these streaming extensions was provided included in Pentium III and then Pentium IV and Pigeon they used Intel Netbust Micro architecture and that is also tuned for multimedia. I shall briefly highlight their internal architecture. This is Pentium VI P6 Micro architecture it forms the basis of Pentium Pro Pentium II and Pentium III besides some specialized instruction set extensions like MMX and SSC. These processors differ in clock rate and cache architecture and these are dynamically scheduled processors translates each I32 instructions to a series of micro operations. We have already mentioned about that how instructions are converted into micro operations and which are executed and micro operations are similar to typical risk instructions and that allows you hardware control unit and this is how the conversion of instructions into micro operations takes place is shown here. So, these are the x86 instructions which are fed to superscalar decode unit then there is a translate units these two together converts the x86 instructions into risk like micro operations. These risk like micro operations are sent to the dispatch unit then the dispatch unit identifies to which functional unit which operation will go. So, you can have different functional units some it can be some fixed point units floating point unit fetch unit. So, depending on what type of micro operation it is there those are fed to the functional units and then since the functional units can perform the execution and it may lead to some kind of out of order execution. So, see the out of order execution has to be converted into in order I mean in order the way the instructions were generated in the program in the same order result should be produced output should be generated that is achieved with the help of this in order retire unit with the help of this in order retire unit where writing into the register takes place in the appropriate sequence. So, that the way it should take place I mean specified by the program in the same way it is done here and Intel Pentium P6 that Pentium 6 has got 5 functional units to instruction units separate load and store unit and floating point unit and it has used 14 stage pipeline. So, in contrast to 5 stage pipeline that is used in Pentium 4 Pentium 6 uses much deeper pipeline that is your 14 stage pipeline and since the Pentium P6 must execute the sys like x86 instructions instructions are decoded into simpler sys like micro operations as I have already mentioned and out of order execution takes place which I have already mentioned with the help of this diagram. So, in Pentium 6 up to 3 IA32 instructions are fetched decoded and translated into micro operations in every clock cycle and micro operations are executed by the out of order speculative pipeline. So, here you are using some kind of a speculative execution and for executing the micro operations and you have already discussed in detail the register renaming then reorder buffer that is being used whenever you do out of order execution and which have been incorporated in this Pentium 6 processor to facilitate this out of order execution of the micro operations. So, processors in Pentium 6 family may be thought of as 3 independent engines coupled with a single instruction pool. So, instructions are fetched instructions are kept in that instruction buffer from which the instructions are going into the execution units 3 independent execution units and the execution takes place parallelly. So, this diagram I mean shows functionally how exactly this happens. So, the instruction fetch unit as I mentioned it can it fetches 16 bytes in every cycles. So, 16 bytes of instructions are fetched and they are stored in an instruction queue. So, the instruction queue from here 3 instructions per cycle goes to the instruction decoded unit. So, the 3 instructions are decoded which are converted into micro operations as it is mentioned here they are translated into micro operations those micro operations. So, it is converted into micro operations 6 micro operations I mean 3 instructions can lead to 6 micro operations and they are sent to the renaming and issue unit I mean 3 micro operations per cycles. From this buffer it goes to the renaming and issue unit and so 3 micro operations per cycle that is the rate at which it goes. Then it goes to the reservation stations and in the reservation station there are 20 entries 20 buffers are there reservation stations are nothing but buffers and 20 buffers are provided from this reservation stations it goes to the functional execution units as I have already mentioned there are 5 total functional units including one floating point as we have already seen separate load store unit and floating point unit. So, the outputs are generated they go to the reorder buffers because the execution the out of order execution can take place and results will be produced in different order. So, they will go to the reorder buffer which has got 40 entries. So, and then it will go to the graduation unit where 3 micro operations per cycle at the rate of 3 micro operations per cycle it performs the graduation I mean graduation unit. So, this is how the Pentium 6 micro architecture performs execution of instructions. So, this is the schematic diagram of the fetch and decode unit of the Intel Pentium 659. So, this is the instruction buffer after fetching the instructions which are stored in the instruction buffer as you have seen there are 16 bytes can be stored and which will go to the different function I mean decoder units you can see 1 2 then decoder 2. So, 6 micro operations are generated and this is that micro operation micro micro control memory micro control memory is stored micro program control memory stored in ROM you know as you know the control unit can be of two types hardware control unit and micro program control unit. So, whenever you use micro program control unit that micro operations are stored in a separate memory known as micro program memory and this is where that it is being stored and that micro those operations are fetched from here and then the decoding take place. And this micro micro operation Q stores the 6 micro operations and this is that branch address calculation take place for generating the next instruction. So, this is the fetch and decode unit of the Pentium 6 pipeline and then that streaming SIMD extensions that is being which is known as SSC 2 technology. So, SSC 2 extends that MMX and SSC technology with the addition of 144 new instructions. So, many new instructions have been added like 128 bit SIMD integer arithmetic operations 128 bit SIMD double precision floating point operation, cache and memory management operations with the help of these additional instructions it allows enhanced encryption, video, speech, image and photo processing. That means, various types of applications can be performed I mean in with much higher efficiency with the help of these enhanced instruction set and that then the Pentium Pro was introduced in the year 1995 it supports predicated instructions instructions are decoded into micro operations just like other processors and micro operations are register renamed and placed into out of order speculative pool for pending operations and then another new feature that has been added here it is execution is done in data flow order. That means, as soon as the operands are you know the data flow machines were invented at some point of time the basic idea was as soon as the operands are available performs that execution of that particular operation. So, here also somewhat similar concept has been used in the data flow order that means, when operands are available operands are ready you perform that particular operation and based on this Pentium Pro execution of instructions take place. Then Pentium 2 and 3 processors use Pentium 6 microarchitecture which are three way superscalar as I have already mentioned then pipeline microarchitecture features of 12 stage super pipeline and in case of super pipeline as you know in a single cycle say this is the conventional pipeline. So, in a single cycle two separate operations can be performed that means, in may be in half of the cycles one operation in another half of the cycles another operations and which is known as superscalar. So, this it is a superscalar three way superscalar execution to improve the performance. So, pipeline microarchitecture feature of 12 stage super pipeline and it trades less work per pipe stage for more stages. So, that means, it has got more number of stages achieving higher clock late as you know the number of stages decides the clock frequency more the number of stages higher will be the clock frequency. So, that was the another feature that has been used here. So, more number of stages have been used so that the you can use higher clock frequency. So, this is the Pentium 2 and 3 microarchitecture is the bus interface unit system bus from which instructions are fetched and this is the L 2 cache and here is your L 1 instruction cache and L 1 data cache. So, the instruction and data which are fetched from the memory using the bus interface unit are stored in the L 1 instruction and L 1 data cache and also it uses L 2 cache and then it has got instruction fetch unit and fetch and decode unit dispatch and execute unit and retire unit. So, after fetching it goes to the instruction goes to the instruction fetch and decode unit for execution and whenever it has to be loaded it goes to the dispatch and execute unit and this is where the instruction pool is stored after and then it goes to the retire unit for storing the result in the registers and here you have got your in the register where the retire unit stores various register values and also so this is this gives you the Pentium 2 and 3 microarchitecture. So, same thing where different units are shown bus interface unit instruction fetch unit instruction decode unit branch target buffer mic instruction sequencer memory reorder buffer data cache instruction cache instruction cache is here into cache and this is the memory interface unit and various functional units are shown here and this is the reorder buffer and retirement register unit. So, this is the microarchitecture of Pentium 2 and 3. So, it has got the in order section and out of order section. So, branch prediction unit it uses two level scheme for branch prediction using branch prediction buffer containing 512 entries. So, it maintains branch history information and the predicted branch target address. So, in the 512 entries history information as well as the branch target address is stored which allows you branch prediction in a very effective way and whenever prediction is not correct penalty is quite heavy at least 11 cycles mis prediction penalties there and that is the minimum and on an average 15 cycles of mis prediction penalties there and decoder breaks IA32 instructions down to micro operations which I have already mentioned each compared to it comprised of an off code to source and one destination operand and micro operations are of fixed length most instructions are decoded into 1 to 4 micro operations. So, more complex instructions are handled as a sequence of micro operations. So, this is the in order sections that register renaming is done. Logical IA32 based register references are converted into reference to physical registers with using this register renaming then it has got register reservation station unit with 20 entries as I have already mentioned and reorder buffer for 3 entries and then out of order execution is performed with the help of that RSU forms that central instruction window with 20 reservation stations each capable of hosting one micro operation and micro operations are issued to the functional units according to the data flow constants and resource availability. So, without regard to the original ordering of the program. So, using the data flow concept it performs the out of order execution. So, out of order execution with the help of RSU and ROB and this is that issue and execute unit MMX functional unit integer unit. So, there are 3 such units MMX functional unit jump functional unit and integer functional unit. So, port 1 port 0 port 1 and port 2 port 2 and then load unit. So, 2 integer unit and 1 load store unit load unit store unit load 0 and 2 3 4 through 2 3 and 4 ports these are provided. So, these are the reservation station unit and this is that in order retire section and micro operations are retired. So, basic function is to store the data into the registers. This is internal diagram let us not go into the details of it. So, I have deeply covered the micro architecture of Pentium 2 and Pentium 3 and starting with your 8086. Thank you.