 So, last time we essentially discussed storage devices mainly the desk and the tape and the reminder part of the computer technology is made up of memory and the processor things which go inside any box that you see. And today in both the sessions we will discuss these components of the architecture particularly about the system architecture, system buses how these things are connected to each other. And we will also briefly describe how the software layers which sit on top of this hardware how do they bind themselves to the hardware as well as on the other hand with the applications that we write. The cache is the fastest and the most costly form of storage it is volatile and it is managed by the computer system hardware. This is even faster than the main memory which is the fastest among all storage devices. Fast access tens of nanoseconds to hundreds of nanoseconds but generally too small in size and too expensive to store the entire database. So we will go back to our premise that our information system data which is required to be persistent that means it should always be available without any effect of any failure of the power etcetera will never be stored in memory it will be stored on desk. But for processing any information you need to bring it out into the computer's box and it is in this context that it might be useful to just look at a particular simple architectural diagram we shall expand on this later. So if you look at the various components of technology last time we discussed about magnetic desks and magnetic tapes and these we store non-volatile data that means data will remain here permanent. The computer itself internally has a processor or central processing unit as we call it and it has the main memory in which not only the data but even instructions are stored. Now the processor does all the computational work which means if it has to process something the data must be fed into the processor. Just like memory has multiple locations the processor also has certain temporary memory locations which are called registers. So it is like a register or a notebook that you will keep essentially the computer processor gets data from the memory into these registers performs addition subtraction comparison what a multiplication whatever operation and then pushes the data back from the registers into the memory. The communication between memory and processor happens through a specific channel which is often called the bus of the computer or the memory system bus. So that means the data from memory must actually come out onto this bus it must travel here and it must get into the processor and vice versa is a two way communication. Imagine this bus to be some kind of a network between memory and computer you will suddenly realize that all issues that we discussed in networking namely the delays the moment of the packets the protocol required etc in some sense all of that is required even for this communication. In fact the protocols that we discussed in network and the networking are actually called communication protocols and since there is a communication between processor and memory there has to be some protocol between apart from facilitating the moment of data the protocols also actually act as an overhead where the time that is required for the protocol to execute itself acts as an overhead for that time. As I said the excess time to memory typically is tens to hundreds of nanoseconds you know nanosecond right so it's second microsecond nanosecond okay so nanoseconds are extremely small here but what this means is that if from just outside the memory you want to access contents of a memory location it will take this much time if from just outside the memory you want to push some contents and write them in memory location it will take this much of time but what about the time that is required for moving the data from here all the way to processor and back now that is a non-trivial time the registers on the other hand and the ability of the processor to carry out the calculations has been becoming faster and faster and faster that as we shall see later is determined by the processor clock the processor clock is one which keeps generating pulses at a regular frequency called the clock frequency and actually execution happens at those pulses she said one pulse some instruction we executed another pulse another instruction we executed something of that sort and these pulses have been becoming faster and faster and faster you have heard of co-occurates of two gigahertz so what is two gigahertz that means how many pulses per second you will get if the clock is say one gigahertz any anybody can do a quick arithmetic how much is giga sorry 10 to the power 9 so if you have 10 to the power 9 cycles per second that means the time required for one pulse will be one upon 10 to the power 9 seconds or 10 to the power minus 9 seconds how much is that this is one nanosecond so any processor which is working even at one gigahertz clock can send a pulse every nanosecond not necessarily every instruction of the computer will be executed in one nanosecond but today's computer processor architecture as we shall see actually permits instruction to be executed every cycle now an instruction cannot execute in isolation an instruction requires some data to act on if the data is inside the registers the processing of that data addition subtraction comparison etc all can be done in one instruction but to get the data from memory to processor or push that data back if you are going to require tens or hundreds of nanoseconds clearly the processor is going to wait stupidly for the data to move on in order to avoid this waiting time what people do is when they design these processors they put an extra memory here and this memory is called the cashman the cash memory is as fast as the registers that means it can respond in one nanosecond or less and it is intrinsically tied to the processor that means the protocol overhead for moment of information between cash and the processor of very minimal ideally then we should put all our data all our instructions in the cash itself so that the processor can quickly have access to it but the cash processor cash memory is extremely costly because it is very fast it is as costly as the intrinsic circuit registers of the processor itself much costlier than the mem so consequently the mechanism that is deployed in modern computer systems is that some of the memory data based on some algorithms of usage will be actually pushed into the cache whenever processor requires to access some data it will first try to locate it in cash if it finds it it has finished the work if it does not find it it will go to the memory now typically in a computer program as you would say for example you have let's say a notation do I equal to one to hundred something something and let's say you are calculating something for elements it stands to reason then that while processing these instructions you will first access one location here then the second location then the third location then the fourth location if these consecutive locations have been allocated to an array which is normally the case consequently what is then intrinsically in these algorithms is whenever a processor does not find something here and it has to get it from the memory it will not get only data from one location in a bus it will transfer data for something like 20 30 hundred consecutive locations and stuff it in the cache so that next time when the processor wants a data the likelihood of that data being found in the cache itself increases it is quite surprising that the achievements of these algorithms is phenomenal they talk of a cache hit rate that means percentage of the time that the processor wants to access some data and it is found in cash itself rather than to have to go here this cash hit rate is often in excess of 95% of course there is an asynchronous moment of data between the cash and memory which is handled by cash controllers independent so memory and cash both have multi access point devices it is not although it is shown to be connected here on one part there are sort of multiple doors to this memory so that while processor is accessing some part of the memory the disk may be accessing some part of the memory some other part of the memory for pushing the data out or reading the retinance there is a lot of parallelism built in here similarly there is a lot of parallelism between cash and memory cash and the processor so in a nutshell then cache memory is almost as fast as the internal register memory of the processor it is of course much more costlier and therefore you don't have very large amount of cache memory it is for example not uncommon these days to have something like 64 megabytes of cash or 128 megabytes of cash when the main memory is of the order of two gigabytes or four gigabytes okay when main memories of the order of one megabyte used to be there the cash would be of the size of 64 k bytes but people always try to put this kind of a buffer so that's the purpose of the cash is that clear the purpose of the cash then when you decide on the configuration of a computer the some of the integrities that you would be expected to look at would be that if you have a processor what is the speed of the processor if you have memory what is the size of the memory and what is the excess time of the memory just like your seek time and rotation delay for this because 10 nanoseconds and 100 nanoseconds would make one memory 10 times slower than the other and then what is the amount of cash and what is the kind of cash hit rate that you get these are obviously important parameters from the perspective of moment of date so coming back to our discussion the cash is the fastest and most costly form of storage and it is managed by the computer system hardware the main memory is the next faster element it has fast access but generally it is not big enough to store the entire database let me also tell you that of late there have been of late has been last 15 years there have been efforts in a direction called main memory databases in these cases you put the entire database in main memory itself then we are not talking about one gigabyte of memory or two gigabytes of memory we are talking about hundreds of gigabytes of memory where the entire database sits inside why would such a requirement be there and what would happen if the memory collapses so in such cases you never let memory collapse such memory is always what is known as battery backup that means it is almost equivalent of a non-volatile storage although it is random access memory the capacities of main memory are up to a few gigabytes and these are widely used currently of this capacity basically the capacities have gone up and per byte cost have decreased steadily and rapidly the semiconductor memory cost have been reducing drastically and they will continue to fall further when they continue to fall further there is always a desire to use larger and larger memory but simultaneously that this technology also gets pushed further and you are getting this which are also cheaper and cheaper and cheaper so any idea about what would rupees 1500 by today in terms of hard desk you'll get any hard desk for 1500 rupees or no yes you will so what's the capacity 80g so notice that if you can get 1 gigabyte of main memory for 1500 rupees you get 80 gigabytes of stable storage or even slightly more for the same amount now that ratio unfortunately continues although that ratio has been diminishing that ratio continue so that time by which you will get 80 gigabytes of main memory you will still get about 500 gigabytes of disk storage for the same price and this distinction will continue but what it means for us is that increasingly smaller applications which have let's say databases of the size of few gigabytes which are traditionally always put on this it may be possible if you have a battery backup a UPS supported a PC or something it may be possible to have in memory databases in a short while where you are processing capacity and performance will improve enormously why because you simply don't have to do disk accesses just to make a comparison if memory excess is of the order of tens or hundreds of nanoseconds any location requires the same amount of excess okay and imagine that you are building B trees or B plus trees in memory so that means in order to reach a particular pointer to data you require let's say four five six excesses and then to get the data you require one more excess to get that pointer of course from the list you read a large block and if you want to read that large block you will be required to spend so many nanoseconds cycles of memory to get all that data but all of this would be at least thousand times faster than any excess today so if you are actually have that kind of performance which is now likely the hardware component availability will permit you to have it so to recapitulate per byte costs have decreased steadily and rapidly but the contents remain volatile main memory contents are lost if a power failure occurs or a system crash occurs you don't live with that you are familiar with the flash memory all the USB pen drives that you carry the flash memory the flash memory is something on which the data can be written at a location only once but the location can be erased and return to again why what is fundamentally different between a flash memory and normal memory in normal memory the number of times you can read or write to a location is practically infinite nothing happens to that memory location okay so you keep writing keep reading remove the power of the contents will be lost again you can write something etc etc there's no difference in reading or writing operation in a flash memory however why you can read any number of times if you want to write to a location then there are two possibility either you are writing it for the first time in which case it will take a normal memory right but if you are writing it again that means somebody else has written it already then there is a there is a process called erasing the contents so you have what we call electronically erasable memory or e-proms or e-proms and that action requires time more importantly how many times can you write and erase is not same as how many times you can read so erasing of memory typically again has to be done on an entire bank of memory there are some intrinsic characteristics because the nature of the technology that is used but once you have written something on to the desk on on to the on to the flash memory you can read it almost as fast as the main memory slightly slower you don't have tens or hundreds of nanosecond speed but reading is almost like hundred nanosecond plus kind of speed if you just want to read now you have suddenly a different dimension of technology the flash memory is much cheaper than the main memory but still much costlier than this the reason you should be very keenly following flash disk is because for your generation and generations next the flash this would become a genuine contender for storing non-volatile data for your databases yeah sorry yes you see when I mentioned that there are now digital desk or solid state desk which are now available those solid state desk now come within interface where that interface itself is designed like a hot strawberry interface for a conventional desk as a matter of fact the packaging of the solid state desk is exactly like any other ordinary desk so that these two are indistinguishable except for the speed of operations so in a normal storage which is designed to take hot strawberry desk it is possible to insert these days they even come with device diverse which are very similar to SCSI or anything else they add to the overall performance but since the intrinsic performance is so much faster that those overage don't mind so integration of flash memory into regular solid state desk as a replacement or complement to regular desk has started happening in the last few years and will continue to happen and that's why I said your generation of information engineers have the privilege of witnessing this transition and then therefore you also have the privilege of deciding how to optimally and most cost effectively use these kind of flash desk along with the conventional desk so is this clear the problem is that it has a limited number of right cycles or erase cycles and that's the reason why you're not comfortable by the way the number of such right erase cycles is increasing every year with technology so these are roughly as fast as main memory but rights are slow it may take a few microseconds and it is still slow people are improving the right speeds but it is continue to go continue to be much slower than either these are rights the cost per unit of storage is roughly similar to main memory this is the biggest advantage that roughly at the main memory price you are able to get solid state storage which is non-holotide so you don't require a battery backup so even with these problems you can live with those obviously the biggest usage of flash memory is in embedded devices you have seen all digital cameras you put a storage card so that your photographs remain there or any speech recorders or music players all of them have have these kind of yes in fact the flash memory is usually often used to carry software versions for embedded systems so you have a processor and some memory which drives let's say a washing machine washing machine has certain functionality please note that almost all the functionality that you see in such gadgets is driven by software software decides the function of course the software then drives the actuators the sensors and everything but it is essentially the software so if you want to replace a software how do you do that you can go into this washing machine and do something for modern days it is fashionable to build embedded systems where we can actually take their functional programs from a socket in which such contents are inserted as USB flash drive and this will become more and more and more common as we go ahead so electrically reusable programmable read-only memory as I mentioned the read-only memory is a misnomer the moment it is usable and programmable you can write the only point being made here is that writing is slow erasing is still slow but reading is as fast and the cost is not very high but do remember this going forward you would increasingly find flash this replacing the conventional magnetic disk for many applications as he pointed out in embedded systems it is already happening you don't require a conventional magnetic disk anymore these disks are good enough and almost all digital cameras everything everything in the world usually have these some of you might have seen long time ago for example a handy cam which came with a CD or a DVD you will actually write that directly onto a CD or DVD you don't do that anymore you write directly to a solid state storage and then that solid state storage can be later on used to transfer information from any one place to one next we discuss the central processing unit which traditionally is set to comprise of one control unit and the other ALU or arithmetic and logic unit the control unit traditionally control the sequence of signaling and the sequence of moment of information across the circuits of computation and the actual computational circuits were called arithmetic and logic unit so ALU or arithmetic and logic unit is the one which is responsible for actual execution of instruction many of you would know the basic mechanism of instruction execution but those of you who do not have the background in electronics and computer processors I would like to very very briefly mention at least a very simplistic representation of how instructions are executed inside the computer so you get an idea of what happens inside what I am depicting here is let's say the main memory of the computer which contains these instructions let me first tell you that the way digital computers work is that the memory of the computer is used not only to store the data but it also used to store the instructions of the program here we are not talking about a 4 tan program or a C program or a sequel program but we are talking about a program which results when you compile the high level language programs link those with all the libraries and create what is known as an executable binary and executable binary contain pure play binary instructions which the computer can execute all these instructions in a program ordinarily reside in the desk as a part of that program but whenever you know that program whenever you tell the system please execute this program the entire program meaning all its instructions come and sit inside the computer's main memory and what I am depicting here are two of these instructions a typical instruction which is in a binary format has two components one component which is called the operational code or op code for example 0 1 0 1 1 0 1 0 may mean an operational code implying addition of some numbers to be done 1 0 1 1 0 0 0 1 may mean subtraction to be done some other code may mean put this data into this location some other code may means take this letter from that location and bring it to process another code will mean compare these two numbers any time an instruction has to be executed the purpose of that instruction will require that some more data be fetched from memory and handle and that some other data is pointed to by the remaining part which is known as the address part I have shown here an 8 bit operational code and any number of bits that will fit a memory location as the address part but suppose I can store only one address now when I say operation code is add two numbers then I need to specify three locations right two numbers to be added and the result is to be kept somewhere both these numbers to be added at somewhere in the memory and the final result also has to be kept somewhere in the memory how will I specify three locations so people do a trick when they design the circuit they decide that addition of a particular address when a single address is mentioned is that you have a register inside the main memory we assume the register contains something already one of the numbers if the address is prescribed I will add the contents of that address to this register and I will put the result in register itself so suppose I want to add 7 and 5 to get result 12 and let's say I have this number 7 in one location 5 in one location I will by executing one instruction try to get the 7 into the register so this is my register I will get 7 here register contains this now I will execute an add instruction which will point to a address which points to 5 what that instruction will mean is take the contents of this register take the contents of the address which is shown here which is number 5 or 7 and 5 and put the result in this register itself so that means when you require to specify three addresses two for the operands and one for the result you can get away by specifying only one address and by writing register be the default address for the other two things one of the operands is in register and the result will also be in the register I hope you can appreciate this simple trick of course this means that if you really want to add two registers initially you have to fetch one of the numbers from the memory and put it in register for which there will have to be one instruction then you perform addition and finally the result from the register you will require to take it back and put it in some actual memory location so you have to execute three instructions to carry out one addition that since instructions are happening fast that is okay so this is the way instructions work inside the machine however how exactly does one instruction execute and let's say this is that instruction consider this is an add instruction so there is an address where it has to add the existing value which let's say is 7 the address points to 5 first of all as long as the instruction is in main memory it cannot be executed who executes instructions the processor the processor has two parts what are the two parts we just mentioned is the arithmetic and logic which is responsible for carrying out addition subtraction as the name suggests but this whole thing cannot do any operation unless the instruction is brought from the memory to the computer's processor this particular operation of getting instruction from the main memory of the computer inside the computer's processor is called a fetch operation so in the fetch operation when the fetch operation is executed computer goes to this memory location fetches this entire content and brings it into the control once the instruction has been fetched inside the CPU the computer will then carry out a decode operation remember this instruction is now here what does decode means decode means that the processor will look at the operational code parts of that instruction it says 1 0 1 1 0 0 1 0 1 and it will suddenly recognize ah this means addition some other bit pattern may mean subtraction some other bit pattern may mean load memory from this content to register some other bit pattern may mean store the register content to memory whatever so there is a decode operation after the decode operation the computer understands that it has to perform an add activity that additional activity will be performed by this arithmetic and logic unit it will be performed with respect to register assume to contain one value and will be performed with respect to other address which has been found during decode phase from which the other operand has to be phased all these operations will then be done by ALU that part is called the execute phase there are two questions that we need to answer one how does the computer know that it has to fetch instruction from this location and not from some other location and the beauty of the architecture of the basic digital computer which is still known as one non-non-architecture have you heard of this gentleman i think i have mentioned his name once he is the creator of the concept of a stored program digital computer he was a great mathematician come engineer of the last century he's also credited with the with the complete new field of knowledge called operations research he wrote a treaty along with morgan one morgan son on i think theory of games and operations research or something it's a well known treaty but he was also creator of this particular logic so what he said is remember i mentioned executable program let's say it has 1000 instructions an executable program typically continues to execute all its instruction one after another so what the operating system does is it puts all the 1000 instructions in the memory and the starting point whatever is the address that address is passed on to the processor saying okay you have to start from here that particular address is called program counter that means i am executing an instruction at this point so the computer automatically fetches from that program counter contains decor set executes it once that instruction is executed and yes something else happens it will go to the next location so it will simply increment the program counter by one fetch the next instruction from next location and so on automatically so you don't have to worry about where to go to fetch an instruction of course you might have jumped instruction if instruction etc in which case when you execute an instruction of the type jump that instruction will not do any arithmetic operation but it will simply change the program counter it will say next instruction don't get from this location but go somewhere else and get that so this is how you decide where to get the instructions from second when you execute these instructions now that we know that each instruction requires a fetch a decode and execute how long should it take the computer to completely execute one instruction clearly there are three phases we already agreed that any activity inside the computer will happen whenever a particular clock pass is generated so at least one clock pass is required to fetch an instruction another will be required to decode another will be required to begin execution whether the execution happens within a single clock cycle or the ALU itself has circuitry which requires four, five, ten, twenty clocks to complete that will depend upon the circuitry but minimum three clock cycles appear to require to execute one instruction now this will make the whole operation very slow so what people say the designer said is that can I make this faster and they created the notion of a pipeline what does a pipeline means suppose this instruction is being fetched by this computer it has been fetched and now the processor is busy decoding this instruction when the previous instruction is being decoded memory is free so the processor says why not why the previous instruction is being decoded why not I go and fetch the next instruction so I can do that finally while the previous instruction is being decoded I can fetch the next instruction now imagine the previous this instruction is now being executed the decoder unit is free so why not the decoder unit now decodes the next instruction that has been fetched and the fetching unit now goes and fetches the third instruction so consequently you have a pipeline there is one instruction which has been fetched let's say long time ago after that instruction has been fetched you have a decode or it's the other way around fetch here decode here and execute here so when one instruction is being fetched the previous is being decoded and previous to previous is being executed once this instruction is executed it goes out of the pipeline this one then is executed this one is then fetched is decoded and other one is fetched consequently assume for the simplicity that each of these activities requires one clock cycle then while for executing one instruction end to end we still require three clock cycles but at the end of this pipeline you will start one instruction you start seeing one instruction getting executed every cycle so in one cycle this instruction is executed in the next cycle the one which was decoded will get executed consequently you will have one cycle per instruction kind of behavior so if your clock rate is two gigahertz you can actually execute two giga instructions per second of this type as opposed to two by three only obviously the execute time for the decoded instructions may take large number of cycles independently and that continues to be the major distinction between two types of alu architectures or the computer architectures one is called the complex instruction set architecture the other is called the reduced instruction set architecture or the REST computers reduced instruction set computer is called the REST machine and as opposed to REST the standard thing is called the SIST machine the SIST machine typically an execution of an instruction will require variable number of cycles depending upon the complexity of instruction addition subtraction multiplication division will take much longer load and store data from memory will take lesser time compare and take some action will take different kind of time so these are the different things that happen in SEST the REST machines because of their design they are reduced number of instructions so consequently a conventional high level language program may compile into more many more REST instructions than SIST instructions but the guarantee is that REST machines will execute one instruction per cycle and now the latest if this pipeline gives you one cycle per instruction kind of execution time today's computers a single chip this is your processor chip these processors chips today no more contain a single ALU and control unit that is a single processor they contain what is known as multiple cores so a core is a combination of ALU and processor and is able to independently execute instruction consequently today if you have a four core processor what it means is you got actually a four processor machine but these four processors are not separate as in four different processor chips or boards but they are all integrated inside a single chip and now imagine the complexity of the architecture to support these kind of things obviously if a core is an independent processor ALU kind of thing you must provide for its own cache you must provide for an independent path from that to memory you must provide for isolation of instruction execution of that core while other instructions are being executed by another core and so on in short whatever you would do with multiple processors existing inside a normal computer architecture which we shall see in a moment you are now required to do the same thing inside a chip when you have multiple cores like this the purpose of sharing all of this is to indicate that the processor technology is also rapidly making strides and giving you performance possibilities which were either to were not available so to go back to our slide then is this is this clear that the central processing unit has a control unit and arithmetic and logic unit and that the way instructions are executed is that every instruction needs to be fetched it needs to be decoded and then it needs to be executed and after this is done the computer goes to execute the next instruction in sequence and because it automatically fetches instructions in sequence you have this behavior of a stored program computer where instructions are executed in sequence one after another and you understand now the notion of a pipeline that gets intrinsically created inside through the additional circuitry this is again a recap of something that i already mentioned processor has registers which are something like fast memory locations they are internal to the processor the data is read from memory has to be processed and written back the processing has to happen inside the computer using these registers as temporary storage a clock synchronizes actions typical clocks could be 300 megahertz 500 megahertz 1 gigahertz this has improved from 1 kilohertz upwards so you can imagine the speed enormous speed increase which has happened in the computers 1 kilohertz means what 1000 cycles per second okay so one clock would be how much one microsecond that is the fastest speed in those days the memory is used to be milliseconds so it was okay and nothing happened in one microsecond all machines were called complex instruction set computers typical execution time of any arithmetic instruction would be anywhere between 10 cycles to 200 cycles and if you had a floating point arithmetic to be done you had it it was it was so complex okay however things have changed drastically now that this has been the speed improve another recap we mentioned risk and sys processors risk stands for reduced instruction set computers and sys stands therefore for complex instruction set computers normal computers which are complex instruction set computers were never called like this unless the the design of reduced instruction set computer image after which all other old style computers were getting called as complex instruction set computers in terms of the popular architecture what is known as x86 architecture we are familiar with Intel processors so Intel 286 386 486 Intel Pentium okay whatever whatever all the Intel processors up to Intel 80486 were strictly sys processors Intel Pentium and beyond have huge risk kind of marketing but because they need to maintain backward compatibility these processors are both sys can risk so they can execute sys constructions but they are essentially sys processors we have already discussed the notion of pipelining the pipelining really becomes very heavy now for example instead of having a single fetch unit can i have multiple fetch units can i have multiple decode units can i have multiple execution units and having multiple core inside the same processor means can i have multiple combinations of these which is exactly what you see now that is how but pipelining is a is a is a beautiful engineering concept which actually gives you a larger throughput although individual instruction may still require multiple cycles or multiple pulses number of instructions that you can execute per second has been always a crude measure of the performance capability of a processor the early processors could execute a few thousand instructions per second then it went to hundreds of thousands of instructions per second the first processor which could execute one million instructions per second was called a one mips processor it was a processor from a company called digital equipment corporation the deck machine on wax 1100 730 or 750 for the first time deployed a processor which could execute one million instructions per second in today's terminology if one instruction is executed per cycle one million instructions per second would mean what is the clock rate one megahertz compare one megahertz with one gigahertz at a very crude level how much faster a processor with one gigahertz clock is one gigahertz is 1000 megahertz so thousand times faster so you have 1000 mips processors today not very long time ago 11 750 11 780 are the machines that i'm talking about these are machines of early 80s or mid 80s and they continue up to mid 80s that means just 20 25 years ago you had machines which were 1000 times slower and they were considered the then fastest processors ordinary processors were still slow that's that's just to recapitulate the history of of movement that mips was therefore an important parameter of measuring performance mips was not related to risk type of process where every instruction takes the same time to execute namely one cycle those days different instructions took different number of cycles and if a computer had to execute one million instructions per second you have to define what is the convenience of instructions you are considering consequently people created an equivalent of a measurable load saying so many odd instructions so many multiply instructions so many divide instructions so many jump instructions so many load so many store there's a variety of things we take this conglomerate together and find out how long it takes to execute this and divide that by the total number of instructions that were executed to find out the mips rating of the process so mips was not a single type of simple instruction being executed again and again it was a well-defined suit of instructions combination of instructions representing a typical computational workflow it of course handled only integer operations for comparison it was very obvious that for computational purposes floating point operations will cost much more you agree with that okay adding or subtracting two simple numbers versus adding or subtracting two floating point number or multiplying two floating point numbers is much more complicated than adding or subtracting integer consequently there was another measure which was called flop or floating point operations per second and you again measure tended to measure millions of floating point operations per second so consequently computers which actually did heavy computing in the early days to justify this word they measured their performance in mega flops even today you measure the performance of supercomputers in terms of flops because floating point operations is the key is considered the most complex instruction and how many floating point operations can you execute per second is is an important so the parallel computers that are that are there today they don't perform at one mega flop or two mega flop or ten mega flop they don't even perform at one giga flop they try to perform at tera flops and tera flops you just don't have a single processor which can execute instructions at that speed so very obviously you require a different strategy as we shall shortly see in the next session the idea of putting multiple processors together to harness their computational power to get much more computational speed than otherwise possible with a single processor has what resulted in different kinds of computer architectures we shall see that in a short while but first let's complete the discussion on the basic measure of performance for a processor so it is mips and mega flops is that is that clear these were the early measures the more common measures today applied to measure the basic computational capability of a processor is in terms of spec int and spec float spec is another specification much like a combination of certain number of instructions put together and measuring the execution time of those instructions and finding out an average except that mips and flops represent a combination which belong to the old style spec has come out to be the new specification of the combination of instructions which are considered a measurement kernel or measurement benchmark this benchmark exists for both integer and floating point operation and consequently you have spec int and spec float so instead of mips and mega flops which nobody really talks about today you talk of spec int and spec float these are the two parameters which are invariably published by the manufacturers of processors so as per individual processors raw capability of computation is concerned it is always measured in terms of spec int and spec float as we shall later on see this capability need not necessarily translate into a corresponding capability of doing information system transaction fast for that there is a set of other sort of parameters and characterization that is required we shall comment on that performance measure separately but as per the raw processing capability of a processor is concerned you measure it in terms of spec int and spec float is that is that clear the native language is a notion which says given a processor it can execute instructions which are written in its native language two processors may have different native language if for the same instruction they have a different coding scheme as simple as if for one computer one zero one zero one one means addition but one zero one zero one one means subtraction for something else that fellow's language is different okay no matter what the native instruction set or instruction native language of a processor is we are buffered from all that because whenever we say we have a compiler that compiler will translate a high level language which is common into the particular native language code for that process that's the reason why you cannot survive without compilers when you buy any new machine you must have compilers which will translate instructions into the native language of that particular process in terms of memory there are multiple chips this is slightly old data information 256 kb 1 mb 4 mb 8 mb 16 mb okay 32 mb so these are chips and these individual chips have when i say kb notice that this is small b kilo bits mega bits so you have only bits if you want to make a world of 8 bits where you want to store a byte you typically put 8 such chips together it's a chip 1 represents the 0th bit chip 2 represents first bit second bit 8 bit or 9 10 32 bits whichever way you clearly each chip can be handled independently in terms of accessing a bit or writing a bit at a location so you again are something like a cylinder on the days you have in these 8 chips your location 1 which goes across the location 2 location 3 whatever whatever larger the capacity of a chip in bits the larger will be the total byte capacity when you put all of them together it is customary not to put 8 bits to organize even a byte word you normally have additional chips which will correspond to parity bit you can have one parity bit two parity bit so error correcting quotes error detecting quotes all of these will be deployed on the main the word length and address length are two important concepts word length is the capacity of a unit and address length means how many such units you can put so for example word length of 8 bit means each memory location is 1 byte long word length of 64 bit means each memory location is 64 bits or 8 bytes long but how many such locations you can have depends upon how many addresses you can put the situation is very comparable to a hypothetical residential colony of houses where each house has a certain capacity in terms of rooms you can say that that is the word length how many bedrooms are how many rooms are there in a house how many houses can you have in the colony that depends upon the addressing scheme suppose you have decided that you should be able to uniquely address each house by a three digit number then the smallest number you can assign a house is zero zero zero largest number you can assign the house is nine nine nine you can uniquely identify thousand houses no matter what is the capacity of an individual house so the individual house has four rooms or one room or eight rooms it doesn't matter in computer technology all houses have same number of rooms same number of bits or bytes and that is the capacity of a house or the word length and in computer technology addresses how many bits do you specify to uniquely address a particular memory location so if you have a 64 bit addressing scheme then how many addresses can you specify 2 to the power 64 so you can have 2 to the power 64 different locations distinct location whether each location is one byte or two bytes or four bytes or eight bytes depends upon the capacity of that ordinarily almost all computers have memory basically organized as byte memory that means each location is one byte and how many locations you can have depends upon the addressing scheme that you have I already mentioned single point versus multi-point access what it means is that you make a memory structure such that you can access it only through one door so while one fellow is accessing the memory nobody else can access it multi-point access is a very common thing in memory where you say that since memory is now the main thing from which instructions will have to go to a processor data may have to go to disk or come from disk it may have to go to a printer unit if there may be multiple processors which may all require my memory so consequently memory now has four doors eight doors 24 doors multiple access point so different fellows from the bus can access it access different portions of the memory naturally nobody wants multiple people to access the same location of the memory well there are 16-bit addresses 32-bit addresses or the address address oh this is not I mean what he is pointing out is a fallacy that we talk of huge large e-commerce capacity but if the addressing scheme is smaller how do we handle that we had a very peculiar situation when a particular computer company those days in India we used to design computers and design and write operating systems and design and write compilers because import was not possible in those days the only computers which came were 8-bit processors that means their addressing scheme was 8 bits how much maximum memory you can have in 8 bits 2 raised to power 8 that is 64,000 so if you have bytes you have 64,000 bytes the 8-bit microprocessor based cpm machines typically had 64 kilobytes of memory because that's the maximum memory that was possible then came the processors which have 16-bit addressing scheme 16 bit suddenly means you could address how much memory 2 to the power 16 which is yes one megabytes and they said okay here you have one megabyte of memory one such computer was purchased by a metallurgical engine department which claimed that you can have one megabyte of memory the metallurgical engine department that research group was working on a whole lot of computational thing where they required larger arrays so a professor there took his photon program in the earlier photon program he had defined smaller arrays because they could not be accommodated this time he declared bigger arrays and started running that program and the compiler said cannot allocate memory and he said what is this nonsense the computer has more memory the computer has larger memory but what they are given is they are given the old compiler the old compiler knew that maximum memory is 64 kb so it would not allocate more than 64 kilobytes to any single array so you can see how the limitations could go on even if the basic limitation is exceeded because there is end-to-end system that you need you must make sure that every component handles a larger capacity if it becomes available at the lower so that that's a fact of life that happens okay you you take the same thing for disk how much should be a disk pointer length imagine the pointer length is 32 bits a pointer which is an address on the disk block so if you can have 32 bit pointers how many disk blocks can you address by 32 bit pointers 2 to the power 32 which is how much 4 giga blocks okay what if you have a disk capacity which is much more than this how will you access suppose you are interested in accessing an individual block which is one of the identifiable blocks in a disk conglomerate of 1000 disks you want to treat all these 1000 disks as a single volume and identify each block individually you can't is this limitation based on the processor no because who how much should be the length of your pointer is a decision taken by someone who is writing software the processor etc nothing to do with what should be the length of pointer on accessing a block that's all hypothetical a disk is a disk you can decide in the as many blocks as you want you can make blocks of very small size or you can make blocks of larger size smaller block so these aberrations as you point out have happened in the entire industry for all the years and the effort is to keep changing or keep modifying these aberrations to avoid that you know to avoid the problem of address space there's one particular computer called ibm a s 400 some of you might have heard of this s 400 was originally the whole architecture was designed by dr saltis and i remember when i met him many years ago in rochester milwauk is a very cold place where s 400 is and happens he says people talk of abstraction and object orientation he says we we did all that in s 400 long time ago you know s 400 has some processor and its memory and access world length and address space etc etc but the entire s 400 software was conceived a return assuming 128 bit pointer size so address space logically was always 128 bit no matter whether the underlying processor is 8 bit 14 bit 16 bit 32 bit or 250 and this abstraction was superimposed or whatever hardware constituted s 400 i would say that was a very visionary decision taken there not all decisions are taken like that particularly for commodity computers so yes the address space does matter and it is prudent to now talk about address space both for memory for this in general the address space of 64 bit of minimum 32 bit anything less than 32 in fact 32 bit is fast going out of let's say popular usage because that is considered clearly inadequate for the growing memory inside the computers processor access time will not increase because the access time will still be determined by the disk rotation speed and the and the access speed oh it doesn't matter because once you move the arm the physical movement of arm and physical movement of this still determines that so that that will not happen when you talk about any memory and you always go in multiples of two so if you have one gb standard if you go multiples of two it should have 64 gb access why it is 80 gb and why where is the number comes out actually it is 72 gb only yes it's a it's an interesting question what he's asking is that when we talk of memory we generally talk in terms of multiples of two why don't we do that when we talk in terms of this so that's a very interesting question let us analyze it from the perspective of how these devices evolved the memory address space which is actually the defining parameter for the amount of memory that you will have has always been measured in terms of bits so address space is 16 bit 32 bit whatever since you will follow the binary system in addressing all binary numbers are integer multiples of two only two raised to power something and that is how the memory capacity is always defined in terms of two to the power something on the other hand if you take the disk the disk is a rotational magnetic media they're depending upon how many platters you have how many tracks you have on each platter and what is the capacity of an individual track or sector the total capacity will be determined it was not uncommon for a long time for the disk tracks to have six platters and ten surfaces six platters should have twelve surfaces so what they used to do is because of the packaging the top surface could not have a arm and the bottom surface of the last disk could not have enough so you have ten surfaces on which you had this disk