 And today we are going to discuss the technology components that we have been only peripherally noting down. So primarily our concern will be the storage devices, the processing devices or the processors or CPUs. Some of the storage devices are called peripheral devices like the disk and the tape and the mouse and the keyboard and the monitor and such things. We will not spend too much time on discussion of peripheral devices now although I will give you references later so that you can look up the peripheral device. The most important peripheral device which is not really a peripheral is the storage device or the disk because in information systems much of our information resides in this. So we shall be looking at the disk in a bit more depth. We shall also very quickly look at the storage hierarchy that a computer system presents and of course whatever information you store in whichever portion of the hierarchy it has to be processed. So we need to understand a little bit more about the processing capabilities of a computer and what does it mean to us in terms of information processing capability because a computer processor does basic instruction execution at a certain phenomenally fast rate, billions of instructions per second kind of thing. But we are worried about how our information is processed that is relevant to us the business information and how do you translate or correlate the information processing requirements of our kind with the computer's innate ability to process information. So in short we are going to look into inside the computer now and see what exactly does it hold for us although we will be discussing technicalities which pertain to you may call computer architecture something like that our emphasis is not to understand most certainly not the electronics of the computers but also not too much about the intricacies internally but what impact do these technologies have in terms of information processing. So in the first session we will discuss the storage device mostly the disk and in the second session we will discuss the main memory and the processor architecture which together make the computer's box. So firstly any storage of information how do you classify the storage could be a variety of kinds so you can classify it based on the speed with which data can be accessed data is stored somewhere and how quickly can you access that data and then cost of storing the data per unit of data. And finally the reliability how reliable is the storage will the data be lost if power fails or if the system crashes what happens if the storage device fails the physical device failure because in any information system you cannot afford to lose data. So before beginning before continuing with this discussion let me ask you some questions from your knowledge of common information storage that you and I come across look at our physical library we have books stored there okay with what speed can you access a book so you are locating let's say book on databases so you go to the library so what is your search mechanism there are what about more than a lakh of books right 100,000 200,000 300,000 whatever be the number so surely you don't start searching from rack A and rack B and rack C so what do you do yeah so you have a classification number or an accession number what would how would that help the accession number will actually point to a particular rack or a place where that book is but the point is you have to know the accession number let's say you know that there is a book by Kurtz silver shorts and Sudarshan and database you don't mug up the accession number as IBM 147 315 what whatever how do you find that so you do a search and you do a search on what on a catalog or a so basically you have in the old style libraries you will have card decks modern computerized library you can do a search on the computer but if you have a card deck usually you will search the cards by title or by author and once you identify a particular book you find the accession number you find the place and then you do what then you go physically to that rack and then you take that book out so without this indexing mechanism you will agree that it will be impossible nearly to search a book you only searching the book throughout the semester if you take let's say 15 seconds to take out one book and see whether you that's the book you want or not you can divide 3 lakhs by 15 seconds and find out the number of seconds required and number of hours impossible essentially then indexing is a key thing whichever be the storage mechanism how do you calculate the cost per unit of data in case of physical storage of books there are two things now here the physical book itself occupies a place in the library rack so the total number of racks plus the cost of the building in which the racks are housed and the total number of books that will give you on an average storage cost per book is so much there is a capital investment in that building and racks and there is a operational expenditure on running the library so if you're a portion that entire thing for the operational cost for the let's say just storage can approximately get so much money it will cost the originally the cost of the book itself is at cost consider information searching nearer home each one of you would have multiple notebooks in which you will take notes or something you'll have notebooks belonging to let's say a previous semester and you would have recorded something somewhere so have you calculated the cost of storing let's say one notebook full of information in terms of bytes how many bytes can you write in a notebook take one byte to be one character counting blanks on one page how many bytes can be written how many characters nobody has done that kind of calculation so that is the cost per unit of data what is the speed with which data can be accessed imagine that you have notes for this session taken on one of the notebooks and you have let's say ten such notebooks and you have to appear for a surprise quiz or something like that and you are searching or an exam so first you'll have to identify that particular notebook which contains this so long it may take only ten that moves in the worst case you do a serial search in about a minute's time you'll be able to access that one within that notebook how will you find out where is the material for this session or if it is not this session but on a particular key where you are searching like schema design or something how will you find out within a notebook flip pages please remember that very rarely do we organize our own notebook with indexes although many of the notebooks provide for that first page index but we do not write you know this is the topic this is the page number this is the topic is the page that is because we can quickly skim through the pages and figure that out but what is this what is the time that is actually required suppose you want to locate say information about normal forms in your hostel room so about a minute to search for the right notebook and another half a minute or one minute to go to that particular topic so about two minutes time is that fair judgment okay what is the reliability there's no power failure for notebooks but I may lose my notebook okay or there might be fire in my room all notebooks may be bird so the the data loss that that we can call it a system crash system crash means or book is just lost or soil it falls into a bucket of water I remember one notebook was put on a bicycle carrier by a friend of mine this happened 30 years ago and when he came out there was no notebook on the bicycle a cow was eating that note we jokingly told him that his notes are liked even by the cow but he said what about me my notes are gone so all kinds of funny things can happen in life how do you safeguard against them notebook is not a very good example because there is a natural safeguard my friends notebook so since there are typically 30 40 students attending a class if I lose my notes I can always depend upon somebody else's if nobody has the notes available there is a textbook on the basis of which I can sort of reconstruct them so you solve the problem of data loss in real life by ensuring redundancy what is there is no redundancy somewhere your identity card for it you lose your identity card take the case of even IIT Bombay you lose your identity card so what happens the immediate implication if you are not well you go to the hospital they may throw you out if you want to go into the lab and suppose the watchman asked for identity card you can't get in you need an identity but identity card can be replicated because the original identity cards information is maintained somewhere else as a backup so from that backup your identity card can be restored it's one piece of information in general you will then agree that if I want to characterize the storage I must worry about the speed with which I can access data the cost of storing that data and the possibility of loss of data and ideally what I would like ideally the speed with which I can access the data should be instantaneous ideally the cost per unit of data should be miniscule and ideally I should have an arrangement such that my data is never lost I can always reselect it of course if the original copy is lost I understand that there will be some time I will have to spend in reselecting we saw that when in case of a disk crash how do you recover using log records and so on so on it is in the context of these three questions that we shall examine the storage hierarchy inside the computer please understand that as well the computer and its storage and peripheral devices are concerned the company is not worried about whether your data is about roll number and marks of students or whether it is your account number and balance it doesn't care the computer recognizes information storage as series of bits zeros and ones organized in a certain way and the computer technology will worry about the accessing of those bits or bytes so no interpretation is a show me interpretation is your problem but accessing the required data at what speed can you do that what is the cost of storing that data bits and bytes and in case something happens to the storage what is the way in which you can recover that data we have to now map the bits and bytes into our meaningful information and figure out how much it will take for us to access the data cost of data etc this is just the background first in terms of computer storage we discriminate between volatile storage and non-volatile storage since the computer technology is electronics we speak of electronic storage as volatile storage if the contents are lost when power is switched off invariably the main memory of any computer system falls under this category it's a volatile store you switch off the machine the contents are lost you switch it on again the contents are non-predictable unless you push something into that memory again the memory contents meaningless things clearly the volatile storage is not suitable for our information processing needs from a storage perspective long term story the long volatile storage is one where the contents will persist even though the power goes off this will include secondary and tertiary storage as well as what is called battery backed up main memory so as I said main memory is something which is definitely volatile storage so this is called primary memory the secondary and tertiary memory are pieces storage which are non-volatile so in case the main memory contents are lost you will always use that secondary or tertiary memory from which you can get the data back into the main memory sometimes you do require main memory itself to continue to contain what you want what you needed in which case you will provide a separate battery backup so even if the main power is lost the battery backup continues to have the same date and this is because the electronic memory today essentially built of what we call flip flops which can store zeros and ones electronically when the power is switched off those flip flops lose their state so there could be zero or one in old times and the main memory used to be made of magnetic core the magnetic cores once magnetized either this way or that way would not lose their magnetism even if power is switched off so the magnetic memory used to be non-volatile the main memory is volatile to that extent this is the storage hierarchy the hierarchy is actually described in the in order of the time that it takes to access information inside that memory the fastest from that perspective is called the cache memory the next faster is the main memory slower than that main memory is the flash memory slower than flash memory is the disk slower than the disk is the optical disk and slowest are tapes so this is the storage hierarchy in this hierarchy cache and main memory are volatile storage all other hierarchy pieces are non-volatile so ideally I would like only desks or flash memory or optical desks or tapes to use for an information system perspective if the last four categories of the hierarchy are non-volatile then the question naturally would be why the head we use these kind of memory at all then what's the fun in having volatile memory if I'm going to lose memory the problem is that the way the computers are architected traditionally the one I'm on architecture which was originally proposed in late 40s you always have a combination of memory and processor memory and central processing unit together forming a computer and the memory which is called the main memory along with the CPU would be live only if there is power there's no power neither the CPU nor memory is required to function that's how the computers have been architected so consequently if at all you want to use computers to process your information your information being on non-volatile storage does not help unless that information can be brought inside the computers man and computer cannot process any information on desk or anything else computer can process only that information which is inside the main man so to constantly bring information from non-volatile storage inside the memory push it back etc. etc. and that is why the cash and men memory are important we shall see the distinction between cash and men men memory later in the second session when we discuss the memory hierarchies along with the processor and look at the computer architecture in this session we are going to concentrate on this which is the main state of our storage systems here is another way of describing storage hierarchy primary secondary and tertiary I use these terms some time ago the primary storage is the fastest media but is volatile so that comprises of cash and men memory the secondary storage is next level in hierarchy it is non volatile it is moderately fast excess time but much slower than the main memory or cash the secondary storage is almost always called online storage online storage means it's something which is physically connected to the computer always so whenever you switch on the computer not only the processor and memory man memory starts working but any online storage typically at desk is always accessible just like in case of a PC where you boot the PC the operating system gets loaded your access to your disk drive all directories etc. etc. the tertiary storage is also non-volatile it has slow access time and it's called offline storage typically a floppy drive or a CD would be a tertiary storage or a tape because that is not ordinarily located inside so when even if you switch on the machine the storage is not accessible immediately unless you go open a cupboard pull out a CD push that CD in till then it's it's somewhere else it naturally has an impact of that time to access information some information is on CD and not on the list you can't access it immediately unless you go and bring out that CD and push it into your computer so online storage offline storage both refer to non-volatile storage but they differ in the access time and primary storage is cash or memory the other three categorizations is a schematic diagram of a disk drive so what you see on the right here is called the arm assembly these are magnetic heads you will notice that this magnetic head is seen here but what you don't see is the magnetic head which is actually meant for the lower surface of the top disk the lower surface of the top disk because that lower surface also magnetic then you have another disk here again there is the upper surface lower surface another disk here how many such disks can you have all of these disks are put on a single spindle there is one lacuna there of course that all the disk will rotate exactly at the same speed there is no difference secondly now you can just say go to tack number so and so and sector number so and so because now there is this number also how do you identify the disk number the disk itself has two surfaces so you may call disk one surface one disk one surface two disk two surface one these two surface two etcetera alternately because there is a magnetic head on each surface you may simply number the heads head one head two head three head four head five head six head seven as many heads that you have and and that will give you exactly which disk you want to read the problem with this arrangement is that suppose I want to read let's say something from track five on the first desk and then after that track 28 on the second desk now when I am reading track five on the first desk all the other heads are also exactly on the track five of the corresponding this it makes much more sense then that when I write information rather than writing it or reading it in arbitrary track numbers and head numbers suppose I say that I will read on a given track on all the days or I will write on all the days of the same track what will this save from the reading writing time the moment of head across the tracks because once a particular head is positioned on set fifth track all other heads are also positioned on the fifth track of the respective surface okay secondly in a single rotation of the desk all the heads which can read and write independently can be writing simultaneously consequently I get lot of parallelism so if I have let us say five days five days will mean how many surfaces in surfaces and it says certain capacity x bytes of a track then I can write 10 x bytes in a single rotation on the track consequently all such simultaneously writeable single track can be assumed to form what is known as a cylinder so if you take those of you have done engineering drawing can imagine that if you take a cross section of all the desk you you see a cylinder okay so this cylinder this is the kind of cylinder that you have how many surfaces the cylinder have this is the first surface second below this third fourth below this fifth six etcetera if you have five days a cylinder has ten sort of tracks inside it okay so now the tracks are you are you are now talking about a track T here however there is a surface one surface two surface three surface four etcetera and it makes sense to read or write one cylinder full of information because once you move the magnetic arm on top of a particular track all the tracks within that cylinder are aligned with the corresponding arms so consequently you can read or write information very fast this is exactly how a disc drive works in the very early days of computers when such disc packs were made these arms could withdraw back and this entire spindle with its cylinders could be taken out so these were called actually removable disc packs just to give you some idea of the cost the first online disc pack that we had was on a machine called EC 1030 which we got in 1974 in IIT in late sixties these were introduced by IBM mainframe machines the capacity of a disc pack was they were I think five four deaths or five days there the capacity of a disc pack was 7.25 megabytes find it laughable but that was a very large storage on 9 to 7.25 megabytes was the storage the disc pack used to be very heavy if I lifted it it would take some the disc drive itself was roughly the size of this table and the disc drive cost was five lakh rupees and the cost of a disc pack was 56,000 rupees 7.25 megabytes so you put it inside and that was the early disc storage okay the most of the storage used to be on magnetic tapes but magnetic tapes were sequential devices as we shall see the point is that even though those devices were costly that day they were preferred for information processing because they permitted direct access and we shall be spending some time on what it means of direct access in terms of our information is this mechanism clear anybody has any doubt on this mechanism she basically move the arms up and down on this and then as it resides on a particular track you can read or write on all tracks of the so-called hypothetical cylinder that is formed by the same number of same track number on each of the surfaces. Yeah sorry this are all tied to this spindle when the spindle rotates all this moves simultaneously so if this disc is rotating this is also rotating this is also rotating all this rotate exactly they are like a rigid mechanical combination. No because there are two different arms this is one arm and this is also another arm so this arm is accessing information from the magnetic coating on the top surface this arm is accessing information the magnetic coating on the bottom surface so both surfaces are accessible and there is an electronic mechanism to read or write independently from each magnetic head so there are ten magnetic heads simultaneously ten reads or writes can happen you must appreciate why this is possible the electronic speed with which the bits and bytes can pass between the magnetic heads and the inside computer are tremendously fast as opposed to that the movement of the disk is so slow that even if I read all of these simultaneously I am not going to have any problem buffering that entire data pushing it in or collecting large amount of data from computer and separating it out so that's not a problem as we shall see so all that is simultaneously accessible what is not simultaneously doable is I cannot be reading from track 5 using this arm and I cannot be reading on track 7 using this arm if I am reading track 5 from this surface all arms are exactly position on track 5 only these arms don't move independent of each other there is a head assembly that arm assembly that entire set of arms move forward or back so that's the limitation and that's the reason why we form this concept of a cylinder otherwise the notion of a cylinder was not there is that clear any any question you can have anti-clockwise or clockwise but the way the it is designed is it rotates only in one direction you cannot try to imagine something moving at say thousand rpm and just because you are somewhere you say now rotate anti-clockwise how long will it take to stop that this rotation and restart it okay you are apparently not connected with mechanical engineering or you are good because if you were I would seriously doubt your degree there is there is an inertia physical inertia of movement and it is therefore see this when they rotate they don't stop they keep rotating perpetually whether you are reading or not because if you stop the time required to start is not as if you switch it on and immediately you can start reading you have to let that this speed stabilize and to stabilize that speed will require sometime at least one or two seconds and one or two seconds is a large time if every time you want to read or write you shut up the days and start again it could be extremely painful in terms of the speeds that you have the rest of the mechanism that's not all right. So that is the reason why this typically move only in one direction and all arms move like that for the simple reason that you want to amortize the cost of this motor which moves this arm forward and backward if you wanted each arm to move separately you could design that but when you were to provide a motor for each of the arm and it will not be cost effective. So all that is connected with costing that's all design wise by the way there have been attempts to answer this question more fairly there have been attempts where a single disc has been made with multiple arms so one arm here one arm here one arm there and those arms could be positioned separately so as the disc moves you could read from multiple tracks or you could say this arm will read only these sectors these arms will read only these sectors effectively increasing the speed of retrieval of data but in a commoditized world you generally standardize the complete manufacturing and operational process and that is how it is now sorry CDs are mostly single sided so the CD drive if you see it will have a single head it won't have two heads but there are double sided CDs available there are double sided DVDs available these are technical differences fundamentally it does not matter all that is either increasing capacity or increasing speed or both but ultimately the boundaries of movement will decide the speed with which you can access information and two things that you should remember always as the most important or paramount important thing for a day is are these two seek time and latency as long as you can do simple arithmetic with these two quantities you can know exactly whether this particular disk is useful for your purpose or not as simple as that yeah yes we don't ensure the the disk drive the processor under the command of operating system which controls the described system ensures all that so if necessary it will not read as fast as the physical desk drive may permit for example suppose you have connected 400 disk drives to a single computer 400 is possible now your 400 controllers but the computer's internal bus is not capable of handling that kind of thing then even though you may issue an instruction to read from all 400 days operating system internally will decide that it will first read from these 20 next from those 20 next from those 20 so you and I don't have to worry about if we were to worry at that level we will not be building information systems will all be computer technologies worrying about bits and bytes and reading the idea of the technology is to buffer the other usage from the technology relevant points what we are discussing here is just to understand how these technologies work but we are buffered by several layers which we shall see shortly yes so coming back again the excess time to information whether you want to read or whether you want to write two main components seek type the time to move the excess up okay how much time and latency or rotational delay latency the general term latency is perhaps the most important term in information processing because you have to worry about latency of moving information or accessing from but latency is used in the context of disk drives as rotational delay which is average half rotation is that very clear why average half rotation because you might have just passed that point or you might be just reaching that point so it is zero latency or one full rotational delay later and seek type of course time to move the access using these two simple things you can do very reasonable computations to find out whether your requirements are met or not met so here are some storage excess timings rotation a speed from a floppy disk 360 revolutions per minute so what is the time for one rotation it's not exactly 83.3 but I think it's 83.3333 something like how do you find that out 360 rotations per minute means in 60 seconds so 36 rotations in one second that is thousand seconds divided by 36 is that the correct number I think so in thousand seconds there will be six rotations no yeah six rotations so no how is that yeah sorry rotation is half half the rotational speed not one rotation but half rotation is the latest is that number correct can somebody cross check so is this correct like 83.3 milliseconds is that time required on an average to access to to wait for the rotation wherever you have recorded that information the average seek time is 4 milliseconds to 20 milliseconds 20 milliseconds in case of a floppy disk 4 millisecond in case of a hard disk the seek time may appear to be 5 times more which is okay but that is not the important thing for a hard desk the rotational speed is much faster earlier hard disk used to be 3600 rpm modern hard disk typically rotated 7200 rpm 7200 revolutions per minute by the way is very fast speed that should answer your question you know when you are rotating something at 7200 revolutions per minute you cannot you know stop it and move it the other way around you can actually compare it with have you played that top anytime Bhora Bhora Gumaya Bhora Gavi so you spin that Bhora rapidly and then it starts rotating now imagine having to stop it and turn the other way around the effort itself required will be too much so that's again coming back to saying why you keep rotating in one day in fact if a hard disk is at stand still just to start rotating will require a few seconds before it stabilizes at this speed that's why before that you generally hard disk in information systems are never stopped you will find some laptops you know which you are not using laptop for quite some time you are doing some simple heating on no disk access is made the disk will stop at that moment try to save your file because some sound will come the disk will start rotating and you take some time before it is saved while the disk is rotating you say save it gets saved very quick the difference is because of any so is this clear the fundamental difference between floppy and hard disk is that the floppy rotates only at 360 revolutions per minute hard disk rotates at 7200 revolutions per minute another difference the magnetic head which is meant for a floppy drive actually touches the floppy drive so it actually touches the floppy drive and reads and writes because it touches the friction the mechanical friction after some time you can have a completely one look that's why the floppy goes bad very quickly if you take the floppy out and remove that cover you can see the the practical hose we call it a see-through floppy and you can see through it is nothing very hard disk or magnetic head never ever touches the surface never ever now that means it's a it's a more difficult technology the head actually floats on top of the surface there's a few micron gap and so beautifully organized because at 70 to 100 if the head was to touch it the head will break or the surface will scratch so that's a different technology altogether in fact many times when you have a major disaster with that desk they call it head crash that means somewhere the mechanism did not work and one head actually bungled up and touch the desk and couple end of it so that that technology is different and it is because of that that it can rotate at this speed and read and write at that speed anyway coming back to our original problem of time to access information rotational speed which is latency means half the rotational speed and average seek time of 4 millisecond to 20 millisecond these are the two key parameters so let's try and solve a problem here I told you can do arithmetic simply you have a 1 gigabyte desk you you don't get 1 gigabyte disk very easily now you get 40 gigabyte desk so you can you can translate that appropriately let's imagine that that hard disk has 800 cylinders it rotates at 70 200 rpm and let's say the average seek time is 12 millisecond and you have floppy disk the capacity is 1.44 megabytes rotation 360 rpm the average seek time is 40 milliseconds these are the parameters given find the time required to backup a 512 megabyte sequential file on a floppy so imagine that 512 megabytes of a file has been written it's half the capacity of the desk okay so you can assume whichever way it is stored you can assume it is stored on the sequential cylinders of the disk so that easiest you can keep moving from a track to track and that's a sequential file and you have to store it on a floppy obviously 512 megabytes will not fit onto a single floppy because the floppy capacity is 1.44 megabytes require multiple floppy but how long will it take to backup a 512 megabyte sequential file on a floppy disk so you have 1 gigabyte desk and you are floppy as a backup you want to safeguard that if the disk crashes or something goes wrong your file is important let's say the file contains all your accounting transaction or medical history or important information as far as the disk is concerned it does not give a diameter as to what is the semantics of that information it has so many bytes it needs to be conducted so as information site is you have to work out a backup and restore scheme you know in case of a problem so if this is the problem so tell me how long will it take can you do some simple arithmetic I always claim that at least 50 percent of the major problems of the world can be solved by a simple arithmetic provided you put the parameters right I would like to know what is your answer how many seconds minutes hours it might take it's wonderful no more than 8 seconds you could have also said more than 0 second but 8 second is it is it realistic what perhaps more than 8 seconds for writing the entire file on a floppy okay very very interesting you are moving the floppy disk at 360 revolutions per minute first of all it is very obvious to you that writing on a floppy is going to be much slower than reading from a desk so the reading speed actually does not matter as long as it is much more than writing speed are you seriously suggesting that within 8 seconds I can write 356 floppy so then 8 seconds is not the correct timing I have to back up 512 megabytes of data so here is a solution first I analyze the given data I do look at my problem seriously so I do look at my disk cylinder capacities 1.2 gigabytes divided by 800 because there are 800 tracks and that means each track is one cylinder okay so at a time I can read one cylinder in one rotation wherever I have put that heads so the cylinder capacity is roughly 1.5 megabytes which I say roughly is a floppy capacity so that means this information can be copied or read in one rotation after a seek once I seek I can read this much of information from the disk it will take about 12 milliseconds average seek time and 4.16 milliseconds is the average rotation delay totally about 16.16 milliseconds that means the time it takes to read from a desk one floppy full of information is 16.16 milliseconds I need not even worry about that that's the purpose of showing this approximate calculations for backing up 512 megabyte 5 over 250 floppies would be needed that is the first observation because the capacity is 1.44 megabytes time required to write on floppy how much is that time rotation takes 166.6 milliseconds seek time is 40 milliseconds so one track is written in 200 milliseconds remember I am not right reading I am writing so one full rotation means I can write one track after writing one track I go to the next track then next track then next track etc. so one track is written in 200 milliseconds how many tracks you assume that the original problem did not specify if you are allowed you would immediately ask me how many tracks because without it you cannot find out how long it will take suppose I told you entirely one track contains 1.44 megabytes then your entire solution is 200 milliseconds but if there are 80 tracks how much does that become time to write on one floppy is 80 into 200 milliseconds please note that this has nothing to do with the speed with which the information can be handled inside the main memory this is the time physically required for the stupid floppy to write and it is not 1.5 4 MB which can come in one shot and get written remember while the disk is a direct access device at one time I can write only on one track only when the track is fully written then I can go to the next track then next track there are 80 tracks so that means it will take 16 seconds to just write on one floppy not even 8 seconds 16 seconds you took half the revolution for us but you are writing one full rotation you have to write 16 seconds is the time to write one floppy now you have written that floppy what you have to do now that let's say the backup program tells you floppy is written you have to take that floppy out put a some kind of a label on it insert a new floppy inside is that a trivial job once we insert a floppy the operating system has to recognize that a new floppy has been put in is that a trivial activity those of you who recognize that assume that it takes two to four seconds time for taking a floppy out labeling and for entry in a record long why just labeling is not sufficient you must maintain a manual register in itself floppy number one twenty three four twenty four one twenty five whatever whatever some info assume that it is ten seconds okay because he's human type you can say no no no I'm very efficient so much second okay much second but some some time non-trivial non-zero time what does it add up to 16 seconds plus this time you know I got a very clever answer to batches ago he said so the moment that program says that thing is over I'll keep the other floppy radio take it out and put it in and then why it is writing I will put labeling etc etc clever answer so you take it out and put it in but still operating system has to recognize that you have to take it out put it somewhere so maybe two seconds so don't take ten seconds but two seconds at least two seconds plus sixteen seconds ordinary mortals like me will take ten seconds so sixteen plus ten seconds is that time per flop how much does it amount to close to three hours with half a minute per floppy 350 floppies will need 175 minutes or close to three hours this is if the operation is continuous what if one floppy gets wrongly labeled now imagine the human errors or machine technology one floppy goes by while recording what do you do typically you will not even come to know that recording has gone bad unless you try to read it you know when you do these kind of backups that is the reason why backup programs don't just write they really don't confirm that what has been written is exactly what I wanted to write if you wanted to do that double the time ultimately pay the risk so what if a floppy does not read later in short what is the correct answer to this problem the correct answer to this problem is it is wrong to use floppy is a backup for large files from an information perspective information system perspective that is the correct answer it's not the arithmetic but the arithmetic is so skewed that we should immediately so now we should still do all these computations are my but somebody says now I am giving you a computer system with one gigabyte of disk and a floppy drive for backup say thank you very much not good enough if my disk capacity is one gigabyte and now I have what 40 gigabytes 80 gigabytes 100 gigabytes that is why people do not give you floppy drives anymore they give you what CD rewritable CDs now writeable DVDs so what is the capacity of air you can now map the same problem the scales are different imagine your 150 gigabyte disk and 760 megabyte CD run and you have a writeable CD and people who have sold you that machine would have convinced you the CD can be used for backup now imagine that you have a 100 gigabyte file which you want to back up on CDs 120 CDs to be written each CD after writing you have to read to confirm that it can be read fortunately CDs are more reliable than floppies but still there are CDs and you have to label each CD so is that a correct backup strategy it is not just remember that these ratios are very very important when you talk about backup devices and things like although the example is centered around the backup timing and so on the fundamental fact of life is that capacities of the total disk drive the latency and seek time are the cardinal things by which you can decide how long it takes to access information let's go ahead and concentrate only now on reading and writing information meaningfully for our purposes okay so for backup and restore operation you need special solutions which exist and you can permit incremental backup so you will very rarely put a half gigabyte file as a single file half gigabyte file does not fall from the sky in the first place okay if it's a some kind of a master file where you are adding let's say file of records of alumni now the alumni all 33,000 of them did not fall from sky overnight they have been accumulated every year we are 300 or 350 so I will consequently I can keep backing them up every year as the backup increases that's called an incremental backup just an example but that's the kind of backup strategy in the context of information systems that you would like to derive however the real advantage of the desk is not for just storing the information and dumping it in the backup the real advantage is the ability to access it very quickly and that is why we like the notion that directly accessing data on desk is very important here's a thumb rule at 7200 rpm you can per second the number of input output operations that you can conduct on a desk are about 50 blocks we are not defined what a block is okay a block is some storage imagine the context of the diagram that I drew consider a sector as a block so a sector is a portion okay now imagine that in actual practice on the desk the information that you seek to read or write is not organized beautifully in the same cylinder within the same sector etc some information is written here some information is written there some information is written somewhere else consequently to access any piece of information which for the time being we call a block in order to access a block on the desk and how to randomly move my arm somewhere and wait for some half rotation so this is called a random IO now how many random IO's can I do per second I have said here 50 50 blocks per second can you justify this 50 blocks per second in one second if I can access 50 blocks then what is the time required to access one block one second divided by 50 that is thousand millisecond divided by 50 which is how much 20 milliseconds you will realize that the average access time and the average rotational delay together could be about 20 milliseconds for some slower this it has improved significantly now imagine that 20 millisecond has become 10 milliseconds you can make it faster than 10 because there is a rotational delay that is the killer the excess time is not much for hard desk but rotational delay if you want to randomly search somewhere that rotational delay is there so instead of 50 now you can say that the excess is about 100 blocks per second what we can do now is very important if we imagine that on a hard desk of a fairly large capacity the information that is relevant to us whether it is account information whether it is academic information of a student whether it is census information whether the insurance policy information these information pieces are stored in blocks and I want to randomly access a block and I imagine that my information is spread across the disk then ordinarily how many blocks can I access per second is about 100 50 to 100 and in past 30 years the number of blocks that can be accessed has actually moved from 25 to 100 that's all electronics on the other hand has moved far more rapidly so consequently the memory speeds are extraordinarily faster as compared to this is about random access now given 100 excesses 100 blocks per second how long would it take to access one required record suppose I want to read let's say one student's record CPI that record is how much the real number name CPI hostel number take our schema 80 hundred by record obviously if I know the block on the desk in which that students record is there the time that I will require a random excess is time to read one block which could be one by 50 seconds or one by 100 seconds provided I know which block how many blocks can be there what is the notion of a block that's an important what should I say component of the direct access a block is typically defined as a fixed capacity size described by the operating system not by the disk I will recommend that you read the Unix file system as a very interesting mechanism of indexing it actually formats the disk puts aside several blocks just to keep information about files and then maintains a list of free blocks which are available then whenever you ask for storage it allocates those blocks you might edit a file and delete some portion in which case you really some blocks so block number 23 released by you then the Unix operating system will allocate block number 23 to someone else who asked for that additional block later so I might have a file which is recorded let's say in block number thousand one thousand two thousand three and then 23 because 23 became the available block so consequently there is no guarantee that my information will be necessarily written in sequentially contiguous blocks why is that important because sequentially contiguous blocks at least originally when the disk was formatted actually where on contiguous tracks or contiguous cylinders consequently reading or writing would have been a bit faster so I had assumed that by reading or writing any block will take about 100 for a second consequently I must minimize the number of blocks that I read for any information handling is that is that clear imagine now if I have the data for all the library books in a disk remember like the index that we mentioned we are trying to find out on a computer I want to search a particular library book with its title these that and the rack position and I have the name of the library book or something and I have stored all of these data on the disk so how will I do that search quickly so for that I have constructed an example here here is an example of index schemes on the disk assume that there are records okay records which I have stored if I have records like a CPI records or employee records or what suppose each record has a pointer or address like in a position in a array so can we not prepare an index let's say there is a key of that record let's say the key is a 23 p 4 1 5 6 9 7 this is a this is the attribute that we know let's say part number is a part number the part numbers record is stored in some block and let's say this is the pointer to that block and let's say this is a 8 digit number 8 digit number means a lot of blocks I can have truly modern this have those many blocks supposing for each key that I have I prepare an index table like this then will it not be easier for me I search this index and in this index I get a pointer then I go to that pointer to go to that pointer I will require one block access right to actually once I get the pointer but how do I get a pointer because I don't know that your record is block 127 somebody versus record is block 1000 something how will I know so you him how I will identify some key which is very number in your case in this case this is the key so suppose I make like this what will it mean where do we store this index suppose we store this index as a file on the disk okay we store this index itself as a file on the disk how many blocks will the file occupy depends on the size of the file see imagine I am putting together a database of all the students appearing for 12 standard in let us say southern states easily about 1.28 crore records how many blocks will an index of 1.28 records occupy imagine the block size to be 512 bytes that is one block size let's go back here the key is 24 bytes pointer is 8 bytes totally 32 bytes one entry of indexes 32 bytes how many entries I can put in a block 128 entries assume there are 100 1.28 crore records deliberately chosen this number then I will have 1 lakh blocks just for the index so the assumption that I will quickly look at the index and then go to that block the time required to go to that block is 1 by 100 seconds or 1 by 50 seconds but to search that block number I will know how to read each one of these 1 lakh records 1 lakh blocks how many seconds assuming 100 blocks per second I have to read 1 lakh blocks on an average half lack because I might find it somewhere half it can I do binary search I can't because I don't know where these blocks are written okay so average time required to search the index if scanned sequentially is assuming 50 blocks per second read about 17 minutes after 17 minutes I will get a block which contain that fellow's information I say ah now within 1 by 50 a second I get your information is that good not useful so you see the non-trivial problem that we are talking about indexing something occurs to us very quickly but indexing on the days when each block if read or written randomly will require 1 by 50 seconds or 1 by 100 seconds we think it is very small a time but if the index itself has 1 lakh entries how will you handle that