 on hashing part 1 myself Rashmi Dixie. So, let us begin the session. At the end of this session students will be able to explain type of hashing used in database. So, this is a learning outcome of this particular session. So, what is hashing? Hashing is an effective technique to calculate the direct location of a data record on a disk without using index structure. Hashing uses hash function with a search key as a parameter to generate the address of a data record. An ideal hash function is uniform and random. So, uniform that is each bucket is assigned the same number of search key values from the set of all possible values and random. So, each bucket will have the same number of records assigned to the irrespective of actual distribution of search key values in the file. So, uniform and random are the two characteristic of ideal hash function. So, two types of hashing static hashing and dynamic hashing. So, what are the characteristic of a static hashing? First primary pages fix, allocated sequentially, never deallocated and overflow pages if needed. Look at this particular diagram. So, pages are fixed, allocated sequentially and never deallocated and if required overflow pages are used. So, hash with a key value mod n that is a number of records it will give you the bucket to which data entry with key k belongs. So, this formula is used to find out the address of a data with search key k. So, m is a hash of buckets. Now, we will see an example of hash file organization, hash file organization of instructor file using department name as a key. So, first we will explain, I will explain. So, there are 10 buckets, the binary representation of the ith character is assumed to be integer high, the hash function returns the sum of the binary representation of the character modulo 10. So, for example, we are now we are using hash function with department on department name. So, hash of a music is 1, hash of a history is 2, sum of binary representation of the character modulo 10. So, music 1, history 2, physics 3, electrical engineer also 3. So, bucket 0, 1, 2, 3, 4, 5, 6, 7. So, this is the hash file organization of instructor file using department name as key. So, music 3, so music in the sorry music 1, so music record in bucket 1. So, physics in hash value is 3. So, address of that record which department name physics is bucket 3. Now, how to handle if overflow occur? Now, when you say that bucket overflow can occur because of insufficient bucket and skew in distribution of records. And this can occur due to two reason, multiple records have some same search key value and chosen hash function produces non-uniform distribution of key value. Although the probability of bucket overflow can be reduced, it cannot be eliminated, it is handled by using overflow bucket. So, whenever there is a insufficient number of buckets and search key give or multiple records have same search key, at that time bucket overflow can occur. We cannot reduce bucket overflow, we try to make minimum. So, there are how to handle bucket overflow. So, overflow chain, the overflow of buckets, the overflow buckets of a given buckets are chain together in a linked list. Means, extra buckets are attached here to handle overflow of buckets and this scheme is called as close hashing. And an alternative scheme which is called as open hashing which does not use overflow bucket, so it is not suitable for database application. So, there are two technique to handle bucket overflow. The first one is close hashing and the second one is open hashing. So, close hashing the attachment of overflow buckets and the second one is does not use overflow buckets which is not suitable for database application. Now, the next point is hash is not suitable for hash indexes. So, hashing can be used not only for file organization, but also flaws index structure creation. And hash index organizes the search key with their associated record pointer into hash file structure. Hash indices are always secondary indices. If the file itself is organized using a hashing, a separate primary hash index on it using the same search key is unnecessary. However, we use the term hash index to refer to both secondary index structure and hash organized file. Look at the example of hash index. So, with the help of hashing, hash index file is created bucket 0 to 7 and it point to the record. Now, all of you student, please pause your video and try to answer what are the deficiencies of static hashing. Up to now we have seen static hashing. Now, try to find out or figure out the deficiencies of static hashing. So, in static hashing function h that is a hash function maps search key value to a fixed set of b of bucket address. Database grows or shrinks with a time. If the initial number of buckets is too small and file grows, performance will degrade due to much overflow. We cannot assure about the size of database. It may be increases as number of records increases, it may be decreases with number of records. And in static hashing, the mapping with the search key to fixed set of bucket addresses. So, if the initial number of buckets is too small and the file grows, the performance will degrade due to much overflow and if the space is allocated for anticipated growth, a significant amount of space will be wasted initially. So, allocation more wastage of space, allocation less, it bucket overflow. So, if database ring again space will be wasted. So, what is the solution on deficiencies of static hashing? So, one solution periodic reorganization of the file with new hash function. So, periodic reorganization of the file, it sounds expensive. Why? Periodic reorganization maintenance increases disrupt normal operation. Shrinking, growing of a database is natural. So, how many times you will go for periodic reorganization? So, expensive and it disrupts normal operation. So, better solution, allow the number of buckets to be modified dynamically. So, what is the disadvantage of static hashing buckets fixed? So, try to allow the number of buckets to modify dynamically. So, actually let us start database small. So, buckets will be less. So, as database increases, buckets will be going on add, if the database ring unused buckets will be deleted automatically and this is called as dynamic hashing. We will see in the next video what is dynamic hashing. So, this is a reference. Thank you.