15.1.1. Introduction¶
Hashing is a method for storing and retrieving records from a database.It lets you insert, delete, and search for records based on a searchkey value.When properly implemented, these operations can be performedin constant time.In fact, a properly tuned hash system typically looks at onlyone or two records for each search, insert, or delete operation.This is far better than the \(O(\log n)\) average cost requiredto do a binary search on a sorted array of \(n\) records,or the \(O(\log n)\) average cost required to do an operationon a binary search tree.However, even though hashing is based on a very simple idea,it is surprisingly difficult to implement properly.Designers need to pay careful attention to all of the detailsinvolved with implementing a hash system.
A hash system stores records in an array called a hash table,which we will call HT
.Hashing works by performing a computation on a search keyK
in a way that is intended to identify the position inHT
that contains the record with key K
.The function that does this calculation is called thehash function,and will be denoted by the letter h.Since hashing schemes place records in the table in whatever ordersatisfies the needs of the address calculation, records arenot ordered by value.A position in the hash table is also known as a slot.The number of slots in hash table HT
will be denoted by thevariable \(M\) with slots numbered from 0 to \(M-1\).
The goal for a hashing system is to arrange things such that,for any key value K
and some hash function \(h\),\(i = \mathbf{h}(K)\) is a slot in the table such that\(0 <= i < M\),and we have the key of the record stored atHT[i]
equal to K
.
Hashing is not good for applications where multiplerecords with the same key value are permitted.Hashing is not a good method for answering range searches.In other words, we cannot easily find all records (if any) whose keyvalues fall within a certain range.Nor can we easily find the record with the minimum or maximum keyvalue, or visit the records in key order.Hashing is most appropriate for answering the question, ‘What record,if any, has key value K
?’For applications where all search is done by exact-match queries,hashing is the search method of choice because it is extremelyefficient when implemented correctly.As this tutorial shows, however, there are many approachesto hashing and it is easy to devise an inefficient implementation.Hashing is suitable for both in-memory and disk-based searching andis one of the two most widely used methods for organizing largedatabases stored on disk (the other is the B-tree).
As a simple (though unrealistic) example of hashing,consider storing \(n\) records, each with a unique key value inthe range 0 to \(n-1\).A record with key k
can be stored inHT[k]
, and so the hash function is\(\mathbf{h}(k) = k\).To find the record with key value k
, look inHT[k]
.
Settings
Saving...
Server Error
Resubmit
In most applications, there are many more values in the key rangethan there are slots in the hash table.For a more realistic example, suppose the key can take any value inthe range 0 to 65,535 (i.e., the key is a two-byte unsigned integer),and that we expect to store approximately 1000 records at any given time.It is impractical in this situation to use a hash table with65,536 slots, because then the vast majority of the slots would beleft empty.Instead, we must devise a hash function that allows us to store therecords in a much smaller table.Because the key range is larger than the size of the table,at least some of the slots must be mapped to from multiple key values.Given a hash function h and two keys \(k_1\) and\(k_2\), if\(\mathbf{h}(k_1) = \beta = \mathbf{h}(k_2)\)where \(\beta\) is a slot inthe table, then we say that \(k_1\) and \(k_2\) have acollision at slot \(\beta\) under hash function h.
Finding a record with key value K
in a database organized by hashingfollows a two-step procedure:
Compute the table location \(\mathbf{h}(K)\).
Starting with slot \(\mathbf{h}(K)\), locate the recordcontaining key
K
using (if necessary) acollision resolutionpolicy .