Hash Tables (2024)

Hash Tables: Content Addressable

  • Hash tables allow content addressable data structures
    • Access elements using using a key rather than an index
      • Key is frequently a string
      • Access via an index provided by arrays

  • Code Example
 phonebook = {"Jack" : "831-1111", "Jill" : "831-9999"} print phonebook("Jack") 

  • Accessing by index or key
    • Access by index uses index value to calculate address of element
    • Access by key requires some kind of search to find element

  • Common Use: Compiler symbol table

  • Possible implementations:
    • Balanced Search Tree: Lg n search
    • Hash Table: expected constant time search

    Hash Table: Common in Modern Languages

    • Modern languages provide hash tables
      • Either built-in: Perl, Python, Ruby
      • Or via a library: Java HashMap, Ada Hashed_Map

    • Common names
      • Map
      • Dictionary
      • Hash Map
      • Associative array

    Problem Overview

    • How to do fast insertion, search, deletion of data with keys

    • Hash tables give expected case behavior of O(1)
      • Better than balanced tree
      • However, worst case behavior is O(n)

    • History:
      • Invented in IBM, about 1950
      • One of first uses of linked lists
      • Related to content addressable memory:
        • Look up memory location by its contents (ie value of key)
        • Used for page tables, etc

    • Presentation based on CLRS, Knuth3, Neop.

    Hash Table: Implementation Illustration

    • Consider 100 records with keys 1 .. 100
      • Easy implementation: array of 100 records: O(1) search

    • Consider 100 records with 9-digit ID as keys
      • Array of 10^9 records is not practical (array is large and sparse)
      • Hash table solution:
        • Array of 100 records
        • Use last 2 digits of ID as the key (ie array index)
        • Fast, but must deal with problem of collisions
          • Collision: two records with same key
        • Need a better way to get the 2 digit key
          • Using the last 2 digits may lead to too many collisions

    Hash Tables: Problem Generalization

    • Goal: store and search a set S of records

    • Assumptions:
      • Each element of S has a unique key

      • Keys are drawn from some universe U

      • |S| << |U|, so allocating array of size |U| would waste space

      • |U| is too large to allocate an array 1 .. |U|

    • Solution: Store set S in array T
      • T: array(0 .. m-1) of Data_Records
        • |T| = m

      • How big should m be (ie how big should T be)?
        • |T| = m = O(|S|)
        • In practice, m < c|S|, where c=2(?) or 3

    Hash Function

    • We want to delegate each data record to be stored in a specific element of T
      • This is done by using a function that maps keys to array elements

      • We define a function h that maps k to h(k)

      • We store record k in slot h(k), not in slot k
        • Sometimes we use the phrase "record k" or "element k" to mean the element with key k

    • Call the function h, a hash function
      • Function h transforms a key k to a unique element of T
      • h: U → 0 .. m-1
      • That is, h(k) ∈ 0 .. m-1

    • We say that k hashes to slot h(k)
      • A slot is a location in hash table

    • We will consider hash functions after looking at collisions and performance

    Collisions

    • What if two or more keys hash to the same slot?
      • We call this a collision

    • When are collisions possible/guaranteed?
      • if |S| < m then collision possible
      • if |S| ≥ m then collision guaranteed (by the ... principle)

    • Two methods of handling collisions:
    1. Chaining (usually the better method)
    2. Open addressing

    3. Other terms:
    • Chaining: aka open hashing
    • Open addressing: aka closed hashing

    • One text says that open hashing is aka open addressing
    • Other sources disagree with this

  • We focus on chaining

  • Chaining

    • Put all elements that hash to the same slot into a linked list

    • Slot j contains a pointer to the head of the list of all stored elements that hash to j

    • If no elements hash to j, slot j contains NIL
      • Table is initially all nil

    • Diagram: Universe of keys U, Set of keys S = {k: k in 1 .. 8}, slots in 1 .. 8
      • h(1) = h(4) = 1
      • h(2) = h(5) = h(7) = 5
      • h(3) = 7
      • h(6) = h(8) = 8

    Chaining - Insert

    • Insert(T, x) inserts x at the head of list T(h(x.key))

    • Worst case running time: O(1)

    • Precondition: x isn't already in the list
      • Performance to guarantee precondition: look at cost of search later

    Chaining - Delete

    • delete(T, x) delete x from the list T(h(x.key))

    • No search of list needed: x is a pointer

    • Worst case: O(1), if doubly linked lists

    • Worst case: time for search for predecessor, if singly linked lists
      • Based on load factor (See below)

    Chaining - Search

    • Search(T, k) search for element x with x.key=k in list T(h(k))

    • Running time: proportional to length of list T(h(k))
      • Analyze below

    Load Factor: α

    • Analysis of hashing with chaining uses the load factor

    • α = n / m

    • n = number of elements in the table
    • m = number of slots in the table = number of linked lists

    Simple Uniform Hashing

    • Our analysis of hashing will assume simple uniform hashing

    • Simple uniform hashing: any given element is equally likely to hash into any of the m slots in the table

    • Thus, the probability that xi maps to slot j is 1/m

    • The probability that two keys map to the same slot is also 1/m

    • If k elements are inserted, what is the expected number of elements at slot j? $$ \sum_{i=1}^k 1/m = (1/m) \sum_{i=1}^k 1 = (1/m) * k = k/m$$

    Worst Case Analysis of Search (Hashing with Chaining)

    • Search - Worst case: all n elements has to same slot
    • Assume m slots

    • Worst case: Θ(n), plus time to compute hash

    • What is the probability of the worst case occurring?
    • $$ m \times \left ( \frac{1}{m}\right )^{n} = m^{-n+1} $$
    • In opening example - 100 keys and 100 slots:
    • $$ 100 \times \left ( \frac{1}{100}\right )^{100} = 10^{-98} $$

    Analyzing Search: Some Notation and Facts

    • nj = length of list at T(j) [ie slot j]
      • n = n0 + n1 + ... + nm-1

    • Expected value of nj = α = n / m

    • Time to compute hash function is O(1)

    Average Case Analysis of Search: Two Cases

    • Case 1: Unsuccessful search - hash table contains no element with key k

    • Case 2: Successful search - hash table contains an element with key k

    Average Case Analysis of Unsuccessful Search

    • Any key k that is not in the table is equally likely to hash to any of the m slots (Why?)

    • Unsuccessful search must, of course, examine all elements of the list at the slot T(k)

    • Expected length of list T(k) is α

    • Total time: Θ(1 + α), including cost of computing h(k)
      • Add 1 for time to compute value of h(k)

    Average Case Analysis of Successful Search - Simple Version

    • Assume all lists are length α = n/m (and that m divides n evenly so that α = n/m is an integer)

    • Number of elements to look at: 1 + number inserted before element with key k

    • Then average number of comparisons is (1 + n/m) / 2 = Θ(1/2 + α/2): $$ 1/\alpha \times \sum_{i=1}^\alpha i = 1/\alpha \times (\alpha(\alpha + 1)) / 2 = (1 + \alpha) / 2 $$
    • Including the cost of computing the hash function gives Θ(3/2 + α/2) = Θ(1 + α)
      • Same as unsuccessful

    Average Case Analysis of Successful Search - More Complex

    • What if lists are not same length?

    • Probability that each list is searched is proportional to number of elements in that list

    • Number of elements examined during a succesful search for x depends number of elements to left of x in T(h(x.key))

    • Need to calculate the average, over the n elements xi in the table, of how many elements were inserted into xi's list after xi was inserted

    • Let's think about x1, x2, x3, ..., x8 being inserted into a table of size 4 (ie n=8, m=4)
      • Assume lists of size 2, 3, 1, 2
      • Let xi be the ith element added into the table
      • Where does x1 go?
      • Where does x2 go?
      • After xi is entered, what is the probability that xi+1 will map to the same list (ie be to xi's left)?
      • After xi is entered, what is the expected number of elements to its left?

    Average Case Analysis of Successful Search (continued)

    • Let xi be the ith element added to the table

    • On average, how many of the remaining n-i elements will map to the same list as element xi?

      • Answer: $\sum_{j=i+1}^n 1/m = (1/m) * \sum_{j=i+1}^n 1 = (1/m) * (n-i) = (n-i)/m$

    • Thus, on average, the number of searches needed to find element xi is
      $1 + \sum_{j=i+1}^n 1/m = 1 + (n-i)/m $

    • Thus, on average, the number of searches to find any element is the average over all xi of finding that xi
      • This average is calculated as follows:
      • (1/n) [number of comparisons to find element xi, for all xi]
        = (1/n) [Σi=1..n (1 + (n-i)/m)]
        = (1/n) [n + Σi=1..n (n-i)/m]
        = 1 + (1/n) [Σi=1..n (n-i)/m]
        = 1 + (1/mn) [Σi=1..n (n-i)]
        = 1 + (1/mn) [Σj=0..n-1 j]
        = 1 + (1/mn) [(n-1)n/2]
        = 1 + (1/m) [(n-1)/2]
        = 1 + n/2m - 1/2m
        = 1 + α/2 - α/2n

    • Adding time for a computing h(x) gives Θ(2 + α/2 - α/2n) = Θ(1 + α)

    Average Case Analysis of Successful Search: Bottom Line

    • If n = O(m) [ie number of elements proportional to size of table]
    • then α = O(1)
    • and Search time = O(1)

    Comparison with Binary Search

    • Binary search run time = O(lg n)

    • In a hash table with n=m, what is the probability that search takes lg n time?
      • (ie that any list is length lg n)

    • Probability that a given bucket contains k keys ≤ ${n \choose k} \times (1/m)^k$
      • Probability of any k ending up in same slot is (1/m)^k
      • There are ${n \choose k}$ ways of choosing k elements from n elements

    • For n=2^16, this gives less than 3 out of 10^9

    Hash Functions

    • Goal: provide Simple Uniform Hashing (ie each slot equally likely)

    • Problem: Probability distribution of keys usually unknown
    • Problem: Keys not drawn independently

    • Requirement: Must map key to an integer

    • Two common functions: Division and Multiplication

    Mapping Keys to Integers

    • Example: Consider character strings to be numbers in base 128 = 2^7
      • 128 basic ascii values
    • Example: String "CLRS"
      • Ascii: C=67, L=76, R=82, S=83

    • CLRS yields 67*128^3 + 76*128^2 + 82*128^1 + 83*128^0 = 141_764_947

    Division Method

    • h(k) = k mod m

    • Example: m = 20, k = 91, yields h(k) = 11

    • Advantage: fast

    • Disadvantage: must avoid certain values of m
      • avoid powers of 2 (x mod m gives low bits of x)
      • m = 2^p - 1 can also be bad

    • Typical choice: m should be prime not too close to 2^r for some r

    Multiplication Method

    • h(k) = floor m*(k*A mod 1)
      • A in range 0 < A < 1
      • kA mod 1 = kA - floor(kA) = fractional part of kA

    • Disadvantage: slower than division

    • Advantage: Value of m not critical

    • Some values of A work better than others

    • Knuth suggests A = (sqrt(5) - 1)/2
      • What is this number?

    Universal Hashing

    • Problem with hashing: for any specific hash function, a specific set of keys will cause cause worst case performance
      • Hacker could deliberately use that set of keys

    • Avoid case where all values hash to same location by
      • Choosing a hash function at random from a set of possible functions

    • Having no fixed hash function guarantees that no specific set of values will lead to poor performance
      • Similar to picking random pivots for quicksort

    Open Addressing

    • Store collisions in open cell of table instead of in a list
    • AKA: Closed hashing
    • Schemes:
      • Linear probing:
        • store collision value in next open slot, in order
        • h(k, i) = (h'(k) + i) mod m
        • tends to have clustering
      • Quadratic probing and Double hashing:
        • store collision value in open slot, some distance away
        • h(k, i) = (h'(k) + c*i + d*i^2) mod m [quadratic probing]
        • h(k, i) = (h'(k) + i*s(k)) mod m [double hashing]
        • tends to have less clustering than Linear

    Perfect Hashing

    • Worst case performance: O(1)
    • Requires a fixed set of keys

    • Uses two levels of hashing:
      • Keep collisions in hash table instead of list
      • Each slot has its own has hash function (based on its index)
      • Worst case space can be shown to be O(n)
    Hash Tables (2024)
    Top Articles
    Digital Media - Lead An Innovative, Growing Industry
    What Is An IBAN Number? | BIC and SWIFT Codes - HSBC UK
    Scheelzien, volwassenen - Alrijne Ziekenhuis
    Chs.mywork
    Chatiw.ib
    Top Scorers Transfermarkt
    Ymca Sammamish Class Schedule
    Gabriel Kuhn Y Daniel Perry Video
    His Lost Lycan Luna Chapter 5
    B67 Bus Time
    Cape Cod | P Town beach
    Breakroom Bw
    Worcester On Craigslist
    104 Whiley Road Lancaster Ohio
    Viha Email Login
    Busby, FM - Demu 1-3 - The Demu Trilogy - PDF Free Download
    Palm Coast Permits Online
    Pretend Newlyweds Nikubou Maranoshin
    Menards Eau Claire Weekly Ad
    zom 100 mangadex - WebNovel
    Amazing Lash Studio Casa Linda
    Two Babies One Fox Full Comic Pdf
    Talk To Me Showtimes Near Marcus Valley Grand Cinema
    Engineering Beauties Chapter 1
    Troy Gamefarm Prices
    3Movierulz
    Amelia Chase Bank Murder
    Claio Rotisserie Menu
    What Sells at Flea Markets: 20 Profitable Items
    Meijer Deli Trays Brochure
    Imagetrend Elite Delaware
    Melissa N. Comics
    Bozjan Platinum Coins
    Marie Peppers Chronic Care Management
    Buhsd Studentvue
    Michael Jordan: A timeline of the NBA legend
    Paperless Employee/Kiewit Pay Statements
    8776725837
    Wgu Admissions Login
    Matt Brickman Wikipedia
    Crigslist Tucson
    Terrell Buckley Net Worth
    5103 Liberty Ave, North Bergen, NJ 07047 - MLS 240018284 - Coldwell Banker
    Bank Of America Appointments Near Me
    Union Supply Direct Wisconsin
    Ajpw Sugar Glider Worth
    Big Brother 23: Wiki, Vote, Cast, Release Date, Contestants, Winner, Elimination
    Diccionario De Los Sueños Misabueso
    Chitterlings (Chitlins)
    Electronics coupons, offers & promotions | The Los Angeles Times
    Latest Posts
    Article information

    Author: Kelle Weber

    Last Updated:

    Views: 6619

    Rating: 4.2 / 5 (53 voted)

    Reviews: 92% of readers found this page helpful

    Author information

    Name: Kelle Weber

    Birthday: 2000-08-05

    Address: 6796 Juan Square, Markfort, MN 58988

    Phone: +8215934114615

    Job: Hospitality Director

    Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

    Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.