Testing different image hash functions – Content Blockchain (2024)

Menu

For the image-ID component of the ISCC – that is, the content ID of image files – we need a hash function which, for minor changes to the file, produces an identical hash, or rather one that is as similar as possible while producing a small number of false positive collisions.

Requirements

In view of the generic deployability of the ISCC, we place value on finding an algorithm that has a moderate computation requirements and that is easy to implement. Calculation duration can easily get out of hand when processing images.For example, a file sized 1920×1080 contains 2,073,600 pixels which might need to be processed individually. To further narrow down the number of eligible hash functions, we have defined image modifications against which our hash should ideally be robust:

Should be robust against:

  • Changes in brightness (5% – 20%)
  • Changes in contrast (5% – 20%)
  • Gamma correction
  • Watermark
  • JPEG compression (5% – 20%)
  • Scaling (50% – 300%)
  • Grayscale

Should be partially robust against:

  • Salt and Pepper
  • Multiplicative noise
  • Cropping (1% – 5%)
  • Gaussian smoothing (5% – 20%)
  • Color adjustments

Does not have to be robust against:

  • Rotation
  • Skewing

Eligible hash functions

Except for block hashing, we have implemented all hash functions and used no third party libraries other than PIL. Firstly we use PIL to convert the images to grayscale. According to its documentation PIL uses ITU-R 601-2 luma transform for this. Secondly, we are using PIL to scale the images, which is described here. For resampling we use bicubic interpolation, which is described here.

The sample picture used in this document was taken from Morguefile.com.

Average Hash

The average hash algorithm first converts the input image to grayscale and then scales it down. In our case, as we want to generate a 64 bit hash, the image is scaled down to 8×8 pixels.

Next, the average of all gray values of the image is calculated and then the pixels are examined one by one from left to right. If the gray value is larger than the average, a 1 is added to the hash, otherwise a 0.

Testing different image hash functions – Content Blockchain (2)

Testing different image hash functions – Content Blockchain (3)

Testing different image hash functions – Content Blockchain (4)

Testing different image hash functions – Content Blockchain (5)

Hash: 0000000000010000000000000010000001000010100001101111111111111110

The block hash algorithm divides the image into blocks and generates a value for each block, either 1 or 0. These values are combined serially from left to right into a hash. As we need a 64 bit hash, we divide the image into 64 blocks.

As the block hash algorithm does neither grayscale conversion nor does it scale the image down, it was initially rather slow, especially when processing larger images. However, we were able to solve this problem by scaling down the input image to 256×256 pixels; thus, every block still has 16 pixels, but the calculation duration was improved considerably. Additionally, we first converted each image to grayscale.

Testing different image hash functions – Content Blockchain (6)

Testing different image hash functions – Content Blockchain (7)

Testing different image hash functions – Content Blockchain (8)

Testing different image hash functions – Content Blockchain (9)

Testing different image hash functions – Content Blockchain (10)

Hash: 0011100010011100000011100110001101000011100001110100001011100110

Difference Hash

Similar to the average hash algorithm, the difference hash algorithm initially generates a grayscale image from the input image, which in our case is then scaled down to 9×8 pixels. From each row, the first 8 pixels are examined serially from left to right and compared to their neighbor to the right, which, analogous to the average hash algorithm, results in a 64 bit hash.

Testing different image hash functions – Content Blockchain (11)

Testing different image hash functions – Content Blockchain (12)

Testing different image hash functions – Content Blockchain (13)

Testing different image hash functions – Content Blockchain (14)

Hash: 1111000000110000101110001100111010000110010011001000111010001110

Median Hash

The median hash algorithm works similar to the average hash algorithm, except that it does not compare to the average but to the median.

Testing different image hash functions – Content Blockchain (15)

Testing different image hash functions – Content Blockchain (16)

Testing different image hash functions – Content Blockchain (17)

Testing different image hash functions – Content Blockchain (18)

Hash: 0000000010011100000011000110001101000010100001111111111111111111

Perceptual Hash

The perceptual hash algorithm, too, initially calculates the gray value image and scales it down. In our case, we desire a factor of 4, which is why we scaled down to 8*4×8*4, that is, a 32×32 image.

To this image we apply a discrete cosine transform, first per row and afterwards per column.

Discrete cosine transform:Testing different image hash functions – Content Blockchain (19)

The pixels with high frequencies are now located in the upper left corner, which is why we crop the image to the upper left 8×8 pixels. Next, we calculate the median of the gray values in this image and generate, analogous to the median hash algorithm, a hash value from the image.

Testing different image hash functions – Content Blockchain (20)

Testing different image hash functions – Content Blockchain (21)

Testing different image hash functions – Content Blockchain (22)

Testing different image hash functions – Content Blockchain (23)

Testing different image hash functions – Content Blockchain (24)

Testing different image hash functions – Content Blockchain (25)

Testing different image hash functions – Content Blockchain (26)

Hash: 1010010010101101100110011011001101100010100100000111011010101110

Wavelet Hash

Analogous to the average hash algorithm, the wavelet hash algorithm also generates a gray value image sized 8×8. Next, a two-dimensional wavelet transform is applied to the image. Our tests have revealed that the results improve when we set the top row to 0, that is, to black and re-apply the wavelet transform three times. We took this idea from the image hash library imagehash.

Next, analogous to the perceptual hash algorithm, each pixel is compared to the median and the hash is calculated.

Testing different image hash functions – Content Blockchain (27)

Testing different image hash functions – Content Blockchain (28)

Testing different image hash functions – Content Blockchain (29)

Testing different image hash functions – Content Blockchain (30)

Testing different image hash functions – Content Blockchain (31)

Testing different image hash functions – Content Blockchain (32)

Testing different image hash functions – Content Blockchain (33)

Testing different image hash functions – Content Blockchain (34)

Hash: 0000010110111001111110011111101111111010000000000000011100000101

We took the block hash from the library blockhash-python, implemented the average hash, difference hash, perceptual hash and wavelet hash algorithms based on the imagehash library and developed the median hash algorithm ourselves.

Our Tests

We tested the hash functions against each other using the image database Caltech101which contains 9143 images. For every image we created 10 individual test images with slight, randomized modifications.

We have:

  • increased and decreased brightness
  • increased and decreased contrast
  • added a watermark
  • converted the image to grayscale
  • scaled it down
  • cropped the borders
  • applied Gaussian smoothing
  • applied JPEG compression

This resulted in 100,584 test images. Next, the individual hashes for each image were calculated and stored along with the image number and the type of error (i. e., the modification applied) in an ElasticSearch index.

We conducted three different tests on each hash:

  1. Grouping by image and counting the number of instances where a hash was calculated that differed from that of the source image.
  2. Grouping by image and calculating the average Hamming distance of a modified image from the source image.
  3. Grouping by hash and counting the number of collisions, that is, the number of images that were assigned the same hash value despite them not belonging to the same source image.

Results

Different hash than the source image

Type of erroraHashbHashdHashmHashpHashwHash
Gaussian smoothing1828 (20.0%)1241 (13.6%)3807 (41.6%)2122 (23.2%)786 (8.6%)971 (10.6%)
Grayscale0 (0%)0 (0%)0 (0%)0 (0%)0 (0%)0 (0%)
Increased brightness3874 (42.4%)3206 (35.1%)5357 (58.6%)3451 (37.7%)3986 (43.6%)2844 (31.1%)
Decreased brightness1717 (18.8%)809 (8.8%)4935 (54.0%)2115 (37.7%)1030 (11.3%)1420 (15.5%)
JPEG compression455 (5.0%)598 (6.5%)1616 (17.7%)658 (7.2%)546 (6.0%)514 (5.6%)
Increased contrast2559 (28.0%)2062 (22.6%)4568 (50.0%)2474 (27.1%)2460 (26.9%)2197 (24.0%)
Decreased contrast2056 (22.5%)766 (8.4%)5223 (51.1%)2154 (23.6%)1063 (11.6%)2026 (22.2%)
Scaled554 (6.1%)1297 (14.2%)2091 (22.9%)851 (9.3%)664 (7.3%)614 (6.7%)
Watermark3600 (39.4%)5046 (55.2%)7167 (78.4%)4029 (44.1%)4242 (46.4%)2896 (31.7%)
Cropped8543 (93.4%)8750 (95.7%)9075 (99.2%)8514 (93.1%)9088 (99.4%)7840 (85.7%)
Total25,186 (25.0%)23,775 (23.6%)43,839 (43.6%)26,368 (26.2%)23,865 (23.7%)21,322 (21.2%)

Average Hamming distance

Type of error

aHash

bHash

dHash

mHash

pHash

wHash

Gaussian smoothing

0.24

0.31

0.61

0.31

0.17

0.20

Grayscale

0.00

0.00

0.00

0.00

0.00

0.00

Increased brightness

0.81

1.22

1.28

0.94

1.14

0.68

Decreased brightness

0.24

0.20

1.15

0.48

0.23

0.31

JPEG compression

0.06

0.13

0.21

0.10

0.12

0.10

Increased contrast

0.38

0.60

0.84

0.43

0.58

0.52

Decreased contrast

0.29

0.59

0.84

0.48

0.23

0.52

Scaled

0.07

0.34

0.28

0.12

0.15

0.12

Watermark

0.64

2.46

2.51

1.30

1.06

0.67

Cropped

4.11

6.02

6.23

4.69

7.47

2.98

Total

0.68

1.14

1.44

0.88

1.12

0.61

Collisions

Hash

Collisions

aHash

4438

bHash

711

dHash

421

mHash

4432

pHash

483

wHash

7866

We also tried to combine the hashes with AND, OR, XOR or with a similarity hash, but the results were at best as good as those of the best of the hashes thus combined.

Calculation duration

Hash

Average calculation duration per image

aHash

2.25 ms

bHash

112.70 ms

dHash

0.33 ms

mHash

0.33 ms

pHash

60.05 ms

wHash

0.91 ms

As the perceptual hash showed good results in overall deviations and the average Hamming distance while producing few false positives and being twice as fast as the block hash algorithm, we opted for the perceptual hash algorithm.

Our building blocks

Content Identifiers

Smart Licenses

Transaction Models

Scenarios

Usage Based Voluntary Payments

Content Resale

Crowd Content Acquisition

Content Marketing Platform

Content Timestamping

Resources

Essays and Articles

Drafts and Concepts

Partner Requirements

Research

About us

Contact

Imprint

Privacy Policy

Testing different image hash functions – Content Blockchain (35)

©2019 – The Content Blockchain Project

Testing different image hash functions – Content Blockchain (2024)
Top Articles
How To Make Money On Robinhood [Ultimate Guide]
Believe it or not, you can make great pizza without cheese. Here’s how.
Maxtrack Live
Spectrum Gdvr-2007
Craigslist Free En Dallas Tx
Chicago Neighborhoods: Lincoln Square & Ravenswood - Chicago Moms
News - Rachel Stevens at RachelStevens.com
Summit County Juvenile Court
Get train & bus departures - Android
Driving Directions To Fedex
Insidious 5 Showtimes Near Cinemark Tinseltown 290 And Xd
Top Financial Advisors in the U.S.
7.2: Introduction to the Endocrine System
Goteach11
How To Delete Bravodate Account
104 Presidential Ct Lafayette La 70503
Power Outage Map Albany Ny
Slushy Beer Strain
Animal Eye Clinic Huntersville Nc
Hijab Hookup Trendy
4156303136
The Witcher 3 Wild Hunt: Map of important locations M19
Cashtapp Atm Near Me
London Ups Store
Niche Crime Rate
Classic | Cyclone RakeAmerica's #1 Lawn and Leaf Vacuum
Csi Tv Series Wiki
Yard Goats Score
Www.publicsurplus.com Motor Pool
Schedule An Oil Change At Walmart
Quadcitiesdaily
Food Universe Near Me Circular
Employee Health Upmc
Criterion Dryer Review
Weather Underground Durham
Will there be a The Tower season 4? Latest news and speculation
Moonrise Time Tonight Near Me
What Happened To Father Anthony Mary Ewtn
Lowell Car Accident Lawyer Kiley Law Group
Unlock The Secrets Of "Skip The Game" Greensboro North Carolina
Rage Of Harrogath Bugged
Craigslist Free Manhattan
Conan Exiles Armor Flexibility Kit
Directions To Cvs Pharmacy
Mbfs Com Login
Mitchell Kronish Obituary
Arcanis Secret Santa
Sinai Sdn 2023
Suppress Spell Damage Poe
Madden 23 Can't Hire Offensive Coordinator
Amourdelavie
Latest Posts
Article information

Author: Domingo Moore

Last Updated:

Views: 6120

Rating: 4.2 / 5 (53 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Domingo Moore

Birthday: 1997-05-20

Address: 6485 Kohler Route, Antonioton, VT 77375-0299

Phone: +3213869077934

Job: Sales Analyst

Hobby: Kayaking, Roller skating, Cabaret, Rugby, Homebrewing, Creative writing, amateur radio

Introduction: My name is Domingo Moore, I am a attractive, gorgeous, funny, jolly, spotless, nice, fantastic person who loves writing and wants to share my knowledge and understanding with you.