TinEye is an indexing engine made for finding images.
It’s actually indexing 901 million images and the retrieval time is quite fast: for the popular Lenna image in ( 512 × 512 pixels, file size: 463 KB, MIME type: image/png)
233 results
searched over 901 million images in 1.031 seconds
What algorithm make the search so efficient? Of course doing it in real time is not possible therefore the engine compare the submitted image with the database of signatures/hashes built by the tineye robots/crawlers similar to what google does with web pages.
I’ve done some experiments with the tineye search engine to estimate what features the hash function uses to compare images.
The sample image used (JPEG image, 980×306, 221.5 KB) is from emirate arabs and it’s indexed uniquely by tineye (there are indeed 20 images but they are all equal: same website with different languages).
20 results
searched over 901 million images in 0.138 seconds
The comparison signature is investigate taking the original image, processing it and submitting for search. The first approach is in the space domain and the second one in the frequency domain.
Space domain analysis
For the space analysis on the left there’s the applied filter on the original image and on the right there’s reported the number of results reported by tineye.
Gaussian noise 10×10 pixel : 20 results
Gaussian noise 15×15 pixel : 12 results
Gaussian noise 20×20 pixel: 0 results
Gaussian noise 50×50 pixel: 0 results
Pixelize 4×4 pixel: 20 results
Pixelize 8×8 pixel: 12 results
contrast 60%: 20 results
contrast 90%: 12 results
contrast 100%: 12 results
linear color mapping: 0 results
exponential color mapping: 12 results
double exponential color mapping: 0 results

dilate 1 pass: 20 results
erode 1 pass: 12 results
erode 2 pass: 12 results
sharpen 50 pixel wide: 20 results
Flips
Flip horizzontal: 0 results

Flip vertical: 0 results
Rotation
1 degree rotation: 20 results
6 degree rotation: 20 results
90 degree rotation: 0 results
Cropping
original image size 980×306
crop1 512×304 rel=51.90% Yes 22
crop2 369×304 rel= 37.41% Yes 22
crop3 237×150 rel=11.85% No
crop4 190×222 rel=14.06 % No
crop5 225×306 rel=22.96% Yes 3
crop6 225×215 rel=16.13% No
colorize
hue 200
saturation 30
light 0
Yes 12
theta1
threshold 205 to 255
No
sobel filter: 0 results
It’s very surprising to see how the hashing function isn’t robust to rotation and mirroring.
My guess is that the comparison function is based on different measure of the image, a possible guess:
- color statistic
- image segmentation
- frequency domain (maybe)
and using a bayesian classifier to guess if the image is similar or not.
Why frequency domain? Because the horizzontal flip swaps the FFT coefficients (along the vertical axis) keeping nevertheless the histogram of the image equal and it tineye is insensible to it.

Indeed using a low pass filter in the frequency domain gives still positive results even if the histogram of the image is changed. I’m investigating different filters at the moment to discover more (please be patient I don’t have so much time at the moment)
Frequency domain (high pass)
I’m using an high pass filter to check if the hash take into account higher frequencies which the human eye can’t perceive. It seems that tineye consider them because I got positive results even with different histograms.
FFThigh pass 1
down to 100 pixels
up to 0 pixels tolerance 5%
12 results
fft high pass2 down to 10 pixels up to 0
12 results
fft high pass3 down to 2 pixels up to 0
12 results
Entropy
I recently discovered that the tineye doesn’t compute the signatures if the image has a low entropy: quite interesting.
For instance an image with an entropy of :
is not accepted for the hashing