Update: Since writing this blog post I’ve been using the Open Source Pastec application to do image similarity search. It’s good, and works well, although it’s not quite as good as the commercial MathEngine service provided by TinEye. I’ve also written a Pastec node module
that you may find to be helpful.
I’ve been working on a few projects in my spare time and one service, in particular, would greatly benefit from a high quality image similarity search.
I’ve been trying a number of the Open (and non-Open) Source tools (a great list of which is on Wikipedia here). Thus far none of the tools that I’ve found are of high-enough quality to warrant further pursuit. They either do simple color comparison, basic wavelet/outline comparison, or some form of hashing – none of which appears to work very well beyond basic images. Some of the best algorithms are either caught up in University research programs (generally unreleased) or are available as corporate search engines.
In an ideal world I’d like something with the quality of TinEye (I’d even be open to using TinEye’s commercial services but they haven’t gotten back to me as of yet – I suspect that they’re mostly interested in dealing with large corporate clients).
In short: Does anyone have a lead on a high quality image similarity search tool (using Content-Based Image Retrieval)? I’m open to Open Source, closed source, or even paid API service – as long as it works well.
Note: I’m looking to use this on a private collection of images, so a service like Google Image Search (or TinEye’s normal commercial API) are not suitable alternatives – they both search images on the open web.
John Resig (February 13, 2012 at 4:51 pm)
Thus far I’ve used: imgSeek, Windsurf, pHash, and libpuzzle – amongst others. Other suggestions are certainly welcome!
Tomas Corral (February 13, 2012 at 4:51 pm)
http://tcorral.github.com/IM.js/
IM.js is an Image Matcher using canvas.
I hope it can help you!
Best regards!
John Resig (February 13, 2012 at 4:53 pm)
@Thomas: Thank you for the suggestion! Unfortunately what I’m looking for is much more complex than that. It must be able to handle images of different sizes, different colors, rotations, scaling, and distortion. Some of the projects that I linked to above get close to that but still fall short.
Tomas Corral (February 13, 2012 at 5:15 pm)
Ok. Thanks!
James (February 13, 2012 at 5:32 pm)
do you want to compare two images for similarity or find stuff in photos like Google Goggles ? If the former Imagick compare may do the trick.
http://www.imagemagick.org/script/compare.php
John Resig (February 13, 2012 at 5:38 pm)
@James: More that I want to compare two images. Finding things in an image is a good sub-goal but the technique that Image Magick’s compare offers is too limited (and slow) to scale. Being able to operate against a couple hundred thousand images is a must and doing one-on-one sub-image comparisons will certainly be trying.
@_dhar (February 13, 2012 at 5:58 pm)
I heard about http://www.moodstocks.com/ but never used. Not sure if it really fits with your needs…
Eric Conner (February 13, 2012 at 6:05 pm)
Probably not exactly what you are looking for, but ImageNet (http://www.image-net.org/) may be of interest. It tries to match images to the WordNet hierarchy.
John Resig (February 13, 2012 at 6:08 pm)
@_dhar: That service does looks like it would be useful for me – unfortunately it also looks like it would be prohibitively expensive. With 200,000 images it would cost ~20,000 Euros a month to keep it running. Thanks for the suggestion though!
John Resig (February 13, 2012 at 6:09 pm)
@Eric: You’re right in that it’s not exactly what I’m looking for, but that’s a very cool project, nonetheless. Thank you for bringing it to my attention!
Jon (February 13, 2012 at 6:13 pm)
Woah, a bit beyond me- but found this earlier, may be helpful- maybe not. Good luck!
http://www.cse.ust.hk/image_forensics/
John Resig (February 13, 2012 at 6:39 pm)
@Jon: That’s much closer to what I’m looking for! I’ll dig into this a bit more. Thank you for the suggestion!
Jon (February 13, 2012 at 6:51 pm)
No worries, glad to help.
Graeme (February 13, 2012 at 6:52 pm)
Have you come across SIFT and SURF? I worked briefly with them during my masters and they seemed pretty powerful. I think SURF is open source and has been implemented in the OpenCV library.
Graeme (February 13, 2012 at 6:54 pm)
(Starter) links: http://en.wikipedia.org/wiki/SURF
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
Nick (February 13, 2012 at 7:31 pm)
I knew someone who did an undergraduate thesis in this area.
He would calculate a video delta frame between those two images (using a variety of video codecs and encoders), and then measure the complexity of that delta frame.
The result was pretty good, and he even had a cool jQuery demo which arranged a collection of images spatially based on their similarity.
Cor Bosman (February 13, 2012 at 8:41 pm)
I just hooked up an image library with 88000+ images to a PHP website using imgSeek. Works like a charm for image similarity searches, and has a workable PHP interface through either SOAP or XMLRPC. Images are found pretty much instantly, but the program is a memory hog. Using 500+MB RES memory right now.
Asa Baylus (February 13, 2012 at 9:08 pm)
Adobe’s pixel nuggets looks pretty amazing. I’m not sure where the project stands, but you might be able to something out with them.
http://tv.adobe.com/watch/max-2011-sneak-peeks/max-2011-sneak-peek-pixel-nuggets/
Eitan (February 13, 2012 at 11:01 pm)
Hey,
You can perhaps try out: http://vision.stanford.edu/projects/objectbank/index.html#software.
I’ll explain more as needed.
Good luck!
wheresrhys (February 14, 2012 at 4:26 am)
http://www.cs.bath.ac.uk/brown/autostitch/autostitch.html is very good at recognising image overlaps, so could maybe be adapted to do an image search. On the site they say they’re open to being approached to develop new products using it, so maybe it’s possible to get a hold of just the image comparison component.
John Noel (February 14, 2012 at 5:10 am)
Looking at some of the ones you’ve tried and from my own limited knowledge of the field – is it maybe worth trying to combine the different methods? So try different pipelines e.g. imgSeek > pHash > Windsurf > libpuzzle, to see which provides the best matches? Or similarly try two or three of them async and see which images they agree on?
That way at least you’re not reliant on one megalithic similarity metric but rather many and allows for pluggable enhancements if a new, “better” similarity method comes out in the near future.
frank (February 14, 2012 at 9:05 am)
OpenCV has a good collection of algorithms (SURF, SIFT, ANN, etc.).
Not a complete solution but a very good starting point
for implementing one.
Darcy Parker (February 14, 2012 at 9:42 am)
Have you looked at MPEG7? http://en.wikipedia.org/wiki/MPEG-7
There are many strategies for indexing images. Solutions based on MPEG7 are readily available works well for digital images and can handle noise/differences between similar images. I believe there have been some mash ups for flicker that use MPEG7. And if I recall correctly, iphoto and picassa use MPEG7 (or a very similar indexing method) to identify pictures of people that are similar so that you can reuse meta data (tags).
Scott Trudeau (February 14, 2012 at 12:33 pm)
John,
I’ve had a chance to play with some of Idée’s (TinEye maker) APIs and they are pretty amazing (their color/palette indexing is also pretty impressive for large image sets)–I did a a cursory review of what is out there a bit over a year ago and while there is a lot of academic research addressing these sorts of problems, as far as I know there isn’t an easily accessible service/library/etc that does this sort of thing easily/well/efficiently. I hope I missed something and/or there are some good solutions that have evolved since then … Idée’s prices were unfortunately too high for us to justify at the time for our own application and we haven’t revisited this. Good luck!
Otavio (February 14, 2012 at 4:05 pm)
Hello,
You can take a look at Eva tool:
http://www.recod.ic.unicamp.br/~otavio/eva/
Daniel (February 15, 2012 at 6:43 am)
Hi John,
we did this image similarity search service called similoo: http://www.similoo.com
Maybe it is suitable for you?
kellan (February 15, 2012 at 6:40 pm)
If you’re looking for “TinEye-like” you’re probably looking for “Near Duplicate Detection and Sub-image retrieval” aka LSH or locality sensitive hashing. Lots and lots knobs to twiddle in this direction based on how much alteration you’re trying to collapse and also how fast you want indexing to be, and how much data you’re willing to store per photo. OpenCV isn’t the most accessible of toolkits, but it’s the most complete.
Ricardo Cabral (February 15, 2012 at 7:19 pm)
Hi John. What would you say were the shortcomings after evaluating imgSeek (or isk-daemon)?
Darryl (February 16, 2012 at 7:29 am)
Check out this blog post:
http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
Marcin Szajek (February 17, 2012 at 5:51 am)
Hello,
look at http://www.itraff.pl/english/
It is website of company with had it’s own photo recognition technology.
If you need more information or want to test it feel free to contact me (szajek[at]programa.pl)
Marcin Szajek (February 17, 2012 at 5:56 am)
I forgot to tell that we were one of then best european startups in StartupFest Wien 2011 (SaveUp – iOS & Android) and won Hack4Europe! 2011 Edition (the highest business value category) organized by European Union Commision (Art4Europe Android app)
William Riley-Land (February 17, 2012 at 5:31 pm)
I wonder if you might take some queues from the article “Medical Image Registration using Evolutionary Computation” in IEEE Computational Intelligence Magazine Vol. 6 Num. 4. It’s not geared toward search, but a lot of the techniques would be applicable.
From the article: “Image registration … is used to align … images acquired … at different times … from different viewpoints … IR aims to estimate the best geometric transformation leading to the best possible overlap [between the images].”
The article goes on to detail some similarity metrics and strategies for finding high-scoring transformations of one image onto another.
Layne Lin (February 19, 2012 at 2:31 pm)
Have you tried Correlation Coefficient formular?
ram (February 19, 2012 at 9:11 pm)
did you try caliph image matching – pretty scalable for large set of images
http://sourceforge.net/projects/caliph-emir/
ram (February 19, 2012 at 9:15 pm)
http://www.semanticmetadata.net/wiki/doku.php?id=start is the link which gives tutorial to use calif lire java api library.
logan henriquez (February 21, 2012 at 11:02 pm)
John
I built an image search engine for the art gallery world a few years ago and used FIRE as part of the image similarity score:
http://code.google.com/p/fire-cbir/
As part of that project we compared millions of images against each other so you could pick an artwork and find all the other artworks with similar look, genre, etc. At the time we compared many of the engines out there including some of those proprietary and university research based systems you mentioned. FIRE was one of the best at the time. The issue with all these systems is that they’re sensitive to elements of the image that humans consider irrelevant, for instance if it has a funky border (e.g. the frame of an artwork or the background of a sculpture). None of the CBIR systems produced results that humans would consider good by themselves. We augmented the FIRE similarity score with other date culled from other sources, such as the artwork’s artist, genre, etc. and together the results were fairly good. The other issue with these systems is that they’re fairly computationally intensive – we ran FIRE as a background job, computing a score and a ranked”most similar” artwork list for every artwork and storing that for the front end to retrieve results from – no different from any other search engine.
Best of luck.
yamaha (February 26, 2012 at 5:19 am)
Hmm it seems like your blog ate my first comment (it was extremely long) so I guess I’ll just sum it up what I wrote and say, I’m thoroughly enjoying your blog. I as well am an aspiring blog writer but I’m still new to the whole thing. Do you have any tips and hints for rookie blog writers? I’d certainly appreciate it.
Pierre Chapuis (February 27, 2012 at 11:13 am)
@john Hello, I work at Moodstocks and I have just noticed this post. We do not really do similarity search but rather exact matching, however it looks like it could fit your use case.
Regarding pricing: we don’t charge for non-production use, and we actually charge most of our corporate clients annual licenses rather than pay-for-use fees. We will update our pricing page to reflect this soon.
In the meantime feel free to contact us if you want to try to figure out a pricing that works for you (contact [at] moodstocks.com).