The Code

The bits and pieces that make up is in our Github, but specifically the following pieces make up the whole:

  • Browser Extensions - a single repository from which we build both Firefox and Chrome extensions with shared code between them (GNU GPLv3)
  • Blockhash - the algorithm we use to identify images (MIT license). An RFC (in progress) describes the algorithm in more detail. There are also implementations in Python and JavaScript (both MIT licensed).
  • HmSearch - a C++ implementation of a hamming distance search algorithm (MIT license). We also provide Node bindings for HmSearch. Master branch works against a Kyoto Cabinet backend, but branches exist for working against LevelDB or PostgreSQL. We currently use PostgreSQL as our own backend.
  • Commons Hasher - our script that reads database dumps from Wikimedia and retrieves information from the Wikimedia API, as well as calculate hashes of images retrieved (GNU GPLv3).
  • Catalog - our catalog infrastructure, that provides the API which the browser extensions communicate with (GNU GPLv3).



The catalog API is documented with Apiary and we encourage you to check it out. Please do note that while the API supports user profiles and extensive information about works, we currently only make use of the API calls that lookup and provide information about works and which do not require authentication.

Information about the photographs in the catalog is carried within W3C Media Annotations. There are two examples (for JavaScript and Python) on Github that's a good starting point for learning how to communicate with the catalog.