On December 14th, we talked about how to best store metadata (date/time, keywords, camera model, exposure properties, etc..) in our database. To get a feeling for the kind of metadata we want to keep and how different services (FTP, Flickr, Picasa, HTTP) store metadata, we are going to implement a simple web service that is able to communicate with demo stores (on Flickr, etc..) and read out the metadata available via the Flickr API.
Metadata that could be fetched includes:
- (embedded) thumbnails
- date/time
- keywords, tags
- photographer, copyright
- camera model/make
- exposure properties
- orientation
- vendor-specific metadata (so-called MakerNotes)
- GEO-Tags (GPS data)
The metadata sandbox will be a place to try out metadata-related code. Functions that the metadata sandbox should provide, so we can test our code and possibilities should include at least:
- Extracting/displaying of thumbnails
- Text-based search through metadata (i.e. all pictures tagged "vienna")
- Chronological sorting of images from multiple stores
- Detect duplicate images (binary identical files with different filenames)
Manually sort picturesManually unselecting pictures (to be excluded from the feed)- Generate feeds for iPhoto (example feed)
- Generate XMP-file for single pictures
The following libraries and frameworks might be helpful in implementing the metadata sandbox:
- The big picture: http://code.google.com/p/thebigpicture
- webpy http://webpy.org/
There are still some open questions that have to be dealt with:
- Keep web service up-to-date with latest SVN commits (triggers, manual updates or branches)
To get the most out of our metadata sandbox, we will start with simple JPEG files (EXIF) and then try to extend the metadata sandbox to also deal with TIFF and Camera RAW formats. Flickr and Picasa have to ability to geocode pictures on-line. These geotags should also be read and used by picurl. In the metadata sandbox, we do not yet save metadata inside a database, but restrict the available data to a controlled set of example pictures. Metadata is read on-the-fly and will not be cached or saved. Caching and saving (the "database") will be designed and implemented after we have played around with the metadata sandbox a bit.
Core Concepts of the Sandbox
Achieve the lowest common multiple
Metadata standards and APIs often overlap each other and save identical data in different structures (e.g. a description of an image can be saved in an IPTC Attribute or a flickr field or even in a filename). The sandbox should abstract the metadata extraction and map/merge identical data fields in a senseful way, meaning there should only be one description field, even the image has more description metadata elements.
Bridging the Metadata Gap
Each Metadata standard has it's own focus and field of application. EXIF is focussed on saving Camera settings during snapshots, IPTC is a descriptive API, Adobe XMP is an "overall container", Flickr/Picasa concentrates more on the Photosharing aspect. Furthermore, the storage location and/or format of an image determines whether a Metadata API can be used (you can't use the flickr API for images on your FTP-Server, IPTC is not possible for RAW files). The goal of picurl is to "level out" these differences and allow the storage-independent saving of metadata. So the Sandbox will serve as a layer above all these APIs and saves all Metadata that can't be stored by available standards of the store. (e.g. Think of Geo-Tagging pictures on a read-only FTP-Store).
