How picurl gets all the metadata it needs
This document presents relevant metadata standards and providers for picurl. It also highlights their strenghts and weaknesses and gives some general ideas on metadata-retrieving issues.
If you are looking for a list of metadata fetched by picurl, try MetadataCategories.
General Considerations for retrieving metadata
Data Reliabiliy
All metadata standards presented in this document declare the *quantity and format* of metadata, not it's quality. One cannot trust all retrieved metadata. The EXIF Orientation tag can serve as an example: most cameras don't have a built-in sensor to detect wheter the camera was rotated during shooting. Nevertheless the Orientation tag is always set to the landscape format (width > height of image).
Data Redundancy
Despite every Metadata format has its own domain of application, they sometimes overlap. E.g. you can provide Copyright information in EXIF and in IPTC. Some redundancies are the result of "silent imports": both flickr and Picasa will try to import existing IPTC data from an uploaded image into their title/description fields.
Cost of Retrieval
The retrieval of Metadata consumes both computing and network activity, especially when accessing APIs of Photosharing Services. It takes 3 API calls to retrieve Title, Description and geographical coordinates for a Flickr.com photo. This can slow down the inventory of a big photo collection dramatically.
Metadata standards and providers
EXIF
EXIF is an abbreviation for Exchangeable image file format. It describes a specification for the image file format used by digital cameras. EXIF is based on the existing JPEG and TIFF specifications and extends them by defining a broad spectrum of metadata information including:
- Date and time information. Digital cameras will record the current date and time and save this in the metadata.
- Camera settings. This includes static information such as the camera model and make, and information that varies with each image such as orientation, aperture, shutter speed, focal length, metering mode, and ISO speed information.
- A thumbnail for previewing the picture on the camera's LCD screen, in file managers, or in photo manipulation software.
- Descriptions and copyright information.
Problems of EXIF metadata:
- EXIF only allows JPEG or TIFF files. Camera RAW files are not compatible with EXIF. Therefore, the metadata structure of Camera RAW files is up to the Camera Manufacturer and has to be decoded individually (dcraw or Exiftool).
- Essential Information is stored in custom=proprietary EXIF MakerNote tags.
- No possibility to record time-zone information.
- Some EXIF tags provide misleading information (e.g. Orientation tag set, but camera can't detect orientation)
- Many EXIF tags are not mandatory.
Sources:
IPTC
IPTC metadata were employed by Adobe Systems Inc. to describe photos already in the early nineties. A subset of the IPTC "Information Interchange Model - IIM" was adopted as the well known "IPTC Headers" for Photoshop, JPEG and TIFF image files which currently describe millions of professional digital photos.
(quoting the IPTC/XMP Website)
IPTC provides descriptive, non-technical infos on a photo. This includes headline (~ title), caption (~ description), but also copyright and textual location information. As one can read from the field names, IPTC is geared towards press photos. This is still the primary application domain, but also hobby photographers store their descriptions in IPTC data fields (actually, their imaging software does).
Two IPTC standards exist: the older one (IPTC/NAA) stores its data directly in the image header, the newer one (IPTC4XMP) embeds this information in an XMP document, which can either be saved within the image header or as a standalone document.
- IPTC4XMP Website
- Imaging Software that supports IPTC tags
- Information on the IPTC/NAA standard - mind the sample IPTC fields table on the bottom of the page!
Adobe XMP
The Adobe Extensible Metadata Platform (XMP) is a standard for processing and storing standardized and proprietary metadata, created by Adobe Systems Inc. It is a Subset of the W3C Resource Description framework (=expressed in XML). XMP serves as a XML Container Format for other metadata like EXIF or IPTC, but can also be extended for custom metadata.
It can be saved as .XMP text file and is also embedable in the following image file formats:
- TIFF - Tag 700
- JPEG - Application segment 1 (0xFFE1) with segment header "http://ns.adobe.com/xap/1.0/\x00"
- JPEG 2000 - 'uuid' atom with UID of 0xBE7ACFCB97A942E89C71999491E3AFAC
- PSD - Location (?)
It's main drawbacks are:
- Lack of full XMP editing capabilities in Adobe's own applications: The user can also edit the Information from pre-defined schemas (IPTC, EXIF(?)). You can't edit your own custom metadata schema.
- Introduces redundancy (the information of EXIF and IPTC is repeated in the XMP document)
- Increases the filesize (especially for web photos)
Sources:
Google Picasa
Flickr.com
Filesystem
TODO: gather some information
Creative Commons
The Creative Commons License model bridges the gap between artists and copyright law. The model allows an artist to specify in an easy, yet detailed way under which conditions he wants to release his works to the public. After answering some simple questions like "May others edit or remix your work?", the system chooses a suitable License for you. While some simple icons show the main restrictions of your license (e.g. no commercial use) in a human-understandable way, international "law-proofness" is assured by a detailed license text. CreativeCommons is free to use, for more information check their webpage
Retrieval Strategy
There are essentially two ways to retrieve metadata from a photo:
- when it is hosted on a photosharing site like flickr or picasa, you can use API-calls to get the data.
- when it is stored in a local or remote filesystem (Harddisk, FTP,...) you have to parse the file header of the photo on your own.
Our retrieval strategy is short: when a) doesn't apply (=photo not hosted on flickr/picasa), then use b.
