Introduction
As we have discussed in MetadataMatching, metadata standards provide us with tons of information on a photo ranging from shooting properties (e.g. flash used) to Geographical information. On the first sight this amount is overwhelming, on the other side, each metadata standard carries a lot of "dead meat" = data fields that contain none/dummy data or proprietary fields that only apply for a special camera model (e.g. EXIF MakerNotes).
In this document I tried to extract senseful metainformation from various standards (EXIF,IPTC,XMP) or providers (Photosharing services like Flickr/Picasa). This is the "good meat", that should saved in queryable form in the picurl database. This means that you e.g. can search for all pictures taken by a Canon Xti Rebel.
All the other information from a metadata standard is just saved in a dictionary, which means you can only view and edit it.
The metadata is sorted by categories and not by metadata standard. This should ease review and creation of a database scheme. The document tries to mention all possible sources for a metadata type (e.g. a Photo Caption can be defined in IPTC, in Flickr or in Picasa)
Remarks
- This document specifies Metadata that picurl should be able to READ, not Write!!! A lot of the gathered information is read-only.
- Following the EXIF tag name I have denoted the number of the EXIF tag in hex value, e.g. make has a tag number of 10F.
- Flickr.com allows the Retrieval of all EXIF data from a photo using flickr.photos.getExif - see FlickrExifResponseSample.
- Google Picasa currently provides a very limited set of EXIF-Metadata information via its API. When applicable, the Metadata parameter is denoted.
- Parsing IPTC information is IMPORTANT, as Google Picasa Desktop software saves all captions in the IPTC Caption field of an image.
Metadata categories
Image description
- Title: Title of the photo, 255 chars maximum.
EXIF: n/a, IPTC: Headline, Flickr: title-property of photo, Picasa: n/a - Description: Title of the photo, 255 chars maximum.
EXIF: n/a, IPTC: Caption, Flickr: description-property of photo: Picasa: <media:description> - Photographer: Photographer/Owner of the image
EXIF Copyright (?), IPTC: Copyright, Picasa: Username, Flickr: if provided, realname else username - Area Notes: Descriptions that refer to a rectangular part of the image, saved with coordinates (see http://www.flickr.com/photos/picurlpy/2077849553/ for an example). Currently only supported by flickr.com
Image categorisation
- Album: Folder/Photoset/Album that serves as a container for thematically related photos. Filesystem: Folder, Picasa: Album, Flickr: Photoset
- Rating: Rating of the image (1-5 stars) not implemented in metadata standards
- Tags: Keywords/Tags assigned to the picture.
EXIF: n/a, IPTC: Keywords, Flickr: tags, Picasa: <media:keywords> - Flag: Permanentely select an image
Technical properties
We are interested in the following properties:
- Camera Manufacturer
Manufacturer of the Camera (e.g. Canon) as a string
EXIF: make (10F), Picasa <exif:make>
- Camera Model
Camera Model (e.g. Rebel Xti) as string, sometimes with Manufacturer name
EXIF: model (110), Picasa <exif:model>
- Orientation
Image Orientation viewed in terms of rows and colums. Not all Cameras provide correct values (sensor required)
EXIF: orientation (112), Picasa n/a
- Exposure Time
The exposure time given as "tuple fraction", e.g. (10, 1110) = 10/1110 = 1/111s
EXIF: ExposureTime (829A), Picasa <exif:exposure>
- ISO Speed/Equivalent
Describes the sensitivity setting of the CMOS sensor compared to traditional film. For more info check http://tinyurl.com/34gdqv
EXIF: ISOSpeedRatings (8827), Picasa <exif:iso>
- Flash used?
EXIF: Doesn't just return Yes/No, 22 possible modes defined in EXIF specification (red-eye reduction, compulsory mode,...) Picasa: just True/False.
EXIF: Flash (9209), Picasa <exif:flash>,
- Exposure program
If a preset Exposure Program is used at shooting time (esp. Portrait/Landscape), its recorded in this tag.
EXIF: ExposureProgram (8822), Picasa n/a
- Light Source
Nummerical Codes for different light sources (1=Daylight, 2=Fluorescent,..)
EXIF: LightSource (9208), Picasa n/a
Date & Time
- Picture Taken
Unix timestamp returning the Creation Date of the picture. Metadata precedence
- Flickr (taken property) or Picasa (<gphoto:timestamp>), if not applicable fall back to
- EXIF: DateTimeCreated (9003) if not applicable fall back to
- Filesystem (Creation Date)
- Picture Uploaded
- Flickr (uploaded property), if not applicable fall back to
- Picasa (Creation Date of containing album) if not applicable fall back to
- Filesystem (Creation Date)
- EXIF DateTimeCreated (9003)
Thumbnails
- Thumbnail Extraction
returns a thumbnail of the image having the following properties:
- longest side max. 160px
- showing all photo content = not cropped
- can have wrong rotation
Extraction procedure:
- Flickr: return Thumbnail via URL construction scheme, if not applicable
- Picasa: get Thumbnail via API Call (?), if not applicable
- get Thumbnail from EXIF data.
Image Versions/Identification
- Post-Header-Checksum Skip the header (metadata) of an image file and create a checksum of the first 20kb of the image. Allows quick identification of duplicates. Google Picasa provides some strange checksum to prevent duplicate uploads.
- Metadata could also include download location for high-res-version of image (TODO: make a detailed concept).
Geolocation
- GPS Latitude and Longitude (necessary to locate a point) can be retrieved via Picasa/Flickr API calls or via EXIF data
- TODO: EXIF further specifies GPSLongitudeRef (East or West) and GPSLatitudeRef (North or South) - do we have to consider this for conversion?
Image License
- Copyright Information
Copyright Information as Text, e.g. (c) 2008 by Joe Doe. No reproduction without prior arrangement
Copyright Extraction procedure:
- for flickr photos:
- return a string following the pattern
Copyright YEAR_OF_POSTING by REAL_NAME_OF_USER (URL_TO_FLICKR_USERPAGE).
if the user supplied no real name, one can use the flickr username - If a Creative Commons License was defined, the string is expanded with
Shared under the Creative Commons CC_LICENSE_NAME-License (URL_OF_CC_LICENSE).
- return a string following the pattern
- for Picasa photos:
return a string following the pattern
Copyright YEAR_OF_GPHOTO_TIMESTAMP by Google Picasa User GOOGLE_USERNAME (URL_TO_PICASA_PHOTOPAGE).
- for other pictures: try to user the IPTC Copyright Field
- for pictures without Copyright Information no text is saved. This can be set later in picurl.
- for flickr photos:
- CreativeCommons License Information
Currently, we can encounter Creative Commons-Licenses in two ways:
- On flickr.com, one can assign a CreativeCommons license to his photos - this information is retrieveable by the flickr.photos.getinfo API call
- For Offline Content, Creative Commons recommends the embedding of an XMP template containing license information into the file header. See CreativeCommonsXmpSample for an example.
