Introduction

As we have discussed in MetadataMatching, metadata standards provide us with tons of information on a photo ranging from shooting properties (e.g. flash used) to Geographical information. On the first sight this amount is overwhelming, on the other side, each metadata standard carries a lot of "dead meat" = data fields that contain none/dummy data or proprietary fields that only apply for a special camera model (e.g. EXIF MakerNotes).

In this document I tried to extract senseful metainformation from various standards (EXIF,IPTC,XMP) or providers (Photosharing services like Flickr/Picasa). This is the "good meat", that should saved in queryable form in the picurl database. This means that you e.g. can search for all pictures taken by a Canon Xti Rebel.

All the other information from a metadata standard is just saved in a dictionary, which means you can only view and edit it.

The metadata is sorted by categories and not by metadata standard. This should ease review and creation of a database scheme. The document tries to mention all possible sources for a metadata type (e.g. a Photo Caption can be defined in IPTC, in Flickr or in Picasa)

Remarks

  • This document specifies Metadata that picurl should be able to READ, not Write!!! A lot of the gathered information is read-only.
  • Following the EXIF tag name I have denoted the number of the EXIF tag in hex value, e.g. make has a tag number of 10F.
  • Flickr.com allows the Retrieval of all EXIF data from a photo using flickr.photos.getExif - see FlickrExifResponseSample.
  • Google Picasa currently provides a very limited set of EXIF-Metadata information via its API. When applicable, the Metadata parameter is denoted.
  • Parsing IPTC information is IMPORTANT, as Google Picasa Desktop software saves all captions in the IPTC Caption field of an image.

Metadata categories

Image description

  1. Title: Title of the photo, 255 chars maximum.
    EXIF: n/a, IPTC: Headline, Flickr: title-property of photo, Picasa: n/a
  2. Description: Title of the photo, 255 chars maximum.
    EXIF: n/a, IPTC: Caption, Flickr: description-property of photo: Picasa: <media:description>
  3. Photographer: Photographer/Owner of the image
    EXIF Copyright (?), IPTC: Copyright, Picasa: Username, Flickr: if provided, realname else username
  4. Area Notes: Descriptions that refer to a rectangular part of the image, saved with coordinates (see http://www.flickr.com/photos/picurlpy/2077849553/ for an example). Currently only supported by flickr.com

Image categorisation

  1. Album: Folder/Photoset/Album that serves as a container for thematically related photos. Filesystem: Folder, Picasa: Album, Flickr: Photoset
  2. Rating: Rating of the image (1-5 stars) not implemented in metadata standards
  3. Tags: Keywords/Tags assigned to the picture.
    EXIF: n/a, IPTC: Keywords, Flickr: tags, Picasa: <media:keywords>
  4. Flag: Permanentely select an image

Technical properties

We are interested in the following properties:

  1. Camera Manufacturer Manufacturer of the Camera (e.g. Canon) as a string
    EXIF: make (10F), Picasa <exif:make>
  1. Camera Model Camera Model (e.g. Rebel Xti) as string, sometimes with Manufacturer name
    EXIF: model (110), Picasa <exif:model>
  1. Orientation Image Orientation viewed in terms of rows and colums. Not all Cameras provide correct values (sensor required)
    EXIF: orientation (112), Picasa n/a
  1. Exposure Time The exposure time given as "tuple fraction", e.g. (10, 1110) = 10/1110 = 1/111s
    EXIF: ExposureTime (829A), Picasa <exif:exposure>
  1. ISO Speed/Equivalent Describes the sensitivity setting of the CMOS sensor compared to traditional film. For more info check http://tinyurl.com/34gdqv
    EXIF: ISOSpeedRatings (8827), Picasa <exif:iso>
  1. Flash used? EXIF: Doesn't just return Yes/No, 22 possible modes defined in EXIF specification (red-eye reduction, compulsory mode,...) Picasa: just True/False.
    EXIF: Flash (9209), Picasa <exif:flash>,
  1. Exposure program If a preset Exposure Program is used at shooting time (esp. Portrait/Landscape), its recorded in this tag.
    EXIF: ExposureProgram (8822), Picasa n/a
  1. Light Source Nummerical Codes for different light sources (1=Daylight, 2=Fluorescent,..)
    EXIF: LightSource (9208), Picasa n/a

Date & Time

  1. Picture Taken Unix timestamp returning the Creation Date of the picture. Metadata precedence
    1. Flickr (taken property) or Picasa (<gphoto:timestamp>), if not applicable fall back to
    2. EXIF: DateTimeCreated (9003) if not applicable fall back to
    3. Filesystem (Creation Date)
  2. Picture Uploaded
    1. Flickr (uploaded property), if not applicable fall back to
    2. Picasa (Creation Date of containing album) if not applicable fall back to
    3. Filesystem (Creation Date)
    4. EXIF DateTimeCreated (9003)

Thumbnails

  1. Thumbnail Extraction returns a thumbnail of the image having the following properties:
    • longest side max. 160px
    • showing all photo content = not cropped
    • can have wrong rotation

Extraction procedure:

  1. Flickr: return Thumbnail via URL construction scheme, if not applicable
  2. Picasa: get Thumbnail via API Call (?), if not applicable
  3. get Thumbnail from EXIF data.

Image Versions/Identification

  • Post-Header-Checksum Skip the header (metadata) of an image file and create a checksum of the first 20kb of the image. Allows quick identification of duplicates. Google Picasa provides some strange checksum to prevent duplicate uploads.
  • Metadata could also include download location for high-res-version of image (TODO: make a detailed concept).

Geolocation

  • GPS Latitude and Longitude (necessary to locate a point) can be retrieved via Picasa/Flickr API calls or via EXIF data
  • TODO: EXIF further specifies GPSLongitudeRef (East or West) and GPSLatitudeRef (North or South) - do we have to consider this for conversion?

Image License

  1. Copyright Information Copyright Information as Text, e.g. (c) 2008 by Joe Doe. No reproduction without prior arrangement Copyright Extraction procedure:
    • for flickr photos:
      • return a string following the pattern
                 Copyright YEAR_OF_POSTING by REAL_NAME_OF_USER (URL_TO_FLICKR_USERPAGE).
        
        if the user supplied no real name, one can use the flickr username
      • If a Creative Commons License was defined, the string is expanded with
                 Shared under the Creative Commons CC_LICENSE_NAME-License (URL_OF_CC_LICENSE).
        
    • for Picasa photos: return a string following the pattern
               Copyright YEAR_OF_GPHOTO_TIMESTAMP by Google Picasa User GOOGLE_USERNAME (URL_TO_PICASA_PHOTOPAGE).
      
    • for other pictures: try to user the IPTC Copyright Field
    • for pictures without Copyright Information no text is saved. This can be set later in picurl.
  1. CreativeCommons License Information

Currently, we can encounter Creative Commons-Licenses in two ways:

  • On flickr.com, one can assign a CreativeCommons license to his photos - this information is retrieveable by the flickr.photos.getinfo API call
  • For Offline Content, Creative Commons recommends the embedding of an XMP template containing license information into the file header. See CreativeCommonsXmpSample for an example.