Format Conversion Strategies for Long-Term Preservation

The Bentley Historical Library is committed to the long-term preservation of and access to its digital collections. Because the library must contend with thousands of potential file formats, the Division of Digital Curation has adopted a three-tier approach to facilitate the preservation and conversion of digital content:

  • Tier 1: Materials produced in sustainable formats will be maintained in their original version.
  • Tier 2: Common “at-risk” formats will be converted to preservation-quality file types to retain important features and functionalities.
  • Tier 3: All other content will receive basic bit-level preservation.

This document provides further information on the Bentley Historical Library’s accepted preservation formats and conversion strategies.

Download this document as a PDF file.


Tier 1: Preservation of Sustainable Formats

The library has identified a number of sustainable file formats that are widely used and/or non-propietary, many of which have been recognized as international standards by bodies such as the International Standards Organization (ISO), ECMA International, and the Organization for the Advancement of Structured Information Standards (OASIS). The longevity of these formats has furthermore been acknowledged by various peer institutions and experts in the digital curation community, including the Library of Congress’s National Digital Information Infrastructure and Preservation Program.

Digital materials stored in these file formats should remain usable to researchers and administrative units at the University of Michigan for the foreseeable future and beyond. The Bentley Historical Library will therefore preserve the original version of content stored in these sustainable formats at the time of accession. The Division of Digital Curation will monitor community best practices and technological advances in case a migration to alternative preservation formats should prove necessary.

Find basic descriptions of file formats or search the PRONOM Technical Registry for specifications and more in-depth information.

 

Media Type Recommended Formats
Office Documents and Text-Based Files DOCX: MS Word Open XML Document (created in MS Office 2007 and 2010)
XLSX: MS Excel Open XML Document (created in MS Office 2007 and 2010)
PPTX: MS PowerPoint Open XML Document (created in MS Office 2007 and 2010)
ODT: OpenDocument Text Document (created in OpenOffice)
ODS: OpenDocument Spreadsheet (created in OpenOffice)
ODP: OpenDocument Presentation (created in OpenOffice)
PDF/A: Portable Document Format (Archival) (more information)
TXT: Plain Text File (ANSI or UTF-8 encoded)
RTF: Rich Text Format File
XML: Extensible Markup Language Data File
CSV: Comma Separated Values File
TSV: Tab Separated Values File
Audio Files WAV: Waveform Audio File Format (more information)
AIFF: Audio Interchange File Format
MP3: Moving Picture Experts Group Layer 3 compression
FLAC: Free Lossless Audio Codec File
OGG: Ogg Vorbis Audio File
Video Files MPEG-1/2: Moving Picture Experts Group
AVI: Audio Video Interleave File (uncompressed)
MOV: Quicktime Movie (uncompressed)
MP4: Moving Picture Experts Group (with H.264 encoding)
MJ2: Motion JPEG 2000
DV: Digital Video File (non-proprietary)
Raster (or Bitmap) Image Files TIFF: Tagged Image Format File
JPEG/JFIF: Joint Photographic Experts Group JPEG Interchange Format File (lossy compression)
JPEG 2000: Joint Photographic Experts Group (lossless compression)
GIF: Graphic Interchange Format
PNG: Portable Network Graphic
Vector Image Files SVG: Scalable Vector Graphics File
Email Files MBOX:Mailbox File
NOTE: A major limitation of ‘free’ Web-mail such as Gmail, Yahoo, or Hotmail is the inability to easily download or export messages to a different email client or your desktop. Using Mozilla Thunderbird, Outlook, MacMail, or similar clients may allow you to save local copies of messages and be platform-independent.
Database Files CSV: Comma Separated Values File
SIARD: Software Independent Archiving of Relational Databases (open XML format)
MySQL SQL: Structured Query Language file (MySQL is an open source relational database management system)

 


Tier 2: Conversion of At-Risk Formats

The digital curation community has long acknowledged the disadvantages posed by proprietary formats (for which only specific software may be used) and content encoded with “lossy” compression (i.e. compression that reduces the quality of the data to conserve space). The Bentley Historical Library will therefore convert the most common at-risk formats to preservation-quality sustainable formats. The original version of content will also be maintained alongside the preservation copy to ensure the authenticity of the Bentley Library’s digital collections. These conversion strategies reflect the policies and practices of peer institutions as well as the National Digital Information Infrastructure and Preservation Program.

Visit the Library of Congress Sustainability of Digital Formats site for more information on preservation issues and descriptions of preferred formats.

 

Media Type At-Risk Formats Preservation Target
Audio Files WMA: Windows Media Audio File WAV Format (preferably Broadcast WAVE)
RA: Real Audio File
SND: Apple Sound File
AU: Sun Audio File
Office Documents and Text-Based Files DOC: MS Word 1997-2003 Document MS Office Open XML (OOXML) Format
PPT: MS PowerPoint 1997-2003 Presentation
XLS: MS Excel 1997-2003 Spreadsheet
Email Files EML: Email Message File MBOX Format
PST: Outlook Personal Information Store File
Eudora Mail and approx. 40 other formats
Raster Image Files BMP: Windows Bitmap TIFF Format
PSD: Adobe Photoshop Document
FPX: FlashPix Bitmap
PCD: Kodak Photo CD Image
PCT: Apple Picture File
TGA: Targa Graphic
Vector Image Files AI: Adobe Illustrator SVG Format
WMF: Windows Metafile
PS: PostScript PDF/A Format
EPS: Encapsulated PostScript
Video Files SWF: Shockwave Flash MPEG4 (with H.264 encoding)
FLV: Flash Video
WMV: Windows Media Video
RV (or RM: Real Video

 


Tier 3: Bit-Level Preservation

Because it is infeasible to create conversion plans for the tens of thousands of formats in existence, the Bentley Historical Library will ensure that digital holdings in other formats (i.e. ones not specifically identified in this document) will receive bit-level preservation. The use of integrity checks and regular replacement of storage media (conducted by trusted partners in the University of Michigan Library Information Technology division and Information and Technology Services) will preserve the raw data stored in these files (i.e. the “stream” of 0s and 1s) in its original state. The library concedes that hardware or software obsolescence may reduce the functionality of these files or render them inaccessible. At the same time, the faithful preservation of the bitstreams will allow the library to take advantage of future developments in emulation technology.

 

Please contact the Division of Digital Curation with questions or comments regarding the Bentley Historical Library’s digital preservation and conversion strategies.