University of Texas at Austin
Libraries Home | My Account | Renew Items | Sitemap | Help

University of Texas Libraries



About DADG Taskforces:
Collections
Digitization
Intellectual Property
Metadata
Resources
Standards & Guidelines
Surveys
Join the Discussion

Metadata Task Force - Members - Minutes - Resources - Join The Discussion

Printable Version

Proposed Interchange Format for KG Metadata Registry Content

Batch Archive (BAR) Overview

The Batch Archive is a structured format consisting of directories and text files to represent collections of digital assets. It is easily set up and maintained, utilizes non-proprietary file formats and it will allow uploading of the collections into any number of databases and applications through batch routines.

The hierarchical structure of the directories is what defines the items within the collection. At the topmost level you have the archive directory, which contains item directories and within these text and xml files contain all the relevant metadata for the item. The digital assets themselves may be placed in these directories; however it is not a requirement that they do so.  

Each item within the collection has its own manifest file, dublin_core.xml and an optional <archive_name>.xml file.

Components and Definitions

archive directory – The topmost directory, named after the collection, where the item directories will reside. Synonymous with archive name.

item directory – Represents an item in the collection. Contains dublin_core.xml, <archive_name>.xml, and optionally the digital assets associated with an item.

manifest – Text file which contains one entry per line for each file associated with an item. Either filenames or URLs may be used.

dublin_core.xml – Item metadata that uses a qualified Dublin Core schema to represent the information. There are several dublin core elements that can be used for each item.

The dublin_core.xml file has the following format, where each Dublin Core element has its own entry within a <dcvalue> tagset. There are currently three tag elements available in the <dcvalue> tagset:

  • <element> - the Dublin Core element
  • <qualifier> - the element's qualifier
  • <language> - (optional)ISO language code for element
<dublin_core>
    <dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue>
    <dcvalue element="date" qualifier="issued">1990</dcvalue></dublin_core>
    <dcvalue element="title" qualifier="alternate" language="fr" ">J'aime les Printemps</dcvalue>
</dublin_core>

<archive_name>.xml – Additional item metadata; the file employs a non-qualified Diblin Core schema specific to a given collection.

BAR Directory Structure

archive_directory/

    item_one/

        manifest

        dublin_core.xml

        file_1.doc

        file_2.doc

    item_two/

        manifest

        dublin_core.xml

        file1.jpg

       

For item_one, the manifest file might have the following entries:

        file1.doc

        file2.doc

        http://www.foo-bar.edu/somefile.pdf

        http://www.foo-bar.edu/someotherfile.wav

 

The best way to illustrate the structure of the Batch Archive is with a concrete example. The following represents items from the Archive of Indigenous Languages of Latin America (AILLA):

 
AILLA/
ACU1M1/
manifest
dublin_core.xml
ailla.xml
ACUM1A1.pdf    
ACUM1A1.wav    
ACUM1A1.mp3    
CAA1M1/
manifest
dublin_core.xml
ailla.xml
CAA1M1A1.mp3
CAA1M1A1.wav
CAA1M1A1.pdf
CAA1M1B1.mp3
 
The item identifiers ACU1M1 and CAA1M1 are arbitrary identifiers that AILLA uses to 
catalog its items. Other possibilities might be ITEM_001, ITEM_002 or ailla_1, ailla_2, etc.
 
Item ACU1M1’s manifest file contains the following three lines:
   ACU1M1A1.pdf
   ACU1M1A1.wav
   ACU1M1A1.mp3
 
If a file was associated with the item, but not present in the item directory, a line 
such this may be added:
 
 http://www.ailla.org/media/achuar/ACUM1A1.doc
 
The dublin_core.xml file for ACU1M1A1:
 
<?xml version="1.0" encoding="ISO-8859-1"?>
<dublin_core>
<dcvalue element="title" qualifier="none">Achuar</dcvalue>
<dcvalue element="identifier" qualifier="other">ACU1M1</dcvalue>
<dcvalue element="language" qualifier="none">Achuar</dcvalue>
<dcvalue element="coverage" qualifier="spatial">Ecuador</dcvalue>
<dcvalue element="description" qualifier="abstract" language="en">A ceremonial v
isiting conversation volunteered by two Achuar men, Nayásh and Chiriáp, in the h
ouse of the first, settled on the upper Setuchi river, on September 22, 1974.</dcvalue>
<dcvalue element="subject" qualifier="other">Conversation</dcvalue>
<dcvalue element="contributor" qualifier="other">Maurizio Gnerre</dcvalue>
</dublin_core> 

Format Requirements

  • archive directory – The name of the archive directory should contain no characters other than alpha-numeric, periods (.) and delimiters such as underscores (_) and hyphens (-), and should contain no spaces, tabs or line breaks. Directory names should be as succinct as possible, and should not exceed 64 characters. Alpha-numeric character should be upper case only (e.g. AILLA, RUNYON, EPOETRY, etc.)
  • item directory - Use short, concise names for items and collections. Ideally, item identifiers should correspond to the actual item names. The same character requirements as those of the archive directory apply, with the exception that lower case characters may be used.
  • manifestThis file should be named “manifest” in lower case characters only. The filenames contained therein should, where applicable, have proper MIME type extensions as defined by RFC 1521 and RFC 1522. File names should contain no spaces, tabs or line breaks, and  no characters other than alpha-numeric, periods (.), underscores (_) and hyphens (-). The filename must match the name of the symbolic link in the item directory to which it corresponds.  URLs must conform to RFC 1738. All URLs in the manifest file should be updated as necessary
  • dublin_core.xml – It should conform to a specific qualified Dublin Core metadata schema and be a well formed XML document under the XML 1.0 Specification (http://www.w3.org/TR/REC-xml). Use qualified Dublin Core metadata whenever possible, since it is easier to manage a single xml file.
  • <archive_name>.xml – As with dublin_core.xml, it must conform to standards for a well formed XML document under the XML 1.0 Specification. Although no DTD is required, each file for items within a single collection should reference the same schema.