TARO - Texas Archival Resources Online Administrative Mode

How Do I...?

TARO | Administrative Pages | 500K XML file size limit and splitting files

500K XML file size limit and splitting files (Suggested)

We no longer limit TARO submissions for size in excess of 500 kilobytes. For those who wish to split their large files, this option is still available. To split XML files in excess of 500K, repositories can adopt either of two approaches:

1. Break the finding aid into completely discrete electronic parts or volumes which stand on their own based on a criteria of your choosing (for instance, year ranges) in order to create files under the 500K limit. These files each have their own filename. (ex. 00001.xml, 00002.xml, etc.) Several TARO repositories have done this in the past.

2. Utilize the EAD 2002 linking elements to split large XML files into a series of smaller connected files. We have created a page on the TARO support site which goes into the practice we want to see followed for linking to objects, other xml files etc.:


<archref href="http://www.lib.utexas.edu/taro/utlac/00245/lac-00245p2.html" actuate="onrequest" show="new" linktype="simple">Part II: Seventeenth Century Documents</archref>

Pay close attention to both the directory and filename in the example. The directory is the original 5 digit number, while the filename is the part name with the institution prefix we have created for you. If you do not know your institution's prefix contact either Minnie or Fred and we will send you the info.

For those of you contemplating use of this second option, it will necessitate a slight tweak to the submission process. As an example, let's assume you have an XML file, 00245.xml, which is over 500K in size and you wish to use EAD 2002 linking tags to create a series of smaller connected xml files.

Having done this, when it comes time to submit the files to us, you will want to name these smaller but connected files in a way which will keep them together as a unit. If in this instance, you split 00245.xml into four parts we would want you to name the files like this:


you will want to use the corresponding filename in the eadid in place of the original 5 digit xxxxx, NOT in addition.


<eadid countrycode="us" mainagencycode="TxU-LA">urn:taro:utexas.blac.00245p2</eadid>

You will also need to create individual "Part" titles for these files so that they don't show up on the browse pages as a long list of repeating titles.

        <titleproper>A Guide to the Ann W. Richards Papers, 1933-2000 [Part 1] </titleproper>
        (any other tags needed)
      (any other tags needed)

You can choose to add as much or as little duplicate information as you wish we only ask that you include three pieces:

    Creator <origination label="Creator">
    Title <unittitle label="Title" encodinganalog="245">
    Repository <repository label="Repository" encodinganalog="852$a">

After you are done editing, rather than simply moving them into your account on the TARO server, we would ask that you create a directory for them named "00245" and place them in that directory in your account rather than in with all your other files.

Since these files require special processing, please inform either Minnie or Fred that you have uploaded files and the appropriate file/folder name(s).

While this new guideline applies to ANY future submission, whether it is a new file or an updated file, we certainly suggest you consider splitting already submitted files which are in excess of 500K in size. However, we will not force you to do so against any timeline.


Copyright © The University of Texas at Austin.
Produced by the University of Texas Libraries.
News How Do I...? <ead> Staff