Saturday, December 13, 2008

Metadata in Context (Pitt's DRL)

This is an assignment completed for part of my MLIS coursework. The goal was to look at a digital collection, read its documentation, study its use of metadata, dissect its workflow, and speak to a metadata librarian at the institution. I chose The University of Pittsburgh's Digital Research Library (D-Scribe) partially for the sake of convenience but also because doing so gave me the opportunity to regularly speak with a metadata librarian and visit the DRL to get a better sense of the process.

The University of Pittsburgh's Digital Research Library


(see "Documentation" in lower right hand corner)

According to the DRL's Mission Statement: The ULS proactively identifies and evaluates resources for conversion into electronic format to be hosted and served by the ULS.

According to the DRL’s "Guidelines for working with the DRL to create Image collections" document: The DRL supports the teaching and research mission of the university through the creation and maintenance of web-accessible digital research collections.

Collection Content:

The collections primarily include digitized historical content with a focus on Pennsylvania and Pittsburgh. They are mostly photographs, text, books, photos, postcards, maps, images, finding aids, and postcards. They are browsable and searchable HERE.

The metadata schemes that are used include:

MARC (Machine-Readable Cataloging) for the text collections.

MARCXML (Machine-Readable Cataloging using an Extensible Markup Language schema) also for the text collections.

TEI Metadata (Text Encoding Initiative) also for the text collections.

EAD (Encoded Archival Description) for archival finding aids. The DRL uses EAD encoded in XML.

Dublin Core for image collections.

MARC records for the cataloged items come in. They are then turned into MARCXML. They are then converted into a TEI header, which is pretty similar to MARC and used for print collections and can also hold structural metadata. Metadata is usually extracted from existing records but, when records do not exist and the contributing agency does not have the means to catalog them, a TEI record is created from scratch.

Dublin Core is pretty much just used for the image collections with a few local modifications. DC is also required for the DRL's participation in the Open Archives Initiative, which ultimately increases access to the collection.

Dublin Core is also central to the DRL's participation in the Open Archives Initiative (OAI). The OAI metadata protocol provides a standard for sharing information about digital objects so that diverse collections from multiple institutions can be searched together, thus increasing awareness, use, and communication about digital resources in the academic community. The OAI metadata protocol requires use of the Dublin core standard to describe digital objects.

"Guidelines for Working with the DRL to Create Image Collections"

The final product for textual items is the TEI including bibliographic info, structural data, links to images, and OCR text. The final product for images is the Dublin Core record and the link to the corresponding image.

Controlled Vocabularies:

When MARC records for print material always have LCSHs assigned to them. The same is true for all items coming from the Archives Service Center and finding aids also have LCSH assigned to them by the ASC or some other archivist. Whenever possible the DRL try to have the content provider assign LCSH, as it is sort of a "preferred standard." However, sometimes collections coming in to be digitized have general LCSHs to describe the collection and then locally created keywords or subject headings to describe the items in the collection.

Types of Metadata:

The structural metadata is primairly for books in the print collections and becomes part of the TEI metadata and is saved as XML. It is also primarily generated using software developed in-house.

Descriptive metadata is used for pretty much all collections, either DC or TEI.

Administrative metatata isn't really used all that much. Currently administrative metadata is only included in certain fields of the DC and TEI records. Apparently the DRL is working on including MIX (Metadata for Images in XML).

The Basic Text Workflow Seems to be:

- get a list of items to be scanned with ID numbers from the books' barcodes and MARC records.

- MARC is converted to MARCXML then MARC XML is converted to TEI.

- the books are sent to the DRL (they are tracked on the computer system through each step of this process).

- the books are processed and sorted depending on scanning methods and materials.

- the books scanned, first in grey scale for text and then in color for images.

- structural metadata is created using in-house software.

- item is reviewed as part of quality control.

- final XML is created from TEI header, structural metadata for the body, links to the corresponding images, and OCR data.

- update the index for collections and update the online collections.

- return deliverables (I know this at least consists of spreadsheets of URLs and bar codes) to Technical Services.

No comments: