Friday, December 12, 2008

CONTENTdm

I did some group work this past semester on CONTENTdm and thought I would post a little write up about the experience. Pieces of this were taken from a larger group final paper on CONTENTdm, Dublin Core, and VRA Core. There will be more of these types of things coming in the near future.


HOW DOES CONTENTdm WORK?

Only administrative accounts are able to create new collections in CONTENTdm. Similarly, only CONTENTdm administrators can add, delete, and modify user accounts, index metadata, make items searchable within the collection, implement and manage controlled vocabularies, and approve items and projects into the collection. One or two positions in larger organizations would be responsible for managing workflow and providing quality control for content and metadata. This would be provided by other catalogers or metadata librarians with standard CONTENTdm user accounts.

Standard accounts allow users to upload content, provide metadata, and edit some search, display, and formatting options for collections. These features will be further explained below and are mostly concerned with rights management and structural metadata. Users with standard accounts upload content into projects, where they are stored and organized until they receive approval from the organization’s CONTENTdm administrators and are added to the collection. These items are sent to queue in the administrative side of the CONTENTdm acquisition station for the metadata to be checked for accuracy, completeness, and quality.

Content and metadata records can be uploaded in bulk to make larger projects more efficient and so that digitization tasks can be divided throughout a whole department. Once items are uploaded into a project, the metadata can be assigned item by item or applied to multiple items using spreadsheet-like features. This series of screens would seem familiar to anyone accustomed to editing information about songs and albums in iTunes. The spreadsheet view also looks a great deal like the social networking website LibraryThing and the two work quite similarly. Users only need to double click on boxes to complete or edit metadata fields and there are an array of shortcuts to aide this process across multiple items.

Controlled vocabularies help maintain standards, consistency, and allow for faster metadata entry. If the CONTENTdm collection has a controlled vocabulary associated with it then the users cannot add terms to their metadata records not already contained within that vocabulary. This vocabulary can be pre-established, like the Library of Congress Thesaurus for Graphical Materials, or it can be built from scratch and imported into CONTENTdm. A vocabulary needs to be a list of approved terms saved as a simple text document with only one term per line. However, a pre-established vocabulary can also be modified within CONTENTdm. Terms can be added and removed to suit a particular organization’s information needs. Users can submit terms for approval by the administrator as they upload content and records to their project. Controlled vocabularies can also be built automatically from records that have already been created and indexed. This vocabulary can then be applied across collections and with other institutions using CONTENTdm.

CONTNETdm works with any file format, including audio and video and other file types corresponding to plugins and applications working with the user’s web browser. Options available for audio and video collections include the ability to divide files into segments and to assign metadata, structural and descriptive, to each, while still allowing them to be retrieved as a whole. CONTENTdm will assign images to these file types or provide the option to upload a representative image.

In image-based collections there are three ways to indicate copyright information directly on items within the collection. The first is banding, which adds a color band and text to the bottom of images. The second is branding, which puts a small image or logo in the lower right hand corner of all images in the collection. The third is watermarking, which embeds an image into the center of each image in the collection to indicate ownership and copyright. The original image is retained within the system and the watermarked version is displayed for the user.

When working with text-based images, like PDFs or scanned documents, collections can be indexed by the item or by the page. Full text is automatically extracted from born digital PDFs and included in the full text metadata field in the item’s record. A thumbnail image is also automatically generated to represent the document in the collection. If the PDF or text-based image is not born digital then the item can be run through optical character recognition software and then that text can be included in the full text field for the item.


MY/THE GROUP'S EXPERIENCE WITH CONTENTdm


According to the documentation provided by OCLC, our academic CONTENTdm demo setup was supposed to already contain collections built specifically for several different metadata standards (varieties of Dublin Core and VRA) and designed for different types of content. However, this wasn’t the case and we were unable to reconfigure the existing collections to suit our needs or to create new collections that all users in our group could access.

Many of the collections that we found already contained a great deal of content but we were unable to access much of it. What we could access most of it could not be deleted or modified. Our CONTENTdm group continually had issues with access, which eventually lead to the early termination of the project in favor of a research-based project on metadata, Dublin Core, VRA, and CONTENTdm. It was assumed that there was simply a miscommunication between professors and OCLC because the problems our group encountered were clearly unexpected.

We found that a project of this nature required a full time tech support person to put together. As group leader and faculty liaison, this became my role. While some issues were eventually sorted out (everyone was finally able to install the acquisition station, set up accounts, log in, begin uploading items, and start testing CONTENTdm), more arose. The fact that we were unable to continue with the project as initially planned was partially because of the complications involved in providing virtual technical support to students collaborating from across the country.

Some of the specific problems encountered by the group included: projects disappearing after being created; members only being able to access one collection; members being unable to view what had been uploaded, approved and indexed; members having access to entirely different collections; and members having issues running the Acquisition Station on non-Windows machines. This last issue of only providing software for PCs seemed unfortunate considering the growing number of people switching from PC to Linux and Apple, especially those dealing with the manipulation of large image files. Perhaps these developments are still to come.

Despite claims of being scalable, flexible, and customizable, the main issues we encountered with CONTENTdm concerned the way that it seems to only allow for specific workflows. This could be because the group was only working with an academic demo or it could simply be the nature of turnkey library software. However, the group found that CONTENTdm worked best when digitization labor was divided into specific compartmentalized tasks and roles. These roles end up being manifested in CONTENTdm in a rigid hierarchy of access and privileges that could easily become a hindrance for certain projects (namely smaller projects with only a few librarians involved or projects with a workflow based on collaboration and with more fluid or modular roles). From our experience it did not appear that CONTENTdm was particularly adaptable.

We were never able to find a way to get around the division between administrative and standard user roles and privileges. That being said, we were able to use the administrative password to fill all roles, uploading and approving. It is possible to create multiple administrative passwords from the server side of things (which we did not have access to in this demo version), and it is also conceivable that a smaller library would have one or two administrators covering the entire work flow. However, it does appear that this set up would be less than seamless and might become a bit tiresome down the line.

It would have been great to see how CONTENTdm works with OCLC Connexion and Worldcat, how it works on a consortial level, and to get a better sense of its use as an institutional repository, as these areas seem as though they could be its strengths. While several members of the group expressed some pretty strong negative feelings about CONTENTdm, I would have to say that it is difficult to get a real sense of how it would work for different types of projects from a demo version. It certainly seems to have its weaknesses (I have doubts about its adaptability and it seems much less intuitive and user-friendly than OCLC’s advertising would have us believe), but it might very well be the best option available to libraries in need of a turnkey solution requiring minimum in-house technical support.

No comments: