Saturday, December 13, 2008

MARC and Dublin Core

This is a short assignment I did for the Independent Study of Metadata that was part of my MLIS coursework. There is nothing earth shattering below but its sort of interesting to see records broken down like this. I pretty much just write in those thin sharpies so it may be hard to read, but if you are familiar with MARC then it doesn't really matter. The numbers correspond to where information is located in each record.


Everything that's in the Dublin Core record is also found in the MARC record. The most notable (and obvious) difference is that Dublin Core is readable and much more intelligible than MARC's cryptic numeric and fixed fields. Even qualified Dublin Core is pretty easy to figure out. However, MARC includes quite a bit of information not found in the Dublin Core record. Most of this relates to holdings information and the descriptive information found in the fixed fields. It also doesn't include some of the numbers automatically generated by WorldCat. It seems like Dublin Core is not as concerned with the describing the book's format (probably a poor choice of words), which is what quite a few of the fixed fields deal with. It also doesn't have a bibliography note. It also doesn't seem to be as concerned with the source of information for the catalog record. The cataloging source and modifying agency is absent, same with the local holdings information, where it was cataloged, etc.

While Dublin Core might be missing some information, none of it seems all that essential. The information in the fixed field is almost never put to any great use. As far as I know OPACs don't allow users to narrow search results based on any of the information found in the fixed fields. You can't simply call up a list of English biographies that include illustrations, are held in Massachusetts libraries, and were published in 1973. So, really, nothing all that useful is lost for the user.

The major downside to Dublin Core would have to be that it is not nearly as standardized and records are not as easily shared because it is pretty customizable. The main thing lost in Dublin Core is the precision, but it can be applied to wider range of resources (physical or digital) more easily (although MARC has been adapted and simplified to work for archivists and to create simple crosswalks from MARC to DC).


Metadata Extraction Tools

For this assignment I also played around with DC Dot, a simple metadata extractor that can be found at http://www.ukoln.ac.uk/metadata/dcdoc/. It was up and running when I first played with it but it seems to be down right now. I'm not sure what the deal is. I inputted Pitt's Library and Information Science program's website (http://www.ischool.pitt.edu/) and this is the information that I got:

< rel="schema.DC" href="http://purl.org/dc/elements/1.1/">
< rel="schema.DCTERMS" href="http://purl.org/dc/terms/">
< name="DC.format" content="">
< name="DC.format" content="13347 bytes">
< name="DC.identifier" scheme="DCTERMS.URI" content="http://www.ischool.pitt.edu">

I am not sure where this information comes from other than the URL, which I inputted. I assume the format information is from the program loading the site and then sending back how many bytes of information the page contained. The general idea of metadata extraction is interesting but I wonder how useful it really is at this point (or at least this particular tool).

I entered several different URLs and couldn't get it to provide any more information. I wonder how much more information could be taken from other portions of the webpage. Could it just take from plain text? Could it judge the text size to get a title? Could it use the title on the top bar? Could it look at all the pages on a website, rather than just individual pages? Right now this just doesn't seem that useful. However, being able to extract metadata from webpages would obviously increase findablilty in the long run and would certainly be a better approach than expecting web designers (or just anyone making a webpage) to encode DC in their HTML. Someone will eventually have to organize this mess.

"A barrier to electronic resource cataloging is that many library professionals and information specialists continue to believe that cataloging web resources is a waste of time; it is better to make we pages (essentially webliographies or lists) because many of the web resources are too ephemeral to be included in the library catalog. However, new tools such as URL link checkers make the maintenance of metadata for web resources much simpler. It is more efficient to have users start with the library catalog as a single gateway to the universe of knowledge, no matter the format or type of information sought."

Anita S. Coleman's "From Cataloging to Metadata: Dublin Core Records for the Library Catalog"

Also, a simple URL checker to play with can be found HERE.

1 comment:

Ujan sharma said...

I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful.

state employee cashless treatment scheme