Open Archives Logo

This is a Drupal module that fetches and parses OAI_DC (Dublin core) metadata records from OAI-PMH services, as defined by http://www.openarchives.org/. It depends on the Feeds module.

Features

  • Harvests from OAI-PMH repositories, respecting resumptionTokens, compression (but no deleted record support yet).
  • Can map OAI_DC metadata into Feeds targets (CCK, taxonomy, etc.) (other metadata schemas can be supported by this or other modules in the future, today you can roll your own using existing modules)
  • Can harvest from the entire repository or a single set. Available sets are loaded in via AHAH when creating the importer.
  • You can set up multiple harvesting rules/mapping per set and/or repository as you desire.
  • Record storage handled by Feeds: nodes (CCK support), raw database, etc. Extensible with other modules.
  • Cron scheduling handled by Feeds: as often as every cron run up to a month between harvests.

Requirements

You need the Feeds module and its dependencies (Ctools, etc.) This module has been tested on Feeds 6.x-1.x-beta10.
Recommended additional modules: CCK, Link (to hold the resource URLs)

Usage

  • Enable all modules: Feeds (and its dependencies) and the "Feeds OAI-PMH Fetcher and Parser" module.
  • Create a new node type for the importer. For instance "OAI repository". Each node of this type will hold each Feed importer configured later on. You could have a different set and/or repository per node, for instance.
  • Create a new node type for the imported records. Here you should add a field for each of the Dublin Core fields that will be imported. For instance: description, publisher, type, format, subject, date, etc. You could also add taxonomy vocabularies for this node type.
  • Create a new Feeds importer at admin/build/feeds/create
  • Configure your importer as you would any other, except for these settings:
    • On Basic Settings, in "Attach to content type" choose the importer node type created earlier. I recommend you uncheck "Import on submission".
    • For Fetcher, choose "HTTP OAI-PMH Fetcher".
    • For Parser, choose "OAI parser".
    • For Processor, choose "Node processor".
    • Under the node processor settings, on "Content type" select the node type created earlier for the records.
  • Add a new node of the type configured.
    • On the "Feed" fieldgroup, on the "URL" field, you should enter the URL for the OAI-PMH endpoint. For instance: http://www.dlese.org/oai/provider
    • The "Set to fetch" options box will be populated if you have Javascript active. If not, just save and re-edit the node to see the available sets.
    • Save the node and then click on the "Import" tab above the node. This will create a node for each record from the repository's selected set. Note that some large repositories could take a long time. If a repository sends a limited amount of records per query, you will have to run cron or hit "Import" repeatedly.

Similar modules

  • The eXtensible Catalog (XC) Drupal Toolkit: includes an OAI Harvester. By itself, it does not actually store harvested records, but can use modules from XC (or others implementing the required hooks) for storage, indexing and browsing. The XC toolkit is a rather powerful all-in-one solution, where Feeds OAI-PMH is a more atomic building block-type approach.
  • Dublin Core to CCK: requires XC's OAI Harvester. It can harvest Dublin Core metadata into a hardcoded node type, putting data in CCK text fields. Non-configurable, doesn't seem to be able to handle updating existing nodes.
  • Drupal OAI PMH: for Drupal 5 only, Drupal 6 version from commenters. Stand-alone module, only supports mapping into taxonomy.

If you want to make your Drupal site a provider instead of a harvester, check out OAI-PMH Module and OAI2 for CCK.

More information

For a listing of available repositories, see the module's README.TXT.

This project has been sponsored by the Center for Innovation in Technology and Education, Tecnológico de Monterrey.

Project Information

Downloads