Allow importing from different OPACs [#355602]

This would let Drupal serve as a central search index for different OPACs.

Bonus points: The "same item" in different libraries occupies one node only (instead of having one node of the "same item" per OPAC). This would mean showing the user all holdings for all crawled OPACs from within the same node; would have to determine how to detect the "same item", which MARC record prevails (newest-changed? priority for each opac?)

Comment	File	Size	Author
#24	millennium-355602-24.patch	59.77 KB	janusman
#24	355602-mappings.png	44.24 KB	janusman
#24	355602-crawl.png	25.33 KB	janusman
#24	355602-authentication.png	11.9 KB	janusman
#24	355602-sources.png	33.3 KB	janusman
#23	millennium-355602-23.patch	60.96 KB	janusman
#20	millennium-355602-20.patch	53.21 KB	janusman
#19	millennium-355602-19.patch	50.54 KB	janusman
#17	millennium-355602-17.patch	31.56 KB	janusman
#16	Changes for multi-OPAC support 19mar2010.pdf	349.04 KB	janusman
#12	millennium-355602.patch	703 bytes	tituomin
#6	millennium-355602-6.patch	2.44 KB	janusman
#4	millennium-355602-4.patch	8.48 KB	janusman

Comments

Comment #1

janusman commented 13 April 2009 at 22:54

Probably "same record" would only be feasable by using LC or OCLC numbers in MARC.

Comment #2

janusman commented 30 October 2009 at 22:26

As of today, the underlying code and database tables are almost finished to allow this.

TODO:
* reimporting items, checking availability, etc. are still not looking at the originating URL from {millennium_node_bib} and instead assume the current URL.
* it should be possible to configure different opacs in the settings page, and then just pick which to import from during manual import.
* "This is the same item" (a.k.a. FRBR?) algorithms. Perhaps not in the scope for this project, but could be an add-on module that could intervene during the import process?

Comment #3

tituomin commented 2 November 2009 at 13:51

Looks interesting! About FRBR-ish algorithms: I've come to the conclusion in my own project, that each FRBR entity type needs its own CCK-enabled node type. For example, I have a new node type for works, containing nodereferences to the different manifestations (which are actually the very nodes generated by Millennium Integration). I think this is a good solution: you can use different heuristics to derive works from manifestations, comparing titles, authors and other things (maybe even using WorldCat / LibraryThing APIs). The results so far are promising, not quite finished yet though. Same approach might work for this feature..

Comment #4

janusman commented 30 November 2009 at 21:49

Status	File	Size
new	millennium-355602-4.patch	8.48 KB

I'm committing this patch for now; mainly it includes an optional base_url argument in lots of functions so that the information is fetched from the WebOPAC the record was imported from instead of the current module's settings.

Comment #5

janusman commented 30 November 2009 at 23:54

Status:

Active

» Needs review

Setting to needs review

Comment #6

janusman commented 1 December 2009 at 00:13

Status	File	Size
new	millennium-355602-6.patch	2.44 KB

Missed a few.

Comment #7

janusman commented 1 December 2009 at 00:22

Status:

Needs review

» Active

Committed last patch.

Need more testing.

Comment #8

janusman commented 1 December 2009 at 17:37

I think the only thing missing is to explicitly let the admin configure different OPACs on the settings screen.

Comment #9

janusman commented 1 December 2009 at 21:04

Uh, no; the mass import functions are missing quite a bit. For instance refresing records will default to the currently-set WebOPAC instead of the source for that record.

Comment #10

janusman commented 21 January 2010 at 16:43

Working on an extensive patch that would get us a LOT closer to having each imported record "know" were it was imported from.

After that would be some way to have a single node pointing to the same item in its different locations.

Comment #11

janusman commented 25 January 2010 at 16:19

Mega-kitten-killer patch is in:http://drupal.org/cvs?commit=318164

This broke some things afterwards but I think the current DEV version is stable enough for testing; yo do need to run update.php if you want to test.

Things pending:
* Expose the innards to a UI which lets admins specify settings for different OPACs, and also report back on status of each (e.g. total number of items from each OPAC, report fetch time independently, etc.)

I'm thinking I won't dare try to FRBR-ize stuff with some quick-but-badly-thought-out mechanism of my own... for now I guess providing hooks to let other modules sort out if things are equivalent at import-time would be a start. FRBR-izing also requires rethinking the DB architecture, figure out how the holdings information would be shown, etc. So, each bib record from each OPAC would still be a separate node for now.

Comment #12

tituomin commented 9 February 2010 at 13:44

Status	File	Size
new	millennium-355602.patch	703 bytes

I got some mild warnings from update_6003:

warning: array_merge() [function.array-merge]: Argument #2 is not an array in /var/www/musa/update.php on line 174.
warning: Invalid argument supplied for foreach() in /var/www/musa/update.php on line 338.

Patch included. I'll try to do some testing.

Comment #13

tituomin commented 26 February 2010 at 12:30

Status:

Active

» Needs review

Comment #14

janusman commented 26 February 2010 at 14:27

Status:

Needs review

» Active

Committed #13. Thanks!

Comment #15

janusman commented 10 March 2010 at 14:33

Ok, TODO:

* We need the user to enable one OPAC farily quickly and enter additional ones (or just fill in the name of base URLs from items imported).

* I think we will only enable auto-crawling for ONE of the OPACs for now. In the future we could do simple algorithm like round-robin on each cron run (opac 1 on cron run 1, opac 2 on cron run 2, opac 3 on cron run 3, back to opac 1 on run #4, etc)

* The module should let admins map OPAC base urls to actual names of the libraries/catalogs containing them. Then the items could inherit that name too in taxonomy. Idea: Right now the "Mappings" settings tab lets one map to an "availability" vocabulary... maybe the library name could be the parent term for the availability. I think this also means moving this particular setting out of the mapping tab (which is global for all MARC->Taxonomy) and move it into the settings for each OPAC.

* The base url widget on all settings screens could now change to an autocomplete.

* The status report could benefit from splitting up reports for each OPAC. For now maybe it should just say the number of different OPACs and each line on data tables should mention the base URL or name.

Comment #16

janusman commented 19 March 2010 at 17:02

Status	File	Size
new	Changes for multi-OPAC support 19mar2010.pdf	349.04 KB

Mockups for what this would look like...

Comment #17

janusman commented 19 March 2010 at 23:48

Status	File	Size
new	millennium-355602-17.patch	31.56 KB

Ok, part 1 of a big patch to make this work.

Done:
* Add/remove multiple OPACs and their names.
* Switched crawl settings to own tab; a select element lets one pick an OPAC from the source table.
* update.php code

After applying patch please empty caches. Run update.php to migrate the existing configured OPAC (and from all imported nodes) into the new source table.

Todo (to come, hopefully, in next patch)
* Taxonomy handling upon source add/remove.
* Taxonomy options
* Taxonomy mapping on node import/update.
* opac name display in holdings table (easy)
* Put back functionality (removed here) for millennium_filter called with only a record # (no base url) and also preview/import records one-by-one.

Buggy:
* AHAH in conjunction with autocomplete in Batch Import. Thinking of switching to just the select box instead of autocomplete text field.

Wishlist:
* Option to check "remove" in source table that will also delete nodes.
* Show number of imported nodes per source.

Comment #18

janusman commented 19 March 2010 at 23:49

Forgot to mention this also disables the millennium_auth module, as it needs some way to define a single OPAC to associate with logins.

Comment #19

janusman commented 22 March 2010 at 20:50

Status:

Active

» Needs review

Status	File	Size
new	millennium-355602-19.patch	50.54 KB

New patch. Now working:
- taxonomy terms for opac names
- rename taxonomy when opac name is renamed
- authentication (must set up using new "authentication" tab)

Yet to do:
* preview/import records one-by-one (needs to accept a full URL or a base URL along with a record number... or recieve a record number and ask for a source OPAC before importing)

Wishlist:
* Option to check "remove" in source table that will also delete nodes.

Comment #20

janusman commented 23 March 2010 at 04:03

Status	File	Size
new	millennium-355602-20.patch	53.21 KB

Ok, final patch for review.

Only thing left would be the wishlist item, an option to check "remove" in source table that will also delete nodes. =) However, this belongs in another issue.

Comment #21

janusman commented 23 March 2010 at 04:15

This is a self-review =)

+++ contributions/modules/millennium/millennium.admin.inc Locally Modified (Based On 1.1.2.31)
@@ -56,7 +42,165 @@
+/**
+ * Submit handler for settings form; handles special values that are not
+ */
+function millennium_admin_settings_form_submit($form, &$form_state) {

Comment was truncated.

+++ contributions/modules/millennium/millennium.admin.inc Locally Modified (Based On 1.1.2.31)
@@ -56,7 +42,165 @@
+  #dpm($form_state);

Remove this debug code.

+++ contributions/modules/millennium/millennium.import.inc Locally Modified (Based On 1.1.2.20)
@@ -163,19 +158,13 @@
+function millennium_fetch_records_via_bookcart($recnums, $complete_holdings = false, $base_url) {

These args should also prolly be rearranged.

+++ contributions/modules/millennium/millennium.module Locally Modified (Based On 1.13.2.33.2.2.2.86)
@@ -647,12 +654,19 @@
+  $items['millennium_autocomplete_js'] = array(
+    'page callback' => 'millennium_autocomplete_js',
+    'type' => MENU_CALLBACK,
+    'access arguments' => array('administer millennium'),
+    'file' => 'millennium.pages.inc',
+  );

I think this is no longer needed.

+++ contributions/modules/millennium/millennium.module Locally Modified (Based On 1.13.2.33.2.2.2.86)
@@ -1563,7 +1576,7 @@
+function millennium_fetch_recordpage($recnum, $mode = "plain", $base_url) { // TODO Fix argument order

Fix argument order

+++ contributions/modules/millennium/millennium.module Locally Modified (Based On 1.13.2.33.2.2.2.86)
@@ -1590,15 +1603,7 @@
+function millennium_permalink($recnum, $mode = 'plain', $base_url) { // TODO fix argument order

Fix thee arguments

+++ contributions/modules/millennium/millennium.pages.inc Locally Modified (Based On 1.1.2.5)
@@ -313,11 +324,29 @@
+
+/**
+ * Callback function for base_url autocomplete form elements
+ */
+function millennium_autocomplete_js() {
+  $suggestions = array();
+  $search_parts = explode('/', trim($_GET['q']));
+  $search_string = implode('/', array_slice($search_parts, 1));
+  $sources = variable_get("millennium_sources", array());
+  // Look for $search_string in all sources
+  foreach ($sources as $base_url => $source_data) {
+    if (strpos($base_url, $search_string) === 0) {
+      $suggestions[$base_url] = $base_url;
+    }
+  }
+  drupal_json($suggestions);
+  exit;
+}

This is no longer needed I think?

+++ contributions/modules/millennium/millennium_auth.module Locally Modified (Based On 1.1.2.6)
@@ -303,11 +352,22 @@
+  $millennium_baseurl = variable_get('millennium_auth_default_base_url', '');
+  // Use HTTPs if settings indicate so. TODO make this automatic?
+  if (variable_get('millennium_auth_use_https', FALSE)) {
+    $millennium_baseurl = str_replace("http://", "https://", $millennium_baseurl);
+  }
+
   // Connect to Millennium and get the patron's information
-  $patroninfo_data = patroninfo_start_session(millennium_get_real_baseurl(), $username, $lastname, $pin);
+  $patroninfo_data = patroninfo_start_session($millennium_baseurl, $username, $lastname, $pin);

Change variable from $millennium_baseurl into just $base_url

Powered by Dreditor.

Comment #22

janusman commented 23 March 2010 at 04:15

Status:

Needs review

» Needs work

Comment #23

janusman commented 23 March 2010 at 22:38

Status:

Needs work

» Needs review

Status	File	Size
new	millennium-355602-23.patch	60.96 KB

This looks like it has a shot =)

Comment #24

janusman commented 25 March 2010 at 15:37

Status:

Needs review

» Fixed

Status	File	Size
new	355602-sources.png	33.3 KB
new	355602-authentication.png	11.9 KB
new	355602-crawl.png	25.33 KB
new	355602-mappings.png	44.24 KB
new	millennium-355602-24.patch	59.77 KB

Yay! Committed this patch: minimal changes from the one in #23.

#355602 by janusman, tituomin: Changed Allow importing from different OPACs.

See attached screenshots to see how the interface looks =)

Comment #25

8 April 2010 at 15:40

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Allow importing from different OPACs

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

News items

Our community

Documentation

Drupal code base

Governance of community