This would let Drupal serve as a central search index for different OPACs.
Bonus points: The "same item" in different libraries occupies one node only (instead of having one node of the "same item" per OPAC). This would mean showing the user all holdings for all crawled OPACs from within the same node; would have to determine how to detect the "same item", which MARC record prevails (newest-changed? priority for each opac?)
| Comment | File | Size | Author |
|---|---|---|---|
| #24 | millennium-355602-24.patch | 59.77 KB | janusman |
| #24 | 355602-mappings.png | 44.24 KB | janusman |
| #24 | 355602-crawl.png | 25.33 KB | janusman |
| #24 | 355602-authentication.png | 11.9 KB | janusman |
| #24 | 355602-sources.png | 33.3 KB | janusman |
Comments
Comment #1
janusman commentedProbably "same record" would only be feasable by using LC or OCLC numbers in MARC.
Comment #2
janusman commentedAs of today, the underlying code and database tables are almost finished to allow this.
TODO:
* reimporting items, checking availability, etc. are still not looking at the originating URL from {millennium_node_bib} and instead assume the current URL.
* it should be possible to configure different opacs in the settings page, and then just pick which to import from during manual import.
* "This is the same item" (a.k.a. FRBR?) algorithms. Perhaps not in the scope for this project, but could be an add-on module that could intervene during the import process?
Comment #3
tituomin commentedLooks interesting! About FRBR-ish algorithms: I've come to the conclusion in my own project, that each FRBR entity type needs its own CCK-enabled node type. For example, I have a new node type for works, containing nodereferences to the different manifestations (which are actually the very nodes generated by Millennium Integration). I think this is a good solution: you can use different heuristics to derive works from manifestations, comparing titles, authors and other things (maybe even using WorldCat / LibraryThing APIs). The results so far are promising, not quite finished yet though. Same approach might work for this feature..
Comment #4
janusman commentedI'm committing this patch for now; mainly it includes an optional base_url argument in lots of functions so that the information is fetched from the WebOPAC the record was imported from instead of the current module's settings.
Comment #5
janusman commentedSetting to needs review
Comment #6
janusman commentedMissed a few.
Comment #7
janusman commentedCommitted last patch.
Need more testing.
Comment #8
janusman commentedI think the only thing missing is to explicitly let the admin configure different OPACs on the settings screen.
Comment #9
janusman commentedUh, no; the mass import functions are missing quite a bit. For instance refresing records will default to the currently-set WebOPAC instead of the source for that record.
Comment #10
janusman commentedWorking on an extensive patch that would get us a LOT closer to having each imported record "know" were it was imported from.
After that would be some way to have a single node pointing to the same item in its different locations.
Comment #11
janusman commentedMega-kitten-killer patch is in:http://drupal.org/cvs?commit=318164
This broke some things afterwards but I think the current DEV version is stable enough for testing; yo do need to run update.php if you want to test.
Things pending:
* Expose the innards to a UI which lets admins specify settings for different OPACs, and also report back on status of each (e.g. total number of items from each OPAC, report fetch time independently, etc.)
I'm thinking I won't dare try to FRBR-ize stuff with some quick-but-badly-thought-out mechanism of my own... for now I guess providing hooks to let other modules sort out if things are equivalent at import-time would be a start. FRBR-izing also requires rethinking the DB architecture, figure out how the holdings information would be shown, etc. So, each bib record from each OPAC would still be a separate node for now.
Comment #12
tituomin commentedI got some mild warnings from update_6003:
Patch included. I'll try to do some testing.
Comment #13
tituomin commentedComment #14
janusman commentedCommitted #13. Thanks!
Comment #15
janusman commentedOk, TODO:
* We need the user to enable one OPAC farily quickly and enter additional ones (or just fill in the name of base URLs from items imported).
* I think we will only enable auto-crawling for ONE of the OPACs for now. In the future we could do simple algorithm like round-robin on each cron run (opac 1 on cron run 1, opac 2 on cron run 2, opac 3 on cron run 3, back to opac 1 on run #4, etc)
* The module should let admins map OPAC base urls to actual names of the libraries/catalogs containing them. Then the items could inherit that name too in taxonomy. Idea: Right now the "Mappings" settings tab lets one map to an "availability" vocabulary... maybe the library name could be the parent term for the availability. I think this also means moving this particular setting out of the mapping tab (which is global for all MARC->Taxonomy) and move it into the settings for each OPAC.
* The base url widget on all settings screens could now change to an autocomplete.
* The status report could benefit from splitting up reports for each OPAC. For now maybe it should just say the number of different OPACs and each line on data tables should mention the base URL or name.
Comment #16
janusman commentedMockups for what this would look like...
Comment #17
janusman commentedOk, part 1 of a big patch to make this work.
Done:
* Add/remove multiple OPACs and their names.
* Switched crawl settings to own tab; a select element lets one pick an OPAC from the source table.
* update.php code
After applying patch please empty caches. Run update.php to migrate the existing configured OPAC (and from all imported nodes) into the new source table.
Todo (to come, hopefully, in next patch)
* Taxonomy handling upon source add/remove.
* Taxonomy options
* Taxonomy mapping on node import/update.
* opac name display in holdings table (easy)
* Put back functionality (removed here) for millennium_filter called with only a record # (no base url) and also preview/import records one-by-one.
Buggy:
* AHAH in conjunction with autocomplete in Batch Import. Thinking of switching to just the select box instead of autocomplete text field.
Wishlist:
* Option to check "remove" in source table that will also delete nodes.
* Show number of imported nodes per source.
Comment #18
janusman commentedForgot to mention this also disables the millennium_auth module, as it needs some way to define a single OPAC to associate with logins.
Comment #19
janusman commentedNew patch. Now working:
- taxonomy terms for opac names
- rename taxonomy when opac name is renamed
- authentication (must set up using new "authentication" tab)
Yet to do:
* preview/import records one-by-one (needs to accept a full URL or a base URL along with a record number... or recieve a record number and ask for a source OPAC before importing)
Wishlist:
* Option to check "remove" in source table that will also delete nodes.
Comment #20
janusman commentedOk, final patch for review.
Only thing left would be the wishlist item, an option to check "remove" in source table that will also delete nodes. =) However, this belongs in another issue.
Comment #21
janusman commentedThis is a self-review =)
Comment was truncated.
Remove this debug code.
These args should also prolly be rearranged.
I think this is no longer needed.
Fix argument order
Fix thee arguments
This is no longer needed I think?
Change variable from $millennium_baseurl into just $base_url
Powered by Dreditor.
Comment #22
janusman commentedComment #23
janusman commentedThis looks like it has a shot =)
Comment #24
janusman commentedYay! Committed this patch: minimal changes from the one in #23.
#355602 by janusman, tituomin: Changed Allow importing from different OPACs.
See attached screenshots to see how the interface looks =)