Integrate with other external data sources (WorldCat, LibraryThing, Open Library)
janusman - January 14, 2009 - 15:26
| Project: | Millennium Integration |
| Version: | 6.x-2.x-dev |
| Component: | Miscellaneous |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
| Issue tags: | RDF |
Description
Amazon should probably ruled out because of their API licensing (uses should "mainly" aim to redirect users to amazon).
Probably, like the author in this article mentions (http://journal.code4lib.org/articles/105 ) Open Biblio is the most open and probably least worrisome legal-wise; don't know about coverage, growth, dependability, etc.

#1
Also see code from:
http://drupal.org/project/bookpost
which is based on:
http://wordpress.org/extend/plugins/openbook-book-data/
#2
This article http://arxiv.org/ftp/arxiv/papers/0805/0805.2855.pdf mentions other "external datasets" (useful for Geographic tags?)
Also check out http://inkdroid.org/journal/2008/01/04/following-your-nose-to-the-web-of...
#3
Would this help?
guessing publisher from ISBN prefix
http://worldcat.org/devnet/blog/2009/01/guessing_publisher_from_isbn_p.html
#4
See this:
Open Library embeddable Book Reader
http://openlibrary.org/dev/docs/bookreader
And more:
http://wiki.code4lib.org/index.php/OSBW_Existing_Software
#5
Perhaps this is all just a subset of linked data? In that case, any useful information about the item (current prices, ratings at amazon or others), the authors (biographies, pictures, etc.), the publisher (homepage, addresses?), people who own/read it, libraries that hold it, etc... would all be targets for this issue. =)
#6
Another data source: the insight web service from random house.
To get direct links to cover images, TOCs and sample pages, see: http://www.randomhouse.biz/webservices/insight/spec.php#G
To embed a widget with a book, see: http://www.randomhouse.biz/webservices/insight/widget/userguide
#7
For now I went ahead and committed to 6-DEV an embedded Google Books widget. Probably needs some work though.
#8
I think that this should be something more modular; I plan on having additional modules that tie into the drupal_alter() hooks, add other hooks (e.g. alter the biblio data table, alter the holdings table, etc) so those other modules would be in charge of adding more information.
This would mean I would move the code that adds the Library of Congress information and Google Book Search link and widget into other modules.
This would also open up the possibility of these new modules actually not depend on millennium.module at all, since in fact, Google Books and other just need an ISBN or other information to work, and are not at all tied to Millennium. For instance, these modules could work using information from Biblio module, a CCK field deemed to hold an identifier of some sort, or use the Millennium.module's stored biblio array.
We could then have modules at different steps:
* Enrichment during import. Say your III record is missing the number of pages; it could be fetched from another source and added to the record. Or say you want to import the Table of Contents from LOC (like we do now).
* Enrichment during viewing (adding online fulltext viewers, adding links, etc.), kind of like WebBridge (or whatever it's called these days?) does on III.
How does this sound?
#9
I'm in the middle of coding a new component with this same idea.. I can send a preliminary version later. I'm currently using this for cover images, but will be using it also for Wikipedia, Google books, Millennium availability data, etc. (Anything, really.)
Features (currently implemented):
Supports a flexible model to deal with different types of external data differently.
Features (coming up):
I'll try to give a preview as soon as possible.
#10
Awesome!! Please share when you have something/anything =)
#11
Just another note:
Library thing API:
http://www.librarything.com/services/librarything.ck.getwork.php
I'm thinking this is *most* useful; I thin ka module could ask for the API key, and manage the upper limit of calls per day to keep in line with the terms of service.
#12
See latest commit for the beginnings of (very basic, humble, horribly unscalable) configurable enrichment options for the module =)
http://drupal.org/cvs?commit=297112
BTW, yes, I killed, like, 4 kittens with that commit.
#13
Re: #10. Here is a preliminary version of the server-side logic. The design is not probably as good as possible yet, I'll refactor it when I add the caching logic and other optimizations.
By the way, currently you should install this in a directory "bibmash" right under the drupal root. (There probably is a better place)
And there is no client side javascript code yet, but you can see the json output in your browser if you go to http://site/bibmash/bibmash.php?id=*valid_millennium_integration_node_id*. LastFM coverimages is really the only concrete feature here. (I'm making a music site).
Here are my basic design choices:
ajax requests, and loading the whole Drupal environment isn't very wise in this case.
About the files:
The class model is this:
(urls for coverimages). A metadata object knows how to present itself.
I hope all this isn't overkill. I will try to keep it simple, and I think the object oriented model fits quite well for this use because there is a clear need for different types of data / different sources, and inheritance keeps it extensible.
Any comments? =)
#14
Sorry, some trouble with my automatic insertion of license notice on top of each source file. So *here* is a working version.
#15
For examples for cover images, see:
http://dobby.darienlibrary.org/websvn/listing.php?repname=locum&path=%2F...
http://github.com/eby/sopac2-contrib/tree/master/covercache/
http://cheerfulcurmudgeon.com/2008/08/11/caching-free-librarything-book-...
#16
This patch (committed) adds a cover image "finder" to the "enrichment" module.
It doesn't download the images but it finds them grabbing a portion of the file from a variety of services, and uses that fragment to determine if it's usable (by checking the image width).
Probably next would be to have a simple caching mechanism, and some way for the user to configure this.