Download & Extend

Crawl or import from a URL with a list (ftlist or search result)

Project:Millennium OPAC Integration
Version:6.x-2.x-dev
Component:Miscellaneous
Category:feature request
Priority:normal
Assigned:janusman
Status:closed (fixed)

Issue Summary

Some libraries have featured lists of items; it would be nice to be able to get the module to import/update items from those lists.

Also, just crawling from a search might be good too. E.g.: import items for search for "branch:branch123 mattype:mattypea", or just a plain keyword search.

It'd also be nice to configure a maximum number of pages to crawl to get records to import (or a maximum total number of items) since, potentially, up to 32,000 items could be harvested in searches (unknown for ftlists)

Comments

#1

This would be a good feature. Also, importing the items from an RSS (a simple regexp would be able to extract bib-ids from an RSS feed or any other web source for that matter).

I'm not sure how well Millennium supports crawling from a search, though. At least the record ids are not visible in the search results.. Maybe using the cart somehow could enable this..

#2

On first look, search seems simple enough, too. The "Add to cart" button or checkbox contains the required bib number, so again, it's just a matter of a regexp. Finding the "Next page" link also seems [relatively] straightforward. =)

Extra points: I'd love a bookmarklet to say, "import all items on the current page", or maybe "Import my current bookcart's contents" =)

#3

Status:active» needs review

A first patch to kick things off. Rough, a bit slow, but works in my testing =)

2010-02-17_224118.png

AttachmentSizeStatusTest resultOperations
millennium-619450-3.patch8.77 KBIgnored: Check issue status.NoneNone
2010-02-17_224118.png22.11 KBIgnored: Check issue status.NoneNone

#4

Assigned to:Anonymous» janusman
Status:needs review» active

Committed following patch.
TODO: import from just any given URL (which could include an FTList)

AttachmentSizeStatusTest resultOperations
millennium-619450-4.patch2.75 KBIgnored: Check issue status.NoneNone

#5

Status:active» needs work

This still has some issues with non-UTF characters messing up during import:

For instance if this record is in the import results:

http://sabio.library.arizona.edu/search~S9/?searchtype=X&searcharg=educa...

which is titled
Jovens lideranças comunitárias e direitos humanos.

this message appears in watchdog; note lideranças is cut off...

Batch import error: no record number given in row: array ( 'id' => '32636', 'session' => '126688338312', 'data' => 'a:3:{s:10:"bib_recnum";s:8:"b4751054";s:5:"title";s:50:"Jovens lideran', )

#6

Status:needs work» active

Fixed this by centralizing all requests to a new function millennium_http_request() which is a proxy for drupal_http_request(); the new function converts to UTF-8 depending on the response's charset.

TODO: Add UI to allow admins to import from an arbitrary URL.

#7

Committed this patch.

This is a screenshot for the UI.

AttachmentSizeStatusTest resultOperations
millennium-619450-7.patch7.21 KBIgnored: Check issue status.NoneNone
2010-02-23_095820.png47.15 KBIgnored: Check issue status.NoneNone

#8

Status:active» fixed

#9

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

nobody click here