The mass cart import failed for several items. The comparison logic seems a bit fragile, like the comments said. I caught two bugs in _millennium_only_letters: it didn't work for international letters (switched to mb_ereg_...) and the MARC indicators should be replaced with a space (at least here).
| Comment | File | Size | Author |
|---|---|---|---|
| #10 | millennium-726580-10.patch | 10.52 KB | janusman |
| #9 | millennium-726580-9.patch | 9.92 KB | janusman |
| #8 | millennium-726580-8.patch | 9.91 KB | janusman |
| #6 | millennium-726580-6.patch | 6.25 KB | janusman |
| #3 | millennium-726580-3.patch | 3.9 KB | janusman |
Comments
Comment #1
tituomin commentedComment #2
janusman commentedBecause it's difficult to really know what PCRE support admins will have compiled into their PHP instance (see http://drupal.org/node/315486#comment-1117887) I'm hesitant to place this code in. I think the correct way it to use what search.module does: use a constant that defines "non-word" characters instead of using \W.
So, I prefer this new patch... needs testing.
Powered by Dreditor.
Comment #3
janusman commentedI can't seem to find good subjects to test this with!
I *did* find ONE record that did import better with your patch than with no patch or my patch from #2 in b1112000-b1112999 @www.helmet.fi ...
I rewrote my patch to the one below and it seems to fix that one missing record.
I would welcome more record numbers where you found your patch solved the problem, to actually do a good comparison and testing =)
Comment #4
tituomin commentedTry
For me they all fail -- with your patch they work!That one fails because the title is all non-ASCII.
Comment #5
tituomin commentedAha! Curiously, because the first one fails, the rest fail too.. If you remove the first from the list, the rest are ok. I made a new issue & patch for this bug: #729016: Title mismatch causes whole block to be set as not found. As for this issue, I think your patch is ok! Actually, I realized you don't even need to replace the subfield indicators with spaces, since it's just an ad hoc test and makes the same transformation for both operands in the test..
Comment #6
janusman commentedOk, after a LOT of testing, this is what I came up with:
1) Settled with mb_ereg_replace instead of preg_replace for _millennium_only_letters() ... as for some reason (in 2 different test environments) trying to use preg_replace in some instances choked with some MARC records containing som UTF-8 characters in high ranges; mb_ereg_replace() worked perfectly. I added (in another commit) a dependency on that function in order to use the module (hoping it won't be too much of a requisite)
2) I did some fine-grained testing to check if what we're getting from millennium_fetch_records_via_bookcart() can be trusted or not (see new function _millennium_confirm_found_not_found()) by checking, one by one, each found and not-found record. The result?
www.helmet.fi, crawl of 1113600 - 1113799Returns false positives for these 4 records
adds all those items to the book cart EXCEPT the last one (b887257); if the cart is cleared and only that one is added:
it seems to work.
SO, this patch makes things better, but it still needs to fix the false positives... it shouldn't be happening at all in the example above because we're supposed to have run out of MARC-bookcart title matches before we hit those positives.
Comment #7
janusman commentedBy the previous comment, I didn't make clear that I think that trying to fix false negatives is a lost battle--it might depend totally on the "mood" of the WebOPAC =) That or I'm just not looking well enough (probably!)
BTW I think I have the false positives fixed; will post an updated patch soon.
Comment #8
janusman commentedI think I have nailed it =)
This patch:
* Reduces cookie lifetime to 5 minutes (seemed to cause problems in some OPACs)
* Reduces items-at-a-time to add to bookcart to 20, total bookcart size is max 40 before emptied
* Eliminates false positives:
Still using mb_ereg_replace() for millennium_only_letters():
Comment #9
janusman commentedThis patch just makes a small change from the previous one, changing
to
(since non-english OPACs would not show their translation of "No such record" and the test would fail).
Comment #10
janusman commentedEnough testing! Committing this to stabilize things.