While indexing my document base, I stumble upon a document with a control char (8) in the path (don't ask...)
I had to modify the apachesolr_node_to_document() function as in the code below (sorry not providing a patch, but I'm not currenlty lined up with *latest* CVS (but I always keep an eye on that)... In need to test Solr 1.4 first)
...
if ($output && $output != $path) {
$document->path = $output;
}
...
became
...
if ($output && $output != $path) {
$document->path = apachesolr_strip_ctl_chars($output);
}
...
| Comment | File | Size | Author |
|---|---|---|---|
| #1 | clean-path-360227-1.patch | 683 bytes | pwolanin |
Comments
Comment #1
pwolanin commentedseems like a reasonable change - ideally we might actually do this is the underlying PHP library for all fields added to a document.
Please check this patch.
Comment #2
flexer commentedThe patch is OK.
You're right... Anyway, I'm indexing 700'000 documents (comments from a very old phpbb2 forum) and I got this problem for ONE document only.
BTW, I wrote a custom apachesolr_node_to_document() to index every comment as a single SOLR document and using an isfield as the $cid... it works very nicely :)
Comment #3
flexer commentedComment #4
pwolanin commentedcommitted to 6.x
Comment #5
pwolanin commented