i get lots of

Undefined index: EM in /var/www/glw/drupal/sites/all/modules/apachesolr/apachesolr.index.inc on line 236.

warnings when indexing some crappy old html.

seems that the regex to get the tags is case insensitive, but the code assumes that the tags are always lower-cased, causing the warnings.

attached patch fixes that.

Comments

Anonymous’s picture

Status: Active » Needs review
Anonymous’s picture

also, just noting something else i don't have time to create a patch for yet. in apachesolr_add_tags_to_document() and related functions, code does this sort of thing:

        $document->{'ts_vid_'. $ancestor->vid .'_names'} .= ' '. $name;

this causes all sorts of errors in my logs, for example:

Undefined index: ts_vid_2_names in /var/www/glw/drupal/sites/all/modules/apachesolr/SolrPhpClient/Apache/Solr/Document.php on line 322.

the code seems to assume that was set elsewhere, so we need to append? and there's more like it. i've suppressed the errors with this in Apache_Solr_Document:

    public function __get($key)
    {
        if (!isset($this->_fields[$key])) {
          return NULL;
        }
        return $this->_fields[$key];
    }

but this seems to be a bug in the drupal module code...

pwolanin’s picture

patch looks reasonable

robertdouglass’s picture

Status: Needs review » Patch (to be ported)
StatusFileSize
new1013 bytes

Even old crappy HTML can be UTF8, right? Using drupal_strtolower() and applying. Thanks!

robertdouglass’s picture

StatusFileSize
new1.03 KB

The lower is in the wrong place. This is the one I'm committing to 6.2.

robertdouglass’s picture

#763072 by robertDouglass, justinrandell | pwolanin: Fixed warnings when indexing old, crappy html.

pwolanin’s picture

needs to be ported to which branch?

robertdouglass’s picture

Version: 6.x-2.x-dev » 6.x-1.x-dev

Sorry - to 6.1.

pwolanin’s picture

Status: Patch (to be ported) » Needs work

No, we never have multi-byte tags. I think the original patch was correct.

  $tags_to_index = variable_get('apachesolr_tags_to_index', array(
    'h1' => 'tags_h1',
    'h2' => 'tags_h2_h3',
    'h3' => 'tags_h2_h3',
    'h4' => 'tags_h4_h5_h6',
    'h5' => 'tags_h4_h5_h6',
    'h6' => 'tags_h4_h5_h6',
    'u' => 'tags_inline',
    'b' => 'tags_inline',
    'i' => 'tags_inline',
    'strong' => 'tags_inline',
    'em' => 'tags_inline',
    'a' => 'tags_a'
  ));

The array keys here are what we are matching.

pwolanin’s picture

Version: 6.x-1.x-dev » 6.x-2.x-dev

.

pwolanin’s picture

StatusFileSize
new1.85 KB

Here's the patch I'm committing to 6.x-1.x

pwolanin’s picture

Version: 6.x-2.x-dev » 5.x-2.x-dev
Status: Needs work » Patch (to be ported)

fixed in 6.x-2.x. Needs to be ported to D5.

jpmckinney’s picture

Status: Patch (to be ported) » Fixed

Fixed in 5-2.
http://drupal.org/cvs?commit=361222

Note that the unrelated issue in #2 has since been resolved.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.