Looking at the field bias page, I see the field tos_name_unformatted listed, where all the other fields have useful labels. I confirmed that this field is not listed in apachesolr_field_name_map().

Is this field really necessary to have indexed? Looking in apachesolr_index_node_solr_document(), I see the author's name being indexed twice, once unchanged in tos_name and once having been run through check_plain() in tos_name_unformatted.

  // Author information
  $document->ss_name = $node->name;
  // We want the name to be searchable for keywords.
  $document->tos_name = $node->name;

  // Index formatted username so it can be searched and sorted on.
  $account = (object) array('uid' => $node->uid, 'name' => $node->name);

  $username = check_plain($account->name);

  $document->ss_name_formatted = $username;
  $document->tos_name_formatted = $username;

This seems a little redundant to me. There may be a use case I'm not seeing here, but I would think that you're really not going to see a difference between the check_plain() version and the unaltered one, so searching against either of them should yield the same result, right?

I'm not sure if removing this field or providing a name for the mapping function is the way to go here.

CommentFileSizeAuthor
#5 1923764.patch610 bytesmkalkbrenner

Comments

nick_vh’s picture

Status: Active » Fixed

tos stands for text omit norms :

Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.

We also use this for taxonomy terms. This does some stemming (using the text analysers) in combination of the complete string. We can facet using the ss_* and search for parts of a username using the tos_*.

Does that make more sense? I'm going to close it as such, please reopen when you think this is still problematic

kevin.dutra’s picture

Status: Fixed » Active

Yes, I understand why you would want an ss_* version and a tos_* version. My confusion is why there is a *_name field set and a *_name_formatted field set.

nick_vh’s picture

Does this issue : http://drupal.org/node/1161608 answer your questions? There might be cases where you want the username without formatting, and I think we also did not wanted to change the behaviour too much.

Are you suggesting the formatted should be removed and we should always index the formatted version to ->name?

kevin.dutra’s picture

Ah, ok that makes sense. I wasn't aware of the format_username() function. I think that makes complete sense for D7.

Since D6 doesn't have that function, check_plain() is being used in its place for 6.x-3.x. Basically all check_plain() does in D6 is encode certain characters as HTML entities. I would think this would make it less useful for searching on because users are more likely to search on "&" rather than "&" (for instance).

At any rate, I'm not opposed to it being there, I just think it needs a proper label in apachesolr_field_name_map() so that on the bias page an administrator knows what's in that field and why it's different from tos_name.

kevin.dutra’s picture

Issue summary: View changes

Adding a code sample

mkalkbrenner’s picture

Assigned: Unassigned » mkalkbrenner
Issue summary: View changes
Status: Active » Needs review
StatusFileSize
new610 bytes

I agree with kevin.dutra:

I just think tos_name_formatted needs a proper label in apachesolr_field_name_map() so that on the bias page an administrator knows what's in that field and why it's different from tos_name.

Additionally the missing label causes an error in apachesolr_multilingual.

Due to the fact that the field is treated differently but not formatted I will use the label "Author name (Stemmed)".

mkalkbrenner’s picture

Title: tos_name_unformatted missing from field name map » tos_name_formatted missing from field name map

  • Commit 9fe98e2 on 6.x-3.x by mkalkbrenner:
    Issue #1923764 by kevin.dutra, mkalkbrenner: tos_name_formatted missing...
nick_vh’s picture

Status: Needs review » Fixed

Seems this has been fixed? :)

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.