Looking at the field bias page, I see the field tos_name_unformatted listed, where all the other fields have useful labels. I confirmed that this field is not listed in apachesolr_field_name_map().
Is this field really necessary to have indexed? Looking in apachesolr_index_node_solr_document(), I see the author's name being indexed twice, once unchanged in tos_name and once having been run through check_plain() in tos_name_unformatted.
// Author information
$document->ss_name = $node->name;
// We want the name to be searchable for keywords.
$document->tos_name = $node->name;
// Index formatted username so it can be searched and sorted on.
$account = (object) array('uid' => $node->uid, 'name' => $node->name);
$username = check_plain($account->name);
$document->ss_name_formatted = $username;
$document->tos_name_formatted = $username;
This seems a little redundant to me. There may be a use case I'm not seeing here, but I would think that you're really not going to see a difference between the check_plain() version and the unaltered one, so searching against either of them should yield the same result, right?
I'm not sure if removing this field or providing a name for the mapping function is the way to go here.
| Comment | File | Size | Author |
|---|---|---|---|
| #5 | 1923764.patch | 610 bytes | mkalkbrenner |
Comments
Comment #1
nick_vhtos stands for text omit norms :
We also use this for taxonomy terms. This does some stemming (using the text analysers) in combination of the complete string. We can facet using the ss_* and search for parts of a username using the tos_*.
Does that make more sense? I'm going to close it as such, please reopen when you think this is still problematic
Comment #2
kevin.dutra commentedYes, I understand why you would want an ss_* version and a tos_* version. My confusion is why there is a *_name field set and a *_name_formatted field set.
Comment #3
nick_vhDoes this issue : http://drupal.org/node/1161608 answer your questions? There might be cases where you want the username without formatting, and I think we also did not wanted to change the behaviour too much.
Are you suggesting the formatted should be removed and we should always index the formatted version to ->name?
Comment #4
kevin.dutra commentedAh, ok that makes sense. I wasn't aware of the
format_username()function. I think that makes complete sense for D7.Since D6 doesn't have that function,
check_plain()is being used in its place for 6.x-3.x. Basically allcheck_plain()does in D6 is encode certain characters as HTML entities. I would think this would make it less useful for searching on because users are more likely to search on "&" rather than "&" (for instance).At any rate, I'm not opposed to it being there, I just think it needs a proper label in
apachesolr_field_name_map()so that on the bias page an administrator knows what's in that field and why it's different from tos_name.Comment #4.0
kevin.dutra commentedAdding a code sample
Comment #5
mkalkbrennerI agree with kevin.dutra:
Additionally the missing label causes an error in apachesolr_multilingual.
Due to the fact that the field is treated differently but not formatted I will use the label "Author name (Stemmed)".
Comment #6
mkalkbrennerComment #8
nick_vhSeems this has been fixed? :)