I use the authors name in the path alias for book node of the author

path: rowling/harrypotter/

If you search for "rowling", all nodes that contains the word "rowling" in the path will appear in the search results.

This is from my point of view not desired.

I propose to change the schema.xml from
<field name="path" type="text" indexed="true" stored="true"/>

to
<field name="path" type="text" indexed="false" stored="true"/>

CommentFileSizeAuthor
#2 patch_apachesolr.txt3.68 KBbrainski

Comments

brainski’s picture

Title: Proposed change of schema.xml » Flag for indexing path alias or not

I missunderstood the schema.xml file.

While indexing the nodex, the path alias are added to the text as well. This might not be useful as described above. Maybe one could implement a flag in the settings if the path aliases should be indexed or not.

$indexPathAlias = false;

if($indexPathAlias)  {         
 // Path aliases can have important information about the content.
            // Add them to the index as well.
            if (function_exists('drupal_get_path_alias')) {
              // Add any path alias to the index, looking first for language specific
              // aliases but using language neutral aliases otherwise.
              $language = empty($node->language) ? '' : $node->language;
              $path = 'node/' . $node->nid;
              $output = drupal_get_path_alias($path, $language);
              if ($output && $output != $path) {
                $document->path = $output;
                $text .= $output;
        }
     }
}

brainski’s picture

Status: Active » Needs review
StatusFileSize
new3.68 KB

After fiddeling around with Eclipse for more than an hour, I finally was able to create my first PATCH! Hurray! :-)

And here is the description.

- New flag in settings: Exclude Path Alias from index. This created a lot of problem in my index. I also saw a lot of duplicate entries.
- New flag in settings: Include only the bodyfield of the node in the index. This option is useful, if you have nodes that containing views with a lot of redundandent information. Because I want only the relevant information of the node in the index, I added this option.

I corrected a typo in the code:
Old: $text = check_plain($node->title) . $node->body;
New: $text = check_plain($node->title) .' '. $node->body;

I tested this functionality very careful. It would be great if someone could review this and then commit it to the dev version.

brainski’s picture

Title: Flag for indexing path alias or not » Patch for excluding pathalias / rendered body from index (new module settings)

changed title

JacobSingh’s picture

Hi Brainski,

This is indeed an interesting although probably larger issue. I don't think the settings page should be cluttered with options like this because there are probably dozens more in the offing like:

- Only index these CCK fields
- Only index these node types
- Only index these vocabularies

etc. etc...

Some of these options may be covered by the module, many will need to be cusotmized by administrators in their solr instance and/or in drupal. My feeling is that we need really split up the module into a more plugin based architechture. So the apachesolr_node module would provide the basics, however, apachesolr_path might provide the options for indexing the path, and would be a LOCAL_TASK of the main settings page.

What do you think of this? I'm not saying your issue isn't relevant, just that we could go on tacking options onto that main page related to how the content gets indexed and it would become quite a complicated page to look at, and the module would be full of bloat.

brainski’s picture

I don't see it the way you see. These are only 2 options and they can solve a lot of problems. If you compare the settings page of apache solr with the one from pathauto, you will agree with me, that apachesolr settings page is almost empty.

For me its better to have everything in one place than 20 different modules. And I expect from every user that implements apache solr, that he is able to handle two or more additional checkboxes because he was able to manage the complexity of solr..

What do you mean?

robertdouglass’s picture

@brainski: thanks for the patch - and congratulations on rolling your own =)

@JacobSingh: More modular is fine. I can see a lot of things being refactored out into plugins or separate modules. However, I think that there could be a section on the configuration page that has checkboxes for all the things that there are to be indexed. If someone doesn't want paths, they can uncheck it. If they don't want taxonomy, they uncheck it. This could be implemented by a hook.

In any case, I see this as post version 1.0 work.

brainski’s picture

What do you mean with post version 1.0 work? Has someone tested this patch? Was it already commited to the dev version?

robertdouglass’s picture

@brainski: we did a prioritization exercise of the issue queue and marked as "critical" all issues that we want to close to be feature complete for a 1.0 release of this module. There are a lot of issues in the queue. Some are left out based on a gut feeling of priority. I decided to address the settings issue in your patch post 1.0 release.

The space between title and body has been committed in another issue, thanks for pointing it out to us.

brainski’s picture

ok thanks for the feedback. If I can support you with this issue, please send me an email. I have some capacity for developing on the solr module.

pwolanin’s picture

Status: Needs review » Closed (fixed)

seems to be no longer relevant