Intended audience: website admins (or other technical individual) with the responsible of configuring Drupal search.

The document is divided into Basic Settings, Advanced Settings, and Popularity Report sections. Configuring Basic settings should be sufficient for those looking for a quick setup and who do not have high requirements for their site search interface. The Advanced Settings section is useful for anyone wishing to get the most out of the module in order to optimize search for their site needs. Anyone wishing to view the popularity values should also read the brief section on the Popularity Report at the bottom of this document.

Basic Settings

The basic setting located at /admin/config/search/apachesolr/apachesolr_popularity are divided into three subsections: Solr Settings, Basic Popularity Configurations, and Clear Popularity Values.

Solr Settings

The Solr Settings subsection contains the minimum necessary forms to use this module. At the top is the name of the default Solr server and its URL as specified by the Apache Solr Search Integration module. If the server cannot be found, an error will be displayed with a link to the Apache Solr Search Integration settings, which gives the user quick access to properly configure it.

Following the Solr server information, there is a checkbox to enable the module and a textbox to input the absolute path of the default Solr server’s data directory. This directory is set up during installation of Apache Solr. It is commonly called “data” and may be in the Solr root directory within the Java Servlet (e.g., Apache Tomcat) directory. If you do not have this information available, consult the system administrator for more information on the location of the Solr data directory.

Once Enable Apache Solr Node Popularity has been checked, the Solr data path have been correctly entered, and the configurations are saved, the module will begin tracking popularity data and will incorporate it into Solr’s search ranking.

To disable popularity tracking, simply uncheck Enable Solr Node Popularity, and save the configurations. Upon unclicking the checkbox, a new checkbox will appear asking whether the tracking data should be removed. Generally, the data should be removed; however, if tracking will only disabled temporarily, there may be interest in keeping the data so that it can be reused once the module is re-enabled.

Basic Popularity Configurations

The Popularity influence is the amount that the module can modify the ranking results. A setting of 100% will result in up to a 1x to 3x increase a page’s search value, which will cause it to increase or decrease rankings relatively. Only the most popular node on the site will have a 3x boost. Most nodes will be somewhere within that range. The popularity influence can be adjusted to reflect how important popularity of a node is to your site. A value of 50% will have a range from 1x to 1.5x and a value of 400% will have a range from 1x to 12x. Those that want search results to greatly influence by popularity should choose a large value and sites that want minimal influence on popularity should choose a small value.

The Forgetting window determines how quickly previous popularity data fades. Roughly speaking, after this many days, the data will begin to fade. Lower forgetting values result in a large amount of forgetting, which causes newer popularity data to be favored higher, and higher forgetting values result in minimal forgetting, which causes old and new popularity data to be treated roughly equally. Those that have websites with a high hit counts or with content that varies greatly in the demand for content should use a low value, and those with a website with low hit counts or fairly static content that has fairly stable demand should choose a higher value.

This subsection also provides two checkboxes to prevent certain types of pages from being tracked. Only track published pages prevents tracking of unpublished pages. This is useful since nodes may receive multiple “editing” hits during creation prior to publishing, and these hits may not correlate with other user’s demand for the node. The second checkbox, Do not track the front page, is useful because the front page is not typically searched for directly, and additionally, it receives a large number of hits. Having a large number of hits will cause the front page to have a very high popularity, and will cause the other nodes to have a reduced relative popularity, which may decrease the effectiveness of the module. Disabling front page tracking is generally recommended.

Clear Popularity Values

To rest all popularity values, check Reset popularity values for all nodes, check Confirm rest, and save configurations. Note that this cannot be undone.

Advanced Settings

The advanced settings are useful for admins who wish to tailor the popularity values to the needs of their site. Once tracking is enabled in the Basic Settings, the advanced settings can be enabled by checking Enable advanced settings, which is located /admin/config/search/apachesolr/apachesolr_popularity/advanced_settings.

Popularity compression on high-traffic nodes compresses (reduces) the popularity of very high nodes to prevent high-traffic nodes from dominating the search results. Currently, the three options are None, Moderate, and High. Select None to increase the influence of popular nodes, Moderate to provide some reduction in influence, and High for a lot of reduction.

Initial popularity sets the popularity of newly tracked nodes, which allow new nodes to be boosted in priority. The default option is Average, which assumes the initial popularity is the average (mean) popularity of all currently tracked nodes. This is generally better than it starting at zero, since the best (a priori) estimate before having any tracking information that reduces estimate error is the average node popularity. If this is not preferred, other values can be selected. Lowest sets the popularity value to zero, Low sets it in between zero and the average, Highest sets it to one hundred, and High sets it between one hundred and the average. Setting to High is recommended for giving newly created nodes boosted priority in the search results.

Initial popularity decay time determines how long (and strongly) the initial popularity is held. The larger the value the greater the amount of time the initial value is held. This does not override popularity calculations, but instead influences it. Setting it to zero will override the initial value at the very first popularity update (at next cron). A value of 0.2 will result in the popularity being roughly halfway in between the initial popularity and the calculated popularity at 0.2 days. A value of 2 will result in the value being halfway at 2 days, and so on.

Low-popularity influence determines how easily it is to find unpopular pages. Setting it to zero will make the least popular pages very difficult to find in the search results and a higher value will make them easier to find. Note that this will never make low-popularity nodes easier to find then the more popular ones; it just sets the lower bound to the effects of popularity. The range of values is determined by the low-popularity influence and the popularity influence mentioned in the basic section. At the time of writing, the amount of influence of popularity on search results is within the range of

Low_popularity_influence TO Low_popularity_influence + 2 x Popularity_influence

For example, a low-popularity influence of 1 and a popularity influence of 100% would result in a range from 1x to 3x. A low-popularity influence of 0.5 and a popularity influence of 100% would result in a range from 0.5x to 2.5x. A low-popularity influence of 0.5 and a popularity influence of 200% would result in a range from 0.5x to 4.5x. And so on. The specific ranking value for each node can be anywhere within this range.

Popularity Report

The Popular pages report displays all popularity-related values and is useful for customizing the module. It displays the node title, ranking multiplier, popularity, recent count, time tracked, and tracking flags. The popularity is the popularity of the node as discussed throughout the document and ranges from zero to one hundred. The ranking multiplier is the actual value that is multiplied by Solr ranking values, and is based on the node popularity. The recent count is the number of page hits since last popularity update, which is updated each cron update. The time tracked is the amount of time the module has been tracking the specific node. The tracking flag is any special flags use to mark the node. Currently, the only flags are unpublished status and front page status.

Only nodes that are marked as tracked in the Basic Setting are displayed in this report.

Credits

Developer:
Jonathan Gagne (jongagne)
http://drupal.org/user/2409764

This project was funded by:
OPIN Software
http://www.opin.ca/