I've got an issue that I'm hoping has a simple solution.

I have a setup where we use the secure login module to force administrators and content editors to use https://mysite.com instead of the public-facing http://mysite.com. This means when we go to manually re-index the site at https://mysite.com/admin/config/search/apachesolr, all of the indexed pages are stored with the url of https://mysite.com/my-page.

This problem gets worse when we have a development or preview link for a site before it goes live, and we the index was built from that test url: https://test.mysite.com/my-page - it becomes really obvious that somethings wrong.

If Drupal is reindexing pages or content as a result of a save or edit, it works fine and uses the correct http://mysite.com url.

The current workaround for us is to turn off Secure Login, log out, log back in without the https:// in the url, kick off the index, then turn Secure Login back on.

I want to be able to manually kick off the index, without having to disable Secure Login.
I can see a bunch of issues and changes where the links have been changed from being absolute to relative and vice-versa (#667650: Results of apachesolr_process_response should return absolute URLs, #337879: Store relative not absolute paths, #1765938: Move the variable_get() for "apachesolr_environments" after the cache_set() so that URLs can be modified dynamically), but I'm struggling to figure out what's current and how I can fix my issue. Any help appreciated.

Comments

marblegravy’s picture

Status: Active » Closed (duplicate)

This is very closely related to #1881164: Wrong domain in the path of the search results (using domain access module), and the comment at http://drupal.org/node/1881164#comment-7028786 was enough to help me solve this for now.

My version of hook_apachesolr_process_results() looks like this:

<?php
/**
* Implements hook_apachesolr_process_results().
*
* Strip down search result links from absolute to relative paths
*/
function mymodule_apachesolr_process_results(&$results, DrupalSolrQueryInterface $query) {
  foreach ($results as $id => $result) {
    //rebuild the link as a relative url. Because this hook is local to this site, remote sites can still use this index.
    $url = parse_url($result['link']);
    $results[$id]['link'] = url($url['path']);
  }
}
?>
marblegravy’s picture

After thinking about this overnight, I think this version might be more robust in case clean_urls are turned off or the page has a query string for whatever reason.

<?php
/**
* Implements hook_apachesolr_process_results().
*
* Strip down search result links from absolute to relative paths
*/
function mymodule_apachesolr_process_results(&$results, DrupalSolrQueryInterface $query) {
  foreach ($results as $id => $result) {
    //break up the result link
    $url = parse_url($result['link']);
    //rebuild a relative link
    $relative_link = (isset($url['path'])?$url['path']:'/').(isset($url['query'])?'?'.$url['query']:'');
    //provide the relative link back for rendering
    $results[$id]['link'] = $relative_link;
  }
}
?>
marblegravy’s picture

Final version... completely different approach which should be the most reliable version provided that:

  • The index only contains content from this site (and external, non-drupal sites)
  • You're only indexing nodes. It should work if you also index users or entities, but I haven't tested it.

All I'm doing here is throwing out the url sent from solr, then letting drupal generate the most appropriate path it has the ability to create based on the raw path of the result.

This has the benefit of working with clean_urls on or off, it also works if the paths are supposed to have query strings or whatever, and it takes care not to try and re-create paths for content that the current site doesn't know about.

<?php
/**
* Implements hook_apachesolr_process_results().
*
* Strip down search result links from absolute to relative paths
*/
function mymodule_apachesolr_process_results(&$results, DrupalSolrQueryInterface $query) {
  foreach ($results as $id => $result) {
    //use the result's node id to generate the url, if it doesn't exist, just use the provided path
    if (isset($result['fields']) && isset($result['fields']['path']) && $path = $result['fields']['path']){
      //The path apears to be valid, we can create a new path for the search result:
      $results[$id]['link'] = url($result['fields']['path']);
    }
    //no value for path for whatever reason, don't change anything.
  }
}
?>