I cannot get the search_node_links to populate at all.

I have modified the search.module around the filter_format_allowcache line as discussed in a few other threads
(such as the issue with non cachable php input format http://drupal.org/node/664124 )

I have disabled/removed, re-enabled search module to start from scratch
I have made sure cron runs and all that.

Regardless the node links table remained empty

The links on my pages look like this <a href="/node/##"

I found a 'fix' but its way to hardcoded and I just don't know how to do this correctly - More importantly is it a bug or a factor of my system configuration.

at my site the Drupal code in a subdirectory cms like this: mydomain.com/cms
I have an htaccess rewrite rule in my site root, not in drupal root, so urls without the cms in the path have it added
calls to mydomain.com/node get rewrite (301) to mydomain.com/cms/node

What I did that worked was this - 3 changes - all in search.module

1- This line :
@href=['"]?(?:http\://mydomain\.org/cms/|/cms/)(?:\?q=)?/?((?![a-z]+:)[^'">]+)['">]@i

added a check for /node
@href=['"]?(?:http\://mydomain\.org/cms/|/node/|/cms/)(?:\?q=)?/?((?![a-z]+:)[^'">]+)['">]@i

by the way at this point $baseurl = http://mydomain.org/cms and base_path() = /cms/

2- At this line
$path = drupal_get_normal_path($match[1]);

Change to:

$path = 'node/' . drupal_get_normal_path($match[1]);

Obviously the hard coding of node/ is bad way to do this

3- The changes ref in other discussions about basically removing this line
if (filter_format_allowcache($node->format)

So, whats broke? My subdirectory/redirect thing, or the search module - or both? or non (feature request)

Allan

Comments

gpk’s picture

Are you using PHP input format? Does step 3 actually make any difference?

My hunch would be that because your links don't point to the actual URL of the linked content on the site, that is what is causing your problem.

I think the line you are referring to in search.module is this one:
http://drupalcode.org/viewvc/drupal/drupal/modules/search/search.module?...
but from what you've posted I can't see exactly what the change is that you have made.

[update] Oh I see, what you've quoted is perhaps the value of $node_regexp.

In which case I think you should replace /node/ by /
I can't see any way of avoiding having to patch search.module to get the behavior you require, because your links basically point to somewhere outside of your site.

jhodgdon’s picture

subscribe

allan1015’s picture

Hi,

No I don't have Php filter enabled
I did mean the value of $node_regexp

and the single change you recommended worked, just added a preg_match check (or) for /
so
$node_regexp = '@href=[\'"]?(?:'. preg_quote($base_url, '@') .'/|'. preg_quote(base_path(), '@') .')(?:\?q=)?/?((?![a-z]+:)[^\'">]+)[\'">]@i';
becomes
$node_regexp = '@href=[\'"]?(?:'. preg_quote($base_url, '@') .'/|/|'. preg_quote(base_path(), '@') .')(?:\?q=)?/?((?![a-z]+:)[^\'">]+)[\'">]@i';

I was able to remove the other changes

>>>I can't see any way of avoiding having to patch search.module to get the behavior you require, because your links basically point to somewhere outside of your site.

Ok, well having drupal in a subdirectory and hiding, well leaving out, the subdirectory in links (links from my own pages or links to the site from outside) isnt that uncommon is it?

I did test and href="/subdirectory/node/##" works witout or with the hack to node_regex

using href="/node/##" Needs the hack to check for / to node_regex

Can I make a feature request to support Drupal in subdirectories like this?

Allan

jhodgdon’s picture

There is probably a better way to do what you are trying to do, that wouldn't break this functionality, I would think? But maybe I am not fully understanding what you were trying to accomplish... Can you explain?

EDIT/ADDED: And also explain what your setup is -- how you are making Drupal do what you want it to do. Thanks.

jhodgdon’s picture

Category: bug » support
Status: Active » Fixed

Wait, I see.

You have example.com not running Drupal, and Drupal is in a subdirectory example.com/cms.
You want it so that calls to example.com/node/* get redirected to example.com/cms/node/*.

So no, I don't think we will patch Drupal so that if you put a link to example.com/node/345 in a node, Drupal will figure out that this is really a link to example.com/cms/node/345 (within the Drupal site) and add this to the node links table. How would that possibly work?

I guess this is a support request for your particular site then, and you have resolved it.

allan1015’s picture

Hi

Ill give a shorter statement of the requirment and then walk through it

*************Short version:

Currently Druapl Search only considers absolute links
<href="example.com/drupalbase"
and links realtive to the drupal base/subfolder
href="/drupalbase/..."

As being on the same site and there for as valid backlinks
(Where drupalbase is = base_path() )

The request or bug is that any links with a leading / as in href="/..." be consider links to the same site and hence indexed as backlinks. This is consistent with URL/link rules as I know them where href="/... assumes a relative link

Net - any html relative links be considered and indexed as a backlinks, not just ones pointing to drupalbase

is that clearer?

*************Long Version

I have basically drupal sitting in a folder /drupal but I do not need to put the subfolder name in the URLs

So these urls work wonderfully :
example.com/node/123
example.com/title-of-a-node (url alias)

Of course calls to
example.com/drupal/node/123
example.com/drupal/title-of-a-node

also work, but who wants that silly /drupal to have to be in all my references, links etc.

Anyway none of that is my issue, it all works and works fine so far with everything but backlinks

What I want to do is have destination links that go to another page, on one node pointing to another node, to look like

/node/123, or more fully <a href="node/123">Click Me</a>

Thats what I do do, it all works wonderfully

I dont want to have to do href="/drupal/node/123"

HOWEVER, search only backlinks these formats:
href="/drupal/node/123"
href="example.com/drupal/node/123"

>>add this to the node links table. How would that possibly work?

Look, the addition of two simple characters '/|' is all it took to make Drupal search support this subdirectory configuration

The search_node_links table simply look like this:

1 node 123 page-titile (Sid, Type, Nid, Caption)

It doesnt seem like a big complicated thing that Drupal has to figure out some magical stuff as you describe?
Actually this isnt asking drupal search to figure out anything, just asking if it could be less restrictive.

There is a check in search.module to see if the link points to a node on this site

// Check if link points to a node on this site
 if (preg_match($node_regexp, $value, $match)) {

because of the regular expression used this check only accepts :

href="example.com/drupal/ (an absolute link)
href="/drupal/ (a site relative link to the drupal base folder only)

I am asking that href="/... be recognized as relative to the same site and that drupal search have the forced requirement of the drupal physical subfolder.

Simply adding |/| as a condition, an OR statement, to the Regex accomplishes this.

What this in effect says is that if the link starts with a "/" then we assume it is relative to the site, and points ot another document on the site, and therefore counts as a backlink

Isnt that how href links work? anything with a leading / - like href="/.... is seen as relative to the root? on the same site?

I reject the notion, but open to comment, that my site config is something all that unique or special and this issue is in any way a one-off special for me. It sounds more to be about a basic assumption of what constitutes a 'link on this site' and current druapl search is to restrictive in its assumption.

So the real short version
Any links with a leading / as in href="/ be consider links to the same site and backlinks

Hope that at least clarifies the issue - thanks for your time

jhodgdon’s picture

If a link goes to part of the site that's outside the drupal root, then it's outside of Drupal, and Drupal doesn't (officially) know anything about it. So the search_node_links table is not going to include these as official links to the node. We cannot guess what URLs you might be using to alias links.

allan1015’s picture

Well, if Drupal Search module - backlink subpart - wants to be narrow in scope to what it considers a valid backlink - not much I can do

I can point out that
Your restiction is limiting valid site implementation options and forcing longer and more complicated links.
Your arguments are that it will require something special of 'Drupal' is not even discussed and hardly proven.

I do want to point out that you write as if you speak for Drupal, but we are discussing a policy within a very narrow and specific function. Across Drupal this configuration is supported, and documented and even has configuration choices inside Drupal.
I run this configuration at several sites - so far nothiong but this little backlink code seems to care.

Any if you want to take a narrow interpration as to what constitutes a valid relative link within a website. one that is at odds with standard html rules, not much I can do.

Well I can state the my impression is that your points that Drupal Needs to Know or Guess at anything at all - is a facetious argument. If you could please explain in a bit more then hand waiving why backlinks are only vaid if druapl base is in the url?
What is it you need to Do or Know or Guess?

other then stop forcing a non-standard definiton of a site relative link.

Allan

allan1015’s picture

I thought about this and the issue really is much narrow and specific then I wrote above.

Links of the form href="/node/123" are valid links on Drupal pages.
Drupal and Apache can be configured to make this work and I am not aware of any other issues with this form
This is not a one off or special

So, bug or feature, but Search backlinks should recognize this form of link as a valid 'internal' link

Using the counter logic presented so far, it seems safe to assume that /node/123 is pointing back to a drupal page

allan1015’s picture

So curently Drupal Search, backlinks only supports absolute references or relative links with the basepath hardcoded in the link

Relative Links of the form '/node/123' should also be considered valid backlinks

The following mod checks to see if the basepath is in the link, and injects if not, but only for links like href="/node/123"

in search.module around line 496

if ($tagname == 'a') {

// MOD - Allows for node links of the form href="/node/123"
// Checks if the drupal base path is in the link if not add it - but only for /node style

   if (!preg_match(preg_quote(base_path(),'@'),$value)) {
      $value = str_ireplace('/node',base_path().'node',$value);
   }

//End

   // Check if link points to a node on this site
   if (preg_match($node_regexp, $value, $match)) {
jhodgdon’s picture

Title: search_node_links not updating, no backlinks, when I use href="/node/##" in my links » search_node_links not updating, for links that are not within the Drupal installation

The problem is that you cannot assume that a link like example.com/node/23 is a link *within the same drupal site* if Drupal is installed at example.com/subdir.

For instance, I have a site where I have several completely separate Drupal installations. One is for the top level domain example.com, and another is in example.com/subdir. So if I happened to link from the subdir installation to the top-level installation, I would DEFINITELY not want to treat example.com/node/23 for instance as a link to example.com/subdir/node/23. They are not the same pages on my site.

This is one counter example... In any case, we don't do things in the core of Drupal to handle non-standard Drupal installations. You could write a module that would do what you want to do for your site.

allan1015’s picture

[UPDATED]

Ahh ok, examples are helping

In your sceanario would you even be using href="/node/123/" for your links, in either domain?
be hard in the subdomain
Should be ok in the top domain

But your argument isnt valid is it? Your making this dire case that some really bad thing of having a one drupal domain report a backlink from a another drupal domain? Actually that sound pretty useful but not what Im talking about

My point is this almost species argument of how bad this would be - like your trying to make it sound bad, or dire, but perhaps you just dont get it, or I dont get it so lets walk through it.

The code I provided would insert base_path() for the domain search was running against as part of a condition

Sceanario-1
A base-drupal site has a Page with href="/node/123"
You run Search and build backlinks - what I am suggesting would work fine, no subdomain would be inserted in the links path
That Page would be listed as a backlink for Node 123 - no cross domain/site/instance issues

Sceanario-2
A subdir-drupal site has a Page with href="/node/123"
You run Search build backlinks - what I am suggesting would work fine, the subdomain would be inserted in the links path
That Page would be listed as a backlink for Node 123 - no cross domain/site/instance issues

Again, only the base_path for the domain that the search is running against would ever be used
The base_path is not part of the stored results, no data is corrupted

>>>The problem is that you cannot assume that a link like example.com/node/23 is a link *within the same drupal site* if Drupal is installed at example.com/subdir.

I still dont get what assumptions you think are going on? Where are assumptions being made? The base_path value for that site is being used, what is being assumed?

It still feels like your throwing up arguments for arguments sake, heels dug in and all, or one of us isnt thinking this through.

I am open and anxious to hearing any sceanarios of a standard drupal install where what Ive suggested actually breaks something

>> to handle non-standard Drupal installations

Could you explain what is NON STANDARD about my drupal installation?

A standard Drupal installation must be: _______________________________

allan1015’s picture

Title: search_node_links not updating, for links that are not within the Drupal installation » search_node_links not updating, for VALID links of the form href="/node/#/

Did someone change the title of this issue?

That is really uncalled for - what a crock to retitle something in such an inaccurate fashion - wow talk about politcal manuvering
sleezy

This is what was there today
search_node_links not updating, for links that are not within the Drupal installation

I am chaging it back to something that is closer to the truth of the issue

gpk’s picture

The real problem we have here is that although some of the docs talk about it being possible to run Drupal in a subdirectory of a domain, support for this is at best patchy across both core and contrib modules. Although I experimented with setting up Drupal sites to work in subdirectories for a while, I decided a year to 2 back that it was easiest to always put the Drupal install in the top folder of a domain/subdomain, and if I wanted dev/test instances then just take out a new subdomain for that. Otherwise integration with various contrib modules (w.g. WYSIWYG) got to be a headache if you wanted the site to remain portable (e.g. take a copy of a base Drupal site and set it up as a dev/test instance in a subfolder).

I think it's pretty non standard to rely on 301 redirects to get embedded links in pages to work.

A scenario to consider:
both base Drupal and subdomain Drupal sites have a node 123, and There is no .htaccess redirect for /node/* in .htaccess.
On subdomain site you create a link to /node/123 (i.e. the base site).
If search.module is patched as you suggest then the search indexer will treat this as a link to node 123 on the current site, which it isn't...

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.