Make search indexing smart to handle nodes wth php redirects [#286263]

Comment	File	Size	Author
#43	286263-disabled_redirects_idea.patch	1.7 KB	sime
#42	286263-disabled_redirects_idea.patch	791 bytes	sime
#26	286263.patch	7.47 KB	robeano
#17	286263.patch	7.47 KB	douggreen
#16	286263.patch	7.47 KB	douggreen
#8	286263.patch	8.12 KB	douggreen
#2	286263.patch	1.36 KB	douggreen
#1	286263.patch	1.37 KB	douggreen

Comment #1

douggreen commented 25 August 2008 at 14:25

Version:	6.3	» 6.4
Status:	Active	» Needs review

Status	File	Size
new	286263.patch	1.37 KB

(I finally got around to this on the plane to DrupalCon, and since webchick was just an aisle over, I discussed this with her.)

I think that the only solution is to set a global variable when in cron, and then check this global variable before doing a drupal_goto.

It's conceivable that other modules will try to render the node from within cron, and the real problem is cron shouldn't be redirected. I think that this is a reasonable general solution.

I have a similar patch for 7.x, but just in case there's some tweaking to this one, I'll wait to post it until we get this one done.

Log in or register to post comments

Comment #2

douggreen commented 25 August 2008 at 14:37

Status	File	Size
new	286263.patch	1.36 KB

Actually, here's a slightly different version, that avoids on logical comparison (the !).

Log in or register to post comments

Comment #3

jeremy commented 25 August 2008 at 16:25

Status:

Needs review

» Needs work

There are several problems with this solution: on d.o., nodes are calling header() directly to do redirects, not calling drupal_goto() -- I simply used the latter as an example way to easily duplicate the problem. Also, even if the calls were to drupal_goto(), we'll end up indexing the contained PHP code, as with your patch we return the node body instead of executing the contained PHP code -- it is wrong to index the php, as then searches for "drupal_goto" will return incorrect results.

The solution I used in the Xapian module was to make indexing stateful, and to detect the nodes that we fail to index, skipping them. This had a few advantages: it doesn't require a patch to core, it works with any method for redirecting pages, and it doesn't index content that shouldn't be indexed. Unfortunately it also feels a bit like a kludge, and it can be slow to index a website with a lot of redirects in it as it tries each failed node twice (to be sure it's not a memory issue or some other intermittent issue).

Log in or register to post comments

Comment #4

damien tournoud commented 25 August 2008 at 16:32

I suggest we create a special menu callback, something like node/<nid>/raw, and make a drupal_http_request() to that during the indexing.

Log in or register to post comments

Comment #5

jeremy commented 25 August 2008 at 18:38

@Damien: That doesn't solve the issue at hand. How would you handle redirects? You're either going to index raw PHP code, or you're going to index the same node multiple times (the original, and the one that's pointing to it via a redirect). Neither is desirable imo.

Log in or register to post comments

Comment #6

damien tournoud commented 25 August 2008 at 18:47

I've not been that clear.

What I suggest is to make a special path (node/nid/raw) where the node is rendered and returned as-is (without being embedded in a theme('page')). The search indexers would call drupal_http_request() on that path and thus index the *result* of the rendering (never the PHP code).

How would you handle redirects?

Easy: you don't index pages that produce a redirect (ie. you don't follow redirects).

This architecture would allow us to render the node and index in separate threads. This way, we avoid pitfalls of the current strategy (for example: PHP fatal errors and other exit() currently block the indexing altogether, etc.).

Log in or register to post comments

Comment #7

arhak commented 25 August 2008 at 19:02

subscribing

Log in or register to post comments

Comment #8

douggreen commented 26 August 2008 at 13:36

Status	File	Size
new	286263.patch	8.12 KB

@dww, If the code was really doing a drupal_goto, I don't see how we are indexing raw php code. Wouldn't the php filter simply return the empty string, and thus we would index nothing?

However, I get the problem about states and failures. So here's another patch that includes both the Cron global variable, and some basic state information. I've changed the {search_dataset}.reindex column to an unsigned int (don't forget to run update.php), and then set it to the negative value of the time, during indexing. Nodes that get stuck will always have this negative value. Some contrib module might choose to re-run these X-times, but I didn't put that in core. I did add a bit more information to the admin/settings/search page though, so that it will tell you how many of these nodes are stuck in pending.

There's also one more minor change to the current behavior, which is, if you are 100% indexed, it no longer says "There are 0 items left to index." I removed this because I thought it looked funny to say "99% of the site has been indexed. There are 0 items left to index. There is 1 item pending or failed indexing."

Log in or register to post comments

Comment #9

douggreen commented 28 August 2008 at 05:19

Status:

Needs work

» Needs review

Opps, forgot to change the Status, which explains why no one has reviewed this yet.

Log in or register to post comments

Comment #11

douggreen commented 28 August 2008 at 05:19

Version:

6.4

» 6.x-dev

Log in or register to post comments

Comment #12

douggreen commented 28 August 2008 at 06:44

Title:

Drupal 6 search indexer never completes in some setups

» Drupal 6 search indexer never completes when nodes have redirecting php

Log in or register to post comments

Comment #13

damienmckenna

TN, USA

commented 9 September 2008 at 02:16

Might this help with the search status message too? #305659: Patch for node_search($op=status) to give correct status message totals

Log in or register to post comments

Comment #14

douggreen commented 10 September 2008 at 10:10

@DamienMcKenna, as you've noticed in #305569: Module Amazon's Amazon Filter Won't Work, that problem has nothing to do with core search, and IMO is not related to this at all.

Log in or register to post comments

Comment #15

rares commented 17 September 2008 at 01:08

subscribing to the issue, I think it's fine that the search module evaluates php when doing indexing, but there should definitely be no redirections when doing so, or cron will never complete. one could patch search, but another way of fixing this is to add a line in drupal_goto to exit the function if cron_semaphore is on.

Log in or register to post comments

Comment #16

douggreen commented 18 September 2008 at 14:05

Status	File	Size
new	286263.patch	7.47 KB

@rares, good idea! I've updated the patch to not use the new drupal_get_cron function, but just check cron_semaphore.

Log in or register to post comments

Comment #17

douggreen commented 29 December 2008 at 13:29

Status	File	Size
new	286263.patch	7.47 KB

I ran into this problem again today, our home site hasn't indexed anything in a couple weeks. This patch fixed the problem (... that and changing a header() call in node php to drupal_goto).

Is there another solution out there? I don't consider dww's header() to be a show-stopper. In my opinion, header() in the node php is a user error, and should be changed to drupal_goto().

I've attached a new patch just to increment the update handler.

Log in or register to post comments

Comment #18

Bevan commented 29 December 2008 at 22:00

#111744: Add configuration to exclude node types from search indexing was marked duplicate

Log in or register to post comments

Comment #19

salvis

he/his

commented 14 January 2009 at 11:24

Comment Mover for D5 produces nodes with drupal_goto() calls, which cause indexing failures when upgrading to D6 (see #357508: Incompatibility with D6 core Search module?).

I don't know how the Search indexing works exactly, but doesn't it index titles, too. So, if drupal_goto() calls are simply ignored during indexing, Search results would include nodes with drupal_goto() calls, which the user cannot see (because they redirect).

That and the annoyance of having items that will forever be "pending or failed" make this a bad solution, imo. Shouldn't redirecting nodes be skipped during indexing, so that they can never show up in Search results?

Log in or register to post comments

Comment #20

damienmckenna

TN, USA

commented 14 January 2009 at 15:58

I'd go the other way. On a site I run we store all off-site destinations using cck_redirection fields so that we *can* index those pages and be able to search for them based on keywords. A common example would be the 3rd party store service being used, we have a page that is indexed with lots of store-related keywords so if someone searches for those they'll get a link to the store, rather than swearing at the site because they can't find where to buy our merchandise.

Log in or register to post comments

Comment #21

Bevan commented 14 January 2009 at 22:47

#19 and #20; Damien and Salvis, I think you've identified two separate use cases that need solutions;

Content item needs to be indexed so that it can show up in search queries, but doesn't have a unique URL (or it's unique URL should redirect to another website/page).
Content item is for internal / private use and should never be indexed, searched for or viewable. Attempts to view it redirect elsewhere. The search engine needs to be told NOT to index it.

This d.o issue node and patch deals with the first case, but not the second. In the second case, by telling the search engine not to index the content items of a certain type, the issue this patch solves goes away and is not even a problem.

I think there are some threads comments and modules that might allow you to disable indexing of certain types. Try #111744: Add configuration to exclude node types from search indexing and Search config module for starters. Searching d.o may reveal more recent developments.

Log in or register to post comments

Comment #22

salvis

he/his

commented 17 January 2009 at 10:06

Having Search results that you can't actually view (because they drupal_goto() elsewhere) seems odd at first, but I understand your use case now.

Still, having items that will forever be "pending or failed" (and without having any way to find out which items they are) isn't such a great solution.

Maybe drupal_goto() with a status code of 301 could be interpreted as use case 2.

Comment Mover for D6 is not using drupal_goto() anymore, and as far as it is concerned, it seems more appropriate now to convert the D5 nodes to the new mechanism and avoid the issue here.

Log in or register to post comments

Comment #23

rootwork

he/him

English

🇺🇸 US-Pacific 🌎 Chinook (Multnomah, Clackamas) & Cowlitz lands 🌹 Portland, OR

commented 23 January 2009 at 07:51

Is it possible that a bunch of these errors:

warning: Invalid argument supplied for foreach() in /home/modules/taxonomy/taxonomy.module on line 1189.

upon running cron, and a lack of additional search indexing beyond a certain point, could be a symptom of this problem? I've been trying to track down the source of these errors and failed indexing over at #295235: Invalid argument supplied to foreach() in taxonomy.module (line 1189) due to incomplete/partially-deleted nodes but this seems promising.

However, I tried applying the patch and running update, and the errors and failed indexing remain. Any idea if this could be related? The errors and the failed indexing could, of course, be two separate problems.

Is there any way to quickly find out if any nodes have content entered with the PHP filter on them from the database side? I inherited this site with lots of content, and I'm not sure if there are nodes with PHP redirecting, but there might well be.

Log in or register to post comments

Comment #24

salvis

he/his

commented 24 January 2009 at 16:07

@quixoticlife: I don't think so.

Use SQL to check your nodes, something like

SELECT * FROM node_revisions WHERE body LIKE '%drupal_goto%';

Look at the context to verify whether it's PHP code or just text about drupal_goto.

Log in or register to post comments

Comment #25

rootwork

he/him

English

🇺🇸 US-Pacific 🌎 Chinook (Multnomah, Clackamas) & Cowlitz lands 🌹 Portland, OR

commented 2 February 2009 at 07:30

It turned out I did have a page with a drupal_goto function (a cache-clear page left over from D5) but that didn't get rid of the errors. I think they're remnants of old nodes that weren't fully deleted.

Thanks for your help!

Log in or register to post comments

Comment #26

robeano commented 5 May 2009 at 23:03

Status	File	Size
new	286263.patch	7.47 KB

I've attached a new patch which increments the update handler.

Log in or register to post comments

Comment #27

berdir

German

Switzerland

commented 26 June 2009 at 12:13

function drupal_goto($path = '', $query = NULL, $fragment = NULL, $http_response_code = 302) { + // Ignore drupal_goto's within cron.php. + if (variable_get('cron_semaphore', 0)) { + return; + } This will also disable drupal_goto() for every other request while cron is running, I think. Not a good idea...

Log in or register to post comments

Comment #28

douggreen commented 26 June 2009 at 13:16

A drupal_goto during cron is a bad idea... I can't think of any reason why you'd want drupal_goto to work, why is this "not a good idea." Drupal goto will forward you from cron.php to some other page!?

Log in or register to post comments

Comment #29

berdir

German

Switzerland

commented 26 June 2009 at 13:23

#28
Yes, but the cron_semaphore value is saved in the database, if a user does access a site *during* a cron run, drupal_goto() will not work. Instead, it should probably be a static variable, or something like that, so that only the actual cron crun is affected from that check.

Log in or register to post comments

Comment #30

douggreen commented 5 September 2009 at 02:27

Title:	Drupal 6 search indexer never completes when nodes have redirecting php	» Search indexer never completes when nodes have redirecting php
Version:	6.x-dev	» 7.x-dev

updating version to 7.x, we need to solve this before code freeze!

Log in or register to post comments

Comment #31

douggreen commented 5 September 2009 at 02:27

Priority:

Normal

» Critical

Log in or register to post comments

Comment #32

5 September 2009 at 02:35

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #34

sime

any/any

Melbourne

commented 28 October 2009 at 13:29

Status:

Needs work

» Needs review

We only need to avoid sending headers.

function drupal_goto($path = '', array $options = array(), $http_response_code = 302) {
  ...
  drupal_alter('drupal_goto', $path, $options, $http_response_code);

  // A module may wish to abort redirect by setting $path to FALSE in hook_drupal_goto_alter()
  if ($path === FALSE) {
    return;
  }
  ...
}

(Sorry, can't install D7 on my mamp right now, I would have tried a patch.)

Log in or register to post comments

Comment #35

sime

any/any

Melbourne

commented 28 October 2009 at 13:34

Oh, implicit in the above solution is an appropriate listener. Something like:

function system_drupal_goto_alter(&$path, &$options, &$http_response_code) {  
  if (SOME CHECK THAT CRON IS RUNNING) {
    $path = FALSE;
  }
}

function search_drupal_goto_alter(&$path, &$options, &$http_response_code) {  
  if (SOME CHECK THAT SEARCH IS INDEXING) {
    $path = FALSE;
  }
}

Log in or register to post comments

Comment #36

sime

any/any

Melbourne

commented 28 October 2009 at 13:35

Status:

Needs review

» Needs work

it's not a patch so it needs work still I think

Log in or register to post comments

Comment #37

sime

any/any

Melbourne

commented 28 October 2009 at 13:43

It also occurs to me that this could be fairly marked 'by design' or 'won't fix'.

If you are issuing a redirect in a node, you should be checking for the context in which your node is being displayed. For example, it would be your responsibility to ensure a redirect didn't occur in a preview or in a view.

Log in or register to post comments

Comment #38

Bevan commented 29 October 2009 at 03:35

I believe that if (SOME CHECK THAT SEARCH IS INDEXING) should be something like if (variable_get('cron_semaphore', FALSE))

Log in or register to post comments

Comment #39

Bevan commented 29 October 2009 at 03:48

Duplicate of #38

Log in or register to post comments

Comment #40

berdir

German

Switzerland

commented 29 October 2009 at 07:31

See #27/29 why variable_get() does not work.

Log in or register to post comments

Comment #41

sime

any/any

Melbourne

commented 29 October 2009 at 22:00

@bevan also search indexing can run outside of a cron run. you can do it on the search admin page.
Edit: no I'm wrong about that.

Log in or register to post comments

Comment #42

sime

any/any

Melbourne

commented 30 October 2009 at 02:43

Status:

Needs work

» Needs review

Status	File	Size
new	286263-disabled_redirects_idea.patch	791 bytes

OK. I may have broken some rules here with the use of drupal_static(), but it makes sense to me!

I have some other stupid ideas for this, so just testing the water with this very simple patch.

I actually think it *should* be possible to disable header sending, but all through core header() is being called directly, so such an idea is not a simple patch.

Log in or register to post comments

Comment #43

sime

any/any

Melbourne

commented 30 October 2009 at 02:45

Status	File	Size
new	286263-disabled_redirects_idea.patch	1.7 KB

forgot function context

Log in or register to post comments

Comment #44

sime

any/any

Melbourne

commented 31 October 2009 at 02:34

Title:	Search indexer never completes when nodes have redirecting php	» Make search indexing smart to handle nodes wth php redirects
Version:	7.x-dev	» 8.x-dev
Category:	bug	» feature
Priority:	Critical	» Normal

Because of #37, I believe this is not a critical bug.

Also, because of the hook_drupal_goto_alter(), it is possible to write a workaround module in contrib.

As a feature request it goes to 8.x I presume.

Log in or register to post comments

Comment #45

Dave Cohen commented 20 November 2009 at 17:52

I just noticed this for the first time on a D6 site. I've followed this thread and while solutions sound reasonable for D8, that's long way off and I'm wondering what I can do right now.

My first idea is to put hook_cron() into my own module, weight it to come before search_cron(), and in that hook I'll print something, so that it will too late for search_cron() to set headers. Haven't tried that yet, but I'll let you know if it works...

Log in or register to post comments

Comment #46

sime

any/any

Melbourne

commented 22 November 2009 at 10:36

Hey Dave, not sure if it's useful, but when I've had this problem it's been with code in the node body. I've wrapped the problem code in an if statement to only run if we're on node/% path.

Log in or register to post comments

Comment #47

cledman commented 14 December 2009 at 19:30

hi... could you help me?
it looks like your function is missing some parts like:

drupal_cron_run();
}
else {
Index: includes/common.inc

(where ends this else?) or..
function drupal_goto($path = '', array $

drupal_alter('drupal_goto', $path, $options, $http_response_code);

(where this function starts?)

Log in or register to post comments

Comment #48

cledman commented 15 December 2009 at 09:42

hi.. i solve my problem, but could you complete my question latter?

Log in or register to post comments

Comment #49

pexxi commented 29 December 2009 at 21:33

My simple workaround...

At top of drupal_cron_run():

define('DRUPAL_CRON_MODE', TRUE);

At top of drupal_goto():

if (defined('DRUPAL_CRON_MODE'))
{
 return;
}

This allows detection of 'cron-mode' anywhere in PHP-code with minimum of added code and doesn't limit users running out-of cron-mode.

Log in or register to post comments

Comment #50

jaypabs commented 11 January 2010 at 07:15

Is the patch applicable to v6.15? I am experiencing the same problem also. When I disabled search module cron execute successfully.

I cannot determine what is the correct answer from the comments above.

Log in or register to post comments

Comment #51

As If commented 19 January 2010 at 00:11

I just used sime's method on a 6.13 site. It works fine. It doesn't attempt to solve the bigger problem of the indexer executing drupal_goto(); instead it simply makes sure the indexer doesn't get a chance to.

All you have to do is edit every page or block where you use drupal_goto(), and make sure it only gets called if arg(0) == 'node'.

Log in or register to post comments

Comment #52

dave reid

he/him

English

Nebraska USA

commented 20 January 2010 at 15:17

There are an infinite number of things I could put in PHP. I could use header(). I could use lots of other things that could break search.module when indexing. I also don't see how it would be possible to stop drupal_goto() by using hook_drupal_goto_alter(). There are no checks after the alter hook is run and it calls header() no matter what.

Log in or register to post comments

Comment #53

As If commented 20 January 2010 at 20:28

I don't know if you're responding to me or to the earlier thread, but you don't need to do anything in a hook function. When I said "Sime's method" I simply meant: Wrap the goto (or the location header code, or anything you do that breaks the indexer) inside an if(arg(0) == 'node'). That's all you need to do.

Log in or register to post comments

Comment #54

Dave Cohen commented 22 January 2010 at 05:32

Here's a hack I'm planning to use. Seems to work so far. Drawback is you get PHP warnings about headers already sent when running cron.php.

I'm not suggesting this as a permanent fix for drupal, just a workaround. The diff below shows a few lines that I added to the beginning of cron.php.

Index: cron.php
===================================================================
--- cron.php	(revision 2124)
+++ cron.php	(working copy)
@@ -6,6 +6,11 @@
  * Handles incoming requests to fire off regularly-scheduled tasks (cron jobs).
  */
 
+header('Content-Type:text/html; charset=utf-8');
+header("Cache-Control: max-age=1, must-revalidate");
+print("\n\n"); flush();
+
+
 include_once './includes/bootstrap.inc';
 drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
 drupal_cron_run();

Log in or register to post comments

Comment #55

Dave Cohen commented 26 January 2010 at 23:09

Has anyone noticed you can have the same problem in both search indexing and on the search results page?

That is, if your node does something wacky like a drupal_goto() or add javascript to the page or anything like that, code will run when rendering search results, too. I only encountered this after applying my patch above, because now the contents of these nodes is actually being indexed.

So, I like sime's idea more and more, as a workaround. I'd like to see core somehow prevent this. Maybe not indexing nodes with a php input format by default? I'm not sure the best solution; it's tricky.

Log in or register to post comments

Comment #56

As If commented 26 January 2010 at 23:29

> Maybe not indexing nodes with a php input format by default?

Yeah, I thought of that too, but many people (including myself on occasion) are using PHP to generate or manipulate some actual content. So that content wouldn't get indexed either. This *has* to be the reason we are even bothering to index PHP pages at all.

My next thought was "index only the *output* of PHP statements" - i.e., only the stuff that gets echo'd or print'd to the screen. We would execute the PHP during indexing (which we seem to already be doing), and I imagine then we would capture the output to a temporary table and index *that*. But obviously we are talking about major core-hacking at this point.

Log in or register to post comments

Comment #57

Dave Cohen commented 27 January 2010 at 00:46

I think it's reasonable to say, if the content is dynamic (i.e. generated from PHP input filter) then don't include it in search. As with all great ideas, not everyone will agree. ;)

One way to implement that would be to enhance the input filter API. Now, input filters can declare themselves as dynamic. We can make use of that, or add a way for them to declare "exclude me from search." Then in filter.module, add a check for whether its a search request and if so, return empty string. To implement this would require changes to format.module and search.module (so that format module would have some way of knowing whether the current request is to build search index or search results.

An almost-as-good approach is to enforce this only for content search. The node.module already keeps track of whether the request is search related (in $node->build_mode), so the change there could be quite small. Something like the following diff, which looks more complicated than it is; it's really just an if clause added to node_build_content():

Index: modules/node/node.module
===================================================================
--- modules/node/node.module	(revision 2124)
+++ modules/node/node.module	(working copy)
@@ -1068,16 +1068,24 @@
     $node->build_mode = NODE_BUILD_NORMAL;
   }
 
-  // Remove the delimiter (if any) that separates the teaser from the body.
-  $node->body = isset($node->body) ? str_replace('<!--break-->', '', $node->body) : '';
-
-  // The 'view' hook can be implemented to overwrite the default function
-  // to display nodes.
-  if (node_hook($node, 'view')) {
-    $node = node_invoke($node, 'view', $teaser, $page);
+  if (($node->build_mode == NODE_BUILD_SEARCH_RESULT ||
+      $node->build_mode == NODE_BUILD_SEARCH_INDEX) &&
+      !filter_format_allowcache($node->format)) {
+    // Omit dynamic content from search.
+    $node->body = '';
   }
   else {
-    $node = node_prepare($node, $teaser);
+    // Remove the delimiter (if any) that separates the teaser from the body.
+    $node->body = isset($node->body) ? str_replace('<!--break-->', '', $node->body) : '';
+    
+    // The 'view' hook can be implemented to overwrite the default function
+    // to display nodes.
+    if (node_hook($node, 'view')) {
+      $node = node_invoke($node, 'view', $teaser, $page);
+    }
+    else {
+      $node = node_prepare($node, $teaser);
+    }
   }
 
   // Allow modules to make their own additions to the node.

I doubt consensus could ever be reached to make a change like this, but I plan to use something like this on my sites.

hack core, fix core - it's in the eye of the beholder.

Log in or register to post comments

Comment #58

manderson725 commented 1 March 2010 at 18:01

Apparently I posted on the wrong issue. My issue is that i cannot run cron from the browser when Search module is enabled. I get the below error.

Fatal error: Class 'view' not found in /var/www/drupal6/includes/common.inc(1685) : eval()'d code on line 3

Disabling Search allows cron to run, but i'm not sure what the error indicates, a view with bad php code? The comments above seem like a great discussion but I don't understand enough of what's going on to fix it without good directions. I would appreciate any help/suggestions where to look/how to get around this.

Log in or register to post comments

Comment #59

jhodgdon

she/her

English

commented 3 March 2010 at 01:23

Your issue sounds similar: If you have a node that is doing some kind of a PHP evaluation, and the PHP evaluation fails when the node is indexed for searching, cron will stop. If you disable the search module, the node will not be indexed, the PHP evaluation will not fail, and cron will run.

Log in or register to post comments

Comment #60

jhodgdon

she/her

English

commented 3 March 2010 at 01:24

Version:	8.x-dev	» 7.x-dev
Category:	feature	» bug

I don't know why this was changed from a bug report to a feature request. I'm putting it back to being a bug report.

Log in or register to post comments

Comment #61

dave reid

he/him

English

Nebraska USA

commented 3 March 2010 at 02:03

Because it is a feature request? Currently we don't special case anything if the node has PHP content, so if you have improper PHP, that's a bug in your node, not the search module. Since we're having to add functionality to avoid bugs in node content, not actual Drupal code, it sounds like a feature request to me.

Log in or register to post comments

Comment #62

jhodgdon

she/her

English

commented 3 March 2010 at 02:05

Version:	7.x-dev	» 8.x-dev
Category:	bug	» feature

OK.

Log in or register to post comments

Comment #63

dave reid

he/him

English

Nebraska USA

commented 3 March 2010 at 02:10

Yeah, let's really tackle this in D8 properly. At least we have contrib solutions we can explore with the new APIs in D7.

Log in or register to post comments

Comment #64

Dave Cohen commented 3 March 2010 at 02:14

One could argue this is a bug. Since PHP content is dynamic, the node's content may change based on who's viewing the node and when. So what the search module indexes may be correct for anonymous users but not registered ones. In such a case, the search will return improper results to those registered users. Drupal knows the content is dynamic, and so chooses to leave it out of the filter cache. Perhaps Drupal should leave dynamic content out of the search index as well. (Essentially what I try to do in comment #57.)

Log in or register to post comments

Comment #65

damienmckenna

TN, USA

commented 3 March 2010 at 02:53

Could it be handled by a the following rules:

If you need to do a redirect *ALWAYS* use drupal_goto()
drupal_goto() will always be made smart enough to handle common use cases (would it be too late to add a hook to it?).
If you don't use drupal_goto() you're on your own.

??

Log in or register to post comments

Comment #66

jhodgdon

she/her

English

commented 3 March 2010 at 15:44

RE #64: that is an interesting idea, but I still think I agree with Dave Reid that it's a feature request that search should do better with dynamic content.

RE #65: Yes, it's probably too late to do something smart with drupal_goto() for Drupal 7. I'm not sure what "smartness" it would need, though?

Log in or register to post comments

Comment #67

damienmckenna

TN, USA

commented 3 March 2010 at 15:51

jhodgdon: I wonder if maybe a tiny hook to allow for overriding whether goto's are acted on could be sneaked under the radar? =)

Log in or register to post comments

Comment #68

jhodgdon

she/her

English

commented 3 March 2010 at 16:22

What would drupal_goto() do if it was told not to act -- what rendering would happen for this node, and what would be indexed or returned in searching? I think this needs more thought than a sneaking in under the radar...

Log in or register to post comments

Comment #69

Junro commented 28 March 2010 at 23:11

subscribe

Log in or register to post comments

Comment #70

Todd Young commented 9 April 2010 at 14:20

subscribe

Log in or register to post comments

Comment #71

icecreamyou commented 22 April 2010 at 05:28

How about something like this?

foreach(debug_backtrace() as $a) {
  if ($a['function'] == 'drupal_cron_run') {
    return;
  }
}

Log in or register to post comments

Comment #72

izmeez commented 1 May 2010 at 20:56

subscribing

Log in or register to post comments

Comment #73

kmillecam commented 7 May 2010 at 23:42

Hmm,

This arg(0) fix isn't working for me. Indexing continues to stop when it hits the node that has this in its body:

<?php
  if (arg(0) == 'node') {
    $referer = explode("?", referer_uri());
    if ($referer[0] == 'http://example.com/dmc-testdrive') {
      print '<p>Thank you for evaluating the Dell Management Console. You may obtain the evaluation Virtual Machine (.VHD) by downloading following files and executing the self-extracting executable:</p>';
      print '<ol>';
      print '<li><a href="/files/3/dmstestdrive/DMC VHD/DMC 1.1.1 VHD.part01.exe">DMC 1.1.1 VHD.part01.exe (700MB)</a></li>';
      print '<li><a href="/files/3/dmstestdrive/DMC VHD/DMC 1.1.1 VHD.part02.rar">DMC 1.1.1 VHD.part02.rar (700MB)</a></li>';
      print '<li><a href="/files/3/dmstestdrive/DMC VHD/DMC 1.1.1 VHD.part03.rar">DMC 1.1.1 VHD.part03.rar (700MB)</a></li>';
      print '<li><a href="/files/3/dmstestdrive/DMC VHD/DMC 1.1.1 VHD.part04.rar">DMC 1.1.1 VHD.part04.rar (700MB)</a></li>';
      print '<li><a href="/files/3/dmstestdrive/DMC VHD/DMC 1.1.1 VHD.part05.rar">DMC 1.1.1 VHD.part05.rar (700MB)</a></li>';
      print '<li><a href="/files/3/dmstestdrive/DMC VHD/DMC 1.1.1 VHD.part06.rar">DMC 1.1.1 VHD.part06.rar (231MB)</a></li>';
      print '</ol>';
    }
    else {
      drupal_goto($path = '/node', $query = NULL, $fragment = NULL, $http_response_code = 302);
    }
  }
?>

Log in or register to post comments

Comment #74

As If commented 8 May 2010 at 05:21

I'm wondering if your first line is messing things up somehow, because it is getting executed. Why are you saying if(arg(0) == 'node') anyway? Why can't you just say if($page)?

Log in or register to post comments

Comment #75

kmillecam commented 9 May 2010 at 21:24

@As If -- I was just wrapping my PHP function in the if statement described (a couple of times) earlier in this thread.

@ all, I did end up using a solution described here (hack to filter.module) that fixed the problem: http://drupal.org/node/361171

Log in or register to post comments

Comment #76

As If commented 12 May 2010 at 02:50

Oh, I thought you were showing us the raw body (prior to the fix). Hm. I cannot see why that didn't work.

So in your new alternative, you are simply looking for format == 3 and bypassing those?

Log in or register to post comments

Comment #77

Bevan commented 12 May 2010 at 02:14

A possible solution for people with this issue is http://drupal.org/project/search_config

Log in or register to post comments

Comment #78

arhak commented 28 May 2010 at 13:41

#68 as an example, wikipedia got its own way to handle no-redirects

Log in or register to post comments

Comment #79

d_arc commented 14 July 2010 at 08:05

subscribe

Log in or register to post comments

Comment #80

ikeigenwijs commented 12 April 2011 at 14:20

if(arg(0) == 'node'){
search breaking code

}

Is the only one that did the trick for me.

thx a lot

Log in or register to post comments

Comment #81

ikeigenwijs commented 11 May 2011 at 08:29

I was wrong, did not work either, the search continues.

Log in or register to post comments

Comment #82

Gabriel R. commented 14 July 2011 at 16:42

A bug from 2008 is a feature request in 2011.
Oh Drupal, this is so you. I love you, you schyzo.

Here's an easy fix for those running cron in CLI server side. This will check that:
- the user can administer nodes, so you could see the page and its EDIT button.
- the page is open in a browser, i.e. SERVER_ADDR is set.

if ( ! (user_access('administer nodes') || empty($_SERVER['SERVER_ADDR']) ) ) drupal_goto('node/123');

Log in or register to post comments

Comment #83

jonathan1055 commented 19 July 2011 at 19:27

Subscribe.
... and apologies to everyone who is going to get an e-mail telling them this. If there was a better way to subscribe to a thread then I would do it.

Log in or register to post comments

Comment #84

DWB Internet commented 12 October 2011 at 07:22

#43: 286263-disabled_redirects_idea.patch queued for re-testing.

Log in or register to post comments

Comment #85

12 October 2011 at 07:23

Status:

Needs review

» Needs work

The last submitted patch, 286263-disabled_redirects_idea.patch, failed testing.

Log in or register to post comments

Comment #86

damienmckenna

TN, USA

commented 7 November 2011 at 04:26

The Field Redirection module may help on D7 sites by allowing the redirection functionality to be handled on specific view modes, rather than for all uses of a given field.

Log in or register to post comments

Comment #87

damienmckenna

TN, USA

commented 9 July 2012 at 20:53

Patch #43 needs to be rerolled, and, honestly, rewritten.

Log in or register to post comments

Comment #88

ianthomas_uk

he/him

English

Brighton, UK

commented 7 January 2014 at 23:26

Version:	8.x-dev	» 7.x-dev
Issue summary:	View changes

The php filter module has been removed from Drupal 8 for exactly this sort of reason - it's too powerful and can't be locked down.

I'll set this back to D7 in case anyone wants to work on a 'fix', but my recommendation to anyone encountering this problem is to disable the php filter module and find alternative ways to do what you're trying to do (e.g. write a custom module).

Log in or register to post comments

Comment #89

ianthomas_uk

he/him

English

Brighton, UK

commented 7 January 2014 at 23:31

Status:

Needs work

» Closed (duplicate)

Actually this is a duplicate of #752184: Keep running cron when bad PHP content is encountered

Log in or register to post comments

Comment #90

bburg

he/him

Washington, DC

commented 27 May 2015 at 18:48

Commenting here since this issue seems to be highest search result on this particular bug. I frequently come across older sites with this problem when I trying to reindex the site's content. Since this is a manual reindexing, and not through cron, the redirect still happens. Changing checks to look for the "batch" path item seems to avoid redirecting during patch API processing.

if( $_SERVER['SCRIPT_NAME'] != '/cron.php' && arg(0) != 'batch' ) {
  drupal_goto("somewhere");
}

Log in or register to post comments

Make search indexing smart to handle nodes wth php redirects

Comments