As suggested earlier (http://drupal.org/node/22035) I want to open an issue to backport the performance fix to 4.6

We have a url_alias table of 8,000 entries / 1M and now even when we turn off path & pathauto modules it is still being fully loaded, slowing the site to a crawl (15-40 sec load)

This fix is desperately needed, I am sure not only by us. 4.7 is way out there (especially with contrib modules) and we are already dependent on these paths for search traffic.

Please help !!!

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Adrian Freed’s picture

I second the motion. This is a showstopper for many drupal projects for me including a million+ node site.

dkruglyak’s picture

Title: Backport performance fix to 4.6 » Backport path alias performance fix to 4.6
Project: Pathauto » Drupal core
Version: 4.6.x-1.x-dev » 4.6.5
Component: Code » base system

Re-filing under Drupal base system. Also note a discussion on some benchmarks in our case: http://drupal.org/node/43431

dkruglyak’s picture

Found OLD patch, have questions. It is introduced here http://drupal.org/node/22035#comment-29593, with disclaimer that it is broken.

Could you help me understand what version that patch applies to? HEAD/4.7? 4.6.x? How different is it from 4.6.5? I can try hacking but any guidance is much appreciated. Maybe you could point me to which functions to work on and where to start modifying 4.6.5...

Jeremy’s picture

Assigned: Unassigned » Jeremy
Status: Active » Needs review
FileSize
7.78 KB

Here's a backport from CVS HEAD that applies to 4.6.5. It works well in my own testing, but I certainly suggest that you test it first before using it in a production environment.

Feedback of how it helps on busy 4.6 sites would be welcome. It'd also be interesting to see some benchmark results.

dkruglyak’s picture

I just finished my own try to do the backport and found a bug in the original code, which also got carried into the patch.

Many of the aliases on my system do not get looked up. When rendering links, a call to drupal_get_normal_path before drupal_get_path_alias corrupts the map. I fixed the drupal_lookup_path function and rewrote it for greater efficiency by getting rid of array_search with two maps. Frontpage load improves 20s->0.7s.

Please review for commit to both 4.6.5 patch & HEAD:

function drupal_lookup_path($action, $path) {
	static $src_map = array();
	static $dst_map = array();
	static $numset = NULL;
	
	if ($numset === NULL) {
		$numset = db_result(db_query('SELECT COUNT(pid) FROM {url_alias}'));
	}
	
	if ($action == 'wipe') {
		$src_map = array();
		$dst_map = array();
	}
	elseif ($numset > 0 && $path != '') {
    if ($action == 'source') {
		if (isset($src_map[$path])) {
			return $src_map[$path];
		}
		if ($alias = db_result(db_query("SELECT dst FROM {url_alias} WHERE src = '%s'", $path))) {
			$src_map[$path] = $alias;
			$dst_map[$alias] = $path;
			return $alias;
		}
    }
    elseif ($action == 'alias') {
		if (!isset($dst_map[$path])) {
			if ($src = db_result(db_query("SELECT src FROM {url_alias} WHERE dst = '%s'", $path))) {
				$src_map[$src] = $path;
				$dst_map[$path] = $src;
				return $src;
			}
		}
		}
	}
	
  return FALSE;
}
dkruglyak’s picture

If you are applying this patch, do not forget to build index on src column in url_alias table.

chx’s picture

dkruglyak , you need to read http;//drupal.org/patch

dkruglyak’s picture

Thanks for the link. I am still not sure how to properly submit a "patch-of-a-patch". Could you provide any pointers on that?

In any case, the code is available for review / testing, it is just one function changed.

chx’s picture

it's easy: apply the patch, change what you think needs changed and provide a new patch.

Jeremy’s picture

FileSize
8.02 KB

Dkruglyak, your changes didn't work on my test server but I updated my patch to use two maps following your claim that this offers a dramatic improvement in performance.

Is anyone willing to benchmark Drupal 4.6.5 versus Drupal 4.6.5 + my first patch versus Drupal 4.6.5 + the attached patch?

BTW: Both patches update update.inc, so the easiest way to update your database is to run update_132 from update.php. (How many times can you use update in a sentence?)

dkruglyak’s picture

Sorry, I think my code broke "as is" since I renamed $map only in one function, but it is used in other places as well. To fix another bug you should remove these lines from bootstrap.inc :

+        else {
+          $alias_map[$path] = $path;
+          $system_map[$path] = $path;
+        }

The problem is that the first call to drupal_get_normal_path for "node/xxxx" populates maps with "node/xxxx":"node/xxxx" pairs. Subsequent call to drupal_get_path_alias instead of fetching the alias from DB gets alias equal to normal path from the map.

This sequence of calls is not invoked for every link, but is obviously a bug. Should be fixed both in 4.6 and HEAD.

Jeremy’s picture

Can you detail how to trigger the bug you're seeing?

It looks to me like the code is doing the right thing. It first does a db_query to see if there's an alias for the current path. Only if it doesn't find an alias in the url_alias table does it trigger the logic you mention above which prevents it from doing additional db_queries on the same path.

dkruglyak’s picture

The code is not doing the right thing. When drupal_get_normal_path is called for a normal alias the maps get corrupted with normal/normal pair, even when alias exists.

I originally discovered problem with links in "Recent blog posts" block. The problem does not show inside regular blog posts, maybe because of different call sequence.

matt westgate’s picture

FileSize
5.03 KB

This patch takes the current working path aliasing for Drupal 4.7 and backports it to 4.6. You'll need to run update.php to add an index to the url_alias table.

This patch is a lifesaver for sites with thousands of aliases. The main difference is that aliases are loaded on demand rather than trying to cache the entire path table into memory for each request (i.e., many small, fast queries vs one huge, slow query).

Adrian Freed’s picture

Matt, I didn't have much luck with that patch. I put the changes in carefully by hand and noticed that the mysql update
ALTER TABLE url_alias ADD INDEX ('src') worked by hand without the quotes around src.

Also my path.module has a call to drupal_rebuild_path_map() which was removed by your patch.

moshe weitzman’s picture

I added a note to the pathauto project page which links here. I fear that lots of sites have grown slow because of this bug.

matt westgate’s picture

FileSize
5.05 KB

New patch attached which should fix the above issues.

Bill St. Clair’s picture

Matt,

Just installed your path_alias_backport2.patch in my Drupal 4.6.5 installation. I did a test import of over 2,600 nodes from the HTML files of my old site. It got very slow, on my 2 GHz Pentium M laptop. The patch sped it right back up.

Thank you, thank you, thank you.

dopry’s picture

Status: Needs review » Reviewed & tested by the community

Tested with 4.6.5, upgrade goes fine. no apparent bugs.

Looking at the code, moving to selecting and caching the specific record needed seems like a vast improvement over selecting them all at one. Seems like it will save processing time and memory. I don't have a lot of aliases to benchmark against, so can't provide hard numbers.

Setting to ready to commit since its already accepted in 4.7 and applies cleanly to 4.6.5.

Adrian Freed’s picture

I have had good luck with the patch.
It does result in lots of fast querie:there are few hosting services out there crazy enough to meter
mysql requests so a small minority of users may be inconvenienced.

Dries’s picture

Status: Reviewed & tested by the community » Closed (won't fix)

Drupal 4.6 is frozen.

dgrant’s picture

What happens when 4.7 is finally released and I want to upgrade? Do I lose the new "function update_132()" that you replaced? Actually, I just checked Drupal 4.7 updates.inc file and update_132 happens to be a postgresql-only update, so it won't affect me if this update doesn't get applied since I don't use postgresql.

Marandb’s picture

I applied the patch & it made quite a bit of difference on my site. (4.6.5)

Just a note: the patch would NOT run for me using stock 4.6.5 files. (Failed, Failed, Failed) I had to patch each file manually. I hope it is ok that I attached a zip with all the patched files (4.6.5 Stock). Patching is a PAIN in the rear for those of us who are not Unix smart / web developers.

Anyway that said, here are a few quick before & after stats.

HomePage - 189 queries
Before: 269.98ms
After: 60.26ms

Forum Topic - 401 queries
Before: 574.00ms
After: 70.41ms

Page - 160 queries
Before: 424.58ms
After: 52.45ms

-- Marand B.

yongrokc’s picture

It's first time I'm trying to update the database of my first installation of drupal, 4.6.5.
But, I can't seem to find update.php... Where is it and how do I go about using it?

Thanks,
Andy

njehlen’s picture

Category: task » bug

I tried using this patch (specifically, I used the files in the attachment that Marandb included) when I installed pathauto, but i get this error:

Fatal error: Call to undefined function: filter_xss_bad_protocol() in /home/action/public_html/ace/includes/common.inc on line 630

Any ideas what I might be doing wrong?

chx’s picture

You are running an outdated 4.6.x , update to latest 4.6.x.

insomoz’s picture

This works, I have just tested, I can now finally edit, delete, add content, without having to worry about my running like a snail.

Thanks to the person who released this script.
Load that would hit 30+ and cliimb is now a steady 0.01

venkat-rk’s picture

Insomoz, did you run the patch attached to comment #17 or comment #23?

asb’s picture

Status: Closed (won't fix) » Fixed

I applied the patch submitted by Jeremy on January 10, 2006 - 03:08 and ran update.php after this; everything seems to be working fine.

The patch has solved a major performace issue for me; page loading times have decreased from approx. 16-20 sec. to 1-2 sec. or less; accessing the admin menu took 20-30 secs before and takes now just 2 secs. Editing the menu system was impossible because of timeouts (90 sec. in php.ini) and now takes approx. 30 sec. and - more importantly - makes editing the menus possible again.

Thank you VERY MUCH for this patch!

Regards -asb

klance’s picture

Thanks to matt westgate for this patch. My site is running much faster, though the menu system is still very slow. Just wanted to point out that when I applied the patch, i18n broke, so I made similar alterations to i18n_get_normal_path($path) so that it wouldn't complain anymore. Seems to have done the trick.

klance’s picture

I was a little hasty on that verdict, and there are a few mistakes in the patch I submitted. When accessing language-specific administration pages, the previous patch leads to a Page Not Found error for URLs preceded by a locale. This patch fixes that problem, and I have tested it thoroughly.

jasonwhat’s picture

Clearer instructions? There are a bunch of different patches going on here and it might be helpful to have some specific instructions of what files need patched. The path.module file does not get patched right?

Rather than applying the patch, I used the already patched files and replaced my old ones. I'm on a CivicSpace install. However when I went to update.php there was "no update". (note: my install had no updates.inc file) Is there something I can do to change this? Or does someone have the SQL statements so I could just apply them directly through php admin? Will using this patch cause complications when upgrading to 4.7?

RobRoy’s picture

I think that last patch from matt westgate (comment #17) is the money one. I just patched my Drupal 4.6.5 and it works fine. If you don't have updates.inc, all you need to do is add an INDEX on the src field.

ALTER TABLE url_alias ADD INDEX (src);
kc’s picture

I used the files from #17 but I get this as I try to acess my admin/categories with 7000 terms

Maximum execution time of 30 seconds exceeded in /home/xyz/public_html/includes/bootstrap.inc on line 389

same when I try to access my sitemap with all 7000 categories w/aliases

Stefan Nagtegaal’s picture

Is this performance fix currently in HEAD or not? if not, I'll make a patch for it..
Sorry for asking, but I do not have time (this moment) to check if it's present myself...

pamphile’s picture

I know this question does not belong here, but inquiring minds want to know...
How do you apply a Drupal patch ? Where can I read more on this ?

see:
http://drupal.org/diffandpatch
http://drupal.org/node/19036

nathandigriz’s picture

Does this patch still work on version 4.6.6? upgrade to 4.6.6 after applying the patch?

mediafrenzy’s picture

I'd also like to know which patch works for 4.6.6? I'm a bit confused by the number of different patches being discussed..

Thanks very much.

dkruglyak’s picture

Title: Backport path alias performance fix to 4.6 » Consolidate Patches: Backport path alias performance fix to 4.6
Status: Fixed » Needs work

It appears that there are two critical patches making significant changes to path / alias processing. They should be consolidated for 4.6.5 / 4.6.6, since they touch the same files and I am not sure how to do that.

I created a new issue here: http://drupal.org/node/56972

tng@neuralgourmet.com’s picture

OK. I have a problem. I applied Matt's patch from post #17 to a Drupal 4.6.5 system without any errors. This took care of the slow page loads beautifully. Unfortunately, after I applied the patch this error would be thrown every time anyone posted anything:
Fatal error: Unknown function: drupal_rebuild_path_map() in /home/xxxxxx/public_html/modules/path.module on line 148

I reversed the patch and everything is back to the way it was before -- including slow page loads.

What would be my best course of action? Can I just take path.module from 4.6.6 and copy it into modules/ ?

Thanks for any suggestions in advance.

eurokate’s picture

I applied the path alias patch and also the i18n patch above. My site sped up noticeably, but the i18n module was still broken. When trying to access any nodes with a language prefix I got a page not found error. Is the i18n patch submitted by klance correct? I couldn't make sense of it.

chx’s picture

Version: 4.6.5 » 4.6.6
Assigned: Jeremy » chx
Status: Needs work » Reviewed & tested by the community
FileSize
4.77 KB

This patch is badly needed though the patch was not something that likely could get into 4.6 but now it is. Why? Because this version does not change any API because that's not acceptable. This is now only a performance fix and nothing else. I have not removed drupal_get_path_map (though now it's unused by core) nor I changed conf_url_rewrite. This is the way a backport should be done for Drupal.

I have tested with and without conf_url_rewrite.

Patch is against DRUPAL-4-6.

bradlis7’s picture

Path seems to be working fine for me.

+1

bradlis7’s picture

with the current patch applied I mean

xeeya’s picture

Sadly I`m getting the following error when I applied the above latest patch for 4.6. I`ve also run update.php after the patch.


Warning: main(includes/bootstrap.inc): failed to open stream: Permission denied in C:\Programme\xampp\htdocs\frightening\index.php on line 12

Warning: main(): Failed opening 'includes/bootstrap.inc' for inclusion (include_path='.;C:\Programme\xampp\php\pear\') in C:\Programme\xampp\htdocs\frightening\index.php on line 12

Fatal error: Call to undefined function: drupal_page_header() in C:\Programme\xampp\htdocs\frightening\index.php on line 13

:(

xeeya’s picture

Sorry, of course I meant 4.6.6 . The drupal installation runs on a local machine.

chx’s picture

xeeya, your report has nothing to do with this patch, your webserver can't read bootstrap.inc , you reuploaded wrong or whatever. Please do not litter this issue further.

beginner’s picture

Version: 4.6.6 » 4.6.8

The patch still applies cleanly.
in #21 dries said he wouldn't fix. Chx: have you talked to him?

Are patches still applied to the 4.6 branch?
http://drupal.org/project/issues?projects=3060&versions=11555,11262,1023...
With this one, there are 3 RTBC patches that have been sitting for weeks.

chx’s picture

Yes, I talked with him and my patch is not the same as it was before -- mine has no API changes.

magico’s picture

When will this patch be applied to 4.6?

Dries’s picture

Status: Reviewed & tested by the community » Closed (won't fix)

We're not going to backport this to 4.6. It's not a bugfix.