Broken autoloader: convert includes/requires to use absolute paths
dopry - May 17, 2008 - 16:12
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | fixed |
Description
There are a couple stat calls involved every time you call include, more if you're using relative paths, which Drupal does. This patch adds absolute paths to all core includes and a new global $drupal_root.
| Attachment | Size |
|---|---|
| drupal_asolute_paths.patch | 36.11 KB |

#1
Is there really a notable difference performance-wise between './foo.inc' and realpath(__FILE__) .'/foo.inc'? I know that not having the leading . causes full PATH traversal, which is bad, but I'm not aware of a difference in the above. Can you provide benchmarks?
I really really really don't want to introduce another global variable unless it really really really makes a difference; and even then, it should be a constant, not a global variable.
#2
The gain is minimal but there is on there. Using an absolute path saves a getcwd system call for every require_once. It also save having to open and stat the file for subsequent require_once calls. This applies to include_once as well. For the single require_once the difference is minimal, but there.
You will also see a greater impact for files that may get included multiple times, which I believe happens in the theme layer for D6. There will also be similar benefits for sites running APC with apc.stat=0.
The following tests are system level and really only detail what is happening under the hood as proof of my statements. The real impact test will come with benchmarking, and will probably be more relevant to disk i/o constrained systems that rely heavily on file templates, or servers with large numbers of open files.
The work is based on a presentation by the php 5.2 maintainer at http://ilia.ws/files/phptek2007_performance.pdf regarding performance tuning PHP.
<?phprequire_once('./includes/install.inc');
require_once('./includes/install.inc');
then strace php test.php... it results in:
Then try it with absolute paths...
<?phprequire_once('/path/to/install.inc');
require_once('/path/to/install.inc');
the strace output is:
The absolute path version has two less cwd calls and 1 less open call and 1 less fstat64 call...
Some system level benchmarks suggest require_once performance with absolute paths is faster...
<?php
$name = 'require_once(relative path)';
$iterations = 1000;
$start = microtime(true);
for ($x = 0; $x < $iterations; $x++) {
require_once('./junk.php');
}
$time = microtime(true) - $start;
echo "$name -> ($iterations iterations): $time\n";
$name = 'require_once(absolute path)';
$iterations = 1000;
$start = microtime(true);
for ($x = 0; $x < $iterations; $x++) {
require_once('/home/anarkh/public_html/drupal-head/includes/junk.php');
}
$time = microtime(true) - $start;
echo "$name -> ($iterations iterations): $time\n";
$name = 'require_once(relative path)';
$iterations = 10000;
$start = microtime(true);
for ($x = 0; $x < $iterations; $x++) {
require_once('./junk.php');
}
$time = microtime(true) - $start;
echo "$name -> ($iterations iterations): $time\n";
$name = 'require_once(absolute path)';
$iterations = 10000;
$start = microtime(true);
for ($x = 0; $x < $iterations; $x++) {
require_once('/home/anarkh/public_html/drupal-head/includes/junk.php');
}
$time = microtime(true) - $start;
echo "$name -> ($iterations iterations): $time\n";
require_once(relative path) -> (1000 iterations): 0.0175869464874
require_once(absolute path) -> (1000 iterations): 0.00283122062683
require_once(relative path) -> (10000 iterations): 0.21484708786
require_once(absolute path) -> (10000 iterations): 0.073392868042
#3
here is a revised version with constants, and some minor spacing issues cleaned up. I guess I'll setup a benchmarking environment if no one else has one setup.
.darrel.
#4
so far benchmarking a base install with AB isn't very telling one way or the other. I did however get a clean room benchmarking environment set up this morning... I got two servers on their own switch with nothing else to do, yeah! I'll populate my installations with some data tonight and pull out siege and see how things go.
#5
The microbenchmarks look like the absolute path is definitely faster, but I'm curious why ab wouldn't show the same. Perhaps it's a case of it not making enough of a difference in actual practice?
If siege sees a difference, then let's go for it.
#6
Seems worth the change.
It's a bit more learning to always prefix.
Should we maybe do this?
<?php
define('DRUPAL_INCLUDE', 0);
define('DRUPAL_REQUIRE', 1);
function drupal_include($path, $type = DRUPAL_INCLUDE) {
switch ($type) {
case DRUPAL_INCLUDE:
include_once(DRUPAL_ROOT . '/' . $path);
break;
case DRUPAL_REQUIRE:
require_once(DRUPAL_ROOT . '/' . $path);
break;
}
}
?>
#7
What actual purpose would switching on include/require serve? Drupal breaks if you try to do either one and it fails; it just breaks differently. :-)
#8
Well, good point. Maybe they should all be requires? My thought was just to introduce a new function to call to avoid having to add the new constant everywhere we do an include.
#9
There is also another reason for using absolute paths:
As mentioned on http://php.net/session_set_save_handler, the current working directory may change during script termination (try adding
var_dump(getcwd());to sess_write() in includes/session.inc). When the current working directory is different from the Drupal root, include_onces with relative paths will fail.If Drupal tries to load new files during shutdown, errors like the following appear in the Apache error log:
So the suggested change is not only a performance optimization but also a bug fix.
#10
How many places in core actually still use require and include directly, now that the code registry is in place? Isn't all of the code-loading centralized already, not requiring a drupal_include() API function?
Edit: grep shows a lot of places; forget what I said.
#11
The patch consists mainly of lines beginning
include_once()orrequire_once(). There are many of them.#12
As an alternative to defining DRUPAL_ROOT in each top-level .php file, you could use $_SERVER[DOCUMENT_ROOT] instead.
#13
Constants should be defined only once, maybe in bootstrap.inc.
#14
Patch cannot be applied to HEAD.
#15
Raising this to critical, because it breaks PDO object loading in sess_write.
#16
I'm working on that.
#17
Here is a proper patch. Changes:
* Standardize all code style to
(include|require)_once 'file', not(include_require)_once('file')* Standardize *_inc variables not to use any trailing './'
* Standardize registry pathes not to use any trailing './'
* Upgraded to HEAD
#18
Very minor, but there are some small code style violations (lots of comments without full stops, some of which, admittedly, were in the code previously. Some are new.)
#19
@keith.smith: No comments were harmed during the making of this patch. There are no new comments and no previous comments were touched. That's out of the scope of that patch to change comments.
... but that makes me thing that the
DRUPAL_ROOTshould get documented.#20
Well.
These appear to be added, or changed, or something, for instance, and have no terminating period. But yes, its certainly no big thing.
+ extract($variables, EXTR_SKIP); // Extract the variables to a local namespace+ ob_start(); // Start output buffering
+ include DRUPAL_ROOT . '/' . $file; // Include the file
+ $contents = ob_get_contents(); // Get the contents of the buffer
+ ob_end_clean(); // End buffering and discard
+ return $contents; // Return the contents
#21
Patch looks good (visually).
However, there's a logical change in module_load_include(), which might be unsafe, better leave this for another issue.
Also, the coding-style fixes in theme_render_template() should go into another issue (as already mentioned by Keith).
Tiny, but confusing: Why is this done differently?
+ define('DRUPAL_ROOT', dirname(realpath(__FILE__)));
+ define('DRUPAL_ROOT', getcwd());
(SimpleTest)
#22
@sun:
Please, read the patch. The change is tiny, and is only made not to call is_file() on an empty variable if drupal_get_path is not found.
There is no change *at all* here. I only shifted the comments left. I didn't change their style nor their wording. That's for another issue.
Both affected scripts (scripts/drupal.sh and scripts/run-tests.sh) are in the "scripts" directory. Their dirname(realpath(__FILE__)) is that directory, not the Drupal root directory.
No change needed to the patch, back to CNR.
#23
Damien : your patch is clean and use absolute path offer better performances (#2)
Hower patch#17 fail on :
\www\drupal\7.x\install.php (Cannot apply hunk @@ 7 )
\www\drupal\7.x\update.php (Cannot apply hunk @@ 6 )
\www\drupal\7.x\includes\install.inc (Cannot apply hunk @@ 9 )
#24
Example of constant doxygen documentation :
/*** Root directory of Drupal installation.
*/
#25
Quick reroll, including documentation of the constant.
#26
Note that this patch has *nothing* to do with performance. It is just here to fix HEAD that breaks because the current directory can change when PHP enters shutdown functions (like sess_write), breaking our dynamic code loading.
#27
Patch applies (with some fuzz) and produced no errors. Visually I can find nothing that ought to be different.
#28
Patch failed on :
install.php (Cannot apply hunk @@ 7 )
update.php (Cannot apply hunk @@ 6 )
I think, after reroll, the patch will be RTBC
#29
Quick reroll.
#30
#31
#32
According to #9
#33
Not tested this patch, but it's blocking #297860: Regression in sess_write (and it should use a merge)
#34
My session tests pass just fine without this patch (82 passes, 0 fails, 0 exceptions). This looks like an important patch, that is prone to breakage. I think we should write a test case for this patch.
#35
@Dries, do you get the BlogAPI, XML-RPC, Path and Actions failures when #297860: Regression in sess_write (and it should use a merge) is applied? The fails are pretty obscure in those tests though so I agree it'd be a good to have a test case.
#36
@34: I'm probably entirely missing the point, but I don't see how "a" test case can cover all the changes in this patch. Surely code coverage of existing tests will ensure that nothing new is broken by it. The only other thing I can think of would be to always unset the include path by setting it to an empty string or somesuch when running tests. That would make sure that all include/require files are properly specified.
Also there is an "include" (not "include_once") in color.module that has escaped this patch, in http://api.drupal.org/api/function/color_get_info/7. Whether there are any other instances of plain "include" in D7 I'm not sure.
Finally, a very trivial point, the form
include_once DRUPAL_ROOT . "/$file";seems marginally clearer to me than
include_once DRUPAL_ROOT . '/' . $file;[@Damien: great patch by the way, must have taken ages to sort out all the different cases]
#37
Marking back to review.
#38
thanks for the share
======================================================
website design | website development | submit site
#39
@37: actually the color.module issue makes this CNW
#40
@gpk: good catch.
Here is an updated patch, including the missing include in color.module. I double-checked that no require or include were missing. I think we covered virtually all cases.
Please run the full test suite with this patch. I think it will be enough to verify that we don't brake anything.
#41
Back to CNR.
#42
5390 passes, 0 fails, 0 exceptions on a fresh install of the current HEAD (patched of course :)).
#43
Subscribing. Will check this in a few minutes.
#44
Ok, nevermind. The changes in here touch a lot of stuff, including command-line scripts that don't have tests, so I'll need to look at this when I have more time available. Will get to it later this week.
#45
What file to apply this patch? Is it something like: patch -p0 < drupal directory/259623-absolute-includes-4.patch?
#46
You first have to be in the drupal directory. Then do
patch -p0 < name_of_patch.patch.#47
There are a couple of rejects in the last patch. Here is a re-roll that applies cleanly.
I installed HEAD from scratch using the patch and everything looks fine.
I will conduct some stress tests.
#48
Here are the benchmarks. 25 concurrent users hitting the home page on a default install for 2 minutes.
Marginal improvement with the patch, statistically insignificant.
Trans, Elap Time, Resp Time, Trans Rate, Concurrent, OKAY, FailedWithout patch 6060, 120.02, 0.49, 50.49, 24.94, 6060, 0
With patch 6199, 120.16, 0.48, 51.59, 24.96, 6199, 0
#49
@kbahey: please note this is *NOT* a performance patch.
#50
I know, and it should not decrease it either. Which is the case, so we are good.
#51
We brainstormed with webchick on this, and she came up with this summary for the patch:
#52
Uh. OK, I am totally confused here.
1) How is this patch suddenly not about performance? That was the point when I last looked at it.
2) How exactly does using an absolute path rather than making PHP figure out the absolute path have any impact whatsoever on the registry loading MergeQuery? If the registry needs to write things it already manually includes the DB files it needs to break the circular dependency there. The only merge in session handling should be in sess_write(), by which point the database and registry are long since loaded.
So, uh, huh?
#53
@Crell:
There is a note on the register_shutdown_function help page:
When we enter
sess_writeand all classes required for MergeQuery has not been loaded yet, the autoloader tries to loadmysql/query.incbut fails with:PHP Fatal error: require_once() [<a href='function.require'>function.require</a>]: Failed opening required './includes/database/mysql/query.inc'... because the current directory changed in the meantime.
#54
Ahhh... OK, that makes sense. (Well, not Apache's silliness but the fix does.) I'm assuming everything else in this patch *is* performance related, as per the start of the thread?
#55
Ok, I ran all the tests (which all passed) and looked this over one more time. From the web side, all looks good. The command-line scripts are missing PHPDoc over their defines, but these are also in the middle of executable code, so I'm not sure it'd do any good anyway. run-tests.sh looks like it needs some general love, since running it without an argument spits out a bunch of errors. But this happens without the patch as well.
Committed to HEAD. Thanks!
#56
@54: not entirely - there is the question of consistency in the way includes/requires are handled, and it also fixes #296194: Some includes are not anchored.
#57
Uggh, cross post...
#58
Hm. I am probably going to have to revert this. :(
On installation, I now get the following directly after entering my DB info:
PDOException: SQLSTATE[42S02]: Base table or view not found: 1146 Table '7x.variable' doesn't exist in /Applications/MAMP/htdocs/7x/includes/database/database.inc on line 1017
Though even without this patch I still get:
PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ') ORDER BY fit DESC LIMIT 0, 1' at line 1 in /Applications/MAMP/htdocs/7x/includes/database/database.inc on line 1018
It's generating a broken query:
PDOException: SELECT * FROM {menu_router} WHERE path IN () ORDER BY fit DESC LIMIT 0, 1 - Array ( ) SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ') ORDER BY fit DESC LIMIT 0, 1' at line 1 in /Applications/MAMP/htdocs/7x/includes/database/database.inc on line 365
This comes from /Applications/MAMP/htdocs/7x/includes/menu.inc line 367.
Anyone got a quick fix?
#59
#60
@webchick: please don't blame the patch. I'm unable to reproduce the issue but it is a known one, even on Drupal 6:
http://drupal.org/node/280015
http://drupal.org/node/223062
http://drupal.org/node/287632
http://drupal.org/node/303378
While the root cause of this error remains to be identified, chx made a quick fix: http://drupal.org/node/261148
While you are at it, because you can reproduce the error, maybe you could grab some information on why and how this happens: at which step of the installer does this happen, what is the state of the database when this happen, etc?
#61
Ok, I documented what's happening over at #311251: Fix broken installer in HEAD. Help appreciated.
#62
This is a nice patch.
But it added a line to every script that has to execute drupal. Namely,
define('DRUPAL_ROOT', dirname(realpath(__FILE__)))#63
Couple things:
1. This never made it to the upgrade docs: http://drupal.org/node/224333
2. I'd like to see if Moshe's suggestion @ #62 would work. That'd make for far less code duplication.
#64
I think this looks good - although I like Nedjo's idea of a simple wrapper function drupal_include($file). This would simplify the calls and mean we have a single place to change the behavior in the future if we need to.
Also, I am wondering if the define could possibly live in settings.php. This would allow people to change this if they have an odd configuration, such as all php source (except index.php of course) living outside of the webroot, which is occasionally requested.
#65
I support drupal_include($file) as well. I was actually working on this a few weeks ago for a different reason (moving all of drupal into a separate directory).
It could be done in a way that allows inclusion from an alternate base directory too, in case some modules want to use drupal_include from that:
<?php
function drupal_include($file, $base_dir = '') {
if (!$base_dir) {
$base_dir = DRUPAL_ROOT;
}
require_once($base_dir . '/' . $file);
}
?>
Also, as moshe says, defining DRUPAL_ROOT in bootstrap.inc will avoid repeating it everywhere.
#66
To clarify, this patch has already been committed. It needs docs.
#67
@66: See http://drupal.org/node/224333#absolute_includes.
Does Moshe's suggestion @62 need a separate issue?
#68
Yes please!
#69
#319154: Remove code duplication - define('DRUPAL_ROOT', dirname(realpath(__FILE__))).