Don't SELECT user uid = 0
robertDouglass - August 3, 2006 - 18:13
| Project: | Drupal |
| Version: | x.y.z |
| Component: | user.module |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Description
A client just brought up an interesting point; there is never any need to SELECT * FROM users WHERE uid=0, is there? Likewise, updating the login and data columns for uid is equally senseless. Could we spare ourselves all the uid=0 queries, perhaps by checking for uid=0 in user_load? (I haven't searched to see if there are any current implementations of these ideas... I'm just looking for feedback).

#1
In particular, it looks like this code from sessions.inc could be improved:
<?php
function sess_read($key) {drupal_set_message("getting a normal session");
global $user;
// retrieve data for a $user object
$result = db_query("SELECT sid FROM {sessions} WHERE sid = '%s'", $key);
if (!db_num_rows($result)) {
$result = db_query("SELECT u.* FROM {users} u WHERE u.uid = 0");
}
else {
$result = db_query("SELECT u.*, s.* FROM {users} u INNER JOIN {sessions} s ON u.uid = s.uid WHERE s.sid = '%s'", $key);
}
?>
$user = new stdClass(); $user->uid = 0;do the trick?That would save a query per page load if it worked.
#2
funny, I thought about the same today. Yes, we need the select since we store session data in the sessions table. Things like language switches rely on this.
#3
Sorry, I didn't understand that. I know we need to select from the sessions table for whatever sid we get, but do we ever need to select from the users table for uid = 0, and why?
Do you think my idea for the "SELECT FROM sessions LEFT JOIN users" would work?
#4
Oops, yeeah, we probably don't need that.
#5
Sweet! It works. Three queries condensed into one.
#6
There is likewise no need to select uid = 0 in user_load. With this patch, I just build an object, call the user_module_invoke('load'...), and return it. This saves two queries when anonymous users access non-cached pages.
#7
I'm a bit tired looking at the patch, but can't the bottom of that patch change that second IF to an ELSE IF to prevent the query building there?
Otherwise the patch looks great, knocking off one query page is a nice performance boost :)
#8
The $user object gets returned before then, so it never gets to that.
#9
For the anonymous user we still unpack $user?
<?php$user = drupal_unpack($user);
?>
I think we might be able to refactor sess_read() some more -- we'd need less if-checks.
The change to user_load is a bit icky.
#10
Apologies if this patch is weird in any way... it was resisting making a nice patch and I had to tweak it by hand.
#11
this one is better.
#12
On my test setup, this patch improves performance by 4% (when serving cached pages to anonymous users). That's a substantial improvement. :-)
When serving non-cached pages to anonymous users, performance improves by
There are two small improvements left for user_read():
1. Change
if (!$user = ...)toif ($user = ...)by switching things around. This eliminate a '!'.2. The last line of sess_read() can be eliminated
return !empty($user->session) ? $user->session : '';. Just do the returns in the if-else clauses. This saves an '!empty()' test in the critical path.Also, anonymous users can have session information too (eg. to store comment display settings) ... Would that still work after your patch? Mmm. Your patch seems to assume that when session information is found, it must be from an authenticated user ... or something. Please double-check.
#13
When serving non-cached pages to anonymous users, performance improves by ... 1%. ;-)
#14
Made the code changes.
The sessions work just the same way they did before. The initial query in sess_read is this:
SELECT u. * , s. *FROM sessions s
LEFT JOIN users u ON s.uid = u.uid
WHERE s.sid = /* php session id */
LIMIT 0 , 30
For an anonymous user, it returns all the information from the sessions table and nothing from the users table by virtue of being a left join. I tested the comment settings and they work as expected.
#15
patch
#16
Do we really need a left join? All users, including uid 0, exist in the users table.
I think we can improve session handling a bit further. When no session ID is sent by the client, it is generated by PHP, but we still check if it exists in the database (twice even). We can skip this step because we already know that no session exists. Aside from performing better in ab-benchmarks, it also saves a few queries for the first request made by a client and more important, crawlers. On many sites crawlers are responsible for a significant percentage of all requests (like it or not).
pseudo code:
- if (empty($_COOKIE[session_name()]) : no session ID was sent by the client. Create the simple uid 0 $user object.
- else try loading it from the database. If a session is found, return that user object
- else, (client sent a session id that no longer exists, probably expired) return the simple uid 0 object
We can add a similar optimisation to sess_write as well:
- if (empty($_COOKIE[session_name()] && empty($value)) : do nothing
That way we can serve pages to crawlers without a single database query - assuming nothing was written to $_SESSION.
I'm not sure about combining the queries though, IIRC this used to be a single query in the past, needs some investigating, it was a long time ago ;)
#17
I am too curious why you replyca an inner by a left join. left joins are usually more expensive and also usually not needed.
#18
Bart: great idea! That will help a lot with agressive crawlers. :-)
#19
The patch that split the query was http://drupal.org/node/44947
Maybe we could do "select * from session where key=X" and only when uid > 0, "select * from user where uid = X" and merge the two results. But i'm no DB expert, i don't know which is faster
#20
Smart people! Great suggestions. If nobody beats me to it I'll re-roll this tomorrow. I'm looking forward to Dries putting it through his performance testing environment.
#21
Bart: will sess_read() be called with $key = NULL/0/empty string? We'll want to test that.
#22
does the above
#23
I believe the sess_write() changes are incorrect. We still need to update the access time in the user table, or things like the "who's online" block stop working properly.
I removed the changes to sess_write() and benchmarked this patch. The result is a 13% speed-up for cached serving pages!
#24
I believe that the code is correct (or at least it is as correct as the existing code). If you run it with the debugging code added, you can verify that the first request (no cookie) goes into the first "if", and would otherwise not go into the second "if", thus the query in between is superfluous:
<?php
function sess_write($key, $value) {
global $user;
// If the client doesn't have a session, and one isn't being created ($value), do nothing.
if (empty($_COOKIE[session_name()]) && empty($value)) {watchdog('session', '0');
return TRUE;
}
$result = db_queryd("SELECT sid FROM {sessions} WHERE sid = '%s'", $key);
if (!db_num_rows($result)) {watchdog('session', '1');
// Only save session data when when the browser sends a cookie. This keeps
// crawlers out of session table. This improves speed up queries, reduces
// memory, and gives more useful statistics. We can't eliminate anonymous
// session table rows without breaking throttle module and "Who's Online"
// block.
if ($user->uid || $value || count($_COOKIE)) {watchdog('session', '2');
db_queryd("INSERT INTO {sessions} (sid, uid, cache, hostname, session, timestamp) VALUES ('%s', %d, %d, '%s', '%s', %d)", $key, $user->uid, $user->cache, $_SERVER["REMOTE_ADDR"], $value, time());
}
}
else {watchdog('session', '3');
db_queryd("UPDATE {sessions} SET uid = %d, cache = %d, hostname = '%s', session = '%s', timestamp = %d WHERE sid = '%s'", $user->uid, $user->cache, $_SERVER["REMOTE_ADDR"], $value, time(), $key);
// TODO: this can be an expensive query. Perhaps only execute it every x minutes. Requires investigation into cache expiration.
if ($user->uid) {watchdog('session', '4');
db_queryd("UPDATE {users} SET access = %d WHERE uid = %d", time(), $user->uid);
}
}
return TRUE;
}
?>
The question is, in the first if, do we want to already write to the session table? I think not. What would we write, after all?
#25
drupal_anonymous_user() will need some function documentation.
#26
Changed the syntax for the parameter validation to something less vertical.
#27
!*%$ wrong patch for this issue.... ignore the last one.
#28
I added documentation to drupal_anonymous_user and moved it to bootstrap.inc. It doesn't belong in session.inc because that contradicts my goal of making our session handling pluggable (because any implementing session API would have to recreate drupal_anonymous_user as well). So it seems it belongs in bootstrap.
#29
#30
Tested the last version of this patch (unmodified) and performance increases by 20% when serving cached pages, and by 2% when serving non-cached pages. That's big! :)
We should shif focus to non-cached pages now...
#31
i think the name column for uid=0 is used to give anonymous user a unique name: 'anonymous coward, anonymous hero, etc.' we had a UI for this but i can't find it right now. are we ditching this feature? not sure if we can store this in variables without incurring performance penalty.
#32
There is still a setting for the anonymous user name. If you can't find it, we should move it. Where did you look for it initially, or where did you expect to find it?
IIRC, we added the uid=0 column so we can do inner joins rather than left joins.
#33
We indeed added uid = 0 for doing inner joins which are faster. The name and all the other fields are not used. We did have a lenghty discussion on the matter, but we never switched away from using the variables we have for this.
#34
the first place i looked was admin/user/settings ... we should set this name in drupal_anonymous_user(), right? then we can remove the variable_get() from theme_username() and anywhere else
#35
Committed this to CVS HEAD, partially.
I didn't commit the user.module part because there is no good reason to load the anonymous user. Modules that load the anonymous user through user_load() are b0rked.
Marking this 'code needs work' as Moshe's suggestions might need further follow-up.
#36
if modules loading the anonymous user through user_load are b0rked, then user.module is guilty of the crime.
#37
Looks good to me?
#38
#39
Committed to HEAD.
#40