Don't SELECT user uid = 0

robertDouglass - August 3, 2006 - 18:13
Project:Drupal
Version:x.y.z
Component:user.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:closed
Description

A client just brought up an interesting point; there is never any need to SELECT * FROM users WHERE uid=0, is there? Likewise, updating the login and data columns for uid is equally senseless. Could we spare ourselves all the uid=0 queries, perhaps by checking for uid=0 in user_load? (I haven't searched to see if there are any current implementations of these ideas... I'm just looking for feedback).

#1

robertDouglass - August 9, 2006 - 17:07

In particular, it looks like this code from sessions.inc could be improved:

<?php
function sess_read($key) {drupal_set_message("getting a normal session");
  global
$user;

 
// retrieve data for a $user object
 
$result = db_query("SELECT sid FROM {sessions} WHERE sid = '%s'", $key);
  if (!
db_num_rows($result)) {
   
$result = db_query("SELECT u.* FROM {users} u WHERE u.uid = 0");
  }
  else {
   
$result = db_query("SELECT u.*, s.* FROM {users} u INNER JOIN {sessions} s ON u.uid = s.uid WHERE s.sid = '%s'", $key);
  }
?>

  • Do we really need to SELECT where u.uid = 0? Wouldn't $user = new stdClass(); $user->uid = 0; do the trick?
  • Couldn't we do the third query as a LEFT join from sessions to users and use that query both to see if the client has a session AND if they have a user account?

That would save a query per page load if it worked.

#2

Gerhard Killesreiter - August 9, 2006 - 17:17

funny, I thought about the same today. Yes, we need the select since we store session data in the sessions table. Things like language switches rely on this.

#3

robertDouglass - August 9, 2006 - 17:24

Sorry, I didn't understand that. I know we need to select from the sessions table for whatever sid we get, but do we ever need to select from the users table for uid = 0, and why?

Do you think my idea for the "SELECT FROM sessions LEFT JOIN users" would work?

#4

Gerhard Killesreiter - August 9, 2006 - 17:29

Oops, yeeah, we probably don't need that.

#5

robertDouglass - August 9, 2006 - 20:09
Category:support request» bug report
Status:active» needs review

Sweet! It works. Three queries condensed into one.

AttachmentSize
sessions_0.patch 1.18 KB

#6

robertDouglass - August 9, 2006 - 21:47

There is likewise no need to select uid = 0 in user_load. With this patch, I just build an object, call the user_module_invoke('load'...), and return it. This saves two queries when anonymous users access non-cached pages.

AttachmentSize
session-user.patch 2.07 KB

#7

m3avrck - August 10, 2006 - 04:12

I'm a bit tired looking at the patch, but can't the bottom of that patch change that second IF to an ELSE IF to prevent the query building there?

Otherwise the patch looks great, knocking off one query page is a nice performance boost :)

#8

robertDouglass - August 10, 2006 - 06:22

The $user object gets returned before then, so it never gets to that.

#9

Dries - August 11, 2006 - 18:22

For the anonymous user we still unpack $user?

<?php
   $user
= drupal_unpack($user);
?>

I think we might be able to refactor sess_read() some more -- we'd need less if-checks.

The change to user_load is a bit icky.

#10

robertDouglass - August 11, 2006 - 22:06

Apologies if this patch is weird in any way... it was resisting making a nice patch and I had to tweak it by hand.

AttachmentSize
user-sessions.patch 2.71 KB

#11

robertDouglass - August 11, 2006 - 22:22

this one is better.

AttachmentSize
session-user_0.patch 2.7 KB

#12

Dries - August 13, 2006 - 09:09

On my test setup, this patch improves performance by 4% (when serving cached pages to anonymous users). That's a substantial improvement. :-)

When serving non-cached pages to anonymous users, performance improves by

There are two small improvements left for user_read():

1. Change if (!$user = ...) to if ($user = ...) by switching things around. This eliminate a '!'.

2. The last line of sess_read() can be eliminated return !empty($user->session) ? $user->session : '';. Just do the returns in the if-else clauses. This saves an '!empty()' test in the critical path.

Also, anonymous users can have session information too (eg. to store comment display settings) ... Would that still work after your patch? Mmm. Your patch seems to assume that when session information is found, it must be from an authenticated user ... or something. Please double-check.

#13

Dries - August 13, 2006 - 09:10

When serving non-cached pages to anonymous users, performance improves by ... 1%. ;-)

#14

robertDouglass - August 13, 2006 - 10:30

Made the code changes.

The sessions work just the same way they did before. The initial query in sess_read is this:

SELECT u. * , s. *
FROM sessions s
LEFT JOIN users u ON s.uid = u.uid
WHERE s.sid = /* php session id */
LIMIT 0 , 30

For an anonymous user, it returns all the information from the sessions table and nothing from the users table by virtue of being a left join. I tested the comment settings and they work as expected.

#15

robertDouglass - August 13, 2006 - 10:30

patch

AttachmentSize
session-user_1.patch 2.81 KB

#16

Bart Jansens - August 13, 2006 - 14:12

Do we really need a left join? All users, including uid 0, exist in the users table.

I think we can improve session handling a bit further. When no session ID is sent by the client, it is generated by PHP, but we still check if it exists in the database (twice even). We can skip this step because we already know that no session exists. Aside from performing better in ab-benchmarks, it also saves a few queries for the first request made by a client and more important, crawlers. On many sites crawlers are responsible for a significant percentage of all requests (like it or not).

pseudo code:
- if (empty($_COOKIE[session_name()]) : no session ID was sent by the client. Create the simple uid 0 $user object.
- else try loading it from the database. If a session is found, return that user object
- else, (client sent a session id that no longer exists, probably expired) return the simple uid 0 object

We can add a similar optimisation to sess_write as well:
- if (empty($_COOKIE[session_name()] && empty($value)) : do nothing

That way we can serve pages to crawlers without a single database query - assuming nothing was written to $_SESSION.

I'm not sure about combining the queries though, IIRC this used to be a single query in the past, needs some investigating, it was a long time ago ;)

#17

killes@www.drop.org - August 13, 2006 - 14:18

I am too curious why you replyca an inner by a left join. left joins are usually more expensive and also usually not needed.

#18

Dries - August 13, 2006 - 17:46

Bart: great idea! That will help a lot with agressive crawlers. :-)

#19

Bart Jansens - August 13, 2006 - 19:22

The patch that split the query was http://drupal.org/node/44947

Maybe we could do "select * from session where key=X" and only when uid > 0, "select * from user where uid = X" and merge the two results. But i'm no DB expert, i don't know which is faster

#20

robertDouglass - August 13, 2006 - 20:27

Smart people! Great suggestions. If nobody beats me to it I'll re-roll this tomorrow. I'm looking forward to Dries putting it through his performance testing environment.

#21

Dries - August 14, 2006 - 11:46

Bart: will sess_read() be called with $key = NULL/0/empty string? We'll want to test that.

#22

robertDouglass - August 14, 2006 - 14:10

does the above

AttachmentSize
sessions-user.patch 3.91 KB

#23

Dries - August 14, 2006 - 15:25

I believe the sess_write() changes are incorrect. We still need to update the access time in the user table, or things like the "who's online" block stop working properly.

I removed the changes to sess_write() and benchmarked this patch. The result is a 13% speed-up for cached serving pages!

#24

robertDouglass - August 14, 2006 - 16:25

I believe that the code is correct (or at least it is as correct as the existing code). If you run it with the debugging code added, you can verify that the first request (no cookie) goes into the first "if", and would otherwise not go into the second "if", thus the query in between is superfluous:

<?php
function sess_write($key, $value) {
  global
$user;

 
// If the client doesn't have a session, and one isn't being created ($value), do nothing.
 
if (empty($_COOKIE[session_name()]) && empty($value)) {watchdog('session', '0');
     return
TRUE;
  }

 
$result = db_queryd("SELECT sid FROM {sessions} WHERE sid = '%s'", $key);

  if (!
db_num_rows($result)) {watchdog('session', '1');
   
// Only save session data when when the browser sends a cookie. This keeps
    // crawlers out of session table. This improves speed up queries, reduces
    // memory, and gives more useful statistics. We can't eliminate anonymous
    // session table rows without breaking throttle module and "Who's Online"
    // block.
   
if ($user->uid || $value || count($_COOKIE)) {watchdog('session', '2');
     
db_queryd("INSERT INTO {sessions} (sid, uid, cache, hostname, session, timestamp) VALUES ('%s', %d, %d, '%s', '%s', %d)", $key, $user->uid, $user->cache, $_SERVER["REMOTE_ADDR"], $value, time());
    }
  }
  else {
watchdog('session', '3');
   
db_queryd("UPDATE {sessions} SET uid = %d, cache = %d, hostname = '%s', session = '%s', timestamp = %d WHERE sid = '%s'", $user->uid, $user->cache, $_SERVER["REMOTE_ADDR"], $value, time(), $key);

   
// TODO: this can be an expensive query. Perhaps only execute it every x minutes. Requires investigation into cache expiration.
   
if ($user->uid) {watchdog('session', '4');
     
db_queryd("UPDATE {users} SET access = %d WHERE uid = %d", time(), $user->uid);
    }
  }

  return
TRUE;
}
?>

The question is, in the first if, do we want to already write to the session table? I think not. What would we write, after all?

#25

drumm - August 15, 2006 - 06:56
Status:needs review» needs work

drupal_anonymous_user() will need some function documentation.

#26

robertDouglass - August 15, 2006 - 07:40

Changed the syntax for the parameter validation to something less vertical.

AttachmentSize
pluggablesessions_5.patch 5.92 KB

#27

robertDouglass - August 15, 2006 - 07:41

!*%$ wrong patch for this issue.... ignore the last one.

#28

robertDouglass - August 15, 2006 - 07:49

I added documentation to drupal_anonymous_user and moved it to bootstrap.inc. It doesn't belong in session.inc because that contradicts my goal of making our session handling pluggable (because any implementing session API would have to recreate drupal_anonymous_user as well). So it seems it belongs in bootstrap.

AttachmentSize
sessions-user_0.patch 4.7 KB

#29

robertDouglass - August 15, 2006 - 08:28
Status:needs work» needs review

#30

Dries - August 15, 2006 - 11:50
Status:needs review» reviewed & tested by the community

Tested the last version of this patch (unmodified) and performance increases by 20% when serving cached pages, and by 2% when serving non-cached pages. That's big! :)

We should shif focus to non-cached pages now...

#31

moshe weitzman - August 16, 2006 - 03:51

i think the name column for uid=0 is used to give anonymous user a unique name: 'anonymous coward, anonymous hero, etc.' we had a UI for this but i can't find it right now. are we ditching this feature? not sure if we can store this in variables without incurring performance penalty.

#32

Dries - August 16, 2006 - 04:50

There is still a setting for the anonymous user name. If you can't find it, we should move it. Where did you look for it initially, or where did you expect to find it?

IIRC, we added the uid=0 column so we can do inner joins rather than left joins.

#33

killes@www.drop.org - August 16, 2006 - 06:41

We indeed added uid = 0 for doing inner joins which are faster. The name and all the other fields are not used. We did have a lenghty discussion on the matter, but we never switched away from using the variables we have for this.

#34

moshe weitzman - August 16, 2006 - 12:23

the first place i looked was admin/user/settings ... we should set this name in drupal_anonymous_user(), right? then we can remove the variable_get() from theme_username() and anywhere else

#35

Dries - August 16, 2006 - 13:15
Status:reviewed & tested by the community» needs work

Committed this to CVS HEAD, partially.

I didn't commit the user.module part because there is no good reason to load the anonymous user. Modules that load the anonymous user through user_load() are b0rked.

Marking this 'code needs work' as Moshe's suggestions might need further follow-up.

#36

robertDouglass - September 6, 2006 - 12:47
Status:needs work» needs review

if modules loading the anonymous user through user_load are b0rked, then user.module is guilty of the crime.

AttachmentSize
uid0.patch.txt 648 bytes

#37

m3avrck - October 2, 2006 - 00:06

Looks good to me?

#38

robertDouglass - October 18, 2006 - 19:10
Status:needs review» reviewed & tested by the community

#39

drumm - November 7, 2006 - 05:49
Status:reviewed & tested by the community» fixed

Committed to HEAD.

#40

Anonymous - November 21, 2006 - 06:02
Status:fixed» closed
 
 

Drupal is a registered trademark of Dries Buytaert.