From #21:

  1. It's semantically correct. Anonymous posts aren't all by the fictitious user zero, they're all by an undefined ("NULL") user.
  2. 0 = 0, but NULL <> NULL. We shouldn't be pretending any anonymous post is by the "same" author as another anonymous post.
  3. We lose the "magic" anonymous user row in the users table.
  4. Should we ever use foreign keys, we can automate anonymizing user content on user deletion.
  5. We have to do annoying things to even use the value zero in an auto-incrementing MySQL column.
  6. We can better enforce what UID columns allow anonymous values by choosing whether to allow or disallow NULL values.
  7. Because we already handle the anonymous user as a special case (a variable_get with no link) when rendering usernames, we can stop the confusing practice of joining anonymous user content onto the users table only to ignore everything we get from it.
Files: 
CommentFileSizeAuthor
#56 350407-56.patch2.71 KBrlmumford
PASSED: [[SimpleTest]]: [MySQL] 52,206 pass(es).
[ View ]
#49 350407.patch2.58 KBEclipseGc
FAILED: [[SimpleTest]]: [MySQL] 51,341 pass(es), 1 fail(s), and 0 exception(s).
[ View ]
#47 350407-47.patch1.94 KBrlmumford
FAILED: [[SimpleTest]]: [MySQL] 51,327 pass(es), 1 fail(s), and 0 exception(s).
[ View ]
#45 350407.patch1.45 KBEclipseGc
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]
#25 350407-uid-is-totally-NULL-1-D7.patch8.28 KBDave Reid
Failed: Failed to apply patch.
[ View ]

Comments

Title:This uid totally should be NULL instead of 0. Anonymous uid totally should be NULL instead of 0.

Could you please point to some specifics how it would benefit being NULL instead of 0 since this would be a lot of work to change this one little thing?

Ever seen install.php ? tried to import a users table? not easy... that in alone would worth it. Also, it's theoretically more correct.

agreed - it is theoretically more correct. might be better to approach the problem with a 'layer' in between... since a lot of problems (and re-coding presumably) would be caused by a return or use of NULL... why not return a -1 value instead? to be clear - the value in the table would be NULL and the value in use by drupal or returned on a call would be -1.

just an idea... ;)

chx is referring to #204411: Anonymous user id breaks on MySQL export/import (xID of Zero get's auto-incremented on external xSQL events) and the following code in system.install (!= install.php):

<?php
 
// This sets the above two users uid 0 (anonymous). We avoid an explicit 0
  // otherwise MySQL might insert the next auto_increment value.
 
db_query("UPDATE {users} SET uid = uid - uid WHERE name = '%s'", '');
 
// This sets uid 1 (superuser). We skip uid 2 but that's not a big problem.
 
db_query("UPDATE {users} SET uid = 1 WHERE name = '%s'", 'placeholder-for-uid-1');
?>

Also note that if a field can be NULL and said field has a UNIQUE key on it then you can have multiple rows with NULL values because in SQL, the NULL value is never true in comparison to any other value, even NULL. This sounds like useful to me.

-1 is not a good idea, you would break a lot of code totally unnecessary. Beauty of this feature is that if ($user->uid) still works happily.

Not sure this is a good idea. Wouldn't the performance of SQL queries suffer? I don't think you can do joind on fields containing NULL

I'm taking a look at how this would be done and will help with the patch when it becomes possible to do so. One thing is we'll have to drop the primary key on {users} since uid cannot be a primary key if it allows NULL values. It will need to be changed to a unique key instead. From http://dev.mysql.com/doc/refman/5.1/en/create-table.html:

A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. For all engines, a UNIQUE index allows multiple NULL values for columns that can contain NULL.

A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL. If they are not explicitly declared as NOT NULL, MySQL declares them so implicitly (and silently). A table can have only one PRIMARY KEY. If you do not have a PRIMARY KEY and an application asks for the PRIMARY KEY in your tables, MySQL returns the first UNIQUE index that has no NULL columns as the PRIMARY KEY.

Side node while looking through this last night. Do we have a good reason for keeping an anonymous user record in the {users} table? user_load() could easily use drupal_anonymous_user().

+1000

Dave, we keep that in the table to make INNER JOINs possible.

@killes: Joining on anonymous users it not a sensible operation because they're, well, anonymous. It doesn't mean anything for two items to both come from "anonymous," so the SQL model of NULL <> NULL is appropriate. Even the idea of the UID being NULL is entirely sensible because the user has no true UID.

Moreover, if we ever did start using foreign keys, there are two approaches we'd consider: CASCADE DELETE and SET NULL. SET NULL would anonymize a user's contributions without deleting them.

@Dave and @chx: We'd remove the anonymous user row from the users table. The current row is a hack-ish placeholder. I see no reason to use INNER JOINs on the users table if we want to include anonymous data. We should just LEFT JOIN.

Yeah there's no point to having a NULL/empty record in the users table just so we can do inner joins. We won't have to implement a sequence API just to make the uid serial work, it won't have problems with import/exports anymore, and we can still make NULL the default uid (or leave it as 0 since the main problem would be fixed). Seems the optimal solution. I'll take a look at implementing that as well.

My casual benchmarks on the Drupal.org servers show this:

mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n LEFT JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.43 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n INNER JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.43 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n LEFT JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.36 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n INNER JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.30 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n LEFT JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.42 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n INNER JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.34 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n LEFT JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.33 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n INNER JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.31 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n LEFT JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.33 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n INNER JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.31 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n LEFT JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.42 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n INNER JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.33 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n LEFT JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.41 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n INNER JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.35 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n LEFT JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.36 sec)
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM node n INNER JOIN users u ON u.uid = n.uid;
+----------+
| COUNT(*) |
+----------+
|   340862 |
+----------+
1 row in set (0.31 sec)

So, it looks like a tiny performance hit of <0.05s for 340K nodes. There's also potential for performance gains for anonymously published data because MySQL knows that NULL <> NULL and doesn't have to bother JOINing with the user table on those rows.

The reason INNER is so much faster in some cases (through not in this one) is because INNER JOINs allow MySQL to conveniently start from either side of the JOIN; the JOIN is symmetric. Because a LEFT JOIN doesn't restrict its result set based on rows in the right-hand table, MySQL can't usefully restrict its working set based on starting there. This isn't a significant concern for the users table because we generally use it to pull extra data on users and not to restrict results.

If we did want to use the users table to restrict results, we would still use an INNER JOIN, even with the anonymous UID being NULL.

I'm not sure removing the uid = 0 line is very wise.

The main benefit of this line is that we automatically get name = 'anonymous' when joining {node} or {comment} to {user} for anonymous content. Removing that line would mean special casing this everywhere, including in places that doesn't really require more complexity (think... Views).

@Damien We don't have to have special cases everywhere because everything should render usernames through theme functions. Moreover, we already have as much of a "special case" as we would need: we don't link "Anonymous" to /user/0. The extra step of also outputting "Anonymous" is no greater work than deciding to not link "Anonymous" and link all real accounts.

Yeah there's literally nothing valuable when doing joins on users for anonymous users. Every field is empty, and it's up to the query to do something like echo (!$result->uid ? variable_get('anonymous', t('Anonymous')) : $result->name);.

After reading through this I'm still not seeing the win of switching to NULL...

@drewish:

* It's semantically correct. Anonymous posts aren't all by the fictitious user zero, they're all by an undefined ("NULL") user.
* 0 = 0, but NULL <> NULL. We shouldn't be pretending any anonymous post is by the "same" author as another anonymous post.
* We lose the "magic" anonymous user row in the users table.
* Should we ever use foreign keys, we can automate anonymizing user content on user deletion.
* We have to do annoying things to even use the value zero in an auto-incrementing MySQL column.
* We can better enforce what UID columns allow anonymous values by choosing whether to allow or disallow NULL values.
* Because we already handle the anonymous user as a special case (a variable_get with no link) when rendering usernames, we can stop the confusing practice of joining anonymous user content onto the users table only to ignore everything we get from it.

I'd say that's a pretty good list of reasons. Thanks for the summary ;)

Issue tags:+anonymous users

Component:database system» base system
Status:Active» Needs review
StatusFileSize
new8.28 KB
Failed: Failed to apply patch.
[ View ]

First step, remove any comparisons to 0 and instead simply do $uid or !$uid.

Hm, sure, but why did comment use === I wonder.

The code was introduced in #147417: Only save cookie for anonymous users and looks like there was no real reason to do an identical check. Comparing !$user->uid works just fine in that case.

Status:Needs review» Needs work

The last submitted patch failed testing.

Not too long ago (post Drupal.org D6 upgrade), we were specifically seeing the semantic harm in uid 0 for anonymous visitors at drupal.org/project/user. It was showing projects without an assigned user, which is not equivalent to all anonymous visitors owning those projects. Having uid 0 for anonymous users invites those sorts of errors.

Did this ever go anywhere?

Just thought I'd point out that we'll want the uid column in {users} to start at 1, if we want to still be able to do if($user->uid).

Version:7.x-dev» 8.x-dev

Probably not going to happen in Drupal 7.

subscribing

Totally!

Subscribing.

Running into the same problem with the quotes module. I'd like "Anonymous" to be aid (author id) NULL instead of 0, but I can't because it's an AUTO_INCREMENT field. Therefore, subscribing...

Just to clear up some confusion, uid is technically no longer AUTO_INCREMENT, it was changed to a regular 'int' field as of #356074: Provide a sequences API. However no upgrade path was provided in that patch, so D6->D7 db's will still be set to AUTO_INCREMENT, which is a bug :)

Interesting. No strong preference, but all I can say is that it's a Drupalism to have an actual value for uid 0 - other systems indeed have no value (NULL) for anonymous users.

So, while this change feels a bit like "ugh, wow." I can definitely see the case of sanity and consistency.

you have no idea of what you are getting into by changing an often used value into NULL . SQL NULL handling is madness.

Edit: yes I said years ago theoretically more correct and theoretically it might be TRUE, in reality, ouch.

@chx ... I second that. Recently had a series of cascading problems from a NULL field db change. Suffice to say, this change, if adopted, would best be part of a major version release (#31) and decided as early as possible. Real life db normalization should teach us many things; ie Best in theory is not always best in reality.

Also, anonymous users should be just userS and handled in an anonymous user bundle to handle the need for different fields as in comment module ie shut this issue down in favor of #355513: Create an anonymous user API

Status:Needs work» Closed (duplicate)

I agree with #40.

Better approach at #355513: Create an anonymous user API

Issue summary:View changes

Updated issue summary.

SInce #355513: Create an anonymous user API has stalled,m I think this might still be a good way forward. I updated the Summary.

Title:Anonymous uid totally should be NULL instead of 0. Anonymous should not appear in the users table at all

Status:Closed (duplicate)» Active

Can only guess you meant to re-open this issue.

That said, this change proposal seems to make sense on its own, regardless of whether we'll do #355513: Create an anonymous user API in addition or not.

I.e.: you either get NULL or you get actual account information. That is, because aforementioned issue only intends to provide anonymous account info under certain conditions, so getting NULL for anonymous users won't be completely obsolete.

Status:Active» Needs review
StatusFileSize
new1.45 KB
FAILED: [[SimpleTest]]: [MySQL] Drupal installation failed.
[ View ]

Ok, Drupal 8 leverages typed data in a bunch of places, and short of special casing the ever loving crap out of users within it, I think we have to consider this is a requirement for release at this point. I'm moving this to major and providing the beginnings of a patch to see what fails and if I can get this moving some.

Eclipse

Status:Needs review» Needs work

The last submitted patch, 350407.patch, failed testing.

Status:Needs work» Needs review
StatusFileSize
new1.94 KB
FAILED: [[SimpleTest]]: [MySQL] 51,327 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

This patch also comments out a line in theme.inc that was causing the install process to break. It is now left to user module to add the 'user' variable to the default variables using hook_template_preprocess_default_variables_alter(). [I think it does this already]

Status:Needs review» Needs work

The last submitted patch, 350407-47.patch, failed testing.

StatusFileSize
new2.58 KB
FAILED: [[SimpleTest]]: [MySQL] 51,341 pass(es), 1 fail(s), and 0 exception(s).
[ View ]

Ok, another revision to fix that test.

Status:Needs work» Needs review

Status:Needs review» Needs work
Issue tags:-anonymous users

The last submitted patch, 350407.patch, failed testing.

Status:Needs work» Needs review

#49: 350407.patch queued for re-testing.

I don't know why that failed, it passes for me locally on that test. retesting.

Eclipse

Status:Needs review» Needs work
Issue tags:+anonymous users

The last submitted patch, 350407.patch, failed testing.

nvm, I'm crazy that doesn't pass locally. Will get it tomorrow.

Been looking through CommentFormController and it looks like $comment->is_anonymous is not being set to true in buildEntity() - it's actually not being set to anything - despite being set correctly in the form. That means the 'name' doesn't get stored with the comment (as they assume a real user has posted it).

Status:Needs work» Needs review
StatusFileSize
new2.71 KB
PASSED: [[SimpleTest]]: [MySQL] 52,206 pass(es).
[ View ]

I've done some more digging and the problem actually seems to be in the CommentStorageController around line 150.

<?php
     
// We test the value with '===' because we need to modify anonymous
      // users as well.
     
if ($comment->uid->target_id === $user->uid && isset($user->name)) {
       
$comment->name->value = $user->name;
      }
?>

I'm not entirely sure what this code is trying to do however, as the anonymous user is a now an entity isset($user->name) is always going to be true and so the name set on the comment is never going to stick. I think we can change this to the following:

<?php
     
// We test the value with '===' because we need to modify anonymous
      // users as well.
     
if ($comment->uid->target_id === $user->uid && $user->uid) {
       
$comment->name->value = $user->name;
      }
?>

Not sure why we can't just remove the commented-out code from _template_preprocess_default_variables().

Status:Needs review» Active

For anyone else that didn't know, the current patches were moved to #1634280: drupal_anonymous_user() should return a User entity. Moving back to active for the original proposal.

Issue summary:View changes

Updated issue summary.