Add uid to username to prevent race condition in account generation code
| Project: | Email Registration |
| Version: | 5.x-1.3 |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
In email_registration_user, if the generated username already exists, the code will issue a query to find the last numeric sequence number in use with that username, then add one to it and change the username to that. But because Drupal is generally run in a multi-process or multi-thread environment and the check and the update do not happen atomically, it is possible for another user to create an account with the same generated name, which would be assigned the same sequence number. The sequence is like this:
- User 1 creates account 'racer@example.com'
- User 2 creates account 'racer@aol.com' at about the same time
- email_registration_user runs for User 1, finds that username racer already exists, chooses sequence number 1
- email_registration_user runs for User 2, finds that username racer already exists, chooses sequence number 1
- email_registration_user runs for User 1, changes username to racer_1
- email_registration_user runs for User 2, changes username to racer_1. SQL statement fails because of duplicate username, and user gets a weird SQL error in their browser
My solution is just to always append their UID. We already know it is unique. This also simplifies the query, and make it work on Postgres (see http://drupal.org/node/421078).
Patch is attached.
| Attachment | Size |
|---|---|
| email_registration_username_uid.patch | 1002 bytes |

#1
I just posted a patch to another issue when I saw this and it got me thinking. Why not just replace the @ with an underscore and put the whole email address in. There shouldn't be any collisions at all that way, plus every user name could be computed from their email address without added information.
#2
There is a privacy issue with revealing the user's entire email address as their username, if their username is ever visible.
#3
I agree with scottgifford - that we can't show the entire e-mail address. Even showing partial e-mail is a potential issue because spammers can guess the domain (yahoo, hotmail, gmail will cover 75% of the users on most sites).
What about #247717: provide a hook so other modules can help generate the user name which would let the admin define a style for the username generation including UID. On small sites the race condition will never be a problem and uids in the name are kind of annoying so they could be ommitted there.
#4
IMO it is a mistake to ignore the race condition, even for smaller sites. Don't sites generally avoid showing the username if they are using email registration, and so the username doesn't matter at all?
Without using the UID, the trick would be to either make the change atomically (perhaps by locking the SQL tables), or else to detect and handle the error, and try again with a new username. That's probably a better solution overall, but using the UID was fast and easy. :)
#5
This would fix also #421078 postgres compatibility issue. Adding _UID certainly would make the function less complicated although there would need to be a check for existing usernames for sites already using the module.
#6
Hrm, so you're saying that maybe I create an account for "user@example.com that is assigned the UID "123", so username "user_123" is generated. Previously a user has signed in and already has the name "user_123". I hadn't thought about that; we don't have to worry about it on our site because we have used this patch since the beginning.
Maybe the easiest solution is to just lock the tables, then. Adding a new user is a fairly infrequent event, and the tables will just be locked for a short time.