Need help writing an SQL query to import 21,000 users into Drupal

Keyz - May 21, 2008 - 18:57

Hey Drupal guys and gals :)

I'm working on converting my site to Drupal. The site, neverside.com, currently runs on a custom CMS (programmed by someone besides myself) and has quite a lot of content and users that I need to import into Drupal... about 21,000 users... 95,000 forum threads (nodes)... and 711,000 replies (comments). The site is currently dead/inactive, but with Drupal's magic I shall revive it! :D I'll be importing the nodes and comments as well, though trying to tackle users first.

I've investigated the User Import module, and determined it may not be usable for my needs (multi-hundred MB size of the CVS files aside, it's important that all my User IDs remain exactly the same to ensure that all the other content stays in sync with the correct users, even if some rows in my user table are missing - it looks like the User Import module will import users and create new ones in sequence, possibly changing a user's ID if there are any missing rows. I think the only way to ensure that IDs remain the same is to manually import the users via SQL.

As I'm not very proficient with programming or databases, I'm hoping someone can help me craft the correct SQL query I'll need to manually add users into Drupal's users table, and help me make sure it's done correctly.

Besides the users table itself, this is what I'm aware of what I need to do for the users. If there are other things I'm missing, please let me know.

  • I'm aware that I'll need to update the "users_uid" value to the last uid # in the sequences table once I've imported.
  • I've checked to ensure that passwords between the systems use the same md5 method, so are compatible.
  • I don't need to set any user roles (only moderators, and I'll do that by hand).
  • The usernames are already sanitized and free of special characters, as the current CMS gives every user their own subdomain (user.neverside.com)... the new site won't have that, though it means the names are cleaned.
  • I've selected the fields I have that match those for a Drupal user... there are additional fields too but I plan to make node profiles for that).
  • The old database stores only filename.jpg for avatars (e.g. Drupal picture field)... will need to include the files/pictures path in front of the filename when inserting the file names into Drupal's picture field. I'm not sure at all how to accomplish this.

Here is the SQL I'm using to export users from my current database:

SELECT user_id, username, password, email, signature, join_date, last_activity, last_visit, blocked_status, avatar FROM user ORDER BY user_id ASC;

The fields in the Drupal users table that I don't have and would need to set to whatever is default are: mode, sort, threshold, theme, timezone, language, init, data (I see that init is a duplicate of the email address).

So to sum it all up...

  • I plan to import the exported data into a temporary table in Drupal and run (via SSH/terminal) the SQL to import the users from there (I'm developing locally so the live database is on a different server). Is this the best way to proceed with that, and is there anything I should know about making an extra table in Drupal's database?
  • Would you help me write the SQL query needed to correctly import the data and make shiny, sparkly authentic Drupal users? :)

Ultimately the site will run on Drupal 6, as there are several more months of development on the site, so I can wait for the magic goodness of CCK and Views, Advanced Forum & Profile, etc on Drupal 6... however if needed I can import into Drupal 5 and upgrade to 6 later.

Thanks a ton for your help! Drupal rules!

- David

Small update

Keyz - May 21, 2008 - 22:07

Small update...

I realized I do in fact have the timezone field for my old CMS users, it's just stored in a different format. It's stored as a single digit, e.g. 0, -1, -2, etc. In the PHP on the site it appears to multiply this by 3600 seconds to arrive at the same numbers Drupal uses.

So I've added this part to my SELECT query: timezone * 3600 AS timezone ... which appears to set the numbers to Drupal's format.

Thanks in advance for your help! :D

- David

Progress

Keyz - May 22, 2008 - 11:18

Made some progress and think I "may" yet be able to use User Import module after all. I figured out a way of skimming through all ~21,000 user rows and confirming that there are no missing IDs at all. To my amazement there's not one missing. So my new plan is to import the basic data that User Import will accept and insert the rest of the fields manually with SQL.

Wish me luck! :D

just a tip

alihammad - May 26, 2008 - 22:01

I am not sure what encryption you use for your passwords but drupal uses md5. So, keep that in perspective!

Ali Hammad Raza
WordsValley

Yep I'd been worried about

Keyz - May 26, 2008 - 22:39

Yep I'd been worried about that before, but someone recommend that to test I go ahead an make a user with the same password between the two systems and compare the md5 hash to see if they match. Fortunately they matched, so passwords should import correctly :) No salt or other stuff to worry about.

I completed my first test round and everything went smoothly (once I'd filtered and corrected invalid and duplicate emails). Logged in as an imported user, and everything looks great. Still have to generate profile nodes (for use with Bio or Node Profile) to accept a variety of additional profile data I have.... and also need to manually run some SQL to properly set a few user preferences, such as their Contact form being enabled or not (still working on that, as the value is stored serialized in the "data" field).

Thanks!

wish you luck (y)

alihammad - May 27, 2008 - 14:29

Ali Hammad Raza
WordsValley

 
 

Drupal is a registered trademark of Dries Buytaert.