Hey Drupal guys and gals :D
I'm working on migrating a large amount of content to the new Drupal-powered version of my site neverside.com. I have about 21,000 users, 95,000 forum threads (nodes)... and 711,000 replies (comments). I imported users using User Import, and the nodes manually with SQL. So far I've successfully imported the users and nodes, as well as set correct taxonomy terms for the nodes to appear in the correct forum categories. I'm a bit stuck on getting the comments imported correctly though.
I have this query written to add my old system's posts to Drupal's comments table (I have a temp table that contains all the relevant data I have available for importing into Drupal... the comments/replies of my old system are sorted by timestamp, no threading at all):
INSERT INTO comments (cid, pid, nid, uid, subject, comment, hostname, timestamp, score, status, format, thread, users, name, mail, homepage)
SELECT
p_thread_id,
'0',
p_user_id,
p_title,
p_post,
'',
p_time,
'',
'0',
'1',
'',
'a:1:{i:0;i:0;}',
'',
''
FROM neverside_replies;
The main thing I'm unsure about is how to correctly populate the "thread" column. I've looked at how Drupal fills this column by default using the "int2vancode and vancode2int functions" (which I don't understand completely). Even when comments are not threaded, it seems to store this value in a special format. I noticed that using Devel module's content generation tool, comments are generated with a value of "0" for thread, and it seems to work fine. However I'm not sure if this is the best way to do this for my actual content (e.g. might it break if someone adds new comments to an old node?). When I'm done I know I need to also update the sequences table as well as rebuild node_comment_statistics.
In addition to the "thread" column, if there are also any other tips on importing comments correctly I'd greatly appreciate it :D
Thanks a bunch guys! :D
- David
Comments
I was wondering!
just out of curiosity, why are you not using comment_save() (drupal api) from within a drupal node ?
Ali Hammad Raza
WordsValley
Hi Ali - Thanks, I'll have a
Hi Ali -
Thanks, I'll have a look at it on the API docs to see if it helps me understand better... however I'm importing everything directly via SQL, because I am not any good with PHP yet, and because it is such a huge amount of content doing it over the database is faster and less prone to page timeouts/etc. Also as not all my pages and their comments have sequential IDs (e.g. some deleted over time) this is the best way I can think of to ensure that everything stays properly connected to each other, maintaining their original IDs.
- If anyone knows, I'd still love some pointers in the right direction on how to turn what I have (posts grouped by node ID and ordered by timestamp) into correct values for the comments "thread" field. Devel sets it just to
- I also do have the comment IP addresses I could import, though does anyone know if this can be left blank? The IPs are useless at this point (up to 8 years old).
I've also corrected some of my SQL statement:
For my Wordpress to Drupal
For my Wordpress to Drupal converter, this is how I've handled the 'thread' field of the comments table.
After each post/node is added (copied across from Wordrpess) an SQL Select query (recordset) retrieves all the comments, if any, for the particular post ordered by comment date (ascending - oldest first).
The thread field merely starts at 1 for the first and increments. The thread field is formatted as a string "00" so 1 is "01". Further there is a slash at the end of the string. So your thread field values will look like these:
01/
02/
03/
Remember, for the next post/node, start all over again with the comments 01/, 02/ and so on.
For Wordpress, comments are not threaded, one follows the other, so it was basically easy to implement.
Hope this helps.
___________________________
Steven Taylor
http://prime357.org
Thanks Steven...
Thanks Steven...
One thing I'm curious about... I noticed that "real" comments generated as normal by Drupal start off in that format, 01/, 02/, etc... though when it gets to the tenth comment for a given node, it instead uses 0a/, 0b/, etc (it continues adding alphabet letters in this format past comment 20, which is set to 0k/... after it gets to 0z/ it then switches to 110/ on comment 37 and continues... not sure what happens after that). If it were just using concat to add a 0 and slash to the CID (per node) I'd be good to go, though the alphabet letters throw a wrench into the mix for me. I haven't yet figured out how to accommodate this sort of thing in my SQL statement.
I tested adding my own threaded comment reply to a comment generated by Devel module (which just sets all thread values to 0), and the result was a thread value of 0.00/ for my new reply comment, and the comment appearing visually on the comment listing associated with a different comment than I had pressed the reply link for. So that answers one question for me... doing things Devel module's way may allow comments to function normally so long as I disable threading on future comments (or enable it and just know that threading will break on all old nodes if new comments are added). To keep the option for threaded comments open if desired though, it looks like I'm going to have to correctly populate the thread field. All my nodes are forum nodes, so revived old forum threads are a definite possibility, as tends to happen on forums.
I'm definitely in over my head on this one haha...
Thanks everyone for your help! :D
Vancode
If you're going to do it the raw SQL way (as I did) then you'll need to use the 'int2vancode' and possibly 'vancode2int'. A 'Vancode' must be a word to describe what is supposed to be in the 'thread' column.
http://api.drupal.org/api/function/int2vancode/5
http://api.drupal.org/api/function/vancode2int/5
Since these functions don't use any other drupal code you could just copy and paste them into any scripts of your own. Worked for me.
Thanks, very helpful. I'll
Thanks, very helpful. I'll be creating the necessary functions for my next release of the Wordpress converter.
___________________________
Steven Taylor
http://prime357.org
Hi dejbar - Thanks :) Did
Hi dejbar -
Thanks :) Did you, after importing the rest of the fields of your comments as best you could using SQL, process the thread column using a PHP script so you could make use of those functions? If you did and still have it, I'd love to take a look at that script. If not that's ok, I have a friend who's great with PHP who may be able to recreate a script based on those functions (unfortunately I'm not very good at PHP yet myself). I definitely wouldn't have been able to import the whole site with a PHP script (the CSV files are hundreds of MB), though a script to process a single column about a million times may be ok haha :D
Thanks!
- David
I know this is an old
I know this is an old thread, but I just imported and generated the thread field using SQL, so here's the SELECT query to generate the vancode for the PARENT POSTs (level 1 posts, which have threads like '01/', '02/', etc) only (I added 'isroot' field to the table just for testing, so you should change that WHERE part :D )
with a bit modification you could run it gradually from level 1 to level n
Hope that helps for a starting point
Thank you
Thanks very much Safri :) I will give it a try. The site in question is still under development (had a long break) so I'm sure the info will still be very helpful.
Again, thanks!
-- David
davidnewkerk.com | absolutecross.com
View my Drupal lessons & guides
Script for thread and comment statistics
I know it's been a while but I ran into a similar situation recently after merging a few sites and importing comments from disqus. At the end of the day I cobbled together this script to both rebuild the node_comments_statistics table and the comment.thread. As there's no clear cut solution posted here, I figured I'd post it incase someone else comes a looking. (Cross posted on my site http://seanreiser.com/projects/2009/06/06/rebuilding-drupal-comment-stat... )
Thanks sreiser, very useful
Thanks sreiser, very useful
sina.salek.ws, Software Manager & Lead developer
Feel freedom with open source softwares
You saved my life!
I had 4000 threaded comments to import from PostNuke for the new version of flayrah.com. Would have taken me an age to figure this out - so thanks a lot.
I did have to change line 18 to:
to handle the case where a name had a quotation mark in it, though.
Hi, the correct function name
Hi, the correct function name to use is: db_escape_string()
M.
Freelancer Senior Drupal Developer -- http://www.ziobuddalabs.it
Sean, Your script saved me
Sean,
Your script saved me from lot's of headaches, thank you very much!
Hi Sean, I am trying to use
Hi Sean,
I am trying to use your excellent script to correct thread and comment statistics for a huge website with more than 17000 nodes being imported to Drupal.
It started to work very smoothly in the beginning and correctly showed number of comments for the first nodes. After it would went on for very long time (couple hours maybe) it had stuck at node #15945, for which and all nodes after which it did not show correct number of comments. I ran script again and it stuck exactly at the same node.
So my question, is it possible to and if yes how exactly I modify the script so that it started to crawl from the node #15945 and not from the very beginning if I start it again?
Thanks,
Yngens
Hi Yngens- I know you've been
Hi Yngens-
I know you've been persistant trying to get in touch with me today about this. I'm sorry I haven't been available sooner, I was taking a day off. I really haven't thought about this code in about a year (I wrote it, it did what I needed, I moved on), but looking at the code I don't see why it would stop at 15945 (or any other node).
First off, where the code hasn't been written to limit the run to a subset of nodes you could add a "where nid => XXXXX" to the first 3 lines in order to run against a certain group of nodes.
To debug this I'd start off starting a run at node 15945. If it hangs up again I'd start at 15946 and see if it runs through to the end. If it does there is some corruption in the comment chain for 15495 (example 2 nodes pointing at each other as their parent).
I will repost this the other places you've left this comment and reply to your email.
Sean, I just deleted all the
Sean,
I just deleted all the comments and the problematic node itself and ran script without any interruption again. It worked great this time, thank you!
D7 script for thread and comment statistics
Been working on migrating comments from Postnuke->Drupal 7 and had to modify Sean's script above - use at your own risk!
Regards,
L33tdawg@hackinthebox.org
After beating my head against
After beating my head against this a bit myself, I found that the import script above was close but wasn't quite getting the desired result, at least for a thread with multiple levels of nesting. I looked at the equivalent chunk of code in comment_save and updated the thread setting behavior for comments without parents.
This script should correctly update comment threads:
After that, I'm relying on devel_rebuild_node_comment_statistics(); to update counts and the stats table.
Hope this helps someone else!
Instead of running the update
The above update scripts did a partial job for me, so I had to run these two commands to finish the job for the counts and user names:
update node_comment_statistics, users set last_comment_name=users.name where node_comment_statistics.last_comment_uid=users.uid;
update node_comment_statistics set comment_count=(SELECT COUNT(comment.nid) FROM comment WHERE node_comment_statistics.nid=comment.nid GROUP BY comment.nid);
I didn't update the cid field though yet in the comment table, but everything is appearing ok in Advanced Forum/Forum lists and counts.