How to correctly import comments (especially the comments "thread" field)?

Keyz - May 30, 2008 - 19:59

Hey Drupal guys and gals :D

I'm working on migrating a large amount of content to the new Drupal-powered version of my site neverside.com. I have about 21,000 users, 95,000 forum threads (nodes)... and 711,000 replies (comments). I imported users using User Import, and the nodes manually with SQL. So far I've successfully imported the users and nodes, as well as set correct taxonomy terms for the nodes to appear in the correct forum categories. I'm a bit stuck on getting the comments imported correctly though.

I have this query written to add my old system's posts to Drupal's comments table (I have a temp table that contains all the relevant data I have available for importing into Drupal... the comments/replies of my old system are sorted by timestamp, no threading at all):

INSERT INTO comments (cid, pid, nid, uid, subject, comment, hostname, timestamp, score, status, format, thread, users, name, mail, homepage)
SELECT
p_thread_id,
'0',
p_user_id,
p_title,
p_post,
'',
p_time,
'',
'0',
'1',
'',
'a:1:{i:0;i:0;}',
'',
''
FROM neverside_replies;

The main thing I'm unsure about is how to correctly populate the "thread" column. I've looked at how Drupal fills this column by default using the "int2vancode and vancode2int functions" (which I don't understand completely). Even when comments are not threaded, it seems to store this value in a special format. I noticed that using Devel module's content generation tool, comments are generated with a value of "0" for thread, and it seems to work fine. However I'm not sure if this is the best way to do this for my actual content (e.g. might it break if someone adds new comments to an old node?). When I'm done I know I need to also update the sequences table as well as rebuild node_comment_statistics.

In addition to the "thread" column, if there are also any other tips on importing comments correctly I'd greatly appreciate it :D

Thanks a bunch guys! :D

- David

I was wondering!

alihammad - May 31, 2008 - 10:11

just out of curiosity, why are you not using comment_save() (drupal api) from within a drupal node ?

Ali Hammad Raza
WordsValley

Hi Ali - Thanks, I'll have a

Keyz - May 31, 2008 - 11:37

Hi Ali -

Thanks, I'll have a look at it on the API docs to see if it helps me understand better... however I'm importing everything directly via SQL, because I am not any good with PHP yet, and because it is such a huge amount of content doing it over the database is faster and less prone to page timeouts/etc. Also as not all my pages and their comments have sequential IDs (e.g. some deleted over time) this is the best way I can think of to ensure that everything stays properly connected to each other, maintaining their original IDs.

- If anyone knows, I'd still love some pointers in the right direction on how to turn what I have (posts grouped by node ID and ordered by timestamp) into correct values for the comments "thread" field. Devel sets it just to
- I also do have the comment IP addresses I could import, though does anyone know if this can be left blank? The IPs are useless at this point (up to 8 years old).

I've also corrected some of my SQL statement:

INSERT INTO comments (cid, pid, nid, uid, subject, comment, hostname, timestamp, score, status, format, thread, users, name, mail, homepage)
SELECT
p_id,
'0',
'p_thread_id',
p_user_id,
p_title,
p_post,
'',
p_time,
'0',
'0',
'1',
thread,
'a:1:{i:0;i:0;}',
'username',
'',
''
FROM neverside_replies;

For my Wordpress to Drupal

superjacent - May 31, 2008 - 12:10

For my Wordpress to Drupal converter, this is how I've handled the 'thread' field of the comments table.

After each post/node is added (copied across from Wordrpess) an SQL Select query (recordset) retrieves all the comments, if any, for the particular post ordered by comment date (ascending - oldest first).

The thread field merely starts at 1 for the first and increments. The thread field is formatted as a string "00" so 1 is "01". Further there is a slash at the end of the string. So your thread field values will look like these:

01/
02/
03/

Remember, for the next post/node, start all over again with the comments 01/, 02/ and so on.

For Wordpress, comments are not threaded, one follows the other, so it was basically easy to implement.

Hope this helps.

___________________________

Steven Taylor
http://prime357.org

Thanks Steven...

Keyz - May 31, 2008 - 21:34

Thanks Steven...

One thing I'm curious about... I noticed that "real" comments generated as normal by Drupal start off in that format, 01/, 02/, etc... though when it gets to the tenth comment for a given node, it instead uses 0a/, 0b/, etc (it continues adding alphabet letters in this format past comment 20, which is set to 0k/... after it gets to 0z/ it then switches to 110/ on comment 37 and continues... not sure what happens after that). If it were just using concat to add a 0 and slash to the CID (per node) I'd be good to go, though the alphabet letters throw a wrench into the mix for me. I haven't yet figured out how to accommodate this sort of thing in my SQL statement.

I tested adding my own threaded comment reply to a comment generated by Devel module (which just sets all thread values to 0), and the result was a thread value of 0.00/ for my new reply comment, and the comment appearing visually on the comment listing associated with a different comment than I had pressed the reply link for. So that answers one question for me... doing things Devel module's way may allow comments to function normally so long as I disable threading on future comments (or enable it and just know that threading will break on all old nodes if new comments are added). To keep the option for threaded comments open if desired though, it looks like I'm going to have to correctly populate the thread field. All my nodes are forum nodes, so revived old forum threads are a definite possibility, as tends to happen on forums.

I'm definitely in over my head on this one haha...

Thanks everyone for your help! :D

Vancode

dejbar - June 2, 2008 - 06:55

If you're going to do it the raw SQL way (as I did) then you'll need to use the 'int2vancode' and possibly 'vancode2int'. A 'Vancode' must be a word to describe what is supposed to be in the 'thread' column.

http://api.drupal.org/api/function/int2vancode/5
http://api.drupal.org/api/function/vancode2int/5

Since these functions don't use any other drupal code you could just copy and paste them into any scripts of your own. Worked for me.

Thanks, very helpful. I'll

superjacent - June 2, 2008 - 07:35

Thanks, very helpful. I'll be creating the necessary functions for my next release of the Wordpress converter.

___________________________

Steven Taylor
http://prime357.org

Hi dejbar - Thanks :) Did

Keyz - June 3, 2008 - 19:30

Hi dejbar -

Thanks :) Did you, after importing the rest of the fields of your comments as best you could using SQL, process the thread column using a PHP script so you could make use of those functions? If you did and still have it, I'd love to take a look at that script. If not that's ok, I have a friend who's great with PHP who may be able to recreate a script based on those functions (unfortunately I'm not very good at PHP yet myself). I definitely wouldn't have been able to import the whole site with a PHP script (the CSV files are hundreds of MB), though a script to process a single column about a million times may be ok haha :D

Thanks!

- David

I know this is an old

natrio - January 26, 2009 - 11:00

I know this is an old thread, but I just imported and generated the thread field using SQL, so here's the SELECT query to generate the vancode for the PARENT POSTs (level 1 posts, which have threads like '01/', '02/', etc) only (I added 'isroot' field to the table just for testing, so you should change that WHERE part :D )

select concat(cast(char(length(conv(rank, 10, 36)) + ord('0') -1) as char), cast(conv(rank,10,36) as char)), nid, pid, cid, subject, comment, thread
from (
  select if (@nid=nid, @num := @num + 1, @num:=1 ) rank, @nid :=nid,
  nid, pid, cid, thread, subject,
  comment from comments e, (select @num:=0) d, (select @nid:=0) h
  where isroot=1
  order by nid asc, pid asc
  ) u ;

with a bit modification you could run it gradually from level 1 to level n

Hope that helps for a starting point

Thank you

Keyz - January 26, 2009 - 11:54

Thanks very much Safri :) I will give it a try. The site in question is still under development (had a long break) so I'm sure the info will still be very helpful.

Again, thanks!

-- David
davidnewkerk.com | absolutecross.com
View my Drupal lessons & guides

Script for thread and comment statistics

sreiser - June 9, 2009 - 17:46

I know it's been a while but I ran into a similar situation recently after merging a few sites and importing comments from disqus. At the end of the day I cobbled together this script to both rebuild the node_comments_statistics table and the comment.thread. As there's no clear cut solution posted here, I figured I'd post it incase someone else comes a looking. (Cross posted on my site http://seanreiser.com/projects/2009/06/06/rebuilding-drupal-comment-stat... )

db_query("UPDATE {comments} set thread = ''");

db_query("DELETE FROM {node_comment_statistics}");

$nodes = db_query("SELECT nid,uid,changed FROM {node} WHERE status=1");
while ($m = db_fetch_object($nodes)) {
    $nid = $m->nid;

    // Calculate node_comment_statistics

    $comments=db_query("SELECT nid,max(cid) FROM {comments} WHERE status=0 AND nid=$nid GROUP by nid");
    if($l = db_fetch_object($comments)) {
        $cid = $l->{"max(cid)"};

        $lastcomments = db_query("SELECT uid,name,timestamp FROM {comments} WHERE cid=$cid");
        $n = db_fetch_object($lastcomments);
        $uid = $n->uid;
        $name = $n->name;
        $timestamp = $n->timestamp;

        $counts = db_query("SELECT count(*) FROM {comments} WHERE status=0 AND nid=$nid");
        $n = db_fetch_object($counts);
        $count = $n->{"count(*)"};
    } else {
        $timestamp = $m->changed;
        $name = NULL;
        $uid = $m->uid;
        $count = 0;
    }
    db_query("INSERT INTO {node_comment_statistics} (nid,last_comment_timestamp,last_comment_name,last_comment_uid,comment_count) VALUES ('$nid','$timestamp','$name','$uid','$count')");



    // Build Comment Threads for Node


    $comments=db_query("SELECT cid, pid FROM {comments} WHERE nid=$nid ORDER BY timestamp");
   
    $nodethread = 0;
    while ($comment = db_fetch_object($comments)){
               
        if ($comment->pid == '0'){
            $thread = int2vancode(++$nodethread) . "/";
        }
        else {
           
            $parent = _comment_load($comment->pid);
            // This is comment with a parent comment: we increase
            // the part of the thread value at the proper depth.

            // Get the parent comment

            // Strip the "/" from the end of the parent thread.
            $parent->thread = (string) rtrim((string) $parent->thread, '/');

            // Get the max value in _this_ thread.
            $max = db_result(db_query("SELECT MAX(thread) FROM {comments} WHERE thread LIKE '%s.%%' AND nid = %d", $parent->thread, $edit['nid']));

            if ($max == '') {
                // First child of this parent.
                $thread = $parent->thread .'.'. int2vancode(0) .'/';
              }
            else {
                // Strip the "/" at the end of the thread.
                $max = rtrim($max, '/');

                // We need to get the value at the correct depth.
                $parts = explode('.', $max);
                $parent_depth = count(explode('.', $parent->thread));
                $last = $parts[$parent_depth];

                // Finally, build the thread field for this new comment.
                $thread = $parent->thread .'.'. int2vancode(vancode2int($last) + 1) .'/';
           
            }
        }
        db_query("UPDATE {comments} set thread = '$thread' where cid = $comment->cid");
    }

}

Thanks sreiser, very useful

sinasalek - July 20, 2009 - 17:37

Thanks sreiser, very useful

sina.salek.ws, CIO & Lead developer
Feel freedom with open source softwares

 
 

Drupal is a registered trademark of Dries Buytaert.