Hello,

the update script got out of memory (initially 128Mb limit then 256Mb limit) when upgrading to version 6.x a large site with over 400.000 private message entries in privatemsg table .

I had to tweek the privatemsg_update_6000() to continue from last mid updated and launch it 4 times in order to complete. I tried first to find where the memory hole was, but I could not .

Thank you

Comments

ronnieserr’s picture

Dear quiquee,

I have ***EXACTLY*** the same issue. I tried upping php.ini memory limit to 512M, but that failed to. My sense like you is that privatemsg_update_6000() was not tested on large privatemsg table.

What to do? How did you tweak privatemsg_update_6000()? I was thinking of adding a LIMIT clause to all the selects to, say, 1000, and see what happens.

Does it make sense contacting the maintainer of this module?

Thanks ... looking forwards to hearing from you.

berdir’s picture

Priority: Critical » Major

Yes, that function hasn't been designed to deal with large amounts of private messages. Doing it now is rather complicated. If you can give me an anonymized version of your data, I can try it. Write me a message through the contact form to get my mail address or join #drupal-games in IRC.

quiquee’s picture

I have basically rewritten the install script to do the updates in a way should work for everyone, and much much faster. I will post the patch as soon as I finish the testing.

Regards

quiquee’s picture

This is the patch. It is not thoroughly tested but it did the work for me.

I hope it can as well serve others.

Regards

igorik’s picture

yes, I have the same issue. No offense quiquee but I would rather welcome patch from Berdir or some word of Berdir about last patch, if it is safe and correct.

berdir’s picture

Status: Needs work » Needs review
berdir’s picture

Checking with testbot, will review later

berdir’s picture

Version: 6.x-1.3 » 6.x-1.x-dev
igorik’s picture

Hi

Sorry for my missunderstanding, I am using 6.x.2.x, not 6x.1. but my update script with using latest dev 2.x buidl fails and server will stop - for lack of memory with update script
then stop all privatemsg module, this is my bug report, not sure if it is connected with this:
http://drupal.org/node/1116134

Status: Needs review » Needs work

The last submitted patch, privatemsg-update6xLargeSite-1111528.patch, failed testing.

berdir’s picture

+++ privatemsg.install	2011-04-03 22:06:17.000000000 +0200
@@ -215,6 +215,11 @@ function privatemsg_update_6000() {
+    'indexes'=> array(
+		      'mid'               => array('mid'),
+		      'thread_id'         => array('thread_id'),
+		      'uid'         => array('uid'),

Use two spaces instead of tabs. Also, you must not rewrite history. Meaning, you can't change the result of the upgrade path because later upgrade functions might depend on the existence of the indexes.

+++ privatemsg.install	2011-04-03 22:06:17.000000000 +0200
@@ -291,89 +291,66 @@ function privatemsg_update_6000() {
   // Step 2c: Next the data from the archive table - notice all these messages have been deleted both by the author and the recipient.
+  ¶
   if (db_table_exists('privatemsg_archive')) {
-    $data = db_query("SELECT * FROM {privatemsg_archive}");
-    while ($result = db_fetch_array($data)) {
-      if ($result['thread'] == 0) {
-        $result['thread_id'] = $result['id'];
-      }
-      else{
-        $result['thread_id'] = 0;
-      }
-      if ($result['author'] <> $result['recipient']) {
-        db_query("INSERT INTO {pm_message} (mid, author, subject, body, timestamp) VALUES ( %d, %d, '%s', '%s', %d )", $result['id'], $result['author'], $result['subject'], $result['message'], $result['timestamp']);
-        db_query("INSERT INTO {temp_pm_index} (mid, thread, folder, thread_id, uid, is_new, deleted) VALUES ( %d, %d, %d, %d, %d, %d, %d )", $result['id'], $result['thread'], $result['folder'], $result['thread_id'], $result['recipient'], 0, 1);
-      }
-      db_query("INSERT INTO {temp_pm_index} (mid, thread, folder, thread_id, uid, is_new, deleted) VALUES ( %d, %d, %d, %d, %d, %d, %d )", $result['id'], $result['thread'], 0, $result['thread_id'], $result['author'], 0, 1);
-    }

Trailing spaces

+++ privatemsg.install	2011-04-03 22:06:17.000000000 +0200
@@ -291,89 +291,66 @@ function privatemsg_update_6000() {
+    db_query("INSERT INTO {temp_pm_index} (mid, thread, folder, thread_id, uid, is_new, deleted) SELECT id, thread, folder, id, recipient, newmsg, recipient_del FROM {privatemsg_archive} WHERE thread=0" );
+    db_query("INSERT INTO {temp_pm_index} (mid, thread, folder, thread_id, uid, is_new, deleted) SELECT id, thread, folder, 0, recipient, newmsg, recipient_del FROM {privatemsg_archive} WHERE thread<>0");
+    db_query("INSERT INTO {temp_pm_index} (mid, thread, folder, thread_id, uid, is_new, deleted) SELECT id,thread,0,id,author,0, author_del  FROM {privatemsg_archive} WHERE thread=0") ;

Is that separation of thread_id 0 really necessary? if it is 0, then we can just let it insert that?

+++ privatemsg.install	2011-04-03 22:06:17.000000000 +0200
@@ -291,89 +291,66 @@ function privatemsg_update_6000() {
+    db_query("INSERT INTO {temp_pm_index} (mid, thread, folder, thread_id, uid, is_new, deleted) SELECT id, thread, folder, id, recipient, newmsg, recipient_del FROM {privatemsg} WHERE thread=0" );
+    db_query("INSERT INTO {temp_pm_index} (mid, thread, folder, thread_id, uid, is_new, deleted) SELECT id, thread, folder, 0, recipient, newmsg, recipient_del FROM {privatemsg} WHERE thread<>0");
+    db_query("INSERT INTO {temp_pm_index} (mid, thread, folder, thread_id, uid, is_new, deleted) SELECT id,thread,0,id,author,0, author_del  FROM {privatemsg} WHERE thread=0") ;
+    db_query("INSERT INTO {temp_pm_index} (mid, thread, folder, thread_id, uid, is_new, deleted) SELECT id,thread,0,0,author,0,author_del FROM {privatemsg} WHERE thread<>0") ;

Same as above. Maybe this could even be combined with the privatemsg_archive query with UNION? Not sure if it would be any faster..

+++ privatemsg.install	2011-04-03 22:06:17.000000000 +0200
@@ -291,89 +291,66 @@ function privatemsg_update_6000() {
+  // Step 3a: Fix the thread dat
+  db_query("DROP TABLE IF EXISTS {temptable}") ;
+  db_query("CREATE TABLE temptable select thread, min(mid) as mid  FROM temp_pm_index where thread_id=0 group by thread" ) ;
+  db_query("create index t2 on temptable ( thread) ") ; ¶
+  db_query("create index t1 on temp_pm_index ( thread_id)" ) ; ¶

You removed the "a" at the end of a comment.

Drupal has an API to create temporary tables, see http://api.drupal.org/api/drupal/includes--database.mysqli.inc/function/...

Also, some trailing spaces and spaces before ";".

+++ privatemsg.install	2011-04-03 22:06:17.000000000 +0200
@@ -291,89 +291,66 @@ function privatemsg_update_6000() {
+  db_query("INSERT INTO {pm_index} (mid, thread_id, uid, is_new, deleted) SELECT mid, thread_id, uid, is_new, deleted FROM {temp_pm_index}  " ) ;

Never seen that syntax before. This also makes me wonder why we need temp_pm_index in the first place...? ( I haven't written that update function, there probably is a reason...)

+++ privatemsg.install	2011-04-03 22:06:17.000000000 +0200
@@ -291,89 +291,66 @@ function privatemsg_update_6000() {
+  if ( 1==2 ) { ¶
+    db_drop_table($ret, 'privatemsg');
+    db_drop_table($ret, 'privatemsg_archive');
+    db_drop_table($ret, 'privatemsg_folder');
+    if (db_table_exists('privatemsg_block_user')) {
+      db_drop_table($ret, 'privatemsg_block_user');
+    }
+    db_drop_table($ret, 'temp_pm_index');
+    if (db_table_exists('temptable')) {
+      db_drop_table($ret, 'temptable');

You probably left that 1==2 condition there for testing, but we need it :)

+++ privatemsg.install	2011-04-03 22:06:17.000000000 +0200
@@ -424,8 +401,8 @@ function privatemsg_update_6003() {
-  $result = db_query($sql);
-  while ($row = db_fetch_object($result)) {
+  //$result = db_query($sql);

Same here, you can't just disable this, is the query too slow?

Powered by Dreditor.

ronnieserr’s picture

privatemsg-update6xLargeSite-1111528.patch failed for me too.

quiquee’s picture

Version: 6.x-1.x-dev » 6.x-1.3

Berdir, thanks for the review. If you want me to help testing it from a clean install and repost only after testing, then I am happy to do. Whatever I can do to help you and others.

I try to reproduce the behaviour of the existing script with mass updates rather than one by one fetch-insert/update. I did it on a reduced version of my data for testing (80.000 instead of +400,000 messages) and I may have broken something along the way as I implemented as it was breaking, never done a clean one pass updatedb run. As said, if you think my contribution is worth I am happy to retest and fix. If you want to do it yourself and post the patch, I will test using the final patch. That is probably the best solution

regards

quiquee’s picture

No worries. It is you who can be offended for me to post something which was not thoroughly tested ! :-)
I hope I can help to make it work asap

berdir’s picture

Version: 6.x-1.3 » 6.x-1.x-dev

You are more than welcome to continue working on the patch! This is hard thing to get right, it is important that the patch is tested on multiple systems with different data.

Don't get my long review the wrong way, I am rather picky and there is nothing wrong with posting patches early, see http://www.webchick.net/embrace-the-chaos.

I've received some test data from ronnieserr, once you updated your patch, I will try it on that myself. I have no Drupal 5 system, so I was never able to test that upgrade path, have not written it and in fact, never even touched Privatemsg for Drupal 5.x ;)

If we can get it working with your mass-update queries (which are certainly better than what we have currently!), great. If not, we need to try splitting it up into multiple batch runs, by processing only a limited amount of rows per request. However, that is going to be even complexer, since we can't add new update functions and would have to track the progress over multiple steps.

In short, please continue working on this!

quiquee’s picture

Assigned: Unassigned » quiquee
Status: Needs work » Needs review
StatusFileSize
new10.4 KB

Hello,

I went through the different errors in detail.
The first 4 would cause the archived messages (those deleted by both author and recipient) to disapear after the upgrade. I fixed that and now it runs clean. For the test I have used the sample tables that ronnie sent (380k messages) and on my own data (500k) . The upgrade takes slightly more than 8 minutes on a powerful server.

The other warnings were not critical and were part of the original upgrade script when upgrading from drupal 5. As they are coming from attemps to drop indexes that dont exist when upgrading from v5 I decided to comment out those lines. The alternative would be to check if indexes exist, but for that we need the db_index_exists that comes only in v7.

I have then tested sending messages, tagging them and deleting them, and it works as expected. However, it is very very slow. I am sure there are missing indexes that would need to be added. I am not going to go through that now, but I will definitely need to investigate that before I can move my site to production on v6.

Please report any problem with the patch.
Thank you!

berdir’s picture

Status: Needs review » Needs work
+++ /home/http/gameshare/drupal-6.15/sites/all/modules/privatemsg/privatemsg.install	2011-04-08 01:00:49.000000000 +0200
@@ -215,6 +215,11 @@ function privatemsg_update_6000() {
+    'indexes'=> array(
+	'mid' => array('mid'),
+	'thread_id' => array('thread_id'),
+	'uid' => array('uid'),

Still some tabs instead of spaces here.

+++ /home/http/gameshare/drupal-6.15/sites/all/modules/privatemsg/privatemsg.install	2011-04-08 01:00:49.000000000 +0200
@@ -291,82 +291,77 @@ function privatemsg_update_6000() {
+  foreach ( array("privatemsg_archive", "privatemsg") as $oldMessageTable) {
+  ¶
+    $recipient_del='recipient_del';
+    $author_del='author_del';
+   ¶

Many trailing spaces in most parts of the patch. I'll re-roll and clean them up myself.

Also, that variable should be named $old_message_table.

+++ /home/http/gameshare/drupal-6.15/sites/all/modules/privatemsg/privatemsg.install	2011-04-08 01:00:49.000000000 +0200
@@ -424,10 +422,7 @@ function privatemsg_update_6003() {
-  $result = db_query($sql);
-  while ($row = db_fetch_object($result)) {
-      privatemsg_message_change_status($row->mid, PRIVATEMSG_READ, $row );
-  }

Again, why are you removing this part?

+++ /home/http/gameshare/drupal-6.15/sites/all/modules/privatemsg/privatemsg.install	2011-04-08 01:00:49.000000000 +0200
@@ -490,9 +485,16 @@ function privatemsg_update_6007() {
-  db_drop_index($ret, 'pm_message', 'author');
-  db_drop_index($ret, 'pm_message', 'subject');
-  db_drop_index($ret, 'pm_message', 'timestamp');
+  // This works only in v7. ¶
+  // for now just let leave the indexes if the ever exist
+  // ¶
+  //  if (db_index_exists('pm_message', 'author')) {
+  // db_drop_index($ret, 'pm_message', 'author');
+  //}
+
+  // db_drop_index($ret, 'pm_message', 'subject');
+  // db_drop_index($ret, 'pm_message', 'timestamp') ;

They only don't exist because you did not create them. That's exactly the kind of problems that turn up if you change history :)

Also, I've added a backport of db_index_exists() to 6.x-2.x already, so we could backport that.

Setting to needs work, I'll work on this a bit myself.

Powered by Dreditor.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new10.23 KB

Ok, worked a bit on the patch...

- Cleaned up coding style, removed unecessary spaces, splitting comments at 80 chars, using tabs instead, ...
- Used db_query_temporary() for temptable
- I confirmed that the query in 6003 runs forever. Instead of just removing it however, I added a check so that it is skipped when you update directly from Drupal 5 because that fixes a data corruption that only happens if you used an earlier version of privatemsg.

The patch took 1m34s with drush using the data from ronnieserr.

I can also confirm the really bad performance after the upgrade. A big part of that is because of privatemsg_filter, which has a really really bad query in 6.x-1.x. This is already improved a lot in 6.x-2.x and I'm working on making it fast in #744374: Introduce a pm_thread table

Also not that there is an upgrade issue when upgrading from 6.x-1.x to 6.x-2.x currently, see #1122188: Update fails because privatemsg_filter updates run after privatemsg.

berdir’s picture

StatusFileSize
new10.28 KB

Another update.

- I made my own simple test site with d5 to be able to better test if it works correctly. Noticed that is_new is always set to 0, updated the patch to actually set that correctly
- I also verified that I can update to 6.x-2.x with the patch posted in #1122188: Update fails because privatemsg_filter updates run after privatemsg (note that it will take a long time but it's using the batch system correctly so that's ok).

ronnieserr’s picture

I have run privatemsg-update6xLargeSite-1111528-v2.patch on privagemsg 6.x-1.x-dev successfully. No update errors. Thank you very much.

I'll be checking functionality in the next few days.

quiquee’s picture

Berdir, thank you very much. I will be using 6.x-2 as I need the extra speed. I plan to test it with my half million message database and 50k users over the coming days and will report any issue. Let me know if you need some special test on anything

regards

berdir’s picture

Sure, any help testing the two linked issues (or anything else for that matter) would be welcome.

berdir’s picture

Can you confirm that the patch I posted works for you too? Going to commit this soon...

berdir’s picture

Status: Needs review » Fixed

Commited to 6.x-1.x and 6.x-2.x.

Thanks for reporting, reviewing, and testing!

A new development snapshot will be available within the next 12 hours. This improvement will be available in the next official release.

Powered by Dreditor (triage sandbox) and Triage transitions

Status: Fixed » Closed (fixed)
Issue tags: -6x upgrade

Automatically closed -- issue fixed for 2 weeks with no activity.