I'm trying to restore a MySQL database dump from a Drupal 6.19 installation; restoring the dump fails because of "duplicate entry" errors:
mysql -u root -p database_drupal < dbcontent.sql
Enter password: [...]
ERROR 1062 (23000) at line 5751: Duplicate entry 'das-415-node' for key 'word_sid_type'
Lines 5750 and 5751 read:
$ sed -n '5750p' dbcontent.sql | more
/*!40000 ALTER TABLE `search_index` DISABLE KEYS */;
...
$ sed -n '5751p' dbcontent.sql | more
INSERT INTO `search_index` VALUES ('situation',1,'node',0.816452), [...]
(a very long line follows).
As it seems, line 5751 trieds to insert duplicates into the search index. Simply deleting the "damaged" line does not help:
$ sed '5752d' dbcontent.sql > dbcontent_fixed.sql
...
$ mysql -u root -p database_drupal < dbcontent_fixed.sql
Enter password: [...]
ERROR 1062 (23000) at line 5751: Duplicate entry 'das-415-node' for key 'word_sid_type'
...
$ mysql -u root -p cinedat_drupal < dbcontent_fixed_3.sql
Enter password: [...]
ERROR 1062 (23000) at line 5752: Duplicate entry 'daß-3230-node' for key 'word_sid_type'
I tried to delete line after line a couple of times, but there are always duplicates left. As it seems there is no easy way to restore this database dump. That's quite bad as it renders database backups quite useless for normal users.
Table structure of 'search_index' from the live database (not the dump):
mysql> describe search_index
-> ;
+-------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| word | varchar(50) | NO | MUL | | |
| sid | int(10) unsigned | NO | MUL | 0 | |
| type | varchar(16) | YES | | | |
| score | float | YES | | NULL | |
+-------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Strangely I seem to be unable to locate the duplicates in the live database:
mysql> ALTER IGNORE TABLE search_index ADD UNIQUE INDEX (sid, word, type);
Query OK, 1985279 rows affected (1 min 37.63 sec)
Records: 1985279 Duplicates: 0 Warnings: 0
And:
mysql> select word, sid, type, count(*) from search_index where sid<10000 group by word, sid, type having count(*) > 1;
Empty set (2.15 sec)
I'm not enough a database guy to make sense out of this findings.
Duplicates in the search indexer have been discussed several times before; e.g.:
- #218403: Duplicate entry errors in search indexer (against D6.9)
- #143160: search_index has duplicate (sid, word, type, fromsid=0) entries (against D6.x-dev)
Also there a a couple of related postings in the the forums, e.g. duplicate search results (180032). However, as it seems there's still a way for duplicates to get into the search_index in current D6 versions. The affected site was upgraded from D5 a couple of months ago, but after the upgrade the search index has been rebuilt via ./admin/settings/search (but I did not drop the 'search_index' table manually), so I believe this might still be an issue in D6.
Any ideas or suggestions?
-asb
Comments
Comment #1
jhodgdonThe only suggestion I can make is to go to your MySQL console and truncate all the search-related tables. Don't drop the tables themselves -- just empty them out. Then run cron until your site is fully indexed.
I imagine those entries were put in there by your D5 site. Those issues you referenced were fixed many many months ago.
Comment #2
damien tournoud commentedYou probably switched from MySQL 5.0 to MySQL 5.1. The collations used in 5.1 are somewhat different, so importing a 5.0 dump into MySQL 5.1 is not totally supported. As described by Jennifer, it doesn't really matter anyway, as you can just throw away the content of this table and reindex.
Comment #3
asb commentedThank you for your replies.
No, I haven't yet switched from from MySQL 5.0 to MySQL 5.1, I'm still running 5.0.51a-24+lenny4 on Debian.
I remember to have struggled with collation issues in the past (D4 to D5 migration and latin-1 to utf-8 conversion), but I believe to have these issues properly fixed, and I'm pretty sure that the affected site wasn't even one of those started on D4.
As I said, I have rebuilt the search index after the last major upgrade; if this doesn't suffice it's good new knowledge (which would really need to be added to the "best practices" section of the handbook pages).
The real problem here is that database dumps become useless as backups because (to my limited knowledge) I can not truncate any search related tables in a database dump. If someone really needs his/her backup, then runs into this issue and has to fix a 1-gigabyte-database-dump-file with sed..., well, good look with that!
However, If you are sure that this duplicate entry issue is really fixed, I'll have to truncate all search tables. If I'm lucky, in a couple of months my users will be able to search our content again without Google ;)
Greetings, -asb
Comment #4
damien tournoud commentedWell, if MySQL cannot import its own dump, the bug is in MySQL, not Drupal. That said, I never heard about something like that.
Comment #5
asb commentedThat's quite polemic, isn't it?
If Drupal does not ensure the integrity of the data structures it stores, the database can't do anything about it but to refuse to process the inconsistent data. It's quite pointless to point fingers in such a way.
Try the MySQL Reference Manual or Google: http://www.google.de/search?hl=de&hs=Qr3&rls=de&q=mysql+%22ERROR+1062%22...
Comment #6
damien tournoud commentedNo, that's not polemic at all. It's MySQL job to ensure the data consistency. If the data was there in your database and you dump it to a file, MySQL is supposed to reimport it cleanly. If it doesn't it's a bug in MySQL.
You completely misunderstood what I said. The database accepted the data, because it is there in your database.
Comment #7
jhodgdonDamien is correct: MySQL is supposed to enforce the integrity of data, not Drupal. MySQL should not accept an entry if it would create a duplicate in the index (that is what those other issues you mentioned were reporting -- Drupal was trying to shove bad data into MySQL and MySQL wasn't accepting it).
And I think/hope MySQL should not allow you to create a new "unique" index on an existing table if that index would have duplicate values. So something went wrong in your MySQL, and if MySQL cannot read its own dumps, that cannot be blamed on Drupal. Sorry.
Anyway, hopefully truncating the tables and reindexing has resolved your problem...
Comment #8
asb commentedAs I said, reindexing my sites will takes several months, so I'll know more maybe between spring and summer '11.
The best recommendation for people doing backups would be not to use something like the fullsitebackup.sh shell script or it's derivates as they dump the complete database and thus might (will) create broken backups. I'm using these scripts for years, and just checked the dumps from my other sites, and none could be restored because of these duplicate entry issue (including some sites that were started on D6 and never went throught a major upgrade).
Comment #9
Dropfish commentedIn my case it worked to just remove the relevant INSERT statements from the dump. I don't know enough about the implementation to be sure, but my hope is that I'm only removing information that will be regenerated soon anyway.
So if you're desperate to restore from that only remaining database dump of yours, try this:
egrep -v '^INSERT INTO `search_(index|total)`' broken-dump.sql | mysql drupal6Comment #10
jhodgdonI did some more investigation. The problem is that if your site was updated from Drupal 5, the function system_update_6036() added the unique index to the search table, but it did it in "ignore" mode, meaning that it ignored duplicates. So at that point, your search table is in a bad state where it cannot be dumped/restored. The only solutions are to (a) run some kind of a query to remove duplicates or (b) truncate the table from the dump and reindex.
But I am changing this back to a bug report. The other issues you mentioned that were "fixed" caused this problem, because they didn't want to bother figuring out which records needed to be removed from the search index (and there probably is no good way to do it other than dumping the entire search index).
Meaning this is probably a bug that we won't fix, but it is nonetheless a real bug.
Comment #11
jhodgdonforgot to change the status too
Comment #12
asb commentedThank you for looking into this. Would you recommend to truncate the four search related tables (
search_dataset,search_index,search_node_linksandsearch_total), or should it suffice to reindex via./admin/settings/search?And, just as a reminder, I believe to be getting this ERROR 1062 as well with dumps of sites that were not upgraded from D5. But of course potentially anything could have interfered with the core search, and I'll have to verify this claim.
However, that's what I've done: Dumped the database of one affected site with the 'Backup and Migrate' module which allows to exclude tables from the database dump; this dump I set up locally, started the reindexing via
./admin/settings/searchand loopeddrush crona couple of times. Result:The "..." show segments I cut off from the (very long) PHP warning for better readability. I don't know too much about databases and I might be wrong, but I believe that Drupal 6 is trying to insert duplicate entries into a clean search table (actually I'm running Pressflow 6.19.92, but I don't think that the kitchen guys have fiddled with the core search).
Comment #13
jhodgdonPossibly any site that was running an early version of Drupal 6.x and upgraded to a later one would have run the system_update_6036() function (I'm not absolutely certain of when that went in), which would have left the duplicate index entries in the tables.
If you want to clean your tables, I think the only way is to truncate all four of the tables. I'm pretty sure that reindexing won't take care of it, because those entries would not be attempted to add again and would probably not be removed by this process. They might, but I think truncating would be the best solution (then running cron a bunch of times to reindex). You should probably also click the reindex button, as search modules can do extra things in response.
Comment #14
jhodgdonPlease don't go back and edit previous comments - it makes it difficult to follow the sequence...
So regarding the newly edited #12, the function search_index() is generating that query, and it does:
The function search_wipe() is doing:
So I don't see how this error you reported could be happening, unless you are running cron twice at the same time. Which I don't think should be possible, but maybe it is.
Comment #15
asb commentedSorry, my editing and your reply overlapsed.
I dont think that
drush croncan run twice since it only frees the shell prompt after finishing one run, but theoretically the "normal"./codemight have been triggered at the same time; ah, and we're running the 'JobScheduler' module. I'll try again on the live site and not use drush:Plus
./admin/settings/search/wipe, let's see what happens in the next days.Comment #16
asb commentedWhen changing servers, I've been hit again by this. Again, restoring the database dump fails:
This is a different site, and it was definitely not started with D5. Watever is causing this, it's still existing in D6.
It is more than frightening that any backup might be corrupt and not restorable. Changing priority because of the severity of corrupt database dumps, and because the issue obviously is not fixed in D6.
Comment #17
asb commentedand again:
Comment #18
asb commentedRelated issue: #772678: Database default collation is not respected
(this time it is an update from MySQL 5.0.x to 5.1.x)
Comment #19
asb commentedand one more:
Common denominator: All these sites are localized.
Comment #20
sunAFAICS, the small but important difference to #772678: Database default collation is not respected is that
@asb: However, did you already try whether you can import your database using a snapshot created by latest Demo module?
Comment #21
asb commentedIn #2, Damian suggested a connection between these duplicate entries when switching from MySQL 5.0 to MySQL 5.1. As of December 2010, this was not the case for me (Debian "Lenny"), but now now it might be (Debian "Squeeze").
I "resolved" the database restore issue by repeating the steps from #15 (dropped all search related tables from the source database, created a new database dump which could be imported smoothly under MySQL 5.1). Since I'm just doing a normal server switch, I still have the source databases available. If this'd be a disaster recovery, I'd probably have a major problem with the "corrupt" database dumps.
Maybe relevant:
mysqladmin -u username -p create databasename, but withmysql> CREATE DATABASE database CHARACTER SET 'utf8';.Re. Demonstration site (Sandbox / Snapshot): I'll try to educate myself how this module works (the Drupal sites on the old server are down).
Comment #22
hosais commentedHi,
I posted my problem at http://drupal.org/node/1223586#comment-4768356.
I think it is similar problem here. I would like to search(view query) accented characters in my web site. I did some tests.
1) Changed the database (utf8-general-ci => utf8-unicode-ci),
2) found that drupal executed "SET NAMES UTF8.
Drupal showed entries no matter their accented after searching. On the other hand, I tested in phpMySQL with "SET NAMES utf8;" + SQL query (copied from drupal view). It worked.
Now I have no idea what is the problem. If anyone has any suggestions, please tell me. Thank you very much.
hosais
Comment #23
asb commentedI was hit on just another site by this issue again when buld updating the search index through
drush search-index:And btw, this is a site with MySQL 5.0, not MySQL 5.1.
Comment #24
asb commentedWe were hit once again by this core issue. When restoring a database dump with
we got:
This time it's a site started before D6 (actually it was 4.6), and the data is migrated from MySQL 5.0 (old server) to MySQL 5.1 (new server). And again I can not try the 'demo' module as the Drupal site on the old server is down (file/domain already transferred) :-(
Fun side note: As rebuilding the site index via normal cron runs takes months, I tried
drush site-indexwhich dies with a PHP memory limit of 2000M (2 gigabytes!) due to a lack of memory (#1303480: PHP Fatal error: Allowed memory size of 209715200 bytes exhausted).Comment #25
alexmoreno commentedi'm of the oppinion of asb. There must be an error in Drupal search. I've run in the same problem and after some hours wasted/learning about the problem, simply truncating or uninstalling search module the mysqldumps worked again.
Thanks.
Comment #26
howdytom commentedThank you so much for your comment. I was searching for hours for a solution.
Finally I've disabled the search and search404 module in D6. Than I've uninstalled both modules. Now the duplicate entry error message disappeared.
Comment #27
jakob123 commentedSo Google brought me here as this bug still ranks pretty high. I am facing the MySQL-cannot-import-its-own-dump problem. A few points that I think are worth noting for others that will find this page when rushing to upgrade from EOL'd MySQL 5.0. (Ubuntu 8.04 has been EOL'd yesterday).
1) system_update_6036() cannot be blamed. "ALTER TABLE IGNORE" means that duplicate rows are DELETED. ( see http://dev.mysql.com/doc/refman/5.0/en/alter-table.html )
2) Drupal is NOT to blame if MySQL cannot import its own dump. The inability to import a dump from MySQL 5.0 into MySQL 5.1 is a bug in MySQL: http://bugs.mysql.com/bug.php?id=43593 (in 5.1, "ß" and "s" are deemed identical) . The last post describes how to get around that problem.
3) The "Duplicate entry" messages you get as the search module indexes your site are annoying but harmless. MySQL is refusing the duplicate entries and keeps the table integrity.
Comment #28
asb commentedI tend to disagree. The duplicates that Drupal generates in it's database are everything but "harmless". They still keep database backups from being processable in MySQL 5.1. MySQL bug #43593 referenced in #27 is about 5.0 to 5.1 migrations which are not the issue here. And I strongly doubt that Oracle or anyone else will make a RDMBs process garbled data. Not even Photoshop tries to load corrupt images, why should a database import invalid database dumps that have been corrupted by the database application, not by the RDBMs?
The only way around this is to tell everybody that all search-related tables have to be dropped if you are planning to restore your backup.
Comment #29
jhodgdonThe only Drupal bug identified in this issue was in migrating from Drupal 5 to 6, which at this point is probably a "wont' fix" issue. The other bug identified is that if you are running one MySQL version and export/dump out your Search tables in your Drupal database, it may not be importable into other MySQL versions or databases set up with different collations. This is not something Drupal can fix.
So I'm just going to mark this issue as "won't fix". If someone can come up with a way that Drupal can be patched to overcome some of these issues, please feel free to reopen the issue.