Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
When running the upgrade to b5, this is what happens:
Update #173
* CREATE TABLE {file_revisions} ( fid int(10) unsigned NOT NULL default 0, vid int(10) unsigned NOT NULL default 0, description varchar(255) NOT NULL default '', list tinyint(1) unsigned NOT NULL default 0, PRIMARY KEY (fid, vid) ) /*!40100 DEFAULT CHARACTER SET utf8 */
* Failed: INSERT INTO {file_revisions} SELECT fid, vid, description, list FROM {files}
* CREATE TABLE {files_copy} AS SELECT * FROM {files}
* DROP TABLE {files}
* CREATE TABLE {files} ( fid int(10) unsigned NOT NULL default 0, nid int(10) unsigned NOT NULL default 0, filename varchar(255) NOT NULL default '', filepath varchar(255) NOT NULL default '', filemime varchar(255) NOT NULL default '', filesize int(10) unsigned NOT NULL default 0, PRIMARY KEY (fid) ) /*!40100 DEFAULT CHARACTER SET utf8 */
* INSERT IGNORE INTO {files} SELECT fid, nid, filename, filepath, filemime, filesize FROM {files_copy}
* DROP TABLE {files_copy}
As you can see, the very important step of inserting the data into the file revisions table fails. At that point, the process should end, becasue the steps that come after result in data loss.
I'll research *why* it fails... whether it is specific to my system and data or a larger problem. In the meantime, perhaps someone could revise that update so that it is impossible to lose data.
Comment | File | Size | Author |
---|---|---|---|
#63 | 53177_6.patch | 4.73 KB | dww |
#61 | 53177_5.patch | 4.58 KB | dopry |
#57 | 53177_4.patch | 5.08 KB | dopry |
#55 | 53177_3.patch | 4.73 KB | Cvbge |
#54 | 53177_2.patch | 4.77 KB | Cvbge |
Comments
Comment #1
Dries CreditAttribution: Dries commentedI can reproduce this problem. When I execute the query on the command line, I get:
INSERT INTO file_revisions SELECT fid, vid, description, list FROM files;
ERROR 1054 (00000): Unknown column 'vid' in 'field list'
Comment #2
robertDouglass CreditAttribution: robertDouglass commentedI get a different failure:
Comment #3
robertDouglass CreditAttribution: robertDouglass commentedThe reason why my update fails the way it does is clear; I have multiple entries in the files table that have the same fid and vid. I've been using this version of the database.mysql file:
$Id: database.mysql,v 1.222 2006/01/24 18:23:41 dries Exp $
where the files definition reads:
I would guess that there should have been a primary key on vid, fid. I wonder how many people will have files tables that have duplicates like I do?
Comment #4
robertDouglass CreditAttribution: robertDouglass commentedTo answer my own question, all of the people with databases older than:
version 1.224, Wed Feb 22 10:06:46 2006 UTC
Comment #5
robertDouglass CreditAttribution: robertDouglass commentedTo clarify; my duplicate records share the same fid, nid and vid. The primary key should have been (fid) alone, I think.
Comment #6
robertDouglass CreditAttribution: robertDouglass commentedChanging the query to this resolved the issue for me:
INSERT INTO {file_revisions} SELECT DISTINCT(fid), vid, description, list FROM {files}
I'm guessing that I ended up with the same copy of the file that I was getting in the web appliation, but maybe this needs to be confirmed. Can't roll a patch right now because I'm behind my client's firewall on their machine, but the code is on 1601 and 1628 of updates.inc.
Comment #7
Dries CreditAttribution: Dries commentedShouldn't we add a primary key too?
Comment #8
robertDouglass CreditAttribution: robertDouglass commentedI think it is already there.
Comment #9
simeI've recreated the issue quite easily, at least the specific case that Robert is describing. I installed 4.7b4 and activated the upload module. I had a quick look at the table and noted that there is no primary key, just two non-unique indexes on fid and vid.
Then I created a new page node and added two files. I check the files table and saw two rows had been created.
Then I edited the node and saved it. No additional rows in the files table. OK
Then I edited the node, ticked "revision" and saved it. Looking in the files table the two rows were duplicated exactly. (Tab-separated file attached).
On Robert's suggestion, I was going to make a patch and see if I could upgrade by applying a "DISTINCT" clause to the SQL in the appropriate places, but I'm really not clear about what upgrade process is yet. To be sure, I looked in the 4.7b5 database.mysql file and see that fid is now a primary key, so we can be sure that the upgrade wouldn't work as is.
It's a bit late for me here, so I'll log off and see how this issue is going tomorrow
Simon
Comment #10
dopry CreditAttribution: dopry commentedAttached patch with DISTINCT... curiously the pgsql version alread had a distinct with the select
that built the new files table.
@sime
you output seems correct...Everything is identical between the two sets except the vid.
Comment #11
dopry CreditAttribution: dopry commentedumm yeah.. I always forget to update that status thing...
Comment #12
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedI am pretty sure that simply adding DISTINCT might "fix" the problem by losing data. Sime's output is not buggy but as it should be. The files are attached both to the first and the second revision, hence the double entry.
Comment #13
dopry CreditAttribution: dopry commentedkilles, no data should be lost. The unique/per vid data has already been copied to the file_revisions table.
What remains, and what is being placed in the new files table from files_copy should be the same for all vids sharing an fid.
Comment #14
robertDouglass CreditAttribution: robertDouglass commentedI had grotesquely mangled data where fid, nid and vid were exactly duplicated:
fid nid vid
1 5 7
1 5 7
like this. When that is the case, there is no recovery possible... take one and be happy. What is your suggestion, Gerhard?
Comment #15
dopry CreditAttribution: dopry commented@robertDouglas: did the duplication occur with the distinct patch? or did you get mangling of data by using the distinct patch? Do you have a copy of the data I can work with?...
IF it is just duplication that is causing a problem something like...
CREATE table files_copy AS SELECT * FROM files;
drop table files;
CREATE table files AS SELECT DISTINCT(fid), nid, filename, filepath, filemime, filesize FROM files_copy;
drop table file_copy;
may do the job...
Comment #16
robertDouglass CreditAttribution: robertDouglass commentedThis is what I had in mind. Hope the patch is ok... I had to work pretty hard to make it (not on my own environment, still).
Comment #17
dopry CreditAttribution: dopry commentedYou may want to use distinct vid, otherwise you will lose revision information.
Comment #18
robertDouglass CreditAttribution: robertDouglass commented@dopry: that might work. Can you roll a patch that does that? Like I said, rolling patches is hard for me from where I am.
Backing up: there are two issues here. First, this upgrade is dangerous for anyone who has a files table that was created after the revisions patch went in because the update fails to copy the information from the temporary revisions table yet still deletes the table, despite the failure.
Second, we need to assess how many corrupt files tables we've created in the wild, and what the best way to recover is. These are two separate tasks, and my patch addresses the corrupt tables in the wild scenario.
Comment #19
dopry CreditAttribution: dopry commented...
I still don't know what you started with and what you ended up with. Whether the duplicates you had in the files table were because the missing distinct on the second query which rebuilt the files table... I'd like to be able to duplicate the error you experienced.
...
Comment #20
robertDouglass CreditAttribution: robertDouglass commentedThis is what my files table looked like *before* the update:
Comment #21
robertDouglass CreditAttribution: robertDouglass commentedlol, so much for being able to set the input format. As you were supposed to easily see from the table that I made (which then had its tags stripped), each row is exactly the same. This is due to a bug that existed in Drupal before this update, but which has been corrected by this update. However, the flawed data that the bug created still exists (like above), and the update process, as is, will destroy *all* of the file revisions data if it encounters a table like the one I have. My patch handles the flawed data well enough, considering how broken the data is to begin with. Perhaps it would be better to say DISTINCT(vid) instead, though in my case it wouldn't help any, I don't think.
Comment #22
Dries CreditAttribution: Dries commentedOn drupal.org, we have lots of those:
Looks kinda fubar to me ...
Comment #23
dopry CreditAttribution: dopry commentedis drupal.org using, or has it used image.module? the _original, thumbnail, preview suggest so... :)
Do these files have associated file->mime/size/desc that is accurate or are thos fields empty?
Comment #24
sepeck CreditAttribution: sepeck commentedYes Drupal.org uses image module.
http://drupal.org/node/27367
Comment #25
Dries CreditAttribution: Dries commenteddopry, some files have mimetype informations, others have not. :/
Comment #26
dopry CreditAttribution: dopry commentedI'm not sure that we can really address pre-existing file table corruption. I think that is something administrators are going to have to figure out for themselves, or ask for help on... I don't think many people in the wild will experience this... I think we are most likely to see this kind of corruption coming from people who track head, and apply updates inconsistantly, and run buggy core or contrib modules.
I think admins will have to take care of that...
I rewrote the update with the DISTINCT clauses fixed... You may get duplications in the revisions table if fid,vid match and there are different descriptions or list options using mysql. Their distinct is apparantly broken and no matter how you write it, it will only do distinctrow.
I also switch to saving the files table by using an ALTER rename to files_copy to preserve keys, etc.
I fixed it so files_copy will not be dropped if an error occurred on any of the queries.
I would appreciate it if someone familiar with postgre would test the queries... I could only read the distinct clause syntax. I don't run post gre locally.
Comment #27
chx CreditAttribution: chx commentedI think we shall only care 4.6 -> HEAD upgrades, oldHEAD->HEAD upgrades are of little concern IMO.
Comment #28
robertDouglass CreditAttribution: robertDouglass commentedOk. So we forget about data consistency, fine. We need to stop deleting the temp_files table if the update fails, though. This is a really hard blow. The whole update needs to fail if the SELECT INTO fails.
Comment #29
robertDouglass CreditAttribution: robertDouglass commentedBut what is wrong with one of the patch versions here that lets the oldbeta->4.7 upgrade succeed? It won't hurt the 4.6->4.7 ugrade? Take a look at the last patch submitted. What is the worst that can happen from it?
Comment #30
dopry CreditAttribution: dopry commentedThe last patch does address the problem with mysql upgrades and row duplication in the tables. It also provides a backup files table in case something goes wrong. I don't think we should have to go that far, since the upgrade instructions include directions to backup your database, before upgrading!!!
Whats wrong with keeping this backup table around on partial upgrades? Well nothing besides someone may forget about it and drag it around for a while, however if we did that with everything a bad upgrade would quickly leave you with a bug mess of _copy tables. Also a second run of the update will now fail because the table files_copy already exists...
If you want real data consistency we should be using transactions!
However before we talk more about the issue more:
Review/Test the stinking patch!
It works for me on mysql, but I tested the queries by hand with a combination of the data I was supplied by both dries and robert. I hadn't really experience the problem before, but my personal site doesn't have any revisions so the process went flawlessly for me to begin with.
.darrel.
Comment #31
robertDouglass CreditAttribution: robertDouglass commentedsorry I didn't see the care you had taken to recover from failure. Thanks! I'll test in the morning.
Comment #32
dopry CreditAttribution: dopry commented@robertDouglass, If you don't have time you can always send me a dump of your database after a DELETE FROM users. And I can test the updates with access checks disabled. I'd really like to get this issue closed.
Comment #33
dopry CreditAttribution: dopry commentedmight as well assign it to myself.
Comment #34
robertDouglass CreditAttribution: robertDouglass commentedI'm in the process of testing the patch (finally). It succeeds in handling failure gracefully, and it applies cleanly, but it still chokes on my data set.
The data set is attached.
Comment #35
robertDouglass CreditAttribution: robertDouglass commentedBefore we get too hung up on the state of the data set, it should be noted that the original goal of this issue (not nuking the table if the upgrade fails) has been met. The data set that I uploaded is the product of a Drupal application that has been under constant development and customization since around October 2005, and has seen around 20 updates to the Drupal code base. It is clearly the exception, not the rule. So I think it is probably important to determine if anyone else has a files table as fubar as mine. Is the Drupal.org files table this messed up? Anyone else?
Comment #36
icenogle CreditAttribution: icenogle commentedI installed beta6, and first tried the unpatched unpdates.inc. It hung immediately. Then I applied the patch, and it hung again.
I'm attaching my files table dump.
Darrell Icenogle
Comment #37
webchickicenogle: just a quick note... when you change the title when you reply to an issue you actually change the title for the entire issue. :) Switching it back.
Comment #38
icenogle CreditAttribution: icenogle commentedWondered how that happened... Sorry.
Comment #39
robertDouglass CreditAttribution: robertDouglass commentedCan you specify which patch you applied? The most complete it dopry's patch: http://drupal.org/files/issues/53177_0.patch
Comment #40
icenogle CreditAttribution: icenogle commentedI used 53177. I thought I was using the latest. I'll try the other and report back.
Darrell
Comment #41
icenogle CreditAttribution: icenogle commentedMy mistake... it was actually 53177_0.patch that I tried. (Wish it were otherwise.)
Darrell
Comment #42
robertDouglass CreditAttribution: robertDouglass commentedicenogle, did you send us the files dump from *after* the failed update? It would be more important to see the files table from before the update, please.
Comment #43
icenogle CreditAttribution: icenogle commentedNo, that was before the update.
I have a test site, and a live site. The live site is running beta4, and I haven't tried taking it past beta4. I have tried beta5 on the test site, given up, and tried beta6.
The table dump is from a db dump from the live site.
What is it that made you think it was from after the update? Doesn't system_update_173() create a new file_revisions table, and possibly a copy of the files table?
Darrell
Comment #44
nedjoShould we be cleaning up the files table (removing duplicate rows) in the first part of the update?
Comment #45
dopry CreditAttribution: dopry commented@icenongle
Did you attempt with the patch, after restoring your database?
@nedjo
No I do not want to clean duplicates out of my files table backup. I want it to be a copy of the original.
@all
I finally have some time to get back on this today.. I'll do some testing. For those of you doing updates from one beta to another make sure you update from a clean database backup. Other wise the first update_173 will split you files table, and the patched version will have no idea what to do.
Comment #46
icenogle CreditAttribution: icenogle commented> Did you attempt with the patch, after restoring your database?
Yes, I did. I'm trying to be clear, here.
I just tried re-loading the original db and dumping the files table again, and I diff'ed them. No differences.
Perhaps someone could tell me why they think I'm sending the post-update files table? What is the symptom of a split files table? (I'm just not a MySQL guy, though I'm gaining.)
Thanks,
Darrell
Comment #47
robertDouglass CreditAttribution: robertDouglass commentedI asked because you seem to only have one file (with no revisions of it) in the database. The particular problem we're trying to solve revolves around files that have multiple revisions. Not that your data isn't valid or helpful, it is both; we just weren't expecting your db table to cause a problem.
Comment #48
chx CreditAttribution: chx commentedare we trying to fix the update of beta X databases? May I ask, why?
Comment #49
icenogle CreditAttribution: icenogle commentedOkay...
I'm here, and willing to do whatever would be helpful. I'm a C++ guy, and don't know how to go about debugging these scripts.
I'll check in once in a while. Let me know if there is something I can do...
Darrell
Comment #50
icenogle CreditAttribution: icenogle commentedSorry for the distraction, folks. My problem was of the "get rid of the js and the problem goes away" category. Wrong bug.
I'll slink away, now...
Darrell
Comment #51
Junyor CreditAttribution: Junyor commentedI originally wrote the code, so let me try to clear some things up. First, dopry's latest patch looks good and prevents a possible disaster if the update doesn't work, though I have not tested it. The DISTINCT when inserting into the file_revision table is a good idea, but I didn't find a need for it in my testing. The reason I made a copy of the files table, rather than renaming it, was to fix the default definition of the files table. In 4.6, the files table defaults were strings, not integers:
In HEAD, it's like this:
If you don't do the copy, you don't get the new table definition.
Next, the mysql portion of the update doesn't use DISTINCT, but uses INSERT IGNORE, instead. INSERT IGNORE drops duplicate data, pretty much the same way DISTINCT does.
If y'all don't think those changes are necessary, no worries. :)
Comment #52
dopry CreditAttribution: dopry commentedHere is an updated version.
Apparently at some point some contrib modules, or head/beta versions have drupal have corrupted drupal.org's file tables. I expect that some of these problems could have occured elsewhere, but what the hey I added some sanity/data integrity checks.
namely a delete from {files} where fid=0;
and update {files} set vid = nid where vid =0;
Integrity seems to check out for the drupal.org files table.
@robertDouglas... you have the same fid associated two two different nid's. You'll have to correct that manually. See fid=32.
@all
This update goes out of the way to make sure you have a backup of the files table if anything goes wrong.
However, I'm not going to fix anymore revision related errors for head/beta upgrades.
Revisions did not even exist in 4.6.
If you have any problems upgrading it is a result of running on beta/head. Then you should be prepared to fix your database manually.
This patch in its current state should be sufficient for a majority of the updates.
Comment #53
dopry CreditAttribution: dopry commentedI'm ready to downgrade this issue to normal, as it is unrelated to a 4.6 > 4.7 upgrade, and none of the people with beta->beta, or head->head upgrades are testing it.
Comment #54
Cvbge CreditAttribution: Cvbge commentedThis patch looks veird.
Anyway, I've fixed it to works with postgresql:
- CREATE TABLE ... _AS_ SELECT ...
- not prefixing temporary tables (not needed + won't work with postgresql)
Sequence must be updated, because we were inserting existing values and were not using it, and the old sequence was deleted when the old table was deleted.
Question: mysql uses distinct on all fields, while postgresql uses distinct on (fid, vid) and on (fid). I think mysql version might be wrong. If we have e.g. (fid, nid) pair and fid is PK, then it could try to insert (10, 20) and (10, 30), because such rows are different. That'd fail, as fid is PK.
I haven't tested the patch, will do tommorow (well, technically it's today ;))
Comment #55
Cvbge CreditAttribution: Cvbge commentedOld patch
Comment #56
Cvbge CreditAttribution: Cvbge commentedI've done update of an old cvs (updates 172+ were done). I had not errors. But I had an empty {files} table.
I've put some garbage into the table and tried again, still no errors. I don't know if the conversion was correct though.
This is not connected with this issue, but after update, when I view a forum thread, the replies (comments) are displayed as titles only (collapsed), not with full body as previously. Anyone else had this problem?
Comment #57
dopry CreditAttribution: dopry commentedI agree the patch is weird.... Becuase its trying to fix things that shouldn't sanely happen, and normalize a table.
The distinct(fid,vid) is there on purpose... There will be druplicate fid's in the file revisions table, a duplicate for every revision. It should probably be pk'd on vid, and fid should just be a key. And it should be a distinct(vid).
so maybe
create file_revisions select distinct(vid) vid, fid, list, description from files_tmp.
create table files select distinct(fid) fid, filename, filepath, filemime, filesize from files_tmp.
I was fixated on fid's for some reason :).
Comment #58
Cvbge CreditAttribution: Cvbge commentedI'm not sure if this is ok.
database.*sql definition of {files} have PK on fid, and {file_revisions} on (fid, vid). Thus, when selecting rows from old {files} we should get rows that have unique (fid, vid) pairs for {file_revisions} and with unique fid for new {files}.
The _3 patch does
DISTINCT ON (fid, vid)
for {file_revisions} andDISTINCT ON (fid)
for {files} for postgresql, thus selecting "unique" rows.But the mysql version uses just
DISTINCT
, so I understand it selects "unique" rows, where a "unique" row is a row that is different than all other rows on *all* columns, not only (fid, vid) paris or (fid). I think this is wrong.Please correct me if I'm wrong.
In _4 you have changed from PK on (fid, vid) to PK on (vid) only. Is this correct? If yes, then also database.*sql should be updated.
Comment #59
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedThere is probably something fishy going on. before the update drupal.org had 428 entries in files and afterwards it has 409.
Comment #60
dopry CreditAttribution: dopry commented@killes,
that sounds about right... There are several 'empty' rows in the drupal.org {files} table...
@Cvbge:
You are correct mysql is selecting a distinct row. It cannot DISTINCT (fid, vid)... It just doesn't work. Its what we want but it doesn't work. Your last patch should do, since I don't really want to switch the PK's or the database.mysql. I wish mysql could select distinct on (fid,vid).... but it will only select distinct across all the columns you are selecting, it will not restrict the distinct to the columns in the distinct clause. That may be why Junyor used INSERT IGNORE originally... Distinct row should be adequate for the file_revisions table in most cases, escpecially if your files table isn;t already corrupt.
Comment #61
dopry CreditAttribution: dopry commentedHere's our update of Cvbge's patch with the files_tmp table properly prefixed...
@Cvbge: I really appreciate you testing this out on postgresql. I'll have to sit down and learn a bit about it one day.
Comment #62
dwwugh, killes now tells us you can't prefix temporary tables. ;)
to review: 53177_3.patch is the current best-guess candidate for this issue... anyone else looking here should test/review that one until further notice.
thanks,
-derek
Comment #63
dwwat killes's suggestion, here's a new version of 53177_3.patch that *does* prefix the {files_tmp} table, but just doesn't declare it TEMPORARY. this is better, and should work just fine. i also trimmed a few trailing whitespace in a few places in the update.
Comment #64
Junyor CreditAttribution: Junyor commented@dopry: Why not use the INSERT IGNORE for mysql since the DISTINCT might not work?
Comment #65
killes@www.drop.org CreditAttribution: killes@www.drop.org commenteddopry is right our files table is quite a mess. I've tried it again and compared file counts before (including the bogus entries) and after (minus the bogus entries) and the difference is exactly the number of bogus entries.
Comment #66
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedThe funny thing is that files aren't versioned at all if we update from 4.6. :p
applied
Comment #67
(not verified) CreditAttribution: commented