Update #173 fails, file version data lost [#53177]

Comment	File	Size	Author
#63	53177_6.patch	4.73 KB	dww
#61	53177_5.patch	4.58 KB	dopry
#57	53177_4.patch	5.08 KB	dopry
#55	53177_3.patch	4.73 KB	Cvbge
#54	53177_2.patch	4.77 KB	Cvbge
#52	53177_1.patch	4.74 KB	dopry
#36	files.txt	25.27 KB	icenogle
#34	files.sql.txt	154.4 KB	robertdouglass
#26	53177_0.patch	5.09 KB	dopry
#16	update173.diff.txt	1.27 KB	robertdouglass
#10	53177.patch	815 bytes	dopry
#9	after_revision.txt	308 bytes	sime

Comment #1

dries commented 9 March 2006 at 11:53

I can reproduce this problem. When I execute the query on the command line, I get:

INSERT INTO file_revisions SELECT fid, vid, description, list FROM files;
ERROR 1054 (00000): Unknown column 'vid' in 'field list'

Log in or register to post comments

Comment #3

robertdouglass commented 9 March 2006 at 13:34

The reason why my update fails the way it does is clear; I have multiple entries in the files table that have the same fid and vid. I've been using this version of the database.mysql file:
$Id: database.mysql,v 1.222 2006/01/24 18:23:41 dries Exp $

where the files definition reads:

CREATE TABLE files (
  fid int(10) unsigned NOT NULL default '0',
  nid int(10) unsigned NOT NULL default '0',
  vid int(10) unsigned NOT NULL default '0',
  filename varchar(255) NOT NULL default '',
  description varchar(255) NOT NULL default '',
  filepath varchar(255) NOT NULL default '',
  filemime varchar(255) NOT NULL default '',
  filesize int(10) unsigned NOT NULL default '0',
  list tinyint(1) unsigned NOT NULL default '0',
  KEY vid (vid),
  KEY fid (fid)
) TYPE=MyISAM
/*!40100 DEFAULT CHARACTER SET utf8 */ ;

I would guess that there should have been a primary key on vid, fid. I wonder how many people will have files tables that have duplicates like I do?

Log in or register to post comments

Comment #4

robertdouglass commented 9 March 2006 at 13:37

To answer my own question, all of the people with databases older than:
version 1.224, Wed Feb 22 10:06:46 2006 UTC

Log in or register to post comments

Comment #5

robertdouglass commented 9 March 2006 at 13:47

To clarify; my duplicate records share the same fid, nid and vid. The primary key should have been (fid) alone, I think.

Log in or register to post comments

Comment #6

robertdouglass commented 9 March 2006 at 14:06

Changing the query to this resolved the issue for me:

INSERT INTO {file_revisions} SELECT DISTINCT(fid), vid, description, list FROM {files}

I'm guessing that I ended up with the same copy of the file that I was getting in the web appliation, but maybe this needs to be confirmed. Can't roll a patch right now because I'm behind my client's firewall on their machine, but the code is on 1601 and 1628 of updates.inc.

Log in or register to post comments

Comment #7

dries commented 9 March 2006 at 14:13

Shouldn't we add a primary key too?

Log in or register to post comments

Comment #8

robertdouglass commented 9 March 2006 at 14:25

I think it is already there.

Log in or register to post comments

Comment #9

sime

any/any

Melbourne

commented 9 March 2006 at 17:09

Status	File	Size
new	after_revision.txt	308 bytes

I've recreated the issue quite easily, at least the specific case that Robert is describing. I installed 4.7b4 and activated the upload module. I had a quick look at the table and noted that there is no primary key, just two non-unique indexes on fid and vid.

Then I created a new page node and added two files. I check the files table and saw two rows had been created.

Then I edited the node and saved it. No additional rows in the files table. OK

Then I edited the node, ticked "revision" and saved it. Looking in the files table the two rows were duplicated exactly. (Tab-separated file attached).

On Robert's suggestion, I was going to make a patch and see if I could upgrade by applying a "DISTINCT" clause to the SQL in the appropriate places, but I'm really not clear about what upgrade process is yet. To be sure, I looked in the 4.7b5 database.mysql file and see that fid is now a primary key, so we can be sure that the upgrade wouldn't work as is.

It's a bit late for me here, so I'll log off and see how this issue is going tomorrow
Simon

Log in or register to post comments

Comment #10

dopry commented 9 March 2006 at 20:29

Status	File	Size
new	53177.patch	815 bytes

Attached patch with DISTINCT... curiously the pgsql version alread had a distinct with the select
that built the new files table.

@sime
you output seems correct...Everything is identical between the two sets except the vid.

Log in or register to post comments

Comment #11

dopry commented 9 March 2006 at 20:32

Status:

Active

» Needs review

umm yeah.. I always forget to update that status thing...

Log in or register to post comments

Comment #12

killes@www.drop.org commented 9 March 2006 at 20:36

I am pretty sure that simply adding DISTINCT might "fix" the problem by losing data. Sime's output is not buggy but as it should be. The files are attached both to the first and the second revision, hence the double entry.

Log in or register to post comments

Comment #13

dopry commented 9 March 2006 at 20:44

killes, no data should be lost. The unique/per vid data has already been copied to the file_revisions table.
What remains, and what is being placed in the new files table from files_copy should be the same for all vids sharing an fid.

Log in or register to post comments

Comment #14

robertdouglass commented 9 March 2006 at 21:01

I had grotesquely mangled data where fid, nid and vid were exactly duplicated:

fid nid vid
1 5 7
1 5 7

like this. When that is the case, there is no recovery possible... take one and be happy. What is your suggestion, Gerhard?

Log in or register to post comments

Comment #15

dopry commented 9 March 2006 at 22:29

@robertDouglas: did the duplication occur with the distinct patch? or did you get mangling of data by using the distinct patch? Do you have a copy of the data I can work with?...

IF it is just duplication that is causing a problem something like...

CREATE table files_copy AS SELECT * FROM files;
drop table files;
CREATE table files AS SELECT DISTINCT(fid), nid, filename, filepath, filemime, filesize FROM files_copy;
drop table file_copy;

may do the job...

Log in or register to post comments

Comment #16

robertdouglass commented 10 March 2006 at 09:08

Status	File	Size
new	update173.diff.txt	1.27 KB

This is what I had in mind. Hope the patch is ok... I had to work pretty hard to make it (not on my own environment, still).

Log in or register to post comments

Comment #17

dopry commented 10 March 2006 at 09:16

You may want to use distinct vid, otherwise you will lose revision information.

Log in or register to post comments

Comment #18

robertdouglass commented 10 March 2006 at 10:07

@dopry: that might work. Can you roll a patch that does that? Like I said, rolling patches is hard for me from where I am.

Backing up: there are two issues here. First, this upgrade is dangerous for anyone who has a files table that was created after the revisions patch went in because the update fails to copy the information from the temporary revisions table yet still deletes the table, despite the failure.

Second, we need to assess how many corrupt files tables we've created in the wild, and what the best way to recover is. These are two separate tasks, and my patch addresses the corrupt tables in the wild scenario.

Log in or register to post comments

Comment #19

dopry commented 10 March 2006 at 11:41

...
I still don't know what you started with and what you ended up with. Whether the duplicates you had in the files table were because the missing distinct on the second query which rebuilt the files table... I'd like to be able to duplicate the error you experienced.
...

Log in or register to post comments

Comment #20

robertdouglass commented 10 March 2006 at 13:27

This is what my files table looked like *before* the update:

fid	nid	vid	filename	description	filepath	filemime	filesize	list
1	14	14	doc.pdf	doc.pdf	path/doc.pdf	application/pdf	21745	1
1	14	14	doc.pdf	doc.pdf	path/doc.pdf	application/pdf	21745	1
1	14	14	doc.pdf	doc.pdf	path/doc.pdf	application/pdf 21745	1
1	14	14	doc.pdf	doc.pdf	path/doc.pdf	application/pdf	21745	1
1	14	14	doc.pdf	doc.pdf	path/doc.pdf	application/pdf	21745	1
2	14	14	doc2.pdf	doc2.pdf	path/doc2.pdf	application/pdf	21745	1
2	14	14	doc2.pdf	doc2.pdf	path/doc2.pdf	application/pdf	21745	1
2	14	14	doc2.pdf	doc2.pdf	path/doc2.pdf	application/pdf	21745	1
2	14	14	doc2.pdf	doc2.pdf	path/doc2.pdf	application/pdf	21745	1

Log in or register to post comments

Comment #21

robertdouglass commented 10 March 2006 at 13:31

lol, so much for being able to set the input format. As you were supposed to easily see from the table that I made (which then had its tags stripped), each row is exactly the same. This is due to a bug that existed in Drupal before this update, but which has been corrected by this update. However, the flawed data that the bug created still exists (like above), and the update process, as is, will destroy *all* of the file revisions data if it encounters a table like the one I have. My patch handles the flawed data well enough, considering how broken the data is to begin with. Perhaps it would be better to say DISTINCT(vid) instead, though in my case it wouldn't help any, I don't think.

Log in or register to post comments

Comment #22

dries commented 10 March 2006 at 19:08

On drupal.org, we have lots of those:

select fid, nid, vid, filename from files;
...
|      0 | 37194 | 37952 |                                       |
|      0 | 41257 | 46169 |                                       |
| 157426 | 22138 |     0 | thumbnail                             |
| 159326 | 31175 |     0 | thumbnail                             |
| 159373 | 31172 |     0 | thumbnail                             |
| 159354 |  3945 |     0 | preview                               |
| 158721 |  3944 |     0 | preview                               |
| 158722 |  3944 |     0 | preview                               |
|     65 |  1348 |  1348 | _original                             |
|     66 |  1350 |  1350 | _original                             |
|     67 |  1351 |  1351 | _original                             |
...

Looks kinda fubar to me ...

Log in or register to post comments

Comment #23

dopry commented 10 March 2006 at 19:51

is drupal.org using, or has it used image.module? the _original, thumbnail, preview suggest so... :)
Do these files have associated file->mime/size/desc that is accurate or are thos fields empty?

Log in or register to post comments

Comment #24

sepeck commented 10 March 2006 at 20:31

Yes Drupal.org uses image module.
http://drupal.org/node/27367

Log in or register to post comments

Comment #25

dries commented 11 March 2006 at 13:55

dopry, some files have mimetype informations, others have not. :/

mysql> select count(fid) from files where filemime = '';
+------------+
| count(fid) |
+------------+
|         62 |
+------------+

mysql> select count(fid) from files where filemime != '';
+------------+
| count(fid) |
+------------+
|        353 |
+------------+

mysql> select fid, nid, vid, filename, filepath from files where filemime = '' limit 9;
+--------+-------+-------+----------------------------------+----------------------------------------+
| fid    | nid   | vid   | filename                         | filepath                               |
+--------+-------+-------+----------------------------------+----------------------------------------+
|  79958 | 40245 | 41577 | the-onion-on-google.png          | files/the-onion-on-google.png          |
|  67710 | 38738 | 39765 | forum47.txt                      | files/forum47.txt                      |
|  85525 | 15209 | 42482 | bluebeach-spotlight-template.psd | files/bluebeach-spotlight-template.psd |
| 112097 | 44796 | 46957 | happy-birthday.jpg               | files/happy-birthday.jpg               |
|  93091 | 42085 | 43631 | jscalendar.module.txt            | files/jscalendar.module.txt            |
|  73772 | 39464 | 40682 | EASA_AST_2006_003.pdf            | files/EASA_AST_2006_003.pdf            |
|  93947 | 42200 | 43759 | toc.pdf                          | files/toc.pdf                          |
| 118071 | 46787 | 49536 | workspace.png                    | files/workspace.png                    |
|      0 | 46787 | 49536 |                                  |                                        |
+--------+-------+-------+----------------------------------+----------------------------------------+

Log in or register to post comments

Comment #26

dopry commented 12 March 2006 at 01:24

Status	File	Size
new	53177_0.patch	5.09 KB

I'm not sure that we can really address pre-existing file table corruption. I think that is something administrators are going to have to figure out for themselves, or ask for help on... I don't think many people in the wild will experience this... I think we are most likely to see this kind of corruption coming from people who track head, and apply updates inconsistantly, and run buggy core or contrib modules.

I think admins will have to take care of that...

I rewrote the update with the DISTINCT clauses fixed... You may get duplications in the revisions table if fid,vid match and there are different descriptions or list options using mysql. Their distinct is apparantly broken and no matter how you write it, it will only do distinctrow.

I also switch to saving the files table by using an ALTER rename to files_copy to preserve keys, etc.
I fixed it so files_copy will not be dropped if an error occurred on any of the queries.

I would appreciate it if someone familiar with postgre would test the queries... I could only read the distinct clause syntax. I don't run post gre locally.

Log in or register to post comments

Comment #27

chx commented 13 March 2006 at 01:08

I think we shall only care 4.6 -> HEAD upgrades, oldHEAD->HEAD upgrades are of little concern IMO.

Log in or register to post comments

Comment #28

robertdouglass commented 13 March 2006 at 06:15

Ok. So we forget about data consistency, fine. We need to stop deleting the temp_files table if the update fails, though. This is a really hard blow. The whole update needs to fail if the SELECT INTO fails.

Log in or register to post comments

Comment #29

robertdouglass commented 13 March 2006 at 06:16

But what is wrong with one of the patch versions here that lets the oldbeta->4.7 upgrade succeed? It won't hurt the 4.6->4.7 ugrade? Take a look at the last patch submitted. What is the worst that can happen from it?

Log in or register to post comments

Comment #30

dopry commented 13 March 2006 at 21:36

The last patch does address the problem with mysql upgrades and row duplication in the tables. It also provides a backup files table in case something goes wrong. I don't think we should have to go that far, since the upgrade instructions include directions to backup your database, before upgrading!!!

Whats wrong with keeping this backup table around on partial upgrades? Well nothing besides someone may forget about it and drag it around for a while, however if we did that with everything a bad upgrade would quickly leave you with a bug mess of _copy tables. Also a second run of the update will now fail because the table files_copy already exists...

If you want real data consistency we should be using transactions!

However before we talk more about the issue more:

Review/Test the stinking patch!

It works for me on mysql, but I tested the queries by hand with a combination of the data I was supplied by both dries and robert. I hadn't really experience the problem before, but my personal site doesn't have any revisions so the process went flawlessly for me to begin with.

.darrel.

Log in or register to post comments

Comment #31

robertdouglass commented 13 March 2006 at 21:48

sorry I didn't see the care you had taken to recover from failure. Thanks! I'll test in the morning.

Log in or register to post comments

Comment #32

dopry commented 14 March 2006 at 20:16

@robertDouglass, If you don't have time you can always send me a dump of your database after a DELETE FROM users. And I can test the updates with access checks disabled. I'd really like to get this issue closed.

Log in or register to post comments

Comment #33

dopry commented 15 March 2006 at 07:49

Assigned:

Unassigned

» dopry

might as well assign it to myself.

Log in or register to post comments

Comment #34

robertdouglass commented 21 March 2006 at 11:40

Status	File	Size
new	files.sql.txt	154.4 KB

I'm in the process of testing the patch (finally). It succeeds in handling failure gracefully, and it applies cleanly, but it still chokes on my data set.

user warning: Duplicate entry '32' for key 1 query: INSERT INTO files SELECT DISTINCT fid, nid, filename, filepath, filemime, filesize FROM files_copy in D:\tmp\drupal-4.7.0-beta-5\includes\database.mysql.inc on line 120.

The data set is attached.

Log in or register to post comments

Comment #35

robertdouglass commented 21 March 2006 at 11:43

Before we get too hung up on the state of the data set, it should be noted that the original goal of this issue (not nuking the table if the upgrade fails) has been met. The data set that I uploaded is the product of a Drupal application that has been under constant development and customization since around October 2005, and has seen around 20 updates to the Drupal code base. It is clearly the exception, not the rule. So I think it is probably important to determine if anyone else has a files table as fubar as mine. Is the Drupal.org files table this messed up? Anyone else?

Log in or register to post comments

Comment #36

icenogle commented 22 March 2006 at 00:41

Title:

Update #173 fails, file version data lost

» patch doesn't work for me

Status	File	Size
new	files.txt	25.27 KB

I installed beta6, and first tried the unpatched unpdates.inc. It hung immediately. Then I applied the patch, and it hung again.

I'm attaching my files table dump.

Darrell Icenogle

Log in or register to post comments

Comment #37

webchick

she/they

English

Vancouver 🇨🇦

commented 22 March 2006 at 01:20

Title:

patch doesn't work for me

» Update #173 fails, file version data lost

icenogle: just a quick note... when you change the title when you reply to an issue you actually change the title for the entire issue. :) Switching it back.

Log in or register to post comments

Comment #38

icenogle commented 22 March 2006 at 01:22

Wondered how that happened... Sorry.

Log in or register to post comments

Comment #39

robertdouglass commented 22 March 2006 at 08:06

Can you specify which patch you applied? The most complete it dopry's patch: http://drupal.org/files/issues/53177_0.patch

Log in or register to post comments

Comment #40

icenogle commented 22 March 2006 at 12:28

I used 53177. I thought I was using the latest. I'll try the other and report back.

Darrell

Log in or register to post comments

Comment #41

icenogle commented 22 March 2006 at 14:05

My mistake... it was actually 53177_0.patch that I tried. (Wish it were otherwise.)

Darrell

Log in or register to post comments

Comment #42

robertdouglass commented 22 March 2006 at 14:40

icenogle, did you send us the files dump from *after* the failed update? It would be more important to see the files table from before the update, please.

Log in or register to post comments

Comment #43

icenogle commented 22 March 2006 at 16:49

No, that was before the update.

I have a test site, and a live site. The live site is running beta4, and I haven't tried taking it past beta4. I have tried beta5 on the test site, given up, and tried beta6.

The table dump is from a db dump from the live site.

What is it that made you think it was from after the update? Doesn't system_update_173() create a new file_revisions table, and possibly a copy of the files table?

Darrell

Log in or register to post comments

Comment #44

nedjo

he/him/his

English

commented 22 March 2006 at 16:58

Should we be cleaning up the files table (removing duplicate rows) in the first part of the update?

CREATE TABLE {files_copy} AS SELECT DISTINCT * FROM {files};

Log in or register to post comments

Comment #45

dopry commented 22 March 2006 at 18:02

@icenongle
Did you attempt with the patch, after restoring your database?

@nedjo
No I do not want to clean duplicates out of my files table backup. I want it to be a copy of the original.

@all
I finally have some time to get back on this today.. I'll do some testing. For those of you doing updates from one beta to another make sure you update from a clean database backup. Other wise the first update_173 will split you files table, and the patched version will have no idea what to do.

Log in or register to post comments

Comment #46

icenogle commented 22 March 2006 at 19:37

> Did you attempt with the patch, after restoring your database?

Yes, I did. I'm trying to be clear, here.

I dumped the database from a live, working, hosted 4.7beta4 site.
I loaded that database into my test site. It's a Wndows XP machine on my private network.
I tried to update, and got spinning barber pole, no progress.
I re-loaded the database from the original dump (see #1)
I applied the patch to updates.inc (53177_0.patch). I applied the patch by hand, but I think I got it right.
I tried the update again, with the same result as #3.
I restored the database from the original dump (#1).
I dumped the files table as files.txt and sent it to you.

I just tried re-loading the original db and dumping the files table again, and I diff'ed them. No differences.

Perhaps someone could tell me why they think I'm sending the post-update files table? What is the symptom of a split files table? (I'm just not a MySQL guy, though I'm gaining.)

Thanks,

Darrell

Log in or register to post comments

Comment #47

robertdouglass commented 22 March 2006 at 19:41

I asked because you seem to only have one file (with no revisions of it) in the database. The particular problem we're trying to solve revolves around files that have multiple revisions. Not that your data isn't valid or helpful, it is both; we just weren't expecting your db table to cause a problem.

Log in or register to post comments

Comment #48

chx commented 22 March 2006 at 19:54

are we trying to fix the update of beta X databases? May I ask, why?

Log in or register to post comments

Comment #49

icenogle commented 22 March 2006 at 19:59

Okay...

I'm here, and willing to do whatever would be helpful. I'm a C++ guy, and don't know how to go about debugging these scripts.

I'll check in once in a while. Let me know if there is something I can do...

Darrell

Log in or register to post comments

Comment #50

icenogle commented 22 March 2006 at 22:58

Sorry for the distraction, folks. My problem was of the "get rid of the js and the problem goes away" category. Wrong bug.

I'll slink away, now...

Darrell

Log in or register to post comments

Comment #51

junyor commented 22 March 2006 at 23:19

I originally wrote the code, so let me try to clear some things up. First, dopry's latest patch looks good and prevents a possible disaster if the update doesn't work, though I have not tested it. The DISTINCT when inserting into the file_revision table is a good idea, but I didn't find a need for it in my testing. The reason I made a copy of the files table, rather than renaming it, was to fix the default definition of the files table. In 4.6, the files table defaults were strings, not integers:

CREATE TABLE files (
  fid int(10) unsigned NOT NULL default '0',
  nid int(10) unsigned NOT NULL default '0',
  filename varchar(255) NOT NULL default '',
  filepath varchar(255) NOT NULL default '',
  filemime varchar(255) NOT NULL default '',
  filesize int(10) unsigned NOT NULL default '0',
  list tinyint(1) unsigned NOT NULL default '0',
  PRIMARY KEY (fid)
) TYPE=MyISAM;

In HEAD, it's like this:

CREATE TABLE files (
  fid int(10) unsigned NOT NULL default 0,
  nid int(10) unsigned NOT NULL default 0,
  filename varchar(255) NOT NULL default '',
  filepath varchar(255) NOT NULL default '',
  filemime varchar(255) NOT NULL default '',
  filesize int(10) unsigned NOT NULL default 0,
  PRIMARY KEY (fid)
)
/*!40100 DEFAULT CHARACTER SET utf8 */ ;

If you don't do the copy, you don't get the new table definition.

Next, the mysql portion of the update doesn't use DISTINCT, but uses INSERT IGNORE, instead. INSERT IGNORE drops duplicate data, pretty much the same way DISTINCT does.

If y'all don't think those changes are necessary, no worries. :)

Log in or register to post comments

Comment #52

dopry commented 23 March 2006 at 05:00

Status	File	Size
new	53177_1.patch	4.74 KB

Here is an updated version.

Apparently at some point some contrib modules, or head/beta versions have drupal have corrupted drupal.org's file tables. I expect that some of these problems could have occured elsewhere, but what the hey I added some sanity/data integrity checks.

namely a delete from {files} where fid=0;
and update {files} set vid = nid where vid =0;

Integrity seems to check out for the drupal.org files table.

@robertDouglas... you have the same fid associated two two different nid's. You'll have to correct that manually. See fid=32.

@all
This update goes out of the way to make sure you have a backup of the files table if anything goes wrong.
However, I'm not going to fix anymore revision related errors for head/beta upgrades.

Revisions did not even exist in 4.6.

If you have any problems upgrading it is a result of running on beta/head. Then you should be prepared to fix your database manually.

This patch in its current state should be sufficient for a majority of the updates.

Log in or register to post comments

Comment #53

dopry commented 24 March 2006 at 15:50

I'm ready to downgrade this issue to normal, as it is unrelated to a 4.6 > 4.7 upgrade, and none of the people with beta->beta, or head->head upgrades are testing it.

Log in or register to post comments

Comment #54

Cvbge commented 24 March 2006 at 23:59

Status	File	Size
new	53177_2.patch	4.77 KB

This patch looks veird.

Anyway, I've fixed it to works with postgresql:
- CREATE TABLE ... _AS_ SELECT ...
- not prefixing temporary tables (not needed + won't work with postgresql)

Sequence must be updated, because we were inserting existing values and were not using it, and the old sequence was deleted when the old table was deleted.

Question: mysql uses distinct on all fields, while postgresql uses distinct on (fid, vid) and on (fid). I think mysql version might be wrong. If we have e.g. (fid, nid) pair and fid is PK, then it could try to insert (10, 20) and (10, 30), because such rows are different. That'd fail, as fid is PK.

I haven't tested the patch, will do tommorow (well, technically it's today ;))

Log in or register to post comments

Comment #55

Cvbge commented 25 March 2006 at 00:04

Status	File	Size
new	53177_3.patch	4.73 KB

Old patch

Log in or register to post comments

Comment #56

Cvbge commented 25 March 2006 at 10:30

I've done update of an old cvs (updates 172+ were done). I had not errors. But I had an empty {files} table.
I've put some garbage into the table and tried again, still no errors. I don't know if the conversion was correct though.

This is not connected with this issue, but after update, when I view a forum thread, the replies (comments) are displayed as titles only (collapsed), not with full body as previously. Anyone else had this problem?

Log in or register to post comments

Comment #57

dopry commented 25 March 2006 at 18:20

Status	File	Size
new	53177_4.patch	5.08 KB

I agree the patch is weird.... Becuase its trying to fix things that shouldn't sanely happen, and normalize a table.

The distinct(fid,vid) is there on purpose... There will be druplicate fid's in the file revisions table, a duplicate for every revision. It should probably be pk'd on vid, and fid should just be a key. And it should be a distinct(vid).

so maybe
create file_revisions select distinct(vid) vid, fid, list, description from files_tmp.
create table files select distinct(fid) fid, filename, filepath, filemime, filesize from files_tmp.

I was fixated on fid's for some reason :).

Log in or register to post comments

Comment #58

Cvbge commented 25 March 2006 at 20:11

I'm not sure if this is ok.

database.*sql definition of {files} have PK on fid, and {file_revisions} on (fid, vid). Thus, when selecting rows from old {files} we should get rows that have unique (fid, vid) pairs for {file_revisions} and with unique fid for new {files}.

The _3 patch does DISTINCT ON (fid, vid) for {file_revisions} and DISTINCT ON (fid) for {files} for postgresql, thus selecting "unique" rows.
But the mysql version uses just DISTINCT, so I understand it selects "unique" rows, where a "unique" row is a row that is different than all other rows on *all* columns, not only (fid, vid) paris or (fid). I think this is wrong.

Please correct me if I'm wrong.

In _4 you have changed from PK on (fid, vid) to PK on (vid) only. Is this correct? If yes, then also database.*sql should be updated.

Log in or register to post comments

Comment #59

killes@www.drop.org commented 25 March 2006 at 23:00

There is probably something fishy going on. before the update drupal.org had 428 entries in files and afterwards it has 409.

Log in or register to post comments

Comment #60

dopry commented 25 March 2006 at 23:07

@killes,
that sounds about right... There are several 'empty' rows in the drupal.org {files} table...

@Cvbge:
You are correct mysql is selecting a distinct row. It cannot DISTINCT (fid, vid)... It just doesn't work. Its what we want but it doesn't work. Your last patch should do, since I don't really want to switch the PK's or the database.mysql. I wish mysql could select distinct on (fid,vid).... but it will only select distinct across all the columns you are selecting, it will not restrict the distinct to the columns in the distinct clause. That may be why Junyor used INSERT IGNORE originally... Distinct row should be adequate for the file_revisions table in most cases, escpecially if your files table isn;t already corrupt.

Log in or register to post comments

Comment #61

dopry commented 25 March 2006 at 23:26

Status	File	Size
new	53177_5.patch	4.58 KB

Here's our update of Cvbge's patch with the files_tmp table properly prefixed...

@Cvbge: I really appreciate you testing this out on postgresql. I'll have to sit down and learn a bit about it one day.

Log in or register to post comments

Comment #62

dww

we/he/they

commented 25 March 2006 at 23:31

ugh, killes now tells us you can't prefix temporary tables. ;)

to review: 53177_3.patch is the current best-guess candidate for this issue... anyone else looking here should test/review that one until further notice.

thanks,
-derek

Log in or register to post comments

Comment #63

dww

we/he/they

commented 25 March 2006 at 23:46

Version:

4.7.0-beta5

» x.y.z

Status	File	Size
new	53177_6.patch	4.73 KB

at killes's suggestion, here's a new version of 53177_3.patch that *does* prefix the {files_tmp} table, but just doesn't declare it TEMPORARY. this is better, and should work just fine. i also trimmed a few trailing whitespace in a few places in the update.

Log in or register to post comments

Comment #64

junyor commented 25 March 2006 at 23:58

@dopry: Why not use the INSERT IGNORE for mysql since the DISTINCT might not work?

Log in or register to post comments

Comment #65

killes@www.drop.org commented 26 March 2006 at 00:10

Status:

Needs review

» Reviewed & tested by the community

dopry is right our files table is quite a mess. I've tried it again and compared file counts before (including the bogus entries) and after (minus the bogus entries) and the difference is exactly the number of bogus entries.

Log in or register to post comments

Comment #66

killes@www.drop.org commented 26 March 2006 at 01:01

Status:

Reviewed & tested by the community

» Fixed

The funny thing is that files aren't versioned at all if we update from 4.6. :p

applied

Log in or register to post comments

Comment #67

(not verified) commented 9 April 2006 at 01:15

Status:

Fixed

» Closed (fixed)

Log in or register to post comments

Update #173 fails, file version data lost

Comments