Jump to:
| Project: | Drupal core |
| Version: | 7.x-dev |
| Component: | database update system |
| Category: | bug report |
| Priority: | major |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | D7 upgrade path |
Issue Summary
Problem/Motivation
You get an error while updating from a D6 site
system module
Update #7061
Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'public://sites/default/files/HotspotFinder.apk' for key 'uri': INSERT INTO {file_managed} (fid, uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, Proposed resolution
This issue has no solutions yet.
workaround / analysis
As boran tells in #2 you could analyze your files table for the fids
Some SQL to help you identify the duplicate filepath.
SELECT count(*), min(fid), max(fid), filepath FROM files GROUP BY filepath HAVING count(*) > 1 ORDER BY count(*);Next you could delete the lowest fids with the next SQLs. Be aware that this does not fix the problem but you could test other parts of the upgrade process. Some node/user/? could mis there FIDs
CREATE TABLE temp_fids AS SELECT min(fid) fid FROM files GROUP BY filepath HAVING count(*)>1;
DELETE FROM files WHERE fid in (SELECT fid FROM temp_fids);As observed in #11 a relation between published and unpublished nodes could be a cause.
Remaining tasks
Is this related to #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case
Original report by boran
system module
Update #7061
Failed: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'public://sites/default/files/HotspotFinder.apk' for key 'uri': INSERT INTO {file_managed} (fid, uid, filename, uri, filemime, filesize, status, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7); Array ( [:db_insert_placeholder_0] => 56 [:db_insert_placeholder_1] => 38 [:db_insert_placeholder_2] => HotspotFinder.apk [:db_insert_placeholder_3] => public://sites/default/files/HotspotFinder.apk [:db_insert_placeholder_4] => application/octet-stream [:db_insert_placeholder_5] => 507667 [:db_insert_placeholder_6] => 1 [:db_insert_placeholder_7] => 1253524245 ) in system_update_7061() (line 2808 of /disk2/www/drupal7/modules/system/system.install).On the D6 site there are entries in the file table with the same filepath:
| fid | uid | filename | filepath | filemime | filesize | status | timestamp
| 54 | 38 | HotspotFinder.apk | sites/default/files/HotspotFinder.apk | application/octet-stream | 178090 | 1 | 1252920928 |
| 56 | 38 | HotspotFinder.apk | sites/default/files/HotspotFinder.apk | application/octet-stream | 507667 | 1 | 1253524245 |On D7 the file_managed table has the uri column as a unique key.
So the upgrade script fails, as noted above, when one tries to insert a file with the same path.
possible solutions:
- if there duplicates, only upgrade the first one, reference that, and issues a warning to the User. The user will need to manually update any non core modules that might reference the duplicate file entries.
- make a copy of the file for each duplicate and reference that
Comments
#1
I have just experienced this error too, upgrading from 6.22 > 7.8.
Is there a workaround for the time being?
#2
I was working on a patch to handle the first file and warn about the others, but am lacking time right now.
Suggestion:
query your file table to see what duplicates you have (do a group by on the filepath). If these are very few:
- note the fid, and find out in what other tables reference the fid (cck fields, nodes.. )
- manually delete the duplicates from the file table,
- upgrade to 7.x
- manually correct the affected nodes/fields or whatever the affected file was attached too.
#3
Duplicate of #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case.
#4
@catch: this report does not have different case file names.
I reopened this as I guess the root case here is different.
As I commented on #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case (http://drupal.org/node/966210#comment-5116402) my system had no case sensitive problem but duplicate entries based on uid in the files table.
What worries me here it the mentioned files have different sizes. Which one should be used?
#5
I tried to delete the duplicate file rows by first checking for how many duplicates there are
[edit: replaced filename by filepath see comment #6]
SELECT min(fid) FROM files GROUP BY filepath HAVING count(*)>1;then define a delete query
DELETE FROM files where fid in (SELECT min(fid) FROM files GROUP BY filepath HAVING count(*)>1);which is not allowed by mysql so I created a temp table having the lowest fid to delete
CREATE TABLE temp_clemens AS SELECT min(fid) fid FROM files GROUP BY filepath HAVING count(*)>1;The delete the rows to further test the upgrade.
DELETE FROM files WHERE fid IN (SELECT fid FROM temp_clemens);my 2cents
#6
Hmmm comment #5 should mention filepath instead of filename as this is about the D7 file URI schemes like public://img.png
#7
#5,
Based on information on the MySQL site, this is the query I used to workaround this issue myself:
CREATE TABLE tmp SELECT * FROM files GROUP BY filepath, filename;DROP TABLE files;
ALTER TABLE tmp RENAME TO files;
Basically does the same thing.
I still don't know what effect it will have on the site to have removed columns from the Files table, but I would guess that it's not really a good thing. If anything, there should be some sort of find and replace functionality that replaces the removed FIDs with the leftover FIDs.
#8
@Deciphered #7
What you did was dropping the unique constraint on the filepath column. And others.
(I should keep the files table by renaming it instead of dropping)
So yes ... that is a nice temporary workaround for testing the rest of the upgrade process.
But no ... I guess when you delete one of the fids through the UI the FILE will be deleted too. So the other fid becomes useless.
#9
So there was never a unique key on filepath in Drupal 6, but we have a unique key on uri in D7. This looks like at least major to me. Also tagging with upgrade path.
#10
#11
A very weird observation I have is this SQL. Note this is data from a D5 site.
SELECT n.title, n.type, n.status, f.filepath FROM node n INNER JOIN files f ON n.nid = f.nidWHERE f.filepath IN (
SELECT filepath FROM files GROUP BY filepath HAVING count(*) > 1 ORDER BY count(*)
)
ORDER BY f.filepath, n.status;
This give a list like this
+--------------------------------------------------+--------------+--------+---------------------------------------------------------------+| title | type | status | filepath |
+--------------------------------------------------+--------------+--------+---------------------------------------------------------------+
| 2x2 Value Matrix | type_x | 0 | files/2x2_Value_Matrix.doc |
| 2x2 Value Matrix | type_x | 1 | files/2x2_Value_Matrix.doc |
...
That is _ALL_ duplicate filepath entries have different node.status
Maybe the cause was #92820: uploaded file attached to multiple nodes when submitting with two node/adds open
So my fix can be deleting all the files of all unpublished nodes and let the site owner repair them based on status and title.
[edit]
DELETE files FROM files INNER JOIN node n ON files.nid = n.nid WHERE n.status = 0;[/edit]
(Puzzle for this particular site is why they created duplicate node entries in the first place)
#12
A problem with #11 is the NIDs between the unpublished and published nodes are not next or near to each other. They differ between 94 - 220. But the published nodes all have consecutive NIDs
select f.fid, g.fid, (g.fid - f.fid) fdiv, f.filepath, g.filepath FROM files f INNER JOIN files g ON f.filepath = g.filepath WHERE f.fid <> g.fid ORDER BY g.fid;I should analyze the created and updated times. Maybe next time as I still have to upgrade a site :)
#13
Appreciated Clemens, your solution at #5 allowed me to execute the system update, so far so good. I have other errors now with other modules but hopefully I am a step closer for a functioning d7 site :)
#14
Running into same error. I'd say this major.
#15
@markwk: it is quite useless just saying me too :)
Please tell us more about your particular site. And ... :)
[Stock response from Dreditor templates and macros.]
Please update the issue summary by editing the issue. This helps people to understand this issue quicker.
#16
@clemens.tolboom: it's even more "quite useless" to say what is useless or useful for someone else :P
I'm updating a site to D7 that started in something hybrid vBulletin/D5 and now in D6. It has lots of files and due to the nature of uploads quite a few have same names with different cases. I'm currently just deleting the duplicates in the db so I can move forward with other upgrade issues.
I don't really feel like digging into a core issue/patch to solve this issue so revising my end to make Drupal upgrade and be happy.
I haven't gotten around to coding a solution but since it involves only 60-100 files, I'll probably add two more steps to the bash script:
1. to take the duplicate files and append a _1 to the file, so where filename.doc and FileName.doc are creating problems, the more recent one will become Filename_1.doc.
2. rename the filepath with the revised file name.
#17
@markwk, this issue is not about files that only differ in case. Issue #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case deals with those.
I too encountered this on a D6-D7 upgrade on a site that was upgraded from D5. The issue seems to be caused by node revisions created in the D5-site. Even though the revisions in my case point to the same filepath, they are all linked to different fids via the upload modules db table.
If others can confirm, and we can indeed pinpoint upload as the culprit a proper cause of action would be to find all fids pointing to the same filepath, keep one record and update the upload table to replace the spurious fids with the saved one. A good place to do so would IMO be in a new hook_update_N for a future D6 release.
#18
I don't see how we can do a generic fix, unless one decides to take the first file, ignore the duplicates and print a warning.
The impact of ignoring the duplicate will be site specific, i.e. could make little difference or mean content is lost.
However by not providing an automated fix, sysadmins are obliged to dive into the sql and make the fixes manually. Although they lose time, they are at least more aware of the data loss risks. This issue doe not seem to affect that many sites? (would be nice if one could se the number of followers to this thread :)
#19
@boran, if the cause of this is the D5 upload module (and not the filesystem itself), it should be possible to provide a generic fix.
#20
Ah. In my case there only D6. no D5.
#21
I'm also having this issue as a result of upgrading a large site from Drupal 5 to 6 and finally 7. I have over 9,000 files with duplicate names, between 2 and 2,500 copies of each, leaving a total of over 28,000 conflicted files.
None of these are due to revisions. Simply deleting references to copies is not an option since that will result in thousands of pages with missing content. That only leaves a few options, none of them good:
1. Change the definition of the file_managed table to allow the uri to be non-unique. That would be quick and easy, but I have no idea what problems that might cause down the line.
2. Create a table mapping duplicate file paths to the first fid with that path, reduce the file_managed table and work through all fields that reference files, replacing the fids. This might also have issues down the line since multiple content items will be referencing the same file in the table, and who knows what will happen if one of those content items delete a referenced file?
3. Make sure I have a copy of all the referenced files and make a script to duplicate and rename all the referenced files, then update the files table with the new filenames and filepaths before migrating again. This would probably be the safest but will result in thousands of duplicate files, not to mention being time consuming to implement.
Any advice?
#22
Scott, can you check if those files were managed by the upload module?
#23
@Heine How could one check for that? In #17 you mentioned the upload module. Could you elaborate a little on that? Ie some sql example?
#24
Scott i am upgrading a site from D5, I have run into the same issue with multiple duplicate files. What was your resolution?
#25
I hope I'm not about to post this twice, post failed last time.
This script resolves the issue by passing over the database generating symlinks for each duplicate file, and then updating the "filepath" attribute, so that when the URI's are generated, they are all to unique locations. Not an ideal solution, but if you have less than a couple of thousand duplicates it only causes 2 or 3 megabytes of fluff in the file system.
I was a bit lazy and created it with absolute paths, so you'll need to adjust it a bit to be appropriate to your specific drupal installation.
Obviously this assumes a Linux based install.
Hope that helps someone.
function clean_this_mess_up($username, $password, $database){
mysql_connect("localhost", $username, $password);
mysql_select_db($database);
$i = 0;
while(1 == 1){
$duplicatePath = "duplicate_files_".$i;
$result = mysql_query("SELECT filepath, min(fid) fid FROM files GROUP BY filepath HAVING count(*)>1");
if(mysql_num_rows($result) == "0"){
echo "No duplicates left. \n";
return true;
}
mkdir("/local/www/drupal/".$duplicatePath, 0755, true);
$i++;
while ($row = mysql_fetch_array($result)) {
echo $row['filepath']."\n";
$path = $row['filepath'];
$pos = strrpos($path, "/");
$dir = substr($path, 0, $pos);
if(!file_exists("/local/www/drupal/".$path)){
if(!is_dir("/local/www/drupal/".$duplicatePath."/".$dir)){
mkdir("/local/www/drupal/".$duplicatePath."/".$dir, 0755, true);
}
touch("/local/www/drupal/".$duplicatePath."/".$path);
}
else{
echo $duplicatePath."/".$dir."\n";
if(!is_dir("/local/www/drupal/".$duplicatePath."/".$dir)){
mkdir("/local/www/drupal/".$duplicatePath."/".$dir, 0755, true);
}
$exec = "ln -s \"/local/www/drupal/$path\" \"/local/www/drupal/$duplicatePath/$path\" \n";
exec ( $exec );
}
}
mysql_free_result($result);
$exec = "chown -R www-data:www-data \"/local/www/drupal/$duplicatePath\" ";
exec ( $exec );
$result = mysql_query("CREATE TABLE temp_concat AS SELECT min(fid) fid FROM files GROUP BY filepath HAVING count(*)>1;");
$result = mysql_query('UPDATE files SET filepath = CONCAT("'.$duplicatePath.'/'.'",filepath) WHERE fid IN (SELECT fid FROM temp_concat);');
$result = mysql_query('drop table temp_concat;');
}
}
#26
Is there no better solution for this? Like Scott I have several thousand files and am moving up from Drupal 4.7-5-6 and now I wanted to upgrade to 7. Should i even bother? What happens between 7 and 8?
#27
@James Tipler - I am curious about your code in #25, where do you put that and at what point does it get run? Thanks.
#28
Had the same problem upgrading a module
Could a solution be something like:
<?php
$file['uri'] = $scheme . str_replace($basename, '', $file['uri']);
$file['uri'] = file_stream_wrapper_uri_normalize($file['uri']);
$duplicate = db_select('file_managed', 'f')
->fields('f', array('fid'))
->condition('uri', $file['uri'])
->execute()
->fetchField();
if ($duplicate && $duplicate != $file['fid']) {
$file['fid'] = $duplicate;
}
?>
At least the upgrade now works and the file can be associated with the right entity.
#29
Same problem here, on a site upgraded from 4.7 to 5 then 6, and now upgrading to 7. After looking into it a bit, I think it happens because in earlier versions upload could allow different revisions of the same node to use the same uploaded file, whereas more recent versions (D6) renamed duplicate files. In my case, I notice the entries in {upload} have the same description, belong to the same nid, and different file objects have been created: notice how the first entry is for fid 129 and all the later ones for fid 134 bound to the same actual file path. (note: file and site names anonymized)
fid vid description list nid weight
129 786 bar.jpg 0 217 0
134 792 bar.jpg 0 217 0
134 795 bar.jpg 0 217 0
134 796 bar.jpg 0 217 0
134 799 bar.jpg 0 217 0
134 803 bar.jpg 0 217 0
134 813 bar.jpg 0 217 0
134 815 bar.jpg 0 217 0
In turn, just before update 7061, this gives these entries in {files}
fid uid filename filepath filemime filesize status timestamp
129 57 bar.jpg sites/foo/files/bar.jpg image/jpeg 36131 1 0
134 57 bar.jpg sites/foo/files/bar.jpg image/jpeg 36131 1 0
And the duplicate happens because the update tries to insert an entry for fid 134 in {file_managed} after it has first inserted this row:
File ID uid filename uri filemime filesize status timestamp
129 57 bar.jpg public://sites/foo/files/bar.jpg image/jpeg 36131 1 0
But, obviously, this URI would be a duplicate of the original one.
Considering this, I think there can not be a database-only solution: any completely correct solution should work around the different file reuse cases by replicating reused files, renaming the copies as needed, referencing the updated file name instead of the original one, and only then applying the DB update strategy.
In my case, this happened for only one fid, so I just removed the offending node revision before running update 7061, removed fid 129 manually from files and upload, and all went like a charm.