I have duplicate entries in my aggregator. Actually, I have 10 entries that are identical.

I deleted them from the MySQL table and they just came back the next time the cron was run. I updated to CVS and the problem persisted.

See the problem here:

http://www.gassavers.org/aggregator/

Can anyone offer a solution? I would like to keep the aggregator, but not if it keeps pulling duplicate entries.

Comments

Prometheus6’s picture

Got a url for the feed itself?

pfaocle’s picture

This has happened to me a few times, both with 4.6.3 and recent HEAD versions. The feed responsible was http://www.demolicious.org/node/feed

Mateo’s picture

breyten’s picture

Mateo, it's because (in your case) the url to the permalinks are more than 255 characters long, so they get cut off when saving the items. Not sure what we should do about it.

Mateo’s picture

Thank you for the reply. I will just create multiple feeds using different keywords.

dopry’s picture

best solution would be
ALTER TABLE `aggregator_item` CHANGE `link` `link` TEXT NOT NULL

I think according to the mysql docs its no more expensive in diskspace than varchar.
Anyone know how to roll db changes as patch.. I figure patch database.mysql, but I don't know the new update.php stuff yet.

This applies to 4.6.5 and head.

dopry’s picture

Status: Active » Needs review

for 4.6.5

function update_X() {
$ret = array();

if ($GLOBALS['db_type'] == 'mysql') {
$ret[] = update_sql("ALTER TABLE {aggregator_item} CHANGE link link TEXT NOT NULL"):

}
elseif ($GLOBALS['db_type'] == 'pgsql') {
$ret[] = update_sql("ALTER TABLE {aggregator_item} RENAME link TO link_old");
$ret[] = update_sql("ALTER TABLE {aggregator_item} ADD link TEXT");
$ret[] = update_sql("UPDATE {aggregator_item} SET link = link_old");
$ret[] = update_sql("ALTER TABLE {aggregator_item} ALTER link SET NOT NULL");
$ret[] = update_sql("ALTER TABLE {aggregator_item} ALTER link SET DEFAULT ''");
$ret[] = update_sql("ALTER TABLE {aggregator_item} DROP link_old");
}

return $ret;
}

for 4.7.0-beta4

function system_update_X() {
$ret = array();

if ($GLOBALS['db_type'] == 'mysql') {
$ret[] = update_sql("ALTER TABLE {aggregator_item} CHANGE link link TEXT NOT NULL"):

}
elseif ($GLOBALS['db_type'] == 'pgsql') {
$ret[] = update_sql("ALTER TABLE {aggregator_item} RENAME link TO link_old");
$ret[] = update_sql("ALTER TABLE {aggregator_item} ADD link TEXT");
$ret[] = update_sql("UPDATE {aggregator_item} SET link = link_old");
$ret[] = update_sql("ALTER TABLE {aggregator_item} ALTER link SET NOT NULL");
$ret[] = update_sql("ALTER TABLE {aggregator_item} ALTER link SET DEFAULT ''");
$ret[] = update_sql("ALTER TABLE {aggregator_item} DROP link_old");
}

return $ret;
}

I'm not sure how to roll this as a real patch... any takers.

Dries’s picture

$ grep "link " ~/cvs/web/drupal/database/database.mysql
  link varchar(255) NOT NULL default '',
  UNIQUE KEY link (url),
  link varchar(255) NOT NULL default '',
  link varchar(255) NOT NULL default '',
  link varchar(255) NOT NULL default '',
$ grep "url " ~/cvs/web/drupal/database/database.mysql
  url varchar(255) default NULL,
  url varchar(255) NOT NULL default '',
$ grep "referer " ~/cvs/web/drupal/database/database.mysql
  referer varchar(128) NOT NULL default '',
$ grep "homepage " ~/cvs/web/drupal/database/database.mysql
  homepage varchar(255) default NULL,

We pretty much standardized on URL being no longer than 255 characters ... Of course, we could change that.

URLs longer than 255 characters are not likely to occur; I suggest dropping the priority of this problem. I doesn't affect most people, and when it does, it doesn't render your site useless.

Dries’s picture

Priority: Critical » Normal
Morbus Iff’s picture

Duplicate checking in general needs work - see the new approach outlined here.

mfarroyo’s picture

I am new to Drupal, and so I have not looked into finding out why there are duplicate entries in the aggregator_items table. Instead, what I did was create a MySQL index that hides the duplicates. The MySQL statement is as follows:

ALTER IGNORE TABLE aggregator_item ADD UNIQUE INDEX(fid,title);

This is issued to the drupal database (mysql -u -p drupal). If anyone should know any untoward effects of that index, I would appreciate some feedback.

Thanks

magico’s picture

What should be done about this? Is this a work in progress in HEAD?

magico’s picture

Version: 4.6.3 » 4.6.9
Status: Needs review » Closed (won't fix)

Closing this in favour of #10.