Problem/Motivation
The title column of the {aggregator_item}
table is a 255char varchar; however, the aggregator doesn't try to truncate it or check the length before saving it. Consequently, titles over 255 characters cause a fatal php error and mysql dies trying to create the feed item (which kills cron execution).
ERROR:
PDOException: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'title' at row 1: INSERT INTO {aggregator_item} (title, link, author, description, guid, timestamp, fid) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array ( [:db_insert_placeholder_0] => Testimony of Rear Adm P. Zukunft, U.S.C.G. Asst Commandant for Marine Safety, Security, and Stewardship, before House Comm on Homeland Security, Subcomm on Border & Maritime Security, Protecting Maritime Borders:Leveraging Law Enforcement Cooperation to Enhance Security along America's Coasts [:db_insert_placeholder_1] => http://www.dhs.gov/ynews/testimony/20110712-zukunft-protecting-maritime-borders.shtm [:db_insert_placeholder_2] => [:db_insert_placeholder_3] => Rear Admiral Paul F. Zukunft spoke about Coast Guard cooperation with law enforcement partners at the federal, state, local, territorial and tribal levels. He discuss current cooperation in the areas of maritime drug and alien migrant interdiction as well as joint capabilities under development. [:db_insert_placeholder_4] => http://www.dhs.gov/ynews/testimony/20110712-zukunft-protecting-maritime-borders.shtm [:db_insert_placeholder_5] => 1310468400 [:db_insert_placeholder_6] => 1 ) in aggregator_save_item() (line 150 of /var/www/upi/drupal/modules/aggregator/aggregator.processor.inc).
Proposed resolution
Truncate the title to 255 chars before saving.
Previously, a patch was applied to use a larger field for the title (#30). However, there are several reasons to truncate the field at 255 characters (see #70).
Remaining tasks
- Roll back #30.
- Discuss whether to truncate the field at 255 characters (see #70).
- Roll a new patch for D8, D7, and D6 that truncates the field as #11 with the test from #30.
User interface changes
None. (Titles over 255 characters are not handled at all, so there is no change if we begin truncating them.)
API changes
Schema change in {aggregator_item}
(from 255 varchar to normal text).
Original report by jenpasch
Aggregator_item's title column is a 255char varchar. One title in our feed is (insanely!) more than 255 characters, and aggregator doesn't try to truncate it or check the length before saving it. This causes a fatal php error and mysql dies trying to create the feed item (which kills cron execution).
Comment | File | Size | Author |
---|---|---|---|
#74 | rollback_1244116.patch | 1.49 KB | xjm |
#66 | aggregator.patch | 2.36 KB | catch |
#50 | 1244116-aggregator-titles-title-text-column-rev4-drupal7.patch | 2.38 KB | brianV |
#46 | 1244116-aggregator-titles-title-text-column-rev3-drupal7.patch | 2.38 KB | brianV |
#45 | 1244116-aggregator-titles-title-text-column-rev3-D7.patch | 2.38 KB | brianV |
Comments
Comment #1
brianV CreditAttribution: brianV commentedTo slightly correct, if the title of a feed item being processed is longer than 255 characters, the query to write the aggregator item fails as the title is larger than the title column, which is declared as a varchar(255). This has been killing cron runs for us, as it generates a fatal error.
The attached patch truncates the title field of the aggregator item before passing to aggregator_save_item().
Comment #2
brianV CreditAttribution: brianV commentedComment #3
brianV CreditAttribution: brianV commentedHmm... patch apparently didn't want to attach...
Comment #4
Bojhan CreditAttribution: Bojhan commentedComment #5
tim.plunkettWhy not increase the size of the database column?
Comment #6
EvanDonovan CreditAttribution: EvanDonovan commentedShouldn't this be at least a major bug since it causes a PHP fatal error?
I think it makes more sense to truncate rather than increase the size of the database column since you don't want feed titles to be that long, and it would potentially slow down performance.
Plus this avoids having to write an update routine whereas increase the size of the database column would.
Comment #7
tim.plunkettFair enough. But shouldn't this be D8 first?
Comment #8
catchPatch looks good.
Are there tests for aggregator_save_item()? If so could we add to those for this issue?
Comment #9
catchComment #11
catchThis should pass.
If someone wants to come up with a different 255 char title that's fine, although I think that one shows why we'd want to truncate rather increasing the column size :)
Comment #12
chx CreditAttribution: chx commentedNice :)
Comment #13
Dries CreditAttribution: Dries commentedI wonder if this is the right fix.
Certainly, this patch is a step forward but I can easily see someone file a report a bug report stating that his/her title has been incorrectly truncated.
As an end-user, I would not expect my data to get truncated.
Comment #14
catchI think 255 characters is long enough, stealing an example from http://frankkoehl.com/2009/05/what-255-characters-looks-like/
it's pretty long for a title :)
The RSS 2.0 spec says:
(emphasis added).
0.91 spec is here:
According to http://backend.userland.com/rss091
Length restrictions were subsequently lifted completely, but as a convention it seems OK to enforce 255.
Comment #15
Morbus IffAs a counter-point, both RSS 0.91 and RSS 2.0 are from the "same guy", which is notoriously different from the RSS 1.0 and Atom specs (unlike logic, RSS 1.0 is an entirely different spec/approach than RSS 2.0 - 2.0 is not "newer" or "better" than 1.0). Anyways, in RSS 1.0, the "suggested" limit is 100 characters, but there is no limit imposed on text constructs in the Atom RFC (4287).
With that said, I'm OK with truncating to 255. It's pretty abnormal to see a title longer than 255 in a syndicated feed.
Comment #16
Dave ReidTruncating to 255 is perfectly reasonable to me.
Comment #17
xjmTagging.
Comment #18
nodiac CreditAttribution: nodiac commentedI disagree!
In the science world, there are many titles that are paragraphs so they can be searched. Truncating the titles breaks this strategy. I've also seen trends in longer titles in other areas. There are people for whom 'title' is something other than what we're used to. We should accomodate them.
Comment #19
Morbus Iff@nodiac: But, presumably, if you're concerned about SEO, you're also putting the article's full payload into the RSS
body, which would get sucked into the aggregator tables just fine, thus answering your search ability desire, no?
@all: I'm *not against* changing the column to something larger, mind.
The right question, here, might be: is anyone *against* changing the database table?
If no one has any decent logic against doing it, the path of least resistance and WTF is to change the db, not the data.
Comment #19.0
mrmofo CreditAttribution: mrmofo commentedIssue summary template.
Comment #20
xjmIssue summary by mrmofo.
So I guess we are still NR here to address whether there's any reason not to just make the column bigger.
Comment #21
brianV CreditAttribution: brianV commentedOk, here is a new patch for D8 implementing the alternative fix of increasing the size of the 'title' column in aggregator_item from VARCHAR(255) to TEXT. Also added the updated test from catch's patch.
EDIT: Since we eventually want to backport this patch to D7, I assume if this approach is correct, I need to roll a set of twin patches which make the schema change in D7, and remove aggregator_update_8000() as the schema will already be changed? Not sure what the proper protocol is for backporting patches which affect the schema.
RE-EDIT: Davereid suggested the best backporting plan is to truncate in D6, make the schema change w/ hook_update_N() in D7, and just the schema change in D8.
Comment #23
brianV CreditAttribution: brianV commentedFailed because I neglected to remove the default value from the text column. New patch attached.
Comment #23.0
brianV CreditAttribution: brianV commentedUpdated issue summary.
Comment #23.1
brianV CreditAttribution: brianV commentedUpdate issue summary to reflect new patch.
Comment #24
dawehnerInstead of using sql directly it would be better to use db_change_field
7 days to next Drupal core point release.
Comment #25
catchfwiw I'm not against changing the column size either, however I did consider posting to planet with a 254 char title to see how broke it would look.
The backport plan looks good to me.
Comment #26
xjmTagging for D6 backport based on #21.
Comment #27
xjmAh, the update hook also needs to be renamed to
aggregator_update_7000()
I believe.Comment #28
xjmRerolled to use
db_change_field()
. Needs testing.Comment #29
xjmTrailing whitespace fail. I do read my patches before uploading them. I swear!
Comment #29.0
xjmupdate summary
Comment #29.1
xjmUpdated issue summary.
Comment #30
crashtest_ CreditAttribution: crashtest_ commentedRerolling this patch without the D7 update function to make it D8 specific on catch's suggestion.
Ran the tests, all passed!
Comment #31
catchWe can't test 7-7 upgrades yet, but the Drupal 6-7 upgrade tests will include changing that column. Since that passed this looks fine to me. It might be worth putting some actual data in the aggregator tables as part of one of the upgrade databases maybe though - did not check to see if that's already there.
Comment #32
catchWe can't test 7-7 upgrades yet, but the Drupal 6-7 upgrade tests will include changing that column. Since that passed this looks fine to me. It might be worth putting some actual data in the aggregator tables as part of one of the upgrade databases maybe though - did not check to see if that's already there.
Comment #32.0
catchPunctuation nitpick.
Comment #32.1
xjmUpdated issue summary.
Comment #32.2
xjmUpdated issue summary.
Comment #33
twistor CreditAttribution: twistor commentedTests pass, applies cleanly, works as advertised.
Comment #34
aspilicious CreditAttribution: aspilicious commentedSrry to put this back but catch mentioned in #32 that we need to check for the aggregator data in the database upgrade test.
Comment #35
twistor CreditAttribution: twistor commented@aspilicious, where, how do we do this?
Comment #36
catchLook in modules/simpletest/tests/upgrade
If any of the database dumps contain actual feed data (not just the actual aggregator tables) then we're good. If not it'd be good to add some.
I don't think that's required for this patch to be committed, but it'd be nice to have actual data being upgraded in the test if it's not already.
Comment #37
aspilicious CreditAttribution: aspilicious commentedCatch does this data need a title longer than 255 characters?
Comment #38
twistor CreditAttribution: twistor commented@catch, so we're back into the 7.x branch. This is the first time I've ever stared into the abyss that is drupal-6.bare.database.php/drupal-6.filled.database.php. As far as I can tell, it's just aggregator tables, how would someone go about adding data to this beasty?
Comment #39
catch@aspicilious - no because that's impossible at the moment due to this bug ;)
@twistor the easiest thing is to have a Drupal 6 site with aggregator, set up a feed, run cron. Then use the scripts/dump-something.sh included with Drupal 7 to dump it. Then copy and paste the aggregator section into drupal-6.filled.database.php -just the db_insert() with the feed config + feed data hunks would be plenty.
Comment #40
catchMarking #30 RTBC for D8. I don't think we need the data in aggregator tables for Drupal 7 upgrade testing just for this patch (it'd be nice to have in general), but if we do this doesn't affect the 8.x patch anyway.
Comment #41
catchComment #42
catchAdding quick fix tag to identify relatively straightforward issues in the major queue.
Comment #43
webchickThis patch in #29 doesn't seem to apply. aggregator's already up to _7002.
Comment #44
catchLikely needs 7.x-extra defgroup adding for aggregator too.
Comment #45
brianV CreditAttribution: brianV commentedUpdated D7 patch added with aggregator_install_7003() instead of aggregator_install_7000(). Also, added the proper Doxygen groupings.
Bumping down to 7.x to test this patch.
Comment #46
brianV CreditAttribution: brianV commentedpatch is renamed so testbox picks it up. same patch as #45...
Comment #47
brianV CreditAttribution: brianV commentedRemoving the 'needs backport to D6' tag.
I tested this troublesome feed in D6 HEAD, and by default, the feed items are truncated to 255 characters without any errors or issues.
So, I do not believe this bug requires a patch in D6. Also, this won't be problematic on a D6-D7 upgrade as there won't be any aggregator items with titles longer than 255 characters in the upgraded D7 site (until the next long feed item is fetched, anyways...
The D8 patch in #30 and the D7 patch in #46 should be all we need provided the bot is happy.
Comment #48
brianV CreditAttribution: brianV commentedD7 patch tested OK, so bumping up to D8.
#30 (D8) and #46 (D7) are the two current patches.
Comment #49
catchBack to RTBC.
Comment #50
brianV CreditAttribution: brianV commentedUrgh, hate to do this, but realized I had used 'defgroup' instead of 'addtogroup' for the D7 update hook...
Other than that slight text change, patches are identical.
Comment #51
catchupdates-extra makes my brain hurt. Opened #1266752: Document updates-7.x-extra defgroup pattern . Re-re-RTBCing.
Comment #52
brianV CreditAttribution: brianV commentedOk, so for the committers, Current RTBC patches are:
#50 for D7
#30 for D8
Comment #54
crashtest_ CreditAttribution: crashtest_ commented#50: 1244116-aggregator-titles-title-text-column-rev4-drupal7.patch queued for re-testing.
Comment #55
xjm@CrashTest_ et al: The issue is against D8, so testbot is trying to apply the D7 patch to D8. This is likely why it's failing. To make testbot ignore a D7 patch generally, add the suffix
-d7
to the patch name. (In fact, I believe-dN
will make the patch be ignored, where N is any number.)Catch marked this RTBC, so if/when testbot comes back with a fail again we can set it back to RTBC.
Comment #56
crashtest_ CreditAttribution: crashtest_ commented@xjm, Ahah! Sorry, didn't know that fact. Thanks for the enlightenment, I will quit clicking re-test :)
Pat
Comment #58
brianV CreditAttribution: brianV commentedHehe... Yeah, sorry. I called it drupal7 in an earlier revision so it would test when I switched the version of the bug to D7. I never fixed that (oops!), and hence it's testing and failing under D8...
Comment #58.0
brianV CreditAttribution: brianV commentedUpdated issue summary.
Comment #59
brianV CreditAttribution: brianV commentedAs requested by webchick, the RTBC candidates patches are:
#50 for D7
#30 for D8
Comment #60
sunPatch in #30 for D8 is OK. The backport in #50 is not.
Comment #61
brianV CreditAttribution: brianV commentedRetesting against 7.x to demonstrate it is indeed green. It was before, but someone retested it against 8.x...
Comment #62
brianV CreditAttribution: brianV commented#50: 1244116-aggregator-titles-title-text-column-rev4-drupal7.patch queued for re-testing.
Comment #63
brianV CreditAttribution: brianV commentedThere - 50 is now showing as a pass again. Unless sun has a specific reason why this patch is bad, I am considering it RTBC.
D7 - #50
D8 - #30
Comment #64
Dries CreditAttribution: Dries commentedCommitted #30 to 8.x. Moving to 7.x.
Comment #65
sunShould be /**
1) Typo in "daddtogroup".
2) @addtogroup only takes a group name; the label/title is defined elsewhere.
Multi-line comment style hiccup... first line should be /**
20 days to next Drupal core point release.
Comment #66
catchShould all be dealt with in this one.
Comment #67
sunThanks!
Comment #68
dawehnerI would vote to backport this bring first the initial patch in and then fix the things mentioned in http://drupal.org/node/1244116#comment-4962716 is this okay?
Comment #69
sunNot needed, only the patch in #66 contains the lines from #65. I deliberately waited for the D8 patch to be committed first.
Comment #70
catchI'm moving this back to D8, there was another good reason to leave this at 255 that was not discussed.
MySQL cannot make temporary tables in memory with TEXT or BLOB columns, varchar is OK though.
This means any query like these:
needs to be reviewed for index usage before changing column type, as well as the likelihood of contrib modules or views querying these tables.
Marking RTBC for a rollback of the currently committed patch, then we can discuss whether to evaluate the queries or just truncate all over again.
Comment #71
catchNote #30 is the patch that needs rollback.
Comment #72
catchOpened #1284658: MySQL strict mode is causing PDO Exceptions when lengths of varchar are exceeded to discuss the overall issue here.
Comment #72.0
catchUpdate issue summary.
Comment #73
xjmUpdated summary based on #70.
Comment #74
xjmFor convenience, this patch rolls back #30.
Comment #75
Dries CreditAttribution: Dries commented1) Rolled the patch back on 8.x.
2) As far as I can see, this patch was never applied to 7.x.
1 + 2 means that I'm moving this back to 'needs work'.
Comment #76
catchThanks for the rollback, let's try to figure this out in #1284658: MySQL strict mode is causing PDO Exceptions when lengths of varchar are exceeded.
Comment #77
xjmSo this issue is blocked then?
Comment #78
laura s CreditAttribution: laura s commentedSubscribe. (Have at least one site going live with a hacked core to work around this fatal error.)
Comment #79
robert.laszlo CreditAttribution: robert.laszlo commentedSubscribe
Comment #80
webchickCan someone post some before/after EXPLAINs on queries in aggregator module that touch the "title" field with/without the patch? I'm unclear whether catch's concern is a real one or a theoretical one, and it'd be nice to find out by exactly how much this degrades performance.
Comment #81
sun"Though what kind of wonky weird content or post title in what kind of shitty feed actually has more than two hundred and fifty five characters in the first place? Isn't that the point where we should rather ask for bogus configuration and just cut off???"
^-- 255 characters
Certainly, the possible length may be reduced when using Unicode byte sequences, but that's actually not even what the OP reported - the case in the OP was:
"Testimony of Rear Adm P. Zukunft, U.S.C.G. Asst Commandant for Marine Safety, Security, and Stewardship, before House Comm on Homeland Security, Subcomm on Border & Maritime Security, Protecting Maritime Borders:Leveraging Law Enforcement Cooperation to Enhance Security along America's Coasts"
There's an easy stance here: If you really want to support such "titles" (for whatever reason), use Feeds module in contrib and map the title to a (long) text field.
In short, I'd recommend to go back to #11.
Comment #82
webchickI agree. #11 would certainly be preferable to a fatal error. And the truncation doesn't strike me as a huge problem because aggregator items are, by definition, ephemeral. It's one of the things I love about aggregator module over Feeds.
Comment #83
webchick#11: 1244116-titles-with-test.patch queued for re-testing.
Comment #84
brianV CreditAttribution: brianV commentedUpdate - the retest of #11 passed.
Comment #85
sun#11 is RTBC for me then.
Comment #86
catchI can't commit #11 since it's my patch.
Also I think we need to resolve #1284658: MySQL strict mode is causing PDO Exceptions when lengths of varchar are exceeded even if it's just a documentation issue - so we don't have to go through this for every patch (since there are 3-4 like this already just in core), so I'm going to postpone this on that one.
Comment #87
catchThat other issue is resolved, and we decided to truncate in PHP unless there's a very good reason to change schema, which in this case there's a high likelihood of temp tables written to disk.
Moving this back to RTBC and assigning to webchick since it's my patch.
Comment #88
brianV CreditAttribution: brianV commented#11: 1244116-titles-with-test.patch queued for re-testing.
Edit: Passed.
Comment #89
webchickI committed and pushed #11 to 8.x and 7.x. I see the argument around unexpected data truncation, but IMO here it's ok because these items get cleared out on cron anyway.
Comment #90
n4rky CreditAttribution: n4rky commentedAre you sure you're using a canonically correct approach to this problem?
PDOException: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'link' at row 1: INSERT INTO {aggregator_item} (title, link, author, description, guid, timestamp, fid) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6); Array ( [:db_insert_placeholder_0] => Archbishop of Canterbury confronts Robert Mugabe in Zimbabwe over controversial Anglican split [:db_insert_placeholder_1] => http://telegraph.feedsportal.com/c/32726/f/568853/s/192bb7a2/l/0L0Steleg... [:db_insert_placeholder_2] => Telegraph Staff [:db_insert_placeholder_3] => Archbishop of Canterbury Rowan Williams meets with Zimbabwe President Robert Mugabe in Harare to discuss a controversial split in the Anglican Church within the troubled African state [:db_insert_placeholder_4] => http://www.telegraph.co.uk/news/worldnews/africaandindianocean/zimbabwe/... [:db_insert_placeholder_5] => 1318274656 [:db_insert_placeholder_6] => 145 ) in aggregator_save_item() (line 150 of /home/www/disunitedstates.com/drupal-7.8/modules/aggregator/aggregator.processor.inc).
Status Report contents:
Drupal 7.8
OK
Access to update.php Protected
Info
CAPTCHA Already 0 blocked form submissions
OK
CTools CSS Cache Exists
OK
Configuration file Protected
OK
Cron maintenance tasks Last run 14 min 28 sec ago
You can run cron manually.
To run cron from outside the site, go to http://disunitedstates.com/cron.php?cron_key=elided
OK
Database system MySQL, MariaDB, or equivalent
OK
Database system version 5.1.54-1ubuntu4
OK
Database updates Up to date
Warning
Drupal core update status No available releases found
There was a problem checking available updates for Drupal. See the available updates page for more information and to install your missing updates.
OK
File system Writable (public download method)
OK
GD library PNG support 2.0
OK
GD library rotate and desaturate effects 2.0
OK
Module and theme update status Up to date
OK
Node Access Permissions 97 permissions in use
If the site is experiencing problems with permissions to content, you may have to rebuild the permissions cache. Rebuilding will remove all privileges to content and replace them with permissions based on the current modules and settings. Rebuilding may take some time if there is a lot of content or complex permission settings. After rebuilding has completed, content will automatically use the new permissions. Rebuild permissions
OK
OpenID Math library Installed
OK
PHP 5.3.5-1ubuntu7.3 (more information)
OK
PHP extensions Enabled
OK
PHP memory limit 128M
OK
PHP register globals Disabled
OK
Page title version Enabled (page_title table contains 0 rows)
OK
Unicode library PHP Mbstring Extension
OK
Update notifications Enabled
OK
Upload progress Enabled (PECL uploadprogress)
OK
Web server Apache/2.2.17 (Ubuntu)
OK
XML sitemap Last attempted generation on Sun, 10/09/2011 - 16:00 (1 week 1 day ago).
OK
XML sitemap cache directory Writable
OK
Zencoder 1.2
I attempted to apply patch in #11 since this is what seems to have been the consensus here as to the correct fix. I got two hunks rejected. Since the patch consists of two hunks that's 100 percent failure.
Comment #91
n4rky CreditAttribution: n4rky commentedA Google search indicates this is biting a number of sites. And considering further the bits of this discussion I read, I would point out that search engine optimization (SEO) is not all users' sole consideration. Some of us are actually using Drupal as a content management system. And we need it to manage content.
Comment #92
catchThis has been committed to 7.x but it is not in a stable release yet (it was committed after 7.8 came out).
Comment #93
DocuAnt CreditAttribution: DocuAnt commentedThe problem still exists for links longer than 255 characters. :-)
Comment #94
DocuAnt CreditAttribution: DocuAnt commentedAnd the problem exists for authors longer than 255 characters. :-)
Comment #95
brianV CreditAttribution: brianV commentedDocuAnt:
This is an old bug, please don't re-open it. It concerns the title field, not the author or link fields.
Now with respect to your reports:
1. The link field has been fixed in D8, and that patch need to be backported to D7: #218004: Allow aggregator feed URLs longer than 255 characters
2. I am opening a new bug report w.r.t the author field. I'll make a new post with a link to it.
In the meantime, I am reverting this issue to it's old title and status.
Comment #96
brianV CreditAttribution: brianV commentedIssue with respect to the author field: #1622974: Aggregator fails to parse author entries longer than 255 characters
Comment #97
DocuAnt CreditAttribution: DocuAnt commentedThanks! :-)
... and sorry for the confusion.
Comment #97.0
DocuAnt CreditAttribution: DocuAnt commentedUpdated issue summary.
Comment #98
jorisx CreditAttribution: jorisx commentedok, I now get this same error but then for the description field?!
PDOException: SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'description' at row 1: INSERT INTO {aggregator_item}
Any help would be super appreciated because now the whole cron run is canceled on this error ...