Convert {watchdog}.referer and {accesslog}.url from VARCHAR to TEXT
Frando - January 8, 2007 - 23:13
| Project: | Drupal |
| Version: | 6.x-dev |
| Component: | statistics.module |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Dave Reid |
| Status: | closed |
| Issue tags: | D7 upgrade path |
Description
While updating the nodeformpopup module to Drupal 5, I was confronted with a few very long referers (due to heaps of GETs), and that resulted in this PHP MySQL warning (it occured while creating a node via the nodeformpopup module):
Warning: Data too long for column 'referer' at row 1 query: INSERT INTO watchdog (uid, type, message, severity, link, location, referer, hostname, timestamp)
The attached patch checks the string length of the referer when creating a watchdog entry and trims it to 128 chars if it is longer.
| Attachment | Size | Status | Test result | Operations |
|---|---|---|---|---|
| bootstrap.inc_18.patch | 1.18 KB | Ignored | None | None |

#1
This should have a comment explaining why we are throwing away data.
Any patches for HEAD should make the referrer column a text column which has plenty of characters.
#2
I agree with drumm that this is better fixed at the database level. I notice in system.install that in system_update_162 referer is changed to a text field for pgsql, so really there should be another update for mysql.
Patch attached. This is my first core patch submission, please be gentle.
#3
Just realized this should have been changed to CNR on patch submission
#4
This stuff still didn't get in 5.7.
I'm not that convinced that this should be turned into a text field since somehow referer and REMOTE_ADDR are a sort of external input.
If you're running on limited web/DB space it could be a serious problem.
http://drupal.org/node/198233
I'd call this a serious bug because if someone is trying to tampering your site he just have to put rubbish in referer to stay under radars.
I'm already applying my patch to all my pg sites.
#5
Since we can't make a decision on truncate or lengthen, this should go to 6.x, which has more active reviewers. 6.x's dblog module has a varchar(128), which is not long enough for URLs. It should be either varchar(128) or text.
On client projects, I always make URL fields text. I used to use varchar(255), but I got asked about truncation every time. I think long URLs are something that should be expected. All the other software works with longer URLs: http://www.boutell.com/newfaq/misc/urllength.html.
#6
Thanks for the pointer to "URL" lenght. I thought about it and I thought that different programs may have different limits but after all it'd be a matter of what the users of drupal consider reasonable.
I'd think that they can't make good forecasts about how sane other webmasters are and they will prefer to conserve a "long" referer but they should know in advance what reasonable requested_uri() should be.
What about making them longer *and* truncating them?
BTW it is not just the referer but the requested_uri() that should be truncated (they both are external input).
People may not have the option to fine tune Apache and saving in the logs 8K most of the time is not what people would like to do on Drupal.
ISP space may not take into account the web server logs but most of the time it consider DB space.
If you really would like to be paranoid you'd check if the URLs are too long, trim them and *log* they were too long.
If someone is trying to access a not existent page with a 3K URL on my site I'd like to know it.
The lenght at which they will be truncated could be configured in settings.php or in a variable.
I consider important that watchdog works always.
#7
So... 255 or text? And if text, what arbitrary length limit to truncate on?
#8
As drumm pointed out Apache support URL as long as 8K (for me it is longer than reasonable in most cases)... other software support different length. Some URL from google may be pretty long but I definitively doubt they can reach 2K.
Will it add too much overhead to make it configurable?
I'd set the default length to something as long as 2K.
I really don't know if it is worth to log that the log was truncated... it could be useful to spot if someone is trying to tamper your site or tune the length at which you're trimming... I don't know if it could be considered a serious veicle of attack once you trim... surely I would consider it a possible hostile activity... but maybe if you're that serious, you may tune apache or whatever... and not use drupal as an IDS... that's start to be too phylosophical and will make the decision about truncating delayed further...
Making it truncating at a configurable lenght should be surely better than the current situation and it is easier to decide upon.
If we are checking if the length is above that limit and log it too.... I wouldn't make 2 log entries since there is a chance you lose relationship between the 2 entries.
Better flag a too long entry in the entry itself to make the log atomic.
#9
locales.module seems to be affected by the same problem.
Going to open a separate issue. URL max length should be treated uniformly across drupal.
$lid = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", ...
#10
Personally, I don't like the idea of throwing away data.
-1 to cropping URLs
+1 to configuration options for this
At all rates, this is now a discussion for 7.x; doesn't time fly? If we come to a decision we can backport later.
Robin
#11
You're going to throw away data anyway since there are hardcoded limits in: browsers and servers.
Now... the current limit is too small and data get thrown away in MySQL (without notice?) anyway.
What's going to happen in real life?
The url should be trimmed by the browser or the server and most realistically if you've a very long url
1) you've made some mistake in assigning a too long url to your pages (pathauto or some of your editors)
2) some crawler got crazy
3) someone is trying to exploit something or cause a DOS
If your web server is working properly... it will trim the url. If it is not working properly you may be in trouble.
Now suppose you'd like to understand what's going on from the logs...
There are very few chances you need 8Kb of data to "debug" 1)
Most people (all?) will be hardly interested in collecting crawlers garbage.
You'd be better to protect yourself from 3).
And all this will happen just if... your web server is not working as expected... otherwise the url will get trimmed anyway and won't reach php at all.
If you want to turn drupal into an IDS you may want to write somewhere that the url was trimmed. I'd think that this task should be left to the logging facility of the web server or a real IDS, not to drupal.
[1]I think people doing forensic that would like to collect a 100K executable sent through a URL are a very small minority of drupal users and still there would be better system to intercept an extraordinary long url that doesn't put you at risk of exceptional traffic.
BTW I think there is a similar limit in access table as well.
#12
Attached patch changes the referrer fields in watchdog and accesslog tables to type text. Upgrade paths provided.
#13
I noticed that the accesslog in the statistics module has the same problem. (i.e. once the watchdog works, you bump in this one!)
I guess D7 is affected. I'm working mainly with D6.
Thank you.
Alexis Wilke
#14
For now, let's keep this issue assigned to 7.x-dev. Once it gets cleared for that, we can start backporting this. Patch in #12 is good for 7.x review.
#15
We've 2 kind of URLS we should take care of... request and referer.
Surely Apache trims requested URL and that is sane... but is it going to trim referrer too?
It seems it does:
192.168.1.12 - - [10/Oct/2008:11:00:57 +0200] "GET /drupal/ HTTP/1.0" 400 8391 "-" "Wget/1.11.4"
at around 8K default...
I didn't dig further but it seems you can even tune those limits in Apache cfg.
So Apache (or any other reasonable web server) seems the place where to trim (it's the first place where you may incur in resource exhaustion), not drupal.
To make it clearer, all my motivations for trimming in drupal are gone and I'd say text is enough.
#16
Code in #12 looks good. Are we able to provide a test?
#17
I indeed think that #12 is a good solution and that's what I was thinking to do first. On the other hand, it will use a lot more space in the database if you get a lot of hits with long referrers.
#18
subscribe
#19
The last submitted patch failed testing.
#20
I thought this was fixed in D7 by changing the type to TEXT instead of a VARCHAR. Why is this bug still open? I'd like to know.
Thank you.
Alexis W.
#21
Because the patch in this issue hasn't been committed yet and both fields are still 'varchar' types. See {accesslog}.url and {watchdog}.referer.
#22
For reference here's the error message (no longer just a warning) d7 is spitting out:
ErrorPDOException: INSERT INTO {watchdog} (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9) - Array ( [:db_insert_placeholder_0] => 1 [:db_insert_placeholder_1] => user [:db_insert_placeholder_2] => Session closed for %name. [:db_insert_placeholder_3] => a:1:{s:5:"%name";s:7:"mfb";} [:db_insert_placeholder_4] => 5 [:db_insert_placeholder_5] => [:db_insert_placeholder_6] => http://localhost/d7/user/logout [:db_insert_placeholder_7] => ***very long referer removed*** [:db_insert_placeholder_8] => 127.0.0.1 [:db_insert_placeholder_9] => 1228299052 ) SQLSTATE[22001]: String data, right truncated: 1406 Data too long for column 'referer' at row 1 in dblog_watchdog() (line 135 of /modules/dblog/dblog.module).
The website encountered an unexpected error. Please try again later.
#23
Re-rolled for head.
#24
Table name was wrong in title...
#25
It seems preferable to just add another update script in dblog.install, so those who have d7 installed can upgrade?
#26
Yeah I suppose that's better. I also went in and fixed my big pet peeve of the "referer" misspelling. It should be "referrer." Renamed the {accesslog}.url to {accesslog}.referrer for consistency as well (url implies the current url, and not a referrer). All dblog, syslog, and statistics tests passed.
From the Wikipedia article on "referrer":
#27
Committed to CVS HEAD. Thanks.
#28
Dries, could you roll back and review #26? I'd prefer to fix the whole misspelling of "referrer" as well.
#29
aww I always thought 'referer' was kinda cute.. :p
Can u fix up the whitespace on the arrays so the values all line up? several of them are now one char off.
#30
Whitespace fixed. Again note, that #23 will need to be rolled back from core, and this applied instead.
#31
New follow-up patchthat will apply to head without any rolling back of patches.
#32
Forgot the initial changes in the _schema functions to the column names.
#33
I'm going to mark this as fixed and file a separate issue for fixing the "referer" spelling error.
#34
Where is an issue with backport this into 6.x ?
#35
This is on my hitlist to backport today...
#36
subscribing :-)
#37
Ah! Stop cross-posting! :)
#38
Patch for 6.x. If this is committed, I will have to change the upgrade paths on D7 with a followup-patch.
#39
Updates should be 6001 (6000 is default)
#40
andypost, not true. From http://api.drupal.org/api/function/hook_update_N/6
I'll bet that if you look in your {system} table right now, the dblog and statistics modules will have a schema of '0' before this patch. There's even a system_update_6000.
#41
Here's an updated D6 patch and updated follow-up D7 patch that provides the update comment blocks that I forgot to put in...
#42
@Dave Reid you right with version numbers but it (6000) works only if update from 5-branch
I've tried to run update.php and only 6001 shown - 6000 ignored by update.php on 6.x
#43
@andypost. It works just fine for me. You've probably applied some patch, ran the update so now the schema for dblog and statistics is set to '6000'. You'll need to reset your install, or reset the 'schema' column in the {system} table for the dblog and statistics module to '0', which simulates what all the current 6.x installs have out there. I have attached screenshots of the update on a newly-patched 6.x install on my machine.
#44
@Dave Reid sorry - you right, {system}.schema_version was 6000
The following queries were executed
dblog module
Update #6000
* ALTER TABLE {watchdog} RENAME referer TO referer_old
* ALTER TABLE {watchdog} ADD COLUMN referer text
* UPDATE {watchdog} SET referer = referer_old
* ALTER TABLE {watchdog} DROP COLUMN referer_old
#45
Doesn't matter whether it works for you guys, but we do never ever substantially change an update function. This ruins the upgrade path within the D7 trunk.
We can add an update_6000 function to D6 and a new update_7002 to D7.
Then we need to do the following:
in update_7001: wrap "db_change_field(... 'referer', ...)" in "if (db_column_exists('referer') {...} else {...}" or (better) check the column name first, and then do the "db_change_field" on the appropriate column.
in update_7002: wrap "db_change_field(... 'referer', ...)" in "if (db_column_exists('referer') {...}"
This is ugly but necessary. Complaints that the upgrade path doesn't work are even uglier.
Also, please create a new issue for the forgotten comment blocks update. Don't roll that in as this is an independent followup, and unnecessarily complicates this already complicated issue.
#46
Pancho, that's why when the 6.x patch is committed, the 7.x follow-up patch in #41 will need to be committed. Don't worry about the 7.x upgrade path. It's not finalized. We don't need to wrap everything in db_column_exists, the column names are still the same. db_change_column will still work in the 7.x paths.
#47
Oh, I'm really sorry - I mixed it up with #343626: Fix "referrer" spelling errors and rename {accesslog}.url to {accesslog}.referrer.
What remains valid from #45 is the last point to split off the forgotten comment blocks issue. I don't see it in the patch either way, so please make sure it doesn't get forgotten.
Sorry again.
#48
Subscribing. Running into long URLs with a search appliance: faceted search is cool, but makes for some really long URLs. It would be great to get this resolved! Thanks!
#49
bug #221020: locales_source.location is just varchar(255) and it is not truncated is the same nature but Gábor Hojtsy has opinion do not change database so it's different
#50
Patch #41 for d6 works fine on live sites so RTBC
#51
Sorry, previous was wrong - here patch for d6 last patch outdated because t() removed from table descriptions
Patch for D7 (migration) should be applied latter
#52
Andy,
You have to choose one or the other method. Either you do it by making sure you never go over the limit or you remove the limit. I think it is better to remove the limit since in 99% of the cases it won't go over the usual size of about 50 to 100 characters. But whenever it happens that a URL is 300, it still works as expected.
Entry #13 (by me) is the same solution as #221020: locales_source.location is just varchar(255) and it is not truncated i.e. truncate the data.
Thank you.
Alexis
#53
@Alexis, as #26 already in CVS by #27 - limit should be removed
better review my last patch
#54
I applied the patch (#51) by hand, and it upgraded and is working successfully!
If someone can verify the patch applies, I think this could be considered reviewed!
#55
Marked #371668: watchdog table referer column not big enough as a duplicate of this issue.
#56
It's unfortunate this didn't make it into D6.10. If no one else can test that the patch applies, I'll do it (just can't right now).
#57
Hi,
I got the same problem : a vrey long uri called by another server I do not control.
I applied the patch and now it work fine.
Hopefully, this issue will be fixed in next core versions.
Tks!
#58
As for me patch from #51 applies cleanly but nobody test it except me
#59
This looks good to me. I noticed this problem for some ago in PostgreSQL but haven't bothered fixing it since it was only a db logging bug. I'm happy to RTBC this but we'll need to fix the upgrade in D7 too. #41's patch http://drupal.org/files/issues/107824-followup-D7_1.patch looks like it'll do the trick.
#60
Hi @all,
will this make it into the next Drupal 6.x core update?
Cheers
hctom
#61
Tested patch posted #51 on Drupal 6.12 MySQL, works fine thanks, subscribing as this'd be nice to see in 6x core
#62
About URL length in browsers
IE 2083 - http://support.microsoft.com/kb/208427
Summary - http://www.boutell.com/newfaq/misc/urllength.html
Suppose this numbers help Gabor to include this patch into 6.x
#63
Finally committed #51 to Drupal 6.x! Now open again for the follow up patch on Drupal 7. Opening as critical to make sure it gets resolved before release.
#64
In fact, we should not mark the two added updates as 5.x to 6.x updates, because then they would be removed in the D7 release. All D6 -> D6 updates are kept though to allow for updating from any D6 version to any D7 version by tradition. So I've fixed the grouping comments to reflect that with our standard 6.x-extra group. Committed this to 6.x too.
#65
so trivial, rtbc
#66
@andypost: as said, I did commit #64. So this issue is left needs work for the fixup of the 7.x update path.
#67
Patch to fix upgrade path
Should we leave empty
defgroup updates-6.x-to-7.xsection in statistics.install?#68
current patch breaks the upgrade path. We need to rename statistics_update_7000 to statistics_update_6000 and add dblog_update_6000() to match what is currently in D6. Otherwise people upgrading from a non-updated version of D6 to D7 still get the full upgrade path.
#69
@Dave take a look at #278592: Sync 6.x extra updates with HEAD and Gabor's comment http://drupal.org/node/221020#comment-2038462
#70
@andypost: That applies for 5.x to 6.x updates. This is a 6.x to 6.x update. Read #64 above: "All D6 -> D6 updates are kept though to allow for updating from any D6 version to any D7 version by tradition." That's what we should be doing here.
#71
New patch adds
updates-6.x-extrasection#72
Added formating for d6->d7
#73
Changed comment, RTBC as minor change
#74
patch was lost, re attach here
#75
taggin
#76
Should wait til #278592: Sync 6.x extra updates with HEAD
#77
Fixed in #278592: Sync 6.x extra updates with HEAD
#78
Automatically closed -- issue fixed for 2 weeks with no activity.