Option to Disable IP Logging
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | watchdog.module |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
The ability to disable IP logging is important for sites that have already disabled IP logging in their server environment (in order to help protect users from government attempts to identify them by seizing servers or subpoenaing data).
Background: The United States government, at least, has demonstrated a desire and willingness to capture large numbers of IP addresses from ISPs or demand swathes of IP address data from web service providers, and also to misuse IP address information to raid people's homes. Most disturbingly, at least one city government has subpoenaed in an attempt to get IP addresses to identify activists and journalistic sources .
I understand from this thread that there are more places than the watchdog that record IP address information, but with watchdog now a required part of core, it must establish best practice for allowing or assisting the disabling of IP logs.
Thanks greatly for any feedback and assistance,
ben melançon :: http://AgaricDesign.com

#1
The following code in settings.php should make sure the IP address cannot be logged: I can't see that it would cause any problems.
$_SERVER['REMOTE_ADDR']='0.0.0.0';
Given the simplicity and effectiveness of this hack, is it worth developing, testing and maintaining code to do the same thing?
#2
Perhaps it should be added to the settings.php file but commented out (lots of things are) with a "to do X, uncomment this line" comment?
#3
Will do and will report if there are problems, so the line could be added.
It would be nice to stop the *logging* of IP addresses without stopping their *use*. (This is particularly the case with the ability to use IP addresses to guess at location, as done by FolkJam.org).
One question is why is the IP address logged in a bunch of places if it isn't used. Aside from the contact form (moving the number of submissions per hour to a high number is a workaround for its use of IP addresses), where else should we be attentive to possible side effects of setting
$_SERVER['REMOTE_ADDR']='0.0.0.0'?~ ben :: Agaric Design Collective :: http://AgaricDesign.com
#4
Hm. Here's an idea. Could a contrib module set the IP address to an md5 of itself? That way it's still unique, so flood control works, but it's hard/impossible to track back to the original person.
Someone with a more paranoid security mind, would that work? :-)
#5
This is still important.
I hear from the Indymedia Worcester group that Akismet, for one, gets flaky without IP addresses, so for a site that has to allow anonymous content but cannot allow IP logging of its users (the classic Indymedia setup), the settings.php hack to set all IP addresses to naught is not a practical solution.
Constant scrubbing looks like the only current approach to help protect users from intrusion of their privacy.
Really, though, it should be an option to simply tell Drupal core, at least, not to log IP addresses in the first place.
Any thoughts on this or Crell's idea to use one-way encryption of IP addresses?
~ben
People Who Give a Damn :: http://pwgd.org/ :: Building the infrastructure of a network for everyone
Agaric Design Collective :: http://AgaricDesign.com/ :: Open Source Web Development
#6
subscribing
#7
I don't think a one-way encryption would stop the government in this case. The problem is there are only 2^32 IP addresses to go through, so even somebody with a below-average computer could calculate and compare all possible values within a few hours, especially if they're targeting a single user. Maybe there's somebody with more experience with encryption that has a good idea.
By the way, I'm able to calculate md5sums (not doing any comparisons or output) of a /8 subnet in about 3 minutes running on a celeron processor under a xen vm with a PHP script. Obviously those are not ideal conditions for cracking. It's just to point out the government wouldn't take long to map IP addresses to hashes.
#8
What about using a semi-random fudge factor? e.g.,
sha1(floor(time() /3600) . $ip_address)? That would keep the hashed address changing every hour, which would only marginally impact the flood control.The base problem is that if the site is tracking users (flood control), it has to do so in some unique way. If it's done in a unique way, it's potentially trackable. You'd have to completely disable flood control and a few other things if you wanted a completely anonymous site.
#9
I created an IP anonymizer module -- http://drupal.org/project/ip_anon -- to scrub logged IP addresses on each cron run. The retention period is configurable per table so e.g. you can clear out session IPs immediately but leave IPs in the flood table for an hour.
Since the IPs are still recorded in the database at least temporarily, forensic methods might still be able to recover them from the hard disk. Ideally in Drupal 7 there could be an option to disable IP logging in the sessions, comments, accesslog and watchdog tables. In flood and poll_votes tables IPs are actually useful so I'm not so concerned.
#10
What about an optional wrapper function, along the lines of custom_url_rewrite - this has minimal overhead and would allow various approaches in contrib. I don't think a single core approach is possible, because it depends a lot on what the privacy requirements are and how much traffic the site gets (sites with little traffic are harder to anonymize).
We should probably have wrappers for datetimes as well as IPs, because they can also be used to identify users/activities (by matching with ISP traffic logs, for example).
#11
Generic hooks would also work, something like
comment_invoke_comment($edit, 'preinsert'), which would allow the timestamp and hostname to be altered before it's inserted. Also would be needed for session, accesslog, watchdog tables.#12
Hooks and conditional functions can have unpleasant overhead along the critical path. A handlers-style approach is probably going to be more performant: http://www.garfieldtech.com/blog/drupal-handler-rfc and http://drupal.org/project/handler
Yeah, not in core yet, but for swappable systems that is the way to go.
#13
@Crell
Do you think this is a good feature to use to try to get handler-style swappability in core, or is there another issue out there that should be the standard-bearer for the introduction of this pattern to D7?
#14
pwolanin is trying to get a very simplified version of handlers (honestly not handlers, but a factory approach which is the basic idea of handlers) working for #259103: fix pluggable password hashing framework.
For right now, we could probably get away with the same sort of approach. The benefit is that if you make the logging dohicky pluggable as an object, then you instantiate the object once and just call a method on it each time, you can easily swap out the class (via a variable_get()?) and then once you pay the cost of creating the object once into a static variable every subsequent call to a method of it is virtually identical to the cost of calling a function.