manually block IP addresses / detect IP floods
| Project: | Spam |
| Version: | 6.x-1.x-dev |
| Component: | Miscellaneous |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
We have recently started getting some spam, (seemingly just on the weekend!), which contains only one or two URL links and where the text is pretty innocuous ... not quite on topic, but not wildly off topic and it all has something to do with automobiles. These are getting scores like 44 and our threshold is something like 72, so they have to be noticed and manually marked. I'm wondering if:
1) It is possible to tell after the fact what IP these are coming from so that one could tell if they are all coming from the same source.
2) If it is a consistent IP, is it possible to add it to the block list? I am a little puzzled about how the block list works since it has had as many as three entries, but now only has one.
3) One can report this spam to whatever agency is maintaining the block list.
4) There is any other technique one might use to catch this spam automatically.
A sample post is
Nice to read your story and may I wish you all the best for the future, honda auto, 2464, and, :-)),
The word honda and the word and at the end are links to blogreporter.biz. Others are to spurl.net.

#1
The IP should be listed in the spam overview page. You can filter out a specific IP (before the automatic IP filter in the duplicate module gets it, let's say) by creating a custom filter that checks the referrer field for the IP you're looking to block.
#2
I'd like to re-open this inquiry to at least get a documentation fix, if not a functionality improvement.
I am curious for starters as to why the Blocked IP list in the Spam Log report seems to only rarely have anything in it. It would seem to me that there should be a function in this module for blacklisting IP addresses which are associated with known sources of spam. We certainly often get bursts of spam from the same IP and it would be helpful to, at the least, manually blacklist that IP so that one didn't need to wade through the cleanup. Something automatic with a review process would be even better.
Having to make a new filter for each new instance ... however one does that ... is not user friendly since the content administrator for the impacted sites isn't really a techie.
#3
admin/settings/spam/filters/duplicate contains IP blocking functions. Perhaps this should be moved out into a separate filter somehow? I don't understand why it's only in the duplicate, since it doesn't take much to randomise parts of each spam message and fool a duplicate detector.
Perhaps it would make sense to auto-block any user that posts something that scores above a certain threshold for any combination of filters?
@tamhas: needs work is for patches.
#4
FWIW: In my experience, blocking IP addresses has very limited application. Sometimes there's an annoying user who won't give up and it comes in handy, but more often than not spam attacks come from zombie farms which have near unlimited IP addresses at their disposal. This is why the ip blocking stuff currently only lives in the duplicate filter -- if an IP starts showing up multiple times posting spam, then it's time to automatically block the IP...
As for the request to manually block an IP address -- Drupal has the ability to _completely_ block an IP address built into it. Perhaps the spam module should make it easier for admins to access this functionality? Granted, the spam module treats blocked IPs differently -- it allows blocked users to view web content, only preventing them from posting new content.
#5
We aren't seeing much in the way of true duplicates ... multiple on the same theme, yes, but with enough variation to not get picked up as duplicates.
While I recognize that the IP blocking isn't going to help a lot of spam, we certainly have seen cases where it would, particularly if it were automatic since we will get a whole burst of spam from a single source.
#6
I had found the built-in Drupal blocking, but the problem with that is that there is nothing in the list of visitors to tell one whether it is spam or something else. E.g., our top user resolves to googlebot, but #3 is 94.102.51.22 which doesn't resolve to anything ... over 1000 visits.
#7
It does seem that it would be of limited use at best to block a particular IP in general -- as has been mentioned a few times, the most common use case is to prevent a burst of spam content from an IP, which has to be caught in time to prevent the burst -- and, considering that it can come from "zombie" machines which may often be behind a shared IP address out of a pool, the block wouldn't have much use after a particular period of time.
So, just to throw this out there, it sounds like what we *really* want is a per-IP flood filter. If an IP posts, say, 10 content items in a minute's span (or whatever), all 10 content items (and any future ones for a certain timeout; say, an hour) are retroactively marked as spam and unpublished.
When it's put that way, I think a flood filter is potentially a very good idea. Does that sound like it would be a solution to the problem you're facing?
#8
The pattern we have been seeing is not really a flood in the sense of a burst of 10 messages within a minute or two, but more a flood like 10 messages in a day possibly continuing over some days.
From what we see, I am inclined to block the IP after only one or two spam, just to keep down the amount of spam we need to filter.
#9
@tamhas: the duplicate module will do this for you -- if an IP has posted more than N pieces of spam, the IP is blocked. The IP is only unblocked if 1) the spam content is manually marked as not-spam, or 2) the spam content is deleted. Is this not what you're looking for?
What gnassar is talking about is monitoring the content posting rate for a given IP address -- if it goes above a certain threshold, then consider the IP a spammer. The threshold would have to be a fairly small window of time, however, as otherwise you may end up penalizing an active member of your community.
I very much like the idea of a flood detection filter. It should be simple enough to write, and then it's a matter of watching it in the wild to see if it's doing any good. It may be good to layer it -- ie, >5 posts in 1 minute = 85% chance of being spammer. >10 posts in a minute = 99% chance... (Similar to the node_age filter).
#10
Two layers is certainly a good idea. it would be good to make all the variables customisable.
#11
I'm not seeing any sign of IPs being blocked. Once, long ago, there were one or two in the blocked IP list, but they have long since disappeared and no new ones have appeared despite numerous spam from the same address. I suppose the problem here is that we are deleting the spam, but if we didn't we would have some pages with hundreds of spam contents ... not visible to the rest of the world, since they are unpublished, but visible to either of the administrators. Even just a few days can build up quite a list of some of their "favorite" pages.
#12
There's a feature still missing from the 6.x branch of the spam module: auto spam deletion. It's supposed to be a configurable option, delete spam after N days. The idea being that IPs are only black-listed temporarily.
I'm updating the title, as a manual permanent IP block is different and will live in its own block -- I think it can live in the same module that detects floods of activity from the same IP address.
This won't happen until after 6.x-1.0 is released, which should be early next week.
#13
I can see that one might want to expire an IP block after a while, but I'm a bit puzzled about it disappearing as soon as one deletes the spam. If one keeps up with deleting the spam, which seems like good admin practice, then the IP block is almost useless.
#14
Just to through in a data point ... we just got 88 spam comments from the same IP.
#15
In the last couple of days, we had 20 IP addresses with 2 or more spam each, all but 1 4 or less. I recognize that banning an IP which has only produced 2 spam may not be that effective, but I am inclined to do so on the chance that the same site is going to continue to send out new spam, i.e., 2 today, 2 tomorrow, 2 the next day, etc. so that we only ever see 2 at a time, but cumulatively it is a lot more.
Banning these with current tools is pretty tedious since one has to search through top visitors page at a time, so it is easy for a new IP with only a couple of spam to be 14 or 15 pages in on the list. With a 1 page list and a working sort by IP, one could move pretty quickly, but having to page through and use ^F to find the IP can take a long time per IP.
Is there a SQL query I could use to do these bans? Clearly, that isn't user friendly enough for the content admin, but if I could take a standard query, plug in the IP, and have it tell me the number of hits and set it to ban, that might be worth doing as an interim step.
Also, the content admin tells me that in the recent batch there are many that appear to be from the same person since the author name is something like jon, jon1, jon2, etc. It would be handy to collect all the IPs that met some pattern like this.
#16
FWIW: I'm not opposed to implementing this functionality. All filters have their own use, though I do not believe there is any silver bullet. (What works well today, is almost guaranteed to not work in all cases "tomorrow" as the spammers always learn and evolve...)
Anyway, patches are very welcome... :)
#17
Another data point ... in the last day, we got 308 spam comments. 104 of those came from one IP! Other IPs were responsible for 41, 27, 12, and 10 each. That's 62% from 5 IPs. There were 38 IPs with two or more spam.
Seems to me this must be a lot more common than supposed, unless our experience is unusual.
#18
Tamhas: I don't think the data is useful, we know the problem, we know that this is a worth while solution. It will happen in time, as soon as someone gets annoyed enough to draft up a filter module. Please be patient (or have a go at poking around the drupal API and have a crack at it yourself ;) )
#19
I've only dabbled in PHP, so I'm not going to be much help in providing code.
We have started keeping a list of IPs and the number of instances of spam in which they occur. We have to keep this externally since we need to keep deleting the spam to make it manageable. One of the things we have quickly discovered is that, in addition to the IPs responsible for a large burst of spam on a single day, there are also IPs which have a pattern of posting 1-4 spam, but do so day after day. These would never bubble to the top if one is cleaning up every day, but are clearly something one would like to notice and respond to.