Download & Extend

URL filtering for domains doesn't catch all sub-domains

Project:Spam
Version:4.5.x-1.x-dev
Component:URL Filter
Category:feature request
Priority:normal
Assigned:Jeremy
Status:closed (won't fix)

Issue Summary

Adding a URL filter for a domain does not automatically filter all sub-domains. So, say I want to block all sub-domains at example.com. I'd have to enter each sub-domain individually instead of the domain once. The help text implies that you can do this, but it seems to actually be talking about sub-domains, not domains. Either the help text should be updated or this functionality should be added. I'd prefer the latter. ;)

In the UI, you could include a checkbox on the Add and Edit pages to "Include all sub-domains for this domain". On the URL filter list page, these domains would be listed as "*.example.com".

Comments

#1

Assigned to:Anonymous» Jeremy

When auto-learning "domains", the URL filter is actually learning sub-domains (ie, host and domain). This is intentional, as the later matching logic is quite zealous and I didn't want to start getting a lot of false positivies. Perhaps instead I should update the preg_match code in spam_filters_url() to more carefully match URLs.

If you wish to catch all subdomains within a domain, simply list the root domain. (Goto "administer >> spam >> URL filters" and add a new or edit an old entry)

For example, if you want to match *.example.com, simply enter the domain as "example.com". If instead you only want to catch only "test.example.com", then enter the full subdomain: "test.example.com".

For more complicated pattern matching, you should define a custom fiilter using regular expressions. I do not intend to add regular expression or wildcard support to the URL filter.

Does this work for you? Or do you still believe there is some missing/broken functionality? I will update the documentation per the above explanation.

#2

I've been getting hit with the online casino spam. So, I manually added a filter for one of the domains. The current spams didn't get filtered automatically (I don't know why I expected that; a "Filter now" function would be nice). So, I used the comment.module.patch to mark the comments as spam en masse. A filter was added for the sub-domain, completely ignoring the already existing domain filter.

I guess I'd like to see two things:
1) A "Filter now" function to search for matching spams on the URL filter page
2) Have the URL filter check for existing domains before adding sub-domains

#3

Category:bug report» feature request

Marking this as a feature request.

> 1) A "Filter now" function to search for matching spams on the URL filter page

This is something I've thought about for a while, and intend to implement. However I saw it more as a per-filter option, for testing filters. ie, under the operations column you'd click "test filter", and then see a list of all matching comments/nodes. On that page, you'd be able to "mark all as spam", or to select a subset and mark them as spam.

I intend to add this functionality to the custom filter page, too.

Would it be necessary to have another button allowing you to test all filters at once, too? (or to select n filters and test them together?) You'd only need it once, to clean up pre-existing spam. But it'd be mighty useful for that.

> 2) Have the URL filter check for existing domains before adding sub-domains

Hmm. This easy enough to accomplish, but I'm debating the merits. Can you open a separate feature request for this?

(My concern is: for this to have any value, when adding domains it would also have to check for pre-existing sub-domains and remove them. That could potentially be confusing to an administrator. Then again, I've already got so many URL filter entries automatically added in my database that I'd never notice if one of them went away... And it would be good to minimize the clutter. Okay, I'll probably implement this.)

#4

It seems like the filter test issue should be the separate feature request, as this issue was originally about the sub-domain vs. domain issue. I've submitted it as http://drupal.org/node/14382.

Your concern has brought another couple of issues to mind. First, you may want to be able to filter out a domain, but allow a certain sub-domain. So, you need some kind of whitelist. Then, you want to be able to organize all filters from one domain together, regardless of the sub-domain. So, when ordering alphabetically you should see:

- *.example.com
+ foo.example.com
- farboo.com

instead of:

- example.com
+ farboo.com
- foo.example.com

Of course, the whole issue of domain filters is complicated further by localized domains like *.co.uk, as you can't use a single "." as an indicator that something is a domain.

To address your concern, you can either remove a sub-domain when adding a domain, or disable it in some way. You'd have to be careful that removing a sub-domain doesn't lose all the information associated with it, such as the number of matches.

#5

The complexity you're describing sounds a bit much to me. Everything you've suggested for the URL filter is possible with custom filters. Thus I'm not sure I see the need to duplicate effort like this (whitelisting subdomains, etc).

Removing a subdomain would absolutely get rid of its statistics for that filter. The URL filter data is stored in the tokenizer table. Though for the complexity you're describing I'd probably have to move it into it's own table.

I'll give these feature requests more thought, and may implement a subset of them.

#6

I will not be adding this complexity to the module.

nobody click here