It seems that since a day or two sites listed at the drupal page get a lot of comment spam. The comment is an off topic quote from some database followed by a link towards a casino site. What follows is partly a technical analysis of the spam attack towards the drupal sites, a start for a "how to prevent comment spam" ans some ideas how we can fight this together. Room for discussion, so put your comments in this thread (as long as you dont mention the war^H^H^Hcasino :-)
TECHNICAL
The spammer is using lots of IP addresses. These Ip addresses are most likely not associated with the spammer, but are bad configured proxy servers that are misused or boxes that have been cracked and the cracker installed a proxy server for him/her on it. So you might want to complain at the abuse desk of the ISP of the IP address but that wont solve the problem.
The spam it self is a short off topic quote with a link to a casino site. The quote is most likely from a database. The plugged sites have the form of www. a casino term 3 digits .com, for example www. poker-rooms-777.com. Some of the plugged sites are already out of the whois database. It seems that the sites are hosted in China, or at least that is what the APCIC returns on 222.47.62.198 (reverse for www. poker-rooms-777.com)
inetnum: 222.32.0.0 - 222.63.255.255
netname: CRTC
descr: CHINA RAILWAY TELECOMMUNICATIONS CENTER
descr: 22F Yuetan Mansion,Xicheng District,Beijing,P.R.China
country: CN
.
The registrant of the sites seems to be Russian people, at least that is what they filled in in the whois database.
Domain name: poker-rooms-777.com
Registrant:
Inna Fridman (2PH7Q) gazelhofman@yahoo.com
SilverStar
balshaya nikitskaya 23
Moscow, RU 52333
Russian Federation
Phone: +7 (095)2917973The useragent of the poster is
Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)
It seems that the lame spammer starts with a low nid and works its way up. So you will likely find the spam at your first posting (nid=1) and then it will increment. Sometimes there are two comments spams on a nid (from different IP addresses), but mostly you will find one spam message.
The action of the script is:
POST /myblog/comment/reply/32? HTTP/1.0
So far I have seen the spam from 1648 IP addresses. I thought about adding the list here so people can drop this list in their firewall, but others might misuse the list so I am not posting it here
WHAT TO DO
You might want to drop the offending IP addresses in your firewall table to stop them from using your service. This will not work, as long as there are misconfigured proxy servers are boxes that can be cracked, you will loose this way. For the time being you might want to tun off the option for anonymous users to post comments. This will stop the bot but will make your site less attractive for visitors since they have to make an account to interact with the content on your site.
You should install the excellent spam filter of jeremy and try to feed it with spam words. This is a good solution though there are some issues with the tracker. Spam wont be published but the tracker will still show the posting where the spam was attached to as new. The spam that is already in the database can be deleted from the command propmt in MySQL or from within phpmyadmin. There is no easy way to get rid of bulk comments in an easy way within drupal yet, but work is underway. To delete mass spam messages search for an exclusive word and enter to MySQL
mysql - user -ppassword databasename
mysql> DELETE FROM `comments` WHERE `comment` LIKE '%free-casino-games%';
Though this wont make me popular, I think it is best not to advertise what engine is running your site (hence, no powered by drupal text or logo) and most of all do not advertise your site to the drupal list. Though I am proud to use drupal, it is bad security to say what you are running.
AND NOW
I think we ought to make a compiled list of spamvertised sites and publish it here by someone we trust and by a way that can not be influenced by spammers. Note, spammers might take revenge by dDOSsing drupal.org, they have been doing that before for other sites. There is a need for a captha module for posting which will stall most bots.
But most of all, never EVER do business with people who spam. Dont go to their sites, dont buy their products and spread that word. As long as a profit can be made by spammers, they will spam...
Comments
Advertising Drupal
It is trivial for a bot to see if a site is running Drupal (URLs, location of stylesheets, form structure/element names, ...). Whether or not you have a "powered by Drupal" button or text on your site does not matter at all ;).
Don't confuse a spammer's lack of ethics with a lack of intelligence.
Also, it is probably useful not to include the "mysql" command and "mysql>" prompt in your instructions. Most people will use phpMyAdmin or something similar to administer their database. Others should be smart enough to know the syntax of the "mysql" command.
yes
Yes it is easy to see if a site is running drupal, once you know how drupals URI look like. That will change however when other engines will use the drupal clean url aproach as well. :-)
But given the time that nmap was able to see what HTTP process was running instead of a HTTP proces, fingerprinting a service is not that easy. There are 100+ php CMS-es in 1000+ of versions out there, and drupal is not the one most used.
But I am afraid that you are right, lack of ethics is not the same as some parts of intelligence. Still I think it is unwise to say what you are running. I will still say I run drupal on LAMP but I would not advise a company to do so.
--
groets
bertb
--
groets
bert boerland
Be careful with that SQL query
Deleting comment spam like that leaves your
node_comment_statisticstable in an inconsistent state and will, amongst others, result in incorrect comment counts.and
so could someone post the less crude way of deleting bulk comments?
--
groets
bertb
--
groets
bert boerland
Hmmm
Anyway to fix the node_comment_statistics table if you have performed a similar mass delete of these spam comments? Maybe something like a rebuild_comment_stats script?
My site is a personal blog, so no biggy if this can't be fixed easily as Drupal is still functional, the comment counters are just all off... Guess this is what I get for reacting to quickly :)
1st, get the fixed
1st, get the fixed spam.module, I used the patch from here:
http://drupal.org/node/14263
Here's what I've used for my website:
create content -> page
paste the following code, hit preview.
It worked for me, But If it ruins your site, I'm not responsible!
<?
$result = db_query("select * from node_comment_statistics");
while ($node = db_fetch_object($result))
{
echo "Manipulating node: ";
echo "$node->nid";
_comment_update_node_statistics($node->nid);
echo "
";
}
?>
Perfect!
Worked like a charm! Thanks msameer!
Recommendation for mass marking of spam
Jeremy has included a patch to the standard comment module in the spam modules optional folder that enables mass marking of comments as spam. It is extremely helpful.
more URLs
This site has had a total of 1291 spam comment postings. in ~ 24h ? oouch!
The script is advertising 4 URLs here:
www .1-poker-games.biz
www .free-casino-games-000.com
www .poker-rooms-777.com
www .texas-holdem-0.com
Not only one.
I've noticed that most of the posts contain the word "poker", I've added it as a custom filter with a "usually spam" effect.
Blocking that spammer
I took the very extreme measure of blocking access to their exact user agent:
SetEnvIfNoCase User-Agent "\(compatible; MSIE 5.5; Windows 98; Win 9x 4.90\)" denyThis
{Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=denyThis
{/Limit>
Also, closing comments on all older nodes will prevent them from hitting. Use a SQL statement such as 'update node set comment=1 where nid
Since I've done those two things I haven't seen any more spams from them.
--
Mike Cohen, http://www.mcdevzone.com/
Blacklisting
I started getting these comment spams, too. One other type of irritating hit to my web site is bad crawlers. I get a ton of repeated hits to /title/node every few days, plus other bad crawls. So I was thinking of a solution where I can mark IPs and comments as spam from within Drupal. Drupal could then send that info to a central place (just as our login info can be shared). If a few people mark the same IPs and comments as spam, they're blacklisted. Then every site could check the shared blacklist and reject hits from those on the list.
One big problem is bandwidth. We don't want to search a central blacklist for every hit or post. So maybe part of the cron job would be to download the blacklist from the central repository, then each site could check it itself.
I'm not sure how good a solution this could be, but I could start creating the module. Anyone have any thoughts? Maybe I'll formalize the idea some more and post it as a separate forum topic.
---
Don't use Microsoft software
the more techniques the better
I think blacklisting is a useful tool, but only up to a point. I've been thinking about comment-spamming lately, and here are some other ideas:
1. Add a field to comments that contains a "spamminess" score.
2. Accept comments from anonymous users, but mark every comment that is not submitted by a logged-in user with a "might be spam" score. Periodically look back over recently submitted comments and if many comments contain the same URL, raise their spamminess score. If many comments come from the same IP number, likewise.
3. Include a lot of honeypot "submit" buttons on the form. Hide all of them using CSS. If a script "clicks" on any of those buttons, you know that it's spam.
4. Look at recent activity on a node. If there have been no comments in a long time on a given node, new comments are more likely to be spam.
5. Once a comment has reached a spam-threshold, it would be deleted or moved into a penalty box, contents added to a Bayesian filter, other information reported back to a central spam repository, etc.
Adam Rice
Mostly there
Unless I misunderstand you, 1, 2 and 5 already are part of the spam module for Drupal. 3 looks very interesting. 4 would not be that helpful for me, as I do get people digging up old posts and commenting on them ... the time has not seemed to be much of an indicator of spam.
The last part of 5 would be great -- I believe the module already checks something, but whether that's a dynamically updated database, I don't know. I could see that becoming a problem, however. Irresponsible use in, for example, a flame war on some cretin's site could lead to legitimate domains, usernames, etc. getting tagged as spam, with implications for hundreds or thousands of sites. So the Bayesian analysis would probably be required on the database itself, as well.
Anyway, the module has saved me so far. I am much appreciative!
--
mediagirl.org
Use Encoded Image in Submit Form
As an alternative to filtering spam on submit, what about a means to verify that a post is coming from a human. Alot of website use a JPG/GIF image with a code that when typed into the submit form allows users to successfully submit. Does anyone thing this might help?
Coolcrap
Tech toys, gadets and gizmo news from around the net.
[...]Alot of website use a
Yes, that helps. That are so called captchas. The captcha module does that kind of stuff.
edit: As I already mentioned in another thread you can reduce the server's work by changing the comment links to something else and by redirecting requests of the usual comment urls to an emtpy page (for avoiding the need to generate a 404). That will force the spammer to either modify the script (they fear work haha) or they will switch to smarter (crawling) bots which can be knocked out by captchas.
Ah that's what their called
Ah my ignorance is showing again. Captchas, what a clever name.
Coolcrap
Tech toys, gadets and gizmo news from around the net.
maybe a simple way
captchas have their own problems (usability for blind users ?)
i dont know all of the modules yet (making a summary right now)
but maybe its possible to alter the comment function to ask the user
a simple Question which is given by the admin (and changed from time to time) maybe a question-list ?
e.g. "please type in numerics twentyfour"
or similiar
ive seen that for a wordpress blog and i like it, anonymous user can still post and you are even able to test the users intelligence
Novel
That's a novel approach and I guess it would still meet usability requirements.
Coolcrap
Tech toys, gadets and gizmo news from around the net.