I've been trying to figure out how to cut down on spam on Drupal.org. I'm working on an automated solution, but in the meantime, I've come up with some queries that give us a good number of spammers to ban.

(Note: all queries run against a dev site that's a few months old, so numbers may not be accurate)

For instance,

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%viagra%" and u.status=1;
gives us 51 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%ugg%" and u.status=1;
gives us a whopping 852 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%porn%" and u.status=1;
...345 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%sex%" and u.status=1;
...874 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%cialis%" and u.status=1;
...961 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%levitra%" and u.status=1;
...14 spammers (evidently, this is not so popular anymore)

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%phentermine%" and u.status=1;
...26 spammers (nor is this one)

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%prescription%" and u.status=1;
...61 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%pharmacy%" and u.status=1;
...73 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%slots%" and u.status=1;
...45 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%bingo%" and u.status=1;
...56 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%poker%" and u.status=1;
...186 spammers.

select distinct pv.uid from profile_values pv join users u on u.uid = pv.uid where pv.value like "%casino%" and u.status=1;
...193 spammers.

select uid,comment from comments where comment like "%dofollow%";
This only gives 2, but I've seen the dofollow "trick" a lot on Drupal.org - in fact, the automated solution that I'm working on will look for that, as it's a good indication that the commenter is a spammer.

Altogether: 3,739 spammers (though they may not all be unique).

Have fun :)

Comments

mikey_p’s picture

Status: Active » Needs work

These queries don't filter on specific field in profile_values and a few of these could conceivably overlap in other fields such as industries worked, companies, etc. Some of these could easily be legitimate interests of users that aren't spammers. (Just search d.o for poker, there's even a project application for a drupal poker module.) Because of that it'd probably be best to also query on date of last login, number of posts/comments, etc or manually review the results from these queries.

For example check for users that have never logged in, with u.login = 0 or users that have not accepted git consent (very unlikely that anyone that has is a spammer) with u.git_consent = 1.

cweagans’s picture

New query:

SELECT DISTINCT
  pv.uid from profile_values pv
JOIN
  users u on u.uid = pv.uid
WHERE
  (
    pv.value like "%viagra%" OR
    pv.value like "%ugg%" OR
    pv.value like "%porn%" OR
    pv.value like "%sex%" OR
    pv.value like "%cialis%" OR
    pv.value like "%levitra%" OR
    pv.value like "%phentermine%" OR
    pv.value like "%prescription%" OR
    pv.value like "%pharmacy%" OR
    pv.value like "%slots%" OR
    pv.value like "%bingo%" OR
    pv.value like "%poker%" OR
    pv.value like "%casino%"
  )
  AND
  (
    pv.fid = 27 OR
    pv.fid = 18 OR
    pv.fid = 76 OR
    pv.fid = 74 OR
    pv.fid = 26 OR
    pv.fid = 25
  ) AND
  u.status=1 AND
  u.git_consent=0
ORDER BY
  pv.uid ASC;

For your reference, here's the fields I'm looking at (basically anywhere on the user profile settings where there's a textarea):

SELECT
  fid, title
FROM
  profile_fields pf
WHERE
(
  pf.fid = 27 OR
  pf.fid = 18 OR
  pf.fid = 76 OR
  pf.fid = 74 OR
  pf.fid = 26 OR
  pf.fid = 25
);

+-----+------------------------------+
| fid | title                        |
+-----+------------------------------+
|  18 | Drupal contributions         | 
|  25 | Companies worked for         | 
|  26 | Industries worked in         | 
|  27 | Interests                    | 
|  74 | Bio                          | 
|  76 | Roles in working with Drupal | 
+-----+------------------------------+

Returns 1749 rows on the spam dev site.

killes@www.drop.org’s picture

Status: Needs work » Fixed

we blocked a couple 10k of these.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.