Senpai just brought this up in irc - view the source of http://drupal.org/profile, and notice that none of the homepage links have rel="nofollow" - this means that all those links are generating link juice and page rank for the accounts, and may well explain why we have so many profile spammers.

IMO, it'd be good to fix this in core, but we probably need a quick fix for this on d.o itself this was astupid idea, crawlers don't have js (how about jQuery to add the rel="nofollow" to all links from various paths or something?)

Comments

mfb’s picture

aww, that would have been pretty nifty if googlebot now does full js execution on every pageview.

dman’s picture

OMG I just noticed this now also.

(Doing my fortnightly trawl for sock puppets that like sex and mp3s)
No Wonder it's such a honeypot.
Sheesh.
Please bump this up in priority. We know it's easy.

The 'interests' section of d.o. is an embarrassment. The whole thing should be dropped via robots.txt.
While we are at that, can someone run a SQL that just deletes all users that never made a post and are older than 6 months? Or something? SRSLY.
Playing whack-a-mole with the spammers is all very well, but just wrong.
And run a scan for anyone whose login name is identical with their website and never posted. I don't trust them.

Going further off-topic - surely there's no harm in actually DELETING these bad users that never made a post? I get the feeling it would free up the username space and improve the user experience for the next years lot of new members.

Sorry to clutter the issue - just thoughts. There's gotta be MILLIONS of dummy acconts on d.o.
.dan.

gerhard killesreiter’s picture

I am not sure about this one. Many legitimate pages could use that link juice for legitimate reasons. I have actually implemented a blacklist that will allow a site maintainer to block inappropriate homepage urls.

dman’s picture

Status: Reviewed & tested by the community » Active

legitimate uses are legitimate, yep.
But if nofollow is appropriate on forum posts - where links may also be legitimate - then the same logic should apply.
... For all the difference it will make. The spammers don't know or care and will continue anyway I guess.

And the clean up you did last month has certainly helped heaps. That's much cooler. I found it hard coming up with nasty keyphrases to get hits on.
But Google still has crap in it ... and will continue to forever unless the freetext make-up-your-own-URL behaviour is blocked or bounced to a 'no results' page. You see the URLs that once existed are now empty, but still valid google pages.
I think blocking is easier, but bounce is a reasonable solution too.

moshe weitzman’s picture

Status: Active » Reviewed & tested by the community

The point here is to inform other users about what you do, not to use drupal.org link juice. I agree with nofollow.

gerhard killesreiter’s picture

There's no patch.

I've patched profile.module to return a 404 for interest and industry pages that don't have an entry. This way google should drop them after a while. They are already eexcluded by robots.txt. Not sure if I did that correctly.

michelle’s picture

Status: Active » Reviewed & tested by the community

It looks like I'm in the minority so not sure there's any point in commenting but at least I've got my vote in. I spend a lot of time helping out d.o and I'm not alone in that. Having the homepage on the profile page give link juice was a nice perk and gave a little back to me and others who spend their time making this place better. The page rank on that page comes mostly from d.o but also has to do with how much it's linked to. Not everyone has the same PR on their page. So I feel like I earned a bit of that juice that I'm using.

If it's abused, I'd rather see ways of cracking down on the abusers rather than taking it away from those of us who use it legitimately.

Michelle

michelle’s picture

Status: Reviewed & tested by the community » Active

Cross posted status change.

mfb’s picture

If there's a need to add rel="nofollow" it would be nice to make it role-based, so new users get it but not someone who has been contributing for years.

JohnForsythe’s picture

I agree with Michelle.

It's been shown nofollowing links does not decrease spam. Spam in the forums hasn't decreased as a result, and it's unlikely profile spam would either. This change would only hurt legitimate users.

gerhard killesreiter’s picture

Apparently, I forgot to block profile/companies. THis is done now and removal from the index has been requested.

dman’s picture

Status: Needs review » Reviewed & tested by the community

Michelle - what juice there was, was (until last month at least) being sucked up by absolute thousands of bastards selling pills and weight loss and ringtones, viagra and sex.
Huge amounts of them are now taken care of, thanks to Gerhard and some magic queries.
So this changes things somewhat - but (I believe) there are still thousands out there, some with no more than their URL.

If Gerhard is confident that the processes in place will pro-actively prevent this building up again ... and we can somehow do a bit of weeding, then it looks like the nofollow could be forgotten about, and the section would retain its value.

But currently, your juice is still coming from a pool that many others have pissed in.

Agreed on the futility of hoping that nofollow will stop spam - it just won't reward them for doing it. :-/

dman’s picture

Status: Active » Needs review

Sorry, x-post on the issue, sorta.

Gerhard? Have you tried a query to find duplicate URLs in different profiles?

kbahey’s picture

Status: Reviewed & tested by the community » Needs review

I am proposing a cron query on a nightly basis searching for certain keywords, and emailing this to the infra and/or the webmasters list.

This can be part of the http://drupal.org/project/drupalorg collection of modules.

dman’s picture

You mean like to see if any more people have signed in overnight expressing an interest in money? I'm getting a 50% hit rate from them... Or maybe they are just behaving like real estate agents, it's a bit hard to tell sometimes there...
:-B

gerhard killesreiter’s picture

The query I use is:

select concat('http://drupal.org/user/',u.uid, '/edit'), p.value from users u inner join profile_values p on u.uid = p.uid where p.fid = 13 and p.value like 'http://%' and u.status = 1 order by u.uid desc limit 10;

This will give me the latest 10 users which have chosen a homepage. If there are duplicates then I will investigate. We could make the result of this query publicly visible somewhere.

dman’s picture

Yep, that'll be helping. Good work on the quick 404 bounce fix, BTW ;-)

I keep on finding new anomalies ...
Like users who have attended every Drupal event ever and signed up a few months ago.

All I have to do is imagine that I was really dumb, but thought I was being clever ... :-}
idjits.

gerhard killesreiter’s picture

Google was quick and removed profile/companies already.

gerhard killesreiter’s picture

Note: Links on the aggregator don't have rel=nofollow either.

catch’s picture

I'm less concerned about the aggregator, if people spam planet we can remove them from it and publically humiliate them, user profiles have many tens of thousands less eyes on them.

HS’s picture

This is a huge issue these days. Over 300 profiles created on some sites I know of in just a few days to get backlinks.

Having 'no follow' actually works but it will take a long time before the word gets out that spamming on Drupal sites is useless. Those individuals spamming profiles enter their capture correctly, so it's not a spam bot, but an actual person; 'no follow' has to be implemented at some point.

I bet there is already a list floating out there with 1000s of Drupal based sites that people can submit their links to.

webchick’s picture

Priority: Normal » Critical

This is happening again: http://drupal.org/node/1478056

This is such a huge problem that cweagans pointed out that "SEO experts" are actually recommending that spammers post profiles on Drupal.org for link juice.

I'm escalating this to critical.

killes@www.drop.org’s picture

The profile spammers on the referenced issue were on g.d.o not d.o. I couldn't find a comment by cweagans regarrding a statement about SEO spammers.

I think a fix for d.o would be to allow the "dofollow" homepage url for contributors, ie people with the it vetted user or documentation team role.

greggles’s picture

Many of profile spammers were on d.o, not g.d.o.

Adding rel=nofollow to d.o hasn't stemmed the tide here so I'm reluctant to add it back for g.d.o. I have been thinking about the number of spammers on g.d.o and that we could look at blocked accounts on d.o compared to blocked accounts on g.d.o per month and see whether there is a trend in either direction. Certainly if g.d.o is contributing more to d.o being a vehicle for spam I'm open to changes.

HS’s picture

OK - as it turns out, "nofollow" is useless. Yahoo, Bing and other search engines, with the exception of Google, do not respect "nofollow". So, all these backlinks do count.

I've removed all URL fields in profile and the 'about me' field.

killes@www.drop.org’s picture

This is unfortunately still an issue, we have a huge amount of profile spammers which only link back to their "homepage".

Today, I got a 2nd email from google about spam being on drupal.org and that they had taken a "manual spam action".

Google doesn't say where it sees spam, though. The only place that could IMO matter to google is actually the homepage field on the profiles.

I've done a count and there are 96 distinct URLs in the homepage field that have more than 10 references totalling almost 2000 users.
Some of these urls are legit (if you discount that we require this to be a personal profile, ie linking to your employer is a bit shady).

The real problem is that there probably many more urls which are referenced less than 10 times or only once.

Now I need a drush command that blocks users based on values of their profile fields...

killes@www.drop.org’s picture

I've created such a command and I've started to block some of the profiles.

However, this isn't really a solution as we are unlikely to notice the vast majority of such spammers.

So, we should find a way to add nofollow to the user pages for users who aren't contributors yet.

I've looked into doing this on D6, but I don't think this would work without a core hack. Somebody knows if this can be done in D7?

killes@www.drop.org’s picture

#1772152: Please evaluate profile spam view was created to help with this issue.

I've been banning quite a few users using my drush script (now allowing for LIKE queries...), but this is not something a single person can clean up.

killes@www.drop.org’s picture

I've blocked nearly 20k accounts over the past two days.

However, these were the more obvious ones where somebody spam-bombed a site and then used d.o accounts to direct link-juice to these spam posts.

Under the assumption that no real user would have the spammed site as his website, I was able to block all these accounts using LIKE queries:

drush profile-block homepage --match 'http://%hydra-aqua.com%'
You are about to block 177 users. Proceed? (y/n): y
(time passes)
Successfully blocked 177 users having set profile field homepage to http://%hydra-aqua.com%

That was one of the cases where the linkjuice was directly sent to the spamvertized site.

Obviously, this is less easy for sites such as wordpress.com or blogspot.com where people might have their real sites.
However, spammers will often link to a particular blog entry rather than the blog in general, so these are gone too.
There were also nearly 1k of linked youtube videos (not channels)

A lot of cleanup still has to be done.

I've generally had a cursory look over the uids and I am quite sure I didn't block anybody important. That's no guarantee, though...

Useful query to get the domains where the homepage url directs to:

select SUBSTRING_INDEX(SUBSTRING_INDEX(p.value, '/', 3), '.', -2) as svalue, count(*) as count from profile_values p inner join users u on u.uid = p.uid where p.fid = 13 and length(p.value) > 5 and u.status = 1 group by svalue order by count desc limit 50 ;

check out the actual values:

select p.value, count(*) as count from profile_values p inner join users u on u.uid = p.uid where p.fid = 13 and length(p.value) > 5 and u.status = 1 and p.value like '%foo.com%' group by p.value desc ;

killes@www.drop.org’s picture

greggles pointed out to me that the users will not have gotten blocked by bakery. I was assuming they'd get blocked as well. :(

killes@www.drop.org’s picture

I've bloked roughly 3k account on g.d.o too and created #1774088: Block spammy user profiles to explain the process. sreynen has taken this over for g.d.o

killes@www.drop.org’s picture

Blocked an additional 750 accounts who used rel="follow" in their bio.

One contributor was caught by this, I renabled his account and sent a message to not do that.

eliza411’s picture

Status: Needs review » Closed (fixed)

Closing old issues. Please re-open if needed.

Project: Drupal.org infrastructure » Drupal.org customizations
Component: Drupal.org module » Miscellaneous