I am getting bombarded with email notifications for flags. I have two flags setup for each node with the JavaScript toggle option and anonymous users have permissions to flag. It appears that a web crawler / spider is indexing the site and clicking the flags on every single node, obviously ignoring the "nofollow" attribute. I just added /flag/ to robots.txt, but I don't know if this will help if this particular web crawler is not honoring the "nofollow" attribute.

What's worse, every single one of the non-global flags that got hit by the web crawler now appears as "reset" (the setting for unflagging) for authenticated users. If I click reset, it gives me the message as if I had clicked the flag originally, but it doesn't count the new click in the database and reset is still showing. It does seem to work as expected for anonymous users (i.e. unclicked and click is properly recorded). The only way to clear the flag status for authenticated users when this happens is to manually delete the flag entries for that node from the database.

Is there a better way to present the flags to anonymous users so that web crawlers are completely unable to click the links? Perhaps something based on JavaScript that can't be clicked by non-human visitors?

CommentFileSizeAuthor
#15 flag_prevent_spiders.patch4.04 KBquicksketch

Comments

Coupon Code Swap’s picture

Okay. Some more info on this. I checked the access logs and the offending IP is 192.84.19.225

When I checked the whois for that IP it traces to 8x8 - http://www.8x8.com/

I don't know why a VOIP internet service would be crawling the website. In fact, I had service with them years ago.

So, it seems there are spiders out there that will crawl websites without honoring the "nofollow" attribute, and this is a serious issue if you have actions setup to receive notifications when nodes are flagged.

quicksketch’s picture

Category: bug » support
Priority: Critical » Normal

I don't think we'll fundamentally change the way flags work for anonymous users. I'd suggest setting up a different flag for your anonymous users that behaves differently (such as requiring higher thresholds).

Coupon Code Swap’s picture

For some flags, it is important to receive email notification after just 1 click (i.e. user reports abuse / inappropriate content). Would it be possible to add an option working with sessions API that limits the total number of flags a visitor can click per session? If limited to 10 or whatever, that would keep the ornery spiders at bay when they do hit the site.

Bilmar’s picture

subscribing

Coupon Code Swap’s picture

How about generating the flag links for anonymous users with this code?

<a rel="nofollow" href="#" onclick='javascript:self.location.href="/flag/flag/..."; return false'>NO SPIDERS ALLOWED</a>

Seems like this would be a very easy thing to implement and I don't think there are any spiders that follow a URL specified with onclick.

moshe weitzman’s picture

Perhaps an option to require javascript in order to get a flag link. most crawlers won't execute js.

Coupon Code Swap’s picture

Okay. I tried adjusting /flag/theme/flag.tpl.php with the link proposed above. After clicking a flag I get "An HTTP error 200 occurred."

So, it isn't as simple as just replacing the flag links. I think that flag.js could be adjusted to work with the onclick URL rather than the href. I'm looking through the JavaScript for flag.js to see if I can figure this out.

If somebody already knows that this isn't possible, please stop me before I waste a bunch of time.

fumbling’s picture

Subscribing

crea’s picture

Subs

jrust’s picture

Subscribing

sam.clayton@gmail.com’s picture

undoIT's idea about rate-limiting anonymous' ability to flag sounds like a smart idea, or something in that vein.

Especially since this seems to be an isolated case originating from a single IP address of questionable value as an indexer, I would override and modify flag.tpl.php in your theme. Wrap the existing code in a conditional that causes the flag link not to be including on the page when $_SERVER['REMOTE_ADDR'] is equal to the offending IP address.

This is presuming you don't just want to block the IP address' traffic altogether, which I would be tempted to do in this case.

rburgundy’s picture

+1 to prevent spiders from spamming our sites
i hope someone will be able to provide a patch to be reviewed.
thanks!

robby.smith’s picture

+1 subscribing - i was wondering if there has been any development in this area? thank you!

BenK’s picture

Subscribing...

quicksketch’s picture

Category: support » feature
Status: Active » Fixed
StatusFileSize
new4.04 KB

In getting ready for the Drupal 7 port I wanted to fix this issue before going forward. The solution I've gone with is similar to undoIt's #5 recommendation, but it makes very few changes to the code structure overall and has no impact at all on current theming or registered users.

Essentially the links on the page for anonymous users will lead to a 403 page unless JavaScript is enabled. This is done by adding a "has_js=1" to the query string for anonymous users. If this query string is not found on the responding page handler, access is denied (with a respectable error message). Essentially it requires anonymous users to have JavaScript in order to use flag links. Flagging through confirmation forms is unchanged.

Committed so I can get a pre-Drupal 7 release out before attempting the port.

Coupon Code Swap’s picture

Great news! I've been getting hit a lot with spiders recently and it has been very time-consuming sorting out the legit flags from the spider flags. I have updated to beta 3 and will be testing. Thanks quicksketch :)

Coupon Code Swap’s picture

... so far so good. The patch is working beautifully and the spiders aren't bugging me any more.

tyler-durden’s picture

Hey everyone,
I'm trying to get this working, but no such luck. My settings are all correct (I have spent an eternity testing everything, including disabling my cache programs), but I just can't get the link to show for an anonymous user. Here are the versions I am running

Flag 6.x-2.0-beta3
Flag actions 6.x-2.0-beta3
Flag Note 6.x-2.x-dev
Session API 6.x-1.2

I'm questioning the SessionAPI, since I have not seen any documentation on which version should be used with this, since everything is beta and Dev platforms, which I do not normally use. Please advise...

quicksketch’s picture

This might seem like a strange question to ask, but have you edited the flag and enabled the anonymous role? I'd also suggest removing any custom theming you've done, since the templates changed between 1.x and 2.x.

tyler-durden’s picture

Anonymous is enabled, and I am currently using Contemplate for templates. I had 1.x on my site at one time, but I only played around with it for a few mn's and never used it after, and eventually I removed it.

quicksketch’s picture

I suggest you uninstall and reinstall the module or check that you don't need to run update.php. Another module may be removing the links or they might not be printed out in your theme.

tyler-durden’s picture

Ok, I will uninstalll and reinstall. I swear the first time I did it, I saw it show up for anonymous testing.

Here is the code Contemplate spits out:
<span class="flag-wrapper flag-report-scam flag-report-scam-720"> <a href="/flag/flag/report_scam/720?destination=admin%2Fcontent%2Fnode-type%2Fed_classified%2Ftemplate&amp;token=5017dae6fba6018d8dbd4c0c44d25726" title="" class="flag flag-action flag-link-toggle" rel="nofollow">Report a Scam</a><span class="flag-throbber">&nbsp;</span> </span>

It shows just fine when you are logged in, but disappears when logged out.

tyler-durden’s picture

I uninstalled and completely removed these programs. I deleted the files and reuploaded Flag 6.x-2.0-beta3, nothing else.

The Anonymous link does not show, and I have tested multiple themes.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

shopdogg’s picture

+