I am getting bombarded with email notifications for flags. I have two flags setup for each node with the JavaScript toggle option and anonymous users have permissions to flag. It appears that a web crawler / spider is indexing the site and clicking the flags on every single node, obviously ignoring the "nofollow" attribute. I just added /flag/ to robots.txt, but I don't know if this will help if this particular web crawler is not honoring the "nofollow" attribute.
What's worse, every single one of the non-global flags that got hit by the web crawler now appears as "reset" (the setting for unflagging) for authenticated users. If I click reset, it gives me the message as if I had clicked the flag originally, but it doesn't count the new click in the database and reset is still showing. It does seem to work as expected for anonymous users (i.e. unclicked and click is properly recorded). The only way to clear the flag status for authenticated users when this happens is to manually delete the flag entries for that node from the database.
Is there a better way to present the flags to anonymous users so that web crawlers are completely unable to click the links? Perhaps something based on JavaScript that can't be clicked by non-human visitors?
| Comment | File | Size | Author |
|---|---|---|---|
| #15 | flag_prevent_spiders.patch | 4.04 KB | quicksketch |
Comments
Comment #1
Coupon Code Swap commentedOkay. Some more info on this. I checked the access logs and the offending IP is 192.84.19.225
When I checked the whois for that IP it traces to 8x8 - http://www.8x8.com/
I don't know why a VOIP internet service would be crawling the website. In fact, I had service with them years ago.
So, it seems there are spiders out there that will crawl websites without honoring the "nofollow" attribute, and this is a serious issue if you have actions setup to receive notifications when nodes are flagged.
Comment #2
quicksketchI don't think we'll fundamentally change the way flags work for anonymous users. I'd suggest setting up a different flag for your anonymous users that behaves differently (such as requiring higher thresholds).
Comment #3
Coupon Code Swap commentedFor some flags, it is important to receive email notification after just 1 click (i.e. user reports abuse / inappropriate content). Would it be possible to add an option working with sessions API that limits the total number of flags a visitor can click per session? If limited to 10 or whatever, that would keep the ornery spiders at bay when they do hit the site.
Comment #4
Bilmar commentedsubscribing
Comment #5
Coupon Code Swap commentedHow about generating the flag links for anonymous users with this code?
<a rel="nofollow" href="#" onclick='javascript:self.location.href="/flag/flag/..."; return false'>NO SPIDERS ALLOWED</a>Seems like this would be a very easy thing to implement and I don't think there are any spiders that follow a URL specified with onclick.
Comment #6
moshe weitzman commentedPerhaps an option to require javascript in order to get a flag link. most crawlers won't execute js.
Comment #7
Coupon Code Swap commentedOkay. I tried adjusting /flag/theme/flag.tpl.php with the link proposed above. After clicking a flag I get "An HTTP error 200 occurred."
So, it isn't as simple as just replacing the flag links. I think that flag.js could be adjusted to work with the onclick URL rather than the href. I'm looking through the JavaScript for flag.js to see if I can figure this out.
If somebody already knows that this isn't possible, please stop me before I waste a bunch of time.
Comment #8
fumbling commentedSubscribing
Comment #9
crea commentedSubs
Comment #10
jrust commentedSubscribing
Comment #11
sam.clayton@gmail.com commentedundoIT's idea about rate-limiting anonymous' ability to flag sounds like a smart idea, or something in that vein.
Especially since this seems to be an isolated case originating from a single IP address of questionable value as an indexer, I would override and modify flag.tpl.php in your theme. Wrap the existing code in a conditional that causes the flag link not to be including on the page when $_SERVER['REMOTE_ADDR'] is equal to the offending IP address.
This is presuming you don't just want to block the IP address' traffic altogether, which I would be tempted to do in this case.
Comment #12
rburgundy commented+1 to prevent spiders from spamming our sites
i hope someone will be able to provide a patch to be reviewed.
thanks!
Comment #13
robby.smith commented+1 subscribing - i was wondering if there has been any development in this area? thank you!
Comment #14
BenK commentedSubscribing...
Comment #15
quicksketchIn getting ready for the Drupal 7 port I wanted to fix this issue before going forward. The solution I've gone with is similar to undoIt's #5 recommendation, but it makes very few changes to the code structure overall and has no impact at all on current theming or registered users.
Essentially the links on the page for anonymous users will lead to a 403 page unless JavaScript is enabled. This is done by adding a "has_js=1" to the query string for anonymous users. If this query string is not found on the responding page handler, access is denied (with a respectable error message). Essentially it requires anonymous users to have JavaScript in order to use flag links. Flagging through confirmation forms is unchanged.
Committed so I can get a pre-Drupal 7 release out before attempting the port.
Comment #16
Coupon Code Swap commentedGreat news! I've been getting hit a lot with spiders recently and it has been very time-consuming sorting out the legit flags from the spider flags. I have updated to beta 3 and will be testing. Thanks quicksketch :)
Comment #17
Coupon Code Swap commented... so far so good. The patch is working beautifully and the spiders aren't bugging me any more.
Comment #18
tyler-durden commentedHey everyone,
I'm trying to get this working, but no such luck. My settings are all correct (I have spent an eternity testing everything, including disabling my cache programs), but I just can't get the link to show for an anonymous user. Here are the versions I am running
Flag 6.x-2.0-beta3
Flag actions 6.x-2.0-beta3
Flag Note 6.x-2.x-dev
Session API 6.x-1.2
I'm questioning the SessionAPI, since I have not seen any documentation on which version should be used with this, since everything is beta and Dev platforms, which I do not normally use. Please advise...
Comment #19
quicksketchThis might seem like a strange question to ask, but have you edited the flag and enabled the anonymous role? I'd also suggest removing any custom theming you've done, since the templates changed between 1.x and 2.x.
Comment #20
tyler-durden commentedAnonymous is enabled, and I am currently using Contemplate for templates. I had 1.x on my site at one time, but I only played around with it for a few mn's and never used it after, and eventually I removed it.
Comment #21
quicksketchI suggest you uninstall and reinstall the module or check that you don't need to run update.php. Another module may be removing the links or they might not be printed out in your theme.
Comment #22
tyler-durden commentedOk, I will uninstalll and reinstall. I swear the first time I did it, I saw it show up for anonymous testing.
Here is the code Contemplate spits out:
<span class="flag-wrapper flag-report-scam flag-report-scam-720"> <a href="/flag/flag/report_scam/720?destination=admin%2Fcontent%2Fnode-type%2Fed_classified%2Ftemplate&token=5017dae6fba6018d8dbd4c0c44d25726" title="" class="flag flag-action flag-link-toggle" rel="nofollow">Report a Scam</a><span class="flag-throbber"> </span> </span>It shows just fine when you are logged in, but disappears when logged out.
Comment #23
tyler-durden commentedI uninstalled and completely removed these programs. I deleted the files and reuploaded Flag 6.x-2.0-beta3, nothing else.
The Anonymous link does not show, and I have tested multiple themes.
Comment #25
shopdogg commented+