I've got 5 pages of log entries between 11/06/2012 06:08 and 11/06/2012 11:21 where Google bot* is spamming the crap out of an amazon store. Can't do anything about Google wanting to spam, but it's triggering an Amazon error, which will probably get Amazon pissed enough to disable/kill the Amazon API account.

Included two of the log entries for analysis.

Anyone have a quick patch that can sanitize the request to Amazon to not kick errors?

Thanks,
Sam

* Seems like a single Google IP , which has been banned. Google has been really crappy this year with it's bots being twits.

# drush pmi amazon
Project : amazon
Type : module
Title : Amazon API
Description : Provides integration with the Amazon Ecommerce APIs.
Version : 6.x-1.4
Package : Amazon
Core : 6.x
PHP : 5.2
Status : enabled
Path : sites/all/modules/amazon
Schema version : 6016
Requires : none
Required by : aat_legacy, amazon_examples, amazon_filter, amazon_media, amazon_search, amazon_store, asin,
amazon_store_hooks_listener

amazon 11/06/2012 11:42 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 11:21 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 11:18 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 11:15 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 11:13 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 11:06 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 11:02 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 11:01 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 11:00 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:59 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:55 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:53 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:52 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:49 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:49 Amazon items could not be updated. Guest
warning amazon 11/06/2012 10:49 Error retrieving Amazon item ... Guest
warning amazon 11/06/2012 10:49 Error retrieving Amazon item ... Guest
amazon 11/06/2012 10:47 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:45 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:45 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:44 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:43 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:41 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:39 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:39 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:37 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:35 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:35 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:32 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:30 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:28 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:26 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:24 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:23 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:22 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:18 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:16 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:15 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:15 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:14 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:12 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:12 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:10 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:09 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:04 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:04 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:03 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:02 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:01 There was an error accessing amazon. Message=Amazon ... Guest
amazon 11/06/2012 10:00 There was an error accessing amazon. Message=Amazon ... Guest

Type amazon
Date Tuesday, November 6, 2012 - 11:42
User Guest
Location /amazon_store?page=10&SearchIndex=HealthPersonalCare&Brand=.&BrowseNode=3763261&MinPercentageOff=25
Referrer
Message There was an error accessing amazon. Message=Amazon error returned. Code=AWS.ParameterOutOfRange}, Message=The value you specified for ItemPage is invalid. Valid values must be between 1 and 10. //, results=SimpleXMLElement Object ( [OperationRequest] => SimpleXMLElement Object ( [HTTPHeaders] => SimpleXMLElement Object ( [Header] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => UserAgent [Value] => Drupal (+http://drupal.org/) ) ) ) [RequestId] => c92c2693-7b35-4dca-8853-cf2dfa9fbf33 [Arguments] => SimpleXMLElement Object ( [Argument] => Array ( [0] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Condition [Value] => New ) ) [1] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Brand [Value] => . ) ) [2] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Operation [Value] => ItemSearch ) ) [3] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Service [Value] => AWSECommerceService ) ) [4] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Signature [Value] => nD2IcBS8nseLfSV70BFarlJ/+mX9eAP1CoETxIKL7bU= ) ) [5] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => MerchantId [Value] => Amazon ) ) [6] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => ItemPage [Value] => 11 ) ) [7] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => AssociateTag [Value] => mimu04-20 ) ) [8] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => BrowseNode [Value] => 3763261 ) ) [9] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Version [Value] => 2011-08-01 ) ) [10] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => AWSAccessKeyId [Value] => AKIAIKNLDPNEPGSZRARA ) ) [11] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Timestamp [Value] => 2012-11-06T16:42:31Z ) ) [12] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => ResponseGroup [Value] => Variations,Images,ItemAttributes,OfferFull,EditorialReview,SearchBins ) ) [13] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => SearchIndex [Value] => HealthPersonalCare ) ) ) ) [RequestProcessingTime] => 0.0031210000000000 ) [Items] => SimpleXMLElement Object ( [Request] => SimpleXMLElement Object ( [IsValid] => False [ItemSearchRequest] => SimpleXMLElement Object ( [Brand] => . [BrowseNode] => 3763261 [Condition] => New [ItemPage] => 11 [MerchantId] => Deprecated [ResponseGroup] => Array ( [0] => Variations [1] => Images [2] => ItemAttributes [3] => OfferFull [4] => EditorialReview [5] => SearchBins ) [SearchIndex] => HealthPersonalCare ) [Errors] => SimpleXMLElement Object ( [Error] => SimpleXMLElement Object ( [Code] => AWS.ParameterOutOfRange [Message] => The value you specified for ItemPage is invalid. Valid values must be between 1 and 10. ) ) ) ) )
Severity notice
Hostname 66.249.73.162

Type amazon
Date Tuesday, November 6, 2012 - 10:00
User Guest
Location /amazon_store?page=1079&SearchIndex=HealthPersonalCare&Brand=4711&BrowseNode=3777371&MinPercentageOff=50
Referrer
Message There was an error accessing amazon. Message=Amazon error returned. Code=AWS.ParameterOutOfRange}, Message=The value you specified for ItemPage is invalid. Valid values must be between 1 and 10. //, results=SimpleXMLElement Object ( [OperationRequest] => SimpleXMLElement Object ( [HTTPHeaders] => SimpleXMLElement Object ( [Header] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => UserAgent [Value] => Drupal (+http://drupal.org/) ) ) ) [RequestId] => ced9b546-1f28-44f7-a1a7-17cfe54c2c80 [Arguments] => SimpleXMLElement Object ( [Argument] => Array ( [0] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Condition [Value] => New ) ) [1] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Brand [Value] => 4711 ) ) [2] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Operation [Value] => ItemSearch ) ) [3] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Service [Value] => AWSECommerceService ) ) [4] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Signature [Value] => 9EdEV3WphL14btuXHpNmxOr/KQ5aO1sj3NkJjsSk6m8= ) ) [5] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => MerchantId [Value] => Amazon ) ) [6] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => ItemPage [Value] => 1080 ) ) [7] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => AssociateTag [Value] => mimu04-20 ) ) [8] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => BrowseNode [Value] => 3777371 ) ) [9] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Version [Value] => 2011-08-01 ) ) [10] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => AWSAccessKeyId [Value] => AKIAIKNLDPNEPGSZRARA ) ) [11] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => Timestamp [Value] => 2012-11-06T15:00:55Z ) ) [12] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => ResponseGroup [Value] => Variations,Images,ItemAttributes,OfferFull,EditorialReview,SearchBins ) ) [13] => SimpleXMLElement Object ( [@attributes] => Array ( [Name] => SearchIndex [Value] => HealthPersonalCare ) ) ) ) [RequestProcessingTime] => 0.0024230000000000 ) [Items] => SimpleXMLElement Object ( [Request] => SimpleXMLElement Object ( [IsValid] => False [ItemSearchRequest] => SimpleXMLElement Object ( [Brand] => 4711 [BrowseNode] => 3777371 [Condition] => New [ItemPage] => 1080 [MerchantId] => Deprecated [ResponseGroup] => Array ( [0] => Variations [1] => Images [2] => ItemAttributes [3] => OfferFull [4] => EditorialReview [5] => SearchBins ) [SearchIndex] => HealthPersonalCare ) [Errors] => SimpleXMLElement Object ( [Error] => SimpleXMLElement Object ( [Code] => AWS.ParameterOutOfRange [Message] => The value you specified for ItemPage is invalid. Valid values must be between 1 and 10. ) ) ) ) )
Severity notice
Hostname 66.249.73.162

Comments

willvincent’s picture

Category: bug » support

Add this to your robots.txt:

Disallow: /amazon_store/
Disallow: /?q=amazon_store/
Michael-IDA’s picture

Category: support » bug

If Google will respect that... But it's a good call.

and 'tis bug, not a request.

Edit:
The bug is that the amazon module isn't checking the params it's sending to Amazon are valid.

Michael-IDA’s picture

Title: Google bot spam creating massive numbers of errors » AWS.Parameter(s) not being checked for validity

changed the title for relevance

willvincent’s picture

The bot should respect those entries in robots.txt

the only other significant option for preventing bot access to things in the store would be to deny access based on user agent, that could be a pain, and feels dirty.

Michael-IDA’s picture

"The bot should respect those entries in robots.txt"

made me laugh. Google bots are *&@holes. They do what they want*. We live with it.

“would be to deny access based on user agent, that could be a pain, and feels dirty.”

No, I'm not requesting that. And agree with your sentiments.

Just want some sort of sanity check by the module for the parameters it's shipping to Amazon. In this specific case, parameter “ItemPage” should be between 1 and 10, since it was “1079” in the second example my suggestion would be to do something like return an error type page back to the user. Something very simple like the base store page with no explanation, as a normal user probably won't send crap anyway.

And, even if the module could, automatically adding entries to robots.txt is the wrong approach as some sites will want specific/custom built store pages indexed.

Best,
Sam

*Just try to get them to respect a Crawl delay, Google won't even respect it even if you set it in their own webmaster tools ...

willvincent’s picture

Title: AWS.Parameter(s) not being checked for validity » Verify AWS.Parameter(s) before making calls to webservice
Category: bug » feature

Ok.. Changing this to a feature request then, since it involved implementation of new functionality.

We need to verify parameters, and optionally remove invalid ones and still process the request, or redirect to an error page, or simply return a status code.

I think giving people the option of which behavior to use on their site is good.

Cleansing the request would probably be possible in most cases. Your example of the ItemPage parameter being invalid, that could be reset to 10 if it's out of the range of 1-10...

status code might be a good choice. I could see any of the following being good options:
400 Bad Request
403 Forbidden
404 Not Found
perhaps 301 Permanent Redirect, with a redirection either back to the base store page, or to the error page.

What I'm curious about is how a bot came up with a value of 1079 for that parameter though. I'm suspicious about it truly being a google bot.

Michael-IDA’s picture

Hi Will,

Giving people the option(s) is probably best.

Although I guess I'm somewhat against the request cleaning route. From the standpoint: If the user clicks on a page built by the module, then the chance of a bad parameter is very low, so, again guessing, that most bad parameters are something outside the norm (bot, browser corruption, ??). And I don't want these forwarded on to Amazon, as, under those conditions it's a waste of bandwidth and might eventually trigger an Amazon API “bad account” event. But giving the user the option on what to do with them would be the 'ideal' choice.

66.249.73.162 is Google ( http://www.projecthoneypot.org/ip_66.249.73.162 ) and seems pretty consistent with its methodology. E.g. it finds something to increment values through and churns away. Eventually it will stop, but I don't think the stop trigger is real time, as I've seen it iterate over 404s.

Hmm, refining a thought, I guess I'd like a redirection back to the base store page, with a message block at the top stating something like, “The option(s) you have selected could not be found.” Something like the system message for, “The comments have been deleted.”

Michael-IDA’s picture

Update:

Adding this to your robots.txt:
Disallow: /amazon_store/
Disallow: /?q=amazon_store/

doesn't bother Google at all, it just switched to a new IP this morning and started pounding away again.

willvincent’s picture

Google is likely caching the previous version of robots.txt