I replaced Captcha with Botcha 2 days back and thought everything is great & now finally boost can cache my pages fine.

But immediately after installing Botcha my status report started showing following error:
http://www.mysite.com/contact?29d1_name=17&37d5_name=56&7066_name=6c&179...

I upgraded from 7.1.4 to 7.1.5 release but still same issue.

I think the issue is occuring when Google is crawling the site. The IP address for this error belongs to Google.

Comments

iva2k’s picture

Thanks for reporting.

It happens only (at least it should) when a Botcha-protected form is submitted. Or crawler follows a path in form action= property, which is strange.

A recommended solution is to add paths like "/contact" and "/user/register" to robotx.txt - crawlers should not be indexing contact and registration pages as there is no information in them. If they ignore robots.txt - it is not google.

As for the solution... I think obscureURL recipe should be modified to have no initial token in the form action= property. That will eliminate this effect.

While at it, obscureURL should remove the token after form submission, so it does not remain in the destination and subsequent page loads. Not a concern, but pollutes logs and user caches with unnecessary clutter.

kent_drupal’s picture

Apart from /contact or user/register pages I have other webform pages which should be included in robot.txt for indexing purposes.
I can not remove from them from there.

I didn't get your solution about ObsureURL. Could you provide guide what needs to be changed in code/setting?
I can help with patch. I am sure many people might be impacted due to this.

iva2k’s picture

Patch would be great.
ObscureURL recipe sends out a secure token broken into pieces and a piece of javascript code that combines the pieces together and pastes the result into the query part of submission URL of the form. Botcha on the server verifies correctness of this secure token. Currently one of the secure token pieces is sent out in the initial form submission URL (which is in the action property of <form> tag). Since Botcha regenerates the secure tokens on each form submission, the URL query never repeats, which creates an infinite chain of URLs. Crawlers apparently pick the whole URL from the form tag and follow it. If this piece of secure token is removed, URL will be clean from the &***_name=*** bit, and they will follow back to the same page, eliminating the infinite loop. I hope this explanation makes it more clear.

kent_drupal’s picture

There are different types of botcha recipes. I don't understand their purpose.
But I think here's the code that you are talking about:

file: botcha.botcha.inc (line#334) v7.1.5

  // Describe URL field. JS will return token in URL field.
  $recipe->url_elements = array(
    $field_name_url => array(
      '#type' => 'textfield',
      '#default_value' => '',
      '!valid_token' => $secure_token,
    ),
  );

  $selector = "input.$field_class";
  $submit = _botcha_url($form['#action'],
    array('query' => array($field_name_url => '__replace__')));
  $submit = preg_replace('/__replace__/',
    $js_tok1 . '\'+v+\'' . $js_tok2   // $secure_token
  , $submit);

  $recipe->js = <<<END
(function ($) {
  Drupal.behaviors.{$js_name} = {
    attach: function (context, settings) {
      $("{$selector}").each(function() {
        f=$(this)[0];
        if (f.value.indexOf("{$js_match}")==0){
          v=f.value.substring({$js_pos});
          form=$(this).parents("form#{$form_id}")[0];
          $(form)[0].action ='{$submit}';
        }
      });
    }
  };
}(jQuery));
END;

  return $recipe;
}
iva2k’s picture

yep, thats "ObscureURL" recipe. Now looking at it I don't see the initial piece of secure token in form.action. I must have imagined it (or forgot - it's been a long time ago). Must be crawlers do indeed run js and let it fill the action property, then follow it. It does complicates things a bit, and redesign / patch may require a lot of work... which will not be portable to the HEAD, so I withdraw my first proposal to patch.

For your specific problem you can just disable this recipe. In this 1.x branch there is no recipe book GUI, so the only way is to comment out the line where the recipe is included into the recipe book. There are still plenty of other recipes remain.

kent_drupal’s picture

This is _botcha_recipe3. So, you are suggesting to comment it in $recipe_book as below?

else {
$recipe_book = array(
'_botcha_recipe1',
'_botcha_recipe2',
'_botcha_recipe3',
'_botcha_recipe4',
);

kent_drupal’s picture

Below change didn't help.

else {
$recipe_book = array(
'_botcha_recipe1',
'_botcha_recipe2',
//'_botcha_recipe3',
//'_botcha_recipe4',
);

Am I missing something?

iva2k’s picture

Yes, that's correct in #7. code in #6 did not have comment-out slashes. You said the #7 did not help? That is not right... You checked that file did change on your server, right?

Can you send me a link to the website in question? I can check if the form is constructed properly.

kent_drupal’s picture

While I made changes in #7, I think google crawler was working on sides so I got 2 junk files created in cache.

I cleaned them and waited overnight to see if they come back. They didn't.

Is there any impact due to commenting out 3rd receipy?

iva2k’s picture

Priority: Major » Normal

The remaining 2 recipes are pretty strong on their own. The third one was added to keep increasing Botcha strength and stay ahead of the spambot development. Sadly it has some unresolved issues, but for now its disabling does not make Botcha weaker.

iva2k’s picture

Issue tags: +ObscureURL
iva2k’s picture

Title: Botcha causing junk files in boost » Botcha ObscureURL causing junk files in boost
Issue tags: -ObscureURL
iva2k’s picture

Issue tags: +ObscureURL

Tagging again.

iva2k’s picture

Issue summary: View changes

updated error