Download & Extend

D6 Port for First Click Free? What about Googlebot/crawler access ?

Project:First Click Free
Version:5.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:active
Issue tags:D6

Issue Summary

The title says it all really. Is there a plan to upgrade this or is there a preferred way to achieve the same result using something else in D6?

Comments

#1

Hi Adrian,
in this period I am very busy, please contact me through Drupal contact form for a sponsored (aka paid) port of the module.
To be sincere the module is not top-notch quality, so I am hesitant to upgrade it.
Talk soon,
Francesco

#2

Hi Adrian!
I ran this module through deadwood, to no avail.
You can use the IP_login module to do this. Just create a user and point and associate an IP address (or range of addresses) with it. I create a user such as 'Mary Elgoog' (that's google backwards), mark the profile private and enter the range of IP addresses. You'll have to create a user for each IP address or range of IP addresses.Go to http://www.iplists.com to get the current list.
CAUTION: Google (and others) will PENALIZE your page rank or even delist you if they think that you are not presenting the same content to them as you do the general public! This is called 'cloaking" and is against their terms of service. They use unpublished IP addresses and different user-agents to determine this.
There has been considerable discussion about how some sites, such as the New York Times, can get away with this without penalty, by the way.
Regards,
Brian Brown, Ph.D.

#3

Hi Brian,

Thanks for the info - I'll check it out but I think to avoid any conflicts with Google's guidelines I can't use source IP address and should use referrer instead. More info on Google's official webmaster blog post here:

http://googlewebmastercentral.blogspot.com/2008/10/first-click-free-for-...

#4

Hmmmm...then that does it in for this. The problem arises when a referer [sic] and/or user-agent is spoofed. Very easy to do. There are even a couple of FireFox extensions to do this. I am surfing as Slurp at the moment.

However the link you posted says:

"... You need to configure your website to serve the full text of each document when the request is identified as coming from Googlebot via the user-agent and IP-address." [emphasis added]

If the IP address is from Google, it is axiomatic that the user-agent is a googlebot. The problem arises when you do not allow a surfer who is refered [sic] by Google to view the content (or at least that one page). Google considers that to be cloaking, http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip...

-B

#5

The problem arises when a referer [sic] and/or user-agent is spoofed. Very easy to do

Depends on your audience. There's no perfect solution to this but at least using the fcf method, your ranking with Google stays strong. If you have premium content that you want Google to crawl, it can't be that secret surely?

If you really need to lock it down you can add in reverse DNS checking (http://www.google.com/support/webmasters/bin/answer.py?answer=80553) to make sure that a visitor claiming to be Googlebot really is although this is not completely infallible either.

#6

Hi,
I want to upgrade a site to drupal 6 version, which is using a module called First click Free
I used the deadwood module to convert .module and .info file.
Hers is what i recevied :
<?php

// implementation of hook_menu
function fcf_menu() {
$items = array();

/* TODO
Non menu code that was placed in hook_menu under the '!$may_cache' block
so that it could be run during initialization, should now be moved to hook_init.
Previously we called hook_init twice, once early in the bootstrap process, second
just after the bootstrap has finished. The first instance is now called boot
instead of init.

In Drupal 6, there are now two hooks that can be used by modules to execute code
at the beginning of a page request. hook_boot() replaces hook_boot() in Drupal 5
and runs on each page request, even for cached pages. hook_boot() now only runs
for non-cached pages and thus can be used for code that was previously placed in
hook_menu() with $may_cache = FALSE:

Dynamic menu items under a '!$may_cache' block can often be simplified
to remove references to arg(n) and use of '%' to check
conditions. See http://drupal.org/node/103114.

The title and description arguments should not have strings wrapped in t(),
because translation of these happen in a later stage in the menu system.
*/
$items['admin/settings/fcf'] = array(
'title' => 'First Click Free',
'description' => 'Configure First Click Free role.',
'page callback' => 'drupal_get_form',
'access arguments' => array('fcf_settings')
);
return $items;
}

// implementation of hook_init
function fcf_init() {
global $user;
/* if user is anonymous and we are coming from Google, add to $user a special role
to let him see the article.
Parse referrer to check if click comes from Google.
As per http://tinyurl.com/2jtwoc, check http://*.google.*
*/
$components = parse_url($_SERVER['HTTP_REFERER']); // subdomain.google.com
$host = $components['host'];
list($tld, $domain, $subdomain) = array_reverse(explode('.', $host)); // (com, google, subdomain)

if (variable_get('fcf_debug', 0) || $user->uid == 0 && $domain == "google" ) {
// fetch special role using rid from settings. default to 1 = anonymous user rid
$role = db_fetch_object(db_query('SELECT * FROM {role} WHERE rid = %s', variable_get('fcf_rid', 1)));
if ($role) { // should be always true, but we check nevertheless to avoid errors
$user->roles[$role->rid] = $role->name;
// if we are in debug mode, show an informative message
if (variable_get('fcf_debug', 0)) {
$message = t("First Click Free debug mode activated. You may toggle debug mode on !link.", array("!link" => l(t("First Click Free settings"), "admin/settings/fcf")));
}
else {
$message = t("Customize the message displayed to users when referred by Google. It's useful for helping or instructing them to register if they wish.");
}
drupal_set_message($message, "fcf");
}
}
}
// settings form (called from _menu)
function fcf_settings() {
$roles = user_roles(TRUE); // first argument TRUE excludes anonymous role from listing
$form['fcf_rid'] = array(
'#type' => 'select',
'#title' => t('Extra for First Click Free users'),
'#default_value' => variable_get('fcf_rid', 1),
'#options' => $roles,
);
$form['fcf_debug'] = array(
'#type' => 'checkbox',
'#title' => t('Enable debug mode.'),
'#default_value' => variable_get('fcf_debug', 0),
);
return system_settings_form($form);
}

fcf.info :
name = First Click Free
description = Let anonymous users read content linked from Google News
package = Other
version = 0.2

; Information added by drupal.org packaging script on 2007-08-19
version = "5.x-1.x-dev"
project = "fcf"
datestamp = "1187481789"

core = 6.x

The module has been enabled in D6 however i am not able to see a menu and settings in :
admin/settings/fcf.
Can anyoone help with changes in the module file to make it work.

Regards
Sagar

#7

Hi, Sagar!
I am unable to help you, but perhaps some kind soul will!
Thanks for your efforts!
=Brian

#8

Hi,

I have converted this module to D6 conversion, all who were struggling can use this.
It works well.

Cheers
Sagar

AttachmentSize
fcf.tar_.gz 7.53 KB

#9

Dear Sagar,
Thank you SO MUCH!!! You are wonderful!
Regards,
Brian

#10

Title:D6 Port for First Click Free?» D6 Port for First Click Free? What about Googlebot/crawler access ?

Thanks all for the work.

I wonder if anyone really tested if this module allows Googlebot to crawl protect pages (member-only). In my case, it works fine allowing access to users referred by Google (I tested it by setting access control -protecting- on an already crawled page) but it does not seem to let Google crawl protected-pages in the first place. I wonder if this is how the module is meant to work ! If not, then I appreciate any hints on how to achieve that.

Please see here my experience so far trying to make this work.

I am using the D6 port above kindly attached by Sagal.

Cheers