In /admin/logs/referrers, I see links from my own site. This is useless, as the good (i.e. informational) results are crowded out of the list. I seem to remember it working, before my site went live. What could be wrong?

From /admin/logs/referrers:
"This page shows you all external referrers. These are links pointing to your web site from outside your web site."

Comments

bsimon’s picture

I've seen the same thing a few times. Seemed to happen in patches. Didn't see any obvious problems from it.

I wondered if it could be some strange behavior of a proxy (on the incoming user's ISP), but that's just a guess.

You can use Recent Hits to figure out the IP addresses of the hits you're seeing in the referrer list - might help understand what's happening

dman’s picture

You do realize that www.yourdomain.com and yourdomain.com are treated as totally different servers as far as most protocols (eg http) go?

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

ricmadeira’s picture

The problem is most likely what dman says.

Look into your .htaccess file; there's a setting there that allows you to force the address to go one way or the other (with the www. or without it). That should you get rid of those URLs in your referrer list.

Ningbo1’s picture

Yeah, sure I know that. But both are still *my site*. From /admin/logs/referrers:

"This page shows you all external referrers. These are links pointing to your web site from outside your web site."

www.example.com and example.com are the same web site. They are certainly not from "outside your web site."

dman’s picture

They certainly are different.
They may end up displaying the same content (in your particular configuration), but the URIs are different.

api.drupal.org is not the same site as drupal.org
There is no special meaning to the string 'www', it's just a convention.

I don't have a fix for you apart from fixing up your host so it does redirect to just one or the other - like many sites do. Apparently it's really helpful for Search Engines also.

Take it up with the W3C if you disagree, but for host resolution, XSS security, DNS records and all, they are technically different sites. They just happen to often behave the same.

I appreciate what you are wanting to happen, but the current behaviour is correct.
A configuration could probably be added to the module to exclude other, specified domains, or to treat them as aliases. Put it on the wish list.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

Ningbo1’s picture

I still don't understand how it's "outside my site". They are certainly both inside. I am fully aware of W3C and DNS standards.

I appreciate what you are wanting to happen, but the current behaviour is correct.

Negative. The page is not showing external referrers, it is showing internal referrers. Other statistics software deals just fine with this issue - it is only here in Drupal's referrer log that the problem occurs. I suppose I am the first person in the world whose website is accessible with www or without?

dman’s picture

Would you argue that http://www.w3.org/ and http://validator.w3.org/ are the same site?
Is http://test.yoursite.com/ the same as http://yoursite.com/ or https://members.yoursite.com/ ?
Can you imagine a case where http://pedantic.blogspot.com/ would not want to be able to log referrals from http://silly.blogspot.com/ ?

They share a second-level domain, but they are not regarded as being totally identical.
And as I said, 'www' is not some magic default keyword, it's a legacy, explicitly chosen convention.

The definition of internal is, in the case, the same host and host is the unique hostname. You are talking about two different hostnames. They may (in rare circumstances) end up on different machines in different parts of the world.
Filtering that to extend to the same second-level domain is possible, and a fair request. The cookie specification allows a method to do so, as do many email resolution methods. As far as the network is concerned, It's an optional, wildcard sort of behavior, and not automatic.

Many folk configure their servers to resolve to one cannonic version of the sitename. As mentioned, Search Engine Optimizers are rabid about it. That's regarded as a good solution to your dilemma. Try it.

On sites that simply serve both sites equally via name-based aliasing, I've encountered session and cookie problems, where (depending on the methods used) you can find yourself logged in to www.site.com but not to site.com , or vice versa. I'm unaware if this has been worked around in Drupal.
I've certainly seen it with XSS sandboxing, you are prohibited from AJAX-requesting from one of these to the other, leading to hard-to-spot problems.

If you are aware of DNS behaviour, you'll realize that it's a protocol-level issue. It can be over-ridden and fuzzed in the statistics analysis code, and I think it would be a nice extra to have, but what you are seeing is the result of explicitly following the actual data.

Anyway, I'm not actually 100% sure that this is the cause of your pain, have you been able to verify it?

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

Ningbo1’s picture

Of course www.example.com and example.com are the same site. Duh! Just because there's no RFC that says it, doesn't mean it's not the daily reality of the internet.

The definition of internal is, in the case, the same host and host is the unique hostname.

That is an extremely limited, obsessively pedantic viewpoint. I'm talking about internal versus external. Obviously, anything with my TLD is internal. I use lots of stats packages, and they seem to have no problem with this. Only Drupal comes out and says, "but it's not the same site! Here, let me spam your referrer logs with useless data!"

what you are seeing is the result of explicitly following the actual data.

Yeah. I figured that out. However, this rigid behavior leads to undesirable results. Whatever, man...do what you want...I forgot what a pain in the *** it is to try to reason with a nerd when he thinks he's technically correct. At least someone responded to this post, unlike the other 5 I posted.

dman’s picture

Look, I've been trying to help you understand the problem so that it would be possible to find a fix for it. If you want to stamp your feet about it, you'll get no closer to an actual fix.
You are denying that this is the reality of TCP and DNS. You are denying that this is what is being written into your server logs.

Sure, it's usually safe to assume that those two different addresses will get you to the same place. I've tried to point out it's just an assumption, and you are swearing black and blue that it's a hard fact. I've seen hundreds of sites that have failed to configure both, even if you never have. They probably should do so, but...

I've once had the misfortune of administering a site which had an Apache server on www.sitename.tld and an IIS one on sitename.tld . This was quite stupid, you'll agree (a legacy migration thing), but our logs for the two sites were also separate, as you can imagine.
Seeing the referers from A to B helped us track down old links.

The definition of internal is, in the case, the same host and host is the unique hostname.

That is an extremely limited, obsessively pedantic viewpoint.

Not if you host more than one site. It's a very useful fact that sites with different names, even if they are under the same second-level domain get resolved differently.

It seems fussy to you when in this case you want your personal thing to just work. But you refuse to believe this result even has a cause.

Obviously, anything with my TLD is internal.

[pedantic]it's not a TLD ;)[/pedantic]
Yet intertwingling links between third-level domain hosts is still worthy of being logged by referrer. Maybe not for your personal coner of the world, but for larger ones.

I have no vested interest in the stats package or anything. I've agreed several times that your feature request would be a nice-to-have, and I'm mildly surprised that the option doesn't already exist.

I've pointed out some of the pitfalls of not understanding why two different hostnames are treated as being two different hosts. SSH Security keys also are issued for only one or the other AFAIK. Usually you can only FTP into one or the other.

I've told you how your particular problem can be fixed.
Look around. You'll probably find it hard to log in to log in to http://www.drupal.org , or http://blogger.com or http://microsoft.com ... try your own favorite secure registration site. They all understand the issue, and solve it in exactly this way.

Your site:server:host:hostname:domain are different terms with slightly different implications. You keep deliberately confusing 'site' with 'hostname'. Pedantic, maybe, but this distinction seems to be the root of the misunderstanding.

Yet you seem to want to make the point that the reason for your problem doesn't exist.

You can't fix it that way.
Sorry for actually trying to help you solve your bloody problem.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

eiland’s picture

Hi all,

Im fully aware of the technical origins of this thing. So how about just giving the htaccess line which allows you to force the address to go one way or the other (with the www. or without it)? That would be helpfull :)

dman’s picture

It is indeed a great solution to the problem :-)
Please inspect your .htaccess file and follow the instructions you'll find inside.

  # If your site can be accessed both with and without the 'www.' prefix, you
  # can use one of the following settings to redirect users to your preferred
  # URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
  ...
  ..

.dan.
if you are asking a question you think should be documented, please provide a link to the handbook where you think the answer should be found.
| http://www.coders.co.nz/ |

eiland’s picture

Ah that was easy! .htaccess is very difficult in my book, so i never opened it, but actually it turned out to be not-so-complicated.

La M’s picture

This only seemed to work for me if I uncommented lines 96 and 97 in my htaccess file, not if I uncommented 90 and 91.

Now, when I go to http://www.mydomain.com/admin/reports/referrers I get redirected to http://mydomain.com/admin/reports/referrers and a report that excludes my own domain.

---------------------------------------------------------------------------------
# If your site can be accessed both with and without the 'www.' prefix, you
# can use one of the following settings to redirect users to your preferred
# URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
# To redirect all users to access the site WITH the 'www.' prefix,
# (http://example.com/... will be redirected to http://www.example.com/...)
# adapt and uncomment the following:
90 # RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
91 # RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
# To redirect all users to access the site WITHOUT the 'www.' prefix,
# (http://www.example.com/... will be redirected to http://example.com/...)
# uncomment and adapt the following:

96 RewriteCond %{HTTP_HOST} ^www\.mydomain\.com$ [NC]
97 RewriteRule ^(.*)$ http://mydomain.com/$1 [L,R=301]

----------------------------------------------------------------------------------