Wrong URLs in xmlsitemap

Expier - July 18, 2009 - 15:53
Project:XML sitemap
Version:5.x-1.6
Component:Other
Category:support request
Priority:normal
Assigned:Unassigned
Status:closed
Description

I have wrong URLs in my sitemap. They are look like http://[my_site].com/home/[my_folder_on hosting]/data/www/[my_site].com/node/XX. And they should looks like http://[my_site].com/node/XX.
Actually the "/home/[my_folder_on hosting]/data/www/[my_site].com/" is adress on my hosting where the site phisically is.

When I am trying to open sitemap with browser i am getting an error like:
"Can not load XSLT style sheet
http://[my_site].com/home/[my_folder_on hosting]/data/www/[my_site].com/files/xmlsitemap/gss.xsl"

#1

kiamlaluno - July 18, 2009 - 21:06
Status:active» postponed (maintainer needs more info)

The URLs used for the links shown in the sitemap are returned from Drupal; are you sure your Drupal site is correctly configured?

#2

Expier - July 22, 2009 - 14:38

How can I check this? The site works right. All links are working...
I have an idea what could be the reason: I am submitting sitemap every time cron runs. My cron is configured through ISP control pannel. To run cron I had add follwing lines to cron.php file:
$_SERVER['HTTP_HOST'] = 'www.[my_site].com';
$_SERVER['REMOTE_ADDR'] = '[server_ip]';
$_SERVER['REQUEST_METHOD'] = 'GET';

chdir('/home/[my_folder]/data/www/[my_site].com');

But why does errors occures even when I am running cron manually through browser (just going to www.[my_site].com/sitemap.xml)?

#3

kiamlaluno - July 19, 2009 - 18:02
Status:postponed (maintainer needs more info)» active

I guess there could be some conflicts with what you set, and the following code executed by Drupal at bootstrap:

<?php
 
if (isset($base_url)) {
   
// Parse fixed base URL from settings.php.
   
$parts = parse_url($base_url);
    if (!isset(
$parts['path'])) {
     
$parts['path'] = '';
    }
   
$base_path = $parts['path'] . '/';
   
// Build $base_root (everything until first slash after "scheme://").
   
$base_root = substr($base_url, 0, strlen($base_url) - strlen($parts['path']));
  }
  else {
   
// Create base URL
   
$base_root = (isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] == 'on') ? 'https' : 'http';

   
$base_url = $base_root .= '://'. $_SERVER['HTTP_HOST'];

   
// $_SERVER['SCRIPT_NAME'] can, in contrast to $_SERVER['PHP_SELF'], not
    // be modified by a visitor.
   
if ($dir = trim(dirname($_SERVER['SCRIPT_NAME']), '\,/')) {
     
$base_path = "/$dir";
     
$base_url .= $base_path;
     
$base_path .= '/';
    }
    else {
     
$base_path = '/';
    }
  }
?>

The effect of seeing the directory path in the URL could be caused by changing the current directory.

#4

earnie - July 20, 2009 - 11:41
Category:bug report» support request
Status:active» postponed (maintainer needs more info)

What happens if you remove your modifications to cron.php? Setting to support request since you've modified the Drupal core source.

#5

kiamlaluno - July 20, 2009 - 15:58

What is the $base_url value you have in your settings.php?

#6

Expier - July 22, 2009 - 14:39

I had removed modifications but sitemap did not become right. It becomes right only after module enabling/disabling

#7

Expier - July 21, 2009 - 20:08

I have no $base_url, it was not set for my site.... Actually I can't chek could it be fix for this issue because I am using poormanscron now... I was trying to repeat this bug, but have no results

#8

earnie - July 22, 2009 - 11:57

I was trying to repeat this bug, but have no results

Are you saying the ``bug'', i.e. results, you were experiencing no long exist?

#9

Expier - July 22, 2009 - 12:33

Call it bug, call it issue, what ever you calling it the situation is the same. I was trying to replicate it but it was failed. I could not do it every time just for telling you: "Yes, THI IS THE BUG", no I will not do it. I just need my cron running and I have it with poormanscorn and I just need my sitemap working and I have it too.
I just trying to help guys who are maintaining this project by reporting to them of this issue... If that was not helpful so dont mind - close it.

#10

kiamlaluno - July 22, 2009 - 12:46

Every issue report is welcome, and it is weighted as it should. Without users reporting issues they are having, developers could not know of problems present in the code they develop (also because they cannot check the interactions with their module and all the plethora of existing modules).

The remark made by earnie is to point out that maybe is not a XML sitemap bug.
Basing on what you said (your site didn't have a $base_url set), the problem is caused by the custom code you had to use to set your cron task. In fact, when $base_url is not set, Drupal executes the following code:

<?php
   
// Create base URL
   
$base_root = (isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] == 'on') ? 'https' : 'http';

   
$base_url = $base_root .= '://'. $_SERVER['HTTP_HOST'];

   
// $_SERVER['SCRIPT_NAME'] can, in contrast to $_SERVER['PHP_SELF'], not
    // be modified by a visitor.
   
if ($dir = trim(dirname($_SERVER['SCRIPT_NAME']), '\,/')) {
     
$base_path = "/$dir";
     
$base_url .= $base_path;
     
$base_path .= '/';
    }
    else {
     
$base_path = '/';
    }
?>

If you don't want to see wrong URLs, and use the custom code you need to enable the cron tasks, try setting $base_url.

#11

kiamlaluno - July 22, 2009 - 12:50

I was trying to repeat this bug, but have no results.

Do you mean that now the URLs shown in the sitemap are correct as you expect them to be?

#12

earnie - July 22, 2009 - 13:18

This module is used by thousands of users, http://drupal.org/project/usage/xmlsitemap, and is why I have a problem calling this a bug. I'm trying to understand why you are having an issue with your reported URLs and no one else is. Except for what Kiam has pointed you to I know of no other fix. Modify your settings.php file to add the $base_url; do you then have good results? (see http://api.drupal.org/api/function/conf_init/5)

The other question would be what modules do you have installed? Maybe another module is interfering with the results of xmlsitemap.

#13

Expier - July 22, 2009 - 14:38

Look what I had done:
- I remove modifications to cron.php I have made before.
- I disable/enable XML Sitemap module
- Check sitemap.xml - it has no errors.

Than Kiam told me to set $base_url in settings.php and I do following:
- Restore modifications made to cron.php
- Run for a few times cron
- Check my sitemap (I want to see, that the error I had before again is replicated, before doing changes to settings.php) - and I had not seen wrong urls i.e. urls was right.

So now I am not using midified cron.php (I am using poormanscron which is running well without modifications) and I could not leave cron running with modifications for some time becuse it is my live server.
I will try to have backups of my site and dump of db for time the error was and if I could have it I will try to check if $base_url fix this.

#14

kiamlaluno - July 22, 2009 - 14:24

Actually, version 5.x-1.6 of XML sitemap uses a custom function to get the URL of a Drupal link, and it could be that function needs to be updated. In the specific case, anyway, I think the problem is primarily caused by the modifies made to cron.php (and the fact that $base_url is not set in settings.php).

We can then say that the problem is specific to the particular configuration, and the issue is not caused by XML sitemap.

#15

murokoma - November 3, 2009 - 01:17
Priority:normal» critical
Status:postponed (maintainer needs more info)» active

Hi there,

I'm having the same problem here on two installation on the same server, using the 1.1 stable version.

In google webmaster tools, it get messages below. To Infom you: my site would be on www.maciejewski.ch and my sitemap is on www.maciejewski.ch/sitemap.xml;
Physically, the server path to www.maciejewski.ch IS http://web892.login-24.hoststar.ch/maciejewski/, so I assume the plugin somehow pulls the server path instead of the real drupal path.

I would appreciate any help, thanks! I'll gladly provide more information.
Robert

(putting this back to active with more information).

***********

Fehler 7
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/
Problem ermittelt am: 01.11.2009
Fehler 61
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/great-new-collabo...
Problem ermittelt am: 01.11.2009
Fehler 73
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/little-update-twi...
Problem ermittelt am: 01.11.2009
Fehler 163
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/jaw-dropping-russ...
Problem ermittelt am: 01.11.2009
Fehler 481
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/another-really-up...
Problem ermittelt am: 01.11.2009
Fehler 487
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/doing-seminar-kmu...
Problem ermittelt am: 01.11.2009
Fehler 505
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/its-twitter-time
Problem ermittelt am: 01.11.2009
Fehler 541
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/amazing-dancer-mu...
Problem ermittelt am: 01.11.2009
Fehler 559
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/and-your-money-go...
Problem ermittelt am: 01.11.2009
Fehler 571
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/8-us-firms-have-a...
Problem ermittelt am: 01.11.2009
Fehler 655
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/quick-addition-ye...
Problem ermittelt am: 01.11.2009
Fehler 679
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/must-see-very-int...
Problem ermittelt am: 01.11.2009
Fehler 697
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/about-great-cab-r...
Problem ermittelt am: 01.11.2009
Fehler 715
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/another-proof-its...
Problem ermittelt am: 01.11.2009
Fehler 721
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/ha-thats-just-inc...
Problem ermittelt am: 01.11.2009
Fehler 727
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/horrible-marketin...
Problem ermittelt am: 01.11.2009
Fehler 733
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/quote-week-0
Problem ermittelt am: 01.11.2009
Fehler 739
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/google-drives-me-...
Problem ermittelt am: 01.11.2009
Fehler 745
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/quote-week
Problem ermittelt am: 01.11.2009
Fehler 751
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/its-just-so-good-...
Problem ermittelt am: 01.11.2009
Fehler 757
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/making-it-easier-...
Problem ermittelt am: 01.11.2009
Fehler 769
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/lorem-ipsum-gener...
Problem ermittelt am: 01.11.2009
Fehler 793
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogs/good-nuuz-all-ze-no...
Problem ermittelt am: 01.11.2009
Fehler 799
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogs/gratuliere-guido
Problem ermittelt am: 01.11.2009
Fehler 835
URL nicht zulässig
Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig.
URL: http://web892.login-24.hoststar.ch/maciejewski/blogsen/nice-story-about-...
Problem ermittelt am: 01.11.2009

#16

murokoma - November 3, 2009 - 01:19

Ah well, and as it's german; "fehler" is error, the number after that is the sitemap line; it says "URL no valid" (URL nicht zulässig), and "This URL is not valid at this posistion for a XML-Sitemap" (Diese URL ist für eine XML-Sitemap an dieser Position nicht zulässig).

#17

murokoma - November 3, 2009 - 01:22

Oh damn it, I just saw that this was issued for 5.16 - I am using D6 1.1; what shall I do to show the correct version issue? I don't want to mess around with the versioning of this post....

#18

kiamlaluno - November 3, 2009 - 09:59
Priority:critical» normal
Status:active» fixed

Open a new issue for the problem you are having.

I am marking this report as fixed because the OP has never replied to my previous comment.

#19

System Message - November 17, 2009 - 10:00
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.