I have both XML sitemap and secure pages modules installed.

Secure pages is set to switch the site over to https for the following:

node/add*
node/*/edit
user/*
admin*

However - all URLs in the sitemap are https instead of http - any clues as to how to get them to be http (all pages that google should be bothering with are http) would be nice.

As it is - since the site is set to switch back to http for any non-protected pages google just moans about too many redirects (the redir from https to http).

CommentFileSizeAuthor
#24 Selection_056.png95.6 KBdave reid

Comments

chrissearle’s picture

Status: Active » Closed (fixed)

Deleted the files/xmlsitemap/sitemap.xml.gz and regenerated using the http: URL rather than https: - looks OK now.

blb’s picture

Status: Closed (fixed) » Active

i'm having the same problem and while the work-around of regenerating the file also worked, it seems to me that what happens when the file is regenerated automatically is still an open question.

xmlsitemap should never generate a map with https links in the files (or for the xsl and css files - which it also does on occasion). there is also a problem with it sending pings with https for the URL of the sitemap itself.

there has got to be a way to fix it so secure_pages and xmlsitemap play together nicely.

chrissearle’s picture

Have to admit that I left it set to only update at cron - but - it would be nice if it would play nicer :)

dedicated’s picture

Can you explain how you regenerated the file using HTTP? Thanks.

chrissearle’s picture

Somewhere in your drupal site's file area there is a cache file generated by xmlsitemap. Look for something like /xmlsitemap/sitemap.xml.gz. The actual location depends on your files config (public/private and path). If you delete that then call cron.php over http then it will re-generate.

dedicated’s picture

Sorry, I wasn't clear. I can regenerate the sitemap file, but how did you manage to get the content listed with http instead of https?

chrissearle’s picture

For me - http vs. https is dependent on whether you trigger the update via http or https. So - when I edit a post - it is https - therefore are the links in the sitemap https. For that reason I turned off "generate on post".

However - securepages module doesn't force https for cron.php (at least for me) - so calling cron via http meant that I got http links in the map.

If deleting the cache then calling http://site/cron.php gives you https links then I'm afraid you're seeing something I am not :( But make sure you are not calling https://site/cron.php or triggering any update of the map from an https link (such as edit post).

darren oh’s picture

chrissearle: Excellent explanation. Thank you for helping to support this project!

chrissearle’s picture

Thank you for providing it :)

avpaderno’s picture

Title: XML Sitemap together with secure pages - https in sitemap » Using https:// in sitemaps
avpaderno’s picture

Status: Active » Fixed

I change the status to fixed, as it has been reported the cause the problem, and how fix it.

chrissearle’s picture

Are you sure? We have a workaround sure - but we are not able to use "generate on post" together with the secure pages module.

For me - a fix would be that it worked. However - I have no idea how to work out what protocol the viewing of a node is while in the save of edit function.

If not then perhaps the module could check for secure pages and have a warning on the settings page ?

avpaderno’s picture

Status: Fixed » Active

cron.php should never run run under https:// IMHO.
The module could check the URLs it generated, but it's not clear what it should do if it finds out any URLs starting with https://.

Would you be able to give me an use case which justify the addition of code into the module?
What should, in your opinion, the options given from the module on how to handle the secure pages? Take in consideration there would be people who want to use the sitemap even if it contains any https:// URLs.

chrissearle’s picture

Sorry - I was perhaps unclear.

Running this under cron is not an issue as long as there is no https in the cache.

I'm successfully running secure pages and xmlsitemap with cron right now.

The issue comes with running "generate on post" rather than on cron. Everything in this post from here on is regards to "generate on post" not "generate on cron".

Worth noting is that if the _entire_ site is https then this is also a non-issue (in other words it is not a problem if view and edit are both on https). This means we can't just strip https.

The issue comes where we create a sitemap on one protocol and the view is on another.

The reason seems to be that the node/*/edit page is https at the point of submit. In other words because the constructed URL for the view (correctly handling aliases etc) is prefixed with the protocol of the edit page.

So - for example - if I edit node: /testnode which internally has a drupal path of node/22 then the view path is

http://domain/testnode

The edit path is

https://domain/27/edit

And the sitemap path is

https://domain/testnode

Just done some more digging - $base_path changes. In edit - https, in view - http.

Now - what can be done to fix this? Perhaps nothing - I'm really moving out of my depth here.

However - I did wonder if it was worth adding to the admin form something that said "if the user enables generate on post AND secure pages is installed/enabled then add a message somehow that says "You appear to be using secure pages - if generate on post is used and edit is via https then https URLs will be used for the sitemap" or something. Just to reduce users wondering what happened.

If there was a way to say "get base_path and find out if secure pages is in effect for this url" it would be nice to handle it. Can't see anything useful in securepages code tho.

I find both slightly ugly - one module having to code around another - but I don't see a more elegant way forward :(

Perhaps its just a documentation issue - mention that this is a known incompatibility and the solution is to only use cron.

avpaderno’s picture

Version: 5.x-1.6 » 5.x-2.x-dev
Status: Active » Postponed

Thanks for your reply.
What you would like to see is a warning message that would advise the user if he is using both XML Sitemap and Secure Pages, and the sitemap is being sent to the search engine after the editing of nodes.

I think that is something that can be done; after we resolved other issues, we can add the warning you suggested.

avpaderno’s picture

Component: xmlsitemap » xmlsitemap.module
Status: Postponed » Closed (won't fix)

The code will not be changed to support such cases.

xurizaemon’s picture

To work around this issue:

1. At admin/settings/xmlsitemap/engines set "Submit site map when updated" OFF and "Submit site map on cron run" ON.

2. Then call cron.php via HTTP.

I was a bit confused by the UI here, and feel that the option labels here don't make it clear how to set "generate on post" (as described by chrissearle above.

The first option at admin/settings/xmlsitemap/engines (aka "generate on post") is labelled,

Submit site map when updated.
If enabled, search engines will be notified of changes to the site map each time it is updated.

What this doesn't say is what's meant by 'it'. The setting will make your sitemap regenerate each time a post is updated, but refers passively to "each time it is updated", which made me think that there was a different setting which controlled when the sitemap actually got regenerated.

I suggest that the description of this element be changed for clarification, eg:

Submit site map on content updates
If enabled, search engines will be notified of changes to the site map each time the site content is updated.

avpaderno’s picture

If you don't mind, I will open a feature request for this. I think you are correct about the UI not being clear in that.

Thanks for your help.

avpaderno’s picture

See #466654: Change 'Submit the sitemap when updated' title and description, which has been already implemented in CVS for the Drupal 5 version of the project.

Thanks again for your suggestion.

Anonymous’s picture

Status: Closed (won't fix) » Active

Ran into the same problem, thanks for posting the fix.

Might be easier (and more peace of mind) to have a field where you can set the domain name in the backend.

dave reid’s picture

Status: Active » Closed (won't fix)

The Drupal 5 version is no longer supported. We've implemented exactly this in 6.x-2.x and 7.x-2.x already.

Anonymous’s picture

Thank you for your reply.

My bad. I was searching through the issue cue, and read over the version of the module.

We've implemented exactly this in 6.x-2.x and 7.x-2.x already.

Where? Do you mean the search engine fields? They are greyed out.

Did a search on:

site:drupal.org XMLsitemap set url.
site:drupal.org XMLsitemap set domain.

Anonymous’s picture

Version: 5.x-2.x-dev » 6.x-1.2
Status: Closed (won't fix) » Active
dave reid’s picture

Version: 6.x-1.2 » 5.x-2.x-dev
Status: Active » Closed (won't fix)
StatusFileSize
new95.6 KB

If you're seeing grayed out search engines, you're using 6.x-1.x, not 6.x-2.x. If you are actually using 6.x-2.x, on the admin/settings/xmlsitemap/settings page you'll see a 'Default base URL' field which is used by default to generate all the links in the sitemap.

Anonymous’s picture

Status: Closed (won't fix) » Fixed

Thank you

Anonymous’s picture

Status: Fixed » Closed (won't fix)
mennonot’s picture

For anyone who comes across this thread and finds that none of the above solutions work, I found that my problem was that Default base URL: under advanced settings at /admin/settings/xmlsitemap/settings. I had my set to https, probably because secure pages was installed previously and this setting had never been changed. I changed it to http and then clicked on rebuild links (/admin/settings/xmlsitemap/rebuild) and it fixed the problem.

cborgia’s picture

Version: 5.x-2.x-dev » 7.x-2.x-dev

echo what #27 says above. only here is an update for 2012: the advanced settings are now at: admin/config/search/xmlsitemap/setting -change your Default base URL to http instead of https

imiksu’s picture

You might be interested on this issue: #1863526: Allow administrators to set URI scheme, which contains a patch for generating URLs always with HTTP or HTTPS scheme.

avpaderno’s picture

Version: 7.x-2.x-dev » 5.x-2.x-dev