Closed (won't fix)
Project:
XML sitemap
Version:
5.x-2.x-dev
Component:
xmlsitemap.module
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
18 Aug 2008 at 11:21 UTC
Updated:
15 Dec 2012 at 15:41 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
chrissearle commentedDeleted the files/xmlsitemap/sitemap.xml.gz and regenerated using the http: URL rather than https: - looks OK now.
Comment #2
blb commentedi'm having the same problem and while the work-around of regenerating the file also worked, it seems to me that what happens when the file is regenerated automatically is still an open question.
xmlsitemap should never generate a map with https links in the files (or for the xsl and css files - which it also does on occasion). there is also a problem with it sending pings with https for the URL of the sitemap itself.
there has got to be a way to fix it so secure_pages and xmlsitemap play together nicely.
Comment #3
chrissearle commentedHave to admit that I left it set to only update at cron - but - it would be nice if it would play nicer :)
Comment #4
dedicated commentedCan you explain how you regenerated the file using HTTP? Thanks.
Comment #5
chrissearle commentedSomewhere in your drupal site's file area there is a cache file generated by xmlsitemap. Look for something like /xmlsitemap/sitemap.xml.gz. The actual location depends on your files config (public/private and path). If you delete that then call cron.php over http then it will re-generate.
Comment #6
dedicated commentedSorry, I wasn't clear. I can regenerate the sitemap file, but how did you manage to get the content listed with http instead of https?
Comment #7
chrissearle commentedFor me - http vs. https is dependent on whether you trigger the update via http or https. So - when I edit a post - it is https - therefore are the links in the sitemap https. For that reason I turned off "generate on post".
However - securepages module doesn't force https for cron.php (at least for me) - so calling cron via http meant that I got http links in the map.
If deleting the cache then calling http://site/cron.php gives you https links then I'm afraid you're seeing something I am not :( But make sure you are not calling https://site/cron.php or triggering any update of the map from an https link (such as edit post).
Comment #8
darren ohchrissearle: Excellent explanation. Thank you for helping to support this project!
Comment #9
chrissearle commentedThank you for providing it :)
Comment #10
avpadernoComment #11
avpadernoI change the status to fixed, as it has been reported the cause the problem, and how fix it.
Comment #12
chrissearle commentedAre you sure? We have a workaround sure - but we are not able to use "generate on post" together with the secure pages module.
For me - a fix would be that it worked. However - I have no idea how to work out what protocol the viewing of a node is while in the save of edit function.
If not then perhaps the module could check for secure pages and have a warning on the settings page ?
Comment #13
avpadernocron.php should never run run under https:// IMHO.
The module could check the URLs it generated, but it's not clear what it should do if it finds out any URLs starting with https://.
Would you be able to give me an use case which justify the addition of code into the module?
What should, in your opinion, the options given from the module on how to handle the secure pages? Take in consideration there would be people who want to use the sitemap even if it contains any https:// URLs.
Comment #14
chrissearle commentedSorry - I was perhaps unclear.
Running this under cron is not an issue as long as there is no https in the cache.
I'm successfully running secure pages and xmlsitemap with cron right now.
The issue comes with running "generate on post" rather than on cron. Everything in this post from here on is regards to "generate on post" not "generate on cron".
Worth noting is that if the _entire_ site is https then this is also a non-issue (in other words it is not a problem if view and edit are both on https). This means we can't just strip https.
The issue comes where we create a sitemap on one protocol and the view is on another.
The reason seems to be that the node/*/edit page is https at the point of submit. In other words because the constructed URL for the view (correctly handling aliases etc) is prefixed with the protocol of the edit page.
So - for example - if I edit node: /testnode which internally has a drupal path of node/22 then the view path is
http://domain/testnode
The edit path is
https://domain/27/edit
And the sitemap path is
https://domain/testnode
Just done some more digging - $base_path changes. In edit - https, in view - http.
Now - what can be done to fix this? Perhaps nothing - I'm really moving out of my depth here.
However - I did wonder if it was worth adding to the admin form something that said "if the user enables generate on post AND secure pages is installed/enabled then add a message somehow that says "You appear to be using secure pages - if generate on post is used and edit is via https then https URLs will be used for the sitemap" or something. Just to reduce users wondering what happened.
If there was a way to say "get base_path and find out if secure pages is in effect for this url" it would be nice to handle it. Can't see anything useful in securepages code tho.
I find both slightly ugly - one module having to code around another - but I don't see a more elegant way forward :(
Perhaps its just a documentation issue - mention that this is a known incompatibility and the solution is to only use cron.
Comment #15
avpadernoThanks for your reply.
What you would like to see is a warning message that would advise the user if he is using both XML Sitemap and Secure Pages, and the sitemap is being sent to the search engine after the editing of nodes.
I think that is something that can be done; after we resolved other issues, we can add the warning you suggested.
Comment #16
avpadernoThe code will not be changed to support such cases.
Comment #17
xurizaemonTo work around this issue:
1. At admin/settings/xmlsitemap/engines set "Submit site map when updated" OFF and "Submit site map on cron run" ON.
2. Then call cron.php via HTTP.
I was a bit confused by the UI here, and feel that the option labels here don't make it clear how to set "generate on post" (as described by chrissearle above.
The first option at admin/settings/xmlsitemap/engines (aka "generate on post") is labelled,
What this doesn't say is what's meant by 'it'. The setting will make your sitemap regenerate each time a post is updated, but refers passively to "each time it is updated", which made me think that there was a different setting which controlled when the sitemap actually got regenerated.
I suggest that the description of this element be changed for clarification, eg:
Comment #18
avpadernoIf you don't mind, I will open a feature request for this. I think you are correct about the UI not being clear in that.
Thanks for your help.
Comment #19
avpadernoSee #466654: Change 'Submit the sitemap when updated' title and description, which has been already implemented in CVS for the Drupal 5 version of the project.
Thanks again for your suggestion.
Comment #20
Anonymous (not verified) commentedRan into the same problem, thanks for posting the fix.
Might be easier (and more peace of mind) to have a field where you can set the domain name in the backend.
Comment #21
dave reidThe Drupal 5 version is no longer supported. We've implemented exactly this in 6.x-2.x and 7.x-2.x already.
Comment #22
Anonymous (not verified) commentedThank you for your reply.
My bad. I was searching through the issue cue, and read over the version of the module.
Where? Do you mean the search engine fields? They are greyed out.
Did a search on:
site:drupal.org XMLsitemap set url.
site:drupal.org XMLsitemap set domain.
Comment #23
Anonymous (not verified) commentedComment #24
dave reidIf you're seeing grayed out search engines, you're using 6.x-1.x, not 6.x-2.x. If you are actually using 6.x-2.x, on the admin/settings/xmlsitemap/settings page you'll see a 'Default base URL' field which is used by default to generate all the links in the sitemap.
Comment #25
Anonymous (not verified) commentedThank you
Comment #26
Anonymous (not verified) commentedComment #27
mennonot commentedFor anyone who comes across this thread and finds that none of the above solutions work, I found that my problem was that Default base URL: under advanced settings at /admin/settings/xmlsitemap/settings. I had my set to https, probably because secure pages was installed previously and this setting had never been changed. I changed it to http and then clicked on rebuild links (/admin/settings/xmlsitemap/rebuild) and it fixed the problem.
Comment #28
cborgia commentedecho what #27 says above. only here is an update for 2012: the advanced settings are now at: admin/config/search/xmlsitemap/setting -change your Default base URL to http instead of https
Comment #29
imiksuYou might be interested on this issue: #1863526: Allow administrators to set URI scheme, which contains a patch for generating URLs always with HTTP or HTTPS scheme.
Comment #30
avpaderno