Closed (fixed)
Project:
Google Sitemap
Version:
4.7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Support request
Assigned:
Reporter:
Created:
14 May 2007 at 20:09 UTC
Updated:
24 Oct 2008 at 09:08 UTC
Jump to comment: Most recent file
Comments
Comment #1
wwwoliondorcom commentedHi,
I have the same problem.
Did you manage to get the perfect sitemap by uninstalling and installing again sitemap and pathauto?
It didn't work for me, I can get the clean url in my sitemap only if I edit then save again each node (withou changind anything in the node)
Thanks.
http://drupal.org/node/143929
Comment #2
badjava commentedI haven't touched pathauto since it appears to be working fine. But yes, if I re-install XML sitemap, it will create a perfect sitemap after installation with the existing content. After that, any new content doesn't use the URL aliases.
Comment #3
badjava commentedI had a look in the database and the gsitemap table. All the problematic nodes have null for the pid value. I have manually edited these records, set the correct pid value by looking them up in the url_alias table and it appears to have fixed the problem. Updating the content node even after its submitted does not fix the problem either. Any idea why the pid field is not getting set correctly?
Comment #4
badjava commentedI've looked at this some more and I think I've figured out why you need to update content in order to get the XML sitemap to generate properly with the url alias. I am a newbie to the Drupal so please bear with me.
I assume there is rhyme and reason to the order of modules doing whatever they need to do when content is created, updated, etc. It appears as though the sitemap is created before pathauto creates the alias which could explain why an update is required. The alias is created after the sitemap module is created when adding new content. When you update the content, the alias is already there and the sitemap is then updated with the correct alias.
I think the pid changing problem I reported earlier is caused by it using the old pid and then pathauto creating a new alias after the xml sitemap is created. In the gsitemap table, it always shows the old pid before the content was updated. I assume that the xml sitemap module has some code in it that verifies the integrity of the pid field contained in the gsitemap table because if its not correct, it will use the node/nid url and not the url alias.
Thoughts?
Comment #5
wwwoliondorcom commentedActually I don't know what to do and hope somebody will help to fix this bug.
I tried again to uninstall sitemap but it doesn't make a new sitemap with clean urls when I install it again.
Thanks for your help.
Comment #6
Ryanbach commentedYes, I would like this fixed.
Having node/x in the sitemap can cause duplicate content pentalties in many/most search engines.
That is not good!
Comment #7
samirnassar commentedcheck out the globalredirect module: http://drupal.org/project/globalredirect, its redirects numbered node URL's to their aliases with a 301, and after activation requires no intervention. XMLSitemap shouldn't behave badly, but until it gets fixed, and even after it gets fixed, global redirect is pretty nice.
Comment #8
wwwoliondorcom commentedThanks for this good idea!
Comment #9
hquadrat commentedI have a similar Problem: when I add content to my drupal-site, the sitemap is not generated correctly - all "directory-links" are not listed anymore, solely the links to the actual pages that have a priority assigned. In addition to that some new nodes get listed with their nodeid, as stated above... However, to generate a sitemap reflecting the settings made, I just need to visit the xml-sitemap settings page in my admin section and resave it. As I find it important to have the directory-links (as defined in pathauto) in the sitemap too and they just get discarded when content is added, I consider this bug to be critical...
Comment #10
wwwoliondorcom commentedHelp!
Comment #11
darren ohIn order for Pathauto's directories to be added to the site map, Pathauto will need to be rewritten to be able to add links to the site map using hook_gsitemap.
One reason that the URL aliases generated by Pathauto are not being used could be that the Pathauto module is setting the alias after the node has been saved to the gsitemap table. Has anyone tried making sure that Pathauto has a higher priority in the system table so it runs before gsitemap?
Comment #12
samirnassar commentedHow do we set a higher priority for a module?
Comment #13
darren ohThere will be a Drupal 5 module for setting module weights available soon at http://drupal.org/project/moduleweight.
Comment #14
wwwoliondorcom commentedWaiting for this new module how to do it manually?!
Thanks.
Comment #15
darren ohYou adjust the weight column in the system table.
Comment #16
MacRonin commentedI have been keeping up with the new versions of sitemap. I was getting the correct path (the full path-alias) and not the node numbers up till May 4th. After that point my sitemap entries are usually the node/### type. The only exceptions are when I have gone back and manually edited the URL-path
About Global Redirect(Global Redirect 5.x-1.1), I had it installed before the change and since and it doesn't stop the compete impact. I can't be positive that there is a direct correlation. But the reason I was looking at my sitemap file to begin with was because I thought something might be wrong due to a drop in my level of activity from the search engines (primarily google). This drop in activity occurred at about the same time as the switch to the node/### format So even with the redirect it appears that google find the content at the node/### names less interesting
I had the .9 version installed until just now, and just upgraded to the .10 version
Could the switch from using gsitemap as the name, have changed the order of module execution?
Comment #17
MacRonin commentedSorry, bad handwriting.
up till May 4th should have been up till May 7th
Comment #18
darren ohWhat happened was we stopped using url() to generated the links each time the site map was viewed and started joining the gsitemap table to the url_alias table using the pid saved at node creation.
Comment #19
wwwoliondorcom commentedI also have this problem since the 7-8th of May.
Have you tried to change the module priority ?
Comment #20
darren ohSince I don't use Pathauto, I can't test this for you.
Comment #21
badjava commentedYes, adjusting the weight field fixed the problem for me. Instead of setting the priority higher for Pathauto, I set the priority lower for XML Sitemap since I assume its probably fine that the sitemap is generated later as it shouldn't rely on much if anything.
There were some modules that had a weight of 10 so I changed gsitemap's weight from 0 to 5. I then added new content and the XML sitemap was created properly.
Thanks for your help.
Comment #22
MacRonin commentedThanks for testing this option out and passing along the info. I'll be giving it a try myself shortly.
Do you know if it updates the previously created node### entries?
and did you use the http://drupal.org/project/moduleweight module to change the weights, or did you go direct to the database?
Comment #23
wwwoliondorcom commentedAt first: system table in database: all weight set to 0 but metatags to 10
I changed the sitemap weight in this table and put 1 instead of 0 using the weight module.
Now all the new nodes have the clean url in sitemap. Thank you!
Now just need to find how to update the previous nodes that still don't have the clean url in sitemap.
Comment #24
adminfor@inforo.com.ar commentedFixed yesterday using priority change, Thnks a lot.
About fixing previous nodes, you may change the NULL value in the sitemap file (manually or linking with url_alias 'node/nnnnn') or, just edit and save each node, using as a reference (obviuosly...) the wrong sitemap.
Enjoy!!!
Gustavo
http://www.inforo.com.ar
Comment #25
wwwoliondorcom commentedHello,
Too many nodes to edit the pid NULL value.
How can I completely delete my sitemap and generate a new one?
And is there anyway to automatically edit then save all my nodes to get the good clean url in sitemap?
Thanks.
Comment #26
badjava commentedIf you can't update the gsitemap table manually or update each node, the next best thing is to disable the xml sitemap module, and then uninstall it. When you re-enable it, a new sitemap will be created and it should contain all your correct url aliases.
Comment #27
MacRonin commentedI installed the weight module and set xmlsitemap to a weight of 1, created a new entry and then went back and checked out the sitemap. Its back to using the URL as generated by pathauto. Thanks for the info
Does anyone know if this weight setting will last between module(sitemap) upgrades? I'm guessing it would stay but want to make sure.
Comment #28
Ryanbach commentedAnyone have the sql code to change gsitemap to a priority of 1? This should be added to the module so that we have to call update.php when we install the latest version (and thus fix this annoying bug...)
Comment #29
badjava commentedNot sure if this helps but this is the code generated by PHPMyAdmin:
Comment #30
Ryanbach commentedWe really need to set the gsitemap module at the priority of 1, in the gsitemap.install using the _update function to do so... Then no one will have this problem, hopefully.
Comment #31
TiViTi commentedsame problem. pls fix it.
Comment #32
TiViTi commentedopen phpmyadmin
go to system table
find gsitemap row
and change the priorty from 0 to 1
and all things work fine for me :)
Comment #33
Ryanbach commentedYeah, I know but that works in the meanwhile but until the database system table, gsitemap row is set to 1 in the gsitemap.install this isn't going to be fixed.
Comment #34
Tobias Maier commentedplease don't change the issues title
Comment #35
Ryanbach commentedUPDATE system SET weight = 1 WHERE name = "gsitemap" should work...
Comment #36
vovaodei commentedyep,
worked for me also...
Comment #37
ashtonium commentedOk, no one has posted a patch yet, so here you go.
I like to leave breathing room between my weight values if possible, so this patch sets the weight value to 5, but as long as the module is run after pathauto it should all be good.
I've tested the upgrade and the install processes and both set the module weight properly. Since a number of people have already successfully tested the end result, I'm marking this as RTBC.
Comment #38
moshe weitzman commentedwhy does sitemap avoid using url()? for faster sitemap generation?
no other module i know of does that. without looking into it, i would say that sitemap is the misbehaving child here. it is silly to force people to regenerate sitemaps and fiddle with module weights. every other part of drupal immediately starts using an alias once it has been defined.
Comment #39
moshe weitzman commentedNow I looked at code and see that we have our own copy of url(), own drupal_get_normal_path(), and own alias column. this strikes me as a heavy price for speed improvement.
in gsitemap_output_chunk(), we do a query to retrieve a bunch of nodes for writing to sitemap file. could we not JOIN with the url_alias table at that time and avoid storing our own cache?
Comment #40
mr.j commentedI applied the weight fix and it works, but of course all the old nodes still are referenced using /node/ in the sitemap. So for anyone who wants to do a bulk update on all of their nodes after applying the weight fix, use the devel module php box and execute this code. It loads each of your nodes and saves them one at a time. After running this all the node references in the site map will use the url aliases.
Comment #41
liam mcdermott commentedConfirming this problem, and that the patch works. It should be applied forthwith, there's no way such an important module should be left in this broken state!
Comment #42
rkendall commentedYes, this is an annoying bug. Pathauto and gsitemap are both essential modules in my opinion, and it would be good if they work together well.
What was the reason for this (7th May) change? And can it be reverted?
Comment #43
darren ohThe reason was that using url() on a site with hundreds of thousands of nodes is so resource intensive that the site map would never be generated if it were done each time. It takes long enough to initialize the site map just once.
Comment #44
rkendall commentedThanks for responding Darren. Looks like you are doing a good job on this module.
Comment #45
darren ohBy the way, in response to #39, the current version of gsitemap does not keep its own copy of the alias column. It joins to the url_alias table using the pid column. We use our own version of url() because we already have the alias and don't need to look it up.
Due to impossibility of predicting another module's weight, this will have to be solved (partially) with a more complicated patch.
Comment #46
darren ohSince I don't use Pathauto, I need someone to test this patch and let me know if there are any problems.
Comment #47
rkendall commentedI'll give it a test. Cheers.
Comment #48
rkendall commentedThe patch is working for me without any problems so far.
Comment #49
rkendall commentedSorry, forgot to mention: I tested with Drupal 5.2 and Pathauto 5.x-2.0-beta3 (current releases)
Comment #50
darren ohGood enough. Fixed in CVS commit 83635.
Comment #51
moshe weitzman commentedre #45: i see that in xmlsitemap_nodeapi(insert) we do record the path alias still in xmlsitemap table. perhaps it isn't used anymore when generating, but it certainly is recorded.
Comment #52
darren ohMoshe, are you sure we're talking about the same version?
Comment #53
lunen33 commentedSorry I'm new drupal.
I can't seem to get the patch to work correctly. Can someone attach the updated sitemap.install file with the patched code?
Thank you
Comment #54
darren ohhttp://cvs.drupal.org/viewvc.py/drupal/contributions/modules/gsitemap/gs...
Comment #55
lunen33 commentedThanks for the quick response.
I'm able to generate URL aliases for taxonomy terms for new nodes, but the nodes themselves are still not getting an alias in the sitemap. I installed the patch and pathauto-5.x-2.0-beta. Maybe its and issue with pathauto. Do you recommend I change the module weighting as mentioned above?
Comment #56
darren ohIf you use the new gsitemap.install, it should change the module weight when you run update.php.
Comment #57
(not verified) commentedComment #58
anoopjohn commentedHi,
I faced the same issue and fixed it by changing the weight of sitemap module. But you dont have to update pid on your own. Here is a simple query to do that for you
update
gsitemap left join url_alias
on concat('node/',gsitemap.nid) = url_alias.src
set gsitemap.pid=url_alias.pid
where gsitemap.pid is null and url_alias.pid is not null
Try the select query first if you wish to verify that you are going to update the correct records.
select * from
gsitemap left join url_alias
on concat('node/',gsitemap.nid) = url_alias.src
where gsitemap.pid is null and url_alias.pid is not null
cheers
Anoop
Comment #59
Oceria commentedThis might be a stupid question, but do we set the weigth of xml-sitemap correctly? Should it be something like "-5", or "+5"? I tried "+2" and resaved pages again, then ran cron.php but nothing changed ...
*Edit* Maybe I should read a bit better. The answer is "+5". Giving that a try peeps.
Comment #60
darren ohRunning cron won't change anything. You have to re-save the nodes.
Comment #61
Oceria commentedIt's even worse: I changed the weight to +2 for sitemap, but still my new nodes are built like /node/# instead of the alias. What am I doing wrong? I am using drupal 5.3, pathauto 5.x-2.0 and gsitemap 5.x-1.11
weight of pathauto is set to 1 (schema version 4) and gsitemap is set to 2 (schema version -1).
Do I need to change the schema version instead (both still as found the first time).
Comment #62
darren ohNothing is going to change for your existing nodes. You must re-save them. You can modify your database directly with the SQL query in #58.
Comment #63
Oceria commentedSorry, I was not very clear in post #61: I meant that a new post, made after changing the weight of sitemap module to two still showed up as "/node/#". This was a newly written node. The older nodes are indeed not affected and need to be changed by hand, or with the script.
PS: Thanks for replying so quickly! :)
Comment #64
Oceria commentedWell, that was stupid: in system there wsa still an entry for xmlsitemap, which I have been making very heavy. Instead of gsitemap. Let's see if that works. I'll keep you posted!
Comment #65
Oceria commentedAnd it works as advertised. My problem was a pebkac.
Comment #66
Rollie commentedWith respect, could some add a comment to the sitemap module page http://drupal.org/project/xmlsitemap about it not working properly with the pathauto module? I never would have installed it if I had seen the comments above (going back to mid last year) and now I've lost a whole bunch of aliases.
Thanks.
Comment #67
Oceria commentedWith respect, but this issue has been resolved in previous versions of pathauto/xmlsitemap. There has even been written a module that helps in setting the module weight, which determines when a module is active on adding a node, but this should not be needed in the latest versions of both pathauto and xmlsitemap.
The problem occurs when you had gsitemap installed previously, then replaced for xmlsitemap. The old database entries of gsitemap are not removed. Pathauto then might use the obsolete gsitemap weight to set its own weight.
Solution:
in maintenance mode, set pathauto to bul generate new aliases. You need to run cron multiple times to complete that task. Your sitemap is now using the rewritten urls.
This worked perfectly and is now working as it should on my drupal 5.6 installation, with xmlsitemap version 5.x-1.4 and pathauto version 5.x-2.0
Closing this topic again. Please stop changing the status of this topic.
(Due to an error the 5.x and 6.x versions are not showed in drupal taxonomy. Therefore I am forced to set it to "4.7-1.x-dev", but this topic belongs to 5.x-1.x-dev.)
Comment #68
akatangac commentedIs it not enough or just inefficient to change line 73 of the xmlsitemap_node.module to use drupal_get_path_alias?
'#loc' => xmlsitemap_url('node/'. $node->nid, drupal_get_path_alias('node/'. $node->nid), NULL, NULL, TRUE),
I did this and everything seems fine. Is there a problem with doing this?
Comment #69
dinaiz commentedThere's also another sitemap module, much simpler, but it seems to work .
You can have it here : http://dinaiz-two-dot-zero.blogspot.com/2008/06/easily-add-sitemap-to-dr...
(Original post here : http://www.seo-expert-blog.com/tutorial/drupal-6-xml-sitemap-for-nodes#c...)
Comment #70
ari-meetai commentedI had the same problem with Pathauto in 6.x (clean URLs not being generated)
Fixed is restoring Pathauto weight to 0.
Thanks.
Comment #71
kungfumaster commentedHad the same issue as above. Pathauto was installed after XmlSitemap in our case. This is how we fixed it:
- Pathauto weight is 1, so we made Xmlsitemap weight >2
- Ran the updates shown in #58 above, but changed table names for the xmlsitemap_node
- Then applied the same update (code below) to xmlsitemap_term for taxonomy URL aliases also to be taken in SiteMap.
Hope this is useful to someone.
thanks