| Project: | XML sitemap |
| Version: | 5.x-1.6 |
| Component: | xmlsitemap.module |
| Category: | support request |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | closed (fixed) |
Issue Summary
I found several references to mysite.com/node/xxx on my sitemap. This is a problem because I disallow robots to access anything node/* to prevent duplicate content.
When I went to the url on the sitemap, node/xxx, I saw that I was redirected to the correct path alias.
When I examined the xmlsitemap_node table, I noticed that the pid that was referenced for that node no longer existed.
I believe this is because the xmlsitemap_node table is not updated when you use the bulk generate feature of pathauto. I can confirm this does not update the xmlsitemap_node table because I enabled the usernode module which created nodes with no alias. I checked my sitemap, sure enough, they were node/xxx. I then went to pathauto and bulk generated aliases for usernodes as profiles/xxx. I then checked my sitemap and the nodes were still listed as node/xxx.
To solve this problem I ran a simple php script to save all usernodes so that the alias' would be updated in an unpublished page like this:
<?php
$res = db_query("SELECT n.nid FROM {node} n WHERE n.type = 'usernode'");
print '<ol>';
while ($n = db_fetch_object($res)) {
node_save(&$n);
print '<li>'.l($n->title, 'node/'.$n->nid).'</li>';
}
print '</ol>';
?>I believe this is also the issue for when taxonomies are not updated on the sitemap.
Comments
#1
Got the same boring problem.
The solution for me was to disable XML Sitemap and contributor modules, then also _remove_ them, and after that enable again, and reconfigure content types etc, which is weird.
Dont want thousands of nodes to be resaved (modified time changing, the process is slow, ...).
Im thinking on a small motoole that backups XML Sitemap settings, then disables, uninstalls, installs, enables modules that have set up, and restores settings.
Is out there any discussion about a reliable solution for this behavior of XML Sitmap?
#2
We experienced the same problem our site, however, we originally did not have path auto installed.
After reading around and trying some techniques I found here and on other sites, I wound up editing xmlsitemap_node.module in the function _xmlsitemap_node_links (line 51). All I did was force Drupal to grab a new alias after the joins (I commented where I edited).
I installed path auto when trying some of the techniques I found but none of them helped. This is the only solution I was able to get to work. Can anyone tell me if this is a good or bad fix? While I understand what I did below, I don’t have a complete grasp on the entire code base for either Drupal or this module, so I’m not sure if this is going to break something down the road, etc.
Edit: I also wanted to mention that I tried using the module weights module before attempting to edit anything which did not work for me although I do still have it setup to have the xmlsitemap modules run after everything else. To test my results, I needed to disable cache in includes/bootstrap.inc.
<?php
function _xmlsitemap_node_links($excludes = array()) {
$links = array();
if (module_exists('comment')) {
$sql = "
SELECT n.nid, n.type, n.promote, s.comment_count, n.changed, xn.previously_changed, s.last_comment_timestamp, xn.previous_comment, xn.priority_override, ua.dst AS alias
FROM {node} n
LEFT JOIN {node_comment_statistics} s ON s.nid = n.nid";
}
else {
$sql = "
SELECT n.nid, n.type, n.promote, n.changed, xn.previously_changed, xn.priority_override, ua.dst AS alias
FROM {node} n";
}
$sql .= "
LEFT JOIN {xmlsitemap_node} xn ON xn.nid = n.nid
LEFT JOIN {url_alias} ua ON ua.pid = xn.pid
WHERE n.status > 0
AND (n.type NOT IN ('". implode("', '", $excludes) ."') AND xn.priority_override IS NULL OR xn.priority_override >= 0)
AND n.nid <> %d";
$result = db_query(db_rewrite_sql($sql), _xmlsitemap_node_frontpage());
while ($node = db_fetch_object($result)) {
// Edit :: Feb 20
// Force Drupal to lookup the path for this node
$new_alias = drupal_lookup_path('alias', 'node/'. $node->nid);
// End
$links[] = array(
'nid' => $node->nid,
// Edit :: Feb 20
// Replaced "'node/'. $node->nid" with "$new_alias"
'#loc' => xmlsitemap_url($new_alias, $node->alias, NULL, NULL, TRUE),
// End edit
'#lastmod' => variable_get('xmlsitemap_node_count_comments', TRUE) ? max($node->changed, $node->last_comment_timestamp) : $node->changed,
'#changefreq' => xmlsitemap_node_frequency($node),
'#priority' => xmlsitemap_node_priority($node),
);
}
return $links;
}
?>
#3
This is a dupe, please delete this comment. Sorry.
#4
I have tried disabling/removing and then re adding both pathauto and xml sitemap. I tried setting weight module to 1 with everything else as 0, manually ran cron several times. . . nothing seems to work. My sitemap is at http://www.yourautorights.com/sitemap.xml.
Seems I can only get alias URLs to show in xml sitemap if I go to each individual page and resubmit or resend each page. Is this the way to do this or should I be doing something else?
Also, Google has given me the following warning for my sitemaps
HTTP Error:
URL:
URL:
Found: 301 (Moved permanently)
http://www.yourautorights.com/taxonomy/term/34
http://www.yourautorights.com/taxonomy/term/22
Feb 23, 2008
URLs not followed
When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot because they contained too many redirects. Please change the URLs in your Sitemap that redirect and replace them with the destination URL (the redirect target). All valid URLs will still be submitted.
I would appreciate any detailed help I could get. Thank you in advance for your time and attention.
#5
Disabling the XML Sitemap modules is not enough. You must also uninstall them.
#6
I set the Weight Module to 10 so its the las think to load and works 99%
Still get a few nodes (8 out of 250) that wont autopath them but most of them do.
I also use Global Redirect so even if the non autopathed links are there, they are redirected by a 301 to the right url which is better than nothing
#7
I did uninstall both pathauto and XML Sitemap a few times. I Re-installed both several times, ran cron several times, adjusted weight before and after,etc. I analytically tried every variation of every suggested fix (other than touching Php code) booth here and through a google search, but still have same problems. I can only get the path to show aliases if I update each sitemap item individually with or without a change to the item. I would be happy to give access to you access if it helps to find and fix the issues with XML Sitemap.
I like this module and recognize its importance for SEO, but it definitely has issues that have not been fixed or dealt with elsewhere.
Thanks again for your time and assistance.
#8
Here is a solution to update the table via an SQL query...
UPDATE xmlsitemap_node x, url_alias u SET x.pid = u.pid WHERE u.src = CONCAT('node/', x.nid);UPDATE xmlsitemap_term x, url_alias u SET x.pid = u.pid WHERE u.src = CONCAT('taxonomy/term/', x.tid);
I just did this to my site - it was suffering the same problem. It took less than a second to update over 1000 nodes. Actually - it probably needs a little tweaking for nodes which dont have an alias... But the basics are there.
There is probably more than needs doing (maybe fore nodes with no alias?) It also doesn't cover the
xmlsitemap_additionaltable... I assume similar principal?!#9
The SQL queries in #8 solved this for me. Note that you'll have to delete the cached sitemap file (in files/xmlsitemap) so it will be regenerated.
Raising priority to critical, as pathauto is one of the most popular contrib modules.
#10
I wrote a tiny module that just hooks cron to update node and taxonomy alias' that didn't update the xmlsitemap tables. There might be a better way to do this but this works for me and only runs during cron so it's lightweight.
Download the zip and extract it inside your xmlsitemap folder in sites/all/modules. Enable the module and run cron.
Test it and let me know this fixes it... it worked for me.
#11
Yes it fixes but in xmlsitemapauto.info file in the line dependencies = xmlsitemap, pathauto the comma need to be removed and be with just a space
dependencies = xmlsitemap pathauto
Otherwise it does not allow to install the module because of dependency check mismatch.
#12
Thanks,
So this one should work... I removed the comma
#13
xmlsitemap_pathauto module solved this for me!
#14
Ditto!!!!
xmlsitemap_pathauto module solved this for me!
Thanks!!!
#15
Duplicate of issue 198173.
#16
The solution in 12# worked for me too
thanks so much
#17
Hi,
Could you prepare patch for Drupal 6 based on #12 solution !?
I have the same issue ("/node/..." urls in sitemap for "bulk ganerated" aliases) but this patch doesn't work for me :(
#18
Hi,
I've made some changes of xmlsitemap_pathauto module (thank you, wmclark) for myself, for my Drupal 6.6 test site. As Drupal 6 has url_alias table by default (I don't have pathauto module installed at all, but have had the same issue, as some recent aliases were simply not taken into the consideration when xmlsitemap was generated...), I've decided to remove the dependancy and rename the module to avoid any potential confusion.
Module - as it is - seems to work fine for me, so here it is - if anyone else needs it...
Maybe it would be convenient to have such an option (to regenerate or reset the xmlsitemap "manually") within XML Sitemap module itself (some admin setting or button or so).
#19
Hi,
First off I've set status to active because having had the same problem I'm certain it's not a duplicate of #198173. That issue was about the time it takes to do a large (usually first time) update. This is about xmlsitemap not picking up changes to url alias made by pathauto. Apologies if I've violated protocol but I'm a newbie.
Secondly, I don't think that xmlsitemap_pathauto and xmlsitemap_fix_aliases in 12# and 18# are a complete fix. The former worked for me the first time I tried it but then a bit of pathauto shenanigans later (deleting aliases and recreating via pathauto bulk update) and I was back where I started with xmlsitemap_pathauto proving completely ineffective.
The problem is that the xmlsitemap_node table (and xmlsitemap_term I presume) is not updated when pathauto does a bulk update. The xmlsitemap_pathauto module fixes this for nodes that have never been aliased but if like me you've deleted and bulk updated then the old pids in xmlsitemap_node table are all out of sync and don't get updated.
The proper fix I imagine is to patch the xmlsitemap_node_cron hook in xmlsitemap_node.module (same for xmlsitemap_term) to update any rows with invalid pids (i.e. pids that don't exist in the url_alias table) and safest to check that the dst corresponds to the nid/tid (just in case the pid sequence got reset?!?).
As for a workaround I'd say that nicholasThompsons approach in 8# is closest to the mark.
More apologies for not providing a patch but I'm up against it at the mo...
#20
Ooops! I was a bit hasty in saying that the xmlsitemap_node was not updated after a bulk pathauto update. It is, but the sql that does it in the cron_hook only updates rows with NULL pids and sits in the if block so it only runs if there's a new alias (which may explain the apparently random behaviour reported here and elsewhere). I propose that changing the WHERE clause to 'WHERE xn.pid <> ua.pid' should do the trick... and bringing the update statements out of the if block.
A few minutes later and I've tried this (with mysql only) and it works. Here's my diff from the 5.x-1.6 version:
@@ -362,6 +362,8 @@* Implementation of hook_cron().
*/
function xmlsitemap_node_cron() {
+ $updated = FALSE;
+
if (db_result(db_query_range("SELECT COUNT(*) FROM {node} n LEFT JOIN {xmlsitemap_node} xn ON xn.nid = n.nid WHERE xn.nid IS NULL", 0, 1))) {
$query = "
INSERT INTO {xmlsitemap_node} (nid, last_changed, last_comment, previous_comment)
@@ -373,25 +375,30 @@
GROUP BY n.nid, n.changed, s.last_comment_timestamp
";
db_query($query);
- switch ($GLOBALS['db_type']) {
- case 'mysql':
- case 'mysqli':
- db_query("
- UPDATE {xmlsitemap_node} xn INNER JOIN {url_alias} ua
- ON ua.src = CONCAT('node/', CAST(xn.nid AS CHAR))
- SET xn.pid = ua.pid
- WHERE xn.pid IS NULL
- ");
- break;
- case 'pgsql':
- db_query("
- UPDATE {xmlsitemap_node}
- SET pid = {url_alias}.pid
- FROM {url_alias}
- WHERE {url_alias}.src = CONCAT('node/', CAST(nid AS VARCHAR)) AND {xmlsitemap_node}.pid IS NULL
- ");
- break;
- }
+ $updated = TRUE;
+ }
+
+ switch ($GLOBALS['db_type']) {
+ case 'mysql':
+ case 'mysqli':
+ $query = "
+ UPDATE {xmlsitemap_node} xn INNER JOIN {url_alias} ua
+ ON ua.src = CONCAT('node/', CAST(xn.nid AS CHAR))
+ SET xn.pid = ua.pid
+ WHERE xn.pid <> ua.pid
+ ";
+ break;
+ case 'pgsql':
+ $query = "
+ UPDATE {xmlsitemap_node}
+ SET pid = {url_alias}.pid
+ FROM {url_alias}
+ WHERE {url_alias}.src = CONCAT('node/', CAST(nid AS VARCHAR)) AND {xmlsitemap_node}.pid <> {url_alias}.pid
+ ";
+ break;
+ }
+
+ if ($updated || db_affected_rows(db_query($query))) {
xmlsitemap_update_sitemap();
}
}
#21
I updated the referring version for the issue. If you still have the 5.1.4, update to 5.1.6.
Anyway, new code and fixes will be just added to 5.x-2.x-dev which is the base for the future 5.2.0 version.
#22
#23
I report here a way to solve the issue, taken by #202923: path URL alias not getting through to sitemap.
#24
I am setting this report to fixed as it has not received any feedbacks.
#25
Automatically closed -- issue fixed for 2 weeks with no activity.