I'm getting an error when installing xmlsitemap_node

    * warning: pg_query() [function.pg-query]: Query failed: ERROR: duplicate key violates unique constraint "xmlsitemap_node_pkey" in /home/davidt/www/drivesouth/includes/database.pgsql.inc on line 125.
    * user warning: query: INSERT INTO xmlsitemap_node (nid, pid, last_changed, last_comment, previous_comment) SELECT n.nid, ua.pid, n.changed, s.last_comment_timestamp, MAX(c.timestamp) FROM node n LEFT JOIN node_comment_statistics s ON s.nid = n.nid LEFT OUTER JOIN comments c ON c.nid = n.nid AND c.timestamp < s.last_comment_timestamp LEFT JOIN xmlsitemap_node xn ON xn.nid = n.nid LEFT JOIN url_alias ua ON ua.src = 'node/' || n.nid WHERE xn.nid IS NULL GROUP BY n.nid, ua.pid, n.changed, s.last_comment_timestamp in /home/davidt/www/drivesouth/includes/database.pgsql.inc on line 145.

Drupal 5.2 / PostgreSQL 8.2

Any ideas?

DT

Comments

darren oh’s picture

Run this query and report your result:

SELECT n.nid, ua.pid, n.changed, s.last_comment_timestamp, MAX(c.timestamp) FROM node n
LEFT JOIN node_comment_statistics s ON s.nid = n.nid
LEFT OUTER JOIN comments c ON c.nid = n.nid AND c.timestamp < s.last_comment_timestamp
LEFT JOIN xmlsitemap_node xn ON xn.nid = n.nid
LEFT JOIN url_alias ua ON ua.src = 'node/' || n.nid
WHERE xn.nid IS NULL
GROUP BY n.nid, ua.pid, n.changed, s.last_comment_timestamp
HAVING COUNT(n.nid) > 1
darren oh’s picture

Status: Active » Postponed (maintainer needs more info)
davidwhthomas’s picture

StatusFileSize
new99.13 KB

Thanks for the reply.

The query returned 1 result.

Please see the attached screenshot.

regards,

DT

P.S I think this may be related to the node paths being the plain path as opposed to the clean urls path in the sitemap file.
Would be great to get it resolved!

davidwhthomas’s picture

Status: Postponed (maintainer needs more info) » Active
darren oh’s picture

Now let's see if we can find out why that node is being duplicated. Try this query:

SELECT n.nid, ua.pid, n.changed, s.last_comment_timestamp, MAX(c.timestamp) FROM node n
LEFT JOIN node_comment_statistics s ON s.nid = n.nid
LEFT OUTER JOIN comments c ON c.nid = n.nid AND c.timestamp < s.last_comment_timestamp
LEFT JOIN xmlsitemap_node xn ON xn.nid = n.nid
LEFT JOIN url_alias ua ON ua.src = 'node/' || n.nid
WHERE xn.nid IS NULL AND n.nid = 799
GROUP BY n.nid, ua.pid, n.changed, s.last_comment_timestamp
darren oh’s picture

Status: Active » Postponed (maintainer needs more info)
davidwhthomas’s picture

StatusFileSize
new97.58 KB

Thanks for the reply,

Here's the output from the query

(see attached screenshot)

DT

darren oh’s picture

It didn't show any duplicate entries. I've seen a similar unexplained error in issue 192928.

davidwhthomas’s picture

Status: Postponed (maintainer needs more info) » Active

still getting the first error in the logs each cron run.

Is there a way to fix it?

darren oh’s picture

Status: Active » Postponed (maintainer needs more info)

Testing is needed to find a query that works. Contact me directly if you would like help with this.

davidwhthomas’s picture

Thanks for the reply,

I've had to disable the cron functionality for the time being because of this error.

(renamed hook_cron to hook_cron_disabled)

I can't give you direct access to our database / system as it's behind firewalls and the only way in is ssh.

But I'm happy to test different queries if you can post them here and give you the results promptly.

thanks again for your help, it would be great to get this resolved.

regards,

DT

darren oh’s picture

Posting here is too slow. I can use PHP code to run queries from within Drupal, or we can chat and you can run the queries. Either way, it would be best for you to contact me directly.

cheechq’s picture

I was having the same problem with Drupal 5.3 and mySQL. The problem is that the query does not account for multilpe rows in the url_alias with the same value in the src field. I did not fix the query (davidwhthomas and project administrators can do this) but I did find a work around for my scenario. In my case, I had multilpe rows in the url_alias with the same value in the src field because I have the pathauto module installed and I entered a value in the "URL path settings" form field on the create content (aka node) that is different from what pathauto would generate. When I did this two rows were inserted into the url_alias table with the same src field value, one created automatically by pathauto and one for the value I entered in the "URL path settings" form field. To work around this I just updated my nodes and cleared the "URL path settings" from each node. Then I used the link created by pathauto when reference this content throughout my site.

oboema’s picture

I an also having problems with the xml site map module;

When I run Cron i get the following error

user warning: INSERT TABLE 'xmlsitemap_node' isn't allowed in FROM table list query: INSERT INTO xmlsitemap_node (nid, pid, last_changed, last_comment, previous_comment) SELECT n.nid, ua.pid, n.changed, s.last_comment_timestamp, MAX(c.timestamp) FROM node n LEFT JOIN node_comment_statistics s ON s.nid = n.nid LEFT OUTER JOIN comments c ON c.nid = n.nid AND c.timestamp < s.last_comment_timestamp LEFT JOIN xmlsitemap_node xn ON xn.nid = n.nid LEFT JOIN url_alias ua ON ua.src = CONCAT('node/', n.nid) WHERE xn.nid IS NULL GROUP BY n.nid, ua.pid, n.changed, s.last_comment_timestamp in /var/www/vhosts/thehaguecentral.com/httpdocs/includes/database.mysql.inc on line 172.

I am running the latest drupal the dec. 18th xmlsitemap dev release and I have path auto and clean urls installed but I do not have clean urls in my sitemap

http://thehaguecentral.com/sitemap.xml

Can anyone help me?

deviantintegral’s picture

StatusFileSize
new1.36 KB

I've attached a patch which seems to fix the issue for me. I'm using MySQL. This is against stable.

Of course, now I realize it's the same as the query above for pgsql :P

I haven't touched the pgsql code as I have no way to test it.

Thanks!
--Andrew

davidwhthomas’s picture

deviantintegral: thanks for this, it looks like a reasonable solution.

Ths code is standard sql and will work in postgresql or mysql.

With the modification, duplicate url alias pids should be subsumed into grouping by nid with this patch which should overcome the duplicate nid key error.

I will test it out tomorrow and post back - looks promising :-)

DT

darren oh’s picture

Status: Postponed (maintainer needs more info) » Needs work

PostgreSQL will not select a single value from a group of values. Thus, when results are grouped, we must group by each column from which we wish to select a single value. If we do not wish to group by ua.pid, we must use an aggregate function to retrieve a value from each group of results.

MySQL, on the other hand, will select a single value from a group of results if asked to do so. However, the value it selects is random and unpredictable, so it's poor design to rely on this MySQL quirk.

We just need to decide on the right aggregate function:

MySQL 5.0 Reference Manual
PostgreSQL 8.2: Aggregate Functions

davidwhthomas’s picture

The patch by deviantintegral

worked for me.

Are there any issues I should consider? or is it ok to use this?

DT

davidwhthomas’s picture

In answer to my own question, with the patch applied I get:

pg_query(): Query failed: ERROR: column "ua.pid" must appear in the GROUP BY clause or be used in an aggregate function in /var/www/.../includes/database.pgsql.inc on line 125.

It looks like you're right there with an aggregate function being required, Darren.

any suggestions?

DT

davidwhthomas’s picture

adding the aggregate function max() worked.

This should select the latest and greatest pid.

/**
 * Implementation of hook_cron().
 */
function xmlsitemap_node_cron() {
  if (db_result(db_query_range("SELECT COUNT(*) FROM {node} n LEFT JOIN {xmlsitemap_node} xn ON xn.nid = n.nid WHERE xn.nid IS NULL", 0, 1))) {
    $query = "
      INSERT INTO {xmlsitemap_node} (nid, pid, last_changed, last_comment, previous_comment)
      SELECT n.nid, max(ua.pid), n.changed, s.last_comment_timestamp, MAX(c.timestamp) FROM {node} n
      LEFT JOIN {node_comment_statistics} s ON s.nid = n.nid
      LEFT OUTER JOIN {comments} c ON c.nid = n.nid AND c.timestamp < s.last_comment_timestamp
      LEFT JOIN {xmlsitemap_node} xn ON xn.nid = n.nid";
    switch ($GLOBALS['db_type']) {
      case 'mysql':
      case 'mysqli':
        $query .= "
          LEFT JOIN {url_alias} ua ON ua.src = CONCAT('node/', n.nid)";
        break;
      case 'pgsql':
        $query .= "
          LEFT JOIN {url_alias} ua ON ua.src = 'node/' || n.nid";
        break;
    }
    $query .= "
      WHERE xn.nid IS NULL
      GROUP BY n.nid, n.changed, s.last_comment_timestamp
    ";
    db_query($query);
    xmlsitemap_update_sitemap();
  }
}

note: the max(ua.pid)

DT

darren oh’s picture

Ideally, we would get the same result as drupal_lookup_path(), which simply returns the first result.

darren oh’s picture

Ideally, we would get the same result as drupal_lookup_path(), which simply returns the first result.

davidwhthomas’s picture

Thanks for the reply,

Just to let you know,

max(ua.pid)

gets the first / top result and is working for me.

DT

darren oh’s picture

Just to clarify, was that tested on a node with multiple URL aliases?

davidwhthomas’s picture

I assume so, it's the same DB (more or less) that raised the original error in this thread.

Is there a query I can run to check if duplicate aliases are in the url_aliases table?

I tried:

SELECT src, count(*) FROM url_alias GROUP BY src;

but couldn't see any dupes.

DT

darren oh’s picture

Try this: SELECT * FROM url_alias GROUP BY src HAVING count(pid) > 1

If that doesn't show duplicates, we haven't identified the cause of this bug.

davidwhthomas’s picture

The query gave :

'pid must appear in group by clause...' etc...

so I tried:

SELECT * FROM url_alias GROUP BY src, pid, dst HAVING count(pid) > 1;

and got:

0 results.

However, this query:

SELECT src, count(*) FROM url_alias GROUP BY src HAVING count(src) > 1;

gave 1 result

psql=> SELECT src, count(*) FROM url_alias GROUP BY src HAVING count(src) > 1;
    src    | count
-----------+-------
 node/1120 |     2
(1 row)

and lastly:

psql=> SELECT * FROM url_alias WHERE src = 'node/1120';
 pid  |    src    |          dst
------+-----------+------------------------
 8829 | node/1120 | A1GP-competition
 8835 | node/1120 | info/1120/competitions
(2 rows)

DT

darren oh’s picture

Sorry for giving you a bad query. Now the the question is, does

SELECT dst FROM {url_alias} WHERE src = 'node/1120' LIMIT 1

produce A1GP-competition or info/1120/competitions?

davidwhthomas’s picture

No problem, thanks for your help on this! xmlsitemap is a great module and worth refining :-)

Here's the latest query result

psql=> SELECT dst FROM url_alias WHERE src = 'node/1120' LIMIT 1;
          dst
------------------------
 info/1120/competitions
(1 row)
darren oh’s picture

Status: Needs work » Fixed

Fixed in CVS commit 94686. This whole issue will become irrelevant when issue 198173 is fixed.

darren oh’s picture

Title: SQL error when enabling xmlsitemap_node » SQL error when node has multiple aliases
darren oh’s picture

Version: 5.x-1.x-dev » 5.x-1.4
Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.