The description of taxonomy_node_get_terms is as follows:

Find all terms associated to the given node, ordered by vocabulary and term weight.

It actually isn't supposed to sort by vid/vocabulary (the description is ambiguous though) but in my opinion it should be, because it greatly simplifies the usage in modules. That's why a wrote a patch (which probably isn't good enough ;-)).

Comments

David Lesieur’s picture

I think the description actually means "Find all terms associated to the given node, ordered by vocabulary weight and term weight". Could you give an example where sorting also by vid greatly simplifies a module?

killes@www.drop.org’s picture

Version: 4.7.1 » x.y.z

features go into cvs.

webchick’s picture

Version: x.y.z » 6.x-dev
Status: Needs review » Needs work

No longer applies.

catch’s picture

Version: 6.x-dev » 7.x-dev
catch’s picture

Status: Needs work » Closed (won't fix)

'vocabulary and term weight' makes plenty of sense especially with the query being self explanatory, IMO this is won't fix.

jhodgdon’s picture

Status: Closed (won't fix) » Needs review

I think this decision of "won't fix" should be revisited. It's also a bug in D6, and it really makes the function fairly worthless if you have multiple vocabularies for a single node type. At a minimum the documentation should be revised to reflect what the function does (shouldn't have to read the code, that's the reason we have documentation).

The reason is that if you have multiple vocabularies on a given node type, and you haven't dragged the vocabularies to assign them weights (i.e. just left them alphabetical), the terms do not come out grouped by vocabulary. In fact, if you've assigned weights to the terms within one vocabularly, perhaps, and left the others alphabetical, you'll see your terms in some kind of mixture of alphabetical and by weight (you don't generally see the weights at all on the screen in D6/D7, because the interface is drag and drop rather than assigning numbers, so who knows what weights are being assigned behind the scenes?).

My preference would be to change the function to make the ORDER BY clause in the query be v.weight, v.name, t.weight, t.name, so that the terms would be grouped by vocabulary no matter what. This is NOT what the above patch does (that uses v.vid, which I don't think makes any sense at all).

If you are not going to make that change, then please consider changing the documentation of the function, because to me "ordered by vocabulary" implies that the terms would be grouped together by vocabulary, in the same order I would see the vocabularies on the Taxonomy list page. Change it to "ordered by vocabulary weight" and put a note in saying that if you haven't changed the vocab weights to something other than alphabetical, there is no ordering by vocabulary at all.

cburschka’s picture

I also have so far thought that all "weight" fields were merely an overriding value to affect the natural sort order (by ID or alphabet). The inference would be that if you use any weight field for sorting, you would also need to use the natural order as a second sorting field. This was just my interpretation, of course.

catch’s picture

Status: Needs review » Needs work

Not sure why the testbot never made it here, but that patch won't apply any more now the taxonomy has been converted to dbtng.

jhodgdon’s picture

Category: feature » bug
Status: Needs work » Needs review
StatusFileSize
new603 bytes

Here is a patch for D7 that adds the vocabulary name as a backup sort order, after vocabulary weight.

catch’s picture

jhodgdon - could you post an EXPLAIN on the new query, or even better benchmarks?

jhodgdon’s picture

Ummm. Not sure how to acheive that, any pointers?

The two current users of this function in Drupal 6 core (taxonomy_nodeapi and taxonomy_form_alter), and the one in Drupal 7 core (taxonomy_form_alter), won't benefit from this patch, by the way, because they don't care about the order of the returned terms.

catch’s picture

If you've got a D6 site with a version of the patch, easiest thing to do is get devel query log, copy the query from there, put into mysql command line or phpmyadmin, and then prefix with 'explain'.

For benchmarking, probably give anonymous users access to create/edit a content type with a taxonomy select on it (ideally a large vocabulary - at least couple of hundred terms and at least 1-2000 in the database altogether) - then run ab -c1 -n500 on that page before and after the patch. I have a feeling the query is currently so unoptimised that adding the extra sort won't make it noticeably worse, but if it does we should know.

jhodgdon’s picture

OK, here's Explain output anyway...

Here's a sample query from D6, before patching:

SELECT t.* FROM d1_term_node r INNER JOIN d1_term_data t ON r.tid = t.tid INNER JOIN d1_vocabulary v ON t.vid = v.vid WHERE r.vid = 1 ORDER BY v.weight, t.weight, t.name

Explain (may need to widen your browser screen to see this well):

+----+-------------+-------+--------+--------------------------------+---------+---------+---------------+------+---------------------------------+
| id | select_type | table | type   | possible_keys                  | key     | key_len | ref           | rows | Extra                           |
+----+-------------+-------+--------+--------------------------------+---------+---------+---------------+------+---------------------------------+
|  1 | SIMPLE      | r     | ref    | PRIMARY,vid                    | vid     | 4       | const         |    1 | Using temporary; Using filesort | 
|  1 | SIMPLE      | t     | eq_ref | PRIMARY,taxonomy_tree,vid_name | PRIMARY | 4       | drupal6.r.tid |    1 |                                 | 
|  1 | SIMPLE      | v     | eq_ref | PRIMARY                        | PRIMARY | 4       | drupal6.t.vid |    1 |                                 | 
+----+-------------+-------+--------+--------------------------------+---------+---------+---------------+------+---------------------------------+

After patching (cannot apply the patch exactly as DB format is different, but it's the same effect), the query is:

SELECT t.* FROM d1_term_node r INNER JOIN d1_term_data t ON r.tid = t.tid INNER JOIN d1_vocabulary v ON t.vid = v.vid WHERE r.vid = 1 ORDER BY v.weight, v.name, t.weight, t.name

And the EXPLAIN output is:

+----+-------------+-------+--------+--------------------------------+---------+---------+---------------+------+---------------------------------+
| id | select_type | table | type   | possible_keys                  | key     | key_len | ref           | rows | Extra                           |
+----+-------------+-------+--------+--------------------------------+---------+---------+---------------+------+---------------------------------+
|  1 | SIMPLE      | r     | ref    | PRIMARY,vid                    | vid     | 4       | const         |    1 | Using temporary; Using filesort | 
|  1 | SIMPLE      | t     | eq_ref | PRIMARY,taxonomy_tree,vid_name | PRIMARY | 4       | drupal6.r.tid |    1 |                                 | 
|  1 | SIMPLE      | v     | eq_ref | PRIMARY                        | PRIMARY | 4       | drupal6.t.vid |    1 |                                 | 
+----+-------------+-------+--------+--------------------------------+---------+---------+---------------+------+---------------------------------+

For reference (from looking at http://api.drupal.org/api/function/taxonomy_schema/6):

term_data has indices:

    'primary key' => array('tid'),
    'indexes' => array(
      'taxonomy_tree' => array('vid', 'weight', 'name'),
      'vid_name' => array('vid', 'name'),
    ),

and vocabulary has indices:

   'primary key' => array('vid'),
    'indexes' => array(
      'list' => array('weight', 'name'),
    ),

D7 has the same indices for the equivalent tables (table names changed to taxonomy_term_data and taxonomy_vocabulary).

So it doesn't look like the "list" index is being used, even in the new query. Maybe we should index separately on "weight" and "name"? Shall I add that to the patch? Thoughts?

I'll see what I can do about generating lots of DB size in order to benchmark...

jhodgdon’s picture

Interestingly, the term_node table has indices for vid and nid, but not tid (this is true in D6 as well as D7). It might help to add an index on tid as well? Pretty much any query on term_node would tend to be joined to term_data on tid, wouldn't it?

I'm not a database index efficiency expert though... I'm not sure which parts of queries indices actually help optimize, is it just WHERE and ORDER BY, or do indices help optimize JOIN ON as well?

catch’s picture

I think it's pretty unlikely the existing query uses indexes - order by across tables usually results in the worst possible query - that temporary table in both EXPLAINs sounds about right. What I'm not sure about is whether adding the order by on vocabulary name makes the existing bad query worse, or has no impact either way.

The term_node tid index seems covered by the primary key already I think?

jhodgdon’s picture

Not sure. term_node's primary key is the combination (tid,vid).

Benchmarks coming shortly...

jhodgdon’s picture

Here are benchmark results, using ab -c1 -n500 on a node edit page, as catch suggested.

I generated 1000 nodes of this type with the devel module (there were a few others already there, but not many), and this content type uses two vocabularies (also generated with devel; one has 90 entries and the other has 10).

Benchmark results for 500 page queries:

- old query (no v.name ordering):

Concurrency Level:      1
Time taken for tests:   281.484 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      8815000 bytes
HTML transferred:       8542000 bytes
Requests per second:    1.78 [#/sec] (mean)
Time per request:       562.967 [ms] (mean)
Time per request:       562.967 [ms] (mean, across all concurrent requests)
Transfer rate:          30.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   408  563 164.1    480     926
Waiting:      378  525 156.8    443     873
Total:        408  563 164.1    480     926

Percentage of the requests served within a certain time (ms)
  50%    480
  66%    635
  75%    714
  80%    738
  90%    841
  95%    851
  98%    857
  99%    861
 100%    926 (longest request)

- new query (with v.name ordering):

Concurrency Level:      1
Time taken for tests:   285.488 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      8815000 bytes
HTML transferred:       8542000 bytes
Requests per second:    1.75 [#/sec] (mean)
Time per request:       570.977 [ms] (mean)
Time per request:       570.977 [ms] (mean, across all concurrent requests)
Transfer rate:          30.15 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   408  571 163.8    498     864
Waiting:      379  531 156.2    467     809
Total:        408  571 163.8    498     864

Percentage of the requests served within a certain time (ms)
  50%    498
  66%    620
  75%    717
  80%    753
  90%    844
  95%    852
  98%    857
  99%    860
 100%    864 (longest request)
jhodgdon’s picture

Just for kicks, I manually added a couple of indices to the database (name, weight, and tid in term_data, and weight in vocabulary). I re-ran the benchmark on the unmodified query (with no vocabulary.vid sorting):

Concurrency Level:      1
Time taken for tests:   255.347 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      8815000 bytes
HTML transferred:       8542000 bytes
Requests per second:    1.96 [#/sec] (mean)
Time per request:       510.693 [ms] (mean)
Time per request:       510.693 [ms] (mean, across all concurrent requests)
Transfer rate:          33.71 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   407  511 141.0    419     858
Waiting:      371  475 134.4    389     804
Total:        407  511 141.1    419     858

Percentage of the requests served within a certain time (ms)
  50%    419
  66%    494
  75%    589
  80%    649
  90%    741
  95%    841
  98%    853
  99%    856
 100%    858 (longest request)
jhodgdon’s picture

What do you think? It looks to me like adding the extra sorting term is a pretty small effect (mean time went up by 8ms, and median went up by 18ms, and both are way less than 1 standard deviation). Adding indices to the tables has a bigger effect (about 50-60ms time reduction).

Status: Needs review » Needs work

The last submitted patch failed testing.

jhodgdon’s picture

Status: Needs work » Needs review

Setting back to Needs Review in hopes that the patch will get retested. The test bot was apparently malfunctioning (I had about 7 patches fail in that time period, all were doc, none broke the HEAD install I am pretty sure).

Status: Needs review » Needs work

The last submitted patch failed testing.

andypost’s picture

Status: Needs work » Needs review

Bot was failed

jhodgdon’s picture

This patch still applies cleanly, and the benchmark tests have been done. Any hope of getting a review?

jhodgdon’s picture

Issue tags: +API change

adding tag

catch’s picture

I'm very much hoping we can remove this entirely with #412518: Convert taxonomy_node_* related code to use field API + upgrade path.

Status: Needs review » Needs work

The last submitted patch failed testing.

jhodgdon’s picture

Version: 7.x-dev » 6.x-dev

Apparently, taxonomy terms are now attached to nodes using the field API. So this function no longer exists in Drupal 7. And it looks like each taxonomy field in D7 has a vocabulary ID on it, so presumably you would query each field separately to find the terms. Anyway, the whole issue just doesn't apply to D7.

I would still like the function to be fixed in Drupal 6, because I think it's a bug that the results are not grouped by vocabulary, since the doc says they should be. Or else it's a doc bug.

Anyway... setting to D6...

jhodgdon’s picture

And by the way, the above benchmarks were done in Drupal 6.

jhodgdon’s picture

StatusFileSize
new1.06 KB

Here is a patch for Drupal 6.

jhodgdon’s picture

Status: Needs work » Needs review
andypost’s picture

StatusFileSize
new1.96 KB

This patch works fins and +1 for rtbc this but this is a part of issue - only a view

When you at the edit node form - vocabularies are sorted wrong

function taxonomy_form_alter()
....
    $c = db_query(db_rewrite_sql("SELECT v.* FROM {vocabulary} v INNER JOIN {vocabulary_node_types} n ON v.vid = n.vid WHERE n.type = '%s' ORDER BY v.weight, v.name", 'v', 'vid'), $node->type);

Order is same as patch provides but after tracing I found that order of vocabs is changed in final drupal_render()

So I think to make order of vacabs with same weight we need add "SubWeight" in taxonomy_form_alter() to edit widgets

jhodgdon’s picture

#33 sounds like a separate issue to me - that vocabularies on the node edit page are not presented by order of their weights assigned on the taxonomy admin screens -- or at least, I think that is what you are saying?

The issue filed here is that this specific function, taxonomy_node_get_terms(), does not do what it says it does.

andypost’s picture

@jhodgdon yes, I file another issue about

So we need more reviews on your patch #31 to rtbc, I've tested it already

thedavidmeister’s picture

Status: Needs review » Closed (won't fix)

Aside from the fact that the patch no longer applies:

error: taxonomy/taxonomy.module: No such file or directory

This would be an API change for D6 for a problem that doesn't exist in D7+ so I don't think it will ever be fixed.