Project:Leftandright - Nested Set Taxonomy
Version:7.x-1.x-dev
Component:Code
Category:feature request
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

Submit a patch. Advocate for the change. Garner and respond to feedback. Changes to taxonomy can languish in contrib forever and ever, or they can find champions who push them into core. Be a champion.

Comments

#1

Pushing this into core means Drupal 7, which I have no experience with, and sadly don't have the time to start. I'd love to, although I'm really not sure how many people this would benefit, certainly Googling, and searching the Drupal site itself there is a lot of comment on the use of "Large taxonomies", but what percentage of the total users this is, is hard to say.

Thanks for your comments anyway Robert, I'll leave this issue open as a reminder, of what I should be doing with this module.

#2

Great contrib! If possible, a patch in the queue for review would be a good start.

#3

+1 to Robert

Great contrib!

Warm regards from sunny México!
:-)
Pepe

#4

Have you guys seen this one?

http://drupal.org/project/lineage

It is meant to address the same issue of super large taxos with the nested tree algorithm.

#5

Sorry giorgio79, you've not read the module's project page properly. It definitely isn't designed to improve Drupal's handling of large taxonomies.

#6

Thanks but I think I have. :) Taxonomy Lineage as I understand it is meant to handle super large hierarchical taxonomies, and the emphasis there is on hierarchy...Currently Drupal Taxonomy module stores only the parent child relationships and it has to rejoin itself to get the full hierarchical taxonomy line for a multi level hierarchy (like geo hierarchy such as continent->country->city->locality...). Taxonomy Lineage solves this issue. As I understand yours is good for free tagging single level taxos as well...

#7

Back to the topic of this issue:

The module requires a small patch to "taxonomy.module" which is fully backwards compatible, meaning that it will work with, or without leftandright installed. Installation of the module requires the taxonomy.module file to be replaced with the one provided.

Is there a link to the D7 issue queue where you are proposing this change?

#8

Version:5.x-1.1» 7.x-1.x-dev

The 7.x version of this module is much more likely to fit into core than the 6.x version. For this reason, I'll leave this issue open and possibly look at suggesting it in the future.

#9

Honestly, I hope this will never get into the core.

Let me be clear: the author has made a decent job in developing a nested set implementation for taxonomies (although some parts, like the insertion of terms, might be improved) and the module works as expected, so I am not blaming the author. It is the nested set model itself that is flawed, in principle and in practice. First of all, in the relational model, relationships are established by equality of values, not by implicit rules as the one dictated by the nested set model. A consequence of the use of an implicit rule is that referential integrity is not possible. Second, taxonomies in Drupal are not tree-like structures, but, in general, they are directed acyclic graphs (directed graphs without cycles), so nested sets do not fit. Third, queries in the nested set model are not so intuitive (this may also be seen as implied by the first point I've made). Fourth, the nested set model is not as efficient as it may seem. A simpler, cleaner, more general and more effective solution would consist in explicitly maintaining the ancestor-descendant relationship between taxonomy terms (or, more precisely, the transitive closure of the term graph), in a table like AD(ancestor,descendant). The trade-off is that more space is needed to store the AD table (but, even at 16 bytes/row, it's maybe a few 10MB for millions of terms).

The last point is worth elaborating a bit, as the goal of the module is to provide a scalable solution for taxonomies. Assume that hierarchies are tree-like (otherwise, nested sets cannot be applied) and consider just two operations: the insertion of a node and the query “all the ancestors of a given node n”. How do the nested set model and the transitive closure model compare wrt those two tasks?

  • Insertion: with nested sets, you need to update all the nodes above and to the right of the node that you are inserting. In the worst case and on the average, that means O(n) updates, with n number of terms, no matter what the shape of the taxonomy is. In the transitive closure model, your need to insert a number of rows proportional to the depth of the taxonomy. This is also O(n) insertions in the (rather unusual) worst-case of a linear chain of terms, but, if the taxonomy is approximately a balanced tree, it is O(log n) insertions, and it may become even less in practice if the taxonomy is shallow. Besides, in the former case, you have to query all the nodes satisfying certain inequality conditions, while in the latter case, you just have to search for the pairs (ancestor,descendant) where the descendant is a specific node, thus making a better use of indexing structures.
  • Ancestor query: in the nested set model, you need to find all the nodes whose [lft,rgt] interval contains the [lft,rgt] interval of the given node, which requires a join of the corresponding table by itself on an inequality condition. The possibilities for optimization are somewhat limited. In the transitive closure model, all you need to do is essentially “select ancestor from AD where descendant = n”, which, again, can make effective use of indexes.

The above are just two examples, but they are significant because (1) insertion is fundamental to populate the taxonomy, and (2) the ancestor/descendant queries are one of the main reasons to improve over the current implementation.

The transitive-closure approach has been widely studied in the scientific literature: see, for example, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.53 or http://www.cs.wright.edu/people/faculty/gdong/ViewMaintenanceSurveySigmo... or, if you do not want to delve into too much technicality, take a look at this presentation: http://www.slideshare.net/billkarwin/models-for-hierarchical-data.

So, my suggestion to the author is to change the implementation into one based on a transitive-closure table, with a gain in design, clarity and performance.

nobody click here