Thanks to a comment from Dries on the developer list, I'm bringing this here. It needs to be discussed before I go ahead and do it, because there is at least one potential snag.
- The Issue
- Sorting any list of objects in a hierarchy requires a recursive algorithm unless additional information is stored in the database.
- My proposed solution
- The Nested Trees solution is what I used for the Taxonomy Lineage module. In this solution, an API would be formed which affected nodes can use to update their lineage trees. The table Lineage Trees currently has an id field, a depth field and the lineage signature, which is effectively a concatenated string of the hierarchy above the affected object; the strings are encoded so that they sort in the correct order. For this solution, I would simply add a type field, which would be set to 'taxonomy' or 'book' or 'comment' or whatever other system wanted to use the lineage functions.
- Interface
- Only two functions need to be exposed as an API. lineage_update_object and lineage_delete_object. However, the problem arises in that the API needs some information about the object it's going to update, and this information is quite different in book than taxonomy than comment.
Specifically, the API needs to know 3 things. How to construct the object's lineage signature. How to find the object's parent. How to find the object's children. I'd thought about passing along this information as a query that is passed to the API, but that becomes problematic with constructing the lineage signature. I believe a callback is probably the best solution, but I'm not completely convinced yet.
- Problems
- Right now the system doesn't support multiple inheritance at all. I believe that, actually, it might be able to if, during the update of an object's lineage, it traces up all possible inheritance paths, produces a list of truncated lineage signatures and then creates multiple signatures for each child. However, this takes away the uniqueness factor that the current system relies on (using a delete query and then an insert) so maintaining this system could lead to orphaned lineages if not very careful.
- API
-
lineage_update_object($object, $parent = NULL)- This function updates the lineage signature for the object and all children. The parent is optional; it's included only to save a database query since in many cases, the immediate parent will be known at the time of the call. (And in hook_taxonomy, this function is called before the object is written to the database, so it can't just look up the parent).
lineage_delete_object($object, $children = false)- Deletes the object from the lineage. If $children is true, delete all the children too; this isn't necessary in the taxonomy case since it has to maintain the hierarchy tree as well, and it does this. However, book doesn't, I think, and instead its child pages become orphans.
hook_lineage($op, $object)- $op = 'parent'. The hook should return an object with var $id and var $signature for the parent of $object. This can be NULL.
$op = 'signature'. The hook should return a signature for the given object.
$op = 'children'. The hook should return an array of objects, each with $id and $signature populated.
- Multiple Inheritance
- In order to handle multiple inheritance properly, I think all that really needs to happen is to add one more field to the table, 'parent', so that you can identify which lineage tree is important during maintenance. You need to add that field to the delete_object call above as well. Are there other issues with this I'm not thinking of?
- Other Issues
- The book and taxonomy modules sort the same way -- weight, then title. Weight is easly converted to a string using sprintf, and the title is already sortable. Comment sorts purely on date, however, and it sorts descending. Meaning that in order to sort properly in a lineage, the comment date will have to be inverted, and then padded to the proper width. Alternatively, instead of doing that at all, if the comment module knows how many comments there already are, it can just number them in the order that it wants. This will take some thought. It's sort of orthogonal to this proposal but it's an important consideration. I welcome any discussion on this one.
Where in core would such a system live? Does it deserve it's own .inc? I'm not familiar with how these decisions get made, so I can't even answer this one at this time. I encourage people who are closer to core to help answer this one.
I welcome any comments, critiques or issues one might have with this.
Comments
Book module
Moshe pointed out on the list that the book module supports multiple inheritance in code, if not in UI.
What if this multiple inheritance were in the form of aliases? Where each leaf is a node-type that specifically points to the actual node, but has enough information to form the breadcrumb properly? But it doesn't contain any data, so that you don't run into problems with copying.
-- Merlin
[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
-- Merlin
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]
something like this?
instead of directly using nid (or tid) in the tree, use a hid (hierarchy id) and have a (hid, nid) and (hid, tid) table.
--
Read my developer blog on Drupal4hu. | The news is Now Public
--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.
Ok, I can see the utility of
Ok, I can see the utility of this. I'll give it some thought.
-- Merlin
[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
-- Merlin
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]
The existing Lineage code
The existing Lineage code can be views in CVS. I link here for convenience.
-- Merlin
[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
-- Merlin
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]
I don't see the need
I don't think the "I don't like recursion" argument is enough to justify code which will be much more complex, while offering no clear benefit.
The argument in the cited article about recursion being inefficient seems to be grounded in misunderstanding. The biggest component of cost here will be the database queries involved in actually getting the contents. Someone correct me if I'm wrong, but with a properly tuned database and sensible queries, one complex query returning 100 records isn't going to be that much more efficient that 100 simple selects returning one record.
There's also much more futzing about required when you need to move a node, for example.
I don't want to sound negative, but the very name 'nested trees' indicates something wrong with this approach. What are the trees nested in? A tree. And what is tree of trees? A tree. So why not just call it a tree and be done with it.
Finally, if a structural API needs to know what's inside the nodes it purports to structure, then something is wrong with the design.
If we're really concerned about the cost of database operations on trees, it might be better to consider caching the structure, or perhaps do away with encoding the structure in the database at all. Why not represent the tree as a PHP structure, persist it, deal with the complexities of caching and transactions instead?
I do agree, however, that an elegant solution to separating aggregate structures such as trees and lists from specific instances such as book and 'todo' is necessary.
Just my 2 cents worth
--
puregin
I don't think the "I don't
The problem with recursion is that it requires quite a bit of code, that that code isn't straightforward. Currently, every place in the system that uses it the code has to jump around a bit, load a bunch of stuff into memory and sort it there. If nothing else, it increases memory usage which we know to be at a premium in Drupal.
The clear benefit is simplicity of code. The reason this came about is that my Views module wants to be able to get a listing of nodes in one query, in order to remain generic. At the moment, it can't. Using the Taxonomy Lineage module, it can. Likewise, being able to get a book hierarchy in one simple query could be very valuable. While I would have to run tests, I'm pretty sure we would find that on a table with, say, 1,000 records, the nested-trees approach would return the desired recordset much faster than the existing approach.
Yes, but the API already handles this. In fact, that's the API's whole raison d'etre, and yes, it's a weighty operation, but updates happen a lot less than reads.
I can't argue against the name 'nested trees' sounding bad, unfortunately. I don't like the name either. And I don't think the name really describes the solution. It's really a hierarchy encoding to create a sort key. In fact, I would say that this is just a fancy index.
The API doesn't need to know what's inside the node, except in the sense that it needs to know how nodes are sorted, and how to find the parent/child of a given node when updating children. That's the reason to provide a callback to whomever's using the API.
Wait, doesn't 'persist' it mean storing it in the database? I'm not sure I really understand what you're getting at here. I'm inferring storing a serialized PHP object, and I'm sure you don't mean that. It's clear that's a direction Drupal is moving away from.
Thank you very much for your commentary! This is helpful.
-- Merlin
[Point the finger: Assign Blame!]
[Read my writing: ehalseymiles.com]
-- Merlin
[Read my writing: ehalseymiles.com]
[Read my Coding blog: Angry Donuts]
Comment module
We use something similar for comments; not having to execute one or two queries for each comment attached to a node is absolutely necessary.
Comment signatures and sort orderings
Taken from comment.module: