Separate revisions from node.module
| Project: | Drupal |
| Version: | 8.x-dev |
| Component: | node system |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
There is a good deal of code (300 loc and counting) in node module that deals with revisions. I think we could see many benefits from stripping this out (at least the UI part) into its own separate module.
Advantages:
- People who don't use and will never use revisions don't need to load that code on every page view.
- With the code separated out, we could do some fancy stuff with revisions like integrate diffing and allow moderated revisions without further complicating the logic (or adding to the 3,000 lines of code) of node.module.
- If we go 'all the way', and make it so that the extra join on the node_revisions table is only done when revisions are enabled for a particular node, this will provide performance benefits (chx's idea).
I'm going to start simple and just strip out the blatantly revision-related stuff (permissions, UI, log field, revisions checkbox).

#1
Here's a preliminary patch. Still needs lots and lots of work. For the most part, this is a copy/paste job from node.module, except for:
- moved columns around in node / node_revisions table in preparation for tweaking various queries
- made log message actually use the theme_node_revisions_log_message function (as opposed to it just sitting there like it is currently)
- changed 3rd argument from $user to $account in node/node_revisions_user, per coding standards.
Still TODO:
* - help page.
* - node_load.
* - node_save.
* - upgrade path.
* - rework a bazillion queries :)
The idea is to only join on node_revisions if:
a) the node is revision-enabled
b) you are dealing with a revision other than the currently published one.
This requires copying the body/teaser/format fields back to {node}. So the stuff in {node} will always be the currently published revision's stuff. No need to go digging around the node_revisions table unless you're on a page like node/#/revisions.
#2
node_save updated. node_load isn't though, so patch is still pretty useless atm.
#3
Updated patch, with some node_load action...only querying the node table unless a particular revision is specified.
"in theory" I'd like to get the query to node_revisions out of node_load entirely. Not sure yet how to do that though... I talked with chx about this and he pointed out that I can't just override $node->body it in nodeapi op 'load' because other modules might want to override its output.
#4
This patch has two small changes:
- Gets rid of the theme_revisions_overview function; This was an attempt to give some control to module authors over the output of that page, but I'll futz with that more in a separate patch.
- Adds a return $node->nid to the end of node_save, so calling modules can know what they just added.
#5
also getting rid of node_revisions_load function I was dinking around with but which wasn't doing anything.
#6
Here's a patch that fixes a bunch of stuff so that node_revisions.module should now be (mostly) working.
Known issues:
- When you revert a revision, timestamp is not there so it defaults to Jan 1 1970.
- I had to take out the where condition in node_load in order to get it to not show "page not found" on node/#/revisions/# .. this is probably sub-optimal. :P
Enough of a patch is here now though that this could probably be benchmarked to see if it's worth going the rest of the way.
#7
Even though there are still issues, I've marked this code needs review ... several people have expressed interest in benchmarking this, and I'd love to get some feedback on it from them before I go further.
#8
here's a patch with the -up option, which also gets rid of the return $node->nid from above (will handle fighting for that in a separate patch :)) and adds in a missing check for node_access('delete') on the revisions.
Here's the command for those who are interested:
svn diff --diff-cmd /usr/bin/diff -x "-up" > node_revisions.patch
#9
Here's the start of an upgrade path. Doesn't work yet. I'll try and look more when I get home later.
#10
I gave a little love to the update path.
#11
Now with brand spankin' new upgrade path!*
*BACK UP before running it. It destroys the node and node revisions table, dontchaknow. ;)
#12
chx caught a logic error in the upgrade path. fixed now. This patch also moves the CREATE {node_revisions} stuff to the node_revisions.install file and adds in some missing $Id$s.
#13
Yeesh. :P thanks chx for reminding me not to compare table values that don't exist yet. ;)
#14
Needed a re-roll.
#15
subscribing
#16
Minor: changed the system.install function to system_update_2005().
#17
me too
#18
I need to update this latest patch, since it's missing, oh. the node_revision module. :P
#19
Ok! This should have all the bits. :)
Btw, that wasn't a slam against Chris Kennedy's re-rolling.. I meant the re-rolling I did at "way too late a.m." the other day. ;)
#20
I don't have time to re-roll the patch atm, but I was talking to Nate / quicksketch about this the other day, and he had a great idea.
In order to truly de-couple node and node_revisions.module, we should make a separate node_revisions_revision_load($nid, $vid) function in the node_revisions.module, and completely remove any references to revision in node_load.
#21
Needs a complete rewrite after the module split patch. The other benefits sound very nice.
#22
I definitely think this would provide a substantial benefit to a large number of users like me that don't have any need for node revisions on some of my sites. Will help test.
#23
I'm obviously not going to have time to work on this anymore. ;) But would love to review and commit such a patch. :D
#24
For all the reasons webchick advanced, I absolutely support this request!
This is logically the first step.
Secondly we would integrate diffing.
Third important step would be integrating Save as draft's logic into the new core node_revisions.module, which simply means: The latest revision doesn't necessarily need to be the one that is currently published.
Fourth step: Of Revision Moderation there should be only a few lines left that could be merged into modr8.
Whether we then want to have modr8 in core or not, that's an entirely different issue.
Does that sound like a good plan?
#25
... awesome !
#26
... but it's probably not gonna happen this way: diffing will be included rightaway, because the other patch is close to rtbc. This means we need to wait until that one is committed and then reconsider how we want to split node.module.
So the best thing we can do right at the moment is to review the "diff in core" patch. The more reviews the better, so we can soon return to this issue here.
#27
This makes complete sense to me, and then adding drafts and simplifying moderation are good plans too.
#28
Oh my,
I've been working all day on a new version of this patch. As you all know, many things have changed in the meanwhile, so I had to start all over again.
But now that most of the work seemed to be done, I was hit by a logic error, which turned out to be a major problem:
I see the following options:
In both cases, if later Node Revisioning gets enabled again, we need to copy over our new nodes from {node} to {node_revisions}.
Either of these ways will work. But we need to take a decision which way we want to go with this patch. Remember, this is about improving performance for small sites that don't need node revisioning. Please throw in your cents ;)
P.S. Only for the records, I enclose my latest patch. It follows the first solution, but doesn't implement a) or b) yet. At the moment this is not worth testing (so I added ".txt" to keep it from being slandered by the testbed ;-).
(1st edit: clarification on solution 2)
(2nd edit: clarification on solution 3)
#29
Does the current patch allow enough to benchmark the difference between 1 and 3? I'd be interested to know how much difference the extra join makes on node/1 and node pages. If it's not a great deal of gain, then it would simplify things a lot to keep it as, is except for moving the code around so that's not loaded unless necessary - both option 1 and 2 are going to take quite a bit of thought and complexity to keep things synced nicely.
I've also been wondering if we could remove the join on the user table - either by denormalising name, and/or moving that into user_nodeapi and only adding the fields if authoring information / user pictures are enabled.
#30
I clarified the alternatives in my post above. Hopefully this is also what catch thought them to be, but after the edits I can't nail him down on his opinion anymore ;-)
We are actually having a fourth alternative:
#31
Thanks catch for guessing with me, but I agree that we need hard facts, i.e. benchmarks. Therefore we need to materialize some (maybe two) of the four alternatives to a working patch, and then benchmark them against each other and the status quo. We can't do this for all of them - this would just be too much wasted work. To help us pick two of the four alternatives (at least for now), I did some "theoretical benchmarks". If they give us a general idea about the performance in comparison, they exactly fulfill what they are meant to.
In theory, things should be as following:
+2 for 500 lines less code loaded
-1/-2 for 1 resp. 2 queries to determine {node}.vid
-2 for needing to write a full record on {node}
+5 for no need to write a record on {node_revision}
+3 for no need to update {node}.vid
+1 for 1500 lines less code loaded
Notes:
My personal opinion?
1 and 3 look promising:
Some more input on this would be great, otherwise my plans would be to start working out alternative 1 to compare its performance to the status quo. This would show how large the difference between good and evil actually is. Then it might be easier to decide on the route to go.
#32
I reworked the last patch to reflect the renaming of tables, the introduction of node_load_multiple() and other changes in core.
However the results of the benchmarks I made are not impressive at all. I tested both / and /node/xyz with and without cache. Also I benchmarked a PHP page running 100 heavy node_load_multiple() queries (with $reset flag on).
Apache Bench shows no significant improvements in overall performance.
The Devel query log indicates an improvement around 1/3ms per node_load_multiple() query. This is virtually nothing compared to the heavy 'cache-get' queries or 'upload' (which should get a batch loader as well).
Obviously the additional join doesn't make any difference, and the performance increase of node_load_multiple() can't be further improved.
This certainly doesn't mean we should not make the {node} table independent from {node_revisions}, if we have other reasons. But - if I got it right - we should not expect a significant performance increase from that.
If somebody could repeat and verify my benchmarks (maybe on a more reliable testing server than mine), we should be ready to decide on our path and move forward.
The enclosed patch separates node_revision.module from node.module. Note that the 'Node revision' module should stay disabled, and saving nodes in the stripped node.module will also fail (because determining the vid has not yet been implemented, see above). However, loading and viewing nodes works fine, which at the moment is all we need for doing benchmarks.
#33
Switching this to CNR, so it has at least a chance to get some review... :)
#34
The last submitted patch failed testing.
#35
Not bad - I expected more fails... as said in #32, everything needed for benchmarking works fine.
Some more benchmarks for verification and feedback on the general plan is appreciated.
#36
Nice work Pancho! I do see a small improvement in the benchmarks [you did] with the patch, not to mention how this will make node editing/creation less complicated for a large number of Drupal sites.
#37
That'd be cool! Could you please post the figures and how they were measured?
Also, note that the question is not if we separate out the node revisioning code, but to which extent node.module should be independent from the {node_revision} table.
So either way we go, we're both getting the UI simplification for simple websites and opening up the possibility to add more features to node revisioning (diffing, for example).
#38
It'd be very much worth considering splitting out the loading of revisions from node_load() - or at least doing it using hook_query_alter() or similar. node revisions, and the lowercase queries in user and taxonomy module are the main barriers to re-using code in the new multiple_load() functions - which could probably be abstracted into one or two common helper functions otherwise.
#39
Please note that node_revisions in its current form may be something of the past thanks to the fields in core initiative. Storing the full $node object in a node_archived_revisions table or something like this could perhaps be enough, and would really simplify the handling code.
#40
subscribe
#41
Please keep in mind the importance of node moderation when playing with revision system. It remains a gaping hole in Core. One I sincerely hope is plugged in Drupal 7.
See: #218755: support revisions in different states
#42
Should the priority be optimizing performance for small sites, or optimizing performance for large sites? It seems to me that larger, more complex sites are more likely to push Drupal's performance limits and hence would see more benefit from optimization. These sites also seem more likely to make use of revisions (especially if dif and moderation are included). Simplifying things and improving performance for sites that don't use revisions is a worthy goal, but shouldn't the adoption of these changes be contingent on their not having a negative impact on performance when revisions are enabled? Just a thought.
#43
Given we have fields in core, shouldn't Body be a separate field? Especially when it's optional anyway.
#44
DamienMcKenna : See #372743: Body and teaser as fields
#45
yched: thanks for referencing that, it's a related change but wasn't mentioned here previously. Cool :)
#46
I haven't checked the status of this for a while - a useful cleanup in itself would be a node_load_revision() function to remove all the $vid special casing from node_load_multiple().
#47
Do we still wait for #120955: Integrate diff into core with this?
It was linked here as RTBC last September, but I just came back from rerolling a patch last updated in December, and the discussion there left it unclear if it would happen at all.
#48
My thoughts are to get this in, and then we can take #120955: Integrate diff into core further afterwards. I know that Dries has some reservations about Diff in Core, so it might seem that we will have a better win on getting the split done first, which then would make more sense in having Diff integrated to Revisions.
My thought is to postpone #120955: Integrate diff into core on this issue, get Revisions split into it's own module which we can then work on improving without playing around with Node module.
And another thought as to why splitting Revisions off could be worth while - long term, we could look to expand Revisions to cover more than just Nodes, possibly?
Soli Deo Gloria
#49
Let's do this once #372743: Body and teaser as fields is in, which will make it a much smaller patch.
#50
#372743: Body and teaser as fields is in, this shouldn't be postponed for that issue's documentation. :)
#51
#52
Now that it is postponed, instead of using diff, I guess a good 'auditing' system to track changes on specific fields could be a good approach. Although the output of diff could be used to view the changes, it is not focused on content auditing, well.. entity (why only content, users would require auditing also) auditing.