Separate revisions from node.module

webchick - February 20, 2007 - 17:27
Project:Drupal
Version:8.x-dev
Component:node system
Category:task
Priority:normal
Assigned:Unassigned
Status:needs work
Description

There is a good deal of code (300 loc and counting) in node module that deals with revisions. I think we could see many benefits from stripping this out (at least the UI part) into its own separate module.

Advantages:
- People who don't use and will never use revisions don't need to load that code on every page view.
- With the code separated out, we could do some fancy stuff with revisions like integrate diffing and allow moderated revisions without further complicating the logic (or adding to the 3,000 lines of code) of node.module.
- If we go 'all the way', and make it so that the extra join on the node_revisions table is only done when revisions are enabled for a particular node, this will provide performance benefits (chx's idea).

I'm going to start simple and just strip out the blatantly revision-related stuff (permissions, UI, log field, revisions checkbox).

#1

webchick - February 24, 2007 - 01:59
Category:bug report» feature request
Status:active» needs work

Here's a preliminary patch. Still needs lots and lots of work. For the most part, this is a copy/paste job from node.module, except for:
- moved columns around in node / node_revisions table in preparation for tweaking various queries
- made log message actually use the theme_node_revisions_log_message function (as opposed to it just sitting there like it is currently)
- changed 3rd argument from $user to $account in node/node_revisions_user, per coding standards.

Still TODO:
* - help page.
* - node_load.
* - node_save.
* - upgrade path.
* - rework a bazillion queries :)

The idea is to only join on node_revisions if:
a) the node is revision-enabled
b) you are dealing with a revision other than the currently published one.

This requires copying the body/teaser/format fields back to {node}. So the stuff in {node} will always be the currently published revision's stuff. No need to go digging around the node_revisions table unless you're on a page like node/#/revisions.

AttachmentSizeStatusTest resultOperations
node_revisions_0.patch25.94 KBIgnoredNoneNone

#2

webchick - February 24, 2007 - 02:58

node_save updated. node_load isn't though, so patch is still pretty useless atm.

AttachmentSizeStatusTest resultOperations
node_revisions_1.patch32.18 KBIgnoredNoneNone

#3

webchick - February 24, 2007 - 03:10

Updated patch, with some node_load action...only querying the node table unless a particular revision is specified.

"in theory" I'd like to get the query to node_revisions out of node_load entirely. Not sure yet how to do that though... I talked with chx about this and he pointed out that I can't just override $node->body it in nodeapi op 'load' because other modules might want to override its output.

AttachmentSizeStatusTest resultOperations
node_revisions_2.patch34.12 KBIgnoredNoneNone

#4

webchick - February 24, 2007 - 03:19

This patch has two small changes:
- Gets rid of the theme_revisions_overview function; This was an attempt to give some control to module authors over the output of that page, but I'll futz with that more in a separate patch.
- Adds a return $node->nid to the end of node_save, so calling modules can know what they just added.

AttachmentSizeStatusTest resultOperations
node_revisions_3.patch34.19 KBIgnoredNoneNone

#5

webchick - February 24, 2007 - 03:21

also getting rid of node_revisions_load function I was dinking around with but which wasn't doing anything.

AttachmentSizeStatusTest resultOperations
node_revisions_4.patch32.62 KBIgnoredNoneNone

#6

webchick - February 24, 2007 - 04:04

Here's a patch that fixes a bunch of stuff so that node_revisions.module should now be (mostly) working.

Known issues:
- When you revert a revision, timestamp is not there so it defaults to Jan 1 1970.
- I had to take out the where condition in node_load in order to get it to not show "page not found" on node/#/revisions/# .. this is probably sub-optimal. :P

Enough of a patch is here now though that this could probably be benchmarked to see if it's worth going the rest of the way.

AttachmentSizeStatusTest resultOperations
node_revisions_5.patch32.97 KBIgnoredNoneNone

#7

webchick - February 24, 2007 - 04:16
Status:needs work» needs review

Even though there are still issues, I've marked this code needs review ... several people have expressed interest in benchmarking this, and I'd love to get some feedback on it from them before I go further.

#8

webchick - February 24, 2007 - 13:40

here's a patch with the -up option, which also gets rid of the return $node->nid from above (will handle fighting for that in a separate patch :)) and adds in a missing check for node_access('delete') on the revisions.

Here's the command for those who are interested:
svn diff --diff-cmd /usr/bin/diff -x "-up" > node_revisions.patch

AttachmentSizeStatusTest resultOperations
node_revisions_6.patch33.49 KBIgnoredNoneNone

#9

webchick - February 24, 2007 - 14:56

Here's the start of an upgrade path. Doesn't work yet. I'll try and look more when I get home later.

AttachmentSizeStatusTest resultOperations
node_revisions_7.patch34.76 KBIgnoredNoneNone

#10

chx - February 24, 2007 - 18:48

I gave a little love to the update path.

AttachmentSizeStatusTest resultOperations
node_revisions_8.patch35.07 KBIgnoredNoneNone

#11

webchick - February 24, 2007 - 20:31

Now with brand spankin' new upgrade path!*

*BACK UP before running it. It destroys the node and node revisions table, dontchaknow. ;)

AttachmentSizeStatusTest resultOperations
node_revisions_9.patch22.97 KBIgnoredNoneNone

#12

webchick - February 24, 2007 - 20:48

chx caught a logic error in the upgrade path. fixed now. This patch also moves the CREATE {node_revisions} stuff to the node_revisions.install file and adds in some missing $Id$s.

AttachmentSizeStatusTest resultOperations
node_revisions_10.patch38.31 KBIgnoredNoneNone

#13

webchick - February 24, 2007 - 20:52

Yeesh. :P thanks chx for reminding me not to compare table values that don't exist yet. ;)

AttachmentSizeStatusTest resultOperations
node_revisions_11.patch38.31 KBIgnoredNoneNone

#14

webchick - March 23, 2007 - 06:02

Needed a re-roll.

AttachmentSizeStatusTest resultOperations
node_revisions_13.patch24.46 KBIgnoredNoneNone

#15

rötzi - March 25, 2007 - 22:43

subscribing

#16

ChrisKennedy - March 25, 2007 - 23:21

Minor: changed the system.install function to system_update_2005().

AttachmentSizeStatusTest resultOperations
node_revisions_14.patch24.17 KBIgnoredNoneNone

#17

toemaz - March 25, 2007 - 23:36

me too

#18

webchick - March 26, 2007 - 00:08
Status:needs review» needs work

I need to update this latest patch, since it's missing, oh. the node_revision module. :P

#19

webchick - March 26, 2007 - 01:29
Status:needs work» needs review

Ok! This should have all the bits. :)

Btw, that wasn't a slam against Chris Kennedy's re-rolling.. I meant the re-rolling I did at "way too late a.m." the other day. ;)

AttachmentSizeStatusTest resultOperations
node_revisions_15.patch37.2 KBIgnoredNoneNone

#20

webchick - March 27, 2007 - 20:57
Status:needs review» needs work

I don't have time to re-roll the patch atm, but I was talking to Nate / quicksketch about this the other day, and he had a great idea.

In order to truly de-couple node and node_revisions.module, we should make a separate node_revisions_revision_load($nid, $vid) function in the node_revisions.module, and completely remove any references to revision in node_load.

#21

catch - October 29, 2007 - 17:16
Version:6.x-dev» 7.x-dev

Needs a complete rewrite after the module split patch. The other benefits sound very nice.

#22

Dave Reid - September 6, 2008 - 20:40

I definitely think this would provide a substantial benefit to a large number of users like me that don't have any need for node revisions on some of my sites. Will help test.

#23

webchick - September 6, 2008 - 21:58
Assigned to:webchick» Anonymous

I'm obviously not going to have time to work on this anymore. ;) But would love to review and commit such a patch. :D

#24

Pancho - September 12, 2008 - 08:51

For all the reasons webchick advanced, I absolutely support this request!

This is logically the first step.
Secondly we would integrate diffing.
Third important step would be integrating Save as draft's logic into the new core node_revisions.module, which simply means: The latest revision doesn't necessarily need to be the one that is currently published.
Fourth step: Of Revision Moderation there should be only a few lines left that could be merged into modr8.
Whether we then want to have modr8 in core or not, that's an entirely different issue.

Does that sound like a good plan?

#25

lilou - September 12, 2008 - 10:06

Does that sound like a good plan ?

... awesome !

#26

Pancho - September 12, 2008 - 12:28

... but it's probably not gonna happen this way: diffing will be included rightaway, because the other patch is close to rtbc. This means we need to wait until that one is committed and then reconsider how we want to split node.module.
So the best thing we can do right at the moment is to review the "diff in core" patch. The more reviews the better, so we can soon return to this issue here.

#27

catch - September 12, 2008 - 17:28

This makes complete sense to me, and then adding drafts and simplifying moderation are good plans too.

#28

Pancho - December 2, 2008 - 02:27
Category:feature request» task

Oh my,

I've been working all day on a new version of this patch. As you all know, many things have changed in the meanwhile, so I had to start all over again.
But now that most of the work seemed to be done, I was hit by a logic error, which turned out to be a major problem:

If {node_revisions} has never been installed:
We can copy our {node}.nid to {node}.vid.
If {node_revisions} is installed and Node Revisioning is enabled:
We can copy the appropriate {node_revisions}.vid to {node}.vid.
If {node_revisions} is installed, but Node Revisioning has been disabled:
We can't assume the next unused {node}.vid without querying {node_revisions}.

I see the following options:

  1. We start without a {node_revisions} table (so the node table needs to be self-sufficient). As soon as Node Revisioning gets installed, we copy over the existing nodes and start using both tables. Now, if Node Revisioning is being disabled, we can double-check if any node has additional revisions. If not, we can automatically uninstall {node_revisions}. But if yes:
    1. we reintroduce some kind of sequence variable, so we know the next vid to use in {node}.
    2. instead we could query the highest {node_revisions}.vid and the highest {node}.vid and take the max of both +1 as the next vid.

    In both cases, if later Node Revisioning gets enabled again, we need to copy over our new nodes from {node} to {node_revisions}.

  2. We start without a {node_revisions} table, but once Node Revisioning has been installed we keep using the table until a full uninstall. As long as Node Revisioning is only disabled, we keep syncing {node_revisions} on every added, edited or deleted {node} entry. Should Node Revisioning be enabled again, we can just keep on working.
  3. Or we keep creating the {node_revisions} table on every install (self-sufficient, though) and keep using it. Only code that is specific to manipulating multiple revisions is split out into a new module. Then we don't have to cope with this, but have the least performance gains for simple websites w/o revisioning.

Either of these ways will work. But we need to take a decision which way we want to go with this patch. Remember, this is about improving performance for small sites that don't need node revisioning. Please throw in your cents ;)

P.S. Only for the records, I enclose my latest patch. It follows the first solution, but doesn't implement a) or b) yet. At the moment this is not worth testing (so I added ".txt" to keep it from being slandered by the testbed ;-).

(1st edit: clarification on solution 2)
(2nd edit: clarification on solution 3)

AttachmentSizeStatusTest resultOperations
node_revisions_16.patch.txt62.93 KBIgnoredNoneNone

#29

catch - December 1, 2008 - 16:18

Does the current patch allow enough to benchmark the difference between 1 and 3? I'd be interested to know how much difference the extra join makes on node/1 and node pages. If it's not a great deal of gain, then it would simplify things a lot to keep it as, is except for moving the code around so that's not loaded unless necessary - both option 1 and 2 are going to take quite a bit of thought and complexity to keep things synced nicely.

I've also been wondering if we could remove the join on the user table - either by denormalising name, and/or moving that into user_nodeapi and only adding the fields if authoring information / user pictures are enabled.

#30

Pancho - December 2, 2008 - 02:33

I clarified the alternatives in my post above. Hopefully this is also what catch thought them to be, but after the edits I can't nail him down on his opinion anymore ;-)

We are actually having a fourth alternative:

  1. Finally we can leave the data structure as-is, which means that we still have to join {node-revision} on viewing a node. As in (3), only code that is specific to manipulating multiple revisions is split out into a new module.

#31

Pancho - December 2, 2008 - 03:16

Thanks catch for guessing with me, but I agree that we need hard facts, i.e. benchmarks. Therefore we need to materialize some (maybe two) of the four alternatives to a working patch, and then benchmark them against each other and the status quo. We can't do this for all of them - this would just be too much wasted work. To help us pick two of the four alternatives (at least for now), I did some "theoretical benchmarks". If they give us a general idea about the performance in comparison, they exactly fulfill what they are meant to.

In theory, things should be as following:

LOAD/UPDATE NODE alternative 1 alternative 2 alternative 3 alternative 4 status quo
Node Revisioning uninstalled
10/5
10/-1
2/1
0/0
Node Revisioning installed, but disabled 10/4 10/-2
Node Revisioning installed and enabled
8/-2 0/0
LOAD NODE: UPDATE NODE:
+8 for no join on {node_revision}
+2 for 500 lines less code loaded
-1 for needing to determine whether {node_revision} exists
-1/-2 for 1 resp. 2 queries to determine {node}.vid
-2 for needing to write a full record on {node}
+5 for no need to write a record on {node_revision}
+3 for no need to update {node}.vid
+1 for 1500 lines less code loaded

Notes:

  • More is better.
  • Diff and save_as_draft are already expected to be included afterwards, which should only affect the amount of loaded code. If you disagree, you need to substract 1 (for loads) resp. 0,5 (for updates) from the first two rows, except from the status quo.
  • Note that the values are just freely weighed estimates, that might or might not turn out to be close to the truth. Also I'm obviously weighing loads higher than updates.
  • My personal opinion?
    1 and 3 look promising:

    • While 1 should bring us maximum performance, it is a bit tricky to implement. Especially the syncing after reenabling Node Revisioning will be quite a task - fortunately we don't need to care too much about this, just to get it benchmarked.
    • 2 would be a compromise, if 1 in the end doesn't work out. Ensuring data integrity is no issue, and at least with a completely uninstalled {node_revisions} table even updates should perform pretty well.
    • 3 is rather defensive and easy to implement, but should still give us good performance on loads. The slightly worse performance on updates is still acceptable.
    • 4 would be a last resort, if we otherwise wouldn't get anything done. In the end it is just some code reorganization which still brings us better modularization, a less cluttered admin UI and some minor performance improvements.
    • Some more input on this would be great, otherwise my plans would be to start working out alternative 1 to compare its performance to the status quo. This would show how large the difference between good and evil actually is. Then it might be easier to decide on the route to go.

#32

Pancho - December 8, 2008 - 17:24

I reworked the last patch to reflect the renaming of tables, the introduction of node_load_multiple() and other changes in core.

However the results of the benchmarks I made are not impressive at all. I tested both / and /node/xyz with and without cache. Also I benchmarked a PHP page running 100 heavy node_load_multiple() queries (with $reset flag on).

Apache Bench shows no significant improvements in overall performance.
The Devel query log indicates an improvement around 1/3ms per node_load_multiple() query. This is virtually nothing compared to the heavy 'cache-get' queries or 'upload' (which should get a batch loader as well).

Obviously the additional join doesn't make any difference, and the performance increase of node_load_multiple() can't be further improved.

This certainly doesn't mean we should not make the {node} table independent from {node_revisions}, if we have other reasons. But - if I got it right - we should not expect a significant performance increase from that.

If somebody could repeat and verify my benchmarks (maybe on a more reliable testing server than mine), we should be ready to decide on our path and move forward.

The enclosed patch separates node_revision.module from node.module. Note that the 'Node revision' module should stay disabled, and saving nodes in the stripped node.module will also fail (because determining the vid has not yet been implemented, see above). However, loading and viewing nodes works fine, which at the moment is all we need for doing benchmarks.

AttachmentSizeStatusTest resultOperations
node_revisions_17.patch63.79 KBIdleFailed: 4938 passes, 26 fails, 27 exceptionsView details | Re-test
as-is.txt4.89 KBIgnoredNoneNone
no_join_on_node-revision.txt4.9 KBIgnoredNoneNone

#33

Pancho - December 8, 2008 - 15:57
Status:needs work» needs review

Switching this to CNR, so it has at least a chance to get some review... :)

#34

System Message - December 8, 2008 - 16:20
Status:needs review» needs work

The last submitted patch failed testing.

#35

Pancho - December 8, 2008 - 16:37
Status:needs work» needs review

Not bad - I expected more fails... as said in #32, everything needed for benchmarking works fine.
Some more benchmarks for verification and feedback on the general plan is appreciated.

#36

Dave Reid - December 11, 2008 - 20:04

Nice work Pancho! I do see a small improvement in the benchmarks [you did] with the patch, not to mention how this will make node editing/creation less complicated for a large number of Drupal sites.

#37

Pancho - December 8, 2008 - 17:33

That'd be cool! Could you please post the figures and how they were measured?

Also, note that the question is not if we separate out the node revisioning code, but to which extent node.module should be independent from the {node_revision} table.
So either way we go, we're both getting the UI simplification for simple websites and opening up the possibility to add more features to node revisioning (diffing, for example).

#38

catch - December 19, 2008 - 01:01

It'd be very much worth considering splitting out the loading of revisions from node_load() - or at least doing it using hook_query_alter() or similar. node revisions, and the lowercase queries in user and taxonomy module are the main barriers to re-using code in the new multiple_load() functions - which could probably be abstracted into one or two common helper functions otherwise.

#39

Damien Tournoud - December 19, 2008 - 01:07

Please note that node_revisions in its current form may be something of the past thanks to the fields in core initiative. Storing the full $node object in a node_archived_revisions table or something like this could perhaps be enough, and would really simplify the handling code.

#40

bjaspan - December 27, 2008 - 14:31

subscribe

#41

jstoller - January 3, 2009 - 20:34

Please keep in mind the importance of node moderation when playing with revision system. It remains a gaping hole in Core. One I sincerely hope is plugged in Drupal 7.

See: #218755: support revisions in different states

#42

jstoller - March 9, 2009 - 21:50

Should the priority be optimizing performance for small sites, or optimizing performance for large sites? It seems to me that larger, more complex sites are more likely to push Drupal's performance limits and hence would see more benefit from optimization. These sites also seem more likely to make use of revisions (especially if dif and moderation are included). Simplifying things and improving performance for sites that don't use revisions is a worthy goal, but shouldn't the adoption of these changes be contingent on their not having a negative impact on performance when revisions are enabled? Just a thought.

#43

DamienMcKenna - April 9, 2009 - 18:17

Given we have fields in core, shouldn't Body be a separate field? Especially when it's optional anyway.

#44

yched - April 9, 2009 - 20:50

#45

DamienMcKenna - April 9, 2009 - 20:53

yched: thanks for referencing that, it's a related change but wasn't mentioned here previously. Cool :)

#46

catch - April 11, 2009 - 16:46

I haven't checked the status of this for a while - a useful cleanup in itself would be a node_load_revision() function to remove all the $vid special casing from node_load_multiple().

#47

Arancaytar - May 19, 2009 - 00:23

Do we still wait for #120955: Integrate diff into core with this?

It was linked here as RTBC last September, but I just came back from rerolling a patch last updated in December, and the discussion there left it unclear if it would happen at all.

#48

wretched sinner... - May 19, 2009 - 00:45

My thoughts are to get this in, and then we can take #120955: Integrate diff into core further afterwards. I know that Dries has some reservations about Diff in Core, so it might seem that we will have a better win on getting the split done first, which then would make more sense in having Diff integrated to Revisions.

My thought is to postpone #120955: Integrate diff into core on this issue, get Revisions split into it's own module which we can then work on improving without playing around with Node module.

And another thought as to why splitting Revisions off could be worth while - long term, we could look to expand Revisions to cover more than just Nodes, possibly?

Soli Deo Gloria

#49

catch - June 2, 2009 - 00:03
Status:needs review» postponed

Let's do this once #372743: Body and teaser as fields is in, which will make it a much smaller patch.

#50

cwgordon7 - June 21, 2009 - 07:58
Status:postponed» needs work

#372743: Body and teaser as fields is in, this shouldn't be postponed for that issue's documentation. :)

#51

catch - October 13, 2009 - 12:12
Version:7.x-dev» 8.x-dev

#52

ilo - October 30, 2009 - 01:04

Now that it is postponed, instead of using diff, I guess a good 'auditing' system to track changes on specific fields could be a good approach. Although the output of diff could be used to view the changes, it is not focused on content auditing, well.. entity (why only content, users would require auditing also) auditing.

 
 

Drupal is a registered trademark of Dries Buytaert.