Comment body as field
catch - August 2, 2009 - 19:02
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | comment.module |
| Category: | task |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | reviewed & tested by the community |
| Issue tags: | beta blocker, Exception code freeze, Fields in Core, RDF |
Description
Now we have fieldable comments, it makes sense to have comment body and subject as fields. Follow-up from #504666: Make comments fieldable.

#1
Adding tag.
#2
Bump. Not sure I'm going to have time to work on this before Wednesday, but an additional sweetener is this would remove a database query per check_markup() (cache_get(), so only with database caching)) - which is up to 300 on pages with long comment threads.
#3
Here's an initial patch:
Took on both comment body and comment subject - you might want just body for a blog or forum, or just title + video for something else, so makes sense to have them both fields and both optional.
Removed some of the hard-coded references to comment->subject - we either use 'in reply to $node->title' or $comment->cid in those situations.
Still TODO:
fix tests. I got weird taxonomy errors locally so had trouble even seeing what the breakage was.
Data migration for the upgrade path should be possible to just do an INSERT INTO SELECT FROM - but should fix tests first.
CNR for the test bot ony, fair bit still to do here.
#4
All comment, tracker and forum tests pass with this, will let the test bot handle the others. Shouldn't be too far off, needs benchmarks and upgrade path now.
#5
Since we no longer have a title, which is optional, but not really optional since it's hard coded into queries, so was filled in anyway with whatever garbage was at the top of a comment in a _submit handler, I had to change the comment block a bit. Obligatory screenshot. IMO it's better like this.
#6
The last submitted patch failed testing.
#7
Down to three fails in search module now. This is due to #523894: Comment indexing does not include the fully rendered comment which may have to be merged in here..
#8
The last submitted patch failed testing.
#9
Crazy idea:
Current behavior is: comment_submit() fills {comment}.subject with the beginning of {comment}.body if there's no explicit subject.
Maybe we could change that slightly:
- comment_submit() calls field_attach_view() on the whole content, renders it, strips HTML, and stores the result in {comment}.precomputed or whatever.
- where we previously displayed $comment->subject (comment administration, UI messages...), we can use
substr($comment->precomputed, 0, N)- comment_node_update_index() can simply return $comment->precomputed, and we solve performance concerns of running field_attach_view() on an arbitrary large # of comments at search_index time.
+ // @TODO do we need to check if any non-comment bundle has these fields+ // assigned before deleting? If so, what happens then?
+
+ // Delete comment_body field.
+ field_delete_field('comment_body');
Well, I was just recently thinking that having 'comment_body' available in Field UI to be shared on non-comment entities makes little sense, and will mainly clutter the 'add existing field' selector and confuse people.
I think this calls for an (optional)
$field['entity_types'] = array('comment')property, forbidding the creation of an instance on non-comment bundles.#10
The pre-computed property doesn't sound too bad to me - we still wouldn't require a subject field but it'd give something from the comment itself to look at on comment administration and elsewhere. One issue with using it for search is it might mess up search rankings for semantic HTML tags a little - but really, these are comments, and that's very broken how it is now, so I think it's fine if comment text gets a flat ranking regardless of markup.
#11
re search index: Yeah, I had the same reasoning.
Although, if we want to let the HTML in comments affect node search rankings, we could store the pre-computed output as HTML, and do the strip tags in
substr(strip_tags($comment->precomputed), 0, N)when doing the 'print beginning of comment in admin pages and messages'.Not sure that's worth it, though.
#12
Created #568084: Limit the scope of some fields to some entity types.
#13
The precomputed value is basically a cache and thats just a pain to keep fresh considering how dynamic fields can be. IMO not a good idea. I am not too worried about the indexing performance. Those pages are only viewed by a computer.
#14
"The precomputed value is basically a cache and thats just a pain to keep fresh considering how dynamic fields can be"
Well, we do know that field values only ever change when there's a call to field_attach_presave(). Comment module could implement hook_field_attach_presave() to recompute the value whenever a comment gets saved or updated ?
#15
Hmm. I'm inclined to agree with moshe. It's a problem already and a search issue rather than comment specifically. In which case we just need to fix the fails.
#16
All tests should pass now. Because we render the full comment thread in search indexing, some tests which assumed just subject and body were broke, also last patch wasn't returning anything in comment_node_update_index().
#17
Upgrade path is working - getting notices in batch.inc but seems unrelated to this patch.
Here's how it looks for performance. Generated one node with 300 comments, displaying 50 per page.
HEAD:
Executed 100 queries in 50.89 milliseconds. Page execution time was 243.59 ms.
ab -c1 -n1000 http://d7.7/node/1
Concurrency Level: 1
Time taken for tests: 235.325 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 131357000 bytes
HTML transferred: 130892000 bytes
Requests per second: 4.25 [#/sec] (mean)
Time per request: 235.325 [ms] (mean)
Time per request: 235.325 [ms] (mean, across all concurrent requests)
Transfer rate: 545.11 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 220 235 11.7 232 302
Waiting: 212 226 11.4 223 292
Total: 220 235 11.7 232 302
Patch:
Executed 51 queries in 35.04 milliseconds. Page execution time was 297.99 ms.
Document Path: /node/1
Document Length: 156276 bytes
Concurrency Level: 1
Time taken for tests: 278.143 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 156741000 bytes
HTML transferred: 156276000 bytes
Requests per second: 3.60 [#/sec] (mean)
Time per request: 278.143 [ms] (mean)
Time per request: 278.143 [ms] (mean, across all concurrent requests)
Transfer rate: 550.32 [Kbytes/sec] received
30% less time in the database. 15% slower overall.
#18
Probably that same old 'field rendering is costly' issue ?
#19
Very likely, yes.
#20
Tidied up some code comments and small change to comment.tpl.php to put the permalink somewhere more sensible.
#21
Hate to be a nitpicker here, especially since I am looking at this patch for an example for converting taxonomy term names and descriptions to fields, but is the compromise of removing comment subjects from the administrative interface necessary?
e.g.
- 'subject' => array('data' => t('Subject'), 'field' => 'subject'),- t('Are you sure you want to delete the comment %title?', array('%title' => $comment->subject)),+ t('Are you sure you want to delete the comment by %name', array('%name' => $comment->name)),
Is this potentially temporary simplicity for the patch, or is code size and performance a serious consideration in pulling this data from fields?
ben, agaric
#22
It's quite possible to imagine a comment with either a subject or a body, so makes sense to have those optional IMO - I might want to allow only video comments, or only one line subjects + ratings, or only bodies.
With term names, I think we'd want to make that a locked field since the term is it's own distinct entity with it's own management interface - can't have a term without a name.
We could potentially do that for comment->subject too, but personally I find the current behaviour of requiring it and filling it in if it's not there pretty horrible in terms of how comment subjects actually look when presented - and I've not seen many other systems with a similar implementation - most just have a single text field, or text field + rating etc. so preventing that (apart from altering the field definition if that allows you to unlock fields) seems like an unnecessary restriction to me.
#23
I'm a little behind still on grasping field and entity APi stuff, but support getting conversions in if possible [i.e. subsbcribe]
#24
Catch:
On my end, I'm seeing gremlins in the changes to themes/garland/comment.tpl.php. These need to be submitted as encoded entities?
–When that happens, I can mark RTBC.
#25
Hmm, I've seen that before in other patches, and been told it's OK - I guess it's how patch itself interprets those characters. in core, they aren't htmlencoded, so not sure how to deal with that encoding issue.
Edit: figured it out. Firefox defaults to showing patch files in ISO - if you manually change to UTF-8 it'll show up fine as the correct character.
#26
Reviewed on the plane - some actual problems and some questions
I should probably read the issue more, but I was confused by the fact that there is no UI to manage comment fields - this is a code-only feature?
$permalink is not documented in the .tpl file doxygen - is that a new variable?
If I uninstall comment module and install it again, there is a problem that the comment form on any pre-exitiong node type seems to be broken - no fields show up. This may well be a field API bug though.
A bug here - in the second line it should be $node->title, not $comment->title I think
@@ -2059,8 +2029,8 @@ function comment_form_submit($form, &$fo$redirect = array('comment/' . $comment->cid, array(), 'comment-' . $comment->cid);
}
else {
- watchdog('content', 'Comment: unauthorized comment submitted or comment submitted to a closed post %subject.', array('%subject' => $comment->subject), WATCHDOG_WARNING);
- drupal_set_message(t('Comment: unauthorized comment submitted or comment submitted to a closed post %subject.', array('%subject' => $comment->subject)), 'error');
+ watchdog('content', 'Comment: unauthorized comment submitted or comment submitted to a closed post %title.', array('%title' => $node->title), WATCHDOG_WARNING);
+ drupal_set_message(t('Comment: unauthorized comment submitted or comment submitted to a closed post %title.', array('%title' => $comment->title)), 'error');
minor typo leading to a bug here "comment->$comment->cid":
@@ -2085,7 +2055,7 @@ function template_preprocess_comment(&$v$variables['new'] = !empty($comment->new) ? t('new') : '';
$variables['picture'] = theme_get_setting('toggle_comment_user_picture') ? theme('user_picture', $comment) : '';
$variables['signature'] = $comment->signature;
- $variables['title'] = l($comment->subject, 'comment/' . $comment->cid, array('fragment' => "comment-$comment->cid"));
+ $variables['permalink'] = l('#', 'comment/' . $comment->cid, array('fragment' => "comment->$comment->cid"));
Also the construction of the fragment is inconsistent - concat vs "" interpolated, though that's possibly exising. Should really be a helper function likely.
The default display settings seem a bit ugly - the field labels show - can we hide them?
Subject:
hahaha
Body:
some text here
If I try to run the update code with no nodes having comments there is an SQL error:
Message: PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'AS etid, 0 AS deleted, 'zxx' AS language, 0 AS deltaFROM
comment c
INNER JOIN ' at line 1: INSERT INTO {field_data_comment_body_2} (entity_id, bundle, comment_body_value, comment_body_format, etid, deleted, language, delta) SELECT c.cid AS entity_id, n.type AS bundle, c.comment AS comment_body_value, c.format AS comment_body_format, AS etid, 0 AS deleted, 'zxx' AS language, 0 AS delta
FROM
{comment} c
INNER JOIN {node} n ON c.nid = n.nid; Array
(
)
in comment_update_7007() (line 250 of modules/comment/comment.install).
The error is from an empty IN(), so maybe it needs to be checked.
#27
UI for comment fields is in #537750: Where can we tie a UI for fields on comments ?. Without the UI, you'll just get subject and body by default. I'm hoping placing the UI later will be a 'code slush' patch.
Rest of the review looks good, will re-roll a bit later. I'll look into hiding field labels here, although personally I'd be tempted to make that the default setting for labels.
#28
Untested re-roll which incorporates most of pwolanin's review (comment_enable() was missing a node_types_rebuild(), not a field API bug thankfully).
The building of the comment permalink is indeed inconsistent, I think this was probably inherited from the building of the node/$nid/#comment-$cid links in there before. Since it's not introduced here, I opened a new issue for a helper function #570266: Add helper function to generate comment permalinks which since it doesn't change any APIs ought to be possible to get in during code slush cleanup. Having rolled a couple of patches with those links, it's a real pain to type each time.
Unresolved:
label on textfields - is it worth trying to default that to off in general? If not I agree we can try to do that in the instance settings here.
UI for adding fields - this becomes a bit more important with this patch, but we should deal with it in the existing issue.
#29
The last submitted patch failed testing.
#30
Fix stupid syntax error.
#31
Now hiding subject and body labels by default.

#32
The last submitted patch failed testing.
#33
Sorry, minus debug this time.
#34
sub
#35
Lots of improvements, but I still see that there are no comment for fields on existing node types when I uninstall then re-install comment module
#36
update path:
- directly casting {node}.type to 'bundle' misses the 'comment_node_type_' prefix currently used by comment bundles
- the instances created in the upgrade path should also have their display settings adjusted (maybe a helper function would be useful here ?)
- the update function doesn't create instances of the 'subject' field if subject was disabled, but the data migration queries will migrate the corresponding values no matter what, thus inserting stale field data for objects that don't have an instance of the field.
- are we confident that a single-step, 'INSERT FROM (query)' migration will not timeout on comments-heavy sites ? FWIW, webchick agreed that turning heavy update funcs into multistep could be accepted as followup (critical) bugfixes.
#37
- directly casting {node}.type to 'bundle' misses the 'comment_node_type_' prefix currently used by comment bundles
OK. We can probably use CONCAT there, or just do a query per node-type.
- the instances created in the upgrade path should also have their display settings adjusted (maybe a helper function would be useful here ?)
That makes sense.
- the update function doesn't create instances of the 'subject' field if subject was disabled, but the data migration queries will migrate the corresponding values no matter what, thus inserting stale field data for objects that don't have an instance of the field.
Arghh. That's a good reason to do this per-content type then.
- are we confident that a single-step, 'INSERT FROM (query)' migration will not timeout on comments-heavy sites ? FWIW, webchick agreed that making heavy upgrade funcs multistep could be accepted as followup (critical) bugfixes.
I was split on this one - MySQL ought to be well optimized for INSERT ... SELECT, so doing it in one pass would both be faster for sites without all that many comments, and big sites should be running update.php with drush really - where it'd be also faster than a multi-pass. However it's potentially a bit more risky in that someone can try to upgrade a huge site via the browser with a short PHP time limit, and the multi-pass would save them there.
I can't get update.php to work in HEAD at the moment, which means while the upgrade path worked a couple of days ago, there's no way to test it now - either for other people to test it, or for me to properly re-roll to fix yched's points. Not really sure what to do about that.
#38
Converted to a multi-pass update (per content type though, so still the potential for some long queries, but I really think that's an edge case which should be handle by command line updates rather than slowing things down for everyone else). And also set labels to disabled in the update path.
#39
FYI, time spent in mysql does not count against php timeout. So we should never use batch API when a single (long) SQL query will do.
#40
Fixed some batch API stuff via yched in irc. Given Moshe's reminder about SQL execution time, we're definitely good on a pass per content type here.
#41
And yet another typo.
#42
Just removed the hardcoded 'zxx' strings in favor of FIELD_LANGUAGE_NONE in tests.
#43
- Fixed a typo in comments (Hide field lables by default)
- Fixed some flawed 'multipass' logic in comment_update_7008
- Added: drop comment.format column at the end of the update.
#44
The last submitted patch failed testing.
#45
Re-rolled.
#46
The last submitted patch failed testing.
#47
Straight re-roll of yched's patch from #43 again to see where we are with test failures.
#48
The last submitted patch failed testing.
#49
Moving to D8.
#50
RDF wants this, marking as not quite dead yet.
#51
rdf.module could save some logic if we were using fields in comment. I've rerolled the patch and tried it successfully with the RDF module (no comment specific logic needed in rdf.module). comment.test remains to be fixed.
#52
d.o returned an error. trying again with the patch.
#53
#54
The last submitted patch failed testing.
#55
I guess I'm fine going forward with this in D7 if Dries is, but IMO it should be possible to RDF-enable fields even if they don't go through field API. For example, Profile module's fields are not going to be converted to Field API this release, it's looking like. There will also be contributed modules that don't embrace the field API convention but will still want to leverage RDF. So from this perspective, Comment module could serve as an example of how to get RDF working for non-Field API fields?
#56
I think we can make this patch a fair bit simpler by doing body but not subject - we didn't really come up with a solution to having both body and subject optional (although it makes sense for one of them to be optional on most sites) in terms of administration etc. Comment body as field would just be a straight conversion though.
#57
Speaking for the RDF(a) effort, we have strong interest in converting comment body to a field in the interests of avoiding special casing comments with workaround code (which scor will be explaining in greater detail) even though, rest assured, the RDF API can and does apply to arbitrary non-FieldAPI field-like things (and here we come to understand that Drupal has more concepts than there are words in English).
#58
At present the rdf module does not deal very well with the comment body because it is not a field and therefore inconsistent with the other entities (node for example has its body as a field). The user bundle with profile.module is an exception, but profile.module is better than comment because it has the concept of fields (they are just not Field API fields). Profile has tpl files in which we can potentially add RDFa attributes. But the body of comment has no such tpl wrapper that we can use. Let's look at an example. I added a field to a comment bundle to illustrate the difference between a real field and the current comment body. This is the code it produces
<?php<div property="content:encoded" class="content">
<p>content of the comment body goes here...</p>
<p>and we can have as many tags as we want here</p>
<p>the body does not have a parent tag of its own which wraps the whole body content. what a pity, if only this was a field.</p>
<div class="field field-name-firstname field-type-text field-label-above clearfix">
<div class="field-label">Field label: </div>
<div class="field-items">
<div class="field-item even">the content of the field goes here</div>
</div>
</div>
</div>
</div>
?>
Comment body lacks a proper HTML wrapper to which we can attach RDFa markup. There might be way to work around this by adding markup to #prefix and #suffix, but would feel like a hack since the Field API would give that for free. Right now the RDFa markup (property="content:encoded") is in the top HTML tag because there is no other wrapping tag to use. The consequence of this is that the RDFa in the fields of comment are being ignored by nature of the RDFa processing rules: any attribute property means that the content of the tag is considered a literal (plain text or XML/HTML), hence the reason why we're trying to put the RDFa markup in a parent tag which is as close as possible of the content we want to annotate.We can specially modify comment body for RDFa, do ugly workarounds, or simply make it a Field proper.
Catch suggested to scale down the patch to just body as field and that would be enough already for the rdf patch to deal automatically with the comment body. Doing comment body is UI cleanup - since it's inconsistent with node at the moment.
EDIT: It would be great to have Dries' thumbs up for this before anyone invest time working on the patch.
#59
I spoke with Dries this evening. The gist is that common-sense clean-ups like this are okay, but it's critical that we do not further degrade performance. So please accompany patches with benchmarks.
#60
fix title ;-)
#61
Patch updated to only convert comment body to a Field. I haven't fixed the tests yet, but Drupal installs and you can create comments without blowing the site up! I've left some commented bits of patches here and there, but will clean that up after a good night sleep. For now, just curious to see how the bot will digest that.
#62
The last submitted patch failed testing.
#63
subscribing
#64
I may have stumbled upon one of the issues with the patch.
To see the form that Drupal was receiving, I output the return value of the call to drupalGet on line 34 of comment.test. In the form, I noticed that the textarea name was set to 'comment_body[zxx][0][value]' instead of 'comment'.
When I changed the array key from 'comment' to 'comment_body[zxx][0][value]' in the test on line 27, I went from 47 fails to 2 fails.
#65
Added to the last patch:
parent::attachLoad($comments);With these changes, Anonymous Comments, Comment Approval, and Comment RSS tests now pass. There are still 77 fails in the other 4 comment tests.
#66
Removed the changes to theme_comment_block() that were in the patch because they do not reflect what is currently in core and are giving errors.
#67
Corrected the variables used when truncating the comment to create the subject in comment_submit().
This now passes the Comment Block tests, which leaves 3 tests still failing.
#68
Corrected variables in comment_preview and in the interface test based on comment 64.
This passes the Interface and Paging tests, so just the Preview test to go in the Comments test group.
#69
Corrected a bunch of variables left in the test file based on comment 64.
This now works for all Comment tests, as well as RDFa and Trigger.
Still fails for Tracker.
#70
and the bot says....
#71
Variables corrected in trigger and tracker tests as per comment 64.
Rerolled to work with latest version of head.
#72
The last submitted patch failed testing.
#73
Can't do this:
+ $query->join('field_data_comment_body', 'b', 'c.cid = b.entity_id');Because field storage is pluggable. Instead any code checking for $comment->comment needs to look for the field location $comment->field_body[FIELD_LANGUAGE_NONE][0] etc.. If there's a direct query on the {comment} table including comment body, that needs to be converted to comment_load_multiple().
#74
Fixed comment_node_update_index() to include $mode and $comments_per_page. Fixed search.test to account for the new comment body. The previous test was searching for comment title although the search keyword was the body of the comment. I'm not sure why but with comment body as field, the subject of the comment does not appear entirely but only the end. In general, if the title is too long, it won't necessarily appear completely in the search results. So instead of expecting the comment subject for a search of the body, I have split the test into two searches, the first for the subject (checking the subject appears) and the second searching for the body and checking it appears (plus other body full HTML tests).
#75
kicking the bot again...
#76
The last submitted patch failed testing.
#77
Thanks for pointing out the error, catch.
I have used comment_load_multiple.
EDIT: I need to reroll this real quick
#78
Ok, this time for real.
The edit from the last patch is at line 82 of comment.admin.inc.
#79
I assume this needs testing?
#80
The last submitted patch failed testing.
#81
Since the last patch was uploaded, a new comment schema update was committed to comment.install, so needed to update the numbers on the schema.
This rerolled patch should apply cleanly, but the Search Engine Ranking test that failed after #74 continues to fail, so this still needs work.
#82
#83
The last submitted patch failed testing.
#84
comment_update_7012() was defined twice in comment.install. rerolled.
I did some basic performance benchmark like catch did at #17. (Generated one node with 300 comments, displaying 50 per page.) but could not find any noticeable impact with this patch (4.82 #/sec for both). I'd like someone else to confirm that please :). devel generate does work for D7, except you might have to retry a few times until it creates the nodes/comments for real. (It worked for me with 10 nodes and 300 comments).
#85
The last submitted patch failed testing.
#86
fixes the search ranking test and removes some trailing spaces.
#87
The last submitted patch failed testing.
#88
fixes the remaining failing tests in field_ui.test and forum.test.
#89
+++ modules/comment/comment.module@@ -1911,8 +1908,7 @@ function comment_preview($comment) {
+ $comment->format = $comment->comment_body['zxx'][0]['format'];
ignore these
comment_body['zxx']which need to be replaced with the right variable. Nevertheless this patch is ready for reviews and benchmarks. Note that I ran some benchmark tests but could not see any performance impact with this patch, but I'd like someone else to confirm this.#90
shame on me for not setting the right permission for the anonymous users to access the comments, that's lame (thanks catch for pointing this out). some quick benchmarking with 300 comments, 50 displayed and ab -c 1 -n 100 gives me 2.34 request / sec without the patch and 2.67 request / sec with the patch applied. I don't want to draw any conclusion without someone else with more benchmark experience confirming this.
#91
As before PHP performance is degraded, number of queries reduced due to skipping the separate filter cache for text fields. So that doesn't help non db-caching sites, although they'd also be getting the entire comment body from memcache instead of the db, so maybe it'd help a bit.
Personally I think the results are OK enough that potentially scalability improvements from using field API down the line cancel it out, and we should be focusing on #438570: Field API conversion leads to lots of expensive function calls and other bottlenecks rather than keeping stuff like this out because Field API is slow - that's a bad situation to be in with Drupal 7's most touted feature.
Devel output: (anonymous user with access to see devel stuff - ignore everything except the query counts since the timings vary significantly between requests).
HEAD:
Page execution time was 492.52 ms. Executed 109 queries in 47.05 milliseconds.
Page execution time was 546.5 ms. Executed 58 queries in 61.62 milliseconds.
ab -c1 -n1000 node/2
HEAD:
Concurrency Level: 1
Time taken for tests: 428.009 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 140984000 bytes
HTML transferred: 140458000 bytes
Requests per second: 2.34 [#/sec] (mean)
Time per request: 428.009 [ms] (mean)
Time per request: 428.009 [ms] (mean, across all concurrent requests)
Transfer rate: 321.68 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 394 428 26.9 422 613
Waiting: 384 417 26.1 411 603
Total: 394 428 26.9 422 613
Patch:
Document Path: /node/2
Document Length: 150997 bytes
Concurrency Level: 1
Time taken for tests: 476.468 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 151523000 bytes
HTML transferred: 150997000 bytes
Requests per second: 2.10 [#/sec] (mean)
Time per request: 476.468 [ms] (mean)
Time per request: 476.468 [ms] (mean, across all concurrent requests)
Transfer rate: 310.56 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 450 476 21.5 473 792
Waiting: 439 464 20.9 461 780
Total: 450 476 21.5 473 792
#92
Cross posted with scor, looks like his come out slightly faster with the patch - that may be well be down to the specs of our respective machines and how MySQL is set up on each, either way it shows the performance impact isn't overly bad here.
#93
I reran some benchmarks again, following this procedure:
1. install HEAD. enable devel generate and create one node with 300 comments. allow anonymous to access comments. benchmark /node/1.
2. patch HEAD with #88. run update.php (there should be 2 updates). benchmark /node/1.
this procedure should be fair since it ensures we're benchmarking with exactly the same set of comments.
HEAD without patch:
Concurrency Level: 1
Time taken for tests: 362.018 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 152513000 bytes
HTML transferred: 152000000 bytes
Requests per second: 2.76 [#/sec] (mean)
Time per request: 362.018 [ms] (mean)
Time per request: 362.018 [ms] (mean, across all concurrent requests)
Transfer rate: 411.41 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 2
Processing: 342 362 18.3 356 505
Waiting: 328 347 17.1 342 486
Total: 342 362 18.3 356 505
HEAD with patch:
Concurrency Level: 1
Time taken for tests: 375.987 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 162763000 bytes
HTML transferred: 162250000 bytes
Requests per second: 2.66 [#/sec] (mean)
Time per request: 375.987 [ms] (mean)
Time per request: 375.987 [ms] (mean, across all concurrent requests)
Transfer rate: 422.75 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.2 0 5
Processing: 360 376 9.3 375 483
Waiting: 346 361 8.5 359 431
Total: 360 376 9.3 375 483
that's an overall 5 % performance loss.
Here are devel output (by anonymous user):
- cold caches
HEAD without patch: Executed 238 queries in 109.61 milliseconds. Page execution time was 585.52 ms.
HEAD with patch: Executed 139 queries in 139.77 milliseconds. Page execution time was 652.35 ms.
- warm caches
HEAD without patch: Executed 106 queries in 35.06 milliseconds. Page execution time was 368.53 ms.
HEAD with patch: Executed 55 queries in 22.02 milliseconds. Page execution time was 383.7 ms.
#94
It seems the patch still has parts where $comment->subject is replaced by $comment->cid or $comment->user in displayed messages, administrative lists, etc.
That was needed in a previous state of the patch where 'Subject' was moved to a 'field' too, but given that the scope has been reduced to only comment body now, are those changes still needed ?
#95
Other remarks:
+++ modules/comment/comment.install@@ -207,6 +223,92 @@ function comment_update_7010() {
+ $body_instance['bundle'] = 'comment_node_' . $info->type;
Shameless plug: #613630: Remove prefix on comment bundles names removes those 'comment_node_' prefixes, they are not needed anymore. Needs someone to RTBC.
They should stay in this patch until that other one is committed, of course.
+++ modules/comment/comment.admin.inc@@ -72,26 +72,37 @@ function comment_admin_overview($form, &$form_state, $arg) {
+ // We need to load the comment object to get the comment_body field which is
+ // stored in its own table. To do this, get an array of cids.
+ foreach ($result as $comment) {
+ $comments[] = $comment;
+ $cid[] = $comment->cid;
+ }
+ $comment_objects = comment_load_multiple($cid);
I don't get why we bother with both $comments (fetched from a direct query) and $comment_objects (loaded by comment_load_multiple()). If we need to load the full comments anyway, why not have the query select only the relevant cids, and pass them to comment_load_multiple() ?
+++ modules/comment/comment.install@@ -207,6 +223,92 @@ function comment_update_7010() {
+function comment_update_7013(&$context) {
+ $ret = array();
The official name of the param in hook_update_N() is $sandbox. $context is for generic batch API callbacks, update functions have an extra layer.
(node_update_7006 is wrong and should be fixed separately).
+ No more $ret arrays in update functions now.
Don't return anything, and set '#finished' in $sandbox['#finished'].
+++ modules/comment/comment.pages.inc@@ -108,7 +108,7 @@ function comment_reply($node, $pid = NULL) {
- $comment->comment_format = $comment->format;
+// $comment->comment_format = $comment->format;
Should I stay or should I go ? ;-)
I'm on crack. Are you, too?
#96
I was reviewing this today, I had thought the same thing and couldn't remember why we hadn't just pulled cids. But I think we do need to both query from the database and load the full comment.
We need things besides cid in the query because:
We can remove the join with the users table if we pass in the comment_object instead of comment to theme_username, though.
I will be posting a patch with that change shortly.
#97
In this patch:
Still needs to be done:
Going to kick the bot to make sure this passes as expected.
#98
re #96:
Right, I forgot about the tablesort.
Then the query should only include the columns needed for the tablesort.
->fields('c', array('subject', 'nid', 'cid', 'changed', 'status', 'name', 'homepage'))should be
->fields('c', array('cid', 'subject', 'name', 'changed'))The actual comments data should be loaded by comment_load_multiple() from the cids. Less data flying around, no more $comments / $comment_objects confusion.
The arrays collecting cids to load should be names $cids instead of $cid, which is confusing.
Also, I still wonder about #94.
#99
Patch includes:
Still needs to be done:
#100
@linclark: you can ignore this permission glitch, it was just necessary to run the benchmarks.
+++ modules/comment/comment.module 22 Nov 2009 11:49:26 -0000@@ -1353,8 +1360,6 @@ function comment_save($comment) {
@@ -1425,6 +1430,7 @@ function comment_delete_multiple($cids)
@@ -1425,6 +1430,7 @@ function comment_delete_multiple($cids)
comment_delete_multiple($child_cids);
_comment_update_node_statistics($comment->nid);
}
+ //parent::attachLoad($comments);
}
}
I think we haven't determine whether we need this or not. Can someone familiar with the Field API enlighten us? (this was commented out in #65).
+++ modules/tracker/tracker.test 22 Nov 2009 11:49:28 -0000@@ -74,7 +74,7 @@ class TrackerTest extends DrupalWebTestC
- 'comment' => $this->randomName(20),
+ 'comment_body[zxx][0][value]' => $this->randomName(20),
zxx needs to be FIELD_LANGUAGE_NONE
This review is powered by Dreditor.
#101
fix the remaining bits of [zxx].
#102
Scor asked me to give this another round of review.
Aside from my comment in #98 (i.e second point in #95), here are my remarks:
+++ modules/comment/comment.install@@ -10,6 +10,9 @@
+ // Delete comment_body field.
+ field_delete_field('comment_body');
+
This is only valid if we allow fields to be restricted to some entity types.
Short of that, you cannot be sure that the 'comment_body' field has not been reused on other entities than comment.
I do think we need that API addition anyway, because having the 'comment body' field show up in the 'add existing field' UI in nodes will be pretty confusing.
I don't mind keeping the current code for now, but then this feature becomes a critical followup.
+++ modules/comment/comment.install@@ -43,6 +46,19 @@ function comment_enable() {
+ // Create comment body field.
+ // @todo this should be done in comment_install, but causes exceptions
+ // in simpletest due to dependencies between entity_get_info() and
+ // taxonomy_install().
Could we just give it another shot - things *might* have gotten better meanwhile ?
+++ modules/comment/comment.install@@ -43,6 +46,19 @@ function comment_enable() {
+ // Create instances on all existing node types.
+ node_types_rebuild();
I'm not sure I get this.
+++ modules/comment/comment.install@@ -207,6 +223,87 @@ function comment_update_7010() {
+ // Hide field labels by default.
Nitpick: 'label' (singular) might be more accurate ?
+++ modules/comment/comment.install@@ -207,6 +223,87 @@ function comment_update_7010() {
+ foreach (node_type_get_types() as $info) {
+ $body_instance['bundle'] = 'comment_node_' . $info->type;
+ field_create_instance($body_instance);
+ }
Hm - on all node types ? or only the ones with comment enabled ?
[edit: all node types, because some might have comments but with comments disabled afterwards, and we don't want to lose that data. I think this deserves a comment][edit: forget about this, I'm on crack - code is fine]
+++ modules/comment/comment.install@@ -207,6 +223,87 @@ function comment_update_7010() {
+ $sandbox['#finished'] = 0;
not needed
+++ modules/comment/comment.install@@ -207,6 +223,87 @@ function comment_update_7010() {
+ if (!$comments) {
+ // The update will finish when $sandbox['types'] is an empty array.
+ // Since we have nothing to do if there's no comments on the site, except
+ // drop the columns, set that here for convenience.
+ $sandbox['types'] = array();
+ }
+ else {
+ $sandbox['etid'] = _field_sql_storage_etid('comment');
+ $sandbox['types'] = node_type_get_types();
+ }
could be shortened to
$sandbox['types'] = array();if ($comments) {
$sandbox['etid'] = _field_sql_storage_etid('comment');
$sandbox['types'] = node_type_get_types();
}
+++ modules/comment/comment.module@@ -1425,6 +1430,7 @@ function comment_delete_multiple($cids) {
+ //parent::attachLoad($comments);
Should I stay or should I go ? ;-)
I'm on crack. Are you, too?
#103
rerolled the patch since it didn't apply any longer. kicking the bot to make sure all tests are passing.
from #98
removed status which was still in the latest patch. The other comments from #98 should have been addressed by #99, i.e. all comment data is loaded from comment_load_multiple(), except the cids of course.
still need to work on the other comments from yched.
#104
All of yched points have been addressed.
Let's follow up in a different issue since this feature should be available and is not comment specific. This patch has been waiting here for a long while...
no luck. It works during a normal install but not during testing.
FieldException: Attempt to create a field of unknown type <em>text_long</em>. in field_create_field() (line 279 of /Applications/MAMP/htdocs/d7git/modules/field/field.crud.inc).field_info_field_types() returns an empty array during the tests. This sounds like a bug in testing which should be fixed in its own issue.
I improved the comment. It also works without having an explicit node_types_rebuild() in hook_enable() but that's because node_types_rebuild() is called after enabling the module. Not sure we should remove it though since it's there for consistency (create field, then create instances).
#105
+++ modules/comment/comment.install@@ -43,6 +46,20 @@ function comment_enable() {
+ // Create comment body field.
+ // @todo this should be done in comment_install, but causes exceptions
+ // in simpletest due to dependencies between entity_get_info() and
+ // taxonomy_install().
I'm OK for investigating this in a separate issue, but the comment doesn't seem accurate - dependencies between entity_get_info() / taxonomy_install() ?
- The error you mention (Attempt to create a field of unknown type text_long) seems related to _field_info_collate_types() cached info, not entity info.
- there's no taxonomy_install() ;-)
+++ modules/comment/comment.admin.inc@@ -67,31 +67,38 @@ function comment_admin_overview($form, &$form_state, $arg) {
+ foreach ($result as $comment) {
Thanks for the refactoring, much clearer now. Last nitpick: I'd suggest naming the var $row or $record instead of $comment, the query doesn't return actual comment objects.
+++ modules/comment/comment.install@@ -43,6 +46,20 @@ function comment_enable() {
+ // Create comment body field instances on all existing node types via
+ // comment_node_type_insert().
+ node_types_rebuild();
Sorry to be blunt but I still don't get this.
If comment.module gets enabled months after the initial install (or gets disabled and re-enabled months after), then a node_types_rebuild() won't trigger a hook_node_type_insert() because there are no new content types on the site.
If the code that needs to run on comment "module being enabled" and on "node type being created", then I'd suggest calling the same helper function in both hooks, but not artificially firing one hook from the other.
This review is powered by Dreditor.
#106
Thanks yched, we're almost there ;) I've fixed all your comments. There is now a helper function to cover your last concern.
#107
+++ modules/comment/comment.module@@ -239,7 +239,7 @@ function comment_count_unpublished() {
function comment_node_type_insert($info) {
- field_attach_create_bundle('comment', 'comment_node_' . $info->type);
+ _comment_body_field_instance_create($info);
}
Hm, I don't think we want to remove the call to f_a_create_bundle() ?
I'm on crack. Are you, too?
#108
@yched, it's not removed, it's moved with the rest in _comment_body_field_instance_create():
+++ modules/comment/comment.module@@ -271,6 +271,30 @@ function comment_node_type_delete($info) {
+function _comment_body_field_instance_create($info) {
+ field_attach_create_bundle('comment', 'comment_node_' . $info->type);
#109
Oops, right. Well, I don't think that's correct. Creating the bundle for comment-on-some-node-type does not belong to a function whose advertised role is to create a field instance. It does exactly belong to hook_node_type_insert() ;-)
#110
oh, right, I've moved it back in comment_node_type_insert().
#111
Thanks ! RTBC, then :-).
#112
bumping this issue, it's been RTBC for a few days now. Since it changes comment body into field, we should get this in before any beta comes out. (Note that this is also the condition for having proper RDFa support for comments and streamlines another old built in field cruft...)
#113
Dries, I'll leave this one to you. This is a pretty major API change, but if you are okay with it, given your knowledge of RDF stuff, I won't squabble. :)
#114
I'm joining late; I think in principle this is awesome, but I haven't reviewed the code in detail. I wanted to generate some new benchmarks, but ran into errors.
1. Patch doesn't apply and needs a reroll.
2. I tried to reroll, but then ran into errors. Perhaps caused by:
+++ modules/comment/comment.install@@ -214,6 +237,80 @@ function comment_update_7011() {
+ // On the last pass of the update, $sandbox['types'] will be empty.
+ if (empty($sandbox['types'])) {
+ db_drop_field('comment', 'subject');
+ }
Is this correct? If so, should there also be a removal of 'subject' from hook_schema()?
This review is powered by Dreditor.
#115
@effulgentsia right, this piece had nothing to do there. Rerolled patch without it since we do not touch the comment subject. Applied the patch on a working HEAD with node and comment content and ran update.php with no problem. rerolled patch so you can go ahead with your benchmark.
#116
now with the new LANGUAGE_NONE constant.
#117
created #652246: Remove comment body specific cruft for RDFa support as follow up issue for when this is in.
#118
Human tested on articles, forum, and custom content; and with tracker. Functions as before, and makes RDFa goodness easier (and also makes comment body play well with custom comment fields).
The comment subject appears below the comment body (and also the body is labeled "body" instead of "comment'), but if no one has complained about this in 117 comments let's just pretend i didn't bring it up. That can be done in a UI fix after this gets in. (Note: If it cannot be corrected by way of what is passed through field_attach_form, subject and author can be manually weighted -1 and -2 respectively and the form looks as it looks for comments pre-patch.)
benjamin, agaric
#119
comment_update_7013() was not dropping the column comment and format after moving the data to field, I've added these to the patch. Also fixed the form element ordering pointed by Benjamin. See comment form as logged in user and anonymous user.
I also ran some benchmarks which showed 6% performance impact on my localhost:
HEAD 3.28 #/s - 308ms
patched 3.10 #/s - 322ms
#120
Oh, wonderful! I do feel better now.
#121
On my computer, I see a 19% performance degradation for an anonymous user with permission to view comments viewing an article node with 50 comments (200ms vs 238ms). That matches what catch saw in #91. However, as catch points out in that comment, the numbers are very sensitive to machine specs, since there's both rendering factors and db query factors, so I'm not surprised that scor is seeing only a 6% change. My procedure was to install HEAD, use devel_generate to make a node with 50 comments, benchmark, apply patch 119 and run update.php, and benchmark again.
This is a frustrating situation. On the one hand, I think this is an architecturally awesome thing to do, and my preference would be to see it land, because on any site I work on, I can use the drupal_render() cache to optimize comment display, and if on a particular site I couldn't, I would scale up hardware, and would see that as a good trade-off for getting the benefits of this patch. And I agree with scor that if it's to be in D7 at all, it should land before the first beta. And some very smart people are working on improving the performance of Field API: #438570: Field API conversion leads to lots of expensive function calls. On the other hand, we're a community, and even a 6% performance impact (and certainly a 19% one) will upset a significant portion of the community. Note Moshe's comment when a 6% degradation was introduced for contextual links: #607244-8: Decrease performance impact of contextual links. And that was for something that we knew could be turned off by people who care more about performance than features (as of yesterday, it's now a module that can be disabled). This patch, however, is not optional. If it lands, it cannot be disabled.
Tough decision, and I'm glad I'm not the one that needs to make it. Dries: good luck :).
#122
Obviously no. Six or nineteen , too much.
#123
@effulgentsia: interesting results. What system do you use? For what it's worth, I ran another series of benchmark on a basic site5 hosting server. I installed HEAD on a subdomain and generated 10 nodes with 150 comments max. I cloned the db onto a second subdomain, applied the patch and ran update.php. In order to account for the fluctuating server load, I used a script to alternate the ab command from one site to the other:
for i in {1..100}do
echo "Round $i"
ab -c1 -n1 http://538164-head.drupalrdf.openspring.net/node/9 >> bench_results_head.txt
ab -c1 -n1 http://538164-patch.drupalrdf.openspring.net/node/9 >> bench_results_patch.txt
done
I parsed the results and the poorest average result I got was still less than 8%. The sites are public so feel free to check them out.
EDIT: @effulgentsia catch's results in #91 are closer to 11% than 19% - though I admit either is still high :(
#124
And if comment body is not meant to become a field, why having fieldable comment to begin with if core doesn't even take advantage of it?
#125
re #123: 8% for a page displaying
150 fields[edit: er, more probably 50 fields, since default comments per page is 50 - my remark still applies though] sounds within the realm of the optimization brought by #652834: Field formatters as render arrays (see moshe's benches in #438570-60: Field API conversion leads to lots of expensive function calls).Would be great if you could bench HEAD vs "this patch + #652834: Field formatters as render arrays", see what we come up with.
#126
You're right. Also, I realized that after running update.php, if I also clear drupal caches (admin/config/development/performance), then the performance on my machine (MAMP) improves to only 10% degradation (I keep forgetting that update.php doesn't empty drupal caches any more). So there's less machine sensitivity than I thought, and our benchmarks are compatible with each other: awesome!
I'm about to test out the impact of #652834: Field formatters as render arrays and will post the results.
#127
I'm actually seeing degradation with #652834: Field formatters as render arrays:
HEAD: 205ms
Patch 119: 225ms
Patch 119 + Patch 652834: 227ms
Let's continue work on that issue, and post back here when it successfully optimizes comment body to within acceptable levels.
#128
Almost all of the performance difference can be traced down to theme('field'). It's a template file, and those are slow. If you add
unset($variables['content']['comment_body']['#theme']);at the end oftemplate_preprocess_comment(), then it removes the calls to theme('field') and the performance impact of the patch drops to 2%, which can probably be optimized even lower. I'm not saying we necessarily want to do that, though I think doing that would be preferable to not having the patch land at all, but there's probably something we can do to make comment_body special and get themed as a field in a more optimized way (or maybe even optimize theme('field') generically). I'll post more thoughts here when I'm able to work on it again.#129
I've looked a bit at template flow and didn't find a way to speed it up. Hope you guys can.
Meanwhile, maybe we turn 'field' theme hook into a function instead of a template? Anyone who wanted template way could do so with hook_theme_regestry_alter, or just call out to a template from a theme function. I agree that this is far preferable to shying away from making stuff like comment body into a field.