Task description
One of the things that has an great impact on Drupal's success is the widely adopted agreement on clean, sophisticated, and well-documented code. In an abstract view, we are not only using a language (PHP), but also using a dialect to articulate what we are doing. This means that core developers and most contrib module maintainers try to ensure that everyone else in the community understands all parts of the application and is able to fix or improve the code without language barriers. Without this solid ground, no one would be able to participate and contribute in the way the Drupal community members are doing. Many patches against Drupal core are holden back or even rejected because of unsuitable code.
To support developers in using the proper syntax, formatting, naming, and adhering to all other development guidelines for Drupal, the Code Review module has been developed, which outputs notices for all code that is considered wrong in terms of Drupal Coding Standards. While Coder does only a code review and leaves you with the results, there is also the coder_format script shipped with the Coder module, that strives to clean and re-format a whole file to adhere to Drupal's Coding Standards.
Cleaning the coding-style of a contributed module with coder_format is usually a 5-minute job. Coder_format can be run recursively on all files in a directory by invoking it with the --batch-replace argument. See inline documentation in coder_format.php for further information.
However, due to its current beta-state, there are situations where coder_format applies wrong formattings to some lines or all lines that follow some lines of complex syntax in a file.
  • The goal of this GHOP task is to fix and improve the coder_format script, so that it can be applied recursively on Drupal 6 core without introducing wrong formattings (and fixing existing wrong formattings throughout Drupal core, of course).
  • The result of this GHOP task must be a single patch file against coder_format.inc that is ready to be committed. To clarify in advance: not a patch against Drupal core -- Patch files against Drupal core are only needed to communicate the current results/errors of coder_format in this issue.
  • To work on this task, you should have a local development environment, be able to run a PHP script from the command line, know how to access a CVS server, and how to checkout and create patches against a CVS module. You should have a (better good) visual diff tool to compare a resulting file with the original.
  • You will have to deal with rather complex regular expressions, Drupal Coding Standards, reviewing of Drupal core (with and without Coder module), creating and applying patches, writing helpful inline documentation, and identifying wrong formattings.
Resources
Primary contact
Daniel F. Kudwien (sun)
I'm pretty sure that fellow Drupallers like Doug, Stella, Core developers, subscribers of the Coding Standards Group and also developers in the #drupal and #drupal-ghop IRC channels will help out on any questions related to missing information about Drupal Coding Standards.

Please bear in mind that coder_format is already performing quite well. So the referenced resources about coding standards are only necessary to find out how something should look like if there are any wrong formattings. It is quite hard to believe that the attached patch is already based on a completely re-formatted Drupal core, which means that all files have been re-written by coder_format.
Pre- and post-processors are already part of the script, so the mentioned post-processor for cleaning up lines containing only white-space is definitely not a big deal. Also, the mentioned bug in drupal_urlencode() is clearly caused by the pre-processor coder_preprocessor_inline_comment().

For the sake of completeness I'm adding a patch file against Drupal 6 core that is the result of coder_format. It's about 1,5 MB, so I needed to zip it. You will discover the first formatting error in drupal_urlencode() which leads to wrong indents in all following lines. Furthermore, the patch would be dramatically smaller, if there was a post-processor that removed any white-space from all lines containing only white-space (this is one of the needed improvements).

CommentFileSizeAuthor
#60 coder-DRUPAL-6--1.patch8.51 KBsun
#59 coder-DRUPAL-6--1.patch7.17 KBsun
#58 coder-DRUPAL-6--1.patch9.48 KBsun
#57 coder-format-gamma-4.patch9 KBezyang
#56 coder-format-gamma-3.patch8.44 KBezyang
#55 coder-format-gamma-2.patch5.8 KBezyang
#53 drupal6.patch969.73 KBsun
#52 coder-format-gamma-1.patch10.79 KBezyang
#49 coder_format.tgz13.48 KBsun
#48 coder_format.inc-3.patch2.71 KBezyang
#47 coder_format-17.patch32.46 KBezyang
#46 coder_format-16.patch30.4 KBezyang
#45 coder_format-15.patch27.65 KBezyang
#44 coder_format.inc-2.patch7.47 KBezyang
#43 coder-DRUPAL-6--1.patch61.98 KBsun
#42 coder_format.inc_.patch31.84 KBezyang
#41 drupal6.patch922.09 KBsun
#40 coder-DRUPAL-6--1.clean-up.patch47.39 KBsun
#38 coder-DRUPAL-6--1.16-new-tests.patch37.72 KBsun
#37 coder_format-16.patch25.08 KBezyang
#36 coder_format-15.patch24.03 KBezyang
#34 coder-DRUPAL-6--1.coder_format-tests.patch41.07 KBsun
#33 coder-DRUPAL-6--1.coder_format-tests.patch32.7 KBsun
#32 coder-DRUPAL-6--1.coder_format-tests.patch31.59 KBsun
#30 coder_format.zip5.35 KBezyang
#30 coder_format-12.patch17.92 KBezyang
#29 coder_format-11.patch16.61 KBezyang
#26 coder_format-10.patch16.6 KBezyang
#25 coder_format-9.patch15.28 KBezyang
#22 coder_format-8.patch13.78 KBezyang
#21 coder_format-7.patch11.79 KBezyang
#18 coder_format-5.patch9.99 KBezyang
#18 coder_format-6.patch11.08 KBezyang
#10 coder_format-1-1.patch1.09 KBezyang
#10 coder_format-comment.patch1.83 KBezyang
#10 common.inc_.patch54.48 KBezyang
#10 database.inc_.patch7.11 KBezyang
#10 database.mysql_.inc_.patch5.92 KBezyang
#10 database.mysql-common.inc_.patch6.8 KBezyang
#6 common.inc_.patch94.84 KBezyang
#5 cache.inc_.patch1.98 KBezyang
#5 cache-install.inc_.patch728 bytesezyang
#4 bootstrap.inc_.patch16.89 KBezyang
#3 batch.inc_.patch11.25 KBezyang
#2 coder_format-3.patch1.86 KBezyang
#2 actions.patch4.83 KBezyang
drupal6.patch.zip360.67 KBsun

Comments

aclight’s picture

Title: GHOP #x: Improve coder_format to re-format Drupal 6 core without wrong formattings » GHOP #136: Improve coder_format to re-format Drupal 6 core without wrong formattings

The offical GHOP task related to this issue can be found at:
http://code.google.com/p/google-highly-open-participation-drupal/issues/...

ezyang’s picture

StatusFileSize
new4.83 KB
new1.86 KB

See comments #1 and #2 here: http://drupal.org/node/149826#comment-693264 ... I accidentally posted them in the wrong place. I hope it's not too much of an inconvenience. I'll repost below:

The first order of business is to get coder_format working on Windows. This patch fixes a number of problems I found. Specifically:

  • There were bad replacements for Windows/Mac newline normalization, resulting in literal "\n"s all over the place
  • coder_postprocessor_array_rearrange totally nukes the file contents, so it had to be completely disabled. I'll see if I can fix it.
  • A conf_path() dependency was recently added, so I stubbed that function too

Coder needs a dose of its own medicine, but that will be done in good time. :-)

Here are the changes coder identified for the action.inc file. Most of them are sensible, but a few are hazy. Specifically:

  • -46,9 +47,9; -75,12 +76,13; -93,9 +95,9: The spacing added to justify the three assignments seems excessive. Maybe coder should have an "upper limit" of how far it goes before it gives up and starts another batch?
  • -210,9 +211,9: Coder appears to have collapsed the indent here; how curious!
ezyang’s picture

Assigned: Unassigned » ezyang
Status: Active » Needs review
StatusFileSize
new11.25 KB

batch.inc is fairly straightforward; same issues as action.inc, but also, we have a lot of reference assignments converted from =& func() to = &func(). I hope this is the intended behavior, although it is not mentioned in Drupal or PEAR's coding style manual.

ezyang’s picture

StatusFileSize
new16.89 KB

bootstrap.inc:

  • Is list( , $session_name) or list(, $session_name) preferred? Coder likes the latter.
  • -584,28 +596,30: It doesn't look like Coder capitalizes inline comments. Should this be added?
  • -755,21 +770,21: Coder appears to have undone the "nice" formatting. Is this desired?
  • -1104,7 +1120,7: Coder prefers (object)array() over (object) array()
ezyang’s picture

StatusFileSize
new728 bytes
new1.98 KB

cache.inc and cache-install.inc are clean except for some minor whitespace changes.

ezyang’s picture

StatusFileSize
new94.84 KB

Ah, the first major problems are found in common.inc:

  • Coder un-indents an array at -2271,20 +2314,24; I am positive this is not desired, the array should be indented right one more.
  • -2334,7 +2382,8: Another bug with comments: // inside /* */ should not trigger the comment processor
  • -2342,12 +2391,13: Similar to the above, this actually breaks the code. Also, for some strange reason, the return is indented improperly. In fact, this appears to affect everything afterwards.

I'll terminate analysis for now and fix these issues. But before I do, any insights into what may be causing these problems?

sun’s picture

http://drupal.org/node/149826 contains the latest code of coder_format including a Windows installer now. The first two mentioned issues in #2 are fixed in that patch. Please use that version for your work on this task. I hope it will be committed ASAP.
Also, if you are working on Windows, you really should use the Windows Explorer integration to quickly and recursively clean/unclean the coding-style of Drupal core.

General:

  • Please try to summarize all findings in one issue follow-up, so this issue is not so cluttered.
  • When referring to wrong formattings or formattings in question, please add the filename of the patch you are referring to in your comments. (You attached two patches in #2 [which is good], but didn't include the filename in your list.)
  • Please use <code> tags in your follow-ups where appropriate.
  • To keep further attached patches as small as possible, you should begin with the proposed post-processor that removes white-space from all lines that only contain white-space.
  • When posting patches for coder_format (itself), please create them from the base directory of the coder module. Also, please use the -p option for diff. See http://drupal.org/patch/create for further info. Diff header lines in your patches for coder_format should always look like this:
    diff -u -p -r1.2.4.2 coder_format.inc
    --- scripts/coder_format/coder_format.inc	9 Jan 2008 13:31:12 -0000	1.2.4.2
    +++ scripts/coder_format/coder_format.inc	16 Jan 2008 10:53:05 -0000
    @@ -829,8 +829,8 @@ function coder_preprocessor_line_breaks_
    

Questions:

#2: conf_path
Great.
#2: actions_16.patch: Extra spacing for variable assignments
First issue found is unimportant (may be dealt with later), the second is caused by a maximum character limit in coder_replace_multiple_vars().
#3: batch.inc_.patch: Assign by reference
Yes, there is no rule. But since variables assigned by reference in function signatures (i.e. mymodule_foo($op, &$node)), and functions declared by reference (i.e. &mymodule_foo();) both use this style, we should use it for variable declarations, too.
#4: bootstrap.inc_.patch: list()
No extra white-space looks good.
#4: bootstrap.inc_.patch: Capitalizing inline comments
Not part of this issue, might be added later.
#4: bootstrap.inc_.patch: Extra spacing for multi-line arrays
Of course there are things, that can't be taken into account in such a generic approach. Since we also have multi-line arrays for menu items and would not want to apply extra spacing to them, this can only be disregarded. However, if you have an idea, let me know.
#4: bootstrap.inc_.patch: (object) conversion
I don't see a problem here, coder_format properly re-formatted to (object)array()
#6: common.inc_.patch
- bug: indents after return: yep
- bug: double slashes in multi-line comments: yep
- bug: wrongly interpreted double-slashes: definitely! This bug in drupal_url_encode() is the cause for the wrong indent in all following lines.

Now, writing this answer was a bit cumbersome. How about creating a Wiki page at http://groups.drupal.org/coding-standards-and-performance-optimization containing a table (or the like) with all findings, questions and answers?

sun’s picture

Status: Needs review » Needs work

Sorry, small addition FYI: To disable a pre-/post-processor, simply comment out the key '#search', and it won't be executed. To receive debugging info on the command line, just add the item '#debug' => TRUE,.

sun’s picture

@ezyang: http://drupal.org/node/149826 has been committed, so be sure to use a fresh checkout of coder_format (or wait approx. 12 hours until a new development snapshot has been packaged).

ezyang’s picture

I've made the wiki page: http://groups.drupal.org/node/8224 I'll post here when I make updates to that page, unless you say otherwise.

Also, it looks like conf_path() fix hasn't be committed yet, so here's a cleaner patch for the issue. We also have a (separate) patch for the two comment issues, which also appears to fix the indent problem.

Updated common.inc patch, with one more issue described on the wiki page. When it's verified, change it to OPEN.

Also attached are "clean" patches for other includes. They had no problems.

sun’s picture

Reminder: Last patch was not created from coder.module's directory. If you create patches for Drupal contrib modules, everyone expects them to be based on the root of the corresponding module (except Drupal core). See http://drupal.org/patch/create Check your directory for further information.

I've committed that patch, because it included only basic changes to get this task started. Upcoming patches will be kept in this issue until the issue is set to RTBC. I don't know the GHOP rules in detail, but I think one patch including all improvements will give us a better overview about the results of this task.

Wiki page: The current "File" and "Location" won't work out, because lines in "Location" refer to a patch file, while "File" refers to another file. Proposed procedure:
- Let's always refer to actual files in Drupal core - thus, also refer to line number(s) in those files (instead of temporary patch files). Everyone is able to track these issues in their own setups then.
- Let's move "File" + "Location" to the beginning of the table, and rename "Location" to "Line(s)".
- "Resolution" is actually a "Status".
- Could you add one example (!) filename and line number for the first three rows?
- It probably makes sense to add the file path to filenames (includes/bootstrap.inc), and order all table rows alphabetically.

Posting here and there will keep this issue warm and publicly visible, which is good.

ezyang’s picture

Quick note, there were updates to the post above before you posted. Composing a response now (will make a new comment, this might take a while).

sun’s picture

Patch files: Simply stating that a file does not include wrong formattings is enough. Please don't attach patches for those files. Instead, we might add a table row for each file indicating the status 'OK' - or what do you think? Too much clutter? Then a second table for reviewed, but 'ok' files would sound appropriate to me.

Please forget my (foolish) comment about line numbers. Let's replace them with function names instead. (Line numbers change quite often, and we still have 7 critical issues for D6)

Any updates regarding the white-space post-processor (to keep further patches smaller)?

sun’s picture

Hyphen in -1 should not be separated
Counter example: includes/install.inc:320

You've added a comma in coder_preprocessor_inline_comment() without updating the explanation (badly needed):

    // [^;,\$] prevents matching of CVS keyword Id comment and double slashes
    //   in quotes (f.e. "W3C//DTD").
ezyang’s picture

My apologies for all the mistakes. I'll try to iron out these errors. I've updated the table with your suggestions. I will handle the comma thing (it too is to prevent matching of double slashes in quotes, but I'll make it explicit that this is also for multiline arrays). I totally missed the whitespace bug too, so I'll need to fix that.

I'll post cumulative patches for coder_format. This makes it more difficult to see what changed, but wth. I'm also going to abstain from posting coder_format diffs, but I will be posting test-cases (hopefully, we can get them automated or something).

sun’s picture

Re: cumulative patches: That's absolutely okay, just keep on posting any updates! This is why: By reviewing all Drupal core files, we are creating a definitive list of bugs or bogus results at the Wiki page. This is our test plan. It is not only checked during development, but will also completely checked again in front of committing all fixes. It is okay, if only you and me are able to track the changes between coder_format patches in follow-ups of this issue, as long as others are able to understand (through inline/PHPdoc comments) how coder_format is working.

I hope that the existing inline documentation is sufficient enough to understand how coder_format works. Since you are one of the first developers that tries to understand and enhance it, feel free to improve the inline documentation in places you think it is missing or not sufficient. That will surely help other developers to enhance the script in the future.

While randomly looking through files, I discovered another two bugs:

includes/common.inc:drupal_common_theme()
Wrong indent of closing braces (due to missing comma in surrounding multi-line array).
includes/install.inc:drupal_rewrite_settings()
Inline comment in inline comment must not be re-formatted.

However, I didn't look thoroughly through both files, so there may be more issues.

ezyang’s picture

Updated: all outstanding issues fixed except for one! Whitespace issue required a globals hack, but I don't think it should be a problem as we are not running coder_format_string recursively.

The coder_format documentation is quite good. There are some design decisions (such as the heavy reliance on regexes) that I don't agree with, but it works well and I, so far, have been able to get it to do what I want it to do.

For the two issues you brought up, I've fixed the // // one. As for the indent of closing braces, I'm not sure if this actually is intended behavior: isn't this syntax:

$array = array(
  'foo', 'bar'
  );
// and so on

Valid?

ezyang’s picture

StatusFileSize
new11.08 KB
new9.99 KB

The previous patch broke a lot of spacing things, due to a faulty implementation of the lookbehind check. I've converted coder_br into accepting $result as a reference, which solves the problem and also looks cleaner!

Update 1: Here's another update, with includes up to locale.inc checked. 12 more to go!

Update 2: Analysis of last 12 files done! Mostly everything is savvy; I'd just like you to checkup on some of the new issues before I fix them.

ezyang’s picture

Status: Needs work » Needs review

Marking code needs review

sun’s picture

Status: Needs review » Needs work

Great work so far! I really like the changes you made to coder_br()'s $result handling.

However:

  • Please create a new follow-up if you want to post an updated patch.
  • Try to avoid variable name abbreviations ($in_function_decl may be $in_function_declaration or at least $in_func_declaration)
  • PHPdoc explanation for $in_function_declaration is missing. I don't understand why it is set to FALSE in case '(', and why exactly it has been added.
  • There is no @note PHPdoc command, just remove it and the indent. Everything between the one-line function short-description and function parameters are considered "notes". :)
  • I'm not sure, if the added indent handling for previous lines in coder_br() is secure. Personally, I'd feel better with a new post-processor that scans the whole file for lines containing only white-space (indent) and replaces them with a single new-line character. Additionally, the inline documentation is a bit meaningless, the order of arguments in the for statement is looking wrong, and all 'if' statements need to use curly braces even in situations where they are technically optional (see coder_postprocessor_if_curly_braces() ). Also, if I'm not totally wrong, accessing a single character in a string is done via $string{123}. Thus,
      // Perform backwards check.
      for ($i = strlen($result) - 1; $i--; $i >= 0) {
        if ($result[$i] == ' ') continue;
    

    should probably be

      // Check if previous character is white-space. [if you wrote it that way, you would probably understand my concerns]
      for ($i = strlen($result) - 1; $i >= 0; $i--) {
        if ($result{$i} == ' ') {
          continue;
        }
    

All other questions have been answered in the wiki page.

ezyang’s picture

StatusFileSize
new11.79 KB

Issues addressed.

I respectfully disagree with the proposition that newline collapsing should be a post-processor: this is because there's no way to tell if we're inside a string/heredoc/etc or not. Granted, in most cases, some removed whitespace won't make a difference, but things are safest if we do rtrimming when the code has explicitly given us permission to input whitespace.

Also, curly brace syntax will be deprecated with PHP 6, so I'm sticking with square brackets.

P.S. Curiously enough, the for loop still works properly in both cases: when $i hits zero it evaluates false and terminates the loop! I have fixed it, however.

ezyang’s picture

StatusFileSize
new13.78 KB

Juicy new multiline array indenter added. Also, docs added for the hyphen fix.

sun’s picture

Wow! That's totally awesome! :)

Unfortunately, I've discovered some very odd results in common.inc:
- drupal_add_feed(): The line containing drupal_add_link(array('rel' => 'alternate', has been deleted.
- drupal_add_js(): The line containing array('basePath' => base_path()), has been deleted.
- watchdog_severity_levels(): The whole array has been deleted.

Your arguments about trimming white-space are good. I've marked the corresponding issue fixed.

sun’s picture

I'm not sure if we really should investigate the new multiline array indent processor further. As mentioned before, multiline arrays are also used in hook_menu(), and from my knowledge, many developers would not be happy to see their menu items indented like this, f.e. in aggregator.module.

I'd say: Let's keep this in mind (and in the patch), disable the processor (for now), and focus on the other bugs. Would that be okay with you?

ezyang’s picture

StatusFileSize
new15.28 KB

Ok.

Did you merge the common.inc :: drupal_common_theme() issues? I would prefer it the other way, as when they're merged, I can't set the status for a specific task otherwise; I think I've fixed "coder_preprocessor_ml_array_add_comma() should add a comma to multi-line array values, which required for case ')'." but not the above issue.

A lot of processors need the same regexps (for example, there is a lot of multi-line array handling). How should we reuse these regular expressions?

ezyang’s picture

StatusFileSize
new16.6 KB

I'm suspicious about this patch. It seems to work, but I haven't tested it enough. This fixes mail.inc :: drupal_mail_send, and should (theoretically) also make some of the comma pre-processing obsolete.

sun’s picture

Re: drupal_common_theme() issues: Can't remember. If it was me, it happened not intentional during a clean-up.

Re: preprocessor_ml_array_add_comma: Performed only a quick check - unfortunately, it introduced a bug in common.inc:url().

sun’s picture

eerm... IRC? *g*

ezyang’s picture

StatusFileSize
new16.61 KB

Updated preprocessor_ml_array_add_comma(), should be working properly now. Sorry, I have to run now. If you're still around in about three hours, I'll be back.

ezyang’s picture

StatusFileSize
new17.92 KB
new5.35 KB

Unzip this file in your tests/ directory for coder, set things up by reading the README.txt file, apply the latest coder patch, and then fire up the SimpleTest and enjoy insta-unit-tests for Coder! This will make regression testing a lot easier. Whee!

sun’s picture

Boy! That is a fantastic idea! It blows my head!

Since introducing unit tests for coder_format is a bit out of scope of this issue, I'll try to work with you in parallel on that. Unfortunately, I have no experience with SimpleTest yet.

sun’s picture

StatusFileSize
new31.59 KB
  • Cleaned coding style of coder_format SimpleTest files (of course, excluding .phpt files) using coder_format ;P
    • Only minor fixes were needed afterwards. A great example of a real world usage for coder_format.
    • Discovered new bug: Missing space in var $foo statements.
  • Merged Text_Diff_Renderer_parallel.php into CoderTestFile.php.
  • Cleaned-up README.txt, and added step-by-step instructions for unfamiliar guys like me ;)

The latest patch for coder_format is included in this patch. Since this is an AWESOME way to speed up our reviews, I would propose to include all these changes in future patches. I hope I can get into touch with Doug to coordinate this. Without coordination, we would need to place those new test files into the coder_format directory.

sun’s picture

StatusFileSize
new32.7 KB

New patch,

- moves coder_format tests into scripts/coder_format/tests/
- creates a stub file tests/coder_format.test to let SimpleTest know of these tests.
- allows to add <?php tags in the first line of all coder_format tests for proper syntax highlighting in editors.
- includes some more code clean-up in the new SimpleTest functions (actually, I've created the previous patch in the wrong directory...).

I'll start to add the existing unit test cases from the wiki page now, so hopefully, you can focus on fixing the outstanding bugs later on.

sun’s picture

StatusFileSize
new41.07 KB

Now we are talking! :)

sun’s picture

SimpleTests have been committed.

ezyang’s picture

StatusFileSize
new24.03 KB

All tests pass. There have been some fairly major architectural changes in this patch: namely, I'm starting to move some of the processors to the tokenizer. Why? Regexps are quick and dirty, but they cause lots of problems when you hit the edge-cases. Using the tokenizer lets us use a much richer set of contextual rules.

It feels pretty impossible to separate the patches, so there are changes to the tests in here too. Namely, they've been revised into 1 KB or less chunks.

ezyang’s picture

StatusFileSize
new25.08 KB

Comment processing moved into the tokenizer. In this case, putting it there greatly simplifies our code by eliminating the need for checking all of those edge cases (is it in a comment, a multiline, an array, inside a string, etc). It looks like some tests are missing from the previous patch, this one has them.

sun’s picture

StatusFileSize
new37.72 KB

Edward: This rocks!

This is the first time that Drupal 6 core does not output a fatal error after applying coder_format! Yeeeeha!

I'm already pretty sure that, once finished, all this work will gain very much attention among Drupal developers.

The new multiline array processing is quite impressive. I did not find any error.

I've tried to clean-up the inline documentation a bit. However, I'm not sure I understood the new variables correctly. If I'm not totally wrong, $in_multiline and $in_array are both controllers for inline comment placement? If they are, the PHPdoc for them could be a bit more detailed.

I agree with you regarding small unit tests. However, most of them require some context to check real world examples. Furthermore, we would need plenty of test files then. Especially regarding comments and parenthesis, there are many tests that have to be passed. After reviewing core files in includes/ once again, I've added some important tests now. Do whatever you think is best to do, but please don't delete 'em.. ;)

Regarding inline comments immediately following an function/if/elseif statement, I was not sure what the proper default should be. Sometimes a developer is commenting the statement itself, sometimes the contents of it. IMHO, moving them always above the remarked line should work out in 99% of all cases. To make this point clear, I added two functions (namely db_query_range() and _locale_import_read_po()) to comments.phpt.

I guess that --EXPECT-- is optional now?

I've replaced variables.phpt with some code that I actually wanted to use in my last patch.

IMHO it would be a good idea to stuff all possibilities into unit tests. There are only two days left until this task needs to be completed, and we might get some help from contributors to review Drupal core and find all wrong formattings. If all of them are covered by unit tests, you could concentrate on fixing coder_format instead of reviewing core files.

Finally, is it possible that the SimpleTest result table is reversed, i.e. actual results should be left? In failing tests, lines in the expected result are marked instead of lines in the actual result.

ezyang’s picture

Whoo! Just a few clarifications before I continue hammering away:

If they are, the PHPdoc for them could be a bit more detailed.

They're slightly magical. I'll document them better.

However, most of them require some context to check real world examples.

I agree. However, that's not the point of unit tests per se; we should be careful to partition real-world tests into their own area.

I guess that --EXPECT-- is optional now?

It's always been optional. When omitted, we expect the input to come out.

Finally, is it possible that the SimpleTest result table is reversed, i.e. actual results should be left? In failing tests, lines in the expected result are marked instead of lines in the actual result.

Hmm... that would be a SimpleTest bug. I'm not very happy with the way they're displaying results, so expect a barrage of patches there (I've already opened up another bug requesting they hide pass cases by default--a trivial fix, but they have to agree to it).

sun’s picture

StatusFileSize
new47.39 KB

Applied coder_format on coder_format.

sun’s picture

StatusFileSize
new922.09 KB

For reference: This is patch against complete Drupal core, using the latest version of coder_format in this thread.

Among possible wrong formattings applied by coder_format, you'll see that Drupal core indeed contains bad coding-style. :)

ezyang’s picture

StatusFileSize
new31.84 KB

This patch is only for coder_format.inc. Please apply it in the proper directory, thanks. (I couldn't get CVS to ignore my test file changes). Fixed are the variable alignment and curly-brace index syntax.

Lowdown on current issues:

Multiline array comma-adding with trailing comments, triggered by:

<?php
$v = array(
'foo' => 'asdf',
'as' => 23 // something
);

We don't know when to add the trailing comma until we hit the parenthesis, but by then it's very difficult to get back to the proper string index to add it (a naive algorithm will accidentally append the comma to the comment). Somehow, this data needs to be made available.

sun’s picture

StatusFileSize
new61.98 KB

New patch,

- implemented logic to allow multiple tests in one test file
- splitted existing tests and added a some more
- fixed output of test results to be displayed in the page content
- added handling for class member function properties (such as abstract, private, public, protected)
- fixed class properties var
- improved some inline documentation

I've made these changes since they are not in the scope of this task.

Also, contrary to my earlier posts, I'd suggest to commit this patch now, so further patches will be smaller.

ezyang’s picture

StatusFileSize
new7.47 KB

Updated patch which fixes the multiline comma issue.

ezyang’s picture

StatusFileSize
new27.65 KB

All tests pass!

ezyang’s picture

StatusFileSize
new30.4 KB

Now with docs!

ezyang’s picture

StatusFileSize
new32.46 KB

Even better docs, and some bugfixes in the unit testing framework.

ezyang’s picture

StatusFileSize
new2.71 KB

Newest patch.

sun’s picture

Status: Needs work » Needs review
StatusFileSize
new13.48 KB

Please help ezyang to review Drupal core! Time's running out!

1) Download Drupal 6 tarball and extract it somewhere on your drive.
2) Checkout Coder from CVS (or download attached archive). If you choose to use an existing Drupal installation, make sure to create a backup first.
3) Open a command line.
4) Read the included README.txt for coder_format to find out how to apply coder_format recursively on Drupal core. Basically, it should work out like this (assuming Drupal core is in /websites/drupal):

Windows:

# cd path\to\coder_format\
# php coder_format.php --batch-replace c:\websites\drupal

Linux:

# cd /path/to/coder_format
# php coder_format.php --batch-replace /websites/drupal
webchick’s picture

I gave this a run-through tonight. Very impressive! I spot-checked a few files in core (not the whole thing) and for the most part it looks a-ok.

There are some clean-up things and bike-shed arguments that will likely need to happen before this is 'formally' marked fixed, but ezyang appears to have fulfilled the requirements for this task (a patch to coder_format that's RTBC, as it has been several times now), and he also has a good track record for coming back and cleaning up on tasks post-closure if any issues come up. Since tha_sun probably wont be around for another ~8-10 hours, and we're down to about 27 hours left for student to claim tasks, I'm going to go ahead and mark this closed in the Google tracker. Great job!!

sun’s picture

Indeed, Edward, your performance has been awesome! I really enjoyed to work with you on this task.

Given the concerns that were raised during this task's proposal, I can only say: The goals have been accomplished (by far exceeded), and development for coder_format has become much easier because of the implementation of SimpleTests.

@ezyang: If you could just post a patch including your latest improvements for inline PHP tags, that would be great. I'll do a code clean-up for coder_format, and a thorough review for Drupal core then. If you want to (I hope so), we can then join forces after you've completed your other GHOP tasks.

ezyang’s picture

StatusFileSize
new10.79 KB

Not a problem. Here's the updated patch. I accidently ran coder_format on coder_format, so there's also some stylistic changes in there too.

sun’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new969.73 KB

Heh... from now on, applying coder_format on a file can't be considered as an accident anymore!

All tests pass, so I've committed all changes in #52. Attached is a new patch against Drupal core for review.

I will add (and commit) unit tests for all wrong formattings I can find throughout Drupal core. Unfortunately, I've already found these:
(i) A case statement in a case statement (filter.module) gets wrong indents.
(ii) Line breaks are added to inline PHP that includes a control structure (like an if).
(iii) Line breaks are added after curly braces in a quoted string (usually found in some JavaScript generating code). Confusingly, the new unit test passes, but locale.inc is completely broken due to _locale_rebuild_js().

Setting to RTBC so this hopefully gains some more attention, reviewers, subscribers, and follow-ups.

@whoever: Feel free to post any pointers to wrong formattings. If you do so, please provide the same information like in the wiki page.

/me keeps dreaming of a patch against Drupal 6 core, and coder_format being added to the scripts directory in core...

ezyang’s picture

(message scrubbed)

ezyang’s picture

StatusFileSize
new5.8 KB

Case statement in case statement fixed. A quick note: In the future, try to make test-cases shorter. The really long lines in that one made it difficult to read the diff; I reduced it in my patch.

ezyang’s picture

StatusFileSize
new8.44 KB

New patch that causes all tests to pass. Now will investigate the JS bug.

ezyang’s picture

StatusFileSize
new9 KB

Hammered out a bunch of bugs with the case and case patch. I'm not seeing the locale.inc bug, sorry. We likely need docs for all the new code, so just tell me what you don't understand and I'll document it!

sun’s picture

StatusFileSize
new9.48 KB

Gotcha! Spent some time debugging the locale mystery...

I needed to switch the order of conditions for opening curly braces:

-          if (!$after_php && (!$in_variable && !$in_quote && !$in_object && substr(rtrim($result), -1) != '$' || substr(rtrim($result), -1) == ')')) {
+          $c = substr(rtrim($result), -1);
+          if (!$after_php && !$in_quote && (!$in_variable && !$in_object && $c != '$' || $c == ')')) {

All tests still pass, but locale_rebuild_js() in locale.inc looks finally correct now. However, I still don't get why the remaining conditions need to be enclosed by extra braces.

sun’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new7.17 KB

Fixed coder_preprocessor_switch_duplicate_exit() and cleaned-up some of the regular expressions.

Also, please note that http://drupal.org/node/213072 has been committed, so usage of coder_format.php has slightly changed.

sun’s picture

StatusFileSize
new8.51 KB

Post-processor coder_postprocessor_if_curly_braces() is finally working now, yay!

dww’s picture

I looked over http://drupal.org/files/issues/drupal6.patch from #53, especially stuff from update.module, since that's almost entirely my code. ;) Some valid concerns and bikeshed-esque nitpicks:

A) This looks like a bug:

@@ -133,12 +135,12 @@ function _update_cron_notify() {
  * This uses PHP4's lame XML parsing, but it works.
  */
 class update_xml_parser {
-  var $projects = array();
-  var $current_project;
-  var $current_release;
-  var $current_term;
-  var $current_tag;
-  var $current_object;
+  var$projects = array();
+  var$current_project;
+  var$current_release;
+  var$current_term;
+  var$current_tag;
+  var$current_object;

B) I really don't think hunks like this improve readability of the code, which is ultimately the point of consistent use of coding standards:

@@ -36,8 +37,8 @@ function theme_update_report($data) {
     return $output;
   }
 
-  $header = array();
-  $rows = array();
+  $header             = array();
+  $rows               = array();
 
   $notification_level = variable_get('update_notification_threshold', 'all');

C) I find this is an even more important regression for readability:

@@ -181,8 +193,8 @@ function theme_update_report($data) {
     $row .= t('Includes: %includes', array('%includes' => implode(', ', $project['includes'])));
     $row .= "</div>\n";
 
-    $row .= "</div>\n"; // info div.
-
+    // info div.
+    $row .= "</div>\n";
     if (!isset($rows[$project['project_type']])) {
       $rows[$project['project_type']] = array();
     }

Especially when there are multiple lines close by like this, closing their own div's, I really want to see the code comments label the exact line they're referring to. There's nothing in our coding conventions that says 1-liners at the end of a given line of code are forbidden (for good reason), and just because it confuses your regular expressions in here is not a good reason to call this a violation of code style or otherwise mark it as an error.

D) Do we actually have any specific guidelines for how to indent multi-line if clauses? I know we generally tend to avoid those in the coding style, but there is some pretty complicated if logic in update status module, and I think it's much more readable as multiple lines. I'm not sure the changes recommended here are any improvement to readability, either. I think it's visually clear how the logic is grouped via the indentation I used, so I'm curious what justification there is (if any) for the proposed changes. ;)

ezyang’s picture

These are good concerns you bring up. A is a bug, and is due to insufficient testing of OOP code; I'll check it out. B is, curiously enough, also a bug: alignments should not extend over blank lines.

C is contentious. I have heard that comments should never ever be on the same line as code, in Drupal. If this is not the case, we can undo this!

For D, do you have an example?

sun’s picture

Patch in #53 has become quite outdated in the meantime. So, A + B might already be solved.

Regarding C, you'll find a in-depth discussion here: http://drupal.org/node/150626
After reviewing plenty of reformatted code by coder_format I have seen many places (in core and contrib) where these changes really improved the readability, and the sense of a comment. Of course, there might be inline comments that may loose their context. However, given your snippet, IMHO this is exactly a great example of an inline comment that rather confuses than explains something. If that DIV was opened in another function or somewhere between the lines, a helpful comment could rather look like:

-    $row .= "</div>\n"; // info div.
-
+    // Close surrounding container, opened in foo_open_container()
+    $row .= "</div>\n";

So, generally spoken, these re-formattings help developers to ensure that their comments are in fact helpful to others who try to understand the code.

dww’s picture

A) Cool, thanks.

B) Frankly, I generally find "alignment" spacing to make things harder to read, and there's no mention of it anywhere to be found at http://drupal.org/coding-standards. Examples of it in core are rare. Seems like this is a personal preference, and not generally agreed upon, so I don't think it's wise for this code to enforce it.

C) Again, no mention at all at http://drupal.org/coding-standards. The closest reference is: "C style comments (/* */) and standard C++ comments (//) are both fine.". Well, C++ comments like this are often used inline with the code they clarify. ;)

D) There are a bunch of hunks in the patch, for example:

@@ -149,9 +159,10 @@ function theme_update_report($data) {
         $row .= theme('update_version', $project['releases'][$project['latest_version']], t('Latest version:'), 'version-latest');
       }
       if ($project['install_type'] == 'dev'
-          && $project['status'] != UPDATE_CURRENT
-          && isset($project['dev_version'])
-          && $project['recommended'] != $project['dev_version']) {
+        && $project['status'] != UPDATE_CURRENT
+        && isset($project['dev_version'])
+        && $project['recommended'] != $project['dev_version']
+      ) {
         $row .= theme('update_version', $project['releases'][$project['dev_version']], t('Development version:'), 'version-latest');
       }
     }

The original indentation aligns the if clauses inside the if () itself, so it's clear those belong to the if. The proposed indentation just normally indents, as if those are lines of code inside the code block defined by the if, not the if itself. see what i mean?

dww’s picture

ezyang’s picture

Issue tags: +GHOP

B. Actually, aligning assignments is a fairly common practice. I've had it recommended to me by George Schlossnagle; I think it does improve readability IF an excessive amount is NOT added.

D. Ew... that looks like some nasty hacks to the indentation code. I understand what you're talking about though. My personal preference is:

if (
    $cond ||
    $cond2
) {
    // Code
}

But that's because I use four spaces for indents. The convention makes less sense when there's only two indents:

if (
  $cond ||
  $cond2
) {
  // Code
}

I think, however, there are two conclusions to be made:

  • Not all coding standards are agreed upon!
  • For areas of ambiguity, coder_format should either leave them alone, or implement multiple behaviors so that people can choose

coder_format configuration will require quite of bit of architectural restructuring, though.

sun’s picture

Component: Code » Coder Format
Assigned: ezyang » sun
klausi’s picture

Status: Needs review » Closed (won't fix)

Coder for Drupal 6 is now frozen and only security fixes will be applied. Feel free to update this issue and reopen against 7.x-2.x or 8.x-2.x.