http://code.google.com/p/google-highly-open-participation-drupal/issues/...

Type: Code PHP VersionControl
Author: greg.knaddison

Changes in the code base can show whether or not a module is being maintained, or if it is receiving a lot of development and is perhaps unstable.

The following steps will enable typical site admins to see how many lines of code have changed in a module

* The Version Control API currently records the number of lines changed in each commit of code. This information is in the database but not available via the API. The first task is to add code to this module so that other modules can get information about.
o Design the function signature. What information will the function need to return the data? Can some of the variables be optional? If they are optional, what should the function do when they are excluded?
o Write a query that will use the data in the function to find the information in the database and return it. Be sure to use the Drupal Database API so that your queries will be secure.
o Consider what kind of return value makes sense for this function. An array? A single number?
* Review the Metrics module's system for adding measurements and add code that integrates the function you created in the first part of the task into the Metrics module
* Provide patches for both of these changes to the issue queue for each module

The final deliverable is a pair of patches to the Metrics and Version Control API modules that exposes the information in an API and includes that information into the Metrics module

Resources:

* Version Control API http://drupal.org/project/versioncontrol
* Project Metrics Module http://drupal.org/project/metrics
* Database Interaction page in the Secure Code handbook
http://drupal.org/node/101496

Estimated time:

3 days

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

webchick’s picture

Title: #30: Enable Metrics and Version Control API modules to show lines of code changed » GHOP #30: Enable Metrics and Version Control API modules to show lines of code changed
Project: Google Highly Open Participation Contest (GHOP) » Version Control API
Version: » 5.x-1.0-rc4
Component: GHOP Task » Code
jpetso’s picture

Whoo! Go Corsix go.

corsix’s picture

Ok, started on the first part of the task.
Added a new API function to versioncontrol.module (ghop30_versioncontrol.patch) - versioncontrol_get_commit_statistics. This is intended to compliment versioncontrol_get_commit_actions; one gets very granular details/actions of a commit, whereas the other gets general statistics of the entire commit (currently the number of lines added, number of lines removed and number of actions).
Added this function to versioncontrol_fakevcs.module (ghop30_versioncontrol_cvs.patch) and versioncontrol_cvs.module (ghop30_versioncontrol_cvs.patch) and extended commitlog.module to add these numbers to the commit log (ghop30_commitlog.patch).

(all patches against CVS head)

1) Should patches for multiple files be combined into one, and if so, how?
2) Metrics works on a node / node ID, so how can this relate to a (set of) commits?

jpetso’s picture

Status: Active » Needs work

Heya Corsix. I'm currently a bit stressed with university courses so I only took a short look into the patches (without applying, testing or committing them). As for your questions:

1) Combining all files in one patch would be nice, but is not absolutely required. The ideal patch comes per project and feature, preferably - so, in this case the Version Control API and the CVS backend would get one patch each. Only if you like to do it that way, it's no problem to apply them separately.

2) The Version Control / Project Node integration hooks up commits with (project) nodes, that is, you create a node and assign a directory in the repository to it. That's what drupal.org is supposed to use soon, and installing it is far easier than Version Control API & backend ;-)
In order to get all commits for a certain project node, you probably want to do something like this:

// Prepare the constraints parameter for versioncontrol_get_commits().
$constraints = array(); // no constraints apart from project specific ones

if (module_exists('versioncontrol_project')) {
  // Add the node ID as additional, versioncontrol_project specific constraint.
  $constraints = versioncontrol_project_get_commit_constraints(
    $constraints, array('nids' => array($nid))
  );
}
$commits = versioncontrol_get_commits($constraints);

Improvement proposals for your patches, from a quick look at it:

* The API documentation (including the FakeVCS part) is ok, and the integration in the CVS backend leaves nothing to wish for.
* Fetching commit actions is assumed to be potentially resource intensive or slow, so it should be avoided to call it more than once whenever possible. I would prefer $commit_actions to be retrieved by the caller (needs to be done anyways in most cases) and given to the statistics function as an additional parameter so that you don't need to call get_commit_actions() there. Minor nitpicking, of course ;)
* Major nitpicking: the user visible text (including the handling of plurals) thing that you add to Commit Log is not internationalized, thus not translatable into other languages. Fortunately, Drupal has a nice function called t() (link to API documentation) which not only translates texts but also handles plurals in a way that languages with more than two plural forms (yeah, those exist) can adapt the texts accordingly. Using t() for all user visible texts (except log entries maybe) is a hard requirement.
* I'm wondering if the standard deviation is informative enough to be displayed in the Commit Log. It's a nice metric I guess, but the log is already very crowded - when I browse commits, I'd probably prefer to see the number of changed lines for each file, whereas if I'm the Metrics module then the total number of lines per commit and maybe the average of changed lines are interesting to me. In any case I think the squared number of lines is too specific and only covers a minor part of use cases, I think it should be left out of the API function and be calculated further up the stack, like in Metrics itself.
* $total_num_actions can very easily (and efficiently) be retrieved by doing count($commit_actions). Nevertheless, a statistics function can certainly include even easily retrievable details. (Right? Other opinions?)

Nevermind the not-yet-perfect points, I greatly enjoy this set of patches. The remaining stuff is mostly subject to personal taste (apart from using t(), of course) and/or thinking about usability, neither of which is subject to the GHOP as far as I can see. Would be no problem to do the rest of the work by myself if that is necessary (give me one week to reach the christmas holidays, though), any pointers from you official people (greggles, webchick, snufkin) on how to proceed with this issue?

corsix’s picture

* I'll add $commit_actions as an optional parameter, so that it can either be specified for efficiency, or left absent for convenience of when the caller isn't interested in actions. Additionally, some future backends may not want/need the actions in order to get statistics.
* I agree that s.d. isn't needed in the commit log (nor is the action count), however I would prefer to leave the squares in the API, as without them s.d. cannot be calculated further up the stack - if lines_added is 5, lines_added_squared may not be 25 (I tried to show this in the fakeCVS example). On the other hand, I cannot think why you would want to know the s.d., but options should be left open incase a need arises in the future.
* If $commit_actions is optional, then there is argument for keeping $total_num_actions in the API for when $commit_actions is obtained by the calling function. Furthermore, a statistics function should be self contained, even if that means duplicating information that can be obtained through other means.

I'll work on part 2, and the other things you picked up on.

jpetso’s picture

> * I'll add $commit_actions as an optional parameter, so that it can either be specified for efficiency,
> or left absent for convenience of when the caller isn't interested in actions. Additionally, some
> future backends may not want/need the actions in order to get statistics.

Good point, making it optional is probably the best of both worlds :)

> * I agree that s.d. isn't needed in the commit log (nor is the action count), however I would prefer
> to leave the squares in the API, as without them s.d. cannot be calculated further up the stack -
> if lines_added is 5, lines_added_squared may not be 25 (I tried to show this in the fakeCVS example).

Oh yeah, right. Note that in the current state, it's not possible for callers to retrieve the number of changed lines for a single commit action - I'd really like to have this, and as a bonus we could drop the squared value out of the return value. Perhaps something like

return array(
  'lines_added' => $lines_added,
  'lines_removed' => $lines_removed,
  'number_actions' => $num_actions,
  'action_statistics' => array(
    $action1_key => array(
      'lines_added' => $lines_added,
      'lines_removed' => $lines_removed,
    ),
    ...
  ),
);

What do you think? Good idea or less so?
Anyways, I'm going to sleep. Good luck with part 2!

corsix’s picture

Status: Needs work » Needs review
FileSize
629 bytes
4.65 KB

Updated part 1.

Implemented part 2 (versioncontrol.inc.zip) - should this be posted in an issue in the metrics issue queue?

jpetso’s picture

Status: Needs review » Needs work

Ah, so that's how Metrics works! Cool stuff, drewish certainly is the man :D (and so are you, Corsix!)

As for the patches, I didn't mean them to be posted as .zip (since when does drupal.org allow this, anyways?) but rather as one .patch file that is comprised of multiple file changes. Zip files are discouraged on drupal.org, although out of other reasons ;)

So I guess you currently do something like cvs diff -u versioncontrol.module > ghop30_versioncontrol_1.patch.
If you just do cvs diff -u > ghop30_versioncontrol_1.patch (without specifying the file) then it diffs the whole directory, and you get a single file for multiple changes.

Looking at the code, it seems that you don't need to get any patch into the Metrics module; rather than that it would be best to keep the code where it is closest to... and that would be the project node integration module. So if I got Metrics right, then one only needs to rename metrics_versioncontrol_metrics_functions() to versioncontrol_project_metrics_functions() and metrics_versioncontrol_lines_changed() to versioncontrol_project_metrics_lines_changed(), and I can add it to the versioncontrol_project module without Metrics being actually involved :D

That one will then be named versioncontrol_project.metrics.php, or versioncontrol_project_metrics.php, or versioncontrol_project-metrics.php... is there any policy on naming additionally included files? (webchick?) well whatever.

Short statements for the rest of the code:

* Totally cool. Get your favorite Drupal GHOP administrator and tell her that jpetso thinks you should get the prize, based on these patches.
* I believe that the $statistics parameter that you added to theme_commitlog_commit_actions() is still a remainder from the previous version, right? Because it's not used in that specific function. Or is there any other reason why it should be there still?
* If $commit_actions is optional, it should probably be set to NULL as default value for the function parameter.
* Also, I think more than three or four action statistics example in the FakeVCS function don't add any more information value, but well, that's really minor now ;)
* Have I mentioned that this is really nice work?

Thanks for your contribution, Corsix! I'll apply it next weekend (with the mentioned changes, whether you perform those or I do), until then I've got a pretty busy week still 8-)

corsix’s picture

* $statistics parameter : I thought about adding the 'lines changed per action statistics' to the actions in the commit log, but then decided that it wasn't needed, as the log gives the totals for each commit. So yes, it's not currently used in the function. It can either be used to add lines added/removed on a per-action basis, or it can be removed.
* $commit_actions : It defaults to NULL in the frontend (function versioncontrol_get_commit_statistics($commit, $commit_actions = NULL), so it is slightly redundant to do the same at the backend (or this could be my C roots showing though; function definitions have default values, function implementations don't).
* FaveVCS : I wanted the get_commit_statistics example to use the same 8 actions as the get_commit_actions function, to show that they are two sides of the same coin

corsix’s picture

Status: Needs work » Needs review
FileSize
2.32 KB
3.39 KB
8.94 KB

Strangely, cvs diff -u works recursively when run in a *nix virtual machine, whereas it wasn't when run on my main windows box. Hence the patches below are nicely organised all-in-one per-module.

* Decided to remove the $statistics parameter from commitlog_commit_actions
* Decided to add $commit_actions defaulting to NULL in the fakeVCS and CVS implementations of the function
* Moved the metric function into versioncontrol_project

jpetso’s picture

Status: Needs review » Reviewed & tested by the community

Decent, that's in a state where I would say it's worthy of committing. For future use, remember that you can also check on $something == NULL in a cleaner (and faster) way by checking for isset($something), or its negation respectively. Nonetheless, RTBC!

greggles’s picture

Great work guys - I forgot to "subscribe" to this issue with a followup so my apologies for not staying more involved. jpetso and corsix - great work!

hunmonk’s picture

Status: Reviewed & tested by the community » Fixed

committed all three patches to HEAD.

@jpetso: please file issues for the tweaks you mentioned

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

  • Commit 67382be on 5.x-1.x, 5.x-2.x, 6.x-1.x, repository-families, drush-vc-sync-unlock by hunmonk:
    #197212 by corsix: Enable Metrics and Version Control API modules to...

  • Commit 67382be on 5.x-1.x, 5.x-2.x, 6.x-1.x, repository-families by hunmonk:
    #197212 by corsix: Enable Metrics and Version Control API modules to...