We have data from CVS module on the people who commit code. We don't have automated data collection on people, who contributed to the code being committed. It would be nice to parse the commit messages automatically and parse out usernames, so that we have an automated way to list people, who contribute to modules and Drupal core. This was done manually several times before and is/was/can be based on the CVS commit messages. The suggested CVS commit message format says we should mention the issue number, then the contributor usernames and then explain the issue.
Possible problems with automated parsing is that usernames are not always spelled correctly. Also, some people have the tendency to include real names instead of usernames. So there will be some false data on the site, but generally, this can help collect the contributors.
Previous work done include:
- SQL based on just having comments to issues related to releases: https://infrastructure.drupal.org/node/68
- SQL based on having attachments to issues based on dates: https://infrastructure.drupal.org/node/37
- SQL based collaborators to issues: https://infrastructure.drupal.org/node/25
- cvs-release-notes.php and spreadsheets based: http://groups.drupal.org/node/8497
I am creating this issue for drupalorg module, since parsing these messages is highly related to Drupal CVS commit message standards and would not be applicable outside this project. If it is deemed suitable, then we can always make this a submodule of project or CVS module.
Comments
Comment #1
webchickHeck yeah! subscribe.
Comment #2
dwwSee also #52285: Links to users referenced in commit messages
In a way, I'd rather it was:
since that'd make it easier to parse, and we could have links to the users as per #52285. The world is full of @user syntax, so it's not like that's a stretch for people to understand. However, changing people's commit message habits (not to mention the 184636 existing commit messages) is going to be quite a challenge. So, perhaps a heuristic along the lines of what Gabor proposes is the best we can come up with.
That said, I don't mind putting this code in cvs.module. d.o isn't the only site that does something like "#issue by user: what" commit messages. cvs.module already has the code for #issue to become a link. We can just document how it parses the messages and if folks want to use its features, to follow its conventions.
Someday (hopefully once it's versioncontrol_cvs, not just cvs.module itself), I'd like to add code so that these issue numbers are parsed as the commit happens, and there's a trivial lookup table for commit id to issue nid mappings. Then, whenever you view an issue, there's an inline table (or block, whatever) of all the commits that reference this issue nid. We *could* do that now, but it'd be punishingly slow for the DB... But, now I'm getting off topic. ;) However, if we had a more fool-proof parsing method (e.g. @user) we could do a similar thing for users and commit messages and use the (very cheap to index) int lookup table for issue nid, project nid, commit id, uid...
Comment #3
jpetso commentedIn the long run, I'd like to decouple that specific attribution format from versioncontrol, because it's hard to parse, not extensible at all, and other projects also use other formats. (I think Git does a much better job with its
Approved-by: [user]format.) But for the time being, I guess on-the-fly parsing is the best we can get, and switching formats at this time is going to be hard if not unfeasible.Comment #4
dwwRe: my comments at #2, see #443000: When viewing an issue, display a list of commits that reference that issue # for more...