Note: We consider this task to be more important than the related task of implementing a Git backend for the Version Control API. However, you may choose whatever task interests you most.

The Version Control API module is a relatively new module that provides functions for interfacing with the server side of version control systems (VCS). In order to work, Version Control API needs at least one VCS backend module that provides the specific VCS's functionality. At the moment, only a back end for CVS has been written (see references).

For this task, you will create a module similar to the Version Control API -- CVS backend module but which instead provides an implementation of the Mercurial version control system. You should also have a look at the example "FakeVCS backend" that ships with Version Control API itself, and the overall OVERVIEW.txt for a better understanding of the API's main concepts.

Your module should have functionality similar to what is currently present in the CVS backend.

Deliverables:
* A new versioncontrol_hg.module that implements the required functions for Version Control API backends. These are mostly functions that transform the revision data from the database representation to the API's array format. Here's the exact list of required functions:
* hook_versioncontrol_backends()
* versioncontrol_hg_get_commit_actions()
* versioncontrol_hg_get_directory_item()
* versioncontrol_hg_get_commit_branches()
* versioncontrol_hg_get_branched_items()
* versioncontrol_hg_get_tagged_items()
* versioncontrol_hg_get_current_item_branch()
* versioncontrol_hg_get_current_item_tag()
* versioncontrol_hg_get_parent_item()
* ...and others that you might consider practicable - in particular, you'll probably need versioncontrol_hg_commit() for managing additional commit data in the database.
* Functionality to import commits from hg logs, similar to the CVS backend's "log fetching" functionality.
* Hook scripts that enable recording and access control for commits, similar to the CVS backend's xcvs-* scripts.
* The task will be complete when the submitted module is marked as RTBC by one of the mentors.
* Develop a database table structure to store information that is required for repository configuration, user account properties and displaying transactions.
Resources:
* Version Control API Module ( http://drupal.org/project/versioncontrol)
* Version Control API -- CVS Backend ( http://drupal.org/project/versioncontrol_cvs )
* Mercurial (http://www.selenic.com/mercurial/wiki/)
* Mercurial commit hooks ( http://hgbook.red-bean.com/hgbookch10.html)

Contact:
* chx ( http://drupal.org/user/9446)
* jpetso (http://drupal.org/user/56020 )

CommentFileSizeAuthor
#7 hg-commitlog.png43.29 KBezyang

Comments

jpetso’s picture

This issue on Google's GHOP issue tracker: http://code.google.com/p/google-highly-open-participation-drupal/issues/...
Claimed by ezyang. (Go Edward go!)

jpetso’s picture

ezyang has created a project for the Mercurial backend at http://drupal.org/project/versioncontrol_hg - please update your subscription settings there so this issue can be moved over to the new project.

@aclight: could you notify ezyang of this issue's existence, and that any updates should not only go into the GHOP issue tracker but in here as well? thanks!

aclight’s picture

Title: GHOP #1XX: Write a Version Control API backend for the Mercurial RCS » GHOP #172: Write a Version Control API backend for the Mercurial RCS
Project: Version Control API » Version Control API -- Mercurial Backend
Version: 5.x-1.0-rc4 »

I'm fixing the title and moving this into the Mercurial queue.

ezyang’s picture

Ok, I will post issues here. For now, the only problems are unimplemented features and some concerns raised at the bottom of the README.txt file.

jpetso’s picture

Ok, thanks for posting here.

For the concerns in the README.txt file, I would consider the following:
* The exact time of the branch and tag operations is not really that critical, it's just a measure so that they'll show up correctly in the commit log (once displaying commits, branch ops and tag ops in one list is implemented there). So I'd suggest that tag operations are assigned the time of the changeset plus one (== a second after the commit) and for the branch operations as well, assuming it's feasible to retrieve the changeset that this branch was branched off.
* What you are using as 'node' is really the 'revision' property {versioncontrol_commits}. Good point about the missing index for that column, I'm going to change that in versioncontrol.install. (Assuming that MySQL and Postgres let me assign an index to a 255-length varchar. Need to try that out, otherwise it may still be a good idea to shrink that field.) If you could replace 'node' with 'revision', I'd very much appreciate that.

The helper library looks good so far, I'm looking forward to the rest of the module :)

Also note my write-ups about commit restrictions and user authentication from today - short version: you don't have to implement access control scripts, and account import/export is too complex to support all kinds of authentication methods that are provided by Mercurial, so at maximum an .htaccess/.htpasswd file would be nice. (That's the least pressing of all issues, though.)

If you're interested, you can also have a look at the issue for the Git backend, there's quite a bit information there already, some of which also applies to Mercurial.

ezyang’s picture

Assigned: ezyang » Unassigned

The module now works! I'd like some feedback on the things I posted in the KNOWN ISSUES section of README. In particular, I'd like to know which repository specific informations commitlog uses, what to do about hashes versus numeric revision IDs, docblock duplication, refactoring of versioncontrol to remove common code from git/hg/svn/cvs, and the nodeid lookup issue (i.e. how to let commitlog know that we're missing info). Thank you!

To set up the repository, enable the module (if you enabled it before, you may need to disable, uninstall, and then load again), use the standard repository creation interface, and point it to an existing Mercurial repository on your computer. Run cron, and then check the commit log.

ezyang’s picture

Assigned: Unassigned » ezyang
Status: Active » Needs review
StatusFileSize
new43.29 KB

With the latest commits, there has been quite a bit of code cleanup, and a proper implementation of source item detection. There are now no major issues with the module (save missing functionality!) I've attached an image of the commit log for the curious.

I've been keeping my eye on the git implementation, and I've specifically avoided (I think) all of the issues posed in #21, as well as the earlier ones.

jpetso’s picture

Assigned: Unassigned » ezyang
Status: Needs review » Needs work

Ok, I just lost like two hours (a bit less, maybe) of writing up stuff in here. So, 1. sorry for that, and 2. sorry for being late with a review altogether. Running out of time again (sounds familiar), so I'll just tackle the most important points:

  1. Maybe it's insufficiently documented, but leading slashes are actually the standard for paths throughout the Version Control API, so they should definitely be passed with a leading slash and probably be stored this way as well. Commit Log may rely on this, backend modules are expected to pass paths that way.
  2. versioncontrol_hg_get_branched_items() and versioncontrol_hg_get_tagged_items() just need to return an empty array, saying that the branch/tag operation spans the whole repository.
  3. As for versioncontrol_hg_get_parent_item() being "somewhat icky": Git and CVS don't version directories as well, so the directory should just contain an empty revision. It's will still possible to keep navigating around in the same tag/branch/changeset by remembering the "state" (operation array and/or commit branch, mostly) as item inside the 'hg_specific' array. This is also the reason that get_parent_item() is done by the backend instead of by the API module itself.
  4. Account tracking is done by Version Control API and Commit Log itself, you only need to ensure that the 'username' is the same for accounts and commits. So if you use the email as 'username', the user has to register an account with the email address as 'username' and Version Control API / Commit Log automatically does all the mapping.
  5. To be honest, I don't really like the *very* normalized current table structure. It might be fast, but it's looks a bit ugly. I tend to think that the difference in performance between joins and normalized table is neglectable, especially when considering that the API module still has architectural issues (Major issues #1a, #1b and #2) that will increase performance far more than micro-optimization like joins vs. normalization could ever do. If you want to help out with performance, please neglect the listed Todo items from your README.txt for now, and consider chiming in with the upstream issues instead. Golden rule: reducing the number of queries is always more performant than optimizing them :D
  6. Btw, the Version Control API issues mentioned above were triggered by your suggestion in the README.txt as well... I had good reasons to let the backends do item revision and commit branch tracking, but I can see now that centralizing more of that stuff has probably better reasons still.
  7. A commit can happen in multiple branches also in Mercurial - when the original commit branch is branched off later, you've got two branches then and can't tell which one the original one was. Please consider a {versioncontrol_hg_commit_branches} table like the one in the most recent versions of the Git backend.
  8. Access control hooks are arguably less important, but it would still be nice to have a hook that logs the commits - even if it's just the poor man's version like in the Git backend, where just the repository update function is being called.
  9. I think using the SHA-1 hashes for the 'revision' property is really the right thing to do - the limitation here is on the side of Commit Log which should provide a hook to "theme" (cut down) the hash to a more recognizable substring, like hgweb and gitweb do.
  10. Likewise, emails are probably the right thing to track, they should just be displayed as real names when being themed by commitlog/versioncontrol_hg. For that matter, an additional column in {versioncontrol_hg_commits} being 'realname' would be a nice thing to track, this would enable real name displays like on this hgweb example. Apart from that, I don't think that displaying emails is a critical privacy issue - everyone who is able to clone the repository can see the mail addresses too. But yeah, it's true that listing them on the web page makes them more visible and thus prone to spambots.
  11. As for docblock duplication, all hooks are documented in either hook_versioncontrol.php or versioncontrol_fakevcs.module, so duplicating them or not is the sole decision of the module maintainer, i.e. you. Feel free to remove the duplicated docs if you prefer it that way.
  12. I didn't find the time to try out your backend so I can't confirm this, but the commitlog screenshot suggests that source items, actions and paths are passed incorrectly in places: the "new:" label doesn't include a directory (which should at least be "/" or some other subdirectory - taken from the "directory" property of the $commit array), and the missing hashes for the foo.txt and lossless.style files mean that the commit 'action' doesn't match the 'source items' array. Might be a minor issue, though... I'm not sure what goes wrong here.

I think that's it from my side... leaving until the 8th of February, so chx will do any remaining reviews and evaluate your work. There might be some edge cases that are not yet perfectly handled, but overall I'd say that you did good research and developed the backend from a solid base, so later changes should be relatively easy to do. Congrats, thanks, and good luck for the remaining days of the GHOP! I'm off now. *shwoop*

aclight’s picture

Re #10 above (whether to display email addresses or not): What about somewhat hiding them like Google does on Google code? So, you might have ez.....@example.com instead of ezyang@example.com.

ezyang’s picture

Thank you for taking two hours to review this module. Your comments are greatly appreciated!

For reference, the points can be categorized as such:

No action needed: 6
Upstream: 9, 10
New features: 2, 3, 4, 8
Modifications: 5, 12
Trivial: 1, 11
Unknown: 7

For point seven I have to respectfully disagree: Mercurial commits only apply to one branch. For example, in your use case, "commit on master and then branch off drupal-6.x", the commit on master would count as one commit, and then the "branch off" would be another commit. Or, in the case that drupal-6.x already existed, it would be a "merge" not a "branch" (but still would get its own commit). Re-recording the commit to the master branch under the "drupal-6.x" branch wouldn't make any sense, because the data would be duplicate, and it never actually happened to "drupal-6.x" (it's a "virtual" commit).

My proposed behavior does lead to a limitation, which is that branch history stops on copy/rename. We can, however, resolve this by checking the parents of the earliest branch revision and then further retrieving the history for them (I don't know if this can be done in one SQL query or not).

I agree with everything else and will be addressing them shortly.

ezyang’s picture

Status: Needs work » Needs review

I'm considering this GHOP-task complete, although there are a few still unimplemented features that are out-of-scope for GHOP (and probably more kinks to work out with real-world use).

I've also chosen to disagree with point five, simply due to the fact that I would need really complicated joins otherwise (there's already some really convoluted ones for DELETEs), and there is never any situation in which the normalized (not denormalized; check the vocabulary) database would be easier to use/better.

ezyang’s picture

Version: » 5.x-1.x-dev
Status: Needs review » Closed (fixed)

I'm marking this as closed, since GHOP is over. Further issues with the module should be get their own issues.

jpetso’s picture

Ok, back from my holidays :)
I'm glad this task worked out so nicely, thanks a lot for working on the module!

Disagreement on points 5 and 7 accepted - I was outright wrong with 7 (changesets are indeed assigned to exactly one branch only) and the argument on 5 is valid. Sorry for wrong usage of vocabulary for normalization x-)

Yeah, and one more thing (unrelated to this issue, but I need to put it somewhere): you don't need to manually close other issues - "fixed" is a perfectly fine state for issues that have been fixed, and issues with status "fixed" will be "closed" by the closebot after two weeks of being marked as "fixed".

So, great work. I very much appreciate your involvement with upstream issues, and would be glad if you'd stick around for a while :)