The Version Control API module is a relatively new module that provides functions for interfacing with the server side of version control systems (VCS). In order to work, Version Control API needs at least one VCS backend module that provides the specific VCS's functionality. At the moment, only a back end for CVS has been written (see references).

For this task, you will create a module similar to the Version Control API -- CVS backend module but which instead provides an implementation of the Git version control system. You should also have a look at the example "FakeVCS backend" that ships with Version Control API itself, and the overall OVERVIEW.txt for a better understanding of the API's main concepts.

Your module should have functionality similar to what is currently present in the CVS backend.

Deliverables:
* A new versioncontrol_git.module that implements the required functions for Version Control API backends. These are mostly functions that transform the revision data from the database representation to the API's array format. Here's the exact list of required functions:
* hook_versioncontrol_backends()
* versioncontrol_git_get_commit_actions()
* versioncontrol_git_get_directory_item()
* versioncontrol_git_get_commit_branches()
* versioncontrol_git_get_branched_items()
* versioncontrol_git_get_tagged_items()
* versioncontrol_git_get_current_item_branch()
* versioncontrol_git_get_current_item_tag()
* versioncontrol_git_get_parent_item()
* ...and others that you might consider practicable - in particular, you'll probably need versioncontrol_git_commit() for managing additional commit data in the database.
* Functionality to import commits from Git logs, similar to the CVS backend's "log fetching" functionality.
* Hook scripts that enable recording and access control for commits, similar to the CVS backend's xcvs-* scripts.
* The task will be complete when the submitted module is marked as RTBC by one of the mentors.
* Develop a database table structure to store information that is required for repository configuration, user account properties and displaying transactions.
Resources:
* Version Control API Module ( http://drupal.org/project/versioncontrol)
* Version Control API -- CVS Backend ( http://drupal.org/project/versioncontrol_cvs )
* Git (http://git.or.cz/)
* Git commit hooks (http://www.kernel.org/pub/software/scm/git/docs/hooks.html )

Contact:
* chx ( http://drupal.org/user/9446)
* jpetso (http://drupal.org/user/56020)

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jpetso’s picture

This issue on Google's GHOP issue tracker: http://code.google.com/p/google-highly-open-participation-drupal/issues/...
Claimed by boombatower. (Go Jimmy go!)

boombatower’s picture

I found a bug in the install hook of the Version Control API and documented it here: http://drupal.org/node/213581.

boombatower’s picture

Title: GHOP #170: Write a Version Control API backend for the Git RCS » GHOP #1XX: Write a Version Control API backend for the Git RCS
FileSize
10.77 KB

I think I have completed most, if not all the requirements for this module. Please let me know if I'm missing anything.

Also not that if there is some functionality that is out of the scope of this task I would not be opposed to working on it outside of the GHOP.

aclight’s picture

Title: GHOP #1XX: Write a Version Control API backend for the Git RCS » GHOP #170: Write a Version Control API backend for the Git RCS
Assigned: Unassigned » boombatower
Status: Active » Needs review
dww’s picture

Title: GHOP #1XX: Write a Version Control API backend for the Git RCS » GHOP #170: Write a Version Control API backend for the Git RCS

I'll have to take a closer look at this another time, but this is extremely promising and exciting. Thanks!

boombatower’s picture

jpetso’s picture

Status: Needs review » Needs work

That's a great start, and the log parsing code looks like it could work nicely. (Haven't tested yet, I'm going to do that over the weekend.)

However there's a few points so I won't yet mark it as RTBC... I'd even go as far as to claim that the hardest part is still lying ahead of you. Here are my initial observations:

  • The module still uses commit_id in places whereas your database scheme already uses vc_op_id, which means the module cannot work as is. (Have you tried viewing the recorded commits in the Commit Log?)
  • Calling just "git log" only retrieves commits of one single branch (the active one), and that is likely to be "master". We need commits from all branches, though.
  • The reason that the CVS backend doesn't record branch and tag operations is because they can't reliably retrieved from the CVS logs. Git, however, should be able to get us the commit identifiers at least of created tags, so additional functionality to capture those as well would be nice. See below for the database considerations for that.
  • Git always branches the whole repository instead of single files, so the {versioncontrol_git_item_branch_points} table is unnecessary. Rather than that, we probably want a {versioncontrol_git_branch_operations} table (additional info to {versioncontrol_branch_operations}) that can hold a 'vc_op_id' (for the branch operation, primary key), the 'revision' that specifies in which state the repository was branched, and the source_branch_id. Getting this info out of the logs won't be all too easy and is probably out of scope for the GHOP issue, but the database schema should be ready to hold the info that we need.
  • Likewise, tags are usually also done repository-wide - tags can also apply to single files, though. So what we need to capture here is primarily the commit identifier (SHA-1 hash, a.k.a. 'revision' column) that specifies the state of the repository when this tag was created. That info should go into a {versioncontrol_git_tag_operations} "addendum" table, like the branch one mentioned above (with 'vc_op_id' identifying the tag op in {versioncontrol_tag_operations}). I'm not sure if it's possible to determine whether tags apply to the whole repository or to single files... the {versioncontrol_git_item_tags} table would only be useful if we can distinguish between those cases.
  • I note that you took versioncontrol_cvs_get_previous_commit_id() and adapted it so it returns the previous revision in time. That's actually not the real purpose of that function - the intention behind this is to be able to retrieve the history of a file, back and forth within a branch. So the previous revision should not be the previous one in time but the previous one in that specific branch, and that info should rather be retrieved from the log itself than from the database. The SVN backend has a 'source_revision' column in its item_revisions table, but even that will probably not suffice for the Git backend because git can merge multiple files into one. So there can't be just one previous revision, instead there might be multiple source revisions for a given file. How this should be recorded in the database (or if multiple source files should be recorded at all) needs to be figured out still.
  • In .admin.inc, commenting out 'updated' in the extract_repository_data() function is not necessary, and you can safely delete 'modules' and 'run_as_user' as those make little sense for Git.
  • And... I hate to say it, but chx meant that including both the log parser and the hook scripts in the list of deliverables would be ok. So... the log parser is here, but the hook scripts are yet to be done.
  • And one minor point: Git is not "GNU Interactive Tools" (that's another package with a similar name) and not fully capitalized by its authors - so it's "Git" rather than "GIT". Nothing to greatly worry about, though :)

Those are all points that I can think of at the moment... there may be more once I get to test the module. Don't let this list get you down - the work you did up to now is very cool, it just needs some refinements :P

Would everyone agree that it's probably a good idea that the code should be committed to the versioncontrol_git project, and we go with patches to the then-current state from here on? Also, as the Git backend project exists now, I think this issue should move there - make sure to have issue subscriptions enabled for that module.

aclight’s picture

Would everyone agree that it's probably a good idea that the code should be committed to the versioncontrol_git project, and we go with patches to the then-current state from here on? Also, as the Git backend project exists now, I think this issue should move there - make sure to have issue subscriptions enabled for that module.

Now that boombatower created a Version Control API -- Git Backend project, I agree that this issue should be moved there. I'll let boombatower do that, so that way he can subscribe to all issues of that project first if he hasn't already. (As a reminder, everyone will need to edit their Subscription prefs. since new projects don't inherit the "All issues", "My issues", etc. subscription options set before the new project was created).

As for committing the code, if it makes sense I think it will be easier to track this from the GHOP perspective if everything for that task is kept i one issue. However, I realize that this is a rather large task, and if it makes more sense for boombatower and/or jpetso to commit some starter code and then create new issues dealing with modifications to that code, that's perfectly fine with me. If that happens, a link to each issue that needs to be completed for this GHOP task to be marked complete on this issue would be helpful.

As others have said, this is great work so far.

dww’s picture

FWIW (and I'm not learned in the ways of GHOP admin'ing), I think it'd be best to commit what he's got already to cvs.d.o/contributions/modules/versioncontrol_git, and fix these things as followup patches/commits, instead of doing it all via reposting changed files (or patches) in here. That's what version control is for, after all. ;) Better to get the code up where others can see it, since that'll make it easier to test and to get reviews. Clearly, after only a few more iterations, this will be "done" as far as GHOP is concerned, and you can just upload the latest revisions of the files to the code.google.com site to make them happy...

That's my preference, anyway.

Thanks again to everyone -- boombatower for the code, jpetso + aclight for the reviews, etc, etc.

p.s. Utterly out of scope for this issue, but I figured I'd point interested parties here: http://groups.drupal.org/node/8102 -- that's what needs to happen before we can deploy versioncontrol_api on d.o, which is obviously of interest to everyone involved in this effort, too. Seems like we're not going to be able to do that before the D6 port of project*, so it's not the top priority anymore, but it'd be great to get that list hammered out in the next few months. Getting vc_api on d.o will be a huge step towards hardening and stability that'll make vc_* much more legitimate in the eyes of site builders around the world.

aclight’s picture

@dww: That's perfectly fine to commit what's available now and then patch from there. As you said, that is what version control is for, and we might as well take advantage of such features.

@all: Once the code from this issue is committed it will be a little unclear from looking at the official Google task tracker what issue to look at to monitor progress. So, if boombatower can just keep us posted occasionally there, and then post when you feel you've completed the task, we can look through the issues created in response to this GHOP task and make sure you've completed the requirements and mark this as closed. Or, if chx stays involved in the review process, he can just close the GHOP task when boombatower's done.

So, in summary, do as you'd like with respect to using the tools available to you, and we'll keep up with the GHOP task status as necessary.

boombatower’s picture

Project: Version Control API » Version Control API -- Git backend
Version: 5.x-1.0-rc4 »

I have fixed a few of the requests, from jpetso, and will work on the others, but I figured I would post a response to get the ball rolling on a few of them.

  • The commit log works great. That is how I tested my module to make sure that it worked. If you would rather I do something differently let me know. Needs Response
  • Reads from all branches and records commit as being part of particular branch. Fixed - commit #97691
  • May be out of scope. (will attempt inside GHOP if enough time) Active
  • May be out of scope. (will attempt inside GHOP if enough time) Active
    • Removed versioncontrol_git_item_branch_points and made related changes. Fixed - commit #97718
  • It records the tag information to the best of my knowledge. There may be a better way to do it and get more information, if so let me know. Fixed - commit #97717
  • Not sure that this needs to be changed by my understanding, but I'm not sure I fully understand what is being requested. Needs Response
  • Left over code. Especial on the modules stuff still wasn't a 100% sure that GIT didn't have something similar. Fixed
  • Implemented to the extent that I could. I have the Version Control API being called when a commit has been submitted. Git doesn't appear to provide any interface for figuring out the user attempting to make a commit before it has been committed. That makes it relatively impossible to complete the commit access portion of the script. It doesn't seem to have any hooks for tags. The generate password script could be completed, but the only documentation I have seen uses Git with Apache (similar to SVN) and I'm not sure if that is correct, and I've had a hard time setting up accounts through the API. Needs Response - commit #97746 & #97747
  • Not sure why I didn't relize that. Fixed

The original code has been committed in commit #97667. Any items marked 'fixed' without a commit reference are in the original commit.

Other commits:

  • Multi-line comments maintain multiple lines. commit #97693
  • Administration repository edit bug that cause the page to throw an error any time it was saved. commit #97744
boombatower’s picture

Status: Needs work » Needs review

I updated the post above as I completed tasks, but now I believe the work needs to be reviewed.

If further discussion of the xgit scripts is necessary a new thread could be created.

As a reminder the project for this is located: http://drupal.org/project/versioncontrol_git. The CVS can be checked out from contributions/modules/versioncontrol_git.

jpetso’s picture

Status: Needs review » Needs work

Man, you're frickin' awesome. Started testing now, but I don't have much time right now (yay for meeting friends to play a game of Worms or two). Short report:

  • Creating the repository worked fine, the admin form seems to be ok.
  • I think I was wrong about the commit_id as you're only using that for internally storing the SHA-1 hash. So that shouldn't be any problem for the module working correctly. Nevertheless, I'd appreciate if you could rename 'commit_id' to 'revision' in the database and all code that has $commit or $commit action arrays. (Rationale: 'revision' is supposed to be used for the VCS specific commit/revision identifier all over Version Control API, so that would be nice for consistency.)
  • Fetching the log yielded a success message, however when trying to look at these commits in the Commit Log, I got an error. It seems that the Version Control API didn't get the repository right for at least one commit, but the inserted data in the database looks ok at first glance. I'll have to investigate this later.
  • How can 'git tag' work without the '-l' (as in 'list') option? Not sure if this actually works, but at least on the command line I get a usage error message on my system when calling a plain 'git tag' without the option.
  • You can remove the 'modules' and 'run_as_user' repository properties from the database as well.
  • Leaving the second 'Needs Response' bullet point for later, I don't have the time right now.
  • As for the generate-passwd script, I think you can leave this out (initially?). The workflow of Git is fundamentally different to CVS, and doesn't really rely on passwords but more one maintainer who has direct SSH access to the repository server (if I got it correctly... not sure if there are additional ways to do access control for Git). As Version Control API is meant to be server-side only, I think it would make more sense to use the {pre,post}-receive and/or the [post-]update hooks instead of post-commit which is only called client-side if I'm correct. Those hooks can also check on tag names (I think branch names as well, iirc) and are better suited for our purposes, I think. The "committing user" issue could be a problem, let me think and investigate a bit more on that.

Ok, gotta go! Maybe I can catch you on IRC sometime, but this seems to work out really nicely :D

aclight’s picture

@boombatower: A few things related to this task:

1. It's probably not a good idea to make major edits to your comments on issues, because it gets confusing quickly. If you need to open new issues to track certain tasks that's fine.

2. Find me in IRC and I'll send you a copy of the xsvn scripts I wrote for my port of the cvslog module to work with subversion. I don't know if those would be useful to you at all, but at least they will give you another example of some functional commit scripts.

3. Regarding getting the commit user, the way I do this in subversion is to call svnlook from the commit hook scripts. svnlook is a binary that comes with subversion which allows one to inspect information about a previous commit or a pending commit. I don't know if Git has a similar feature, but you might see if it does.

boombatower’s picture

Status: Needs work » Needs review

@jpetso: Summary of changes:

  • Changed 'commit_id' to 'revision'. Fixed commit - #97858
  • Running git tag without -l works on my box, but I changed it so that it is more correct and possible works on all setups. Fixed commit - #97863
  • Removed 'modules' and 'run_as_user' from database and one code reference. Fixed commit - #97865
  • Changed README.txt to inform user to use the post-update hook since it better suits our needs. Fixed commit - #97871

@aclight: Response to your points:

  • I have a copy of the xcvs scripts which provide a nice example, but the methods used are special to CVS and Git doesn't seem to provide similar features.
  • I will keep looking around for commit user, but the only stuff I found allowed you to get the commit message, nothing more.
jpetso’s picture

@aclight: The main problem that I see with retrieving the committer in the access control scripts is that it doesn't have to be the person who actually contributed the patch. Git remembers who did the original commit even when all the commits are merged into a different repository, and we only want to check on the "uploader", not the one that initially did the commit.

So, in order to solve the problem of associating "uploaders" with accounts, we need to know how an uploader appears on the system. Does Git assume that any user with write access to the repository has sufficient rights? In that case, we might want to use the current system user for the commit scripts. There might also be a shortcoming in the Version Control API's access control API as it doesn't distinguish between committer and uploader, guess I should introduce an optional uploader username parameter or something.

In short, we need more research on 1. how Git's standard workflow works with regards to user identities, and 2. if it can be different to that standard workflow, and if so, how.

Apart from that, the module starts to become really nice. In order to keep the whole thing manageable, I'd suggest we have a list of "active" and "needs response" tasklets, and let's assign those to letters - a), b), c), etc.
Seems I don't find any time for a proper answering/improvement session, need to go to sleep now as I'm going to get up and drive in 6 hours and still haven't packed. But well... :P

morphir’s picture

the ssh key should be submitted when you register or apply for a git account to the central repository(git.drupal.org).
The RSA encryption makes this secure as hell. See git web on how this is done.
Short debrief:

  1. You commit your code and changes locally on your own computer. That's what decentralized means (primarily).
  2. When you are pleased with your results, you push your changes upstream to the central git repository
  3. However, your push does not inflict with anyone else (read scenario underneath)

Git opens up for having several trees against one project.
[Scenario]
Views module would have one master tree maintained by Earl Miles, and one "fork" maintained by jpetso. This way, earl can watch any research or crazy attempts that are being done trying to improve views, and ultimately merge that fork into the master tree when or if the fork matures.

Now, we could introduce "The Drupal developer forest" - which lists all trees, or links to all trees. How about that?

jpetso’s picture

I had a short (...NOT!) research on commit access restrictions and user authentication today, and came to the conclusion that commit access scripts are not necessary for distributed version control systems. So that eliminates the need for a "pre-receive" or "update" hook script, and leaves us with migrating post-commit to post-receive.

jpetso’s picture

Status: Needs review » Needs work

Oh right, and the nice people in #git also had a proposal on how to retrieve the latest commits. They advised not to base anything on the commit date/time (which is naturally a good idea for nonlinear commit trees like in Git) and rather go from the latest commit of each branch. That means we'll need a table like {versioncontrol_git_latest_commits} with the columns 'branch' and 'revision' (= SHA-1 hash) and on each update we'd store the latest commit on each branch (as retrieved by git log) in there. I could imagine a repository update algorithm like this - pseudo code, should only communicate the idea, not usable as is. Er... it grew a bit large... well what the heck, I like it :)

$latest_commits = [get all branch/commit pairs from {versioncontrol_git_latest_commits}]; // $latest_commits[$branch] == $sha1_hash
$branches = [get all branches from git branch];
$obsolete_branches = array();

// remember branches that don't exist anymore in the current repository (have been merged, or deleted)
foreach ($latest_commits as $branch => $latest_commit) {
  if (!in_array($branch, $branches)) {
    $obsolete_branches[] = $branch;
  }
}
// track new branches from the beginning
foreach ($branches in $branch) {
  if (!isset($latest_commits[$branch])) {
    $latest_commits[$branch] = FALSE;
  }
}

// record the commits
foreach ($latest_commits as $branch => $latest_commit) {

  // generate either '[hash]..' or '..[branch]' or '[hash]..[branch]' as range specification for git log
  $range_spec = $latest_commit ? ($latest_commit .'..') : '..';
  if (!in_array($branch, $obsolete_branches)) {
    $range_spec .= $branch;
  }

  $commits = [get all commits from git log];
  $is_latest_commit = TRUE; // the first entry in 'git log' is the most recent one

  foreach ($commits as $commit) {
    if ($is_latest_commit) {
      $latest_commits[$branch] = $commit->revision;
      $is_latest_commit = FALSE;
    }
    if ($commit->revision is not in {versioncontrol_git_item_revisions}(revision)) {
      insert into {versioncontrol_git_item_revisions} values (...);
    }

    versioncontrol_ensure_branch($repo_id, $branch);
    if (($commit->revision, $branch) is not in {versioncontrol_git_commit_branches}) {
      insert into {versioncontrol_git_commit_branches} values ($commit->revision, $branch);
    }
  }
  delete from {versioncontrol_git_latest_commits} where branch = $branch;
  insert into {versioncontrol_git_latest_commits} values ($branch, $latest_commits[$branch]);
}

// delete obsolete branches
foreach ($obsolete_branches as $branch) {
  delete from {versioncontrol_git_latest_commits} where branch = $branch;
  delete from {versioncontrol_git_commit_branches} where branch = $branch;
  versioncontrol_delete_branch($repo_id, $branch);
}

Something like this should work pretty well... the branches should be stored (and retrieved, and deleted) as branch ids, but apart from that I think I got it quite realistic. Bonus points for storing all previous revisions in {versioncontrol_git_item_source_revisions}(revision, source_revision). Well, bonus points... actually that should really be done.

boombatower’s picture

Status: Needs work » Needs review

@jpetso: summary of changes:

  • Changed log retrieval to use last commit hash instead of date and rewrote repository update method. Fixed - commit #98533
  • Fixed issue with '..[branch]' not returning anyting. Fixed - commit #98535
  • Store previous revisions database. Fixed - commit #98544
jpetso’s picture

Status: Needs review » Needs work

Very cool, it's really starting to get ready for real-life usage. Not yet perfect, however:

  1. Now that you use latest commits for log retrieval, the 'updated' repository column is not needed at all anymore. You can remove it in the .install file, all occurrences of $repository['updated'] and SQL queries involving that property.
  2. In log_update_repository(), no need to query {versioncontrol_branches} for the branch id - Version Control API has a nice function for that with even caching integrated. So you'd rather call versioncontrol_get_branch_id($branch, $repository['repo_id']) and check for NULL in case the branch doesn't exist. (You can currently find that API function in versioncontrol-backend.inc, starting from line 1154.)
  3. Don't forget to keep the comments and documentation up to date: in line 43, the '..[branch]' part of the comment should now be '[branch]' only, and the description of the get_current_revision() and get_previous_revision() functions doesn't match the code anymore. Also, a typo in line 45: s/obsolute/obsolete/.
  4. {versioncontrol_git_latest_commits} should definitely have a repo_id column, so that you can delete them when the corresponding repository is deleted (implement versioncontrol_git_repository() for that matter, there's a template for that function in versionconrol_fakevcs.module).
  5. Hm... thinking about it a bit more, {versioncontrol_git_item_source_revisions} is actually a misnomer in its current form: it tracks source commits, not source revisions of items. Let's recap: its goal is to provide a correct $commit_actions array. That is:
    • action == VERSIONCONTROL_ACTION_ADDED if there's a 'current item' and no 'source items'.
    • action == VERSIONCONTROL_ACTION_DELETED if there's no 'current item' and one element in 'source items'.
    • action == VERSIONCONTROL_ACTION_MODIFIED if there's a 'current item' and one item with the same path and different revision in 'source items'.
    • action == VERSIONCONTROL_ACTION_COPIED or VERSIONCONTROL_ACTION_MOVED if there's a 'current item' and one item with different path and revision in 'source items'. Not sure if it's possible to track this in Git, so in practice this might work out as "added" (plus "deleted" for a move).
    • action == VERSIONCONTROL_ACTION_MERGED if there's a 'current item' and multiple 'source items'. This case doesn't exist for CVS but exists in distributed VCS, which means there can be no single "previous item" in a commit action (in the general case, at least) in Git. As a consequence, the versioncontrol_git_get_previous_revision() function, if appropriate at all, would need to return an array of previous revisions. I'm not sure if Git can merge from source items whose path is different to the target path... if so, then this function would be wrong altogether and we would need functionality that gets us an array of whole source items instead an array of source revisions (as not only the revision but also the path can be different).

    Now, the {versioncontrol_git_item_source_revisions} table is supposed to help us to find out which source items there are for a given current item. (And the other way round, for item history purposes.) With the current approach, you can only distinguish between the first three cases, but especially the "merged" action is very prominent and widely used in distributed version control too. So, I don't know if it's feasible to find out, but it would help a lot to have {versioncontrol_git_item_source_revisions} hold the columns (item_revision_id, source_item_revision_id), that is, storing file-level parents instead of commit parents. In any case, you should clean up entries from that table in versioncontrol_git_commit() when a commit is being removed.

  6. A commit can be present on multiple branches (happens e.g. when you commit on master and then branch off drupal-6.x), so there cannot be a single branch mapped to a commit. The CVS backend does that only because we can find out the exact branch that was used when for the commit by looking at the revision number, but you can't do that for Git, so you don't know which branch was the original one. That in turn means that {versioncontrol_git_commits} is wrong, it should instead be an n-to-n mapping (primary key: vc_op_id plus branch_id) and as a consequence renamed to {versioncontrol_git_commit_branches}. That even provides a nice replacement for the current {versioncontrol_git_item_source_revisions}, as it contains all commits on a given branch and thus has all source commits in it as well. Code-wise, you then need to adapt get_commit_branches() to return all branches for a given commit (== vc_op_id) and remove the VERSIONCONTROL_FLAG_AUTOADD_COMMITS flag from the hook_versioncontrol_backends() implementation so that the API module doesn't try to query {versioncontrol_git_commits}.
  7. While we're at it, you should also remove the VERSIONCONTROL_CAPABILITY_COMMIT_RESTRICTIONS capability flag, and also the VERSIONCONTROL_CAPABILITY_BRANCH_TAG_RESTRICTIONS one - in case you implement it, you can always re-add the flag. (dww stated in the recent discussion that branch/tag access hooks might be needed after all - I'm still not convinced that this is in scope for the GHOP task, though).

Sorry for all that nitpicking... don't get scared off by all that stuff, you're doing a fabulous job. It's only that distributed VCS provide a lot of possibilities, and therefore present a lot of edge cases to someone who tracks the commit history.

Those are all issues that remain, if I'm correct. Nothing left from earlier comments, so we can use this as the new starting point. I believe that when these issues are fixed (and no new ones arise by suboptimal implementation of those), the GHOP task will be done, with the Git backend as one hell of a module.

I'm leaving for skiing holidays in two days and taking tomorrow mostly off for preparation, so this is probably my last review in this issue. (No promises on that, though.) Need to get hold of chx and tell him to do the final reviews. Maybe aclight can also help out. Have fun, and thanks for the fish! boombatower, you rock!

...oh, and I'll be back on the 9th of February, see you then!

mlncn’s picture

Subscribing in awe.

benjamin, Agaric Design Collective

boombatower’s picture

  1. I figured that would still be useful to see when looking at the repository through the administration interface Needs Response
  2. Replaced query with API function call. Fixed - commit #98669
  3. Yea, I had noticed the '..[branch]' comment issue. Fixed - commit #98668
    Renamed get_source_revision() and updated comment. I'm not sure what is wrong with the documentation for get_current_revision() Fixed - commit #98670
  4. Add repository id and implemented repository hook. Fixed - commit #98674
  5. Not totally sure what is wanted since #6 seems to say it will replace this. Needs Response
    Clean up entries in versioncontrol_git_item_source_revisions when a commit is deleted. Fixed - commit #98757
  6. Rewritten to allow for a commit to be in multiple branches. Required quiet a bit of changing. Fixed - commit #98718
  7. Removed restriction capability flags. Fixed - commit #98677

A few other commits that fixed/changed some miscellaneous items were also committed.

gordon’s picture

Status: Needs work » Needs review

subscribing

e-Commerce is already using git at http://git.drupalecommerce.org, and if we were to use something like git we could restrict access even more and patches can be past as git patches which will information like the original author or the patch and who the committer is. These patches can be signed giving a chain of evidence so you can track down exactly who did the code without any doubt.

jpetso@drupal.org’s picture

Status: Needs review » Needs work

Still a bit time left before leaving! I know that I should review the Mercurial backend first, but whatever... let's get this finished (sorry ezyang).

ad 1.: Good point, let's keep the 'updated' value then. (maybe with a comment somewhere that it's only there for "documentation" purposes and not needed for the backend to work correctly.)

ad 3.: You're right, get_current_revision() is alright - sorry for the riot.

ad 5. What I meant was "it will replace the table as it is now", not "the table is unnecessary altogether". So as last major point, I would like to see this table transformed from a mapping of commits to a mapping of items. Stealing from my own Version Control API issue:

db_query("CREATE TABLE {versioncontrol_item_git_source_revisions} (
  item_revision_id int unsigned NOT NULL default 0,
  source_item_revision_id int unsigned NOT NULL default 0,
  PRIMARY KEY (item_revision_id, source_item_revision_id)
) /*!40100 DEFAULT CHARACTER SET utf8 */");

The difference is that you can retrieve all source items in get_commit_actions() with this structure, so if a file has been merged from two source files, you can provide two items in the 'source items' array. In other words, you'd track source revisions on a file level rather than on a commit level. Also, get_source_revision($revision) (single return value, identified by the commit id) will need to change to get_source_items($item) (return array possibly containing multiple items, identified by the item_revision_id) and move over to the .module file. I hope that explanation was good enough, in combination with re-reading 5. from above it should contain the info that is necessary to implement it. If not... well, don't know what to say more, and I'm running out of time again.

Other minor points:

8. It seems to me that the commit_branches_id in {versioncontrol_git_commit_branches} is unused and not necessary, if there's not a good reason for it then I'd suggest you remove it and set (vc_op_id, branch_id) as primary key. repo_id in that table is imho not totally necessary as well, because the linked vc_op_id already has that information. Haven't thought about this much, but that's what I currently think about this.

9. Just noticed that you are using {versioncontrol_git_item_revisions}(revision) for the commit id - that's not really what I meant before. CVS does it this way because there are no commit-wide revision numbers, so it has to be stored for each file individually. This is better (and easier) with other VCS as there is only one revision number / commit id per commit, and that is supposed to be stored as $commit['revision']. It then goes into {versioncontrol_commits}(revision) and will be autoretrieved from there, so when doing get_commit_actions() you can just assign $commit['revision'] to $item['revision'], and you can get rid of the hackish functionality that retrieves the commit id from {versioncontrol_git_item_revisions}. So, please delete 'revision' from that table and get it into $commit['revision'] instead (which you currently leave empty).

10. Related: if you fix 9. and add VERSIONCONTROL_CAPABILITY_ATOMIC_COMMITS to the backend capabilities, then commit log will display the SHA-1 hash as commit identifier instead of the vc_op_id. (At least, it should work this way - untested because I didn't have a matching backend yet.)

So, only one major issue left? Seems like next time this will be RTBC, or stuff. I love it.

dww’s picture

2 minor points:

a) yes, i think we'll still need tag/branch validation checks, but that's outside the scope of this GHOP task (not that it'd be all that hard, but it's certainly not essential to consider this "done").

b) as rocking as all of this is, no one should confuse this effort with the idea that d.o itself will inevitably move to Git as soon as this is committed. there are still vast unanswered questions about such a switch, a lot of work to do before d.o can use VersionControl API at all, much less versioncontrol_git. i don't say this to squelch anyone's enthusiasm, especially not boombatower's, but just to appropriately manage everyone's expectations. ;)

i'll be offline most of today, but hopefully available tomorrow sometime to do some final reviews.

thanks everyone!

boombatower’s picture

Status: Needs work » Needs review

@jpetso: summary of changes:

  • Rewrote source_revisions code and table, redesigned delete commit and repository to handle the new changes. Fixed - commit #99025
  • Removed unnecessary columns from versioncontrol_git_commit_branches. Not exactly sure why that was in there. Fixed - commit #99028
  • Removed the revision column from versioncontrol_git_item_revisions and use the versioncontrol_commits revision column. Fixed - commit #99029

It seems we had quiet a bit of miscommunication. Many of the things that have been requested to be changed over the course of the thread are things that I purposely changed. Oh well, it seems that everything is working and the module should be just about, if not ready, to be reviewed and released.

One thing I noticed is that when a deleted file commit is being viewed in the log the first letter is cut off. For example:
code.txt turns into ode.txt. I'm gonna guess this has something to do with everything being in the root folder, but that is purely a gusss.

@all: I see there are a few things that I can add to the module, but since none of them are necessary I think it is safe to say I can add them after GHOP.

dww’s picture

Version: » 5.x-1.0
Status: Needs review » Fixed

This code is clearly Done Enough(tm) for a first stab at a functioning git backend. It's all been committed, gone through the ringer for reviews, etc. In fact, chx seems to have already marked this closed at google's tracker. So, I'm marking this 'fixed' for the initial task here, as well.

I just started looking and found a few minor issues that I submitted separately (http://drupal.org/node/217424 and http://drupal.org/node/217425), but clearly, this is almost entirely ready, and we can move reviews, testing, etc into the issue queue for this new module just as any other.

Fantastic work, boombatower! Welcome aboard being a productive contributor to Drupal. ;) Looking forward to working with you more in the future.

p.s. boombatower: chx said over at the google tracker thingy that you still need to upload code there -- please do so to ensure that you get full credit for this work.

boombatower’s picture

I fixed the two minor issues you found and I have uploaded the final code to GHOP.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

  • Commit 1d93b6f on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower.
    
    Initial commit for Version Control API -- GIT...
  • Commit b1648fd on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: reads from all branches and records the branch...
  • Commit d3c4839 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: record tag information from logs and a few...
  • Commit acbd3e7 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: removed versioncontrol_git_item_branch_points...
  • Commit 2770b21 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: basic xgit scripts and modifications to...
  • Commit 4ed4235 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: changed 'commit_id' to 'revision'.
    
    
  • Commit b3bfd75 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: added -l argument to branch and tag list calls...
  • Commit 9a86e1e on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: removed 'modules' and 'run_as_user' from...
  • Commit 5a67161 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: use the post-update hook instead of the post-...
  • Commit ee2fb54 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: changed the log retreval to use last commit hash...
  • Commit 70064d2 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: store previous revision in database.
    
    
  • Commit d749c5e on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: replaced query with version control API call.
    
    
  • Commit 159ccea on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: renamed get_previous_revision() to get...
  • Commit 8bbc38f on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: added repository id to source revisions and...
  • Commit 5694124 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: removed repository capabilities that will no...
  • Commit 524633c on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: rewritten to allow for a commit to be in...
  • Commit 162e8bd on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: clean up entries in...
  • Commit c44f8fd on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: added comment explaining that updated field is...
  • Commit 2ae06ac on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: rewrote source_revisions code and table,...
  • Commit 0a9e5c7 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: removed uncessary columns.
    
    
  • Commit 7f6fd92 on 5.x-1.x, 6.x-1.x, 6.x-2.x, fix-invalid-default-branches, fullsync-memory by boombatower:
    #212731 by boombatower: removed the revision column from...