After discussion with Sam Boyer and David Strauss at CPH2010, it was determined that there isn't a perfect solution for our git-hosting needs. We want the following:
1) High Performance - Able to easily handle our users concurrently checking out/checking in/branching/etc. Also with the ability to handle the fact that we may have a lot of temporary issue branches.
2) SSH Key Authentication - Wanted by many core developers
3) Password Authentication - Wanted by less technical users
4) User DB Not Tied to LDAP/UNIX/PAM - Being tied to real user accounts is just not doable, we have too many cvs account for this to be manageable.
5) Would really rather not use a non-standard, patched ssh daemon.
We had two main options going into our discussion at CPH2010, one being gitosis/gitolite (patched ssh daemon, key only, etc) and the other being the new git smart HTTP transport. Neither of these options contain all the wanted features and both would end up being hacks in one way or another. The best of these two likely being the HTTP transport, but for either option we would have to do some development to make it fit in our workflow and we wouldn't be able to use keys for the HTTP transport option, which really bothers me.
As this discussion went on, we started discussing how launchpad fixed this problem. What they did is take the Twisted Web Framework (which contains an SSH platform named Conch) and use this framework to write an SSH wrapper around bzr and then tie it into their ACL database. Twisted is a high performance python web server framework, that is easy to develop with and can handle a fairly high level of concurrency (its the standard epoll/event based networking daemon). It became fairly obvious that developing a git version of this python-twisted daemon which would be able to authenticate via keys or passwords and integrate into our ACL system easily would be the work of a few days, possibly a week.
We are not alone in realizing this. Later research revealed this: http://deadpuck.net/blag/serving-git/. Which is basically a step-by-step guide for creating exactly such a daemon.
The current plan of action is to schedule a time for David and Myself to meet up for a git daemon sprint and crank out this daemon/integrate it with drupal.org. I'm going to start development before this, to ensure that performance is what we want. I'll post a github link at some point with the code for the first version. Comments are appreciated, as this is a decision in progress and we could certainly be wrong in our conclusions.
Comments
Comment #1
damien tournoud commentedWhat happened for six months? We came to this *exact* same conclusion during the Git-sprint back in Drupalcon SF. I even started working on this and implementing a basic wrapper.
Oh, actually I know what happened. Deplorable decision making.
*Sigh*
Comment #2
nnewton commented@Damien: I wasn't involved in this six months ago, so I don't know :). I'd imagine what happened is a focus on earlier parts of this, since this isn't needed right now and won't be needed for quite awhile actually. Do you have a link to the code you worked on? I'd love to not duplicate your effort.
-N
Comment #3
webchickTagging for tracking purposes. Thanks a lot for the detailed write-up, Narayan!
Comment #4
webchickAs for "what happened," I'm not sure either. But if this was documented in the issue queue as a decision everyone reached, I must've missed it.
Comment #5
sdboyer commentedThe first I _ever_ heard about using a twisted-based system was when David, Narayan and I discussed it at DC CPH. I did miss the initial discussion at DCSF, unfortunately, but the fact that a twisted-based solution was never mentioned in the big discussion (#714034: Determine the access control solution for git) and that the primary discussion throughout there was oriented towards gitolite leaves me very confused as to how this was at all decided six months ago. Moreoever, no one present at those discussions ever brought it up to me directly in the numerous personal conversations I've had on the topic since then.
I understand that you've felt disenfranchised by this process, Damien, but this vitriol you occasionally show up and shoot at us is hurtful, and in this case also seems baseless. We've done a boatload of communicating, and to the extent that it's possible used the tools available in the d.o community public space to come to the necessary decisions. There have been a number of people who've participated in the work and decisionmaking process quite a bit, enough that I feel pretty good about how open the process has been with respect to the decisions we've needed to take.
Truth is, I've avoided approaching you about most anything related to the migration because of how strongly you expressed your displeasure with the process when the contract was initially announced. If that was the wrong response on my part, and if you'd like to be more involved, I can try to loop you in more.
Comment #6
sdboyer commentedretagging the lost tag
Comment #7
webchickMy mistake! This is the implementation ticket. We're not going to get that done this sprint. :)
Comment #8
mikey_p commentedI just want to point out that while this is pretty much the exact same conclusion that was reached in SF, the real issue here is that the smart http transport option has been thoroughly evaluated and ruled it out as an option for drupal.org. The http option wasn't even considered in the discussion at SF, and I don't think that most of the documentation about it existed at the time.
Comment #9
sdboyer commentedGonna try to get this done, or at least significant progress made, during this sprint.
Comment #10
halstead commentedWhen looking over this some questions came up. Can anyone answer these or point me to the person who knows the answers?
Do the CVS ACLs or equivalent already exist in Drupal.org's database or is migrating the current ACL system part of the git-ssh issue?
What kind of load is expected for git (are there any cvs stats available)?
Comment #11
dwwCurrent CVS ACLs are only mapping projects and user IDs. There's nothing more fine-grained (like per-branch ACLs). All those live in the d.o DB.
#880818: Remove CVS access tab in favor of the project maintainers form
The data itself lives in the {cvs_project_maintainers} mapping table.
The ACLs are enforced by the "xcvs" scripts that are included as part of the CVS Integration module.
Hope that helps. Let me know if you need any other info about the current CVS ACLs.
Comment #12
sdboyer commentedAs for load, I'd expect it'll get heavier than CVS pretty quickly, and continue to increase as we expand the number of ways people can use git with d.o.
Comment #13
tizzo commentedThis is something I have been wondering about. Sam mentioned something about using varnish to cash the authentication calls.
Do we want to expose a web service that the twisted server checks against or do we want python to connect directly to the DBs?
I'm guessing we want to keep it as loosely coupled as possible and will have provide some kind of cacheable response. Is there a ticket for this anywhere yet?
Comment #14
tizzo commentedApparently this is already underway though not yet anywhere public for the community to participate in or review.
Comment #15
tizzo commentedComment #16
eliza411 commentedAssigning to me for follow up per Sprint 4 planning meeting.
Comment #17
marvil07 commentedsubscribing
Comment #18
chizu commentedComment #19
nnewton commented@Chizu, Let me know if you need any information on this. David and I were going to work on this at a later point, but I think are both busy with SSL, et. al. ATM.
@eliza Can you ping me on IRC at your convenience
Comment #20
marvil07 commentedrelated #720664-34: Create a "ssh_key" module, to allow upload of SSH keys to drupal.org user profiles ?
Comment #21
tizzo commentedChizu and I each got working demos of this up and running and they are currently posted on github.
Both versions were based on the sample code nnarayan found. Each of our implementations had some strengths and some areas that still needed work so we are now in the process of combining them.
Chizu's version lives on his github account: Git Drupal
Mine lives on mine: Drupal.org Git Daemons and is authenticates against the placeholder module Project Git Auth.
While my code wasn't quite as tidy and needs more cleanup, it was closer to the eventual model we want to implement so we will be pushing my repository forward as we merge the best of the two implementations.
What we have working right now:
With what we have done today we could do a sprint demo that walks all the way through creating a project and getting a repository, authenticating, adding other users and collaborating on development!
YAY TEAM!
Comment #22
Anonymous (not verified) commentedawesome!
i've forked tizzo's repo on github, and will get a working dev setup going this weekend so i can join in the fun.
Comment #23
dwwNote: it's rare that projects are deleted. And even when they are, it's rarer still that we remove their directory from CVS. Do we ever want to be deleting Git repos? Maybe this code only needs to worry about create, not delete.
Comment #24
tizzo commented@dww : Yeah, I had considered that. For my own testing I wanted to be able to easily delete test data and get the git repos with it but I'm open to taking that back out.
Also Sam brought up a good point on IRC yesterday: Failing over to username and password is not an option because D.O allows for non-ssh safe usernames. As a result we'll have to scrap that. Everyone uses public keys and we sort out the training issues as they come up. Sorry rfay!
Comment #25
webchickIf we go with the currently planned proposal to deal with Git applications (basically, there isn't one, and people can create projects but not releases until approved), the ability to delete projects and repos gets more important.
Comment #26
tizzo commented@webchick: Well, we might want to differentiate. Approved projects don't get trashed but sandboxes can.
Also, ideally we should be able to spin up any number of developer sandboxes per user, right? We'd also need to have a provision for doing that mapping in the ssh daemon
Comment #27
webchickCurrently, sandboxes are slated for phase 3 (post-launch) because there's quite a bit of extra work to do there (as far as I know). But we should probably have an after-talk about this on our call today. #961144: Determine/finalize technical requirements for post-Git migration project approval process has brought this up again.
Comment #28
halstead commentedRegarding #24, chizu and I did some tests and the twisted ssh implementation doesn't seem to have trouble with most (any?) drupal.org usernames. We tested with some odd ones like Системаи нави-штан' and didn't have trouble. Are there other difficult names we should test?
So we should be able to deliver the desired password authentication functionality. Yay!
Comment #29
mikey_p commentedI think that spaces are a concern.
Comment #30
halstead commentedSpaces are not a problem for the twisted implementation of SSH nor for OpenSSH.
Comment #31
tizzo commented@halstead: Killer! We'll need to update the module to provide a service that the daemon can query against. I'll add that to the Drupal module this weekend.
I guess it'll be something like vcs-git-auth?user=[url encoded user name]&pass=[md5 hash] ?
We want to use query strings so that a varnish cash can sit in between twisted and Drupal (posts aren't supported).
Sending the password hashed isn't the most secure thing we could do but it seems marginally better than sending the cleartext and doing proper asymmetric encryption between python and drupal would raise the configuration barrier and be a pain.
As long as this is happening behind a firewall I suppose it shouldn't matter too much. For anyone using this stuff where both machines aren't behind a firewall they could access the drupal service over SSL (query strings are sent inside the tunnel, just not hostnames) or they could use something like pound to tunnel ssl traffic to varnish (varnish doesn't support SSL either).
How can we verify that all characters allowed in usernames (what passes in user_validate_name) will pass in SSH?
Comment #32
halstead commented@tizzo the user name must be in ISO-10646 UTF-8 encoding [RFC3629] which works for any string user_validate_name() allows.
Comment #33
eliza411 commentedTagging for consideration in git sprint 5, preferably as discreet issues.
Comment #34
tizzo commentedThis is implemented so we can call it done: https://github.com/tizzo/Drupal.org-Git-Daemons
It still needs some work, but the discrete follow up issues we have identified can be found at: