Coming from http://groups.drupal.org/node/48818?page=2#comment-133743

One of the reasons cited for drupal.org losing contributors to GitHub and the like is that you can't get a CVS account unless you meet the requirements, and one of the requirements is to have a finished and working module/theme/etc. (and one that does not duplicate others' code).

So here's an idea. Just give a CVS account to any damn person who wants one, but limit their commit access to /sandbox/[username] until the time comes that they fulfill the requirements at http://drupal.org/node/59 and can move over some of their experimental code (which would handily come with commit logs so we could evaluate their Drupal knowledge as part of the application process).

This seems like win, win, winnity win to me:

- /modules and /themes remains "protected" so code there only comes from people who've met our approval.
- Our end-users aren't exposed to buggy, experimental code
- New contributors can get used to CVS and maintaining projects and get used to the community 'spirit' in advance of doing their "formal" application
- The CVS application approval team can now review actual commit logs instead of a huge-ass chunk of code with no context
- We help stop the hemorrhaging of Drupal code to external servers.

Possible disadvantages:
- Policing to make sure people aren't committing XSS vulnerabilities or other horrors (less so an issue now that this is displayed from drupalcode.org)
- Policing it so people aren't uploading warez or other crap.
- More work on CVS admins to check a box (I volunteer to help with this)

Any others? Can we do it anyway? ;) Please? ;)

Comments

Dave Reid’s picture

+1000000000

I would love to maybe have CVS enabled for all users by default and they can start using sandboxes. Once they want to put something as an official project, we have something with history to look at.

apaderno’s picture

I don't see any reasons to not do so.
Is it possible to restrict to which directory users commit code, or allow them to commit code without the possibility to create a project? As far, when users have a CVS account they can create project nodes.

The pro with this would be that CVS reviews would not be anymore a sequence of upload code, wait for review, read what it should be changed, upload new code; plus, users would get used to committing code in CVS.

webchick’s picture

Project: Drupal.org site moderators » Drupal.org infrastructure

Actually, this is probably best in the infrastructure queue.

@kiam: I'm no CVS expert, but I definitely know that's possible to do in Subversion, and I imagine it's possible to do in CVS too, which is why I can commit to revision_moderation but not to views, for example.

On IRC pwolanin implied that there might be some hard-coded assumptions in CVS integration module (like access to CVS at all == create project access) that would make this a bit tricky though.

pwolanin’s picture

Sadly - it's likely we'd need to dig into project code for this not to be a disaster. Maybe there is some other way - e.g. creating a 3rd repo? drupal-sandbox?

Or let's just deal with this when we switch to $dvcs

Gerhard Killesreiter’s picture

one of the requirements is to have a finished and working module/theme/etc. (and one that does not duplicate others' code).

I wouldn't say that the theme/module has to be _finished_. It should demonstrate that the submitter knows what he is doing and what is the general plan is with this (so that the non-duplication requirement can be checked).

apaderno’s picture

Actually, in http://drupal.org/node/539608 the words used are the following:

Many people write good motivation messages and then mess it up right at the end with here's a link to my contribution, it's about 75% done. Please, when applying for a CVS account supply a link to your contribution that you believe is complete. Attempting to review uncompleted work is difficult and time consuming (yes, remove all those dpr() and self reminder comments). Reviewing completed work is much simpler as there is already a level of expectation in the reviewer. Try to make our job of reviewing your work a joy.

Apart for the reasoning here (making the work of reviewing simpler for reviewers is not secondary, anyway), I think that another reason to not accept no complete code is that we want to review the code that will go in CVS. If we would accept code that is 50% complete, then some users would show as few code as possible (the less code is, the less is the probability is that reviewers point out something that needs to be changed), and they will not show the part of the code they have already written, and that maybe has some security issues (I know, it is always possible that users add code with security issues also after they got their CVS application approved).

I think that is important to make users understand they apply for a CVS account when they have a module / theme ready to be placed in CVS, not before (with this I don't mean that all the bugs must be resolved, though). Differently, there would not be something to review, and we could return to the old CVS applications where no code were reviewed; that would surely resolve the problem of users waiting for somebody reviewing their code, and the problem of users hosting their Drupal code in different places.
If the review process has been added, I guess that there have been the need to decide who really deserves a CVS account, or at least to filter out who applies for a CVS account without to really understand why he would apply.

Gerhard Killesreiter’s picture

There must be enough code to review to get an idea of what the finished product is supposed to do and that the author has understood Drupal's APIs. That's really all we need.

pwolanin’s picture

Well - I think it's clear that a priority for a switch to a not-CVS system should be the separation of the ability to commit to projects or branches (e.g. be a co-maintainer) versus being able to create new projects.

apaderno’s picture

Maybe I am wrong, but the permission to create project nodes is not related with the permission to commit code for a project; to commit code in CVS for a project I must be in the list of users with access to that project CVS (with the exception of projects like Documentation, and Sandbox). If there is some code that verifies I am in the CVS access list, and I can create project nodes, then the code should be changed.

To implement this proposal, there should be two different roles where now there is just the role CVS user; users for which the CVS application has been approved would have a new role (i.e., Repository user), while the other users would keep the actual role (which could be renamed in project maintainer).

Directory that are not associated with a project can be written from all users; this is not a problem that is related with the proposal, as there are already users that accidentaly commit files in /contributions/module, which needs to be deleted.

apaderno’s picture

Splitting the current role in two different roles would also allow to control who creates projects, which was already proposed.

Damien Tournoud’s picture

Title: Proposal: Give a CVS sandbox-only account to any user who wants one » Host sandbox GIT repositories for any user who wants one

At this point, this is not a proposal anymore, we need to do that :)

alexanderpas’s picture

Component: Other » GIT
charlesjc’s picture

Some random thoughts:

  • It is likely that each user will require a sandbox repository per project that they work on. While it is possible to have a repository with multiple independent root commits (a root commit is the start of history in a repository), this will rapidly lead to pollution of the branch and/or tag namespaces in git. For example, if module A has a version tagged as 7.x-1.1, then module B in the same repository cannot have a version tagged as 7.x-1.1.
  • Given that there may be a lot of repositories, when a user asks to clone a module, it may make sense to use the git alternates functionality to save space. See the --shared option to git-clone at http://kernel.org/pub/software/scm/git/docs/git-clone.html
sdboyer’s picture

#11 @DamZ - I agree, this will be a fundamental requirement for the new git architecture. Anyone with a d.o will need to have an associated git sandbox. I also think that trying to implement this prior to the migration would be too complex, without much added benefit.

#13 @charlesjc - one module, one repo. At no point will the new architecture place more than one discrete 'drupal thing' - module, theme, etc. - into a single repository. There are no substantial gains from doing so, but a lot of drawbacks.

Re: shared repos: disk space is cheap. My current full mirror of CVS is only about 6G, and with the hardlinking that goes on by default, additional repositories (especially bare ones) are not going to make for a considerable space addition. The reasons I've pondered using --shared in the past were because of the possibility that it might basically implicitly auto-fetch, but I don't think that's actually the case.

With this in place, the application becomes a gateway on a d.o role - whether you can create project nodes. Which, I think, is nice. Plus, the sandbox will be an easy place for application reviewers to look when evaluating whether or not someone should be granted that role.

sdboyer’s picture

Actually, I'm gonna back up on the --shared comment - looks like we could use it to pretty good effect. See the comment from the github guy:

http://www.nardol.org/2009/2/7/github-saving-space

I do think we're not really going to be in a place where this is a huge concern, given a) hardlinking and b) what we have hosted on d.o is really not going to grow THAT big...but I might be wrong. And if it's as easy as using shared and doing a non-pruning gc, then it should be cake.

CorniI’s picture

Component: GIT » Database

subscribe...

CorniI’s picture

Component: Database » Git

I definitly did not intend this component change, sorry.

Crell’s picture

Legal sidebar: I'm not against loosening up sandboxes for people, but no one should be able to put code onto d.o, in whatever structure, unless they've gone through the click-through "I agree to only upload stuff I'm legally allowed to upload and that is GPLv2+ and I won't do anything to get Drupal sued" agreement. Automating that, fine, but it still needs to be there and a clear step for people to agree to before they can distribute code through our servers.

apaderno’s picture

I'm not against loosening up sandboxes for people, but no one should be able to put code onto d.o, in whatever structure, unless they've gone through the click-through "I agree to only upload stuff I'm legally allowed to upload and that is GPLv2+ and I won't do anything to get Drupal sued" agreement.

As who needs access to the sandbox needs to first apply for a CVS account (it's not possible to give a CVS account to users if they first don't apply for a CVS account), that should be added in Apply for contributions CVS access. Actually, uploading GPL licensed code is a requirement, but there is not an explicit agreement about that.

I changed the content of Apply for contributions CVS access, and added a sentence that makes explicit what reported by Crell. It should made clearer what Drupal means in such context (the Drupal Association?).

sdboyer’s picture

I have thought about the licensing issue with some concern. Just so we understand our options, would it be feasible for the code in sandboxes to be exempt from the licensing requirement? Project repos (and by extension, issue repos) would still definitely have the GPLv2+ requirement.

Crell’s picture

sdboyer: No. If it comes off of our severs, then we (the DA, technically) are liable for distributing it. Saying "we don't actually police this part" is not a protection, unless we respond to any and all complaints by anyone by deleting content without investigation. (AKA, the DMCA.)

Kiam: Oh dear lord! That was *not* text to use directly!!! I have already removed it from the page. The *real* version of that text is already on the CVS application form itself along with the necessary checkbox. (I cannot get to it now because I have a CVS account.) That text was already worked out with our attorney. If we're going to put a second copy of it on the instructions page, it should be the exact same text, not a tongue-in-cheek informal summary thereof (which is what I had above).

Right now this is already covered (unless someone has changed it since I was last looking into this issue). My point is that if we are going to start handing out git sandbox accounts like Chicago city patronage jobs we need to ensure that we *keep* that step for our own legal CYA.

apaderno’s picture

@Crell: I apologize; comment #18 confused me.

As I reported in comment #19, users who get access to the sandbox would first get a CVS account; therefore, they would apply for a CVS account, and they would agree about what code they will commit in Drupal.org repository (yes, the CVS application form has already a checkbox about the license under which the code will be licensed).

webchick’s picture

@kiam: No, we're talking about changing this up, by necessity.

Under the new rules, in order to get a Git account they would simply go to their Drupal.org user profile, under the "Contribute code" tab (or whatever), and check a box that says "I agree to upload only GPLv2+ code" and then press the "Give me a Git account!" button. This would immediately give them a personal sandbox and commit access to the per-issue repos we're discussing setting up at http://groups.drupal.org/node/50438. We would then revoke access from anyone who abused this privilege.

If they want to create a project that's downloadable by end users, then under the current rules, the first time they'd go through the approval process that is our current CVS approval process, only instead of uploading a tarball, they'd point folks off to their personal drupal.org sandbox with their code in it where the application approvers would review the code and leave comments. It still remains to be seen whether we'd have that step one-time as we do now or per-project or just scrap it altogether.

Or at least that's my understanding.

webchick’s picture

Incidentally, Crell, if we instituted this policy for all Git account holders, both new and existing CVS folks, once we make the switch, we could potentially solve all potential legal ambiguity around d.o contributions in one fell swoop. Could you maybe discuss with our lawyers what the wording around this checkbox should be, and pop over to a separate issue to discuss it?

CorniI’s picture

@#23:
The new contributors need to upload a ssh key before getting a git account, because that's used for logging in, and we need _good_ documentation on how to do this on all platforms.

pwolanin’s picture

Given that a user should be able to associate multiple ssh keys with one or more sandbox git repos and there is the potential that ssh keys can be shared, seems like we should add the license terms check-off every time a ssh key is added.

pwolanin’s picture

Github and other sites already have reasonably good docs - however those seems ot be all rights reserved not CC. At least initially we can point users there until we write our own.

webchick’s picture

Incidentally, if there any more of these "bite-sized chunks" kinds of tasks, please pull them into separate issues and cross-link. The more attackable something looks, the more likely someone is going to do it. :) It also forces us to think through all the actual steps it will take to get this done, which helps to figure out what kind of time it might take and what kind of extra resources we might need.

apaderno’s picture

@webchick: Thanks for the explanation; now it's clearer to me how things are going to be changed.

Gerhard Killesreiter’s picture

I am not convinced the proposed procedure is too helpful.

webchick’s picture

Gerhard: Do you mean #23? Could you elaborate on your concerns? Over at http://groups.drupal.org/node/50438 we've been hashing things out and this arrangement really seems like the best (and probably only) way to keep development and collaboration centralized here at d.o.

Gerhard Killesreiter’s picture

Yes. I simply don't believe that free cookies will fix anything. However, I also don't believe there's a terribly big problem in the first place.

You have to understand the implications of this change: Anybody can simply create an account on d.o and find free storage space for whatever they like to store (I don't believe reviewing the contents is an even a slight possibility).

webchick’s picture

Yes, I do understand those implications... they were mentioned in the OP. And I am quite open to ideas on how to address them. Possibilities:

- pre-commit hooks that check against a list of allowed file extensions, much like we have in the contributions/profiles directories: .inc, .php, .module, .install, .css, .js, .png...
- disallow committing of any files > 1MB in size (or similar)
- Some kind of "auto-scanner" script run before commit that greps for the following and rejects them (and/or bans the account):
a) text/headers from licenses other than GPL
b) security violations e.g. http://ha.ckers.org/xss.html
c) binary data

That would probably stop most of the warez kiddies' tactics. I agree that monitoring each and every commit is not realistic.

Or, if you have a suggestion for a "low-weight" approval process that would both ensure we don't lose any of our current patch-level contributors, but also not leave it so open/free-for-all, I'm all ears on that side as well.

Gerhard Killesreiter’s picture

Yes, if we find people who will implement such kind of checks we can consider doing this.

webchick’s picture

Great, thanks Gerhard. Created critical sub-task #720700: Come up with an update hook to mitigate abuse of sandboxes.

Crell’s picture

@webchick: I must have missed something because I don't get how using ssh-authenticated git sandboxes solves anything with regards to GPL-verification better than we have now. Anyone with CVS access now has to go through a click-through checkbox. The same would/should be true with git. Anyone just making a clone, rolling a patch, and posting it as they do now does not, and would still not. So I don't get how it would change anything. (Feel free to clarify in the linked issue if that makes more sense.)

Gerhard Killesreiter’s picture

Another point to consider is disk space.

A neat use case for such a sandbox would be to host any Drupal site that I develop (assuming I don't care for prprietary code).

That's easily 10MB per site if not more. I don't think we should allow such use cases.

CorniI’s picture

Well usually, you'd just have a module in the sandbox. Git compresses very well, and you're not expected to have a whole drupal installation in your sandbox. And if you clone a project in your local sandbox, git uses hardlinks to save space. Btw a whole drupal 6 install without history is like 1.5MB. And we want to limit the disk space a sandbox can use as well.

Crell’s picture

Another random question from a relative git-novice who has never actually used github: Is the intent for every user to have one (1) sandbox, into which they put all of their "crazy module ideas that may or may not get released?" Or one per such module? Because if they're all in the same sandbox in separate directories (as we do in CVS now), you can't branch them separately. The entire repository clone branches as one (AFAIK). OTOH, if we're allowing an unlimited number of sandbox repositories per user (so that they can be "fully" developed and branched and so forth before getting promoted to the "real" module repository) that's potentially a lot of new repositories and the namespace and diskspace(?) issues that come with that.

How are we planning to address that?

CorniI’s picture

The details of the sandboxes are not fully worked out 100% yet, but the sandboxes will be definitly either empty repositorys, so you can start developing some in there, like a new module you propose, but you maybe just scrap it instead of releasing it, or a clone of an existing project repository. Git uses hardlinks for common files in repositorys, so it saves most of the space required by a repository, if we store both repositorys on the same harddisk. And really, this is all about source code, which doesn't take much HD space, especially if its compressed, as git will do. Throw in a RAID I with 1 TB space, which doesn't cost much money, even if you choose good HDs, and the system will scale space-wise at least until 2020.

Crell’s picture

OK, so diskspace is a non-issue (yay!). That still leaves the question of whether we have git://git.drupal.org/sandboxes/crell.git for all of my random stuff I'm working on (effectively making branching useless) or git://git.drupal.org/sandboxes/crell/crazy-idea.git for each random thing I'm working on (creating potentially a few zillion more repositories).

If the answer is "I dunno yet", that's cool, I just want to make sure the question is in someone's head. :-)

(This is in the front of my mind right now because I recently did check such a crazy idea into my CVS sandbox before reformatting my hard drive, only to discover that, oh yeah, right, CVS doesn't check in subdirectories unless you explicitly tell it to separately, so I lost half of my work. *stab stab stab stab*)

pwolanin’s picture

@Crell - re: the GPL issue, we are no better off if people manually roll and post the patch. However, if they must have agreed to the terms in order to push to any issue repo, likely we will be better off (especially for large contributions) than we are now.

That's why in the ssh keys issue I wanted to make sure we include GPL agreement - it's certainly possible that you could add ssh keys and be able to push to issue repos without ever requesting a sandbox repo.

Crell’s picture

Ah, I think I follow. Yes, you should not be able to push code to git.drupal.org anywhere without first agreeing to the GPL click-through.

sdboyer’s picture

Issue tags: +git phase 3

tagging

wik’s picture

subscribing

mlncn’s picture

Can we have an issue queue per sandbox ("crell") or, preferred, per sandbox project ("crell/crazy_idea")?

marvil07’s picture

Josh The Geek’s picture

This got fixed, but I'm not sure where.

anarcat’s picture

Status: Active » Needs review

Indeed, it's unclear how or when this was fixed, but I think since it's documented it's probably up, right?

Josh The Geek’s picture

Title: Host sandbox GIT repositories for any user who wants one » Host sandbox Git repositories for any user who wants one

:)

dww’s picture

Status: Needs review » Fixed
Issue tags: -git phase 3 +git phase 2

#986718: Add support for sandbox projects and a series of other issues. deployed live as part of the initial phase 2 launch.

Status: Fixed » Closed (fixed)
Issue tags: -git phase 2

Automatically closed -- issue fixed for 2 weeks with no activity.