All Git commits have metadata for committer and author in the form of “John Doe ”. Currently, the commits imported from CVS are stored in the form of “mikl ” – ie. the d.o username as both name and e-mail address.

When people start committing directly to Git, it will change to have the details the user provides in his .gitconfig. That will obviously cause a mismatch between the two.

The way the most Git sites does it, is that the commits are matched to users via the e-mail address. So if I commit as mikkel@example.com and that is also one of my registered e-mails, commits will be credited to my user account.

I think we currently have both name and e-mail available on d.o, and in my opinion the best thing to do would be to simply use those for the metadata, but if that is not an option, I’d like to suggest that we at least change the e-mail to something more unique, like username@git.drupal.org or something, so we’ll be able to distinguish the ported commits and figure out which user they belong to.

CommentFileSizeAuthor
#23 git-workflow.png86.16 KBwebchick

Comments

mparker17’s picture

I don't particularly like the idea of my personal e-mail address floating out there in the wild for anyone to find. I get enough spam as it is. I'd much rather that people who want to contact me use the personal contact form I've enabled on my Drupal user account page.

I like the way that Github identifies users: by an SSH key. That way, I can set user.email in my .gitconfig file to be "mparker17@work" or "mparker17@home" or whatever. I'd also be happy with keeping my user.email set to username@git.drupal.org

rfay’s picture

I guess I think that "realname " is the natural git way to do this, but don't blame people for not liking that. Too many people will be annoyed about that to do it.

mparker17's "username@git.drupal.org" would work fine.

mikl’s picture

Or perhaps uid@git.drupal.org, since people can change their names – so in my case 58679@git.drupal.org

mparker17’s picture

That sounds like a great idea to me!

bojanz’s picture

I understand the reasoning behind using the UID, but what most people look for is the username itself. It needs to be shown somewhere.

Bojan Zivanovic / bojanz@git.drupal.org sounds cool (although I have no problem with using my real email, already do it on GitHub)

rfay’s picture

And I don't think it's that horrible that usernames may change over time. It's happened even with core committers, and it hasn't ruined anything. username++, username@git.drupal.org++

webchick’s picture

I like username@git.drupal.org as well. :) And no worries on it not syncing with the Drupal.org username; our current CVS ones don't either.

webchick’s picture

webchick’s picture

Status: Active » Needs review
sdboyer’s picture

Issue tags: +git phase 2

Some of this has already been discussed in #716300: Formatting of ~/.gitconfig and git user data. This conversation is, to some extent, a duplicate of that one. I'll leave it open for the moment, though, and tag it.

My goal in this discussion is to walk the line between protecting privacy and following the general patterns used in the wider git community (which is to provide a real email address, though typically an intentionally public-facing one).

To be clear, while doing direct repository browsing will reveal email addresses, regardless of the solution we decide on here we will have author information point to d.o profiles on the equivalent to http://drupal.org/cvs .

mparker17’s picture

To be clear, while doing direct repository browsing will reveal email addresses, regardless of the solution we decide on here we will have author information point to d.o profiles on the equivalent to http://drupal.org/cvs .

This does make me feel better. Brief clarification though: when you talk about e-mails being revealed through direct browsing above, does this include browsing the web interface at http://git.drupal.org/ (which redirects to http://git.drupalcode.org/ at the time of writing) or would someone see e-mail addresses only if they were to git clone some part of the repository?

mparker17’s picture

I like username@git.drupal.org as well. :) And no worries on it not syncing with the Drupal.org username; our current CVS ones don't either.

Random, crazy, off-the-wall idea that might satisfy both sides of this discussion: what if username@git.drupal.org forwarded to the e-mail address in our user profile but also respected the hourly threshold of Drupal's per-user contact form (i.e.: "3 submits per hour" rule)?

eliza411’s picture

Issue tags: +git sprint 5

Tagging for consideration in git sprint 5

sdboyer’s picture

Assigned: Unassigned » sdboyer

Gonna wrap this up in sprint 5.

sdboyer’s picture

Priority: Normal » Critical

After some further reflection and a clearer plan on how versioncontrol will actually handle mapping raw git user data to d.o users (short version - we'll have a pluggable, extensible system for performing the mapping, so we can implement as many different mapping strategies as we want), I've come to a solution that should satisfy all camps, and not require a ton of coding. Let me start by running through the relevant considerations:

  • Any format we recommend for git's user.email data must be some sort of real, unique, verifiable identifier of a person. A proper email address is preferable, but not strictly necessary. What really matters is that we can take that raw data and map it to a d.o user account.
  • We already take considerable lengths to protect d.o users' email privacy, and we shouldn't reverse that trend. The simplest and immediately intuitive option is to link to d.o user profiles.
  • At the same time, it is pretty standard practice in the wider git world to use a real email address for user.email. And, where reasonably possible, we should adopt common practices of the wider world. More concretely: a d.o user profile link could never, ever resolve to a user account on github, but an email address can. It's very VERY important to understand that even if commits do contain real email addresses, at NO point will those emails EVER be visible over any d.o-based HTTP offerings. Spambots would have to clone a git repository and look at raw git data to get emails.
  • We have an email address from every d.o user already, but it wouldn't be right to just appropriate these email addresses and reuse them for this purpose. So we need to provide a choice between the options we're going to support. And beyond that, people could very reasonably want to associate a different email address than is attached to their d.o accounts with their git commits.
  • Past and future data are related, but different. We can offer a choice to each individual user about how their old CVS commits are mapped, but that oughtn't lock them into that choice for all future commits they make.
  • It's in everyone's best interest to make our statistical data as accurate as possible. This is true especially now that we're going to git and (once we have per-issue repos) work done by contributors will be counted as first-class robot-readable data, rather than second-class human-readable scrawl in commit messages. Accurate correlations between raw vcs user data & d.o users is predicated on making this a flexible, forgiving system.

With all that in mind, then, here's my plan:

  • In the final version, we'll allow mapping based on either a d.o profile link, or an email address. We also need to accommodate multiple email addresses, so we'll need something like the Multiple Emails module installed on d.o. And we're actually gonna need that sooner rather than later, because…
  • Existing CVS account holders will need to be given a choice as to how they want their historical commit mapping done. One place to add it could be on the CVS form at http://drupal.org/user//edit/cvs, but wherever it goes, we need to present a set of radio options:
    1. The default option (and the option we'll use if no data is ever entered) will be to use a profile link.
    2. The second radio option will be to use an email address. If this option is selected, a dependent select list will be shown, allowing the user to pick from all the email addresses they have registered with the system - their base email address, or ones added via our multiple email addresses module solution.
  • Once this additional form is in place, we email all CVS account holders to notify them that it's there, and give them a window (probably a week) during which they can make their choice.
  • After the window closes, we dump all the data into a format the migration scripts can use, and once the other hash-changing ducks are in a row, we'll slot that data in and every run of the migration scripts will use them, right up until launch day.

As I said, I think this proposal is adequate and ready to roll. And if I don't hear any big, whiny objections to it in a couple days, I'm gonna mark this fixed and we're gonna roll into implementation :)

sdboyer’s picture

Oh, and, sorry - username@git.drupal.org is not a good idea for a few reasons:

  • It's not intuitive to outsiders. Are those real email addresses? Do they work? (no they don't - they're just cute pseudo-identifiers. so yeah, I think #12 is a really bad idea :P).
  • Usernames can have funky characters that, while with proper UTF-8 goodness SHOULD be fine, will still require escaping and other nastiness (hello, spaces!).
  • It _is_ a big deal if the username changes. We need to keep everything in sync so that statistics & view listings are all correct.

A profile link has none of these problems, and thus is vastly preferable.

webchick’s picture

Hm. I actually think we need to pick one single way and apply it to everyone. Otherwise, reading commit messages will get extremely hairy, as will displaying data from commit messages (if $user->email is a URL, print it this way, else if it's an email address, print it that way). Providing choices also means additional coding, additional documentation to explain the differences and pros/cons between the two, etc. Ick.

I guess I'm confused why we can't do the Multiple Email addresses thing, since Git natively works with email addresses, and no matter which e-mail is associated with a commit, replace it with a link to their user profile once commits are pushed. That protects contributors' e-mail addresses and ensures that the most important data about a person on d.o -- their d.o profile -- remains the primary way of identifying them.

Or do I miss a cluestick?

sdboyer’s picture

Otherwise, reading commit messages will get extremely hairy, as will displaying data from commit messages (if $user->email is a URL, print it this way, else if it's an email address, print it that way).

First question is whether you're talking about "reading commit messages" as in the output of git log, or if you're talking about reading the commit messages that appear on d.o. WRT the output of git log, I actually tend to find the 'name' portion to be what I look at to quickly identify who made a commit, not the email, and I think that's what most people do. What people put for their name is entirely up to them (we won't be using it at all for the purpose of mapping), but I suspect most people will invest some effort in keeping it consistent.

As for output on d.o, here's the cluestick :) vcapi keeps 'author_uid' and 'committer_uid' fields in its records of commits - foreign keys to {users}.uid. These fields contain the result of the mapping logic, which is pluggable & extensible - and therefore, can handle the variety of different possible types of user.email strings from git. Mapping logic is run and these fields are populated when the commit data is initially read in (and later re-run on cron for commits that failed to map to a known user). The only thing that Views ever looks at is these uid fields - the endpoint of the mapping logic. All Views ever has to do is turn a uid into a link to a profile.

dww’s picture

Choice is inevitable here, since people can configure their Git clients however they want. Granted, we could make an arbitrary rule that says "only if your Git email address [sic] is really a link to your d.o user profile will we associate the commits with your d.o account", but a) that's not going to mean everyone's going to pay attention to the rule and b) it doesn't necessarily make it any easier to code this stuff (unless we go out of our way to build an inflexible hard-coded system from the start, which would be rather silly).

That said, a 1 week window is *way* too short to expect all CVS account holders to get the email, read it, understand the implications of the choices they have, and make their decision. For example, I could easily be offline for more than a week at a time over the next few months, and then I'd miss my chance. Assuming there are no major objections to the kind of plan Sam spelled out above, I think we need to move forward on the multiple addresses module and radio button ASAP to give existing CVS users more time to make this choice.

Cheers,
-Derek

mikl’s picture

I agree with Derek here. Getting this done could take a while, and I do think its necessary. The commits I make with Git get tagged with my real e-mail, and if those I've done with CVS don't get the same, it will be kinda confusing when the Git history is used elsewhere, like on Github.

webchick’s picture

"Choice is inevitable here, since people can configure their Git clients however they want."

Sure, but they won't abuse users.email to put a d.o profile address in there unless we explicitly tell them to do this. And we shouldn't. We should just tell them to put in that whatever they'd have to put in there normally, and resolve it to a d.o user profile link when it's output on the site (which is what Sam says).

So why offer them a choice to abuse users.email for something it's not intended for, and would be totally bizarre and d.o-specific?

webchick’s picture

I mean, if I'm concerned about people not knowing my email, I can always stick yourmom@mailinator.com in there. But it should still be an email address. The setting is called users.email. :)

I still feel like I'm missing something here, so I guess I'll request an after-talk on our call today.

webchick’s picture

StatusFileSize
new86.16 KB

In other words, here's how I would expect this to work:

 read my email out of git.config, map it to a user ID, show my username / link to my profile on the commit view.

Drupal.org would cross-reference the e-mail address associated with the commits with its "multiple mail" table, and return the user ID, which VCAPI stores. Then on commit message views, it does theme('username').

And I guess if you try and perform a push and the multiple email looker upper doesn't find your email, it kicks back an error and directs you to your user profile.

No choices. No documentation. No complicated explanations about identity. No weird timeboxing of having to make some kind of major decision. We just deal with it. And a person's username/d.o profile remains their primary means of identity on the site.

mikl’s picture

#23: I think the choice is mainly concerned with what is to be done with all the CVS commits when converted to Git. Should they have user.email set? And if so, to what address?

marvil07’s picture

The diagram is pretty clear :-), thanks!

Yep, "Multiple email lookup thinger thing" is #979040: Make pluggable the process of mapping of raw vcs data to Drupal users.

The only problem I see is what are we supposed to show when author_uid or committer_uid is 0 (not mapped, this can happen if someone is not including mail/whatever-we-map-to). It is actually going to happen on #970244: Create a views handler to map operation author/committer to their drupal user. I mean, if we just show the plain VCS data for author/committer it would follow backend data, which means whatever VCS user have put on his/her git configuration.

webchick’s picture

Re: #24, I think we just use Drupal.org's users.mail field. I don't quite see why we can't do this, if the following is true:

It's very VERY important to understand that even if commits do contain real email addresses, at NO point will those emails EVER be visible over any d.o-based HTTP offerings. Spambots would have to clone a git repository and look at raw git data to get emails.

Or, if we want to be especially paranoid in preparation for said future smart spambots, we can simply make all incoming records map to http://drupal.org/user/XXXX as Sam said. Done and done.

I'm still not quite understanding why we need to offer users the choice on how to map their old data. It seems like picking the way we deal with legacy commits is a policy change firmly under our control. VCAPI doesn't care one way or the other, because it has the uid association, which is the only thing that matters in terms of associating "karma".

#25: IMO you stop this at the push level with some validation hooks that check for a condition where it can't resolve someone's users.email property to something in the multiple_mails table. We reject the push and tell them to add the email address XXX@XXX.XXX to their Drupal.org user account under the "Multiple mails" tab.

We probably also then need some UI validation so that we don't allow people to delete an e-mail address associated with their account if it's associated with one or more commit messages in VCAPI.

mikl’s picture

#26: Well, I am all for just sticking people's e-mail in there, but I know that there are privacy concerns here.

As for validating e-mails on push, that would be troublesome, since you may want to push commits made by someone else, and not want to associate their e-mail with your d.o account.
Additionally, user.email will get set to your Unix username if user has not set his mail-account. So dude@MacBookPro.local or similar. Probably not something you want to associate with your d.o account either.

I think we should be as flexible as possible in this regard. Git is a new tool to people, and there is plenty of stuff to be confused about, without us adding additional complexity.

webchick’s picture

Ah, that's true about pulling in commits from other folks who may or may not have accounts on d.o. I hadn't thought about that.

What if we treat it like theme('username') does then? Store 0 as the uid in VCAPI, but reference whatever users.name is from Git when displaying:

"23c431a by webchick (not verified)"

I guess we need a cron job to periodically attempt to re-map these unmapped contact records then? Hm.

marvil07’s picture

Store 0 as the uid in VCAPI,

That's the current behaviour ;-)

but reference whatever users.name is from Git

Sounds like a good idea, I mean, instead of printing the plain {versioncontrol_operations}.committer (or author) pass it for a function to extract the name(this is naturally only for git backend), so an analog of VersioncontrolBackend::formatRevisionIdentifier() for author/commiter should be fine(proved on views already on #976136: Let backends overwrite revision field output on views using backend class).

mikl’s picture

#28: As for the (not verified) part, that may indicate that other commits with your name on them are verified to be yours. That is not so, unless you were to start GPG-signing your commits (that possiblity is actually one of Gits design parameters).

There's nothing stopping me from setting up my Git up to use “Angie Byron” and “angie@lolbots.com” for user.name and user.email and then having my Git actions on d.o show up as yours. That's the price of decentralisation, so we should probably not do anything to indicate that these data are completely reliable.

sdboyer’s picture

What if we treat it like theme('username') does then? Store 0 as the uid in VCAPI, but reference whatever users.name is from Git when displaying:
"23c431a by webchick (not verified)"

I guess we need a cron job to periodically attempt to re-map these unmapped contact records then?

Quite literally exactly my plan :) Really, all of it. The handler, and the cron job.

sdboyer’s picture

@webchick #898816-26: Consider using real names and/or e-mail addresses for the author/committer metadata?

I'm still not quite understanding why we need to offer users the choice on how to map their old data. It seems like picking the way we deal with legacy commits is a policy change firmly under our control. VCAPI doesn't care one way or the other, because it has the uid association, which is the only thing that matters in terms of associating "karma".

Maybe I wasn't clear - people are making the choice about how their old commits get mapped, but this is ALSO a choice they'd be free to make going forward. If someone (like say mparker17 in #898816-2: Consider using real names and/or e-mail addresses for the author/committer metadata?!), doesn't ever want an email address to go floating anywhere - and that is really, REALLY well within their right - then they can choose to have their old CVS commits use the profile link, AND make all their new git commits use the profile link as well. It'll still work.

@webchick #898816-21: Consider using real names and/or e-mail addresses for the author/committer metadata?

So why offer them a choice to abuse users.email for something it's not intended for, and would be totally bizarre and d.o-specific?

Lemme put this in no uncertain terms: I believe that neither I, nor anyone else, have the right to begin take personal information (an email address) given to me with a particular set of expectations about where it is publicly displayed, then make the arbitrary choice to start displaying that information in some other public manner. I am bound, at least ethically and possibly legally, to consult them prior to playing with their privacy - and that consultation is what I'm proposing here.

If you accept all that, then it's just a question of what the most intuitive alternative is going to be - and I explained why I prefer the profile link in #898816-16: Consider using real names and/or e-mail addresses for the author/committer metadata?. And since we can't make it just work for legacy commits, the approach gets grandfathered in.

@dww #898816-19: Consider using real names and/or e-mail addresses for the author/committer metadata?

That said, a 1 week window is *way* too short to expect all CVS account holders to get the email, read it, understand the implications of the choices they have, and make their decision.

You're absolutely right, it was stupid to even put that there. I put it there because we were originally trying to make this happen at the same time as our public unveil, but there's really just no way in hell that'll happen. Really, the better plan would be to basically let people decide right up until (a few days before) launch.

@mikl

#28: As for the (not verified) part, that may indicate that other commits with your name on them are verified to be yours. That is not so, unless you were to start GPG-signing your commits (that possiblity is actually one of Gits design parameters).

There's nothing stopping me from setting up my Git up to use “Angie Byron” and “angie@lolbots.com” for user.name and user.email and then having my Git actions on d.o show up as yours. That's the price of decentralisation, so we should probably not do anything to indicate that these data are completely reliable.

You're quite correct. Given that the only thing one can really do is give OTHER people credit for their own work, though, my thought on this has always been that it'd be more of a nuisance than a cause for real concern. However, if it does become a problem, we'll be able to consult the push history to figure out who's actually been putting in erroneous authorship information, and deal with the situation accordingly.

sdboyer’s picture

Status: Needs review » Fixed

On Friday's sprint wrap-up call, Angie convinced me/us that we'd be OK using the pseudo-email (e.g., sdboyer@no-reply.drupal.org). Of the points I raised in #898816-16: Consider using real names and/or e-mail addresses for the author/committer metadata?, the only true blocker is the possibility of user name changes, as it causes a potentially nasty data inconsistency issue. We can get around that one of two ways:

  1. Disable username changes on d.o. This is not a bad idea anyway - there's no good use case for changing usernames, but plenty of negative issues that arise from it.
  2. If disabling username changes doesn't fly, then we can still just force-register the fake emails into the multiple email module.

So we get to respect git standards, have consistent logs, and respect peoples' privacy. woot!

Status: Fixed » Closed (fixed)
Issue tags: -git phase 2, -git sprint 5

Automatically closed -- issue fixed for 2 weeks with no activity.