If you download a drupal module via "drush dl" (or web download), you will see something slightly different from what you get with git clone.
The "$Id$ expands to $Id ..... blabla ..$
The *.info file gets a few extra lines
The project dir gets a LICENSE.txt

This is a bit odd if you want to use git to compare your working (hacked) version of a module with the original. A lot of noise showing up, making it difficult to identify the real differences.

What can we do about it?
- for each project, create a separate public repository with the processed ($Id$ expanded etc) module code, read-only to anyone except the packaging script. (this is why it has to be a separate repo)
- the original (unprocessed) project repository is added as a remote
- for every release branch and tag of the remote (unprocessed), we add a processed child tag or branch on this separate repository.

Now, if you want to git diff or merge, you can simply clone the processed repository.

Or does something like this already exist on a 3rd party site?

Comments

sdboyer’s picture

This discussion started playing out on g.d.o. Capturing the relevant part of the thread:

sdboyer:

Ah, I actually misunderstood this a bit at first because I didn't read the link. So:

As I said, $Id$ tags should be gone. They no longer have any place in Git repositories...though I'm a bit ashamed to admit that, after spending an inordinately long time getting all the logic in place to make that happen, it turns out there's the gitattribute ident that would allow us to continue simulating the feature. Oh well. Anyway, point is that the tags should pretty much be gone from repositories.
LICENSE.txt won't make it in to repositories anytime soon. It's been suggested to me by legal-y people that this is a potential issue, though that conversation's gone dead in the water, and I'm operating under the assumption that it's a small enough problem that we can deal with it.
The information in the .info file is for the update manager. That's rendered totally unnecessary by git_deploy.

Updating existing projects to use dog is not something I'm looking to deal with right away. It's important, obviously, but the basic functionality needs to be working for new sites before we can think about scooting existing ones into this format.

And to be clear, it's not just "especially" relevant for existing projects. It's ONLY relevant for existing projects. If you're using dog, you should never ever ever EVER have a single tarball from d.o in your system. All git repos, all the time, period. Hybridizing makes things unnecessarily complicated

donquixote:

never ever ever EVER have a single tarball

The idea was, if the processed stuff (that is, LICENSE.txt included) was provided as a (read-only) repo, then we could start from there.
For the git workflow it would be nice to have a LICENSE.txt added, but we don't need expanded $Id$ (good to know it's gone), and we can discuss if we want the stuff in *.info or rather not.

I personally like the *.info stuff, because it's an easy way to know the version of a module. And probably the "available updates check" also uses this information. Yes we don't drush up anymore, but we might still want to have the warning messages about available security updates.

So, I think it is reasonable to ask in that linked issue, if d.o. could provide a repo with the processed module releases. And once we have that, I imagine we all want to switch to that one, if only for the LICENSE.txt.

We could even think about an intermediate repo that only has the LICENSE.txt, but nothing else.
And if d.o. does not want, someone could set up the same thing on github or somewhere else..

sdboyer:

Individual projects can add a LICENSE.txt. Truth is that core, at the very least, should probably put one in. That's how it gets in the git repo.

So when you say ~"a read-only repo of the processed stuff," there are a few possibilities to what that could mean:

A fresh new git repo containing a generated tarball that's been checked in, with one commit per tarball.
A repo with the additional files/changes made in a new commit on the tip of every branch.

The first proposal has already been out there for a long time: http://drupal.org/node/806484 . I don't like release repositories, because IMO they solve a problem that doesn't really exist - and in the process obliterate everything useful about git history. If you want to use tarballs, then USE TARBALLS. Don't just wrap their data in a git repository because "hey, we use Git now!" If you want to do that locally, fine - but I don't see a reason to invest infra time and resources in doing it. Beyond that, I see it as an inferior method for sitebuilding, so I'd actually rather we not support it at all, as that'll give it the impression that it's a good idea.

The second method is simply not feasible, period. We'd have to have background workers do nothing but continually rebase a commit on top of tens of thousands of repositories - ALL of which are copies of the real repos, and need their own repo location strategy, management when things go wrong, etc. And all of that so that people have a repo they can clone which does an upstream rebase on every single push. So every single merge from upstream will be painful and nasty.

As for the information in the .info file and managing upstream updates (security or otherwise), I'll say it again: git_deploy takes care of that.

The real question is - what problem are you trying to solve with this?

And..."I like doing it this way" isn't a reason. Dog is about codifying some best practices into real, assumable rules - not about accommodating every possible way to put together a Drupal site. That's what we've already had for ten years.

donquixote:

The second method is simply not feasible, period.

All of this can be automated.
Maybe it will be resource-expensive - in this case we should probably discard the idea. But maybe it is not.
The release repo would have the original repo as a remote, and it would have its own branches for all published releases.

For every new release, it would check out that version from the original repo, add the LICENSE.txt and *.info stuff, and commit the result into the release branch. That's the minimal thing, which does not require any merge or rebase or whatever.

The benefit is small, but so is the cost - or if it is not, we just say goodbye to this idea.

If we want to be a bit smarter, then we need to somehow make both origin-1.1 and release-1.0 parents of release-1.1. Not sure how exactly we would do that, probably involves merge and/or rebase. But still, it would be automatic.
And, in case of merge conflicts, we can always take the version from origin, then add the usual stuff (LICENSE + info), and declare this to be our merge result.

I need to read a bit more about git, but from what I know so far, this should work.
Expensive or not, only a test can tell.

donquixote’s picture

Thanks!

sdboyer’s picture

You are seriously trying my patience. I am enormously in favor of having conversations about improving drupal.org and our Git infra, but when you simply blow past the crucial issues I've raised and keep on throwing crap out there without addressing my key questions, it goes from "potentially useful conversation" to "colossal waste of time."

Please read what I have written. .info file modifications in Git are pointless and stupid. It is a solved problem - git_deploy. That is a *FAR* better solution than introducing a whole new network of repositories. I've said it twice now, and you blow right past it. Putting those modifications in Git is like giving someone a rusty compass when they're already carrying a portable GPS.

I need to read a bit more about git, but from what I know so far, this should work.
Expensive or not, only a test can tell.

Wrong. I can tell you: a test will indicate this is trivial in system resource terms. The problem is the total-system complexity this would introduce. Without some significant, defined benefit, there is *no* way this is worth it.

or every new release, it would check out that version from the original repo, add the LICENSE.txt and *.info stuff, and commit the result into the release branch. That's the minimal thing, which does not require any merge or rebase or whatever.

As for doing this just off of tags, rather than on every branch tip, yes, that's less insane. But it's still pretty pointless. Which brings us back to my question - another you blew right past - What problem are you trying to solve?

I'm sorry, but until you (or someone else) can answer the question of what problem this solves, providing specific use cases/legal arguments (re: LICENSE.txt), this proposal is dead in the water.

sdboyer’s picture

Status: Active » Closed (won't fix)

updating the status to reflect "dead in the water"

Damien Tournoud’s picture

donquixote’s picture

@Damien Tournoud,
More or less the same idea. No mention of LICENSE.txt, though. How important do you consider the LICENSE.txt?

@sdboyer,
The benefit within your "dog" context would be the added LICENSE.txt.
Is this relevant? I dunno. You assume yourself this is a "small enough problem that we can deal with it". Others might disagree. Personally I don't know.
But if someone would give you the LICENSE.txt for free, then I imagine you would probably take it.

Outside of the "dog" context, these release repos would allow people to download individual modules with git, and avoid some of the git noise this would bring if the LICENSE.txt was missing.
Just consider some people might play with git occasionally, until they feel confident enough to switch the entire project's workflow.
Again this benefit (of less noise in changesets) is quite small, but if you get it "for free", then why not.

So, we have this "small, maybe irrelevant" benefits vs a "simply not feasible, period" implementation / maintenance / whatever cost.
We found that the "simply not feasible, period" is wrong. The "for free" is also wrong, but if someone (Damien) already made it, then for anyone else it can be regarded as "for free".

donquixote’s picture

Btw..
the $Id$ is no longer expanded in git, but it is expanded in all tarballs that were generated for previous releases. And it is expanded in installed modules on existing sites.

My own use case was exactly this:
I wanted to use git to compare an older version of a module to the respective version in git, to find if and where it is locally modified.
This proved to be quite a pain, because the git version did not have the $Id$ expanded, etc.

I know there are other ways to do this kind of comparison:
- the "hacked" module
- drush dl the old version to some folder and then compare
- drush dl the old version, and then create my own git repo

But at this time I really wanted to find out what is possible with git.