Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
$Id$
tags are going to be defunct after the migration, but the current plan dictates that they'll still be in files wherever they were before, albeit unexpanded. It'd be great if we could have a ready-made script (cross platform, even??) that will go through and remove the tags in all their various forms, all in one fell swoop.
It'd be especially nice to have this for phase 2, as folks are far more likely to use it as part of their initial switchover and acclimation to git than if we make such a script available months down the line.
Comment | File | Size | Author |
---|---|---|---|
#21 | git_attribute_expansion_testing.png | 85.29 KB | rocketeerbkw |
Comments
Comment #1
sdboyer CreditAttribution: sdboyer commentedThis really isn't a big thing, though...
Comment #2
ryanaghdam CreditAttribution: ryanaghdam commentedComment #3
mikey_p CreditAttribution: mikey_p commentedI very much doubt this, I would bet most of the tags are actually expanded.
Comment #4
sdboyer CreditAttribution: sdboyer commentedNo, they'll be unexpanded. That's going to happen as part of the migration path I've engineered. It'll actually be like they never existed at all - cvs2svn will use the kill keywords option when exporting the cvs revisions, so they'll never be present at all in the git history.
Comment #5
rocketeerbkw CreditAttribution: rocketeerbkw commentedShould all CVS keywords be accounted for? I had to google for a list of them and found this http://ximbiot.com/cvs/manual/cvs-1.11.6/cvs_12.html
Issue title specifies $Id$ but description says
Comment #6
webchickSam, is there documentation for how one runs through the migration path locally if someone* wanted to work on this, against some realistic test data?
* Not volunteering! :D
Comment #7
sdboyer CreditAttribution: sdboyer commented@rocketeerbkw: only the $Id$ tag is strictly necessary. That's all that we're using in a standard way with Drupal; if folks are using those other ones in their files, that's their thing. Then again, if you're gonna do $Id$, it probably wouldn't be hard to do the others, too...maybe we provide two versions of the script.
@webchick: unfortunately, there isn't any documentation on doing the full migration path on one's own. I've been meaning to do it up, but haven't worked on the migration path itself in a little while and so have let it slide. There are some scripts on github that DamZ and I worked on which will get you most of the way there, but will unfortunately not do this _specific_ piece correctly because to make cvs2svn use kill keywords, you have to hack it the code. I could probably include that as part of the instructions, though...
Comment #8
chx CreditAttribution: chx commentedAnd this is good why?
Comment #9
xmacinfoIs there a way to keep $Id$ in the files with GIT? Or at least transform them to comments?
Without these tags it will be next to impossible for most of our users to distinguish which file is more recent, or which version to file is, etc.
Also, when posting issues on d.o., we will not be able to say which version of the file we need help with.
This is specially true to users who work with FTP and don't use any version control.
Comment #10
sdboyer CreditAttribution: sdboyer commented@chx - I don't mind you being snarky as long as you're specific about it. Do you not like the idea of stripping the tags? That's fine, you don't have to apply it to all those contrib modules you maintain. Or are you objecting to not having $Id$-type metadata present in the files themselves? That's a different discussion, dovetailing with...
@xmacinfo - $Id$ will be present in the files after the migration occurs, but it will be unexpanded (e.g., it will be just literally, '$Id$', without any revision information). We could do it in a way where the tags are still expanded in the final revision, but (IMO) doing so would be pointless and confusing: pointless, because once we switch to git, that metadata will be forever frozen in whatever state it was in when imported from CVS, and confusing, because as soon as that file is changed but the metadata stays the same, the metadata is actually saying something UNTRUE about the state of that file.
If your users use the CVS revision number to identify the version of the file they're working with, then I'm sorry, but that's just the wrong way to do things. If they need to identify the version of the file but aren't using version control, then they should use the module version information provided by Drupal itself. If your users are regularly working with -dev releases and/or making modifications to files, then they should be using version control. Relying on CVS metadata tags, which will only be there if the author bothered to include the tag, is just a deficient way of doing things.
Comment #11
sdboyer CreditAttribution: sdboyer commented@rocketeerbkw: Oh, just realized I half-answered your question. I said "in all their various forms" because the tag can be in any kind of comment, in any file type. So we have to account for more than just
// $Id$
, but also/* $Id$ */
, as an example.Comment #12
chx CreditAttribution: chx commentedSo in case this was not evident: the $Id$ tag gives you a human readable way to check on the freshness of a file. It's not a machine thing.
Comment #13
xmacinfo@sdboyer: Agreed, in an ideal world everything should be under version control. However, there are real life example where having the revision number displayed inside the file would be important.
For example, the .htaccess file and the settings.php files. These two are often modified in production to override settings for various reasons. These two files do not always change from release to release.
A FTP user uploading files to its shared hosting repository cannot version control the file. For most of Drupal files, a simple file replacement works nicely, but for settings.php and .htaccess (if some settings are overridden), we need a way to know about the changes.
With CVS metadata, that was easy to spot a change in those two files. Without metadata, we will need to diff the files, see if there are changes, and if there are changes, apply the changes to the file.
There are a lot of support issues just for those two files and I am looking for a way to display version information inside these files.
So I guess I worry only about these two files: settings.php and .htaccess.
And I worry about the vast majority of users that download a tarball and do not use version control or cannot use version control.
How complicated would it to display revision number inside settings.php and .htaccess?
Comment #14
sdboyer CreditAttribution: sdboyer commentedSo, just had a long conversation about this in IRC, wherein the points xmacinfo raises were covered, in addition to others. Mea culpa, we do need to retain some kind of version string. I think unexpanding the keywords for the git history is still a good idea (not doing so will make git merges with CVS history in them awkward), but we do still need to retain some version numbering in the files. Git has a placeholder expansion system that we can use to do more than what was possible with CVS - see #720598-13: Consider using git-archive expansion for .info files for the list. The big drawback is that you can't immediately look SHAs and know which came first, as you could with CVS numbering. So the best we can do for that is dates.
In any case, let's repurpose this thread to a) decide on what we want to replace $Id$ with, then b) create a script, or at least figure out the regex, for doing so.
Note that a script may not end up being feasible, as we may need to rewrite ALL of the history with this new formatting system, which will mean applying the regex to every revision during the migration process itself. The reason why that might be necessary has flitted out of my mind, but I'm putting it out there anyway.
Comment #15
xmacinfoAny example of what it would look like?
Also, does this replacement need to be visible on all files or only on a subset of files?
As for specific information inside the *new* tag, if the 'date' is available and printed there, I am sure this will be enough for the use cases I have in mind.
Comment #16
rocketeerbkw CreditAttribution: rocketeerbkw commented@sdboyer: thanks for the clarification, it seems CVS will also expand
$Id: $
A)
From some CVS docs:
Shouldn't we replace $Id$ with something that will show the same info? Perhaps even on one line as well? (although the full sha1 is long for that)
B)
So for replacing unexpanded the following regex should work
\$Id[^$]*\$
Does that mean we need to replace expanded version of $Id$ as well?
Comment #17
rocketeerbkw CreditAttribution: rocketeerbkw commentedI should note that
\$Id[^$]*\$
is PCRE in case the script that's eventually written doesn't support PCREComment #18
pwolanin CreditAttribution: pwolanin commentedI thought the conclusion in IRC was to leave the expanded $Id$ tags in the imported files, and replace them in the tip of each branch so they can be git archive expaned when we make releases?
Comment #19
sdboyer CreditAttribution: sdboyer commented@xmacinfo: chx had some ideas about that. I think he was happy with just
%ai
ultimately. But I'd like to see people make some proposals here, discuss pros/cons :)@rocketeerbkw: yes, that's the conclusion other people helped me to come to, and what this discussion should now be about - what to replace $Id$ tags with that'll give similarly useful data once we're into git.
@pwolanin: I wasn't clear on leaving them expanded in the historical files, but it does seem like that's the preferred direction, so sure. (That also has the side benefit of making it relatively easy to set up the migration steps so that other people can test it, as they'll no longer need to hack cvs2svn). And then, yes, the script should replace the expanded tags in the tip of each branch. That does mean we'll have historical CVS metadata in there which doesn't actually refer to anything real or check-out-able, but that's probably OK.
Also, it's worth noting that I'm not quite sure how these placeholder expansions work - if they just expand the date of the last commit regardless of whether a file was modified in that commit, they'll be much more limited in their usefulness. I hope that's not what they do, but we ARE moving away from a non-atomic system. I'll experiment with that when I get a chance.
Comment #20
xmacinfoLet's hope the placeholder will expand correctly. I was not aware that GIT offered these.
And as CHX, I shall be happy with
%ai
. I don't think any other information is relevant for the end-users.Comment #21
rocketeerbkw CreditAttribution: rocketeerbkw commentedIt appears they are expanded in all files based on the most recent commit. I've attached a screenshot that demonstrates this.
I used
git archive -o test.zip HEAD
to expand the text in file1/2/3.txtComment #22
xmacinfoThe more I think about this, the more I think we should inject one way or the other a revision info only in two files:
Can a script inject some type of revision info (the date being sufficient, here) inside those two files?
Comment #23
pwolanin CreditAttribution: pwolanin commented@xmacinfo - we are not just talking about Drupal core here, but rather every contrib module as well.
Comment #24
xmacinfo@pwolanin - Granted. My main concerns are for core. ;-)
Comment #25
Barry_Fisher CreditAttribution: Barry_Fisher commentedShould it/ could it be the case that there is the option of expansion rules presented when downloading a release?
I guess this would come under the new Project module integration? It would seem that by having these tags remaining is a convenience for people not using version control on production servers. Perhaps I'm missing some other uses here but I think how the Id tags will remain to be useful will depend on the end user/developer. For a crude way of comparing latest files- I would just look at the modification date- but that's just me. Anything more complex then a standard diff is easy enough- especially with IDEs and GUI tools as good as they are.
The situation with most shared hosts is that version control can't be used and so expansion tags could be useful for some folks. For dedicated machines less so.
My point here is that it's horses for courses and the user should have the ability to decide whether expansion tags are included in downloads.
Any thoughts on how a choice could be integrated?
An afterthought.... should Id tags be left out altogether from source files as per discussed before and then have a date/version injected into the download if the user wants it?
Comment #26
fgmI discussed this issue (like probably most people coming from SCCS/RCS/CVS do) a few months ago with several people on #git and the best (IMHO) suggestion, once past their initial outrage ("one of /these/ again") that came out of it was to
- do not touch the individual files in a normal checkout/pull, à la CVS
- generate the version info in one file in the packaged releases
- yes, even -dev releases, and maybe especially them, since these are the only ones which do not carry a meaningful version information otherwise,
this provides a easy, human-readable way, of identifying which version of a package is actually being used, and does not mess with the people using live checkouts and suggesting patches. These can use git themselves to examine version information anyway.
Comment #27
chx CreditAttribution: chx commentedSo here is the scenario: you have a client, unknown origins. She has some Drupal. You need to determine and quickly whether module X is up to date. Right now, regardless of what way that file landed there it will have a definite version number in it and anyone capable of operating an FTP client can do it. Even over the phone. Easily. This is not something we want to lose. Therefore adding keywords on packaging is unacceptable because what if the original developer have not used the package but git? And then uploaded via FTP and now there is no git to check with? Next, what if a shop does not use git, their checkout won't have drupal.org versions? This sounds bad.
Comment #28
fgm@chx: this is exactly how I presented the problem at that time, and the solution satisfied me, considering anything on a customer site would be either
- a production release (hence packaged and even with a clean drupal version)
- a dev version (hence packaged, with drupal date info)
- a VCS (whichever) checkout done on-site (hence containing VCS version info)
However, I had not envisioned a developer pushing to a customer something that is not even a -dev version (hence packaged), AND doing so not via a VCS checkout, but a FTP from his own checkout without the VCS info. That sounds real bad practice, but after having audited some customer sites, I can now believe anything - including this - can happen.
So how about a checkout hook generating a version file, and that file being removed by a pre-commit hook: that way the files stay pristine for git comparisons, and there is still a human-readable version.
Comment #29
pwolanin CreditAttribution: pwolanin commentedA thought - in that case maybe the .info file in git shoudl be a template (.info.tpl or some such), and the checkout could create the functional .info file?
The problem here is that I'm not sure if these sorts of checkout hooks come along when you do a git clone.
Comment #30
sdboyer CreditAttribution: sdboyer commented@fgm: yeah, that's the basic pattern of argument I've made in the past - that the metadata is only useful if you're working with a packaged release, not something that's been directly cloned out. In the latter case, you should just use git to find out the state of items. And yes, given what rocketeerbkw demonstrated, it's quite pointless to have keyword expansion in all files if they're just going to reflect the latest snapshot, not data pertinent to that specific file.
There's no question that we'll need to have at least a single file with the latest version in it. The real question is whether or not that's going to be enough to satisfy these other, legitimate use cases. If it isn't, I can only think of two choices: we patch git to create the sort of placeholders we need, or we write something more elaborate into the packaging process that basically does its own keyword expansion. The latter might simply be prohibitively expensive in terms of sheer number of git commands it runs: for every branch being packaged, it would need to run something like
git rev-list --max-count=1 --topo-order <branch name> -- <file name>
. Git's fast, but unless we could figure out a way to batch that operation, I don't know if it's feasible.Comment #31
webchickPersonally, the "person who was savvy enough to build their site with Git checkouts on localhost but only has FTP access to the server" use case is not compelling enough for me to possibly delay this entire process for weeks trying to figure out how to hack in support for keyword expansion into Git. If they were savvy enough to checkout the stuff from Git in the first place, they're savvy enough to re-download the stuff from the sever and use Git to check the version numbers from their local computer. I guess my only concern is whether we can have the equivalent of a "Git deploy" module without this. If so, I don't see this as something worth expending a whole lot of effort on it, when there is a lot of other things to do to get the migration done.
I am, however, curious what the Git equivalent of 'svn info' is. I couldn't seem to find one that was not a shell script on someone's blog. But as long as there's a way for someone with a Git checkout to run
git foo command
and get the commit number the files correspond to, I'm fine with replacing $Id$ with nothing, personally.Comment #32
fgm@webchick: this script provides information similar to svn info, although not in a format as easily parseable. Could be a start.
Comment #33
hunmonk CreditAttribution: hunmonk commentedsdboyer and i were discussing this issue more today. i agree w/ webchick that we should definitely not over-complicate this issue. given that, i'd like to propse the following plan:
this solves the issue in a simple, straightforward manner for anybody that a) downloads packages, or b) checks out via git, which should be the vast majority of use cases.
Comment #34
sdboyer CreditAttribution: sdboyer commentedAddendums to hunmonk's items:
a. Clobbering (that is, unexpanding) $Id$ everywhere means we'll have no problems with stale tags. Think about it - after we migrate, if we leave $Id$ tags expanded then they'll remain unchanged in files, telling people something very WRONG about the file version. And no, it's not as easy to just clobber tags at branch tips - that would require a separate script, very much like the one this thread was originally, and should maybe again be, about.
b. Including last-modified SHA1/date for each file in a package is going to be bound by the issues I described in the latter half of #819874-30: Decide on a replacement for $Id$ tags in files. It might turn out to be fine, but I'm gonna be worried about it until we actually benchmark it.
Comment #35
dwwSince webchick raises it in #31, let me just say as the maintainer of the cvs_deploy module that it doesn't care about $Id$ tags at all. It's just directly inspecting the CVS metadata about a directory (the stuff you could find out with "cvs status") and using some hooks to tell update status/manager about that. So, FWIW, this thread has no bearing on the feasibility of a git_deploy module whatsoever.
Cheers,
-Derek
Comment #36
kbahey CreditAttribution: kbahey commented@hunmonk
I basically like the manifest file idea.
What I am not clear on is under such a scenario, how do we address the following use cases?
1. Someone deploys a site via CVS, but then edit them locally. I know this is bad, but it does happen. Right now with CVS and $Id$, I have a starting point on what the original files were, and can diff them against it to see if they were hacked.
2. Someone deploys a site via a Git checkout, then edits them locally. I don't have $Id$ anymore, so I don't know which branch they deployed from, so I can diff/merge against it.
Comment #37
dwwOh, and while I'm spreading useful info, please see #606592-4: Allow updating core with the update manager about how a manifest file could be useful for the update manager.
Comment #38
apadernoThose cases seem similar to the case reported from webchick in comment #31, for which she says we should not be interested in edge cases.
Comment #39
hunmonk CreditAttribution: hunmonk commented@kbahey: wouldn't it be best practice in this case to deploy a checkout from a git clone, and include the repo with the uploaded files? i think that would be possible, and thus allow for querying the repo if necessary, even if you have to re-download it from a shared server to do so. i guess that equals more bandwidth/disk space used, but worth the trade off i would think.
Comment #40
webchickWell, also, unless I'm reading #33b wrong, said evil hacker of core would have their MANIFEST.txt file to reference to know where (branch/tag/whatever), who (committer) and when (timestamp) the files originally came from, no? Isn't that just as good as $Id$? Or possibly better, depending on what other goodies we put in MANIFEST.txt?
Comment #41
kbahey CreditAttribution: kbahey commented@hunmonk
I was not advocating the above use cases. I was stating what do you do if you are faced with such a case.
To summarize, the use case is more like: "you took over a site from someone else who went against best practices" what is the course of action here?
If we use %awhatever in the file itself in a comment (i.e. replacing $Id$ with %aX), it would have the same effect as $Id$.
Granted, this is an edge case, and should not derail the whole process.
Comment #42
marvil07 CreditAttribution: marvil07 commented@webchick: about "git info": http://marvil07.soup.io/post/65026709/git-info (kind of OT, that's why it's outside)
Comment #43
sdboyer CreditAttribution: sdboyer commentedUnassigning.
Comment #44
webchickTagging. This is a task that might be good for a volunteer to tackle once we get a decision here.
Comment #45
webchickOops. :P
Comment #46
sirkitree CreditAttribution: sirkitree commentedI'd certainly prefer not to have info appended to my files. A separate manifest sounds like an awesome thing to have. We're already moving that way in our projects as it is with .make files to register what all (core, contrib, custom) is in the project so this approach totally makes sense to me for individual modules and such.
Also would love to know where some documentation is on this whole process to that I could actually try it out and work on this issue once a decision is made.
Comment #47
fgmWhy could this "manifest" file not be the .info file ? It would mesh nicely with other uses of the info file to hold version information and source URL, it seems.
Comment #48
xmacinfoThe .info file is parsed by Drupal for dependencies, version number, description, etc.
The MANIFEST.TXT file would not be parsed by Drupal and the content of this file would serve as information only.
Comment #49
sdboyer CreditAttribution: sdboyer commented@fgm - First, info files are already a morbidly obese hodgepodge. Also, info files need a fully bootstrapped drupal context to make sense - a good manifest needs nothing more than
md5
,cat
andfind
to be verified. Maybe most important, if we're creating a list of hashes of every file in a package, then that list can't be put IN one of the files it needs to hash.Comment #50
fgm@xmacinfo: yes, that was precisely the point
@sdboyer: about obese format: true; especially seeing the format extensions Features puts there. bootstrapped drupal not needed, though. But the third argument kills the suggestion indeed, you're right.
This leaves us in a somehow annoying situation, though: we will then have TWO metadata files in packages, the new manifest file, and the info file, which will surely be derided by critics. It reminds me of the ephemeral .schema files introduced then shelved during the D6 development process.
Comment #51
sdboyer CreditAttribution: sdboyer commentedTaking a note from gentoo, which has just about the most crazy-powerful packaging system in the universe:
That's 1) a metadata xml file for displaying info when searching in portage, 2) a Manifest file containing hashes, 3) a Changelog corresponding to changes in portage not (just) new apache releases, and 3) individual ebuild files that contain build instructions. Critics can bite me. :)
Comment #52
webchickFixing tag.
Comment #54
xmacinfoAlthough the manifext example on #51 is impressive. But for end usersm it's missing key information:
The date of last change. ;-)
Comment #55
sdboyer CreditAttribution: sdboyer commentedShould be phase 2.
Comment #56
sdboyer CreditAttribution: sdboyer commented@xmacinfo - The manifest in #51 is from gentoo, is VERY much not human-facing in their system, and is just an example. Not necessarily what we're going to go with. Including a datestamp in that big line of data wouldn't be difficult. Humans can use the datestamp, and machines can use the hashes. Everyone's happy.
Also, just a quick note because I just noticed #41 - @kbahey, the whole approach of using placeholder expansion is moot because it expands based on the current commit, whether or not the file was modified in that commit. So all placeholders look identical.
I'm gonna write up a definitive recommendation on a path forward for this soon.
Comment #57
webchickComment #58
chrisstrahl CreditAttribution: chrisstrahl commentedsdboyer to publish an update on this and resolve the issue.
Comment #59
sdboyer CreditAttribution: sdboyer commentedMan, I hate it when "very soon" turns into more than a month. Well, better late than never - here's how we should resolve this one. Please reopen the issue if there are glaring problems with what I've laid out.
There are a few pieces to this, so I'll break it out item by item.
$Id$
tags; git.drupalcode.org already does this, and has been doing it for a while. This'll prevent merge conflicts, will ensure stale information doesn't make it into the new, shiny git repos and end up confusing people.$Id$
in all projects. I've opened a separate issue for that (#914280: Create a script to remove all $Id$ tags from projects), so that this issue can be just about the decision.$Id$
with anything else. Drupal files will be unadorned by in-file meta-packaging info.Comment #61
chx CreditAttribution: chx commentedNote that although noone asked for me, a manifest file does work for me. It works with the "hapless client over the phone case". The manifest is like any other file in a module package so it gets updated the same time no matter the deployment. Good enough for me.
Comment #62
sdboyer CreditAttribution: sdboyer commentedGlad to hear it.