Closed (won't fix)
Project:
Drupal.org customizations
Version:
6.x-3.x-dev
Component:
Code
Priority:
Normal
Category:
Task
Assigned:
Issue tags:
Reporter:
Created:
17 Sep 2010 at 04:27 UTC
Updated:
17 Apr 2015 at 17:38 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
dmitrig01 commentedI know this is only marginally related but this has the potential to greatly help drush make in being more intelligent about what's a project, where to put projects, how to re-make a file, etc.
Comment #2
dwwThis might also help the Update manager. See #986616: Update Manager fails when the primary module/theme for a project lives in a subdirectory However, I think we're screwed there in the (cvs|git)_deploy cases... *shrug* No, I don't remember an older issue about this. The concept has come up numerous times, but I think this is the first time anyone actually opened a separate issue to discuss it.
Comment #3
chx commentedI do not know what phase 3 means here but this is fairly mandatory to have the moment git is deployed. I thought we agreed on this when $Id$ got removed.
Comment #4
sdboyer commented@chx - yikes, thanks for responding on this issue and resurfacing, this had slipped off my radar since I initially tagged it phase 3 and I haven't been watching that queue (I'm rechecking it now). Phase 2 is everything leading up to initial launch, phase 3 is everything after, and I agree that the upshot of that discussion was that we'd need to launch with this. So I'm retagging it, and we'll talk about it at tomorrow sprint/sooner.
I suspect we're not going to be able to get this done for the initial public testing launch since that's only a week away, but shortly thereafter. So I've tagged this for sprint 9, our next sprint.
Comment #5
mikey_p commentedI skimmed the other issue, is there a spec that we're trying to match with this?
Comment #6
sdboyer commentedNo, we kicked around some ideas in that other thread, but never settled on anything definitive. Step 1 here is defining such a spec. In #819874-51: Decide on a replacement for $Id$ tags in files, I copied in a manifest used by gentoo's portage as an example. That's one place we could start from.
Comment #7
dww(I know it's terribly confusing, but project_package is actually for something completely different than releases and packaging releases). ;)
Defining the spec should definitely continue to happen. But no one should actually start writing a patch for this until the dust settles on #781246: Write Git-specific packaging plugin for d.o. So, I'm almost tempted to mark this postponed. But yeah, we need a spec, so it should remain active.
Comment #8
eliza411 commentedRetagging to git sprint 10. this isn't going to happen this sprint.
Comment #9
sdboyer commentedNow that we've gotten #781246: Write Git-specific packaging plugin for d.o together, this is ready to be worked on. And honestly, shouldn't be too tough. Let's say the spec for this is:
(last modified commit SHA1) (file md5) (relative filepath)
Putting the relative filepath last is important, I think, since it can contain spaces; we're guaranteed the others won't. Anybody have a problem with this proposal? Have I missed something?
Comment #10
dwwShould there be a header with meta-data about the contents overall? URL to the Git repo it came from? Project URL? Release URL? Creation date? Anything else that would also show up in the release-history XML for that project or release?
Also, for each entry, do we want to include the date of the last commit SHA1?
Comment #11
sdboyer commentedRe: header - yeah, I'd be good with that. Commented lines (leading hash character) with anon clone URL. Project URL seems fine, too. I dunno if creation date matters that much, but I don't know the update systems.
Human-readable timestamps at the end of each line indicating the last modification time for each file would also be important. If something programmatic actually needs the data, it can always convert it to a more usable format, but a big part of the goal here is making the timestamps usable for people, per all the discussion in the original issue.
Comment #12
dwwGiven that this MANIFEST is going to include a Git hash, and given that we're talking about a new file created during packaging and included in the various archives, this needs to live in the d.o-specific packaging ctools plugin, not in project* itself.
Another thing that would be really helpful is if the header included *just* the project shortname. I know that's technically going to be in some of the URLs, but it'd be better not to have to parse that out of URLs and just have it directly. Probably wouldn't hurt to include the version string while we're at it.
Comment #13
dwwHere's a patch and an example of the resulting MANIFEST.txt file for project-6.x-1.x-dev.
Comment #14
dwwAnd now with a human-readable key as the last header entry to explain WTF the payload of the file is.
Comment #15
dwwNow with the Git SHA1 that the release itself is being generated from in the header...
Comment #16
dwwCommitted #15 to DRUPAL-6--3 of drupalorg. We might want to tweak this later to deal with a few of the TODO comments I left in the code, but this is good enough for now.
Comment #17
chx commentedHrm, I am not too happy with the line breaks. Can we remove them?Edit: my browser fooled me, sorry! Can we remove the timezone +0000 if it's +0000 always?Comment #18
sdboyer commentedYeah, dww mentioned wanting to remove those, and it's consistent with the ISO standard, so I'm fine with doing that.
Comment #19
sdboyer commentedWhoops, xpost.
Comment #20
sdboyer commentedSorry for not posting this earlier, but...
On further reflection, I'd like to roll this patch back. When I tweeted about this shortly after the patch went in, Crell responded by pointing out that we've invented yet another Drupal-specific pseudo-standard here. I was (quite) annoyed initially, but couldn't really contest his basic point, so I asked him to point me to an alternative. He pointed at PECL initially, and suggested that I talk to mbutcher who'd done a fair bit with it. Turns out PECL's XML-based packaging metadata really doesn't help us at all, but Phar just might. In particular, its manifest format should be of interesting to us: http://us2.php.net/manual/en/phar.fileformat.phar.php and http://us2.php.net/manual/en/phar.fileformat.manifestfile.php .
Without much experience actually using Phar, looking through the set of options it provides, it's very tempting. It can generate tar or zip, use bzip/gzip/zlib compression, and maybe most importantly can generate phars, which is itself a potentially very useful format for distributing components (not the least for the case where you're trying to keep clueless clients from modifying files). We have no use for phars right now, but if we ever want to use them, we'll need to start using Phar. And I don't think that's an unrealistic possibility. Also, it's got built-in support for assembling a package from an Iterator instead of just a literal filesystem location, which means it'd probably be possible for a suffiicently enterprising mind to write an entirely stream-based package assembly system that means packages in their finalized form never, ever have to hit disk. And, y'know, we avoid yet more NIH.
There are other benefits, too, but those are the big ones. As for disadvantages, I see three:
- Using Phar means PHP 5.3, which our infra isn't on quite yet.
- The Phar manifest uses totally not-human-readable UNIX timestamp instead of an ISO date.
- The code itself gets a bit more complicated, especially if we go the in-memory-assembly route. I'm loathe to even really mention this as a disadvantage, though. Optimizing lower-level operations can mean avoiding lowest-common-denominator approaches, and we shouldn't hesitate to reap those benefits for a second.
The MANIFEST is an important and really useful thing, but as soon as we settle on a standard, we're pretty much stuck with it. Which is really why I'd like to pull this back out and try to be a little more deliberate about this decision than just dww and I bandying some code back and forth for 90 minutes. IMO, this really isn't a launch blocker, as we can always go back and retroactively generate packages. It's much more important that we make a good, forward-thinking decision about this.
Whatever we do, though there is an immediate problem with the patch that was committed. I can't remember if switching to the manifest meant we abandoned the decorating of .info files, but this logic:
now throws errors because $info_files never gets defined. It would be great if this could get immediate attention, whether we intend to keep the manifests or not.
Comment #21
Crell commentedOn the infra side, Phar is bundled with PHP 5.3 but can be installed on late-model PHP 5.2 as well: http://de.php.net/manual/en/phar.requirements.php
I am unsurprisingly in favor of using an existing manifest format over rolling our own. While it may be a bit more complex, this is a file that is mostly intended for automation and for minimal human checks (eg, quick glance to see if a number is bigger than another number) so that's OK. The potential benefits of leveraging an existing standard, one that was already designed to handle a variety of situations, outweigh the non-purpose-built downsides.
I have not looked into Phar itself in much detail so I cannot yet comment if that is the "right" format to use, but sdboyer's analysis above suggests that it is. It is a format designed for packaging up and distributing applications and modules, so it does seem like a natural fit. (I believe it was modeled on Java's "JAR" file format, at least in original concept if not in implementation.)
Comment #22
dwwI fixed the .info processing at http://drupal.org/cvs?commit=500632 (only in DRUPAL-6--3 branch due to this patch).
I don't know enough about phar yet to know if it's the right solution.
I had a clarifying chat in IRC with sdboyer, and luckily, we're on the same page about a number of things:
A) If we're going to use a phar-compliant manifest file, it's going to be b/c we're actually creating phar archives via their whole class system and getting the manifest for free.
B) It's not obvious phar is the right solution for us, that requires more research.
C) Generating archives on-demand at download time is not a win for us, and not a reason to use phar. Disk is cheaper than server CPU/RAM, especially since in this case, disk is being served to disk by separate hardware (ftp.d.o) that's optimized for the job.
D) Sam was only talking about perhaps being able to create the phar archive in RAM in the first place, instead of doing it initially via file I/O. However, again, this is basically pointless/premature optimization -- util.d.o (where packaging will be running) is not having I/O problems now. If we want to reduce file I/O during packaging, #381956: Mark project releases as "to be updated" only if pushes happen there would be a vastly more important win than in-RAM phar'ing... ;)
E) It's not a Git-launch blocker to switch to using phar for this stuff. Really, that's a separate proposal that needs time and thought, not a hasty decision to be rushed in.
-----
Things we're not sure we agree on yet:
F) I don't want the possibility of someday using something cooler to prevent us from doing something useful now. "Don't let perfect be the enemy of good" and all that. Sam characterizes this as: "i tend to look at it not as perfect enemy of the good, but how a quick solution to a not-pressing problem could obstruct much niftier things in the future". If/when we have something better, I think we could either tell people "sorry, our homebrew is being replaced by $cooler, deal with it", or we could continue to let working code work, provide our MANIFEST.txt for those that want it, and let phar or whomever else add something else next to it for folks who want that.
G) This MANIFEST file will help the Update manager in core (a lot). There are a number of problems this could help solve. So, I think this is more of a pressing problem than I think Sam does.
-----
Still leaving this as needs work to either roll-back per #20 or strip off the useless timezone per #18. However, I need more clarity on the plan before doing either... ;)
Comment #23
sdboyer commentedI'm not sure I actually agree on D), but that's OK - it's a later discussion quite separate from this one.
Comment #24
webchickHonestly, I'd rather push this off post-launch. While I agree with G) that this will ultimately be a really awesome thing for our community, there is still some contention around the ultimate solution, and we definitely don't want to do this twice.
Let's take another look at this once the big stressful thing is over and we can have some fresh eyes, and get the larger community involved in the discussion.
Comment #25
sdboyer commentedI've reverted the two commits made here so far, so that we a) don't have the spectre of this hanging over our heads with launch (there's already plenty of things like that, tyvm) and b) aren't generating inaccurate tarballs with gd.d.o anymore.
Comment #26
sdboyer commentedWe've officially punted this to post-launch, updating tags accordingly.
Comment #27
bwinett commentedJust wanted to note that there is an open security vulnerability with Phar: http://www.securityfocus.com/bid/47545/info .
Comment #38
mikey_p commentedThis seems to have broken sometime somewhat recently, maybe in the Drupal 7 upgrade?
Comment #40
drummWhat exactly is broken?
Comment #41
mikey_p commentedSorry, I thought this was what also added the list of projects and versions to the release node for a distribution.
Comment #42
drummSeems like we haven't been missing having this.