in working on http://drupal.org/node/128827 and thinking about http://drupal.org/node/125742 it's now clear that the existing XML-RPC protocol between update_status.module and project_release.module isn't good enough. :(

once upon a time, i thought update_status sent an array of (module name, major version) pairs in the request, along with a single field to indicate the core version (since you can't possibly have multiple versions of core on the same site). however, really it just sends an array of names, and a single core *and* major revision number, or no revision number at all if it wants to find out about the latest "default" version (as defined by the project admins) for each module. :( the reason for this is that it was easier to write a single (and therefore, faster) DB query in the project server code to answer a request if all the major revisions are the same. :( in fact, update_status.module itself currently gets this wrong for sites that have different major revisions installed -- it's just hard-coded to always ask about major revision 1. :(

so, either:

  1. we have to change the protocol between update_status and project, so that we send (name, major) pairs, which would be better for update_status, and better for project_usage, but worse for project's XML-RPC server (more expensive to answer the requests b/c we'd need a more complicated query, or perhaps more queries.
  2. forget about tracking major revision in all this, which is pretty much out of the question -- update_status can't actually provide the right answers if major revision isn't involved.
  3. have update_status send multiple queries, 1 for each set of modules at a given major revision. this seems inefficient and wonky. it makes it harder to store the queries for recording usage info, and wouldn't necessarily be any faster for project to answer the XML-RPC requests (since there are more of them), than putting all the data in a single request.
  4. work on the project* DB schema to consider ways to cache the info and make these XML-RPC requests less expensive to answer, so there's no problem with more complicated queries. for example, maybe project_release should just maintain a {project_release_latest} table with project_nid, core_version, major_revision, and release_nid as the fields (more or less).

a sort-of-related complication is how this protocol can be used to correctly flag updates as including a security fix. it'd be easy enough to have a new "release type" taxonomy, so you can flag individual release nodes as providing a security fix. however, say 5.x-1.2 is a security fix, a site currently has 5.x-1.1, but the most recent release is now 5.x-1.3, which fixes bugs, but not security holes. update_status should still tell the admin they need to upgrade to 5.x-1.3 because of a security problem, even though that was from 5.x-1.2, not 5.x-1.3. :( so, it seems like part of the answer that project_release's XML-RPC server sends back has to be something like "last security update on this branch", and let update_status do the work to decide if it needs to flag a given out-of-date module using that info. i only raise this here while we're considering how to fix this protocol.

the goal is to get update_status into core for 6.x, so time's really running out if we're going to get this right before the code freeze. hence, the critical priority for this issue.

Comments

drewish’s picture

the last bit about security status ties in with one feature that chx was asking for in my SoC project. he wanted to make sure that security got factored in. having the ability to mark the cvs tag releases (I'm not sure what terminology you use to distinguish the HEAD -dev release from the "stable" tag releases) as containing a security issue would be very critical to evaluating a module's quality.

drewish’s picture

Couple more thoughts:
* As I sort of said in the last post, I think the security issue flag should be per-release, but its not really applicable to -dev/HEAD releases so I don't know if a taxonomy is the right way to do this. It might be better off as a release node field.
* I don't know if I'm seeing the the or part of #1 and #4 they both sound like the right idea to me.

dww’s picture

the basic terminology we use is:

official release
something from a specific cvs tag with a specific version
development snapshot
something from a cvs branch, which is periodically rebuilt, and therefore, a moving target

see http://drupal.org/handbook/cvs/releases/types for more.

anyway, yeah, flagging official releases as including a security fix has been part of the design goals since day 0 of the new release system. the taxonomy i propose above is most of what we need. however, it just gets tricky with the update_status.module for the reasons i outlined above, which is why i'm bringing it up here.

dww’s picture

oh yeah, and to clarify, i only envision this taxonomy applying to official releases, not dev snapshots. however, i've run into this same problem of wanting a taxonomy to only apply to a subset of nodes of a given type (e.g. the database compatibility term i'd also love to see on projects only makes sense for modules and installation profiles, not translations or themes) but sadly, there doesn't seem to be a good, general solution to this right now. so, meanwhile, we might have to play some tricks with form_alter() to just hide this taxonomy selector when creating release nodes that are development snapshots (we already play some tricks like that to alter the taxonomy UI on project nodes for the "Project type" radio buttons vs. the "Project categories" multi-selects to classify modules)...

that said, i still think a taxonomy is the right solution for this, instead of something hard-coded into project_release.module as another field directly in the {project_release_nodes} table. here's why:

  1. even with some hard-coded logic, a "release type" taxonomy is more flexible for other sites than something hard-coded about "security update", especially since some sites want to use project_release without cvs.module (and the notion of dev snapshot vs. official release is in the realm of cvs.module, not project_release).
  2. this really is about classifying release nodes, and that's what taxonomy is good for.
  3. taxonomy already provides a bunch of nice features for free (e.g. we'd get an RSS feed of all release nodes tagged with "security update", etc).
  4. there's already some special-case taxonomy stuff in project* (e.g. the project type mentioned above, but also the "Drupal core compatibilty" taxonomy), so there's precedent for this sort of thing.

however, it does get a little messy once we get into the XML-RPC protocol, and this whole business of reporting security updates. then, the general, flexible niceness of a taxonomy can't remain clean and pure. someone has to get dirty and have specific knowledge of the security update term, and give it some special case treatment, because of the problem in the original post about a site running 5.x-1.1, which missed the 5.x-1.2 security update, and now 5.x-1.3 (which itself is not a security update) is already released. :(

dries’s picture

  1. Using the $base_url is not 100% either. If someone is using poorman's cron, he might send requests from http://example.com OR http://www.example.com/. I don't think there is anything we can do about this, unless we use "application keys".
  2. Server side hashing of the URLs is naive and provides nothing but a false sense of security. Let's not get side-track too far with that. It's easy enough to come up with a list of Drupal sites ...
  3. A daily cron-job that iterates over all requests doesn't seem like the proper thing to do. I also don't think we should store the data in serialized format. We have to process these requests anyway -- why postpone them to a cron run? It saves absolutely no work, it only creates more as we also have to store them in an intermediate format. Let's just update the project module table(s) as the XML-RPC requests come in, and then worry about how frequent these tables are queried by the data visualizer/reporter. The project modules should simply cache their queries for x days.
dries’s picture

Argh, I typed a long reply here but a 500 ate it. I'll re-post it later on, have to run now.

dww’s picture

attempting to summarize a few long discussions with merlinofchaos, webchick, and dries...

  1. merlin, webchick and i came up with a more fancy protocol that should include all the info the client could ever need. a draft of which can (for now) be found here: http://drupal.pastebin.us/25412
  2. once dries came around, and we started brainstorming some more, he and webchick proposed we move *all* the brains into the client, so that the server has very little work to do at all. basically, project_release.module would maintain .xml files for each project, with an entire historical dump of all releases. then, when the client (update_status.module for now, or system.module in 6.x, if all goes well) sends a request for N projects, it just gets back N .xml files, 1 for each project. the, the client has *everything* it could possibly need, and the server's task is reduced to saving usage stats, and sending back this .xml files. no DB queries (except to save the stats) required.

one possible downside of #2 is that not only would the client need to parse these XML files (good god, let's not turn this into another theme .info file thread!), more importantly, the client would have to have all the brains required to figure out stuff like "what's the latest release?", "is there a security update in between?", etc, etc. if we ever change anything about how project_release works, if most of these brains live in project_release, we can change everything in one place. once it's in the client (and especially once that client is in core), we're going to be pretty locked into how everything works.

Dries asked me to post info on the "brains" of the code i'm talking about. basically, see function project_release_data() from http://drupal.org/files/issues/xmlrpc_22.patch (care of http://drupal.org/node/48580#comment-191462). or, just look in the current source:
http://cvs.drupal.org/viewcvs/drupal/contributions/modules/project/relea...

basically, all the smarts are currently handled by building the right SQL query, since the db schema related to release nodes was designed to answer these kinds of queries. so, we'd have to basically do the same kind of logic in php on the array of parsed info we got from these .xml files.

i'm still torn, since the idea of putting all the brains in the client is great from a performance standpoint, but a little scary for other reasons (especially if/when the client moves into system.module in core)...

more brainstorming required, clearly. ;)

nedjo’s picture

Considering that we have XML-RPC parsing in core, we could use that as the format for the (static) XML files. Then the Drupal client would just have an array to deal with. So basically we'd be generating an XML-RPC file and caching it rather than responding on demand.

merlinofchaos’s picture

I don't know enough about the guts of our xmlrpc library to really comment; but if we can basically format the messages ahead of time and store flat-files it seems like a fantastic idea. Server load would be well reduced.

dww’s picture

Title: fix XML-RPC protocol for update_status.module » fix protocol for update_status.module

summary from another IRC session, this time with Dries and merlin at the same time. ;)

  • everyone seems to like the static .xml files to reduce the load on server.
  • everyone likes formatting these .xml files with the same layout as the XML-RPC code expects, so that parsing is effectively free.
  • open issues with the .xml files:
    • is keeping these .xml files accurate the responsibility of the dreaded packaging script?
    • how to store usage data? (see below)
    • how to aggregate site-specific data? (e.g. "sites that use foo.module also use bar.module")
    • how much brains do we have to put into the client to handle these big history dumps for each module (e.g. latest release, default branches, security updates, etc, etc -- don't want the client to get too complex, especially once it's in system.module for D6).

ideas about storing usage data:

  • we'd want something like http://updates.drupal.org/project/views/releases.xml?version=5.x-1.2 to record the currently installed version of views (and to answer the complete release history of views.module)
  • we could *just* record the IP of the request, however:
    • this will give bad results for multi-site (e.g. how many drupal sites are hosted with the same IP by bryght or civicspace?) and for shared hosting (e.g. a handful of IPs at dreamhost and site5 will have a huge # of (totally unrelated) drupal sites reporting from them).
    • IP is easy to spoof if you don't care about the answer
    • makes it basically impossible to have meaningful site-specific stats
  • could do some kind of anonymous hostkey, as discussed in http://drupal.org/node/128827:
    • more accurate stats overall
    • can have meaninful site-specific stats
    • still easy (even easier) to spoof/poison
  • the only *truly* un-spoofable, trustworthy solution is private application keys, some kind of public/private key exchange, etc. but, that'd make the usage stats about as useful as the ones from drupal.module right now, since it'd be such a huge barrier to setup, so few sites would participate. :(
  • regardless of how we store them, we'll want to age-out the usage stats over some period of time, so that bogus results don't poison the data forever. we probably want to save historical summaries by week or by month, anyway, to be able to datamine the usage trends over time...
  • even if we go with the site key, we could still store the IP along with the data, as a way to help detect and react to abuse, if we suspected it. e.g., if a ton of requests all come from the same IP, we'd could have a whitelist of known IPs for shared hosting/multi-site installs, but if some rogue IP shows up with 1000s of requests for foo.module (to artificially inflate the stats), we'd a) be able to remove all the data from that IP, and b) block that IP or do other things to punish the abuse.

Dries still didn't seem too thrilled with the hostkey thing, but merlin and I think that's the approach with the best "price/performance" ratio (i.e. most accurate stats for the least trouble).

everyone agrees this effort is important, so more thinking is required. however, there's a sense of urgency about getting this figured out, a new server side implemented, a new version of update_status.module as a 5.x client that supports it, and a patch against system.module for D6 that moves the client code into core, all before the june 1st code freeze. yikes... ;)

dww’s picture

eek, sorry, i realized i was a little bit sloppy with terminology. to clarify:

  • "hostkey" == "sitekey" == md5hash($baseurl . site's random private key). see Steven's comment at http://drupal.org/node/128827#comment-220176 (#6) in particular for motivation and details.
  • "application key" == some public/private keypair thingy that's exchanged with d.o some how.

and, to be clear, no one's actually in favor of the application key approach, since it's way too much work, and no one would bother to use it.

Paul Natsuo Kishimoto’s picture

This is exciting - subscribing!

Am I correct in thinking that the XML-RPC requests are actually serving two purposes? Namely:

  • determining the latest versions / security updates of installed modules, and
  • allowing d.o to collect usage information in modules.

Food for thought: these could be separated. From my (limited) experience with apt-based package managers, they download lists containing information (latest versions, dependencies, descriptions, details of security updates) on all packages. With 24000 packages in the Ubuntu repositories, the lists are less than 5 MB. There is a popularity contest analogous to Drupal's usage stats, but it functions independently from the package manager. Users can turn it off.

There are several repositories; analogues in Drupal:

  • 5.x compatible modules
    • official releases
    • development snapshots
  • 6.x compatible modules
    • official releases
    • development snapshots

If the processing onus is truly on the client (as in package managers), d.o would generate the very few module list files (gzip'ed) and serve them (straight from cache or via public download) as requested. If a client wasn't running any modules under development, it wouldn't download the snapshots list. Complete lists also enable the client to inform users "If you want Module X, version 5.x-1.2, you must also install these versions of Modules J, K and L. Here are the download links for all four."

In short, thought lessons from package managers (which seems to be, in the very very long run, where this is going):

  • Module (package) lists: small in number, higher on bandwidth, lower on repository (d.o) DB overhead.
  • 'Brainy' clients need data; better to provide it all than do lots of processing to provide just enough.
  • Popularity is fairly simple on its own, but complicates module (package) management and vice versa.

I hope that isn't too far off-topic. My point is that the above would obviate the need for an advanced protocol for popularity/usage tracking; the client would send d.o only the list of the modules it was running. It would obviate the need for *any* protocol at all for update status checking, in favour of an XML package list format. And, of course, that everyone else is doing it isn't sufficient reason to follow suit (unless it really works!)

dww’s picture

FYI: see http://drupal.org/node/142120 about setting up a "Release type" taxonomy on d.o as per my comments in #4 above. that's basically in the critical path for all of this security update flagging stuff to really work properly...

dww’s picture

after considerable further thought and more discussions in IRC, here are my current ideas about this:

  1. separate history dumps for each project and for each version of core make sense, since a 6.x install will never care about the release history for 5.x-* releases, etc. however, separate histories for official and dev snapshot releases doesn't make sense to me.
  2. putting all release history into a single file is a bad idea for a few reasons:
    • makes it impossible to record usage data from the queries
    • the entire history will basically be stale every 5-20 minutes, whereas if we keep all the projects separate, it's likely that the release history for any given project will be valid for days or weeks at a time.
    • much more bandwidth to transfer
    • requires more processing at the client to find what it needs
  3. the release history for a given project can be made stale by either a) a new release tarball or b) changes to an existing release node on d.o (e.g. to change the "Release type" taxonomy terms after the fact, to unpublish a release node, if an admin changes the title and version string to correct errors, etc, etc). therefore, we're going to have to keep track of changes to the release node that would invalidate the history cache and cause that to be regenerated.
  4. we want these history files written and owned by a privledged user, (much like the tarballs themselves) not httpd. the vast majority of the time, the history will be invalidated by a new release tarball. so the packaging script is really the perfect place to implement the .xml history generation.
  5. instead of making things more complicated, whenever *any* releases in a given history file are stale, we just brute-force regenerate that entire history file for that project, instead of trying to be smart and find the stale parts of the history and update them. this will just require 1 moderate DB query per stale history file, and it'll vastly simplify the backend code in project_release and the packaging script to use this method.
  6. the best way to handle this is as follows:
    • add a new "history_stale" column (bit) in {project_release_nodes}
    • any time we hit project_release_insert() or project_release_update() we set that bit to 1
    • the packaging script will continue to do what it does now to decide what tarballs to create.
    • during the packaging run, when the script updates the release node to give it the new tarball name, date, md5hash, and to publish the node, it will also set the history_stale bit. in case of catastrophic failure, so long as the release got published, we'll know the history for that project is stale and will know to regenerate it.
    • at the end of the run, it will query for all unique project+core_version pairs that have the "history_stale" bit set in the DB (mostly the ones it just updated, but potentially other release nodes that were modified via d.o, too) to make a list of all project history files that need to be regenerated.
    • after all of the .xml history files are re-generated, the script will update the {project_release_nodes} table to clear the "history_stale" bit on all entries. the only bummer is that we'll either have to LOCK this table while we generate the history, split up the clearing of the bit to only do it immediately after each project we generate, or run the risk that someone edits a release node while all this stuff is going on, and we miss the fact that the history is stale for that project.
  7. history files will be written to a new directory tree under the web root: files/release-history/[project_name]/[core_version].xml. i suppose we could just do everything in 1 giant directory, and just name the files [project_name]-[core_version].xml but i'm slightly worried about performance, and i think subdirs might be better. this point doesn't really matter, and we could change it later, since i'm not envisioning that clients pull the file directly from apache, but instead go through a menu call-back
  8. we'll want a menu callback on d.o to a) serve up a requested .xml file and b) optionally record usage stats as described above and in other places. menu callback will be at http://drupal.org/project/release-history/[project_name]/[core_version] -- e.g. http://drupal.org/project/release-history/views/5.x for 5.x-* releases of views.
    • i don't want this to be a regular item, tab on the project nodes, etc, since regular users shouldn't normally navigate to this page, anyway
    • the module .info files reliably give us the project name, but not the project nid, so it's safer to use the project_name here.
  9. i've got a pretty simple "schema" for the XML for the release history (see below). the hope is that this is trivial enough to parse with php's native xml parser (yes, php4), or if not, with a very trivial parser of our own. if it turns out we need to tweak the below representation a little bit to make it easier to parse with php's native parser, i'm fine with that.
  10. for maximum flexibility and functionality, the release history will include *all* taxonomy terms that are associated with each release. it's up to the client to decide which terms it cares about and how it treats them specially.

the .xml files for release histories would be something like this (this is cut and paste from a local test site, the URLs, nids, etc will all be different for real):

<project>
 <name>signup</name>
 <link>http://iskra.local/drupal-5/node/1</link>
 <api_version>5.x</api_version>
 <default_major>1</default_major>
<releases>
 <release>
  <name>signup 5.x-1.0</name>
  <version>5.x-1.0</version>
  <version_major>1</version_major>
  <version_patch>0</version_patch>
  <release_link>http://iskra.local/drupal-5/node/3</release_link>
  <download_link><a href="http://iskra.local/drupal-5/files/projects/signup-5.x-1.0.tar.gz">signup-5.x-1.0.tar.gz</a></download_link>
  <date>1178379854</date>
  <mdhash>795dc6cca4f569e1447f3ac166182023</mdhash>
  <terms>
   <term><name>Release type</name><value>Security update</value></term>
   <term><name>Database compatibility</name><value>MySQL</value></term>
   <term><name>Database compatibility</name><value>PostgeSQL</value></term>
  </terms>
 </release>
 <release>
  <name>signup 5.x-1.x-dev</name>
  <version>5.x-1.x-dev</version>
  <version_major>1</version_major>
  <version_extra>dev</version_extra>
  <release_link>http://iskra.local/drupal-5/node/4</release_link>
  <download_link><a href="http://iskra.local/drupal-5/files/projects/signup-5.x-1.x-dev.tar.gz"></a></download_link>
  <date>1178379858</date>
  <mdhash>795dc6cca4f569e1447f3ac835720123</mdhash>
  <terms>
   <term><name>Database compatibility</name><value>MySQL</value></term>
   <term><name>Database compatibility</name><value>PostgeSQL</value></term>
  </terms>
 </release>
</releases>
</project>

my concrete action plan:

  • add the "Release type" vocabulary on d.o and start getting some raw data in there (e.g. i'll go back and add the "Security update" term to some of the releases i've made to address various SAs). see http://drupal.org/node/142120
  • write a patch against project_release to include a db update for this new "history_stale" bit in {project_release_nodes}, and the project_release.* code to set it to 1 whenever needed.
  • roll a patch against package-release-nodes.php to do all this .xml history generation on demand (this is mostly done on a local test site -- that's where most of the example .xml came from).
  • run the history generator once on *all* projects/versions to initially populate the directory tree on the d.o filesystem.
  • write a patch against project_release.module to register a simple menu callback (as discussed above) to serve up the appropriate .xml history file based on the incoming request. this callback will also be where we log usage info, but that's phase 2.

as soon as i get the initial .xml files up there, merlin can start working on the new brains in the client for update_status 5.x-2.0, so we can get a working prototype out in the wild to make sure it's all happy while we work on a) storing usage data and b) a patch for D6 core.

any objections to anything above? speak now or forever hold your peace. ;)

thanks,
-derek

dww’s picture

dries and i chatted via IM about this. mostly he loves it. he suggested a few simplifying assumptions:

  1. we should forget about the history_stale bit in {project_release_nodes}. let's just (at least initially) re-generate all release history files every N hours (say, N=6). he considers it a feature that there's a little delay, anyway. ;)
  2. if it was easier and faster to implement (not talking performance, we can always optimize/tweak the server side later), he'd even be happy storing the individual XML files (pre-generated, of course) directly in the DB and serving them from cache, instead of the filesystem.

i'm all in favor of #1, that sounds great. not sure about #2 yet, honestly, i think files on disk will be easier to get working quickly, but i could be wrong.

dww’s picture

Status: Active » Needs work
StatusFileSize
new8.52 KB

here's a mostly working prototype to generate the xml files. currently, it only logs to the watchdog, instead of spitting out files, but i have to run right now and can't spend the extra 15 minutes to get this fully working. ;) however, i wanted to share what i've got as a backup and so others can review and poke holes in the mean time...

dww’s picture

Status: Needs work » Needs review
StatusFileSize
new10.91 KB

code-complete and tested locally. writes everything (carefully) to the file system. the live .xml files are only touched via an atomic call to rename(), so there's no fear of clients accidentally grabbing a partially-written version if they happen to ask in the middle of while this script is generating files. i guess the next step is to let this rip on d.o. however, i wouldn't mind a 2nd pair of eyes before i did that, just to make sure everything is cool. of course, we're not writing to the DB at all, and only writing to a very specific, well-known directory tree, so there's really no harm that could be done. but, better safe than sorry. ;)

dww’s picture

StatusFileSize
new11.43 KB

a few extra check_plains(), just to be safe... thanks to webchick for having the extra paranoia filter on while reviewing. ;)

while i was at it, i added a big comment to where we create the filenames we use to explain what parts are safe and what parts aren't. and, for pure paranoia, i added checks to remove '/' from the Drupal core compatibility vocab terms when generating the filename... tried it on a test site and it worked nicely to avoid "../../../" style attacks using these vocab terms.

drewish’s picture

looked over #18 and had a few comments.

i wouldn't mind seeing $fatal_err = FALSE before it's first used. it'd be E_ALL compatible and make it's lifetime a bit clearer.

not to be too picky but i though the Drupal coding standard was to capitalize TRUE and FALSE

you've got what looks like some dead code:

$project_uri = $project->uri;

should 'Published' and 'Unpublished' really be translated? It seems like it's being used as a constant more than a label.
$xml .= ' <status>'. ($release->status ? t('Published') : t('Unpublished')) ."</status>\n";

would it make sense for $drupal_root, $site_name, $dest_root, $dest_rel and $dest_full to be defined as constants rather than global variables? they don't change and it'd help them stand out...

also would you mind posting one of the output files?

dww’s picture

Status: Needs review » Needs work
StatusFileSize
new1.81 KB

thanks for the careful review, drewish. actually, last night before i went to sleep, i started working on the project_release.module side of things, and decided that it'd be best (and more consistent with the rest of drupal) to have a setting for the directory to use for this, so a bunch of the $dest_* stuff is gone in my workspace now. just putting the finishing touches on that. i'll fold in your other suggestions, for sure.

re: TRUE vs. true, wasn't there recently a thread on devel@ that said 'true' was a few % faster? i don't really care, i'll stick with TRUE for consistency, until core switches...

and yeah, i guess there's no good reason to t() published vs. unpublished in this case. ;)

attached is a sample output file from a test site. it's full of bogus data, and not all of the releases even have files attached to them, so some of the attrs are missing, but it should give you an idea. (speaking of which, just noticed another bug: we shouldn't include the <download_link> tag if there's no file attached to the release, yet.) looks a lot like what i posted above in #14.

dww’s picture

Status: Needs work » Needs review
StatusFileSize
new11.34 KB

new improved version of the generation script, including all of drewish's suggestions. also relies on a variable_get() for the relative directory stuff, instead of having that configured in here, since project_release.module needs the same info. initial patch against project_release.module coming next.

dww’s picture

StatusFileSize
new5.02 KB

initial patch against project_release.module. doesn't actually serve up the .xml files yet, but it does everything else (all the validation, handling error cases, adds the necessary settings, etc).

drewish’s picture

Looking at the output..

It seems like these should be made consistent. Perhaps without the a href bit? less parsing and all...

  <release_link>http://iskra.local/drupal-5/node/9</release_link>
  <download_link><a href="http://iskra.local/drupal-5/files/"></a></download_link>

It's probably been mentioned before but why are we including unpublished releases?

Should we be including the version in the name? It seems redundant. Is that already baked in to the node title?

  <name>signup 5.x-1.x-dev</name>
  <version>5.x-1.x-dev</version>

And now looking at the code...

DIR_SEP isn't really needed... PHP's pretty good about treating \ and / the same on windows... You could use the built in DIRECTORY_SEPARATOR http://us.php.net/manual/en/ref.dir.php

it may be over thinking it but depending on which version of PHP is being used and how the script is called, $argv[0] may have '/usr/bin/php' as a value.

drupal_chdir() isn't being called from anywhere.

dww’s picture

StatusFileSize
new11.07 KB

re: including info about unpublished release nodes: if you happen to be running a version that's since been unpublished, that's *really* bad news, and the client should be able to tell you that. however, i agree it's dumb to include links to the release node and the download link in this case -- we just want to include enough info for the client to identify it's the same release, and then to be able to freak out appropriately. ;)

otherwise, i incorporated all of drewish's suggestions in #23. thanks!

so, here's the new generator-script...

dww’s picture

finished the project_release.module side of things to actually send back XML. ;) for consistency, all errors at this menu callback are reported via XML, too, so once you land on that page, you always see XML (e.g. if the requested project is not found, has no releases for the requested version, etc).

IMHO, both parts of this are now RTBC, but additional reviews wouldn't hurt. i'll probably deploy on scratch.d.o this afternoon and let it rip, to see how things are looking.

dww’s picture

StatusFileSize
new1.52 KB

oh, and re: "Should we be including the version in the name? It seems redundant. Is that already baked in to the node title?" -- yeah, the version is just part of the node title for the release nodes.

at request, a new copy of the generated xml file (still with bogus data from the test site... soon y'all will just be able to see this directly on s.d.o...)

dww’s picture

ran the generator script on s.d.o, and it took less than a minute (and the initial run is the one that has to create the most directories). a subsequent run took only 12 seconds to re-generate all the files. ;) granted, that's on s.d.o, which is using the new beefy DB server, but still...

i also applied my patch against project_release.module on s.d.o, so the menu callback is now working. for example:

http://scratch.drupal.org/project/release-history/views/5.x
http://scratch.drupal.org/project/release-history/project_issue/5.x
...

obviously, your browser is going to do funny things to display the XML, but if you view source, you'll see it's all happy. the project_issue example includes some taxonomy terms, too, since i had already classified some of the security releases i've done with the new Release type terms.

any final tweaks to any of the behavior or comments on the code? i'd like to commit all of this to CVS and install it for real on d.o.

drewish’s picture

sweet! it looks good to me. when you think about it most of the execution time is probably spent running you error-triple-checking code ;)

drewish’s picture

sweet! it looks good to me. when you think about it most of the execution time is probably spent running you error-triple-checking code ;)

dww’s picture

re: when you think about it most of the execution time is probably spent running you error-triple-checking code ;)

hah, you laugh, but i've been developing project* code on d.o for over a year now, and it *never* ceases to amaze me how often the impossible happens once i move to the live environment. ;) sure enough, the script generated 25 errors on those "impossible" cases that should never happen, due to the weirdness that is the live d.o data...

about 20 of those appear to be from behavior i didn't think about before, which i now can't decide if it's a bug or not (probably it is). apparently, it's not uncommon for entire project nodes to be deleted (i thought we usually unpublished them, but i guess not). anyway, there's nothing in project* that notices when you delete a project node that goes off to delete all the release nodes for that project. so, we've got a bunch of entries in {project_release_nodes} pointing to projects that no longer exist. who knew? ;) submitted http://drupal.org/node/142957 about this, so if you're interested in discussing further, go there.

the other ~5 are from cases where the "HEAD" release node has been assigned a core compatibility term (e.g. 5.x or 6.x) without having a reasonable version string. i really need to finish up http://drupal.org/node/89699.

dww’s picture

StatusFileSize
new11.18 KB

in IRC, merlin and i decided we might want the project short name returned in the XML, for example, if you wanted to fetch a bunch, smoosh them together, and parse all at once. furthermore, "name" is rather ambiguous for the full project name (aka "title"). so, here's a new version that calls the full name "title" and adds the short name as "short_name".

dww’s picture

Status: Needs review » Fixed

committed to HEAD and installed on d.o.

the initial run took a *lot* longer on d.o than on s.d.o, i guess the old DB server really is orders of magnitude worse than the new one ... took 7 minutes and 5 seconds. instead of ~45 seconds. second run took 5 minutes and 21 seconds, as opposed to ~15 seconds on s.d.o. yikes.

added a cron job as drupal-cron, too, to re-generate every 6 hours at 25 minutes after the hour. we can tweak that as needed.

Anonymous’s picture

Status: Fixed » Closed (fixed)