Closed (fixed)
Project:
Project
Version:
5.x-1.x-dev
Component:
Releases
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
8 Feb 2006 at 23:45 UTC
Updated:
2 Mar 2007 at 14:46 UTC
Jump to comment: Most recent file
Comments
Comment #1
Chris Johnson commentedSee also issue 48582.
Comment #2
Chris Johnson commentedIn this issue, we are looking at a simple but effective means for drupal installs to check for available update information.
Under this approach, an install would query a central server (e.g., drupal.org) for the latest available version of a particular module or theme.
To respond to this request, the project module would return a simple text file in php_ini format, e.g.:
md5:93284729238747
url:http://www.drupal.org/project
version:4.7.3
Needed:
* a URL (as a menu item?) to be called via a GET from the client which will return the above data.
* a way to generate a checksum and version number for each file included in a module or theme. The project module already generates checksums on the nightly tarball for each module or theme.
This proposal comes from the Drupal Vancouver conference, and was the consensus suggestion of: Neil Drumm, Jeremy Andrews, Chris Johnson, Nedjo Rogers and Omar Bickell.
This issue is for the server-side code which will be in the project module. Issue 48582 is for the client-side code which will probably be in the system module.
Comment #3
nedjoHere, to get the ball rolling, is a very rough idea of how we could do this. Anyone please wade in and make improvements.
Warning: I haven't yet taken the time to understand how we're generating and storing version data, so my guesses here are very likely to be all wrong.
This patch declares a new menu item at, e.g.,:
/node/43/updates
where 43 is the nid of a project node. This is expected to be accessed with an additional argument of a major and minor version, e.g., 5.0:
/node/43/updates/5.0
What is returned is three lines in the format:
patch_version = 1
file_path = http://example.com/files/somefile.tar.gz
file_hash = 93284729238747
where 1 is the highest available 'patch version' for the given major and minor version for that project. If no such version exists, the return value is:
patch_version = null
To use this data, a contrib module would issue a get request, passing data parsed from the VERSION define in system.module.
The contrib module would also need access to the base url for the project, in this case, /node/43. I'm thinking we've put something relevant in the .info files?
On this basis, the contrib module would determine if the highest available version is higher than the currently installed one and respond accordingly.
Issues/questions:
1. I'm forgetting how the arguments fed into menu callbacks work. Likely we don't need to explicitly fetch the $version from an arg.
2. How are the version data actually structured? What exactly should we really be trying to return?
3. Is project_release.module the logical place to put this? I'm thinking so because it's release-related.
Comment #4
dwwthanks for taking an initial stab at this, nedjo!
sadly, i don't have much time to devote to this right now, since i'm working on 2 critical things before the 5.0 release:
http://drupal.org/node/105192
http://drupal.org/node/105986
however, a few quick replies:
a) if you haven't already, i'd give a thorough read to http://drupal.org/node/94154. instead of every install generating a separate http request for every contrib, the general idea is that a single request for a big list of latest versions would be best.
b) yes, we store the "project" automatically in the .info files. this is the "project short name" from the project node, which in theory always gives you the project page, since we auto-generate the http://drupal.org/project/[foo] aliases. but, the idea was the request would have a list of these "project" names (as distinct from the human-readable name that lives in the "name" field), and their corresponding "version" values.
c) version info is (almost) all in the {project_release_nodes} table. probably all we need to return is the human readable version string ("version"). the additional version info comes from the "Project release compatibility API" taxonomy, which on d.o is called "Drupal core compatibility". but, this stuff is encapsulated in the version string already. i'll be the first to admit it's all a little wonky, and it's not yet really documented anywhere (except in the code), so by all means, ask questions and i'll do the best i can to clear up any confusion.
d) yes, project_release.module is the right place for this stuff.
that's all for now, back to my critical 5.x upgrade tasks for d.o... ;)
thanks!
-derek
Comment #5
nedjoAs per discussion in http://drupal.org/node/94154, this patch is changed to:
* use xmlrpc
* return data for all projects.
A contrib module implementing this would issue an xmlrpc request as follows for all projects:
or for a single project:
Question: how do we determine what is the highest available release for a given major/minor release combination? In the current patch, I've matched the version value through a LIKE where clause and then done a simple sort by version DESC, but this is not likely to work.
Comment #6
dopry commentedIt would be nice if the data array were better keyed... like with the name form the info files...
and shouldn't the drupal_xmlrpc be project_release_xmlrpc?
Either way this is awesome code brewing future goodness... as soon as this is available on drupal.org it will be awesome... big kudos.
Comment #7
dwwstill no time to look closely, but a few thoughts:
@nedjo: look at the schema for the {project_release_nodes} table for the details on how we store version info. also, if you look at, for example,
project_release_project_edit_releases(), you'll see some code that's computing the most recent releases for each "core version" (which is really a taxonomy, we just call it "Drupal core compatibility" on d.o).@dopry: thanks for looking and sharing your thoughts. however, in general, i'm pretty down on the idea of project*.module parsing .info files. .info files are *totally specific* to drupal.org. we can't hard-code this module to parse those. if we're going to get any data out of .info files, we either need to:
a) put that into the d.o-specific hacks module (project_drupal_org.module or some such).
b) have the packaging script (which is already d.o-specific) take data out of the .info files (it already finds *.info so it can update the version string and module identification correctly) and stuff that into the DB in some generic way.
both approaches require more thought. so, meanwhile, if there's ever a solution that doesn't involve the .info files, that's my preference (though i admit it's lame there are now (at least) 4 unique places you can name your module: the filename in CVS (which determines your hook names), the title of the project node, the "project short name" in the project node, and the "name" in the .info file). at most, that should be 2: machine-readable (filename, hooknames + project short name) vs. human readable (project title and .info file). maybe the packaging script should always overwrite what's in the .info file with the project title. ;) we already enforce that the "project short name" has to match the directory in CVS. so long as the file names match the directory name, we shouldn't get any divergence there...
Comment #8
moshe weitzman commentednice.
i would like to see a bit more summary info about each project in the XML. for example, the author name, author uid, project.module description . if it isn't too hard, taxonomy term names and urls would be nice too.
the motivation here is that other apps will want to consume this data. specifically, one might setup a module which publishes the modules in use at a site and gives the site owner a chance to comment on how he is using them. or a 3rd aprty site might use this data to provide a ratings and reviews service for projects.
Comment #9
merlinofchaos commentedNedjo: Bless you for getting to this!
Comments:
1) why is the callback named drupal_xmlrpc()? Shouldn't it be project_release_xmlrpc?
2) The comment on project_release_ping is wrong.
3) I would not name the function 'ping', I would name it something more descriptive for what you're doing. query_project perhaps.
4) Version data probably needs to be paired with api version data -- while the 'version' string includes this, it'd be easier on the client if it doesn't have to parse this field, and I believe all of the data is available. Or maybe it needs to be parsed, but let's not make the client parse it.
5) Allow the query to request versions only for a specific API version -- any given Drupal site will only want updates for a given version, though something that gets all API versions or a list of API versions would be useful too (i.e, can I upgrade from Drupal 4.7 --> 5)
6) As for additional data we make available, I'm not sure how much we can add that is useful at this time, though I take Moshe's point and project description and taxonomy makes a lot of sense. The taxonomy should just be available on the project node, too.
Comment #10
Paul Natsuo Kishimoto commentedNedjo, this is great—thanks! To add to the discussion:
@dww: I have a 'noob' question -) Is there a historical reason there isn't a hook_version() or something like it for modules?
@merlinofchaos: about your 4) and 5), Nedjo's sample PHP in #5 passes a major.minor version, which maps easily to an API version. I'll have to take a closer look at the contents of the Project DB tables to be sure, but Nedjo's implementation looks right, i.e. accept "5.0", "4.7.3" etc. as a parameter, then use SQL LIKE (or truncate to "5.", "4.7.") to match API versions in the database. .info files contain 4.7.x-1.0/5.x-1.1/5.x-1.2/etc.-style versions, so the client doesn't need parse—merely compare. If the XMLRPC and installed versions don't match, it's guaranteed the version on d.o is newer.
Comment #11
nedjoHere's an update to fix some of the errors noted and sketch out some of the suggestions. Plus some documentation.
We support an 'all' option for $version as well as $project, to fetch all versions for a given project, or for all projects.
As per Moshe's and Paul's suggestions, we introduce an option to fetch extended data (though I haven't put anything in there yet).
The query fetching the version data is now certainly broken, since the DISTINCT will prevent getting all versions.
Seeing some of the actual data stored in {project_relase_nodes} would be very handy for those of us still struggling to grok this. Could someone post the results of, e.g.,
SELECT * FROM project_release_node WHERE pid = 38878;
for views.module on drupal.org or scratch.drupal.org (views because it has multiple releases per major.minor version), or some similar data?
Comment #12
dwwan important thing to remember is that the "Drupal core compatibilty" stuff is all just a taxonomy, so that data lives in {term_node}:
make sense?
sorry for the delay, i've been slammed. :( let me know if there's anything else i can do to help clarify what's going on.
thanks!
-derek
p.s. apologies for the nasty formatting. unconed claimed that [code] wasn't going to consume whitespace anymore after the upgrade to 5.x, but it seems to be still doing that. :( alas.
Comment #13
dwwsubset of the columns, hopefully easier to read:
cheers,
-derek
Comment #14
forngren commentedSubscribe.
Comment #15
merlinofchaos commentedI've redone Nedjo's patch. I've actually simplified it a fair bit. Since I think XMLRPC is very picky about arguments, I created two calls; one that takes a string for a single project, and one that takes an array of projects so a Drupal site can get all installed modules' version data at once.
This one returns only the needed data for versioning -- I appreciate Moshe's desire to do more, but I'm concerned about query expense; therefore we should start small and expand to more later. For example, returning taxonomy terms requires an extra query per project.
I'm not enthused about $project == 'all' -- I think that's just too heavy. I believe we should limit it to just projects owned by Drupal.
I don't have a working project install, so this is completely untested. I'm hoping dww can vet it just enough to prove it doesn't break hideously due to typos, and then maybe we can stick it on s.d.o and write a quick client to see what data we get back.
Comment #16
merlinofchaos commentedMinor update.
Comment #17
dwwthanks for taking another stab at this, merlin!
however, yup, still needs work. ;)
+ $where .= "AND p.uri = '%s'";
...
+ $where = 'WHERE '. implode(' AND ', $where) : '';
...
+ $query .= " WHERE " . implode(' AND ', $where);
thanks again, i have high hopes... this looks quite good.
Comment #18
merlinofchaos commentedHere's an update which fixes some of the blatant typos dww found. Thanks for that review. Much of it was a side effect of not properly adapting the original code.
In fact 'all' is support on projects (with the 'all' keyword no filter gets added on the project table), but I'm hesitant about that feature and feel that maybe we should remove it.
This patch does not address allowing the client to specify the major version number; that's easy if getting one project, but it fundamentally changes the array structure, and that takes a little more thought. I'm guessing the best structure for that would be this:
array(
array('project_name', optional_major_version_number),
);
The structure that initially came to mind looked like this:
array(
'project_name' => version_number,
)
But that makes it difficult to be optional, since 0 is valid, you'd have to put in a NULL, and I find that awkward.
Comment #19
merlinofchaos commentedWhoops. Updated patch fixes the fact that I forgot to join in the project_release_nodes table and was somehow thinking that was the base table.
Comment #20
merlinofchaos commentedDoh, I missed 4 and 5. This patch fixes.
The whitespace changes are just Komodo doing that automatically. I manually removed those frome arlier patches but forgot later. So let's try again. I'm sure there will be several "one more times" =)
Comment #21
nedjoThanks for improving this Earl! Excellent work. In particular you've got a much better handle than I do on how we're handling versioning.
How I read your patch (with a couple of questions on points I'm unclear on):
A minor tweak needed to the
hook_xmlrpc()implementation. The third element of the arrays, where we specify data types, takes the type of the return value as its first element. So instead ofarray('string', 'string'),we need
array('array', 'string', 'string'),and an 'array' in the second part too.
Comment #22
merlinofchaos commentedNedjo: We restrict the query to official releases only. At the moment this won't return a dev release. I believe this is the optimal solution because dev releases are very ephemeral, and I think if this system attempts to report on them it will only cause noise. You can tell if a dev release has been updated only by comparing the filedate, which is a little different.
That said, we *could* try to put something in to get dev releases, and provide filedates...which could lead to sites updating their -dev modules on a roughly nightly basis. Maybe this is a good thing, as it's very close to updating from CVS.
Regarding the hook_xmlrpc implementation: Interesting, I misread the docs on hook_xmlrpc about this. Next patch will include the 'array' there.
Regarding the specific question 1: Yes we have two implementations. I don't KNOW that this is necessary, but I believe xmlrpc can't send a string in place of an array or vice versa, so two methods are necessary. Alternatively, we could reduce it to just one method and require project to always be an array, and if you're getting just one, that's fine. But then, we can't easily have an 'all' keyword very easily.
2) Yes, we're sending the taxonomy term title, which in Drupal's case is 5.x or 4.7.x or 4.6.x, et al. We are considering allowing the client to also send the major version number, so for example I could ask "Tell me what is the most current version of Views 2" as opposed to simply just "most current Views".
Thanks for the commentary, Nedjo! Sorry I answered out of order =)
Comment #23
merlinofchaos commentedUpdate based on Nedjo's comments.
While this is in for review, there are still 2 things that are 'pending' in this discussion:
1) client allowing major version to control what version gets returned.
2) Moshe suggested caching as an alternative to 'all'. Using cache_set with a 15 minute expiration time seems perfectly doable, and would reduce system load quite a bit. I'm basically in favor of that as a plan, and will be fairly easy. Does anyone have any thoughts on that?
Comment #24
nedjoThis seems like a lower priority than getting the basic stuff working. My feeling is we can leave this for a followup patch when we have a strong use case.
Yes, this would save a lot of processing so seems worth doing.
We'll need something to test this so tonight I started to rough in a module to fetch update info from drupal.org. I put it in my sandbox, http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/nedjo/modules.... It's not working yet, but I'll fix it up when I find time. If anyone wants to apply fixes/changes, pls go ahead. The aim at this point is just to get something that works for testing, not to complete a final module.
Comment #25
dwwfirst, i'm just thrilled y'all are working on this and making progress!
re: major versions and a use case... it's pretty clear we have a use case now: there's a certain insanely popular module that already has multiple major revisions compatible with 5.x core: views. if merlin makes a 5.x-2.1 release, and a site is running 5.x-1.3, this code shouldn't tell the site that 5.x-2.1 is the "most recent" release, since they're specifically *not* running the 2.* series yet (presumably, because they want the stable 1.* functionality without all of merlin's crazy ideas and new features for "Views 2"...
that said, i'd be great to see *anything* actually working, so don't feel compelled to completely solve this problem in the first iteration. but, it'd be nice to avoid churning the protocol for this command too much, so if it's possible to at least plan for needing this in some kind of forwards/backwards compatible way, that'd be great, even if implementing it correctly is a little more than we want to deal with in round 1. OTOH, it shouldn't be *that* much more difficult to just get this right in the first place. it's certainly trivial as far as the query is concerned, it's more about how the client asks, it seems.
re: caching: sure, that's great. only question is if we only cache for 'all', or if we attempt to cache per project and/or per query... 'all' seems obvious. not sure if it's worth trying to cache per project or per query (array of projects), but i'm open to the idea. i guess you could just md5hash the array of project names and cache based on that, on the theory that there might be a lot of sites with the same set of modules (think installation profiles/distributions). but even then, it seems like the caching might be overkill... i dunno.
even more than the major revision stuff, i think the caching can be harmlessly added in phase 2, since it won't even effect the protocol at all. it's just an optional optimization on the server side whenever and however we feel like adding it.
anyway, great work on this, folks, and i'm very excited to see real progress being made. i'll have to review merlin's latest patch in more detail when i'm less exhausted. if it seems reasonable, i'll put it on scratch.d.o so we can try out nedjo's client and start to see this in action! ;)
p.s. i just double checked, and i guess there's no 5.x-2.* releases of views, yet. ;) but, there almost certainly will be soon. and once there is, this is a real problem. there's already a DRUPAL-5--2 branch for panels, which is also quite popular (though not as much as views), and i've already got 4.7.x-1.* and 4.7.x-2.* versions of project, project_issue and cvslog (which are obviously much less popular than views, but still). there are currently 21 modules with DRUPAL4-7--2 branches, and 3 with a DRUPAL-5--2 branch, and those numbers are certain to grow at least some as time goes on...
Comment #26
dwwthis is getting really close. i think i could actually install this on s.d.o now.
however, some nits, mostly cosmetic:
i'd rather the XMLRPC command name and the function name matched each other (aside from . vs _)
$project = db_fetch_object($result), since the query itself is actually returning more about a release than a project. the nid is the release nid, for example. fundamentally, we're getting back releases, that happen to have a little data about their parent project (pid, title, etc), not the other way around. i think i'd prefer to call this variable$releaseinstead of$project. being familiar with project_release code, i did a double-take when i saw:$project->pida little later, since projects don't normally have pids, but releases do...thanks again for working on this. it'd be super slick to have this solid enough to commit before wednesday night. ;)
cheers,
-derek
Comment #27
dwwhehe, and some syntax errors too... that indentation problem i mentioned is partially to blame. ;)
here's a new patch that at least has no php errors, and fixes #1, #3 and #5 from my previous comment.
long-term TODO:
a) caching (at least for 'all')
b) query based on major versions, too (with continued support for not specifying it, and getting the info for whatever major version {project_release_default_versions} says to use)
c) query based on numeric tid, instead of string term name?
however, i just noticed a more serious bug in the current implementation:
INNER JOIN {project_release_default_versions}not every project has records in this table. i'll grant this is a little confusing, and perhaps its own bug. :( maybe we should just say that *all* projects should maintain an appropriate record in this table for all api_compatibility tids it has a release for, even if there's only 1 major revision with releases for a specific tid. however, until that's the case, we need to LEFT JOIN it, and if it's NULL, do something smart (like ORDER BY version_major or something, to always get the latest release of the highest major version...).
Comment #28
dwwin spite of the known bugs, i just installed my patch from the previous comment on s.d.o. basic project node navigation and download pages still seem to work (which basically just confirms there are no more syntax errors, since otherwise, this patch isn't touching any existing code). so, nedjo, feel free to start playing with your client code and seeing if you can get any reasonable answers back. ;)
Comment #29
nedjoI've been trying to test this, so far without success. I don't know what's expected as a value for $version, I've tried various things but I always get an empty array returned. merlin, dww, can you clear this up?
Here's some testing code (can be run in devel.module's php block on any test site):
Comment #30
dwwthanks for the client side code, nedjo. i debugged for a little while and found a few major bugs in the SQL query. ;) that's the main reason you're getting empty results...
however, you also want to use one of the names of the "Drupal core compatibility" vocabulary terms. e.g. "5.x" or "4.7.x". VERSION is set to stuff like "5.0", "5.1-dev", etc, and that won't do.
as soon as i upload the patch to this issue, i'll install the fixed copy of the server code on s.d.o... 1 sec.
Comment #31
dwwwhoops, one more bug that didn't make it into my patch.
this doesn't work:
db_query($query, $join_parameters + $parameters);for some reason i don't fully understand, you need this:
db_query($query, array_merge($join_parameters, $parameters));anyway, attached patch solves it, and is now installed on s.d.o.
i just tried this:
and got this:
can i get a WOO HOO?! ;) nice work, folks, we're getting there... the links are clearly broken, i'll take a quick look at that.
Comment #32
dwwfixing the links. just silly minor errors in the SQL (not selecting for pid at all, and yet more filepath vs. file_path mixups).
Comment #33
dwwhehe, whoops, except i forgot about
project_release_download_link(), which, for example, handles the download-from-ftp.osuosl.org link re-writing stuff...Comment #34
dwwexcept we just want the raw URL, not the html-formated download link. ;)
Comment #35
dwwand we want the right # of args for the function. ;)
Comment #36
nedjoNice! Thanks for all the fixes! This is exciting!
I've updated the draft module in my sandbox and it's displaying some useful information, screenshot attached.
To use the module:
1. install it
2. run cron
3. visit admin/logs/updates (also linked to on modules page)
Comment #37
dwwand, since it's so trivial to serve, and potentially extremely useful for clients, include the file date and md5hash in the answer, too.
Comment #38
dwwok, after further investigation, i don't think i care about the projects that don't have records in {project_release_default_versions}. there's only 248 of them, and they seem to either have no published releases at all, or only have a HEAD release node, which doesn't know what version of core it's compatible with. in either case, there'd be nothing to report, anyway. so, that concern from comment #27 is hereby off the table. that leaves us with:
a) caching results for 'all'.
b) querying based on a custom major revision.
c) querying based on numeric tid instead of term name.
merlin and i just discussed (c) in IRC, and agree it's a bad idea, so that's out, too. (a) should be trivial, and isn't even required for phase 1. perhaps i'll roll a quick patch for that.
(b) is tricky, since there's no particularly good way to query on arbitrary major revisions for each project in a big array in a single query. so, we'll probably just say that if you use the array of projects or 'all', you get the defaults, and querying for a specific major revision only works if you query a single project. so, at least sites could say "i'm using the defaults for everything, except foo, bar, and baz, which are at "5.x-2" (or whatever), which would generate 4 XML-RPC requests, resulting in 4 queries. that seems like not the end of the world. and, since we're supporting individual project queries anyway, it doesn't seem to hurt to allow the client to optionally specify a major revision to restrict the results to...
Comment #39
merlinofchaos commentedAccording to chx, we don't need to specify the arguments to xmlrpc *at all* which gives us more flexibility. We can safely do this because we're writing the client, and we have pretty much already checked for sane defaults anyhow.
This version allows for an optional major version -- it must be the same for all projects specified. It also returns a little more data. Hopefully I made no typos!
Comment #40
dww- merlin's patch wiped out a few changes to project_release_download_link(), which i've restored.
- cleaned up a few comments
- added caching for 'all'
i think that's everything. ;) this no longer needs work... it needs reviews. i'll install on s.d.o as soon as the patch is uploaded.
Comment #41
dwwwhoops, minor bug in the cache stuff... merlin had renamed $version to $api_version as the function argument (which is a good move), but my cache code didn't notice. now fixed.
Comment #42
dwwand if you specify both 'all' and a major version, the cache id is wrong. ;) so, if the $version_major is specified, we now add that to the $cid we construct.
Comment #43
dwwsome minor typos/bugs in the previous patch. wow, this issue is just tearing through the revisions. ;)
Comment #44
moshe weitzman commentednice work, all.
it occurs to me that perhaps we only want to offer the 'all' version .that would be much kinder on the server, since everyone will receive the same cached document. the downside is a bit more bandwidth consumption.
Comment #45
merlinofchaos commentedI don't believe that unserializing an array of 1200 or so arrays of around 8 items apiece will be any friendlier to the system than a query for what will probably be a dozen or two projects. I could be wrong -- we could easily do some benchmarks and see.
Comment #46
moshe weitzman commented@earl - all that unserializing has to be once every 15 mins and then cached as XML - for everyone. the 'give me my own XML' approach has to be built custom for each of drupal's 100,000 sites or however many we'll have for this release. seems like a clear decision to me.
Comment #47
merlinofchaos commentedUh, no, it has to be unserialized when it's pulled from the cache. We're not storing raw XML; the XMLRPC client we have doesn't let us cache at that level. We're storing the array, which is serialized in the cache table. It must be unserialized every time it is hit.
I did run across a bug. It currently believes that Drupal 5.0-beta1.tar.gz is the most current Drupal. This is clearly wrong; not sure what's causing this offhand.
Nedjo, I'm toying with projectinfo a bit; I'm curious why you're not using theme('table') for your table. Also, there's quite a few things I'd like to see out of this. Do you want to create a real project out of this, or would you like me to? You took the initiative to write this; I'll be more than happy to continue it and maintain the project until we can get it into core for Drupal 6, but if you want to I am more than willing to let you. Either way, once we have a project we can discuss the desired feature set there.
Comment #48
nedjoEarl, please go ahead and create a project for this and code away. I put it in my sandbox mainly because I wasn't sure what we'd want to call it. I'll be happy to review ideas or contribute. The theming is following for convenience what's in
theme_status_report()(and recycling the class system-status-report). Make all the changes you like.Comment #49
moshe weitzman commentedah, i didn't catch that. thanks.
this brings up the question of why use XMLRPC at all? What is wrong with a regular callback that returns XML. Perhaps the only argument we need is major version of drupal. That can be a typical argument in menu system. Sorry to be a pest here. Feel free to just ignore me.
Comment #50
merlinofchaos commentedWell, why parse all the XML ourselves when XMLRPC already does it?
XMLRPC is nice and standard; it'd really suck to create our own protocol just to shave a little bit off.
I may be wrong about the kind of performance hit this will be, but we already have a lot of Drupal sites out there doing reporting, right? That should give us some kind of indication of how many sites are likely to use this sort of feature as well.
In any case, the current module updater module is set to check only once per day; (er, I think; at least, it will be by the time it's released), and it's not set to check at a specific time; which means we should get them spread out over a not fully random distribution but also not all lumped together at 3am or something.
Comment #51
dww@merlin: I did run across a bug. It currently believes that Drupal 5.0-beta1.tar.gz is the most current Drupal. This is clearly wrong; not sure what's causing this offhand.
part of the problem is that we're running on s.d.o, so don't be confused that 5.1 isn't showing up, since that didn't happen until after the last time s.d.o got refreshed from the d.o DB. however, there is a bug in the query, too. we don't deal with version_extra in the ORDER BY, so 5.0 and 5.0-beta1 are equivalent, as far as the query is concerned, and you get semi-random results. however, we can't ORDER BY on extra, since there's no guarantee that the order of releases people choose with their own crazy "beta", "rc", etc conventions, will always work out to a nice clean, alphabetic ordering. other parts of project_release.module and friends just use an ORDER BY on prn.file_date as the tie-breaker if we have 2 otherwise equivalent version strings. so, even though 5.0 and 5.0-beta1 are the same from the major/minor/patch (but not extra) perspective, 5.0 was later, so it has a bigger timestamp value, so it'll win.
@moshe: i mostly agree w/ merlin. it's a pain in the ass to reimplment the wheel, and until we have real benchmarks or data that the performance is a problem, i'd rather avoid a) re-doing any of this work or b) making it any more custom/complicated than it has to be. sure, the XML-RPC will generate some new load. OTOH, the project browsing pages on d.o are some of the most expensive, nasty queries our poor DB server has to handle. if we had a wide deployment of projectinfo.module (or whatever it's going to be called), perhaps that would cut down (significantly?) at how many times/people are constantly trawling through the project browsing pages... ;)
the one thing i disagree with is the "reporting" vs. "project update info querying" crowds. the reporting stuff is tied directly into the drupal.module, and only a small subset actually care to mess with such distributed authentication. furthermore, many sites have a (justifiable) aversion to the "my software phones home to tell its makers about me" functionality. so, i'm guessing it's a relatively tiny % of drupal sites that actually turn on the report-home code. however, *tons* of sites (i daresay *all* of them) will want this updater (which is why this should become part of core for D6, if at all possible). so, i don't think it's fair to say that the relatively small number of sites that report to d.o is any indication of how many people will be using this code once it "hits the shelves".
anyway, attached patch fixes the query as described above.
Comment #52
dwwFYI: i'm happy with this, and think it's RTBC. unfortunately, i'm not going to mark it as such, nor am i going to commit it just yet, since it seems there's lingering doubt over the performance implications. however, merlinofchaos and nedjo both have commit access, and my blessings to commit this patch whenever they see fit. of course, if this doesn't happen while i'm gone, i'll pick up the torch again when i return. i just don't want anyone to feel like this issue needs to be blocked on any further input from me...
good luck! ;)
-derek
Comment #53
fgmWith killes' help, we did some benchmarking from a remote client in php-gtk2 to s.d.o. and got the following results, all averaged over 100 passes.
Deviations from the 450ms per normal request were higher within a series of test than from one series of test with 1 module (up to 670ms at some point) to another with 10 modules, so it appears that a reasonably small number of modules in the request does not significantly alter the response time.
Of course, it is to be expected that requesting a large number of modules will at some point become significant, but up to 10 modules, this does not appear to be the case : external variations of load on the drupal servers appear to be much more significant than the number of modules requested.
Sample test harness is attached. It must be renamed to rpc_test_harness.tgz because d.o. rejects .gz attachments on comments.
Comment #54
merlinofchaos commentedWith killes' ok, I believe this patch (which includes gzip compression) will be RBTC.
I'd like to get this on scratch to make sure it works and get projectinfo using the right data.
Comment #55
merlinofchaos commentedEr, try this one instead.
Comment #56
merlinofchaos commentedOk, killes' pointed out a small problem with the previous patch. Also, this version caches for an hour.
Comment #57
moshe weitzman commentedearl committed this.
Comment #58
Chris Johnson commentedWow! It's great to see this. Nedjo, thanks for resurrecting these ideas and getting into code. Earl and Derek, thanks much for pitching in.
It's really great to see an idea I had several years ago get improved and implemented. I'm sure others had similar ideas independently and in parallel -- I'm not claiming credit here, it's just something I've wanted to see for a long time.
Comment #59
(not verified) commented