Comments

Chris Johnson’s picture

See also issue 48582.

Chris Johnson’s picture

Title: Track project files via an indicator and provide a page (URL) which returns current version » provide module version info from drupal.org

In this issue, we are looking at a simple but effective means for drupal installs to check for available update information.

Under this approach, an install would query a central server (e.g., drupal.org) for the latest available version of a particular module or theme.

To respond to this request, the project module would return a simple text file in php_ini format, e.g.:

md5:93284729238747
url:http://www.drupal.org/project
version:4.7.3

Needed:

* a URL (as a menu item?) to be called via a GET from the client which will return the above data.
* a way to generate a checksum and version number for each file included in a module or theme. The project module already generates checksums on the nightly tarball for each module or theme.

This proposal comes from the Drupal Vancouver conference, and was the consensus suggestion of: Neil Drumm, Jeremy Andrews, Chris Johnson, Nedjo Rogers and Omar Bickell.

This issue is for the server-side code which will be in the project module. Issue 48582 is for the client-side code which will probably be in the system module.

nedjo’s picture

Version: x.y.z » 5.x-1.x-dev
Status: Active » Needs work
StatusFileSize
new2.17 KB

Here, to get the ball rolling, is a very rough idea of how we could do this. Anyone please wade in and make improvements.

Warning: I haven't yet taken the time to understand how we're generating and storing version data, so my guesses here are very likely to be all wrong.

This patch declares a new menu item at, e.g.,:

/node/43/updates

where 43 is the nid of a project node. This is expected to be accessed with an additional argument of a major and minor version, e.g., 5.0:

/node/43/updates/5.0

What is returned is three lines in the format:

patch_version = 1
file_path = http://example.com/files/somefile.tar.gz
file_hash = 93284729238747

where 1 is the highest available 'patch version' for the given major and minor version for that project. If no such version exists, the return value is:

patch_version = null

To use this data, a contrib module would issue a get request, passing data parsed from the VERSION define in system.module.

The contrib module would also need access to the base url for the project, in this case, /node/43. I'm thinking we've put something relevant in the .info files?

On this basis, the contrib module would determine if the highest available version is higher than the currently installed one and respond accordingly.

Issues/questions:

1. I'm forgetting how the arguments fed into menu callbacks work. Likely we don't need to explicitly fetch the $version from an arg.

2. How are the version data actually structured? What exactly should we really be trying to return?

3. Is project_release.module the logical place to put this? I'm thinking so because it's release-related.

dww’s picture

thanks for taking an initial stab at this, nedjo!

sadly, i don't have much time to devote to this right now, since i'm working on 2 critical things before the 5.0 release:
http://drupal.org/node/105192
http://drupal.org/node/105986

however, a few quick replies:

a) if you haven't already, i'd give a thorough read to http://drupal.org/node/94154. instead of every install generating a separate http request for every contrib, the general idea is that a single request for a big list of latest versions would be best.

b) yes, we store the "project" automatically in the .info files. this is the "project short name" from the project node, which in theory always gives you the project page, since we auto-generate the http://drupal.org/project/[foo] aliases. but, the idea was the request would have a list of these "project" names (as distinct from the human-readable name that lives in the "name" field), and their corresponding "version" values.

c) version info is (almost) all in the {project_release_nodes} table. probably all we need to return is the human readable version string ("version"). the additional version info comes from the "Project release compatibility API" taxonomy, which on d.o is called "Drupal core compatibility". but, this stuff is encapsulated in the version string already. i'll be the first to admit it's all a little wonky, and it's not yet really documented anywhere (except in the code), so by all means, ask questions and i'll do the best i can to clear up any confusion.

d) yes, project_release.module is the right place for this stuff.

that's all for now, back to my critical 5.x upgrade tasks for d.o... ;)

thanks!
-derek

nedjo’s picture

Assigned: Unassigned » nedjo
Status: Needs work » Needs review
StatusFileSize
new1.74 KB

As per discussion in http://drupal.org/node/94154, this patch is changed to:

* use xmlrpc
* return data for all projects.

A contrib module implementing this would issue an xmlrpc request as follows for all projects:

  $server = variable_get('project_updates_server', 'http://drupal.org/xmlrpc.php');
  $project = 'all';
  // Version value to be parsed from the VERSION define in system.module.
  $version = '5.0';
  $result = xmlrpc($server, 'project.release.ping', $project, $version);

or for a single project:

  $server = variable_get('project_updates_server', 'http://drupal.org/xmlrpc.php');
  $project = 'views';
  $version = '5.0';
  $result = xmlrpc($server, 'project.release.ping', $project, $version);

Question: how do we determine what is the highest available release for a given major/minor release combination? In the current patch, I've matched the version value through a LIKE where clause and then done a simple sort by version DESC, but this is not likely to work.

dopry’s picture

It would be nice if the data array were better keyed... like with the name form the info files...
and shouldn't the drupal_xmlrpc be project_release_xmlrpc?

Either way this is awesome code brewing future goodness... as soon as this is available on drupal.org it will be awesome... big kudos.

dww’s picture

still no time to look closely, but a few thoughts:

@nedjo: look at the schema for the {project_release_nodes} table for the details on how we store version info. also, if you look at, for example, project_release_project_edit_releases(), you'll see some code that's computing the most recent releases for each "core version" (which is really a taxonomy, we just call it "Drupal core compatibility" on d.o).

@dopry: thanks for looking and sharing your thoughts. however, in general, i'm pretty down on the idea of project*.module parsing .info files. .info files are *totally specific* to drupal.org. we can't hard-code this module to parse those. if we're going to get any data out of .info files, we either need to:
a) put that into the d.o-specific hacks module (project_drupal_org.module or some such).
b) have the packaging script (which is already d.o-specific) take data out of the .info files (it already finds *.info so it can update the version string and module identification correctly) and stuff that into the DB in some generic way.

both approaches require more thought. so, meanwhile, if there's ever a solution that doesn't involve the .info files, that's my preference (though i admit it's lame there are now (at least) 4 unique places you can name your module: the filename in CVS (which determines your hook names), the title of the project node, the "project short name" in the project node, and the "name" in the .info file). at most, that should be 2: machine-readable (filename, hooknames + project short name) vs. human readable (project title and .info file). maybe the packaging script should always overwrite what's in the .info file with the project title. ;) we already enforce that the "project short name" has to match the directory in CVS. so long as the file names match the directory name, we shouldn't get any divergence there...

moshe weitzman’s picture

nice.

i would like to see a bit more summary info about each project in the XML. for example, the author name, author uid, project.module description . if it isn't too hard, taxonomy term names and urls would be nice too.

the motivation here is that other apps will want to consume this data. specifically, one might setup a module which publishes the modules in use at a site and gives the site owner a chance to comment on how he is using them. or a 3rd aprty site might use this data to provide a ratings and reviews service for projects.

merlinofchaos’s picture

Nedjo: Bless you for getting to this!

Comments:

1) why is the callback named drupal_xmlrpc()? Shouldn't it be project_release_xmlrpc?

2) The comment on project_release_ping is wrong.

3) I would not name the function 'ping', I would name it something more descriptive for what you're doing. query_project perhaps.

4) Version data probably needs to be paired with api version data -- while the 'version' string includes this, it'd be easier on the client if it doesn't have to parse this field, and I believe all of the data is available. Or maybe it needs to be parsed, but let's not make the client parse it.

5) Allow the query to request versions only for a specific API version -- any given Drupal site will only want updates for a given version, though something that gets all API versions or a list of API versions would be useful too (i.e, can I upgrade from Drupal 4.7 --> 5)

6) As for additional data we make available, I'm not sure how much we can add that is useful at this time, though I take Moshe's point and project description and taxonomy makes a lot of sense. The taxonomy should just be available on the project node, too.

Paul Natsuo Kishimoto’s picture

Nedjo, this is great—thanks! To add to the discussion:

  • From the perspective of 94154 adding taxonomy & other data multiplies the size of the RPC result, but doesn't help provide a yes/no answer to "Is an updated version available?" If a user is considering a new module for installation, (s)he would be better off reading and understanding the full description on the project's page on d.o, rather than a one-line summary. However, we could add an XMLRPC parameter that selects basic/extended information, and include dependencies in the extra data.
  • I agree with merlinofchaos re: 'ping'. Perhaps 'list' instead?

@dww: I have a 'noob' question -) Is there a historical reason there isn't a hook_version() or something like it for modules?

@merlinofchaos: about your 4) and 5), Nedjo's sample PHP in #5 passes a major.minor version, which maps easily to an API version. I'll have to take a closer look at the contents of the Project DB tables to be sure, but Nedjo's implementation looks right, i.e. accept "5.0", "4.7.3" etc. as a parameter, then use SQL LIKE (or truncate to "5.", "4.7.") to match API versions in the database. .info files contain 4.7.x-1.0/5.x-1.1/5.x-1.2/etc.-style versions, so the client doesn't need parse—merely compare. If the XMLRPC and installed versions don't match, it's guaranteed the version on d.o is newer.

nedjo’s picture

Status: Needs review » Needs work
StatusFileSize
new3.99 KB

Here's an update to fix some of the errors noted and sketch out some of the suggestions. Plus some documentation.

We support an 'all' option for $version as well as $project, to fetch all versions for a given project, or for all projects.

As per Moshe's and Paul's suggestions, we introduce an option to fetch extended data (though I haven't put anything in there yet).

The query fetching the version data is now certainly broken, since the DISTINCT will prevent getting all versions.

Seeing some of the actual data stored in {project_relase_nodes} would be very handy for those of us still struggling to grok this. Could someone post the results of, e.g.,

SELECT * FROM project_release_node WHERE pid = 38878;

for views.module on drupal.org or scratch.drupal.org (views because it has multiple releases per major.minor version), or some similar data?

dww’s picture

+--------+-------+---------------+----------------------+-------------------------------------------+------------+----------------------------------+---------+---------------+---------------+---------------+---------------+
| nid    | pid   | version       | tag                  | file_path                                 | file_date  | file_hash                        | rebuild | version_major | version_minor | version_patch | version_extra |
+--------+-------+---------------+----------------------+-------------------------------------------+------------+----------------------------------+---------+---------------+---------------+---------------+---------------+
|  95897 | 38878 | HEAD          | HEAD                 | files/projects/views-HEAD.tar.gz          | 1168935636 | c0ddae18760ec673d66a1bcbc9b36fdf |       1 |          NULL |          NULL |          NULL | dev           |
|  95898 | 38878 | 4.6.x-1.x-dev | DRUPAL-4-6           | files/projects/views-4.6.x-1.x-dev.tar.gz | 1163377307 | ee17e834c70b4bdf75ab19a9542d8c86 |       1 |             1 |          NULL |          NULL | dev           |
|  95899 | 38878 | 4.7.x-1.x-dev | DRUPAL-4-7           | files/projects/views-4.7.x-1.x-dev.tar.gz | 1168935642 | 65986fdcbc5a44e7d74582aeed4fcf3c |       1 |             1 |          NULL |          NULL | dev           |
|  96979 | 38878 | 5.x-1.0       | DRUPAL-5--1-0        | files/projects/views-5.x-1.0.tar.gz       | 1163237525 | e28a9956eec799519419a4381a4dbcf1 |       0 |             1 |          NULL |             0 | NULL          |
|  96980 | 38878 | 4.7.x-1.0     | DRUPAL-4-7--1-0      | files/projects/views-4.7.x-1.0.tar.gz     | 1163237525 | 50f54bfce65124e790adeb53bc6aa038 |       0 |             1 |          NULL |             0 | NULL          |
|  97825 | 38878 | 5.x-1.1-beta  | DRUPAL-5--1-1-BETA   | files/projects/views-5.x-1.1-beta.tar.gz  | 1163619003 | b5470a54544ab83302e710da1c18551c |       0 |             1 |          NULL |             1 | beta          |
|  97827 | 38878 | 4.7.x-1.1     | DRUPAL-4-7--1-1      | files/projects/views-4.7.x-1.1.tar.gz     | 1163619303 | f6881c97e93c942561fb44198fc618b9 |       0 |             1 |          NULL |             1 | NULL          |
| 101134 | 38878 | 5.x-1.2-beta1 | DRUPAL-5--1-2-BETA-1 | files/projects/views-5.x-1.2-beta1.tar.gz | 1165155303 | 9aacfce74f7741312c165f75005ff270 |       0 |             1 |          NULL |             2 | beta1         |
| 101135 | 38878 | 5.x-1.x-dev   | DRUPAL-5             | files/projects/views-5.x-1.x-dev.tar.gz   | 1165176533 | 6ba58eb7e2bc0f613b6efdb835c7ab7e |       1 |             1 |          NULL |          NULL | dev           |
| 101159 | 38878 | 4.7.x-1.2     | DRUPAL-4-7--1-2      | files/projects/views-4.7.x-1.2.tar.gz     | 1165155306 | c5dad72986d2c30feb0bf888ef4bbe6f |       0 |             1 |          NULL |             2 | NULL          |
| 102419 | 38878 | 5.x-1.3-beta1 | DRUPAL-5--1-3-BETA-1 | files/projects/views-5.x-1.3-beta1.tar.gz | 1165701902 | e8f1456d85a502edc0421e19fa224038 |       0 |             1 |          NULL |             3 | beta1         |
| 102420 | 38878 | 4.7.x-1.3     | DRUPAL-4-7--1-3      | files/projects/views-4.7.x-1.3.tar.gz     | 1165702204 | e3bb550ae97c84088769015a366a22a2 |       0 |             1 |          NULL |             3 | NULL          |
| 104384 | 38878 | 5.x-1.4-rc1   | DRUPAL-5--1-4-RC1    | files/projects/views-5.x-1.4-rc1.tar.gz   | 1166582406 | 4d8ab7f30f8d1de2bc2d52d4686dfe00 |       0 |             1 |          NULL |             4 | rc1           |
| 104385 | 38878 | 4.7.x-1.4     | DRUPAL-4-7--1-4      | files/projects/views-4.7.x-1.4.tar.gz     | 1166582703 | 4d1bb5c5779d04eab50fe2261a0b5e49 |       0 |             1 |          NULL |             4 | NULL          |
| 104450 | 38878 | 5.x-1.4-2rc1  | DRUPAL-5--1-4-2-RC1  | files/projects/views-5.x-1.4-2rc1.tar.gz  | 1166614504 | c430daadbeb8478f8a29a464aec5774a |       0 |             1 |          NULL |             4 | 2rc1          |
| 104452 | 38878 | 4.7.x-1.4-2   | DRUPAL-4-7--1-4-2    | files/projects/views-4.7.x-1.4-2.tar.gz   | 1166614509 | 5fb566ceb796d2c11f59bb6f74dea117 |       0 |             1 |          NULL |             4 | 2             |
+--------+-------+---------------+----------------------+-------------------------------------------+------------+----------------------------------+---------+---------------+---------------+---------------+---------------+
16 rows in set (0.00 sec)

an important thing to remember is that the "Drupal core compatibilty" stuff is all just a taxonomy, so that data lives in {term_node}:

mysql> SELECT p.nid, p.pid, p.version, t.tid FROM project_release_nodes p INNER JOIN term_node t ON p.nid = t.nid WHERE p.pid = 38878;
+--------+-------+---------------+-----+
| nid    | pid   | version       | tid |
+--------+-------+---------------+-----+
|  95898 | 38878 | 4.6.x-1.x-dev |  80 |
|  95899 | 38878 | 4.7.x-1.x-dev |  79 |
|  96979 | 38878 | 5.x-1.0       |  78 |
|  96980 | 38878 | 4.7.x-1.0     |  79 |
|  97825 | 38878 | 5.x-1.1-beta  |  78 |
|  97827 | 38878 | 4.7.x-1.1     |  79 |
| 101134 | 38878 | 5.x-1.2-beta1 |  78 |
| 101135 | 38878 | 5.x-1.x-dev   |  78 |
| 101159 | 38878 | 4.7.x-1.2     |  79 |
| 102419 | 38878 | 5.x-1.3-beta1 |  78 |
| 102420 | 38878 | 4.7.x-1.3     |  79 |
| 104384 | 38878 | 5.x-1.4-rc1   |  78 |
| 104385 | 38878 | 4.7.x-1.4     |  79 |
| 104450 | 38878 | 5.x-1.4-2rc1  |  78 |
| 104452 | 38878 | 4.7.x-1.4-2   |  79 |
+--------+-------+---------------+-----+
15 rows in set (0.00 sec)

make sense?

sorry for the delay, i've been slammed. :( let me know if there's anything else i can do to help clarify what's going on.

thanks!
-derek

p.s. apologies for the nasty formatting. unconed claimed that [code] wasn't going to consume whitespace anymore after the upgrade to 5.x, but it seems to be still doing that. :( alas.

dww’s picture

subset of the columns, hopefully easier to read:

mysql> SELECT nid, version, tag, version_major, version_minor, version_patch, version_extra FROM project_release_nodes WHERE pid = 38878;
+--------+---------------+----------------------+---------------+---------------+---------------+---------------+
| nid    | version       | tag                  | version_major | version_minor | version_patch | version_extra |
+--------+---------------+----------------------+---------------+---------------+---------------+---------------+
|  95897 | HEAD          | HEAD                 |          NULL |          NULL |          NULL | dev           |
|  95898 | 4.6.x-1.x-dev | DRUPAL-4-6           |             1 |          NULL |          NULL | dev           |
|  95899 | 4.7.x-1.x-dev | DRUPAL-4-7           |             1 |          NULL |          NULL | dev           |
|  96979 | 5.x-1.0       | DRUPAL-5--1-0        |             1 |          NULL |             0 | NULL          |
|  96980 | 4.7.x-1.0     | DRUPAL-4-7--1-0      |             1 |          NULL |             0 | NULL          |
|  97825 | 5.x-1.1-beta  | DRUPAL-5--1-1-BETA   |             1 |          NULL |             1 | beta          |
|  97827 | 4.7.x-1.1     | DRUPAL-4-7--1-1      |             1 |          NULL |             1 | NULL          |
| 101134 | 5.x-1.2-beta1 | DRUPAL-5--1-2-BETA-1 |             1 |          NULL |             2 | beta1         |
| 101135 | 5.x-1.x-dev   | DRUPAL-5             |             1 |          NULL |          NULL | dev           |
| 101159 | 4.7.x-1.2     | DRUPAL-4-7--1-2      |             1 |          NULL |             2 | NULL          |
| 102419 | 5.x-1.3-beta1 | DRUPAL-5--1-3-BETA-1 |             1 |          NULL |             3 | beta1         |
| 102420 | 4.7.x-1.3     | DRUPAL-4-7--1-3      |             1 |          NULL |             3 | NULL          |
| 104384 | 5.x-1.4-rc1   | DRUPAL-5--1-4-RC1    |             1 |          NULL |             4 | rc1           |
| 104385 | 4.7.x-1.4     | DRUPAL-4-7--1-4      |             1 |          NULL |             4 | NULL          |
| 104450 | 5.x-1.4-2rc1  | DRUPAL-5--1-4-2-RC1  |             1 |          NULL |             4 | 2rc1          |
| 104452 | 4.7.x-1.4-2   | DRUPAL-4-7--1-4-2    |             1 |          NULL |             4 | 2             |
+--------+---------------+----------------------+---------------+---------------+---------------+---------------+
16 rows in set (0.00 sec)

cheers,
-derek

forngren’s picture

Subscribe.

merlinofchaos’s picture

StatusFileSize
new3.93 KB

I've redone Nedjo's patch. I've actually simplified it a fair bit. Since I think XMLRPC is very picky about arguments, I created two calls; one that takes a string for a single project, and one that takes an array of projects so a Drupal site can get all installed modules' version data at once.

This one returns only the needed data for versioning -- I appreciate Moshe's desire to do more, but I'm concerned about query expense; therefore we should start small and expand to more later. For example, returning taxonomy terms requires an extra query per project.

I'm not enthused about $project == 'all' -- I think that's just too heavy. I believe we should limit it to just projects owned by Drupal.

I don't have a working project install, so this is completely untested. I'm hoping dww can vet it just enough to prove it doesn't break hideously due to typos, and then maybe we can stick it on s.d.o and write a quick client to see what data we get back.

merlinofchaos’s picture

StatusFileSize
new8.91 KB

Minor update.

dww’s picture

thanks for taking another stab at this, merlin!

however, yup, still needs work. ;)

  1. handling of $where is fubar:
    + $where .= "AND p.uri = '%s'";
    ...
    + $where = 'WHERE '. implode(' AND ', $where) : '';
    ...
    + $query .= " WHERE " . implode(' AND ', $where);
  2. nice to see the JOIN {project_release_default_versions}, however, i'm not sure that's what we always want. seems like we need an optional arg for a site to ask "what's the most recent version of views 5.x-2.*?" not just "what's the most recent version of views 5.x for the maintainer's idea of the default series...". so, if the optional arg for "2" isn't specified, we ask {project_release_default_versions}, but if the client wants a specific major version, we should let 'em have it.
  3. lots of the comments are inaccurate. one example is confusion over "API version" (the taxonomy term, e.g. the "5.x" part above) and the "major revision" (the "2" part). another is confusion over 'all' ... some comments say it's supported, others don't (and i'm too tired to tell for sure, but it looks like 'all' is not in fact supported).
  4. should the returned data include a link to the project node? the project homepage property isn't the same thing...
  5. seems to include a bunch of unrelated whitespace-only changes, which i'd like to keep separate if it's not too much of a pain.

thanks again, i have high hopes... this looks quite good.

merlinofchaos’s picture

StatusFileSize
new8.88 KB

Here's an update which fixes some of the blatant typos dww found. Thanks for that review. Much of it was a side effect of not properly adapting the original code.

In fact 'all' is support on projects (with the 'all' keyword no filter gets added on the project table), but I'm hesitant about that feature and feel that maybe we should remove it.

This patch does not address allowing the client to specify the major version number; that's easy if getting one project, but it fundamentally changes the array structure, and that takes a little more thought. I'm guessing the best structure for that would be this:

array(
array('project_name', optional_major_version_number),
);

The structure that initially came to mind looked like this:

array(
'project_name' => version_number,
)

But that makes it difficult to be optional, since 0 is valid, you'd have to put in a NULL, and I find that awkward.

merlinofchaos’s picture

StatusFileSize
new8.99 KB

Whoops. Updated patch fixes the fact that I forgot to join in the project_release_nodes table and was somehow thinking that was the base table.

merlinofchaos’s picture

Status: Needs work » Needs review
StatusFileSize
new4.1 KB

Doh, I missed 4 and 5. This patch fixes.

The whitespace changes are just Komodo doing that automatically. I manually removed those frome arlier patches but forgot later. So let's try again. I'm sure there will be several "one more times" =)

nedjo’s picture

Assigned: nedjo » Unassigned
Status: Needs review » Needs work

Thanks for improving this Earl! Excellent work. In particular you've got a much better handle than I do on how we're handling versioning.

How I read your patch (with a couple of questions on points I'm unclear on):

  1. We have two separate XMLRPC methods, one for either all or a single project and one for an array of specific projects.
  2. In both cases, we feed in a version string representing a version of Drupal (or, potentially, some other software) core. (Do I have this right? Are we instead considering sending version information about the particular module being queried for?)
  3. We build a query to find the 'highest' compatible version of the module or modules being fetched. (Do we return an official release if available, only returning data on a branch if no official release is there? Or do we want to ignore branches altogether, returning data only on official releases?)
  4. We return the module data as an array, including link data (nice!).

A minor tweak needed to the hook_xmlrpc() implementation. The third element of the arrays, where we specify data types, takes the type of the return value as its first element. So instead of

    array('string', 'string'),

we need

    array('array', 'string', 'string'),

and an 'array' in the second part too.

merlinofchaos’s picture

Nedjo: We restrict the query to official releases only. At the moment this won't return a dev release. I believe this is the optimal solution because dev releases are very ephemeral, and I think if this system attempts to report on them it will only cause noise. You can tell if a dev release has been updated only by comparing the filedate, which is a little different.

That said, we *could* try to put something in to get dev releases, and provide filedates...which could lead to sites updating their -dev modules on a roughly nightly basis. Maybe this is a good thing, as it's very close to updating from CVS.

Regarding the hook_xmlrpc implementation: Interesting, I misread the docs on hook_xmlrpc about this. Next patch will include the 'array' there.

Regarding the specific question 1: Yes we have two implementations. I don't KNOW that this is necessary, but I believe xmlrpc can't send a string in place of an array or vice versa, so two methods are necessary. Alternatively, we could reduce it to just one method and require project to always be an array, and if you're getting just one, that's fine. But then, we can't easily have an 'all' keyword very easily.

2) Yes, we're sending the taxonomy term title, which in Drupal's case is 5.x or 4.7.x or 4.6.x, et al. We are considering allowing the client to also send the major version number, so for example I could ask "Tell me what is the most current version of Views 2" as opposed to simply just "most current Views".

Thanks for the commentary, Nedjo! Sorry I answered out of order =)

merlinofchaos’s picture

Status: Needs work » Needs review
StatusFileSize
new4.2 KB

Update based on Nedjo's comments.

While this is in for review, there are still 2 things that are 'pending' in this discussion:

1) client allowing major version to control what version gets returned.
2) Moshe suggested caching as an alternative to 'all'. Using cache_set with a 15 minute expiration time seems perfectly doable, and would reduce system load quite a bit. I'm basically in favor of that as a plan, and will be fairly easy. Does anyone have any thoughts on that?

nedjo’s picture

1) client allowing major version to control what version gets returned.

This seems like a lower priority than getting the basic stuff working. My feeling is we can leave this for a followup patch when we have a strong use case.

2) Moshe suggested caching

Yes, this would save a lot of processing so seems worth doing.

We'll need something to test this so tonight I started to rough in a module to fetch update info from drupal.org. I put it in my sandbox, http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/nedjo/modules.... It's not working yet, but I'll fix it up when I find time. If anyone wants to apply fixes/changes, pls go ahead. The aim at this point is just to get something that works for testing, not to complete a final module.

dww’s picture

first, i'm just thrilled y'all are working on this and making progress!

re: major versions and a use case... it's pretty clear we have a use case now: there's a certain insanely popular module that already has multiple major revisions compatible with 5.x core: views. if merlin makes a 5.x-2.1 release, and a site is running 5.x-1.3, this code shouldn't tell the site that 5.x-2.1 is the "most recent" release, since they're specifically *not* running the 2.* series yet (presumably, because they want the stable 1.* functionality without all of merlin's crazy ideas and new features for "Views 2"...

that said, i'd be great to see *anything* actually working, so don't feel compelled to completely solve this problem in the first iteration. but, it'd be nice to avoid churning the protocol for this command too much, so if it's possible to at least plan for needing this in some kind of forwards/backwards compatible way, that'd be great, even if implementing it correctly is a little more than we want to deal with in round 1. OTOH, it shouldn't be *that* much more difficult to just get this right in the first place. it's certainly trivial as far as the query is concerned, it's more about how the client asks, it seems.

re: caching: sure, that's great. only question is if we only cache for 'all', or if we attempt to cache per project and/or per query... 'all' seems obvious. not sure if it's worth trying to cache per project or per query (array of projects), but i'm open to the idea. i guess you could just md5hash the array of project names and cache based on that, on the theory that there might be a lot of sites with the same set of modules (think installation profiles/distributions). but even then, it seems like the caching might be overkill... i dunno.

even more than the major revision stuff, i think the caching can be harmlessly added in phase 2, since it won't even effect the protocol at all. it's just an optional optimization on the server side whenever and however we feel like adding it.

anyway, great work on this, folks, and i'm very excited to see real progress being made. i'll have to review merlin's latest patch in more detail when i'm less exhausted. if it seems reasonable, i'll put it on scratch.d.o so we can try out nedjo's client and start to see this in action! ;)

p.s. i just double checked, and i guess there's no 5.x-2.* releases of views, yet. ;) but, there almost certainly will be soon. and once there is, this is a real problem. there's already a DRUPAL-5--2 branch for panels, which is also quite popular (though not as much as views), and i've already got 4.7.x-1.* and 4.7.x-2.* versions of project, project_issue and cvslog (which are obviously much less popular than views, but still). there are currently 21 modules with DRUPAL4-7--2 branches, and 3 with a DRUPAL-5--2 branch, and those numbers are certain to grow at least some as time goes on...

dww’s picture

Status: Needs review » Needs work

this is getting really close. i think i could actually install this on s.d.o now.

however, some nits, mostly cosmetic:

  1. this is unfortunate:
    +    'project.version.data',
    +    'project_release_data',
    

    i'd rather the XMLRPC command name and the function name matched each other (aside from . vs _)

  2. should we support querying based on core compatibility tid, not just term name? term names are subject to change (however unlikely that might be), though tids are forever.
  3. this is a little confusing: $project = db_fetch_object($result), since the query itself is actually returning more about a release than a project. the nid is the release nid, for example. fundamentally, we're getting back releases, that happen to have a little data about their parent project (pid, title, etc), not the other way around. i think i'd prefer to call this variable $release instead of $project. being familiar with project_release code, i did a double-take when i saw: $project->pid a little later, since projects don't normally have pids, but releases do...
  4. on a similar vein, why are we returning the title of the project node, and not the title of the release node? you wanted to return the project name and version string separately, instead of the release title, which contains both, since it's nicer to just display the version string, and you didn't want clients to have to parse, presumably. right?
  5. minor indentation problem around here:
    +    if (!isset($data[$project->uri])) {
    +    $data[$project->uri] = array(
    +      'name' => check_plain($project->title),
    

thanks again for working on this. it'd be super slick to have this solid enough to commit before wednesday night. ;)

cheers,
-derek

dww’s picture

StatusFileSize
new4.26 KB

hehe, and some syntax errors too... that indentation problem i mentioned is partially to blame. ;)
here's a new patch that at least has no php errors, and fixes #1, #3 and #5 from my previous comment.

long-term TODO:
a) caching (at least for 'all')
b) query based on major versions, too (with continued support for not specifying it, and getting the info for whatever major version {project_release_default_versions} says to use)
c) query based on numeric tid, instead of string term name?

however, i just noticed a more serious bug in the current implementation:
INNER JOIN {project_release_default_versions}

not every project has records in this table. i'll grant this is a little confusing, and perhaps its own bug. :( maybe we should just say that *all* projects should maintain an appropriate record in this table for all api_compatibility tids it has a release for, even if there's only 1 major revision with releases for a specific tid. however, until that's the case, we need to LEFT JOIN it, and if it's NULL, do something smart (like ORDER BY version_major or something, to always get the latest release of the highest major version...).

dww’s picture

in spite of the known bugs, i just installed my patch from the previous comment on s.d.o. basic project node navigation and download pages still seem to work (which basically just confirms there are no more syntax errors, since otherwise, this patch isn't touching any existing code). so, nedjo, feel free to start playing with your client code and seeing if you can get any reasonable answers back. ;)

nedjo’s picture

I've been trying to test this, so far without success. I don't know what's expected as a value for $version, I've tried various things but I always get an empty array returned. merlin, dww, can you clear this up?

Here's some testing code (can be run in devel.module's php block on any test site):


$projects = array('views', 'project', 'cck');
$version = VERSION;
$result = xmlrpc('http://scratch.drupal.org/xmlrpc.php', 'project.release.data'. (is_array($projects) ? '.multi' : ''), $projects, $version);
  if ($result === FALSE) {
    drupal_set_message(t('Error %code: %message', array('%code' => xmlrpc_errno(), '%message' => xmlrpc_error_msg())), 'error');
  }
print_r($result);

dww’s picture

StatusFileSize
new4.27 KB

thanks for the client side code, nedjo. i debugged for a little while and found a few major bugs in the SQL query. ;) that's the main reason you're getting empty results...

however, you also want to use one of the names of the "Drupal core compatibility" vocabulary terms. e.g. "5.x" or "4.7.x". VERSION is set to stuff like "5.0", "5.1-dev", etc, and that won't do.

as soon as i upload the patch to this issue, i'll install the fixed copy of the server code on s.d.o... 1 sec.

dww’s picture

StatusFileSize
new4.29 KB

whoops, one more bug that didn't make it into my patch.

this doesn't work:
db_query($query, $join_parameters + $parameters);

for some reason i don't fully understand, you need this:
db_query($query, array_merge($join_parameters, $parameters));

anyway, attached patch solves it, and is now installed on s.d.o.

i just tried this:

$projects = array('views', 'project', 'cck');
$version = '4.7.x';
$result = xmlrpc('http://scratch.drupal.org/xmlrpc.php', 'project.release.data'. (is_array($projects) ? '.multi' : ''), $projects, $version);
if ($result === FALSE) {
  drupal_set_message(t('Error %code: %message', array('%code' => xmlrpc_errno(), '%message' => xmlrpc_error_msg())), 'error');
}
print_r($result);

and got this:

Array
(
    [views] => Array
        (
            [name] => Views
            [version] => 4.7.x-1.5
            [link] => http://scratch.drupal.org/node/
            [download] => http://scratch.drupal.org/
        )

    [cck] => Array
        (
            [name] => Content Construction Kit (CCK)
            [version] => 4.7.x-1.2
            [link] => http://scratch.drupal.org/node/
            [download] => http://scratch.drupal.org/
        )

    [project] => Array
        (
            [name] => Project
            [version] => 4.7.x-1.2
            [link] => http://scratch.drupal.org/node/
            [download] => http://scratch.drupal.org/
          )

)

can i get a WOO HOO?! ;) nice work, folks, we're getting there... the links are clearly broken, i'll take a quick look at that.

dww’s picture

StatusFileSize
new4.3 KB

fixing the links. just silly minor errors in the SQL (not selecting for pid at all, and yet more filepath vs. file_path mixups).

dww’s picture

StatusFileSize
new4.3 KB

hehe, whoops, except i forgot about project_release_download_link(), which, for example, handles the download-from-ftp.osuosl.org link re-writing stuff...

dww’s picture

StatusFileSize
new7.01 KB

except we just want the raw URL, not the html-formated download link. ;)

dww’s picture

StatusFileSize
new7.02 KB

and we want the right # of args for the function. ;)

nedjo’s picture

StatusFileSize
new8.63 KB

Nice! Thanks for all the fixes! This is exciting!

I've updated the draft module in my sandbox and it's displaying some useful information, screenshot attached.

To use the module:

1. install it
2. run cron
3. visit admin/logs/updates (also linked to on modules page)

dww’s picture

StatusFileSize
new7.13 KB

and, since it's so trivial to serve, and potentially extremely useful for clients, include the file date and md5hash in the answer, too.

dww’s picture

ok, after further investigation, i don't think i care about the projects that don't have records in {project_release_default_versions}. there's only 248 of them, and they seem to either have no published releases at all, or only have a HEAD release node, which doesn't know what version of core it's compatible with. in either case, there'd be nothing to report, anyway. so, that concern from comment #27 is hereby off the table. that leaves us with:

a) caching results for 'all'.
b) querying based on a custom major revision.
c) querying based on numeric tid instead of term name.

merlin and i just discussed (c) in IRC, and agree it's a bad idea, so that's out, too. (a) should be trivial, and isn't even required for phase 1. perhaps i'll roll a quick patch for that.

(b) is tricky, since there's no particularly good way to query on arbitrary major revisions for each project in a big array in a single query. so, we'll probably just say that if you use the array of projects or 'all', you get the defaults, and querying for a specific major revision only works if you query a single project. so, at least sites could say "i'm using the defaults for everything, except foo, bar, and baz, which are at "5.x-2" (or whatever), which would generate 4 XML-RPC requests, resulting in 4 queries. that seems like not the end of the world. and, since we're supporting individual project queries anyway, it doesn't seem to hurt to allow the client to optionally specify a major revision to restrict the results to...

merlinofchaos’s picture

StatusFileSize
new4.83 KB

According to chx, we don't need to specify the arguments to xmlrpc *at all* which gives us more flexibility. We can safely do this because we're writing the client, and we have pretty much already checked for sane defaults anyhow.

This version allows for an optional major version -- it must be the same for all projects specified. It also returns a little more data. Hopefully I made no typos!

dww’s picture

Status: Needs work » Needs review
StatusFileSize
new8.04 KB

- merlin's patch wiped out a few changes to project_release_download_link(), which i've restored.
- cleaned up a few comments
- added caching for 'all'

i think that's everything. ;) this no longer needs work... it needs reviews. i'll install on s.d.o as soon as the patch is uploaded.

dww’s picture

StatusFileSize
new8.04 KB

whoops, minor bug in the cache stuff... merlin had renamed $version to $api_version as the function argument (which is a good move), but my cache code didn't notice. now fixed.

dww’s picture

StatusFileSize
new8.08 KB

and if you specify both 'all' and a major version, the cache id is wrong. ;) so, if the $version_major is specified, we now add that to the $cid we construct.

dww’s picture

StatusFileSize
new8.11 KB

some minor typos/bugs in the previous patch. wow, this issue is just tearing through the revisions. ;)

moshe weitzman’s picture

nice work, all.

it occurs to me that perhaps we only want to offer the 'all' version .that would be much kinder on the server, since everyone will receive the same cached document. the downside is a bit more bandwidth consumption.

merlinofchaos’s picture

I don't believe that unserializing an array of 1200 or so arrays of around 8 items apiece will be any friendlier to the system than a query for what will probably be a dozen or two projects. I could be wrong -- we could easily do some benchmarks and see.

moshe weitzman’s picture

@earl - all that unserializing has to be once every 15 mins and then cached as XML - for everyone. the 'give me my own XML' approach has to be built custom for each of drupal's 100,000 sites or however many we'll have for this release. seems like a clear decision to me.

merlinofchaos’s picture

Uh, no, it has to be unserialized when it's pulled from the cache. We're not storing raw XML; the XMLRPC client we have doesn't let us cache at that level. We're storing the array, which is serialized in the cache table. It must be unserialized every time it is hit.

I did run across a bug. It currently believes that Drupal 5.0-beta1.tar.gz is the most current Drupal. This is clearly wrong; not sure what's causing this offhand.

Nedjo, I'm toying with projectinfo a bit; I'm curious why you're not using theme('table') for your table. Also, there's quite a few things I'd like to see out of this. Do you want to create a real project out of this, or would you like me to? You took the initiative to write this; I'll be more than happy to continue it and maintain the project until we can get it into core for Drupal 6, but if you want to I am more than willing to let you. Either way, once we have a project we can discuss the desired feature set there.

nedjo’s picture

Earl, please go ahead and create a project for this and code away. I put it in my sandbox mainly because I wasn't sure what we'd want to call it. I'll be happy to review ideas or contribute. The theming is following for convenience what's in theme_status_report() (and recycling the class system-status-report). Make all the changes you like.

moshe weitzman’s picture

ah, i didn't catch that. thanks.

this brings up the question of why use XMLRPC at all? What is wrong with a regular callback that returns XML. Perhaps the only argument we need is major version of drupal. That can be a typical argument in menu system. Sorry to be a pest here. Feel free to just ignore me.

merlinofchaos’s picture

Well, why parse all the XML ourselves when XMLRPC already does it?

XMLRPC is nice and standard; it'd really suck to create our own protocol just to shave a little bit off.

I may be wrong about the kind of performance hit this will be, but we already have a lot of Drupal sites out there doing reporting, right? That should give us some kind of indication of how many sites are likely to use this sort of feature as well.

In any case, the current module updater module is set to check only once per day; (er, I think; at least, it will be by the time it's released), and it's not set to check at a specific time; which means we should get them spread out over a not fully random distribution but also not all lumped together at 3am or something.

dww’s picture

StatusFileSize
new8.13 KB

@merlin: I did run across a bug. It currently believes that Drupal 5.0-beta1.tar.gz is the most current Drupal. This is clearly wrong; not sure what's causing this offhand.

part of the problem is that we're running on s.d.o, so don't be confused that 5.1 isn't showing up, since that didn't happen until after the last time s.d.o got refreshed from the d.o DB. however, there is a bug in the query, too. we don't deal with version_extra in the ORDER BY, so 5.0 and 5.0-beta1 are equivalent, as far as the query is concerned, and you get semi-random results. however, we can't ORDER BY on extra, since there's no guarantee that the order of releases people choose with their own crazy "beta", "rc", etc conventions, will always work out to a nice clean, alphabetic ordering. other parts of project_release.module and friends just use an ORDER BY on prn.file_date as the tie-breaker if we have 2 otherwise equivalent version strings. so, even though 5.0 and 5.0-beta1 are the same from the major/minor/patch (but not extra) perspective, 5.0 was later, so it has a bigger timestamp value, so it'll win.

@moshe: i mostly agree w/ merlin. it's a pain in the ass to reimplment the wheel, and until we have real benchmarks or data that the performance is a problem, i'd rather avoid a) re-doing any of this work or b) making it any more custom/complicated than it has to be. sure, the XML-RPC will generate some new load. OTOH, the project browsing pages on d.o are some of the most expensive, nasty queries our poor DB server has to handle. if we had a wide deployment of projectinfo.module (or whatever it's going to be called), perhaps that would cut down (significantly?) at how many times/people are constantly trawling through the project browsing pages... ;)

the one thing i disagree with is the "reporting" vs. "project update info querying" crowds. the reporting stuff is tied directly into the drupal.module, and only a small subset actually care to mess with such distributed authentication. furthermore, many sites have a (justifiable) aversion to the "my software phones home to tell its makers about me" functionality. so, i'm guessing it's a relatively tiny % of drupal sites that actually turn on the report-home code. however, *tons* of sites (i daresay *all* of them) will want this updater (which is why this should become part of core for D6, if at all possible). so, i don't think it's fair to say that the relatively small number of sites that report to d.o is any indication of how many people will be using this code once it "hits the shelves".

anyway, attached patch fixes the query as described above.

dww’s picture

FYI: i'm happy with this, and think it's RTBC. unfortunately, i'm not going to mark it as such, nor am i going to commit it just yet, since it seems there's lingering doubt over the performance implications. however, merlinofchaos and nedjo both have commit access, and my blessings to commit this patch whenever they see fit. of course, if this doesn't happen while i'm gone, i'll pick up the torch again when i return. i just don't want anyone to feel like this issue needs to be blocked on any further input from me...

good luck! ;)
-derek

fgm’s picture

StatusFileSize
new1.91 KB

With killes' help, we did some benchmarking from a remote client in php-gtk2 to s.d.o. and got the following results, all averaged over 100 passes.

  • empty request: 430ms
  • normal request, for 1 to 10 modules in array form, for 1 release: 430 (1 module) to 470ms (10 modules)
  • "all" request
    • original version: 3660ms
    • remove the call to unserialize, leaving client to unserialize itself: 2500ms
    • compress and make readable upon cache_set with base64_encode(gzencode(serialize($data), 9, FORCE_GZIP)): 680ms

Deviations from the 450ms per normal request were higher within a series of test than from one series of test with 1 module (up to 670ms at some point) to another with 10 modules, so it appears that a reasonably small number of modules in the request does not significantly alter the response time.

Of course, it is to be expected that requesting a large number of modules will at some point become significant, but up to 10 modules, this does not appear to be the case : external variations of load on the drupal servers appear to be much more significant than the number of modules requested.

Sample test harness is attached. It must be renamed to rpc_test_harness.tgz because d.o. rejects .gz attachments on comments.

merlinofchaos’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new6.3 KB

With killes' ok, I believe this patch (which includes gzip compression) will be RBTC.

I'd like to get this on scratch to make sure it works and get projectinfo using the right data.

merlinofchaos’s picture

StatusFileSize
new6.26 KB

Er, try this one instead.

merlinofchaos’s picture

StatusFileSize
new6.27 KB

Ok, killes' pointed out a small problem with the previous patch. Also, this version caches for an hour.

moshe weitzman’s picture

Status: Reviewed & tested by the community » Fixed

earl committed this.

Chris Johnson’s picture

Wow! It's great to see this. Nedjo, thanks for resurrecting these ideas and getting into code. Earl and Derek, thanks much for pitching in.

It's really great to see an idea I had several years ago get improved and implemented. I'm sure others had similar ideas independently and in parallel -- I'm not claiming credit here, it's just something I've wanted to see for a long time.

Anonymous’s picture

Status: Fixed » Closed (fixed)