Add possibility to retrieve a list of projects from the server [#157514]

Here is a patch which adds the ability to retrieve a list of projects from the server. This kind of feature is needed for my Google Summer of Code projects (the localization server), so I can have a list of projects syncronized for localization.

The attach patch adds on the *release* XML generation system:

- adds a new type of XML output, which lists all projects with their short names, titles and project page URIs
- generates an XML file with this data to the release XML space

For this to work, I needed to modify the release XML storage code slightly, so proper error messages are used. I also invented the special 'project-list' project to retrieve this listing, and an 'all' core compatibility (which we discussed with dww that it would be used for other things later too).

This approach made the output XML directly servable by the already working project XML server script, as we have a 'project-list' psudo-project and an 'all' core compatibility, which will return the list. Possible problems with this approach:

- This script generates XML files for releases, and is in the project/releases subproject. Having a list of *projects* here (kind of unrelated to releases) is not 100% logical. (I also submitted this feature request against the releases component, because the patch affects that part).
- I sneaked in this list to the project release XML namespace, so the project list is a kind of easter egg, it provides different information in a different format. This is also not 100% logical.

The code should work, but we are better to discuss these things to position this service to a proper place both in the source code and in the output.

Comment	File	Size	Author
#52	157514_project_release_xml_not_suck_51.patch	2.6 KB	aclight
#39	project-list-all-20080411.patch	11.35 KB	gábor hojtsy
#36	project-list-all-proper-username.patch	11.29 KB	gábor hojtsy
#34	project-list-all-fixed.patch	11.29 KB	gábor hojtsy
#33	project-list-all-reroll.patch	10.5 KB	gábor hojtsy
#28	project-list-all.patch	9.63 KB	gábor hojtsy
#24	mw_67.patch	9.78 KB	moshe weitzman
#22	mw_66.patch	9.47 KB	moshe weitzman
#20	mw_65.patch	3.32 KB	moshe weitzman
#12	project-list-allxml.patch_2.txt	6.54 KB	dww
#7	project-list-allxml.patch	6.51 KB	gábor hojtsy
#3	project_all_list.patch_2.txt	6.07 KB	dww
#2	project_all_list.patch_1.txt	5.85 KB	dww
	project_all_list.patch	5.75 KB	gábor hojtsy

Comments

Comment #1

gábor hojtsy

he/him

Hungarian

Hungary

commented 7 July 2007 at 12:08

BTW the patch also fixed a bug, the code had "@old, $new" in an error message, where it should have been "@old, @new".

Comment #2

dww

we/he/they

commented 13 July 2007 at 01:25

Status:

Needs review

» Needs work

Status	File	Size
new	project_all_list.patch_1.txt	5.85 KB

Yeah, the code looks good. However, I agree this is a little bit weird to put the project list into the "release-history" tree, and related code. On the other hand, it's a little unfortunate to duplicate some of the code from project-release-create-history.php and in project-release-serve-history.php, for that matter.

So, I'm quite torn on the Right(tm) approach to this. From the elegance standpoint, I agree that fetching this XML project list from something like:

http://updates.drupal.org/project-list

makes more sense. On the other hand, if we included more release-specific information in here, such as the list of compatibility terms that a given project has releases for, then it might make more sense to keep it under release-history. It's still a little weird that it's got a different XML "schema" (not sure if that's the right terminology), but at least it'd still be related to releases in a more obvious way.

So, I'm setting this to "needs work". There are 2 possible paths back to "needs review":

Move it completely out of modules/project/release and "release-history". Put a project-create-xml.inc file at modules/project that's shared by both project-release-create-history.php and project-create-list.php for any common code we can share. Write a new thin project-serve-list.php script for modules/project to live at htdocs/project-list. Again, consider if there's code we want to share (maybe not, since it's so small, and most of the code in project-release-serve-history.php is to handle the query args and eventually to store usage data, so it's probably not worth the hassle of a .inc file for a tiny few shared functions).
Add some release-specific data to the generated project-list XML file, e.g. compatibility terms (perhaps with total # of release nodes for each term?) and leave it where it is. I guess I'm ok with "project-list" as the project name, though any bright ideas on better alternatives would be welcome.

I ended up renaming "project-release-history.php" to "project-release-create-history.php" so it fits more nicely with "project-release-serve-history.php". So, attached patch is just a re-roll of yours relative to the new file (and removes some offsets that snuck in from a few other recent commits to this script). That should be a starting point for approach #2 if that's how you'd like to proceed. As I said, I'm torn on which way is better, so I'm leaving this open for consideration/discussion.

Thanks,
-Derek

Comment #3

dww

we/he/they

commented 13 July 2007 at 01:29

Status	File	Size
new	project_all_list.patch_2.txt	6.07 KB

Committed a fix to that watchdog() typo you mentioned in your original post (http://drupal.org/cvs?commit=73326), so here's a re-roll that still applies cleanly.

Comment #4

moshe weitzman commented 31 July 2007 at 20:49

subscribe - i need this too for a module rating demo that i am fooling with. i might try to edit the patch as suggested. we'll see.

Comment #5

moshe weitzman commented 1 August 2007 at 06:24

what are "compatibility terms"? how might they represented in the XML? any other release data?

Comment #6

dww

we/he/they

commented 1 August 2007 at 14:58

"compatibillity terms" are the taxonomy terms that specifiy what version of core a release is compatible with.
For example, the "5.x" term is here: http://drupal.org/taxonomy/term/78

The "5.x" is part of the URL used to fetch project history:
http://updates.drupal.org/release-history/og/5.x

That file contains info for all releases of og compatible with 5.x core. Since core versions are never cross compatible, any site running a given version of core can only possibly care about releases for that version.

If you look at that XML file, you'll see how we currently represent this:

<api_version>5.x</api_version>

Let me know if you have other questions. I'm still not sure with of the two options in comment #2 is the right way to go, so input on that is also welcome.

Thanks,
-Derek

Comment #7

gábor hojtsy

he/him

Hungarian

Hungary

commented 6 August 2007 at 23:09

Status:

Needs work

» Needs review

Status	File	Size
new	project-list-allxml.patch	6.51 KB

The patch did not apply anymore, so first I rerolled it. Then added core compatibility terms, with the IMHO expected api_versions container tag and api_version list item tag (designed after the release XML structure).

The number of releases for a given project might be useful, but what would be most invaluable for me is the date of the last updated release, to be used as kind of an if-modified-since check in localization server. I did not include this one yet, because I'd prefer reviews of this patch first, if possible.

Comment #8

moshe weitzman commented 7 August 2007 at 00:29

I have always used form_xml_elements() to contruct XML. It does all the nice escaping thats needed. If we don't have user input here, we could get away without it but it is the best practice.

I would appreciate having the username of the project nid author.

Comment #9

gábor hojtsy

he/him

Hungarian

Hungary

commented 7 August 2007 at 16:47

I opted to not rewrite what is in there, but extend current functionality with current practices.

It would be great to come up with a sensible tag/format to include the username.

Comment #10

moshe weitzman commented 7 August 2007 at 19:39

you can see from http://drupal.org/rss.xml that out standard way of listing username in nodes is <dc:creator>bertboerland@www.drop.org</dc:creator>. i suggest we stick with that for now.

Comment #11

dww

we/he/they

commented 8 August 2007 at 01:27

Status:

Needs review

» Needs work

@moshe: <dc:creator>, huh? What's the "dc:" part stand for? ;)

@gabor:

A) I tried running this on d.o and it doesn't work. It doesn't generate any <api_version> tags for any projects. Oh, and I I see why... there's an SQL error in there. :( Notice the extra ',' at the end of your SELECT clause, before the FROM.

B) This isn't needed: INNER JOIN {vocabulary} v ON td.vid = v.vid -- {term_data} (which you're already JOIN'ing on) already has the vid, and it doesn't look like you're using {vocabulary} for anything else.

C) I'd add an ORDER BY td.weight ASC in there to order the API compatibility terms in a deterministic way.

D) The implementation itself is clean in the sense of code re-use and simplicity. However, it seems rather evil in terms of performance:

The *entire* XML array for all projects is constructed in RAM. This hasn't yet been a problem with the per-project stuff, since those files are relatively small XML files, but the one for the project-list is 445K. Keep in mind that the whole query result objects are in RAM, too, so the total footprint of this is going to be quite large.
This generates ~2000 separate queries to get all the API terms for each project in a rather tight loop. Seems like it wouldn't be that hard to just get everything for the project-list in the initial query? I'm not sure which is worse for d.o -- tons of really small queries or one big one.

Comment #12

dww

we/he/they

commented 8 August 2007 at 01:31

Status:

Needs work

» Needs review

Status	File	Size
new	project-list-allxml.patch_2.txt	6.54 KB

This patch fixes A-C. D is still an open question. Of course, the username, etc, isn't in there, yet (though we can always add to the XML schema later).

This works, as evidenced by: http://updates.drupal.org/release-history/project-list/all

I think I'd be ok with committing this to CVS and deploying it via cron on d.o, but I'd like a 2nd opinion on D before I do.

Cheers,
-Derek

Comment #13

gábor hojtsy

he/him

Hungarian

Hungary

commented 8 August 2007 at 09:47

Great, thanks! We should indeed discuss performance issues here. Of course if we would remove the API terms and make this a project list only, the file size would be cut in half, and the query performance would receive a boom. We added this feature in the first place to make this 'release history' compliant, but my project will drop this data right away. In case we don't see a realistic need for this data apart from elegance, we can indeed move this code out of the release history part and optimize elsewhere.

While looking at this list, I also noticed that I'll need to blacklist several project types (stub projects like "Drupal.org webmasters", translation projects themself, and so on). It is not easy to tell from this data what type of project is in question. If it fits with you, my needs would be better served with an XML which contains information about what type of project we are looking at. So I'd add even more data to the XML to reflect this (evil grin).

BTW DC refers to Dublin Core metadata, and would introduce namespaces in this XML file.

Comment #14

dww

we/he/they

commented 8 August 2007 at 15:59

Status:

Needs review

» Needs work

Right, I realize I'm the idiot who wanted the API compatibility terms in the list in the first place. ;) Those comments should have been @dww as much as @gabor. ;)

That said, I do think they're useful, and this script only runs every 6 hours, not on every page load, so the bad performance isn't *that* big a deal...

E) Looking at this more, I'm not sure about the <link> tag. Perhaps we should give the node/[nid] version, instead? These direct project aliases have been causing all sorts of grief and URL namespace conflicts. But, I guess they're so entrenched, they're here to stay, so I guess it's ok. However, there are a handful of very old projects that have never been edited since my fix went in that updates these aliases automatically (previously, it was all manual effort by d.o admins), so a handful of projects don't have aliases that work. The nid is sure to always work, so I'm leaning towards that... Of course users of this data that want to present "human-readable" links can always construct them based on the "short_name" field, which is how the URL aliases are built in the first place.

F) Including the project type taxonomy identifier (much like we do with the release type taxonomy terms) would be great.

G) There's no handling of unpublished projects at all. Either they should be excluded from the list in the first place (probably) or at the very least indicated as such in the XML (which is what we do for unpublished release nodes, in case update_status wants to print a really strong warning that "this release is no longer available on drupal.org and is therefore unsupported...", etc). Including the <status> in the project list doesn't make as much sense to me, I think it'd be better to just restrict the query for the list of projects to (n.status = 1) in the first place.

H) If we wanted to get crazy, we could also include tags that indicate if releases are enabled for the project or not.

I) If we *really* wanted the performance to get terrible, we could include tags about the project's issue queue (which is technically optional as well), for example, the current count of issues in each state, etc, etc. ;)

J) We still have Moshe's request for info about the current owner of the project node.

So, setting this to needs work for E, F, and G. H, and I might never happen, or could be moved to another issue. I'm still unclear about J -- do we just want the username, probably we also want their uid and/or link to their /user/[uid] page? Do we just want the project node owner or do we want to include everyone with CVS access?

Comment #15

moshe weitzman commented 8 August 2007 at 15:59

Status:

Needs work

» Needs review

I think release info is quite nice in this feed, so i propose we keep it and optimize if the strain hurts drupal.org or another user of this feature. I don't *have* to have it, so I am OK with either outcome.

Comment #16

dww

we/he/they

commented 8 August 2007 at 16:16

I just realized how confusing this might be: "So, setting this to needs work for E, F, and G. H, and I might never happen, or could be moved to another issue.". ;) Let me try again:

So, I'm setting this issue to needs work for (E), (F), and (G). Points (H) and (I) might never happen, or could be moved to another issue.

;)

Comment #17

dww

we/he/they

commented 8 August 2007 at 16:18

Status:

Needs review

» Needs work

Whoops, and I missed that moshe and I replied simultaneously and he clobbered my "needs work"... ;) Ahh, the joys of the issue queue first thing in the morning.

Comment #18

moshe weitzman commented 8 August 2007 at 16:39

i was thinking that we have username and link elements which show the username and URL *for just the project owner*.

Comment #19

gábor hojtsy

he/him

Hungarian

Hungary

commented 8 August 2007 at 20:09

@dww:

(E): I think the link is better. You give reasons yourself, why it cannot be automated to generate the link from the short_name. There are lots of projects in the list without aliases.
(F): Sure.
(G): IMHO we just should not list unpublished stuff, that's it.
(H): Why? :)
(I): I don't need this, neither I think someone else would do. This file will get big enough to download and especially to *parse* anyway.
(J): Yes, I also think Moshe meant the owner, definitely not all contributors.

I don't have time right now to work on an improved patch unfortunately, but will have time tomorrow (it is 22:00 here).

Comment #20

moshe weitzman commented 27 August 2007 at 15:56

Status:

Needs work

» Needs review

Status	File	Size
new	mw_65.patch	3.32 KB

This patch resolves all outstanding items:

E: we include a LINK whose value is determined by url()
F: i added these category elements to both individual project xml files and the new site-wide one
G: don't include unpublished projects
J: added owner of project node as dc:creator - is consistent with node_rss_item(). i added to both individual project xml files and the new site-wide one

I generated the patch from all the way down to /sites/all/modules/project/release.

Comment #21

hunmonk commented 27 August 2007 at 21:29

Status:

Needs review

» Needs work

call me crazy, but this doesn't look like the right patch at all...

Comment #22

moshe weitzman commented 28 August 2007 at 00:13

Status	File	Size
new	mw_66.patch	9.47 KB

yikes. those are the lines i had to coment out to get the drupalorg_testing profile to finish on php5. anyway, here is the right patch.

Comment #23

hunmonk commented 28 August 2007 at 03:00

spacing needs to be fixed for n.uid=u.uid in a couple of spots.
we reordered the args in project_release_history_write_xml() -- seems like we should reorder the Doxygen accordingly
why are we sanitizing output here: $xml .= ' <title>'. check_plain($project->title) ."</title>\n";, but not here: $xml .= "<category domain=\"$taxonomy_url\">$term->name</category>\n";. overall it looks like output filtering is missing in several places.

Comment #24

moshe weitzman commented 29 August 2007 at 02:45

Status:

Needs work

» Needs review

Status	File	Size
new	mw_67.patch	9.78 KB

fixed all items on hunmonk's list. good call on the output filtering. i added into a couple places.

Comment #25

gábor hojtsy

he/him

Hungarian

Hungary

commented 30 November 2007 at 18:54

dww: any issues with this patch?

Comment #26

dww

we/he/they

commented 30 November 2007 at 20:08

Status:

Needs review

» Needs work

Sorry, totally fell off my radar. Yes, there are a few issues. There are a couple of potential XSS bugs in this hunk:

@@ -174,6 +174,12 @@ function project_release_history_generat
   $xml .= '<title>'. check_plain($project->title) ."</title>\n";
   $xml .= '<short_name>'. check_plain($project->uri) ."</short_name>\n";
   $xml .= '<link>'. url("node/$project->nid", NULL, NULL, TRUE) ."</link>\n";
+  $xml .= "<dc:creator>$project->name</dc:creator>\n";
+  $terms =  taxonomy_node_get_terms_by_vocabulary($project->nid, _project_get_vid());
+  foreach ($terms as $term) {
+    $taxonomy_url = url("taxonomy/$term->tid", NULL, NULL, TRUE);
+    $xml .= "<category domain=\"$taxonomy_url\">$term->name</category>\n";
+  }
   $xml .= '<api_version>'. check_plain($api_version) ."</api_version>\n";
   $xml .= "<default_major>$project->major</default_major>\n";
   $xml .= "<releases>\n";

In particular:

A) $project->name

B) $term->name

One other problem:

C) This URL is bogus: url("taxonomy/$term->tid"), should be taxonomy/term/$term->tid. This is broken both in the above hunk and down in project_list_generate().

That's all I can spot right now, there might be more...

Comment #27

adrian commented 23 January 2008 at 14:00

subscribing.

i'm going to need this for Hostmaster

Comment #28

gábor hojtsy

he/him

Hungarian

Hungary

commented 20 February 2008 at 20:50

Status:

Needs work

» Needs review

Status	File	Size
new	project-list-all.patch	9.63 KB

@dww: I went through all your suggestions, and fixed those. Interestingly, the "all projects" list had these XSS holes covered in the patch already. Strange. I also implemented some coding style fixes (whitespace, quotes, etc), but otherwise did not find issues. Back for review again. (Patch rolled from project module root, unlike Moshe's previous patch).

Comment #29

dww

we/he/they

commented 7 March 2008 at 17:51

Status:

Needs review

» Needs work

Code looked ok on visual inspection, so I put it on p.d.o and ran it. Unfortunately, the resulting XML is invalid:

http://project.drupal.org/release-history/project-list/all

error on line 7 at column 65: Namespace prefix dc on creator is not defined
error on line 27 at column 339: Namespace prefix dc on creator is not defined
error on line 34 at column 420: Namespace prefix dc on creator is not defined
error on line 41 at column 501: Namespace prefix dc on creator is not defined
error on line 58 at column 731: Namespace prefix dc on creator is not defined
error on line 75 at column 961: Namespace prefix dc on creator is not defined
error on line 89 at column 1139: Namespace prefix dc on creator is not defined
error on line 106 at column 1369: Namespace prefix dc on creator is not defined
error on line 123 at column 1599: Namespace prefix dc on creator is not defined
error on line 134 at column 1729: Namespace prefix dc on creator is not defined
error on line 149 at column 1923: Namespace prefix dc on creator is not defined
error on line 165 at column 2137: Namespace prefix dc on creator is not defined
error on line 184 at column 2399: Namespace prefix dc on creator is not defined
error on line 198 at column 2581: Namespace prefix dc on creator is not defined
error on line 212 at column 2759: Namespace prefix dc on creator is not defined
error on line 228 at column 2973: Namespace prefix dc on creator is not defined
error on line 245 at column 3215: Namespace prefix dc on creator is not defined
error on line 264 at column 3477: Namespace prefix dc on creator is not defined
error on line 281 at column 3707: Namespace prefix dc on creator is not defined
error on line 299 at column 3953: Namespace prefix dc on creator is not defined
error on line 314 at column 4147: Namespace prefix dc on creator is not defined
error on line 333 at column 4409: Namespace prefix dc on creator is not defined
error on line 347 at column 4601: Namespace prefix dc on creator is not defined
error on line 363 at column 4815: Namespace prefix dc on creator is not defined
error on line 379 at column 5029: Namespace prefix dc on creator is not defined

Comment #30

gábor hojtsy

he/him

Hungarian

Hungary

commented 8 March 2008 at 02:16

The dc namespace should be defined at the beginning of the XML for that. I'll look into this a bit later if nobody beats me to it.

Also, it looks the links are quite bad, I don't know whether it is because of "issues" of (or the nature of) project.drupal.org or bugs with the code:

<projects>
 <project>
  <title>Drupal</title>
  <short_name>drupal</short_name>
  <link>http://project.drupal.org/var/www/project.drupal.org/htdocs/sites/all/modules/project/release/project/drupal</link>
  <dc:creator>Drupal</dc:creator>
<category domain="http://project.drupal.org/var/www/project.drupal.org/htdocs/sites/all/modules/project/release/taxonomy/term/13">Drupal project</category>
  <api_versions>
   <api_version>7.x</api_version>
   <api_version>6.x</api_version>
   <api_version>5.x</api_version>
   <api_version>4.7.x</api_version>
   <api_version>4.6.x</api_version>
   <api_version>4.5.x</api_version>
   <api_version>4.4.x</api_version>
   <api_version>4.3.x</api_version>
   <api_version>4.2.x</api_version>
   <api_version>4.1.x</api_version>
   <api_version>4.0.x</api_version>
  </api_versions>
 </project>

Comment #31

dww

we/he/they

commented 8 March 2008 at 08:38

Thanks for looking into the dc namespace thing.
The bogus links are just from how I ran it on p.d.o, don't worry about that.

Cheers,
-Derek

Comment #32

dww

we/he/they

commented 8 March 2008 at 16:07

Note to whomever works on this next: I committed #204140: Modify project-release-create-history.php to use {project_release_supported_versions} so this needs a re-roll to deal with conflicts, too. Thanks.

Comment #33

gábor hojtsy

he/him

Hungarian

Hungary

commented 20 March 2008 at 10:54

Status:

Needs work

» Needs review

Status	File	Size
new	project-list-all-reroll.patch	10.5 KB

OK, here is a reroll. It should fix the missing Dublin Core namespace and the conflicts. Reviewing the code I (still) have some concerns:

- user name information (dc:creator) crept into this patch above, but has really nothing to do with listing projects
- category information for project nodes is in the patch, but uses different tags then release category (eg. security, bug fix, new features) listing

The user name functionality is not in scope of this issue, although some of the issues evolved around that part of the patch. The category information I need to exclude non-code (eg. translation, DROP, etc.) projects from the l10n_server listings.

I need project listing information for l10n_server, and high level category information for projects, and that's it. I don't even need project release information in the project list. That was included so that we can put this into the release listing script. However the project release listing XML schema changed a lot with http://drupal.org/node/204140 being committed, so we probably need to take a fresh look at that as well. Again, I don't need the release information directly there, and it makes this generated XML very huuuge, so I am fine with removing that as well, keeping an even simpler project list. That's what *I* need and willing to scratch more there. Also, some of the schema changes simply need to be copied over, like project status information (eg. published, unpublished, etc) would be great in the project list.

Comment #34

gábor hojtsy

he/him

Hungarian

Hungary

commented 20 March 2008 at 11:48

Status	File	Size
new	project-list-all-fixed.patch	11.29 KB

Updated version with:

- aliased user 'name' to 'user_name' to remove ambiguity with project title
- carried release tag schema over to project tag listing ('terms' and 'term' with 'name' and 'value' instead of 'category' with 'domain'), this also removes the link to the category, which was not represented in the release terms
- carried over project_status information to project list based on project node 'status' flag

Kept the DC creator intact. Still to discuss the api_versions list in the project list (at the end of patch), but otherwise should be good to go IMHO.

Comment #35

dries commented 21 March 2008 at 08:12

Tiny detail: in Drupal we use 'username' instead of 'user_name'.

Comment #36

gábor hojtsy

he/him

Hungarian

Hungary

commented 25 March 2008 at 16:05

Status	File	Size
new	project-list-all-proper-username.patch	11.29 KB

- Patch resolving Dries' issue with user_name vs username. Otherwise no change in the patch!
- Checked the API versions concern I had. The patch has project data like this:

 <project>
  <title>Drupal</title>
  ...
  <api_versions>
   <api_version>7.x</api_version>
   <api_version>6.x</api_version>
   <api_version>5.x</api_version>
   <api_version>4.7.x</api_version>
   <api_version>4.6.x</api_version>
   <api_version>4.5.x</api_version>
   <api_version>4.4.x</api_version>
   <api_version>4.3.x</api_version>
   <api_version>4.2.x</api_version>
   <api_version>4.1.x</api_version>
   <api_version>4.0.x</api_version>
  </api_versions>
 </project>
...
 <project>
  <title>Authentication</title>
  ...
  <api_versions>
   <api_version>4.2.x</api_version>
   <api_version>4.1.x</api_version>
   <api_version>4.0.x</api_version>
  </api_versions>
 </project>
 <project>
  <title>Bbcode</title>
  ...
  <api_versions>
   <api_version>5.x</api_version>
   <api_version>4.7.x</api_version>
   <api_version>4.6.x</api_version>
   <api_version>4.5.x</api_version>
   <api_version>4.4.x</api_version>
   <api_version>4.3.x</api_version>
   <api_version>4.2.x</api_version>
  </api_versions>
 </project>

This is compared to data output for a specific project:

<title>Bbcode</title>
<short_name>bbcode</short_name>
<api_version>6.x</api_version>
<recommended_major>1</recommended_major>
<supported_majors>1</supported_majors>
<default_major>1</default_major>
<project_status>published</project_status>
<link>http://drupal.org/project/bbcode</link>
<releases>
 ...
</releases>

So this looks like in concert with the current project tag usage, having api_versions wrap api_version tags in the "all project" listing.

All-in-all I hope this should be good to go!

Comment #37

gábor hojtsy

he/him

Hungarian

Hungary

commented 11 April 2008 at 14:10

Anything else I should do about this patch? As I written above 3 weeks ago, "All-in-all I hope this should be good to go!".

Comment #38

aclight commented 11 April 2008 at 14:33

Status:

Needs review

» Needs work

+      $term_query = db_query("SELECT DISTINCT(td.tid), td.name AS term_name FROM {project_release_nodes} prn INNER JOIN {term_node} tn ON prn.nid = tn.nid INNER JOIN {term_data} td ON tn.tid = td.tid WHERE prn.pid = %d AND td.vid = %d ORDER BY td.weight ASC", $project->nid);

Doesn't this line (from project_list_generate()) need one more argument for db_query()? There are two %d placeholders but just one argument.

Comment #39

gábor hojtsy

he/him

Hungarian

Hungary

commented 11 April 2008 at 15:40

Status:

Needs work

» Needs review

Status	File	Size
new	project-list-all-20080411.patch	11.35 KB

Oh, right. We need the $api_vid = _project_release_get_api_vid() code. Added.

Comment #40

aclight commented 12 April 2008 at 16:09

Status:

Needs review

» Needs work

I tested this and indeed it does create XML. The code in the patch looks good to me.

As for the XML generated here, it doesn't seem to be valid XML, at least according to http://www.stg.brown.edu/service/xmlvalid/

I don't know much about XML, so maybe it's not a problem that this doesn't validate or the site I was using is not a good choice for validation testing.

One thing I noticed about the XML output for individual projects is that it looks like you need to indent most of it by two additional spaces. Here's some example output:

<project xmlns:dc="http://purl.org/dc/elements/1.1/">
<title>Project issue tracking</title>
<short_name>project_issue</short_name>
<dc:creator>site1</dc:creator>
  <terms>
   <term><name>Project types</name><value>Modules</value></term>
   <term><name>Project types</name><value>Developer</value></term>
  </terms>
<api_version>5.x</api_version>
<recommended_major>1</recommended_major>
<supported_majors>1,2</supported_majors>
<default_major>1</default_major>
<project_status>published</project_status>
<link>http://localhostrgdrupal/project/project_issue</link>
<releases>
  [major snip]
 <release>
  <name>project_issue 5.x-0.1-beta</name>
  <version>5.x-0.1-beta</version>
  <tag>DRUPAL-5--0-1-BETA</tag>
  <version_major>0</version_major>
  <version_minor>0</version_minor>
  <version_patch>1</version_patch>
  <version_extra>beta</version_extra>
  <status>published</status>
  <release_link>http://localhostrgdrupal/node/140</release_link>
  <download_link>http://localhostrgdrupal/files/project/project_issue-5.x-0.1-beta.tar.gz</download_link>
  <date>1169599520</date>
  <mdhash>8854aac14c1a6ed2a5b5f10add93f87e</mdhash>
  <terms>
   <term><name>Release type</name><value>New features</value></term>
   <term><name>Release type</name><value>Bug fixes</value></term>
   <term><name>Release type</name><value>Security update</value></term>
  </terms>
 </release>
</releases>
</project>

Seems to me that everything between <project> and </project> should be indented extra.

Setting to CNW for the indentation issue. If I'm wrong there, that's fine.

Comment #41

gábor hojtsy

he/him

Hungarian

Hungary

commented 14 April 2008 at 08:35

The indentation itself should not make any XML invalid or valid. Do you have a copy of what was that site saying about invalidity of this XML?

Comment #42

aclight commented 14 April 2008 at 11:03

Sorry, I wasn't meaning to imply that the indentation had anything to do with the XML not validating. Those were two separate issues in my mind.

As for the validation errors, I pasted the XML from #40 into the validation engine I mentioned in #40, and here was the result:

Errors:

line 1, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: project 
line 1, [user-supplied text]:
    error (1203): attribute can't be checked because element is not defined: project 
line 2, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: title 
line 2, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 2, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: title 
line 2, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: title 
line 3, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: short_name 
line 3, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 3, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: short_name 
line 3, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: short_name 
line 4, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: dc:creator 
line 4, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: dc:creator 
line 4, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: dc:creator 
line 4, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 5, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: terms 
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term 
line 6, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term 
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value 
line 6, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value 
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 6, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value 
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name 
line 6, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name 
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 6, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name 
line 6, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term 
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term 
line 7, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term 
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value 
line 7, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value 
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 7, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value 
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name 
line 7, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name 
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 7, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name 
line 7, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term 
line 8, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: terms 
line 8, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: terms 
line 9, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: api_version 
line 9, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: api_version 
line 9, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 9, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: api_version 
line 10, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: recommended_major 
line 10, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 10, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: recommended_major 
line 10, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: recommended_major 
line 11, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: supported_majors 
line 11, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 11, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: supported_majors 
line 11, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: supported_majors 
line 12, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: default_major 
line 12, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: default_major 
line 12, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: default_major 
line 12, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 13, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: project_status 
line 13, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 13, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: project_status 
line 13, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: project_status 
line 14, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: link 
line 14, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: link 
line 14, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 14, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: link 
line 15, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: releases 
line 17, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 17, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: release 
line 18, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name 
line 18, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 18, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name 
line 18, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name 
line 19, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version 
line 19, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version 
line 19, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version 
line 19, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 20, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: tag 
line 20, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 20, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: tag 
line 20, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: tag 
line 21, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version_major 
line 21, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version_major 
line 21, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version_major 
line 21, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 22, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version_minor 
line 22, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 22, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version_minor 
line 22, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version_minor 
line 23, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version_patch 
line 23, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 23, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version_patch 
line 23, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version_patch 
line 24, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version_extra 
line 24, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version_extra 
line 24, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version_extra 
line 24, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 25, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: status 
line 25, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 25, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: status 
line 25, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: status 
line 26, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: release_link 
line 26, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: release_link 
line 26, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 26, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: release_link 
line 27, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: download_link 
line 27, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 27, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: download_link 
line 27, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: download_link 
line 28, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: date 
line 28, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 28, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: date 
line 28, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: date 
line 29, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: mdhash 
line 29, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: mdhash 
line 29, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: mdhash 
line 29, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 30, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: terms 
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term 
line 31, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term 
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value 
line 31, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value 
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 31, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name 
line 31, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value 
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name 
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 31, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name 
line 31, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term 
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term 
line 32, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term 
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value 
line 32, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value 
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 32, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value 
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name 
line 32, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name 
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 32, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name 
line 32, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term 
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term 
line 33, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term 
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value 
line 33, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value 
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 33, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name 
line 33, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value 
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name 
line 33, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name 
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData) 
line 33, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term 
line 34, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: terms 
line 34, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: terms 
line 35, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: release 
line 35, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: release 
line 36, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: releases 
line 36, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: releases 
line 37, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: project 
line 37, [user-supplied text]:
    error (402): EOF encountered; no doctype declaration found: project

As I said above, I know next to nothing about XML, so I'm not sure that this site is a good choice for XML validation. Maybe there's something subtle that I'm not aware of that makes it a poor choice. It was just the first hit I got when searching Google for whatever terms I used.

Comment #43

gábor hojtsy

he/him

Hungarian

Hungary

commented 14 April 2008 at 21:36

Status:

Needs work

» Needs review

All these seem to be "undeclared element" or "lacks content model", which are really just the sign that project module does not refer to a defined XML schema to validate against. All you can do is to ensure/check the well formdness of the XML genearted, as there is no schema to validate against, the XML would never validate. This is true to all files generated by project module before or after this patch. So these validation errors should not hold up the patch.

Also, the XML output for individual projects is not generated by this patch, this patch only slightly modifies it, so that the existing code does not do nice indenting there should not be a problem in this patch.

With all this considered, I think you did not identify any issues with this patch. Anything else bad you noticed which my hold this back from being committed?

Comment #44

aclight commented 14 April 2008 at 21:59

Status:

Needs review

» Reviewed & tested by the community

Anyone who has hung out in the project* for long enough would know that just because a patch doesn't introduce new errors/issues, that doesn't mean that this fact alone should not hold up the patch. :)

But, no, I didn't find any other issues with the patch other than those I've mentioned. I'll RTBC it and then dww/hunmonk can have their way with it.

Comment #45

hunmonk commented 16 April 2008 at 03:59

visual inspection of the code looks good. since dww is really most familiar with this functionality, i'm going to leave it to him to commit.

Comment #46

dww

we/he/they

commented 11 June 2008 at 03:23

Status:

Reviewed & tested by the community

» Fixed

I still feel a little dirty with this list living under "release-history", but whatever, this issue has dragged on for way too long and I don't have time/energy to rewrite everything for a more elegant URL. After final review, committed to DRUPAL-5 and HEAD, and deployed on d.o.

Behold:

http://updates.drupal.org/release-history/project-list/all

Heh, great...

% wget http://updates.drupal.org/release-history/project-list/all
% wc all
   47690   53641 1572577 all

Good luck downloading and parsing this monster. ;)

Anyway, thanks to everyone who helped with this, and sorry for the delays.

Comment #47

dww

we/he/they

commented 11 June 2008 at 23:29

Category:	feature	» bug
Status:	Fixed	» Needs work

Crap, the new XML breaks the parser in update(_status)?. :( See #269444: Some xml information being lost by parser. Ugh.

Comment #48

Anonymous (not verified) commented 14 September 2008 at 04:04

I REALLY need this file to be available for my project (plugin_manager.) Either that or something like it. What can I do?

Comment #50

dww

we/he/they

commented 14 September 2008 at 04:18

If you're up to it, you can reroll this patch in such a way that the changes to create the new file don't change the contents of the existing files in any way, even while parts of the code are being shared. Or, if any there are any changes to the content of the existing files that you test the existing update_status parser with them. See #269444 for some info about how the previous format of the data was breaking things. Thanks.

Comment #51

aclight commented 13 October 2008 at 03:04

Assigned:

gábor hojtsy

» Unassigned

It looks to me like the problem is that the order of stuff in the XML confuses the parser and that breaks stuff. With the patch applied, the xml looks something like this:

<project xmlns:dc="http://purl.org/dc/elements/1.1/">
<title>Project</title>
<short_name>project</short_name>
<dc:creator>site1</dc:creator>
  <terms>
   <term><name>Project types</name><value>Modules</value></term>
   <term><name>Project types</name><value>Developer</value></term>
  </terms>
<api_version>5.x</api_version>
<recommended_major>1</recommended_major>
<supported_majors>1</supported_majors>
<default_major>1</default_major>
<project_status>published</project_status>
<link>http://Drupal/project/project</link>
<releases>
....

If I manually edit the XML to look like below, update_status (D5 at least) seems to work fine:

<project xmlns:dc="http://purl.org/dc/elements/1.1/">
<title>Project</title>
<short_name>project</short_name>
<dc:creator>site1</dc:creator>
<api_version>5.x</api_version>
<recommended_major>1</recommended_major>
<supported_majors>1</supported_majors>
<default_major>1</default_major>
<project_status>published</project_status>
<link>http://Drupal/project/project</link>
  <terms>
   <term><name>Project types</name><value>Modules</value></term>
   <term><name>Project types</name><value>Developer</value></term>
  </terms>
<releases>
....

So I think if we take a look at project-release-create-history.php and re-order the code so that the xml looks like this later example, I think we won't need to modify the code of update_status. It's probably possible to fix this in the XML parser as well, but ugh, XML parsing in PHP4 sucks.

Comment #52

aclight commented 13 October 2008 at 03:22

Status:

Needs work

» Needs review

Status	File	Size
new	157514_project_release_xml_not_suck_51.patch	2.6 KB

Try this on for size. Slightly tested.

Comment #53

gábor hojtsy

he/him

Hungarian

Hungary

commented 24 October 2008 at 13:19

I am not sure that this is as simple as putting the terms list at the end. I've looked at the update.fetch.inc code in Drupal 6, and the update XML parser looks like having assumptions that TERM and TERMS tags belong to the current release being parsed. This is why TERMS eats up the rest of the data before RELEASES. So adding new tags like DC:CREATOR is not a problem, but reusing tags in other contexts (TERMS relating to the project not to a release), is a no-go with the current update_status parser as I see. Excerpt from the close tag parser:

      case 'TERM':
        unset($this->current_object);
        $term_name = $this->current_term['name'];
        if (!isset($this->current_release['terms'])) {
          $this->current_release['terms'] = array();
        }
        if (!isset($this->current_release['terms'][$term_name])) {
          $this->current_release['terms'][$term_name] = array();
        }
        $this->current_release['terms'][$term_name][] = $this->current_term['value'];
        break;
      case 'TERMS':
        $this->current_object = &$this->current_release;
        break;

So we can either solve this by fixing the update_status parser (complicated, needs synchronization with project module d.o deployment), come up with new tag names for project terms (ugly), decouple the project list generation from the project data generation (more code), or back out this feature altogether (lost feature, although it was only ever available in a -dev release).

We've seen several people expressing their need above for this feature. Hostmaster/Aegir and Plugin manager developers being examples, but from the looks only I worked on actually implementing it and indeed we ended up with a huge monster file which would be complicated to manage. My focus changed to run the l10n_server on the same database as project module, so I am not interested in this functionality anymore. To get to a release sooner then later, I'd say we should roll back this patch, and let the remaining people who are interested in it figure out a smaller, more compact and update_status compatible version. We've wasted enough time on this.

Comment #54

gábor hojtsy

he/him

Hungarian

Hungary

commented 24 October 2008 at 13:31

To clarify the analysis, the closing TERMS makes the "current release" the current object (instead of the current project in the project context), and therefore ends up storing all the rest of the tags before the RELEASES tag in a "dead release", which will never be stored. If the TERMS for projects are moved to the end, it just makes update status build up this "dead release" at the end (which would not be attached to the project, because there is no closing RELEASES tag, which invokes the attaching code). So if we can live with a bit of useless processing in update_status, then just moving project TERMS to the end does not hurt. But I guess there are already performance complaints against update_status, so this does not seem like an attractive solution.

Comment #55

aclight commented 24 October 2008 at 13:32

I only tested my patch in #52 on D5 using the update_status module, but it looks like the parser class is the same in both the D5 and D6 versions.

I tested my patch by first setting up a D5 site running update_status. I modified the update_status code of that site to fetch information from another local site, to which I had applied my patch in #52.

Before applying the patch, when I checked the status of modules on the test site, the information for some modules would be incorrect. After applying the patch, the information was correct.

In addition, if I print_r()'ed the parsed output before and after applying the patch, before the patch the major_version, etc. attributes for the project itself were not present (though for the releases of the project they were present). After the patch, those attributes were also present.

So, I do believe the patch at least improves the situation. I can't say for sure that it entirely fixes the problem, because I don't fully understand what the problem is.

I'd hate to see all the effort that went into this patch, both by the writers and reviewers, be completely wasted by just backing out the original commit.

Perhaps dww could give my patch a quick review and then install it on p.d.o, where it should be much easier for us to test it.

I agree that if we end up needing to fix the update_status parser itself then we might postpone this issue, but if it's relatively easy to fix in project I don't see why we should just give up.

Comment #56

gábor hojtsy

he/him

Hungarian

Hungary

commented 24 October 2008 at 13:57

@aclight: we probably cross-posted. The addendum I posted in #54 explains that moving TERMS at the end you should not see any unusual data in the structure generated by update_status, just project terms will be missing. Those are parsed and added to a "phantom release" which is not stored. So indeed, it looks like moving the project TERMS to the end fixes the parsing, it does leave the update_status parser in a state where it parses and throws away some stuff. That could be fixed in an upcoming commit to actually store the project TERMS on the project array to not drop that data. These are two "independent" fixes. ~~Only one of those fixes the problem~~. Even applying only one of these fixes the problem. The project module fix is just a workaround however for the simple update_status parser, until that is fixed properly. Doing the project module fix first enables update_status module to run fine still, and then whenever appropriate update_status can be fixed independently to actually store the data. Better not waste resources on the update_status side like this.

Comment #57

dww

we/he/they

commented 24 October 2008 at 17:00

a) The XML parser in update.module is being fixed in D7 at #324443: Update XML Parsing in update module by replacing it with PHP5-only parsing. I don't think it's worth fixing the PHP4 parser in D6 or D5 contrib.

b) update_status doesn't care about these extra tags, so I'm either fine with not including them at all or including them in a way that the parser just throws them out. update_status is resource intensive b/c it saves more data than it needs, not that the parsing itself is that expensive. I believe that RAM is the bottleneck, not CPU. I personally have no problem including some extra data that update_status doesn't need and will ignore, if that data is useful for other clients of this XML than update_status.

c) I'm also fine just splitting out the project list generation into something else. I had hoped that code reuse would win, but in practice, the PHP4 parser gave us an unanticipated hassle, which is what derailed progress in here.

d) Yes, I'm happy to put aclight's patch on p.d.o in a little while -- I just need to eat, first. ;) Stay tuned.

Comment #58

gábor hojtsy

he/him

Hungarian

Hungary

commented 24 October 2008 at 17:08

@dww (re #57): Ok, then moving TERMS to the end should work.

Comment #59

dww

we/he/they

commented 24 October 2008 at 17:54

Indeed, aclight's patch seems to generate XML that even D5 update_status can understand. I installed it on p.d.o and ran the generator, but quickly realized that'd make testing this hard since p.d.o's {node} and {project_release_nodes} tables are woefully out of date (and a full DB sync is currently a pain).

So, I set this up in a parallel release history directory on updates.d.o itself:

http://updates.drupal.org/release-history-2/drupal/6.x
http://updates.drupal.org/release-history-2/project-list/all
...

On a D5 test site, you can put this in your settings.php $conf array:

  'update_status_fetch_url' => 'http://updates.drupal.org/release-history-2',

On a D6 test site, you need this in your $conf array:

  'update_fetch_url' => 'http://updates.drupal.org/release-history-2',

If you do that, you should be able to test update(_status)? XML parsing on the new files. Light testing on my end revealed no problems. Other interested parties should please test ASAP and report your findings here, so I can just commit aclight's patch and deploy it for real.

Thanks,
-Derek

Comment #60

gábor hojtsy

he/him

Hungarian

Hungary

commented 24 October 2008 at 18:17

Status:

Needs review

» Reviewed & tested by the community

I've tested update status Drupal 6 with http://updates.drupal.org/release-history-2 and it works for me on a site where I have up to date, dev, outdated, etc. versions of different modules. Looks like good to go to me.

Comment #61

dww

we/he/they

commented 24 October 2008 at 20:50

Category:	bug	» feature
Status:	Reviewed & tested by the community	» Fixed

Great, thanks. Committed #52 to HEAD and DRUPAL-5. Deployed on d.o, and manually re-ran the script to generate live data. I just ripped out all the release-history-2 stuff, too. Pointing my update_status test site back to updates.d.o and all is still working fine. Yay. ;) Setting this back to a feature request for posterity, since that what it was before #46...

Comment #62

Anonymous (not verified) commented 7 November 2008 at 20:52

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

Comment #63

stefan freudenberg commented 22 March 2009 at 10:06

Is there a way to retrieve an API specific list of projects from the server? I tried http://updates.drupal.org/release-history/project-list/6.x but that does not work.

Comment #64

dww

we/he/they

commented 22 March 2009 at 15:52

"Is there a way to retrieve an API specific list of projects from the server?"

No. Please open a new feature request for new features. ;)

Comment #65

tm01xx commented 8 July 2009 at 05:49

Hi,

As looking at the XML's information, is it possible to include the description of a project (currently it has only 'title', 'api_version'., etc but not something likes 'description' )? Anywhere else could we retrieve that information actually?

Thank you,

Comment #66

aclight commented 8 July 2009 at 11:10

@tm01xx: I think that this has been requested before, but the problem with providing the description is that for many projects it is quite long. The XML that is generated is already quite long, so providing the description might cause additional problems. As far as I know it's not possible to get the description of projects short of screen scraping.