Add possibility to retrieve a list of projects from the server

Gábor Hojtsy - July 7, 2007 - 12:07
Project:Project
Version:5.x-1.x-dev
Component:Releases
Category:bug report
Priority:normal
Assigned:Gábor Hojtsy
Status:patch (code needs work)
Description

Here is a patch which adds the ability to retrieve a list of projects from the server. This kind of feature is needed for my Google Summer of Code projects (the localization server), so I can have a list of projects syncronized for localization.

The attach patch adds on the *release* XML generation system:

- adds a new type of XML output, which lists all projects with their short names, titles and project page URIs
- generates an XML file with this data to the release XML space

For this to work, I needed to modify the release XML storage code slightly, so proper error messages are used. I also invented the special 'project-list' project to retrieve this listing, and an 'all' core compatibility (which we discussed with dww that it would be used for other things later too).

This approach made the output XML directly servable by the already working project XML server script, as we have a 'project-list' psudo-project and an 'all' core compatibility, which will return the list. Possible problems with this approach:

- This script generates XML files for releases, and is in the project/releases subproject. Having a list of *projects* here (kind of unrelated to releases) is not 100% logical. (I also submitted this feature request against the releases component, because the patch affects that part).
- I sneaked in this list to the project release XML namespace, so the project list is a kind of easter egg, it provides different information in a different format. This is also not 100% logical.

The code should work, but we are better to discuss these things to position this service to a proper place both in the source code and in the output.

AttachmentSize
project_all_list.patch5.75 KB

#1

Gábor Hojtsy - July 7, 2007 - 12:08

BTW the patch also fixed a bug, the code had "@old, $new" in an error message, where it should have been "@old, @new".

#2

dww - July 13, 2007 - 01:25
Status:patch (code needs review)» patch (code needs work)

Yeah, the code looks good. However, I agree this is a little bit weird to put the project list into the "release-history" tree, and related code. On the other hand, it's a little unfortunate to duplicate some of the code from project-release-create-history.php and in project-release-serve-history.php, for that matter.

So, I'm quite torn on the Right(tm) approach to this. From the elegance standpoint, I agree that fetching this XML project list from something like:

http://updates.drupal.org/project-list

makes more sense. On the other hand, if we included more release-specific information in here, such as the list of compatibility terms that a given project has releases for, then it might make more sense to keep it under release-history. It's still a little weird that it's got a different XML "schema" (not sure if that's the right terminology), but at least it'd still be related to releases in a more obvious way.

So, I'm setting this to "needs work". There are 2 possible paths back to "needs review":

  1. Move it completely out of modules/project/release and "release-history". Put a project-create-xml.inc file at modules/project that's shared by both project-release-create-history.php and project-create-list.php for any common code we can share. Write a new thin project-serve-list.php script for modules/project to live at htdocs/project-list. Again, consider if there's code we want to share (maybe not, since it's so small, and most of the code in project-release-serve-history.php is to handle the query args and eventually to store usage data, so it's probably not worth the hassle of a .inc file for a tiny few shared functions).
  2. Add some release-specific data to the generated project-list XML file, e.g. compatibility terms (perhaps with total # of release nodes for each term?) and leave it where it is. I guess I'm ok with "project-list" as the project name, though any bright ideas on better alternatives would be welcome.

I ended up renaming "project-release-history.php" to "project-release-create-history.php" so it fits more nicely with "project-release-serve-history.php". So, attached patch is just a re-roll of yours relative to the new file (and removes some offsets that snuck in from a few other recent commits to this script). That should be a starting point for approach #2 if that's how you'd like to proceed. As I said, I'm torn on which way is better, so I'm leaving this open for consideration/discussion.

Thanks,
-Derek

AttachmentSize
project_all_list.patch_1.txt5.85 KB

#3

dww - July 13, 2007 - 01:29

Committed a fix to that watchdog() typo you mentioned in your original post (http://drupal.org/cvs?commit=73326), so here's a re-roll that still applies cleanly.

AttachmentSize
project_all_list.patch_2.txt6.07 KB

#4

moshe weitzman - July 31, 2007 - 20:49

subscribe - i need this too for a module rating demo that i am fooling with. i might try to edit the patch as suggested. we'll see.

#5

moshe weitzman - August 1, 2007 - 06:24

what are "compatibility terms"? how might they represented in the XML? any other release data?

#6

dww - August 1, 2007 - 14:58

"compatibillity terms" are the taxonomy terms that specifiy what version of core a release is compatible with.
For example, the "5.x" term is here: http://drupal.org/taxonomy/term/78

The "5.x" is part of the URL used to fetch project history:
http://updates.drupal.org/release-history/og/5.x

That file contains info for all releases of og compatible with 5.x core. Since core versions are never cross compatible, any site running a given version of core can only possibly care about releases for that version.

If you look at that XML file, you'll see how we currently represent this:

<api_version>5.x</api_version>

Let me know if you have other questions. I'm still not sure with of the two options in comment #2 is the right way to go, so input on that is also welcome.

Thanks,
-Derek

#7

Gábor Hojtsy - August 6, 2007 - 23:09
Status:patch (code needs work)» patch (code needs review)

The patch did not apply anymore, so first I rerolled it. Then added core compatibility terms, with the IMHO expected api_versions container tag and api_version list item tag (designed after the release XML structure).

The number of releases for a given project might be useful, but what would be most invaluable for me is the date of the last updated release, to be used as kind of an if-modified-since check in localization server. I did not include this one yet, because I'd prefer reviews of this patch first, if possible.

AttachmentSize
project-list-allxml.patch6.51 KB

#8

moshe weitzman - August 7, 2007 - 00:29

I have always used form_xml_elements() to contruct XML. It does all the nice escaping thats needed. If we don't have user input here, we could get away without it but it is the best practice.

I would appreciate having the username of the project nid author.

#9

Gábor Hojtsy - August 7, 2007 - 16:47

I opted to not rewrite what is in there, but extend current functionality with current practices.

It would be great to come up with a sensible tag/format to include the username.

#10

moshe weitzman - August 7, 2007 - 19:39

you can see from http://drupal.org/rss.xml that out standard way of listing username in nodes is <dc:creator>bertboerland@www.drop.org</dc:creator>. i suggest we stick with that for now.

#11

dww - August 8, 2007 - 01:27
Status:patch (code needs review)» patch (code needs work)

@moshe: <dc:creator>, huh? What's the "dc:" part stand for? ;)

@gabor:

A) I tried running this on d.o and it doesn't work. It doesn't generate any <api_version> tags for any projects. Oh, and I I see why... there's an SQL error in there. :( Notice the extra ',' at the end of your SELECT clause, before the FROM.

B) This isn't needed: INNER JOIN {vocabulary} v ON td.vid = v.vid -- {term_data} (which you're already JOIN'ing on) already has the vid, and it doesn't look like you're using {vocabulary} for anything else.

C) I'd add an ORDER BY td.weight ASC in there to order the API compatibility terms in a deterministic way.

D) The implementation itself is clean in the sense of code re-use and simplicity. However, it seems rather evil in terms of performance:

  1. The *entire* XML array for all projects is constructed in RAM. This hasn't yet been a problem with the per-project stuff, since those files are relatively small XML files, but the one for the project-list is 445K. Keep in mind that the whole query result objects are in RAM, too, so the total footprint of this is going to be quite large.
  2. This generates ~2000 separate queries to get all the API terms for each project in a rather tight loop. Seems like it wouldn't be that hard to just get everything for the project-list in the initial query? I'm not sure which is worse for d.o -- tons of really small queries or one big one.

#12

dww - August 8, 2007 - 01:31
Status:patch (code needs work)» patch (code needs review)

This patch fixes A-C. D is still an open question. Of course, the username, etc, isn't in there, yet (though we can always add to the XML schema later).

This works, as evidenced by: http://updates.drupal.org/release-history/project-list/all

I think I'd be ok with committing this to CVS and deploying it via cron on d.o, but I'd like a 2nd opinion on D before I do.

Cheers,
-Derek

AttachmentSize
project-list-allxml.patch_2.txt6.54 KB

#13

Gábor Hojtsy - August 8, 2007 - 09:47

Great, thanks! We should indeed discuss performance issues here. Of course if we would remove the API terms and make this a project list only, the file size would be cut in half, and the query performance would receive a boom. We added this feature in the first place to make this 'release history' compliant, but my project will drop this data right away. In case we don't see a realistic need for this data apart from elegance, we can indeed move this code out of the release history part and optimize elsewhere.

While looking at this list, I also noticed that I'll need to blacklist several project types (stub projects like "Drupal.org webmasters", translation projects themself, and so on). It is not easy to tell from this data what type of project is in question. If it fits with you, my needs would be better served with an XML which contains information about what type of project we are looking at. So I'd add even more data to the XML to reflect this (evil grin).

BTW DC refers to Dublin Core metadata, and would introduce namespaces in this XML file.

#14

dww - August 8, 2007 - 15:59
Status:patch (code needs review)» patch (code needs work)

Right, I realize I'm the idiot who wanted the API compatibility terms in the list in the first place. ;) Those comments should have been @dww as much as @gabor. ;)

That said, I do think they're useful, and this script only runs every 6 hours, not on every page load, so the bad performance isn't *that* big a deal...

E) Looking at this more, I'm not sure about the <link> tag. Perhaps we should give the node/[nid] version, instead? These direct project aliases have been causing all sorts of grief and URL namespace conflicts. But, I guess they're so entrenched, they're here to stay, so I guess it's ok. However, there are a handful of very old projects that have never been edited since my fix went in that updates these aliases automatically (previously, it was all manual effort by d.o admins), so a handful of projects don't have aliases that work. The nid is sure to always work, so I'm leaning towards that... Of course users of this data that want to present "human-readable" links can always construct them based on the "short_name" field, which is how the URL aliases are built in the first place.

F) Including the project type taxonomy identifier (much like we do with the release type taxonomy terms) would be great.

G) There's no handling of unpublished projects at all. Either they should be excluded from the list in the first place (probably) or at the very least indicated as such in the XML (which is what we do for unpublished release nodes, in case update_status wants to print a really strong warning that "this release is no longer available on drupal.org and is therefore unsupported...", etc). Including the <status> in the project list doesn't make as much sense to me, I think it'd be better to just restrict the query for the list of projects to (n.status = 1) in the first place.

H) If we wanted to get crazy, we could also include tags that indicate if releases are enabled for the project or not.

I) If we *really* wanted the performance to get terrible, we could include tags about the project's issue queue (which is technically optional as well), for example, the current count of issues in each state, etc, etc. ;)

J) We still have Moshe's request for info about the current owner of the project node.

So, setting this to needs work for E, F, and G. H, and I might never happen, or could be moved to another issue. I'm still unclear about J -- do we just want the username, probably we also want their uid and/or link to their /user/[uid] page? Do we just want the project node owner or do we want to include everyone with CVS access?

#15

moshe weitzman - August 8, 2007 - 15:59
Status:patch (code needs work)» patch (code needs review)

I think release info is quite nice in this feed, so i propose we keep it and optimize if the strain hurts drupal.org or another user of this feature. I don't *have* to have it, so I am OK with either outcome.

#16

dww - August 8, 2007 - 16:16

I just realized how confusing this might be: "So, setting this to needs work for E, F, and G. H, and I might never happen, or could be moved to another issue.". ;) Let me try again:

So, I'm setting this issue to needs work for (E), (F), and (G). Points (H) and (I) might never happen, or could be moved to another issue.

;)

#17

dww - August 8, 2007 - 16:18
Status:patch (code needs review)» patch (code needs work)

Whoops, and I missed that moshe and I replied simultaneously and he clobbered my "needs work"... ;) Ahh, the joys of the issue queue first thing in the morning.

#18

moshe weitzman - August 8, 2007 - 16:39

i was thinking that we have username and link elements which show the username and URL *for just the project owner*.

#19

Gábor Hojtsy - August 8, 2007 - 20:09

@dww:

(E): I think the link is better. You give reasons yourself, why it cannot be automated to generate the link from the short_name. There are lots of projects in the list without aliases.
(F): Sure.
(G): IMHO we just should not list unpublished stuff, that's it.
(H): Why? :)
(I): I don't need this, neither I think someone else would do. This file will get big enough to download and especially to *parse* anyway.
(J): Yes, I also think Moshe meant the owner, definitely not all contributors.

I don't have time right now to work on an improved patch unfortunately, but will have time tomorrow (it is 22:00 here).

#20

moshe weitzman - August 27, 2007 - 15:56
Status:patch (code needs work)» patch (code needs review)

This patch resolves all outstanding items:

E: we include a LINK whose value is determined by url()
F: i added these category elements to both individual project xml files and the new site-wide one
G: don't include unpublished projects
J: added owner of project node as dc:creator - is consistent with node_rss_item(). i added to both individual project xml files and the new site-wide one

I generated the patch from all the way down to /sites/all/modules/project/release.

AttachmentSize
mw_65.patch3.32 KB

#21

hunmonk - August 27, 2007 - 21:29
Status:patch (code needs review)» patch (code needs work)

call me crazy, but this doesn't look like the right patch at all...

#22

moshe weitzman - August 28, 2007 - 00:13

yikes. those are the lines i had to coment out to get the drupalorg_testing profile to finish on php5. anyway, here is the right patch.

AttachmentSize
mw_66.patch9.47 KB

#23

hunmonk - August 28, 2007 - 03:00
  • spacing needs to be fixed for n.uid=u.uid in a couple of spots.
  • we reordered the args in project_release_history_write_xml() -- seems like we should reorder the Doxygen accordingly
  • why are we sanitizing output here: $xml .= '  <title>'. check_plain($project->title) ."</title>\n";, but not here: $xml .= "<category domain=\"$taxonomy_url\">$term->name</category>\n";. overall it looks like output filtering is missing in several places.

#24

moshe weitzman - August 29, 2007 - 02:45
Status:patch (code needs work)» patch (code needs review)

fixed all items on hunmonk's list. good call on the output filtering. i added into a couple places.

AttachmentSize
mw_67.patch9.78 KB

#25

Gábor Hojtsy - November 30, 2007 - 18:54

dww: any issues with this patch?

#26

dww - November 30, 2007 - 20:08
Status:patch (code needs review)» patch (code needs work)

Sorry, totally fell off my radar. Yes, there are a few issues. There are a couple of potential XSS bugs in this hunk:

@@ -174,6 +174,12 @@ function project_release_history_generat
   $xml .= '<title>'. check_plain($project->title) ."</title>\n";
   $xml .= '<short_name>'. check_plain($project->uri) ."</short_name>\n";
   $xml .= '<link>'. url("node/$project->nid", NULL, NULL, TRUE) ."</link>\n";
+  $xml .= "<dc:creator>$project->name</dc:creator>\n";
+  $terms =  taxonomy_node_get_terms_by_vocabulary($project->nid, _project_get_vid());
+  foreach ($terms as $term) {
+    $taxonomy_url = url("taxonomy/$term->tid", NULL, NULL, TRUE);
+    $xml .= "<category domain=\"$taxonomy_url\">$term->name</category>\n";
+  }
   $xml .= '<api_version>'. check_plain($api_version) ."</api_version>\n";
   $xml .= "<default_major>$project->major</default_major>\n";
   $xml .= "<releases>\n";

In particular:

A) $project->name

B) $term->name

One other problem:

C) This URL is bogus: url("taxonomy/$term->tid"), should be taxonomy/term/$term->tid. This is broken both in the above hunk and down in project_list_generate().

That's all I can spot right now, there might be more...

#27

adrian - January 23, 2008 - 14:00

subscribing.

i'm going to need this for Hostmaster

#28

Gábor Hojtsy - February 20, 2008 - 20:50
Status:patch (code needs work)» patch (code needs review)

@dww: I went through all your suggestions, and fixed those. Interestingly, the "all projects" list had these XSS holes covered in the patch already. Strange. I also implemented some coding style fixes (whitespace, quotes, etc), but otherwise did not find issues. Back for review again. (Patch rolled from project module root, unlike Moshe's previous patch).

AttachmentSize
project-list-all.patch9.63 KB

#29

dww - March 7, 2008 - 17:51
Status:patch (code needs review)» patch (code needs work)

Code looked ok on visual inspection, so I put it on p.d.o and ran it. Unfortunately, the resulting XML is invalid:

http://project.drupal.org/release-history/project-list/all

error on line 7 at column 65: Namespace prefix dc on creator is not defined
error on line 27 at column 339: Namespace prefix dc on creator is not defined
error on line 34 at column 420: Namespace prefix dc on creator is not defined
error on line 41 at column 501: Namespace prefix dc on creator is not defined
error on line 58 at column 731: Namespace prefix dc on creator is not defined
error on line 75 at column 961: Namespace prefix dc on creator is not defined
error on line 89 at column 1139: Namespace prefix dc on creator is not defined
error on line 106 at column 1369: Namespace prefix dc on creator is not defined
error on line 123 at column 1599: Namespace prefix dc on creator is not defined
error on line 134 at column 1729: Namespace prefix dc on creator is not defined
error on line 149 at column 1923: Namespace prefix dc on creator is not defined
error on line 165 at column 2137: Namespace prefix dc on creator is not defined
error on line 184 at column 2399: Namespace prefix dc on creator is not defined
error on line 198 at column 2581: Namespace prefix dc on creator is not defined
error on line 212 at column 2759: Namespace prefix dc on creator is not defined
error on line 228 at column 2973: Namespace prefix dc on creator is not defined
error on line 245 at column 3215: Namespace prefix dc on creator is not defined
error on line 264 at column 3477: Namespace prefix dc on creator is not defined
error on line 281 at column 3707: Namespace prefix dc on creator is not defined
error on line 299 at column 3953: Namespace prefix dc on creator is not defined
error on line 314 at column 4147: Namespace prefix dc on creator is not defined
error on line 333 at column 4409: Namespace prefix dc on creator is not defined
error on line 347 at column 4601: Namespace prefix dc on creator is not defined
error on line 363 at column 4815: Namespace prefix dc on creator is not defined
error on line 379 at column 5029: Namespace prefix dc on creator is not defined

#30

Gábor Hojtsy - March 8, 2008 - 02:16

The dc namespace should be defined at the beginning of the XML for that. I'll look into this a bit later if nobody beats me to it.

Also, it looks the links are quite bad, I don't know whether it is because of "issues" of (or the nature of) project.drupal.org or bugs with the code:

<projects>
<project>
  <title>Drupal</title>
  <short_name>drupal</short_name>
  <link>http://project.drupal.org/var/www/project.drupal.org/htdocs/sites/all/modules/project/release/project/drupal</link>
  <dc:creator>Drupal</dc:creator>
<category domain="http://project.drupal.org/var/www/project.drupal.org/htdocs/sites/all/modules/project/release/taxonomy/term/13">Drupal project</category>
  <api_versions>
   <api_version>7.x</api_version>
   <api_version>6.x</api_version>
   <api_version>5.x</api_version>
   <api_version>4.7.x</api_version>
   <api_version>4.6.x</api_version>
   <api_version>4.5.x</api_version>
   <api_version>4.4.x</api_version>
   <api_version>4.3.x</api_version>
   <api_version>4.2.x</api_version>
   <api_version>4.1.x</api_version>
   <api_version>4.0.x</api_version>
  </api_versions>
</project>

#31

dww - March 8, 2008 - 08:38

Thanks for looking into the dc namespace thing.
The bogus links are just from how I ran it on p.d.o, don't worry about that.

Cheers,
-Derek

#32

dww - March 8, 2008 - 16:07

Note to whomever works on this next: I committed #204140: Modify project-release-create-history.php to use {project_release_supported_versions} so this needs a re-roll to deal with conflicts, too. Thanks.

#33

Gábor Hojtsy - March 20, 2008 - 10:54
Status:patch (code needs work)» patch (code needs review)

OK, here is a reroll. It should fix the missing Dublin Core namespace and the conflicts. Reviewing the code I (still) have some concerns:

- user name information (dc:creator) crept into this patch above, but has really nothing to do with listing projects
- category information for project nodes is in the patch, but uses different tags then release category (eg. security, bug fix, new features) listing

The user name functionality is not in scope of this issue, although some of the issues evolved around that part of the patch. The category information I need to exclude non-code (eg. translation, DROP, etc.) projects from the l10n_server listings.

I need project listing information for l10n_server, and high level category information for projects, and that's it. I don't even need project release information in the project list. That was included so that we can put this into the release listing script. However the project release listing XML schema changed a lot with http://drupal.org/node/204140 being committed, so we probably need to take a fresh look at that as well. Again, I don't need the release information directly there, and it makes this generated XML very huuuge, so I am fine with removing that as well, keeping an even simpler project list. That's what *I* need and willing to scratch more there. Also, some of the schema changes simply need to be copied over, like project status information (eg. published, unpublished, etc) would be great in the project list.

AttachmentSize
project-list-all-reroll.patch10.5 KB

#34

Gábor Hojtsy - March 20, 2008 - 11:48

Updated version with:

- aliased user 'name' to 'user_name' to remove ambiguity with project title
- carried release tag schema over to project tag listing ('terms' and 'term' with 'name' and 'value' instead of 'category' with 'domain'), this also removes the link to the category, which was not represented in the release terms
- carried over project_status information to project list based on project node 'status' flag

Kept the DC creator intact. Still to discuss the api_versions list in the project list (at the end of patch), but otherwise should be good to go IMHO.

AttachmentSize
project-list-all-fixed.patch11.29 KB

#35

Dries - March 21, 2008 - 08:12

Tiny detail: in Drupal we use 'username' instead of 'user_name'.

#36

Gábor Hojtsy - March 25, 2008 - 16:05

- Patch resolving Dries' issue with user_name vs username. Otherwise no change in the patch!
- Checked the API versions concern I had. The patch has project data like this:

<project>
  <title>Drupal</title>
  ...
  <api_versions>
   <api_version>7.x</api_version>
   <api_version>6.x</api_version>
   <api_version>5.x</api_version>
   <api_version>4.7.x</api_version>
   <api_version>4.6.x</api_version>
   <api_version>4.5.x</api_version>
   <api_version>4.4.x</api_version>
   <api_version>4.3.x</api_version>
   <api_version>4.2.x</api_version>
   <api_version>4.1.x</api_version>
   <api_version>4.0.x</api_version>
  </api_versions>
</project>
...
<project>
  <title>Authentication</title>
  ...
  <api_versions>
   <api_version>4.2.x</api_version>
   <api_version>4.1.x</api_version>
   <api_version>4.0.x</api_version>
  </api_versions>
</project>
<project>
  <title>Bbcode</title>
  ...
  <api_versions>
   <api_version>5.x</api_version>
   <api_version>4.7.x</api_version>
   <api_version>4.6.x</api_version>
   <api_version>4.5.x</api_version>
   <api_version>4.4.x</api_version>
   <api_version>4.3.x</api_version>
   <api_version>4.2.x</api_version>
  </api_versions>
</project>

This is compared to data output for a specific project:

<title>Bbcode</title>
<short_name>bbcode</short_name>
<api_version>6.x</api_version>
<recommended_major>1</recommended_major>
<supported_majors>1</supported_majors>
<default_major>1</default_major>
<project_status>published</project_status>
<link>http://drupal.org/project/bbcode</link>
<releases>
...
</releases>

So this looks like in concert with the current project tag usage, having api_versions wrap api_version tags in the "all project" listing.

All-in-all I hope this should be good to go!

AttachmentSize
project-list-all-proper-username.patch11.29 KB

#37

Gábor Hojtsy - April 11, 2008 - 14:10

Anything else I should do about this patch? As I written above 3 weeks ago, "All-in-all I hope this should be good to go!".

#38

aclight - April 11, 2008 - 14:33
Status:patch (code needs review)» patch (code needs work)

+      $term_query = db_query("SELECT DISTINCT(td.tid), td.name AS term_name FROM {project_release_nodes} prn INNER JOIN {term_node} tn ON prn.nid = tn.nid INNER JOIN {term_data} td ON tn.tid = td.tid WHERE prn.pid = %d AND td.vid = %d ORDER BY td.weight ASC", $project->nid);

Doesn't this line (from project_list_generate()) need one more argument for db_query()? There are two %d placeholders but just one argument.

#39

Gábor Hojtsy - April 11, 2008 - 15:40
Status:patch (code needs work)» patch (code needs review)

Oh, right. We need the $api_vid = _project_release_get_api_vid() code. Added.

AttachmentSize
project-list-all-20080411.patch11.35 KB

#40

aclight - April 12, 2008 - 16:09
Status:patch (code needs review)» patch (code needs work)

I tested this and indeed it does create XML. The code in the patch looks good to me.

As for the XML generated here, it doesn't seem to be valid XML, at least according to http://www.stg.brown.edu/service/xmlvalid/

I don't know much about XML, so maybe it's not a problem that this doesn't validate or the site I was using is not a good choice for validation testing.

One thing I noticed about the XML output for individual projects is that it looks like you need to indent most of it by two additional spaces. Here's some example output:

<project xmlns:dc="http://purl.org/dc/elements/1.1/">
<title>Project issue tracking</title>
<short_name>project_issue</short_name>
<dc:creator>site1</dc:creator>
  <terms>
   <term><name>Project types</name><value>Modules</value></term>
   <term><name>Project types</name><value>Developer</value></term>
  </terms>
<api_version>5.x</api_version>
<recommended_major>1</recommended_major>
<supported_majors>1,2</supported_majors>
<default_major>1</default_major>
<project_status>published</project_status>
<link>http://localhostrgdrupal/project/project_issue</link>
<releases>
  [major snip]
<release>
  <name>project_issue 5.x-0.1-beta</name>
  <version>5.x-0.1-beta</version>
  <tag>DRUPAL-5--0-1-BETA</tag>
  <version_major>0</version_major>
  <version_minor>0</version_minor>
  <version_patch>1</version_patch>
  <version_extra>beta</version_extra>
  <status>published</status>
  <release_link>http://localhostrgdrupal/node/140</release_link>
  <download_link>http://localhostrgdrupal/files/project/project_issue-5.x-0.1-beta.tar.gz</download_link>
  <date>1169599520</date>
  <mdhash>8854aac14c1a6ed2a5b5f10add93f87e</mdhash>
  <terms>
   <term><name>Release type</name><value>New features</value></term>
   <term><name>Release type</name><value>Bug fixes</value></term>
   <term><name>Release type</name><value>Security update</value></term>
  </terms>
</release>
</releases>
</project>

Seems to me that everything between <project> and </project> should be indented extra.

Setting to CNW for the indentation issue. If I'm wrong there, that's fine.

#41

Gábor Hojtsy - April 14, 2008 - 08:35

The indentation itself should not make any XML invalid or valid. Do you have a copy of what was that site saying about invalidity of this XML?

#42

aclight - April 14, 2008 - 11:03

Sorry, I wasn't meaning to imply that the indentation had anything to do with the XML not validating. Those were two separate issues in my mind.

As for the validation errors, I pasted the XML from #40 into the validation engine I mentioned in #40, and here was the result:

Errors:

line 1, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: project
line 1, [user-supplied text]:
    error (1203): attribute can't be checked because element is not defined: project
line 2, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: title
line 2, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 2, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: title
line 2, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: title
line 3, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: short_name
line 3, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 3, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: short_name
line 3, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: short_name
line 4, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: dc:creator
line 4, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: dc:creator
line 4, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: dc:creator
line 4, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 5, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: terms
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term
line 6, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value
line 6, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 6, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name
line 6, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name
line 6, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 6, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name
line 6, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term
line 7, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value
line 7, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 7, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name
line 7, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name
line 7, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 7, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name
line 7, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term
line 8, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: terms
line 8, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: terms
line 9, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: api_version
line 9, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: api_version
line 9, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 9, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: api_version
line 10, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: recommended_major
line 10, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 10, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: recommended_major
line 10, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: recommended_major
line 11, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: supported_majors
line 11, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 11, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: supported_majors
line 11, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: supported_majors
line 12, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: default_major
line 12, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: default_major
line 12, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: default_major
line 12, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 13, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: project_status
line 13, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 13, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: project_status
line 13, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: project_status
line 14, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: link
line 14, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: link
line 14, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 14, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: link
line 15, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: releases
line 17, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 17, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: release
line 18, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name
line 18, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 18, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name
line 18, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name
line 19, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version
line 19, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version
line 19, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version
line 19, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 20, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: tag
line 20, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 20, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: tag
line 20, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: tag
line 21, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version_major
line 21, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version_major
line 21, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version_major
line 21, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 22, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version_minor
line 22, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 22, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version_minor
line 22, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version_minor
line 23, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version_patch
line 23, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 23, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version_patch
line 23, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version_patch
line 24, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: version_extra
line 24, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: version_extra
line 24, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: version_extra
line 24, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 25, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: status
line 25, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 25, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: status
line 25, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: status
line 26, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: release_link
line 26, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: release_link
line 26, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 26, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: release_link
line 27, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: download_link
line 27, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 27, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: download_link
line 27, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: download_link
line 28, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: date
line 28, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 28, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: date
line 28, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: date
line 29, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: mdhash
line 29, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: mdhash
line 29, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: mdhash
line 29, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 30, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: terms
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term
line 31, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value
line 31, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 31, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name
line 31, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name
line 31, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 31, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name
line 31, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term
line 32, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value
line 32, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 32, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name
line 32, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name
line 32, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 32, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name
line 32, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: term
line 33, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: term
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: value
line 33, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: value
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 33, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: name
line 33, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: value
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: name
line 33, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: name
line 33, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: (CharData)
line 33, [user-supplied text]:
    error (1102): tag uses GI for an undeclared element: term
line 34, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: terms
line 34, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: terms
line 35, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: release
line 35, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: release
line 36, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: releases
line 36, [user-supplied text]:
    error (1150): enclosing tag undefined or lacks content model; can't check child: releases
line 37, [user-supplied text]:
    error (1103): end tag uses GI for an undeclared element: project
line 37, [user-supplied text]:
    error (402): EOF encountered; no doctype declaration found: project

As I said above, I know next to nothing about XML, so I'm not sure that this site is a good choice for XML validation. Maybe there's something subtle that I'm not aware of that makes it a poor choice. It was just the first hit I got when searching Google for whatever terms I used.

#43

Gábor Hojtsy - April 14, 2008 - 21:36
Status:patch (code needs work)» patch (code needs review)

All these seem to be "undeclared element" or "lacks content model", which are really just the sign that project module does not refer to a defined XML schema to validate against. All you can do is to ensure/check the well formdness of the XML genearted, as there is no schema to validate against, the XML would never validate. This is true to all files generated by project module before or after this patch. So these validation errors should not hold up the patch.

Also, the XML output for individual projects is not generated by this patch, this patch only slightly modifies it, so that the existing code does not do nice indenting there should not be a problem in this patch.

With all this considered, I think you did not identify any issues with this patch. Anything else bad you noticed which my hold this back from being committed?

#44

aclight - April 14, 2008 - 21:59
Status:patch (code needs review)» patch (reviewed & tested by the community)

Anyone who has hung out in the project* for long enough would know that just because a patch doesn't introduce new errors/issues, that doesn't mean that this fact alone should not hold up the patch. :)

But, no, I didn't find any other issues with the patch other than those I've mentioned. I'll RTBC it and then dww/hunmonk can have their way with it.

#45

hunmonk - April 16, 2008 - 03:59

visual inspection of the code looks good. since dww is really most familiar with this functionality, i'm going to leave it to him to commit.

#46

dww - June 11, 2008 - 03:23
Status:patch (reviewed & tested by the community)» fixed

I still feel a little dirty with this list living under "release-history", but whatever, this issue has dragged on for way too long and I don't have time/energy to rewrite everything for a more elegant URL. After final review, committed to DRUPAL-5 and HEAD, and deployed on d.o.

Behold:

http://updates.drupal.org/release-history/project-list/all

Heh, great...

% wget http://updates.drupal.org/release-history/project-list/all
%
wc all
   47690   53641 1572577 all

Good luck downloading and parsing this monster. ;)

Anyway, thanks to everyone who helped with this, and sorry for the delays.

#47

dww - June 11, 2008 - 23:29
Category:feature request» bug report
Status:fixed» patch (code needs work)

Crap, the new XML breaks the parser in update(_status)?. :( See #269444: Some xml information being lost by parser. Ugh.

 
 

Drupal is a registered trademark of Dries Buytaert.