Some thoughts on the execution - I think it should use pure HTML for documentation [#285290]

Oh dear. I was really happy when I saw this module mentioned. I've been thinking about some useful sort of extension to the module help for, um, 2 years now.
But.

I've just scanned the CVS code (not tried it out yet, I'm on a non-dev box)
It's looking incredibly heavy for what (ithought was) a pretty simple job. I guess it got a few features thrown at it, or the search indexing turned out to be tricky.

I've got (at least) two concerns that stop me from even trying to join in and follow its lead.

#1 the *.html files are not actually HTML. Arg! They are incomplete, invalid HTML fragments. We suddenly loose all the practicality of using HTML as a doc format if we don't follow the rules.
Would it have been so hard to specify real HTML input then strip the content out of the body tags?
#1.b The links to each other are (as far as I can see) proprietary and made up ... and don't work out of context like, y'know, viewing the help files directly, offline. I can see why they are there, but i'd think of using a links rel= property or something for that meta-link.
#2 the *.ini files syntax put metadata in yet another place, using yet another syntax. Couldn't we have embedded that sort of info in the docs themselves, using well-worn meta tags and/or semantic/microformat tags? Or even a toc.html index? The parsing might look a tiny bit more challenging, but looking at the many horrible hoops the search-and-replace text-parser is already doing, it looks to me like it's time to drink the XHTML kool-aid and start trusting PHP5 and its DOM tools.

This critique is just a version of how I would have approached it, and I have different priorities/requirements than were in force in this module I guess. Please take it as constructive a code review as I can manage :-}

Important for me is:

HTML documentation is as accessible offline as a README.txt. Valid for opening in a filebrowser, published raw files or online CVS archive.
editing of those files can be done with standard tools and standard conventions. An HTML editor and HTML tidy.
cross-referencing links should work as expected on a flat file system also. Making them more clever and woven into the context of the 'help system' should be done by the tool that's re-writing them into its own context.
It should not involve inventing any new syntax when things can be stolen from existing conventions like doxygen, docbook and HTML itself. It's got to be drop-in and play. An HTML manual that's self-contained and just happens to be compatible with another tool that knows how to read it. One valid help/index.html file created with any tool and dropped in the dir and the minimum requirement is met.
It should be invisibly extensible to allow for new features. All features should be optional. This is why I like a microformat approach.
ini files annoy me

... :-(
So I really like the idea, and was hoping to come join in and try a D5 backport. But the current architecture approach doesn't make me happy enough to join in.

... just some feedback, I hope you see where I'm coming from.
regards, .dan.

Comments

Comment #1

merlinofchaos commented 21 July 2008 at 16:24

Sorry. Most of this stuff never really occurred to me. I rather wish you'd performed this review before summer of code started so that Gurpartap could have gotten your feedback, but I fear it is probably a little late now.

Some comments:

.ini format is very easy and I know nothing about the other formats you mention, so they aren't really something I would even have thought of. Also everything you mention would be much heavier code which is another thing you're having trouble with, so apparently there's a 'lose' there either way.

Dealing with embedded URLs in Drupal turned out to be a major pain in the patookie because of things like clean urls and the fact that you can't really rely on where in your structure even things like "administer >> site building >> modules" will be. To be able to get that URL properly you need something special.

And I have no idea how you'd cross-reference links into another module.

doxygen is not an HTML documentor, so I don't think there's anything that could be stolen from there. Maybe rel metadata could've worked but I never thought of that. I'm still dubious.

Embedding meta info into the doc itself would've required a very difficult parse to pick up the index tree. It would've required scanning and parsing every help file in the system, which sounds very heavy to me.

ini files seem to annoy lots of people, but they're at least clean and easy to deal with.

So all in all, not sure what I can do here.

Comment #2

dman commented 22 July 2008 at 01:12

Yeah, I didn't see that the first time around, only saw it in passing yesterday. Sorry for being late to the game.

Well, I'm much more comfortable (nowadays) with metadata extraction from DOM structures, although it was painful for years. So I don't see it as at all tricky anymore, especially compared to a ton of regexps. You are needing this sort of function anyway to rewrite the links accurately, and I think that after getting over that hump then the cross-referencing tricks would be a LOT easier as the link elements would be lots more workable.
However, it does depend a bit on familiarity with the XML tools. PHP5+XML is now (or should be) ubiquitous enough to stop worrying about hacks. Especially for developer machines this module is targeted at.

I know what it's like to rewrite URLs in mixed contexts, but there are ways through once you bite the bullet to rewrite everything using a proper tokenizer/DOM. The rewriter is already working hard enough to do that (tricky indeed) - I just want it to start with a more valid base source to do its magic on.

I mention doxygen mostly for reference with terminology and approach, not neccessarily syntax. Although there is absolutely nothing to stop a commented doxygen block working just fine inside an HTML.

Seeing as (from what I can see) there's a lot of indexing happening anyway, I'm not too worried about the processing weight. I'm imagining these as being admin-only references, although you'd have a point when these are public.

I can live with ini files for primary meta-indexing now, if you think they really parse that much better. But if you'd be into formalizing *.html files as, um, HTML files I could play with a demo of the methods I'd suggest. If we can make those xhtml-strict, then most other stuff can be transparently layered in. Which is one of my points ;-)

Comment #3

merlinofchaos commented 22 July 2008 at 02:51

Here's a downside: PHP4 compatibility is a requirement. And we all know how nasty PHP4 and XML are.

Comment #4

merlinofchaos commented 22 July 2008 at 02:53

If you think you can do a demo in PHP4, and it won't take TOO much effort (I'd hate for you to waste a lot of effort if it's a direction we ultimately choose not to pursue) I would like to see it. But PHP5 only would be a Drupal 7 thing and I'm personally less involved with that and Gurpartap would currently be the guy to convince. And his project is past the halfway point so there probably isn't time to do that.

Comment #5

dman commented 22 July 2008 at 03:11

OK, if PHP4 is a stumbling block I will totally withdraw. I've lived through it and don't want to again. I'll not be mocking that up.
It just looked like this was an advanced feature module and PHP5 requirements were reasonable this year.

Still, using normal regexps we can just grab the [body] and proceed as before without any real backwards incompatability. But anything I'd be contributing would be PHP5 only. If that's not a go... OK.

Comment #6

merlinofchaos commented 22 July 2008 at 16:09

I'll see if I can find Gurpartap and point him at this issue.

I wrote this primarily to be a companion to Views; Views for Drupal 6 is PHP4 compatible, so this must be to. For Drupal 7, that's another story entirely.

Comment #7

redndahead commented 1 April 2009 at 23:02

Status:

Active

» Closed (won't fix)

Marking as won't fix. I think the direction of D7 isn't moving this way so I think it's something to bring up for D8.

Some thoughts on the execution - I think it should use pure HTML for documentation

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

News items

Our community

Documentation

Drupal code base

Governance of community