Add XHTML validation to simpletest

catch - January 6, 2009 - 20:33
Project:Drupal
Version:7.x-dev
Component:simpletest.module
Category:feature request
Priority:normal
Assigned:boombatower
Status:postponed
Issue tags:Favorite-of-Dries, XHTML validation
Description

Quoting Dries: "I'm wondering if we could add a XHTML validation step to D7's test framework so that every page that is tested automatically gets validated ..."
Cross posting from #326527: form id="node-form" declared twice

#1

boombatower - January 20, 2009 - 08:47
Assigned to:Anonymous» boombatower

Since we already use DOM to parse document should be easy.

DOMDocument::validate -> http://us3.php.net/manual/en/domdocument.validate.php

#2

boombatower - January 20, 2009 - 08:49
Status:active» needs review

Here is some testing I have done.

The validation seems to work great...checks some subtle things I tried.

I included some code MYDOMDocument in a user comment to collect the actually errors. This may/maynot be something we want to collect. Also the class requires cleanup (and I can't get to work either).

AttachmentSizeStatusTest resultOperations
simpletest_dom.patch3.76 KBIdleFailed: 6999 passes, 1 fail, 21726 exceptionsView details | Re-test

#3

System Message - January 20, 2009 - 09:20
Status:needs review» needs work

The last submitted patch failed testing.

#4

mr.baileys - March 16, 2009 - 07:58

I had something similar over at #402356: XHTML validator needed in SimpleTest. (now marked as duplicate).

Some comments after reviewing your patch:

  1. As I understand it, the MyDOMDocument class is used to trap the errors thrown by the validation functions. I think using libxml_use_internal_errors() and libxml_get_errors() is cleaner and shorter (although I'm not sure about compatibility across PHP (sub-)versions).
  2. When I used DOMDocument::validate(), the performance was terrible. I traced this down to the DTD (and its related resources) being downloaded on each request. I've found that XML caches are the solution for this, but couldn't get one set up on my end (Windows...), and doing so automatically doesn't seem feasible. I'm wondering what your experience with regards to performance is? Maybe this is a Windows issue? My final solution was to bundle the DTDs with SimpleTest, but this results in a large patch (which ni turn might make it unsuitable to be commited in the end) and might prove more difficult to manage in the long term.

I've attached my patch here for comparison/documentation (I don't mean to hijack your issue). Also set this to CNR to let the testbot play with it once.

Also note that there is a similar effort by Damien Tournoud in (#402254: How do we do on XHTML validation?). While not full XHTML validation, it does bring to light a number of bugs by just changing one line in SimpleTest.

AttachmentSizeStatusTest resultOperations
XHTML_validation.patch193.78 KBIdleFailed: 10582 passes, 6888 fails, 2366 exceptionsView details | Re-test

#5

mr.baileys - March 16, 2009 - 08:03
Status:needs work» needs review

Also set this to CNR to let the testbot play with it once.

#6

boombatower - March 16, 2009 - 08:12

The performance I got was terrible as well and that was the same conclusion I drew...and if you delete the DTD references from the template it runs better (my hack to confirm).

Thus I am open to whatever solution you have if it runs better. I just remembered that existed and played with it to see what I could get to work. Glad to see someone else interested in this.

#7

mr.baileys - March 16, 2009 - 08:26

Problem is: by dropping the DTD I think we're defeating the purpose of XHTML validation, since it's the DTD that drives most of it. Haven't tested this, but my fear is that be removing the DTD, we're essentially downgrading to just HTML validation.

Maybe we should start by building a test case containing things we want the validator to notice? (note that some of these will get caught even by normal HTML validation as they're not XHTML specific):

  • Uppercase element names
  • Unclosed tags
  • Unmatched tags
  • Disallowed tags
  • Tags that are only allowed within certain other tags
  • Invalid element attributes
  • Duplicate ID attributes
  • Incorrectly encoded entities
  • ...

#8

System Message - March 16, 2009 - 08:45
Status:needs review» needs work

The last submitted patch failed testing.

#9

boombatower - March 16, 2009 - 20:55

I don't think we should remove the DTD tags...I would just commenting that that in fact seems to be the problem and we need some sort of solution to make this feasible.

#10

Dries - March 17, 2009 - 15:38
Issue tags:+Favorite-of-Dries

Awesome. One option, but I'm not sure I like that, is to make the validation optional; i.e. there could be a checkbox on the SimpleTest page that allows people to enable/disable validation. HTML validation is probably not something we'd need to run all the time, if gets run once in a while, that might be "good enough".

#11

Damien Tournoud - March 17, 2009 - 15:46

I don't think we actually need all that, at least not right now. DOMDocument::loadHTML() seems to know a lot about HTML, including (at least part of) the DTDs. I just posted a simple patch to suppress error hiding in #402254: How do we do on XHTML validation?... it resulted in 1442 exceptions. We should first concentrate on fixing those, then come back to full-fledged validation to see if it makes a difference.

#12

Damien Tournoud - March 17, 2009 - 15:47
Status:needs work» postponed

Let's postpone this one until we investigate the 1442 exceptions of #402254: How do we do on XHTML validation?.

#13

Damien Tournoud - March 17, 2009 - 20:53
Issue tags:-Invalid markup, -XHTML

Uniformizing tags.

#14

Dave Reid - July 16, 2009 - 14:41

+1 for this. Would be even better if we could somehow get a way to validate the RSS/XML feeds that Drupal generates.

 
 

Drupal is a registered trademark of Dries Buytaert.