This is not strictly the fault of wysiwyg_imageupload - yet it is down to the the way that wysiwyg_imageupload does what it does.

Nice valid xhtml-strict source

<img src="image.jpg" alt="an image" />

becomes

<img src="image.jpg" alt="an image">

whenever "Image Uploading" is enabled for the WYSIWYG format.
This is for "normal" image embeds, not the actual wysiwyg_imageupload additions, and also applies to a few other trivial source-code cases.

This is a problem for me as
A: the rest of the site and theme used to validate perfectly. Now it can't.
B: I am using another input filter that does clever things using xpath processing on what used to be valid XHTML.

I spent like 6 hours on this today, first tracking it down to here, then tracing it and trying and failing to find work-arounds.

It's all about how tinymce-3.js chooses to use jquery on the attach and detach callbacks.
For reasons I've now researched until my eyes bleed,
- tinymce proper does not use jquery, and does a fine job of maintaining markup validity.
- wysiwyg_imageupload uses jquery to do a little pre and post processing of the text whenever the text editor is enabled.
- jquery, specifically the $content.html() call, doesn't give a damn about the validity or DOCTYPE and always returns old-style HTML (no closing singleton slash), not XHTML.

Stand-alone WYSIWYG.module + TinyMCE is fine, but wysiwyg_imageupload calls jquery to read, parse, manipulate and write-out the text content. In the process, that corrupts it.

I have totally failed to find a way to sensibly prevent this from happening, or to get jquery to return better markup.

I searched a LOT, and am not proposing any solution, I just need to document it here that this should rate as a "Known issue"

If any genius can solve this well, I'd like to see it.

* I don't want to discard the current, sensible jquery utility calls totally. But it looks like that's what it would take.
* I don't like the idea of a regexp repair-job after-the-fact. That's just horrid.
* I tried working around the issue to pretend it should not happen when the attach/detach processes actually have nothing to do. But that just becomes unstable.

For my actual use-case, I'm going to give up and get my XHTML parser to re-tidy the bad input before processing, as that's the most robust way to protect myself against invalid stuff.

Comments

EugenMayer’s picture

Title: WYSIWYG Image Upload (+ tinyMCE) can break XHTML validity. img tag singletons lose their closing slash. » Be XHTML valid
Category: bug » task
Priority: Normal » Minor

Thank you for the detailed description and your time you took to actually deal with the issue yourself.

In my eyes, thats rather not a bug rather cosmetics for HTML-purist. Nevertheless, iam open to fix those things to get a "cleaner" web :) . AFAICS this is just about fixing the theming methods - that pretty much it. If you are talking about "in editing" mode, i rather tend to tell you, validity is a "dream to accomplish" there - i would then rather close this ticket. We can hope to reach this gole with HTML5 editors in the future.

For the output "rendered" node view, just fix the theming methods

Patches are welcome

dman’s picture

Yeah it's not something that the theming methods touch.

This issue in general isn't actually very closely related to the wysiwyg_image_upload functionality and the way it manages images at all, so I appreciate it's not really a bug here.
It's more that choosing to *use* this module exposes an underlying issue that would occur any time you choose to :
- load valid XHTML
- run jquery manipulations over it
- dump the result back to the page as old-style HTML
- and then *save* that result into the database.

The things that got broken for me were other img tags, not ones that were managed by wysiwyg_image_upload, but got corrupted as collateral damage because the module plugin js *did* go through those steps above.

It became an issue for me, not as a purist thing (though I certainly am) but because elsewhere in the site we scan and parse the node contents for semantic enhancements using XML tools.

Until now, using WYSIWYG+TinyMCE would maintain that data integrity for me, and I got XHTML-in, XHTML-out when editing, even when doing all sorts of things with the editor.
Adding this plugin to the mix accidentally spoiled that. Disabling the plugin returned to the correct behavior.
Yes, it's in the "edit mode" that this breakage happens - but the breakage is caused directly by the actions of the tinymce.js file.

But no, it's not really a bug against this module. IMO it's against jquery.
It's just this edge-case combination of jquery messing with the textarea that then ends up back in my database that is exposing this as a problem.

I mostly wanted to *document* that this was a *problem* as I failed to find any helpful research on this in my searches.