Hi,

I am trying to find out what might be causing the /> to be changed to > on image tags when I enable the Teaser break plugin.

I discovered the problem and narrowed down the issue in this post: http://drupal.org/node/504946.

I don't know enough about what is contributing to changing the tag, but as an experiment, I commented out this in the break.js.

  /**
   * Replace images with <!--break--> tags in content upon detaching editor.
   */
  detach: function(content, settings, instanceId) {
    var $content = $('<div>' + content + '</div>'); // No .outerHTML() in jQuery :(
    // #404532: document.createComment() required or IE will strip the comment.
    // #474908: IE 8 breaks when using jQuery methods to replace the elements.
    // @todo Add a generic implementation for all Drupal plugins for this.
    $.each($('img.wysiwyg-break', $content), function (i, elem) {
      elem.parentNode.insertBefore(document.createComment('break'), elem);
      elem.parentNode.removeChild(elem);
    });
    return $content.html();
  },

Now the img tags are valid xhtml as the /> is not removed, but of course the break comment is not rendered.

Please let me know if you need any more info.

Cheers, eezz.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sun’s picture

Hm. The question is, what exactly triggers the bug.

Is it

    var $content = $('<div>' + content + '</div>'); // No .outerHTML() in jQuery :(
...
    return $content.html();

or is it

    $.each($('img.wysiwyg-break', $content), function (i, elem) {
      elem.parentNode.insertBefore(document.createComment('break'), elem);
      elem.parentNode.removeChild(elem);
    });

or is it

      elem.parentNode.insertBefore(document.createComment('break'), elem);
      elem.parentNode.removeChild(elem);

?

eezz’s picture

Hi sun, I commented out those items individually to narrow it down.

var $content = $('<div>' + content + '</div>'); // No .outerHTML() in jQuery :(
...
    return $content.html();

Commenting out that one breaks the editor. The "source editor" popup is just blank.

$.each($('img.wysiwyg-break', $content), function (i, elem) {
      elem.parentNode.insertBefore(document.createComment('break'), elem);
      elem.parentNode.removeChild(elem);
    });

Commenting out this line still strips off the closing /

 elem.parentNode.insertBefore(document.createComment('break'), elem);
      elem.parentNode.removeChild(elem);

Same result as above.

I don't have the skills to know what is going on here, but if there is anything else you think I should try, let me know.

Cheers, Ian

eezz’s picture

Hi,

I tried it using the FCK Editor, thinking that would make a difference, of course the same issue occurs.

However, I have discovered that Drupal uses node_teaser() to put in the breaks anyway ;). I didn't know that because I am quite new to Drupal API.

So I can safely turn off the Teaser Break plugin and in the cases where I want to modify where the break goes I can input the break comment manually.

I guess that doesn't resolve the bug, but at least a work around.

Cheers, eezz

TwoD’s picture

Assigned: Unassigned » TwoD

I'll look into this. The Teaser Break plugin has so far proven to be a good issue bait, allowing us to catch a couple of "curiosities" which we hopefully can avoid in the future and I'd love to keep working with it. Keeping all output [Strict] XHTML is hard, especially since we don't always have precise control of how an editor or browser behaves, but we of course strive to do it as it's a requirement for many users.

We very much appreciate everyone finding, reporting and investigating bugs like these! I'll get right on it once I've had a couple of pancakes. Coding on an empty stomach just leads to more "curiosities" in my experience! =)

TwoD’s picture

Title: img tag invalid with Teaser Break Plugin enabled » Teaser Break and Image Assist plugins dropping trailing slashes (XHTML vs HTML)

So far I've found that var $content = $('<div>' + content + '</div>'); // No .outerHTML() in jQuery :( is where the self closing tag is removed.
Note: For me, the actual replacement didn't happen in break.js because I also had the Image Assist plugin enabled, which uses the same outerHTML trick.

It appears that jQuery (or rather the browser) does not treat the editor contents as XHTML but plain HTML. So any new markup passed into the DOM will come out as HTML, without the trailing slashes.
The only workaround I could find was supposedly to set the HTTP header content type to application/xhtml+xml (includes/common.inc). Drupal now uses text/html for compatibility reasons but at least the default themes should be XHTML compliant. That should tell the JavaScript engine to create XHTML markup instead.

I tried changing this on my dev machine but ran into an undefined validation error which seemed to be in jquery.js. (The page displayed fine but scripts didn't run.) Btw, turn off the Drupal Firebug module if you're going to try this, it produces hundreds of <font> tags and other nasty things which makes your browser choke!

eezz’s picture

Hi TwoD, thanks for investigating. If you need me to test anything, let me know. Given that Drupal adds a teaser break on its own (something that I didn't know until just the other day).

Cheers, eezz

mrfelton’s picture

subs

TwoD’s picture

TwoD’s picture

Status: Active » Needs review

Renaming issue so it's easier to find.
Summary:
The uppercase tags, missing slashes and absolute URLs are all created at the same point in time; when the contents are parsed into a DOM by Drupal plugins or our TinyMCE implementation. That uses the browser's internal functions.
Now, browser's wont output XHTML unless the page they were delivered really is XHTML (MIME from server is application/xhtml+xm and the document validates against XHTML * 1.0 or 1.1)
This is where the problems start. Drupal sends the generated "XHTML" as html/text for compatibility reasons (Loooong story). Only special cases/subsets of XHTML 1.0 (NOT 1.1) are allowed to be sent using html/text, and even so it might be considered broken. When browsers see html/text along with a DOCTYPE for XHTML 1.0, they have to make a qualified guess at which markup was actually used (it's very common for XHTML DOCTYPEd documents to be regular HTML due to bad authoring or copy/pasting). In most cases, IE goes for HTML (aka 'tag soup') as that's what it's most familiar with. Subsequently, all code generated by the browser will be in the wrong markup...

The test-patches referenced above attempt to correct this by not sending the content via the browser at all, which makes for a bit of string manipulation instead.

I'll mark further XHTML issues duplicates of this one unless they belong to a 'new' part of the module or display symptoms which does not really fit here. Note that this issue will be closed if the two issues referenced in #8 are both fixed and it can be confirmed they together or separately solve the problems mentioned here.

sun’s picture

Title: Teaser Break and Image Assist plugins dropping trailing slashes (XHTML vs HTML) » Invalid XHTML: missing trailing slashes, absolute urls and uppercase tags.
Status: Needs review » Active

wow, awesome summary, TwoD!

Since all of this runs in an IFRAME, I guess that our goal and target needs to be to change the content type for 1) the designMode documents of the editors and 2) all HTML that is transferred from/to Drupal for interaction with editors...?

http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/doctype

http://docs.cksource.com/ckeditor_api/symbols/CKEDITOR.config.html#.docType

TwoD’s picture

Status: Needs review » Active

@sun. Thanks!
About those settings. I've already tried that with TinyMCE. It did not affect the problems as they do not originate from inside the iFrames. The output from the editor is correct, but it's mangled once put through our files, which are all loaded in the context of the main page. Besides, that "I can only start typing by clicking on the first row"-problem in IE would annoy the heck out of users and create lots of bug reports...

You can verify that the Drupal plugins and the TinyMCE implementation's prepareContent() cause the problem by inserting debugging code there or just stepping through it in Firebug.

BrightBold’s picture

I am using WYSIWYG 6.x-2.dev from 11/23/2009 with CKEditor, and everything was working beautifully until someone tried to update the site from IE8 (instead of the usual FF, Safari, or Chrome) and all her pages came out double-spaced. All the tags were being converted to uppercase, and then weren't recognized, so the line break converter was adding extra paragraph tags around everything and creating tag insanity.

Found this thread and turned off Teaser Break, now all is well. I know this report doesn't shed any light on the issue, but hopefully it will help others searching on this problem to find this useful thread. I never would have suspected Teaser Break as the culprit - I was off on a line break converter wild goose chase.

Thanks for clearing up that mystery!

TwoD’s picture

@BrightBold, If you have the opportunity, could you please try the patches mentioned in #8 and see if they solve the problem? (You should only need the patch in the first issue if you're not going to use TinyMCE.) Btw, your post also confirms this issue does not appear when using CKEditor without TB, and it's nice to know the info in the issue itself helped someone. :)

gregarios’s picture

My problem is with IMG tags. When the teaser-break button is simply enabled, all my IMG tags are non-xhtml compliant, like <img src="http://url.com/img.jpg"> even when not using the teaser-break at all. However, when I turn off the teaser-break button in the WYSIWYG Editor settings, then my images are correctly formatted as <img src="http://url.com/img.jpg" />. It doesn't matter whether I actually use a teaser-break or not, just having the button enabled causes it.

This has been tested on at least 3 different Drupal sites, using TinyMCE 3.2.7.

Another strangeness: EVEN if you manually place a closing slash at the end of the image tag, by modifying the source code -- if you look at the source by clicking the "Disable rich-text" link or use the "HTML" button, then click it again to bring back the editor, the slash will be gone!

TwoD’s picture

@gregarios, why this happens is explained in #9, and there are a couple of patches mentioned in #8 which work around this problem. They were created to demonstrate that not parsing the content into a DOM and serializing it back again is actually the problem, just doing string operations on it will preserve the HTML. Basically, it's the browser which is messing things up (its internal functions are used for the parsing/serializing), so we don't let the browser touch the code. As this code is is used for substituting special markers in the content with the placeholders, and back again, it is executed each time content is put into or taken out of the editor, that's why you can't fix the image tag manually.

jide’s picture

@TwoD: After some digging, I came to the very same conclusion as yours :)

Anyway, as this DOM to non standard HTML in IE issue is likely to happen again, it might be useful to have alternative solutions. I sent to you some libraries that could help on IRC, but I think this one is the best I found :

http://xhtmljs.codeplex.com/

I think it would be very useful to have something like this available for plugin developers.

Looks pretty solid, but is under a BSD license. Maybe contacting author could help.

gregarios’s picture

Anyway, as this DOM to non standard HTML in IE issue is likely to happen again

I'm using Firefox and Safari. Not exclusive to IE apparently.

gregarios’s picture

The patches referenced in #8 do fix my problem with non-XHTML IMG tags. After applying the first patch (manually) to v2.0, it still didn't work. However, after then applying the second patch, it worked to fix the problem.

sun’s picture

Title: Invalid XHTML: missing trailing slashes, absolute urls and uppercase tags. » Invalid XHTML: missing trailing slashes, absolute urls and uppercase tags

#573878: Teaser break causes invalid HTML has been marked as duplicate, but definitely contains further information related to this issue.

- http://www.stevetucker.co.uk/page-innerxhtml.php is another + smaller xhtml library, but may require some fixes.

- http://drupal.org/node/573878#comment-2397220 contains some test-cases.

jide’s picture

Some more resources. In the TinyEditor I mention in #708558: Add TinyEditor, there is a portion of code that uses regexps to convert to xhtml, in a similar way to what jquery.xhtml() does. Not rock solid, but could help for a simple approach.

if(this.xhtml){
	v=v.replace(/<[^>]*>/g,function(match){return match.toLowerCase()});
	v=v.replace(/<span class="apple-style-span">(.*)<\/span>/g,'$1');
	v=v.replace(/ class="apple-style-span"/g,'');
	v=v.replace(/<span style="">/g,'');
	v=v.replace(/<br>/g,'<br />');
	v=v.replace(/<br ?\/?>$/g,'');
	v=v.replace(/^<br ?\/?>/g,'');
	v=v.replace(/(<img [^>]+[^\/])>/gi,'$1 />');
	v=v.replace(/<b\b[^>]*>(.*?)<\/b[^>]*>/g,'<strong>$1</strong>');
	v=v.replace(/<i\b[^>]*>(.*?)<\/i[^>]*>/g,'<em>$1</em>');
	v=v.replace(/<u\b[^>]*>(.*?)<\/u[^>]*>/g,'<span style="text-decoration:underline">$1</span>');
	v=v.replace(/<(b|strong|em|i|u) style="font-weight: normal;?">(.*)<\/(b|strong|em|i|u)>/g,'$2');
	v=v.replace(/<(b|strong|em|i|u) style="(.*)">(.*)<\/(b|strong|em|i|u)>/g,'<span style="$2"><$4>$3</$4></span>');
	v=v.replace(/<span style="font-weight: normal;?">(.*)<\/span>/g,'$1');
	v=v.replace(/<span style="font-weight: bold;?">(.*)<\/span>/g,'<strong>$1</strong>');
	v=v.replace(/<span style="font-style: italic;?">(.*)<\/span>/g,'<em>$1</em>');
	v=v.replace(/<span style="font-weight: bold;?">(.*)<\/span>|<b\b[^>]*>(.*?)<\/b[^>]*>/g,'<strong>$1</strong>')
}
gregarios’s picture

In case anyone is interested, attached is a patch rolled against 6.x-2.0 to combine the fixes found in the #8 comments above. It fixes the issues with non-closing XHTML tags and teaser-break placement near paragraph tags. I've been using it for the last day inputting nodes at the following website which now passes the W3C Markup Validation Service tests: www.kpbj.com

If you've been using the broken 2.0 version to place images and break tags, you can fix them by simply opening those nodes for editing, then saving them with no changes. The fixes seem to automatically get applied. :-)

geerlingguy’s picture

Subscribe - I hit this problem with Internet Explorer (not a problem in Safari), and the linebreaks.js filter that I have applied causes extra
tags to be added to any capitalized HTML tag, causing tons of extra space on the page... Solution is the same for me as #12.

(Reference to linebreaks.js issue: #513998: Plugin to convert p- and br-tags to newlines)

adrianmak’s picture

how about the absolute urls issue of inserting images?
This must cause a lot of problem when developing site in local and move to production server where domain name is different.

gregarios’s picture

how about the absolute urls issue of inserting images?

Not a problem for me. When I move the site, I do it by creating a database dump, then do a find/replace using sed on the database file, changing anything with "http://oldserver.com" to "http://newserver.com", then restore to the production server. Keep in mind this only works if the Domain name is the same char length for the dev server as the production server.

It works backwards too:
I use this method to develop for the Production site by making an exact backup of the production site on the dev server to work out kinks in new software, test new CSS stuff, etc, before making it live. You just need to enable Reroute Email module on the dev site whenever you work on it to prevent users from getting emails form the dev site. Lets me test subscription events, etc, on a live well used site before doing the changes to the production site.

TwoD’s picture

I've actually been thinking about that today and was about to post here that we must not forget to deal with that ourselves (as no HTML to XHTML lib would).
The editors seem to solve this by adding an extra attribute on tags which has a "src" or "href" attribute, something like "_cke_saved_src" and also setting it whenever the url changes. As the browser doesn't know it's an url, it will not make it absolute if relative. We would have to do the same before passing the contents to Drupal plugins, and then revert the attributes after they've attached/detached.
A problem with that is what happens if a module is designed to change the ulr in some tags or inserts new tags with urls? If src/href doesn't match our custom attribute (or it's undefined), we'd have to use the src/href value instead but we can't know if it is supposed to be absolute or relative because the browser may have changed it if a plugin parsed the contents into a DOM tree.

The more I think about situations like this the messier it gets. We can probably not trust the plugins to preserve source formatting either if using DOM manipulation, which effectively nullifies those editor options. If we tell all API users to use XHTML-specific methods to generate their output this might be easier to control. Source formatting control is ruled out with the http://xhtmljs.codeplex.com/ library. That lib does have a formatting option, but we can't guarantee plugins will use it, and that formatting won't be affected by settings made on the editor [when we have the ability to set those].
The library at http://www.stevetucker.co.uk/page-innerxhtml.php is much better at preserving the formatting as it uses the nodeValue property. It is not as good as the first lib at fixing tags, but I'm tinkering with it looks promising while being simple. (We actually only need the innerXHTML function, the other one deals with writing and creating nodes from a string.) We still have the url problem tho.

BrightBold’s picture

Sorry I never came back to test the patch in 8 as requested. I'd forgotten about this issue until it came up on another site. Question: Is the patch in #21 in the latest dev version, or do I need to patch it even if I upgrade?

gregarios’s picture

Sorry I never came back to test the patch in 8 as requested. I'd forgotten about this issue until it came up on another site. Question: Is the patch in #21 in the latest dev version, or do I need to patch it even if I upgrade?

The patch in #21 is a culmination of all the patches found in #8. The patch has not been incorporated from what I can tell, yet. The patch is for 6.x-2.0, not the -dev version, so you should check the dev first for the patch changes. Hopefully a maintainer will comment in here about the state of this thread soon. (hint, hint!)

gregarios’s picture

The patch from #21 was not implemented in this latest release, and the new 6.x-2.1 version still has the W3C invalid image tags problem with missing self-closing slashes. :-/

catofcheshir’s picture

Just want to say: I'm apply patch from #21 and it works now. Great!

gregarios’s picture

Just want to say: I'm apply patch from #21 and it works now. Great!

My patch works even on 6.x-2.1? I tried applying it manually, and it is working again. :-)

gregarios’s picture

Attached is an updated patch to apply to 6.x-2.1. Please, maintainers... would you implement this? :-)

sun’s picture

Status: Active » Needs work

There is a patch.

TwoD’s picture

That patch still has all my commented out debug calls in it, which isn't so pretty hehe.
Didn't we agree earlier on that instead of abandoning the string->DOM->string conversion in favor of string manipulations only, we would implement something similar to .xhtml() instead of .html()?

Referring to the last part of #25 and earlier; the innerXHTML lib has a really interesting method we could use if we also add checks for things like not adding </br>. It preserves as much as possible of the formatting done by the editor, and I'm a big fan of that hehe.

The major problem left to solve with that approach is URL preservation. Using only string manipulations would preserve URLs as they are never touched by the browser, but even if we do this, other modules implementing Drupal plugins may not. So we'd probably have to do that anyway.

Anyway, that is mostly what's going through my head when thinking about this issue and I'm trying to come up with a solid solution, but I'm being distracted by other things at the moment. :s

gregarios’s picture

That patch still has all my commented out debug calls in it, which isn't so pretty hehe.

I didn't create the fixes found in my patch... I only created the patch based on the references patches in comment #8, which fixed all the issues I was having.

If you have a better way, by all means implement it. I've got no alliances with any particular way of fixing it, so long as it just gets fixed. ALL I know is the patch I put together was able to fix my issues, and it enables my website to validate as W3C XHTML 1.0 Strict even with lightboxes with captions and images and the whole jumble. :-)

I'm not even sure I understand what is holding this up, but I defer to your better judgment on the method, so long as the outcome is right. I anxiously await a release with the final fixes implemented.

m4olivei’s picture

subscribe

m4olivei’s picture

I have a site with this issue. Specifically we started noticing sporadic nodes with uppercase tags. After reading through this and other tickets its clear that not all nodes are affected since this issue is browser specific and not all of our users use IE. I've applied the patch in #31 and will be testing it over the next couple of days. I'll let you know if I find anything.

Is this patch supposed to also fix the issue where the

is inserted inside opening and closing tags such that it causes tag mismatches when showing just teaser?

Thanks,
~Matt

gregarios’s picture

Is this patch supposed to also fix the issue where the is inserted inside opening and closing tags such that it causes tag mismatches when showing just teaser?

yes.

Some of these problems it fixes are not browser specific by the way.

TwoD’s picture

The patch above will only partially fix the problem with the break placeholder being inserted inside other tags and won't catch when it's inserted in for example a div. You need the HTML Corrector filter enabled to be sure that your output stays valid. To fix this completely, the Teaser Break plugin (and all other Drupal plugins) need better access to the selection upon inserting the placeholder, then it could split the tags as needed. The level of support for retrieving the selection as we need it varies greatly between editors, so that has been postponed for Wysiwyg 3.x and is dealt with in another issue.

Note that we will most likely not apply a patch like the one above as we've decided to go another route to ensure XHTML compliance, one which also allows use to use DOM methods to manipulate the contents sent to Drupal plugins, see above (and the other linked issues) for details on that discussion.

gregarios’s picture

Seems to me if you have your stock Drupal "HTML Corrector" filter in the right order, then there is no problem with closing tags being missing. (except for the missing self-closing tags of images which this also fixes)

I was always able to place a teaser break anywhere I wanted and it always closed everything just fine in the teasers. My issue was only with self-closing img tags being missing.

Elijah Lynn’s picture

@m4olivei, in response to comment #36 (http://drupal.org/node/510552#comment-2738530)

I had the same problem with TinyMCE creating invalid XHTML by capitalizing all my <p> elements. I turned off "apply source formatting" in the WYSIWYG Profile for TinyMCE in the "Cleanup and Output" area (/admin/settings/wysiwyg/profile/XX/edit) and I started getting valid XHTML, however, I had to edit each node and save again.

jerry’s picture

Subscribing.

te-brian’s picture

Just adding my experiences here:

I've had the .html() call screw up markup in two different circumstances so far that haven't been mentioned yet. The first case was someone posting youtube videos from IE7. Some of the attributes were getting stripped off of the object tags (if memory serves). The second and more recent problem was with someone working with a table in IE7. The combination of all the table tags (TABLE, TBODY, TR, TD) being uppercased and the main input filter we use (htmlawed) was causing the contents of the td tags to be dropped.

I can confirm that the exact line where the screw-up happens is the innerHTML() call in html()'s implementation in jQuery.

I haven't tried the above patch, kinda waiting to see what course of action WYSIWYG is likely to take.

Was anything decided about whether or not to use an innerXHTML library?

geerlingguy’s picture

@te-brian: I fear that's happening for some of my users too (w/r/t IE7 YouTube embedding). Ugh.

jared12’s picture

The patch from Comment #31 seems to have fixed the problem I was having with relative links (#495828: In TinyMCE in IE7 Relative Links Become Absolute Links if Teaser Break is Enabled).

Thank you TwoD, sun, gregarios and whoever else.

Drupal 6.16
Wysiwyg 6.x-2.1
CKEditor 3.2.1.5372
IE 7.0.5730.13

Jimboy’s picture

I just noticed that enabling the HTML Corrector filter seems to fix this issue (at least for
and Only local images are allowed.).

Do I still have to apply patches?

TwoD’s picture

There's currently no patch available for the approach we've decided to go with to fix this issue, but #31 has a patch which temporary fixes the problems. The #31 patch also fixes the absolute-url-issue (see #44), something we have yet found a good workaround for with the approach we've decided to go with (hence the delay in fixing this).

Jimboy’s picture

Can you explain why there is a need for patches / fixing when the filter takes care of this?

gregarios’s picture

I'd just apply the approach that is timely, tested, and partially fixes things IMHO. Then find a better fix at your leisure. Not like it can't be undone later transparently. ;-) Jus' sayin. :-)

mattwmc’s picture

Version: 6.x-2.0 » 6.x-2.1
Component: Plugins » Editor - CKeditor

I don't have teaser break on and when using CKeditor get added:

<p>&nbsp;</p>

and

&nbsp;&nbsp;

when pasting from Microsoft Office Word Viewer.

Patch didn't make a difference - as I assume because teaser break is already off.

TwoD’s picture

Component: Editor - CKeditor » Plugins

Then that's a different issue, this one is about a confirmed problem with the Teaser Break plugin (and others) due to how jQuery's .html() method works.
Please reopen #825908: When pasting from Word CKeditor adds extra space if this is contained to CKEditor (without any cross-editor plugins enabled) and not possible to reproduce on http://ckeditor.com/demo.

mattwmc’s picture

Sorry, someone pointed me here.

Thanks.

maddentim’s picture

subscribing

vinmassaro’s picture

subscribing

supersteph’s picture

could this be the same issue that is stripping backslashes from my text fields?

TwoD’s picture

Backslashes? I haven't noticed such a problem, but I haven't used many backslashes either... I don't think that would be part of this issue, as backslashes are allowed in HTML and the browser shouldn't strip those out during string->DOM->string conversions even if it doesn't generate XHTML the way we want it to. Maybe it's an entity encoding issue?

curtaindog’s picture

Some people have reported HTML corrector fixes the casing issue but I'm getting pages where the opening tag is rendered in uppercase while the closing tag is in lowercase. Is it something I'm doing wrong or do I need to raise an issue against HTML corrector?

sun’s picture

Priority: Normal » Major
JonesUI’s picture

I've read the various threads about this issue because I have a related that issue that was not solved by applying the patch at #31.

My config:

Drupal 6.15
wysiwyg 6.x-2.1
TinyMCE 3.3.7
Teaser Break plugin active
IE7

I am seeing the node pre-pended to href and src URLs:
Example: <img src="sites/default/files/uploads/image.jpg" alt="" />
Becomes: <img src="node/159/sites/default/files/uploads/image.jpg" alt="" />

Only happens in IE with Teaser Break activated (Firefox works fine).

And its cumulative... If I edit the same page multiple times, the img src might look like:
<img src="node/159/node/159/node/159/node/159/node/159/node/159/node/159/node/159/node/159/node/159/node/159/node/159/node/159/node/159/sites/default/files/uploads/image.jpg" alt="" />

I'm adding this to this issue because it seems to be related and the patch doesn't fix it.

gregarios’s picture

I've read the various threads about this issue because I have a related that issue that was not solved by applying the patch at #31.

I believe your issue is an unrelated issue.
You may want to start a new issue for if you want to give it attention.

gregarios’s picture

Version: 6.x-2.1 » 6.x-2.2
FileSize
3.35 KB

A revised patch for the 6.x-2.2 version if anyone is interested. Pretty much the same as comment #31's.
Seems the maintainers don't like this fix, but it does work well for me — keeps things nice and XHTML valid. ;-)
(Tested with Drupal 6.20, Wysiwyg 6.x-2.2, TinyMCE 3.3.9.2)

powery’s picture

Thanks! #60 Works very well.

gregarios’s picture

Version: 6.x-2.2 » 6.x-2.3

NOTE: Patch in #60 is still compatible with, and is still needed by, WYSIWYG 6.x-2.3.

TwoD’s picture

Version: 6.x-2.3 » 7.x-2.x-dev

Bumping to 7.x-2.x-dev as that's where changes will be applied first (and the issue still exists there).
I've made some progress on a more generic solution for this and will try to create a patch ASAP. The Absolute URL problem still needs addressing in what I've got, but let's try hit the easier targets first.

Shai’s picture

@gregarios, thank you so much, works for me patch at #60.

geerlingguy’s picture

Patch in #60 works well... can someone re-roll against 7.x, so we can commit there, then backport?

BarisW’s picture

This is great, the patch in #60 solves my issues with uppercase tags in IE.
Applied to 6.x-2.3

Thanks a lot!

spgd01’s picture

Still an issue:
I have:
Drupal 6.20
WYSIWYG 6.x-2.3
using IMCE
with and without Teaser break

Editing in IE 8

The uppercase Tags and Spacing and erroneous tags are still present

In Firefox and IE

There are no closing tags for images and line brakes

<br> should be <br />
gbaudoin’s picture

Had this issue with the tinymce html6 video. (the params for the flash fallback were removed and the

tag was in uppercase), #60 (applied on wysiwyg 2.3) resolved it. Thanks !
StevenWill’s picture

subscribing

cookiesunshinex’s picture

I have a site where I can't get valid XHTML using the markup validator at http://validator.w3.org/

Drupal 6.20
Wysiwyg 6.x-2.3
CKEditor 3.5.1.6398
IMCE 6.x-2.1
IMCE Wysiwyg bridge 6.x-1.1

My theme uses XHTML 1.0 Transitional, and we have found that the WYSIWYG editor causes the following issues:

* - instead of <br /> the WYSIWYG editor adds <br>
* - instead of <img src="" alt="" /> the WYSIWYG editor removes trailing / at the end

Are my issue the same that are reported in this thread? Will the patch in #60 fix these issues?

gregarios’s picture

In regard to comment #70, the patch from #60 should help those two issues.

eyolf’s picture

Version: 7.x-2.x-dev » 6.x-2.3

I assume this is the same problem: if a <--!break--> tag for some reason is placed within a

, the div tag is not closed, obviously resulting in a skewed distribution of contents in the following blocks on the page.
I'm using TinyMCE, and the break in question is part of a view with some customization (including some replacement command); I haven't tested it in a vanilla installation, but since the error shows up in the described case, I consider it a bug in any case. I'm ready to do further testing if necessary, though.
spgd01’s picture

Ok, the patch seams to work on 6.x-2.3 for IE 8 (I thought it was already added to 6.x-2.3) however line brakes and images still are not valid HTML they are still:

<br> should be <br />

and no closing "/" for images. Did I miss something? Is everyone else getting valid image and line brakes with the patch in #60?

spgd01’s picture

I still have these issues with the patch form #60. Did you have any luck?

TwoD’s picture

The patch in #60 was created as a temporary workaround until we've got a way to let other plugins produce valid XHTML as well - plugins that might not be possible to refactor to a set of Regular Expressions. The patch keeps the browser from modifying the contents while the Teaser Break plugin is handling it. The same problem exists in the TinyMCE implementation itself, in it's prepareContent() method to be exact and that would require a second patch. When Teaser Break is used with any other supported editor, only #60 is needed.

I have been working on a larger patch which uses a completely rewritten version of the innerXHTML library first mentioned by sun in #19 to do Markup->DOM->Markup conversions via a utility function.
As mentioned in #19, the innerXHTML library wasn't perfect and needed many changes, and still needs some more, but it seemed like the better option.

Instead of having each plugin convert the markup string given to their attach()/detach() methods, they should now call Drupal.wysiwyg.utilities.modifyAsDom(content, callback), where callback is a function taking a DOM node (or array of DOM nodes) as its first argument. The argument can be manipulated as needed by the callback, which doesn't need to return anything. Once the callback is done, the utility function will convert the DOM back to a markup string and return it.

I found this callback pattern to be the easiest solution for plugin developers as it doesn't require them to manually serialize or unseralize content while taking cross-browser issues into consideration. Using a callback lets Wysiwyg module preserve and restore white-spaces used for formatting (show up as #text nodes in the [XHTML] DOM). It's also an opt-in solution where existing plugins that only work on strings do not need any modifications to work with the new code.

This EXPERIMENTAL code can be found the new xhtml_callback branch.

Known issues: UPDATED

  1. IE turns relative URLs into absolute URLs. - This should be pretty easy to fix by storing/manipulating the original URLs in a temporary attribute, which the editors already do internally. Plugin writers need to be made aware of this quirk. - Fix committed!
  2. IE (8?) adds unnecessary attributes to <img> and <ol> tags, such as start="fileopen" and loop="1". Should be pretty easy to fix as well by filtering them out during serialization, as already done with colspan="1" & rowspan="1". - Fix committed!
  3. TinyMCE does not preserve white-space formatting. (I think this is "just" a regression, I could swear I had it working earlier). - Not sure there actually was a problem here since I can't even get the official TinyMCE demo to do indents, only line breaks.
  4. Code style needs to be "Drupalified", could take advantage of jQuery all over the place.
TwoD’s picture

HTML entities like &nbsp;,&amp; or &Delta; were not being preserved in the unserialize() call because it uses .innerHTML. I escape entites by temporarily replacing them with <span data-wysiwyg-protected-entity="1">ENTITYNAME</span> in the content string and convert them back to text nodes after we've got a DOM again.
I think this is more robust and less likely to interfere with real content than just using plain text replacements.

silloyd’s picture

subscribing

smscotten’s picture

Version: 6.x-2.3 » 6.x-2.4

Applied patch from #60. Using Wysiwyg 6.x-2.4, JQuery Update 6.x-1.1.

IE8 still converts lowercase tags into uppercase, which then creates extra space between lines because line break converter doesn't recognize that the uppercase p tags are p tags, and so adds more p and br tags. Looks like this:

    <p></p><P>Keep IE</p><br />
<P>From Adding Spaces</p><br />
<P>more</p><br />
<P>like</p><br />
<P>this</p>

Browsers, on the other hand, recognize both the uppercase and lowercase tags and the breaks too and so my IE users' comments look very stretched out.

I turned the line break converter off. That fixed it in MSIE, but I'm concerned about the next person who visits my site with JavaScript turned off (or everyone with an iPad where editable doesn't work) and writes a multiparagraph post that gets turned into one run-on sentence. I have more users on MSIE than on iPad but fixing the formatting issue on MSIE by disabling the line break converter is much more destructive to the iPad/non-JS users.

bzrudi’s picture

subscribing

TwoD’s picture

@smscotten, if you're using TinyMCE that patch is probably not enough. The TinyMCE integration's prepareContent method also causes this issue and needs a similar fix. (This is not a problem in the xhtml_callback branch.)

aimutch’s picture

I "fixed" my problem with URLs being converted into absolute URLs by disabling the Teaser Break button in the WYSIWYG settings. In my case, there seems to be a connection to IMCE. For users that don't have a profile set in IMCE, this behavior doesn't happen. For those that do have a profile in IMCE, I have to disable the Teaser Break button or URLs to images and links get converted in absolute URLs.
- TinyMCE 3.4.2
- IMCE - 6.x-2.2
- WYSIWYG - 6.x-2.3
- IMCE Wysiwyg API bridge - 6.x-1.1

TwoD’s picture

@aimuch, with the xhtml_callback branch that is no longer an issue so you should be able to keep Teaser Break active if testing it, see #75.

aimutch’s picture

Just to be clear, the fix is to apply the patch in #60?

TwoD’s picture

Version: 7.x-2.x-dev » 6.x-2.4
FileSize
12.45 KB

No, the #60 patch is a temporary workaround for people who can't wait for a proper [and by other plugin modules reusable] fix in the xhtml_callback branch, but still need to use the Teaser Break plugin.
For people using TinyMCE and Teaser Break [or other cross-editor plugins], the patch in #60 isn't enough because the same problem happens there when those plugins are active.

The xhtml_callback branch works with all editors supporting the Teaser Break button, and other plugin modules could use the same fixed code as that version of Teaser Break does.

We need people to test the Wysiwyg and Teaser Break plugin version found in the xhtml_callback branch (D7) so we know it's reliable enough "in the wild" and that other developers consider this a usable solution. Then we can apply this solution to Wysiwyg 7.x-2.x-dev as well as backport it to 6.x-2.x-dev.

See #75 for details on what the changes do.

I'm also attaching the current (rerolled) differences between master and the xhtml_callback branch as a patch here.
Testing it will essentially be the same as testing the xhtml_callback branch, with the difference that the Wysiwyg version in the xhtml_callback branch does not contain the unrelated changes made since that breanch was created.

This is a big patch so the steps leading here are probably not that obvious, please reference the xhtml_branch log.

UPDATE: Needs a small tweak for entities to be preserved in all browsers. + node.innerText + should have been + (node.textContent || node.innerText) +. Fix already pushed to xhtml_callback branch.

Shai’s picture

@TwoD, about two weeks ago I briefly tried the xhtml_callback branch but, but it crashed the site. I'll try to retest in the next couple of days and promise to report any errors I encounter (I didn't last time... I just gave up). Thanks for the great work.

Shai Gluskin

TwoD’s picture

Version: 6.x-2.4 » 7.x-2.x-dev

I just pushed some small fixes to xhtml_callback so make sure you pull changes before testing.
One was a fix for an error thrown when trying to merge (normalize) sibling text nodes under nodes that did not support the operation.
I found it while making the patch in #84 so it's already included there. There were also some non-critical fixes and comment cleanups, see the branch logs for details.

gregarios’s picture

For people using TinyMCE and Teaser Break [or other cross-editor plugins], the patch in #60 isn't enough because the same problem happens there when those plugins are active.

I'm using TinyMCE and Teaser Break and the patch in #60 fixed it for me — jus' sayin. ;-)

TwoD’s picture

Are you also saving content using IE? If not, these problems will be smaller since other browsers tend to generate XHTML (or close to it) instead of HTML when using .innerHTML without some of the precautions used in the xhtml_callback branch and the last patch.

gregarios’s picture

Are you also saving content using IE? If not, these problems will be smaller since other browsers tend to generate XHTML (or close to it) instead of HTML when using .innerHTML without some of the precautions used in the xhtml_callback branch and the last patch.

No, actually. I'm using Firefox or Safari.

SandraL’s picture

Just thought I'd note here that the media browser plugin appears to cause this problem as well.

(After reading this thread, I disabled the teaser plugin but the issue didn't go away, so I started disabling other plugins to find the culprit.)

bzrudi’s picture

Hi,

just applied the patch from #84 and played some minutes with it. The problem with missing img tag closing seems to be fixed. Teaser break is working also as expected. What I found is that there is a problem with some special signs and quotes. Every single or double quote, as well as -, and maybe others are replaced by &undefined; within the text.

Just to let you know. This is with Firefox 5.0 and ckeditor...

cheers bzrudi

TwoD’s picture

Re #91:

+++ wysiwyg.jsundefined
@@ -230,6 +230,259 @@ Drupal.wysiwyg.getParams = function(element, params) {
+      // Text nodes can't have children so return right away.
+      return document.createTextNode('&' + node.innerText + ';');
+    }

+ node.innerText + should have been + (node.textContent || node.innerText) +.
Fix pushed to xhtml_callback branch. Replacing that snippet in the patch before applying it should work too.

Powered by Dreditor.

mrfelton’s picture

Version: 6.x-2.4 » 7.x-2.x-dev
FileSize
3.4 KB

Here is the patch from #60 against 6.x-2.x in a format that will apply with drush make, for those that need it.

rosborn’s picture

sub

chicagomom’s picture

Version: 7.x-2.x-dev » 6.x-2.4

results from patch manager for patch in #93:

Hmm...  Looks like a unified diff to me...
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/plugins/break/break.js b/plugins/break/break.js
|index 54aac4c..6d8f025 100644
|--- a/plugins/break/break.js
|+++ b/plugins/break/break.js
--------------------------
File to patch:
Skip this patch? [y]
Skipping patch.
Hunk #1 ignored at 38.
Hunk #2 ignored at 53.
2 out of 2 hunks ignored
done
maddentim’s picture

@chicagomom, this means that your patch program can't find the file to change. This patch was generated from inside the wysiwyg directory. You need to run the patch from the same spot with the option -p0 and it should find it....

bzrudi’s picture

Version: 6.x-2.4 » 7.x-2.x-dev

Hi,

FYI: just gave the current patch (including the entities tweak) another try. Problem with entities seems solved as well as the closing img slash problem. However, another problem I found is, that if you make use of teaser break in a paragraph section the closing paragraph is missing.

So what should read <p>foobar<!--break--></p> 
results in <p>foobar<!--break-->
and the closing </p> is missing

That makes W3C failing for valid XHTML. Any idea?

cheers,
bzrudi

TwoD’s picture

I'm assuming you mean the #84 patch, not the #93 one since it uses the older RegExp approach, right?

Which browser and editor did you test with?

bzrudi’s picture

Yes, I use patch from #84. Browsers tested, Firefox 7.x and Chrome together with the CKeditor (guess V 3.6.1).
W3C results for my site
Thanks!
bzrudi

mrfelton’s picture

Confirming that as per #90, having the Media browser plugin enabled causes the same problems. Also something that doesn't seem to have been mentioned in this thread is that this also causes valid <script></script> to be completely stripped out of content. Making it impossible for content authors to embed scripts within their content if either the teaser break plugin or the media browser plugin are enabled (you do not have to use them, they just simply have to be enabled).

mikedub’s picture

Using the patch in #84 with the fix on 6.x-2.4, I am having problems with ampersands in URLs. I get a bunch of "data-wysiwyg-protected-entity" spans inserted which multiply every time the editor loads.

Also, what is the status on this being rolled into 7 or 6 versions?

sezio’s picture

I solved the problem by using the patch #60. Thanks !

ayalon’s picture

Hi!

Patch #60 works for me. Patch #75 of the maintainer does not work. You can easily test it.

  1. Enable the page break plugin first!
  2. Set config.allowedContent = true!
  3. switch to the source code
  4. add a script tag like this in the source code: <script type="text/javascript" src="http://www.google.ch/script.js"></script>
  5. switch to the normal mode
  6. switch to the source code mode: the script tag is gone. using patch #60 fixes this
TwoD’s picture

@ayalon, #75 has no patch, so I assume you tested the xhtml_callback branch.
I haven't had time to update that to support CKEditor 4 yet, so you would have to be using CKEditor 3 with that. Therefor, your step 2 does nothing.

Adding the script tag to the list of tags to mask out before serialization fixes the issue you are describing.
I'll get back to this issue once the queue has been cleared up a bit. We've taken over maintaining the innerXHTML script for use with Wysiwyg and plugins, but it needs a bit of a rewrite before it's reliable enough to work across all browsers etc.

JediSange’s picture

Issue summary: View changes
FileSize
1.43 KB

Ok, so the problem was the following line of code:

var $content = $('<div>' + content + '</div>');

This, for whatever reason, was removing valid markup (possibly as a security precaution). The patch I've attached is the no JQuery solution to the problem that should leave you with a clean markup. Please let's fix this and close this forsaken issue.

Edit: To follow up on this, if you examined it closely then JQuery actually broke out the tags it was stripping into higher indexes in the array. In my case, a script tag was getting parsed out and the $content[0] was the div, whereas the $content[1] was the script. Effectively you could've used this method and still recompiled the result to the same ends down the road. But this is a more elegant solution that should work.

JediSange’s picture

Status: Needs work » Needs review
TwoD’s picture

Status: Needs review » Needs work

innerHTML IS the problem here. It's a mess in IE (and older versions of some browsers).
jQuery uses/used innerHTML too. IE, for one, spits out HTML markup instead of XHTML - causing the missing slashes and uppercase tags issues, it may or may not strip scripts, comments, as well as rearranging all kinds of stuff.

Also, IE8 and below does not support the DOMParser object.

JediSange’s picture

Those are fair points. That all being said, the previous version was unusable due to the fact that it wasn't working as intended. And innerHTML was not the problem that was stripping out things like script tags (for me, at least). It's very possible this is still not great for older browsers, but for modern versions of Chrome/FF/Safari this appears to do the job.

The DOMParser I can probably rewrite to use something else, but in terms of the innerHTML problem -- Do we know of a solution for that? I was not aware it was so buggy in IE. I'm all for dropping IE support. :)

JediSange’s picture

And just to echo on the point a bit more: This might be the wrong thread for that code. I found my way here looking for why it stripped out script tags specifically. My particular problem was that the script tag was getting pulled out into higher indexes from JQuery and then ultimately lost because of the Page Break button. Even if this code doesn't fix the XHTML issue, I still think it's a strict upgrade from the current code base to fix the issue of erroneously stripping tags away (pending we rewrite the DOMParser bit if IE8- is a concern).

reevo’s picture

The patch supplied in #105 is working for me. I'm attaching a reformatted version for those who want to apply this as part of a drush make file.

Update: ignore me. This patch does not work on Mac Safari.

bennybobw’s picture

I'd like to pitch in and help with this ticket since I'm running into this problem with a bunch of sites. @TwoD you said in #107

We've taken over maintaining the innerXHTML script for use with Wysiwyg and plugins, but it needs a bit of a rewrite before it's reliable enough to work across all browsers etc.

Can you submit a list of what things are still not reliable across all browsers? If so, I'm happy to start work back up on this ticket.

mgifford’s picture