New break wysiwyg editors and W3C validation

By blogarithme on 26 Dec 2006 at 13:51 UTC

Because compliance with standards is important most of the wysiwyg editors try to generate a valid HTML document. When you insert manually a <tag> but not the </tag> ‘FCK editor’ and ‘widgEditor’ do it for you: they add a </tag> at the end of the document. This is what happen with the new <break> !

When Drupal 5.rc1 generates the page it removes the <break> tag but leave the </break>. The resulting document is not a valid document. (Closing </break>without opening </break>)

Note: if you decide to replace <break> by <break/> a new problem appears: both FCK and widgEditor replace “xxxx <break/> yyyyyy” by “xxxxxx <break> yyyyy </break>”

I think the only solution to avoid breaking many Drupal installations is to continue to use the well know . For those who work at HTML level the comment syntax is not a problem. Eventually, if you want to propose a simplest solution for those using wysiwyg editors, you can accept a [break] or /break/ keyword (not a tag)

The problems are the same with ‘FCK editor’ and ‘widgEditor’. I haven’t tested the TinyMCE editor.

Important: Currently, before to compose a message, I have to know if it’s Drupla 4.x or Drupal 5.x to know what is the 'break' syntax.

Comments

What have you wanted to achieve with this post?

chx commented 26 December 2006 at 14:08

This issue is discussed in http://drupal.org/node/87145#comment-169474 , your followup is after it so I presume you read Steven's big answer. Moving this discussion out of the issue into the forum buys us little aside from FUD.

The /break is removed by filter_xss as it is never appears in the list of valid HTML tags... If it is not removed then you need to file an issue.

As this issue already caused quite some debate, I will keep an eye on this thread, remove comments and if needed, make the whole thing read-only as I see fit. Read the linked issue and think before posting.
--
The news is Now Public | Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

step back from the hostility

ericg commented 26 December 2006 at 15:29

the user obviously feels that a bad decision is being made.

the discussion you link to shows that the general consensus is that the change to the break tag is not necessary and a bad decision

this user is trying to highlight that in an appropriate way, posting that concern to drupal in a rational and clear manner

please don't abuse your power and delete comments or otherwise shut down this discussion

It is one thing to be helpful and point out an already existing thread but I think you have crossed a line.

Yes, there is another (very long and hard to read) thread in the issues track. why not put it where more people might see and be able to give feedback (using node comments instead of issue follow-up, which will give easier to read formatting)

And now that I'm done wiht that, let me say that the proposed change from a valid html comment to invalid markup is a bad one that is not necessary or useful in the long run (in my opinion).

I have found it useful when I train end-users that the break tag is an html comment tag. This allows for them to understand that comments are things that can control things behind the scenes and keeps them separate from html tags that they use for design of their text.

But, I am also a bit strange in that I tend to avoid any wysiwyg editors. I find that they all end up confusing users rather than educate them or give them the feeling of control over their content that is the goal of using a CMS. (using qucktags.module I can give them an easy way to modify their content and let them learn html at the same time).

hostility

chx commented 26 December 2006 at 17:29

Yes, I am hostile because this is an inappropriate place -- the issue is the place to discuss this and what's worse: he have followed up there, so he might know it. All this 'invalid markup' talk is nonsense as the linked comment (no need to read the whole thing...) tells you clearly: this pseudo tag never appears in the output.

I warned beforehand because I much feared what kind of debate will sprung up. As long as it's courteus, and reasonable rest assured I will not do anything.
--
The news is Now Public | Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

wrong on two points

chx commented 27 December 2006 at 01:21

And now that I'm done wiht that, let me say that the proposed change from a valid html comment to invalid markup is a bad one that is not necessary or useful in the long run (in my opinion).

The data in the database is not valid markup. Noone ever said that. It's the filter system that tries to bring order to the user inputted chaos.

I have found it useful when I train end-users that the break tag is an html comment tag. This allows for them to understand that comments are things that can control things behind the scenes and keeps them separate from html tags that they use for design of their text.

Using controlling comments are actually pretty evil, it's something that Microsoft does because comments are present only to understand things better, any proper parser skips these. The old tag was a hack.
--
The news is Now Public | Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

a serious question

ericg commented 27 December 2006 at 02:22

how is the new tag less of a hack?

seriously, I'm interested in why you think this is any less of a hack.

I object.

dman commented 27 December 2006 at 03:11

First,

Thankyou for posting this topic in the public arena

I hadn't seen the issue thread before now, and appreciate having had a chance to read it.

The data in the database is not valid markup. Noone ever said that.

Well, the data in my database always was. You can say it's not by design, but it was a good thing when it was. This one change now deliberately makes it invalid for what I see as a trivial win. Fixing a problem that wasn't broken (HTML comment syntax confuses some people) by destroying the ability to work with modern XML tools and such.

Much of my work - contributions to the HTML tidy module - The import html module - XSL filter module - etc all enforce good, valid, syntax on the text they work with. Usually this means the unfiltered database stuff. My input validator for htmltidy for example prevents bad HTML from getting into the DB in the first case. This new made-up pseudo-tag is bad HTML.
I've done lots of XML, lots of XSL, Lots of DTDs, Namespaces and related stuff. I now appreciate why these annoying, strict validation rules are required, and why they are a good thing.

I've spent a decade migrating and translating old handmade or custom-made sites from one system to another to another. Web-wide, Early hacks often made up their own tags then did on-the-fly substitutions. This may have worked for them, but produced crap migration, editing, validation and maintainability issues.
Eventually some of the editors stopped doing that and Frontpage and Dreamweaver etc started hiding their custom extensions in comments. Yeah it's messy, but it didn't break everyone elses way of doing things and the source files were mostly portable again.

With all of the validation tools and standards available to us nowadays, and lessons learnt from past problems, I think it's a great leap backwards to deliberately break what was (at least potentially) otherwise perfectly valid source. Modern practices may allow you to legally extend your custom additions via namespaces <drupal:break /> but I don't think that'll please either side of this discussion.

To say as chx does that the body cell in the database is not meant to correlate with a classic HTML source file, or that it should never be touched without output filtering precludes a lot of power (input/output API, content synchronization, static backups, serialization) that we currently HAVE and would be broken by this change.

Sorry, I've done too much content conversion between different CMSs on different platforms to get tied down to the my-database-is-the-only-answer lock-in. Thinking your pages will only ever exist inside Drupal with its output filters always applied is short-sighted.
Imagine a client-side HTML editor that could be tied into edit drupal pages via RPC or WEBDAV, much like some current blog tools or FTP-capable text editors already do. This one invalidating change deliberately breaks compatability with such a system.

I think either keeping the comment tag version, or(and) supporting the visual markdown [break] version to help the rich-text editors is fine.

OK. Maybe this post should have been in the issue thread, but my rant is about the continuing validity of my data, as per this thread, not about the UI discussions that went on there.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

_{.dan. is the New Zealand Drupal Developer working on Government Web Standards}

huh

chx commented 27 December 2006 at 10:37

you telling me that you were importing-export data from/to Drupal database without using the Drupal API?

"my-database-is-the-only-answer" -- no. But my code is the only gateway to my database.

No matter how many validations you do, the node.body is actually two things in one, the teaser and the body, nicely separated by a pseudotag. On export, you need to separate out teaser anyways...
--
The news is Now Public | Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

I'm not a great fan of the

dman commented 27 December 2006 at 16:19

I'm not a great fan of the node->body being two things at once a great thing to begin with, but it works up to a point.
Teaser can be abstracted from it as a convenience, fine, but it's not an intrinsic part of it.
Even in the database, there is node.body and node.teaser. If the DB can handle that concept, why do we have to break the body HTML code to continue storing it?

I saw the filters as garnishes or pretty-printers that get applied to the content just before display. Not things that change my contents content or validity.
If the change was important enough to actually change the semantics of the body, it should have been done at input time - really an 'input filter' and then stored in its fixed state.
Deliberately saving bad code then trusting an output filter to fix it every time is just not the right way forward ... in my way of thinking.

I'd use the API to export data, sure, but I don't treat the filter as part of the data, and don't export filtered. It's an optional pretty-printer that will help display the node as was intended, but it shouldn't make the node worthless if it's not applied any more than a missing css should make your HTML corrupt.

Maybe I have a different concept of 'filters' than other folk. OK.
I thought we had:
Something Usable + Enhancements = Viewable page

Instead, if the filters are ALWAYS REQUIRED before content is accessed we have:
Something Custom Encoded + Decoder = Viewable page

I don't see that model as better.

For import/export, my concept is to be able to take a node from system A to system B - either of them being Drupal or not. I need not have to take filter 'n' and its configurations along for the ride just to make the node, as a stand-alone page, readable.
Neither should I be required to filter it on the way out, as that would proscribe importing it to another filtering Drupal system.

Thus, I want the node data to be reasonably self-contained. Asking for something that is currently 100% valid HTML to continue to be valid HTML is not a big ask. Deliberately breaking it for no big win like this change does seems wrong.

There are infinite validating alternatives.
The comment version was fine.
<br class='pagebreak' /> would work for me.
wiki-like-syntax --- would be fine. markdown [break] is not bad.
WHY does the pseudo-HTML choice have to be the one thing that would instantly invalidate HTML, XHTML, wysiwyg editors, third party tools, future upgrades and everything?

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

_{.dan. is the New Zealand Drupal Developer working on Government Web Standards}

The answer.

eaton commented 5 January 2007 at 06:44

Any direct use of the content in the node table requires post processing. Either you use the drupal filter system, or you roll your own. In the latter case, a simple str_replace is all that is needed. it is not the end of the world.

Sorry.

--
Lullabot! | Eaton's blog | VotingAPI discussion

--
Eaton — Partner at Autogram

a very proprietary way of thinking

ericg commented 5 January 2007 at 15:20

This disturbs me mainly because it is a very proprietary way of thinking. Free/Open Source software should always (when possible) extend existing standards.

To justify moving away from standards by suggesting that the data in my drupal sites's db should only be accessed by drupal is not a very f/oss way of looking at things.

Anyone that has ever built systems that are a combination/interaction of many different codebases; anyone that has ever done large scale moving of data from one cms to another knows this well. One of the things that attracted me to drupal in the first place was how its filters and such never modified the user-entered content and encouraged the use of current html standards.

Excerpt into core

peterx commented 23 January 2007 at 23:27

I agree with placing some discussion in forum threads. I post issues and find they do not appear in "My recent posts" unless there is an update, which makes issues useless for public discussions.

I agree with all the XML, the conversion stuff, and have converted a Drupal site to another system. If data contains any HTML or XML then it should be XML compliant.

I vote for the following in the order listed:

Put Excerpt into core and update Excerpt to use Javascript so everyone can have real teasers defaulting to the first part of the body. We can type in the body then optionally edit the teaser before hitting Preview.
Put Excerpt into core so everyone can have real teasers.
<teaser></teaser>

<drupal:break />

When my Drupal is broken (after a new module screws up) or PHP is broken (rare over the last few years) or Apache is broken (never with Apache 2) or the ISP screwed up httpd.conf (does the sun rise?) or the server permissions are screwed up (which they like to do on any week containing a Tuesday), I can access my data using SQL. I would hate to then be screwed by invalid XML.

petermoulding.com/web_architect

I am wrong..

chx commented 27 December 2006 at 01:39

... regarding the /filter . While filter_xss indeed removes them, full HTML does not run fitler_xss. Steven said:

For WYSIWYG, there should be a button insert a break automatically, and it should be displayed in a nice fashion (horizontal rule with 'read more' label or something). This functionality has existed for a whole now.

--
The news is Now Public | Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

--
Drupal development: making the world better, one patch at a time. | A bedroom without a teddy is like a face without a smile.

May be someone knows, how to

chess2u.com commented 23 January 2007 at 09:11

May be someone knows, how to implement "pagebreak" in Drupal 5.0 ?
------
DruChess
WinDict
------

<break> is how its done in

vm commented 23 January 2007 at 23:39

<break> is how its done in drupal 5.

nope

sepeck commented 24 January 2007 at 00:10

It's still . Rolled back

-Steven Peck
---------
Test site, always start with a test site.
_{Drupal Best Practices Guide -|- Black Mountain}

-Steven Peck
---------
Test site, always start with a test site.
_{Drupal Best Practices Guide}

lol - its too hard to keep

vm commented 24 January 2007 at 00:11

lol - its too hard to keep up with the bouncing ball. Thanks for the correction!

There are 2 types of breaks

chess2u.com commented 13 February 2007 at 14:47

There are 2 types of breaks in 4.7:

<!--break--> is to break preview
<!--pagebreak--> to divide content into pages

Second one is not realized yet !

------
DruChess
WinDict
------

Somewhat confusing

cburschka

they

commented 13 February 2007 at 15:08

Confused me to no end that one week the database schema update replaced all my (non-existant)  instances with <break>, and the next update reversed it... but I'm glad this was done nonetheless, since I share the OP's opinion regarding pseudotags.

--Aran

Fix the editors

peterx commented 13 February 2007 at 21:00

The FCK editor and widgEditor need to be fixed so you can specify the XML elements that are or can be single tags.

Tagging of the form [amazon title 123456] is too restrictive and produces the wrong result when the filter does not work or the tag is slightly wrong. An element of the form <amazon type="title" asin="123456" /> produces a flexible approach and disappears when the element is wrong or the filter is wrong.

With XML elements you have the choice of a Drupal filter per element or combining them in XSL. I implemented XSL for one Drupal 4.6 site and XSL was fast. I expect that combining several Drupal filters into one XSL will be the fastest long term approach. We need valid XML and working editors for XSL.

petermoulding.com/web_architect

Nooo.

dman commented 14 February 2007 at 16:35

<amazon type="title" asin="123456" /> produces a flexible approach and disappears when the element is wrong or the filter is wrong.

That invalid tag does not 'disappear', it sits in the background causing your pages to not validate!

If you want to 'extend' HTML arbitrarily like that, learn the rules on how to do it right.
HTML pages and HTML tags are in the HTML namespace.
For your suggestion to be legal, it must be <your:amazon type="title" asin="123456" /> and have a corresponding xmlns:your declaration in the page header. Otherwise you are deliberately creating bad markup. That's a hassle.

Markdown IS an ugly work-around, and it would be nice to phase out. However it's simple, visible and it works. It doesn't scare the average user away either.

My favoured option - if you need complex embeds - is to extend from within, using microformat-style annotations. This is very compatable with XSL, and you don't need to make up a new ad-hoc XML dialect to do it.
<span class='amazon-title' title='123456' />
However it's a bit less intuative, and you still need tool support, but at least the support can be generic.

For really complex embeds, I guess you should use the OBJECT tag. That's what it's there for, and it can degrade legally.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

_{.dan. is the New Zealand Drupal Developer working on Government Web Standards}

Amazon Filter as an example

peterx commented 14 February 2007 at 21:46

The details of XML namespaces are a little beyond this discussion but Drupal could host the XML namespace definitions for Drupal modules and provide a page pointing to tutorials on namespaces.

Amazon filter currently provides three options and is documented as [amazon cover|price|title ASIN]. The XML version, using namespaces could be the following, which is even easier than using type="".

<html xmlns:amazon="http://xml.drupal.org/amazon" xml:lang="en">

<amazon:cover asin="123456" />
<amazon:price asin="123456" />
<amazon:title asin="123456" />

petermoulding.com/web_architect

You are correct, but I don't think you are right ;-)

dman commented 15 February 2007 at 04:06

Yes, that is one of the possible paths that could be followed, and would even work.

I've played with some similar ideas looking at XMLNode and with my experiments in making an SVG node type and VRML/H-Anim nodes. (Proof of concept only)

Although I've proven to myself that it CAN be done, I'm not able to say it's intuative, or even explicable or sustainable.
Nodes don't (currently) have the ability to notate multiple namespaces, although that could be developed. Perhaps the filters would push themselves into the header as a matter of course. Your suggestion of Drupal being the namespace is a good one.

The path you describe is valid, I just don't know that the effort and pitfalls would be worth the result. As you say, namespaces are out-of-scope here ... but are one of possible the technical solutions. Very technical, however, and would require, as you say, extra documentation and work.
I think that with microformats (just extra classes in your XHTML) that entire messy problem can be sidestepped.

.dan.
How to troubleshoot Drupal | http://www.coders.co.nz/

_{.dan. is the New Zealand Drupal Developer working on Government Web Standards}

New <break> break wysiwyg editors and W3C validation

Comments

What have you wanted to achieve with this post?

step back from the hostility

hostility

wrong on two points

a serious question

I object.

Thankyou for posting this topic in the public arena

huh

I'm not a great fan of the

The answer.

a very proprietary way of thinking

Excerpt into core

I am wrong..

May be someone knows, how to

<break> is how its done in

nope

lol - its too hard to keep

There are 2 types of breaks

Somewhat confusing

Fix the editors

Nooo.

Amazon Filter as an example

You are correct, but I don't think you are right ;-)

New forum topics

News items

Our community

Documentation

Drupal code base

Governance of community