If you use FCKeditor (I've not tested with TinyMCE) and Pagination, you can obtain pages which will not be considered valid by validator.w3.org.

The reason is that FCKeditor embeds the pagination tag into a paragraph, i.e.

[ header = next page ]

So, in the first page there will be a

without the end tag, and in the next page, there will be a

without the start tag.

I have fixed this issue adding a function and changing this section of the function _get_pages() in pagination.module.

$textsize = strlen($text);
$previous_start_tags = '';

do {
$section = _get_section_break($text, $cutoff);
$match = (is_array($section) and isset($section['break']) ) ? $section['break'] : 0;
$header = (is_array($section) and isset($section['header']) ) ? $section['header'] : null;
$section = is_array($section) ? $section['text'] : $section;
//$pages[] = $section;
$start_tags = '';
$end_tags = '';
_pagination_xhtml_unterminated_elements($section, $start_tags, $end_tags);
$pages[] = $previous_start_tags . $section . $end_tags;
$previous_start_tags = $start_tags;
$text = substr($text, strlen($section) + $match);

/**
* Get the unterminated elements in the xhtml code.
* Useful when splitting xhtml code
*
* @param string $xhtml
* the code to be parsed
* @param string $star_tags
* used to return the unterminated elements
* @param string $end_tags
* used to return the missing end tags
*/
function _pagination_xhtml_unterminated_elements($xhtml, &$start_tags, &$end_tags)
{
$unclosed_tags = array();
$end_tags = array();
$regexp = '/<[^\!]{1}[^\/]*?>/';
preg_match_all($regexp, $xhtml, $matches);
$tags = $matches[0];
foreach ($tags as $tag)
{
if (substr($tag, 0, 2) != ' {
// start tag
$unclosed_tags[] = $tag;
}
else
{
// end tag, delete the open tag
end($unclosed_tags);
unset( $unclosed_tags[key($unclosed_tags)] );
}
}
foreach($unclosed_tags as $tag)
{
if (strpos($tag, ' ')) $end_tags[] = '';
else $end_tags[] = '')-1) . '>';
}
$end_tags = array_reverse($end_tags);
$start_tags = implode($unclosed_tags);
$end_tags = implode($end_tags);
}

CommentFileSizeAuthor
#1 pagination.zip7.13 KBsersim
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sersim’s picture

FileSize
7.13 KB

Sorry, the tags in the post have been parsed!

The reason is that FCKeditor embeds the pagination tag into a paragraph, i.e.
<p>[ header = next page ]</p>
So, in the first page there will be a <p> without the end tag, and in the next page, there will be a </p>
without the start tag.

mundanity’s picture

Hi Sersim,

Thanks for the information, I'll take a closer look at what FCKeditor does, and compare that to TinyMCE and see if there's a compromise I can put in. I appreciate the code, I may use that if you don't mind once I've taken a closer look.

sersim’s picture

Hi mundowen,
the same thing happens with TinyMCE. You can test here: http://tinymce.moxiecode.com/examples/full.php

If you past [ header = next page ] in the editor and switch to HTML, you will see that it has been transformed into <p>[ header = next page ]</p>
You can remove <p> and </p> from the the HTML Source Editor and click Update, but, if you reopen the HTML Source Editor, you will see that <p> and </p> have been added again.

I will be glad if you will add my code to Pagination.

sersim’s picture

I've updated the regular expression used in _pagination_xhtml_unterminated_elements().
This is the new one:

  $regexp = '/<[^\!]{1}[^>]*?[^\/>]{1}>/';
mundanity’s picture

Hi Sersim,

Sorry for not getting back to you earlier. I went a different route with this, for similar reasons that I posted on the other thread about HTML parsing. I'm happy for now with sticking to the way Drupal's node_teaser() handles the situation (break at <p> tags). I've modified the initial header regex to look for optional <p> tags which will avoid the FCKEditor/TinyMCE issue. A new release containing this fix should be out shortly, just cleaning up a few other things as I re-factored the core pagination code. Thanks for your help in identifying this issue.

mundanity’s picture

Version: 6.x-1.3 » 6.x-1.x-dev
Status: Needs review » Fixed

Hi Serism,

The latest dev version should address this specific issue. I want to knock out a few more issues for the official release (target is end of the week at the latest), but if you want to take a look, I'd be happy for the feedback. The dev version updates every 12 hours or so apparently, so it may be awhile for the latest build to appear.

sersim’s picture

The problem is not only with the <p> tag.
For example a post like this (it's a real case) causes problems in the page rendering:

<div>
text
[ header = next page ]
<div>
text
[ header = next page ]
<div>
text
[ header = next page ]
</div>
</div>
</div>

My patch handles these cases too.

Update
The line which calls the function has to be replaced with this one:

_pagination_xhtml_unterminated_elements($previous_start_tags . $section, $start_tags, $end_tags);
mundanity’s picture

Status: Fixed » Active

Hi Sersim,

Hmm, let me try to re-think this one as well.

giozzz’s picture

Hello! patch and solution pointed in #4 worked for me!!! thanks a lot

mundanity’s picture

Title: Using FCKeditor, Pagination could produce not validated pages » HTML breaks due to regex for "complex" HTML based content

Just re-titling so I get a better sense of what needs to be looked at.