"?>" string prematurely stops syntax highlighting

dman - November 17, 2005 - 23:47
Project:Code Filter
Version:6.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs work
Issue tags:GHOP
Description

Note: This is now a GHOP issue: http://code.google.com/p/google-highly-open-participation-drupal/issues/...

My code happened to include a function with the line

$found = preg_match("|<title.?>(.*)</title>|",$source,$match);

It seems the codefilter regexp found the ?> and decided it was the end of the ?php block, te rest of my listing came out plaintext.
This is to be expected when running regexps over source that includes regexps.

It made me realize that maybe my code was slightly wrong to begin with anyway, so no patch, sorry, but FYI, dropping out of codefilter prematurely may be due to this string (?>) being found in the source.

Example follows :)

<?php
/**
* Go to the net and retrieve a defined HTML page, fetch its title
* Should not call this too often, as it's obviously network-heavy
* Very simple scan for 'title' tag in the target
*/
function fetch_remote_title($path){
 
$source = file_get_contents($path);
  if(
$source){
   
$found = preg_match("|<title.
?>
(.*)|",$source,$match);
if($found){
$title = $match[1];
drupal_set_message("Looked up remote link and found title:'$title' ");
return $title;
}
}
}

?>

#1

deekayen - February 8, 2006 - 05:53

I haven't seen this problem on the PHP.net manual comments. Maybe we can rip a preg_replace from there?

#2

deekayen - February 8, 2006 - 16:00

I'm able to break the Drupal codefilter regexp by pasting the codefilter_process_php function into a page. When I paste the same code into the PHP.net comment box, hit preview, and even add some erroneous end PHP tags ?>, it still doesn't seem to break, so I tracked down the PHP highlight process for the PHP manual note display:

<?php
manual_note_display
(time(), stripslashes($user), stripslashes($note), FALSE);

function
manual_note_display($date, $name, $text, $id)
{
   if (
$name) { $name = "\n  <strong>" . htmlspecialchars($name) . "</strong><br />"; }
  
$datestr = date("d-M-Y h:i", $date);
  
$datestr = ($id ? "\n  <a href=\"#$id\">$datestr</a>" : "\n  $datestr");
  
$anchor  = ($id ? "<a name=\"$id\"></a>\n " : "");
  
$text    = clean_note($text);

  
// If the viewer is logged in, show admin options
  
if (isset($_COOKIE['MAGIC_COOKIE']) && $id) {
 
      
$admin = "\n  <span class=\"action\">\n  " .
     
      
make_popup_link(
          
'http://master.php.net/manage/user-notes.php?action=edit+' . $id,
          
make_image('notes-edit.gif', 'edit note'),
          
'admin',
          
'scrollbars=yes,width=650,height=400'
      
) . "\n  " .
 
      
make_popup_link(
          
'http://master.php.net/manage/user-notes.php?action=reject+' . $id,
          
make_image('notes-reject.gif', 'reject note'),
          
'admin',
          
'scrollbars=no,width=300,height=200'
      
) . "\n  " .
 
      
make_popup_link(
          
'http://master.php.net/manage/user-notes.php?action=delete+' . $id,
          
make_image('notes-delete.gif', 'delete note'),
          
'admin',
          
'scrollbars=no,width=300,height=200'
      
) . "\n  </span>";
     
   } else {
$admin = ''; }
 
   echo <<<USER_NOTE_TEXT

{$anchor}<div class="note">{$admin}{$name}{$datestr}
  <div class="text">
{$text}
  </div>
</div>
USER_NOTE_TEXT;

}

function
clean_note($text)
{
  
// Highlight PHP source
  
$text = highlight_php(trim($text), TRUE);

  
// Turn urls into links
  
$text = preg_replace(
      
'!((mailto:|(http|ftp|nntp|news):\/\/).*?)(\s|<|\)|"|\\|\'|$)!',
      
'<a href="\1" target="_blank">\1</a>\4',
      
$text
  
);
 
   return
$text;
}

function
highlight_php($code, $return = FALSE)
{
  
// Using OB, as highlight_string() only supports
   // returning the result from 4.2.0
  
ob_start();
  
highlight_string($code);
  
$highlighted = ob_get_contents();
  
ob_end_clean();

  
// This should eventually be a php_syntax_check() call when we move to PHP5
   // But use this ugly hack for now to avoid code snippets with bad syntax screwing up the highlighter
  
if(strstr($highlighted,"include/layout.inc</b>")) $highlighted = "<code>&quot;.htmlentities($code).&quot;</code>";

  
// Fix output to use CSS classes and wrap well
  
$highlighted = '<div class="phpcode">' . str_replace(
       array(
          
'&nbsp;',
          
'<br />',
          
'<font color="',        // for PHP 4
          
'<span style="color: ', // from PHP 5.0.0RC1
          
'</font>',
          
"\n ",
          
'  '
      
),
       array(
          
' ',
          
"<br />\n",
          
'<span class="',
          
'<span class="',
          
'</span>',
          
"\n&nbsp;",
          
'&nbsp; '
      
),
      
$highlighted
  
) . '</div>';
 
   if (
$return) { return $highlighted; }
   else { echo
$highlighted; }
}
?>

#3

deekayen - February 8, 2006 - 17:31

The attached file breaks down the previous code into a more basic form. It can still be fooled, but not as easily.

AttachmentSize
codefiltertest.php.txt 1.99 KB

#4

deekayen - February 8, 2006 - 19:32
Title:regexp eror drops out of parsing halfway» regexp error drops out of parsing halfway

This version only puts code tags around the PHP instead of the whole string.

AttachmentSize
codefiltertest.php_0.txt 2.66 KB

#5

deekayen - February 8, 2006 - 20:58

Please don't take my code as a sign I plan on closing this bug. The filter hooks are too much of a headache. To summarize, the PHP.net way I've discussed is to highlight_string() the whole string. Then I added a substr to trim off the code tags added by highlight_string(), and str_replace to go add the code tags back for instances of open and close PHP tags. That removes the problem of having a regex try to identify the start and end of PHP code.

#6

JohnAlbin - August 17, 2007 - 22:41
Title:regexp error drops out of parsing halfway» "?>" string prematurely stops syntax highlighting

#7

hass - August 18, 2007 - 06:17

subscribe

#8

corsix - December 1, 2007 - 21:13
Status:active» needs review

Now a GHOP issue: http://code.google.com/p/google-highly-open-participation-drupal/issues/...

Attached patch is against the Drupal 6 version of Codefilter (http://drupal.org/files/issues/codefilter_d6_1.patch), and uses uses a non-regex method to detect which ?> is the correct one to end the highlighting, as a regex cannot easily detect if a ?> is within a multiline comment or a string.

AttachmentSize
codefilter_ghop58.patch 4.32 KB

#9

deekayen - December 3, 2007 - 02:25
Status:needs review» needs work

Still doesn't fix the bug. For example, pasting the source of the codefilter module to a node still breaks in and out of proper highlighting.

AttachmentSize
codefilter_GHOP.png 428.04 KB

#10

webchick - December 3, 2007 - 06:57
Title:"?>" string prematurely stops syntax highlighting» GHOP #58: "?>" string prematurely stops syntax highlighting

#11

corsix - December 3, 2007 - 16:58
Status:needs work» needs review

Updated patch to fix the issue it had with "?>" within a single quoted string.

AttachmentSize
codefilter_ghop58_1.patch 4.93 KB

#12

deekayen - December 3, 2007 - 17:47
Status:needs review» needs work

It is improved, but still fails tests in #3 & #4. Also watch your coding standards (e.g. if statements http://drupal.org/coding-standards).

Writing special cases for every occurrence where it breaks out seems like a sub-optimal solution to me.

#13

cYu - December 3, 2007 - 17:53

Pasting the following code in the body of a page

<?php
echo " \" ";
?>

results in

<?php
echo " " ";
?>

with this latest patch.

Using that same echo inside of code tags instead of PHP tags, however, gives the expected result of

echo " \" ";

#14

corsix - December 3, 2007 - 23:01

As preg_replace-with-/e is no longer used for preparing the PHP code, and codefilter_escape is used, which undoes some odd preg_replace-with-/e behaviour on slashes, slashes are being removing when the shouldn't.
Fixed this (test it in comments - http://corsix.org/drupal60b3/node/2), but still needs some work regarding escaped html entites.

#15

soxofaan - December 4, 2007 - 09:37

subscribing, since it is also a problem for http://drupal.org/project/geshifilter

#16

corsix - December 4, 2007 - 14:53
Status:needs work» needs review

> Writing special cases for every occurrence where it breaks out seems like a sub-optimal solution to me.
IMO, it was accounting for odd behaviour with highlight_string, as highlight_string should not remove any of the string, just like there was existing code to account for odd preg_replace-with-/e behaviour on slashes.

Adjusted the if statements that went against the coding standards, and further bug fixes to the code.

AttachmentSize
codefilter_ghop58_2.patch 6.46 KB

#17

corsix - December 4, 2007 - 15:59

Patch revised for coding standards, E_ALL compliance, and to re-enable unhighlighted sections, which broke with patch version 2.

AttachmentSize
codefilter_ghop58_3.patch 7.14 KB

#18

deekayen - December 4, 2007 - 16:12
Status:needs review» needs work

Code tags inline should keep the included text inline. Code tags with a line break involved should parse the contents out into the grey box as formatted plaintext. For example, the attached file should have tester inline as a fixed width font, then code highlight the echo, then put the last block in a box. Instead, the first line is boxed and doesn't quit that code block till it gets to the second end code tag.

AttachmentSize
GHOP_58-2.txt 67 bytes

#19

deekayen - December 4, 2007 - 16:39
Status:needs work» needs review

I'm giving a well deserved good job on killing the test cases, and then the supplemental test cases from cYu. I'm switching back to CNR, but only so John can note to check the code implementation. The functionality seems to pass all the hell and fire I could think of putting it through.

#20

cYu - December 4, 2007 - 16:56

I ran as many standard and obscure cases as I could think of and this most recent patch handled all of them well. Great work staying with this one and patching up these issues so quickly.

#21

JohnAlbin - December 4, 2007 - 22:45

I’ll review this tonight.

#22

JohnAlbin - December 5, 2007 - 20:14

It took me a while to review this because I was getting a really weird cacheing issue. (If you are doing before/after comparisons of content before/after a patch, you have to change the content, and not just edit with no changes and save, otherwise the "after" result will still look like the "before" content.) Anyway…

It looks like the patch works as advertised regarding ?> strings. Nice work! But, a couple of notes/questions…

Why did you change the way newlines are escaped in codefilter_escape()? This isn’t a criticism; I just want to understand why it was done.

And, if you look at line 102 of the original file: preg_replace('@[\[<](\?php|%)(.+?)(\?|%)[\]>]@se', you’ll see that, in order to work with the php input filter, the codefilter module supports php delimiters with square brackets in addition to those with angle brakets: [?php, and [%. You’ll need to make sure the module continues to support those delimeters.

Lastly, and this might be scope creep (let me know if you feel this should be a separate issue), but the content, <code><?php print "</code>"; ?></code>, is broken (in both the patched and un-patched versions.)

#23

JohnAlbin - December 5, 2007 - 20:39
Status:needs review» needs work

#24

corsix - December 5, 2007 - 21:57
Status:needs work» needs review

Newlines:
When putting the source of codefilter.module into a post, I found that the "& #10;" in the actual source was being replace by a newline (which might be fine when the source is dealing with html, but that cannot be assumed), as "& #10;" was being used as a temporary newline replacement. As "<" and ">" are replaced by fun (and conveniently invalid) unicode characters between prepare and process, using something similar for newlines seemed a consistent solution to the problem.

The following code contains & #10; between the quotes, not a newline:
<?php echo "&#10;"; ?>

Square brackets:
Ok, added support for them.

php within code:
This is hopefully fixed as well.

AttachmentSize
codefilter_ghop58_4.patch 10.28 KB

#25

JohnAlbin - December 5, 2007 - 22:29
Status:needs review» needs work

When applying this patch, I get: Hunk #5 FAILED at 226.

#26

corsix - December 5, 2007 - 22:46
Status:needs work» needs review

Strange; I did the patch against CVS head, so it shouldn't be failing, seeing as every other patch I've posted is against CVS head and has worked. I'll look into it.

Works fine here:

E:\HTML\Apache\users\drupal60b3\modules>set CVSROOT=:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib

E:\HTML\Apache\users\drupal60b3\modules>"C:\Program Files\TortoiseCVS\cvs.exe" -q checkout -d codefilter -P contributions/modules/codefilter
U codefilter/CHANGELOG.txt
U codefilter/INSTALL.txt
U codefilter/README.txt
U codefilter/codefilter.css
U codefilter/codefilter.info
U codefilter/codefilter.module
U codefilter/translations/codefilter.pot
U codefilter/translations/hu.po

E:\HTML\Apache\users\drupal60b3\modules>cd codefilter

E:\HTML\Apache\users\drupal60b3\modules\codefilter>patch < codefilter_ghop58_4.patch
patching file `codefilter.module'

E:\HTML\Apache\users\drupal60b3\modules\codefilter>

#27

JohnAlbin - December 6, 2007 - 01:48

Sorry, my fault. I was applying it to DRUPAL-5.

#28

JohnAlbin - December 6, 2007 - 04:42
Status:needs review» needs work

It looks like everything is working now.

Can we get some more testing on this recent patch?

I think there are some optimizations that we could do, but I don't have time this week to really give it the kind of analysis it deserves.

If you add PHP docblocks to the functions you added, fix the typos in your comments (like “Iff”), and remove any trailing spaces from any lines, I'll mark this RTBC.

#29

corsix - December 6, 2007 - 09:35

I did actually mean iff (http://en.wikipedia.org/wiki/Iff). Regardless, I'll go over the comments & docblocks.

I would also be interested to see what kind of optimisations are possible.

#30

corsix - December 6, 2007 - 19:10
Status:needs work» needs review

Added docblocks, removed trailing whitespace, did a spelling check, and fixed a bug relating to php blocks (not) within code blocks;

<code>
some_code();
</code>
<?php
outside_of_the_code_block();
?>
<code>
some_more_code();
</code>

This would be treated as one big code block rather than two seperate ones, as all php blocks after a code block would extend the code block to include them.

AttachmentSize
codefilter_ghop58_5.patch 13.38 KB

#31

JohnAlbin - January 11, 2008 - 00:34
Status:needs review» needs work

See how \xFE and \xFF can’t be used in Drupal 5.6: http://drupal.org/node/208636

#32

zeta ζ - January 15, 2008 - 18:21

subscribe …

#33

corsix - January 15, 2008 - 18:31

This is for Codefilter 6.x-1.x-dev, so not relevant to Drupal 5.6. As the issue with \xFE and \xFF has been resolved at the aforementioned node, it can be ported to the 6.x branch if it hasn't been already, but I do not see how it is related to the original problem of "?>" stopping highlighting.

#34

cYu - January 24, 2008 - 20:31

I think the square bracket support may not be working. I get an incorrect result when surrounding

$a = str_replace('[%]', '', $b);

in php tags.

#35

corsix - January 24, 2008 - 23:33

Could you provide a screenshot of it failing?

The quick test I did looked fine (attached image).

AttachmentSize
Image1.png 43.38 KB

#36

cYu - January 25, 2008 - 15:05

My mistake Corsix, I was assuming that your patch was already in the current version of Codefilter which is where I was testing. Grabbing 1.25 and applying your latest patch outputs the correct results for the case I mentioned.

Oddly enough though, surrounding that same example with code tags instead of php tags messes up the output.

AttachmentSize
codefilter_bug.png 23.41 KB

#37

JohnAlbin - February 1, 2008 - 20:02
Title:GHOP #58: "?>" string prematurely stops syntax highlighting» "?>" string prematurely stops syntax highlighting

I'm glad Peter is getting credit for this in the GHOP. He did a lot of work and the patch does fix the bug.

However, there is a lot of code change to fix what is really just a minor bug. I need to create some easy way to test code patches for regression testing and for performance. Especially since this module is used on d.o.

Until I get some Simple Tests created, big patches for minor bugs will have to wait.

 
 

Drupal is a registered trademark of Dries Buytaert.