Note: This is now a GHOP issue: http://code.google.com/p/google-highly-open-participation-drupal/issues/...

My code happened to include a function with the line

$found = preg_match("|<title.?>(.*)</title>|",$source,$match);

It seems the codefilter regexp found the ?> and decided it was the end of the ?php block, te rest of my listing came out plaintext.
This is to be expected when running regexps over source that includes regexps.

It made me realize that maybe my code was slightly wrong to begin with anyway, so no patch, sorry, but FYI, dropping out of codefilter prematurely may be due to this string (?>) being found in the source.

Example follows :)


/**
 * Go to the net and retrieve a defined HTML page, fetch its title
 * Should not call this too often, as it's obviously network-heavy
 * Very simple scan for 'title' tag in the target
 */
function fetch_remote_title($path){
  $source = file_get_contents($path);
  if($source){
    $found = preg_match("|<title.

(.*)|",$source,$match);
if($found){
$title = $match[1];
drupal_set_message("Looked up remote link and found title:'$title' ");
return $title;
}
}
}

?>

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

deekayen’s picture

I haven't seen this problem on the PHP.net manual comments. Maybe we can rip a preg_replace from there?

deekayen’s picture

I'm able to break the Drupal codefilter regexp by pasting the codefilter_process_php function into a page. When I paste the same code into the PHP.net comment box, hit preview, and even add some erroneous end PHP tags ?>, it still doesn't seem to break, so I tracked down the PHP highlight process for the PHP manual note display:


manual_note_display(time(), stripslashes($user), stripslashes($note), FALSE);

function manual_note_display($date, $name, $text, $id)
{
   if ($name) { $name = "\n  <strong>" . htmlspecialchars($name) . "</strong><br />"; }
   $datestr = date("d-M-Y h:i", $date);
   $datestr = ($id ? "\n  <a href=\"#$id\">$datestr</a>" : "\n  $datestr");
   $anchor  = ($id ? "<a name=\"$id\"></a>\n " : "");
   $text    = clean_note($text);

   // If the viewer is logged in, show admin options
   if (isset($_COOKIE['MAGIC_COOKIE']) && $id) {
  
       $admin = "\n  <span class=\"action\">\n  " .
      
       make_popup_link(
           'http://master.php.net/manage/user-notes.php?action=edit+' . $id,
           make_image('notes-edit.gif', 'edit note'),
           'admin',
           'scrollbars=yes,width=650,height=400'
       ) . "\n  " .
  
       make_popup_link(
           'http://master.php.net/manage/user-notes.php?action=reject+' . $id,
           make_image('notes-reject.gif', 'reject note'),
           'admin',
           'scrollbars=no,width=300,height=200'
       ) . "\n  " .
  
       make_popup_link(
           'http://master.php.net/manage/user-notes.php?action=delete+' . $id,
           make_image('notes-delete.gif', 'delete note'),
           'admin',
           'scrollbars=no,width=300,height=200'
       ) . "\n  </span>";
      
   } else { $admin = ''; }
  
   echo <<<USER_NOTE_TEXT

 {$anchor}<div class="note">{$admin}{$name}{$datestr}
  <div class="text">
{$text}
  </div>
 </div>
USER_NOTE_TEXT;
 
}

function clean_note($text)
{
   // Highlight PHP source
   $text = highlight_php(trim($text), TRUE);

   // Turn urls into links
   $text = preg_replace(
       '!((mailto:|(http|ftp|nntp|news):\/\/).*?)(\s|<|\)|"|\\|\'|$)!',
       '<a href="\1" target="_blank">\1</a>\4',
       $text
   );
  
   return $text;
}

function highlight_php($code, $return = FALSE)
{
   // Using OB, as highlight_string() only supports
   // returning the result from 4.2.0
   ob_start();
   highlight_string($code);
   $highlighted = ob_get_contents();
   ob_end_clean();

   // This should eventually be a php_syntax_check() call when we move to PHP5
   // But use this ugly hack for now to avoid code snippets with bad syntax screwing up the highlighter 
   if(strstr($highlighted,"include/layout.inc</b>")) $highlighted = "&quot;.htmlentities($code).&quot;";
 
   // Fix output to use CSS classes and wrap well
   $highlighted = '<div class="phpcode">' . str_replace(
       array(
           '&nbsp;',
           '<br />',
           '<font color="',        // for PHP 4
           '<span style="color: ', // from PHP 5.0.0RC1
           '</font>',
           "\n ",
           '  '
       ),
       array(
           ' ',
           "<br />\n",
           '<span class="',
           '<span class="',
           '</span>',
           "\n&nbsp;",
           '&nbsp; '
       ),
       $highlighted
   ) . '</div>';
  
   if ($return) { return $highlighted; }
   else { echo $highlighted; }
}

deekayen’s picture

FileSize
1.99 KB

The attached file breaks down the previous code into a more basic form. It can still be fooled, but not as easily.

deekayen’s picture

Title: regexp eror drops out of parsing halfway » regexp error drops out of parsing halfway
FileSize
2.66 KB

This version only puts code tags around the PHP instead of the whole string.

deekayen’s picture

Please don't take my code as a sign I plan on closing this bug. The filter hooks are too much of a headache. To summarize, the PHP.net way I've discussed is to highlight_string() the whole string. Then I added a substr to trim off the code tags added by highlight_string(), and str_replace to go add the code tags back for instances of open and close PHP tags. That removes the problem of having a regex try to identify the start and end of PHP code.

JohnAlbin’s picture

Title: regexp error drops out of parsing halfway » "?>" string prematurely stops syntax highlighting
hass’s picture

subscribe

corsix’s picture

Status: Active » Needs review
FileSize
4.32 KB

Now a GHOP issue: http://code.google.com/p/google-highly-open-participation-drupal/issues/...

Attached patch is against the Drupal 6 version of Codefilter (http://drupal.org/files/issues/codefilter_d6_1.patch), and uses uses a non-regex method to detect which ?> is the correct one to end the highlighting, as a regex cannot easily detect if a ?> is within a multiline comment or a string.

deekayen’s picture

Status: Needs review » Needs work
FileSize
428.04 KB

Still doesn't fix the bug. For example, pasting the source of the codefilter module to a node still breaks in and out of proper highlighting.

webchick’s picture

Title: "?>" string prematurely stops syntax highlighting » GHOP #58: "?>" string prematurely stops syntax highlighting
corsix’s picture

Status: Needs work » Needs review
FileSize
4.93 KB

Updated patch to fix the issue it had with "?>" within a single quoted string.

deekayen’s picture

Status: Needs review » Needs work

It is improved, but still fails tests in #3 & #4. Also watch your coding standards (e.g. if statements http://drupal.org/coding-standards).

Writing special cases for every occurrence where it breaks out seems like a sub-optimal solution to me.

cYu’s picture

Pasting the following code in the body of a page

<?php
echo " \" ";
?>

results in

<?php
echo " " ";
?>

with this latest patch.

Using that same echo inside of code tags instead of PHP tags, however, gives the expected result of

echo " \" ";
corsix’s picture

As preg_replace-with-/e is no longer used for preparing the PHP code, and codefilter_escape is used, which undoes some odd preg_replace-with-/e behaviour on slashes, slashes are being removing when the shouldn't.
Fixed this (test it in comments - http://corsix.org/drupal60b3/node/2), but still needs some work regarding escaped html entites.

soxofaan’s picture

subscribing, since it is also a problem for http://drupal.org/project/geshifilter

corsix’s picture

Status: Needs work » Needs review
FileSize
6.46 KB

> Writing special cases for every occurrence where it breaks out seems like a sub-optimal solution to me.
IMO, it was accounting for odd behaviour with highlight_string, as highlight_string should not remove any of the string, just like there was existing code to account for odd preg_replace-with-/e behaviour on slashes.

Adjusted the if statements that went against the coding standards, and further bug fixes to the code.

corsix’s picture

FileSize
7.14 KB

Patch revised for coding standards, E_ALL compliance, and to re-enable unhighlighted sections, which broke with patch version 2.

deekayen’s picture

Status: Needs review » Needs work
FileSize
67 bytes

Code tags inline should keep the included text inline. Code tags with a line break involved should parse the contents out into the grey box as formatted plaintext. For example, the attached file should have tester inline as a fixed width font, then code highlight the echo, then put the last block in a box. Instead, the first line is boxed and doesn't quit that code block till it gets to the second end code tag.

deekayen’s picture

Status: Needs work » Needs review

I'm giving a well deserved good job on killing the test cases, and then the supplemental test cases from cYu. I'm switching back to CNR, but only so John can note to check the code implementation. The functionality seems to pass all the hell and fire I could think of putting it through.

cYu’s picture

I ran as many standard and obscure cases as I could think of and this most recent patch handled all of them well. Great work staying with this one and patching up these issues so quickly.

JohnAlbin’s picture

I’ll review this tonight.

JohnAlbin’s picture

It took me a while to review this because I was getting a really weird cacheing issue. (If you are doing before/after comparisons of content before/after a patch, you have to change the content, and not just edit with no changes and save, otherwise the "after" result will still look like the "before" content.) Anyway…

It looks like the patch works as advertised regarding ?> strings. Nice work! But, a couple of notes/questions…

Why did you change the way newlines are escaped in codefilter_escape()? This isn’t a criticism; I just want to understand why it was done.

And, if you look at line 102 of the original file: preg_replace('@[\[<](\?php|%)(.+?)(\?|%)[\]>]@se', you’ll see that, in order to work with the php input filter, the codefilter module supports php delimiters with square brackets in addition to those with angle brakets: [?php, and [%. You’ll need to make sure the module continues to support those delimeters.

Lastly, and this might be scope creep (let me know if you feel this should be a separate issue), but the content, <code><?php print "</code>"; ?></code>, is broken (in both the patched and un-patched versions.)

JohnAlbin’s picture

Status: Needs review » Needs work
corsix’s picture

Status: Needs work » Needs review
FileSize
10.28 KB

Newlines:
When putting the source of codefilter.module into a post, I found that the "& #10;" in the actual source was being replace by a newline (which might be fine when the source is dealing with html, but that cannot be assumed), as "& #10;" was being used as a temporary newline replacement. As "<" and ">" are replaced by fun (and conveniently invalid) unicode characters between prepare and process, using something similar for newlines seemed a consistent solution to the problem.

The following code contains & #10; between the quotes, not a newline:
<?php echo "&#10;"; ?>

Square brackets:
Ok, added support for them.

php within code:
This is hopefully fixed as well.

JohnAlbin’s picture

Status: Needs review » Needs work

When applying this patch, I get: Hunk #5 FAILED at 226.

corsix’s picture

Status: Needs work » Needs review

Strange; I did the patch against CVS head, so it shouldn't be failing, seeing as every other patch I've posted is against CVS head and has worked. I'll look into it.

Works fine here:

E:\HTML\Apache\users\drupal60b3\modules>set CVSROOT=:pserver:anonymous:anonymous@cvs.drupal.org:/cvs/drupal-contrib

E:\HTML\Apache\users\drupal60b3\modules>"C:\Program Files\TortoiseCVS\cvs.exe" -q checkout -d codefilter -P contributions/modules/codefilter
U codefilter/CHANGELOG.txt
U codefilter/INSTALL.txt
U codefilter/README.txt
U codefilter/codefilter.css
U codefilter/codefilter.info
U codefilter/codefilter.module
U codefilter/translations/codefilter.pot
U codefilter/translations/hu.po

E:\HTML\Apache\users\drupal60b3\modules>cd codefilter

E:\HTML\Apache\users\drupal60b3\modules\codefilter>patch < codefilter_ghop58_4.patch
patching file `codefilter.module'

E:\HTML\Apache\users\drupal60b3\modules\codefilter>
JohnAlbin’s picture

Sorry, my fault. I was applying it to DRUPAL-5.

JohnAlbin’s picture

Status: Needs review » Needs work

It looks like everything is working now.

Can we get some more testing on this recent patch?

I think there are some optimizations that we could do, but I don't have time this week to really give it the kind of analysis it deserves.

If you add PHP docblocks to the functions you added, fix the typos in your comments (like “Iff”), and remove any trailing spaces from any lines, I'll mark this RTBC.

corsix’s picture

I did actually mean iff (http://en.wikipedia.org/wiki/Iff). Regardless, I'll go over the comments & docblocks.

I would also be interested to see what kind of optimisations are possible.

corsix’s picture

Status: Needs work » Needs review
FileSize
13.38 KB

Added docblocks, removed trailing whitespace, did a spelling check, and fixed a bug relating to php blocks (not) within code blocks;

<code>
some_code();
</code>
<?php
outside_of_the_code_block();
?>
<code>
some_more_code();
</code>

This would be treated as one big code block rather than two seperate ones, as all php blocks after a code block would extend the code block to include them.

JohnAlbin’s picture

Status: Needs review » Needs work

See how \xFE and \xFF can’t be used in Drupal 5.6: http://drupal.org/node/208636

zeta ζ’s picture

subscribe …

corsix’s picture

This is for Codefilter 6.x-1.x-dev, so not relevant to Drupal 5.6. As the issue with \xFE and \xFF has been resolved at the aforementioned node, it can be ported to the 6.x branch if it hasn't been already, but I do not see how it is related to the original problem of "?>" stopping highlighting.

cYu’s picture

I think the square bracket support may not be working. I get an incorrect result when surrounding

$a = str_replace('[%]', '', $b);

in php tags.

corsix’s picture

FileSize
43.38 KB

Could you provide a screenshot of it failing?

The quick test I did looked fine (attached image).

cYu’s picture

FileSize
23.41 KB

My mistake Corsix, I was assuming that your patch was already in the current version of Codefilter which is where I was testing. Grabbing 1.25 and applying your latest patch outputs the correct results for the case I mentioned.

Oddly enough though, surrounding that same example with code tags instead of php tags messes up the output.

JohnAlbin’s picture

Title: GHOP #58: "?>" string prematurely stops syntax highlighting » "?>" string prematurely stops syntax highlighting

I'm glad Peter is getting credit for this in the GHOP. He did a lot of work and the patch does fix the bug.

However, there is a lot of code change to fix what is really just a minor bug. I need to create some easy way to test code patches for regression testing and for performance. Especially since this module is used on d.o.

Until I get some Simple Tests created, big patches for minor bugs will have to wait.