an inelegant string (i wrote it) of the form:
ΑΝΤΙ-ΛΥΣΙΣΤΡΑΤΗ
is the title of a blog entry which is imported as an rss feed using the aggregator module.

in the bluemarine theme, with the aggregator block enabled and on the lhs, this string is displayed as a newline and 86 contiguous display positions, which is inaesthetic in terms of three-column themes generally, and while reading html entity values is as easy as reading hex (or octal), the html entity values could be decoded. the browser used is a recent (debian) mozilla.

the following fixes the presenting problem, the post title in sidebar being wicked long, and undecoded:

[ebw@abenaki.wabanaki.net:96]% diff -c drupal47/includes/bootstrap.inc drupal-4.7.0/includes/bootstrap.inc
*** drupal47/includes/bootstrap.inc Sun May 7 07:18:55 2006
--- drupal-4.7.0/includes/bootstrap.inc Fri Apr 28 07:52:31 2006
***************
*** 596,605 ****
return htmlspecialchars($text, ENT_QUOTES);
}

- function check_e_plain($text) {
- return htmlspecialchars_decode($text, ENT_QUOTES);
- }
-
/**
* Since request_uri() is only available on Apache, we generate an
* equivalent using other environment variables.
--- 596,601 ----

[ebw@abenaki.wabanaki.net:97]% diff -c drupal47/modules/aggregator.module drupal-4.7.0/modules/aggregator.module
*** drupal47/modules/aggregator.module Sat May 6 22:30:52 2006
--- drupal-4.7.0/modules/aggregator.module Fri Apr 28 05:52:57 2006
***************
*** 1290,1296 ****
}

// Display the external link to the item.
! $output .= 'link) .'">'. check_e_plain($item->title) ."\n";

return $output;
}
--- 1290,1296 ----
}

// Display the external link to the item.
! $output .= 'link) .'">'. check_plain($item->title) ."\n";

return $output;
}

it is not a more general solution to all instances of aggregator acquired encoded characters being properly displayed, where "properly" may mean naive, but without violence to display width (truncated or line folded), or decoded.

since there may be issues with html, hence the need for a "check_plain" wrapper for htmlspecialchars() , this feature/bug/issue should be reviewed by someone who knows why the check_plain() wrapper for htmlspecialchars() was inserted into bootstrap.inc, and used pervasively.

the string could be displayed as: ΑΝΤΙ-ΛΥΣΙΣΤΡΑΤΗ

Comments

ebw’s picture

part of the strings in the 2nd diff were eaten by some tag eater. since the only thing interesting in the diff line is after the eaten part, just look at line 1293.

Steven’s picture

Please use the code and PHP tags to post code.

Steven’s picture

Status: Needs review » Active

And be sure to attach an actual patch.

Steve Dondley’s picture

I ran into a similar problem with html entities in a title that were already encoded. They end up getting encoded again and get screwed up.

I supplied a patch at: http://drupal.org/node/61456

ebw’s picture

hmm. lets try this:

ΑΝΤΙ-ΛΥΣΙΣΤΡΑΤΗ

o.k. that's deliciously ugly in preview.

now for the context diff that ... does the right thing, but only in one place, not everywhere it could, but that is why i suggested that someone think about why the wrapped htmlspecialchars() is so pervasive.

*** drupal-4.7.0/includes/bootstrap.inc Fri Apr 28 07:52:31 2006
--- drupal47/includes/bootstrap.inc     Sun May  7 07:18:55 2006
***************
*** 596,601 ****
--- 596,605 ----
    return htmlspecialchars($text, ENT_QUOTES);
  }
  
+ function check_e_plain($text) {
+   return htmlspecialchars_decode($text, ENT_QUOTES);
+ }
+ 
  /**
   * Since request_uri() is only available on Apache, we generate an
   * equivalent using other environment variables.
*** drupal-4.7.0/modules/aggregator.module      Fri Apr 28 05:52:57 2006
--- drupal47/modules/aggregator.module  Sat May  6 22:30:52 2006
***************
*** 1290,1296 ****
    }
  
    // Display the external link to the item.
!   $output .= '<a href="'. check_url($item->link) .'">'. check_plain($item->title) ."</a>\n";
  
    return $output;
  }
--- 1290,1296 ----
    }
  
    // Display the external link to the item.
!   $output .= '<a href="'. check_url($item->link) .'">'. check_e_plain($item->title) ."</a>\n";
  
    return $output;
  }

oddly, i'm not seeing previews, but it could just be my eyes.

ebw’s picture

sigh.

obviously preview and postview handle entities of the form & # nnn ; (spaces elided) differently.

the original string can be had for a song here: http://wampum.wabanaki.net/vault/2006/05/002748.html, just point an instance of the rss aggregator at wampum and an ugly string can be your friend.

now there are other places in the aggregator.module code where the string is incorrectly decoded, and i suppose i can find them all, and then the question may be different -- a module local wrapper?

ebw’s picture

localizing the change in the wrapper to the module, there are two instances i've identified where html entities in rss feed item titles are incorrectly displayed. the following works for me.

[ebw@abenaki.wabanaki.net:84]% diff -c drupal-4.7.0/modules/aggregator.module drupal47/modules/aggregator.module

*** drupal-4.7.0/modules/aggregator.module      Fri Apr 28 05:52:57 2006
--- drupal47/modules/aggregator.module  Mon May  8 09:31:39 2006
***************
*** 1290,1296 ****
    }
  
    // Display the external link to the item.
!   $output .= '<a href="'. check_url($item->link) .'">'. check_plain($item->tit
le) ."</a>\n";
  
    return $output;
  }
--- 1290,1296 ----
    }
  
    // Display the external link to the item.
!   $output .= '<a href="'. check_url($item->link) .'">'. check_e_plain($item->t
itle) ."</a>\n";
  
    return $output;
  }
***************
*** 1332,1338 ****
    }
  
    $output .= "<div class=\"feed-item\">\n";
!   $output .= '<h3 class="feed-item-title"><a href="'. check_url($item->link) .
'">'. check_plain($item->title) ."</a></h3>\n";
    $output .= "<div class=\"feed-item-meta\">$source <span class=\"feed-item-da
te\">$source_date</span></div>\n";
  
    if ($item->description) {
--- 1332,1338 ----
    }
  
    $output .= "<div class=\"feed-item\">\n";
!   $output .= '<h3 class="feed-item-title"><a href="'. check_url($item->link) .
'">'. check_e_plain($item->title) ."</a></h3>\n";
    $output .= "<div class=\"feed-item-meta\">$source <span class=\"feed-item-da
te\">$source_date</span></div>\n";
  
    if ($item->description) {
***************
*** 1365,1368 ****
--- 1365,1375 ----
   */
  function _aggregator_items($count) {
    return format_plural($count, '1 item', '%count items');
+ }
+ 
+ /**
+  * correctly render html entities
+  */
+ function check_e_plain($text) {
+   return htmlspecialchars_decode($text, ENT_QUOTES);
  }
stevenpatz’s picture

Is there a patch we can test?

magico’s picture

@ebw: could you check if this happens with 4.7.5 (or better, with 5.x) and give us some screenshots of the problem, so we can verify what can be done?

Thanks

ebw’s picture

still present in 4.7.6 (core upgrade).

ebw’s picture

and still broken in 5.1

ebw’s picture

and still broken in 5.1

ebw’s picture

and still broken in 5.1

magico’s picture

Version: 4.7.0 » 5.1
dpearcefl’s picture

Status: Active » Closed (won't fix)

Considering the age of this issue with no comments and that Drupal 5 is no longer supported, I'm closing this ticket.