I had the web page below listed in one of my RSS feeds that had the character #149 in TITLE of the page, and that broke that RSS feed. By #149 I mean "ampersand #149 semicolon" I'm spelling it out and not putting the actual HTML in case that would break something else.

http://ii.best.vwh.net/internet/messaging/imap/isps/

Is #149 a legitimate character to put into a web page title? I contacted the author of the document about it, saying I would also check drupal to see if the burden is on drupal to fix a "bug."

Comments

tdailey’s picture

import.module appears to trust the originating site to provide a valid title without doing any error checking on it. This is fine, but in your case obviously it is causing trouble.

In my 4.3.2 source, the section that I would look at is

      if ($item["TITLE"]) {
        $title = $item["TITLE"];

in function import_refresh.



You might want to add some scrubbing code to this. I suggest looking at example 5 here:


http://us4.php.net/preg_replace