Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
By jfdill on
I had the web page below listed in one of my RSS feeds that had the character #149 in TITLE of the page, and that broke that RSS feed. By #149 I mean "ampersand #149 semicolon" I'm spelling it out and not putting the actual HTML in case that would break something else.
http://ii.best.vwh.net/internet/messaging/imap/isps/
Is #149 a legitimate character to put into a web page title? I contacted the author of the document about it, saying I would also check drupal to see if the burden is on drupal to fix a "bug."
Comments
import.module appears to trus
import.module appears to trust the originating site to provide a valid title without doing any error checking on it. This is fine, but in your case obviously it is causing trouble.
In my 4.3.2 source, the section that I would look at is
in function import_refresh.
You might want to add some scrubbing code to this. I suggest looking at example 5 here:
http://us4.php.net/preg_replace