HTML tags (e.g. from Google Alerts) are shown in the results of a feed imported by aggregator.

This is shown as
&#39;Het bijtellen van extra tijd is aan de <b>scheidsrechter</b>&#39

In Drupal 7, the same problem exists as described in http://drupal.org/node/311511. I have created a different issue, because the problem was solved for the 6.x-version.

Any help would be appreciated.

CommentFileSizeAuthor
#6 feeds.png15.91 KBedvanleeuwen

Comments

edvanleeuwen’s picture

Status: Active » Needs review

It seems that this can be resolved by combining the 6.x version solutions.

In aggregator.parser.inc, function aggregator_parse_feed, replace

// Resolve the item's title. If no title is found, we use up to 40
    // characters of the description ending at a word boundary, but not
    // splitting potential entities.
    if (!empty($item['title'])) {
      $item['title'] = $item['title'];
    }

by

/// Resolve the item's title. If no title is found, we use up to 40
    // characters of the description ending at a word boundary, but not
    // splitting potential entities.
    if (!empty($item['title'])) {
      $item['title'] = html_entity_decode(strip_tags($item['title']), ENT_QUOTES, "utf-8");
    }

I am not sure whether this causes any side-effects, but for now I am happy with it.

FSMDrupal’s picture

This worked in Drupal 7 for me. This is similar to another post I saw but this one includes the 'strip_tags' which seemed to be key to this working.

Thanks

edvanleeuwen’s picture

Version: 7.15 » 7.17
Status: Needs review » Reviewed & tested by the community

Can anyone help me with getting this into the mainstream files?

David_Rothstein’s picture

Status: Reviewed & tested by the community » Active

There is no patch here.

David_Rothstein’s picture

Status: Active » Postponed (maintainer needs more info)

In any case, it's not 100% clear if this is about the titles or content of the feed items, but:

  • For feed content, these should already allow the tags listed in the "Allowed HTML tags" section at admin/config/services/aggregator/settings to go through.
  • For feed titles, Drupal does escape the HTML, but note that it does the same thing for e.g. node titles too... Putting HTML in titles like that doesn't work in general.

Either way, I can't reproduce any issue with quotes getting escaped (as in the original example above, &#39;Het bijtellen van extra tijd is aan de <b>scheidsrechter</b>&#39), just HTML tags.

Finally, if there were a place to change this, it would probably be in the theme preprocess layer? See this code from aggregator.pages.inc:

function template_preprocess_aggregator_item(&$variables) {
....
  $variables['feed_title'] = check_plain($item->title);
  $variables['content'] = aggregator_filter_xss($item->description);

This shows how the title and content are processed differently before they are displayed.

edvanleeuwen’s picture

StatusFileSize
new15.91 KB

Thanks for you reply, David.

In any case, it's not 100% clear if this is about the titles or content of the feed items

It is for the titles. I have attached a screenshot of two blocks which shows the results of a feed of the KNVB.nl website and one from Google Alert:
Feeds example

For feed titles, Drupal does escape the HTML, but note that it does the same thing for e.g. node titles too... Putting HTML in titles like that doesn't work in general.

This is something I get when adding an atom or RSS feed from Google Alert. It is not something I have control over myself.

Finally, if there were a place to change this, it would probably be in the theme preprocess layer?

I cannot say, I am no expert on this, unfortunately.

If you feel that this is something which is not going to be incorporated into the mainstream, it is fine with me. I am quite happy with the solution described above, although I have to apply it every time core is updated.

edvanleeuwen’s picture

Status: Postponed (maintainer needs more info) » Closed (duplicate)

Declared duplicate: https://drupal.org/node/61456