Output of description and dc.description is not properly sanitized [#971428]

Hi,

when meta tag content is automatically generated from the node content, the output is not properly sanitized, because markup from the node's textfields is being inserted into the <meta name="description"... and <meta name="dc.description"... fields.

Configuration from ./admin/content/nodewords:

"Generate meta tag content when the meta tag content is empty"
Generation source: "Generate meta tags content from the node teaser"

Steps to reproduce: Install a 3rd party input filter from contrib, e.g. mediawiki_filter and configure nodewords as described above. You'll see that output like this in the generated HTML code:

<meta name="description" content="&#039;&#039;&#039;A famous name&#039;&#039;&#039; from Czech Republic is &#039;&#039;&#039;doing an excellent job&#039;&#039;&#039; in &#039;&#039;&#039;healthcare&#039;&#039;&#039; for the &#039;&#039;&#039;[[United Nations]]&#039;&#039;&#039;." />

''' is the Mediawiki markup ('''...''') in Unicode entities for <bold>. Mediawiki style links ([[...]]) are not interpreted at all. It's similar for the <meta name="dc.description"... meta tag, if used.

I don't know if this is intended behaviour, but as it is, the output is useless as a meta tag. I'd think that the content from the node body would have to run through the input filter first to become proper HTML, and then all HTML tags would have to be stripped before the metatag token is inserted into the header of the resulting HTML page.

Thanks & greetings, -asb

Comment	File	Size	Author
#8	nodewords-n971428-8.patch	587 bytes	damienmckenna

Comments

Comment #1

damienmckenna

TN, USA

commented 21 November 2010 at 18:29

Issue tags:

+v6.x-1.12 blocker

Tag.

Comment #3

pomliane commented 2 February 2011 at 12:56

Hi,
Same issue here without any input filter from contrib activated.
Following meta tags are not displayed correctly: keywords, copyright, description, abstract, dc.contributor, dc.creator, dc.description, dc.publisher, dc.title.

Comment #4

antiorario commented 15 April 2011 at 08:03

Seems like Nodewords should take input filters into account. I use Markdown, which obviously ends up appearing in the meta tags.

Comment #5

quicksketch

he/him

commented 2 May 2011 at 20:09

Version:	6.x-1.11	» 6.x-1.x-dev
Issue tags:	-v6.x-1.12 blocker

As this issue exists in both 1.11 and the latest 1.12-RC, it doesn't look likely to get fixed in the 1.12 release. At the same time though, this is an obvious problem. I'm kicking this issue to the next version, which we can address once we get the long-overdue 1.12 release out.

Comment #6

damienmckenna

TN, USA

commented 10 September 2011 at 04:15

Status:

Active

» Postponed (maintainer needs more info)

This should go in nodewords_metatag_from_node_content() in nodewords.module. The question, however, is should we use node_build_content(), node_view() or something custom based on a manual application of the input filter? I'm personally thinking of node_build_content(), any other suggestions?

Comment #7

damienmckenna

TN, USA

commented 28 November 2012 at 15:22

Status:

Postponed (maintainer needs more info)

» Active

Comment #8

damienmckenna

TN, USA

commented 5 December 2012 at 14:35

Status:

Active

» Needs review

Status	File	Size
new	nodewords-n971428-8.patch	587 bytes

This inserts a simple check_markup() on the output before any other parsing is done.

Comment #9

damienmckenna

TN, USA

commented 5 December 2012 at 14:38

Status:

Needs review

» Fixed

Committed.

Comment #10

19 December 2012 at 14:40

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Output of description and dc.description is not properly sanitized

Comments

Comment #1

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

News items

Our community

Documentation

Drupal code base

Governance of community