In various places, titles are double-escaped, such as with check_plain(). For example, if the title is "Liam's Test", it gets escaped to Liam's Test (which is what should happen before display) then to Liam's Test.

Steps to recreate:
- Create a new web page with a title contained a single quote.
- View the source of the node: the span element inside the node markup with property="schema:name" will have double-escaped @content value.

Use Google's RDFa Parser to test.

There are reports of this problem in various places and I suspect they are all caused by the same core bug:
#1189550: RSS Style plugin escapes feed title twice
#1302760: ampersand from Title gets escaped (or double escaped?)
#1363358: Shortcut set titles are double-escaped with check_plain()
#1468014: Double Escapes the Title? Escape characters showing up in the node title.
#1955072: Node/feed item titles are double escaped

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Liam Morland’s picture

Title: Titles are often doubled-escapted » Titles are often double-escaped
David_Rothstein’s picture

I'm not sure there's a central place that would be doing the double escaping ... seems more likely that maybe there are different causes for each? A lot of times people use @-style placeholders in t() somewhat blindly (which is usually exactly what they should be doing) but if the thing they're inserting has already been escaped previously, then doing that blindly is where the double-escaping would come from.

Liam Morland’s picture

It looks to me like Drupal is not consistent about when a string is plaintext and when it is HTML. At any given moment in the page-build process it needs to be clear which it is.

It certainly could be different causes for each. These issues are related in that it is the same essential problem in each case.

I'm pretty sure this should not be solved by un-escaping the strings to un-do excessive escaping.

In any case, there is a core bug here since the issue is observed in the dc:title meta element.

David_Rothstein’s picture

Not un-escaping, just using t('!string') rather than t('@string') to avoid double-escaping if you're in a situation where you know the string is already pre-sanitized.

In any case, there is a core bug here since the issue is observed in the dc:title meta element.

That part should probably be moved to the RDF core issue queue component, but in any case my guess is it's because rdf_preprocess_node() is doing this:

    $element = array(
      '#tag' => 'meta',
      '#attributes' => array(
        'content' => $variables['title'],
        'about' => $variables['node_url'],
      ),
    );

But $variable['title'] is presanitized by template_preprocess_node(), and Drupal sanitizes HTML attributes too when they're printed, so I think that's where the double-escaping comes from in this case. Probably the RDF module should use $variables['node']->title here instead?

Liam Morland’s picture

Yes, the difference between ! and @ is subtle and it doesn't help that people may think it is OK to sanitize too much.

Your solution sounds reasonable to me, but I am really not familiar enough with that part of the code to know.

Liam Morland’s picture

Component: node system » rdf.module
Status: Active » Needs review
FileSize
446 bytes

@#4: That fixes it for the meta element. Patch attached.

Status: Needs review » Needs work

The last submitted patch, core_2011918_title_double_escape.patch, failed testing.

David_Rothstein’s picture

Status: Needs work » Needs review
David_Rothstein’s picture

Title: Titles are often double-escaped » Titles are often double-escaped (including in the content attribute of the dc:title meta element for nodes)

That looks right to me.

Does this exist in Drupal 8 too? I think it probably would...

Liam Morland’s picture

Version: 7.x-dev » 8.x-dev
FileSize
466 bytes

D8 version.

scor’s picture

Good catch. Verified that this patch works on Google RDFa parser, before: Liam's Test, and after: Liam's Test

I also checked the user and taxonomy term code in rdf.module, and we use the raw value so we're fine ($account->name and $term->label()).

I'm uploading a patch that uses the label() method on the node, and which updates the node RDF test to include a single quote in the node title for the regular test. I'm curious to see if others would prefer to have it as a separate chunk of code at the end of the regular node test (create a new node for the purpose of testing the single quote).

chrisshattuck’s picture

I tested the patch, and it works on the property="dc:title" example.

I feel like including the single quotation mark in the same test is efficient, but am curious if combining two tests is in general not as good of a practice for regression testing? (I'm just now learning more about testing, so this is a genuine question.)

chrisshattuck’s picture

I may be missing something, but does the test modification need to add an assertion somewhere that tests if the double-escaping is happening?

scor’s picture

@chrisshattuck not as it is... because the test would fail without the fix, it did fail on my machine. you are welcome to upload a patch with just the test hunk (without the rdf.module change) to illustrate that the test is effectively working (and failing).

Pycouz’s picture

Hi all!

There is the same problem in titles of RSS feed when using fields in Views. The ' is modified to '.

Any help?

Liam Morland’s picture

Pycouz’s picture

Sorry for the trouble. Thank you.

jesse.d’s picture

This looks good to me.

NodeAttributesTest passes locally.

In a manual test, dc:title content before: Jesse's Test and after:Jesse's Test.
And in Google RDFa parser: before: Jesse's Test and after: Jesse's Test

jesse.d’s picture

jesse.d’s picture

Status: Needs review » Needs work

Patch failed to apply locally. Rerolling.

jesse.d’s picture

Status: Needs work » Needs review
FileSize
1.33 KB

The updated patch is attached.

scor’s picture

Status: Needs review » Needs work

needs reroll after https://drupal.org/node/1323830#comment-7713521 was committed.

jesse.d’s picture

Status: Needs work » Needs review
FileSize
1.55 KB

Rerolled after https://drupal.org/node/1323830#comment-7713521 was committed.

It looked good to me locally testing in google's rdfa parser.

jesse.d’s picture

Issue summary: View changes

Updated issue summary.

kay_v’s picture

Status: Needs review » Reviewed & tested by the community

The patch fixes the problem. Checks out both in the browser and in Google's RDFa Parser.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

I've run the changed test without the patch and it caught the error. With the fix the tests pass. Nice work!

Committed 5778af9 and pushed to 8.x. Thanks!

Liam Morland’s picture

Version: 8.x-dev » 7.x-dev
Status: Fixed » Needs review
FileSize
446 bytes

D7 version (same as in #6). It does not have a test because the tests in the RDF module are completely different in D7 and I don't know how to port the test.

jesse.d’s picture

D7 version including an updated test.

scor’s picture

Looks good to me. We should probably check that the new test fails when the fix isn't applied.

kay_v’s picture

Status: Needs review » Reviewed & tested by the community

confirmed on fresh Drupal 7 site - all good to go, so marking RTBC.

Tor Arne Thune’s picture

I've been using the patch from #26 on production for some weeks now.
+1 to the RTBC.

Tor Arne Thune’s picture

Issue summary: View changes

update with regard to previous commit and schema.org mappings

David_Rothstein’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Fixed

I verified locally that the test fails without the rest of the patch, so good to go.

I fixed a code style issue on commit (missing spaces before and after the string-concatenation operator) and also added the code comment from the Drupal 8 patch (seemed like an equally useful comment for Drupal 7)...

Committed to 7.x - thanks! http://drupalcode.org/project/drupal.git/commit/6331211

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.