HTML entities double-escaped in titles

gnassar - March 2, 2007 - 18:05
Project:Page Title
Version:5.x-2.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:nicholasThompson
Status:needs review
Description

Current code runs all titles through check_plain (by using @ in the t() function). These are already check_plained by Drupal in drupal_get_title(), so any quotes or such get entity-coded twice, resulting in something like " so you get an entity code in the title instead of the quote. Patch above replaces the @'s in the title string with !'s.

AttachmentSize
page_title.module-HEAD-r0.patch4.15 KB

#1

NikLP - March 3, 2007 - 18:40

Yeeeeaaaah... you didn't see this then?

It works, and it only changes one line of code...

#2

gnassar - March 5, 2007 - 21:27

No, didn't notice it. Thank you. (I think I was confused by the "rendering" bit; or perhaps if you mentioned HTML entities in the title?)

It does only change one line, while mine changes 8 or so, if I recall. But I think what may be more important is making sure it doesn't break anything else. Yours gets the set_value function to return the right value, but you'd have to fix the return value for all externally facing functions as well, especially the nodeapi hook. But more importantly, I think titles should probably be handled the same in this module as they are in native Drupal without the module. The t(@text) handling of this would have worked fine in 4.6 and 4.7, but in 5 for these to internally be handled the same, that has to be changed to t(!text). That is what my code does, and then I don't have to "kludge" any of the return values at the end.

#3

NikLP - March 6, 2007 - 02:47

Are you saying that this is a porting error then...?

I didn't (at the time) spot anywhere else in the module that was exposing the title variable, so I thought it was fairly safe to ""kludge"" it :)
If your patch is more appropriate to D5 coding standards, then I'll tag mine as duff and jump on board this! :)

#4

gnassar - March 6, 2007 - 23:06

Well, the nodeapi hook uses the title var as well, but more to the point, the title var is the same as is used in Drupal core, and I don't know whether there are any places where that is used in core that this module doesn't override, and I would not want to take that risk.

And yes, this is a very common porting bug. I've run into similar bugs with other modules as well. 4.7 didn't automatically check_plain results of t() functions, while 5 does. I do feel this is the right way to address this bug; then again, I wrote the patch, so I may be biased :-) which is why I left it up for review. I'd love to hear anyone else's opinions on it.

#5

NikLP - March 8, 2007 - 18:53

FYI, I have unescaped page titles in other areas as well, that are more likely to be associated with problems in core than anything else (I think) - I renamed a blog "News & Events" and the ampersand is coming thru as you-know-what.

#6

gnassar - April 11, 2007 - 17:16
Status:needs review» reviewed & tested by the community

#7

freeman - April 24, 2007 - 16:57

Thanks ! Works for me, 's and &'s comes out ok.

I tried to solve this and the following problem earlier (and was partially successful - failed on the 's). Even with the above patch, if you use Views module, you'll see the <em> tags in the html title in edit mode. I don't know if this appears elsewhere in other contexts, but the following (adapted from my earlier solution) resolved it :

Around page_title.module:158, apply the strip_tags for the $drupal_title variable. So

  // this is the normal title. For node pages it is $node->title
  $drupal_title = drupal_get_title();

  // this is the custom page title. For node pages it is $node->page_title.
  $page_title = (page_title_set_title()) ? strip_tags(page_title_set_title()) : $drupal_title;

becomes:

  // this is the normal title. For node pages it is $node->title
  $drupal_title = drupal_get_title();

  // this is the custom page title. For node pages it is $node->page_title.
  $page_title = (page_title_set_title()) ? page_title_set_title() : $drupal_title;
  $page_title = strip_tags($page_title);

This solution is based on looking at /themes/engine/phptemplate.engine. There's a strip_tags around drupal_get_title() in constructing $head_title, whereas none was in the current module.

Does anyone think this is worth commiting ?

#8

nicholasThompson - April 26, 2007 - 19:03
Assigned to:gnassar» nicholasThompson
Status:reviewed & tested by the community» fixed

gnassar's patch and freeman's suggestions committed...

http://drupal.org/cvs?commit=65459

Thanks guys

#9

killes@www.drop.org - May 11, 2007 - 12:24
Status:fixed» closed

#10

funana - November 9, 2009 - 13:33
Version:5.x-1.x-dev» 5.x-2.2
Status:closed» active

Exactly the same behaviour in 5.x-2.2 version of the module. Apostrophes are put out as &#039;.
Used an Apostrophe for user's blogs: [page-title] | [author-name-raw]'s Blog.

#11

akahn - November 10, 2009 - 20:32

Yep, I'm still experiencing this on 5.x-2.2.

Removing the call to check_plain() in the return statement of page_title_page_get_title() solves the issue. But I'm not sure if this user input is sanitized elsewhere, and thus I'm not sure if this is safe to do.

#12

nicholasThompson - November 23, 2009 - 18:06
Version:5.x-2.2» 5.x-2.x-dev
Status:active» needs review

I've submitted some fixes for this to the dev branches - can you guys please review and let me know if it fixes your problems? I cannot replicate any of the old double-encode situations on my dev box...

Common problems (for me) were:
(1) Node Titles with ampersands and/or greater than/less than symbols or quotes
(2) Nodes with Page Titles that contain characters mentioned in (1)
(3) Menu Titles containing encodable characters such as ampersands or quotes, such as admin/logs/access-denied...

 
 

Drupal is a registered trademark of Dries Buytaert.