When you use for node title generation some field than can contain special HTML chars (like quotes or apostrophe) when this chars converted twice

First time - by token module and then by node module

So if we use "&" it converts first to "&" and then to "&"

To fix this bug - we need change this line

$output = token_replace($output, 'node', $node);</code.
to this one
<code>$output = htmlspecialchars_decode(token_replace($output, 'node', $node));

Comments

This didn't do it for me, alas. I think the problem has something to do with token vs. token-raw.

StatusFileSize
new553 bytes

I adjusted the code from the description slightly to use

$output = htmlspecialchars_decode(token_replace($output, 'node', $node), ENT_QUOTES);

This seems to work for me with apostrophes, double and single quotes. I'm not sure if its the right approach though. It looks like this method leaves in the wrong code and then adds extra code to reverse its effect, instead of finding out what's wrong and removing or fixing it.

Patch attached.

#3 works great!

#3 didn't work for me until I looked at the core check_plain() function, which is how the non-raw tokens are created, including the location one that I was having problem with. In the check_plain() function, the UTF-8 option is set. So what I did was keep line 135 the same, and add a line right underneath that looked like this:
$output = html_entity_decode($output, ENT_QUOTES, 'UTF-8');

The patch in #3 is using htmlspecialchars_decode, which doesn't appear to support the necessary charset property like html_entity_decode does.

#5 appears to be working fine for me, tested on D 7.x-1.0

#5 works for me too on 6.x-1.2

StatusFileSize
new664 bytes

Here's #5 as a patch. Works for me on 6.x-1.2; needs to be ported & tested to 6.x-1.x-dev and perhaps 7.x.

Status:Patch (to be ported)» Needs review

It doesn't appear this patch has been committed to any branch, so "patch (to be ported)" seems to be a wrong status.

Yay!! The patch from #8 applies cleanly when running git apply from within ./sites/all/modules/auto_nodetitle, and even better: it actually fixes the issue (at least for me ;) against 6.x-1.2. Side effects encountered so far: none.

Thank you, mvc & lonehorseend!

Status:Needs review» Patch (to be ported)

Status:Patch (to be ported)» Reviewed & tested by the community

i have confirmed that my patch in #8 applies cleanly against 6.x-1.2 and 6.x-1.x-dev, and i see several posts from people saying it works for them, so i'm going to set the status to RTBC.

asb, "patch (to be ported)" is a status reserved for patches which have already been accepted into one branch. after this is accepted into 6.x we can use that status to flag this for porting to 7.x, but not yet.

The below code is the patch from #8 re-rolled for 7.x-1.x-dev.
I'm not attaching the patch here or changing issue status because the D6 version is still RTBC.

diff --git a/auto_nodetitle.module b/auto_nodetitle.module
index 30e6e4f..d3bbc39 100644
--- a/auto_nodetitle.module
+++ b/auto_nodetitle.module
@@ -129,6 +129,7 @@ function auto_nodetitle_operations_update($nodes) {
function _auto_nodetitle_patternprocessor($pattern, $node) {
   // Replace tokens.
   $output = token_replace($pattern, array('node' => $node), array('sanitize' => FALSE, 'clear' => TRUE));
+  $output = html_entity_decode($output, ENT_QUOTES, 'UTF-8');
   // Evalute PHP.
   if (variable_get('ant_php_' . $node->type, 0)) {
     $output = auto_nodetitle_eval($output, $node);

Version:6.x-1.x-dev» 7.x-1.x-dev

This patch still hasn't fixed our problem. (I'm working with JeffSchuler.) We applied the patch, but still, any apostrophe in an included field caused the nodetitle not to generate. We eventually rewrote the code to avoid using tokens altogether.

The problem seems to be in the pattern processor, but as I'm not a code guy, I won't be able to fix it. But I have definitely narrowed the problem down to that particular function.

Added the line from #13 on 7.x-1.0 and it fixed the problem.

Version:7.x-1.x-dev» 6.x-1.x-dev

this issue & patch #8 is still for 6.x-1.x-dev, please don't change the issue version until it's been committed to that branch.

#5 worked like a charm for me! Thanks for the fix!

I just ran into this issue, I am creating Auto Nodetitles from EXIF data from images exported from Adobe Lightroom. The fix for Drupal 7 from #13 worked for me. Thank you many times @jeffschuler!

Version:6.x-1.x-dev» 7.x-1.x-dev

okay, we have patchs for d6 and d7 reviewed & tested, and this seems ready to commit to both branches, so i'm going to go ahead and change the version number here to the latest dev just to get the attention of the maintainers. :)

Priority:Normal» Major

Since this seems not be be committed anyway, and the maintainers seem to not even bother to read the module's issue queue: I ran into a situation where the patch from #8 does not work for me. Example string for testing: Don't Come Knocking.

My node title is built through PHP with this snippet:

<?php
  $titel
= '[field_titel_reference-title-raw]';
 
$untertitel = '[field_untertitel-raw]';
 
$medium = '[field_medium-term-raw]';
 
$label = '[field_label_taxonomy-term-raw]';
 
$release = '[field_legacy_asin-release-Y]';
  if (empty(
$release)) {
    return
$titel . ' ' . $untertitel . ' - ' . $medium . ' - ' . $label;
    }
    else
    {
    return
$titel . ' ' . $untertitel . ' - ' . $medium . ' - ' . $label . ' (' . $release . ')';
    }
?>

If $titel has a string like the one mentioned above as it's value, the resultung node title turns out empty. In "normal" cases, 'auto_nodetitle' works with this PHP snippet.

Other strings where whis fails as well:

  • Let's Make Money
  • I'll never die alone
  • Let's Make Money
  • It's Always a Pleasure

So in my case it's always this darn ' (apostrophe, probably 7-bit ASCII code 0x27 (39)) that results in empty node titles. Empty node titles must not happen in Drupal, thus adjusting priority to "major". This really needs to be fixed!

Status:Reviewed & tested by the community» Needs work

Sounds like this needs work.

...or the PHP snippet circumvents the patch. I don't know enough about PHP to guess which encoding/decoding happens when.

Thanks @lonehorseend, #5 worked perfectly!

JeffSchuler's patch in #13 works great.

Let me add to this one small change to also solve asb's issue with PHP Evaluations in #20.

If you move html_entity_decode() after PHP Evaluation UTF characters are encoded correctly in the title. I have tested this successfully with both special characters and latin characters using both PHP and tokens in 7.x-1.0.

<?php
function _auto_nodetitle_patternprocessor($pattern, $node) {
  
// Replace tokens.
  
$output = token_replace($pattern, array('node' => $node), array('sanitize' => FALSE, 'clear' => TRUE));
   - 
// Evalute PHP.
  
// Evaluate PHP.
  
if (variable_get('ant_php_' . $node->type, 0)) {
    
$output = auto_nodetitle_eval($output, $node);
   }
   + 
$output = html_entity_decode($output, ENT_QUOTES, 'UTF-8');
 
// Strip tags.
 
$output = preg_replace('/[\t\n\r\0\x0B]/', '', strip_tags($output));
  return
$output;
}
?>

Hm, the D6 version looks slightly different:

/**
  * Helper function to generate the title according to the PHP code.
  * Right now its only a wrapper, but if this is to be expanded, here is the place to be.
  * @return a title string
  */
function _auto_nodetitle_patternprocessor($output, $node) {
  if (module_exists('token')) {
    $output = token_replace($output, 'node', $node);
    $output = html_entity_decode($output, ENT_QUOTES, 'UTF-8');
  }
  if (variable_get('ant_php_'. $node->type, 0)) {
    $output = auto_nodetitle_eval($output, $node);
  }
  if (variable_get('ant_php_'. $node->type, 0) || module_exists('token')) {
    $output = preg_replace('/[\t\n\r\0\x0B]/', '', strip_tags($output));
  }
  return $output;
}

E.g., instead of $output = token_replace($pattern, array('node' => $node), array('sanitize' => FALSE, 'clear' => TRUE));, we just have $output = token_replace($output, 'node', $node);

If I change function _auto_nodetitle_patternprocessor() as follows, I still get an empty nodetitle if it contains a ':

/**
  * Helper function to generate the title according to the PHP code.
  * Right now its only a wrapper, but if this is to be expanded, here is the place to be.
  * @return a title string
  */
function _auto_nodetitle_patternprocessor($output, $node) {
  if (module_exists('token')) {
    $output = token_replace($output, 'node', $node);
  }
  if (variable_get('ant_php_'. $node->type, 0)) {
    $output = auto_nodetitle_eval($output, $node);
    $output = html_entity_decode($output, ENT_QUOTES, 'UTF-8');
  }
  if (variable_get('ant_php_'. $node->type, 0) || module_exists('token')) {
    $output = preg_replace('/[\t\n\r\0\x0B]/', '', strip_tags($output));
  }
  return $output;
}

My issue is slightly different, though it might be related - tell me if it should go into a separate queue.

I have a piece of content with a node title that has several letters in parenthesis. These get stripped out ok in the url, but search will not work on the content of that particular node. The title is: House for (W)ren(t). I can search on "House" and bring up the content, but searching for anything in the content that comes after that returns a zero hit.

Is this a known problem? I haven't yet found any issue that matches this.

Thanks

@muranod you should create a separate issue.

#13 saved my pants on D7 using 7.x-1.x-dev. Thanks!

#13 worked for me as well. Please commit.

#13 needs to be submitted as a patch before it can be committed.

Version:7.x-1.x-dev» 6.x-1.2

#8 Did not work for me.

sample title : Polysorbate 80 SPECIFIC TESTS/Fats and Fixed Oils, Peroxide Value <401> 01-Dec-2012

I had to move $output = html_entity_decode($output, ENT_QUOTES, 'UTF-8'); just before return $output;

function _auto_nodetitle_patternprocessor($output, $node) {
if (module_exists('token')) {
// see http://drupal.org/node/355067
token_get_values('node', NULL, TRUE);
$output = token_replace($output, 'node', $node);
}
if (variable_get('ant_php_'. $node->type, 0)) {
$output = auto_nodetitle_eval($output, $node);
}
if (variable_get('ant_php_'. $node->type, 0) || module_exists('token')) {
$output = preg_replace('/[\t\n\r\0\x0B]/', '', strip_tags($output));
}

$output = html_entity_decode($output, ENT_QUOTES, 'UTF-8');

return $output;
}

StatusFileSize
new758 bytes

jeffschuler's change in #8 worked for me. I am using auto node titles of this pattern [node:field_model_name]-[node:field_model_family]. I have many instances where the first token [node:field_model_name] is replaced by strings like this:
('80)
('81)
('90) etc.

I've attached a patch for #8. This is against 7.x-1.x.

Status:Needs work» Needs review

Status:Needs review» Closed (works as designed)

I've run into the same problem. However, the solutions proposed here are the wrong fix. Ultimately, auto_nodetitle is doing the right thing (which is neither encoding nor decoding the title). The root of the problem is that you need to use the raw token values for this to work. In D6, that just means you need to use the [...-raw] version of tokens you're using for this, and it should work fine.

However, there's no totally satisfactory solution in D7. By default, the field tokens provided by D7 token.module are using the sanitized display values. There's been a lot of discussion around this:

#1940076: No Raw value?
#1713164: Raw field tokens
#691078: Field tokens (500+ comments) :/
...
I just bumped #1713164 to a major task and pointed here.

Meanwhile, a possible work-around (depending on the needs of your site) is:
- go to the "manage display" tab on the node type you're trying to use auto_nodetitle on
- open the "custom display settings" fieldset
- click on the "Tokens" checkbox to be able to configure the "token" view mode and click "Save"
- click on the "token" subtab (e.g. admin/structure/types/manage/[foo]/display/token)
- change the format on the fields you're trying to use for the title to be 'plain text' and save

After that, auto_nodechanges will work properly, even with fields that have ' in them. However, *all* references to [node:field_whatever] will now be using the raw/plain value, not the filtered/sanitized one. So, depending on where you're using the tokens, you need to be careful to filter the output before display lest you open your site up to an XSS vulnerability. text.module will still be stripping tags, so it's not really going to be an XSS vulnerability, but still, you have to be careful if you're making heavy use of these tokens.

An alternative work-around is to define your own custom raw tokens for the fields you care about (until token.module is doing that for us).

However, all the patches proposed in this issue are wrong, and none of them should be committed to auto_nodetitle. The fix is to get the raw values in the first place, not to use sanitized values and then try to un-sanitize them back to raw.

Also, #878570: Do not sanitize title in function _auto_nodetitle_patternprocessor is definitely a bug, and depending on your situation, that might be causing you problems, too. But that's different than what's being attempted here.

Hope that's all clear... ;)

Thanks,
-Derek

Status:Closed (works as designed)» Active

@dww : I'm using "[...-raw] version of tokens" (cf. #20), and it does not work fine. That's the point of this issue.

If #878570 is the root cause that "[...-raw] version of tokens" do not work, this issue would be a duplicate, right?

Status:Active» Closed (duplicate)

Status:Closed (duplicate)» Closed (works as designed)

@asb: the point of this issue is how to get "special" characters into the title. Why your PHP code isn't working isn't related the proposals here to "unsanitize" the title. I don't feel any obligation to debug your code for you. ;) you can try my patches at the other issue, but that's a separate bug. This here should remain "works as designed" because auto_nodetitle does by, and should not, sanitize the title on input.

Thanks,
-Derek