Well, at least for Chinese. If a user leaves a comment without a subject, the comment.module is supposed to automatically extract and create one for that comment. Unfortunately tt doesn't work for comments written in Chinese: In the 'most recent comments' block, the subject field is shown empty. Actually this problem has long been found since 4.6.0 was out, and a workaround was (credit goes to kzeng):

Change the following line in comment.module: (4.6.x)

...
 if (trim($edit['subject']) == '') {
    // The body may be in any format, so we:
    // 1) Filter it into HTML
    // 2) Strip out all HTML tags
    // 3) Convert entities back to plain-text.
    $edit['subject'] = truncate_utf8(decode_entities(strip_tags(check_output($edit['comment'], $edit['format']))), 29, TRUE);
  }
...

To

...
 if (trim($edit['subject']) == '') {
        $edit['subject'] = truncate_utf8(strip_tags($edit['comment']), 29);
  }
...

The above workaround works for 4.6.x, but doesn't seem to work for 4.7.0beta1.

-Dami

CommentFileSizeAuthor
#4 truncate_spaces.patch570 bytesSteven

Comments

Wesley Tanaka’s picture

apply this patch first.

http://drupal.org/node/40754

dami’s picture

Tried the patch but it doesn't solve the problem.

Wesley Tanaka’s picture

step 1. apply http://drupal.org/node/40754
step 2. apply the kzeng workaround

Steven’s picture

Title: Comment subject auto-extraction not working for multibyte languages » Comment subject auto-extraction depends on spaces
Status: Active » Needs review
StatusFileSize
new570 bytes

The problem is not multibyte (with utf-8, almost everything is multibyte). It's spaces. This problem also occurs in non-Chinese/Japanese, when the first word is very long.

dami’s picture

Thanks wtanaka and Steven.
Your two patches together solves the problem:
1. apply wtanaka's patch to comment.module
2. apply Steven's patch to unicode.inc
(without Steven's patch, a long comment without spaces in between words, which is common in Chinese, will show a blank subject.)

kzeng's work-around is no longer needed with the above two patches applied.

Thanks!
-Dami

dami’s picture

Version: 4.7.0-beta1 » 4.7.0-beta3
Status: Needs review » Needs work

Just a quick updat:
Still have this problem in beta3, and new patches are needed to work for beta3.

Vidarls’s picture

I think I might have found the source of the problem:
The problem is not chinese, nor is it spaces. The problem is that comment.module tries to alter the form in hook_validate. I don't know if it is a bug or a feature but it does not work. Any changes made to the form values in hook_validate will remain within this funtion.
The code that extracts a subject from the body works really great, (at least on my simple western language).
I've made a quick and unelegant temporary solution to the problem:
copy the subject extraction code to the comment_save function:

function comment_save($edit) {
  global $user;
  
  //ugly hack to ensure subject is populated:
  if (trim($edit['subject']) == '') {
    // The body may be in any format, so we:
    // 1) Filter it into HTML
    // 2) Strip out all HTML tags
    // 3) Convert entities back to plain-text.
  // Note: format is checked by check_markup().
    $edit['subject'] = truncate_utf8(decode_entities(strip_tags(check_markup($edit['comment'], $edit['format']))), 29, TRUE);
  }
  //endof ugly hack
  
  if (user_access('post comments') && (user_access('administer coments') || node_comment_mode($edit['nid']) == COMMENT_NODE_READ_WRITE)) {

This may or may not be the best way of fixing this, but at least it works for now until someone with a deeper understanding of drupals inner workings comes along and does something neat and elegant.
I have not tried the patches in this thread http://drupal.org/node/40754, cause I cant' get my head around patching on windows (yes I'm a bit lazy..) So this problem might be solved there

Jaza’s picture

Version: 4.7.0-beta3 » 4.7.0

Haven't done any testing on this myself, but it looks like this issue probably still applies in the 4.7 branch (and CVS as well).

killes@www.drop.org’s picture

Status: Needs work » Fixed

this appears to be fixed, long words get broken up.

Anonymous’s picture

Status: Fixed » Closed (fixed)