It seems that v2.2 doesn't recognize words containing ā, ś, ū, ñ, ... There might be some more. I have php mbstring installed. It was working ok in version 1.7 - all terms were correctly tagged.

CommentFileSizeAuthor
#10 glossary1.png34.32 KBmartig
#10 glossary2.png20.02 KBmartig
#10 glossary3.png31.84 KBmartig

Comments

nancydru’s picture

I'm not sure what you mean "doesn't recognize." I have a vocabulary with a term that starts with é and another with ñ and they show up fine - although to be in correct order, one must set the alphabet on the "Alphabet" settings tab. It may also require you to adjust the collation on the term_data table, as was reported some time ago, and is (partially) documented in http://drupal.org/node/201763.

nancydru’s picture

Status: Active » Postponed (maintainer needs more info)
martig’s picture

The terms are displayed fine in the vocabulary, but they are not linked from the nodes.

nancydru’s picture

One more question: Do these characters begin the term, or just somewhere else in them? ("école" as opposed to plissé)

It's kind of strange, because the 1.7 version was before most of the language support was added.

nancydru’s picture

I just set up a test node, with "In order to test foreign language support, I have the words école, plissé, and ñachi in my glossary." and all the terms are flagged as they should be.

martig’s picture

Hmm, it seems the problem is not at all like it seemed at first.

Only some of the terms are linked to their description and it seems to quite random (no relation to foregin characters).

Here's an example - http://teosoofia.ee/nikola_tesla_1853_1943/vedaanta_filosoofia_moju_niko... <-- There are quite many terms but at the moment I'm not seeing any of those tagged (I use the acronym tag).

I also found these errors in the log, always in pair:

Unknown table &#039;term_data&#039; in where clause query: SELECT t.name, t.description, t.tid, COUNT(tn.nid) as nodes FROM term_data t LEFT JOIN term_node tn USING(tid) LEFT JOIN term_data catd ON term_data.tid = catd.tid WHERE (term_data.tid IN (251, 252, 232, 234, 250, 233, 280, 236, 376, 240, 241, 242, 246, 247, 0) OR catd.vid NOT IN (12)) AND ( t.vid=1 ) GROUP BY t.tid, t.name, t.description ORDER BY LENGTH(t.name) DESC /drupal/includes/database.mysql.inc on line 172.

Unknown table &#039;term_data&#039; in where clause query: SELECT ts.tid, ts.name FROM term_synonym ts JOIN term_data t USING(tid) LEFT JOIN term_data catd ON term_data.tid = catd.tid WHERE (term_data.tid IN (251, 252, 232, 234, 250, 233, 280, 236, 376, 240, 241, 242, 246, 247, 0) OR catd.vid NOT IN (12)) AND ( t.vid=1) /drupal/includes/database.mysql.inc on line 172.

I haven't managed to reproduce these under my own user account. They occur a few times a day and only for the quests.

nancydru’s picture

Well, let's deal with that last part first: http://drupal.org/node/239880 is fixed in the -dev version. Then there's also: http://drupal.org/node/233752#comment-793342

After midnight (GMT) tonight, download the -dev version, verify that you have acronym links styled visibly, save the settings again (even if no changes), and retry it.

martig’s picture

I installed the dev version, but it's still not tagging all the terms. Only some get tagged.

nancydru’s picture

Assigned: Unassigned » nancydru

Well, it's progress I guess. Can you see any kind of pattern to this? Can you tell me all your settings, please.

martig’s picture

StatusFileSize
new31.84 KB
new20.02 KB
new34.32 KB

No, there doesn't seem to be a recognizable pattern. I took screenshots of the glossary settings. Link style is acronym.
PHP version is 4.4.8, MySQL is 4.1.7. And I haven't installed any new modules since upgrading from glossary 1.7.

nancydru’s picture

Wow, that is the oldest MySql I have ever seen! BTW, are you aware that 7.x will require PHP 5.2+ (probably at least 5.2.5) and MySql 5+. Time to pressure your host to upgrade.

I'll see if there's anything that stands out.

nancydru’s picture

Well, one thing I see that has confused some people. You have "only first match" which means only the first occurrence in a node is flagged.

Another issue is discussing the problem of non-English characters. For example, "école" does not match "&eacute;cole" or "&#233;cole".

martig’s picture

Yes, I want only the first match to be displayed.
Just an example: http://teosoofia.ee/nikola_tesla_1853_1943/vedaanta_filosoofia_moju_niko... <-- the word Ākāśa is not tagged, and so aren't many other terms.

nancydru’s picture

Is the term (in the vocabulary) entered in exactly the same way that the word is typed in the content?

martig’s picture

Yes, the terms are entered exactly. I tried copy-pasting some of the terms, but it didn't change anything.

I also tried creating a new node with the content copied from the node I brought as an example, and guess what - all the terms were tagged in the newly created node. That's really strange.

nancydru’s picture

That sounds like a caching issue. Make sure you've cleared the cache_filter table.

martig’s picture

Hmm, before I tried just copying the text to a new node, but now I copied the html and the result is same as in the old node. I took a peek at the module's code and found that for example the <pre> tag is blocked. Maybe this should also be made configurable.

nancydru’s picture

Probably just documented better, because there are things like <pre> and <code> where the module should not be operating.

mwrochna’s picture

martig - were <pre> tags the only problem?

There's no reason to keep them hardcoded - someone may use them in another way (like here, it's not code-escaping, it's changing to a monospaced font) or just not use them to make the filter faster. Also we won't have to document it when it's there - just add an update and remove pre and code from $open_tags and $close_tags (lines 404-405).

/* Implementation of hook_update_N
 * We move the hardcoded tags to configurable Blocking tags
 */
function glossary_update_5103() {
  $ret = array();
  $result = db_query('SELECT format, name FROM {filter_formats}');
  while ($filter = db_fetch_array($result)) {
    $format = $filter['format'];
    $value = variable_get("glossary_blocking_tags_$format", NULL);
    if(!is_null($value)) {
      $value = "code pre $value";
      variable_set("glossary_blocking_tags_$format", $value);
    }
  }
  return $ret;
}

(The data's serialized, so we can't just make an UPDATE query). We need '<a ' because there's the space, we can document it with [code] right where [no-glossary] is mentioned.

nancydru’s picture

Thank you , Marcin.

For those who don't know him, Marcin (mwrochna) is the person who did a significant rewrite on this module (for the GHOP season) that gives it the capability to skip parts of the content.

I will look at how to better document this.

nancydru’s picture

Status: Postponed (maintainer needs more info) » Active
nancydru’s picture

Status: Active » Patch (to be ported)

Changing status so I'm reminded to fix it.

nancydru’s picture

Status: Patch (to be ported) » Postponed (maintainer needs more info)

I just committed a fix to the form description and filter tips to mention these hard-coded tags.

I'm still thinking about moving those tags to the exposed list of blocking tags. Certainly it's easy enough, but I don't want to break the installation for those who are already using this module. So, I need comments, people.

mwrochna’s picture

I'm not sure I understand. It won't break anything if they run the update (and a red warning will remind them to). Setting $value = "code pre $value"; only adds those tags, nothing gets lost and the behaviour is exactly the same.
Just add glossary_update_5103() from #19 to glossary.install, change to $open_tags = array('[no-glossary]', '<', '<a ', '[code'); , $close_tags accordingly, and change defaults in variable_get()s from 'abbr acronym' to 'code pre abbr acronym'... or did I forget something?

nancydru’s picture

You might be surprised how many people don't run update.php when updating modules. But you are right that the impact should be close to non-existent, but I would really like more eyes on this.

AaronCollier’s picture

I think this sounds like it could be useful and people could at least have the option. Honestly, if people don't run update every time, then they should expect to have problems. It is written there to remind them.

So +1 from me.

nancydru’s picture

Status: Postponed (maintainer needs more info) » Active
nancydru’s picture

Status: Active » Fixed

Committed to both branches.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.