First non-English <h> title is not shown in TOC

dami - May 5, 2008 - 03:46
Project:Table of Contents
Version:6.x-3.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:AlexisWilke
Status:closed
Description

If the title of first tag is not in English, i.e. it doesn't consist any letters [A-z0-9 ], then it won't show up in TOC. The reason is that the auto generated anchor id will be empty for this first tag. Subsequent tags do not have this problem.

Looking at the code, looks like if title of a "h" tag is empty, the filtered result is inconsistent. If it's the first empty tag, it's ignored in TOC. However, all subsequent empty tags will be assigned to an css id, as a result, it shows up in TOC.

#1

dami - May 5, 2008 - 03:49
Status:active» needs review

This patch seems fix the problem, but I am not sure if it breaks anything else.

AttachmentSize
empty_title.patch 667 bytes

#2

dami - May 5, 2008 - 15:31

Patch in #1 will actually change the existing anchor numbering. Here is a patch that just deal with <h> tags, that do not have any valid anchor id characters (i.e. [A-z0-9 ]) in its title. With this patch, Any real empty <h> tags will also show in TOC, if this is not wanted, we may add another line of code to deal with it.

AttachmentSize
headinganchors_emptytags.patch 520 bytes

#3

deviantintegral - May 11, 2008 - 21:38

I'm trying to find information, but I'm not getting a clear answer: what are the valid characters for XHTML attribute values? If they are more than just A-z 0-9, we should allow them, rather than just assigning an arbitrary attribute. If not, then some function should convert the string into something valid and use that instead.

--Andrew

#4

deviantintegral - December 31, 2008 - 18:25
Version:5.x-2.1» 5.x-2.x-dev
Status:needs review» needs work

Can someone determine if this is still an issue in the latest development version?

#5

dami - January 19, 2009 - 01:08
Version:5.x-2.x-dev» 6.x-2.x-dev

I have upgraded my site to 6.x. So I couldn't test 5.x-dev
But the problem still exists in both 6.x-2.2 and 6.x-2.x-dev

#6

jpfle - March 16, 2009 - 18:52

deviantintegral wrote:

I'm trying to find information, but I'm not getting a clear answer: what are the valid characters for XHTML attribute values? If they are more than just A-z 0-9, we should allow them, rather than just assigning an arbitrary attribute. If not, then some function should convert the string into something valid and use that instead.

On the page http://www.w3.org/TR/xhtml1/#C_8 we can read:

Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information.

And in the section 6.2 on http://www.w3.org/TR/html4/types.html#h-6.2 :

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

Also, ID's content is sanitized, but it's not adapted for words with diacritics. For example, say we have this title in French:

<h2>Été</h2>

it will be changed for:

<h2 id="t">Été</h2>

and the url will be:

/page#t

It should be:

/page#Ete

Diacritics should be changed for non-diacritics (e.g. é=>e).

So I attach a patch to accept [A-Za-z][A-Za-z0-9:_.-]* and to improve rendering of sanitized diacritics.

Edit: I forgot to attach also the i18n-ascii.txt file (we must put it in the module's directory).

AttachmentSize
headinganchors.module.patch 871 bytes

#7

jpfle - March 16, 2009 - 18:53

File i18n-ascii.txt attached.

AttachmentSize
i18n-ascii.txt 5.21 KB

#8

deviantintegral - March 18, 2009 - 00:55

Thanks for the patch. Finding the info about the XHTML spec was very useful. I've updated it slightly with more comments, and included the i18n-ascii.txt file in the patch. I've tested against the 5.x version.

Do you see this as RTBC?

AttachmentSize
254722_transliterate_anchors_5.x.patch 7.29 KB
254722_transliterate_anchors_6.x.patch 7.3 KB

#9

deviantintegral - March 18, 2009 - 20:25
Status:needs work» needs review

#10

jpfle - March 19, 2009 - 01:12

Some comments:

1) I think it would be more efficient to parse i18n-ascii.txt only once, so I've moved it outside the "foreach" loop. See the new patch.

2) I think that anchors would be more readable if spaces was replaced by a hyphen. See this example:

Without hyphen: /page#CurrentDrupalcoreinitiatives
With hyphen: /page#Current-Drupal-core-initiatives

I've put a suggestion in the patch: $anchor = preg_replace("/ +/", "-", $anchor);

3) I'm not sure about this one. What about doing the same thing with apostrophe? Example with "L'arrivée de l'été l'a lentement réchauffé":

Without apostrophe: /page#Larrivee-de-lete-la-lentement-rechauffe
With apostrophe: /page#L-arrivee-de-l-ete-l-a-lentement-rechauffe

Personally I replace apostrophe with hyphen in Pathauto, but usage of apostrophe must be changing according to languages.

The best would be surely some settings in the module's admin, as Pathauto, but it's more code. What do you think?

AttachmentSize
headinganchors.module.drupal6.patch 1.39 KB

#11

jpfle - May 16, 2009 - 20:42

Hi deviantintegral,

I wonder if you have some comments about the patch proposed?

Thanks.

#12

AlexisWilke - June 8, 2009 - 08:37

Salut Gay Luron,

My comment would be that it makes it quite complicated to add more code to support apostrophes one way or another. Especially because it will make the module slower. More code means slower module...

I will look into that later, but it sounds like a good idea to put a dash by default since that would most certainly look okay in most languages.

Thank you.
Alexis Wilke

#13

AlexisWilke - July 5, 2009 - 09:59
Version:6.x-2.x-dev» 6.x-3.x-dev
Assigned to:Anonymous» AlexisWilke
Status:needs review» fixed

Okay, the transliteration is installed in 3.x-dev, HOWEVER, with FCKeditor, a character such as é is transformed into &eacute;. This means the transliteration does NOT work. I guess we should add all the default HTML entities in the i18n-acsii.txt file? Please, feel free to re-open this issue if you provide that fix.

Thank you.
Alexis Wilke

#14

jpfle - July 18, 2009 - 22:27

Hi Alexis,

I don't know if we should add the default HTML entities in the file, because the «problem» can be resolved (I guess, I didn't test) in FCKeditor with this configuration:

FCKConfig.ProcessHTMLEntities = false ;

See ProcessHTMLEntities.

#15

AlexisWilke - July 19, 2009 - 06:47

Okay, I guess that should be documented somewhere... Most people would not know about such a thing!

Thank you for the pointer
Alexis Wilke

Addition: Added info in the TOC front page for now.

#16

System Message - August 2, 2009 - 06:50
Status:fixed» closed

Automatically closed -- issue fixed for 2 weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.