I've just noted that form_clean_id() doesn't remove slashes, which are not allowed for IDs.

quote of the w3c xhtml 1.0 Recommendation:
Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information.

The attached patch improves form_clean_id to follow this recommendation, except for the first character which isn't forced to be [A-Za-z].

Comments

JirkaRybka’s picture

Seems to me the brackets '][' will now produce two hyphens instead of one. That might be a problem (?)

fago’s picture

StatusFileSize
new642 bytes

ops, of course this would be a problem, thanks! Attached is an updated patch, which handles this case too.

ducdebreme’s picture

How is the status of this patch? Why not including into the core HEAD?
It sound to be a good idea to have a positive filter on the ids...
Thanks
Stefan

darren oh’s picture

Version: 7.x-dev » 6.x-dev
StatusFileSize
new765 bytes

Updated to allow the process to be reversed if necessary.

darren oh’s picture

Version: 6.x-dev » 7.x-dev

Version: 6.x-dev » 7.x-dev
Status: Needs review » Needs work

The last submitted patch failed testing.

liam morland’s picture

Status: Needs work » Needs review

What is wrong with the original patch?

If multiple hyphens in a row is a problem, then add '/--+/' as a pattern to replace with '-'.

Any reason why this fix can't also be made in D6?

darren oh’s picture

The problem with the original patch is that it doesn't handle all invalid characters. The Drupal 6 maintainers do not accept fixes that have not first been made in Drupal 7.

liam morland’s picture

This code, from the patch, replaces with a hyphen all characters that are not valid in id attributes: $id = preg_replace('/[^A-Za-z0-9:.-]/', '-', $id);.

It also removes underscore, which it doesn't have to, but which is currently removed.

The only thing it doesn't do is guaranty that the first character is a letter, but that is probably not needed since strings coming through this function start with "edit-" or similar anyway.

darren oh’s picture

Sorry, I forgot. The only change in my patch was to make the process reversible. However, mine needs more work.

liam morland’s picture

StatusFileSize
new498 bytes

Attached is the original patch, rerolled against D7. If it is not desirable to generate multiple hyphens in a row, a minor change to the regex will fix that.

Status: Needs review » Needs work

The last submitted patch failed testing.

liam morland’s picture

Status: Needs work » Closed (duplicate)