Download & Extend

Allow Internationalized domain names (new link to the discussion below)

Project:Link
Version:6.x-2.9
Component:Code
Category:bug report
Priority:normal
Assigned:jcfiala
Status:closed (duplicate)
Issue tags:IDN

Issue Summary

I get "Not a valid URL" when trying a URI which has Hebrew characters. This is a big issue - there is no logic behind this, as paths and file names may have characters other than Latin ones these days.

Thanks!

Comments

#1

Same for the german Umlauts (äöü) and ß.

These do appear in working URLs
www.grüne.de

#2

I confirm the same, is not possible to add an URL with 'ñ' in it.

#3

Assigned to:Anonymous» jcfiala

Quite right, folks - there shouldn't be any reason for that. I'll move this issue up in my mental list. That said, if anyone would like to work on a patch, that would be great!

#4

Title:Allow non-English characters in a URI» Allow Internationalized domain name and non-ASCII characters in a URI

#5

Version:6.x-2.5» 6.x-2.6

Better title, I agree. Also, let's get the Version number correct, ya?

#6

#7

Title:Allow Internationalized domain name and non-ASCII characters in a URI» Allow Internationalized domain name

#8

Title:Allow Internationalized domain name» Allow Internationalized domain names

#9

+

#10

Version:6.x-2.6» 6.x-2.x-dev
Priority:critical» normal
Status:active» postponed (maintainer needs more info)

Alright, so here's what's been done. If you're using international urls that aren't working, then you can now (in the latest patches) go into the field and turn off url validation. This will allow people to enter just about any url they feel like, without throwing an error.

I've also added the ß and ñ.

Hebrew characters are giving me a bit of a trouble, though. Is there an expert in the house who can work with me on how to change the regular expressions in link_validate_url() to include the hebrew characters without listing every possible one?

#11

@#10
Good to hear that some progress is taken into account. However, I am afraid that neither of the options is good;

Turning off validation - well, this is going to far - one of the greater benefits of using the link module is that is prevents entering a bad string, by mistake.

Adding specific characters - there will be more to find out as people which use more languages will tackle this issue; adding more and more specific characters is an endless task.

I will try to get some ideas from people on how to solve this problem in more a generic way.

Amir

#12

"ß" is not allowed in domain/host names (and only in path)! ä, ü, ö is allowed.

#13

"ß" is not allowed in domain/host names (and only in path)! ä, ü, ö is allowed.

hass, you have to give the code time to get into the dev release. I only uploaded it an hour ago.

#14

@11: I agree that turning off validation is excessive!

The base problem is that I need to modify the regex to handle/include unicode characters, without including unicode symbols. This is something I'm not familiar enough with to do yet - I attempted to add \u#### to the regex, and got errors back out again. I'm open for suggestions and help of any sort!

#15

Is there an RFC for this we can use for the validation?

#16

@jcfiala: No, "ß" is a letter that is not allowed in hostnames. You cannot register domains with this letter. If you have allowed it in your code, this is wrong.

#17

@hass

Ah, that's what you meant. Do you have a reference that declares "ß" character-non-grata for hostnames?

#18

#19

Status:postponed (maintainer needs more info)» active

Interesting.

Okay, I'll change the code so that "ß" is not allowed in the domain name, but is allowed in the path.

#20

Status:active» fixed

Okay, the ß, which I have learned from wikipedia to call the Eszett character, is no longer allowed in domain names, but can be used in other locations.

#21

chx suggested in #389278: Create IDN encoding and decoding functions to write an extra module only for IDN validation. I also think this would be a great idea... but I'm not familiar enough with all IDN rules to maintain such a module.

#22

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

#23

Status:closed (fixed)» active

@#20 - jcfiala - This is not fixed at all; re-acticating.

#24

Status:active» postponed (maintainer needs more info)

Amir, can you please give more detail on what, exactly, is not fixed at all, and which version you were testing when you were trying it?

Giving exact urls that failed and should not have (or which were accepted and should not have) is _really_ useful for writing tests.

#25

OK - here is a link to an item in Drupal Israel site - you may see gibberish if you have no Hebrew fonts installed:

http://www.drupal.org.il/content/הוספת-פורום

Thanks.

#26

And I can't set url for cyrillic domain names.
For example: http://президент.рф

#27

Version:6.x-2.x-dev» 6.x-2.9
Status:postponed (maintainer needs more info)» active

I think a general solution required. Regardless of culture.

#28

I'm quite agreed that a general solution is required. That's part of the reason why this ticket is currently stalled - I don't have a general solution on hand, and higher-priority items are keeping me busy.

#29

You can use Punycode conversion then validate as usual I think.

Complete Punycode converter class: http://www.phpclasses.org/browse/download/1/file/5845/name/idna_convert.....

Just

<?php
  $IDN
= new idna_convert();
 
$punycoded = $IDN->encode($url);
 
//validate $punycoded
?>

And don't forget "рф" domain :-)
Is this helpful?

#30

Title:Allow Internationalized domain names» Allow Internationalized domain names (new link to the discussion below)
Status:active» closed (duplicate)

@unic, I thought the same. Can you please post your idea anew? There is a chance for support on this.

Please go here #1319520: Gathered: Internationalized domain names (punycode)

to join a discussion I set up on this and to contribute for faster implementation. So I mark this here as duplicate to make sure that evereybody joins the centralized discussion on this task, which I would like to put more attention on next time.

Thanks for the effort.

#31

Hi-

Is there anything as of now what works ?

at moment, if you put in link field someting like äriportaal.ee, it will generate link like yourdomain.com/%C3%A4riportaal.ee

What can be done ?

Regards,

Virgo