Refactor regex for better validation and to allow for special chars in urls
neo2049 - August 2, 2009 - 19:07
| Project: | Follow |
| Version: | 7.x-1.x-dev |
| Component: | Code |
| Category: | task |
| Priority: | normal |
| Assigned: | q0rban |
| Status: | closed |
Description
Hi,
I've been able to add facebook, twitter and flickr to the follow module but when I try last.fm it says my url is invalid. I have checked the url so many times but no joy. I think the module doesn't like the .fm in the url.
Please help.

#1
Hi, I'm sorry you are having troubles.
I can't replicate this. Can you paste the url here that you are unable to use? Thanks!
#2
The URL I am trying is http://www.last.fm/music/Musica+Reverso
Thanks for your help.
#3
hmm, I imagine it's the plus symbol actually. Try http://www.last.fm/music/Musica%2BReverso
#4
Thanks man! It works perfectly now.
#5
I'm going to keep this active in case it continues to be a problem for other people.
#6
Bumping down to normal
#7
#8
started working on this for D7, adding knwn service urls to build better validation
#9
here's a pass at a D7 patch, not really tested yet.
#10
picking up the suggestion here: http://drupal.org/node/619276#comment-2218064 for a more complex data structure to help with validation and display
#11
#12
fixes a bug with the user block theme function call also.
#13
remove debug code, improve error message.
#14
Allowing special characters is hard, since there is only a single text area. Basically to do it correctly we need to parse the URL on input and store the parts separately in the DB. I think drupal core is doing something like this in the menu module.
I'm planning to commit the above patch, since solving the other concern seems liek a more substantial problem that at first sight.
#15
Marked #622118: Follow doesn't accept valid Flickr link dues to @ symbol as a duplicate of this issue.
Also, anyone that is having trouble with special characters can refer to this doc and convert the special characters by hand.
@pwolanin, is there any way to parse out the url and pass all fragments to urlencode?
#16
@q0rban - yes, see: http://api.drupal.org/api/function/menu_edit_item_validate/7
but then I'd suggest the DB storage needs to change. In fact the module should not even need to urlencode anything if using the l() function properly due to http://api.drupal.org/api/function/drupal_http_build_query/7
#17
Ok, then I think the only thing the regex needs to do is make sure it has the domain in the url. Then we still need to somehow filter for XSS before saving to the DB, correct?
#18
@q0rban - no, l() should filter this on output, so using that well will make everything both easier and more robust. In fact, if we are going to parse the URL, likely we can make the regex simpler.
We will essentially need to save to the DB 2 possible ways:
1) a path + serialized options array as 2 columns
2) a path, query, and fragment columns (query sill likely as a serialized array)
#19
Hmm... I'm going to have to think about this.. Off the bat I'm not crazy about parsing it and saving it to the DB that way.
#20
@q0rban - well that's how drupal core saves links - what's the downside?
#21
We should apply these 2 functions:
http://www.php.net/manual/en/function.parse-url.php
http://www.php.net/manual/en/function.parse-str.php
#22
#23
could also look at using http://api.drupal.org/api/function/drupal_parse_url/7, but that's not as readily backported to 6.x
#24
Minor tweaks to regex generation and help/error text. About ready to commit this.
#25
#26
opps - broke a clean install. Follow-up patch to fix this - committing.
#27
Automatically closed -- issue fixed for 2 weeks with no activity.