I like the idea behind this module, but it doesn't work very well on pages of mine that have a lot of links. Following all the directions to the letter, I ended up a lot of errors. My /files/urlicon/ directory has several favicons now, but the pages that have a lot of links always time out (resulting in a blank page).

    * warning: parse_url(http:///favicon.ico) [function.parse-url]: Unable to parse url in /home/fabmin/public_html/includes/common.inc on line 366.
    * warning: parse_url(http:///en.wikipedia.org/wiki/Metaverse) [function.parse-url]: Unable to parse url in /home/fabmin/public_html/modules/contrib/urlicon/urlicon.module on line 68.
    * warning: fsockopen() [function.fsockopen]: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/fabmin/public_html/includes/common.inc on line 371.

It would be nice if it would make note of outgoing links and update the /files/urlicon/ directory every so often instead of doing it on each page load.

Finally, with urlicon.module is enabled the node ID is printed at the beginning of the node body.

CommentFileSizeAuthor
#1 urlicon.patch3.18 KBjaydub

Comments

jaydub’s picture

Status: Active » Needs review
StatusFileSize
new3.18 KB

I have put together a patch to address this issue along with a few other items.

First off, this being a filter, it will not be run every time if the there is a filter cache entry and the cache test (which I do not know much about) will trigger whether or not this module will do its thing.

Changes:

1) I modified the regex for finding the URL. I didn't see the need to save each section of the match that was in the original regex. Now it just stores the contents of the HREF and the text of the <a> tag.

2) The issue of printing out the node ID was due to the drupal_set_message functional call. The code was simply passing the node ID to this function which results in not very informative output. I put in some text to indicate that this module was doing something.

3) Modified the urlicon_replace callback function to address the following:

After fetching the headers of the request for URL favicon, need to test if the response indicated the existance
of a favicon. So first test is if HTTP response code of 200. Also tested if Content-Length was > 0. Using drupal.org
as a test returned HTTP 200 but zero-length content. Not sure how to get the drupal.org favicon :)

Next test expanded the Content-Type from image/x-icon to also include text/plain. I came across this when testing
with www.nytimes.com . Not sure why they return text/plain but to make it work I added that.

Only if the above conditions are met is the CSS file edited.

TODO

Right now this filter will run for both the trimmed and full version of a node. This results in two <link> entries in the resulting HTML as well as double entries in the node's CSS file. Ideally would be able to just make a single entry.

sanduhrs’s picture

Assigned: Unassigned » sanduhrs

Thanks for the input. I will have a look at it tonight.
vg

sanduhrs’s picture

Status: Needs review » Fixed

Based on the patch I did some improvements on the code and commited it to HEAD.
- Improved performance if favicon already availlable.
- Improved regex
- Added support for Content-Type: text/plain
- suppress warnings in case of invalid urls

Thanks.

jaydub’s picture

Status: Fixed » Active

I came across something else after submitting the patch to you...the regex will capture internal links which will not have the hostname. i see that you've added in an IF test for url['host'] however you could probably just add in the 'http' string as part of the regex to reduce the number of calls (helpful in a heavily internal linked page).

so, (.+?) becomes (http:\/\/.+?) or even just (http.+?). This is after the search for the 'href'. This will also mean that we will avoid looking up any non-http links such as ftp or mailto.

sanduhrs’s picture

Status: Active » Fixed

You're absolutely right.
Changed the regex accordingly and commited it.
Thanks.

Anonymous’s picture

Status: Fixed » Closed (fixed)