No Picture links supported

BEfH - March 7, 2007 - 07:51
Project:Related links
Version:5.x-1.0-beta
Component:Miscellaneous
Category:bug report
Priority:normal
Assigned:Unassigned
Status:active
Description

I use a link with a picture inside in one of my static pages. But this module do not parse it correct and shows the img-tag in the block. :(
Would be OK if it searches for alt or better title attribute or use the html code.
PS: Sorry for my bad english...

#1

csc4 - March 28, 2007 - 11:06

This is actually a 4.7x and 5.x issue

Are there any regex guru's out there who could offer some help? I'm seeing this a lot as I use the amazontools module and the links from the images are horrible:

<h2>Links from Article Text</h2><ul><li><a nicetitle="Matched text: &lt;a href=&quot;http://www.amazon.co.uk/gp/redirect.html%3FASIN=0743275284%26tag=googletag%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0743275284%253FSubscriptionId=1XFK01HK9NZWGPENWGG2&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;http://ec1.images-amazon.com/images/P/0743275284.01._SCTHUMBZZZ_.jpg&quot; height=&quot;75&quot; width=&quot;50&quot; alt=&quot;cover of The Writing on the Wall: Why We Must Embrace China as a Partner or Face It as an Enemy&quot; /&gt;&lt;/a&gt;" href="http://www.amazon.co.uk/gp/redirect.html%3FASIN=0743275284%26tag=googletag%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0743275284%253FSubscriptionId=1XFK01HK9NZWGPENWGG2">&lt;img src="http://ec1.images-amazon.com/images/P/0743275284.01._SCTHUMBZZZ_.jpg" height="75" width="50" alt="cover of The Writing on the Wall: Why We Must Embrace China as a Partner or Face It as an Enemy" /&gt;</a></li><li><a nicetitle="Matched text: &lt;a href=&quot;http://www.amazon.co.uk/gp/redirect.html%3FASIN=0743275284%26tag=googletag%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0743275284%253FSubscriptionId=1XFK01HK9NZWGPENWGG2&quot; target=&quot;_blank&quot;&gt;The Writing on the Wall: Why We Must Embrace China as a Partner or Face It as an Enemy&lt;br&gt;&lt;/a&gt;" href="http://www.amazon.co.uk/gp/redirect.html%3FASIN=0743275284%26tag=googletag%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0743275284%253FSubscriptionId=1XFK01HK9NZWGPENWGG2">The Writing on the Wall: Why We Must Embrace China as a Partner or Face It as an Enemy&lt;br&gt;</a></li></ul></div>

The original source it is parsing looks like

<table class="class=" amazontools_related=""><tbody><tr><td><a href="http://www.amazon.co.uk/gp/redirect.html%3FASIN=0743275284%26tag=googletag%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0743275284%253FSubscriptionId=1XFK01HK9NZWGPENWGG2" target="_blank"><img src="http://ec1.images-amazon.com/images/P/0743275284.01._SCTHUMBZZZ_.jpg" alt="cover of The Writing on the Wall: Why We Must Embrace China as a Partner or Face It as an Enemy" height="75" width="50"></a></td><td><a href="http://www.amazon.co.uk/gp/redirect.html%3FASIN=0743275284%26tag=googletag%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/0743275284%253FSubscriptionId=1XFK01HK9NZWGPENWGG2" target="_blank">The Writing on the Wall: Why We Must Embrace China as a Partner or Face It as an Enemy<br></a>author: Will Hutton<br>asin: 0743275284

I found http://drupal.org/node/53880#comment-101916 which suggested

$output = preg_replace('#<a href="/\?q=glossary[^"]+" title="[^"]+"><img src="/[^"]+" /></a>#', '', $output );

to strip out glossary image links but I'm not sure how to get this changed to strip the img tags from Related Links? I've tried some things myself but I just don't seem to be getting anywhere.

I believe the issues is around line 103

              if (in_array(RELATEDLINKS_PARSED, variable_get('relatedlinks_types', array(RELATEDLINKS_PARSED)))) {
                // Rather than parsing out only the URI + link text, an attempt is
                // made to retain any other attributes present.
                preg_match_all('#(<a [^>]+>[^<]+</a>)#', $node->body, $matches);
                if (count($matches[1])) {
                  $links = array();
                  // Check URIs for duplicates.
                  foreach ($matches[1] as $index => $link) {
                    preg_match('#href\s*=\s*["]*([^"\s>]*)#', $link, $match);
                    $link = rtrim($match[1], '/');
                    if (!in_array($link, $links)) {
                      $links[] = $link;
                    }
                    else {
                      // Unset duplicate.
                      unset($matches[1][$index]);
                    }
                  }
                  _relatedlinks_add_links($node->nid, $matches[1], RELATEDLINKS_PARSED);
                }
              }

I tried

                  foreach ($matches[1] as $index => $link) {
                    preg_match('#href\s*=\s*["]*([^"\s>]*)#', $link, $match);
                    $link = rtrim($match[1], '/');
                    $link = preg_replace('#<img src="/[^"]+" />#', '', $link);
                    if (!in_array($link, $links)) {
                      $links[] = $link;
                    }
                    else {
                      // Unset duplicate.
                      unset($matches[1][$index]);
                    }
                  }
but I don't seem to be getting anywhere.

Anyone out there good at regex?

#2

csc4 - January 21, 2008 - 10:06

Is there really noone out there who can help with this regex?

 
 

Drupal is a registered trademark of Dries Buytaert.