Hey,

One thing I've always been useless at is regular expressions. Basically I need a regex to parse out three elements of an anchor (a) tag: the url, the title attribute, and the link text inbetween the a tags. So:

<a href="/my/url" title="a description">link text</a>

becomes an array:

array( 'url' => 'my/url', 'title' => 'a description', 'text' => 'link text')

or somesuch. Any help is appreciated, thanks!

Comments

he_who_shall_not_be_named’s picture

<?php
  $url = '<a href="/my/url" title="a description">link text</a>';

  preg_match('/<a[\s]+.*href[\s]*=[\s]*["\']([^"\']*)["\'].*>(.*)<\/a>/miU', $url, $matches);
  
  $a['url'] = $matches[1];
  $a['text'] = $matches[2];
  
  preg_match('/<a[\s]+.*title[\s]*=[\s]*["\']([^"\']*)["\'].*>.*<\/a>/miU', $url, $matches);
  
  $a['title'] = $matches[1];
  
  print_r($a);
?>
profix898’s picture

I think you can have all at once? Why not?

<?php
  $url = '<a href="/my/url" title="a description">link text</a>';
  preg_match('/<a[\s]+.*href=[\']?[\"]?([^\"\']*)[\']?[\"]?[\s]+.*title=[\']?[\"]?([^\"\']*)[\']?[\"]?>([^<]*)<\/a>/miU', $url, $matches);
  $a['url'] = $matches[1];
  $a['title'] = $matches[2];
  $a['text'] = $matches[3];
  print_r($a);
?>

Seems like the input filters for forum cant handle my code post, this is the regexp again:

'/<a[\s]+.*href=[\']?[\"]?([^\"\']*)[\']?[\"]?[\s]+.*title=[\']?[\"]?([^\"\']*)[\']?[\"]?>([^<]*)<\/a>/miU'

There is a great tool available at http://www.weitz.de/regex-coach/ which can help you
developing new RegExps. For your purpose you should try with 'split' option.

he_who_shall_not_be_named’s picture

change

$url = '<a href="/my/url" title="a description">link text</a>';

to

$url = '<a title="a description" href="/my/url">link text</a>';

which is some (title changed with href) but you can't parse that :)

pfaocle’s picture

Excellent, thank you very much. Here's the application:

  foreach ($links as $link) {
    preg_match('/<a[\s]+.*title[\s]*=[\s]*["\']([^"\']*)["\'].*>.*<\/a>/miU', $link, $matches);
    $links_out .= '<div class="pm-item">'. $link .'<span class="small">'. $matches[1] .'</span></div>';
  }

which sits in a PHPTemplate template file for the primary/secondary links, allowing me to insert the title attribute of the link underneath the link itself in the theme.

Thanks again!

---
paul byrne
paul.leafish.co.uk | www.leafish.co.uk

---
Paul Byrne
pfaocle.co.uk | CTI digital

felipensp’s picture

Don't need uses the modifiers 'm' and 'U'.

My suggestion:

  foreach ($links as $link) {
    preg_match('/<a\s+(?:(?!title=).)*title\s*=\s*(?:\x22((?:(?!(?<!\\\)\x22).)+)|\x27((?:(?!(?<!\\\)\x27).)+)|([^\x22\x27\s>]+))/i', $link, $matches);
    if (!empty($matches[1])) {
      $title = $matches[1];
    } elseif (!empty($matches[2])) {
      $title = $matches[2];
    } else {
      $title = $matches[3];
    }
    $links_out .= '<div class="pm-item">'. $link .'<span class="small">'. $title .'</span></div>';
  }

Accept valids values:

<a href="foo.php" title="foo 'foo">foo</a>
<a href="foo.php" title='foo "foo"'>foo</a>
<a href="foo.php" title=foo>foo</a>