removing anchor tags from node bodies

By bschoudel on 27 Feb 2009 at 02:35 UTC

I'm using FeedAPI to create nodes from various RSS feeds.

Some of the feeds end up building some sloppy node descriptions that contain many undesirable anchor tags into the node body.

What would be the best way to strip these anchor tags out?

Comments

Find anything

emackn commented 28 October 2009 at 18:34

You ever dig up anything about this?

REGEX or REGEXP to remove tags

jghyde commented 22 January 2010 at 01:55

I ran into a similar problem, only I wanted to strip out all image tags that were embedded in the $content variable of node.tpl.php. So, I looked at the raw html and saw it was wrapping a div around each image. It looked like this:

<div style="width: 604px" class="image-attach-body"><a href="/image/joe-hyde"><img src="http://www.hydeinteractive.com/sites/default/files/images/me1.jpg" alt="Joe Hyde" title="Joe Hyde"  class="image image-preview " width="604" height="402" /></a></div>

The common traits? Each image was wrapped inside a div with the same class name:

<div ... class="image-attach-body" ... </div>

And so, not wanting to get into the preprocessor stuff inside the template.php, I decided that the quickest way to get rid of those images was to use the good 'ol php function called preg_replace. All I had to do was find an awesome regex pattern that matched that div of that class name and replace the tag and everything in it, the images, with, well, nuthin' (e.g. "").

I searched for a good replacement regexp (regex) on the Web.

This is a very good regular expression tool to test your output: http://regex.larsolavtorvik.com/
and I found a good pattern here:
http://stackoverflow.com/questions/226562/how-can-i-remove-an-entire-htm...

I then applied the pattern to the $content variable.

$content = preg_replace('/<div[^>]*class=\"image-attach-body\"[^>]*>(.*?)<\/div>/im', '', $content);

Note: the "i" means to ignore case, and the "m" allows for multi-line matching.

It worked!

Now, when you

print $content;

Inside the node.tpl.php, it no longer displays images!

And I am so stoked.

Joe
http://www.hydeinteractive.com/

Local News Platform Built on Drupal
http://sanangelolive.com/

removing anchor tags from node bodies

Comments

Find anything

REGEX or REGEXP to remove tags

New forum topics

News items

Our community

Documentation

Drupal code base

Governance of community