Community Documentation

Stripping out junk from embed codes (regex between tags)

Last updated April 24, 2012. Created by rootwork on April 24, 2012.
Log in to edit this page.

I had a situation in which users were given the ability to embed content from a third-party provider. The default embed code contained a bunch of junk tags, including wrapping itself in a div with forced styles, and links to the provider's site, the user's profile on the provider's site, and links to pages on the provider's site with listings for every tag on the content.

I used computed field to strip out everything except the content between the <object> tags and it worked great!

First, create a plaintext field for the embed code. Then, create a computed field using the snippet below. The snippet assumes the plaintext field is named embed_code.

<?php
// grabs the value of the embed code from the plaintext field
$body = $node->field_embed_code[0]['value'];
// removes any whitespace
$body = preg_replace('/\s\s+/', ' ', $body);
// matches everything between the object tags, and nothing else
$pattern = '/<object[^>]*>(.*?)<\/object>/';
preg_match($pattern, $body, $matches);

// adds back the object tags and returns the value
$node_field[0]['value'] = '<object>' . $matches[1] . '</object>';
?>

Note that this returns only the first match of tag pairs, if there are multiple matches (that's what $matches[1] is for).

This could be used for matching the content between any given tags.

Page status

No known problems

Log in to edit this page

About this page

Drupal version
Drupal 6.x
Level
Beginner
Audience
Programmers
Keywords
computed field

Site Building Guide

Drupal’s online documentation is © 2000-2013 by the individual contributors and can be used in accordance with the Creative Commons License, Attribution-ShareAlike 2.0. PHP code is distributed under the GNU General Public License. Comments on documentation pages are used to improve content and then deleted.
nobody click here