By Fluffy Convict on
Currently, Drupal is lacking proper UBB support, so I'd thought it would be a good idea to write a stack-based UBB parser. For me, this is a first, but also a challenge. I am stuck now, and really could use some help. Regarding the code below, I have two questions:
- the code gets stuck in a never-ending loop when the first tag in $text is a closing tag (e.g. [/b]). Why, and what's your suggested solution?
- I want the parser to support "double tags" (tags that require an opening and closing, like [b]) and "single tags" (like [hr] and perhaps ones you define yourself). I'm not sure on how to implement the handling of single tags here - any suggestions?
Eventually, I want to make the parser extendible by modules by implementing a _ubb hook, e.g. [my_tag property="value"] would be handled by my_tag_ubb(array('property' => 'value'));
<?php
$text = 'The [b foo="fVal"]quick [i bar="bVal"]brown[/i] [u]fox[/b] jumped over the
[b]lazy[/b] dog. [img src="http://www.google.com/intl/en_ALL/images/logo.gif"]';
echo ubbParse($text);
function ubbParse($text) {
$validDoubleTags = array('b', 'i', 'foo', 'span', 'table', 'tr', 'td', 'cite');
$validSingleTags = array('img', 'hr');
$lastNode = array();
$stack = array();
$size = 0;
while (false !== ($node = ubbSearchTag($text))) {
// First node, special conditions
if (empty($lastNode)) {
// Text before node on stack
array_push($stack, substr($text, 0, $node['OPEN']));
$size = array_push($stack, $node); // First node on stack
}
// All but the first node
else {
// Opening tag
if ('/' != substr($node['TAG'], 0, 1)) {
// Text before node on stack
array_push($stack, substr($text, $lastNode['CLOSE'] + 1, $node['OPEN'] - 1 - $lastNode['CLOSE']));
// Add node to the stack
$size = array_push($stack, $node);
}
// Closing tag
else {
$prevText = '';
// Grab all text on the end of the array
$prevText = _popText($stack);
// Text from lastNode to current node (fox, lazy)
$prevText.= substr($text, $lastNode['CLOSE'] + 1, $node['OPEN'] - 1 - $lastNode['CLOSE']);
$match = array();
do {
$match = array_pop($stack);
if (in_array($match['TAG'], $validDoubleTags)) {
$output = "<{$match['TAG']}";
if (isset($match['ATTR'])) {
foreach ($match['ATTR'] as $id => $attribute) {
$output.= " $id=\"$attribute\"";
}
}
$output.= ">{$prevText}</{$match['TAG']}>";
$prevText = _popText($stack) . $output;
}
else {
// Do not automagically close the tag
$output = "[{$match['TAG']}]{$prevText}";
$prevText = _popText($stack) . $output;
}
}
while ($match['TAG'] != substr($node['TAG'], 1));
array_push($stack, $prevText);
}
}
$lastNode = $node;
}
$stack[] = substr($text, $lastNode['CLOSE'] + 1, strlen($text) - 1 - $lastNode['CLOSE']);
return implode('', $stack);
}
/**
* Every time it's called, it returns the next tag in $text
*
* @param string $text
* @return array|false UBB Node or false when no more nodes
*/
function ubbSearchTag($text) {
static $curr = 0;
$length = strlen($text);
$open = strpos($text, '[', $curr);
$close = strpos($text, ']', $curr);
if (0 == $close) {
return false;
}
$slice = substr($text, $open, $close - $open);
$tag = array();
preg_match('/\[([^ \[\]]*) ?/i', $slice, $tag);
$attributes = array();
preg_match_all('/ ([^="\[\]]*)="([^="]*)"/i', $slice, $attributes);
$node = array(
'TAG' => strtolower($tag[1]),
'OPEN' => $open,
'CLOSE' => $close,
);
foreach ($attributes[0] as $key => $attribute) {
$node['ATTR'][$attributes[1][$key]] = $attributes[2][$key];
}
unset($tag, $attributes);
$curr = $close + 1;
return $node;
}
/**
* Pops off all array elements that only contain text and return them all as one
* string
*/
function _popText(&$stack) {
$text = '';
while (!is_array($stack[count($stack) - 1])) {
$text = array_pop($stack) . $text;
if (0 == count($stack)) {
break;
}
}
return $text;
}
?>
Any help would be greatly appreciated (and credited for, of course :))!
Comments
Interest for DRUSL?
I have been thinking about this since I came across the wonderfull macrotags module. Eventually I would like to turn this parser into more than just a parser for UBB tags. I would like to create something called the Drupal Scripting Language (DRUSL?). I think it is something that could be of great use but is currently missing. Please read on.
If you want to enable users of your website to create content, Drupal does not make it easy for them to insert different content parts into their story. In my view, it should be easy to do something like this:
You get the point. By default, the parser will convert UBB tags into XHTML. But modules can define their own tags by implementing a parser_tag hook, in example image.module would implement parser_gallery(). Parsing the text above, the parser would call parser_gallery(array('album' => 'vacation 2008', 'thumb' => 'small')). This function could return the correct output or an error back to the parser.
I have written some pseudo code for this, and am working on the parser itself (which is going to be my first stackbased one). I do need help in developing this module though. I hope some of you are interested, I think it would really be an addition to content-creation!
What do you think?