So, uploading stuff to my actual web server at work showed me something interesting:
***
warning: Compilation failed: unrecognized character after (?< at offset 11 in /var/www/html/sites/all/modules/mmedia/contrib/media_attach/media_attach.module on line 453.
***
25 times

The big difference is that the server runs RHEL 4 (IIRC), with goodies like php 4.3.9. So - I'm curious about what this unrecognized character might be (to my untrained eye it looks like '_'). Anyway, the module didn't spew these errors before chritmas - apart from that I have no clue. Do you?

CommentFileSizeAuthor
#5 media_attach.module.patch1.25 KBIbn al-Hazardous

Comments

Ibn al-Hazardous’s picture

As far as I can tell - it's a problem with the regexp on line 453. I'm looking into it.

Ibn al-Hazardous’s picture

/\[attach(?:(?:(?:(?:\w+="(?:\\\\"|[^"]|)+"));*)+|)\]/

Hmm, that was a nice one... ;)

If you explain what it is you want to match - I can write a regexp that's compatible with 4.3 (I have the advantage of having a machine to test it on).

rhys’s picture

The idea is to match the following sorts of tags:

  • [attach]
  • [attach:setting="value"]
  • [attach:setting1="\"value in quotes\"";setting2="another setting";setting3="and so on"]

The first provides a display mechanism for all of the attachments.
The second is simply with one setting.
The third is multiple settings, with escaped quotes within values allowed.

This covers quite a bit, and will allow for things such as playlisted displays, slideshows, etc., to be performed. I hope that by doing this, we can also improve the method of how we display things, utilizing things such as what you wrote, to provide mechanism to deal with these things (including default transforms, with override capacity, selection of presenter, etc).

This tagging is different to the general filter mechanism, since it has to take into account whether it's in a teaser or a body.

Ibn al-Hazardous’s picture

Ah, I can see how a more featurefull regexp makes it simpler to parse this kind of tag. I'm digging through the code - and I'll be back when I come up with something that works on older PHP too. :)

Edit: Oh, the very interesting thing is that regexp no 2 works fine:

'/(?<key>\w+)=(?<value>"(?:\\\\"|[^"]|)+")/'

Edit again: part of this is swallowed by the tag-filter - let's see if this works better.

Last edit: regexp no 2 didn't work fine, it just didn't throw an error - it did nothing at all with php4 :(

Ibn al-Hazardous’s picture

StatusFileSize
new1.25 KB

Ok, here's a patch. It populates $settings with the oldfashioned split-technique. Works on the examples you posted before. :)

rhys’s picture

Status: Active » Needs work

The difficulty I have with the changes that you have in the patch, is that a badly written tag can still be passed. This leads to some serious problems with XSS, or other potential problems, including that there is no possibility to have a ] within one of the settings within the tag.

This is why there was a more detailed regex.

What I'm going to try, is to take out the special regex specifiers (?, and ?:), although I'm fairly sure that ?: functions for PHP4 (I saw it being used with core).

I really want to be very specific for faulty created tags, and the patch you supplied doesn't do enough error checking.

Ibn al-Hazardous’s picture

Ah, I can see your point. It mustn't match the next ] but the next ] that is not enclosed in "". And in the second instance it mustn't split on all ; but again only on those not enclosed in "". That makes your regexp a lot clearer to me. <:)

So, a reasonable test-set would look somtheing like this

Text[attach] more [attach:setting="test for evi] trap"]  don't attach] [attach:setting1="more \" and \" more";setting2="and ; more";setting3="and the last = one"].

And this should yield (if we put a print_r in the loop):

$settings=( )
$settings=(
  'setting'=>'test for evi] trap'
)
$settings=(
  'setting1'=>'more \" and \" more',
  'setting2'=>'and ; more',
  'setting3'=>'and the last = one'
)

Right? Any more test cases I haven't thought of?

What if a malicious entity inserts funny characters in the key?
[attach:set]ing="gottcha!"]
Should we just invalidate all of the tag then?
Can the key be named anything, or just "setting\d*"?
And lastly, are both single and double quotes ok for enclosing the setting value?

rhys’s picture

Basically, the idea would be to invalidate any tag which does not meet the regex standards (i.e. completely ignore -- leaving it to be dealt with by the drupal system).

The key can be named anything that is a "\w\d*" sequence.

I wasn't planning on supporting both styles of quotes (since this complicates the regex somewhat unnecessarily) - only the double quotes. If you can think of a reason, and a simple way to make it function, then feel free to add it. (Since I'll then try to update the general media filter to become somewhat closer in style).

rhys’s picture

Status: Needs work » Fixed

I've changed it so that the regex doesn't used the (?<> specifier... This should now function with PHP4.
I'll do the same with the media filter also, as it has the same problem.

Updated in the CVS. Actually quite a few things where added.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.