This is the first project of lolandese on drupal.org. It is going through a one-time Project Application process now. After approval it will be promoted as a full project.

You are welcome to:

  • use the module like any other project. See Installing sandbox projects for instructions or grab the latest package here and install as usual.
  • create issues as usual.
  • post an issue as a comment (only if this project is still a sandbox) on the Project Application page where you will find all initial issues for this project.

Comments

lolandese’s picture

Issue summary: View changes

Readability

dman’s picture

Any sane person looking at the one-liner in the current code

$text = preg_replace_callback('~<((div ([^>]*)(class\\s*=\\s*["\'](nofloat|' . trim($rejector[0]) . '|' . trim($rejector[1]) . ')["\'])([^>]*)>(.*?)</div)|(span([^>]*)(class\\s*=\\s*["\'](float|' . trim($selector[0]) . '|' . trim($selector[1]) . ')["\'])([^>]*)>(.*?)</span)|(a[^>]*)(href=)([^>]*)(><img[^>]*>(.*?)(</img>)?</a)|(img (.+?)))>~is', 'autofloat_filter_callback', $text);

Would be doing you a favor by linking you to why parsing HTML with regular expressions is not just a bad idea, but downright diabolical in non-trivial cases.
Hat tip to coding horror - which this is.

I'm not saying it's not clever.
I'm not saying that it doesn't work for you on your current one use-case.

But I'm warning you (as someone who has been terribly guilty of similar sins in the past) - this is madness.

lolandese’s picture

I agree 100% and maybe even more. I would be grateful for some specific hints how to use an XML-parser instead of a regex for this case. It would allow to add a class to the existing element instead of wrapping it in a 'span'. I looked into it but couldn't form a clear idea on how to do that. Adding a dependency on the QueryPath module would be okay if that should be the way to go.

dman’s picture

I have heard love for QueryPath and agree that may be a good track to follow.

Personally, having been doing this since many years before QueryPath became available, I've developed a habit of combining DOM XML parsing and XPATH queries.
They never work out quite as concise as they should be but I do at least know I'm getting *exactly* what I'm looking for. Or just crashing due to XML encoding issues. It was a crapshoot in PHP4 - but at least I got either a correct rewrite or a known failure - not random occasional half-borked results.

I'm not fully recommending my method - but I am at least able to defend it above RegExps.

this code example from the file_ownage project is a little newer and takes advantages of stuff now in PHP5. This is how I
* find img tags in a doc
* rewrite them to modify attributes.
* ... actually that's it.

Again, it's only tangentially similar to what you are doing .... but thought I'd throw it out as at least an alternative approach.

.. but if QueryPath lives up to the hype - certainly investigate that too.

lolandese’s picture

Having some example code helps a lot, especially because it's commented nicely.
Thanks for putting me on the right track.

lolandese’s picture

Version: » 7.x-1.x-dev
Status: Active » Fixed

Slightly specious (fallacious) arguments not to use an XML-parser:

  • performance concern of parsing the DOM several times in iteration
  • library dependency to do it right
  • it makes the code at least twice as big and three times as complex
  • adding a span wrapper can hardly be considered HTML manipulation. It can't have dramatic consequences if spans are added around the wrong elements or not added around the right ones.

..but the real reason is that my level of coding is not up for the task yet for this use case. Using an XML-parser within Drupal modules is high on my wishlist though. It is one of those capabilities that allow to build the real interesting modules and patches.

I enhanced the readability of the regex by adding an x-modifier to be able to split it over multiple lines and add inline comments. More info at http://net.tutsplus.com/tutorials/php/advanced-regular-expression-tips-a...

During the application process none of the reviewers perceived using a RegEx as a problem. Let's see what happens now that AutoFloat turned into a full project. For this reason only a development version is released for now. If issues arise due to the use of a regular expression, I should consider to make a substantial time-investment to study and build a release that uses an XML-parser instead. Meanwhile patches are more than welcome.

lolandese’s picture

Issue summary: View changes

Updated issue summary.

dman’s picture

I understand that it may not be a perfect fit - was just a peer-review sort of alternative suggestion for your consideration.
Yes, it's got a heck of a learning curve, and some performance implications unless you think ahead (it's possible to parse just once).
I won't dispute your choice. :-B

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Anonymous’s picture

Issue summary: View changes

Changed the download link to point to the official release.

lolandese’s picture

Issue summary: View changes
Related issues: +#1955308: Better parsing method

It has been a while, but I opened a 2.x branch that uses XPath instead of a regular expression.

I have to contradict what I said in #5:

  • It does not have a dependency on other modules or libraries.
  • It has less code lines, 31 insertions(+), 50 deletions(-)

Feedback is appreciated.

Thanks for the help previously provided.