Turn this sandbox into a full project. [#1680940]

This is the first project of lolandese on drupal.org. ~~It is going through a one-time Project Application process now.~~ After approval it will be promoted as a full project.

You are welcome to:

use the module like any other project. See Installing sandbox projects for instructions or grab the latest package here and install as usual.
create issues as usual.
~~post an issue as a comment (only if this project is still a sandbox) on the Project Application page where you will find all initial issues for this project.~~

Comments

Comment #0.0

lolandese commented 12 July 2012 at 12:34

Issue summary:

View changes

Readability

Comment #1

dman commented 12 July 2012 at 12:52

Any sane person looking at the one-liner in the current code

$text = preg_replace_callback('~<((div ([^>]*)(class\\s*=\\s*["\'](nofloat|' . trim($rejector[0]) . '|' . trim($rejector[1]) . ')["\'])([^>]*)>(.*?)</div)|(span([^>]*)(class\\s*=\\s*["\'](float|' . trim($selector[0]) . '|' . trim($selector[1]) . ')["\'])([^>]*)>(.*?)</span)|(a[^>]*)(href=)([^>]*)(><img[^>]*>(.*?)(</img>)?</a)|(img (.+?)))>~is', 'autofloat_filter_callback', $text);

Would be doing you a favor by linking you to why parsing HTML with regular expressions is not just a bad idea, but downright diabolical in non-trivial cases.
Hat tip to coding horror - which this is.

I'm not saying it's not clever.
I'm not saying that it doesn't work for you on your current one use-case.

But I'm warning you (as someone who has been terribly guilty of similar sins in the past) - this is madness.

Comment #2

lolandese commented 12 July 2012 at 13:05

I agree 100% and maybe even more. I would be grateful for some specific hints how to use an XML-parser instead of a regex for this case. It would allow to add a class to the existing element instead of wrapping it in a 'span'. I looked into it but couldn't form a clear idea on how to do that. Adding a dependency on the QueryPath module would be okay if that should be the way to go.

Comment #3

dman commented 12 July 2012 at 13:37

I have heard love for QueryPath and agree that may be a good track to follow.

Personally, having been doing this since many years before QueryPath became available, I've developed a habit of combining DOM XML parsing and XPATH queries.
They never work out quite as concise as they should be but I do at least know I'm getting *exactly* what I'm looking for. Or just crashing due to XML encoding issues. It was a crapshoot in PHP4 - but at least I got either a correct rewrite or a known failure - not random occasional half-borked results.

I'm not fully recommending my method - but I am at least able to defend it above RegExps.

this code example from the file_ownage project is a little newer and takes advantages of stuff now in PHP5. This is how I
* find img tags in a doc
* rewrite them to modify attributes.
* ... actually that's it.

Again, it's only tangentially similar to what you are doing .... but thought I'd throw it out as at least an alternative approach.

.. but if QueryPath lives up to the hype - certainly investigate that too.

Comment #4

lolandese commented 12 July 2012 at 20:40

Having some example code helps a lot, especially because it's commented nicely.
Thanks for putting me on the right track.

Comment #5

lolandese commented 21 September 2012 at 13:04

Version:		» 7.x-1.x-dev
Status:	Active	» Fixed

Slightly specious (fallacious) arguments not to use an XML-parser:

performance concern of parsing the DOM several times in iteration
library dependency to do it right
it makes the code at least twice as big and three times as complex
adding a span wrapper can hardly be considered HTML manipulation. It can't have dramatic consequences if spans are added around the wrong elements or not added around the right ones.

..but the real reason is that my level of coding is not up for the task yet for this use case. Using an XML-parser within Drupal modules is high on my wishlist though. It is one of those capabilities that allow to build the real interesting modules and patches.

I enhanced the readability of the regex by adding an x-modifier to be able to split it over multiple lines and add inline comments. More info at http://net.tutsplus.com/tutorials/php/advanced-regular-expression-tips-a...

During the application process none of the reviewers perceived using a RegEx as a problem. Let's see what happens now that AutoFloat turned into a full project. For this reason only a development version is released for now. If issues arise due to the use of a regular expression, I should consider to make a substantial time-investment to study and build a release that uses an XML-parser instead. Meanwhile patches are more than welcome.

Comment #5.0

lolandese commented 21 September 2012 at 13:04

Issue summary:

View changes

Updated issue summary.

Comment #6

dman commented 27 September 2012 at 11:13

I understand that it may not be a perfect fit - was just a peer-review sort of alternative suggestion for your consideration.
Yes, it's got a heck of a learning curve, and some performance implications unless you think ahead (it's possible to parse just once).
I won't dispute your choice. :-B

Comment #7

11 October 2012 at 11:21

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Comment #7.0

(not verified) commented 11 October 2012 at 11:21

Issue summary:

View changes

Changed the download link to point to the official release.

Comment #8

lolandese commented 9 December 2013 at 19:00

Issue summary:	View changes
Related issues:		+#1955308: Better parsing method

It has been a while, but I opened a 2.x branch that uses XPath instead of a regular expression.

I have to contradict what I said in #5:

It does not have a dependency on other modules or libraries.
It has less code lines, 31 insertions(+), 50 deletions(-)

Feedback is appreciated.

Thanks for the help previously provided.

Turn this sandbox into a full project.

Comments

Comment #0.0

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #5.0

Comment #6

Comment #7

Comment #7.0

Comment #8

Related issues

News items

Our community

Documentation

Drupal code base

Governance of community