Closed (fixed)
Project:
AutoFloat
Version:
7.x-1.x-dev
Component:
Miscellaneous
Priority:
Normal
Category:
Task
Assigned:
Unassigned
Reporter:
Created:
12 Jul 2012 at 08:42 UTC
Updated:
9 Dec 2013 at 18:59 UTC
Jump to comment: Most recent
Comments
Comment #0.0
lolandese commentedReadability
Comment #1
dman commentedAny sane person looking at the one-liner in the current code
Would be doing you a favor by linking you to why parsing HTML with regular expressions is not just a bad idea, but downright diabolical in non-trivial cases.
Hat tip to coding horror - which this is.
I'm not saying it's not clever.
I'm not saying that it doesn't work for you on your current one use-case.
But I'm warning you (as someone who has been terribly guilty of similar sins in the past) - this is madness.
Comment #2
lolandese commentedI agree 100% and maybe even more. I would be grateful for some specific hints how to use an XML-parser instead of a regex for this case. It would allow to add a class to the existing element instead of wrapping it in a 'span'. I looked into it but couldn't form a clear idea on how to do that. Adding a dependency on the QueryPath module would be okay if that should be the way to go.
Comment #3
dman commentedI have heard love for QueryPath and agree that may be a good track to follow.
Personally, having been doing this since many years before QueryPath became available, I've developed a habit of combining DOM XML parsing and XPATH queries.
They never work out quite as concise as they should be but I do at least know I'm getting *exactly* what I'm looking for. Or just crashing due to XML encoding issues. It was a crapshoot in PHP4 - but at least I got either a correct rewrite or a known failure - not random occasional half-borked results.
I'm not fully recommending my method - but I am at least able to defend it above RegExps.
this code example from the file_ownage project is a little newer and takes advantages of stuff now in PHP5. This is how I
* find img tags in a doc
* rewrite them to modify attributes.
* ... actually that's it.
Again, it's only tangentially similar to what you are doing .... but thought I'd throw it out as at least an alternative approach.
.. but if QueryPath lives up to the hype - certainly investigate that too.
Comment #4
lolandese commentedHaving some example code helps a lot, especially because it's commented nicely.
Thanks for putting me on the right track.
Comment #5
lolandese commentedSlightly specious (fallacious) arguments not to use an XML-parser:
..but the real reason is that my level of coding is not up for the task yet for this use case. Using an XML-parser within Drupal modules is high on my wishlist though. It is one of those capabilities that allow to build the real interesting modules and patches.
I enhanced the readability of the regex by adding an x-modifier to be able to split it over multiple lines and add inline comments. More info at http://net.tutsplus.com/tutorials/php/advanced-regular-expression-tips-a...
During the application process none of the reviewers perceived using a RegEx as a problem. Let's see what happens now that AutoFloat turned into a full project. For this reason only a development version is released for now. If issues arise due to the use of a regular expression, I should consider to make a substantial time-investment to study and build a release that uses an XML-parser instead. Meanwhile patches are more than welcome.
Comment #5.0
lolandese commentedUpdated issue summary.
Comment #6
dman commentedI understand that it may not be a perfect fit - was just a peer-review sort of alternative suggestion for your consideration.
Yes, it's got a heck of a learning curve, and some performance implications unless you think ahead (it's possible to parse just once).
I won't dispute your choice. :-B
Comment #7.0
(not verified) commentedChanged the download link to point to the official release.
Comment #8
lolandese commentedIt has been a while, but I opened a 2.x branch that uses XPath instead of a regular expression.
I have to contradict what I said in #5:
Feedback is appreciated.
Thanks for the help previously provided.