Regular Expressions (preg_match)
arhak - September 25, 2008 - 22:03
| Project: | Parsing API |
| Version: | 6.x-1.7 |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | arhak |
| Status: | closed |
Description
Would be nice to have some Regular Expression support for advanced users
maybe using preg_match/preg_replace and offering the same facilities as with strpos, etc.

#1
Regex is pretty advanced, but I have thought of some regex versions of the before/before_last(), etc functions. I dont need them 98% of the time though since these really eliminate the need for regex to begin with. I think the ONLY advantage would be speed. The regex versions might have very very slight speed advantage.
#2
Well, the advantage I find on regex (in the scope of this module) is scrapping dynamic pages that vary through time but keep identifiable structure, thus can't be a simple strpos, but a regex is needed.
Certainly this won't be most of the cases.
#3
It could surely help with structural searching. If you have time to write up some functions similar to what I've got here then I could put them in. Currently I'm bangin my head against the wall trying to make a miles version of the Vincenty formula and the last time I did trig was about 50 billion years ago. If you do write up some functions for it, if you could put together an example usage as well, I'll also add those to the docs pages.
#4
I can't say no to such an offer, regretfully right now I'm about to take a site to live, so these kind of issues are solved some how.
I really would like to see regex in Parsing API since I use it and I don't have time neither CVS capabilities to maintain a module.
I will queue this.
Regex are no problem at all (generic examples might be though...)
#5
In the attachment are two modules:
- parsing_api 6.x-2.x-dev (new functions added, minor improvements to old ones)
- parsing_api_test 6.x-2.x-dev (testing functions for the new regex functions, read bellow)
Done
Sorry, not following. What is that "Vincenty formula" about?
Well... this is the tough part for me.
I find tons of uses, but none for demonstration purpose. Regex are too complex, thus for advanced use. Nevertheless, there are some small examples in the tests functions, but those are so simple that can be perfectly done without regex.
To use the testing functions there is a shortcut:
- enable both modules (parsing_api & parsing_api_test)
- create a new page or story with the code
echo parsing_api_test();(between php tags and with PHP Code as Input Format, which requires PHP Filter core optional module to be enabled)- Note: the displayed data doesn't come from online data, but from a sample fragment to speed up testing, to test against live pages read the following steps.
To use the testing functions with online data:
- do the same as above, but passing a drupal.org/project name as argument, for instance
echo parsing_api_test('views');orecho parsing_api_test('parsing_api');(between php tags ... and so on, read above steps)#6
Note: I didn't recommend using Execute PHP from Devel module because the html output will be passed through
check_plain, but definitely will be much faster#7
Great I'll take a look at this and let you know what I think. The examples dont have to be to extensive, just as an easy proof of concept so it wont matter if they are practical/usable or not.
#8
#9
I've added these functions along with minor changes. I prefer if statements over shorthand as it's easier to read. There isnt any performance difference between the two. I also added the php5 support for the strrpos() function and the surround function with a couple mods so that it could be used with multi_between(). I will be adding the test cases to the docs page.
#10
strrevpospenalty evaluating PHP version on each invocation, shouldn't be that wayplease return to proposed if-else function declaration for performance issue
I'll check this out later, just a sight to check for missing
is_boolon proposed patch, but I see they were inserted back on.#11
1
Duplicated code implies duplicated effort to maintain, test, and keep the code bug-free.
Thus, the logic of the functions should be written once and separately returned depending whether it must be plain checked or not.
Instead of:
if (!check_plain) {$result = some(computed($value));
return $result;
}
some(computed($value));
return check_plain($result);
It should be:
$result = some(computed($value));if (check_plain) {
result = check_plain($result);
}
return $result;
I personally prefer
return $check_plain ? check_plain($result) : $result;because it keeps very simple to maintain every's function return statement and is not cryptic at all.Currently there are several functions with non-trivial code written twice (except
multi_between,strrevposand the*_pattern, all the rest)2
The function
surroundwas meant to be aside thecheck_plainissue, given that it's more useful fot tag surrounding and is not a ripper function, but an augmenting one.BTW, function
surroundwas never meant to returnnull, I recall:if ($surround_when_empty_as_well && empty($what)){
$what = $when_empty;
}
return empty($what) ? $when_empty : ($prefix . $what . $suffix);
3
The function
strrevposshouldn't compare the current version of php on each call, rather it should be conditionally defined (as my initial suggestion) even when a function being conditionally declared looks odd.#12
just correcting the target version
#13
#14
kind of rude closing this issue without an explanation even a comment to the one who contributed a considerable piece of code.
nevertheless... correcting the issue state as
won't fix#15
Its closed because the topic was getting off the topic of the original post. I added the regex exps. The rest was nothing more than personal preference opinions. And if you would carefully look over the original surround function, you'll see that it would indeed return blank in some instances. I corrected it to do what you originally intended to do which was to always return something. Also, any code I write will have a checkplain feature. Surround didnt have it at all. So there you have it. Closed because regex was put in. Not "wont fix" because I already did add the things you contributed. If theres something wrong with the implementations then there should be a separate issue started. It makes things easier when using the search feature on drupal. Besides, all issues get auto-closed after 2 weeks of inactivity anyways so I'll just change it to fixed and in 2 weeks it'll be closed anyways. As for the if statement check on the function call. Makes sense and I did change it, just havent commited anything yet. I tried to benchmark it to see if it would make any difference and was unable to get any different results but the logic does make sense to only do the iffy statement once rather than on each call.
#16
My fault, sorry for that.
You're many times right:
I didn't realize on this, certainly the regex were added.
I agree different issues should be treated separately (nevertheless I won't open any other issue since it's enough with that last reply of you)
Right, there was an infrequently bug if both
$whatand$when_emptywere empty at the same time.BUT I recall your comment in the code
<?php//This surround function was originally coded to return null when the value was empty. I've instead changed it to at least return the
//withthis andthis variables because it would be useful in the multi_between() function.
?>
Which doesn't reflect that at all! The original function wasn't intended to return NULL in any case. Certainly neither empty when requested
$surround_when_empty_as_welland that was a flaw of mine.(Besides, you mention some utility for the
multi_betweenfunction which is not usingsurroundat all)I think that comment in the code is out of place.
Correct, there was a bug in the proposed code, never the intention (neither the ability) to return NULL (unless a null argument being passed).
As you said, a matter of taste. It seems too paranoid since the function
surroundis more useful internally, meaning when all the input was already throughcheck_plain, but that's another issue...Right, again my mistake, it seems I was typing as if I were in a fresh new issue being rejected. Sorry for that too.
I understand I misused this issue thread, but what could avoid it was a simple comment stating "out of topic, original post already fixed" or something...
Certainly another issue, but thanks for agreeing on that, there is no hurry for such commit.
Of course, you would have to look for an old PC (which won't ever be a server) to be able to notice the difference between benchmark test.
Finally, sorry for posting this reply on an issue thread, but it might look awful not to do it being mostly agree with you.
--project followup subject--
Automatically closed -- issue fixed for two weeks with no activity.
#17
Automatically closed -- issue fixed for two weeks with no activity.