Regular Expressions (preg_match)

arhak - September 25, 2008 - 22:03
Project:Parsing API
Version:6.x-1.7
Component:Code
Category:feature request
Priority:normal
Assigned:arhak
Status:closed
Description

Would be nice to have some Regular Expression support for advanced users
maybe using preg_match/preg_replace and offering the same facilities as with strpos, etc.

#1

crystaldawn - September 26, 2008 - 17:07

Regex is pretty advanced, but I have thought of some regex versions of the before/before_last(), etc functions. I dont need them 98% of the time though since these really eliminate the need for regex to begin with. I think the ONLY advantage would be speed. The regex versions might have very very slight speed advantage.

#2

arhak - September 28, 2008 - 22:26

Well, the advantage I find on regex (in the scope of this module) is scrapping dynamic pages that vary through time but keep identifiable structure, thus can't be a simple strpos, but a regex is needed.
Certainly this won't be most of the cases.

#3

crystaldawn - September 28, 2008 - 22:38

It could surely help with structural searching. If you have time to write up some functions similar to what I've got here then I could put them in. Currently I'm bangin my head against the wall trying to make a miles version of the Vincenty formula and the last time I did trig was about 50 billion years ago. If you do write up some functions for it, if you could put together an example usage as well, I'll also add those to the docs pages.

#4

arhak - September 28, 2008 - 23:19
Assigned to:Anonymous» arhak
Status:active» postponed

I can't say no to such an offer, regretfully right now I'm about to take a site to live, so these kind of issues are solved some how.
I really would like to see regex in Parsing API since I use it and I don't have time neither CVS capabilities to maintain a module.
I will queue this.
Regex are no problem at all (generic examples might be though...)

#5

arhak - September 29, 2008 - 12:00
Status:postponed» needs review

In the attachment are two modules:
- parsing_api 6.x-2.x-dev (new functions added, minor improvements to old ones)
- parsing_api_test 6.x-2.x-dev (testing functions for the new regex functions, read bellow)

If you have time to write up some functions similar to what I've got here then I could put them in.

Done

Currently I'm bangin my head against the wall trying to make a miles version of the Vincenty formula and the last time I did trig was about 50 billion years ago.

Sorry, not following. What is that "Vincenty formula" about?

if you could put together an example usage as well

Well... this is the tough part for me.
I find tons of uses, but none for demonstration purpose. Regex are too complex, thus for advanced use. Nevertheless, there are some small examples in the tests functions, but those are so simple that can be perfectly done without regex.

To use the testing functions there is a shortcut:
- enable both modules (parsing_api & parsing_api_test)
- create a new page or story with the code echo parsing_api_test(); (between php tags and with PHP Code as Input Format, which requires PHP Filter core optional module to be enabled)
- Note: the displayed data doesn't come from online data, but from a sample fragment to speed up testing, to test against live pages read the following steps.

To use the testing functions with online data:
- do the same as above, but passing a drupal.org/project name as argument, for instance echo parsing_api_test('views'); or echo parsing_api_test('parsing_api'); (between php tags ... and so on, read above steps)

AttachmentSize
parsing_api-dev.zip 6.61 KB

#6

arhak - September 29, 2008 - 12:04

Note: I didn't recommend using Execute PHP from Devel module because the html output will be passed through check_plain, but definitely will be much faster

#7

crystaldawn - September 29, 2008 - 20:06

Great I'll take a look at this and let you know what I think. The examples dont have to be to extensive, just as an easy proof of concept so it wont matter if they are practical/usable or not.

#8

crystaldawn - September 29, 2008 - 20:06
Status:needs review» patch (to be ported)

#9

crystaldawn - September 29, 2008 - 23:36
Status:patch (to be ported)» fixed

I've added these functions along with minor changes. I prefer if statements over shorthand as it's easier to read. There isnt any performance difference between the two. I also added the php5 support for the strrpos() function and the surround function with a couple mods so that it could be used with multi_between(). I will be adding the test cases to the docs page.

#10

arhak - September 30, 2008 - 05:00

strrevpos penalty evaluating PHP version on each invocation, shouldn't be that way
please return to proposed if-else function declaration for performance issue
I'll check this out later, just a sight to check for missing is_bool on proposed patch, but I see they were inserted back on.

#11

arhak - October 2, 2008 - 17:12
Status:fixed» needs work

1
Duplicated code implies duplicated effort to maintain, test, and keep the code bug-free.
Thus, the logic of the functions should be written once and separately returned depending whether it must be plain checked or not.

Instead of:

  if (!check_plain) {
    $result = some(computed($value));
    return $result;
  }
  some(computed($value));
  return check_plain($result);

It should be:
  $result = some(computed($value));
  if (check_plain) {
    result = check_plain($result);
  }
  return $result;

I personally prefer return $check_plain ? check_plain($result) : $result; because it keeps very simple to maintain every's function return statement and is not cryptic at all.

Currently there are several functions with non-trivial code written twice (except multi_between, strrevpos and the *_pattern, all the rest)

2
The function surround was meant to be aside the check_plain issue, given that it's more useful fot tag surrounding and is not a ripper function, but an augmenting one.

BTW, function surround was never meant to return null, I recall:

  if ($surround_when_empty_as_well && empty($what))
  {
    $what = $when_empty;
  }
  return empty($what) ? $when_empty : ($prefix . $what . $suffix);

3
The function strrevpos shouldn't compare the current version of php on each call, rather it should be conditionally defined (as my initial suggestion) even when a function being conditionally declared looks odd.

#12

arhak - October 2, 2008 - 17:15
Version:6.x-2.0» 6.x-1.7

just correcting the target version

#13

crystaldawn - October 3, 2008 - 20:07
Status:needs work» closed

#14

arhak - October 5, 2008 - 20:05
Status:closed» won't fix

kind of rude closing this issue without an explanation even a comment to the one who contributed a considerable piece of code.
nevertheless... correcting the issue state as won't fix

#15

crystaldawn - October 6, 2008 - 15:32
Status:won't fix» fixed

Its closed because the topic was getting off the topic of the original post. I added the regex exps. The rest was nothing more than personal preference opinions. And if you would carefully look over the original surround function, you'll see that it would indeed return blank in some instances. I corrected it to do what you originally intended to do which was to always return something. Also, any code I write will have a checkplain feature. Surround didnt have it at all. So there you have it. Closed because regex was put in. Not "wont fix" because I already did add the things you contributed. If theres something wrong with the implementations then there should be a separate issue started. It makes things easier when using the search feature on drupal. Besides, all issues get auto-closed after 2 weeks of inactivity anyways so I'll just change it to fixed and in 2 weeks it'll be closed anyways. As for the if statement check on the function call. Makes sense and I did change it, just havent commited anything yet. I tried to benchmark it to see if it would make any difference and was unable to get any different results but the logic does make sense to only do the iffy statement once rather than on each call.

#16

arhak - October 8, 2008 - 06:54

My fault, sorry for that.
You're many times right:

the topic was getting off the topic of the original post

I didn't realize on this, certainly the regex were added.
I agree different issues should be treated separately (nevertheless I won't open any other issue since it's enough with that last reply of you)

if you would carefully look over the original surround function, you'll see that it would indeed return blank in some instances.

Right, there was an infrequently bug if both $what and $when_empty were empty at the same time.
BUT I recall your comment in the code

<?php
//This surround function was originally coded to return null when the value was empty.  I've instead changed it to at least return the
//withthis andthis variables because it would be useful in the multi_between() function.
?>

Which doesn't reflect that at all! The original function wasn't intended to return NULL in any case. Certainly neither empty when requested $surround_when_empty_as_well and that was a flaw of mine.
(Besides, you mention some utility for the multi_between function which is not using surround at all)
I think that comment in the code is out of place.

I corrected it to do what you originally intended to do which was to always return something.

Correct, there was a bug in the proposed code, never the intention (neither the ability) to return NULL (unless a null argument being passed).

Also, any code I write will have a checkplain feature.

As you said, a matter of taste. It seems too paranoid since the function surround is more useful internally, meaning when all the input was already through check_plain, but that's another issue...

Not "wont fix" because I already did add the things you contributed.

Right, again my mistake, it seems I was typing as if I were in a fresh new issue being rejected. Sorry for that too.

Besides, all issues get auto-closed after 2 weeks of inactivity anyways so I'll just change it to fixed and in 2 weeks it'll be closed anyways.

I understand I misused this issue thread, but what could avoid it was a simple comment stating "out of topic, original post already fixed" or something...

As for the if statement check on the function call. Makes sense and I did change it, just havent commited anything yet.

Certainly another issue, but thanks for agreeing on that, there is no hurry for such commit.

I tried to benchmark it to see if it would make any difference and was unable to get any different results but the logic does make sense to only do the iffy statement once rather than on each call.

Of course, you would have to look for an old PC (which won't ever be a server) to be able to notice the difference between benchmark test.

Finally, sorry for posting this reply on an issue thread, but it might look awful not to do it being mostly agree with you.

--project followup subject--

Anonymous (not verified) - October 22, 2008 - 07:13

Automatically closed -- issue fixed for two weeks with no activity.

#17

Anonymous (not verified) - October 22, 2008 - 07:22
Status:fixed» closed

Automatically closed -- issue fixed for two weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.