the current regex PIFT uses to determine which uploaded files should be marked for patch testing is: /(\.diff|\.patch)$/, which translated means "only mark files for testing that end in .patch or .diff
the problem here, as pointed out by both webchick and dww, is that even though we're only pulling patches for issues that are set as 7.x in the Drupal project, it's common practice to post backport patches in the same issue, and also common practice to end those in .patch or .diff.
this can lead to a situation where a patch for Drupal 6 is improperly tested, and most likely will fail. it's possible to work around that just by respecting the current filename regex we filter on, by doing, say, foo.patch.d6.txt (note that you can't do foo.patch.d6 currently, b/c comment upload would reject that extension). however, this is a bit of an awkward approach.
what i'd like to do is twofold:
- determine what the best naming convention is for these backport patches. this would ideally be something that most people are already doing. i've been told that it's unofficial practice to do something like foo-d6.patch to indicate the patch is against the drupal 6 API. whether it's that or something else, we should try to determine what the naming convention for these backport patches looks like
- once #1 is set, then we need a regex whiz to express the convention so i can change our current setting to be more robust. i'm thinking the regex should generally do something like "test everything that ends in .patch or .diff, except this/these patterns.
note that due to the design of PIFT, this new regex will only be applied to files uploaded after the setting is updated, so the sooner we do this the better ;)
| Comment | File | Size | Author |
|---|---|---|---|
| #34 | pift-regex.png | 57.85 KB | sun |
Comments
Comment #1
gábor hojtsyIf the current regex is
/(\.diff|\.patch)$/then a new regex could be/(?<=-d6|\.d6)(\.diff|\.patch)$/which uses negative look-behind to detect only .patch and .diff where it is not named with -d6 or .d6 before the final extension. Untested but should work from previous experience.As for defining what the common ground for patch name ending for backports is a good question. I think -d6 and .d6 suffixes are common, so it makes sense to support both. Better one can only tell with a drupal.org database dump, taking a look at names of the files attached to issues.
Maintainability for later versions should also be considered. D5 backport patches are also posted in some issues. What if the .d7.patch becomes the backport patch (while we are working on D8?).
Comment #2
hunmonk commentedwe can alter the regex as necessary as newer versions of drupal come out.
can you rewrite that regex you posted to also support -d5.patch and .d5.patch as well? we should include those...
Comment #3
gábor hojtsySure, that would be
/(?<=-d6|\.d6|-d5|\.d5)(\.diff|\.patch)$/Comment #4
gábor hojtsyOh, wow, sorry we should catch where this does not match, not where this matches, so make it:
/(?<!-d6|\.d6|-d5|\.d5)(\.diff|\.patch)$/. To break it down, () is a separated "subsearch"?<!means we should look back in the pattern and check that the value inside the subsearch is not there. The different options are the possible values separated with | so they are in an OR relationship.Comment #5
dwwNote: D6 is just as common as d6 in my experience, so we should support both cases. And, instead of instead of "unwinding" all the exclusion options, why not just use
[.-][Dd][56]? That'll make it much easier to maintain this regexp in the future as we add/remove other versions of core...Comment #6
gábor hojtsydww. your regex as-is matches -d and -D without a number, so I assume you meant to use () around it before the question mark. If we are only going for these combinations, then yours is absolutely simpler.
Comment #7
mfbWould be nice if project module allowed you to add a metadata tag to each attachment, indicating if it's a patch and what branch it's a patch on.
Comment #8
dww@gabor: the ? was related to the "why not" of the english prose, and wasn't included inside the code tag for the regexp. ;) Sorry that wasn't clear.
@mfb: yes it would, i said the same thing to hunmonk in IRC and he laughed at me, saying that was way out of scope...
Comment #9
aclight commentedI think it would also be useful to allow the underscore character to separate the patch name from a version segment. So, to modify dww's pseudo-regex in #5, I'd go with:
Comment #10
hunmonk commented@dww: i'm pretty sure what i said was it was out of scope for where we're at in the system, and that it was a good idea long term. one step at a time!!
Comment #11
boombatower commented/me notices that the issues he brought up in original conversations with hunmonk that were dismissed re-emerge.
Comment #12
hunmonk commented@boombatower: this is why it's good to have more than one person looking at the problem :)
glad you're on board to help me get this across the finish line.
Comment #13
Island Usurper commentedJust putting the latest version of the regex at the bottom of the thread. Make sure everybody's on the same page. :)
/(?<![.-_][Dd][56])(\.diff|\.patch)$/I can't think of any more improvements, so +1 from me.
Comment #14
sunCould we also include "drupal5"/"drupal6" and CVS branch names "DRUPAL-5/DRUPAL-6"? Most often, I'm creating patches on dedicated core test sites, checked out into corresponding directory names, so all of these should not match:
I'd also like to propose to remove .diff from the regex, so we have a (yes, special) way of uploading patches that are NOT intended to be tested at all.
Comment #15
hunmonk commented@all: for purposes of consistency, i think (if we can) that we should limit the back check to whatever is right before .patch/.diff
@sun: so from above, you'd want naming more like:
Comment #16
boombatower commentedI like
whatever.d6.patchas that is consistent with file extensions likepifr.pages.incand is short and sweet.Comment #17
sunwell, if we want to test the suffix only, *d6.patch/*d5.patch would be sufficient.
My suggestion was based on my personal workflow/habit. I am using a script to create patches, which automatically takes the parent folder name (i.e. "drupal6" or "drupal-DRUPAL-6") as basename and extends it with ".patch". Afterwards, I'm just adding a descriptor in between, i.e. resulting in "drupal6.whatever-issue.patch".
If I am forced to, I can certainly append ".d6" to my descriptors where appropriate, or automate my environment even further ;)
Comment #18
hunmonk commented@sun: i think we want a naming convention here that's tight, which means no garbage between the parts we're testing to me. i do think we can offer the flexibility in #13 or similar.
Comment #19
catchd6/D6/d5/D5 seems reasonable to me. Doesn't seem like we need a lot more flexibility than that.
We'll need to document the creation in drupal.org/patch somewhere - if one or two people post patches don't meet the convention, then, well, that's tough, and it's easy to repost the same thing again differently named. Far worse things happen to issues than that.
Comment #20
sunKeep it simple. Also, pretty much talk about separating the "dX" suffix from the rest of the filename, while it's very unlikely that someone names a D7 patch "fixed6.patch" (note it *must* end in "d6" or "d5") - hence:
or if you prefer case-sensitive (while not really required):
Comment #21
hunmonk commentedshould we use the one in #13 or #20?
i'm partial to the #13...
Comment #22
dwwi like #13 better, too.
Comment #23
boombatower commented#13 is .d6.patch....right? If so then I would prefer that.
Comment #24
plachI've always seen the version suffix somehow separated from the rest of the patch name, so having the most common separators in the regexp is better imHo
+1 for the #13
Comment #25
catch#13 works for me too.
Comment #26
hunmonk commentedthe regex in #13 doesn't work properly, as this filename is still passing through:
pass-D5.patch
i'm guessing it's because it interprets the dash as a range symbol instead of a dash itself?
Comment #27
hunmonk commentedcouple of things:
/(?<![.\-_][Dd][56])(\.diff|\.patch)$/solves the dash problemfoo.d5.patchandfoo.d5_0.patchshould be caught by the regex as not testable.Comment #28
sunFiles to match:
Always .patch and also .diff.
Files NOT to match:
"whatever" is an arbitrary string (*) of course. The prefix for d5/d6 can be a dot (.), hiven (-), or underscore (_).
Comment #29
Island Usurper commentedUploads could go farther than even
foo.d5_13.patch, right? Might as well future-proof the _X check./(?<![._-][Dd][56](_\d+)?)(\.diff|\.patch)$/Comment #30
chx commentedI would use two regexes here,
if (preg_match('/\.(patch|diff)$/') && !preg_match('/[._-]d[56](_\d+)?\./)Comment #31
sunFWIW, this would be the required regular expression:
Comment #32
Island Usurper commentedThat's right. I forgot that look-behind assertions have to be fixed-length.
#31 won't work because it would not match
foo_0.diff, which we do want to be tested.The presence of _XX doesn't affect whether it is tested or not, so it does not belong in the look-behind assertion.
I'm undecided if this is easier or harder to maintain than two preg_match() calls.
Comment #33
hunmonk commented@Island Usurper: this is a configurable setting, and has to be done in one regex.
also, whatever.d6_0.patch seems to match the regex in #32 when it shouldn't
Comment #34
sunSorry, either we need a regex guru/magician here, or I think this simply cannot be done in one preg_match() at all. Attached is a screenshot of several different approaches, including the negation of the complete matching criteria - without any luck.
Thus, I'd agree with chx: Implement 1 regex to match positive, and another one to match negative, i.e.
PASS 1 (positive):
PASS 2 (negative):
Comment #35
hunmonk commentedactually, i think what might need to happen is an adjustment to how PIFT checks the filename -- ie, check the filename the user submitted instead of the actual filename saved. i'll have to check and see how hard this would be...
Comment #36
alexanderpas commentedCritical as per:
http://drupal.org/node/329973
Comment #37
alexanderpas commentedComment #38
boombatower commented#35: I'm thinking it would be better to check the actual filename as the other is more of an editable user input field. (as I'm sure you know)
I think just having a regex match and regex exclude setting would cover this as chx pointed out in #30.
Comment #39
Island Usurper commentedIt can be done, it's just even uglier than most:
Consider this proof of concept rather than something that should be used, though.
Comment #40
sunAll my tests pass on #39.
However, it won't catch three appended numbers, i.e. whatever.d6_123.patch, so I'd still prefer two chained tests, as mentioned in #34 and #30.
Comment #41
catchWe definitely have few patch names which are close to or over three digit suffixes now. I wish I didn't know that piece of information.
Comment #42
hunmonk commented@boombatower: i agree that the filename is less stable than the filepath, but i don't think the regex should have to include accounting for stuff that is happening under the hood in project_issue. it seems more consistent to me to allow the admin to set the regex to what the users actually name the file. furthermore, although there's no support for it now, it also opens up the possibility for somebody to rename an erroneously named file in order to add/remove it from the list of testable files.
in keeping with this, i've adjusted pift to use the regex against the filename instead of the filepath, so no more need to account for the _X munging in the regex.
currently i have
/(?<![.\-_][Dd][56])(\.diff|\.patch)$/installed on project.drupal.org, along with the new code that examines by filename. if somebody can check this out by posting some patches there, that would be great.apologies for asking people to try to tackle the _X thing unnecessarily...
Comment #43
hunmonk commentedthis has been tested and deployed on drupal.org