Optimize CSS option causes php cgi to segfault in pcre function "match"
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | needs review |
| Issue tags: | CSS aggregation, Quick fix |
OS: FreeBSD 7.1-RELEASE-p4
Webserver: lighttpd-1.4.22
PHP 5.2.9 with Suhosin-Patch 0.9.7
PCRE version: pcre-7.9 *!
When using a zen-based theme, the webserver returns an "HTTP Error 500 - Internal server error", because the php cgi process ( both fastcgi and normal cgi versions tested ) segfaults.
I traced the segfault in the php-cgi.core dump file with gdb at the match routine in the pcre library. I also tried increasing and decreasing the pcre.backtrack_limit and pcre.recursion_limit, but it didn't help. ( also: memory_limit = 128M )
I traced the drupal php code which triggers this segfault to this preg_replace call in the drupal_load_stylesheet function:
Lines 1066-1073:
if ($_optimize) {
// Perform some safe CSS optimizations.
$contents = preg_replace('<
\s*([@{}:;,]|\)\s|\s\()\s* | # Remove whitespace around separators, but keep space around parentheses.
/\*([^*\\\\]|\*(?!/))+\*/ | # Remove comments that are not CSS hacks.
[\n\r] # Remove line breaks.
>x', '\1', $contents);
}I assume the problem lies at the regex and this version of pcre, because the exact same configuration of (drupal,theme,database) doesn't cause the crash on another system ( debian-lenny libpcre3 (7.6-2.1) ) .
The crash only happens on a certain pattern in the css ( here in $contents ), and that's what i'm trying to find out now. I'll post it when I pinpoint it.
I found the same issue here, but it's slightly different, since only one comment has the exact same issue: http://drupal.org/node/331915
Excuse my bad english.
Greetz,
Raul

#1
In the forum is a general overview about this issue, including some more information about the server environment:
D6 on FreeBSD 7.0 & Lighttpd - 500 Internal Server Error
(it's the same D6 site, and we could really need some help with this!)
Greetings, -asb
#2
Hi,
first, i want to make a little correction:
The Php Code which triggers the segfault is located at lines 1966-1973, in the file includes/common.inc .
second, i pinpointed the php code which always reproduces the bug. It's:
$contents = "/* aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
*/
";
$contents = preg_replace('<
\s*([@{}:;,]|\)\s|\s\()\s* | # Remove whitespace around separators, but keep space around parentheses.
/\*([^*\\\\]|\*(?!/))+\*/ | # Remove comments that are not CSS hacks.
[\n\r] # Remove line breaks.
>x', '\1', $contents);
Just put that code in a separate php file, run it and it should crash. ( Eventually increase the number of aaa's, feel free to insert malicious machine code, etc. ;) )
This DoS only seems to affect php version 5.2.9 built against pcre 7.9, since i tested it on following machines:
So, a Drupal Theme must only include a css file with that comment lines somewhere in it, and than drupal/php/webserver crashes ( given it has pcre 7.9 & php 5.2.9 ). It Seems the problem is when there are too many characters inside a multiline comment.
#3
Seems the standalone pcre version i gave was irrelevant, since another version is bundeled with every php release.
Also, i managed to reduce the php code to the core of the problem:
$contents = 'd' . str_repeat('a', 1900) . 'b';$contents = preg_replace('/d(a)+b/', '\1', $contents);
And, I observerd, the segfault doesn't occur if I disable mhash.so extension of php from extensions.ini.
#4
I have the same exact symptom on another totally different system: vista, apache 2.0.x and latest version of php (5.2.9-2)
Debugging the dieing apache process shows a bunch of 'match' commands before the stack overflow occurs.
(actually i have 64Mo memory limit)
Here is a trace.
Disabling css&js preprocess fixes it.
#5
@raulgigea: nice work on investigating the issue, but this is doesn't seems to be a Drupal bug. Why not opening a bug on the PHP bug tracker?
#6
Hi, well because I wasn't really sure if the development (trunk/head) version of php still has this bug, in which case i should report the bug to the FreeBSD package maintainers, and still not here.
On the other hand, this post might help some users which upgrade to php 5.2.9, which i presume will happend soon, because 5.2.9 ( as a new version) is reaching the unix distros anytime now. Unless i can stop it with a bug report ;).
But i'm compiling the development version of php as i type, and will either open a php bug or post here the result, if it works.
#7
Okay, it actually does crash with the cvs snapshot version of php. So here's the new php bug: http://bugs.php.net/bug.php?id=48153 . <- everybody vote for that, and maybe they fix it in the next version :)
#8
Hi,
as you might have read, the php guys didn't accept it as a bug aswell.
For a big enough pattern, the preg_match function simply takes too much stack memory. It's not a bug, its the same as coding a recursive function that goes too deep into recursion.
It is the programmers responsability to check that his code doesn't eat up all the stack and choke.
In this case, Lines 1966-1973, in the file includes/common.inc works in most configurations of php, but some systems don't have big enough stack size, or other php modules eat up more stackmemory so that the left one goes out faster than usual.
In any case, if we construct unusual test cases - for example:
- insert >60000 bytes into a multiline Comment in an css file, than it almost surely segfaults on any configuration ( os, libs & php ).
The problem is where the os & php & libs are poorly configured, this case occurs with much less bytes and it can happend that a usual longer multiline comment is able to trigger this stack overflow.
For those cases i provided a patch, that is equivalent to that regex, but withouth using PCRE functions thus without the stack overflow.
Greetz,
Raul
#9
Raul, great work. I had the 500 Error here too when enabling css compression. Your patch fixed the problem. Thanks so much!
Thomas
#10
Is there any chance to get this patch into core, or will we have to maintain this bloody patch for every Drupal release?
Thanks -asb
#11
Hi!
I have same problem. I tried turn on optimization CSS and JS and site working ok.
#12
This bug is still killing Drupal everytime core is updated (and thus our patch is being overwritten).
Rauls patch from #8 again has to be modified, this time we're getting:
Parse error: syntax error, unexpected '*' in /usr/local/www/drupal/includes/common.inc on line 1927This is extremely annoying and an absolute showstopper to run Drupal on FreeBSD. Updating the applicable version of Drupal core to 6.14 and setting the priority to "critical" since it forces us to not install security updates.
Greetings, -asb
#13
I have the same, or a similar error on my windows xp localhost.
I was able to reduce the regex and the "offensive" CSS, and make a standalone php script to reproduce the error.
I get the error on localhost, but not on my webspace. So I guess it depends a lot on the configuration.
In the "offenive css", it is enough to remove a few letters (no matter which exactly) to make it work.
#14
Is this related to this #543892: Win XAMPP 1.7.2 Regex PHP Bug - Optimize css & long css comments fails?
#15
Could be, who knows..
I will link it from there, hope it helps!
#16
The default
settings.phpfile now ships with (commented-out) configuration parameters that will allow you to increase the PCRE limits on your system. Just comment out the parameters and set the limits that make sense for your particular case. Closing this bug.#17
Hi.
The solution from #16 is to add this to your settings.php
ini_set('pcre.backtrack_limit', 200000);ini_set('pcre.recursion_limit', 200000);
I have tried with values up to
ini_set('pcre.backtrack_limit', 40000000);ini_set('pcre.recursion_limit', 40000000);
without any luck.
I am on
FreeBSD 7.2-RELEASE-p2
Lighttpd 1.4.23
PHP 5.2.11
Suhosin-Patch 0.9.7
I am switching status back to active, since the solution from #16 doesn't seem to do it.
#18
@Damien (#16):
Please have a look at the inital bug report from Raul; there he writes on April 24, 2009: "I also tried increasing and decreasing the pcre.backtrack_limit and pcre.recursion_limit, but it didn't help".
Fiddling with pcre.backtrack_limit and pcre.recursion_limit is where we failed before filing this bug report half a year ago. The whole point of the patch from #8 is that modifying pcre.backtrack_limit and/or pcre.recursion_limit does not solve this bug, at least not on FreeBSD.
It would be very kind if the powers-that-be could have a look at the instructions from #2 and would try to reproduce this bug. It shouldn't be too hard. Alternatively we would be very glad to receive instructions where else we could file this bug report; since this weakness probably could be used to execute malicious code, it might be useful to inform the security team?
Greetings, -asb
#19
Hi.
I have found out that the regexp that was striping out the comments wasn't so well crafted, and that's why it was segfaulting PHP through endless recursion.
The attached patch fixes the problem: I have tested it with comment chunks of 50K+ and it works fine and quick.
I hope it will help the other FreeBSD+Lighttp fellows in this thread ;)
Please test it and report back.
#20
Wonderful!
The modified regex solves the problem for me on Windows XP (for the test script I created). I did not do other tests yet..
#21
Nice patch -- worked for me too. Thanks!
#22
Here is the same patch as #19 against Drupal head.
If some more of us can test it and verify that it's working as expected, then it can get committed to core.
#23
Could you provide a sample input that triggers the segfault?
#24
@Damien
This code
<?php$contents = '/*' . str_repeat('a', 60000) . '*/ Test1';
$contents = preg_replace('<
\s*([@{}:;,]|\)\s|\s\()\s* | # Remove whitespace around separators, but keep space around parentheses.
/\*([^*\\\\]|\*(?!/))+\*/ | # Remove comments that are not CSS hacks.
[\n\r] # Remove line breaks.
>x', '\1', $contents);
print $contents;
?>
produces this
http://www.webistas.gr/segfault.php
while this code
<?php$contents = '/*' . str_repeat('a', 60000) . '*/ Test1';
$contents = preg_replace('<
\s*([@{}:;,]|\)\s|\s\()\s* | # Remove whitespace around separators, but keep space around parentheses.
/\*[^*\\\\]*\*+([^/*][^*]*\*+)*/ | # Remove comments that are not CSS hacks.
[\n\r] # Remove line breaks.
>x', '\1', $contents);
print $contents;
?>
produces this
http://www.webistas.gr/no-segfault.php
Both the above are running on
FreeBSD 7.2-RELEASE-p2
Lighttpd 1.4.23
PHP 5.2.11
Suhosin-Patch 0.9.7.
I hope it will help you verify it.
#25
I had PHP/Apache crashing with CSS/JS compression on a Win 2003 server. Just applied the patch #19. With compression applied and server still functioning, my YSlow grade has jumped 2 levels. Thanks for patch and I'm +1 for an update to core (no brainer IMHO).
Not knowing much about this, is there still any merit in also making the PCRE increases for backtrack and recursion? I have the luxury of my own server with loads of memory so have no problem bigging up resources.
#26
@snorkers
I am sorry but
pcre.backtrack_limitandpcre.recursion_limitwill not help you get faster... In fact, your YSlow grade increased because you managed to activate css compression/aggregation without crashing the server by applying the patch, and not because of the patch itself.Talking about speed, the patch - besides fixing the original issue - runs also faster. Some rough-cast benchmarking yielded:
Comment block size -> Patched version times faster than core version of regex
100b -> 3,3
500b -> 17,5
1Kb -> 19,4
2Kb -> 38,3
5Kb -> 55,6
10Kb -> 65,2
Offcourse, this speed benefit applies only when
drupal_load_stylesheet_content()function runs, so only the first visitor of a page will notice, nonetheless it's a welcomed sideffect.@Damien
I am wondering what is the meaning behind status needs work. Is there something that needs to be worked out about the #22 patch?
#27
@Tri:
Could you explain what was wrong with the old regex, so we can all learn something? That would be great!
#28
@donquixote:
It wouldn't hurt too much to read the whole thread, especially the initial issue's description, before requesting "explanations" in #27, wouldn't it?
Thank you!
#29
I did read the full thread, I do understand that the regex caused problems, and I am very happy that Tri found this solution.
But, most of what has been said here was plain empirical tapping in the dark, until Tri found that a different regex exists that does the job and doesn't break.
What I would be interested to know: Is this really a poorly written regex, or is it just the PCRE regex engine that does not like it? And if it is poorly written, then what exactly is wrong with it? Is it an invalid regex? Does it have performance problems? Maybe I can learn something for the next time that I have to build a regular expression myself - if Tri does not mind to write an explanation.
#30
/\*([^*\\\\]|\*(?!/))+\*/That code is really poorly documented, but it seems like the objective of this regexp is to capture all the CSS comments that do not end with
\*/(which is the old IE 5 mac hack).I'm not really sure that it makes sense to continue supporting this hack nowadays, but if we want to, this regexp could probably be simplified this way:
/\*(.+?)(?<!/)\*/For D7, I would argue removing the support for IE 5 Mac hacks altogether.
#31
@donquixote
Regular expressions is a programming language in itself, considered by many as a "dark art", because of the complex internals that are hidden behind its condensed - often cryptic -syntax. It's very powerful, but with "power comes responsibility", as the old computing moto is saying.
By feeding this code to PHP
<?phpfunction foo() {
foo();
}
foo();
?>
you won't blame the parser when it will segfault, because it's an obvious coding mistake. Doing the same though, through a regexp isn't so obvious...
So no, it doesn't have to do with the engine.
It's not an invalid regexp, because an invalid regexp doesn't even get parsed.
Going in great detail about regexp theory is out of this thread's scope.
In sort
/\*([^*\\\\]|\*(?!/))+\*/is matching super-linear, a regexp programming jargon for describing situations where the number of possible matches grows exponentially by the length of the fed text. The construct "anything not star, not backslash" combined with "or" and "lookahead" inside the "one or more", leads the engine to try all possible permutations, which means lots of backtracking.First of all it has stability problems, which is why this thread was opened in the first place. After stability comes performance (see #26)
I hope it's clearer now.
@Damien Tournoud
I believe that talking about css techniques and support for IE 5 Mac hacks is out of this thread's scope also. A fundamental concept of the Drupal architecture is never alter a user’s original data, so silently striping code that someone has chosen to include in his css through a core function is out of the question.
Besides that
/\*(.+?)(?<!/)\*/doesn't work as expected since it's striping out\*/hacks and doesn't strip multiline comments.I still don't understand the status switch from #23.
So
I am switching priority back to critical since this bug crashes Drupal if someone has long enough comment blocks in her css or it's preventing people to enable a core function.
I am switching status to "needs review" since I don't see what needs to be worked out in patch from #22 and I am assigning this to myself.
I am attaching the #22 patch against Drupal 6.14.
#32
The last submitted patch failed testing.
#33
@Tri:
Thanks a lot! This is exactly the kind of explanation I was hoping for. I am still scratching my head about the details, but this is now my own problem..
About your latest patch: I think you need to suffix the filename with "D6" or something to avoid it being tested for D7?
#34
Reroll against head.
#35
#36
Supporting the IE 5 Mac Hack *is the source of the performance problem*. Discussing if we want to continue supporting this outdated (and as a consequence useless) hack if perfectly in the scope of this issue.
The look-ahead expression (
(?<!/)) is specifically designed not to strip out \*/ hacks. I'm not sure how the previous regexp managed to strip multiline comments.Anyway, we need to add tests for all that regular expression voodoo.
#37
<?php/*([^*]|*(?!/))+*/ | # Remove comments that are not CSS hacks.
?>
Inside
5.x: http://api.drupal.org/api/function/drupal_build_css_cache/5
6.x: http://api.drupal.org/api/function/drupal_load_stylesheet/6
7.x: http://api.drupal.org/api/function/drupal_load_stylesheet_content/7
#38
#39
We need tests, and I still don't know what you think is wrong with my simple
/\*(.+?)(?<!/)\*/.#40
Absolutely. Here is one.
Result:
- Tri's regex keeps the first part of the ie5 mac hack, but not the second part that ends the hack.
- The old regex had the same problem (so it was quite pointless with ie5 mac hacks)
- Tri's regex as standalone does not completely remove the comment. It needs the part that says "remove whitespace".
- Damien Tournoud's regex removes the ie5 mac hack (as expected).
- none of them causes a crash.
EDIT:
The things I pointed out about Tri's regexp are not relevant int he context of the patch: The patched version works the same as the unpatched version - this is what matters.
#41
This statement is inaccurate. The regex that is causing the problem was intended to support the IE 5 Mac Hack. This does not mean that supporting this hack requires a poor regex. If we can find a fast regular expression which successfully preserves the IE 5 Mac Hack, we shall use it.
<?php$content = preg_replace('<
((/\*.*\\\\\*/([^/]|/[^\*])*/\*([^\*]|\*[^/])*\*/)|/\*([^\*]|\*[^/])*\*/) # Remove comments that are not CSS hacks.
>x', '\2', $content);
?>
It works for my simple examples, but some other tests would be helpful.
This one tries to preserve the trailing comment as well. If that is not what we want, we can further simplify.
EDIT:
Added a few characters to allow /* in the trailing comment.
Btw:
Noone cares about the length of the regular expression, as long as it is fast, and there is a documentation somewhere.
#42
The ie5 mac hack comes in two flavors.
The first is that if you insert a comment containing a backslash MacIE5 would ignore the next (single) rule, so no second part that ends the hack is needed.
The second is that if the \ is immediately followed by the closing */, you can comment out a whole block of rules for MacIE5 and in this case you need a second part that ends the hack.
The proposed regexp works for the first flavor as it did the original from core. I just rewrite it to be stable and efficient.
The proposed patches are leaving the "remove whitespace" part of the regexp intact, only the "remove comments that are not CSS hacks" is changed, I don't know why it exists in your test as "case 4".
In my tests both regexp from #41 and #39, are striping out every comment regardless of hacks or not.
#43
Case 4 exists more out of curiosity - I like to understand things.
But I would say yes, the patch does the job in my test cases, in that it behaves the same way as the old version.
#44
I have worked out the patch some more and this version respects both flavors of ie-mac hack, multiline or single line.
The trick is to add a backslash also at the end hack block in a first pass (it doesn't hurt ie-mac as my tests have shown) so the second pass which strips the comments doesn't touch it.
I am attaching patches for d6 and head along with a test script that checks for 60K comment blocks, both versions of the ie-mac hack and multiline, multistar comment blocks. It also displays the elapsed time to complete the tests.
Please test it and report back.
#45
The patches/files as #44 with a small fix. #44 was adding an extra \ also at the begging ie-mac hack comment block.
Now it's not.
#46
The last submitted patch failed testing.
#47
There was an other css agreegation related bug here drupal_load_stylesheet_content() removes whitespace improperly, creating invalid CSS which ended up commiting a patch that removed the "# Remove line breaks" part of the regexp discussed in this issue.
The regexp is still segfaulting on large enough comments so I am rerolling my patch against HEAD. Unfortunately, I am too busy at the moment to educate myself about writing simpletests, as the guys over that issue did. I understand that this is the correct way to go, but I can't for now. In the attached zip, you will find a css file that can be used to prove the bug, so someone fluent in writing simpletests is welcomed to use it.
Just to sum up what has been done in this thread, in order to spare new comers here from reading the whole thing, besides the original segfaulting on large comments bug, an other bug was found were ending ie-mac hack comment blocks were erroneously getting stripped out. The proposed patch fixes them both.
#48
#49
subscribe
#50
We're battling an unwinnable battle here with the wrong tools: You can't successfully parse CSS with regular expressions. Several minor improvements have been committed in D7, but for D8 we definitely need to parse CSS: #494498: Add CSS parsing capability to Drupal.
That said, we still need to address this for D7.
#51
@Tri,
Thanks so much for this work. This bug was driving me crazy.
Manually applied #47 to 6.14 and it fixed the crashed site problem with no other apparent side-effects.
Since in my ignorance I'm guessing that the testing system uses the version number in the issue data, I'm going to post the D6 patch here #331915: Performance: When enable optimization for css and javascript ... site goes down instant!.
#52
fix for bug works for me as well
#53
@rfay (#50):
Is this a documented fact from computational complexity theory? Or is it just "difficult" ? Remember, we are not completely parsing CSS here, we only want to get rid of comments. That's a smaller problem, I would say (but again, don't know if that's a proven fact).
#54
@kentr
I am glad that I helped :). The patch for D6 is actually at #45. The testbed says "failed" for the D7 patch, it doesn't check for D6. I am using it in all my D6 sites and it works flawlessly.
I am also glad that there is life again in this thread. Since status is now RTBC, I believe it might be time for the patch to make it into core. Objections anyone?
#55
I should mention that #472820: drupal_load_stylesheet_content() removes whitespace improperly, creating invalid CSS (mentioned in #47 on its commit to D7) also got committed to D6 the other day, which will (I believe) cause the D6 patch from #45 to need a re-roll.
#56
@rfay
You are right. I missed the D6 commit of your patch. So here is a re-roll against D6.14.
#57
Subscribe, I'm seeing this problem on my Windows machine too.
#58
@marcvangend
Does the patch fix the issue for you? If so please say so.
#59
I think we should remove support for the IE5mac hack in D7. It was released in 2000, hasn't been updated since 2003, is unavailable for download since 2006. And most importantly, NO ONE USES IT.
Its nonsensical to jump through hoops to make sure the hack continues to work in Drupal 7. Support for it is already in D6, so I guess we'll have to leave it. But if we can fix the segfault problem and simplify the regex in D7, we should do it.
Is that regex supporting any other comment-related hacks besides the IE5mac one? It doesn't look like it. Here's a detailed explanation of the
/\*([^*\\\\]|\*(?!/))+\*/ # Remove comments that are not CSS hacks.regex that is currently in D7:/\* # match "/*"( # start of matching sub-pattern
[^*\\\\] # match a single character that is not "*" or "\"
| # OR
\*(?!/) # match "*" that is not followd by "/" (memory-exploding negative look-ahead)
) # end of matching sub-pattern
+ # match one or more of the sub-pattern above
\*/ # match "*/"
In the attached patch, I've removed the check for a backslash in a comment and modified Tri's regex very slightly. BTW, Tri, your regex is awesome; I'd have never thought of it. Here's a break down of the regex in my patch:
/\* # match "/*"[^*]* # 0 or more characters that are not "*"
\*+ # 1 or more "*" characters
( # start of matching sub-pattern
[^/] # single character that is not "/"
[^*]* # zero or more characters that are not "*"
\*+ # one or more "*"
) # end of matching sub-pattern
* # match zero or more of the sub-pattern above
/ # match "/"
Still needs simpletests.
#60
@mikeytown2: I had this problem in two D6 sites I copied to localhost for upgrades and new features. On one site the patch from #56 solved the problem, the other site I haven't tested yet.
By the way: many thanks to all of you who are working on this. Personally, I suck at regex, so I'm glad I don't have to fix this :-)
#61
@JohnAlbin
I believe you are mistaking css parsing with css authoring. This issue has to do with css parsing - any css that someone might wish to include in his site, has to stay intact - functionally speaking. I agree with you that IE-Mac is outdated but this is a matter of css authoring. This thought came up again in this very thread (#30, #31 second part, #36 & #41).
Regexp's were never meant to be read by humans. They are crafted with the machine in mind. In the regexp world, the easier for the eye - the harder for the machine and vise-versa, holds true.
Actually I don't see any serious hoops. The proposed patch is in fact two lines of code and it is not even PHP. A few character classes and a few greedy repeats is all that is. From this we get a) stability (from what other people have verified in this thread b) speed (see #26) and c) IE-Mac support for whom might want it (improved! (#44)).
About the simpletests, I understand that it is the defacto way to test patches in Drupal, though, since the patch is plain regexp stuff - no PHP or Drupal constructs used, I believe it can be tested standalone... Since I am not familiar with simpletest works, I can't write one myself, so if anyone can help on that it would be nice.
I am re-attaching (not re-rolling - so people that have tested it so far don't have to bother reapplying) my patches for D7 & D6 from #47 & #56 so people coming here searching, can find the one they need without jumping from post to post. Also in the zip there is a .css for tests and .php to test standalone.
#62
I would call #61 RTBC
#63
And if you don't enable CSS optimization, your IE5mac hack will remain. But if you want Drupal to alter your CSS so that it is optimized, I don't see the problem with stripping comments. We should not add extra code to support IE5mac in D7's CSS optimizer. I agree with Damien on this point.
Readability is not what I meant when I said "simplify". I meant less code == less time to compile the regex == faster execution. Your patch does simplify the existing regex (can't say enough good things about that part of your patch), but it also adds an additional call to preg_replace() per stylesheet. On a site with 35 stylesheets, that's an additional 35 calls to preg_replace just to support IE5mac. That's what I'm objecting to.
Also, your speed tests are against a version of the patch before you added the extra preg_replace().
Finally, your
/\*[^*\\\\]*\*+([^/*][^*]*\*+)*/contains an unneeded * in it. Temporarily ignoring the fact that the backslash checking should be removed, it should (at the very least) be:/\*[^*\\\\]*\*+([^/][^*]*\*+)*/because the regex part just before the[^/]is a greedy\*+so[^/*]would never match a *.#64
@JohnAlbin:
Thanks for looking at optimizing the regex. Would you mind testing your regex and rolling a patch?
Either way, IMHO it's important to get *something* committed so that CSS aggregation won't break sites - to help people who are using D6.
#65
John, thanks for the kudos and for revising my regexp. Though,
is not true. Greedy repeats are "non-capturing", which means that they can "lend" a character if the following pattern needs it to match. So
*+([^a])foowill match "****foo" while*+([^a*])foowill not.About IE-Mac, I see your point. All this regexp talking is scaring people away and they don't get involved in something that is IMHO a matter of decision taking about how to strike a balance between efficiency and "politic correctness". I am going to sum up the possible choices in one post so the community can make up their mind.
The ie5 mac hack comes in two flavors.
The first is that if you insert a comment containing a backslash, MacIE5 would ignore the next (single) rule, (lets call it hack v.1).
The second is that if the \ is immediately followed by the closing */, you can comment out a whole block of rules for MacIE5 and in this case you need a second comment part after the block of rules that ends the hack, (lets call it hack v.2).
The original implementation from core supports hack v.1 only (when it doesn't segfault on large comment blocks). So the choices we have are:
1) Drop the IE-Mac hack all together.
This will be the fastest of all. Scores high on efficiency but alters the functionality of the original css. If we follow this path, we have to place a warning at the "Performance page" bellow the "Enable CSS optimization" about hacks not supported.
2) Support hack v.1 only as the original did.
This is the patch from #22 (needs re-roll for current HEAD). This is something like 20-60 times faster than core (#26). Still alters the functionality of the original css and will need a warning about multi line hacks not supported.
3) Support both hack v.1 & hack v.2
This is the patch from #61. Some rough testing shows that it is 3-4 times faster than core.
The speed benefits applies only when
drupal_load_stylesheet_content()function runs, so only the first visitor of a page will notice.Please everybody say what you believe is the correct way to go. After the path will be shown, we can use the correct regexp to take us there...
#66
I vote for option 1. I haven't had to support IE5 in ages and the gain in performance is much more important to me.
IE5 for Mac was replaced as the default browser by Safari in 2003, with the release of OSX 10.3. Do you remember? That's when iMacs looked like this. Drupal doesn't even try to be compatible with it's previous major release, so I don't see why we should sacrifice performance IE5 for Mac.
If the majority here decides that full support of IE5 hacks is important, I want to propose to offer three choices on /admin/settings/performance:
Optimize CSS files:O Disabled
O Performance mode (faster, can remove CSS hacks for older browsers)
O Compatibility mode (slower, respects all CSS hacks)
#67
I should point out that unless I'm off track here, the performance issues *only* come into play when the aggregated CSS has to be rebuilt, as when CSS Aggregation is turned on or when a file is changed, making it far less of a real performance issue - it does not occur on every page load, but rather more when administrative events happen.
#68
Does Safari work on older Macs too?
#69
Definitely not on every page load. What about rebuilding the cache? But, optimization in general is a good thing, so this would be a good time (while it's in our face) to implement an acceptable optimization lest it get lost in the shuffle...
Did the code that attempted to remove comments while preserving the Mac hack ever work reliably? If not, take it out, because it wasn't ever something people could depend on anyway.
But if it did work, then it was a core feature of Drupal, so removing it doesn't seem to be our call. In this case, i vote Tri's option #2. Adding support for hack v.2 if it was never in the first place is a step backwards, IMO.
I like marcvangend's solution for providing the option to the user - at least for D7. If it's realized and easily backported to D6, nice but not a big deal.
#70
@kentr, yes, nearly the only time this performance would be an issue is at cache rebuild time. In my world-view of performance optimization, that makes it irrelevant to even think about because that's a very unusual event.
#71
@tri, @rfay: you're right, the code isn't executed on every page load. I guess I forgot about that because I've seen the error occur again and again and again...
If performance isn't really an issue here, I definitely think we should choose option 3. Not because I suddenly care about MacIE5, but because it's better for usability/UX to avoid warnings whenever possible. I can imagine that many site administrators don't even know if their theme uses MacIE5 hacks. It would be a shame if they don't dare to turn on CSS optimization, just because they cannot judge if the warning applies to their situation.
#72
Agree as well, do it right the first time. Allow this to not fail on CSS hacks.
#73
Admin performance (with "rebuild theme registry on every page") is my time and my money, so...
well, but in this particular case, it's not such a big difference.
I have never bothered to implement any IE Mac support in my projects, and probably none of those posting in this thread have, but that's not really the point is it?
I did a quick search for "ie mac site:drupal.org"
http://www.google.de/search?hl=de&client=opera&rls=de&q=site%3Adrupal.or...
Some of these issues contain posts from 2007, which is not that long ago.
And one quote I found in a mailing list:
http://lists.drupal.org/pipermail/themes/2005-December/000088.html
Exactly what I said - still used on older machines :(
Some stats would be interesting! And certainly more relevant than some personal opinions posted in this thread.
EDIT:
Ok let's get 3) in. It's not a bottleneck even for admin pages, so why care?
#74
Everyone is in agreement here that we leave in the macIE css hacks correct?
#61 is RTBC
#75
Subscribe (ref D6 issue: http://drupal.org/node/331915)
#76
The code comments in #61 are missing a space after the pound character. Needs quick reroll.
#77
Rerolled it myself and committed it to CVS HEAD. Thanks.
#78
Thanks Dries.
If I understand correctly, 'CVS HEAD' means that is fixed in D7 - or will it be rolled into D6 as well?
#79
We need it for 6.x now
#61 is RTBC
#80
I would like to thank everybody in this thread for their help in reviewing, testing and voting for the various options - especially, Donquixote for discovering the hack v.2 bug and JohnAlbin for revising and thought provoking objections.
See you everybody at the next issue.
I am attaching patch from #61 for D6 with the missing spaces.
#81
+1 RTBC, Tested patch from #80 under
Apache/2.2.9 (FreeBSD) DAV/2 PHP/5.2.6 mod_ssl/2.2.9 OpenSSL/0.9.8eFreeBSD h2.itl.ua 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Wed Oct 29 09:48:51 UTC 2008 axl@h2.itl.ua:/usr/obj/usr/src/sys/H2 i386
There's little difference within 6.14 and 6-dev caused by #472820: drupal_load_stylesheet_content() removes whitespace improperly, creating invalid CSS
#82
Tested the patch from #80 under
Apache/2.2.14, PHP/5.2.11, FreeBSD delsi.pair.com 7.2-STABLE FreeBSD 7.2-STABLE #0: Wed Oct 21 12:38:36 EDT 2009 erik5@kyack.pair.com:/usr/obj/usr/src/sys/7PAIRe i386and it works great!
#83
#642974: CSS aggregation removes multi-line ie-mac hacks when the hack rules contain a star
#84
This apparently broke the very thing it was supposed to support.
From #642974: CSS aggregation removes multi-line ie-mac hacks when the hack rules contain a star:
So let's just cut the crap and remove support for this IE-mac hack completely.
And: why in hell was that committed without the corresponding tests? Frankly, we should not accept any regexp-that-nobody-understands without a comprehensive test suite.
#85
I am working on the CSS aggregator: #641472: Fix CSS Aggregator.
Latest patch still contains that bug, but i am working on it.
#86
This fixes the original segfault bug and supports v.1 mac hacks exactly as the original from core did. The #84 has to do with v.2 (multiline) mac hacks. Since v.2 mack hacks are a kind of a "new" feature, I believe it has to be discussed in its own thread an not here.
Let's do it here #642974: CSS aggregation removes multi-line ie-mac hacks when the hack rules contain a star
#87
@Tri:
I don't get it. I thought we had option 3 ? If 3) removes the hack, it does not do what it is intended to do.
@Damien Tournoud
> So let's just cut the crap and remove support for this IE-mac hack completely.
Then why does some core CSS still use this hack? If this was discovered in even core's common.css, then I would not be surprised if a bunch of modules and themes do also have css hacks.
#88
@donquixote
We do have v.2 mac-hacks as stated in 3). But Casey at #83, discovered a bug: If the block of rules of the v.2 hack contain a star, the first
preg_replacefails to detect the second comment part that ends the hack in order to add a slash to it, so the secondpreg_replacejust strips it off.Since this thread has come a long way and started as a critical segfault bug, I felt is better to address this bug in a thread of it's own rather than reopening this one.
#89
Let's not move fixing bugs introduced by this patch to a different issue, or we will have a hard time figuring out when/how is this ready for D6 (which it does not yet seem so). Also, Damien seemed to have other outstanding issues as well, like adding tests for this.
#90
Ok, so the action comes back here again from #642974: CSS aggregation removes multi-line ie-mac hacks when the hack rules contain a star.
I am attaching a new patch that uses a different approach to the conditional comment stripping, since trying to identify start & end v.2 ie-mac hack comment blocks through regexps is getting too complex. I use a regexp to match all comments and comment strip logic is implemented in a function.
In case you are worried about performance: In some rough benchmarks I have done, I found that it takes ~0.040sec to proccess 25 css files coming from a D6 setup with zen, views, admin_menu, date_popup, filefield & panels on my rather slow dev box.
I added a css unit test to check for mac hacks v.1 & v.2. It also has a huge 60Kb comment block to test for segfaults, so don't get scared by the size of the patch, it's just the test css files.
Please test it and report back.
#91
Patch #90 worked for me on my 6.15 install (had to tweak the patch a little to make it apply). Now I can re-enable CSS aggregation on my sites.
BTW I think this is the direction to go. Trying to match hacks in multiline comments using a single (multi branch) regexp may be problematic, IMHO.
@Tri, if I understand your patch correctly, you moved the comment stripping logic to another preg_replace_callback() call, but kept the branch separator in the original regexp, however there's no need for it anymore. You can chop off the "|" separator from the regexp.