Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Semicolons in @import urls (e.g., certain types of Google Fonts variants) don't work with aggregation. In addition, there are a number of modifiers (e.g., media queries or supports-queries) on the @import syntax that are not captured by our test coverage. See: https://developer.mozilla.org/en-US/docs/Web/CSS/@import and https://regex101.com/r/H2SOFh/1
Also, the CSS collection optimizer doesn't have test coverage.
Comment | File | Size | Author |
---|---|---|---|
#73 | 2936067-73-backport.patch | 925 bytes | pfrenssen |
#29 | interdiff_27-29.txt | 663 bytes | ravi.shankar |
#27 | 2936067-test-only.patch | 7.2 KB | eiriksm |
#26 | 2936067-test-only.patch | 7.2 KB | eiriksm |
#25 | 2936067-25.patch | 835 bytes | patrickkrueger |
Issue fork drupal-2936067
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
DuaelFrComment #3
agentrickardThis is an invalid URL. The ampersand and semicolon are reserved characters with special meaning and need to be encoded.
The
&
should be recoded as%26
using a function like http://php.net/manual/en/function.urlencode.phpSee https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved... and https://tools.ietf.org/html/rfc1738
Comment #4
agentrickardThe '&' here is being used to pass query arguments, and needs to be an actual & and not
&
https://fonts.googleapis.com/css?family=Sedgwick+Ave+Display&subset=lati...
Comment #5
DuaelFr@agentrickard it was a real life example. Yes, this one has an easy workaround but it is real and it is working in all browsers so it's strange that Drupal breaks it when aggregation is enabled.
The thing is that, according to the RFC3986, semicolons can be used as "sub-delims". It's not very common but it exists. If you read the "Notes" section of the PHP urlencode function documentation, you'll be teased that both "&", ";" and "&" can be found as separator.
I wasn't able to start writing a test for
\Drupal\Core\Asset\CssCollectionOptimizer
yet but I think it's going to be needed.The attached patch has been inspired from
\Drupal\Core\Asset\CssOptimizer::processCss()
and the new Regex can be tested ->here<-. It fixes the issue mentionned in the summary.Comment #6
bradjones1Indeed this is very much in the wild; e.g., fonts from Google Fonts produce such an embed code.
It is definitely uncommon to see a semicolon in a URL, but my read of RFC 3986 comports with the above comment.
Let's bump the version and see if this still passes.
Comment #7
bradjones1Pass! I think this just needs a solid addition to the test suite.
Comment #8
proweb.ua CreditAttribution: proweb.ua commented#5 works
Comment #9
webdrips CreditAttribution: webdrips commented#5 works for me too.
The line that was breaking our site:
Comment #10
tanubansal CreditAttribution: tanubansal at Salsa Digital commentedTested #5
RTBC + 1
Comment #12
Clemens Sahs CreditAttribution: Clemens Sahs commentedTested #5
RTBC + 1
needed for 8.x too!
Comment #13
bradjones1Still needs tests :-)
Comment #14
Cosmin Hodis-Mindras CreditAttribution: Cosmin Hodis-Mindras commentedTested #5
RTBC + 1
Comment #15
Clemens Sahs CreditAttribution: Clemens Sahs commentedI think this works back for drupal 8.x to
Comment #16
bradjones1Concerned about including an external URL (even from somewhere as "stable" as you'd expect Google Fonts to be) in tests - even if there's a temporary network failure in retrieving, this would be more fragile than we probably need. I imagine a text fixture could serve just as well.
Comment #17
Clemens Sahs CreditAttribution: Clemens Sahs commentedIn the case we make some network interactions, yes you are right and my full approval.
But in this case (CssOptimizer) we make simple string interaction.
at the same time we must edit the following, too.
Please correct my if I miss something?
Comment #18
bradjones1Ah, yes, sorry - brain fart. I think then let's just change the fonts.googleapis.com to fonts.example.com, per the RFC on example domains and so as to not "endorse" any particular font vendor by reference?
Comment #19
AndyF CreditAttribution: AndyF at Fabb for FRUITION commentedAlso I think the regex as it stands might fail to match some (edge!) cases correctly?
@import "cu'st;om.css";
@import 'cus(t;om.css';
@import 'cus)t;om.css';
@import 'cus(t);om.css';
@import url(http://example.com/cu'st;om.css);
@import url(http://example.com/cus(t;om.css);
@import url(http://example.com/cus)t;om.css);
@import url(http://example.com/cu(s)t;om.css);
See https://regex101.com/r/97dCNQ/2.
Thanks!
Comment #21
bradjones1Hi Andy!
You raise a good point - in recent years internationalization (IRIs) and things like funky Google Fonts URLs have clouded the question of "what is a valid URL" to the point where it's rather difficult to regex it. I went back and looked at the current code, which basically boils down to matching any text starting with
@import
through to a semicolon:https://regex101.com/r/ybSIaY/1/
As you can see, this matches all the existing test cases but incorrectly truncates the URL containing a semicolon.
I have updated this regex to now be
'/(*ANYCRLF)(*BSR_ANYCRLF)@import.*;(\R|$)/i'
which now matches from the start of the import statement to the end of the line (or to the end of the file, if for instance the file doesn't end with a newline. This shouldn't happen per the spec but might be worth trying to catch anyway?)https://regex101.com/r/5YE5Rr/1
So far it looks like this helps clean things up, passes the expanded test coverage and doesn't have any regression on the existing aggregation unit test. The thing this will not capture is any import statements spanning multiple lines. I imagine this was the intent of the
[^;]
in the existing regex. AFAICT this is not a common pattern and I'm not quite sure there's a way to write a regex that does all the things we need here and handles the newlines in a platform agnostic way, but I could be wrong. We don't have this multiline condition in the current test coverage and honestly I'm not sure it's valid per the CSS spec, so this may be worth a possible regression to get Google Fonts with semicolons working, which is a way more common condition.Comment #23
bradjones1Comment #24
bradjones1Does the CSS collection optimizer not have tests? Looks like it might not?
Comment #25
patrickkruegerRerolling 2936067-5.patch against Drupal 8.9.xUpdate: 2936067-5.patch is applying successfully against Drupal 8.9.x – there was some misconfiguration in our build process causing this irritation.
Comment #26
eiriksmNot sure how polite it is to just take over someones merge request, but here is a test only patch that fails for me.
I kept the changes to the optimized css files, even though we could use css strings directly. Seems practical, if we changed something that in turn ended up changing those, somehow.
Comment #27
eiriksmok here is one without that phpcs error
Comment #29
ravi.shankar CreditAttribution: ravi.shankar at OpenSense Labs commentedJust fixed failed test of patch #27.
Comment #31
bradjones1Comment #32
bradjones1Updating the title to handle some additional edge cases that are material to our handling of @import; might as well knock them out since this ticket has expanded to cover the collection optimizer, as well.
Comment #33
bradjones1Would be interesting to see if this resolves any of the referenced issues; some propose changes in behaviour rather than bugfix but this is the only one with tests AFAIK.
Comment #35
majorrobot CreditAttribution: majorrobot at CivicActions for Global Game Jam commentedThe patch in #25 worked for us. Would be great to see this merged!
Our pre-aggregated CSS:
@import url(https://fonts.googleapis.com/css2?family=Nunito+Sans:wght@200;400;700&display=swap);@import url(https://fonts.googleapis.com/css2?family=Noto+Sans+TC:wght@400;700&display=swap);
And after aggregation:
@import url(https://fonts.googleapis.com/css2?family=Nunito+Sans:wght@200;@import url(https://fonts.googleapis.com/css2?family=Noto+Sans+TC:wght@400;
It appears the optimizer saw the semicolon, stopped, and went to the next rule.
Comment #36
bradjones1@majorrobot if you think it's RTBC can you mark it as such?
Comment #37
majorrobot CreditAttribution: majorrobot at CivicActions for Global Game Jam commented@bradjones1 I haven't verified the tests. I'm not clear if that is a blocker to RTBC?
Comment #38
bradjones1@majorrobot Not sure what you mean by "verified" but if you have functionally validated this (your comment above) and taken a look at the patch and think it's ready for framework maintainers to review, RTBC it.
Comment #39
majorrobot CreditAttribution: majorrobot at CivicActions for Global Game Jam commentedThanks @bradjones1. Done!
Comment #40
bradjones1Comment #41
szeidler CreditAttribution: szeidler at Ramsalt Lab commentedWe're using the merge request changes successfully on a couple of projects, to make the Google Fonts import work as expected.
Comment #42
alexpottI think this is a pretty major bug if CSS aggregation is failing on snippets provided by google fonts then that's quite unexpected.
While reviewing the code I think there is one part of the change that's not necessary - or if it is then it needs test coverage to prove it.
Comment #43
anrikun CreditAttribution: anrikun commentedA quick fix for D7 users (port of patch at #25)
Comment #44
pfrenssenThis needs a reroll now that #2669074: Convert file_create_url() & file_url_transform_relative() to service, deprecate it is in.
Comment #45
pfrenssenThis is a bugfix so it can stay on 9.2.x for the time being. Then we have no merge conflict since #2669074: Convert file_create_url() & file_url_transform_relative() to service, deprecate it is a feature request for 9.3.x.
Comment #46
pfrenssenI have updated the test so that inline and external imports are now mixed. I will now revert to the original approach using
preg_replace()
to address comment #2936067-42: CSS aggregation fails on many variations of @import.Comment #47
pfrenssenOK it is working as expected. Remark #42 is addressed. Ready for next review!
Comment #48
idimopoulos CreditAttribution: idimopoulos for European Commission and European Union Institutions, Agencies and Bodies commentedIt worked fine for us. Remarks addressed. RTBC +1.
Comment #49
lauriii@nod_ pointed out #19 which led into discovering that there are at least some advanced variations where the previous regex would have recognized an import statement but the new regex wouldn't:
Discussed with @nod_ and @alexpott about use cases where the new regex will recognize invalid CSS syntax. We thought that would be acceptable because that's essentially pre-existing problem, and this issue won't make it worse. Also, the new regex essentially decreases the likelihood of CSS aggregator created syntax errors.
Comment #50
bradjones1@lauriii thanks for the review. Can you be more specific what you mean by the "new" regex? I had proposed a change to the regex in https://git.drupalcode.org/project/drupal/-/merge_requests/241/diffs?com... which I think later got backed out. Can you be more specific what you think needs to be changed here? It's not clear to me from your comment. Do we also need to include any additional test (not valid, but common mistake) patterns you mention, a la those in #19? Thanks.
Comment #51
lauriiiI was referring to the regex that has been changed as a new regex since the pattern changed significantly from the regex that we use currently. I'm sorry because that could have been clearer.
I think what needs to be done is add test coverage that proves that the two examples I posted in #49 work with the changes.
Comment #52
bradjones1OK thanks for the clarification.
See https://regex101.com/r/QuK3Pp/1 which is based on the set presented in #19 but with an updated regex to:
/@import\s*(?:url\(\s*)?[\'"]?([^\'"\]+)([\'")]+\)?.*;)/ig
The difference being the removal of the
U
(ungreedy) flag as well as removing(
and)
as invalid characters inside of the URL.Pushed this in https://git.drupalcode.org/project/drupal/-/merge_requests/241/diffs?com...
I'll admit it's been a while since I last looked at this issue so I'm not sure if making this greedy will have undesired effects, but I suppose that's what the test suite is for!
Rebased to 9.3.x.
Comment #53
bradjones1Realized I included some new URL patterns to the test but these also need to be included in the aggregated fixtures.
Comment #54
AndyF CreditAttribution: AndyF at Fabb commentedThanks for all your work on this @bradjones1!
This is probably a silly question, but if we have a greedy
.*;
on the end, does the new regex buy us anything over just matching@import.+;
?Comment #55
nod_Seems a regex will be tricky, we need to catch those things to have a complete solution:
https://developer.mozilla.org/en-US/docs/Web/CSS/string#syntax
https://developer.mozilla.org/en-US/docs/Web/CSS/url()#values
Comment #56
pfrenssenLooks like the regex posted above by @bradjones1 in #52 covers those edge cases?
https://regex101.com/r/QuK3Pp/1
Is there a case that is not covered by this example? I see that both the regular strings and
url()
declarations are covered with semicolons and parentheses inside them.Comment #57
pfrenssenOK I see that the escaped quotes were not covered, but they are working with the same regex. Here I have added additional test cases for escaped quotes and double quotes as described on https://developer.mozilla.org/en-US/docs/Web/CSS/string#syntax
https://regex101.com/r/ThvAX6/1
Comment #58
nod_Oh nice :) Regex seems to have problems when several @import statements are on the same line
Comment #59
pfrenssenSince this has been rebased on 9.3.x the patch no longer applies on 9.2.x and below. Here is a quick reroll of the patch for people that need to unbreak their sites right now.
This patch doesn't include the test coverage but can be applied on a wide range of Drupal versions: 8.8.x, 8.9.x, 9.0.x, 9.1.x, 9.2.x and 9.3.x with varying degrees of offset.
Comment #60
AndyF CreditAttribution: AndyF at Fabb commentedI wonder if it's unnecessarily complicated though? It mostly optionally matches at the front and then has a
.*;
at the end. If we remove the optional parts, I think the regex reduces to@import[^\'"\]+)([\'")]+.*;
(https://regex101.com/r/ESu2qg/1). At that point it's not clear to me that it offers anything over a simple@import.+;
(https://regex101.com/r/EfhPdD/1). Sorry if I'm missing something already discussed!Comment #61
pfrenssenNeither of those work with inline statements as mentioned by _nod in #58.
The inlining makes it a whole lot more complex. I have been experimenting with it but I'm stumped on the parentheses and semicolons that can be part of a URL inside
url()
, for example@import url(http://example.com/cu(s)t;om.css);
. I have no idea how we can reliably detect the ending parentheses and semicolon.This is already detecting inline @import statements that do not use
url()
:https://regex101.com/r/Az82S0/1
Comment #62
pfrenssenOK we need to narrow down what is considered to be a valid URL. It is not fully documented on https://developer.mozilla.org/en-US/docs/Web/CSS/url()
I've been experimenting in the Firefox CSS engine, and these URLs are NOT valid:
url(http://example.com/cu'stom.css);
url(http://example.com/cu"stom.css);
url(http://example.com/cus(t;om.css);
url(http://example.com/cus)t;om.css) ;
url(http://example.com/cu(s)t;om.css);
url(http://user:pass@example.com/cu(s)t;om.css);
url (http://example.com);
- note the space between "url" and the opening parenthesis. This is invalidTo make the above valid they can either be wrapped in quotes or escaped with backslashes. These test cases are valid:
url( " http://user:pass@example.com/c'u(s)t;o\"m.css " ) ;
url( ' http://user:pass@example.com/c"u(s)t;o\'m.css ' ) ;
url( http://user:pass@example.com/c\'u\(s\)t;om.css ) ;
This makes a lot more sense :)
Comment #63
pfrenssenI have added the url() matching on quoted and double quoted strings: https://regex101.com/r/nNj8Sn/1
Still missing is the quoteless but escaped case
url( http://user:pass@example.com/c\'u\(s\)t;om.css ) ;
I have no more time unfortunately to continue on this today.
Comment #64
pfrenssenI settled on this
/@import\s*(?:(['"])?(?:\\\1|.)*\1.*|url\(\s*(?:(['"])?(?:\\\2|.)*\2|(?:\\[\)\'\"]|[^'")])*)\s*\).*);/gUi
Passes all the test cases, on separate lines or inline: https://regex101.com/r/X6B10k/1
Comment #65
pfrenssenI am getting errors when trying it in a real site :-/
On PHP 8.0 this throws
JIT stack limit exhausted
. It works though when I disablepcre.jit
and increase thepcre.backtrack_limit
to 10000000 but that is not a good idea.I'm guessing it is caused by having 2 option groups in combination with non-greedy matching. However this seems to be necessary, we need to be able to identify matching quote pairs.
Possibly we can greatly reduce the number of recursions if we split this in 2 separate regexes? One for the string URLs and one for the
url()
pattern?Comment #66
pfrenssenI did some more testing but matching quoted URLs using option groups and backtracking are out of the question. It works fine on individual test cases, but when testing with 100kb minified CSS files they are too heavy.
There are 5 possible cases of import statements. I added a simple regex for each case and execute them in series. This is fast and memory efficient even with large files.
We still need to repair the tests and extend them with some of the cases discovered above.
Comment #67
pfrenssenBackport for Drupal 9.2.x and below.
Comment #68
nod_I like the solution, this might change the source order of @import though, depending on what is matched when.
Comment #69
nod_Looks like we can combine things:
Comment #70
nod_looks like it's working, one fail, haven't look to see if test or code is "wrong"
Comment #71
pfrenssenI'm going to work on this for a few hours, take a look at the test coverage. I will unassign as soon as I am ready.
I will also create a downloadable patch compatible with older D9 versions to help out @ultrabob.
Comment #72
pfrenssenThere is a bug in the regex, it is not detecting some import statements correctly, such as the following:
@import url(http://example.com/c\"us\(t;o\'m.css);
@import "cu\"st;om.css";
@import url( http://user:pass@example.com/c\'u\(s\)t;om.css ) ;
I'll see if I can reconstruct it from the 5 separate ones. It looks like something went wrong possibly in the escaping of the quotes. Perhaps we should use NOWDOC to define the expression.
Comment #73
pfrenssenStopping for today. Using NOWDOC is indeed a good way to import the regular expression. It is also possible to import it in quoted form but this involves tricky triple backslashes. If we keep it as a NOWDOC then this can also be trivially copy/pasted into an online regex builder tool for future work. Otherwise we'll have to strip out the escapes again.
While working on the tests I noticed that there are some bugs in the code that strips out duplicate whitespace. There are some regexes that will detect quoted strings, but they get tripped by unquoted import statements that contain escaped quotes. For example something like
@import url(some\'file.css);
will trip up whitespace deduplication in the rest of the line. This is out of scope for this issue though. I will file a followup.Here is also a version of the patch without test coverage for people who want to use this on older versions of Drupal 8/9.
Comment #74
pfrenssenCreated followup for the whitespace deduplication issue: #3241196: Whitespace deduplication tripped up by escaped quotes in URLs
Comment #75
pfrenssenAwesome, looking good from my side. I also tested it using a real life project that was affected by this bug and it works fine now. No more problems with the stack limit. Setting back to NR
Comment #76
nod_Same happy with it now too
Comment #78
nod_Comment #79
nod_looks good to me, need someone to RTBC this :)
Comment #80
ultrabob CreditAttribution: ultrabob commentedI spun up a copy of a site that is known to have this issue, upgraded it to 9.3.x, and confirmed that css was aggregating and the issue persisted. I then applied the patch, and css was still aggregating and the problem was fixed.
I went through the code as well, and while the regex is a monster, that I'm not sure I fully grasped, the code and tests both look good.
Comment #81
alexpottUpdating the link in the IS to the current regex and using @pfrenssen's examples...
Comment #82
alexpottSorted out commit credit. I creditted people who reviewed the patch and commented on the code via gitlab. I didn't credit @ravi.shankar because their fix was not a fix. I didn't credit @patrickkrueger since the upload was for Drupal 8.9 and the patch for that existed and applied already. And I didn't @anrikun because D7 patches need a separate issue as per policy.
Comment #83
alexpottCommitted 7374845 and pushed to 9.3.x. Thanks!
Committed 5aa154d and pushed to 9.2.x. Thanks!
I backported the change and fixed the conflict in core/tests/Drupal/Tests/Core/Asset/CssOptimizerUnitTest.php and ran all the tests in core/tests/Drupal/Tests/Core/Asset - everything looks good.