Semicolons in @import urls (e.g., certain types of Google Fonts variants) don't work with aggregation. In addition, there are a number of modifiers (e.g., media queries or supports-queries) on the @import syntax that are not captured by our test coverage. See: https://developer.mozilla.org/en-US/docs/Web/CSS/@import and https://regex101.com/r/H2SOFh/1

Also, the CSS collection optimizer doesn't have test coverage.

Issue fork drupal-2936067

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

DuaelFr created an issue. See original summary.

DuaelFr’s picture

Issue tags: +CSS aggregation
agentrickard’s picture

Status: Active » Closed (won't fix)

This is an invalid URL. The ampersand and semicolon are reserved characters with special meaning and need to be encoded.

The & should be recoded as %26 using a function like http://php.net/manual/en/function.urlencode.php

See https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved... and https://tools.ietf.org/html/rfc1738

agentrickard’s picture

The '&' here is being used to pass query arguments, and needs to be an actual & and not &

https://fonts.googleapis.com/css?family=Sedgwick+Ave+Display&subset=lati...

DuaelFr’s picture

Status: Closed (won't fix) » Needs review
FileSize
829 bytes

@agentrickard it was a real life example. Yes, this one has an easy workaround but it is real and it is working in all browsers so it's strange that Drupal breaks it when aggregation is enabled.

The thing is that, according to the RFC3986, semicolons can be used as "sub-delims". It's not very common but it exists. If you read the "Notes" section of the PHP urlencode function documentation, you'll be teased that both "&", ";" and "&" can be found as separator.

I wasn't able to start writing a test for \Drupal\Core\Asset\CssCollectionOptimizer yet but I think it's going to be needed.

The attached patch has been inspired from \Drupal\Core\Asset\CssOptimizer::processCss() and the new Regex can be tested ->here<-. It fixes the issue mentionned in the summary.

bradjones1’s picture

Version: 8.4.x-dev » 9.1.x-dev

Indeed this is very much in the wild; e.g., fonts from Google Fonts produce such an embed code.

It is definitely uncommon to see a semicolon in a URL, but my read of RFC 3986 comports with the above comment.

Let's bump the version and see if this still passes.

bradjones1’s picture

Pass! I think this just needs a solid addition to the test suite.

proweb.ua’s picture

#5 works

webdrips’s picture

#5 works for me too.

The line that was breaking our site:

@import url('https://fonts.googleapis.com/css2?family=Roboto:wght@400;500;700;900&display=swap');
tanubansal’s picture

Tested #5
RTBC + 1

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Clemens Sahs’s picture

Tested #5
RTBC + 1

needed for 8.x too!

bradjones1’s picture

Still needs tests :-)

Cosmin Hodis-Mindras’s picture

Issue summary: View changes

Tested #5
RTBC + 1

Clemens Sahs’s picture

I think this works back for drupal 8.x to

bradjones1’s picture

Status: Needs review » Needs work
+++ b/core/tests/Drupal/Tests/Core/Asset/css_test_files/css_input_with_import.css
@@ -4,6 +4,7 @@
+@import url("https://fonts.googleapis.com/css2?family=Roboto+Mono:wght@300;400&family=Roboto:ital,wght@0,300;0,400;1,300;1,400&display=swap");

Concerned about including an external URL (even from somewhere as "stable" as you'd expect Google Fonts to be) in tests - even if there's a temporary network failure in retrieving, this would be more fragile than we probably need. I imagine a text fixture could serve just as well.

Clemens Sahs’s picture

even if there's a temporary network failure in retrieving, this would be more fragile than we probably need.

In the case we make some network interactions, yes you are right and my full approval.

But in this case (CssOptimizer) we make simple string interaction.

at the same time we must edit the following, too.

@import url("http://example.com/style.css");
@import url("//example.com/style.css");

Please correct my if I miss something?

bradjones1’s picture

Ah, yes, sorry - brain fart. I think then let's just change the fonts.googleapis.com to fonts.example.com, per the RFC on example domains and so as to not "endorse" any particular font vendor by reference?

AndyF’s picture

Also I think the regex as it stands might fail to match some (edge!) cases correctly?

  1. @import "cu'st;om.css";
  2. @import 'cus(t;om.css';
  3. @import 'cus)t;om.css';
  4. @import 'cus(t);om.css';
  5. @import url(http://example.com/cu'st;om.css);
  6. @import url(http://example.com/cus(t;om.css);
  7. @import url(http://example.com/cus)t;om.css);
  8. @import url(http://example.com/cu(s)t;om.css);

See https://regex101.com/r/97dCNQ/2.

Thanks!

bradjones1’s picture

Status: Needs work » Needs review

Hi Andy!

You raise a good point - in recent years internationalization (IRIs) and things like funky Google Fonts URLs have clouded the question of "what is a valid URL" to the point where it's rather difficult to regex it. I went back and looked at the current code, which basically boils down to matching any text starting with @import through to a semicolon:

https://regex101.com/r/ybSIaY/1/

As you can see, this matches all the existing test cases but incorrectly truncates the URL containing a semicolon.

I have updated this regex to now be '/(*ANYCRLF)(*BSR_ANYCRLF)@import.*;(\R|$)/i' which now matches from the start of the import statement to the end of the line (or to the end of the file, if for instance the file doesn't end with a newline. This shouldn't happen per the spec but might be worth trying to catch anyway?)

https://regex101.com/r/5YE5Rr/1

So far it looks like this helps clean things up, passes the expanded test coverage and doesn't have any regression on the existing aggregation unit test. The thing this will not capture is any import statements spanning multiple lines. I imagine this was the intent of the [^;] in the existing regex. AFAICT this is not a common pattern and I'm not quite sure there's a way to write a regex that does all the things we need here and handles the newlines in a platform agnostic way, but I could be wrong. We don't have this multiline condition in the current test coverage and honestly I'm not sure it's valid per the CSS spec, so this may be worth a possible regression to get Google Fonts with semicolons working, which is a way more common condition.

bradjones1’s picture

Status: Needs review » Needs work
bradjones1’s picture

Does the CSS collection optimizer not have tests? Looks like it might not?

patrickkrueger’s picture

Rerolling 2936067-5.patch against Drupal 8.9.x

Update: 2936067-5.patch is applying successfully against Drupal 8.9.x – there was some misconfiguration in our build process causing this irritation.

eiriksm’s picture

Status: Needs work » Needs review
FileSize
7.2 KB

Not sure how polite it is to just take over someones merge request, but here is a test only patch that fails for me.

I kept the changes to the optimized css files, even though we could use css strings directly. Seems practical, if we changed something that in turn ended up changing those, somehow.

eiriksm’s picture

ok here is one without that phpcs error

Status: Needs review » Needs work

The last submitted patch, 27: 2936067-test-only.patch, failed testing. View results

ravi.shankar’s picture

Status: Needs work » Needs review
FileSize
663 bytes
7.2 KB

Just fixed failed test of patch #27.

Status: Needs review » Needs work

The last submitted patch, 29: 2936067-test-only.patch, failed testing. View results

bradjones1’s picture

Title: CSS aggregation fails if an @import contains a semicolon » CSS aggregation fails on many variations of @import
Issue summary: View changes
bradjones1’s picture

Issue summary: View changes

Updating the title to handle some additional edge cases that are material to our handling of @import; might as well knock them out since this ticket has expanded to cover the collection optimizer, as well.

bradjones1’s picture

Status: Needs work » Needs review

Would be interesting to see if this resolves any of the referenced issues; some propose changes in behaviour rather than bugfix but this is the only one with tests AFAIK.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

majorrobot’s picture

The patch in #25 worked for us. Would be great to see this merged!

Our pre-aggregated CSS:
@import url(https://fonts.googleapis.com/css2?family=Nunito+Sans:wght@200;400;700&display=swap);@import url(https://fonts.googleapis.com/css2?family=Noto+Sans+TC:wght@400;700&display=swap);

And after aggregation:
@import url(https://fonts.googleapis.com/css2?family=Nunito+Sans:wght@200;@import url(https://fonts.googleapis.com/css2?family=Noto+Sans+TC:wght@400;

It appears the optimizer saw the semicolon, stopped, and went to the next rule.

bradjones1’s picture

@majorrobot if you think it's RTBC can you mark it as such?

majorrobot’s picture

@bradjones1 I haven't verified the tests. I'm not clear if that is a blocker to RTBC?

bradjones1’s picture

@majorrobot Not sure what you mean by "verified" but if you have functionally validated this (your comment above) and taken a look at the patch and think it's ready for framework maintainers to review, RTBC it.

majorrobot’s picture

Status: Needs review » Reviewed & tested by the community

Thanks @bradjones1. Done!

bradjones1’s picture

Issue tags: -Needs tests
szeidler’s picture

We're using the merge request changes successfully on a couple of projects, to make the Google Fonts import work as expected.

alexpott’s picture

Priority: Normal » Major
Status: Reviewed & tested by the community » Needs work

I think this is a pretty major bug if CSS aggregation is failing on snippets provided by google fonts then that's quite unexpected.

While reviewing the code I think there is one part of the change that's not necessary - or if it is then it needs test coverage to prove it.

anrikun’s picture

A quick fix for D7 users (port of patch at #25)

pfrenssen’s picture

pfrenssen’s picture

Version: 9.3.x-dev » 9.2.x-dev
Issue tags: -Needs reroll

This is a bugfix so it can stay on 9.2.x for the time being. Then we have no merge conflict since #2669074: Convert file_create_url() & file_url_transform_relative() to service, deprecate it is a feature request for 9.3.x.

pfrenssen’s picture

I have updated the test so that inline and external imports are now mixed. I will now revert to the original approach using preg_replace() to address comment #2936067-42: CSS aggregation fails on many variations of @import.

pfrenssen’s picture

Status: Needs work » Needs review

OK it is working as expected. Remark #42 is addressed. Ready for next review!

idimopoulos’s picture

Status: Needs review » Reviewed & tested by the community

It worked fine for us. Remarks addressed. RTBC +1.

lauriii’s picture

Status: Reviewed & tested by the community » Needs work

@nod_ pointed out #19 which led into discovering that there are at least some advanced variations where the previous regex would have recognized an import statement but the new regex wouldn't:

@import url(http://example.com/cus(tom.css);
@import url(http://example.com/cu(s)tom.css);

Discussed with @nod_ and @alexpott about use cases where the new regex will recognize invalid CSS syntax. We thought that would be acceptable because that's essentially pre-existing problem, and this issue won't make it worse. Also, the new regex essentially decreases the likelihood of CSS aggregator created syntax errors.

bradjones1’s picture

@lauriii thanks for the review. Can you be more specific what you mean by the "new" regex? I had proposed a change to the regex in https://git.drupalcode.org/project/drupal/-/merge_requests/241/diffs?com... which I think later got backed out. Can you be more specific what you think needs to be changed here? It's not clear to me from your comment. Do we also need to include any additional test (not valid, but common mistake) patterns you mention, a la those in #19? Thanks.

lauriii’s picture

I was referring to the regex that has been changed as a new regex since the pattern changed significantly from the regex that we use currently. I'm sorry because that could have been clearer.

I think what needs to be done is add test coverage that proves that the two examples I posted in #49 work with the changes.

bradjones1’s picture

Version: 9.2.x-dev » 9.3.x-dev
Status: Needs work » Needs review

OK thanks for the clarification.

See https://regex101.com/r/QuK3Pp/1 which is based on the set presented in #19 but with an updated regex to:

/@import\s*(?:url\(\s*)?[\'"]?([^\'"\]+)([\'")]+\)?.*;)/ig

The difference being the removal of the U (ungreedy) flag as well as removing ( and ) as invalid characters inside of the URL.

Pushed this in https://git.drupalcode.org/project/drupal/-/merge_requests/241/diffs?com...

I'll admit it's been a while since I last looked at this issue so I'm not sure if making this greedy will have undesired effects, but I suppose that's what the test suite is for!

Rebased to 9.3.x.

bradjones1’s picture

Status: Needs review » Needs work

Realized I included some new URL patterns to the test but these also need to be included in the aggregated fixtures.

AndyF’s picture

Thanks for all your work on this @bradjones1!

The difference being the removal of the U (ungreedy) flag as well as removing ( and ) as invalid characters inside of the URL.

This is probably a silly question, but if we have a greedy .*; on the end, does the new regex buy us anything over just matching @import.+;?

nod_’s picture

Seems a regex will be tricky, we need to catch those things to have a complete solution:
https://developer.mozilla.org/en-US/docs/Web/CSS/string#syntax
https://developer.mozilla.org/en-US/docs/Web/CSS/url()#values

pfrenssen’s picture

Looks like the regex posted above by @bradjones1 in #52 covers those edge cases?

https://regex101.com/r/QuK3Pp/1

Is there a case that is not covered by this example? I see that both the regular strings and url() declarations are covered with semicolons and parentheses inside them.

pfrenssen’s picture

OK I see that the escaped quotes were not covered, but they are working with the same regex. Here I have added additional test cases for escaped quotes and double quotes as described on https://developer.mozilla.org/en-US/docs/Web/CSS/string#syntax

https://regex101.com/r/ThvAX6/1

nod_’s picture

Oh nice :) Regex seems to have problems when several @import statements are on the same line

pfrenssen’s picture

Since this has been rebased on 9.3.x the patch no longer applies on 9.2.x and below. Here is a quick reroll of the patch for people that need to unbreak their sites right now.

This patch doesn't include the test coverage but can be applied on a wide range of Drupal versions: 8.8.x, 8.9.x, 9.0.x, 9.1.x, 9.2.x and 9.3.x with varying degrees of offset.

AndyF’s picture

Looks like the regex posted above by @bradjones1 in #52 covers those edge cases?

I wonder if it's unnecessarily complicated though? It mostly optionally matches at the front and then has a .*; at the end. If we remove the optional parts, I think the regex reduces to @import[^\'"\]+)([\'")]+.*; (https://regex101.com/r/ESu2qg/1). At that point it's not clear to me that it offers anything over a simple @import.+; (https://regex101.com/r/EfhPdD/1). Sorry if I'm missing something already discussed!

pfrenssen’s picture

Neither of those work with inline statements as mentioned by _nod in #58.

The inlining makes it a whole lot more complex. I have been experimenting with it but I'm stumped on the parentheses and semicolons that can be part of a URL inside url(), for example @import url(http://example.com/cu(s)t;om.css);. I have no idea how we can reliably detect the ending parentheses and semicolon.

This is already detecting inline @import statements that do not use url():

https://regex101.com/r/Az82S0/1

pfrenssen’s picture

OK we need to narrow down what is considered to be a valid URL. It is not fully documented on https://developer.mozilla.org/en-US/docs/Web/CSS/url()

I've been experimenting in the Firefox CSS engine, and these URLs are NOT valid:

  • url(http://example.com/cu'stom.css);
  • url(http://example.com/cu"stom.css);
  • url(http://example.com/cus(t;om.css);
  • url(http://example.com/cus)t;om.css) ;
  • url(http://example.com/cu(s)t;om.css);
  • url(http://user:pass@example.com/cu(s)t;om.css);
  • url (http://example.com); - note the space between "url" and the opening parenthesis. This is invalid

To make the above valid they can either be wrapped in quotes or escaped with backslashes. These test cases are valid:

  • url( " http://user:pass@example.com/c'u(s)t;o\"m.css " ) ;
  • url( ' http://user:pass@example.com/c"u(s)t;o\'m.css ' ) ;
  • url( http://user:pass@example.com/c\'u\(s\)t;om.css ) ;

This makes a lot more sense :)

pfrenssen’s picture

I have added the url() matching on quoted and double quoted strings: https://regex101.com/r/nNj8Sn/1

Still missing is the quoteless but escaped case url( http://user:pass@example.com/c\'u\(s\)t;om.css ) ;

I have no more time unfortunately to continue on this today.

pfrenssen’s picture

I settled on this /@import\s*(?:(['"])?(?:\\\1|.)*\1.*|url\(\s*(?:(['"])?(?:\\\2|.)*\2|(?:\\[\)\'\"]|[^'")])*)\s*\).*);/gUi

Passes all the test cases, on separate lines or inline: https://regex101.com/r/X6B10k/1

pfrenssen’s picture

I am getting errors when trying it in a real site :-/

On PHP 8.0 this throws JIT stack limit exhausted. It works though when I disable pcre.jit and increase the pcre.backtrack_limit to 10000000 but that is not a good idea.

I'm guessing it is caused by having 2 option groups in combination with non-greedy matching. However this seems to be necessary, we need to be able to identify matching quote pairs.

Possibly we can greatly reduce the number of recursions if we split this in 2 separate regexes? One for the string URLs and one for the url() pattern?

pfrenssen’s picture

I did some more testing but matching quoted URLs using option groups and backtracking are out of the question. It works fine on individual test cases, but when testing with 100kb minified CSS files they are too heavy.

There are 5 possible cases of import statements. I added a simple regex for each case and execute them in series. This is fast and memory efficient even with large files.

We still need to repair the tests and extend them with some of the cases discovered above.

pfrenssen’s picture

FileSize
2.04 KB

Backport for Drupal 9.2.x and below.

nod_’s picture

I like the solution, this might change the source order of @import though, depending on what is matched when.

nod_’s picture

Looks like we can combine things:

@import\s*(\'(?:\\\'|.)*\'|"(?:\\"|.)*"|url\(\s*(?:\\[\)\\\'\"]|[^\'")])*\s*\)|url\(\s*\'(?:\\\'|.)*\'\s*\)|url\(\s*"(?:\\"|.)*"\s*\)).*;
nod_’s picture

looks like it's working, one fail, haven't look to see if test or code is "wrong"

pfrenssen’s picture

Assigned: Unassigned » pfrenssen

I'm going to work on this for a few hours, take a look at the test coverage. I will unassign as soon as I am ready.

I will also create a downloadable patch compatible with older D9 versions to help out @ultrabob.

pfrenssen’s picture

There is a bug in the regex, it is not detecting some import statements correctly, such as the following:

  • @import url(http://example.com/c\"us\(t;o\'m.css);
  • @import "cu\"st;om.css";
  • @import url( http://user:pass@example.com/c\'u\(s\)t;om.css ) ;

I'll see if I can reconstruct it from the 5 separate ones. It looks like something went wrong possibly in the escaping of the quotes. Perhaps we should use NOWDOC to define the expression.

pfrenssen’s picture

Stopping for today. Using NOWDOC is indeed a good way to import the regular expression. It is also possible to import it in quoted form but this involves tricky triple backslashes. If we keep it as a NOWDOC then this can also be trivially copy/pasted into an online regex builder tool for future work. Otherwise we'll have to strip out the escapes again.

While working on the tests I noticed that there are some bugs in the code that strips out duplicate whitespace. There are some regexes that will detect quoted strings, but they get tripped by unquoted import statements that contain escaped quotes. For example something like @import url(some\'file.css); will trip up whitespace deduplication in the rest of the line. This is out of scope for this issue though. I will file a followup.

Here is also a version of the patch without test coverage for people who want to use this on older versions of Drupal 8/9.

pfrenssen’s picture

Created followup for the whitespace deduplication issue: #3241196: Whitespace deduplication tripped up by escaped quotes in URLs

pfrenssen’s picture

Status: Needs work » Needs review

Awesome, looking good from my side. I also tested it using a real life project that was affected by this bug and it works fine now. No more problems with the stack limit. Setting back to NR

nod_’s picture

Same happy with it now too

Status: Needs review » Needs work

The last submitted patch, 73: 2936067-73-backport.patch, failed testing. View results

nod_’s picture

Status: Needs work » Needs review
nod_’s picture

looks good to me, need someone to RTBC this :)

ultrabob’s picture

Status: Needs review » Reviewed & tested by the community

I spun up a copy of a site that is known to have this issue, upgraded it to 9.3.x, and confirmed that css was aggregating and the issue persisted. I then applied the patch, and css was still aggregating and the problem was fixed.

I went through the code as well, and while the regex is a monster, that I'm not sure I fully grasped, the code and tests both look good.

alexpott’s picture

Issue summary: View changes

Updating the link in the IS to the current regex and using @pfrenssen's examples...

alexpott’s picture

Sorted out commit credit. I creditted people who reviewed the patch and commented on the code via gitlab. I didn't credit @ravi.shankar because their fix was not a fix. I didn't credit @patrickkrueger since the upload was for Drupal 8.9 and the patch for that existed and applied already. And I didn't @anrikun because D7 patches need a separate issue as per policy.

alexpott’s picture

Version: 9.3.x-dev » 9.2.x-dev
Status: Reviewed & tested by the community » Fixed

Committed 7374845 and pushed to 9.3.x. Thanks!
Committed 5aa154d and pushed to 9.2.x. Thanks!

I backported the change and fixed the conflict in core/tests/Drupal/Tests/Core/Asset/CssOptimizerUnitTest.php and ran all the tests in core/tests/Drupal/Tests/Core/Asset - everything looks good.

  • alexpott committed 7374845 on 9.3.x
    Issue #2936067 by pfrenssen, bradjones1, nod_, eiriksm, DuaelFr, Clemens...

  • alexpott committed 5aa154d on 9.2.x
    Issue #2936067 by pfrenssen, bradjones1, nod_, eiriksm, DuaelFr, Clemens...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.