Sweet project! Have you thought about moving over to use the CDN module?

Comments

chriscalip’s picture

Version: » 6.x-1.x-dev

Well I thought about it.. and this one really complements the CDN module. Pretty much CDN rewrites the links inside the html page and this module in conjunction with (advagg) rewrites the links in the aggregated css files.

And I could use some opinion about this:

I can be wrong about this but the admin panel in CDN admin/settings/cdn/details has the logic setup for by file type. Usually people would do

CDN Mapping:
http://img1.drupal.org|jpg
http://img2.drupal.org|gif
http://img3.drupal.org|png
http://img4.drupal.org|ico

While this module has it for the sequential replacement element of the logic.
http://img1.drupal.org
http://img2.drupal.org
http://img3.drupal.org
http://img4.drupal.org

pretty much with a data set of
background:url('sites/all/themes/do/1.png');
background:url('sites/all/themes/do/2.png');
background:url('sites/all/themes/do/3..png');
background:url('sites/all/themes/do/4.png');
background:url('sites/all/themes/do/5.png');
background:url('sites/all/themes/do/6.png');
background:url('sites/all/themes/do/7.png');
background:url('sites/all/themes/do/8.png');

The result would be:
background:url('http://img1.drupal.org/sites/all/themes/do/1.png');
background:url('http://img2.drupal.org/sites/all/themes/do/2.png');
background:url('http://img3.drupal.org/sites/all/themes/do/3..png');
background:url('http://img4.drupal.org/sites/all/themes/do/4.png');
background:url('http://img1.drupal.org/sites/all/themes/do/5.png');
background:url('http://img2.drupal.org/sites/all/themes/do/6.png');
background:url('http://img3.drupal.org/sites/all/themes/do/7.png');
background:url('http://img4.drupal.org/sites/all/themes/do/8.png');

chriscalip’s picture

Having said that what would be the user experience for managing this? I can't picture of a way to consolidate these 2 different requirements into just one admin interface aka the mapping Text Area field. What do you think?

mikeytown2’s picture

I helped Wim Leers with some prototype code that deals with this exact situation. It is available in the CDN module. See readme.txt for details.

Your mapping on admin/settings/cdn/details would look like

http://img1.drupal.org
http://img2.drupal.org
http://img3.drupal.org
http://img4.drupal.org

And then the "PHP code for cdn_pick_server()" on admin/settings/cdn/other would look like

$filename = basename($servers_for_file[0]['url']);
$unique_file_id = hexdec(substr(md5($filename), 0, 5));
return $servers_for_file[$unique_file_id % count($servers_for_file)];

This will spread all cdn requests fairly equally across the 4 different img* domains.

chriscalip’s picture

It is very possible to do integration with cdn. Because both $cdn_basic_mapping and $parallel_css_settings are just a string of urls.

$parallel_css_settings is always:
http://img1.drupal.org
http://img2.drupal.org
http://img3.drupal.org
http://img4.drupal.org

While $cdn_basic_mapping can be
http://img1.drupal.org
http://img2.drupal.org
http://img3.drupal.org
http://img4.drupal.org

With:
$CDN_PICK_SERVER_PHP_CODE_VARIABLE:

$filename = basename($servers_for_file[0]['url']);
$unique_file_id = hexdec(substr(md5($filename), 0, 5));
return $servers_for_file[$unique_file_id % count($servers_for_file)];

or this:
http://img1.drupal.org|png
http://img2.drupal.org|gif
http://img3.drupal.org|jpg
http://img4.drupal.org|ico

The devil is in the details

<?php
/**
 * Implementation of hook_advagg_css_alter().
 */
function parallel_css_advagg_css_alter(&$contents, $files, $bundle_md5) {
  $parallel_css_settings = variable_get('parallel_css_settings',NULL);
  if ( empty($parallel_css_settings) ) {
    return;
  }
  $parallel_css_counter = 0;
  $paralell_css_settings_urls = explode("\n",$parallel_css_settings);
  // clean up
  foreach ($paralell_css_settings_urls as $key => $value ){
    if (strlen($value) <= 1){
      unset($paralell_css_settings_urls[$key]);
    }
  }
  $paralell_css_settings_count = count($paralell_css_settings_urls);
  ctools_static('parallel_css_counter', $parallel_css_counter);
  ctools_static('parallel_css_settings_urls', $paralell_css_settings_urls);
  ctools_static('parallel_css_settings_count', $paralell_css_settings_count);
  $contents = preg_replace_callback('/url\(\s*[\'"]?\/?(.+?)[\'"]?\s*\)/i', "_parallel_css_replace_url", $contents);
  $contents = preg_replace_callback('/src=\s*[\'"]?\/?(.+?)[\'"]?\s*\)/i', "_parallel_css_replace_src", $contents);
}
?>
chriscalip’s picture

I can make an assumption that during the advagg process of parallel_css it's always gonna be whatever is the selected mapping url(s) we want to balance this out as evenly as possible.

pretty much the same formula:

<?php
$servers_for_file[$unique_file_id % count($servers_for_file)];
// OR
$paralell_css_settings_urls[$parallel_css_counter % $paralell_css_settings_count]
?>

With that said I can do like this:

<?php

// If cdn basic mapping is available, assign as the mapping url array
// else if parallel css mapping is available, assign as the mapping url array
// else do nothing 

// then process the mapping url array and the contents data
?>

So i pretty much make separate module of parallel css mapping admin for just in case folks that dont want to make use of the cdn module but still wants to do a load balancing on their css aggregates..Or pretty much just remove the admin aspect of the parallel css and use cdn.

What do you think?

Peter Bowey’s picture

@chriscalip

I love the idea = +1.

Load Balancing => 'yes'

Notes: I have not started using this module yet, I prefer to read the source and see where it is going...:)

Well done!

mikeytown2’s picture

You want some sort of hash on the filename, that way the same file will always be coming from the same server; thus your browser will always have the cache of it. I don't think your current code does that. Also set the weight of this to be heavier than css_emimage

Peter Bowey’s picture

Referring to #7

In the 'ancient' non CMS days .... :)
I used a 'parallel' URI CDN 'hash' like this: (0-2)

$isx=true;
$isx=++$isx%3;

eg: ('old timer' HTML method sample):

<?php session_set_cookie_params(900,"/","www.pbcomp.com.au",NULL,TRUE); session_cache_limiter(FALSE); session_start(); $isx=true; $page_title=''; ?>
...
...
<link rel="stylesheet" type="text/css" media="screen,projection" href="http://small.gdlcdn.com/802C5F/static<?php echo $isx=++$isx%3;?>/css/master.css" />

DNS:

static0.computerdocs.com.au.    240     IN      A       165.228.91.94
static1.computerdocs.com.au.    240     IN      A       165.228.91.94
static2.computerdocs.com.au.    240     IN      A       165.228.91.94
mikeytown2’s picture

@peter bowey
In regards to #8, that works great until the order of your link tags change; once they change then you have to re-download the same CSS file from a different domain instead of getting it from your browser cache. Or in this case if you add/remove a url() link at the top of a CSS file then all the url() references will be pointing to a different server.

The code below shows how the filename hash thing works. If you change the number of servers than the modulus will be different. This isn't perfect by any means but in terms of code complexity VS getting it right, its a pretty good tradeoff. The url() changes when the # of available servers changes, which makes sense.

$files = array(
  '/css/master.css' => -1,
  '/css/alpha.css' => -1,
  '/css/beta.css' => -1,
  '/css/gama.css' => -1,
  '/css/delta.css' => -1,
);

$servers = array(
  0 => 'http://img1.drupal.org',
  1 => 'http://img2.drupal.org',
  2 => 'http://img3.drupal.org',
  3 => 'http://img4.drupal.org',
  4 => 'http://img5.drupal.org',
);

$file_ids = array();
foreach ($files as $filepath => $value) {
  $filename = basename($filepath);
  $unique_file_id = hexdec(substr(md5($filename), 0, 5));
  $files[$filepath] = $servers[$unique_file_id % count($servers)];
  $file_ids[$filepath] = $unique_file_id;
}

echo str_replace('    ', '&nbsp;&nbsp;&nbsp;&nbsp;', nl2br(htmlentities(print_r($files, TRUE))));
echo str_replace('    ', '&nbsp;&nbsp;&nbsp;&nbsp;', nl2br(htmlentities(print_r($file_ids, TRUE))));

Outputs

Array
(
    [/css/master.css] => http://img4.drupal.org
    [/css/alpha.css] => http://img2.drupal.org
    [/css/beta.css] => http://img3.drupal.org
    [/css/gama.css] => http://img1.drupal.org
    [/css/delta.css] => http://img3.drupal.org
)
Array
(
    [/css/master.css] => 945108
    [/css/alpha.css] => 63356
    [/css/beta.css] => 684072
    [/css/gama.css] => 366595
    [/css/delta.css] => 161037
)
Peter Bowey’s picture

@mikeytown2

Many thanks!

That is a 'acceptable' method!

I will plan to integrate it into the advagg + parallel interface 'thingy'

Appreciate you will and time to encourage an 'old dog non-cms coder'.
* I am still learning the correct Drupal 'bark' - it is not 'woof - woof' - more like 'callback sometime grrrr' * :)

chriscalip’s picture

Hey mikey,

I made you a co-maintainer if you want to handle hash code , ill whip up the cdn integration thing. sorry talking with a client. cant respond for a time.

Peter Bowey’s picture

@mikeytown2 project support count = +1
Mike, must be about 22+ projects you love + support :)
I elect that you have 26 hours per day, the rest of us 24....

chriscalip’s picture

Yikes, i thought about #11 more .. its just that i wasnt aware of the concept. I can quickly research and implement it. but if you want to take care of it (at least that part of the module) thats okay too :)

mikeytown2’s picture

I'll be busy over here for a little while so the ball is in your court :)
http://groups.drupal.org/node/154564

chriscalip’s picture

I got this, should be finish by tomorrow. need to sleep and all

Peter Bowey’s picture

@chriscalip

I think @mikeytown2 has planted enough 'good seed' to get this 'hash code' rose 'in bloom' :)

chriscalip’s picture

I could not sleep. This is interesting.

I re-read the messages and i realized that i am not getting the big picture here.
Picking on the clues "asset collective" and "to be heavier than css_emimage" I started
reading the issue queues of several modules including advagg and css_emimage.
Having said that I just want to be clear on what we are trying to pull off here.

Drupal site http://www.example.com
has several css files including the following

/modules/system/system.css 
   url("/modules/system/misc/1.png")
   url("/modules/system/misc/2.png")
   url("/modules/system/misc/3.png")
/sites/all/themes/ninesixtyrobots/css/main.css
   url("/sites/all/themes/ninesixtyrobots/4.png")
   url("/sites/all/themes/ninesixtyrobots/5.png")
   url("/sites/all/themes/ninesixtyrobots/6.png")
   url("/sites/all/themes/ninesixtyrobots/7.png")
   url("/sites/all/themes/ninesixtyrobots/8.png")

site admin installs cdn, advagg, parallel_css, and css_emimage.

CDN mapping url:

http://img1.drupal.org
http://img2.drupal.org
http://img3.drupal.org
http://img4.drupal.org
http://s1.amazonaws.com/drupal-cdn
http://drupal-cdn.s1.amazonaws.com

Three scenarios:

Senario A parallel_css, advagg compress css, core advagg css/js are enabled. css_emimage is not.

During the css aggregation process because parallel_css has a weight of -10 see (parallel_css.install) it gets first dibs
on hook_advagg_css_alter. parallel_css gets the mapping url array from cdn_basic_mapping and then proceeds to the replacement
process. After the replacement process of $content it gets pass to the other implementers of hook_advagg_css_alter and at the
end of the process we get an aggregated file of css_0f8107b462965cd0d36e3ad9a51359e7_0.css containing among its contents:

   url("http://img1.drupal.org/modules/system/misc/1.png")
   url("http://img2.drupal.org/modules/system/misc/2.png")
   url("http://img3.drupal.org/modules/system/misc/3.png")
   url("http://img4.drupal.org/sites/all/themes/ninesixtyrobots/4.png")
   url("http://s1.amazonaws.com/drupal-cdn/sites/all/themes/ninesixtyrobots/5.png")
   url("https://drupal-cdn.s1.amazonaws.com/sites/all/themes/ninesixtyrobots/6.png")
   url("http://img1.drupal.org/sites/all/themes/ninesixtyrobots/7.png")
   url("http://img2.drupal.org/sites/all/themes/ninesixtyrobots/8.png")

--- So why does parallel_css needs to implement hash code if the other modules are doing it?

Senario B parallel_css, advagg compress css, core advagg css/js are enabled. css_emimage.
parallel_css is lightest.

we get an aggregated file of css_0f8107b462965cd0d36e3ad9a51359e7_1.css containing among its contents:

   url("XXXX-CSS-EmbeddedString-XXXXX")
   url("XXXX-CSS-EmbeddedString-XXXXX")
   url("XXXX-CSS-EmbeddedString-XXXXX")
   url("XXXX-CSS-EmbeddedString-XXXXX")
   url("XXXX-CSS-EmbeddedString-XXXXX")
   url("XXXX-CSS-EmbeddedString-XXXXX")
   url("XXXX-CSS-EmbeddedString-XXXXX")
   url("XXXX-CSS-EmbeddedString-XXXXX")

--- Is Css Embedded Image able to handle a domain name ???

Senario C parallel_css, advagg compress css, core advagg css/js are enabled. css_emimage.
parallel_css is heavier than css_emimage

we get an aggregated file of css_0f8107b462965cd0d36e3ad9a51359e7_2.css containing among its contents:

   url("http://img1.drupal.org/XXXX-CSS-EmbeddedString-XXXXX")
   url("http://img2.drupal.org/XXXX-CSS-EmbeddedString-XXXXX")
   url("http://img3.drupal.org/XXXX-CSS-EmbeddedString-XXXXX")
   url("http://img4.drupal.org/XXXX-CSS-EmbeddedString-XXXXX")
   url("http://s1.amazonaws.com/drupal-cdn/XXXX-CSS-EmbeddedString-XXXXX")
   url("https://drupal-cdn.s1.amazonaws.com/XXXX-CSS-EmbeddedString-XXXXX")
   url("http://img1.drupal.org/XXXX-CSS-EmbeddedString-XXXXX")
   url("http://img2.drupal.org/XXXX-CSS-EmbeddedString-XXXXX")

--- are these strings valid ???

mikeytown2’s picture

XXXX-CSS-EmbeddedString-XXXXX is a BASE64 encoded version of that file. You get the benefits of a image sprite without some of the hassles that come with it. So this module (Parrallel CSS) needs to check that the ulr() is not base64 encoded and is a file. css_emimage will only drop in 32kb of image data into the CSS file so anything larger will then be processed in this module.

chriscalip’s picture

Mikeytown2 and peter bowey , you guys were pretty deep, I could not get what you guys were saying. mbutcher and i figured it out and even made some improvements.

First I wanted to make sure that this is the concept that we are trying to achieve.

1.png from 1.css is loaded the first time as img1.d.o/1.png
at the next pages: 1.png from 2.css is appearing as img2.d.o/1.png

what we want to make sure is 1.png is always attached to the same server.
1.png is always http://img1.d.o/1.png from any aggregated css.

although the distributed set is not always optimal i agree that this is the best way in the long run.

//Equal Distribution algorithm 
$servers_for_file[$unique_file_id % count($servers_for_file)];
// result set of (img1,img2,im3,img4,img1,img2,im3,img4) 

//Hash based algorithm 
$paralell_css_settings_urls[$parallel_css_counter % $paralell_css_settings_count]
// result set of (img1,img1,im1,img2,img1,img2,im3,img4)

Matt Butcher made some suggestions on how to speed it up from:

md5
Overall Summary
Total Incl. Wall Time (microsec): 4,401 microsecs
Total Incl. MemUse (bytes): 102,348 bytes
Total Incl. PeakMemUse (bytes): 199,084 bytes
Number of Function Calls: 466

$filename = basename("$matches[1]");
$unique_file_id = hexdec(substr(md5($filename), 0, 5));
$replaced_text = "src='".$paralell_css_settings_urls[$unique_file_id % $paralell_css_settings_count]."/$matches[1]'";

TO:

crc32
Overall Summary
Total Incl. Wall Time (microsec): 3,173 microsecs
Total Incl. MemUse (bytes): 101,664 bytes
Total Incl. PeakMemUse (bytes): 181,248 bytes
Number of Function Calls: 420

  $filename = basename("$matches[1]");
  $unique_file_id = abs(crc32($filename));
  $replaced_text = "src='".$paralell_css_settings_urls[$unique_file_id % $paralell_css_settings_count]."/$matches[1]'";
Peter Bowey’s picture

@chriscalip

Many Thanks for working this through.
The use of crc32() many not be unique enough in some cases, hence the reasoning for using md5().

CRC is 'at most' an error detection method than a serious hash function. It helps in identifying say 'corrupted files' rather than uniquely identifying them.

Given a file, and a CRC32 checksum, it is relatively simple to make small modifications to the file so that it has the desired checksum. There is no easy way to do this with md5 sums.

CRC32 is useful for say, a communications checksum, because it's fast and efficient and effective at catching the kinds of errors that happen over a communictions line (short bursts of errors, at most, in relatively small blocksizes). It's easy to implement and long predates MD5.

But if you're using it for anything other than a simple communications checksum, 'it's being abused'.

chriscalip’s picture

@peter bowey

My pleasure its a fun project for me. http://drupalcode.org/project/parallel_css.git/commit/18974e3 Done.

Peter Bowey’s picture

Refer #19:

See http://brainspl.at/articles/2006/12/29/speed-up-page-loads

Most browsers will only open up 2 concurrent connections per cname. this means that if all of your assets are being served from http://example.com and you have a lot of little images and scripts on the page, the clients browser will only open two connections to example.com and pipeline or use those two connections to download all the assets on the page.

By using a wildcard subdomain or manually setting dns so that you spread out the static assets over a few different subdomains, you let the browser open two connections per subdomain so all the assets will download in a more parallel fashion. This can be the difference between your page loads lagging at the end while they load up all the little assets and having the page snap into place and seem a lot quicker to the user.

So if we use a simple little view helper method for all of our image urls we can spread the load out by faking the browser into thinking it is connecting to multiple servers. For example if we serve all our images from these subdomains:

asset1.example.com
asset2.example.com
asset3.example.com
asset4.example.com

This will give us 8 concurrent connections from the browser to the server for static assets which dramatically decreases page load time. The thing to watch out for is that you always want to serve the same asset from the same subdomain or else you defeat browser caching and won’t gain anything from this trick. So we will use an Zlib hash of the asset url modulo 4 to choose a subdomain. Here is a simple helper:

require 'zlib'

  # balance images across many domains to force the opening of more connections  
  # updated to use Zlib.crc32 instead of md5 as per
  # comment from David
  def balanced_asset_url(asset)        
    idx = (Zlib.crc32(asset || "error" ) % 4) + 1
    %!http://asset#{idx}.#{request.domain}#{asset}!
  end

Then use it like this:

<%= image_tag balanced_asset_url('/images/foo.png') %>

By hashing the asset path we make sure that each time this helper is called for the same asset it will always return the same subdomain.

This technique is most useful when you have many objects on a page that need to make an additional http request each to render. By tricking the browser into making more concurrent connections when fetching assets we can speed up our page load times and make our sites seem more ‘snappy’

The above 'quote' is only meant as a idea 'template' and 'brain food' :)

chriscalip’s picture

I think we have achieved this now. :) 1.png will always be assigned to the same domain.

@TODO if cdn_basic_mapping exist use that instead of the parallel_css_mapping
@TODO make parallel_css weight more heavy than css_emimage

Peter Bowey’s picture

Refer #23
@chriscalip

Looking through the latest code @ http://drupalcode.org/project/parallel_css.git/blob_plain/refs/heads/6.x...

The above code methods look good to me.
I will test this 'real-time' today! :)

@TODO if cdn_basic_mapping exist use that instead of the parallel_css_mapping
@TODO make parallel_css weight more heavy than css_emimage

+1

Many thanks for contributing to Drupal projects!

Peter Bowey’s picture

Refer #23

It is also interesting reading through other projects / ideas that used this parallel asset method:

See -> http://statichtml.com/2010/use-unique-ips-for-sharded-asset-hosts.html

One of the golden rules for front-end performance optimisation — one recommended by both Yahoo's YSlow and Google's Page Speed — is to split your page assets across multiple hostnames to allow web browsers to download more of those assets in parallel. Unfortunately it turns out that some consumer-grade network devices will block traffic to sites that use these techniques if the asset hosts all have the same IP address.

Consequently, if your site downloads page assets from multiple hosts — often referred to as domain sharding — make sure they all have separate IP addresses.
...
...
Timeout woes, SYN Flood to Host

Unfortunately, days later we started to get a steady trickle of customers complaining that they were getting timeout errors when accessing the LOVEFiLM website. They were reporting that the first page loaded, but most (though not all) of the images were broken. Any subsequent page requests all failed with a timeout error. Other than the symptoms, the customers had very little in common; ISPs, operating systems and browsers all seemed to be affected proportionately to our visitor stats.

The problem turned out to be caused by a well-intentioned but ultimately misguided setting baked into the stateful firewall built into certain consumer-grade ADSL routers. These routers track the number of unfinished TCP connections — that is, outbound TCP connections where the SYN packet has been sent but the router has yet to see a SYN ACK response from the server, otherwise known as embryonic connections — to each IP address. If the number of unfinished TCP connections to an individual IP address exceeds a given threshold, all subsequent packets to that IP are silently dropped for a period of 5 minutes. In the user's web browser, this results in timeout errors for any requests that did not make it through before the door was slammed shut.

The setting in question is commonly labelled Maximum unfinished TCP/UDP connections per host. On some devices such as the Belkin F5D7630 this setting is configurable through a hidden page in the router's web-based admin interface, but on others the threshold is simply baked into the firmware and cannot be changed. Worse, some devices ship with a default value as low as 10 for this setting. Modern web browsers make anywhere between 6 and 15 HTTP connections per hostname, so loading static assets from more than one hostname is almost certain to trigger this rule.

The only clue a user would have that their router was causing the connection to be blocked is the SYN Flood to Host entry in their firewall logs:

07/13/2010 21:02:38 **SYN Flood to Host** 192.168.2.4, 55112->>  ↵
    194.117.248.100, 80 (from ATM1 Outbound)

I can only assume that this setting is an attempt lessen the effectiveness of DDoS attacks from the client-side. A noble intention, to be sure, but preventing or lessening the effectiveness of DDoS attacks on websites is not something I would consider to be within the domain of a consumer-grade ADSL modem. By all means protect the user against inbound DoS/DDoS attacks, but blocking outbound traffic based on what the router manufacturer deems to be normal usage seems like a step too far.

Overloading of brain food (sorry!)... :)

chriscalip’s picture

@TODO make parallel_css weight more heavy than css_emimage

http://drupalcode.org/project/parallel_css.git/commit/32c84d8 Done.

chriscalip’s picture

Refer #25 Oh joy! My company website is like that http://www.straightnorth.com
We are pretty much using (img1.straightnorth.com,img2.straightnorth.com,img3.straightnorth.com,img3.straightnorth.com,img4.straightnorth.com,css.straightnorth.com) all pointing to the same ip :(

Peter Bowey’s picture

Refer #27

@chriscalip

*smile* That is only meant to be a 'heads up' about some 'possible issues' + how some 'typically older consumer' grade ADSL routers offer 'crude' 'firewall' protection... eg: "SYN Flood to Host" :)

Unfortunately it turns out that some consumer-grade network devices will block traffic to sites that use these techniques if the asset hosts all have the same IP address.

Personally, I use a dual-wan Linksys RV082 ADSL2+ on two active ADSL2+ lines - with two static IP's... feeding a dedicated Linux Server (3 x Ethernet Ports / Gateway). In this event, I have 'disabled' the Linksys RV082 WAN firmware 'crud protection' and use Linux 'packet stateful' firewall..

Peter Bowey’s picture

Refer #26
@chriscalip

Good work Chris! +1

Just one to go: :)

@TODO if cdn_basic_mapping exist use that instead of the parallel_css_mapping

Of interest see the following Drupal CDN links:
http://drupal.org/node/962266
http://drupal.org/node/956164

Notes: Google is pushing a growing number of hits for your module:

Showing results for drupal advagg
Search Results

Advanced CSS/JS Aggregation | drupal.org
drupal.org/project/advagg
19 Feb 2011 ... If the user has the permission of "bypass advanced aggregation" then adding ?advagg=0 to the end of the URL will turn off aggregation for ...
Parrallel CSS - AdvAgg Plugin | drupal.org
drupal.org/project/parallel_css
8 Jun 2011 ... Inspired by the request AdvAgg - Use the CDN module for ...

+1:)

chriscalip’s picture

This is a bit tricky, i am troubled by cdn's approach of only those who knows php will be able to pull this off.
http://drupal.org/node/962266

We need a better approach here:

What do you think of this:
@ /admin/settings/advagg/parallel-css

If function_exist(cdn_file_url_alter) :

[X] Use Available CDN Mapping and CDN pick-server
----------------------------------------------------------------
Be sure to read: http://drupal.org/node/962266
----------------------------------------------------------------
URL:
----------------------------------------------------------------
Enter the domains urls you want included separated by each line. Warning dont include a '/' at the end of the domain url.

* For example http://img1.drupal.org
* http://img2.drupal.org
* http://img3.drupal.org
* http://img4.drupal.org
* https://s1.amazonaws.com/drupal_cdn

In addition for SEO purposes (prevent double content) : Please update the .htaccess file

In between these two lines:

# RewriteBase /

# Rewrite URLs of the form 'x' to the form 'index.php?q=x'.

* # Parallel CSS - Start RewriteCond %{HTTP_HOST} img1.drupal.org [NC]
* RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|jpeg|ico)$ [NC]
* RewriteRule ^(.*)$ http://www.drupal.org/$1 [L,R=301]
*
* RewriteCond %{HTTP_HOST} img2.drupal.org [NC]
* RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|jpeg|ico)$ [NC]
* RewriteRule ^(.*)$ http://www.drupal.org/$1 [L,R=301]
*
* RewriteCond %{HTTP_HOST} img3.drupal.org [NC]
* RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|jpeg|ico)$ [NC]
* RewriteRule ^(.*)$ http://www.drupal.org/$1 [L,R=301]
*
* RewriteCond %{HTTP_HOST} img4.drupal.org [NC]
* RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|jpeg|ico)$ [NC]
* RewriteRule ^(.*)$ http://www.drupal.org/$1 [L,R=301]
*
* RewriteCond %{HTTP_HOST} s1.amazonaws.com/drupal_cdn [NC]
* RewriteCond %{REQUEST_URI} !\.(png|gif|jpg|jpeg|ico)$ [NC]
* RewriteRule ^(.*)$ http://www.drupal.org/$1 [L,R=301]
# Parallel CSS - End

----------------------------------------------------------------

mikeytown2’s picture

Instead of htaccess rules there is an issue for CDN in regards to SEO. It's fairly high on my priority list
#1060358: CDN and SEO as in it might get done in 2 weeks

Peter Bowey’s picture

Refer #30

"Oh No", not Apache .htaccess rules 'again'.... :(

"tongue-in-cheek"

I use exclusively Nginx, that poor Apache 2.x 'sod' died for me 2 years past (R.I.P.)

Research Reference: http://drupal.org/node/1060358#comment-4333802

chriscalip’s picture

#32

I mean .... I am giving an option for people to use the CDN mapping and cdn_pick_server instead of using parallel_css mapping and logic.

Pretty much a checkbox in the admin settings page of parallel_css

[ YES OR NO ] [X] Use Available CDN Mapping and CDN pick-server
----------------------------------------------------------------
Be sure to read: http://drupal.org/node/962266
-----------------------------------------------------------------

Peter Bowey’s picture

Refer #33
@chriscalip

Sounds good to me Chris! +1
I got the 'shakes' when I saw that .htaccess 'thingy :)

chriscalip’s picture

ok dokes. going with that option.

Peter Bowey’s picture

Refer #30 + #31

For those setup's effected by CDN 'duplicate' SEO a partial solutions exists here ->
http://drupal.org/project/files_proxy

mikeytown2’s picture

@peter bowey
Not the right solution. We need to send out a 404 at a minimum or a 301 ideally if someone tries to access html content on your server through the CDN.

Peter Bowey’s picture

Refer #37
@mikeytown2

Thanks, I misunderstood the 'doc' reading @ http://drupal.org/project/files_proxy

In nginx.conf, I use something like this for the CDN private back-channel URI path (what the CDN pulls from):

    # Drupal STATIC DOMAIN = Static Assets for CDN PULL
    server {
        server_name                 cdn1.peterbowey.com.au cdn2.peterbowey.com.au cdn3.peterbowey.com.au cdn4.peterbowey.com.au;
        root                        /var/www/virtual/peterbowey.com.au;
        limit_conn                  gulag 12;               # max concurrent connections per client /ip
        index                       index.php index.html;
        if_modified_since           exact;
        access_log                  /var/log/nginx/wp_static.log main buffer=32k;
//...
//...
    # Avoid bandwidth stealing (Media resources) - serve 1x1 transparent GIF
    valid_referers none blocked server_names www.peterbowey.com.au www.pbcomp.com.au small.gdlcdn.com ~(peterbowey.com.au.|google.);  # reduce linking from outside
    if ($invalid_referer) {
    	return 403;
    }
    # Deny illegal Host headers
    if ($host !~* ^(cdn1.peterbowey.com.au|cdn2.peterbowey.com.au|cdn3.peterbowey.com.au|cdn4.peterbowey.com.au|www.peterbowey.com.au)$ ) { # allow access for CDN + self
    	return 444;
    }
//...
//...
    # send our static not cached requests to our drupal PHP Dynamic Domain with clean URLs support (301))
    location @drupal {
            rewrite ^/(.*)$  $scheme://www.peterbowey.com.au/index.php?q=$1 last;
    }
//...
//...
   # deny access to any php files
   location ~* ^.+\.php$ {
            deny all;
   }

Then something like this on the Drupal PHP side:

    server {                # DRUPAL DYNAMIC SECTION:
        server_name         www.peterbowey.com.au;
        root                /var/www/virtual/peterbowey.com.au;
        limit_conn          gulag 20;                                           # max concurrent connections per client /ip
        index               index.php index.html;

        access_log          /var/log/nginx/peter-drupal.log main buffer=32k;
        error_log           /var/log/nginx/bad-error.log;

        # Deny illegal Host headers
        if ($host ~* ^(cdn1.peterbowey.com.au|cdn2.peterbowey.com.au|cdn3.peterbowey.com.au|cdn4.peterbowey.com.au)$ ) {  # Remote CDN should NOT come here
            rewrite ^ $scheme://cdn1.peterbowey.com.au$request_uri permanent;   # send it to the correct static host
        }
chriscalip’s picture

hey how expensive is it to get several ips and host accounts and just have it rsync? trying to solve my straightnorth.com and imgX pointing to same ip problem.

chriscalip’s picture

BTW CDN make use of hook_file_url_alter via cdn_file_url_alter --- that function is a beast with user access checks, cdn testing checks and cdn_devel_page_stats stuff. i am currently pretty much copying and pasting the important parts of cdn_file_url_alter -- or i could go the route of calling cdn_file_url_alter... what do you guys think?

Peter Bowey’s picture

Refer #39

*Same IP's*
That should only be a 'problem' if the router starts blocking packets. See #28

Unfortunately it turns out that some consumer-grade network devices will block traffic to sites that use these techniques if the asset hosts all have the same IP address.

chriscalip’s picture

#41

Router being the router of the users looking at the site or the router of the hosting company of the site?

Peter Bowey’s picture

Refer #42

A) = Host / Server router

Side-note: Obviously, you 'hire' hosting. I run my own dedicated server -'in-house' :)

All I pay for, is 2 x ADSL2+ 'public' lines / connections (100Gb x 2 - per-month use)...

In your case, I do not think that a professional 'host' company would have 'that issue' with their modern routers!

chriscalip’s picture

#43
thank you.

chriscalip’s picture

First working prototype of cdn integration very basic.
http://drupalcode.org/project/parallel_css.git/commit/6f62c02

pretty much we still have to go to
@ /admin/settings/advagg/parallel-css
Check the box [X] use available cdn mapping and cdn_pick_server of cdn

this doesnt do the following CDN features:
a.) CDN supports HTTPS
b.) Drupal paths entered in this blacklist will not serve any files from the CDN. This blacklist is applied for all users.
c.) Drupal paths entered in this blacklist will not serve any files from the CDN. This blacklist is applied for authenticated users only.

Peter Bowey’s picture

chriscalip’s picture

this doesnt do the following CDN features:
a.) CDN supports HTTPS
b.) Drupal paths entered in this blacklist will not serve any files from the CDN. This blacklist is applied for all users.
c.) Drupal paths entered in this blacklist will not serve any files from the CDN. This blacklist is applied for authenticated users only.

These will prolly have to be other issues. Right now, I dont know how to pull these off.

mikeytown2’s picture

Why are you copying the cdn_file_url_alter function? Just require the CDN module and be done with it. Or am I missing something? Run the image references in the CSS through file_create_url or if they are running CDN on a non patched drupal, detect it by stealing the first part of cdn_init() (variable_get(CDN_THEME_LAYER_FALLBACK_VARIABLE, FALSE) == TRUE) and then call cdn_file_url_alter directly.

Have it look something like this

  // CDN Support.
  if (module_exists('cdn')) {
    $status = variable_get(CDN_STATUS_VARIABLE, CDN_DISABLED);
    if (($status == CDN_ENABLED || ($status == CDN_TESTING && user_access(CDN_PERM_ACCESS_TESTING))) && !variable_get(CDN_THEME_LAYER_FALLBACK_VARIABLE, FALSE)) {
      if (variable_get(CDN_THEME_LAYER_FALLBACK_VARIABLE, FALSE) == TRUE) {
        return cdn_file_url_alter($path);
      }
      else {
        return file_create_url($path);
      }
    }
  }
  else {
    // "Simple" fallback if CDN is not installed. Don't re-implement a modules logic.
  }
chriscalip’s picture

doh! or even better

$cdn_replacement = advagg_build_uri($path);

Incidentally this is the one prone to let the relative urls in ../.. which causes a bug like http://drupal.org/node/1183062

I am hoping that somewhere in the process advagg_build_css_bundle always gets run.

mikeytown2’s picture

Good idea!
I've added in the fallback logic on my end so advagg_build_uri() looks a lot like #48 (#1185786: allow for URLs to get CDN-ed even if cdn patch is not applied). As for #1183062: Support for URI (path) rather than Domain, how that works is configurable in the CDN module. The reason why it wasn't working is by default CDN disables it's self on all paths that start with admin/*; I have a special case to handle those now.

chriscalip’s picture

Peter Bowey’s picture

@chriscalip
@mikeytown2

Nice teamwork Mike and Chris!

@chriscalip, your module may have possibly saved me from moving my D6 Core to D7 (a long story within...) .. :)

@mikeytown2, the updated advagg_build_uri() you applied has made the 'great Code Sun' shine here :)
see http://drupal.org/node/1185786. Great, see my comment above to Chris.

Many thanks for a useful module Chris, additionally - we have learned some 'cool stuff'.
Teamwork = Cool!

chriscalip’s picture

it was! lets do it again sometime.

chriscalip’s picture

Status: Active » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.