This issue deals with bringing the library download feature (seen below) to drupal-org.make.

101 	To download an external library (this will be placed in sites/all/libraries by default):
102 	====
103 	libraries[tinymce][download][type] = "get"
104 	libraries[tinymce][download][url] = "http://downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.7/tinymce_3_2_7.zip"
105 	libraries[tinymce][directory_name] = "tinymce"
106 	====
107 	
108 	To download an external library and place it in an alternate location (this
109 	code downloads jQuery UI and places it in modules/jquery_ui/jquery.ui):
110 	====
111 	libraries[jquery_ui][download][type] = "get"
112 	libraries[jquery_ui][download][url] = "http://jquery-ui.googlecode.com/files/jquery.ui-1.6.zip"
113 	libraries[jquery_ui][directory_name] = jquery.ui
114 	libraries[jquery_ui][destination] = modules/jquery_ui
115 	==== 

For security reasons, the "get" feature of drush.make was removed from drupal.org's implementation. The consensus appears to be that if a white-list is implemented to verify no malicious or illegal code was being packaged and stored on drupal.org, the "get" feature would be enabled.

Is there a way to verify the following line:

libraries[jquery_ui][download][url] = "http://jquery-ui.googlecode.com/files/jquery.ui-1.6.zip"

...against a white-list that contains something like the following?

http://downloads.sourceforge.net/project/tinymce/*

Comments

dmitrig01’s picture

I would prefer to have d.o introduce a new property, something like externals[] and just keep a list of name => url. WHat do you think?

philbar’s picture

If the property is the same as the current convention "libraries[]", then the .make file can be used apart form d.o. This seems more ideal since this is intended for use on install profiles and some people might want to package the install profile on their own.

philbar’s picture

Also, correct me if I'm wrong, but it sounds like your assuming the whitelist could have the full download URL. I don't think a system like this will be sustainable.

Something like externals[tinymce] will not work because:

1. The location of tinymce could change. The full download URL should be the resposibility of the install profile maintainer because it would be much to much work for the whitelist maintainer to manage contantly changing download locations.
2. It doesn't account for different versions. Not every module could be expected to work with the latest version of the library.

For this reason, the Install Profile maintainer should provide the URL which downloads the version of the library they want.

dmitrig01’s picture

(1) Well if the URL changes, how is it going to be marked as on the whitelist anyway?
(2) easy enough to solve - have different versions just as anything else does - different URLs point to different versions.

philbar’s picture

The whitelist you are proposing would look like this for only one library, TinyMCE, with only version 3.2.7:

http://downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.7/tinymce_3_2_7.zip
http://downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.6/tinymce_3_2_6.zip
http://downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.5/tinymce_3_2_5.zip
http://downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.4/tinymce_3_2_4.zip
http://downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.3/tinymce_3_2_3.zip
http://downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.2/tinymce_3_2_2.zip
http://downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.1/tinymce_3_2_1.zip
http://downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.0/tinymce_3_2_0.zip

Or the whitelist I'm suggesting:

http://downloads.sourceforge.net/project/tinymce/*

Which whitelist is easier to maintain if the TinyMCE developers changes their download location to the following:

http://tinymce.moxiecode.com/download/tinymce-3-2-7.zip

It's your call. If you don't think it will be too much work for the whitelist maintainer when a change like this happens, than go ahead.

I imagine it would be a nightmare if sourceforce.net or Google Code changes their download locations because many libraries are hosted on these sites and dozens of URLs would need to be changed by a single whitelist maintainer. This seems like an unnecessary bottleneck.

dmitrig01’s picture

I have a brilliant idea - ask the (potential) whitelist maintainer!

philbar’s picture

It appears we are waiting to choose the whitelist maintainer until this issue is complete.

Maybe we should just make the decision here.

I sincerely feel using wildcards instead exact URLs will be the best approach since it accomplishes what we need to keep malicious code out of distributions and will result in a smaller whitelist file which in turn makes it easier to maintain.

cweagans’s picture

I will volunteer to maintain the whitelist, and I think that it would be best to start simple (specific download URLs) and add features as needed (wildcards and such).

philbar’s picture

Are you suggesting something like this:

// TinyMCE
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.3/tinymce_3_3.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.3rc1/tinymce_3_3rc1.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.3b2/tinymce_3_3b2.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.3b1/tinymce_3_3b1.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.7/tinymce_3_2_7.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.6/tinymce_3_2_6.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.5/tinymce_3_2_5.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.4/tinymce_3_2_4.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.3/tinymce_3_2_3.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.2/tinymce_3_2_2.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.1/tinymce_3_2_1.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.0/tinymce_3_2_0.zip

// TinyMCE for jQuery
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.3/tinymce_3_3_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.3rc1/tinymce_3_3rc1_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.3b2/tinymce_3_3b2_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.3b1/tinymce_3_3b1_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.7/tinymce_3_2_7_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.6/tinymce_3_2_6_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.5/tinymce_3_2_5_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.4/tinymce_3_2_4_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.3/tinymce_3_2_3_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.2/tinymce_3_2_2_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.1/tinymce_3_2_1_jquery.zip
http:\/\/downloads.sourceforge.net/project/tinymce/TinyMCE/3.2.0/tinymce_3_2_0_jquery.zip

// MooTools
http:\/\/mootools.net/download/get/mootools-1.2.4-core-yc.js
http:\/\/mootools.net/download/get/mootools-1.2.3-core-yc.js
http:\/\/mootools.net/download/get/mootools-1.2.2-core-yc.js
http:\/\/mootools.net/download/get/mootools-1.2.1-core-yc.js
http:\/\/mootools.net/download/get/mootools-1.2.0-core-yc.js

Wouldn't it be easier for the following to be the whitelist format?

// TinyMCE
http:\/\/downloads.sourceforge.net/project/tinymce/\*

// MooTools
http:\/\/mootools.net/download/get/\*

I just don't want to have to rely on the whitelist maintainer to update the list every time a new version of the library comes out. It seems like more to manage than is necessary. With specific download URLs you would have to be updating it weekly with the release of every new library version. Why don't we save the hassel and just whitelist entire libraries rather than individual library versions.

philbar’s picture

alex_b’s picture

Let's support wildcards. It reflects the use case better. We will need to decide on allowing external libraries on a library per library basis, not a library version basis.

dmitrig01’s picture

I'm convinced. Someone wanna code this up?

philbar’s picture

We need to work with drush_make_add_libraries found on line 161 of drush_make.drush.inc

I propose something like the following:

// Contents of whitelist.inc
http:\/\/downloads.sourceforge.net\/project\/tinymce\/
http:\/\/jquery-ui.googlecode.com\/

// Include Whitelist File to be Maintained
$whitelist = include 'whitelist.inc';

// Convert whitelist to array based on linebreaks
$whitepath = explode("/n", $whitelist);

// Check to see if library path is on whitelist 
if (strpos($basepath, $whitepath)) {
    
    // 
    echo $basepath;
}

NOTE: i doubt any of the above code works, because I'm new to PHP. But hopefully it's a start.

andrechun’s picture

How about something like this?

function drush_make_add_libraries($install_path, $tmp_path, $base_path, $info, $queue) {
  if (!empty($info['libraries'])) {

    foreach ($info['libraries'] as $key => $library) {
      if (!is_string($key) || !is_array($library)) {
        // TODO Print a prettier message
        continue;
      }

      if (DRUSH_MAKE_DO) {
        $is_merge_array = drush_make_libraries_whitelist($library['download']['url']);
        if (!$is_merge_array) {
          drush_make_error('BUILD_ERROR', dt('Not on white list: '.$library['download']['url']));		
        }
      } else {
        $is_merge_array = true;	
      }


      if ($is_merge_array) {
        // Merge the known data onto the library info.
        $library += array(
          'name'           => $key,
          'core'           => $info['core'],
          'base_path'      => $base_path,
          'install_path'   => $install_path,
          'tmp_path'       => $tmp_path,
          'version'        => drush_get_option('drush-make-version-best'),
          'location'       => drush_get_option('drush-make-update-default-url'),
          'subdir'         => '',
          'directory_name' => $key,
          'queue'          => $queue,
          'type'           => 'library',
        );
        $init = $queue->addItem('drush_make_initialize_project', array($library));
        $queue->addItem('drush_make_run', array(), $init);
      }
    }
  }
}


//new function
function drush_make_libraries_whitelist($url) {
  $whitelist = explode("\n", file_get_contents('whitelist.inc'));
  foreach($whitelist as $whitelist_url) {
    if ($whitelist_url == substr($url, 0, strlen($whitelist_url)) {
      return true;
    }
  }
  return false;
}
philbar’s picture

Status: Active » Needs review
dmitrig01’s picture

Status: Needs review » Active

There's no patch yet. There is a plugin system. Look in contrib/drush_make_d_o.drush.inc for more.

dmitrig01’s picture

before moving into code I want to talk to the d.o people

q0rban’s picture

dixon_’s picture

This makes most sense to me: If a whitelist *is* provided that will be taken into account. And if a whitelist *isn't* provided everything will pass through. This isn't reflected in the pseudo code in #15, what I can see.

What would be the best approach? This functionality can be useful for things outside of the usecase for drupal.org also.

dmitrig01’s picture

yeah, the code itself isn't really an issue, semantics are more of the issue here.

dmitrig01’s picture

So, 5 things need to happen for this to get in.

1) We need to appoint a whitelist maintainer.
2) We need to figure out what in format the whitelist will be kept.
3) The whitelist maintainer (and others potentially) need to create the whitelist.
4) This needs to be coded in Drush Make
5) It needs to be deployed on Drupal.org

I'll handle #4, and #5 needs to happen after all the others, but #1-3 still need to be solved.

Amazon’s picture

Dmitrig01 and I discussed the issue this morning. I've also bumped into dww mid-researching.

Here's the current plan:
1) We need to appoint a whitelist maintainer.
I've reached out to crell and cweagans to see what needs to be done for this. I've volunteered to kick it off, but I'll need to pass it along to some others quickly.

2) We need to figure out in what format the whitelist will be kept.
For Drupal.org the white-list content type has been created. http://drupal.org/node/add/packaging-whitelist

3) The whitelist maintainer (and others potentially) need to create the whitelist.
Paraphrased from dww:

the canonical storage of the whitelist will be these nodes. the nodes are elements of the list.

We want both a machine-readable feed in whatever format he thinks is best (JSON, XML, whatever) and a human-readable view so that the humans can easily see what's available for their distros.

We further discussed which issue queue to track these changes. End users will submit an issue in the packaging white list component of the drupal.org infrastructure project to get packages added to the white-list content type.

Dmitrig01 will need to create the white-list content type view on drupal.org that drush make can read.

4) This needs to be coded in Drush Make
Dmitrig01 will add this capability to drush make.

5) It needs to be deployed on Drupal.org
Dmitrig01 has access to a local development environment by asking for a shell account to access the druapl.org BZR. Write access to a BZR development branch, combined with Hudson will allow him to deploy and test his code in a drupal.org development environment.

Finally, I've sent dmitrig01 a copy of 3281d's plan for developing and deploying this so he's in sync with dww and hunmonk.

dww’s picture

Status: Active » Postponed

Good summary, thanks. Just to clarify a few things:

Meta summary and battle plan is probably best kept up to date at the Install Profile Packaging Community Initiative handbook page. That has the benefit of a) already having a bunch of this plan spelled out and b) being a handbook page that anyone can edit and keep updated, instead of a comment in one of the issues for a specific part of the overall plan, which no one else can edit.

This issue is specifically about fixing drush_make to read and enforce the whitelist once it exists. As such, postponed is probably the most appropriate status here, since drush_make can't enforce a whitelist that doesn't yet exist. ;)

#779452: Whitelist for external dependencies has more discussion about what the white list nodes themselves are for and how they should be used. That's where the next action needs to happen -- views for the white list nodes. The policy stuff is also being discussed over there, although the creation of a packaging whitelist maintainer role should probably be split out into a separate infra issue to avoid confusion.

Hurray that dmitri01 wants to help move this forward, thanks!!!

-Derek

rickvug’s picture

Amazon's comment #23 provides a clear path forward. It sounds like points 4 and 5 are where the work is. Starting and maintaining the whitelist itself should be quite simple.

webchick’s picture

Subscribe and tagging.

jhedstrom’s picture

Status: Postponed » Active

Un-postponing, since Drush Make can support an arbitrary whitelist before there is an official one, it just needs the code and tests to do so.

dww’s picture

Project: Drush Make » Drupal.org drush
Version: 6.x-2.x-dev »

d.o-specific drush validation code now lives in http://drupal.org/project/drupalorg_drush. Moving this issue to the right queue.

hunmonk’s picture

Status: Active » Needs review

a working patch for this lives on the 684788-library-whitelist branch. leveraging the work at #779452: Whitelist for external dependencies, it loads the whitelist, runs it through a regex-ifier function, then tries to match all libraries entries against the list. if you run drush with the -d switch, there's some nice debug output describing what it's doing.

still to do:

  1. no tests written, but because i built in a command line switch to specify the location of the whitelist resource, and because the loader function can take a filepath, we can definitely test this stuff
  2. since we haven't settled on a final format for whitelist entries over at #779452: Whitelist for external dependencies, the regex-ifer function is just a simple hack that transforms any asterisk into the regex equivalent .+. if we do keep this approach, that function will need to be expanded to provide proper escaping for regex special characters. if we decide that the whitelist entries are straight regexes, we can simply rip it out.
hunmonk’s picture

latest code at the 684788-library-whitelist branch addresses the issues in #29.

as discussed over at #779452-39: Whitelist for external dependencies, the incoming whitelist entries are now full regexes, so the code on the consuming end that dealt with transforming it has been ripped out.

i added packaging_whitelist.json to the test suite, and rewrote the library tests to use it, so we now have test coverage for both a 'good' and 'bad' download location for a library.

i believe all the major issues have been addressed here.

dww’s picture

Status: Needs review » Needs work

Reviewing this branch, I see a few issues:

A) Instead of adding drupalorg_drush_get_remote_file(), any reason not to just use drush_download_file()?

B) This way of verifying isn't ideal:

+    if (!empty($info['libraries'])) {
+      $library_transformer = new DrushMakeDo_LibraryWhitelist('make');
+      if ($library_transformer->whitelist_loaded()) {
+        foreach ($info['libraries'] as $library => $library_data) {
+          $pass = $pass && $library_transformer->verify($library_data, $library);
+          $info['libraries'][$library] = $library_data;
+        }
+      }
...

Due to short-circuit evaluation, as soon as we hit one verification failure, $pass will be FALSE and we'll never verify any other libraries in the make file. IMHO it's better to loop through everything so we can provide complete info to the end user about anything in their .make file that's not kosher, instead of just bailing out at the first error and only telling them about that.

C) There appears to be some inconsistency in the handling of the whitelist URLs. Sometimes we honor the command-line arg to specify your own, other times we hard-code based on the constant, e.g. here:

+  function error_message($url, $name) {
+    return dt("The library '!library' cannot be downloaded from !url, the URL must be in the whitelist available at !whitelist.", array('!library' => $name, '!url' => $url, '!whitelist' => DRUPALORG_DRUSH_PACKAGING_LIBRARY_WHITELIST));
+  }

D) Maybe not a bug, but I don't understand why sometimes we use drush_log() and other times we use $this->log().

E) Not clear to me why function verify() is taking $library_data by reference.

F) Codestyle: newline after the return; or break; for each case statement in a switch().

G) I'm concerned about this:

+        if (preg_match("@$regex@", $url)) {

We'd better ensure the whitelist regexs never use '@' internally. Either we should automatically escape any '@' characters in inside each regexp, or perhaps add validation over at #779452: Whitelist for external dependencies so that the list of URLs never includes '@'. I think automatically escaping in drupalorg_drush is cleaner and better than a semi-arbitrary restriction on the whitelist regexp, and relying on server-side validation instead of just doing the safe thing in the client.

H) Shouldn't verify() return FALSE instead of NULL if there's no 'url' defined?

That's what I saw on a visual review. Running the phpunit tests seems to have worked:

Time: 45 seconds, Memory: 4.50Mb
OK (8 tests, 10 assertions)

I haven't closely reviewed the tests themselves, nor manually tested myself.

dww’s picture

FYI: I realized point B effects other verification code paths in the plugin, so I think it's cleaner to handle that separately:
#1401738: Fix verification logic so we find all errors and report them at once

hunmonk’s picture

Assigned: Unassigned » hunmonk
Status: Needs work » Needs review

A) drupalorg_drush_get_remote_file() is a more robust function than drush_download_file(), therefore i would vote to try and get it merged into drush core instead of using drush core's inferior function. so, leaving that in for now...

C) cleaned this up to use only one define

D) $this->log specifically logs a make build error, whereas drush_log(), as used here, outputs debug messages. i added a TODO to either cleanup the name of the log wrapper function, or remove it entirely.

G) added a support function to properly escape the regex delimiter in the whitelist url regexes, and wrap them in the delimiter.

E), F), H) simple, all fixed.

dww’s picture

Status: Needs review » Fixed

Found a few more minor problems, worked with hunmonk to fix them. We then did a squash merge to put all of this into master via one commit (since Chad wasn't doing careful commit messages along the way, and we don't really need the specific small-scale history now that this is done).

Again, I'd normally tag this "needs drupal.org deployment" instead of calling it fixed, but that's really for #1365536: Switch distribution packaging system to use just drush core and drupalorg_drush and http://drupal.org/community-initiatives/drupalorg/distribution-packaging itself...

Status: Fixed » Closed (fixed)
Issue tags: -drupal.org distribution blockers

Automatically closed -- issue fixed for 2 weeks with no activity.